<<

Index

A-matrix, 262 biased composition, 356 Acceptance probability, 53 empirical patterns, 355 Accepted point , see PAM GC content, 356 Affine gap penalties, 378 spectrum, 356 AIC, see Akaike information criterion Base frequencies, 107, 188 Akaike information criterion, 20, 204, Bayes factor, 152, 153, 204–206 293, 465 Bayes’ theorem, 197 Alignment Bayesian approach, 46, 439 multiple, see Multiple alignment Bayesian dating, 244 pairwise, see Pairwise alignment Bayesian estimation, 184 Alignment algorithms, 375 applications, 186 BLAST, 376 Bayesian hypothesis testing, 451 Clustal, 376 predictive distributions, 441 hidden Markov models, 376 Bayesian inference, 35, 46, 47, 184, 186, Smith-Waterman, 376 188, 242, 467, 469, 470, 488, 489 Alignment methods, 376 assessing uncertainty in phylogenet- Alphabet, 328 ics, 467 rate matrices, 11 divergence times, 242, 244 AU test, 468, 480 empirical approach, 118 Autocorrelated rate-variation model, phylogenetic inference, 50 332 hybrid samplers, 50 Autocorrelation parameter λ, 331 posterior distribution, 46 posterior probability, 35 Background frequencies, 327 prior distribution, 35, 46 Background selection, 370 unnormalized posterior distribution, BAMBE program, 50, 449 47 Base composition, 355 Bayesian information criterion, 337, 469 Base composition evolution, 366 Bayesian methods, 35, 46, 184, 197 a two-state model, 366 BEAST program, 399 selection coefficients, 366 Biased base composition, 362 selection parameter, 368 Biases of probability values, 477 Base composition variation, 355 BIC, see Bayesian information criterion biased DNA repair, 357 Binomial distribution, 27 496 Index

Birth-death process, 4, 5, 378, 380 Codon frequency, 360 Bivariate distribution, 150 Codon models, 12, 17, 20, 120, 144, 281, Block , see 338 Substitution matrix, BLOSUM local, 147 Bootstrap methods, 19, 142, 143, 149, reversible, 106 171, 199, 242, 249, 442, 472–476, Codon usage, 90, 105, 107, 108, 360, 478, 485, 487, 488 361, 364, 366–370 approximately unbiased tests, 474 Coevolution, 278–280 ML estimate, 32 Coevolutionary Markov model, 279, 280 multiscale, 474 Computer underflow, 196 nonparametric, 199, 200, 242, 249, Confidence intervals, 19, 31, 132–134, 451, 472, 478, 485–487 154, 171 parametric, 19, 142, 149, 199, 294, CONSEL software, 468, 475 442, 451–453, 457, 478 Context-dependent substitution, 333, sampling error, 472 335 speed improvements, 474 Continuous-time evolutionary model, Bootstrap probability, 468, 473 377 Bootstrap replicate, 242, 473–476, 488 Continuous-time , 187, BP, see Bootstrap probability 296 Branch lengths, 235 Correlated character evolution, 458 Breakpoint graph, 308, 313, 314, 317 Correlated rate change, 251 Brownian motion, 3, 4, 315 Covarion models, 17, 275 Burn-in, 55 Cox test, 204, 479, 484, 485, 487 CpG islands, 357 Calibration point, 239, 240, 249 cpREV model, 211, 265 Calibration times, 216, 218, 219 Character history, 195, 442, 447 Dayhoff model, 148, 262, 263 sampling, 447 Degrees of freedom, 271 Character mapping, 440 Dependent sampling, 45 Bayesian approaches, 440 Detailed balance condition, 48, 378 maximum likelihood, 440 Dimension matching, 53 parsimony, 440 Dirichlet prior distribution, 214 Chromosomal fission, 307, 315 DIST-PC model, 273, 274 Chromosomal fusion, 307, 315 Divergence times, 233 Chromosomal inversions, 307 Bayesian inference, 242 Bayesian approach, 311 branch lengths, 235 breakpoint graph, 308 estimation, 215–217, 233–235, 240, comparative map, 312, 318 248, 252 fortress of hurdles, 309 local clock, 239 hurdles, 308 Markov chain Monte Carlo method, overdispersed, 239 311 multigene analyses, 250 nonuniformity, 320 penalized likelihood, 240 signed permutation, 307 rate change, 251, 252 unsigned permutation, 309 uncertainties, 248 Chromosomal segment, 307 uncertainty Chronological rate, 233–236, 252 , 249 Clades, 200, 215, 483 topological, 250 Codon bias, 360 DNA motif bias, 358 Index 497

DNA repair amino acid fitnesses, 271 biased, 357 Metropolis-Hastings function, 272 very short patch, 359 Fitness model, 271, 272, 274, 279 vsp, see very short patch amino acid, 271 DNA substitution matrices, see coevolutionary, 279 Substitution matrix, DNA Fluorescent in situ hybridization, see FISH Effective divergence time, 426, 427, 430, Forward algorithm, 328, 341 431 Forward-backward algorithm, 328, 331, Effective number of codons, 360 341, 387, 395 Effective population size, 368 EM, see EM algorithm Gamma distribution, 16, 212, 266, 267, EM algorithm, 342, 414–416, 418, 419 273 continuous time, 418 Gamma shape parameters, 217 discrete time, 416 GC content, 356–358, 368 E step, 417 General time-reversible model, 203, 263, M step, 417 363 tree EM, 420 Generalized Dirichlet prior, 247 Empirical Bayes, 118, 154 Generation length, 235 Empirical Bayesian mapping, 274 Genetic drift, 64, 67, 69–71, 79, 80, 89, ENC, see Effective number of codons 90 Equilibrium frequencies, 272 Genetic markers, 289 Equilibrium length distribution, 378 Genome rearrangement, 307 Erd¨os-Renyi graph, 314 breakpoint distance, 310, 312 edge occupancy probability, 314 breakpoint graph, 308, 313, 314 Evolutionary constraints, 260, 270 chromosomal fission, 307, 315 pattern, 267 chromosomal fusion, 307, 315 Evolutionary distance, 363, 364, 410 chromosome segment Evolutionary divergence estimation, 363 syntenic, 320 Evolutionary rate, 15, 236 chromosome shuffling, 310 modelling rate variations, 15 coagulation-fragmentation process, Expectation maximization, see EM 313 algorithm conserved segments, 316 Expected amount of evolution, 146, cycle structure, 313 236, 237 distance, 312 Expected information, 31, 32 inversion tract lengths, 321 Exponential random variables, 189 inversions, 307, see Chromosomal inversions F81, see Felsenstein model maximum parsimony, 310 FASTA format, 127 n-inversion chain, 309 Felsenstein model, 159, 160, 203, 204 chromosome markers, 309 Felsenstein pruning algorithm, 328, 447 Nadeau and Taylor method, 318 likelihood calculation, 281 number of inversions, 307, 310–312, FISH, 319 317 Fisher information matrix, 132, 133 parsimony, 312, 320 FIT-GEN model, 274 parsimony distance, 312 FIT-PC model, 271, 273, 274, 279 parsimony methods, 307 Fitness functions, 271 permutation cycles, 308 amino acid, 279 random transpositions, 312 498 Index

reciprocal translocations, 307, 315 HKY model, 11, 328, 336, 337, 363 θ-inversion model, 321 HKY85 model, 52–54, 127, 187, 188, Genomic distance, 315 190, 192, 196, 199, 203–205, 228 Genomic signature, 359 Holding times, 296 Gibbs sampler, 49, 50 Homotachy, 275 random-scan, 49, 50 HP algorithm, 307 systematic-scan, 49, 50 Hypermutability, 358 Graph, 338 HyPhy, 125 directed, 338 Alignment data, 159 edges, 338 data filter, 127, 128 nodes, 338 data set, 127 undirected, 338 defining a likelihood function, 161 vertices, 338 HKY85, 127, 129 Graphical models, 325, 338 hypothesis testing, 141 belief-propagation algorithm, 340–342 instantaneous rate matrix, 127 elimination algorithm, 340, 341 likelihood function, 128, 130 junction-tree algorithm, 342 local branch parameters, 135 Markov chain Monte Carlo, 344 maximizing the likelihood, 162 moralization, 342 MLE, 130, 132 parents, 338 model description, 160 probabilistic inference, 339 multiple partitions, 139 GTR model, see General time-reversible object inspector, 136 model input, 161 substitution models, 127 Hardy-Weinberg equilibrium, 36 tree, 128 maximum likelihood estimator, 36 tree viewer, 131 HBL, see HyPhy Batch Language HyPhy batch files, 159, 162 Heterogeneity models over time, 275 HyPhy batch language, 157, 158, 162 Hidden Markov model, 268, 325, 326, analyzing codon data, 178 385, 386, 389 model definition, 162 across sites, 268 molecular clocks, 168 emission-equivalent, 389 simulation tools, 170 hidden classes, 268 site-to-site rate heterogeneity, 175 hidden path, 326 Hypothesis testing, 19, 33, 113, 139, HMMER, 391 141, 148 matrix of state-transition probabili- acceptance region, 33 ties, 327 alternative hypothesis, 33 multiple alignment, 391 null hypothesis, 33 path, 326 rejection region, 33 path-equivalent, 389 significance level, 34 phylogenetic models, 325 type I error, 34 posterior probability, 326 type II error, 34 recombination events, 325 SAM, 391 Indel rate per fragment, 382 secondary structure prediction, 325 Indels, 377 silent states, 392 Independent sites–structurally con- Hidden site classes, 273 strained evolution, see Higher-order Markov models, 333 IS-SCPE method Hill-Robertson effect, 370 Individual, 26 Index 499

Instantaneous transition matrix, 263 MAP, see Maximum a posteriori Instantaneous transition rate matrix, Markov chain, 3, 187, 325, 408 262 continuous-time, 5, 296 IS-SCPE method, 270 EM algorithm, see EM algorithm Ising model, 343 equilibrium distribution, 409 Isochore, 357 ergodic, 6 higher-order, 333 JC69 model, see Jukes and Cantor homogeneous, 409 model inhomogeneity, 427 JTT model, 12, 149, 264, 265 posterior probability, 17 JTT+Γ model, 270 rate matrix, 409 Jukes and Cantor model, 10, 36, 201, resolvent, 422 363, 445 reversible, 409 maximum likelihood estimator, 37 stationary, 409 stationary distribution, 6 KH test, see Kishino-Hasegawa test substitution matrix, 408 Kimura two-parameter model, 363 time reversibility, 7 Kimura’s formula, 366 time-reversible, 48 Kishino-Hasegawa test, 482, 484, 485, transition probabilities, 409 487 calculations, 7 KL divergence, 465, 466 transition rates, 5 Kolmogorov’s forward equations, 379 Markov chain Monte Carlo, 45, 197, 469 Kullback-Leibler divergence, see KL assessing convergence, 54 divergence burn-in, 55, 57, 58, 400, 470 Metropolis-coupled, 58, 471 Likelihood, 193 cold chain, 59 Likelihood function, 8, 9, 25–27, 46, hot chain, 59 106, 107, 130, 184, 185 reversible jump, 52 Bayesian inference, 46, 467 dimension matching, 53 binomial distribution, 27 temperature, 470 multiple-parameter models, 184 trace plots, 55 phylogenetic, 269, 280 Markov models, 3, 105, 189, 259, 327, , 39 362, 386 tree, 8, 464 codon evolution, 105 Likelihood methods, 25, 464 coevolutionary, 279 Likelihood profile, 133, 135 continuous-time, 263 Likelihood ratio, 19, 48, 465 emission probability, 329, 333 Likelihood ratio test, 19, 34, 104 forward algorithm, 328 generalized, 34 higher-order, 333 Lineage sorting, 77 matrix of state-transition probabili- Link, 378 ties, 327 immortal, 378 phylogenetic analyses, 18 Local algorithm, 50 REV, 261 Local clocks, 239 sequence evolution, 10, 362 Log-likelihood, 9, 31, 196, 464 continuous-time transition matrix, Long indel model, 383, 384, 392 362 LR, see Likelihood ratio state-transition diagram, 332 LR statistics, 19 Viterbi algorithm, 328 LRT, 19, see Likelihood ratio test Markov process, see Markov chain 500 Index

Markov property, 3, 408 MG94×HKY85 3×4 model, 144–146, Markov-dependent models, 335 149–151, 156 Maximum a posteriori, 469 Microsatellite evolution Maximum likelihood, 20, 28, 103–105, models, 291 184, 264, 464 point mutations, 293 computing the estimate, 29 random walk models, 291 confidence intervals, 31 slippage model, 291 estimate, 20, 28, 464 stepwise mutation model, 291 bootstrap, 32 symmetric slippage models, 295 efficiency, 33 Microsatellite markers, 289 variance, 31 Microsatellites, 289 estimation, 28 PCR, 294 estimator, 184 Polymerase chain reaction, see PCR expected information, 31 Mixed data analyses, 155 hypothesis testing, 33 ML, see Maximum likelihood likelihood equation(s), 29 MLE, see Maximum likelihood estimate multinomial distributions, 30 Model, 26 Model estimation PAML, 104 likelihood methods, 264 parameter estimation, 20 Model selection tests, 463, 482 phylogenetic inference, 463 Modeling correlated evolution between phylogeny estimation, 197 sites, 278 point estimate, 31 MODELTEST program, 204, 207, 465 support, 29 Molecular clock, 145, 234, 235 variance, 242 hypothesis, 235 Maximum parsimony, 310 Monte Carlo, 45 MB, see Multiscale bootstrap method Monte Carlo approximations, 191 MCMC, see Markov chain Monte Carlo Monte Carlo procedure, 191 MCMCMC, see Markov chain Monte Monte Carlo simulation, 191 Carlo, Metropolis-coupled, see MPI, see Message Passing Interface, 155 Markov chain Monte Carlo, MrBayes, 183 Metropolis-coupled phylogenetic inference MCS, see Multi-species conserved complex models, 208 sequences MrBayes program, 449 Mechanistic models, 270 MT126 model, 270 Message Passing Interface, 155 Multi-species conserved sequences, 331 Metropolis-coupled MCMC, see Multigene analyses, 250 Markov chain Monte Carlo, Multinomial distributions, 30 Metropolis-coupled Multiple alignment, 376, 390, 391, 393 Metropolis-Hastings algorithm, 47, 48, algorithms, 393 247 corner-cutting methods, 396 candidate, 48 goodness, 390 Hastings ratio, 48 HMMER, 391 likelihood ratio, 48 multiple forward-backward algorithm, posterior ratio, 48 395 prior ratio, 48 multiple HMM, 391 proposal distribution, 47 multiple Viterbi algorithms, 396 proposal ratio, 48 phylogenetic inference, 390 target ratio, 48 programs, 391 Index 501

ClustalW, 391 Nucleotide-level sensitivity, 330 DiAlign, 391 PSI-Blast, 391 Overdispersed molecular clock, 239 T-Coffee, 391 Oxford graph, 320 SAM, 391 Oxford grid, 320 score-based approach, 390 time complexity, 391, 394 P-matrix, see Substitution probability TKF91 model, 392 matrix TKF92 model, 392 Pairwise alignment, 375, 384 transducers, 392 hidden Markov models, 384 Multiple alignments, 208 likelihood calculations, 384 Multiple substitutions, 364 nonequivalence of paths, 390 Multiscale bootstrap method, 474 time complexity, 385 CONSEL software, 475 TKF models, 384 Multivariate normal densities, 248 hidden Markov models, 386 covariance matrix, 248 PAM, 263, 410 Mutability, 262 PAML software, 104, 210 Mutation bias, 357 Parameter estimation, 20 Mutation probability matrix, 262 Parametric bootstrap, see Bootstrap Mutation rate, 235 methods, parametric Mutation rate matrix, see Instantaneous Parsimony, 195, 307, 376 transition rate matrix Parsimony mapping, 443 Mutation-selection-drift theory, 366 PASSML model, 268, 269, 273, 278 PASSML-TM model, 269 Nadeau and Taylor method, 318 Pattern heterogeneity , 79, 364 across sites, 267 Nearly neutral molecular evolution, 84 PAUP* program, 41, 197, 203–205 Negative selection, 67 PB, see Bootstrap methods, parametric Neutral theory, 65, 67, 266 Penalized likelihood, 240 NEXUS format, 127, 228 Penalized log-likelihood, 241 Node rates, 244, 245 Penalty function, 241 Node time constraints, 247 PHAS, see Pattern heterogeneity across Node times, 245 sites Node-dating, 241 PHAS model, 270, 271, 275 Non-independence, 364 PHYLIP format, 127 Nonhomogeneous models, 13 Phylo-HMM, 325–327 Nonindependence between sites, 14 as graphical models, 338 Nonparametric bootstrap, see Bootstrap autocorrelated rate-variation model, methods, nonparametric 332 Nonphylo-HMM, 330, 331 autocorrelation parameter λ, 331 Nonstationarity, 364 context-dependent substitution, 346 Nonsynonymous rate change, 252 full process-based model, 346 Nonsynonymous substitution, 103, 105, highly conserved regions, 331 140, 146, 335, 370 HKY, 336 Nonsynonymous substitution rate, 108, junction-tree algorithm, 342 144, 156 Markov-dependent model, 346 NP problems, 390 matrix of state-transition probabili- Nucleotide composition, 356 ties, 327 GC content variation, 356 simple-lattice model, 346 502 Index

toy gene finding, 329 Positive selection, 67, 103, 458 U2S, 336, 337 detection methods, 458 U3S, 336, 337 Posterior decoding, 390 UNR, 336 Posterior distribution, 46 Phylogenetic analyses, 18 Posterior mapping, 439, 442 simulations, 18 Posterior predictive distributions, 439, Phylogenetic hidden Markov models, 441 see Phylo-HMM Posterior predictive probabilities, 441 Phylogenetic inference, 187, 463 Posterior predictive values, 441, 455 assessing uncertainty, 463 Posterior probabilities on trees, 200 Bayesian inference, 467 Posterior probability, 35, 185, 467 Bayesian approach, 50 Potential function, 342 Bayesian information criterion, 469 PP, see Posterior probability Bootstrap methods, 472 Predictive distributions, 451 Cox test, 479, 484, 485, 487 parametric bootstrap, 451 hypothesis testing, 478 Prior distribution, 35, 46 combining nonnested models, 481 Prior probability, 185 maximum likelihood Probabilistic inference, 339 Akaike information criterion, 465 Probabilistic models, 338 substitution process selection, 465 Protein evolutionary models, 259 tree topology selection, 466 Protein folding, 260 maximum likelihood estimate, 464 Purifying selection, 103 model selection tests, 482 Kishino-Hasegawa test, 482 multiple-comparisons, 485 R-matrix, see Relative rate matrix MrBayes, 208 Rate heterogeneity multiple alignment, 390 across sites, see RHAS model SOWH test, 479 Gamma distribution, 266 Phylogenetic model, 325, 327 Rate heterogeneity across time, see Bayesian, 196 RHAT model context-dependent, 335 Rate matrix, 6, 10, 187 hidden Markov model, 325 amino acid models, 11 Markov models, 18 codon models, 12 Phylo-HMM, 325 DNA models, 10 Phylogenetic tree, 8 nonhomogeneous models, 13 EM, 420 Rate matrix, Q, 188 Likelihood function Rate multiplier, 212 calculations, 39 Rate trajectory, 238 likelihood function, 39 Rate variation parameter ν, 245 maximum likelihood, 41 Recombination, 89 MLE, 127 Relative rate matrix, 263 posterior probabilities, 200 Relative ratio test, 143 rooted, 187 Relative substitution rates, 268 unrooted, 187 Relative synonymous codon usage, 360 Phylogeny estimation, 183, 196 RELL method, 474, 486, 488 Physicochemical properties, 261 Replication slippage, 290 Point accepted mutation, see PAM RES, 413, 422 Poisson random variable, 315 Residue disequilibrium value, 280 Population, 26 Resolvent method, 422 Index 503

REV model, 11, 164, 261, 263–267, 270, blocks, 423 273, 274 BLOSUM, 407, 408, 410, 413, REV+Γ model, 267 423–427, 430–433 Reversible jump MCMC, 52, 205 BLOSUMθ, 424 Reversible rate matrix, 413 BLOSUM40, 425 estimation BLOSUM45, 413 ML I, 413 BLOSUM62, 413 ML II, 414 BLOSUM80, 425 pair EM, 416 calibration, 410 Maximum likelihood, 413 codon models, 12 RES, 422 comparison, 426 resolvent method, 422 simulations, 431 tree EM, 420 DNA, 434 RHAS model, 266, 275, 277 HOXD70, 434 RHAT model, 275, 277 DNA models, 10 Rooted trees, 187 estimation, 407 RSCU, see Relative synonymous codon Jukes and Cantor, 10 usage markov process, 408 rtREV model, 265 nonhomogeneous models, 13 PAM, 407, 408, 410–413, 415, 426 Sample, 26 PAM1, 410 Secondary structure, 260 PAM160, 413 Selective interference, 370 substitution score, 407 Sequence evolution theoretical comparison, 430 Markov models, 10 , 36, 457 Shared rate, 238 Substitution probability matrix, 261 Signal detection, 345 Substitution process selection, 465 SIMMAP program, 195, 443 Substitution rate, 368 Simulations, 18 Substitution rate matrix, 327 Site-specific models, 212 Substitutions per synonymous site, 365 Site-to-site dependence, 269 Swofford-Olsen-Waddell-Hillis, see SOWH test, 479–481, 485, 487 SOWH test Spatial rate heterogeneity, 149 Synonymous rate change, 252 SS, see Site-specific models Synonymous substitution, 103 Star tree, 391 Syntenic segment, 320 State-transition diagram, 332 Stationary distribution, 6, 192 Taxon bipartitions, see Clades Stationary frequencies, 7, 193, 210 Tertiary structure, 260 Statistical alignment, 375 Test statistic, 33, 34 Markov chain Monte Carlo, 397 Thermodynamic Hypothesis, 260 multiple, see Multiple alignment Time reversibility, 7 Stem regions, 209 Time-reversible, 48 Stirling’s formula, 314 TKF models, 377, 384 Stochastic models, 183 hidden Markov models Substitution function algorithms, 387 amino acid, 271 forward-backward algorithm, 387 Substitution matrix, 10, 365, 407, 408 maximum likelihood path, 389 amino acid, 407 Viterbi algorithm, 388 amino acid models, 11 likelihood calculations, 387 504 Index

maximum likelihood, 382 Transitions, 10 parameters, 382 Translational efficiency, 361 TKF91 model, 377, 378, 392, 394, 395, Transversions, 10 398 Tree, 38 deletion rate, 379 branch length, 38 hidden Markov models, 387 branches, 38 transition rate, 378 likelihood function, 464 TKF92 model, 377, 381, 392 nodes, 38 fragments, 381 Type I error, 34 indel, 381 Type II error, 34 tmREV matrix, 264 TN93 model, 52–54 Unrooted trees, 187 Total variation distance, 310 Trace plots, 55 Variable, 26 Transition probabilities, 7, 188, 191, 409 Viterbi algorithm, 328, 388, 389, 393, Transition probability matrix, 127, 191, 396 192 Transition rates WAG matrix, 265 continuous time, 5 Waiting time, 189 Transition/transversion rate ratio, 188, Weighted SH test, 486 328 Wright-Fisher model, 71