A-Matrix, 262 Acceptance Probability, 53 Accepted Point Mutations, See

Index A-matrix, 262 biased nucleotide composition, 356 Acceptance probability, 53 empirical patterns, 355 Accepted point mutations, see PAM GC content, 356 Affine gap penalties, 378 spectrum, 356 AIC, see Akaike information criterion Base frequencies, 107, 188 Akaike information criterion, 20, 204, Bayes factor, 152, 153, 204–206 293, 465 Bayes’ theorem, 197 Alignment Bayesian approach, 46, 439 multiple, see Multiple alignment Bayesian dating, 244 pairwise, see Pairwise alignment Bayesian estimation, 184 Alignment algorithms, 375 applications, 186 BLAST, 376 Bayesian hypothesis testing, 451 Clustal, 376 predictive distributions, 441 hidden Markov models, 376 Bayesian inference, 35, 46, 47, 184, 186, Smith-Waterman, 376 188, 242, 467, 469, 470, 488, 489 Alignment methods, 376 assessing uncertainty in phylogenet- Alphabet, 328 ics, 467 Amino acid rate matrices, 11 divergence times, 242, 244 AU test, 468, 480 empirical approach, 118 Autocorrelated rate-variation model, phylogenetic inference, 50 332 hybrid samplers, 50 Autocorrelation parameter λ, 331 posterior distribution, 46 posterior probability, 35 Background frequencies, 327 prior distribution, 35, 46 Background selection, 370 unnormalized posterior distribution, BAMBE program, 50, 449 47 Base composition, 355 Bayesian information criterion, 337, 469 Base composition evolution, 366 Bayesian methods, 35, 46, 184, 197 a two-state model, 366 BEAST program, 399 selection coefficients, 366 Biased base composition, 362 selection parameter, 368 Biases of probability values, 477 Base composition variation, 355 BIC, see Bayesian information criterion biased mutation DNA repair, 357 Binomial distribution, 27 496 Index Birth-death process, 4, 5, 378, 380 Codon frequency, 360 Bivariate distribution, 150 Codon models, 12, 17, 20, 120, 144, 281, Block substitution matrix, see 338 Substitution matrix, BLOSUM local, 147 Bootstrap methods, 19, 142, 143, 149, reversible, 106 171, 199, 242, 249, 442, 472–476, Codon usage, 90, 105, 107, 108, 360, 478, 485, 487, 488 361, 364, 366–370 approximately unbiased tests, 474 Coevolution, 278–280 ML estimate, 32 Coevolutionary Markov model, 279, 280 multiscale, 474 Computer underflow, 196 nonparametric, 199, 200, 242, 249, Confidence intervals, 19, 31, 132–134, 451, 472, 478, 485–487 154, 171 parametric, 19, 142, 149, 199, 294, CONSEL software, 468, 475 442, 451–453, 457, 478 Context-dependent substitution, 333, sampling error, 472 335 speed improvements, 474 Continuous-time evolutionary model, Bootstrap probability, 468, 473 377 Bootstrap replicate, 242, 473–476, 488 Continuous-time Markov chain, 187, BP, see Bootstrap probability 296 Branch lengths, 235 Correlated character evolution, 458 Breakpoint graph, 308, 313, 314, 317 Correlated rate change, 251 Brownian motion, 3, 4, 315 Covarion models, 17, 275 Burn-in, 55 Cox test, 204, 479, 484, 485, 487 CpG islands, 357 Calibration point, 239, 240, 249 cpREV model, 211, 265 Calibration times, 216, 218, 219 Character history, 195, 442, 447 Dayhoff model, 148, 262, 263 sampling, 447 Degrees of freedom, 271 Character mapping, 440 Dependent sampling, 45 Bayesian approaches, 440 Detailed balance condition, 48, 378 maximum likelihood, 440 Dimension matching, 53 parsimony, 440 Dirichlet prior distribution, 214 Chromosomal fission, 307, 315 DIST-PC model, 273, 274 Chromosomal fusion, 307, 315 Divergence times, 233 Chromosomal inversions, 307 Bayesian inference, 242 Bayesian approach, 311 branch lengths, 235 breakpoint graph, 308 estimation, 215–217, 233–235, 240, comparative map, 312, 318 248, 252 fortress of hurdles, 309 local clock, 239 hurdles, 308 molecular clock Markov chain Monte Carlo method, overdispersed, 239 311 multigene analyses, 250 nonuniformity, 320 penalized likelihood, 240 signed permutation, 307 rate change, 251, 252 unsigned permutation, 309 uncertainties, 248 Chromosomal segment, 307 uncertainty Chronological rate, 233–236, 252 fossil, 249 Clades, 200, 215, 483 topological, 250 Codon bias, 360 DNA motif bias, 358 Index 497 DNA repair amino acid fitnesses, 271 biased, 357 Metropolis-Hastings function, 272 very short patch, 359 Fitness model, 271, 272, 274, 279 vsp, see very short patch amino acid, 271 DNA substitution matrices, see coevolutionary, 279 Substitution matrix, DNA Fluorescent in situ hybridization, see FISH Effective divergence time, 426, 427, 430, Forward algorithm, 328, 341 431 Forward-backward algorithm, 328, 331, Effective number of codons, 360 341, 387, 395 Effective population size, 368 EM, see EM algorithm Gamma distribution, 16, 212, 266, 267, EM algorithm, 342, 414–416, 418, 419 273 continuous time, 418 Gamma shape parameters, 217 discrete time, 416 GC content, 356–358, 368 E step, 417 General time-reversible model, 203, 263, M step, 417 363 tree EM, 420 Generalized Dirichlet prior, 247 Empirical Bayes, 118, 154 Generation length, 235 Empirical Bayesian mapping, 274 Genetic drift, 64, 67, 69–71, 79, 80, 89, ENC, see Effective number of codons 90 Equilibrium frequencies, 272 Genetic markers, 289 Equilibrium length distribution, 378 Genome rearrangement, 307 Erdös-Renyi graph, 314 breakpoint distance, 310, 312 edge occupancy probability, 314 breakpoint graph, 308, 313, 314 Evolutionary constraints, 260, 270 chromosomal fission, 307, 315 pattern, 267 chromosomal fusion, 307, 315 Evolutionary distance, 363, 364, 410 chromosome segment Evolutionary divergence estimation, 363 syntenic, 320 Evolutionary rate, 15, 236 chromosome shuffling, 310 modelling rate variations, 15 coagulation-fragmentation process, Expectation maximization, see EM 313 algorithm conserved segments, 316 Expected amount of evolution, 146, cycle structure, 313 236, 237 distance, 312 Expected information, 31, 32 inversion tract lengths, 321 Exponential random variables, 189 inversions, 307, see Chromosomal inversions F81, see Felsenstein model maximum parsimony, 310 FASTA format, 127 n-inversion chain, 309 Felsenstein model, 159, 160, 203, 204 chromosome markers, 309 Felsenstein pruning algorithm, 328, 447 Nadeau and Taylor method, 318 likelihood calculation, 281 number of inversions, 307, 310–312, FISH, 319 317 Fisher information matrix, 132, 133 parsimony, 312, 320 FIT-GEN model, 274 parsimony distance, 312 FIT-PC model, 271, 273, 274, 279 parsimony methods, 307 Fitness functions, 271 permutation cycles, 308 amino acid, 279 random transpositions, 312 498 Index reciprocal translocations, 307, 315 HKY model, 11, 328, 336, 337, 363 θ-inversion model, 321 HKY85 model, 52–54, 127, 187, 188, Genomic distance, 315 190, 192, 196, 199, 203–205, 228 Genomic signature, 359 Holding times, 296 Gibbs sampler, 49, 50 Homotachy, 275 random-scan, 49, 50 HP algorithm, 307 systematic-scan, 49, 50 Hypermutability, 358 Graph, 338 HyPhy, 125 directed, 338 Alignment data, 159 edges, 338 data filter, 127, 128 nodes, 338 data set, 127 undirected, 338 defining a likelihood function, 161 vertices, 338 HKY85, 127, 129 Graphical models, 325, 338 hypothesis testing, 141 belief-propagation algorithm, 340–342 instantaneous rate matrix, 127 elimination algorithm, 340, 341 likelihood function, 128, 130 junction-tree algorithm, 342 local branch parameters, 135 Markov chain Monte Carlo, 344 maximizing the likelihood, 162 moralization, 342 MLE, 130, 132 parents, 338 model description, 160 probabilistic inference, 339 multiple partitions, 139 GTR model, see General time-reversible object inspector, 136 model phylogenetic tree input, 161 substitution models, 127 Hardy-Weinberg equilibrium, 36 tree, 128 maximum likelihood estimator, 36 tree viewer, 131 HBL, see HyPhy Batch Language HyPhy batch files, 159, 162 Heterogeneity models over time, 275 HyPhy batch language, 157, 158, 162 Hidden Markov model, 268, 325, 326, analyzing codon data, 178 385, 386, 389 model definition, 162 across sites, 268 molecular clocks, 168 emission-equivalent, 389 simulation tools, 170 hidden classes, 268 site-to-site rate heterogeneity, 175 hidden path, 326 Hypothesis testing, 19, 33, 113, 139, HMMER, 391 141, 148 matrix of state-transition probabili- acceptance region, 33 ties, 327 alternative hypothesis, 33 multiple alignment, 391 null hypothesis, 33 path, 326 rejection region, 33 path-equivalent, 389 significance level, 34 phylogenetic models, 325 type I error, 34 posterior probability, 326 type II error, 34 recombination events, 325 SAM, 391 Indel rate per fragment, 382 secondary structure prediction, 325 Indels, 377 silent states, 392 Independent sites–structurally con- Hidden site classes, 273 strained protein evolution, see Higher-order Markov models, 333 IS-SCPE method Hill-Robertson effect, 370 Individual, 26 Index 499 Instantaneous transition matrix, 263 MAP, see Maximum a posteriori Instantaneous transition rate matrix, Markov chain, 3, 187, 325, 408 262 continuous-time, 5, 296 IS-SCPE method, 270 EM algorithm, see EM algorithm Ising model, 343 equilibrium distribution, 409 Isochore, 357 ergodic, 6 higher-order, 333 JC69 model, see Jukes and Cantor homogeneous, 409 model inhomogeneity, 427 JTT model, 12, 149, 264, 265 posterior probability, 17 JTT+Γ model, 270 rate matrix, 409 Jukes and Cantor model, 10, 36, 201, resolvent, 422 363, 445 reversible, 409 maximum likelihood estimator, 37 stationary, 409 stationary distribution, 6 KH test, see Kishino-Hasegawa test substitution matrix, 408 Kimura two-parameter model, 363 time reversibility, 7 Kimura’s formula, 366 time-reversible, 48 Kishino-Hasegawa test, 482, 484, 485, transition probabilities, 409 487 calculations, 7 KL divergence, 465, 466 transition rates, 5 Kolmogorov’s forward equations, 379 Markov chain Monte Carlo, 45,

A-Matrix, 262 Acceptance Probability, 53 Accepted Point Mutations, See

Lecture 5: Sequence Alignment – Global Alignment

Novel Bioinformatics Applications for Protein Allergology

Information-Theoretic Bounds of Evolutionary Processes Modeled As a Protein Communication System

Testing the Independence Hypothesis of Accepted Mutations for Pairs Of

A Thesis Entitled Homology-Based Structural Prediction of the Binding

Bioinformatics Scoring Matrices

Oxidising Bacteria (SAOB)

Lecture 10: Local Alignment and Substitution Matrices 10.1

2-PAM Matrices

Pairwise Alignment

PHAT: a Transmembrane-Specific Substitution Matrix

Substitution Matrices E S V U