Direct Solutions of the Wright-Fisher Model

University of Calgary PRISM: University of Calgary's Digital Repository Graduate Studies The Vault: Electronic Theses and Dissertations 2019-12-23 Direct Solutions of the Wright-Fisher Model Kryukov, Ivan Kryukov, I. (2019). Direct Solutions of the Wright-Fisher Model (Unpublished doctoral thesis). University of Calgary, Calgary, AB. http://hdl.handle.net/1880/111434 doctoral thesis University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY Direct Solutions of the Wright-Fisher Model by Ivan Kryukov A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY GRADUATE PROGRAM IN BIOCHEMISTRY AND MOLECULAR BIOLOGY CALGARY, ALBERTA DECEMBER, 2019 c Ivan Kryukov 2019 Contents List of figures v List of tables viii Abstract x Preface xii Acknowledgements xiii 1 Introduction 1 1.1 The Wright-Fisher model . .1 1.1.1 Transition probability matrix of the Wright-Fisher model . .2 1.1.2 Selection and mutation . .3 1.1.3 Probability of fixation . .6 1.2 Applications in evolutionary biology . .9 1.2.1 Mutation-selection limitations . 11 1.3 Direct computation of substitution rate . 12 2 Wright-Fisher Exact Solver 16 2.1 Introduction . 17 2.2 Results . 18 2.2.1 Implementation . 18 i 2.2.2 Evaluation . 19 2.2.3 Discussion . 19 2.3 Methods . 20 2.3.1 Finite absorbing Markov chain theory . 21 2.3.2 Rapid solution of restricted linear systems . 23 2.3.3 Parameterization of the Wright-Fisher model . 24 2.3.4 Rapid calculation of the transition matrix . 26 2.4 Additional Results . 26 2.4.1 Comparison of Exact and Approximate Results . 26 2.4.2 Effect of Truncation . 28 3 Allele Age 30 3.1 Introduction . 31 3.2 Results . 34 3.2.1 Validation by comparison to other methods . 35 3.2.2 Computational advantages of the exact approach . 37 3.2.3 Direct demonstration of classical results . 38 3.2.4 Selective strolls and stochastic slowdowns . 40 3.2.5 Fast mutation and age imbalance . 40 3.2.6 Allowing the starting number of copies to vary . 42 3.3 Discussion . 46 3.4 Materials and Methods . 48 3.4.1 Theory . 48 3.4.2 Implementation . 50 3.5 Simulations . 53 3.6 Supporting information . 54 3.7 Supplementary methods . 54 ii 4 Modelling Time-Heterogeneous Evolution and Changing Population Size 60 4.1 Abstract . 60 4.2 Introduction . 61 4.2.1 Background . 62 4.2.2 Consideration of selection . 64 4.2.3 From fixation probabilities to allele frequency spectra . 64 4.3 Methods . 65 4.3.1 Time-homogeneous Wright-Fisher model . 65 4.3.2 Time-heterogeneous Wright-Fisher model . 68 4.3.3 Allele frequency spectrum calculation . 70 4.4 Results . 73 4.4.1 Fluctuating population size . 73 4.4.2 Increasing population size . 74 4.4.3 Distribution of allele frequencies . 80 4.5 Conclusions . 84 4.6 Supplement . 85 4.6.1 Supplementary methods . 85 4.6.2 AFS approximation scaling . 87 4.6.3 Supplementary figures . 88 5 Rate of Substitution with Standing Genetic Variation 93 5.1 Abstract . 93 5.2 Introduction . 94 5.3 Methods . 97 5.3.1 Rate of approach to equilibrium . 97 5.3.2 Rate of substitution in the Wright-Fisher model . 100 5.3.3 Modelling single-origin selective sweeps . 101 iii 5.3.4 Modelling multiple- and single-origin selective sweeps by recurrent mutation . 102 5.3.5 Modelling multiple- and single-origin selective sweeps by recurrent mutation and standing genetic variation . 102 5.3.6 Simulations . 103 5.4 Results . 104 5.4.1 Approach to equilibrium . 104 5.4.2 Finite-sites substitution rate with bidirectional mutation, selection, and SGV . 109 5.4.3 Validation by simulation . 112 5.4.4 Sojourn times prior to absorption . 112 5.5 Conclusions . 115 5.5.1 Data availability . 116 5.6 Supplement . 116 5.6.1 Solving for equilibrium . 116 5.6.2 Approach to equilibrium via spectral decomposition . 117 5.7 Wright-Fisher simulation code . 120 5.8 Validation by simulation . 121 5.9 Supplementary figures . 122 6 WFES2: New models, computations, and improved performance 124 6.1 Abstract . 124 6.2 Introduction . 126 6.2.1 Available computations . 128 6.3 Methods and Results . 132 6.3.1 Time-dependent distributions . 132 6.3.2 Discrete phase-type distributions . 134 6.4 Small population size approximation and truncation . 137 iv 6.5 Conclusions . 139 6.6 Supplement . 140 6.6.1 Adjustable sparsity threshold . 140 6.6.2 Distributions of time to absorption with selection . 140 7 Future Directions 143 7.1 Multiple Alleles . 144 7.2 Comparison to diffusion theory . 145 7.3 Coalescent . 146 v List of Figures 1.1 Change of number of copies per Wright-Fisher generation . .5 2.1 Relative probabilities of fixation . 19 2.2 Probabilities of fixation . 27 2.3 A small N = 10 WF transition matrix . 28 3.1 Distributions of allele age by simulation . 37 3.2 Expected allele age and variance as a function of selection, dominance, and mutation rate . 39 3.3 Expected extinction and fixation times when mutation is strong . 41 3.4 Difference in conditional sojourn times (compared to neutral) for selected alleles going to extinction . 43 3.5 Effect of integrating out uncertainty in p .................... 45 3.6 Expected allele age and variance - larger range of mutaiton rates . 55 3.7 Simulated neutral allele age distributions. Larger range of mutation rates. 56 3.8 Simulated non-neutral allele age distributions (θ = 0:01) - larger range of mutation rates. 56 3.9 Simulated non-neutral allele age distributions (θ = 0:96) - larger range of mutation rates . 57 4.1 Markov-Modulated Wright-Fisher model . 68 4.2 Fluctuating population size in a reversible switching model MMWF vs HM . 75 vi 4.3 Instantaneous doubling of population size from N1 = 1000 to N2 = 2000, after average of t generation - MMWF vs, HM .................... 78 4.4 Instantaneous doubling of population size from N1 = 1000 to N2 = 2000, after average of 5; 000 generations in N1, with variable selection . 79 4.5 Full allele frequency spectra after a non-equilibrium demography for neutral variants . 81.

Direct Solutions of the Wright-Fisher Model

Direct Solutions of the Wright-Fisher Model

Joint Bayesian Estimation of Mutation Location and Age Using Linkage Disequilibrium

A Maximum Likelihood Approach

The Evolutionary History of the CCR5-Δ32 HIV-Resistance Mutation

Population Genetics of Rare Variants and Complex Diseases Authors

Joint Nonparametric Coalescent Inference of Mutation Spectrum

The Linked Selection Signature of Rapid Adaptation in Temporal Genomic Data

Convergent Adaptation of Human Lactase Persistence in Africa and Europe

Estimating Time to the Common Ancestor for a Beneficial Allele

The Geographic Spread of the CCR5 D32 HIV-Resistance Allele

Supplemental Materials

Linkage Disequilibrium — Understanding the Evolutionary Past and Mapping the Medical Future