Booker2018 Redacted.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
This thesis has been submitted in fulfilment of the requirements for a postgraduate degree (e.g. PhD, MPhil, DClinPsychol) at the University of Edinburgh. Please note the following terms and conditions of use: This work is protected by copyright and other intellectual property rights, which are retained by the thesis author, unless otherwise stated. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the author. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the author. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given. Understanding variation in nucleotide diversity across the mouse genome Tom Booker Submitted for the degree of Doctor of Philosophy University of Edinburgh 2018 Declaration I declare that the thesis has been composed by myself and that the work has not be submitted for any other degree or professional qualification. I confirm that the work submitted is my own, except where work which has formed part of jointly-authored publications has been included. My contribution and those of the other authors to this work have been explicitly indicated below. I confirm that appropriate credit has been given within this thesis where reference has been made to the work of others. I composed this thesis, under guidance from Professor Peter Keightley, and with contributions from other authors for individual chapters as stated below. Parts of the introduction were included in a published review paper. Chapter 2 is published and Chapter 3 is a manuscript currently under review. I have retained the use of "we" in Chapters 2-3 and, for consistency, also use \we" in Chapter 4. I use \I" for Chapter 5. All published papers are reproduced in the Appendix. I confirm that I carried out all analyses myself. Chapter 2 I designed the analyses with Professor Peter Keightley and Dr Rob Ness. I performed all analyses and wrote the paper, under guidance from Professor Keightley and Dr Ness. Chapter 3 I designed the analyses for this project with Professor Peter Keightley. I performed all analyses and wrote the paper, under guidance from Professor Keightley. Chapter 4 I designed the analyses for this project with Professor Peter Keightley and Professor Brian Charlesworth. I performed all analyses and wrote the paper, under guidance from Professors Keightley and Charlesworth. Tom Booker - 22nd October 2018 i Thesis Abstract It is well known that nucleotide diversity varies across the genomes of eukaryotic species in ways consistent with the effects of natural selection. However, the contribution of selection on advantageous and deleterious mutations to the observed variation is not well understood. In this thesis, I aim to disentangle the contribution of background selection and selective sweeps to patterns of genetic diversity in the mouse genome, thus furthering our understanding of natural selection in mammals. In chapter 1, I introduce core concepts in evolutionary genetics and describe how recombination and selection interact to shape patterns of genetic diversity. I will then describe three projects in which I examine aspects of molecular evolution in house mice. In the first of these, I estimate the landscape of recombination rate variation in wild mice using population genomic data. In the second, I estimate the distribution of fitness effects for new mutations, based on the site frequency spectrum, then analyse population genomic simulations parametrized using my estimates. In the third, I use a model of selective sweeps to estimate and compare the strength of selection in protein-coding and regulatory regions of the mouse genome. This thesis demonstrates that selective sweeps are responsible for a large amount of the variation in genetic diversity across the mouse genome. ii Lay Summary Evolutionary genetics is the study of how and why an organisms' genetic code changes through time and space. In organisms such as humans and mice, a large proportion of the mutations that arise in any given generation are benign. Evolutionary genetic model have shown that mutations that are not benign, whether they are beneficial or harmful, can influence variation in genetic diversity across the genome. In the last 30 years or so, numerous studies have shown that patterns of genetic diversity in mammals are consistent with the predictions of such models. What remains unclear is the contribution that beneficial and harmful mutations make to variation in genetic diversity. This thesis aims to determine how natural selection, which acts to remove harmful mutations and promote beneficial ones, affects levels of genetic diversity in mammals. To this end, I study wild mice as they are a highly tractable model organism. I analyse the genome sequences of 10 wild mice using population genetic models to estimate the strength and frequency of natural selection. Using my estimates I test how natural selection contributes to variation in genetic diversity in the mouse genome. The research presented in this thesis confirms that natural selection plays a major role in shaping patterns of diversity in the mouse genome. Furthermore, this thesis demonstrates that selection on beneficial mutations is responsible for a considerable proportion of the variation. iii I dedicate this thesis to my good friend Arya NCSJTN83 iv Acknowledgements First and foremost, thanks to Peter Keightley. He has been an excellent supervisor and mentor for the last four years. Peter's help and guidance has helped me develop in many ways and has made my PhD an incredibly enjoyable and rewarding experience. Peter's thorough and rigorous approach to science is something that I aspire to. Thanks for introducing me to disc golf too! I would also thank Brian Charlesworth for helping me understand the thornier aspects of population genetics that I have come across. Brian has been very patient with me and willing to give help and advice whenever I have asked for it, which has been hugely helpful throughout my PhD. Thanks to Deborah Charlesworth for being very kind and generous in helping me understand many aspects of evolutionary biology, not limited to my own research. Thanks to participants of the evolutionary genetics lab group and genetics journal club in IEB for great discussions and feedback about my research, particularly Susie Johnston and Konrad Lohse. I would also extend massive thanks to Sally Otto and members of her lab group at UBC for giving me such a welcoming place to work and write-up my thesis in Vancouver. Thanks to members of the Keightley lab past and present for listening to me go on and on about selective sweeps or other things for the past four years. In no particular order: thanks Ben Jackson, Rory Craig, Susanne Kraemer, Eva Deinum, Thanasis Kousathanas and Matty Hartfield. Rob Ness taught me to code and helped me out in numerous ways, it's just a shame he left Edinburgh when he did. Without the help of Dan Halligan, who tolerated me bugging him even after he left the lab, I would have been very lost. On the personal side of things, my family has been very supportive throughout my whole PhD, particularly my amazing Mum and Dad. My brothers and their SAPs (spouses and partners) are all really great, but Stella, you really nailed it. Thanks to my friends in IEB and the whole Berwickshire gang for palling around with me in Edinburgh. Thanks to Jaz and Michael for all the great hangs in Vancouver. Finally, Arya has looked after me and kept my head straight for the past four years. She is the best and even married me during this PhD, can you believe that? v Publications The following publications have arisen from this thesis: • Booker, T. R., Ness, R. W., & Keightley, P. D. (2017). The recombination landscape in wild house mice inferred using population genomic data. Genetics, 207(1), 297-309. • Booker, T. R., Jackson, B. C., & Keightley, P. D. (2017). Detecting positive selection in the genome. BMC Biology, 15(1), 98. The following has been accepted at Molecular Biology and Evolution and has been deposited on bioRXiv. • Booker, T. R., & Keightley, P. D. (In Press). Understanding the factors that shape patterns of nucleotide diversity in the house mouse genome. Molecular Biology and Evolution I contributed to the following papers during my PhD, but these do not form part of this thesis: • Booker, T., Ness, R. W., & Charlesworth, D. (2015). Molecular evolution: breakthroughs and mysteries in Batesian mimicry. Current Biology, 25(12), R506- R508. • Keightley, P. D., Campos, J. L., Booker, T. R., & Charlesworth, B. (2016). Inferring the frequency spectrum of derived variants to quantify adaptive molecular evolution in protein-coding genes of Drosophila melanogaster. Genetics, 203(2), 975-984. vi Contents Declaration i Acknowledgements v Publications vi Contents vii 1 Introduction 1 1.1 Molecular evolution . 2 1.1.1 Genetic drift and coalescence . 2 1.1.2 Natural selection . 4 1.1.3 The neutral theory . 6 1.2 Detecting positive selection . 7 1.2.1 The McDonald-Kreitman test . 10 1.2.2 Using models of selective sweeps to estimate positive selection parameters . 11 1.2.3 The correlation between diversity and the rate of recombination 12 1.2.4 Correlations between neutral diversity and non-neutral divergence 13 1.2.5 Patterns of diversity around the targets of selection . 15 vii CONTENTS viii 1.2.6 Fitting genome wide patterns . 17 1.3 Studying molecular evolution in wild house mice . 17 1.3.1 Halligan et al. (2013) . 18 1.3.2 Moving forward . 20 1.4 Thesis aims .