From Array Genotypes to Whole-Genome Sequences: Harnessing Genetic Variation for Dairy Cattle Selection
Total Page:16
File Type:pdf, Size:1020Kb
From array genotypes to whole-genome sequences: harnessing genetic variation for dairy cattle selection by Adrien M. Butty A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Doctor of Philosophy in Animal Biosciences Guelph, Ontario, Canada © Adrien Butty, September 2019 ABSTRACT FROM ARRAY GENOTYPES TO WHOLE-GENOME SEQUENCES: HARNESSING GENETIC VARIATION FOR DAIRY CATTLE SELECTION Adrien M. Butty Advisor: University of Guelph, 2019 Prof. Dr. Christine F. Baes Genetic markers are currently used as a tool in animal breeding to measure and make use of genetic variation. After the unsuccessful implementation of marker-assisted selection including microsatellites in genetic evaluation models, implementation of genomic selection principles in breeding programs allowed drastic acceleration of the genetic gains made in the dairy industry. Although great genetic improvements have been made possible with the introgression of array- based single nucleotide polymorphisms (SNP) genotypes into genetic evaluation models, more variants are needed to explain a larger proportion of the genetic variance observed in economically important traits. This would allow for more precise estimation of breeding values (EBV) and greater genetic progress. The increase in the number and type of variants through using whole- genome sequencing (WGS), for instance copy number variants (CNV), in genetic evaluation models could contribute to higher accuracies of EBV. Sequencing of a large number of animals is still prohibitively expensive, but the large number of genotyped samples already available allows for 1) the accurate imputation of genotypes to WGS variants, 2) the identification of CNV relying on the signal intensity values produced at the time of array genotyping, and 3) the use of the CNV identified with high confidence in silico to gain knowledge about the genetic architecture of traits of economic importance, for example hoof health traits. In this thesis, haplotype-based methods were developed that improve reference population animal selection for sequencing, to allow for more accurate imputation of common or rare variants. Secondly, CNV were identified with high confidence using both array genotypes and WGS information. Finally, CNV regions were identified that were associated with hoof health traits recorded for the Canadian Holstein genetic evaluation. Starting with SNP that were phased to haplotypes and looking at the structural variants that are CNV, this thesis bridges current and possible future genetic markers to exploit the maximum genomic information present in the dairy population. Altogether, the advances made in this thesis will permit an increase in the rate of genetic improvement for dairy cattle once breeding value estimation models have been developed that efficiently include and combine CNV and SNP information. iv DEDICATION A Marie-Berthe et Henri v ACKNOWLEDGEMENTS My first thank you goes to my supervisor of long date, Dr. Christine Baes, not only for her constant support and patience, but also for the great moments spent listening to her stories, which are always so interesting. Thank you, Chris, for bringing me back on track and for challenging me in situations where I definitely needed to get a kick to go on. I am also very grateful to my co-advisor, Dr. Filippo Miglior. Thank you, Filippo, for your support and mentorship through this PhD adventure. I very much appreciate your steady support, your guidance in this unknown process that is the development of a PhD thesis, but also the support in times where I needed an ear for my concerns. I would like to thank all members of my advisory committee members for their help, and constructive feedbacks: Dr. Paul Stothard, Dr. Mehdi Sargolzaei, Dr. Birgit Gredler-Grandl, and Dr. Angela Cánovas. Special thanks to Paul and his team at the University of Alberta for the opportunity that I had to visit and learn about next-generation sequence data, copy number variants and the winter in Alberta. I acknowledge the financial support of the Efficient Dairy Genome Project and all partners. In addition, I am grateful to the Swiss Association for Animal Science and Qualitas AG (Zug, Switzerland) for their support. A special thanks goes here to the genetic evaluation team of Qualitas AG who accepted me as part of their group in the times I was working on my thesis from their offices. Technical difficulties along my PhD thesis were always resolved in a very bright and rapid manner by the IT manager of the Department of Animal Sciences. I thank from the bottom of my heart Bill Szkotnicki for his immeasurable help to get or keep my (sometimes somewhat large) analyses running. The people I met in Guelph all have their share in my well-being during my stay and I am very grateful for all the encounters. I especially want to thank Marta McCarthy and the Gryphon Singers for welcoming me in the choir. A huge thank you also to Nathalie Lauriault and Gord Kirby for giving me a home away from home and so many great memories. Merci! vi It is a great experience to be part of the CGIL community, from everyday lunch chats to more festive party nights, I am very grateful for all the great memories and the support in my academic time. Thank you all of you! Last but definitely not least, I thank from the bottom of my heart, Marco, my family and my friends in Switzerland and in Canada for their support throughout all the more and the less good phases I went through during my thesis. The support I received every day is what got me to the end of this adventure. To all of you: thank you! vii TABLE OF CONTENTS Abstract ........................................................................................................................................... ii Dedication ...................................................................................................................................... iv Acknowledgements ......................................................................................................................... v Table of Contents .......................................................................................................................... vii List of Tables ................................................................................................................................. ix List of Figures ................................................................................................................................. x List of Abbreviations ................................................................................................................... xiii Chapter 1: General introduction...................................................................................................... 1 1.1 Cattle breeding and genetic markers ............................................................................... 1 1.2 Thesis objectives ............................................................................................................. 5 1.3 References ....................................................................................................................... 5 Chapter 2: Optimizing selection of the reference population for genotype imputation from array to sequence variants ............................................................................................................................ 8 2.1 Abstract ........................................................................................................................... 9 2.2 Introduction ................................................................................................................... 10 2.3 Material and methods .................................................................................................... 13 2.4 Results ........................................................................................................................... 20 2.5 Discussion ..................................................................................................................... 24 2.6 Conclusions ................................................................................................................... 28 2.7 References ..................................................................................................................... 28 2.8 Acknowledgements ....................................................................................................... 32 2.9 Tables ............................................................................................................................ 33 2.10 Figures........................................................................................................................... 37 Chapter 3: High confidence copy number variants identified in Holstein dairy cattle from whole genome sequence and genotype array data ................................................................................... 48 3.1 Abstract ......................................................................................................................... 49 3.2 Introduction ................................................................................................................... 49 3.3 Materials and methods .................................................................................................. 51 3.4 Results ........................................................................................................................... 56 3.5 Discussion