Exploiting Whole Genome Sequence Variants in Cattle Breeding

Exploiting whole genome sequence variants in cattle breeding Qianqian Zhang Supervisors committee in Aarhus University Main supervisors Dr. Goutam Sahana Senior Researcher, Center for Quantitative Genetics and Genomics Aarhus University, Denmark Dr. Mario Calus Associate professor, Animal Breeding and Genomics Wageningen University & Research, the Netherlands Co-supervisors Prof. Dr. Mogens Sandø Lund Professor of Center for Quantitative Genetics and Genomics Aarhus University, Denmark Dr. Bernt Guldbrandtsen Associate Professor of Center for Quantitative Genetics and Genomics Aarhus University, Denmark This research was conducted under the joint auspices of the Graduate School of Science and Technology (GSST), Aarhus University, Denmark and the Graduate School Wageningen Institute of Animal Sciences (WIAS), Wageningen University and is part of the Erasmus Joint Doctorate Program “EGS-ABG”. Thesis committee Wageningen University Promotor Prof. Dr Henk Bovenhuis Professor of Animal Breeding and Genomics Wageningen University & Research, the Netherlands Prof. Dr. Mogens Sandø Lund Professor of Center for Quantitative Genetics and Genomics Aarhus University, Denmark Co-promotors Dr. Goutam Sahana Senior Researcher, Center for Quantitative Genetics and Genomics Aarhus University, Denmark Dr. Mario Calus Associate professor, Animal Breeding and Genomics Wageningen University & Research, the Netherlands Other members Prof. Dr Fred van Eeuwijk, Wageningen University & Research, the Netherlands Prof. Just Jensen, Aarhus University, Denmark Prof. Dr Johann Sölkner, University of Natural Resources and Life Sciences, Vienna, Austria Prof. Dr Hubert Pausch, Swiss Federal Institute of Technology, Zurich, Switzerland This research was conducted under the joint auspices of the Graduate School of Science and Techonology (GSST), Aarhus University, Denmark and the Graduate School Wageningen Institute of Animal Sciences (WIAS), Wageningen University and is part of the Erasmus Joint Doctorate Program “EGS-ABG”. Exploiting whole genome sequence variants in cattle breeding Unraveling the distribution of genetic variants and role of rare variants in genomic evaluation Qianqian Zhang Thesis submitted in fulfillment of the requirements for the joint degree of doctor between Aarhus University by the authority of the Head of Graduate School of Science and Technology and Wageningen University by the authority of the Rector Magnificus Prof. Dr A.P.J. Mol, in the presence of the Thesis Committee appointed by the Head of Graduate School of Science and Technology at Aarhus University and the Academic Board of Wageningen University to be defended in public on Tuesday 19 December 2017 at 9.00 a.m. in Foulum, Aarhus University Qianqian Zhang Exploiting whole genome sequence variants in cattle breeding PhD thesis, Aarhus University, Foulum, Denmark and Wageningen University, Wageningen, the Netherlands (2017) With summaries in English and Danish ISBN: 978-87-93643-14-7 DOI: https://doi.org/10.18174/428523 Abstract Zhang, Q. (2017). Exploiting whole genome sequence variants in cattle breeding: Unraveling the distribution of genetic variants and role of rare variants in genomic evaluation. Joint PhD thesis, Aarhus University, Denmark and Wageningen University & Research, the Netherlands. The availability of whole genome sequence data enables to better explore the genetic mechanisms underlying different quantitative traits that are targeted in animal breeding. This thesis presents different strategies and perspectives on utilization of whole genome sequence variants in cattle breeding. Using whole genome sequence variants, I show the genetic variation, recent and ancient inbreeding, and genome-wide pattern of introgression across the demographic and breeding history in different cattle populations. Using the latest genomic tools, I demonstrate that recent inbreeding can accurately be estimated by runs of homozygosity (ROH). This can further be utilized in breeding programs to control inbreeding in breeding programs. In chapter 2 and 4, by in-depth genomic analysis on whole genome sequence data, I demonstrate that the distribution of functional genetic variants in ROH regions and introgressed haplotypes was shaped by recent selective breeding in cattle populations. The contribution of whole genome sequence variants to the phenotypic variation partly depends on their allele frequencies. Common variants associated with different traits have been identified and explain a considerable proportion of the genetic variance. For example, common variants from whole genome sequence associated with longevity have been identified in chapter 5. However, the identified common variants cannot explain the full genetic variance, and rare variants might play an important role here. Rare variants may account for a large proportion of the whole genome sequence variants, but are often ignored in genomic evaluation, partly because of difficulty to identify associations between rare variants and phenotypes. I compared the powers of different gene-based association mapping methods that combine the rare variants within a gene using a simulation study. Those gene- based methods had a higher power for mapping rare variants compared with mixed linear models applying single marker tests that are commonly used for common variants. Moreover, I explored the role of rare and low-frequency variants in the variation of different complex traits and their impact on genomic prediction reliability. Rare and low-frequency variants contributed relatively more to variation for health-related traits than production traits, reflecting the potential of improving prediction reliability using rare and low-frequency variants for health-related traits. However, in practice, only marginal improvement was observed using selected rare and low-frequency variants when combined with 50k SNP genotype data on the reliability of genomic prediction for fertility, longevity and health traits. A simulation study did show that reliability of genomic prediction could be improved provided that causal rare and low-frequency variants affecting a trait are known. To my grandfather, always deeply missed. Contents 13 1 – General introduction 33 2 – Runs of homozygosity and distribution of functional variants in the cattle genome 65 3 – Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds 89 4 – Detection of introgressed genomic regions in modern dairy breeds: A case study of the hybrid Modern Danish Red cattle 117 5 – Genome-wide association study for longevity with whole-genome sequencing in 3 cattle breeds 137 6 – Comparison of gene-based rare variant association mapping methods for quantitative traits in a bovine population with complex familial relationships 163 7 – Contribution of rare and low-frequency whole-genome sequence variants to complex traits variation in dairy cattle 187 8 – Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle 209 9 – General discussion 229 Summary 233 Sammendrag 237 Acknowledgements 241 Curriculum Vitae 249 Colophon 1 General introduction 1 General introduction 1.1 Utilizing genomic information in cattle breeding Animal breeding has been very successful in achieving genetic gain to improve economically relevant traits in breeding goal even without knowing underlying genetic mechanisms controlling the traits. However, to reach a higher genetic gain and more accurate selection, it is of great importance to understand genetic mechanisms underlying including genetic variants affecting the traits, but also how selective breeding shapes the distribution of these genetic variants in the current population. If all the causal variants affecting a trait can be identified, the breeding goal for the trait will be achieved simply using the known genetic mechanisms with 100% accuracy in selection (Goddard, 2017). Recently, genomic information generated is helping to explore these genetic mechanisms (Dekkers, 2012, Kiplagat et al., 2012, Brondum et al., 2015). For example, selection for functional traits such as health-related traits will probably be improved with mining the deleterious genetic variants in the population. Therefore, I believe that the next generation animal breeding will be directly selecting on the variants controlling the trait with more accurate biological knowledge underlying different traits, once identified. The availability of genomic data has provided the unique opportunity to examine the changes in genomic composition in different populations over the period under selective breeding, and to start identifying the causal genetic variants underlying different phenotypic traits using statistical methods. The causal genetic variants and their combinations are not randomly distributed across individuals and populations. Artificial selection has shaped the variation landscape in cattle genomes, by eliminating the variants with negative effects and increasing the frequency of alleles with favorable effects on phenotypic traits (Xu et al., 2015). Using genomic data, it is possible to study the effect of intense artificial selection on the distribution of genetic variants in cattle populations. The genomic data are routinely generated for genomic selection in livestock breeding (Bouquet and Juga, 2013). Genomic selection in cattle uses genotype information such as 50k and high density (HD) chip data to predict breeding values for selection candidates (Meuwissen, 2007). By using genomic

Exploiting Whole Genome Sequence Variants in Cattle Breeding

Livestock Genomics Comes of Age Michel Georges and Leif Anclersson 2

The Genome Sequence of Taurine Cattle: a Window to Ruminant Biology and Evolution the Bovine Genome Sequencing and Analysis Consortium, Christine G

Bovinehd Genotyping Beadchip More Than 777,000 Snps That Deliver the Densest Coverage Available for the Bovine Genome

Bovine Genomic Sequencing Initiative Cattle-Izing the Human Genome

Unlocking the Bovine Genome

Livestock 2.0 – Genome Editing for Fitter, Healthier, and More Productive Farmed Animals Christine Tait-Burkard, Andrea Doeschl-Wilson, Mike J

Diversity and Evolution of 11 Innate Immune Genes in Bos Taurus Taurus and Bos Taurus Indicus Cattle

The Genome 10K Project: a Way Forward

A Primary Assembly of a Bovine Haplotype Block Map Based On

A Primary Assembly of a Bovine Haplotype Block Map Based on a 15,036-Single-Nucleotide Polymorphism Panel Genotyped in Holstein–Friesian Cattle

Introduction to Genomic Selection

Novel Functional Sequences Uncovered Through a Bovine Multi-Assembly Graph (Version 1.0) [Dataset]