From Sequence to Phenotype: the Impact of Deleterious Variation in Livestock
Total Page:16
File Type:pdf, Size:1020Kb
From Sequence to Phenotype: From Sequence to Phenotype: The Impact of Deleterious Variation in Livestock Martijn F.L. Derks The Impact of Deleterious Variation in Livestock From sequence to phenotype: the impact of deleterious variation in livestock Martijn F.L. Derks Thesis committee Promotor Prof. Dr M.A.M. Groenen Professor of Animal Breeding and Genomics Wageningen University & Research Co-promotor Dr H.-J.W.C. Megens Assistant professor, Animal Breeding and Genomics Wageningen University & Research Other members Dr C. Charlier, University of Liège, Belgium Dr P. Knap, Genus-PIC, Rabenkirchen-Fauluck, Germany Prof. Dr B.J. Zwaan, Wageningen University & Research, Netherlands Prof. Dr Cord Drögemüller, University of Bern, Switzerland This research was conducted under the auspices of the Graduate School of Wageningen Institute of Animal Sciences (WIAS). From sequence to phenotype: the impact of deleterious variation in livestock Martijn F.L. Derks Thesis submitted in fulfillment of the requirements for the degree of doctor at Wageningen University by the authority of the Rector Magnificus, Prof. Dr A.P.J. Mol, in the presence of the Thesis Committee appointed by the Academic Board to be defended in public on Wednesday 2 September, 2020 at 4. p.m. in the Aula. Martijn F.L. Derks From sequence to phenotype: the impact of deleterious variation in livestock PhD thesis, Wageningen University, the Netherlands (2020) With references, with summary in English ISBN 978-94-6395-320-7 DOI: https://doi.org/10.18174/515207 Abstract Derks, MFL. (2020). From sequence to phenotype: the impact of deleterious variation in livestock. PhD thesis, Wageningen University, the Netherlands The genome provides a blueprint of life containing the instruction, together with the environment, that determine the phenotype. In animal breeding, we try to understand this relationship between an animals genetic code (genome sequence) and its performance (phenotype) to select the best performing animals for the next generation. Recently, rapid improvements in genome sequencing have opened up new possibilities to explore and use the complete set of genetic variation seen in (livestock) animals that can be exploited by scientists to try and further close the genotype-phenotype link. However, despite the vast increase of molecular data, pinpointing the exact variants underlying a phenotype of interest is still challenging. In this thesis I provide an in-depth analysis of population genomics and transcriptomics data to identify deleterious and functional variation in livestock populations. More specifically, I report several variants that causes lethality in homozygous state within different stages of development. Moreover, I pinpoint the exact causal mutations and describe its functional consequences at the molecular, phenotypic, and population level. Subsequently, I focus on the identification of functional variation underlying important selection traits. I combine various sources of functional (epi)genomic data to predict the impact of variation in livestock. Together I provide a comprehensive overview of high-impact variation and molecular mechanisms affecting important phenotypes in various livestock breeds, and discuss that molecular genomics could benefit genomic prediction in livestock. 5 Contents 9 1 – General introduction 23 2 – A systematic survey to identify lethal recessive variation in highly managed pig populations 45 3 – Early and late feathering in turkey and chicken: same gene but different mutations 57 4 – A survey of functional genomic variation in domesticated chickens 81 5 – Balancing selection on a recessive lethal deletion with pleiotropic effects on two neighboring genes in the porcine genome 105 6 – Loss of function mutations in essential genes cause embryonic lethality in pigs 129 7 – Detection of a frameshift deletion in the SPTBN4 gene leads to prevention of severe myopathy and postnatal mortality in pigs 145 8 – Accelerated discovery of functional variation in pigs 167 9 – General discussion 186 References 209 Summary 213 Curriculum Vitae 221 Acknowledgements 1 General introduction 1 General introduction 1.1 Genome variation The central dogma in molecular biology describes a process by which DNA is transcribed to RNA which is translated to proteins (Crick 1970). The proteins perform most of the work in our cells, which in return are responsible for the regulation, structure, and functioning of the individuals tissues and organs. The DNA provides a blueprint of life containing the instruction to grow, develop, survive, and reproduce. The complete genetic code (i.e. the genome sequence) defines the phenotype of an individual (Lehner 2013). Hence, the differences (i.e. genetic variation) in genome sequences contribute to the observed phenotypic variation in a population. Animal breeders try to utilize variation in genomes to select the best performing animals for the next generation (Georges et al. 2019). However, until recently, the characterization of the entire genome sequence of an individual was a costly and time-consuming procedure (Shendure et al. 2017). Yet, rapid improvements in genome sequencing have opened up new possibilities to explore and use the complete set of genetic variation seen in (livestock) animals. Within the department of Animal Breeding and Genomics at Wageningen University, hundreds of animals from various livestock populations have been sequenced. The sequences can be mapped back to a reference genome, built from a single individual to produce a high quality genome (Hillier et al. 2004; Groenen et al. 2012), to explore the millions of genetic variants present in the population. Most of these variants are substitutions of a single nucleotide that occurs at a specific position in the genome (SNPs), or short deletions or insertions (indels). Until recently, the livestock genomes were of much lower quality compared to the human or mouse genome. With the developments in long read sequencing technology, much improved reference genomes for chicken, cattle, and pig have been created (Warren et al. 2017; Warr et al. 2019). These improved reference genomes have significantly improved variant discovery and further downstream analysis. 1.2 From sequence to phenotype Domestication constitutes a long-term biological experiment, started by early farming societies and currently done by animal breeders, to constantly improve stock. The ultimate goal in livestock genomics is to understand the relationship between the genome of the animal and its phenotype (i.e. the performance). However, this link between an animals genetic code, and the performance is extremely complex (Gjuvsland et al. 2013). 11 1 General introduction A major breakthrough in animal breeding was provided with the discovery of genomic selection, at the start of this century (Meuwissen et al. 2001), which led to enhanced rates of genetic improvement and shortening of generation intervals. In general, genomic selection uses a variant panel on a chip (SNP chip) that is distributed across the genome, allowing to capture within-breed genetic variation used to select the best animals for breeding. By that, a wealth of genomic information has become available for many species, including livestock (Dekkers 2012). However, genomic selection uses the genome as a black box, as the SNPs on the chip are not causal, but genetically linked to the actual causal variants and genes (Habier et al. 2013). Despite the remarkable success of genomic selection, the molecular drivers underlying the selection traits are still largely unknown. One factor that explains the success of genomic selection is that most important traits (e.g. growth, fertility) are actually regulated by a large set of genes (Hayes and Goddard 2001), with often subtle effects (polygenic traits), while only a few important traits are inherited by a variant in a single gene, or a small subset of genes with large effects (monogenic traits). To unravel the genetic factors underlying these mono- and polygenic traits requires both genomic and phenotypic information to associate regions in the DNA with important traits, the so-called quantitative trait loci (QTL). Although thousands of such QTL have been reported by now (Hu et al. 2019), identification of the causal genes and variants underlying these QTL is still a major challenge. Hence, only few functional variants have been discovered that are currently used in animal breeding (Goddard et al. 2016). An additional type of functional (high-impact) variants are those that exhibit harmful effects (i.e. deleterious). Inferring the function of deleterious variants (e.g. underlying recessive defects), and underpinning the genetic basis of the disorder is still a major challenge. However, variants underlying severe phenotypes can likely be identified more easily compared to variation with small effects on important selection traits, thereby providing a starting point to study the genotype-phenotype link in livestock. 12 1 General introduction Glossary SNP: Single nucleotide polymorphism, resulting in another nucleotide at one single base in the genome. Indel: Small deletion or insertion of bases in the genome. Haplotype: A group of alleles in an organism that are inherited together from a single parent. Genetic drift: The random fluctuations of allele frequencies from generation to generation. Mutation load: The total genetic burden in a population resulting from accumulated deleterious mutations Recessive lethal: A variant resulting in a lethal phenotype in homozygous