1 Identification of Selection Signatures in Cattle Breeds Selected for Dairy Production
Total Page:16
File Type:pdf, Size:1020Kb
Genetics: Published Articles Ahead of Print, published on May 17, 2010 as 10.1534/genetics.110.116111 1 Identification of Selection Signatures in Cattle Breeds Selected for Dairy Production 2 3 Alessandra Stella*, Paolo Ajmone-Marsan†, Barbara Lazzari‡ and Paul Boettcher§,1 4 *Istituto di Biologia e Biotecnologia Agraria, Consiglio Nazionale delle Ricerche, Milan, Italy 5 20133 6 †Istituto di Zootecnica, Università Cattolica del Sacro Cuore, Piacenza, Italy 29100 7 ‡Parco Tecnologico Padano, Lodi, Italy 26900 8 §Joint FAO-IAEA Division for Nuclear Applications to Food and Agriculture, International 9 Atomic Energy Agency, Vienna, Austria A-1400 10 11 12 13 14 1 Present Address: Animal Production and Health Division, Food and Agriculture Organization of the United Nations, Rome, Italy 1 15 Running Head: Selection Signatures in Dairy Cattle 16 Key Words: Selection signature, breed, phenotype, permutation test 17 Corresponding Author: 18 Paul Boettcher 19 FAO-AGAP 20 Viale delle Terme di Caracalla 21 00153 Rome, Italy 22 +39 06 570 55620 23 [email protected] 24 25 26 2 27 ABSTRACT 28 The genomics revolution has spurred the undertaking of HapMap studies of numerous 29 species, allowing for population genomics to increase the understanding of how selection has 30 created genetic differences between subspecies populations. The objectives of this study were 31 1) to develop an approach to detect signatures of selection in subsets of phenotypically similar 32 breeds of livestock by comparing SNP diversity between the subset and a larger population, 33 2) to verify this method in breeds selected for simply-inherited traits, and 3) apply this method 34 to the dairy breeds in the International Bovine HapMap (IBHM) study. The data consisted of 35 genotypes for 32,689 SNP of 497 animals from 19 breeds. For a given subset of breeds, the 36 test statistic was the parametric composite log likelihood (CLL) of the differences in allelic 37 frequencies between the subset and the IBHM for a sliding window of SNP. The null 38 distribution was obtained by calculating CLL for 50,000 random subsets (per chromosome) of 39 individuals. The validity of this approach was confirmed by obtaining extremely large CLL at 40 the sites of causative variation for polled (BTA1) and black-coat-color (BTA18) phenotypes. 41 Across the 30 bovine chromosomes, 699 putative selection signatures were detected. The 42 largest CLL was on BTA6 and corresponded to KIT, which is responsible for the piebald 43 phenotype present in four of the five dairy breeds. Potassium channel-related genes were at 44 the site of the largest CLL on three chromosomes (BTA14, 16 and 25) whereas integrins 45 (BTA18 and 19) and serine/arginine rich splicing factors (BTA20 and 23) each had the largest 46 CLL on two chromosomes. Based on the results of this study, the application of population 47 genomics to farm animals seems quite promising. Comparisons between breed groups have 48 the potential to identify genomic regions influencing complex traits with no need for complex 49 equipment and the collection of extensive phenotypic records and can contribute to the 50 identification of candidate genes and to the understanding of the biological mechanisms 51 controlling complex traits. 52 3 53 Recent advances in genomics technology have created the opportunity to greatly expand our 54 ability to study the genetics of the organisms in the world around us. Numerous “HapMap” 55 studies have been undertaken in various species, whereby subpopulations of a given species 56 have been genotyped and compared for differences in their respective genomes. In livestock, 57 HapMap studies can provide insight on the differentiation of breeds and in the process of 58 long-term selection for particular phenotypes of complex traits. When a favorable mutation 59 occurs within a population subject to directional selection, the frequency of the favorable 60 allele is likely increase over subsequent generations. Because DNA is a linear molecule and 61 the probability of recombination is inversely proportional to the distance separating them, the 62 nucleotides immediately adjacent to the favorable mutation also tend to increase in frequency, 63 in a sort of “hitch-hiking” process (Maynard Smith and Haigh, 1974; Fay and Wu, 2000). 64 This process leads to “signatures of selection” that are characterized by distributions of 65 nucleotides around favorable mutations that differ statistically from that expected purely by 66 chance (Kim and Stephan, 2002). Detection of selection signatures can increase the 67 understanding of the evolution and biology underlying a given phenotype, and may provide 68 tools to increase efficiency of selection. 69 A number of methods have been developed for detection of selection signatures through 70 genomic analysis. In general, most of these methods are based on comparison of the 71 distribution of allelic frequencies, either directly, or indirectly, by calculating population 72 genetics statistics that are a function of allelic or genotypic frequencies. As examples of the 73 latter, FST (e.g. Weir et al., 2006; The Bovine HapMap Consortium et al., 2009) and linkage 74 disequilibrium (e.g. Parsch et al., 2001; Przeworski, 2002; Kim and Nielsen, 2004; Ennis, 75 2007), have been used to detect selection signatures. In addition, specific tests for detecting 76 signatures have been developed (e.g. Fu and Li, 1993; Tajima, 1998; Fay and Wu, 2000; Kim 77 and Stephan, 2002; Voight et al., 2006). 4 78 With many of these methods, constructing a significance test is not straight-forward, 79 especially when searching for selection signatures within a single population. Determining the 80 null distribution of the test statistic, i.e. if no selection were operating, often requires making 81 assumptions about the null distribution and applying a parametric test based on statistical 82 theory. An alternative approach is to use simulation to derive a distribution of the test statistic 83 under the assumption of no selection. For example, Kim and Stephan, 2002) proposed the use 84 of a coalescent simulation. The use of simulation, however, requires that the simulation model 85 accurately mimics the dynamics of the population within which the search for selection 86 signatures is being undertaken and that the model is relatively robust to its underlying 87 assumptions. Another factor that may complicate significance testing is the fact that methods 88 to identify selection signatures often involve a large number of tests, on many non- 89 independent loci, across multiple chromosomes or even entire genomes. 90 When data are available from a large number of populations, and one desires to search for a 91 signature of selection within a subset of similar populations, construction of a permutation test 92 may be possible. Livestock breeds selected for various phenotypic traits may offer one such 93 opportunity. For example, the International Bovine HapMap project (IBHM - The Bovine 94 HapMap Consortium et al., 2009) evaluated a range of breeds that have been historically 95 selected, both naturally and artificially, for different phenotypic traits. 96 The primary objective of this study was to develop a test for selection signatures in a subset of 97 breeds sharing a similar phenotype. Randomly drawn sets of individuals from the whole 98 population of breeds were used to establish a null distribution of marker alleles of animals 99 that were not undergoing artificial selection for a specific quantitative phenotype, such as 100 dairy production. In addition, the test developed was designed to account for the multiple 101 testing of many loci and a complete genome consisting of multiple chromosomes. 5 102 This method was first tested by using it to identify selection signatures for qualitative 103 phenotypes, each under the control of a single well-characterized locus. The method was then 104 applied to identify putative signatures within breeds of dairy cattle. In a final step, a brief and 105 subjective evaluation was undertaken of the potential biological significance of several of the 106 genes located closest to the center of regions carrying putative selection signatures. 107 108 MATERIALS AND METHODS 109 The data used in this study were from the IBHM (The Bovine HapMap Consortium et al., 110 2009). The data are available to the public at www.bovinehapmap.org. The IBHM evaluated 111 genotypes of animals from 19 breeds of cattle (see Table 1) plus single animals of two 112 outgroup species (Anoa and Water Buffalo), which were not included in this study. Sampling 113 included Bos taurus, Bos indicus and synthetic breeds from different geographic locations and 114 historically different breeding goals (Table 1). A total of 497 animals were included in the 115 study. The IBHM sampled 24 animals per breed, with the exception of Red Angus (12), 116 Holstein (53) and Limousin (42). Animals were generally unrelated, with the exception of a 117 few breeds for which parent-offspring trios were included to help validate genotyping. The 118 offspring of these trios were not considered in this study. 119 For the IBHM, genotypes were obtained for 37,470 single nucleotide polymorphisms (SNP). 120 Only those SNP that had been assigned to a chromosome (29 autosomes and X) in the 121 Btau_4.0 build of the bovine genome were considered in this analysis, however, leaving 122 32,689 SNP. The distribution of SNP across chromosomes is in Table 2. Chromosomes 6, 14 123 and 25 had more SNP in the IBHM, because these chromosomes were targeted, as they have 124 genes with effects on economically important phenotypic traits in cattle (Khatkar et al., 2002). 125 Test statistic 6 126 The test statistic used in this study was derived from the approach originally proposed by Kim 127 and Stephan (2002) and modified by Nielsen et al. (2005). The approach is based on the 128 calculation of a composite likelihood of the allelic frequencies of SNP observed across 129 “sliding windows”, of adjacent loci.