A Worldwide Map of Swine Short Tandem Repeats and Their
Total Page:16
File Type:pdf, Size:1020Kb
Wu et al. Genet Sel Evol (2021) 53:39 https://doi.org/10.1186/s12711-021-00631-4 Genetics Selection Evolution RESEARCH ARTICLE Open Access A worldwide map of swine short tandem repeats and their associations with evolutionary and environmental adaptations Zhongzi Wu1, Huanfa Gong1, Mingpeng Zhang1, Xinkai Tong1, Huashui Ai1, Shijun Xiao1, Miguel Perez‑Enciso2,3, Bin Yang1* and Lusheng Huang1* Abstract Background: Short tandem repeats (STRs) are genetic markers with a greater mutation rate than single nucleotide polymorphisms (SNPs) and are widely used in genetic studies and forensics. However, most studies in pigs have focused only on SNPs or on a limited number of STRs. Results: This study screened 394 deep‑sequenced genomes from 22 domesticated pig breeds/populations world‑ wide, wild boars from both Europe and Asia, and numerous outgroup Suidaes, and identifed a set of 878,967 poly‑ morphic STRs (pSTRs), which represents the largest repository of pSTRs in pigs to date. We found multiple lines of evidence that pSTRs in coding regions were afected by purifying selection. The enrichment of trinucleotide pSTRs in coding sequences (CDS), 5′UTR and H3K4me3 regions suggests that trinucleotide STRs serve as important com‑ ponents in the exons and promoters of the corresponding genes. We demonstrated that, compared to SNPs, pSTRs provide comparable or even greater accuracy in determining the breed identity of individuals. We identifed pSTRs that showed signifcant population diferentiation between domestic pigs and wild boars in Asia and Europe. We also observed that some pSTRs were signifcantly associated with environmental variables, such as average annual tem‑ perature or altitude of the originating sites of Chinese indigenous breeds, among which we identifed loss‑of‑function and/or expanded STRs overlapping with genes such as AHR, LAS1L and PDK1. Finally, our results revealed that several pSTRs show stronger signals in domestic pig—wild boar diferentiation or association with the analysed environmen‑ tal variables than the fanking SNPs within a 100‑kb window. Conclusions: This study provides a genome‑wide high‑density map of pSTRs in diverse pig populations based on genome sequencing data, enabling a more comprehensive characterization of their roles in evolutionary and environ‑ mental adaptation. Background animals [1], and they account for 3 and 3.4% of the human Short tandem repeats (STRs), also known as microsat- and mouse genomes, respectively [2]. STRs stem mainly ellite DNA loci, are defned as tandem repetitive DNA from DNA polymerase slippage events during DNA rep- elements with core unit lengths of one to six base pairs. lication [3]. In humans, the mutation rate of STRs ranges STRs are largely present in the genomes of plants and from 10–8 to 10–2 per locus per generation [4], which is much higher than the mutation rate of single nucleotide polymorphisms (SNPs) (10–9 to 10–8) [5]. Owing to their *Correspondence: [email protected]; [email protected] high genotype diversity and broad distribution across the 1 State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China genome, STRs have been widely used in genetic mapping Full list of author information is available at the end of the article of complex diseases [6], forensics [7], population genetics © The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Wu et al. Genet Sel Evol (2021) 53:39 Page 2 of 16 [8], linkage analysis [9], and cell line identifcation [10]. pSTRs substantially augment the pSTRs landscape in In humans, over 40 Mendelian diseases are caused by pigs. We surveyed the pSTRs for their genomic distri- STRs, many of which are trinucleotide repeats. Tese butions by diferent unit lengths, genetic diversities in diseases include the fragile X syndrome, which is associ- diferent populations, and their accuracy in classifying ated with amplifcation of the CGG motif in the 5′ UTR individuals by breed identity. We annotated loss-of-func- of the FMR1 gene [11], and Huntington’s disease, which tion and expanded pSTRs. We also identifed a number is linked to CAG repeat expansion in the frst exon of the of pSTRs that showed signifcant diferentiation between HTT gene [12]. Studies have also shown important roles domesticated pigs and wild boars in Asia and Europe for STRs in the regulation of gene expression traits [13, and pSTRs that are associated with environmental vari- 14]. ables such as average annual temperature or altitude in Population genetics studies using dozens to hundreds Chinese indigenous pigs, and compared the signals of of STRs have been carried out since the early 1990s [15]. population diferentiation or environmental adaptation- Recently developed software tools such as lobSTR [16], related pSTRs with those of SNPs called from the same STR-FM [17] and HipSTR [18] have enabled accurate dataset. genotyping of STRs from genome sequence data and their characterization in terms of their association with Methods evolutionary and environmental adaptations and with Samples complex traits in cohorts of various species including Sequence data of 394 pigs or other Sus species were used humans [19], macaques [20], cattle [21] and ducks [22]. to genotype the genome-wide STRs, of which 241 were For example, Xu et al. [21] generated the frst high-den- generated by our laboratory, and the rest were down- sity bovine STR map and identifed STRs associated with loaded from public databases (see Additional fle 1: dairy traits. Fan and his colleagues [22] found that STRs Table S1 and Fig. 1). Te 394 individuals comprise 24 that diferentiated Pekin ducks from mallards are associ- Asian wild boars from South China and Korea; 157 ated with energy metabolism and nervous system genes. Chinese domesticated pigs representing 15 indigenous Forman et al. [23] observed that a (GAA)n repeat expan- breeds, including Bamaxiang (BMX), Luchuan (LC), sion in intron 35 of the ITPR1 gene potentially causes Wuzhishan (WZS), Laiwu (LWH), Hetao (HT), Min spinocerebellar ataxia in Italian spinone dogs. (MIN), Bamei (BM), Baoshan (BS), Neijiang (NJ), Jinhua Pigs are among the most important domestic animals. (JH), Erhualian (EHL), Yunnan Tibetan (YT), Sichuan To date, most of the population genetics studies that Tibetan (ST), Gansu Tibetan (GT) and Tibet Tibetan have been performed in pigs have been based on SNPs (TT); 21 European wild boars; 43 European domesti- called from high-density SNP arrays [24] or second- cated pigs; 130 international commercial pigs, including generation sequencing data [25, 26], or on a small num- Duroc (DU), Landrace (LR), Pietrain (PI), Hampshire ber (from dozens to hundreds) of STRs genotyped with (HP) and Large White (LW) breeds, and 26 outgroup the Sanger sequencing technology [27]. A recent study species, including seven Sus cebifrons, two Sus verru- identifed approximately 16,000 high-quality polymor- cosus, one Sus celebensis, one Sus barbatus, six Porcula phic STRs in sequence data from 102 pigs by referring salvania, one Phacochoerus africanus and one Babyrousa to the Sscrofa10.2 genome assembly and characterized babyrussa. their genetic diversity among diferent breeds [28]. How- ever, this study included a limited number of samples Annotation of repeat sequences in the reference genome and breeds, which restricted its ability to investigate We downloaded the Sscrofa11.1 reference genome the diversity of STRs and their evolutionary roles in sequences from ENSEMBL (ftp:// ftp. ensem bl. org/ pub/ pigs. Moreover, the reference genome of pigs has been relea se- 95/ fasta/ sus_ scrofa/). Te Tandem Repeat Finder updated to Sscrofa11.1, which is a much better assem- (TRF v4.09) software [30] was used to search for poten- bly than Sscrofa10.2 [29]. Terefore, characterizing the tial STR loci in the reference genome with the following STRs in larger cohorts and using the latest porcine refer- command: “${trf} genome.fa 2 7 7 80 10 20 6 -d–h”. All ence genome would be helpful to better understand the overlapping and equivocal STRs were excluded. Repeat- genetic diversity, evolutionary and environmental adap- Masker and RepeatModeler were used to detect other tations of STRs in pigs. repeat elements, including short interspersed nucleo- In this study, we investigated polymorphic STRs tide elements (SINE), long interspersed nucleotide ele- (pSTRs) in 330 domestic pigs from 22 indigenous pig ments (LINE), long terminal repeat retrotransposons populations, 24 Asian and 21 European wild boars, and (LTR), and DNA transposable elements (DNA_TE). 19 individuals from other Suidae species based on the Briefy, RepeatMasker (v4.0.7) [31] was used to search for Sscrofa11.1 genome assembly (Fig. 1). Te identifed repeat sequences by BLAST search of the existing repeat Wu et al. Genet Sel Evol (2021) 53:39 Page 3 of 16 Fig. 1 Geographical distributions and STR‑based neighbor joining tree on 394 samples investigated in this study.