Evidence for and Localization of Proposed Causative Variants in Cattle and Pig Genomes Martin Johnsson1* and Melissa K
Total Page:16
File Type:pdf, Size:1020Kb
Johnsson and Jungnickel Genet Sel Evol (2021) 53:67 https://doi.org/10.1186/s12711-021-00662-x Genetics Selection Evolution REVIEW Open Access Evidence for and localization of proposed causative variants in cattle and pig genomes Martin Johnsson1* and Melissa K. Jungnickel2 Abstract Background: This paper reviews the localization of published potential causative variants in contemporary pig and cattle reference genomes, and the evidence for their causality. In spite of the difculties inherent to the identifcation of causative variants from genetic mapping and genome-wide association studies, researchers in animal genetics have proposed putative causative variants for several traits relevant to livestock breeding. Results: For this review, we read the literature that supports potential causative variants in 13 genes (ABCG2, DGAT1, GHR, IGF2, MC4R, MSTN, NR6A1, PHGK1, PRKAG3, PLRL, RYR1, SYNGR2 and VRTN) in cattle and pigs, and localized them in contemporary reference genomes. We review the evidence for their causality, by aiming to separate the evidence for the locus, the proposed causative gene and the proposed causative variant, and report the bioinformatic searches and tactics needed to localize the sequence variants in the cattle or pig genome. Conclusions: Taken together, there is usually good evidence for the association at the locus level, some evidence for a specifc causative gene at eight of the loci, and some experimental evidence for a specifc causative variant at six of the loci. We recommend that researchers who report new potential causative variants use referenced coordinate systems, show local sequence context, and submit variants to repositories. Background over 100 kb (reviewed by [1]). Furthermore, while genetic Identifcation of causative variants from genetic map- mapping studies follow relatively standardized linkage ping and genome-wide association studies is a difcult mapping or genome-wide association workfows, there is problem, especially for quantitative traits. Te difculties no clear recipe for the experimental biology studies that stem from several unfortunate facts of genetics. Quan- are needed to go from the associated locus to the causa- titative traits are afected by many variants each with a tive variant. small efect, which limits the power of genetic map- In spite of these difculties, researchers in animal ping, even with large sample sizes. Te genomic resolu- genetics have isolated a small number, probably less than tion of genetic mapping is also limited by the correlation 50, putative causative variants for traits relevant to live- between genetic variants (linkage disequilibrium), mean- stock breeding (reviewed for example by [1–3]). Te vari- ing that there are many candidate genes and variants ants and the evidence that support them are documented for each association. Especially in commercial livestock in a somewhat ad hoc fashion in scientifc papers and breeds that have seen systematic breeding, familial rela- databases. tionship leads to linkage disequilibrium that can extend As larger datasets of genotyped and sequenced live- stock animals that are phenotyped for complex traits, as well as functional genomic data from livestock, are *Correspondence: [email protected] accruing [4], we might expect a new boom in the identi- 1 Department of Animal Breeding and Genetics, Swedish University fcation of causative variants. Large datasets increase the of Agricultural Sciences, Box 7023, 750 07 Uppsala, Sweden Full list of author information is available at the end of the article power to detect loci for quantitative traits, and sequence © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Johnsson and Jungnickel Genet Sel Evol (2021) 53:67 Page 2 of 18 data will allow them to be fne-mapped down to the limit SNVs that are imported from the dbSNP database that set by linkage disequilibrium. New functional genomic have been remapped to the reference genome in ques- assays will make it easier to take candidate variants iden- tion, and have had consequences predicted with the tifed from fne-mapping studies further and to test their Ensembl Variant Efect Predictor [16]; in one case, we efects in the laboratory. In particular, more comprehen- generated predictions by inputting a modifed sequence sive open chromatin data (e.g. ATAC-seq or ChIP-seq into the VEP web interface. However, the dbSNP data- of histone marks) from multiple tissues (e.g. [5–7]), and base has discontinued non-human animals, and has expression quantitative trait locus studies (e.g. [8–11]) been superseded by the European Variation Archive will help prioritize non-coding variants, that are likely to (https:// www. ebi. ac. uk/ eva/) as a repository for live- explain a substantial part of the genetic variation in quan- stock genetic variants. For a few of the variants, which titative traits (quantifed by [12] in cattle), but with efects can be considered as responsible for genetic disorders that are challenging to predict from DNA sequence. We or monogenic traits, there are entries in the Online expect that this will make challenges in the identifcation Mendelian Inheritance In Animals (OMIA) database of causative variants more pressing in the near future. [17] (https:// www. omia. org), which are also listed in For the purpose of this paper, a “locus” refers to a Tables 1 and 2. Te reference genome versions used region of a genome associated with a trait, a “causative were ARS-UCD1.2 for cattle [18] and Sscrofa11.1 for variant” refers to a sequence variant that causes such a pig [19]. In one case where the gene was missing from genetic association, and a “causative gene” is the gene that the Ensembl gene annotation, we used the NCBI gene mediates that causative efect. In the literature, loci are annotation instead (https:// www. ncbi. nlm. nih. gov/ sometimes specifed as “quantitative trait loci” and causa- genome/ annot ation_ euk/ proce ss/) [20]. tive variants are variously referred to as “quantitative trait Using the Ensembl genome browser, we looked for nucleotides (QTN)”, “causative mutations” or “causal vari- variants in the gene that matched the original description ants”. For our purposes, these terms are interchangeable. (position on the protein, and amino acid substitution) in For this review, we read the literature that supports any of the Ensembl transcripts associated with the gene. potential causative variants in 13 genes (ABCG2, DGAT1, We looked in Ensembl Variation for variants with litera- GHR, IGF2, MC4R, MSTN, NR6A1, PHGK1, PRKAG3, ture citations. When the original publications gave the PLRL, RYR1, SYNGR2 and VRTN) in cattle and pigs, sequence (amino acid or nucleotide) close to the variant, and localized them in contemporary reference genomes. we verifed the position by pairwise alignment of amino Most of them are single nucleotide variants (SNVs), and acid sequences with the Emboss program Needle for some are short insertions/deletions (indels). We have global alignment or the Emboss program Water for local concentrated on causative variants that have been pro- alignment [21], or by alignment of nucleotide sequences posed for economically important traits, in particular for to the genome with the BLAT program [22]. We used the quantitative traits, but also included a few major genetic Ensembl REST API web service to map coordinates of defects; however, we have excluded causative variants the amino acid positions in the Ensembl gene database to for breed-type traits such as pigmentation, and recessive the reference genomes [23]. We used the LiftOver tool of lethal haplotypes. We review the evidence for causal- the UCSC genome browser to map coordinates between ity, aiming to separate the evidence for the locus, for the reference genome versions when coordinates were given proposed causative gene and for the proposed causative for an older reference genome (https:// genome. ucsc. edu/ variant, and report the bioinformatic searches and tac- cgi- bin/ hgLif tOver). tics needed to localize the sequence variants in the cattle or pig genome. We hope that this paper will be useful to researchers confronted with the task of following up on Proposed causative variants established genetic mapping results, and point out what Below, we report the localisation and citation for each information might be helpful to include when reporting of the potential causative variants, and comment on the new candidate causative variants. evidence that supports these variants. Tables 1 and 2 list the variants and their localisation in the cattle and pig Main text genome, respectively. We will discuss the evidence for To localise putative causative variants in contemporary each variant at three levels: reference genomes, we used the Ensembl Genes [13] and Ensembl Variation [14] database version 102. Te • whether the proposed gene is the causative gene livestock genomics resources provided by Ensembl and mediating the genetic efect at the locus; how to use them have recently been reviewed by Mar- • whether the specifc variant proposed is the causative tin et al.