www.nature.com/scientificreports

OPEN The ABCC4 is associated with pyometra in golden retriever dogs Maja Arendt1,2*, Aime Ambrosen3, Tove Fall4, Marcin Kierczak5, Katarina Tengvall2, Jennifer R. S. Meadows2, Åsa Karlsson2, Anne‑Sofe Lagerstedt3, Tomas Bergström7, Göran Andersson7, Kerstin Lindblad‑Toh2,6 & Ragnvi Hagman3*

Pyometra is one of the most common diseases in female dogs, presenting as purulent infammation and bacterial infection of the uterus. On average 20% of intact female dogs are afected before 10 years of age, a proportion that varies greatly between breeds (3–66%). The clear breed predisposition suggests that genetic risk factors are involved in disease development. To identify genetic risk factors associated with the disease, we performed a genome-wide association study (GWAS) in golden retrievers, a breed with increased risk of developing pyometra (risk ratio: 3.3). We applied a mixed model approach comparing 98 cases, and 96 healthy controls and identifed an associated locus on 22 (p = 1.2 × ­10–6, passing Bonferroni corrected signifcance). This locus contained fve signifcantly associated SNPs positioned within introns of the ATP-binding cassette transporter 4 (ABCC4) gene. This gene encodes a transmembrane transporter that is important for prostaglandin transport. Next generation sequencing and genotyping of cases and controls subsequently identifed four missense SNPs within the ABCC4 gene. One missense SNP at chr22:45,893,198 (p.Met787Val) showed complete linkage disequilibrium with the associated GWAS SNPs suggesting a potential role in disease development. Another locus on chromosome 18 overlapping the TESMIN gene, is also potentially implicated in the development of the disease.

Purulent bacterial infection of the uterus (pyometra) is one of the most common diseases of intact female dogs. On average one in fve female dogs are diagnosed with the disease before 10 years of age­ 1,2. Te proportion of afected bitches diagnosed varies greatly between diferent breeds, i.e. some breeds develop the disease to a much larger extent and at an earlier age than others (from 3% in Finnish spitz’ to 66% in Bernese mountain dogs)1,2. Te clear breed predisposition indicates that genetic risk factors play a role in the pathogenesis. Te golden retriever breed is among the breeds that have increased risk of pyometra (age corrected risk ratio 3.3)2. By 10 years of age, approximately 37% of all intact Swedish female golden retrievers will have been afected by the ­disease1,2. Pyometra is a potentially life-threatening illness that develops as a consequence of a combination of hormo- nal and bacterial factors. During the luteal phase of the oestrus cycle, high progesterone hormone levels make the uterus susceptible to opportunistic bacterial infections, foremost by Escherichia coli. Infection of the uterus can lead to sepsis and related endotoxemia and organ dysfunctions in severely afected individuals. In addition, circulating infammatory mediators ­increase3,4. Te treatment of choice is surgical ovariohysterectomy. Non- surgical treatment alternatives are possible in less severe cases, but are frequently associated with disease relapse­ 5. Diseases of the reproductive organs, such as pyometra, are more commonly diagnosed in Sweden in com- parison to many other countries, where in the latter, most non-breeding female dogs are spayed for reproduction preventive ­purposes6. Of all Swedish dogs, 90% are insured and 67% are registered in the Swedish Kennel Club

1Faculty of Health and Medical Sciences, Department of Veterinary Clinical Sciences, University of Copenhagen, Copenhagen, Denmark. 2Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden. 3Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden. 4Department of Medical Sciences, Molecular Epidemiology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden. 5Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden. 6Broad Institute of MIT and Harvard, Cambridge, MA, USA. 7Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden. *email: [email protected]; [email protected]

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 1 Vol.:(0123456789) www.nature.com/scientificreports/

(SKK), which facilitates identifcation of cases and control dogs suitable for genetic research studies through insurance company databases and the SKK ­registry7. Here we present an investigation of Swedish golden retrievers to identify genetic risk factors for pyometra using a genome-wide association study approach with a case–control population consisting of clinically well- defned afected and healthy dogs. Results Genome‑wide signifcant locus on chromosome 22 in CanFam 3.1. To identify disease-associated loci, 194 female golden retrievers were genotyped using the 170 k CanineHD BeadChip. Ninety-eight of the dogs were classifed as cases and 96 as controls. Te mean age of onset for the cases was 6.6 years (SD 2.1 years). All controls were intact and > 7 years old with a mean age of 8.6 years (SD 1.4 years). At initial quality control and fltering, 1000 SNPs were removed for low genotyping rate (< 95%) and 72,878 SNPs were removed for having a minor allele frequency of less than 5% leaving 97,468 SNPs for further analysis. No individuals were removed for having a low genotyping rate and the average genotyping rate in the population was 99%. A multidimensional scaling plot was generated showing the frst two dimensions (C1-C2) (Figure S1). No clustering between cases versus controls was noted in the population as a whole. Calculation of relatedness showed that two control dogs were related at sibling level (PI_HAT 0.51, both individuals were kept). A GWAS was performed using EMMAX to account for cryptic relatedness between individuals and population ­structure8. One genome-wide signifcant locus containing 5 SNPs in complete LD was identifed on chromosome 22 at ~ 45 Mb, p = 1.24 × ­10–6, which was below the LD-corrected Bonferroni signifcance threshold calculated as 4.2 × ­10–6 (see QQ-plot and Manhattan plot in Fig. 1a, b). Te QQ-plot did not show evidence of infation (lambda 0.99) with the associated SNPs on chromosome 22 above the dotted line reaching Bonferroni corrected signifcance. Two tentative additional loci were seen in the Manhattan plot (Fig. 1b). A locus on chromosome 18 (top SNP chr18:51,224,157, p = 5.2 × ­10–5), and a locus on chromosome 28 (top SNP chr28: 8,872,257, p = 6.0 × ­10–5). None of these reach Bonferroni corrected signifcance.

Conditioning on the top locus. To evaluate if either of the two additional loci represented independent risk factors from the chromosome 22 locus, a conditional genome-wide analysis was performed choosing the genotype of one of the top-associated SNPs (chr22:45,875,420) on chromosome 22 as a covariate. As seen in the QQ plot (Fig. 1c) and Manhattan plot (Fig. 1d), the locus on chromosome 18 shifed ~ 2 Mb but showed a mildly improved p-value leaving it as a suggestive locus (chr18:49,198,998, p = 2.8 × ­10–5), whilst the association to the SNP on chromosome 28, located in intron 7 of the SORBS1 gene, disappeared. Te most signifcantly associated SNP on chromosome 18 identifed in this analysis was located in intron 4 of the TESMIN (MTL5) gene (the allele frequency was 0.64 in cases and 0.44 in controls). A closeup of the locus on chromosome 18 including the LD structure and annotation of the region can be found in Fig. 2. Te risk alleles on chromosome 22 are present in 40% of the cases, however when looking at the distribution of alleles for the chromosome 22 and chromosome 18 loci (at 49 Mb) then 96% of the cases carried at least one risk allele from one of the two loci versus only 70% of the controls. Te distribution of genotypes between cases and controls is shown in Figure S2.

Association to age of onset. To investigate potential loci associated to early onset of pyometra, we per- formed an association analysis within the cases only, using age of onset in days as a continuous variable. No SNPs reached Bonferroni corrected signifcance (Fig. 1e, f). Two loci on chromosome 15 and 32 stood out and were considered as suggestively associated. Te most strongly associated SNP on chromosome 15 (chr15:59,440,763, p = 9.0 × ­10–6) is located within intron 3 of the Nuclear Assembly Factor 1 ribonucleoprotein (NAF1) gene. Te most strongly associated SNP on chromosome 32 (chr32:22,285,412, p = 4.32 × ­10–5) was located in intron 1 of the Endomucin (EMCN) gene.

Investigation of top locus identifes non‑synonymous SNP in the ABCC4 gene. Te genome- wide signifcant locus on chromosome 22 was defned as an 18.2 kb haplotype block of 5 GWAS SNPs in com- plete LD (r­ 2 = 1.00, chr22:45,875,420–45,893,599 bp) spanning introns 18 and 19 of the ATP-binding cassette transporter 4 gene (ABCC4, ENSCAFG00000005433, ENSCAFT00000008769, UniProt F1PNA2), (Fig. 3a–c). Te allele frequency for the risk haplotype was 0.21 in the cases versus 0.05 in the controls (Table 1), resulting in an odds ratio of 4.8 (95% CI 2.3–9.9). A summary of the allele frequencies and p values for the fve GWAS SNPs is shown in Table 1. To further investigate the associated locus on chromosome 22, we generated whole genome sequencing data from a pool of 10 pyometra cases (22X mean coverage), which were all heterozygous for the associated risk haplotype on chromosome 22. In addition, sequencing data was generated from one individual homozygous for the GWAS risk haplotype (23X coverage) and 10 individually barcoded individuals homozygous for the non- risk alleles (3 cases and 7 controls; 4.4X mean coverage). A 0.25 Mb region covering the ABCC4 gene (chr22:4 5,767,063–46,013,484 bp) was extracted from the sequencing data and called variants were annotated. In total 1,051 SNPs were identifed in the region, out of which 627 were known variants. Four missense variants were identifed within the ABCC4 gene (Table 2), one of which, chr22:45,893,198 (rs8937218), was located within the associated GWAS locus. To expand the study of the ABCC4 gene in a larger population of golden retrievers, we designed TaqMan genotyping assays for the four identifed missense variants. Genotyping of these selected SNPs was carried out in 292 golden retrievers including 134 cases and 158 controls. Te 292 dogs included the 97 cases and 96 controls, which were part of the GWAS analysis (See supplementary Figure S3 for overview of data). Te additional dogs

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 2 Vol:.(1234567890) www.nature.com/scientificreports/

Figure 1. GWAS of pyometra in golden retrievers. (a) QQ-plot (λ = 0.99) and (b) Manhattan plot for GWAS of 98 cases and 96 controls identifed a genome-wide signifcant signal on chromosome 22. Stippled line shows Bonferroni corrected signifcance threshold. (c) QQ-plot and (d) Manhattan plot of conditional GWAS using the genotype for one of the most associated SNPs (chr22: 45,875,420) on chromosome 22 as covariate, illustrating a slightly stronger association on chromosome 18, and the disappearance of the association on chromosome 28. (e) QQ-plot and (f) Manhattan plot for age of onset analysis showed two suggestive loci on chromosome 15 and 32.

were individuals who were not chosen to be part of the GWAS analysis due to relatedness. For the dogs, which did not have 170 k genotyping data available, two of the GWAS SNPs (chr22:45,882,260 and chr18:49,198,998) were also genotyped. Te TaqMan data from the original GWAS dogs was merged with the GWAS dataset. A genome-wide association analysis was repeated on the merged GWAS dataset using a mixed model approach (EMMAX). One of the ABCC4 coding SNPs (Chr22: 45,893,198) was in complete LD (r­ 2 = 1.00) with the identifed GWAS locus on chr 22: i.e. equally associated with the disease phenotype based on the p value (p = 1.24 × ­10–6, lambda 0.99). Tis coding sequence variant (A > G) causes an amino-acid substitution p.Met787Val in the encoded ABCC4 . Te SIFT score for the amino acid change is 1.0 indicating that this is likely to be a well-tolerated change. When performing a basic association test using PLINK 1.07 for the six TaqMan SNPs for all 292 genotyped dogs, three dogs were removed for low genotyping rate < 0.5. For the remaining dogs, complete LD with a ­r2 = 1.00 was seen between two SNPs chr22: 45,882,260 and chr22:45,893,198 indicating that the risk haplotype seen in the smaller GWAS dataset is still present in this larger population. For this basic association including all TaqMan genotyped individuals the best association was to the chr18:49,198,998 SNP with a p value of 7.89 × ­10–5 with

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 3 Vol.:(0123456789) www.nature.com/scientificreports/

Figure 2. Detailed view of the associated locus on chromosome 18. (a) Zoomed in view of chromosome 18. Te LD R­ 2-value between the most highly associated SNP is illustrated by colour. (b) Further zoomed in view showing chr18 48.9–50 Mbp. (c) annotation of the region chr18 48.9–50 Mbp based on the UCSC browser annotation.

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 4 Vol:.(1234567890) www.nature.com/scientificreports/

Figure 3. Detailed view of the associated locus on chromosome 22. (a) Zoomed in view of chromosome 22 at 44.7–47.2 Mbp. Te LD R­ 2-value between the most highly associated SNPs is illustrated by colour. (b) Further zoomed in view of the most associated SNPs on chromosome 22 cfa 45.7–46.2 Mbp. (c) Te ABCC4 gene is located within the chromosome 22 45.7–46.2 Mbp region.

SNP Id Chr Bp Allele risk/non-risk Location in relation to ABCC4 gene Risk allele freq. (cases) Risk allele freq. (controls) GWAS P value BICF2G630333328 22 45,875,420 T/G intron 19/30 0.21 0.05 1.24e−06 BICF2G630333353 22 45,882,260 A/G intron 19/30 0.21 0.05 1.24e−06 BICF2G630333364 22 45,886,617 T/G intron 19/30 0.21 0.05 1.24e−06 BICF2G630333376 22 45,892,301 G/A intron 19/30 0.21 0.05 1.24e−06 BICF2G630333380 22 45,893,599 C/T intron 18/30 0.21 0.05 1.24e−06

Table 1. Signifcantly associated GWAS SNPs. Te CanFam3.1 genomic coordinates are shown for the fve associated GWAS SNPs reaching Bonferroni corrected signifcance. Allele frequency in cases and controls for the 98 cases and 96 controls is shown.

the candidate SNP chr22:45,893,198 being less associated p = 1.48 × ­10–4 and with a mildly reduced OR 2.6 (95% CI 1.6–4.5). In total 120 SNPs were identifed within the ~ 18 kb haplotype block defned by the 5 SNPs in complete LD. Tough the chr22:45,893,198 variant is the only coding variant within this region there are other variants in the locus located in genetically conserved regions or in cis-regulatory elements. In Fig. 4 we have visualized the SNPs lifed over to the (hg38) in the UCSC browser­ 9. In total, 85 of 120 SNPs could be lifed over to the human genome (hg38).

Allele frequencies of the chr22:45,893,198 SNP in other breeds. Allele frequencies for the ABCC4 candidate SNP chr22:45,893,198 were available in fve other dog breeds and in a separate pool of American golden retrievers, from a Panel Of Normal (PON) ­dataset10. Tis dataset was collected as part of a cancer study

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 5 Vol.:(0123456789) www.nature.com/scientificreports/

Allele frequency P value GWAS (98 cases/96 Dominant Amino acid LD with GWAS (98 cases-96 controls) risk amino acid 100 Chr Position Reference Alternative Risk allele change SIFT score LOCUS controls) allele mammals A (T not seen in chr22 45,815,581 G A A 972 [A/T] Tolerated (0.53) 0.64 4.9 × ­10–4 0.14/0.03 other species) I (V seen in chr22 45,823,359 G A G* 874 [V/I] Tolerated (1) Nil Nil 1.0/1.0 mouse, shrew, dog ) V (M seen in pig, naked mole rat, chinchilla, brush chr22 45,893,198 A G A 787 [M/V] Tolerated (1) 1.00 1.2 × 10–6 0.21/0.05 tailed rat, dog, lizard and fared fycatcher) A and S (V not chr22 45,934,522 C T T 297 [A/V] Tolerated (0.08) 0.01 0.23 0.74/0.69 seen in other species)

Table 2. Fine-mapped SNPs with potential function in the chr22 region. Te four missense SNPs in the ABCC4 gene are displayed. Tey were identifed in the 45–46 MB region of chromosome 22 based on next generation sequencing. Te reference, risk and protective allele for each SNP is listed as well as the associated p-values for the SNPs when incorporated into the GWAS data set for 98 cases and 96 controls. Te r­ 2 value is listed in relation to LD calculations between the most associated GWAS SNP on chromosome 22 and each of the coding SNPs. An evaluation of the most common amino acid residues for the ABCC4 coding changes based on the UCSC genome browser is also noted. *Only one dog (control) in the extended dataset with 292 dogs was heterozygous for the chr22:45,823,359 SNP.

unrelated to the current study and hence the dataset contained both male and female dogs and their disease history and neutering status is unknown. Te allele frequency for the chr22:45,893,198 SNP in the population is summarized (Table S1). In this population of diferent dog breeds the allele frequency for the risk variant varies between from 0.37 in golden retrievers to 0.98 in Rottweilers. Te Rottweiler is one of the breeds with the highest risk of developing pyometra (adjusted risk 4.4). Discussion We performed a GWAS comparing golden retrievers afected by pyometra, with healthy intact female dogs older than 7 years of age. We found a genome-wide signifcant association to a region on chromosome 22 localized in the ABCC4 gene. Sequencing data identifed four non-synonymous SNPs in the ABCC4 gene. Genotyping of the coding SNPs in a larger cohort of golden retriever cases and controls showed that in particular one SNP, chr22:45,893,198, was in complete LD with the top-associated GWAS SNPs suggesting a potential causal function relating to the risk for development of pyometra in this breed, though other non-coding SNPs in this region could also be implicated. When performing a conditional analysis with the chromosome 22 associated SNP, a SNP on chr18:49,198,998 was shown to be more associated implicating that this complex disease is likely promoted by several genetic risk factors. Te ABCC4 protein, also known as multidrug resistant protein 4 (MRP4), is a member of the ATP-binding cassette transporter family, which encodes that are important for transportation of endogenous and exogenous molecules across cell membranes­ 11,12. Although ABCC4 has not previously been associated with uterine infammation in dogs, it is known that the protein has central functions in the reproductive system, and its role in transporting prostaglandins (PGs) in the endometrium is well-established11,12. Prostaglandins have many roles in reproduction­ 13,14. Te function and life-span of the corpus luteum, and physiology of parturition are regulated by complex interactions in which PGs ­participate15. PGs are also impor- tant infammatory mediators and are produced and released from neutrophils, macrophages, lymphocytes, and platelets during infammation. Importantly, uterine endometrial cells are capable of synthesizing and releas- ing ­PGs16. In dogs as well as many other mammals, circulating levels of PGs increase in uterine infammatory ­conditions16–22. In the uterine tissue during canine pyometra, Prostaglandin-endoperoxide synthase 2 (PTGS2), a gene that is responsible for PG synthesis, is among the top for which expression is ­increased23,24. Further- more, PGF­ 2α induces myometrial contractions, and is used therapeutically in medical treatment of pyometra in ­dogs25,26. Taken together, the many roles of PGs point to that altered prostaglandin transport could contribute to the development of ­pyometra27,28. Te associated coding sequence variant at chr22:45,893,198 causes an amino acid change, p.M787V. Tis amino acid is located in the cytoplasmic loop 3 (CL3) of the ABCC4 protein, a region of the protein that is phy- logenetically conserved­ 29. Across mammalian species the most common amino acid residue at the 787 position is Valine, and out of 61 mammals only the dog, pig, brush tailed rat, naked mole rat and chinchilla have Methio- nine in this position­ 9,30. In this study the risk allele results in keeping the less common amino acid Methionine, whilst the protective allele results in a change into to the more common Valine. A recent paper described the importance of the cytoplasmic loop 3 and how a single amino acid substitution in ABCC4 p.T796M could reduce the expression and stability of the human ABCC4 protein. In that study the p.T796M ABCC4 substitution was predicted to be benign and well tolerated by SIFT and PolyPhen, however this was found to be unlikely by the authors due to the larger size of methionine­ 29. In our study, the canine p.M787V ABCC4 substitution was also

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 6 Vol:.(1234567890) www.nature.com/scientificreports/

Figure 4. Detailed view of the associated locus on chromosome 22 lifed over to the human genome GRCh38. (a) Lifover of identifed genetic variants located with the complete LD locus from CanFam3.1 to hg38. A separate track shows the location of the fve associated GWAS SNPs and the identifed coding variant within ABCC4. (b) Zoom in on the coding variant and the two nearest associated GWAS SNPs lifed over to human hg38. Tracks below the coding track include the ENCODE cis-regulatory elements track, the gene expression track, the Multix Alignments of 100 Vertebrates conservation track and the H3K27Ac regulatory elements track­ 9.

predicted to be benign and well tolerated by PolyPhen and SIFT. Nevertheless, it is possible that it could infuence the ABCC4 cellular transportation capacity­ 29. Te ABCC4 risk variant is present in the canine reference genome sequence rather than the non-risk allele. Te Canfam 3.1 assembly is based on a single individual (female boxer) and it is feasible that this individual carries the risk allele as the disease is common in boxers (diagnosed in 28% by 10 years of age, age adjusted rela- tive risk 2.7)2. Tough we predict that the coding sequence variant chr22:45,893,198 can infuence the risk of developing pyometra, it is also possible that this SNP could increase the reproductive potential of the individu- als, such as contributing to increased fertility, litter size, conception, or fetal growth and therefore, it could have been under selection. Te susceptibility to disease is likely not selected against as the majority of pyometra cases develop afer the main reproductive period. Te discrepancy between the association results in the smaller GWAS population and the larger TaqMan genotyping population can possibly be explained by the larger dataset including dogs which were excluded from the original GWAS dataset due to frst degree relatedness. Hence the association analysis containing more dogs included many highly related individuals, which could falsely skew the data away from the original association. Due to the few variants in the dataset it is not possible to correct this association using a mixed model approach. Interestingly, the chr22:45,893,198 SNP showed variation in allele frequency across breeds, with the Rot- tweiler being almost completely fxed for the risk allele. Te Rottweiler is ranked as one of the dog breeds with the highest risk of developing pyometra (61% are diagnosed with the illness by 10 years of age)1,2. Tis suggests

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 7 Vol.:(0123456789) www.nature.com/scientificreports/

that this variant, could contribute to the disease risk in other breeds, however due to its high allele frequency it would be difcult to identify an association in the Rottweiler. Te ABCC4 transporter is known for its involvement in transporting drugs and other molecules across the cell membrane and altered function can lead to increased cellular toxicity in relation to exposure of various drugs. It is unknown whether the ABCC4 amino acid changes identifed here can afect drug transport, but diferences in NSAID transportation have been linked with altered ABCC4 function­ 31. In total 120 SNPs were identifed in the associated GWAS locus. When lifed over to the human genome hg38, 85 SNPs could be transferred. Some of these SNPs were found to be located in areas with H3K27Ac histone enrichment and areas with a high conservation score based on data from Multiz alignments of 100 Vertebrates (Fig. 4)9,30. Even though this study focused on evaluating the coding mutations it is possible that other variants could have functional impact on gene expression or function and infuence the risk for disease development. Pyometra is likely induced by multifactorial genetic and environmental factors hence we investigated the potential of associated loci independent of the chr 22 locus and risk factors associated to age of onset. Tough none of the other loci reached Bonferroni corrected signifcance, we detected suggestive associations to SNPs located within introns of several genes (TESMIN, SORBS1, NAF1 and EMCN). Based on known function of the proteins encoded by these genes, potential implications for the development of pyometra could be considered­ 32–36. A stronger association was seen to the SNP chr18:49,198,998, in the larger genotyping cohort. Tis SNP lies within the intronic region of TESMIN, Te TESMIN gene encodes a metallothionein-like protein. In the mouse, this gene is expressed in both male and female reproductive organs and changes in expression have been observed in the endometrium in response to stress­ 37. Tis locus was not studied in detail here though many additional potentially implicated genes were located within high LD of the associated SNP and additional genetic variants were identifed in this region. Larger genome-wide association studies of more golden retrievers and other high- risk breeds will be necessary to achieve sufcient power to detect additional loci at genome-wide signifcance. In conclusion, this GWAS identifed an association to a locus in the ABCC4 gene and subsequently identi- fed a non-synonymous SNP in complete LD with the most highly associated GWAS SNP. Further validation studies are needed to establish the direct functional consequence of the ABCC4 risk variant on the transport of prostaglandins and development of pyometra. Materials and methods Ethical approval. All methods were carried out in accordance with relevant guidelines and regulations. Samples were collected with the owners’ written informed consent and in agreement with Ethical guidelines. Ethical approval was granted by the regional animal ethics committee (Uppsala ethics committee on animal experiments/Uppsala djurförsöksetiska nämnd: Dnr C269/8, D318/9, C139/9, C41/12).

Sample collection. Blood samples were collected from female golden retrievers afected by or with a his- tory of pyometra. Te dogs were identifed through the diagnostic code for pyometra in the Agria Animal Insur- ance Inc. database, which has been validated for research purposes­ 38. A questionnaire was flled in by each dog- owner directly prior to the time of blood sample collection. Details of the dog’s Swedish Kennel Club’s (SKK) registry number, name, age, birth date, previous whelping, onset of signs of pyometra, whether surgical treat- ment (ovariohysterectomy) had been performed or not, and past or present other diseases or medications were noted in the questionnaire. In parallel, blood samples were similarly collected from control dogs (> 7 years old intact female golden retrievers), identifed via the SKK registry. Questionnaire and health information was also collected from the control dogs. Information regarding pedigree was extracted from the SKK database, based on the individual dog’s registration number. Siblings were excluded as to only include one individual in the case and control groups, respectively. Te health status of the control dogs was updated yearly by telephone contact, to assure that none of the controls had developed pyometra at the time of data analysis.

DNA extraction. Genomic DNA was extracted from EDTA blood by a robotic method using the QIASym- phony robot (Qiagen, Hilden, Germany) together with the QIAamp DNA Blood Midi Kit (Qiagen).

Genome‑wide genotyping. DNA from each dog was genotyped using the Illumina 170 K CanineHD BeadChip (Illumina, San Diego, CA, USA). A total of 98 cases and 96 control samples were genotyped on the NeoGen Genomics genotyping platform (NeoGen Genomics, Lincoln, NE, USA).

GWAS analysis. Data quality control and fltering was performed using the sofware PLINK 1.0739. SNPs were removed if they had a minor allele frequency (MAF) less than 5% or if they had failed to be genotyped in more than 5% of samples (–maf 0.05, –geno 0.05). Individuals with a genotyping rate of less than 95% were removed (–mind 0.05). To evaluate and visualize the population structure a multidimensional scaling (MDS) plot was generated on the fltered dataset using PLINK 1.0739. Te frst 4 coordinates (C1–C4) were calculated and the frst two were plotted against each other to illustrate population structure. An additional control for relatedness within the cases or controls was calculated using PLINK v 1.9 by calculating PI_HAT on the fnal dataset afer LD ­pruning39. To account for cryptic relatedness within the population and population structure the GWAS analysis was performed using the efcient mixed model association expedited sofware (EMMAX)8. Te sofware was used with an identity by state (IBS) matrix. QQ-plots and Manhattan plots were generated in R using the sofware package Lattice­ 40. Te signifcance threshold was determined using an LD corrected Bonferroni signifcance

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 8 Vol:.(1234567890) www.nature.com/scientificreports/

threshold based on 11,897 SNPs that were not in complete or near-complete LD as calculated by PLINK –indep 100 10 10 as previously described­ 41. A conditional association analysis to look for risk factors independent of the top associated loci was per- formed using the genotype for the associated SNP (chr22: 45,875,420) as a covariate in the GWAS analysis using EMMAX. To search for risk factors associated with age of onset, age was calculated in days for all the cases. A GWAS analysis was then performed including only the cases and using the age of onset in days as a continuous variable using the EMMAX sofware.

Whole genome sequencing. Whole genome sequencing data was generated from one pool of 10 golden retrievers, all heterozygous for the risk haplotype and from one dog homozygous for the risk haplotype. In addi- tion, individually barcoded sequencing data was generated from 10 individuals homozygous for the non-risk haplotype (PRJNA693123). Samples were sequenced as paired-end libraries with 100 bp read length on the Illumina HiSeq2000 system. Data was aligned to the CanFam3.1 reference genome sequence (http://​www.​ncbi.​ nlm.​nih.​gov/​assem​bly/​317138) according to the Genome Analysis Tool Kit (GATK) best practices work fow­ 42. Variants in the regions of interest were annotated using variant efect ­predictor43, including annotation with ­SIFT44 and PolyPhen-245.

TaqMan genotyping of SNPs in candidate region. TaqMan custom arrays (Table S2) were designed for genotyping of four non-synonymous coding SNPs in the ABCC4 gene and using the Custom TaqMan Assay Design Tool (TermoFisher Scientifc, Waltham, MA, USA). Assays were also designed for two of the top GWAS SNPs, as controls and to evaluate genotypes for additional samples, which were not genotyped on the 170 k CanineHD BeadChip. Genotyping was performed in 292 golden retrievers including 134 cases and 158 controls. Tis included 97 cases and 96 controls genotyped for the GWAS analysis and additional novel cases and controls. Unfortunately, one of the cases from the GWAS study could not be genotyped for the additional candidate SNPs due to lack of DNA. Manual imputation of the ABCC4 coding SNPs with an r­ 2 above 0.99 to the original GWAS locus was performed for this individual. Te basic association on the TaqMan genotyping data was carried out using PLINK 1.0739.

Allele frequencies in panel of normal. Genotypes for the four coding SNPs in the ABCC4 gene were extracted from a Panel of Normals (PON) database generated for a separate study which is displayed as a track on the UCSC genome ­browser10.

Received: 28 March 2021; Accepted: 29 July 2021

References 1. Jitpean, S. et al. Breed variations in the incidence of pyometra and mammary tumours in Swedish dogs. Reprod. Domest. Anim. 47(Suppl 6), 347–350. https://​doi.​org/​10.​1111/​rda.​12103 (2012). 2. Egenvall, A. et al. Breed risk of pyometra in insured dogs in Sweden. J. Vet. Intern. Med. 15, 530–538 (2001). 3. Hagman, R., Kindahl, H. & Lagerstedt, A. S. Pyometra in bitches induces elevated plasma endotoxin and prostaglandin F2alpha metabolite levels. Acta Vet. Scand. 47, 55–67 (2006). 4. Jitpean, S. et al. Serum insulin-like growth factor-I, iron, C-reactive protein, and serum amyloid A for prediction of outcome in dogs with pyometra. Teriogenology 82, 43–48. https://​doi.​org/​10.​1016/j.​theri​ogeno​logy.​2014.​02.​014 (2014). 5. Hagman, R. Pyometra in small animals. Vet. Clin. N. Am. Small Anim. Pract. 48, 639–661. https://​doi.​org/​10.​1016/j.​cvsm.​2018.​ 03.​001 (2018). 6. Egenvall, A., Hedhammar, A., Bonnett, B. N. & Olson, P. Survey of the Swedish dog population: Age, gender, breed, location and enrollment in animal insurance. Acta Vet. Scand. 40, 231–240 (1999). 7. https://​www.​skk.​se/​sv/​nyhet​er/​2017/​11/​allt-​fer-​hundar-​i-​sveri​ge/. 8. Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354. https://​doi.​org/​10.​1038/​ng.​548 (2010). 9. http://​genome.​ucsc.​edu. 10. Elvers, I. et al. Exome sequencing of lymphomas from three dog breeds reveals somatic mutation patterns refecting genetic background. Genome Res. 25, 1634–1645. https://​doi.​org/​10.​1101/​gr.​194449.​115 (2015). 11. Rius, M., Ton, W. F., Keppler, D. & Nies, A. T. Prostanoid transport by multidrug resistance protein 4 (MRP4/ABCC4) localized in tissues of the human urogenital tract. J. Urol. 174, 2409–2414. https://​doi.​org/​10.​1097/​01.​ju.​00001​80411.​03808.​cb (2005). 12. Russel, F. G., Koenderink, J. B. & Masereeuw, R. Multidrug resistance protein 4 (MRP4/ABCC4): A versatile efux transporter for drugs and signalling molecules. Trends Pharmacol. Sci. 29, 200–207. https://​doi.​org/​10.​1016/j.​tips.​2008.​01.​006 (2008). 13. Samuelsson, B. Te synthesis and biological role of prostaglandins. Biochem. J. 128, 4P (1972). 14. Bergstrom, S. et al. Prostaglandins in fertility control. Science 175, 1280–1287. https://​doi.​org/​10.​1126/​scien​ce.​175.​4027.​1280 (1972). 15. McCracken, J. A. et al. Prostaglandin F 2 identifed as a luteolytic hormone in sheep. Nat. New Biol. 238, 129–134 (1972). 16. Heap, R. B. Prostaglandins in pyometrial fuid from the cow, bitch and ferret. Br. J. Pharmacol. 55, 515–518 (1975). 17. Hughes, J. P. et al. Pyometra in the mare. J Reprod Fertil Suppl, 321–329 (1979). 18. Silva, E., Leitao, S., Ferreira-Dias, G., Lopes da Costa, L. & Mateus, L. Prostaglandin synthesis genes are diferentially transcripted in normal and pyometra endometria of bitches. Reprod. Domest. Anim. 44(Suppl 2), 200–203. https://doi.​ org/​ 10.​ 1111/j.​ 1439-​ 0531.​ ​ 2009.​01393.x (2009). 19. Hagman, R. et al. Diferentiation between pyometra and cystic endometrial hyperplasia/mucometra in bitches by prostaglandin F2alpha metabolite analysis. Teriogenology 66, 198–206. https://​doi.​org/​10.​1016/j.​theri​ogeno​logy.​2005.​11.​002 (2006).

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 9 Vol.:(0123456789) www.nature.com/scientificreports/

20. Fu, X., Favini, R., Kindahl, K. & Ulmsten, U. Prostaglandin F2alpha-induced Ca++ oscillations in human myometrial cells and the role of RU 486. Am. J. Obstet. Gynecol. 182, 582–588 (2000). 21. Fredriksson, G., Kindahl, H., Sandstedt, K. & Edqvist, L. E. Intrauterine bacterial fndings and release of PGF2 alpha in the post- partum dairy cow. Zentralbl Veterinarmed A 32, 368–380 (1985). 22. Wlodawer, P., Kindahl, H. & Hamberg, M. Biosynthesis of prostaglandin F2alpha from arachidonic acid and prostaglandin endop- eroxides in the uterus. Biochim. Biophys. Acta. 431, 603–614 (1976). 23. Voorwald, F. A. et al. Molecular expression profle reveals potential biomarkers and therapeutic targets in canine endometrial lesions. PLoS ONE 10, e0133894. https://​doi.​org/​10.​1371/​journ​al.​pone.​01338​94 (2015). 24. Hagman, R., Ronnberg, E. & Pejler, G. Canine uterine bacterial infection induces upregulation of proteolysis-related genes and downregulation of homeobox and zinc fnger factors. PLoS ONE 4, e8039. https://​doi.​org/​10.​1371/​journ​al.​pone.​00080​39 (2009). 25. Hirsbrunner, G., Knutti, B., Kupfer, U., Burkhardt, H. & Steiner, A. Efect of prostaglandin E2, DL-cloprostenol, and prostaglandin E2 in combination with D-cloprostenol on uterine motility during diestrus in experimental cows. Anim. Reprod. Sci. 79, 17–32 (2003). 26. Fieni, F., Topie, E. & Gogny, A. Medical treatment for pyometra in dogs. Reprod. Domest. Anim. 49(Suppl 2), 28–32. https://​doi.​ org/​10.​1111/​rda.​12302 (2014). 27. Sheldon, I. M., Cronin, J., Goetze, L., Donofrio, G. & Schuberth, H. J. Defning postpartum uterine disease and the mechanisms of infection and immunity in the female reproductive tract in cattle. Biol. Reprod. 81, 1025–1032. https://​doi.​org/​10.​1095/​biolr​ eprod.​109.​077370 (2009). 28. Herath, S. et al. Expression of genes associated with immunity in the endometrium of cattle with disparate postpartum uterine disease and fertility. Reprod. Biol. Endocrinol. 7, 55. https://​doi.​org/​10.​1186/​1477-​7827-7-​55 (2009). 29. Cheepala, S. B. et al. Crucial role for phylogenetically conserved cytoplasmic loop 3 in ABCC4 protein expression. J. Biol. Chem. 288, 22207–22218. https://​doi.​org/​10.​1074/​jbc.​M113.​476218 (2013). 30. Rosenbloom, K. R. et al. Te UCSC genome browser database: 2015 update. Nucleic Acids Res. 43, D670-681. https://​doi.​org/​10.​ 1093/​nar/​gku11​77 (2015). 31. Reid, G. et al. Te human multidrug resistance protein MRP4 functions as a prostaglandin efux transporter and is inhibited by nonsteroidal antiinfammatory drugs. Proc. Natl. Acad. Sci. U S A 100, 9244–9249. https://​doi.​org/​10.​1073/​pnas.​10330​60100 (2003). 32. Matsuura, T. et al. Germ cell-specifc nucleocytoplasmic shuttling protein, tesmin, responsive to heavy metal stress in mouse testes. J. Inorg. Biochem. 88, 183–191. https://​doi.​org/​10.​1016/​s0162-​0134(01)​00377-4 (2002). 33. Liu, J. L., Wang, T. S. & Zhao, M. Genome-wide association mapping for female infertility in inbred mice. G3 (Bethesda) 6, 2929–2935. https://​doi.​org/​10.​1534/​g3.​116.​031575 (2016). 34. https://​www.​prote​inatl​as.​org/​ENSG0​00001​45414-​NAF1/​tissue. 35. Stanley, S. E. et al. Loss-of-function mutations in the RNA biogenesis factor NAF1 predispose to pulmonary fbrosis-emphysema. Sci. Transl. Med. 8, 351ra107. https://​doi.​org/​10.​1126/​scitr​anslm​ed.​aaf78​37 (2016). 36. Zahr, A. et al. Endomucin prevents leukocyte-endothelial cell adhesion and has a critical role under resting and infammatory conditions. Nat. Commun. 7, 10363. https://​doi.​org/​10.​1038/​ncomm​s10363 (2016). 37. Kalma, Y. et al. Endometrial biopsy-induced gene modulation: frst evidence for the expression of bladder-transmembranal uro- plakin Ib in human endometrium. Fertil. Steril. 91, 1042–1049. https://​doi.​org/​10.​1016/j.​fertn​stert.​2008.​01.​043 (2009). 38. Egenvall, A., Bonnett, B. N., Olson, P. & Hedhammar, A. Validation of computerized Swedish dog and cat insurance data against veterinary practice records. Prev. Vet. Med. 36, 51–65 (1998). 39. Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575. https://​doi.​org/​10.​1086/​519795 (2007). 40. Sarkar, D. in Use R! (eds R. Gentleman, K. Hornik, & G. Parmigiani) (Springer, 2008). 41. Hayward, J. J. et al. Complex disease and phenotype mapping in the domestic dog. Nat. Commun. 7, 10460. https://​doi.​org/​10.​ 1038/​ncomm​s10460 (2016). 42. McKenna, A. et al. Te Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. https://​doi.​org/​10.​1101/​gr.​107524.​110 (2010). 43. McLaren, W. et al. Te Ensembl Variant Efect Predictor. Genome Biol. 17, 122. https://doi.​ org/​ 10.​ 1186/​ s13059-​ 016-​ 0974-4​ (2016). 44. Sim, N. L. et al. SIFT web server: Predicting efects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452-457. https://​doi.​org/​10.​1093/​nar/​gks539 (2012). 45. Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249. https://doi.​ org/​ ​ 10.​1038/​nmeth​0410-​248 (2010). Acknowledgements Te owners of participating golden retrievers and all veterinarians that performed sampling are gratefully acknowledged—without your help and support the study would not have been possible. Te Agria Animal Insurance Inc. and Swedish Kennel Club (SKK) Research Foundation and Fjällveterinärerna are acknowledged for fnancial support. In addition, the Agria Animal Insurance Inc. and SKK contributed with data from the insur- ance database and SKK registry. We would like to thank Annalena Hartmann, Leonie Neumann and Eva Murén for help with the TaqMan genotyping. Further we would like to thank Susanne Gustafsson and the biobank at the Swedish University of Agricultural Sciences for DNA extractions and handling and storing of genetic material, and Veronika Scholtz for helping out during her MSc studies. We would like to thank the Swedish Golden Retriever Club for encouragement, support and for spreading information about the project. We thank the National Genomics Infrastructure in Uppsala funded by Science for Life Laboratory, for assistance, and for access to the UPPMAX computational infrastructure. Author contributions R.H., K.L.T. and G.A. designed the study. T.F. contributed during the initial phase of study design. R.H. was project leader. R.H. and A.A. collected and phenotyped samples. A.S.L., S.G., K.T. and T.B. gave input on data collection. M.L.A. analyzed the GWAS and whole genome sequencing data and designed the TaqMan assays. M.K. and T.F. assisted in GWAS data analysis. Å.K. and J.R.S.M., performed the TaqMan genotyping of SNPs. R.H., M.L.A., K.L.T., G.A. drafed the manuscript with input from all co-authors. Funding Open access funding provided by Uppsala University and the Swedish University of Agriculture. Te Agria Animal Insurance Inc and SKK Research Foundation and Fjällveterinärerna fnancially supported the study.

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 10 Vol:.(1234567890) www.nature.com/scientificreports/

Competing interests A patent application has been submitted ref HSJ114786P.SEA covering the fndings from this research study. Additional information Supplementary Information Te online version contains supplementary material available at https://​doi.​org/​ 10.​1038/​s41598-​021-​95936-1. Correspondence and requests for materials should be addressed to M.A. or R.H. Reprints and permissions information is available at www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations. Open Access Tis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.

© Te Author(s) 2021

Scientifc Reports | (2021) 11:16647 | https://doi.org/10.1038/s41598-021-95936-1 11 Vol.:(0123456789)