Endoscopy

Original research Genome-­wide analysis of 944 133 individuals Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from provides insights into the etiology of haemorrhoidal disease Tenghao Zheng ‍ ‍ ,1,2 David Ellinghaus ‍ ‍ ,3,4 Simonas Juzenas ‍ ‍ ,3,5 François Cossais,6 Greta Burmeister,7 Gabriele Mayr,3 Isabella Friis Jørgensen,4 Maris Teder-­Laving,8 Anne Heidi Skogholt,9 Sisi Chen,10 Peter R Strege,10 Go Ito,3,11 Karina Banasik,4 Thomas Becker,12 Frank Bokelmann,13 Søren Brunak,4 Stephan Buch,14 Hartmut Clausnitzer,15 Christian Datz ‍ ‍ ,16 DBDS Consortium, Frauke Degenhardt,3 Marek Doniec,17 Christian Erikstrup,18 Tõnu Esko,8 Michael Forster ‍ ‍ ,3 Norbert Frey,19,20,21 Lars G Fritsche,22 Maiken Elvestad Gabrielsen,9 Tobias Gräßle,23,24 Andrea Gsur ‍ ‍ ,25 Justus Gross,7 Jochen Hampe ‍ ‍ ,14,26 Alexander Hendricks,7 Sebastian Hinz,7 Kristian Hveem,9 Johannes Jongen,27,28 Ralf Junker,15 Tom Hemming Karlsen,29 3 30 31 ►► Additional supplemental Georg Hemmrich-­Stanisak, Wolfgang Kruis, Juozas Kupcinskas, material is published online 27,28,32 3,33 34 only. To view, please visit the Tilman Laubert, Philip C Rosenstiel, Christoph Röcken ‍ ‍ , journal online (http://dx.​ ​doi.org/​ ​ Matthias Laudes,35 Fabian H Leendertz,23,24 Wolfgang Lieb,36 Verena Limperger,15 10.1136/​ ​gutjnl-2020-​ ​323868). Nikolaos Margetis,37 Kerstin Mätz-­Rensing,38 Christopher Georg Németh,12,39 For numbered affiliations see 9,40,41 15 22 end of article. Eivind Ness-­Jensen ‍ ‍ , Ulrike Nowak-­Göttl, Anita Pandit, Ole Birger Pedersen,42 Hans Günter Peleikis,27,28 Kenneth Peuker,14,26 Correspondence to 4 3 43 Prof Mauro D’Amato, Cristina Leal Rodriguez, Malte Christoph Rühlemann ‍ ‍ , Bodo Schniewind, Gastrointestinal Genetics Lab, 3 31 44 9,45,46,47

CIC bioGUNE - BRTA, Bizkaia Martin Schulzky, Jurgita Skieceviciene, Jürgen Tepel, Laurent Thomas, http://gut.bmj.com/ Science and Technology Park, Florian Uellendahl-­Werth,3 Henrik Ullum,48 Ilka Vogel,49 Henry Volzke,50 Building 801A, 48160 Derio, 51 12 22 7 Spain; mdamato@​ ​cicbiogune.es​ Lorenzo von Fersen, Witigo von Schönfels, Brett Vanderwerff, Julia Wilking, and Prof Andre Franke, Institute 3 14,26 52 22 of Clinical Molecular Biology Michael Wittig, Sebastian Zeissig, Myrko Zobel, Matthew Zawistowski, (IKMB) & University Hospital Vladimir Vacic,53 Olga Sazonova,53 Elizabeth S Noblin,53 The 23andMe Research Team, Schleswig-Holstein­ (UKSH), 10 10 6 27,28,54

Christian-Albrechts-­ ­University of Gianrico Farrugia ‍ ‍ , Arthur Beyder, Thilo Wedel, Volker Kahlke, on September 25, 2021 by guest. Protected copyright. Kiel (CAU), Rosalind-Franklin-­ ­Str. 7 1,2,55,56 3,33 12, D-24105 Kiel, Germany; Clemens Schafmayer, Mauro D’Amato ‍ ‍ , Andre Franke ‍ ‍ a.​ ​franke@mucosa.​ ​de

TZ, DE and SJ contributed equally. ABSTRACT Results We demonstrate modest heritability and genetic CS, MD and AF contributed Objective Haemorrhoidal disease (HEM) affects a large equally. correlation of HEM with several other diseases from the and silently suffering fraction of the population but its GI, neuroaffective and cardiovascular domains. HEM PRS Received 18 December 2020 aetiology, including suspected genetic predisposition, validated in 180 435 individuals from independent datasets Revised 19 March 2021 is poorly understood. We report the first genome-­wide allowed the identification of those at risk and correlated with Accepted 30 March 2021 association study (GWAS) meta-­analysis to identify younger age of onset and recurrent surgery. We identified genetic risk factors for HEM to date. 102 independent HEM risk loci harbouring whose Design We conducted a GWAS meta-­analysis of 218 expression is enriched in blood vessels and GI tissues, and 920 patients with HEM and 725 213 controls of European © Author(s) (or their in pathways associated with smooth muscles, epithelial employer(s)) 2021. Re-­use ancestry. Using GWAS summary statistics, we performed permitted under CC BY-­NC. No multiple genetic correlation analyses between HEM and and endothelial development and morphogenesis. Network commercial re-­use. See rights other traits as well as calculated HEM polygenic risk transcriptomic analyses highlighted HEM coexpression and permissions. Published modules that are relevant to the development and integrity by BMJ. scores (PRS) and evaluated their translational potential in independent datasets. Using functional annotation of the musculoskeletal and epidermal systems, and the To cite: Zheng T, of GWAS results, we identified HEM candidate genes, organisation of the extracellular matrix. Ellinghaus D, Juzenas S, et al. Gut Epub ahead of print: which differential expression and coexpression in HEM Conclusion HEM has a genetic component that [please include Day Month tissues were evaluated employing RNA-­seq analyses. The predisposes to smooth muscle, epithelial and connective Year]. doi:10.1136/ localisation of expressed at selected loci was tissue dysfunction. gutjnl-2020-323868 investigated by immunohistochemistry.

Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 1 Endoscopy

related to the large number of haemorrhoidectomies performed Significance of this study every year.4 A number of HEM risk factors have been suggested, including Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from What is already known about this subject? human erect position. The tight anal sealing provided by the ►► Human haemorrhoidal disease (HEM) is a prevalent anorectal elaborated haemorrhoidal plexus may have developed during pathology characterised by the symptomatic enlargement human evolution co-­occurring with permanent bipedalism, as and distal displacement of anal cushions. shown by our histology comparison of four different mammals ►► As a consequence of scarce research and of being a taboo (human, gorilla, baboon, mouse; online supplemental figure 1, topic in society, only a few HEM non-­genetic risk factors online supplemental material 1). Other suggested risk factors have been suggested, thus the aetiology of the disease still are a sedentary lifestyle, obesity, reduced dietary fibre intake, remains unclear. spending excess time on the toilet, straining during defeca- ►► Genetic susceptibility to HEM has been suspected but was tion, strenuous lifting, constipation, diarrhoea, pelvic floor never systematically investigated. dysfunction, pregnancy and giving natural birth, with several What are the new findings? being controversially reported. The hypothetical model shown ► in online supplemental figure 2 summarises the contemporary ► Here, we report the first well-­powered genome-­wide 5 association study (GWAS) meta-­analysis with a sample size concepts regarding the pathophysiology of HEM development. of 944 133 individuals to identify genetic risk factors for HEM. Until today, HEM aetiopathogenesis is poorly investigated, and ►► We describe 102 novel independent HEM risk loci that are neither the exact molecular mechanisms nor the reason(s) why functionally linked to pathways associated with smooth only some people develop HEM are known. Genetic suscepti- muscles, epithelial and endothelial development and bility may play a role in HEM development, but no large-­scale, morphogenesis. genome-­wide association study (GWAS) for HEM has ever been ►► We show genetic correlations of HEM with several other conducted. To evaluate the contribution of genetic variation to diseases that are classified as GI, neuroaffective and the genetic architecture of HEM, we carried out a GWAS meta-­ cardiovascular disorders. analysis in 218 920 affected individuals and 725 213 population ►► We report significance of computed HEM polygenic risk controls of European ancestry. scores, which were validated in independent population-­ based cohorts. METHODS ►► We show significant enrichment of HEM genes in tissue coexpression modules responsible for the development of Detailed methods are provided in the online supplemental mate- musculoskeletal and epidermal systems, and the organisation rial 1 and summarised in the online supplemental figure 3. of the extracellular matrix. ►► Based on our data, we outline HEM as a disorder of impaired RESULTS neuromuscular motility, smooth muscle contraction and GWAS meta-analysis and fine-mapping genomic regions http://gut.bmj.com/ extracellular matrix organisation. We conducted a GWAS meta-analysis­ in 944 133 individuals How might it impact on clinical practice in the foreseeable of European ancestry from five large population-­based cohorts 6 7 future? (23andMe, UK Biobank (UKBB), Estonian Genome Centre 8 9 ►► The results from this GWAS provide new insights on HEM at the University of Tartu, Michigan Genomics Initiative, and 10 genetic predisposition to smooth muscle, epithelial and Genetic Epidemiology Research on Aging (GERA) ) including connective tissue dysfunction. 218 920 HEM cases and 725 213 controls (online supplemental

►► HEM polygenic risk scores identify individuals at risk and table 1). HEM cases had higher body mass index (BMI), were on September 25, 2021 by guest. Protected copyright. correlate with a more severe phenotype. significantly older and more often women compared with non-­ HEM controls; hence, age, sex, BMI (if available) and top prin- cipal components from principal component analysis (PCA) INTRODUCTION were included as covariates in individual GWAS (see ‘Methods’ Haemorrhoids are normal anal vascular cushions filled with section in online supplemental material 1, hereafter referred to blood at the junction of the rectum and the anus. It is assumed as online methods). that their main role in humans is to maintain continence1 but After data harmonisation and quality control, 8 494 288 high-­ other functions such as sensing fullness, pressure and perceiving quality, common (minor allele frequency >0.01) single nucle- anal contents have been suggested given the sensory innerva- otide polymorphisms (SNPs) were included in a fixed-­effect tion.2 Haemorrhoidal disease (hereafter referred to as HEM) inverse variance meta-analysis­ using the software METAL11 occurs when haemorrhoids enlarge and become symptomatic (online methods). We identified 5480 genome-­wide significant −8 (sometimes associated with rectal bleeding and itching/soiling) associations (PMeta <5×10 ), which were mapped to 102 inde- due to the deterioration or prolapse of the anchoring connec- pendent genomic regions using FUMA12 (online methods). A tive tissue, the dilation of the haemorrhoidal plexus or the summary of the association results for 102 lead SNPs is provided formation of blood clots. Severe forms of HEM often require in figure 1 and online supplemental table 2. Although genomic surgical treatment and the removal of abnormally enlarged and/ inflation was observed (λ=1.3; online supplemental figure 4), or thrombosed haemorrhoids.1 HEM prevalence increases with this was likely due to polygenicity rather than population stratifi- age and shows staggering figures worldwide (up to 86% prev- cation, as determined via linkage disequilibrium score regression alence in some reports),3 whereby a large proportion of cases analysis (LDSC, intercept=1.06; online methods) and based on a

remain undetected as asymptomatic or mild enough to be self-­ normalised λ1000=1.001. The heritability of HEM was estimated 2 treated with over-­the-­counter treatment. HEM represents a at 5% (SNP-based­ heritability, h SNP computed with LDSC), considerable medical and socioeconomic burden with an esti- whereby the newly identified 102 risk variants explained about mated annual cost of US$800 million in the USA alone, mainly 0.9% of the variance (online methods).

2 Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 Endoscopy Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from http://gut.bmj.com/ on September 25, 2021 by guest. Protected copyright.

Figure 1 Annotation of 102 haemorrhoidal disease (HEM) genome-­wide association study (GWAS) risk loci. From left to right: Manhattan plot −8 of GWAS meta-analysis­ results, (genome-­wide significance level—PMeta <5×10 —indicated with vertical dotted red line); Lead single nucleotide polymorphism (SNP)—marker associated with the strongest association signal from each locus (also annotated with a red circle in the Manhattan plot); Effect allele—allele associated with reported genetic risk effects (OR), also always the minor allele; OR with respect to the effect allele; Effect allele frequency—frequency of the effect allele in the discovery dataset; Number of SNPs in 95% credible set—the minimum set of variants from Bayesian fine-mapping­ analysis that is >95% likely to contain the causal variant; SNP with probability >50%—single variant (if detected) with >50% probability of being causal (coding SNPs highlighted in red); Nearest gene (#genes within locus boundaries)—gene closest to the lead SNP (if within 100 kb distance, otherwise ‘na’) and number of additional genes positionally mapped to the locus using FUMA (online supplemental table 2 and online methods). Signif. DGEx—locus containing HEM genes differentially expressed in RNA Combo-Seq­ analysis of HEM affected tissue, detected at higher (green) and/or lower (red) level of expression (see online methods).

Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 3 Endoscopy

Bayesian fine-mapping­ analysis with FINEMAP13 identified We then investigated genetic correlations with 1387 other human a total of 3323 SNPs that belong to the 95% credible sets of traits and conditions using LDSC as implemented in CTG-­VL17 Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from variants most likely to be causal at each locus (online methods, (online methods). The strongest correlations (rg) were observed online supplemental figure 5 and online supplemental table 3). with diseases and traits from the GI domain, including ‘other −8 For six loci, we pinpointed the association signal to a single diseases of anus and rectum’ (rg=0.78, PFDR=4.94×10 ), ‘fissure −12 causal variant with greater than 95% certainty, including two and fistula of anal and rectal regions’ (rg=0.58, PFDR=2.70×10 ), −7 missense variants rs2186797 (ANO1) and rs35318931 (SRPX). ‘self-­reported IBS’ (rg=0.42, PFDR=1.87×10 ), use of ‘laxatives’ For another 24 loci, there was evidence that the lead variant is −14 (rg=0.42, PFDR=2.35×10 ), and ‘diverticular disease’ (rg=0.23, causal with >50% certainty (figure 1). −9 PFDR=6.68×10 ) among others (figure 2). Given these similarities, we performed a genome-­wide pleiotropy analysis for diverticular Cross-trait analyses disease, IBS and HEM, revealing 44 independent genomic regions A lookup of HEM association signals in previous GWAS studies shared by at least two phenotypes from the group of diverticular retrieved from GWAS Atlas,14 GWAS Catalog15 and via Phenos- disease, IBS and HEM phenotypes (online supplemental table 5). canner v216 (online methods) revealed that 76/102 loci had been Other notable correlations were detected for psychiatric and neuro- previously associated with diseases and traits across the meta- affective disorders (anxiety, depression and neuroticism), pain-­ bolic, cardiovascular, digestive, psychiatric, environmental and related traits (including abdominal pain and painful gums), diseases other domains (online supplemental figure 6 and online supple- of the circulatory system, and diseases of the musculoskeletal system mental table 4). and connective tissue (figure 2 and online supplemental table 6). http://gut.bmj.com/ on September 25, 2021 by guest. Protected copyright.

Figure 2 Genetic correlation between haemorrhoidal disease and other traits estimated by linkage disequilibrium score regression analysis. Genetic correlations (rg +se) are shown for selected traits, grouped by domain. Only correlations significant after Bonferroni correction were considered (full list available in online supplemental table 6). ICD, International Classification of Diseases.

4 Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 Endoscopy Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from

Figure 3 Analysis of haemorrhoidal disease (HEM) genetically correlated traits in UK Biobank (UKBB) and the Danish National Patient Registry (DNPR). Traits and conditions identified in linkage disequilibrium score regression analyses of genetic correlation with HEM (outer ring in the circos plot, see also figure 2 and online supplemental table 6) were studied for their differential prevalence in UKBB and DNPR, based on data extracted from participants’ healthcare records. Significant results are reported, respectively, as ORs (log(OR), UKBB, middle ring) and relative risk (log(RR), DNPR, inner ring) or ‘ns’ (for non-­significant findings). Diseases and traits are categorised according to ICD10 diagnostic codes or self-reported­ http://gut.bmj.com/ conditions and use of medications from questionnaire data (see online methods). Self-reported­ traits in UKBB (dark blue colour) were manually mapped to ICD10-­codes in DNPR.

To gain further insight into potential cross-­trait overlaps and Study (HUNT19 ; n=69 291; 977 cases vs 68 314 controls), (2) to validate the LDSC results, we assessed whether genetically the Danish Blood Donor Study (DBDS20 ; n=56 397; 1754 cases

correlated traits and conditions also occur more frequently in vs 54 643 controls), and (3) 1144 HEM cases from gastroen- on September 25, 2021 by guest. Protected copyright. patients with HEM by analysing data on diagnoses and medica- terology clinics compared with 2740 cross-sectional­ population tions from UKBB. The results were highly consistent with those controls from Germany21 (online supplemental table 1). In all obtained via LDSC (figure 3, online supplemental table 6). three datasets, patients with HEM showed significantly higher Compared with controls, patients with HEM additionally suffered PRS values compared with controls (OR=1.24, p=1.98×10−11; more often from diverticular disease, IBS and other functional GI OR=1.28, p=3.52×10−22 and OR=1.36, p=7.64×10−15, disorders (FGIDs), abdominal pain, hypertension, ischaemic heart respectively for the two population-­based cohorts HUNT and disease, depression, anxiety, and diseases of the musculoskeletal DBDS, and the German case-control­ cohort). In HUNT and system and connective tissue among others. To further consoli- DBDS, HEM prevalence increased across PRS percentile distri- date these findings, we analysed an independent population-­scale butions, with individuals from the top 5% tail being exposed healthcare record dataset comprising 8 172 531 individuals from to higher HEM risk compared with the rest of the population the Danish National Patient Registry (online methods), obtaining (OR=1.68, p=1.55×10−5, and OR=1.68, p=6.11×10−8, similar results (see inner circle of figure 3). Therefore, based on respectively for HUNT and DBDS, figure 4). Higher HEM PRS these largely overlapping observations at the genetic and epide- were also associated with a more severe phenotype as defined miological level, HEM appears to be strongly associated with by the need for recurrent invasive procedures (OR=1.03, other diseases of the digestive, neuroaffective and cardiovascular p=8.63×10−3 in German patients) and a younger age of onset domains. (p=1.90×10−3 in DBDS; p=4.01×10−3 in German patients).

Polygenic risk scores (PRS) We next exploited meta-analysis­ summary statistics to compute Functional annotation of GWAS loci, tissue and pathway HEM PRS and evaluate their relevance and translational poten- enrichment analyses tial in independent datasets. HEM PRS were calculated with In order to identify most likely candidate genes and relevant PRSice-218 (online methods) and their performance tested in molecular pathways, we used independent computational pipe- three independent datasets: (1) the Norwegian Trøndelag Health lines for the functional annotation of GWAS results. This yielded

Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 5 Endoscopy Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from

Figure 4 Risk haemorrhoidal disease (HEM) prevalence across polygenic risk score (PRS) percentile distributions. PRS was derived from the results of the association meta-analysis­ (see online methods). HEM prevalence (%, Y-­axis) is reported on a scatter plot in relation to PRS percentile distribution (X-axis)­ in the Norwegian Trøndelag Health Study (HUNT) (A) and the Danish Blood Donor Study (DBDS) (B) population cohorts. The top 5% of the distribution is highlighted with a shaded area in both cohorts, and the results of testing HEM prevalence in this group versus the rest of the population are also reported (p value and OR from logistic regression; online methods). a total of 819 non-­redundant HEM-­associated transcripts (here- normalisation for cell-type­ heterogeneity in different tissues after referred to as HEM genes; online supplemental table 7) (online methods, online supplemental figure 8), 720 out of 819 http://gut.bmj.com/ derived from alternative positional (N=540 total from FUMA,12 candidate genes were found to be expressed in haemorrhoidal DEPICT22 and MAGMA23) and expression quantitative trait tissue, with 287 (39.9%) of these being among the most strongly (eQTL, N=562 from FUMA) mapping efforts (online methods). expressed transcripts (upper quartile) (online supplemental Tissue-specific­ enrichment analyses (TSEA) of 540 positional table 7). Compared with normal tissue from controls, 18 HEM candidates led to very similar results for all three approaches, candidate genes from 14 independent loci showed differential

with enrichment of HEM gene expression in blood vessels, expression in haemorrhoidal tissue (PFDR <0.05 and |log2 fold

colon and other relevant tissues (online supplemental figure 7 change|>0.5), with 12 genes showing increased and 6 decreased on September 25, 2021 by guest. Protected copyright. and online supplemental table 8). Similarly, gene-set­ enrichment expression (figure 5A). analyses (GSEA) highlighted common pathways important in the To obtain further biological insight from transcriptomic development of vasculature and the intestinal tissue including profiling, HEM candidate genes were further characterised the (GO) terms ‘tube morphogenesis and devel- for their membership in coexpressed gene modules identified opment’, ‘artery morphogenesis and development’, ‘epithelium via weighted gene correlation network analysis (WGCNA)24 morphogenesis’, ‘smooth muscle tissue morphogenesis’ and (figure 5B; online methods). The final network consisted of others. Additionally, DEPICT detected enrichment for a number 36 342 genes partitioned into 41 coexpression modules, and 3 of traits from the Mammalian Phenotype Ontology including of these (M1, M4 and M7; referred to as HEM modules) were ‘abnormal intestinal morphology’, ‘rectal prolapse’, ‘abnormal significantly enriched (PFDR <0.05) for HEM genes (figure 5C). blood vessel (and artery) morphology’, and ‘abnormal smooth In total, 260 (35.7%) expressed HEM genes were members of muscle morphology and physiology’ (online supplemental table modules M1 (N=121), M4 (N=75) or M7 (N=64) (online 8). TSEA and GSEA analyses performed on all 819 transcripts supplemental table 9). Functional annotation of these modules including eQTL genes from FUMA gave rise to similar results revealed M1 enrichment for ‘extracellular matrix (ECM) although these did not reach statistical significance (not shown). –18 organisation’ (PFDR=5.17×10 ) and ‘muscle contraction’ –17 (PFDR=7.90×10 ); M4 for ‘mitochondrion organisation’ –4 –4 Gene expression in HEM tissue and gene-network analyses (PFDR=2.35×10 ) and ‘glycosylation’ (PFDR=5.70×10 ); and –52 The expression of HEM genes was studied in integrated mRNA M7 for ‘cornification’ (PFDR=3.97×10 ) and ‘epidermis devel- –41 and microRNA Combo-­Seq analysis of enlarged haemorrhoidal opment’ (PFDR=2.07×10 ) (figure 5C). The WGCNA analysis tissue from 20 patients with HEM and normal specimens from also allowed for the identification of the most interconnected 18 controls (online methods, online supplemental table 1). HEM genes, or module hub genes (ie, central nodes in the scale-­ genes were examined with regard to their expression status, free network). Of note, several HEM candidate genes, namely differential expression between cases and controls, and their NEGR1, MRVI1, MYH11, ELN and CHRDL1, were among the connectivity and topology in gene coexpression networks. After top 50 hub genes for module M1 (figure 5D). M1 is the module

6 Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 Endoscopy Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from http://gut.bmj.com/

Figure 5 Analysis of mRNA and microRNA (Combo-­Seq) data from haemorrhoidal disease (HEM) affected tissue, in relation to HEM genes coexpression networks. (A) Volcano plot reporting HEM genes differentially expressed in haemorrhoidal tissue (significantly upregulated=red, and downregulated=green); (B) schematic representation of the analytical flow for HEM genes coexpression network module identification and characterisation; (C) upper panel (barplot): overrepresentation analysis of HEM genes in coexpression network modules, with significant enrichment

(PFDR <0.05) in modules M1, M4 and M7; lower panel (dotplot): top five gene ontology terms (biological process) from gene set enrichment analysis on September 25, 2021 by guest. Protected copyright. relative to M1, M4 and M7 coexpression modules (gene counts and false discovery rate (FDR)-adjusted­ significance level are also reported as indicated); (D) coexpression hub network of module M1. The network represents strength of connections (weighted Pearson’s correlation >0.7) among the top 50 hub genes with highest values of intramodular membership (size of the node). HEM genes and the top 5 hub genes are highlighted in red and black, respectively.

−5 most significantly enriched for HEM genes (PFDR=6.40×10 ), (online supplemental table 7). Some notable observations were and also the largest (n=3975) coexpression module in the haem- made in relation to a subset of these prioritised genes, whose orrhoidal tissue (online supplemental table 9), thus pointing to associated evidence and known biological function(s) make them the importance of its associated GO terms ‘ECM organisation’ remarkably good candidates to play a role in HEM risk. and muscle contraction in HEM. Two genes, ANO1 and SRPX, were both linked to a single coding variant fine-mapped­ as causal with very high confi- Prioritised HEM genes dence (rs2186797, ANO1, p.Phe608Ser with PP=97.0%, and In order to identify genes most likely to play a causative role in rs35318931, SRPX, p.Ser413Phe with PP=87.3%, respectively). HEM, we selected candidates based on a scoring approach by ANO1 encodes the voltage-­gated calcium-activated­ anion prioritising those associated with one or more of the following: channel anoctamin-1 , which is highly expressed in the (1) linked to a high-confidence­ fine-mapped­ variant (posterior interstitial cells of Cajal (ICCs) throughout the human GI tract, probability (PP)>50%), (2) differentially expressed in enlarged where it contributes to the control of intestinal motility and peri- haemorrhoidal tissue, (3) highlighted by pathway and tissue/ stalsis.25 The ANO1:p.Phe608Ser (F608S) variant is predicted cell-type­ enrichment DEPICT analysis (online methods), or (4) to destabilise local protein structure and to disturb ANO1-­ predicted hub of a WGCNA coexpression HEM module (M1, activating phospholipid interactions (online supplemental figure M4, M7). This reduced the number of candidates from 819 to 9). Indeed, site-directed­ mutagenesis and electrophysiology 100 prioritised genes associated with 58 independent HEM loci experiments in vitro showed that the amino acid 608 Phe to

Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 7 Endoscopy

Ser change leads to an increased voltage-­dependent instanta- Localisation of selected HEM gene-encoded proteins neous Cl− current, and a slowing of activation and deactivation A site-specific­ analysis of selected candidate proteins for their kinetics (especially at high Ca2+ concentrations) (online supple- localisation in anorectal tissues underlined the complex multi- Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from mental figurer 10). The X-­linked gene SRPX codes for a Sushi factorial nature of cellular components potentially involved in repeat-containing­ protein whose domain composition implies a the pathogenesis of HEM. Indeed, analysed candidates displayed role in ECM, and is expressed in various ECM tissues including a broad spectrum of expression in intestinal mucosal, neuromus- colon and liver.26 The SRPX:p.Ser413Phe variant (rs35318931) cular, immune and anodermal tissues (figure 6, online supple- may potentially destabilise the C-­terminal domain of unknown mental figure 13, online supplemental tables 11 and 12). They function (DUF4174) (online supplemental figure 11), which is were also directly colocalised with haemorrhoidal blood vessels, conserved in various ECM proteins (online methods, section In suggesting a putative role connected with the haemorrhoidal silico variant protein analysis). vasculature itself. At locus 7q22.1, the signal was fine-mapped­ with very high confidence to rs4556017 (PP=96.2%), which exerts eQTL DISCUSSION effects on ACHE and SRRT, both showing high levels of expres- Given the lack of large and systematic epidemiological and sion in enlarged haemorrhoidal tissue, and found in the M4 and molecular studies, and despite its worldwide distribution, HEM M7 coexpression modules (figure 5), respectively. However, still can be regarded as an understudied disease. In this study, while SRRT encodes a poorly characterised capped-RNA­ binding we report the largest and most detailed genome-wide­ analysis protein, ACHE appears a much better candidate as the gene of HEM, implemented via a combination of classical GWAS encodes an enzyme that hydrolyses the neurotransmitter acetyl- approaches and the use of minimal phenotyping, as recently choline at neuromuscular junctions, and corresponds to the shown to be effective in boosting sample size for increased statis- 28 Cartwright blood group antigen Yt.27 One more locus was fine-­ tical power. We demonstrate for the first time that HEM is a mapped to single-variant­ resolution with high confidence, SNP partly inherited condition with a weak but detectable heritability rs10956488 from locus 8q24.21 (PP=0.962), which is linked estimated at 5% based on SNP data. We identify 102 independent to an eQTL for GSDMC. Gasdermin C (encoded by GSDMC) risk loci, which were functionally annotated based on computa- is a poorly characterised member of the gasdermin family of tional predictions and gene expression analysis of diseased and proteins expressed in epithelial cells and in enlarged haemor- normal tissue. These loci alone explain approximately 0.9% of HEM heritability, which is of similar magnitude with respect rhoidal tissue, though the mechanisms of its eventual HEM 29 involvement remain elusive. to the genetic contribution of other common complex traits. Other genes were linked via eQTL to a variant mapped They provide important novel pathophysiological insight, which we discuss below in relation to individual pathways and mecha- with PP >50% and were prioritised based on additional nisms proposed to contribute to HEM aetiology. experimental evidence. Among these, the fine-­mapped SNP rs6498573 from locus 16p13.11 (PP=63.2%) is associated

with eQTL effects on MYH11 (encoding muscle myosin ECM, elasticity of the connective tissue and smooth muscle http://gut.bmj.com/ heavy chain 11; online supplemental table 10), a gene coding function for a smooth muscle myosin heavy chain that shows mRNA The non-­vascular components of the anal cushions consist of the upregulation in enlarged haemorrhoidal tissue and consti- transitional epithelium, connective tissue (elastic and collage- 5 tutes a hub for the M1 coexpression module associated with nous) and the submucosal anal muscle (muscle of Treitz). Treitz’s ECM organisation and muscle contraction. Notable priori- muscle tightly maintains the anal cushions in their normal posi- tised candidates were also observed at loci that were not fine-­ tion, and its deterioration is considered one of the most important mapped, including ELN (encoding elastin, a key component pathogenetic factors in the formation of enlarged and prolapsed on September 25, 2021 by guest. Protected copyright. of the ECM found in the connective tissue of many organs, haemorrhoids (online supplemental figure 2). Anal cushion fixa- highly expressed in enlarged haemorrhoidal tissue and hub tion is further facilitated by elastic and collagenous connective tissue, whose degeneration, due, for instance, to abnormalities gene for the M1 coexpression module), COL5A2 (encoding 30 type V collagen protein; highly expressed in HEM tissue in collagen composition, has been involved in HEM aetiology, and belong to M1 coexpression module), PRDM6 (encoding although without strong molecular evidence. The coexpression module M1 identified from HEM tissue is a putative histone methyltransferase regulating vascular linked to ECM organisation and muscle function and is enriched smooth muscle cells contractility, expressed in enlarged for HEM gene thus providing novel important evidence for a haemorrhoidal tissue and hub for the M1 module), and role of these two interconnected processes in HEM pathogen- others (online supplemental table 7). esis. ELN (lead SNP rs11770437) is one of the HEM prioritised Finally, while no candidate genes could be highlighted genes, and also a main hub gene for the M1 module. ELN codes from the top GWAS hit region on chr12q14.3 (rs11176001), for the elastin protein, a key component of elastic fibres that both the second and third strongest GWAS signals were comprise part of the ECM and confer elasticity to organs and detected at loci linked to genes involved in the determina- tissues including blood vessels. Mutations in the ELN gene have tion of blood groups, namely ABO (rs676996) and the Kell been shown to cause cutis laxa, a disease in which dysfunctional Blood Group Complex Subunit-­Related Family member elastin interferes with the formation of elastic fibres, thus weak- XKR9 (rs1838392). In addition, blood group antigens are ening connective tissue in the skin and blood vessels. Of note, encoded at additional loci, including ACHE (rs4556017) and ELN has been recently implicated also in diverticular disease,31 XKR6 (identified by MAGMA). Imputation of human ABO a condition characterised by outpouchings of the colonic wall blood types from genotype data revealed the O type to be at sites of relative weakness and/or defective elasticity of the associated with increased and A and B types decreased HEM connective and muscle layers, as well as in the common skin risk, both in UKBB and GERA datasets (online supplemental condition non-­syndromic striae distensae (NSD, also known as figure 12). stretch marks), whose manifestation is due to lost tissue elasticity

8 Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 Endoscopy Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from

Figure 6 Immunohistochemistry for selected haemorrhoidal disease (HEM) candidate proteins. Illustration of the rectum and anal canal (A) indicating the site-­specific localisation of the immunohistochemical panels nalyseda in (B). Results of fluorescence immunohistochemistry are shown for selected HEM candidate proteins encoded by HEM genes COL5A2 (rs16831319), SRPX (rs35318931), ANO1 (rs2186797), MYH11 (rs6498573) and ELN (rs11770437) (see also online supplemental table 11 and online supplemental figure 13). Antibody staining was performed on FFPE colorectal tissue specimens from control individuals. Picture layers correspond to the rectal mucosa (top row, epithelial surface delimited by a white dotted line, *=intestinal lumen), smooth musculature (second row), enteric ganglia (third row, ganglionic boundaries delimited by a white dotted line), haemorrhoidal plexus (fourth row, endothelial surface delimited by a white dotted line, *=vascular lumen) and the anoderm (bottom row, surface http://gut.bmj.com/ of the anoderm delimited by a white dotted line). Blue: DAPI; green: α-SMA (anti-alpha­ smooth muscle actin antibody) for rows 2 and 4 (smooth musculature/haemorrhoidal plexus) and PGP9.5 (member of the ubiquitin hydrolase family of proteins, neuronal marker) for row 3 (enteric ganglia); red: antibody for the respective candidate protein. Arrows point to respective candidate-positive­ cells within the vascular wall. Arrowheads point to respective candidate-positive­ nucleated immune cells.

32 at affected skin sites. Hence, similar mechanisms may underly for GSDMC (lead SNP rs10956488), an uncharacterised gene on September 25, 2021 by guest. Protected copyright. HEM risk due to genetic variation in ELN and, notably, also also shown to be relevant to lumbar disc herniation and back the SRPX (lead SNP rs35318931) and COL5A2 (lead SNP pain37 and PRDM6, a histone methyltransferase that acts as a rs16831319) genes. The SRPX lead SNP is also strongly associ- transcriptional repressor of smooth muscle gene expression. ated with NSD, and is a coding variant potentially impacting the Finally, besides individual association signals, we detected signif- function of an ECM protein. COL5A2 codes for type V fibril-­ icant genetic correlation with several other complex diseases of forming collagen that has regulatory roles during development shared aetiology (hernia, dorsalgia), for which connective tissue and growth of type I collagen-positive­ tissues. Mutations in this and/or muscle alterations are described. gene are known to cause Ehlers-Danlos­ syndrome, a rare connec- tive tissue disease that affects the skin, joints and blood vessels, and for which a link to HEM has already been postulated.33 An Gut motility additional hub gene for the coexpression module M1, MYH11 Several lines of evidence link gut motility to the pathophysiology (lead SNP rs6498573) encodes a smooth muscle myosin protein of HEM in this study. In our UKBB analyses, patients with HEM that is important for muscle contraction and relaxation, and were found to suffer more often from IBS and other dysmo- whose dysfunction has been linked to vascular diseases34 and GI tility syndromes than controls, as evidenced also by the increased dysmotility.35 Functional studies in smooth muscle cells showed use of medications including laxatives. These conditions also that overexpression of MYH11 led to a paradoxical decrease showed strongest genetic correlation with HEM among all of protein levels through increased autophagic degradation tested traits, indicating similar genetic architecture and predis- followed by disruption of contractile signalling.36 Our transcrip- posing mechanisms. Moreover, given the important role of the tome analysis also showed MYH11 RNA upregulation in HEM gut-brain­ axis in IBS and other FGIDs,38 it is possible that the tissue, thus suggesting possible alterations in smooth muscle correlations observed for anxiety, depression and other neuroaf- action that may be relevant, for instance, to the Treitz’s muscle fective traits may mediate genetic risk effects at least in part via function(s). Additional evidence for the involvement of the similar mechanisms also involving gut motility. Constipation and muscoskeletal system may come from the associations detected prolonged sitting and straining during defecation are associated

Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 9 Endoscopy with delayed GI transit and reduced peristalsis and are among supplemental figure 14). This sets the stage for more detailed the proposed HEM risk factors (online supplemental figure genetic and mechanistic follow-up­ analyses, the search for ther- 2). Harder stools, due for instance to infrequent defecations, apeutically actionable genes and pathways, and the eventual Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from can cause difficulty in bowel emptying and therefore increase exploitation for the adoption of preventive measures based on pressure and mechanical friction on the haemorrhoidal cush- computed individual predisposition. ions, leading to excessive engorgement and stretching or tissue damage. The relationship between HEM and gut motility is Author affiliations 1School of Biological Sciences, Monash University, Clayton, Victoria, Australia probably best evidenced by the association signal detected at the 2 ANO1 locus: its lead SNP rs2186797 corresponds to the missense Unit of Clinical Epidemiology, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden variant ANO1:p.Phe608Ser (F608S) that was fine-­mapped with 3Institute of Clinical Molecular Biology, Christian-­Albrechts-­University of Kiel, Kiel, very high confidence and shown to impact anoctamin-1 func- Germany tion. Anoctamin-1 is an ion channel expressed in the ICCs, the 4Novo Nordisk Foundation Center for Protein Research, Disease Systems Biology, pacemakers of the GI tract controlling intestinal peristalsis, has Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, 39 Denmark already been implicated in IBS, and is also expressed in the 5Institute of Biotechnology, Life Science Centre, Vilnius University, Vilnius, Lithuania vasculature. Hence, it represents an ideal candidate to also affect 6Institute of Anatomy, Christian-­Albrechts-­University of Kiel, Kiel, Germany HEM risk via genotype-­driven modulation of ICC function. An 7Department for General, Visceral, Vascular and Transplantation Surgery, University additional interesting candidate is ACHE, which shows an eQTL Medical Center Rostock, Rostock, Germany 8 for the fine-­mapped lead SNP rs4556017: ACHE codes for an Institute of Genomics, University of Tartu, Tartu, Estonia 9K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and enzyme that hydrolyses the neurotransmitter acetylcholine at Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway neuromuscular junctions and is overexpressed in Hirschsprung’s 10Enteric NeuroScience Program, Division of Gastroenterology and Hepatology, Mayo disease, a condition in which gut motility is compromised due to Clinic, Rochester, MN, USA 11Institute of Advanced Research, Tokyo Medical and Dental University, Tokyo, Japan the absence of nerve cells (aganglionosis) in the distal or entire 12 segments of the large bowel.40 Of interest, expression in the Department of General-, Visceral- Transplant-, Thoracic and Pediatric Surgery, Kiel University, Kiel, Germany enteric ganglia next to the haemorrhoidal plexus was observed 13Medical Office for Surgery Preetz, Preetz, Germany for several proteins encoded by HEM genes in our immunofluo- 14Medical Department 1, University Hospital Dresden, Technische Universität Dresden rescence experiments. (TU Dresden), Dresden, Germany 15University Hospital Schleswig-­Holstein, Institute of Clinical Chemistry, Thrombosis & Hemostasis Treatment Center, Campus Kiel & Lübeck, Kiel, Germany 16 Vasculature and circulatory system Department of Internal Medicine, Hospital Oberndorf, Teaching Hospital of the Paracelsus Private Medical University of Salzburg, Oberndorf, Austria Previous observations showed that HEM is not varicosities and 17Medical office for Colo-­Proctology Kiel, Kiel, Germany accelerated blood flow velocities were observed in afferent 18Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark vessels of patients with HEM.41 An impaired drainage or filling 19Department of Internal Medicine III, University Hospital Schleswig-­Holstein, of the anal cushion may contribute to cushion slippage and Campus Kiel, Kiel, Germany 20Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, may thus be considered as one of many disease-causing­ factors, 41 Germany http://gut.bmj.com/ as already previously proposed. Our genetic data support 21DZHK (German Centre for Cardiovascular Research), Partner Site Hamburg/Kiel/ the involvement of the vasculature as an important player in Lübeck, Kiel, Germany 22 HEM pathophysiology. TSEA and GSEA results downstream of Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA 23 HEM GWAS meta-analysis­ highlight blood vessels and artery Epidemiology of highly pathogenic microorganisms, Robert Koch-­Institute, Berlin, Germany morphogenesis among the HEM gene-­enriched tissues and 24Department of Primatology, Max Planck Institute for Evolutionary Anthropology, GO pathways, respectively. At the same time, moderate genetic Leipzig, Germany 25 correlation is detected for diseases of the circulatory system Department of Medicine I, Institute of Cancer Research, Medical University Vienna, on September 25, 2021 by guest. Protected copyright. Vienna, Austria in the LDSC analyses. We identified a very strong association 26 signal in correspondence of the ABO locus on 9, Center for Regenerative Therapies Dresden (CRTD), Technische Universität (TU) Dresden, Dresden, Germany which determines the corresponding ABO blood group type (A, 27Department of Proctological Surgery Park Klinik Kiel, Kiel, Germany B, AB and O). In addition to red blood cells, ABO antigens are 28Proctological Office Kiel, Kiel, Germany expressed on the surface of many cells and tissues, and have been 29Research Institute for Internal Medicine, Division of Surgery, Inflammatory Diseases strongly associated with coronary artery disease, thrombosis, and Transplantation, Oslo University Hospital Rikshospitalet and University of Oslo, 42 43 Oslo, Norway haemorrhage, GI bleeding and other conditions related to 30Faculty of Medicine, University of Cologne, Cologne, Germany the circulatory system. Interestingly, increased risk for coronary 31Department of Gastroenterology, Institute for Digestive Research, Lithuanian artery disease has been reported in patients with HEM in at least University of Health Sciences, Kaunas, Lithuania some studies44 and replicated in our UKBB cross-­disease anal- 32University of Lübeck, Lübeck, Germany 33 yses although amidst other hundreds of associations of similar University Hospital of Schleswig-­Holstein (UKSH), Kiel Campus, Kiel, Germany 34Department of Pathology, University Medical Center Schleswig-­Holstein, Kiel, magnitude of effects. We imputed ABO blood types from geno- Germany type data in UKBB, and detected increased HEM risk for carriers 35Division of Endocrinology, Diabetes and Clinical Nutrition, Department of Medicine of the O type. O type has been reported to be protective for 1, University of Kiel, Kiel, Germany 36Institute of Epidemiology, Christian-­Albrechts-­University of Kiel, Kiel, Germany coronary artery disease in UKBB, although also predisposing 37 to hypertension.45 Hence, the potential mechanism(s) by which University of Athens, Athens Euroclinic, Athens, Greece 38Pathology Unit, German Primate Center, Leibniz Institute for Primatology, variation at this locus impacts HEM risk remain elusive at this Göttingen, Germany stage. Blood antigens are nevertheless likely relevant, as other 39Department of Ophthalmology, Hospital Frankfurt Hoechst, Frankfurt, Germany genes involved in the determination of specific blood groups are 40Upper Gastrointestinal Surgery, Department of Molecular Medicine and Surgery, also among the 102 HEM GWAS hits (Kell blood group locus Karolinska Institutet, Stockholm, Sweden 41Department of Medicine, Levanger Hospital, Nord-­Trøndelag Hospital Trust, XKR9, lead SNP rs1838392). Levanger, Norway In summary, our data provide important new insight into 42Department of Clinical Immunology, Naestved Hospital, Naestved, Denmark currently lacking evidence46 about HEM pathogenesis (online 43General Hospital Lüneburg, Lüneburg, Germany

10 Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 Endoscopy

44Department of General and Thoracic Surgery, Hospital Osnabrück, Osnabrück, Competing interests Vladimir Vacic and Olga V. Sazonova are/were employed by Germany and hold stock or stock options in 23andMe, Inc. All other authors have nothing to 45 Department of Clinical and Molecular Medicine, Norwegian University of Science declare. Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from and Technology, Trondheim, Norway Patient consent for publication Not required. 46BioCore – Bioinformatics Core Facility, Norwegian University of Science and Technology, Trondheim, Norway Provenance and peer review Not commissioned; externally peer reviewed. 47 Clinic of Laboratory Medicine, St.Olavs Hospital, Trondheim University Hospital, Data availability statement Data are available in a public, open access Trondheim, Norway 48 repository. Genome-­wide summary statistics of our analyses are publicly available Department of Clinical Immunology, Copenhagen University Hospital, Copenhagen, through our web browser (http://​hemorrhoids.​online) and have been submitted to Denmark the European Bioinformatics Institute (​www.​ebi.​ac.​uk/​gwas) under the accession 49Department of Surgery, Community Hospital Kiel, Kiel, Germany 50 number GCST90014033. RNA-­seq data have been deposited at NCBI Gene Institute for Community Medicine, University Medicine Greifswald, Greifswald, Expression Omnibus (GEO) under the accession number GSE154650. Germany 51Department Research & Conservation, Zoo Nuremberg, Nuremberg, Germany Supplemental material This content has been supplied by the author(s). It 52Department of Gastroenterology, Helios Hospital Weißeritztal, Freital, Germany has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have 5323andMe Inc, Sunnyvale, CA, USA been peer-­reviewed. Any opinions or recommendations discussed are solely those 54Christian-­Albrechts-­University of Kiel, Kiel, Germany of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and 55Gastrointestinal Genetics Lab, CIC bioGUNE - BRTA, Derio, Spain responsibility arising from any reliance placed on the content. Where the content 56Ikerbasque, Basque Foundation for Science, Bilbao, Spain includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, Twitter Malte Christoph Rühlemann @mruehlemann, Arthur Beyder @beyderlab terminology, drug names and drug dosages), and is not responsible for any error and Mauro D’Amato @damato_mauro and/or omissions arising from translation and adaptation or otherwise. Contributors AF, CS and MD’A were responsible for the concept and the design Open access This is an open access article distributed in accordance with the of the study. CS and GB coordinated the recruitment of the German clinical cohort. Creative Commons Attribution Non Commercial (CC BY-­NC 4.0) license, which VK, JG, IV, BS, HGP, AH, JJ, JW, TB, MD, FB, SH, JK, JS, WK, MZo, CGN, TL, JT and permits others to distribute, remix, adapt, build upon this work non-­commercially, WS recruited and phenotyped patients of the clinical cohorts. VK collected and and license their derivative works on different terms, provided the original work is phenotyped the biopsy samples that were used in the immunohistochemistry and properly cited, appropriate credit is given, any changes made indicated, and the use transcriptome analyses. Animal samples for histology were taken and provided by is non-­commercial. See: http://​creativecommons.org/​ ​licenses/by-​ ​nc/4.​ ​0/. KP, SZ, LF, TG, FHL and KM-­R. Immunohistochemistry and histology analyses were performed by FC and TW with support of JW and CR. TZ, DE and FD performed ORCID iDs genotype and phenotype data collection. TZ, DE, and FD performed GWAS data Tenghao Zheng http://orcid.​ ​org/0000-​ ​0003-1587-​ ​7481 quality control and imputation. SJ performed the transcriptome (RNA-­Seq) analysis David Ellinghaus http://orcid.​ ​org/0000-​ ​0002-4332-​ ​6110 with help from GI and PR. FU-­W, SBu, and MF helped with bioinformatic analyses. Simonas Juzenas http://orcid.​ ​org/0000-​ ​0001-9293-​ ​0691 IFJ, KB, SBr, CLR, CE, OBP, and HU conducted analyses in DBDS and DNPR. HC, VL, Christian Datz http://orcid.​ ​org/0000-​ ​0001-7838-​ ​4532 RJ, NF, WL, ML, and UN-­G contributed German control data. SBu, CD, HV, AG and Michael Forster http://orcid.​ ​org/0000-​ ​0001-9927-​ ​5124 JH contributed the GWAS data on diverticular disease. Norwegian HUNT data were Andrea Gsur http://orcid.​ ​org/0000-​ ​0002-9795-​ ​1528 contributed by AHS, THK, MEG, LT, EN-­J and KH. The Estonian dataset was provided Jochen Hampe http://orcid.​ ​org/0000-​ ​0002-2421-​ ​6127 by TE and MT-­L. Protein models and analyses were performed by GM. The analysis Christoph Röcken http://orcid.​ ​org/0000-​ ​0002-6989-​ ​8002 of the MGI data set was performed by MZa, BV, AP and LGF. The 23andMe data set Eivind Ness-Jensen­ http://orcid.​ ​org/0000-​ ​0001-6005-​ ​0729 was analyzed and provided by OS, ESN, and VV. MW implemented ABO blood-­group Malte Christoph Rühlemann http://orcid.​ ​org/0000-​ ​0002-0685-​ ​0052

inference. SC, PRS, GF and AB performed site-­directed ANO1 mutagenesis and Gianrico Farrugia http://orcid.​ ​org/0000-​ ​0003-3473-​ ​5235 http://gut.bmj.com/ whole-­cell electrophysiology experiments. TZ, DE implemented statistical models and Mauro D’Amato http://orcid.​ ​org/0000-​ ​0003-2743-​ ​5197 performed the (meta-)analysis­ . TZ, DE, and SJ curated and interpreted results. MS Andre Franke http://orcid.​ ​org/0000-​ ​0003-1530-​ ​5811 designed online supplementary figure S2 with scientific input from NM and AF. The GWAS data browser was implemented by MR and set up by GH-­S. AF, MD’A and DE wrote the manuscript draft with substantial contributions from TZ and SJ. All authors REFERENCES reviewed, edited and approved the final manuscript. 1 Jacobs D. Clinical practice. Hemorrhoids. N Engl J Med 2014;371:944–51. Funding This project was funded by Andre Franke’s and Clemens Schafmayer’s 2 Duthie HL, Gairns FW. Sensory nerve-endings­ and sensation in the anal region of

DFG grant “Discovery of risk factors for hemorrhoids“ (ID: FR 2821/19-1). The study man. Br J Surg 1960;47:585–95. on September 25, 2021 by guest. Protected copyright. received infrastructure support from the DFG Cluster of Excellence 2167 “Precision 3 Haas PA. Haas GP, Schmaltz S, Fox TA, Jr. The prevalence of hemorrhoids. Dis Colon Medicine in Chronic Inflammation (PMI)” (DFG Grant: “EXC2167”). The project was Rectum 1983;26:435-9. supported by grants from the Swedish Research Council to MD (VR 2017-02403), 4 Yang JY, Peery AF, Lund JL, et al. Burden and cost of outpatient hemorrhoids the Novo Nordisk Foundation (grants NNF17OC0027594 and NNF14CC0001) and in the United States Employer-Insured­ population, 2014. Am J Gastroenterol BigTempHealth (grant 5153-­00002B). We are indebted to the valuable assistance 2019;114:798–803. by Tanja Wesse (Genotyping of the German samples) and Petra Röthgen (German 5 Margetis N. Pathophysiology of internal hemorrhoids. Ann Gastroenterol DZHK control data set). We thank Clemens Franke (Institute of Anatomy, Christian-­ 2019;32:264–72. Albrechts-University­ of Kiel, Kiel, Germany) for graphic assistance (figure 6, online 6 Tung JY, Do CB, Hinds DA, et al. Efficient replication of over 180 genetic associations supplementary figure S1 and S13). This research has been conducted using the with self-­reported medical data. PLoS One 2011;6:e23473. UK Biobank Resource under Application Number 31435. The work on cross-­trait 7 Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep analysis for diverticular disease presented in this manuscript was supported by phenotyping and genomic data. Nature 2018;562:203–9. the German Research Council (DFG, ID: Ha3091/9-1) and the Austrian Science 8 Leitsalu L, Haller T, Esko T, et al. Cohort profile: Estonian Biobank of the Estonian Fund (FWF, ID: I1542-­B13). Data access to the UK biobank data for diverticular genome center, University of Tartu. Int J Epidemiol 2015;44:1137–47. disease was granted under project numbers 22691. EGCUT work has also been 9 Fritsche LG, Gruber SB, Wu Z, et al. Association of polygenic risk scores for multiple supported by the European Regional Development Fund and grants SP1GI18045T, cancers in a Phenome-­wide study: results from the Michigan genomics initiative. Am J No. 2014-2020.4.01.15-0012 GENTRANSMED and 2014-2020.4.01.16-0125 Hum Genet 2018;102:1048–61. This study was also funded by EU H2020 grant 692145, Estonian Research Council 10 Kvale MN, Hesselson S, Hoffmann TJ, et al. Genotyping informatics and quality control Grant PUT1660. Data analyzes with Estonian datasets were carried out in part for 100,000 subjects in the genetic epidemiology research on adult health and aging in the High-P­ erformance Computing Center of University of Tartu. The work on (GERA) cohort. Genetics 2015;200:1051–60. site-directed­ mutagenesis of ANO1 and whole-­cell electrophysiology was funded by 11 Willer CJ, Li Y, Abecasis GR. Metal: fast and efficient meta-analysis­ of genomewide NIH DK057061. We would like to thank the research participants and employees of association scans. Bioinformatics 2010;26:2190–1. 23andMe for making this work possible. The following members of the 23andMe 12 Watanabe K, Taskesen E, van Bochoven A, et al. Functional mapping and annotation Research Team contributed to this study: Michelle Agee, Adam Auton, Robert K. Bell, of genetic associations with FUMA. Nat Commun 2017;8:8. Katarzyna Bryc, Sarah L. Elson, Pierre Fontanillas, Nicholas A. Furlotte, David A. Hinds, 13 Benner C, Spencer CCA, Havulinna AS, et al. FINEMAP: efficient variable selection Karen E. Huber, Aaron Kleinman, Nadia K. Litterman, Jennifer C. McCreight, Matthew using summary data from genome-wide­ association studies. Bioinformatics H. McIntyre, Joanna L. Mountain, Elizabeth S. Noblin, Carrie A.M. Northover, 2016;32:1493–501. Steven J. Pitts, J. Fah Sathirapongsasuti, Olga V. Sazonova, Janie F. Shelton, Suyash 14 Watanabe K, Stringer S, Frei O, et al. A global overview of pleiotropy and genetic Shringarpure, Chao Tian, Joyce Y. Tung, and Vladimir Vacic. architecture in complex traits. Nat Genet 2019;51:1339–48.

Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868 11 Endoscopy

15 Buniello A, MacArthur JAL, Cerezo M, et al. The NHGRI-­EBI GWAS catalog of 31 Schafmayer C, Harrison JW, Buch S, et al. Genome-­Wide association analysis of published genome-­wide association studies, targeted arrays and summary statistics diverticular disease points towards neuromuscular, connective tissue and epithelial

2019. Nucleic Acids Res 2019;47:D1005–12. pathomechanisms. Gut 2019;68:854–65. Gut: first published as 10.1136/gutjnl-2020-323868 on 22 April 2021. Downloaded from 16 Kamat MA, Blackshaw JA, Young R, et al. PhenoScanner V2: an expanded tool for 32 Tung JY, Kiefer AK, Mullins M, et al. Genome-­Wide association analysis implicates searching human genotype-­phenotype associations. Bioinformatics 2019;35:4851–3. elastic microfibrils in the development of nonsyndromic striae distensae. J Invest 17 Cuéllar-­Partida G, Lundberg M, Kho PF, et al. Complex-­Traits genetics virtual lab: a Dermatol 2013;133:2628–31. community-­driven web platform for post-­GWAS analyses. bioRxiv 2019;518027. 33 Plackett TP, Kwon E, Gagliano RA, et al. Ehlers-Danlos­ syndrome-hypermobility­ type 18 Choi SW, O’Reilly PF. PRSice-2: polygenic risk score software for biobank-scale­ data. and hemorrhoids. Case Rep Surg 2014;2014:1–3. Gigascience 2019;8. doi:10.1093/gigascience/giz082. [Epub ahead of print: 01 07 34 Zhu L, Vranckx R, Khau Van Kien P, et al. Mutations in myosin heavy chain 11 cause 2019]. a syndrome associating thoracic aortic aneurysm/aortic dissection and patent ductus 19 Krokstad S, Langhammer A, Hveem K, et al. Cohort profile: the HUNT study, Norway. arteriosus. Nat Genet 2006;38:343–9. Int J Epidemiol 2013;42:968–77. 35 Gilbert MA, Schultz-­Rogers L, Rajagopalan R, et al. Protein-­elongating mutations in 20 Hansen TF, Banasik K, Erikstrup C, et al. Dbds genomic cohort, a prospective MYH11 are implicated in a dominantly inherited smooth muscle dysmotility syndrome and comprehensive resource for integrative and temporal analysis of genetic, with severe esophageal, gastric, and intestinal disease. Hum Mutat 2020;41:973–82. environmental and lifestyle factors affecting health of blood donors. BMJ Open 36 Kwartler CS, Chen J, Thakur D, et al. Overexpression of smooth muscle myosin heavy 2019;9:e028401. chain leads to activation of the unfolded protein response and autophagic turnover 21 Krawczak M, Nikolaus S, von Eberstein H, et al. PopGen: population-­based of thick filament-associated­ proteins in vascular smooth muscle cells. J Biol Chem recruitment of patients and controls for the analysis of complex genotype-phenotype­ 2014;289:14075–88. relationships. Community Genet 2006;9:55–61. 37 Suri P, Palmer MR, Tsepilov YA, et al. Genome-Wide­ meta-analysis­ of 158,000 22 Pers TH, Karjalainen JM, Chan Y, et al. Biological interpretation of genome-­wide individuals of European ancestry identifies three loci associated with chronic back association studies using predicted gene functions. Nat Commun 2015;6:5890. pain. PLoS Genet 2018;14:e1007601. 23 de Leeuw CA, Mooij JM, Heskes T, et al. MAGMA: generalized gene-­set analysis of 38 Holtmann G, Shah A, Morrison M. Pathophysiology of functional gastrointestinal GWAS data. PLoS Comput Biol 2015;11:e1004219. disorders: a holistic overview. Dig Dis 2017;35 Suppl 1:5–13. 24 Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network 39 Mazzone A, Gibbons SJ, Eisenman ST, et al. Direct repression of anoctamin 1 (ANO1) analysis. BMC Bioinformatics 2008;9:559. gene transcription by Gli proteins. Faseb J 2019;33:6632–42. 25 Malysz J, Gibbons SJ, Saravanaperumal SA, et al. Conditional genetic deletion 40 Moore SW, Johnson G. Acetylcholinesterase in Hirschsprung’s disease. Pediatr Surg Int of Ano1 in interstitial cells of Cajal impairs Ca2+ transients and slow waves 2005;21:255–63. in adult mouse small intestine. Am J Physiol Gastrointest Liver Physiol 41 Aigner F, Gruber H, Conrad F, et al. Revised morphology and hemodynamics of 2017;312:G228–45. the anorectal vascular plexus: impact on the course of hemorrhoidal disease. Int J 26 Naba A, Clauser KR, Whittaker CA, et al. Extracellular matrix signatures of Colorectal Dis 2009;24:105–13. human primary metastatic colon cancers and their metastases to liver. BMC 42 Franchini M, Lippi G. Relative risks of thrombosis and bleeding in different ABO blood Cancer 2014;14:518. groups. Semin Thromb Hemost 2016;42:112–7. 27 Bartels CF, Zelinski T, Lockridge O. Mutation at codon 322 in the human 43 Huang ES, Khalili H, Strate LL, et al. Su1786 ABO blood group and the risk of acetylcholinesterase (ACHE) gene accounts for YT blood group polymorphism. Am J gastrointestinal bleeding. Gastroenterology 2012;142:S-503. Hum Genet 1993;52:928–36. 44 Chang S-S­ , Sung F-­C, Lin C-­L, et al. Association between hemorrhoid and risk of 28 DeBoever C, Tanigawa Y, Aguirre M, et al. Assessing digital phenotyping to enhance coronary heart disease: a nationwide population-­based cohort study. Medicine genetic studies of human diseases. Am J Hum Genet 2020;106:611–22. 2017;96:e7662. 29 Okada Y, Wu D, Trynka G, et al. Genetics of rheumatoid arthritis contributes to biology 45 Groot HE, Villegas Sierra LE, Said MA, et al. Genetically determined ABO blood and drug discovery. Nature 2014;506:376–81. group and its associations with health and disease. Arterioscler Thromb Vasc Biol 30 Nasseri YY, Krott E, Van Groningen KM, et al. Abnormalities in collagen composition 2020;40:830–8. may contribute to the pathogenesis of hemorrhoids: morphometric analysis. Tech 46 Sandler RS, Peery AF. Rethinking what we know about hemorrhoids. Clin Coloproctol 2015;19:83–7. Gastroenterol Hepatol 2019;17:8–15. http://gut.bmj.com/ on September 25, 2021 by guest. Protected copyright.

12 Zheng T, et al. Gut 2021;0:1–12. doi:10.1136/gutjnl-2020-323868