ARTICLE

DOI: 10.1038/s41467-018-05328-9 OPEN Enhancer -QTLs are enriched on autoimmune risk haplotypes and influence gene expression within networks

Richard C. Pelikan 1, Jennifer A. Kelly1, Yao Fu1, Caleb A. Lareau 2,3, Kandice L. Tessneer1, Graham B. Wiley1, Mandi M. Wiley1, Stuart B. Glenn1, John B. Harley4, Joel M. Guthridge5, Judith A. James5,6, Martin J. Aryee 2,3, Courtney Montgomery1 & Patrick M. Gaffney1 1234567890():,; Genetic variants can confer risk to complex genetic diseases by modulating gene expression through changes to the epigenome. To assess the degree to which genetic variants influence epigenome activity, we integrate epigenetic and genotypic data from lupus patient lympho- blastoid cell lines to identify variants that induce allelic imbalance in the magnitude of histone post-translational modifications, referred to herein as histone quantitative trait loci (hQTLs). We demonstrate that enhancer hQTLs are enriched on autoimmune disease risk haplotypes and disproportionately influence gene expression variability compared with non-hQTL var- iants in strong linkage disequilibrium. We show that the epigenome regulates HLA class II genes differently in individuals who carry HLA-DR3 or HLA-DR15 haplotypes, resulting in differential 3D chromatin conformation and gene expression. Finally, we identify significant expression QTL (eQTL) x hQTL interactions that reveal substructure within eQTL gene expression, suggesting potential implications for functional genomic studies that leverage eQTL data for subject selection and stratification.

1 Division of Genomics and Data Sciences, Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City 73104 OK, USA. 2 Department of Biostatistics, Harvard T.H. Chan School of Public Health, Charlestown 02115 MA, USA. 3 Department of Pathology, Harvard Medical School, Boston 02115 MA, USA. 4 Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital, Cincinnati 45229 OH, USA. 5 Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City 73104 OK, USA. 6 Departments of Medicine and Pathology, University of Oklahoma Health Sciences Center, Oklahoma City 73104 OK, USA. Correspondence and requests for materials should be addressed to P.M.G. (email: [email protected])

NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9

fundamental objective of human genetics is to under- Results Astand how genotypes influence phenotypes. To this end, Genome-wide scan identifies 6261 enhancer hQTLs. Chromatin genome-wide association studies (GWAS) have success- immunoprecipitation (ChIP) sequencing peaks that were repro- fully identified thousands of convincing and reproducible statis- ducibly measured across two technical replicates in at least 13 of tical associations between genetic variants, phenotypic traits, and 25 LCLs were used to construct a consensus peak map specific for diseases in humans1. GWAS data, however, do not carry funda- each chromatin mark. In total, we detected 33,437 H3K27ac and mental information about how the flow of genomic information 39,613 H3K4me1 consensus peaks, of which 27,404 overlapped from genotype to phenotype is acted upon by the epigenome, both marks. potentially limiting the effectiveness of translating GWAS data Genotyping each cell line using the Illumina HumanOmni 2.5 into actionable clinical knowledge for diagnosis, prognosis, and M SNP array resulted in 1,468,562 variants that passed quality prediction of complex diseases in patients. Thus, in the post- control. Imputation with the 1000 Genomes Phase 3 reference GWAS era, significant effort has been directed toward char- panel12 produced a total dataset of 9,603,466 SNPs. Of these, acterizing epigenetic states, and the mechanisms by which the 315,210 heterozygous SNPs were located within a H3K27ac or epigenome orchestrates the flow of genomic information in spe- H3K4me1 consensus enhancer peak. Following realignment of cific cellular contexts2–4. These studies have demonstrated that ChIP-seq reads using WASP to control for reference genome much of the non-protein-coding genome is dedicated to epige- alignment bias7, the 315,210 SNPs were tested for allelic nomic activity2, and that specific post-translational modifications imbalance on histone PTMs using the combined haplotype test (PTMs) on can define the location and functional state (CHT)8. Comparison of the observed read count distribution with of enhancer elements and regions of the genome that are tran- permuted read counts demonstrated that the CHT was well scriptionally activated or inhibited3. Moreover, the epigenome calibrated for both histone marks (Supplementary Figure 1). In coordinates information flow in three-dimensional (3D) space total, we identified 6261 significant hQTLs (2007 with r2 ≤ 0.8) through chromatin loops that facilitate long-range engagement of distributed throughout the genome; 5829 significant at the 10% enhancers with promoters of genes whose expression sustains or family-wise error rate (FWER)-adjusted p-value (nominal modulates the cell state4. p-value < 8.9E−07), and 432 suggestive at 20% FWER (nominal From this framework, it follows that DNA and p-value < 9.8E−07) (Fig. 1a, b, Supplementary Data 1). H3K27ac polymorphisms have the potential to modify cellular phenotypes hQTLs were strongest in the HLA region, with greatest by inducing changes in the epigenome circuitry and how it significance located between HLA-DRB1 and HLA-DQA1 (p = processes information, particularly for complex genetic diseases. 1.23E−142) (Fig. 1a). While strong H3K4me1 hQTLs were also Accordingly, the majority of GWAS variants locate to regions of observed in the HLA region (HLA-DPB2; p = 3.13E−24), the non-protein-coding DNA5,6 and are enriched in enhancer ele- most significant H3K4me1 hQTLs were located at LRRC16A (p = ments that function as epigenome modulators of gene 7.82E−40), ~4 Mb upstream of the major histocompatibility expression4,6. Genetic variants can induce epigenetic “foot- complex (MHC) locus (Fig. 1b). The most significant non-HLA prints”—manifested as allele-specific imbalances in the magni- hQTLs (p <1E−45) for H3K27ac were observed with CACNA1E tude of histone PTMs (histone quantitative trait loci (hQTLs))— (chr 1), a region between SLMAP and FLNB (chr 3); TEC (chr 4); that identify functional states of enhancer elements7. These ANTXRLP1 (chr 10); OAS1 (chr 12); and TSHZ1 (chr 18) hQTLs can disrupt factor binding motifs leading to (Fig. 1a). Strong H3K4me1 non-HLA hQTLs (p <1E−24) were enhancer dysfunction that is heritable from parent to offspring8,9. observed in a region between ZBTB18 and C1orf100 (chr 1); These results suggest that a priori knowledge of epigenome RNASEH1 (chr 1); PNOC (chr 8); TIMM23B (chr 10); CHST11 alterations induced by hQTLs could focus analysis of disease risk (chr 12); and PLCG2 (chr 16) (Fig. 1b). haplotypes on enhancer elements most likely to harbor disease- To replicate our findings, we utilized publicly available modifying variants, even within the context of strong linkage genotyping data and LCL H3K27ac ChIP-seq data9,13 to identify disequilibrium (LD). In addition, knowledge of hQTLs and their hQTLs in ten independent Caucasian subjects. Of the 6261 effects on quantitative gene expression traits (eQTLs), particularly H3K27ac hQTLs in our discovery set, 5068 hQTLs in the in the context of the 3D chromatin network, could improve the replication set had sufficient read depth to test for allelic precision of this widely used method of genotype-to-phenotype imbalance. Following our CHT analysis, we observed 2181 analysis. (43%) hQTLs in the replication set with evidence of allelic To quantify the impact of hQTLs on complex disease risk imbalance at p < 0.05 (Supplementary Data 1; Supplementary haplotypes and gene expression traits, we performed a genome- Table 1). Given that the replication dataset was only 40% the wide screen in 25 lymphoblastoid cell lines (LCLs) from sample size and had less than one third the sequencing read depth European-American patients with systemic lupus erythematosus of our discovery sample (Supplementary Figure 2), we interpret (SLE) to identify hQTLs in weak and strong enhancers defined by these results as strong support for the reproducibility of our LCL the presence of H3K4me1 or H3K27ac, respectively10,11. Our hQTLs. results show that enhancer hQTLs are significantly enriched in We defined the effect size (ES) for hQTLs as the alpha autoimmune disease risk haplotypes and exert a disproportionate parameter/beta parameter ratio produced by the CHT, where influence on gene expression variability when compared with alpha was the maximum likelihood estimate of the reference allele non-hQTL variants in strong LD with them. We show that the read count and beta was the maximum likelihood estimate of the HLA class II locus is densely populated with enhancer hQTLs, alternative allele read count. We observed an average log2(ES) of resulting in differential 3D chromatin conformation and gene 0.89 for all hQTLs, with stronger effects observed with non-HLA = = expression between the two most common HLA class II auto- variants (log2(ESaverage) 0.90; log2(ESmax) 3.57) compared to — = = immune disease risk haplotypes HLA-DR3 and HLA-DR15. HLA variants (log2(ESaverage) 0.83; log2(ESmax) 2.47) (Fig. 1c; Finally, we identify statistically significant physical interactions Supplementary Data 1). These results demonstrate that our between eQTLs and hQTLs, in LD, that modify eQTL-based gene hQTLs produced strong effects on allelic imbalance with variants expression and explain, in part, gene expression variability of having, on average, almost two-fold more histone reads with one eQTL data, suggesting potential implications for functional allele versus the other. genomic studies that leverage eQTL data for subject selection and The distribution of hQTLs was skewed such that 4858 (78%) stratification. were unique to the H3K27ac mark and located in 879 of the

2 NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9 ARTICLE

ab–log10(p) –log10(p) FWER 20% FWER 10% FWER 20% FWER 10% Effects with p < 9E-20 0 20406080140 Effects with p < 1E-11 0 10203040

CHRGene(s) nearby SNP BP p CHISQ 1 DENND4B rs10127484 153900980 1.79E-33 145.359 CHR Gene(s) nearby SNP BP p CHISQ 1 CACNA1E rs12085947 181398405 1.91E-59 264.37 1 LUZP1 rs9426688 23451360 2.37E-15 62.729 1 ZBTB18-C1orf100 rs111815157 244486976 1.75E-32 140.831 1 1 KCNN3 rs6426988 154815320 2.65E-14 57.981 1 2 DTNB rs10206012 25624186 1.41E-22 95.598 1 CD244 rs60755678 160825706 2.05E-13 53.957 2 CTNNA2 rs17340143 80756301 5.97E-21 88.181 1 ZBTB18-C1orf100 rs77638395 244484884 2.06E-26 113.091 2 IKZF2 rs13405747 213855490 2.57E-35 153.792 2 RNASEH1 rs75802434 3611491 3.22E-25 107.642 3 SLMAP-FLNB rs1675383 57945274 9.57E-48 210.719 2 PDIA6 rs1734426 10932989 6.58E-18 74.339 4 SLC2A9 rs60045583 9971950 7.51E-26 110.528 2 FTOP1 rs11691418 43025422 1.85E-15 63.215 4 TEC rs4695350 48175191 2.88E-79 355.361 2 RAMP1-GPR35 rs895572 238815252 8.22E-14 55.753 2 5 LPCAT1 rs36982 1503205 5.71E-34 147.633 2 3 SSUH2 rs1965975 8670161 2.07E-13 53.934 5 CTD-2247C11.1 rs114338318 5007267 1.21E-35 155.287 5 CWC27 rs447583 64357583 2.80E-35 153.625 3 PRKCD-TKT rs34041412 53246926 2.22E-24 103.813 6 LRRC16A rs9467502 25405877 1.57E-32 141.053 3 SLMAP-FLNB rs1675383 57945274 2.91E-19 80.501 6 HLA-DRB1/DQA1 rs9271908 32596131 1.23E-142 646.597 3 4 TEC rs62309327 48174474 2.31E-17 71.856 3 6 SAMD3 rs4895868 130591318 1.09E-45 201.294 4 TENM3 rs17267298 183730650 2.97E-19 80.459 6 MOXD1 rs9493309 132725738 7.51E-27 115.094 5 LHFPL2 rs10069173 77929023 2.07E-12 49.414 7 GPR146 rs11768486 1101290 4.42E-31 134.42 5 Gene desert rs1549171 116222074 3.93E-15 61.733 7 EIF2AK1 rs10263017 6065804 4.86E-28 120.521 4 5 TNFAIP8 rs6883657 118683346 9.76E-12 46.376 4 7 DFNA5 rs754555 24758753 9.71E-41 178.618 5 SPRY4-FGF1 rs450988 141758788 6.56E-13 51.672 7 FAM1888 rs1034750 30838147 2.33E-29 126.551 6 LRRC16A rs6937918 25407295 7.82E-40 174.486 7 RBM33 rs1920451 155546598 4.31E-26 111.628 6 BTN3A2 rs9358932 26362705 7.81E-18 74 8 FAM167A rs2618444 11338370 1.09E-34 150.925 5 6 HLA-DPB2 rs9277655 33084051 3.13E-24 103.137 5 8 TPD52 rs717340 81046850 1.43E-33 145.814 7 FAM188B rs1017012 30837003 1.74E-17 72.423 9 DAPK1 rs10868631 90171305 1.94E-23 99.524 7 TRGV5 rs111405005 38384542 5.47E-13 52.03 9 TRIM14 rs12552242 100866201 3.70E-21 89.127 7 C7orf65 rs6953275 47708823 1.14E-17 73.249 9 DEC1 rs4246908 117969959 1.21E-37 164.443 6 6 10 ANTXRLP1 rs11259761 47640726 1.93E-47 209.322 8 FAM167A rs2736332 11339965 5.64E-14 56.492 10 CHST3 rs4746104 73744987 5.61E-31 133.949 8 PNOC rs2722906 28186027 4.71E-25 106.887 10 ADK rs10824142 76050007 4.55E-21 88.719 8 PRDM14 rs455041 70927624 6.43E-22 92.591 11 GLYAT rs11607534 58414794 5.85E-26 111.022 7 9 AK3 rs62544266 4726759 3.67E-19 80.039 7 11 MAP6 rs599816 75293982 3.79E-26 111.884 9 KIAA1432 rs4348560 5768609 2.78E-12 48.841 12 BCAT1 rs2220270 24977058 3.96E-28 120.931 9 VAV2 rs3858101 136686242 8.46E-15 60.225 12 CLLU1-C12orf74 rs1012529 92927720 4.34E-22 93.371 10 ALOX5 rs3780895 45874817 1.86E-14 58.677 8 12 CHST11 rs60351248 104900636 8.18E-25 105.795 8 10 TIMM23B rs2611475 51503178 9.43E-26 110.076 12 OAS1 rs6489870 113363972 1.32E-55 246.767 10 SH3PXD2A rs7923100 105518482 2.93E-18 75.935 12 SLC15A4 rs35907548 129282180 3.56E-33 143.994 11 GLYAT rs1954694 58416262 4.07E-12 48.089 14 SLC7A7 rs12884337 23273114 1.44E-21 90.99 11 FKBP2 rs12794145 64006484 1.73E-14 58.823 9 14 SPTLC2 rs2886131 78020290 1.92E-37 163.521 9 12 PCED1B rs10881070 47584050 4.73E-15 61.371 15 HERC1-DAPK2 rs8037081 64175305 4.23E-28 120.799 12 CHST11 rs60351248 104900636 4.60E-31 134.341 16 MMP2 rs1558668 55477585 6.89E-32 138.111 12 OAS1 rs6489870 113363972 1.23E-23 100.42 10 16 TANGO6 rs72789210 69091823 5.43E-27 115.735 10 16 TLDC1 rs12924123 84541424 4.47E-31 134.4 14 GPATCH2L rs17104243 76662370 9.91E-14 55.384 15 SRP14-AS1 rs76119380 40360843 2.51E-14 58.087 16 FAM92B-GSE1 rs59458341 85259889 6.73E-25 106.181 11 17 C17orf97 rs7502594 260182 8.86E-21 87.401 11 15 USP3 rs11852815 63888113 5.92E-13 51.873 17 XAF1 rs1983180 6656774 1.45E-29 127.496 15 MIR422A rs8037081 64175305 8.15E-19 78.463 17 SLFN5 rs883416 33570441 1.22E-40 178.167 16 ZNF500 rs17137242 4798072 4.59E-16 65.965 17 KANSL1 rs2696604 44214094 1.83E-21 90.518 16 MYH11 rs9932695 15948539 1.68E-18 77.035 12 17 LLGL2 rs1661716 73567535 1.48E-33 145.763 12 16 PLCG2 rs7198191 81811485 2.42E-27 117.339 17 ENPP7 rs11868696 77706406 1.11E-28 123.454 16 TAF1C-TLDC1 rs35044427 84542333 3.23E-20 84.846 17 RPTOR rs2672884 78865171 8.31E-23 96.642 16 SLC7A5 rs56722741 87877313 2.76E-21 89.709 13 17 BAHCC1-ACTG1 rs9911340 79455609 1.61E-26 113.579 13 17 MYO1D rs395110 30892951 6.06E-16 65.419 18 NDC80-CBX3P2 rs7241649 2640201 3.45E-24 102.941 17 SPACA3 rs2346390 31297879 1.93E-13 54.072 18 PIEZO2 rs462781 10951824 4.09E-31 134.575 17 PGAP3 rs12150298 37834541 2.56E-12 49.001 14 18 SKOR2-SMAD2 rs1393515 45206799 4.57E-26 111.51 14 KANSL1- 18 TSHZ1 rs7240363 72973762 1.55E-47 209.761 17 LRRC37A rs56130708 44366665 4.69E-18 75.006 19 SMARCA4 rs11085752 11065409 3.33E-23 98.452 15 15 17 GPRC5C rs2706522 72451848 9.37E-16 64.558 19 ZNF254 rs62117617 24214709 2.16E-25 108.431 17 LLGL2 rs1661716 73567535 3.07E-21 89.496 19 ZNF568 rs1644647 37464307 4.41E-24 102.457 16 17 NPLOC4 rs7209180 79581519 4.38E-18 75.14 20 RASSF2 rs7346461 4791074 7.26E-24 101.468 16 CLC 20 BCAS1 rs462174 52680298 5.99E-25 106.413 19 rs385895 40217907 1.63E-13 54.401 17 21 SAMSN1 rs2822745 15899275 2.72E-32 139.957 17 20 BCAS1 rs458674 52679327 3.72E-12 48.269 21 SLC37A1 rs79918208 43977153 7.34E-20 83.219 21 Gene desert rs619484 44783957 6.81E-12 47.08 18 18 21 ICOSLG rs8134436 45616497 3.46E-27 116.628 21 YBEY-C21orf58 rs2839204 47716547 1.71E-21 90.656 19 21 C21orf58 rs11700398 47716552 4.41E-34 148.147 19 22 MYH9-TXN2 rs28625284 36830042 3.18E-21 89.429 22 IGLV4-60 rs743519 22522272 2.05E-20 85.743 22 ELFN2 rs5750423 37734452 1.69E-13 54.331 20 22 APOBEC3H rs9611092 39497181 1.66E-24 104.394 20 21 22 PACSIN2 rs1807570 43319558 3.88E-26 111.836 21 22 NUP50-UPK3A rs12170732 45643680 1.27E-23 100.361 22 22 c 160 Non-HLA hQTLs

140 HLA hQTLs

120

100 -value)

p 80 ( 10 60 –log

40

20

0 –3–2–101234

log2(hQTL effect size)

Fig. 1 hQTL Manhattan plots and plot of hQTL effect sizes. Top effects for a the H3K27ac mark (p <1E−30, vertical gray line), and b the H3K4me1 mark (p ,<1E−15, vertical gray line) are indicated. Strongest effects for each mark are in bold. Red vertical line indicates the 10% FWER threshold (nominal p , = , 8.9E−07); blue vertical line indicates the 20% FWER threshold (nominal p = 9.8E−07). Y-axis is the −log10 (p) from the CHT; position is plotted on the x-axis. c Volcano plot of the hQTL effect sizes. The log2(effect size) is plotted on the x-axis where values >0 had more reference than alternate allele specific reads and those with effect size values <0 had more alternate than reference allele specific reads. Y-axis is the −log10 (p) from the CHT. hQTLs located in the HLA region are in yellow and non-HLA hQTLs are in blue

NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9

H3K27ac consensus peaks, while 817 (13%) were unique to using the 44 AI disease risk haplotypes that contained an hQTL in H3K4me1 and found in 628 of the H3K4me1 consensus peaks. strong LD (D′ ≥ 0.8) with the lead GWAS variant. We limited the Only 586 (9%) hQTLs were in consensus peaks shared by both analysis to variants with low scoring RegulomeDB scores of 1 and marks. Consensus peaks containing hQTLs were enriched, when 2, thus focusing on variants with high confidence for functional compared to the non-hQTL consensus peaks, for chromatin impact (Supplementary Data 3). Among the 44 haplotypes, 63 out states consistent with enhancers, as well as of 386 total hQTLs (16%) compared with 424 out of 5351 total motifs found in the B lymphoid lineage (Supplementary Figure 4; non-hQTLs (8%) scored a 1 or 2, thus demonstrating a significant Supplementary Data 2). enrichment of hQTLs versus non-hQTLs ranked for high likelihood of functional impact (p = 1.88E−8). Although we cannot rule out the functional contribution of non-hQTL putative hQTLs map to disease risk haplotypes and impact gene causal variants based on these analyses alone, our results suggest expression. To add disease specific context to our hQTL dataset, that knowledge of hQTLs would be valuable in deconstructing LD we utilized 1436 index SNPs from 21 autoimmune (AI) diseases on risk haplotypes into component SNPs in enhancers that most (Supplementary Table 2) reported in the NHGRI-EBI Catalog of likely contribute to locus-specific causality. Published Genome-Wide Association Studies (October 17, 2016; www.ebi.ac.uk/gwas) to construct risk haplotypes based on LD HLA-DR3 and -DR15 hQTLs differentially regulate HLA (D, ′ ≥ 0.8). We found that our hQTLs were enriched on risk genes. The HLA region spans ~3.4 MB on Chromosome 6 and is haplotypes, likely identifying enhancers at risk for regulatory enriched for genes that orchestrate and regulate innate and dysfunction. A total of 386 hQTLs mapped to 44 reported AI adaptive immune responses. in the HLA Class = − = disease risk haplotypes (ppermutation 4.9E 62; N 180 hQTLs II region remains the most reproducibly significant finding in expected by chance). Interestingly, 15 hQTL SNPs were the exact GWAS studies of autoimmune diseases18–27. Our hQTL analysis index AI SNP reported in the catalog, while 344 hQTL SNPs were revealed significant allelic imbalance within the HLA Class I and strong proxies (r2 ≥ 0.8) of an AI index SNP. Considering only II regions for both histone marks (N = 1063 hQTLs) (Supple- SLE, we observed 68 hQTLs mapping to SLE risk haplotypes in mentary Figure 4), but no significant allelic imbalance in the Class BLK, FAM167A, HCG27, HLA-DQA1, HLA-DRB1, PXK, and III region for either mark. = − = SLC15A4 (ppermutation 1.9E 4; N 45 hQTLs expected by The region between HLA-DRB1 and HLA-DQB1, which chance). Altogether, 1520 hQTLs (24%) mapped to 550 risk loci contains the XL9 regulatory sequence that epigenetically from 239 different genetic traits reported in the GWAS catalog coordinates HLA-D gene expression28, demonstrated the most = − = (ppermutation 2.1E 90; N 1007 hQTLs expected by chance). concentrated and significant evidence of H3K27ac allelic To ensure our results were not driven solely by HLA loci, we removed all HLA hQTLs from the analysis and still observed significant enrichment with AI disease (ppermutation = 4.4E−3; N , = 177 observed/146 expected by chance), and complex traits 0.50 reported in the NHGRI database (ppermutation = 4.2E−40; N = 1151 observed/839 expected by chance), and suggestive enrich- 0.45 ment with non-HLA SLE risk haplotypes (ppermutation = 0.073; = N , 48 hQTLs observed/38 expected by chance). 0.40 We would expect hQTLs to produce measurable fluctuations on target gene expression variance if the allelic imbalance results 0.35 in altered enhancer potency. To test this possibility, we first calculated gene-level eQTLs in the gEUVADIS European LCL 0.30 dataset (N = 358)14 using the same methods described there but fi with updated RNA-seq alignment and quanti cation 0.25 methods15,16 and identified 2403 (74%) of the 3259 eQTLs 14 originally reported by gEUVADIS . Using these eQTLs as 0.20 functional surrogates of risk haplotypes, we identified 245 haplotypes of sufficient size (≥six variants) that contained an 0.15 hQTL in strong LD (D′ ≥ 0.8) with the eQTL but not highly correlated (r2 < 0.6) in the genetic information it captured. We Proportion of additional gene expression 0.10 calculated joint effect generalized linear models for each of the variation explained by incorporation of hQTL 245 haplotypes to determine if incorporation of the genotype at the hQTL and each variant within the haplotype block 0.05 significantly contributed to expression variation of the eQTL target gene (deviance, or D2). To generate the null distribution of 0.00 D2 and significance at each locus, we permuted the position of the <0.01 <0.02 <0.03 <0.04 <0.05 hQTL at each non-QTL variant position within the haplotype. p p p p For 87 haplotypes (36%), the hQTL demonstrated significantly 0.01< 0.02< 0.03< 0.04< increased ability to model expression variation (Pperm < 0.05) of Permutation p -value the eQTL target gene over the eQTL alone or any other non-QTL variant on the haplotype (Fig. 2). These results suggest that Fig. 2 hQTLs impact gene expression in haplotype blocks that also contain variants with strong allelic imbalance at histone marks can eQTLs. Figure plots the 87 haplotypes for which the hQTL demonstrated produce measurable effects on gene expression variance, even significantly increased ability to model expression variation of the eQTL above that of known eQTLs or other non-QTL variants in tight target gene (ppermutation < 0.05). The proportion of additional gene LD with them. expression variation by incorporation of the hQTL genotypes (D2) is plotted We then tested how RegulomeDB, an established functional on the y-axis and the permutation p-value is plotted on the x-axis. HLA variant annotation method17, would rank hQTLs and non-hQTLs haplotypes are colored in red and non-HLA haplotypes are in black

4 NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9 ARTICLE imbalance at 32.596 MB (p = 1.23E−142) (Fig 1a; Supplementary coupled with chromatin immunoprecipitation (HiChIP) Figure 4; Supplementary Data 1). For SLE, two HLA risk to develop 3D chromatin topology maps for H3K27ac and haplotypes (HLA-DR3 and HLA-DR15 (historically named HLA- CTCF-mediated loops from independent LCLs heterozygous for DR2)) confer non-symmetrical risk for disease, with HLA-DR3 HLA-DR3 or HLA-DR15 (N = 3 for each haplotype). Again, being a more potent risk factor29–31. In addition, eQTLs carried sequencing reads were aligned to the appropriate COX or PGF on these haplotypes show increased expression of HLA-DR and genome to minimize reference genome bias in the MHC locus. In HLA-DQ genes compared to non-risk HLA haplotypes28; aggregate across all six cell lines, we identified 175,833 H3K27ac however, it is not known if eQTL effects differ when the two and 86,891 CTCF high-confidence chromatin loops with at least risk haplotypes are compared to each other. To determine if SLE four supporting paired-end tags (PETs) spanning two anchors; risk haplotypes contributed differential allelic imbalance in 8345 loops were shared by both H3K27ac and CTCF. A total H3K27ac enhancer marks, we imputed the HLA classical alleles of 5608 (90%) hQTLs were located within loop anchors. in our 25 LCL samples and performed haplotype specific hQTL As expected, since our hQTLs were H3K27ac-biased, 5196 analyses (Fig. 3, Supplementary Figures 5-7); the majority of hQTLs were found only in H3K27ac-mediated loops, while 37 individuals (N = 21) were heterozygous for either HLA-DR3 hQTLs were present only in CTCF-mediated loops, and 375 in (N = 9) or HLA-DR15 (N = 12). Since the human reference both loop types. genome is homozygous for HLA-DR1532, we wanted to ensure A strong CTCF loop extended from BTNL2 to a region that any observed differences between the two haplotypes between HLA-DQB1 and HLA-DQA2 (Fig. 5a). This loop likely reflected the true intrinsic effect of genetic variation and were defines an insulated neighborhood33,34 and is supported by Hi-C not due to alignment bias of sequencing reads. We constructed a data from the GM12878 cell line, which demonstrates a custom genome containing the COX (HLA-DR3) Major topologically associating domain (TAD)35,36 in this region Histocompatibility Complex (MHC) sequence in place of the (Supplementary Figure 8). Within this insulated neighborhood, PGF (HLA-DR15) MHC sequence in the human reference we observed differential H3K27ac-mediated chromatin loop genome and mapped the sequencing reads from the HLA-DR3 frequencies between the HLA-DR3 and HLA-DR15 subjects. and HLA-DR15 subjects to their respective HLA reference Higher loop frequencies (as determined by PET counts) were genomes to evaluate haplotype-specific effects. Focusing on the observed within the HLA-DR15 subjects with PGF enhancers HLA-DRB1 to HLA-DQB1 region, we observed three strong 2(p = 0.008) and 5 (p = 0.045) (Fig. 5b, c). The PGF enhancer hQTL signals (A: 32.568–32.571 Mb, B: 32.578–32.582 Mb and C: 2 corresponds to the exact positions of the preferential HLA- 32.588–32.606 Mb (Fig. 3; Supplementary Figure 5). Signals A DR15 hQTL signals A and B described above, and the PGF and B were produced primarily from the HLA-DR15 haplotype enhancer 5 lies within the region of HLA-DQB1 (32.625–32.641 with over three-fold more total allele-specific reads (ASR) than Mb on the human reference genome) that exhibits stronger = the HLA-DR3 haplotype (mean ASR(Total)/SNP 49 vs. 15% for hQTLs within the HLA-DR15 subjects compared to HLA-DR3 HLA-DR15 and HLA-DR3 subjects, respectively) (Supplementary subjects (Fig. 3b); additionally, these two enhancers interact Data 4). Signal C, while exhibiting comparable total ASR between through a loop that is absent in the HLA-DR3 subjects (Fig. 5b, c the two haplotypes, had more hQTLs with highly significant CHT (red loop)). Thus, the strong allelic imbalance observed in this p-values coming from the HLA-DR3 haplotype compared to the region with the HLA-DR15 haplotype likely results in HLA-DR15 haplotype (N = 261 vs. N = 110) (Supplementary the significantly increased loop frequency to these enhancers Data 5). To understand why the HLA-DR3 subjects had more and the increased expression of HLA-DQA1 and HLA-DQB1 significant hQTLs when the total ASR between the two within the HLA-DR15 subjects. haplotypes was comparable, we separated the total ASR into By comparison, the HLA-DR3 subjects demonstrated signifi- reference and alternate ASR. When mapped to the COX reference cantly higher loop frequency with the HLA-DRB1 COX fi = genome, the HLA-DR3 subjects had signi cantly higher ASR(ref)/ enhancer 1 (p 0.013). This enhancer exhibited strong down- ASR(alt) ratios at strong hQTLs (ASR(ref)/ASR(alt) = 21.35) stream looping interactions with the COX enhancers 3 and 4; compared to HLA-DR15 subjects mapped to the PGF genome these downstream looping interactions were absent with the (ASR(ref)/ASR(alt) = 2.35) (Mann Whitney p = 1.4E-05). Since the HLA-DRB1 promoter enhancer on the HLA-DR15 haplotype alternate allele on the PGF haplotype is the reference allele on the (Fig. 5b, c (blue loops)). The COX enhancers 3 and 4 correspond COX haplotype for most of these variants, these results to the location of the strongest HLA-DR3 hQTL signal (Fig. 3b; demonstrate stronger allelic imbalance with the variant alleles Signal C); therefore, the strong allelic imbalance observed with carried on the COX haplotype, which results in more significant HLA-DR3 in this region may be driving the significantly hQTLs within the DR3 subjects at signal C (Supplementary increased looping frequency to the HLA-DRB1 promoter and Data 5). increased expression of the HLA-DRB1 gene in the HLA-DR3 To determine if the HLA-DR15 and HLA-DR3 haplotypes subjects. In addition, the HLA-DRB1 promoter has two additional correlated with local gene expression, we measured the expression Alu repeats present only on the COX genome in this region; of Class II genes stratified by HLA risk haplotype in our 25 LCLs. otherwise, the two sequences were very similar. Some Alu repeats HLA-DR15 carriers demonstrated a significant increase in have the potential to modulate gene transcription37, modulate expression of HLA-DQB1 (Fig. 4), whereas the HLA-DR3 carriers nucleosome positioning38 and shape the epigenomic landscape, demonstrated significant increases in expression of HLA-DRB1 thus these HLA-DR3 specific Alu repeats may also contribute to (Fig. 4). Imputing the classical HLA alleles in the European the preferential looping and increased HLA-DRB1 gene expres- gEUVADIS dataset and conducting the same analysis provided sion observed in HLA-DR3 subjects. confirmation of these results in a more powerful dataset. In addition, HLA-DR15 carriers within the gEUVADIS dataset (N = 69) exhibited a significant increase in expression of HLA- hQTLs reveal substructure in eQTL regulatory networks. DQA1 compared to HLA-DR3 carriers (N = 54) (Fig. 4). hQTLs located within the chromatin looping network of an eQTL To determine if the genetic variation resulting in differential could potentially modify the eQTL effect if interacting on the hQTL allelic imbalance between HLA risk haplotypes influences same target gene promoter. To assess this in our dataset, we used the chromatin topology in the HLA class II region, we used high- a generalized linear model to measure the joint action of inde- throughput sequencing of chromosome conformation capture pendent (r2 < 0.6; D′ < 0.6) hQTL x eQTL SNP combinations on

NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications 5 lte nthe on plotted eeoyosfrteHAD1 tppnl rHAD3(otmpnl haplotypes. panel) (bottom HLA-DR3 or panel) (top HLA-DR15 the for heterozygous 6 3 Fig. ARTICLE leespeci allele b,adC(32.588 C and Mb), tapriua SNP. particular a at L ls IhTshv ipootoaeD1/R S ratios. ASE DR15/DR3 disproportionate have hQTLs II Class HLA fi ed tapriua N nteHAD1 n akoag aus(min values orange dark and HLA-DR15 the in SNP particular a at reads c x ai.hTsaeclrdb h log the by colored are hQTLs -axis. b – omdi einhglgtdb lc o in box black by highlighted region in Zoomed 266M)aeindicated are Mb) 32.606 b a –log10(p -value) –log10(p -value) DR3 DR15

DR3 DR15 100 100 20 40 60 AUECOMMUNICATIONS NATURE 20 40 60 80 20 40 60 20 40 60 80 0 0 0 0 243. 263. 283. 33.0 32.9 32.8 32.7 32.6 32.5 32.4 HLA–DRA HLA–DRB1 25 25 32.60 32.58 32.56 HLA–DRB5 A HLA–DRB6 2 D1 leespeci allele (DR15 B HLA–DRB1 (08 :95 O:1.08s16-1-52- www.nature.com/naturecommunications | 10.1038/s41467-018-05328-9 DOI: | (2018)9:2905 | HLA–DQA1 HLA–DQB1 Position onchr6(Mb) a C (32.56 fi Position onchr6(Mb) ed/R leespeci allele reads/DR3 c HLA–DQA2 a HLA–DQA1 – ersnaino L ls I(32.4 II Class HLA of Representation HLA–DQB2 26 b sdslydin displayed is Mb) 32.66 HLA–DOB AUECMUIAIN O:10.1038/s41467-018-05328-9 DOI: | COMMUNICATIONS NATURE PSMB8–AS1 Y = ai sthe is -axis TAP2 − PSMB8 26 32.64 32.62 PSMB9 .1 niaemr leespeci allele more indicate 6.11) TAP1 LOC100294145 fi ed) akprl aus(max values purple dark reads); c HLA–DQB1 − HLA–DMB log HLA–DMA b inl (32.568 A Signals . 10 BRD2 ( p rmteCT hoooepsto M)is (Mb) position chromosome CHT; the from ) HLA–DOA – 31M)hTsb individuals by hQTLs Mb) 33.1 Log HLA–DPB1 Log HLA–DPA1 ASR ratio) 2 ASR ratio) fi 2 HLA–DPB2 (DR15/DR3 (DR15/DR3 ed nteHAD3samples HLA-DR3 the in reads c – 251M) (32.578 B Mb), 32.571 –6 –3 0 3 –6 –3 0 3 33.1 = .3 niaemore indicate 5.63) – 32.582 NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9 ARTICLE

HLA-DRB1 HLA-DQA1 HLA-DQB1 (p = 0.01) (p = NS) (p = 0.02) 400 300 150

300 200 100

200

100 50 100 Normalized counts (thousands) 0 0 0 HLA-DR3 HLA-DR15HLA-DR3 HLA-DR15HLA-DR3 HLA-DR15

HLA-DRB1 HLA-DQA1 HLA-DQB1 (p < 0.0001) (p = 0.0002) (p < 0.0001) 80 60 40

60 30 40

40 20

20 20 10 Normalized counts (thousands) 0 0 0 HLA-DR3 HLA-DR15HLA-DR3 HLA-DR15 HLA-DR3 HLA-DR15

Fig. 4 Impact of HLA-DR3 and DR15 haplotypes on HLA Class II gene expression. Data presented are from our 25 LCL samples (top panel) and the European gEUVADIS dataset (bottom panel). Normalized gene counts (measured in thousands) for each gene are plotted on the y-axis and colored by risk haplotype (DR3: blue; DR15: red); risk haplotype is presented on the x-axis. Mean and standard error of the mean bars are given in black eQTL target genes. Following multiple hypothesis correction, we with the eQTL rs10098664 (p = 2.7E−14) and was driven by identified 17,228 significant (FDR ≤ 0.05) interactions (median/ individuals who carry an increasing dose of the alternate eQTL C mean distance between two SNPs: ~115 kb/~370 kb). The allele. This variant, located within the BLK ninth intron, was majority of significant interactions (N = 15,567, 90.4%) were brought into proximity with the BLK promoter through observed with genes in the HLA region, most frequently in the chromatin interactions (Fig. 6d). We identified a hQTL Class II region (N = 13,280) and less frequently in the Class I and (rs13256690) that is positioned between BLK and FAM167A, Class III regions (N = 1510 and 777, respectively) (Supplemen- ~79KB upstream of the eQTL, and was also physically associated tary Data 6). The remaining 9.6% of significant interactions (N = with the BLK promoter through chromatin looping. The 1661) involved 752 genes (Supplementary Data 7). The direc- significant interaction of the two variants with BLK expression tionality of these modifying effects was not uniform (Fig. 6a, b). (p-interaction = 8.05E−05) revealed enhanced reduction of BLK Most interactions (N = 13,506, 78%) demonstrated an opposing expression in individuals with increasing alternate allele dosage at effect on eQTL expression in samples carrying the alternate allele the hQTL. Lowest BLK expression was observed in individuals at the hQTL—7279 produced repressive hQTL effects on an who were homozygous for the alternate alleles at both loci. These enhancing eQTL (Fig. 6a, b, green), while 6227 did the opposite results illustrate that an important component of variance in (Fig. 6a, b, red). The remaining 3722 interactions (22%) eQTL genotypic expression is due to independent hQTLs located demonstrated concordant directionality with the eQTL, i.e., within the chromatin network of an eQTL, revealing gene expression in the same direction as the eQTL (Fig. 6a, b, blue and expression stratification that is masked by observing eQTL data yellow). alone. In general, we found that significant gene expression substructure was revealed only when considering the joint contribution of genotypes at both the hQTL and eQTL; this Discussion substructure was not evident when evaluating the effect of the Causal variants on disease risk haplotypes are the molecular link eQTL alone. We highlight ICOSLG and BLK as representative connecting genotype to complex disease phenotypes; however, examples (Fig. 6c, d). There is a strong eQTL (rs56124762) within the precision and pace of causal variant discovery is hampered by the Crohn’s disease and ulcerative colitis-associated ICOSLG strong LD with neutral variants. Our approach demonstrates that locus39–41 in which individuals with increased dosage of the hQTLs are enriched in autoimmune and complex disease risk alternate G allele exhibited significantly higher levels of ICOSLG haplotypes. In addition, hQTLs disproportionately contribute to gene expression, on average, than those who only carried the gene expression variation compared to non-hQTL variants in reference A allele (p = 2.7E−09) (Fig. 6c). We identified an hQTL strong LD; when located on haplotypes with known eQTLs, the (rs8134436) located 42 kb upstream of the eQTL that hQTL demonstrated significantly increased ability to model was physically associated with the ICOSLG promoter by long- expression variation of the eQTL target gene over the eQTL alone range cis-interaction. When the eQTL and hQTL genotypes or any other non-QTL variants 36% of the time. This suggests are evaluated together, gene expression substructure emerged that a priori knowledge of variant-induced allelic imbalance of with increasing dosage of the hQTL’s alternate G allele epigenetic PTMs in enhancer elements can facilitate identification (p-interaction = 1.11E−09). Similarly, for the BLK locus of the most influential putative causal variants on risk haplotypes associated with rheumatoid arthritis, SLE, and Kawasaki even in the context of strong LD. Current approaches to causal disease22,42–48, a significant decrease in expression was observed variant discovery use bioinformatic predictions to develop

NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9 prioritized lists of putative causal variants. However, bioinfor- established bioinformatic annotation method, hQTLs had matic algorithms lack information about allele-specific effects twice as many high functional RegulomeDB scores compared of most genetic variants that modulate epigenome to non-hQTLs, demonstrating that the identification of function. We found that when using RegulomeDB, an hQTLs could help prioritize and constrain the variants that would

HLA-DOB a HLA-DQA1 PSMB9 HLA-DRA HLA-DRB6 HLA-DQB1 TAP1 HLA-DRB3 HLA-DQA2 HLA-DRB1 TAP2 DR3 COX-aligned

3.85 3.88 3.91 3.94 3.97 4 4.02 4.05 4.08 4.11 4.14 4.17 4.2 4.23 4.26 4.29

3.85 3.88 3.91 3.94 3.97 4 4.02 4.05 4.08 4.11 4.14 4.17 4.2 4.23 4.26 4.29 DR15 PGF-aligned HLA-DQB1 HLA-DQA1 HLA-DRA HLA-DRB6 HCG23 HLA-DRB1 HLA-DQA2 BTNL2 HLA-DRB5

HLA-DQA1 b HLA-DRA HLA-DRB6 HLA-DQB1 HLA-DRB3 HLA-DRB1 HLA-DQA2 DR3 COX-aligned

3.85 3.87 3.89 3.91 3.93 3.95 3.97 3.99 4.01 4.03 4.05 4.07 4.09 4.11 4.13 4.15 4.17 1 2345

1 2 3 4 5 3.85 3.87 3.89 3.91 3.93 3.95 3.97 3.99 4.01 4.03 4.05 4.07 4.09 4.11 4.13 4.15 4.17 DR15

PGF-aligned HLA-DQA1 HCG23 HLA-DRB6 HLA-DRA HLA-DRB1 BTNL2 HLA-DRB5 HLA-DQB1 c LCL#3 LCL #6 LCL#2 LCL#5 HLA-DR3 HLA-DR15 LCL#4 LCL#1

3.85 3.89 3.93 3.97 4 4.03 4.07 4.11 4.15 3.85 3.89 3.93 3.97 4 4.03 4.07 4.11 4.15 12 34 5 1 2 3 4 5

8 NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9 ARTICLE need experimental validation for mechanistic causality, thus eQTLs for the selection of subjects for functional analysis of hastening the pace for understanding how associated causal GWAS effects or genetically guided clinical studies. variants modulate gene expression phenotypes and risk for In this report, we used an SLE-derived LCL model system to complex diseases. perform a genome-wide screen for allelic imbalance in PTM of Our results shed new light on the HLA region and how it is enhancer histones induced by common genetic variants. Epstein- regulated in the context of chromatin architecture and con- Barr Virus (EBV) transformation modifies the epigenome of B- siderable genetic variation. HLA Class II loci demonstrate allele cells resulting in promoter demethylation51,52, more regions of specific gene expression when comparing HLA risk and non-risk open chromatin53, and increased numbers of expressed haplotypes, and in eQTL studies without stratification on disease transcripts52,53, while maintaining most of the transcripts status28,49. Our results extend these observations to the level of expressed in the original B-cell phenotype52. We view this as a the epigenome, where we show that the HLA Class II region strength of this model system since more of the genome is capable exhibits significant levels of allelic imbalance in enhancer PTMs of being interrogated for allelic imbalance using our approach. between the two most frequent HLA risk haplotypes for human Nevertheless, careful analysis of hQTLs in specific primary cell SLE and other autoimmune diseases (HLA-DR3 and HLA- lineages and cell states will be necessary to develop a compre- DR15). Using 3D chromatin topography mapping, we found that hensive portrait for how genetic variation modulates gene the HLA-DR15 risk haplotype demonstrated significantly expression through allele-specific epigenome mechanisms. increased H3K27ac-mediated chromatin loop frequencies to Moreover, hQTLs are likely to differ across ethnic backgrounds enhancers flanking HLA-DQA1 and HLA-DQB1. These enhan- since the allele frequency distribution of SNPs varies across dis- cers correspond to regions of strong allelic imbalance and hQTLs parate populations. Further work will be necessary to assess the observed in the HLA-DR15 subjects, which are likely associated power of this approach in admixed populations. with the observed higher levels of HLA-DQA1 and HLA-DQB1 gene expression in these individuals. Alternatively, enhancers Methods located in the region producing the strongest hQTLs among the Study population and cell culture. Experiments were approved by the Institu- HLA-DR3 individuals produced unique loops to the promoter of tional Review Board at the Oklahoma Medical Research Foundation (OMRF) prior fi HLA-DRB1 and likely contribute to subsequent higher HLA- to initiation. A total of 25 de-identi ed EBV-transformed B cell lines generated from European SLE patients (21 female, 4 male) enrolled into the Lupus Family DRB1 gene expression compared to HLA-DR15 subjects. While Registry and Repository (LFRR)54 were obtained from OMRF’s Autoimmune the underlying mechanisms remain to be fully clarified, it is Disease Institute Biorepository, Phenotyping, and Bioinformatics Cores and are tempting to speculate that the differences in HLA class II chro- hereafter denoted as lymphoblastoid cell lines (LCLs). All study participants pro- matin topology between these two haplotypes and correlated gene vided written informed consent prior to study enrollment into the LFRR. Using previously generated genotype data, cell lines were selected on their risk haplotype expression of HLA-D genes may explain, in part, the increased repertoire for 24 SLE genes: ATG5, BANK1, BLK, CD44, IFIH1, IKZF1, IKZF3, risk for SLE conferred by the HLA-DR3 risk haplotype. IL10, IRF5, IRF7, IRF8, ITGAM, JAZF1, LRRC18/WDFY4, NCF2, PRDM1, STAT4, Expression QTL analysis provides a valuable intermediate TNFAIP3, TNFSF4, TNIP1, TYK2, UBE2L3, and XKR6. Cell lines were maintained phenotype for the functional characterization of causal variants in RPMI 1640 medium supplemented with 10% FBS, penicillin, streptomycin, and on disease risk haplotypes. Moreover, as precision medicine L-. approaches develop, eQTLs may serve as potential genetic bio- ChIP sequencing and analysis. We crosslinked 10 million Epstein-Barr Virus markers for selection of study subjects carrying variants that alter (EBV)-transformed B cells with 1% formaldehyde for 5 min followed by careful cell expression of genes encoding drug targets and pathways. How- lysis using the truChIP Chromatin Shearing Kit (Covaris). Nuclei were isolated and ever, eQTL data aggregate the effects of all forces influencing gene fragmented using a Covaris E220e sonicator with the following operating condi- expression. Our data demonstrate that genetically independent tions: Peak Incident Power: 140 Watts; duty cycle: 5%; cycles per burst: 200; hQTLs (r2 < 0.6, D′ < 0.6) within an eQTL target gene chromatin treatment time: 15 min; setpoint temperature: 6 °C). Crosslinked protein/DNA was fl immunoprecipitated using H3K27ac (rabbit polyclonal, Abcam #ab4729, 20ug/ml) network can in uence subsets of subjects for which the aggregate or H3K4me1 (rabbit polyclonal, Abcam #ab8895, 20 µg/ml) antibodies on Protein eQTL effect is reduced or augmented. These results confirm and A + G immunomagnetic beads. Anti-acetyl histone H3 antibodies and isotype extend the recent work of Corradin et al., where “outside var- matched IgG antibodies were included as positive and negative controls, respec- iants” in close proximity to GWAS risk variants, but not in LD tively. An aliquot of the fragmented chromatin, not subjected to immunoprecipi- tation, was also used as an input control. Chromatin crosslinks were reversed, then with them, contributed to variation in target gene expression and protein was digested with proteinase K and DNA fragments were purified using 50 clinical risk . Overall, these results highlight the complexity Ampure XP beads (Beckman Coulter), made into libraries, and sequenced on the embedded in eQTL datasets and the need to clarify the compo- Illumina NextSeq 500 (Illumina Inc., San Diego, CA) using 75 bp single-end reads. nent contributions of all variants within the chromatin network Biological replicates of each cell line were sequenced and the resulting reads were fi pooled. Total input DNA was also processed in replicate and pooled. To assess the of an eQTL target locus. De ning these interactions will be quality of the ChIP-seq data, we calculated the fraction of reads in peaks (FRiP) for important for improving the precision of studies that leverage acCBP/p300 and RNAPII enrichment peaks and scrutinized experiments with FRiP

Fig. 5 HLA Class II chromatin landscapes of HLA-DR3 and DR15 haplotypes. a CTCF HiChIP looping interactions within the HLA Class II region reveal an insulated neighborhood that extends from BTNL2 to upstream of HLA-DQA2. The COX-aligned DR3 haplotype is presented on top and the PGF-aligned DR15 haplotype is presented on the bottom. Gene and bp positions are presented specific for each haplotype. Orange arcs represent CTCF-mediated HiChIP interactions. Arc thickness is proportional to the frequency of observed paired-end tags (PETs) at each enhancer. b H3K27ac HiChIP looping interactions within the HLA Class II region reveal differential frequency and pattern in the HLA-DRB1 to HLA-DQB1 region between the HLA-DR15 and HLA- DR3 haplotypes. Black arcs represent H3K27ac-mediated HiChIP interactions. Significant p-values are presented for enhancers that exhibit differential binding frequencies (mean PETs) for one haplotype or the other and are colored by the haplotype that exhibits higher PET counts (HLA-DR3: blue; HLA- DR15: red). H3K27ac ChIP peaks are presented in blue (HLA-DR3) and red (HLA-DR15). Haplotype specific loops are indicated in blue (COX) or red (PGF). Corresponding haplotype specific enhancers are labeled as follows—COX: 1: 4.007–4.013 Mb, 2: 4.021–4.031 Mb, 3: 4.039–4.047 Mb, 4: 4.052–4.06 Mb, 5: 4.078–4.083 Mb; PGF: 1: 4.076–4.081 Mb, B: 4.092–4.102 Mb, C: 4.111–4.120 Mb, D: 4.124–4.129 Mb, E: 4.154–4.159 Mb. c Individual H3K27 HiChIP looping interactions for three subjects heterozygous for HLA-DR3 (left) and three subjects homozygous for HLA-DR15 (right). H3K27ac ChIP peaks, haplotype specific loops, and haplotype specific enhancers are presented in blue (HLA-DR3) and red (HLA-DR15)

NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9 scores <1%. We also used strand cross-correlation to calculate the normalized excluded using Bowtie v2.2.456 with option -L 10. Filtered reads were aligned using strand coefficient (NSC) and relative strand coefficient (RSC), and repeated the Burrows Wheeler Aligner (BWA)57 to the human hg19 genome with the fol- experiments with NSC < 1.05 and RSC < 0.8. Raw FASTQ reads were quality fil- lowing options: -n 2 –o0–l 20. To reduce reference genome bias on read tered and trimmed for Illumina adapters using Trimmomatic v0.3555. Reads mappability, we used the WASP pipeline7 to relocate reads potentially biased by aligning to Poly-A, Poly-C, ribosomal, mitochondrial, or Phi-X sequences were genetic variation. Only reads overlapping single nucleotide polymorphisms (SNPs)

a Relationship: b Relationship: 2 Opposing 2 Opposing enhancement enhancement Concordant Concordant enhancement enhancement Concordant Concordant attenuation attenuation 1 1 Opposing Opposing attenuation attenuation

550 194 5677 1761 0 0 257 660 1492 6637 Interaction coefficient Interaction Interaction coefficient Interaction –1 –1

–2 –2 –3 –2 –1 0 1 2 3 –3 –2 –1 0 123 eQTL coefficient eQTL coefficient

c

QTLs rs8134436 rs56124762

AP001055.1 AIRE C21orf33 DNMT3L COSLG PFKL chr21 45.53 45.56 45.59 45.62 45.65 45.68 45.71 Mb epiQTL REF/REF 4000 4000 eQTL p = 2.7E-09 epiQTL REF/ALT epiQTL ALT/ALT

3000 3000

pinteraction 2000 = 1.11E-09 2000

1000 1000 ICOSLG gene expression ICOSLG gene expression

0 0 AA AG GG AA AG GG eQTL rs56124762 eQTL rs56124762 d

QTLs rs13256690 rs10098664

FAM167A C8orf12 BLK chr8 Mb 11.275 11.315 11.355 11.395

epiQTL REF/REF 8000 8000 eQTL p = 2.7E-14 epiQTL REF/ALT epiQTL ALT/ALT

6000 6000

pinteraction 4000 = 8.05E-05 4000 BLK gene expression 2000 BLK gene expression 2000

0 0 TT CT CC TT CT CC eQTL rs10098664 eQTL rs10098664

10 NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9 ARTICLE which were segregating in the genotyping data (≥2 minor alleles across population) human reference genome using HiC-Pro65. Aligned data were processed and ana- were targeted by the realignment procedure (Supplementary Tables 3 & 4). lyzed through the hichipper pipeline where consensus ChIP-seq tracks for either factor (H3K27ac or CTCF) were used to scaffold anchors for loop calling as recommended with default parameters66. Long range interactions spanning two Genotyping. DNA was harvested from each LCL using standard phenol- anchor regions, termed DNA loops, were derived from linked paired-end reads that chloroform extraction. Genotyping data for the 25 LCL samples were generated overlapped restriction fragments containing these peaks (Supplementary Table 5). using Illumina HumanOmni 2.5–8 SNP arrays (Illumina Inc., San Diego, CA). Samples that passed stringent quality control had a minimum of 15% long inter- Variants were included for analysis if they met the following criteria: genotyping actions and contained intrachromosomal loops with a minimum length of 5 Kbp rate > 0.9, minor allele frequency > 0.01, Hardy-Weinberg equilibrium ≥ 0.001. A and a maximum length of 2 Mbp. The 3D chromatin structures generated by the total of 1,468,563 variants passed our filtering thresholds. Data were then pre- HiChIP data were analyzed and visualized using the R package diffloop67. phased using reference information from the HapMap project phase 3 genotype map, release 358 together with SHAPEIT, v2.r79059. Imputation was carried out using IMPUTE v2.3.160 and the 1000 Genomes Phase 3 reference panel, v512.A Statistical analysis. ChIP-seq peaks were called using MACS268 using the fol- total of 9,603,466 SNPs resulted from the phasing and imputation process. lowing options: nomodel, extsize 175. Reproducible peaks were identified on an gEUVADIS genotyping data were obtained from the May 02, 2013 release of the individual basis using the Irreproducible Discovery Rate (IDR) method69 on each 1000 Genomes project12. Haplotype blocks (D’ > 0.8) were generated using PLINK261 pair of technical replicates. Population-wide consensus peak sets were derived from from 358 Caucasian individuals using the following options:—blocks no-pheno-req individual peak profiles by identifying genomic regions that were represented by –blocks-max-kb 10,000 –blocks-min-maf 0.05, using the same reference information reproducible peaks in at least 13 individuals (>50% of the population). This used in the imputation of our SLE cell lines. majority rule has been shown to perform reliably in distinguishing genuinely enriched peaks across multiple biological replicates70. Disjoint consensus genomic regions separated by fewer than 147 bp were merged to create the final consensus RNA-seq alignment and quantification. RNA was isolated from each LCL using peak map for each epigenetic track. TRIzol. RNA concentrations were quantified using a Qubit fluorometer, and We utilized the combined haplotype test (CHT)8 to identify allele-dependent mRNA libraries were generated using 500 ng of each RNA sample and Illumina’s epigenetic footprints in ChIP-seq data. Allele-specific ChIP-seq reads were TruSeq Stranded mRNA Library Prep Kit. RNA-seq was performed on the Illu- aggregated within 2000 bp regions centered around testable SNPs. SNPs evaluated mina HiSeq 3000 (Illumina Inc., San Diego, CA) in the OMRF’s Clinical Genomics by the CHT were required to have a minimum of 15 allele-specific reads in the SNP Center following Center procedures. Samples were sequenced using 75 bp paired- region. Only variants on autosomal were considered for analysis by end reads with 10 samples per lane, which yielded ~30 million reads per sample. the CHT. Region-specific read counts were adjusted for GC content bias and Primary RNA-seq FASTQ files for gEUVADIS individuals were obtained from haplotype probabilities calibrated before learning dispersion parameters for both ArrayExpress62. Post-sequence reads were quality filtered and trimmed for Illu- the allele-specific and read-count portions of the CHT model. We used Holm’s mina adapters using Trimmomatic v0.3555. Resulting reads were pseudo-aligned to Family-wise Error Rate (FWER) correction procedure71 to adjust the statistical coding regions of the GRCh38 genome (release 79), using Kallisto v0.43.015 with significance given by the CHT to variants and their correlation with epigenetic read the following options:—bias enabled, 50 bootstraps. Expression values for 173,260 counts. We chose correcting the FWER over the more common false discovery rate unique transcripts were measured in transcripts per million reads sequenced (FDR) since it tends to offer better control of the Type I error rate in LD-based (TPM). To perform eQTL analyses, expression values were summarized at the gene QTL analyses72. Significant hQTLs (N = 5829) were identified with FWER ≤ 0.1, level to transcript-length adjusted, library-size scaled counts per million (CPM) and suggestive hQTLs (N = 432) with FWER ≤ 0.2. To evaluate the calibration of with the R package tximport16. Finally, to avoid false positives due to technical the CHT, we randomly permuted genotypes and their read counts at the tested variation and outliers, RNA-seq expression data was normalized with PEER63 variants and compared the distribution of significance values through using K = 10 latent factors on genes with positive expression in >50% of indivi- quantile–quantile plots (Supplementary Figure 1). These distributions indicated duals. The PEER-derived weights were then scaled by the mean of each gene. distinct signal arising from the CHT under the true genotype and read count information, suggesting that the test was well-calibrated. HiChIP sequencing and analysis. We crosslinked 10 million EBV-transformed B We conducted an hQTL replication study for the H3K27ac hQTLs using cells using 1% formaldehyde for 10 min at room temperature. Intact nuclei were publicly available H3K27ac ChIP-seq data collected in LCLs from ten independent isolated and digested using the MboI restriction enzyme for 4 h at 37 °C. DNA Caucasian 1000 Genomes phase three samples from two published studies9,13.We fragment ends were filled and labeled with dCTP, dGTP, dTTP and biotin-dATP. used the methods described above with the exception that SNPs evaluated by the Proximity ligation was performed at room temperature overnight. Following the CHT were required to have a minimum of 10 allele-specific reads in the SNP in situ ligation, we fragmented the ligated chromatin as previously described64. region due to the smaller sample size and lower ChIP sequencing depth. Crosslinked protein/DNA was immunoprecipitated using antibodies to H3K27ac Motif enrichment analysis was done using the HOMER suite v4.773 via the (rabbit polyclonal, Abcam #ab4729, 20ug/ml) and CTCF (rabbit mAb, clone findMotifsGenome tool with the hg19 database to compare consensus peaks with D31H2, Cell Signaling #3418, 20ug/ml) and then purified by Protein A + G and without hQTLs. Background genomic regions were specified as the list of all immunomagnetic beads. The amount of MboI enzyme, antibody, and beads are unique consensus peaks. We used the LOLA (Locus Overlap) tool74 coupled with determined by the number of cells in each sample64. After immunoprecipitation, data obtained from the human Cistrome project75 to systematically test enrichment DNA was eluted from the beads by incubating at 65 °C for 4 h with 5% Proteinase of transcription factor (TF) binding events within hQTL-containing consensus K. DNA was purified by Zymo DNA Clean & Concentrator Column and quantified peaks versus all consensus peaks. We used annotations from the Roadmap by Qubit High Sensitivity Assay Kit. A minimum of 2 ng DNA was required for Epigenomics Project release 93 to assign GM12878 chromatin states to consensus library construction. Biotin-labeled DNA fragments were further immunoprecipi- peak regions. tated by Streptavidin M-280 Dynabeads. HiChIP libraries were generated on the To evaluate hQTLs for enrichment in catalogs of risk haplotypes, we performed streptavidin beads using the Nextera DNA Library Prep Kit. The PCR products were SNP-based enrichment analyses as follows. First, hQTLs were binned into categories purified by two-sided size selection using the Ampure XP beads to capture DNA based on minor allele frequency (MAF) and distance to the nearest 5′ transcription fragments between 300 and 700 bp. The integrity and quality of the libraries were start site (TSS). Each category represented hQTLs from a given decile of assessed using an Agilent TapeStation 2200 bioanalyzer. HiChIP libraries were then measurements of MAF and TSS distance, for a total of 100 categories. Next, for each pooled together and quantified using a KAPA Biosystem library quantification category, we matched variant sets to hQTLs by randomly sampling an equal number qPCR kit. qPCRs were performed on a Roche LightCycler 480 system using primers of variants from those which both (a) occurred in any consensus peak region and (b) specific to the HiSeq flowcell. Using the template size determined from the Qubit, matched the criteria for MAF and TSS distance of that category. Across all categories, the pools were diluted and requantified using qPCR to double-check the starting a total number of 337,242 variants formed the pool of matchable SNPs. Risk concentrations. The pooled samples were then denatured for 5 min with NaOH. haplotypes were established around catalog index SNPs using the PLINK-defined LD Libraries were sequenced on the Illumina NextSeq 500 sequencer on a 150-cycle blocks obtained from the 358 European individuals in the 1000 Genomes phase 3 paired-end Mid flowcell. HiChIP raw reads (fastq files) were aligned to the hg19 dataset. The total number of matchable SNPs in risk catalogs were determined to be

Fig. 6 hQTLs alter eQTL gene expression within chromatin networks. Quadrant plots for statistically significant interactions with genes within the HLA region (a) and outside the HLA region (b). Each statistically significant (FDR < 0.05%) distal interaction is represented by a point oriented by the value of coefficients in the generalized linear model for the eQTL main effect (x-axis) and the interaction effect (y-axis). The sign and magnitude of the coefficients determine the directionality and strength of the two effects on the eQTL target gene’s expression. The total number of interactions within each quadrant are labeled. c Example of ICOSLG triplet interaction with eQTL rs56124762 and hQTL rs8134436. d Example of BLK triplet interaction with eQTL rs10098664 and hQTLs rs13256690. For both c and d, 3D topology map with H3K27ac looping data is represented by red loops. H3K27ac ChIP peaks are in blue and anchored to looping data. hQTLs (black dots), eQTLs (orange dots), and gene locations are presented. Scatterplots of the eQTL genotypes alone (left panel) and stratified by the hQTL genotypes (right panel) are given

NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9

8278 (autoimmune risk haplotypes), 2036 (SLE risk haplotypes) and 50,766 (NHGRI transcript’s transcription start site (TSS) to the transcription end site (TES)); GWAS catalog risk haplotypes). We estimated a distribution of SNP enrichment in “Joint” if the hQTL haplotype and eQTL haplotype were connected by a loop; risk haplotypes under the null hypothesis by performing 1000 repetitions of the “Disjoint” otherwise (Supplementary Figure 9). variant matching process and calculating the fold-enrichment of matched variants in risk haplotypes by a one-sided Fisher’s exact test. We then computed fold-enrichment Data availability. The ChIP-seq, HiChIP-seq, and RNA-seq data that support the of the true set of hQTLs in risk haplotypes by Fisher’s exact test, comparing to the null findings of this study have been deposited into NCBI’s Gene Expression Omni- distribution to obtain a p-value. bus79 and are accessible through GEO Series accession number GSE116193. To maximize our power for analyses using eQTL data, we obtained the raw RNA-sequencing reads from the gEUVADIS Caucasian dataset14 and performed an eQTL analysis. Our analysis only identified 2403 (74%) of the 3259 eQTLs Received: 3 January 2018 Accepted: 2 July 2018 originally reported by gEUVADIS, which resulted from the following differences between the two analyses: (1) while we attempted to use the same analysis tools used by gEUVADIS, RNA-sequencing alignment tools have rapidly evolved and improved since the original gEUVADIS paper that utilized GEM. We, therefore, decided to use Kallisto15 and tximport16 for the alignment of RNA-sequencing reads. (2) The original study included 373 samples from Phase 1 of the 1000 Genomes Project in the analysis; due to sample dropout in the Phase 3 genotyping References data, publicly available data included only 358 samples. 1. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait – eQTL analysis within our own sample was performed with the Matrix eQTL associations. Nucleic Acids Res. 42, D1001 D1006 (2014). software package (v.2.1.1) in R (v.3.2.2)76,77. Expression quantifications were 2. Dunham, I. et al. An integrated encyclopedia of DNA elements in the human – standardized to the normal distribution to suit the assumptions of the Matrix eQTL genome. Nature 489,57 74 (2012). linear model. We considered cis-eQTL interactions with less than 1 Mbp separating 3. Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference human the SNP and the interacting transcript. Testable eQTL interactions were required to epigenomes. Nature 518, 317–330 (2015). have variants with minor allele frequency ≥ 0.05 and transcripts with coefficient of 4. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine variation ≥ 0.15 across gEUVADIS RNA-seq samples. We retained 214,702 eQTLs human cell types. Nature 473,43–49 (2011). at FDR ≤ 5%. For each gene with eQTL interactions, proxy SNPs were iteratively 5. Maurano, M. T. et al. Systematic localization of common disease-associated pruned by sorting the cis-interacting SNPs by nominal p-value and calculating r2 variation in regulatory DNA. Science 337, 1190–1195 (2012). among them; SNPs with r2 < 0.8 were retained. 6. Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune To determine if the genotype at the local hQTL significantly contributed to disease variants. Nature 518, 337–343 (2015). expression variation of the eQTL target gene, we calculated a negative binomial 7. van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele- generalized linear model (R function glm.nb, package MASS) with logarithmic link specific software for robust molecular quantitative trait locus discovery. Nat. function to evaluate the variance explained between the lead eQTL and other non- Methods 12, 1061–1063 (2015). proxy variants (r2 < 0.6) within each haplotype block. If a haplotype contained 8. McVicker, G. et al. Identification of genetic variants that affect histone multiple eQTLs or hQTLs, the lead eQTL was identified as the variant with modifications in human cells. Science 342, 747–749 (2013). maximum variance explained (D2 statistic, generalized linear model) and the lead 9. Kasowski, M. et al. Extensive variation in chromatin states across humans. hQTL was defined as having the maximum nominal CHT p-value among identified Science 342, 750–752 (2013). hQTLs. The contribution of expression variation was estimated as: 10. Heintzman, N. D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 2 ¼ ðÞÀ = ; D Null deviance Residual deviance Null deviance (2009). 11. Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the . Nat. Genet. where the null deviance was calculated with a model without genotypic 39, 311–318 (2007). information, and residual deviance incorporates variant allele dosage (homozygote 12. Genomes Project, C et al. An integrated map of genetic variation from 1092 reference = 0, heterozygote = 1, homozygote alternate = 2). For the local haplotype human genomes. Nature 491,56–65 (2012). block analysis, we ranked eQTLs by D2 by modeling an eQTL’s gene expression as 13. Kilpinen, H. et al. Coordinated effects of sequence variation on DNA a function of the eQTL variant’s allele dosage alone. An empirical null distribution binding, chromatin structure, and transcription. Science 342, 744–747 of D2 was calculated for each haplotype from 10,000 permutations of hQTL (2013). positions to establish a 95% confidence interval of the p-value estimate. For both 14. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers the local and distal analyses, joint effect models were fit with additive main effect functional variation in humans. Nature 501, 506–511 (2013). terms for both eQTL and hQTL, and a multiplicative interaction term. 15. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic To rank hQTLs by functional annotation tools, we retrieved information from RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016). RegulomeDB17 for hQTL variants and those in strong LD (D′ ≥ 0.8) with them on 16. Soneson, C., Love, M. I. & Robinson, M. D. Differential analyses for RNA-seq: 44 AI disease risk haplotypes that contained an hQTL. For simplicity, we collapsed transcript-level estimates improve gene-level inferences. F1000Res. 4, 1521 RegulomeDB scores across subcategories into a single score (i.e., 1a-1f were all (2015). scored as 1), such that RegulomeDB collapsed scores range from 8 (no 17. Boyle, A. P. et al. Annotation of functional variation in personal genomes RegulomeDB record) to 1 (highest functional class). using RegulomeDB. Genome Res. 22, 1790–1797 (2012). fi To perform haplotype-speci c alignment of the sequencing data, we obtained 18. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to and the COX (HLA-DR3) and PGF (HLA-DR15) MHC haplotype sequences from the drug discovery. Nature 506, 376–381 (2014). 32 MHC Haplotype Project . A custom genome was created by replacing the PGF 19. Wellcome Trust Case Control, C. et al. Association scan of 14,500 sequence from the hg19 reference genome at chr6:28477798–33351543 with the fi 78 nonsynonymous SNPs in four diseases identi es autoimmunity variants. Nat. COX MHC sequence. BLAT was used to locate the PGF sequence endpoints. Genet. 39, 1329–1337 (2007). ChIP and HiChIP data for HLA-DR3 individuals were then reprocessed with the 20. Jostins, L. et al. Host-microbe interactions have shaped the genetic custom COX genome. architecture of inflammatory bowel disease. Nature 491, 119–124 For the distal eQTL-hQTL interaction modeling, we required the eQTL and (2012). hQTL SNPs be independent of one another by LD (r2 < 0.6; D′ < 0.6), and the 21. Cooper, J. D. et al. Meta-analysis of genome-wide association study data hQTL to be both within an H3K27ac loop anchor and the chromatin network of identifies additional type 1 diabetes risk loci. Nat. Genet. 40, 1399–1401 the eQTL target gene as defined by our 3D chromatin topography data. We (2008). modeled interaction between hQTLs and LD-independent eQTLs at an eQTL target gene as described above. Statistical significance of these models was 22. Bentham, J. et al. Genetic association analyses implicate aberrant regulation of fi innate and adaptive immunity genes in the pathogenesis of systemic lupus calculated using a two-tailed z-test of the interaction coef cient. Statistical – significance was controlled with the FDR. Proxies (determined by rAggr (raggr.usc. erythematosus. Nat. Genet. 47, 1457 1464 (2015). fi fi edu; r2 > 0.8) using the European 1000G, Phase 3, Oct 2014; hg19 reference 23. Liu, J. Z. et al. Dense ne-mapping study identi es new susceptibility loci for – genome) were removed from each set of eQTLs and hQTLs prior to analysis. The primary biliary cirrhosis. Nat. Genet. 44, 1137 1141 (2012). sets of hQTL and eQTL variants tested for distal interactions were required to be 24. Harley, J. B. et al. Genome-wide association scan in women with systemic fi located on separate haplotype blocks as determined by PLINK261. Only lupus erythematosus identi es susceptibility variants in ITGAM, PXK, interactions that had representatives for all nine eQTL x hQTL genotype KIAA1542 and other loci. Nat. Genet. 40, 204–210 (2008). combinations were considered. Distal interactions were further classified as follows: 25. Pickrell, J. K. et al. Detection and interpretation of shared genetic influences “Anchored” if both hQTL and eQTL were in the same loop anchor; “Unanchored” on 42 human traits. Nat. Genet. 48, 709–717 (2016). if the hQTL was not in a loop anchor; “Looped” if hQTL was in a loop anchor 26. Allanore, Y. et al. Genome-wide scan identifies TNIP1, PSORS1C1, and without the eQTL; “Off-target” if the loop containing the hQTL did not loop to any RHOB as novel risk loci for systemic sclerosis. PLoS Genet. 7, e1002091 part of the eQTL target transcript’s range (defined as 10Kbp upstream of the (2011).

12 NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9 ARTICLE

27. International Multiple Sclerosis Genetics, C. et al. Analysis of immune-related 56. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. loci identifies 48 new susceptibility variants for multiple sclerosis. Nat. Genet. Methods 9, 357–359 (2012). 45, 1353–1360 (2013). 57. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows- 28. Raj, P. et al. Regulatory polymorphisms modulate the expression of HLA Wheeler transform. Bioinformatics 25, 1754–1760 (2009). class II molecules and promote autoimmunity. eLife 5, e12089 (2016). 58. International HapMap, C. et al. Integrating common and rare genetic 29. Kachru, R. B., Sequeira, W., Mittal, K. K., Siegel, M. E. & Telischi, M. A variation in diverse human populations. Nature 467,52–58 (2010). significant increase of HLA-DR3 and DR2 in systemic lupus erythematosus 59. Delaneau, O., & Marchini, J., Genomes Project, C. & Genomes Project, C. among blacks. J. Rheumatol. 11, 471–474 (1984). Integrating sequence and array data to create an improved 1000 Genomes 30. Graham, R. R. et al. Specific combinations of HLA-DR2 and DR3 class II Project haplotype reference panel. Nat. Commun. 5, 3934 (2014). haplotypes contribute graded risk for disease susceptibility and autoantibodies 60. Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast in human SLE. Eur. J. Hum. Genet. 15, 823–830 (2007). and accurate genotype imputation in genome-wide association studies 31. Niu, Z., Zhang, P. & Tong, Y. Value of HLA-DR genotype in systemic lupus through pre-phasing. Nat. Genet. 44, 955–959 (2012). erythematosus and lupus nephritis: a meta-analysis. Int J. Rheum. Dis. 18, 61. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger 17–28 (2015). and richer datasets. Gigascience 4, 7 (2015). 32. Horton, R. et al. Variation analysis and gene annotation of eight MHC 62. Parkinson, H. et al. ArrayExpress—a public database of microarray experiments haplotypes: the MHC Haplotype Project. Immunogenetics 60,1–18 and gene expression profiles. Nucleic Acids Res. 35, D747–D750 (2007). (2008). 63. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic 33. Dowen, J. M. et al. Control of cell identity genes occurs in insulated estimation of expression residuals (PEER) to obtain increased power and neighborhoods in mammalian chromosomes. Cell 159, 374–387 (2014). interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012). 34. Ji, X. et al. 3D chromosome regulatory landscape of human pluripotent cells. 64. Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein- Cell Stem Cell 18, 262–275 (2016). directed genome architecture. Nat. Methods 13, 919–922 (2016). 35. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions 65. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data reveals folding principles of the human genome. Science 326, 289–293 processing. Genome Biol. 16, 259 (2015). (2009). 66. Lareau, C. A. & Aryee, M. J. hichipper: a preprocessing pipeline for calling 36. Dixon, J. R. et al. Topological domains in mammalian genomes identified by DNA loops from HiChIP data. Nat. Methods 15, 155–156 (2018). analysis of chromatin interactions. Nature 485, 376–380 (2012). 67. Lareau, C. A. & Aryee, M. J. diffloop: a computational framework for 37. Hasler, J. & Strub, K. Alu elements as regulators of gene expression. Nucleic identifying and analyzing differential DNA loops from sequencing data. Acids Res. 34, 5491–5497 (2006). Bioinformatics 34, 672–674 (2018). 38. Salih, F., Salih, B., Kogan, S. & Trifonov, E. N. Epigenetic nucleosomes: Alu 68. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, sequences and CG as nucleosome positioning element. J. Biomol. Struct. Dyn. R137 (2008). 26,9–16 (2008). 69. Li, Q., Brown, J. B., Huang, H. & Bickel, P. J. Measuring reproducibility of 39. Barrett, J. C. et al. Genome-wide association defines more than 30 high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011). distinct susceptibility loci for Crohn’s disease. Nat. Genet. 40, 955–962 70. Yang, Y. et al. Leveraging biological replicates to improve analysis in ChIP-seq (2008). experiments. Comput. Struct. Biotechnol. J. 9, e201401002 (2014). 40. Anderson, C. A. et al. Meta-analysis identifies 29 additional ulcerative colitis 71. Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. risk loci, increasing the number of confirmed associations to 47. Nat. Genet. 6,65–70 (1979). 43, 246–252 (2011). 72. Fu, G., Saunders, G. & Stevens, J. Holm multiple correction for large-scale 41. Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of gene-shape association mapping. BMC Genet. 15 Suppl 1, S5 (2014). confirmed Crohn’s disease susceptibility loci. Nat. Genet. 42, 1118–1125 73. Heinz, S. et al. Simple combinations of lineage-determining transcription (2010). factors prime cis-regulatory elements required for macrophage and B cell 42. Gregersen, P. K. et al. REL, encoding a member of the NF-kappaB family of identities. Mol. Cell 38, 576–589 (2010). transcription factors, is a newly defined risk locus for rheumatoid arthritis. 74. Sheffield, N. C. & Bock, C. LOLA: enrichment analysis for genomic region sets Nat. Genet. 41, 820–823 (2009). and regulatory elements in R and Bioconductor. Bioinformatics 32, (587–589 43. Freudenberg, J. et al. Genome-wide association study of rheumatoid arthritis (2016). in Koreans: population-specific loci as well as overlap with European 75. Mei, S. et al. Cistrome Data Browser: a data portal for ChIP-Seq and susceptibility loci. Arthritis Rheum. 63, 884–893 (2011). chromatin accessibility data in human and mouse. Nucleic Acids Res. 45, 44. Morris, D. L. et al. Genome-wide association meta-analysis in Chinese and D658–D662 (2017). European individuals identifies ten new loci associated with systemic lupus 76. Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix erythematosus. Nat. Genet. 48, 940–946 (2016). operations. Bioinformatics 28, 1353–1358 (2012). 45. Hom, G. et al. Association of systemic lupus erythematosus with C8orf13-BLK 77. Team, R. C. R. R Foundation for Statistical Computing (Vienna, Austria, and ITGAM-ITGAX. N. Engl. J. Med. 358, 900–909 (2008). 2015). 46. Han, J. W. et al. Genome-wide association study in a Chinese Han population 78. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 identifies nine new susceptibility loci for systemic lupus erythematosus. Nat. (2002). Genet. 41, 1234–1237 (2009). 79. Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI 47. Lee, Y. C. et al. Two new susceptibility loci for Kawasaki disease identified gene expression and hybridization array data repository. Nucleic Acids Res. 30, through genome-wide association analysis. Nat. Genet. 44, 522–525 (2012). 207–210 (2002). 48. Onouchi, Y. et al. A genome-wide association study identifies three new risk loci for Kawasaki disease. Nat. Genet. 44, 517–521 (2012). 49. Fairfax, B. P. et al. Genetics of gene expression in primary immune cells Acknowledgements fi fi identi es cell type-speci c master regulators and roles of HLA alleles. Nat. We want to thank the SLE patients whose sample contributions were essential for these – Genet. 44, 502 510 (2012). studies. We also thank Bryce van de Geijn and Graham McVicker for their time and 50. Corradin, O. et al. Modeling disease risk through analysis of physical excellent support with the WASP and CHT programs. Research reported in this pub- interactions between genetic variants within chromatin regulatory circuitry. lication was supported by the National Institute of Allergy and Infectious Diseases: – Nat. Genet. 48, 1313 1320 (2016). U19AI082714; National Institute of Arthritis and Musculoskeletal and Skin Diseases: 51. Hernando, H. et al. The B cell transcription program mediates P30AR053483, R01AR056360, R01AR063124; and the National Institute of General hypomethylation and overexpression of key genes in Epstein-Barr virus- Medical Sciences: P30GM110766 and U54GM104938. The content of this publication is associated proliferative conversion. Genome Biol. 14, R3 (2013). solely the responsibility of the authors and does not necessarily represent the official 52. Caliskan, M., Cusanovich, D. A., Ober, C. & Gilad, Y. The effects of EBV views of the National Institutes of Health. transformation on gene expression levels and methylation profiles. Hum. Mol. Genet. 20, 1643–1652 (2011). 53. Hernando, H. et al. Epstein-Barr virus-mediated transformation of B cells Author contributions induces global chromatin changes independent to the acquisition of P.M.G. supervised the project. R.C.P., J.A.K., and P.M.G. generated the main idea of the – proliferation. Nucleic Acids Res. 42, 249 263 (2014). work and developed the study design. R.C.P., J.A.K., C.A.L., S.B.G., and P.M.G. per- 54. Rasmussen, A. et al. The lupus family registry and repository. Rheumatology formed the analysis and made interpretations of the data. Y.F., and G.B.W. made con- – 50,47 59 (2011). tributions to the acquisition of the data. J.B.H., J.M.G., and J.A.J. provided the LCLs for fl 55. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a exible trimmer for the project. M.M.W. maintained the LCLs. M.J.A., C.M., and P.M.G. supervised the data – Illumina sequence data. Bioinformatics 30, 2114 2120 (2014). analysis. R.C.P., J.A.K., and P.M.G. wrote the manuscript from first draft to completion.

NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-05328-9

fi Y.F. and K.L.T. made comments, suggested appropriate modi cations, and corrections. Open Access This article is licensed under a Creative Commons fi All authors read and approved the nal manuscript. Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give Additional information appropriate credit to the original author(s) and the source, provide a link to the Creative Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467- Commons license, and indicate if changes were made. The images or other third party ’ 018-05328-9. material in this article are included in the article s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the Competing interests: The authors declare no competing interests. article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from Reprints and permission information is available online at http://npg.nature.com/ the copyright holder. To view a copy of this license, visit http://creativecommons.org/ reprintsandpermissions/ licenses/by/4.0/.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. © The Author(s) 2018

14 NATURE COMMUNICATIONS | (2018) 9:2905 | DOI: 10.1038/s41467-018-05328-9 | www.nature.com/naturecommunications