Mapping pathogenic regulatory regions and

Chris Cotsapas Yale/Broad Mapping pathogenic regulatory regions

Chris Cotsapas Yale/Broad Mapping pathogenic regulatory regions

Chris Cotsapas Yale/Broad cotsapaslab.info/positions

http://biorxiv.org/content/early/2016/05/19/054361

Common risk variants localize to DHS

Maurano et. al, Science 2012

Gusev et. al, AJHG 2014

Trynka et al, AJHG 2015 over loci shown for 9 AID).

and adjusted forIdentifying the correlation specific structure regulatory in the elements expression driving data (see risk supplementary to the disease, and the materials forgenes more that details). they control: We called any DHS cluster harboring a CI SNP (⇢d > 0) a We convertedpathogenic P values DHS of correlationcluster. For to chi-squareeach lead test SNP, statistics we defined as an a intermediate 2 Mbp region step centered on it, required toand measure identified posterior all pathogenic of association DHS transmitted clusters and from genes a pathogenic overlapping DHS the cluster region to using intersect a gene. Asfunction shown in of Supplementary BEDTools [45]. Figure We reasoned 7 34, P values that if of a correlationDHS cluster follow controls a uniform expression of a gene, distribution;its therefore, presence/absence this conversion should is be valid. correlated We showed to the the expression proportion of the of associationtarget gene over matched posterior transmittedsamples. We from therefore, DHS cluster measuredd to P gene valueg ofby correlationd,g, and computed between all it pairs using of the pathogenic DHS following formula.clusters and genes in the using paired expression and DHS data over 22 cell types from 2 2 /2 d,g /2 Roadmap Epigenome Projectd,g = e d,g (see/ supplementarye i materials), and asked whether expression g of any gene in the locus changed conditionalXi on presence of a pathogenic DHS cluster. 2 We converted P values of correlation to chi-square test statistics as an intermediate step where d,gi is correlation chi-squared test statistics between DHS cluster d and gene gi. In the above formula,required genes to measure with no posteriorcorrelations of to association DHS cluster transmittedd get non-zero from low a pathogenicd,g values; DHS cluster however, into order a gene. to compute As shownd,g inof theSupplementary correlated genes Figure with 4, higher P values accuracy, of correlationd,g of non- follow a uniform computedcorrelated posterior genesdistribution; probability should be of therefore, associationset zero. this To of decideeach conversion SNP on in a set reasonable isS valid.as cut-oWe showed↵ threshold the forproportiond,g,we of association

considered posteriord,g of all pairstransmitted of pathogenic from2 DHS DHS2 cluster cluster andd to genes Gene ing allby loci,d,g and, and sorted computed them it using the s/2 i /2 in a decreasingcomputedfollowing posterior order formula. (See probability supplementaryPPs = ofe association/ figuree of 8, each in SNP which in theset S blueas vertical line separates i S 2 2 /2 X2 d,g/2 d,g 25% top d,g values from the rest). As shown2 in this= e figure,2 / aftere thisi cut-o↵ value, the d,g PosteriorRegulatory Probability of Association fine-mapping models/2 d,g /2 i 0.0 0.1 0.2 0.3 0.42 PPs = e / e wherecurvei is is the flat. association This cut-o chi-square↵ value test is equivalent statistics of to SNP thei gene-DHScalculatedg cluster byi converting correlation as- P value of i S sociation P value from Immunochip data into chi-square test2 statistics. WeX sorted SNPs by 6 ● 0.25. We therefore assigned2 0 to d,g if Pd,g > 0.25.X If a DHS cluster had zero d,g values for Posterior(Probability( where d,gi is correlation chi-squared test statistics between DHS cluster d and Gene gi. Posterior'Probability'of'their posterior probability2 values (i.e. PPs)andselectedthen top SNPs with the highest as- ●● ● all geneswhere in theis region, the association it was considered chi-square test to be statistics not correlated of SNP i tocalculated any gene by in converting the region as- and ●● ● Associa/on((PPAssocia%on'(PP )'s)( i ● sociations posteriorsThe such genes that with lowPP correlationss 0.99 and to a DHSPP clusters < 0.99. (equivalent The selected to low d,g)introducenoise ● ● ● sociation P value froms Immunochip=1..n data into chi-squares=1..(n 1) test statistics. We sorted SNPs by ●●●●●● it was removed from downstream analysis. For each remaining DHS cluster d in the locus, ●● ●● ● 4 Posterior Probability●● of● ●Association● ●● ● ●● ● ● n top SNPs formedto theour 99% calculations; credible interval therefore, set for their the risk corresponding locus of interest.35d,g need to be zeroed out to avoid this ● 0" 0.1" 0.2" 0.3" 0.4" their posterior probability values (i.e. PPs)andselectedthen top SNPs with the highest as- (P) ● ● 0.0 0.1 0.2we0.3 0.4 identified all genes with Pd,g 0.25, and put them in set Gd. ● ● ● P P 10 ● ● ●● ●●● ●● ● ● e↵ect. However, we first needed to decide on a reasonable cut-o↵ threshold for d,g. In log ●● ● sociation posteriors such that s=1..n PPs 0.99 and s=1..(n 1) PPs < 0.99. The selected − ●●● ●● ● We updated our formula for calculating d,g as follows. 6 ● ● ●●●● Estimating regulatory potential of disease loci: In order to identify pathogenic 2 ●●●●●● ●● n top SNPsSupplementary formed our 99% Figure credible 5, we interval plotted set sorted for the riskd,g values locus of (before interest. zero-ing out). The blue vertical ● ● ● ● ● ● ●● ● ●●● ● ●● ● ●● ● ● P 2 P ●● ●● ● ●● regulatory elements, we overlapped credible2 interval (CI) SNPs with our DHS clusters using ● ●● ●● ●●●● ●● ● /2 /2 ●● ●●●●● ● ● ● ● ● line separates 25% topd,gd,g values fromd,g the rest. As shown in this figure, after this cut-o↵ ●●●● ● ● i ● ● ● ●●●●●●●●●●●●● ● e / e if g Gd ● ●●●●●●●●●●●● ● ●●●●●●●●●● ● ● the● intersect● function of BedTools software suite [49].gi WeGd computed posterior probability of ●● ● ●●●●● ●●●●●●●●●●●●● ● Summary%P%values%from%gene>cs% 4 ● ●● ●●●●●●●● ●●● ●●●●●●● ● Estimating regulatory = potential of2 disease loci: In order to identify pathogenic ● ● ● ●●●●●●●●●●●●●●●●●●●●●● ● ●●● ●● d,g 2 0 ●●●●●●● ● ●●●●● ●● ● ● ●●●●● ●● ● ● ●● ● value, the d,g curve remains almost flat with small values. This cut-o↵ value is equivalent

(P) ● ● ● ● ● association attributableassocia>on%data% to each DHS cluster0otherwised as the sum of per-SNP posteriors over all CI 10 ● ● ( ●● ●●● regulatory elements, we overlapped credible interval (CI) SNPs with our DHS clusters using ●● ● ● P log 90.50 90.75 ●● 91.00● 91.25 91.50 to the gene-DHS cluster correlation P value of 0.25. Given this, we assigned 0 to d,g if − ●●● ●● SNPs overlapping the DHS. That is, Posi%on'on''(Position● on ChromosomeMbp (Mbp))' ● ●●●● the intersect function of BedTools software suite [49]. We computed posterior probability of 2 ●●●●● ●● ● ● ● ● We computed the total posterior of association transmitted from DHS cluster d to gene ●● ●● Pd,g > 0.25. If a DHS cluster had zero d,g values for all genes in the region, it was considered ● ● ● ● ● ● ●●● ● ● association attributable to each DHS cluster d as the sum of per-SNP posteriors over all CI ●● ● ●●●● ● ● ● ●●●●● ● ● SNPs% ● ●●●●●● ●● ● ● g (shownPer3SNP%posterior%probability% as d,g) by multiplying⇢ = itsPP posteriorO (s) of association (⇢d)andtheproportionof ● ● ●●●●●●●●●●● ● to be not correlatedd to anys gened in the region and it was removed from downstream analysis. ● ●●●●●●●●●●●● ● ●●● ● ● ● ● ●● ● ●●●●●● ●● ● ●●●●●● ● SNPs overlapping the DHS. That is, ● ●● ●● ●●●●●● ● ●●●●●● ● ⇥ ● ● ● ● ●●●●●● ●●● ●●●●●●● ● ●●● ● ●●●●●● ● ●●●●●●●●●● ● ●●●●●●●● ● ● ● ●transmitted association posteriors CI ( ). That is, 0 ● ● ● ● ● ● ● ●● For each remaining DHSX2 d,g cluster d in the locus, we identified all genes with Pd,g 0.25, and 90.50 90.75 91.00 91.25 91.50  PositionDHS1% on Chromosome (Mbp) DHS2% Per3DHS%posterior%probability%put them in set G . We⇢ updatedd = PP ours formulaOd(s) for calculating as follows. DHS%Clusters% where PPs is the posterior probabilityd of association for SNP⇥ s. Od(s) is equal to one if SNPd,g d,gs =CI ⇢d d,g s is located on DHS cluster d or the 100 bp flankingX2 region⇥ each side of DHS cluster d,andit 2 /2 2 /2 is zero otherwise. If one SNP overlapped two or more DHSe d,g clusters/ or theire d,g 100i bpif flankingg G By summingwhere PPs isover the posterior all pathogenic probability DHS of clusters, association we for computed SNPgi Gsd. O overalld(s) is equal posterior to oned probability if SNP Normalized%correla>on%d,g = 2 2 regions, thens is we located split on its DHS posterior cluster probabilityd or the 100d,gPPs bpbetween flanking those region DHS each clusters side of equally. DHS cluster d,andit of associationcoefficient% attributable to each gene (g)intheregion.Thatis,(0otherwise We measuredis zero otherwise. regulatory If potential one SNP of overlapped each disease two risk or more locus DHSP by summing clusters per-DHSor their 100 pos- bp flanking teriors over all DHSsWe in the computed locus. That the is, total posterior of association transmitted from DHS cluster d to Gene regions, then we split its posterior probability PPs between those DHS clusters equally. Genes% Gene1% Gene2% Gene3% Per3gene%posterior%probability% g = ⇢d d,g Weg measured(shown regulatory as d,g)bymultiplyingitsposteriorofassociation( potential of each disease⇥ risk locus by summing per-DHS⇢d)andtheproportionof pos- ⇢ = ⇢d d D teriorstransmitted over all DHSs association in the locus. posterior That is,X2 (d,g). That is, d D X2 where D is the set of all pathogenic DHS clusters in the region. We used g to prioritize where D is the set of all DHS clusters in the region,⇢ and= ⇢ represents⇢d d,g = ⇢ overalld d,g posterior prob- genes in the disease risk locus. The higher g, the higher per-gene⇥ posterior of association, d D ability of association attributable to all DHS clusters in2 the region. We used ⇢ to rank risk By summing d,g over all pathogenicX DHS clusters, we computed overall posterior probability loci and towhere identifyD is candidate the set of loci all DHS for regulatory clusters in e↵ theects region, (See Figure and ⇢ represents X for distribution overall posterior of ⇢ prob- 34 of association attributable to each gene (g)intheregion.Thatis, over lociDONE: shownability forComment of 9association AID). for Chris: attributable Please take to all a look DHS at clusters the figure. in the My region. question We is that,used ⇢ shouldto rank we riskgenerate the same figure for all traits? loci and to identify candidate loci for regulatory e↵ects= (See⇢ Figure X for distribution of ⇢ Identifying35Comment specific to Parisa: regulatory Edit this paragraph elements again. driving riskg to thed disease,⇥ d,g and the over loci shown for 9 AID). d D genes that they control: We adopted the following approach toX2 identifying specific regu- latory elementsIdentifying driving risk specific to the disease, regulatory and the elements genes that driving they control. risk to We the called disease, any and the 8 7 DHS clustergenes harboring that they a CI control: SNP (⇢d We> 0) adopted a pathogenic the following DHS cluster. approach For to each identifying lead SNP, specific regu- we identifiedlatory all pathogenic elements driving DHS clusters risk to the and disease, genes overlapping and the genes the2 that Mbp they region control. around We it called any using the intersect function of BEDTools [49]. DHS cluster harboring a CI SNP (⇢d > 0) a pathogenic DHS cluster. For each lead SNP, We thenwe reasoned identified that all pathogenic if a DHS cluster DHS controlsclusters and expression genes overlapping of a gene, its the presence 2 Mbp should region around it be correlatedusing to the theintersect increasedfunction expression of BEDTools of the target [49]. gene. We therefore measured the P value of correlationWe then reasoned between thatall pairs if a DHS of pathogenic cluster controls DHS clusters expression and of genes a gene, in the its presence locus should using pairedbe expression correlated and to the DHS increased data over expression 22 cell types of the from target Roadmap gene. Epigenome We therefore Project. measured the In order toP value compute of correlationP value of between correlation, all pairs we usedof pathogenic two-sided DHS Wilcoxon clusters rank and sum genes test in the locus using paired expression and DHS data over 22 cell types from Roadmap Epigenome Project. In order to compute P value of correlation,7 we used two-sided Wilcoxon rank sum test

7 Aligning DHSsProportion Over of Heritability Cell Types Proportion of Heritability Thymus Thymus CD14 CD14 56 x 2 DHS REP tissuesCD19 CD19 CD3 CD3 CD4 CD4 CD56 CD56 CD8 CD8 IMR90 Hotspot - Peak CallingIMR90 Mobilized_CD34 Mobilized_CD34 Mobilized_CD3 Mobilized_CD4 Mobilized_CD3 Lung Mobilized_CD4 Lung_Left Lung Lung_Right Lung_Left Muscle_Arm DHS peaks for 112Lung_Right samples Muscle_Back Muscle_Leg Muscle_Arm Muscle_Trunk Muscle_Back Breast Muscle_Leg Adrenal_Gland Muscle_Trunk Brain Markov Clustering Heart Breast Intestine_Large Adrenal_Gland Intestine_Small 1,079,138/1,994,675 clustersBrain Kidney Heart Kidney_Left Status Kidney_Right (~54%) passIntestine_Large QC DHS_Peaks_Hotspot 1,994,675 DHS clusters Placenta Intestine_Small Renal_Cortex DHS_Clusters Kidney Renal_Cortex_Left Cell Types Kidney_Left Renal_Cortex_Right Status Kidney_Right Renal_Pelvis 8% of genome (cf. 14% all peaks) Renal_Pelvis_Left DHS_Peaks_Hotspot Replication TestPlacenta Renal_Pelvis_Right Renal_Cortex Spinal_Cord DHS_Clusters Renal_Cortex_Left Stomach Testes Cell Types Renal_Cortex_Right Renal_Pelvis Skin_Abdomen Skin_Back Renal_Pelvis_Left Skin_Biceps_Left Renal_Pelvis_Right Skin_Biceps_Right Spinal_Cord Skin_Quadriceps_Left Stomach Skin_Quadriceps_Right Skin_Scalp Testes Skin_Upper_Back Skin_Abdomen Gastric Skin_Back H1 Skin_Biceps_Left H1_BMP4_Mesendoderm Skin_Biceps_Right H1_BMP4_Trophoblast H1_Mesenchymal Skin_Quadriceps_Left H1_Neuronal_Progenitor Skin_Quadriceps_Right H9 Skin_Scalp Pancreas Skin_Upper_Back Foreskin_Fibroblast Gastric Foreskin_Keratinocyte 0.0 0.2 0.4 H1 H1_BMP4_Mesendoderm H1_BMP4_Trophoblast Proportion of Heritability H1_Mesenchymal H1_Neuronal_Progenitor H9 Pancreas Foreskin_Fibroblast Foreskin_Keratinocyte 0.0 0.2 0.4

Proportion of Heritability Posterior probabilities of association Posterior Probability of Association 0.0 0.1 0.2 0.3 0.4 Association Plot for rs72928038 ● 6 ● Posterior Probability of Association ●● ● 0.0 0.1 0.2 0.3 0.4 ●● ● ● ● ● ● ● ● 6 ●●●●● ●● 4 Posterior Probability●●●● of●●● ●●Association 4 ● ●● ●●● ● ● ● ●● 0.0 0.1● 0.2 0.3●0.4 ● ● (P) ● (P) ● ● ● ● ● ● 10 ● 10 ● ●●● ●● ● ● ● ● ● ●●●●●● ●● ● ● ●● ●●●●● ● 6 ● 4 ●● ●● ● − log − log ●●● ●● ● ● ●● ● ● (P) ● ● ●●●●● ● ● ● ●●●●10 ● ● 2 ●●● ●● ● ●● ●●● ● ● ●● ● ● ● ●● ● ● ●● ●●● ● ●● ● ● ● − log ● ●●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●●● ●● ●●2●●●●● ● ●●●●●●● ● ●●●●● ●● 4 ● ● ● ●● ● ●●●●●●● ● ● ● ● ● ● ●● ●●●●●● ●●● ● ●● ●● ●● ● ●●●●●● ●●● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●● ● ● ●●● ● ● ● ●●●●●●●●●●●●● ●● ●●● ● ● ●●●● ●●●● ●● ● (P) ●● ● ● ●●● ●●● ●●●●●●● ●● ●●● ●● ● ● ●● ● ● ● ●●●● ● ●●● ● ●● ●●●●●● ● ●● ● ● 10 ● ● ●● ●●● ● ●● ●●●●●●● ● ●● ● ●●●● ● ● ● ● ●●●●●●● ●●●●●●●●●●●●● ● ● ●●● ● ●●●●●●●●●●●●● ● ● ●●●●●● ● ●●●●●●●●●● ● ●●●●●●● ● ●● ●● ●●●●●●●●●●●●●● ●● ●●●●● ● ● ● ● 0 ●●● ● ● ● ●● ● ● ●● ● ● ●●●● ● ●●●●●●●● ●●● ●●●●● ● ● ● ●●● ●●●●● ●● ●●●●● ● ● ● − log ●●● ●● ● ● ● ● ●●●●●●●●●● ●●●●●●●●●●● ● ●●● ● ● 0 ●●●●●●● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ●●●● 2 ●●●●●● ●● 90.50 90.75 ● ● 91.00● ● ● 91.25 91.50 ● ●● 90.50● 90.75 91.00 91.25 91.50 ● ● ●● ● ● ● ● PositionPosition on Chromosome●● ● ● (Mbp) (Mbp) Position on Chromosome (Mbp) ● ●● ●● ●●●● ● ● ●● ●●●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●●●●●●●●●●● ● ● ●●●●●●●●●●●● ● ●● ●●● ● ● ● ● ●● ● ●●●●● ●● ●●●●●●● ● ● ●● ●● ●●●●● ● ●●●●●● ● ● ● ● ●●●●●●●● ●●●●●●●●●●●● ● ●●● ●● SNPs 0 ●●●●●●● ● ●●●●● ●● ● ● ●●●●● ●● ● ● ●● ● 90.50 90.75 91.00 91.25 91.50 Position on Chromosome (Mbp)

DHSs DHS1 DHS2

Gene Gene1 Gene2 Gene3 s Position on (Mbp)

39.47 39.97 40.47 40.97 41.47

30 Posterior 0.18

20

10 − log10(P) IBD GWAS

0 0

DSCR8 ERG AF064858.6 WRBSH3BGR PCP4

DSCR4 KCNJ15 ETS2 PSMG1 LCA5L B3GALT5

DHS 1 Correlation

DHS and Gene Gene DSCR4 DSCR8 KCNJ15 ERG ETS2 PSMG1 WRB SH3BGR B3GALT5 PCP4

12 Absent Present 10

8

6 by DHS State by 4 DHS1 DHS1 DHS1 DHS1 DHS1 DHS1 DHS1 DHS1 DHS1 DHS1

* 0 0.3 0.98 0.11 0.69 0.23 0.92 0.37 0.19 0.66 P Value of P Value Correlation Position on (Mbp)

116.08 116.58 117.08 117.58 118.08

24 Posterior 0.77

18

12 − log10(P) MS GWAS 6 0 0

CASQ2 SLC22A15 ATP1A1 IGSF3 PTGFRN TRIM45

VANGL1 NHLH2 MAB21L3 CD58 CD2 TTF2 MAN1A2

DHS 1 2 Correlation

DHS and Gene Gene VANGL1 CASQ2 SLC22A15 MAB21L3 ATP1A1 CD58 CD2 PTGFRN TTF2 MAN1A2

13 Absent Present 11

9

7 by DHS State by Gene Expression 5 DHS1 DHS2 DHS1 DHS2 DHS1 DHS2 DHS1 DHS2 DHS1 DHS2 DHS1 DHS2 DHS1 DHS2 DHS1 DHS2 DHS1 DHS2 DHS1 DHS2

* 0.7 0.73 0.15 0.69 0.78 0.95 0.54 0.06 0.99 0.52 0.04 0.08 0.17 0.38 0.56 0.21 0.46 0.59 0.44 0.81 P Value of P Value Correlation Position on Chromosome 16 (Mbp)

85.02 85.52 86.02 86.52 87.02

6 Posterior 0.36 4

2 − log10(P) RA GWAS

0 0

KIAA0513 GINS2 COX4NB IRF8 FOXL1

ZDHHC7 LINC00311 C16orf74COX4I1 FOXC2

DHS 1 2 3 4 5

Gene ZDHHC7 KIAA0513 LINC00311 GINS2 C16orf74 COX4NB COX4I1 IRF8 FOXC2 FOXL1

12 DHS State Absent Present

10

8 by DHS State by Gene Expression 6 DHS1 DHS2 DHS3 DHS5 DHS1 DHS2 DHS3 DHS5 DHS1 DHS2 DHS3 DHS5 DHS1 DHS2 DHS3 DHS5 DHS1 DHS2 DHS3 DHS5 DHS1 DHS2 DHS3 DHS5 DHS1 DHS2 DHS3 DHS5 DHS1 DHS2 DHS3 DHS5 DHS1 DHS2 DHS3 DHS5 DHS1 DHS2 DHS3 DHS5

* * * * * * 1 0 0.09 0.12 0.53 0.16 0.04 0.08 0.23 0.26 0.35 0.93 0.91 0.07 0.38 0.23 0.76 0.08 0.47 0.76 0.18 0.28 0.44 0.3 0.62 0.61 0.98 0.72 0.63 0.03 0.43 0.89 0.02 0.01 0.01 0.12 0.88 0.12 0.06 0.77 P Value A DHS Clusters B Genes

PBC ●

AITD ●

PSO ● ●

JIA ●

●●●● ●

T1D ●

●● ●

CEL ● ●

RA ●● ● ●

●● ●● ●●● ● ●

IBD ●● ● ●● ●●●

● ● ●

MS ● ●

1 2 3 4 5 10 20 50 200 1100 1 2 3 4 5 10 20 40 65 Number of DHS Clusters (log scale) Number of Genes (log scale)

C Regulatory Potential Per Locus D Posterior Probability of Association Per Gene

PBC ●

AITD ● ●

PSO ● ●

JIA ● ●

T1D ● ●● ● ● Traits CEL ●

RA ●

IBD ● ●●

MS ● ●

0 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 Total Posterior Probability Attributable to DHS Clusters Normalized Posterior Probability of Association of the Top Genes

E Allelic Imbalance in DHS Accessibility F Tissue Activity of Prioritized DHS Clusters

120 Tissue 20 7e−03 1e−08 3e−02 3e−03 3e−02 4e−03 4e−03 8e−02 3e−01 100 Immune Non−Immune log10) 80 − 15 Expected 60 Observed 10

40 5 20 DHS Accessibility (5% FDR) P Value of Enrichment ( P Value

Number of CI SNPs with Imbalanced 0 0

MS IBD RA CEL T1D JIA PSO AITD PBC MS IBD RA CEL T1D JIA PSO AITD PBC BACH2 locus (chr6:91Mb) in MS Posterior Probability of Association 0.0 0.1 0.2 0.3 0.4

6 ● Posterior Probability of Association 0.0 0.1 0.2 0.3 0.4 ●● ● ●● ● Posterior Probability of Association ● 6 ● ● ● ● 0.0 0.1 0.2 0.3 0.4 ●●●●● ●● 4 ● ●●● ●●●● ● ●● ●●● ●● ● ● ● ● ● ●● ● ● (P) ● ● ● 6 ● ● ● ● 10 ● ● ● ● ●●●●●● ●● ●● ●●●●● ● ● 4 ●● ●● ● ●● ● ● ● ● ●● ●● ● ●● − log ● ●●● ●● ●● (P) ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ●● ●●● ● ●●●● ● 2 ● ●● ● ● ●●●● ● ●● ● ● ●● ●● ● ● ● ● − log ●●● ● ●● ● ● ●● ●● ● ●● ● ●● ●● 4 ●● ●●● ●● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● 2 GABRR2 ●●●●●● ●● MDN1 ● ● MAP3K7 ● RRAGD● ● ● ● GJA10 BACH2 ●● ●● ●●●● ● ● (P) ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●●●● ● ● ● ● 10 ● ● ●● ● ● ● ● ●● ●● ●●● ● ●●● ● ● ●●●●●● ●● ● ● ●● ●● ●● ● ● ● ● ● ●● ●●●●●● ● ●● ● ● ● ●● ●●●●●● ● ●● ● ● ● ●●●●●●●●●●●●● ● ● ●●● ● ● ● ● ● ● ●●●● ● ● chr6rs72928038●● (Total ●PPA● ●●●● = ●●0.59)●●●●●● ● − log ●● ● ● ●●●●●●● ● ● ● ● ● ● ●● ● ● ●●●●●●●●●●●● ● ●●● ● ● ● ● ● ●● ● ● ●●●● ● ● ●●●●●● ● ● ●● ● ●●●●●● ●● ● ●●●●●● ●● ● ● ● ● ● ● ●●● ● ● ● ●●●●●● ● ●●● ● ● ●●● ● ●● ●● ●●●●●● ● ●●●●●●● ● ● ● ● ● ●●●●●●●●●●● ●● ●●●●●● ● ● ● ● ●●●● ● ● ● ● ● ●●●●●● ●●● ●●●●●●● ● ●●● ● ●●●●●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●● 2 ●0● ●● ●●●●●●● ● ●●●●●●●●●● ● ●●●●●●●●● ● ● ● ● ● ● ● 0● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ●●●● 90.50● 90.75● 91.00 91.25 91.50 ●● ●●●●● ● ● ● ● 90.0● 90.5 91.0 91.5 ● ●●● ● ● ● ● ● ●●●●●●●●●●●● ● Position on Chromosome (Mbp) ● ●●●●●●●●●●●● ● ●● ●●● ● ● ● ● Position on Chromosome (Mbp) ●● ● ●●●●● ●● ●●●●●●● ● ● ●● ●● ●●●●● ● ●●●●●● ● ● ● ● ●●●●●●●● ●●●●●●●●●●●● ● ●●● ●● 0 ●●●●●●● ● ●●●●● ●● ● ● ●●●●● ●● ● ● ●● ●

90.50 90.75 91.00 91.25 91.50 Position onD1 Chromosome (0.07) (Mbp) D2 (0.003) D3 (0.003) D4 (0.003) D5 (0.002) D6 (0.002) D7 (0.48) D8 (0.005) D9 (0.005) D10 (0.02)

GABRR2 RRAGD MDN1 GJA10 BACH2 MAP3K7

Boxplots of Gene Expression 12 p=0.33 p=0.43 p=0.02 p=0.74 p=0.12 p=0.9

10 ● ● ● D7.Activity ● ● 8 Absent Present

● ●

Gene Expression 6 ●

● 4 GABRR2 RRAGD MDN1 GJA10 BACH2 MAP3K7 Genes A" Position on Chromosome 6 (Mbp) B" Position on Chromosome 6 (Mbp) 89.98 90.48 90.98 91.48 91.98 89.81 90.31 90.81 91.31 91.81

6 6 Posterior Posterior 0.95 0.23

4 4 log10(P) log10(P) 2 −

− 2 CEL GWAS ATD GWAS ATD 0 0 0 0

RRAGD GJA10 MAP3K7 GABRR1 RRAGD GJA10 MAP3K7

GABRR2 MDN1 BACH2 SRSF12 GABRR2 MDN1 BACH2

DHS 2 12 15 DHS 1 2 3 4 5 6 7 8 10 11 12 13 14

Gene GABRR2 RRAGD MDN1 GJA10 BACH2 MAP3K7 Gene SRSF12 GABRR1 GABRR2 RRAGD MDN1 GJA10 BACH2 MAP3K7

10 DHS State 10 DHS State Absent Absent Present Present

8 8

6 6 by DHS State by by DHS State by Gene Expression Gene Expression 4 4 DHS2 DHS2 DHS2 DHS2 DHS2 DHS2 DHS2 DHS2 DHS12 DHS13 DHS14 DHS12 DHS13 DHS14 DHS12 DHS13 DHS14 DHS12 DHS13 DHS14 DHS12 DHS13 DHS14 DHS12 DHS13 DHS14 DHS12 DHS13 DHS14 DHS12 DHS13 DHS14 DHS12 DHS12 DHS12 DHS12 DHS12 DHS12

0.57 0.35 0.27 0.79 0.91 0.89 0.99 0.44 0.13 0.34 0.09 0.39 0.5 0.43 0.55 0.65 0.36 0.19 0.21 0.8 0.74 0.93 0.35 0.12 0.09 0.92 0.88 0.9 0.88 0.51 0.33 0.43 0.01* 0.75 0.12 0.9 0.02* 0.03* P Value P Value

C" Position on Chromosome 6 (Mbp) D" Position on Chromosome 6 (Mbp)

89.97 90.47 90.97 91.47 91.97 89.98 90.48 90.98 91.48 91.98

6 Posterior Posterior 0.59 0.48 6 4

4

log10(P) log10(P) 2 − − MS GWAS

IBD GWAS 2

0 0 0 0

RRAGD GJA10 MAP3K7 RRAGD GJA10 MAP3K7

IBD GABRR2 MDN1 BACH2 GABRR2 MDN1 BACH2 (MAP3K7) DHS 3 4 6 9 DHS 2 3 4 6 7 8 10 11 12 13 14 15

Gene GABRR2 RRAGD MDN1 GJA10 BACH2 MAP3K7 Gene GABRR2 RRAGD MDN1 GJA10 BACH2 MAP3K7

10 DHS State 10 DHS State Absent Absent Present Present

8 8

6 6 by DHS State by DHS State by Gene Expression Gene Expression 4 4 DHS9 DHS9 DHS9 DHS9 DHS9 DHS9 DHS2 DHS2 DHS2 DHS2 DHS2 DHS2 DHS12 DHS15 DHS12 DHS15 DHS12 DHS15 DHS12 DHS15 DHS12 DHS15 DHS12 DHS15

0.78 0.41 0.38 0.27 0.86 0.12 0.12 0.33 0.13 0.51 0.43 0.26 0.35 0.02* 0.85 0.81 0.75 0.63 0.35 0.12 0.14 0.87 0.9 0.53 P Value P Value

Color Key E" Position on Chromosome 6 (Mbp) F"

0.4 0.8 89.98 90.48 90.98 91.48 91.98 Value

Posterior 0.85 MS log10(P) − T1D GWAS 0 ATD 0

RRAGD GJA10 MAP3K7

GABRR2 MDN1 BACH2 T1D Traits DHS 7 12 13 14

IBD

Gene GABRR2 RRAGD MDN1 GJA10 BACH2 MAP3K7

10 DHS State Absent CEL Present

8 MS IBD ATD T1D CEL 6 Traits by DHS State by Gene Expression 4 DHS12 DHS12 DHS12 DHS12 DHS12 DHS12

0.33 0.43 0.02* 0.75 0.12 0.9 P Value Concordance Discordance Jaccard Coefficient

Most Associated SNPs 6 45 0.12

CI SNPs (mean) 6.8 31.16 0.21

Prioritized CI SNPs (mean) 2.2 9.47 0.25 Prioritized DHS Clusters (mean) 2.47 8.51 0.27

Prioritized Genes 16 35 0.31 Concordance Discordance Jaccard Coefficient

Most Associated SNPs 6 45 0.12

CI SNPs (mean) 6.8 31.16 0.21

Prioritized CI SNPs (mean) 2.2 9.47 0.25 Prioritized DHS Clusters (mean) 2.47 8.51 0.27

Prioritized Genes 16 35 0.31

Fisher’s exact test P = 0.001