Supplementary Information for “INFIMA leverages multi-omics model organism data to identify effector of human GWAS variants”

Chenyang Dong1, Shane P. Simonett3, Sunyoung Shin4, Donnie S. Stapleton3, Kathryn L. Schueler3, Gary A. Churchill5, Leina Lu6, Xiaoxiao Liu6, Fulai Jin6, Yan Li6, Alan D. Attie3, Mark P. Keller3,*, and Sund¨ uz¨ Keles¸1,2,*

1Department of Statistics, 2Department of Biostatistics and Medical Informatics, 3Department of Biochemistry, University of Wisconsin-Madison 4Department of Mathematical Sciences, University of Texas at Dallas 5The Jackson Laboratory 6Case Western University Corresponding authors: *[email protected], [email protected]

Contents

1 Supplementary Notes 2 1.1 ATAC-seq library amplification primers ...... 2 1.2 Irreproducible discovery rate (IDR) analysis ...... 3 1.3 Quality trimming of ATAC-seq master peaks ...... 3 1.4 Promoter/Enhancer annotation of ATAC-seq master peaks ...... 3 1.5 Collective enrichment of islet β-cell TFs ...... 3 1.6 Application of DAP-G and SuSiE for fine-mapping DO mice eQTLs ...... 3

2 Supplementary Tables 5

3 Supplementary Figures 7

1 1 Supplementary Notes

1.1 ATAC-seq library amplification primers

Ad1 noMX AATGATACGGCGACCACCGAGATCTACACTCGTCGGCAGCGTCAGATGTG Ad2.1 TAAGGCGA CAAGCAGAAGACGGCATACGAGATTCGCCTTAGTCTCGTGGGCTCGGAGATGT Ad2.2 CGTACTAG CAAGCAGAAGACGGCATACGAGATCTAGTACGGTCTCGTGGGCTCGGAGATGT Ad2.3 AGGCAGAA CAAGCAGAAGACGGCATACGAGATTTCTGCCTGTCTCGTGGGCTCGGAGATGT Ad2.4 TCCTGAGC CAAGCAGAAGACGGCATACGAGATGCTCAGGAGTCTCGTGGGCTCGGAGATGT Ad2.5 GGACTCCT CAAGCAGAAGACGGCATACGAGATAGGAGTCCGTCTCGTGGGCTCGGAGATGT Ad2.6 TAGGCATG CAAGCAGAAGACGGCATACGAGATCATGCCTAGTCTCGTGGGCTCGGAGATGT Ad2.7 CTCTCTAC CAAGCAGAAGACGGCATACGAGATGTAGAGAGGTCTCGTGGGCTCGGAGATGT Ad2.8 CAGAGAGG CAAGCAGAAGACGGCATACGAGATCCTCTCTGGTCTCGTGGGCTCGGAGATGT Ad2.9 GCTACGCT CAAGCAGAAGACGGCATACGAGATAGCGTAGCGTCTCGTGGGCTCGGAGATGT Ad2.10 CGAGGCTG CAAGCAGAAGACGGCATACGAGATCAGCCTCGGTCTCGTGGGCTCGGAGATGT Ad2.11 AAGAGGCA CAAGCAGAAGACGGCATACGAGATTGCCTCTTGTCTCGTGGGCTCGGAGATGT Ad2.12 GTAGAGGA CAAGCAGAAGACGGCATACGAGATTCCTCTACGTCTCGTGGGCTCGGAGATGT Ad2.13 GTCGTGAT CAAGCAGAAGACGGCATACGAGATATCACGACGTCTCGTGGGCTCGGAGATGT Ad2.14 ACCACTGT CAAGCAGAAGACGGCATACGAGATACAGTGGTGTCTCGTGGGCTCGGAGATGT Ad2.15 TGGATCTG CAAGCAGAAGACGGCATACGAGATCAGATCCAGTCTCGTGGGCTCGGAGATGT Ad2.16 CCGTTTGT CAAGCAGAAGACGGCATACGAGATACAAACGGGTCTCGTGGGCTCGGAGATGT Ad2.17 TGCTGGGT CAAGCAGAAGACGGCATACGAGATACCCAGCAGTCTCGTGGGCTCGGAGATGT Ad2.18 GAGGGGTT CAAGCAGAAGACGGCATACGAGATAACCCCTCGTCTCGTGGGCTCGGAGATGT Ad2.19 AGGTTGGG CAAGCAGAAGACGGCATACGAGATCCCAACCTGTCTCGTGGGCTCGGAGATGT Ad2.20 GTGTGGTG CAAGCAGAAGACGGCATACGAGATCACCACACGTCTCGTGGGCTCGGAGATGT Ad2.21 TGGGTTTC CAAGCAGAAGACGGCATACGAGATGAAACCCAGTCTCGTGGGCTCGGAGATGT Ad2.22 TGGTCACA CAAGCAGAAGACGGCATACGAGATTGTGACCAGTCTCGTGGGCTCGGAGATGT Ad2.23 TTGACCCT

2 CAAGCAGAAGACGGCATACGAGATAGGGTCAAGTCTCGTGGGCTCGGAGATGT Ad2.24 CCACTCCT CAAGCAGAAGACGGCATACGAGATAGGAGTGGGTCTCGTGGGCTCGGAGATGT

1.2 Irreproducible discovery rate (IDR) analysis

We identified regions of accessible chromatin with MOSAiCS for both separate and pooled samples of each founder strain at false discovery rate (FDR) of 0.05 and applied irreproducible discovery rate (IDR) analysis at IDR of 0.05 to generate peak sets of each strain.

1.3 Quality trimming of ATAC-seq master peaks

Quality trimming addressed a potential bias revealed by the positive correlation of the number of strains shared by an ATAC-seq peak with the ATAC-seq signal. Specifically, we trimmed the master peak list to maximize the overlap of the peaks with 15-state chromHMM annotations across an existing large collection of mouse tissues from the ENCODE project. In total, 12 tissues from mouse were available: liver e16.5, heart e16.5, lung e16.5, kidney e16.5, forebrain e16.5, midbrain e16.5, stomach e16.5, intestine e16.5, neural tube e15.5, hindbrain e13.5, limb e11.5, embryonic facial e13.5 (Supplementary Fig. S5 and S4).

1.4 Promoter/Enhancer annotation of ATAC-seq master peaks

We overlapped ATAC-seq master peaks with ChIP-seq based promoter/enhancer lists from ENCODE and aggregated the results over existing mouse tissues (pancreatic islet was not available). The following datasets were included: embryofacial (H3K4me3 e14.5; H3K27ac e14.5), forebrain (H3K4me3 e14.5; H3K27ac e14.5), hindbrain (H3K4me3 e14.5; H3K27ac e14.5), lung (H3K4me3 e14.5; H3K27ac e14.5), midbrain (H3K4me3 e14.5; H3K27ac e14.5), heart (H3K4me3 e16.5; H3K27ac e16.5), intestine (H3K4me3 e16.5; H3K27ac e16.5), kidney (H3K4me3 e16.5; H3K27ac e16.5), limb (H3K4me3 e15.5; H3K27ac e15.5), liver (H3K4me3 e16.5; H3K27ac e16.5), stomach (H3K4me3 e16.5; H3K27ac e16.5).

1.5 Collective enrichment of islet β-cell TFs

For each of the 744 human or mouse JASPAR 2020 motifs, we computed the footprint depth (FPD, Supplementary Fig. S21) for aggregated footprint profiles in B6 ATAC-seq samples. In order to evaluate the collective enrichment of the known β-cell TFs (PB0042.1;Mafk 1, PB0146.1;Mafk 2, PH0131.1;Pax4, PH0111.1;Nkx2-2, PB0015.1;Foxa2 1, PB0119.1;Foxa2 2, PH0132.1;Pax6, MA0132.1;Pdx1, PH0118.1;Nkx6-1 1, PH0119.1;Nkx6-1 2), we first treated their averaged FPD as observed statistic. We then applied a randomization test by uniformly drawing similar TFs for each β-cell TF (width difference ≤ 1 and information content difference ≤ 0.2). We repeated the randomization for 100,000 times and computed the average FPD during each iteration. A p-value was obtained by comparing the observed averaged FPD to those obtained from the randomized samples.

1.6 Application of DAP-G and SuSiE for fine-mapping DO mice eQTLs

A standard application of human GWAS fine-mapping methods to DO-eQTL results considers all the SNPs around the eQTL marker as candidates. As a showcase, we consider the Adcy5 with 27,307 SNPs within the 1 Mb of the eQTL marker of Adcy5 (representative LD structure for 4, 616 SNPs within the 200Kb of the marker is provided in Fig. S31). The top signal

3 cluster estimated by DAP-G contained 3,590 SNPs with cluster Posterior Inclusive Probability (PIP) of 0.434 and an average LD of 0.956. SuSiE did not output any credible sets for any coverage level. The maximum estimated SNP PIPs for both approaches were less than 0.003. Next, we utilized the multi-omics prior for the 27,307 SNPs by classifying them as: (i) not within an ATAC-seq peak; (ii) within an ATAC-seq peak but is not a local-ATAC-MV; (iii) local-ATAC- MV. For (iii), we utilized the multi-omics priors (Πg) that we have used for INFIMA and set the priors to 0.5 and 1, for (i) and (ii), respectively. However, the outputs from both DAP-G and SuSiE were almost identical with or without using a prior, further supporting that the level of LD in the DO mice hampers a standard application of fine-mapping methods designed for human genetic studies.

4 2 Supplementary Tables

Table S1: Founder ATAC-seq data alignment rates. In this manuscript, 129, AJ, B6, CAST, NOD, NZO, PWK, and WSB are short for 129S1 SvImJ, A J, C57BL/6J, CAST EiJ, NOD ShiLtJ, NZO HlLtJ, PWK PhJ, and WSB EiJ respectively. Sample AlignOnce % AlignMultiple % chrM % Duplicate % 1 129-F 66.42 29.74 28.71 11.80 2 129-M 57.96 29.19 41.65 17.74 3 AJ-F 55.95 37.11 41.75 18.86 4 AJ-M 55.00 38.83 37.05 16.31 5 B6-F 55.47 32.22 35.35 15.68 6 B6-M 59.47 34.01 16.46 9.14 7 CAST-F 69.99 23.84 18.57 11.41 8 CAST-M 68.34 25.37 19.83 8.41 9 NOD-F 58.39 32.55 33.39 17.79 10 NOD-M 63.50 32.21 29.16 11.51 11 NZO-F 66.15 26.68 19.33 8.94 12 NZO-M 66.60 24.45 12.72 7.75 13 PWK-F 55.80 35.56 34.35 19.16 14 PWK-M 57.01 34.25 25.27 13.49 15 WSB-F 71.38 24.97 21.23 7.81 16 WSB-M 62.91 24.77 20.93 7.60

5 Table S2: Comparison of rate of alignments to reference genome (mm10) and personalized genomes. Sample AlignReference % AlignPersonalized % 1 129-F 96.16 96.38 2 129-M 87.16 87.23 3 AJ-F 93.06 93.17 4 AJ-M 93.83 93.97 5 B6-F 87.64 87.64 6 B6-M 93.44 93.44 7 CAST-F 93.84 96.12 8 CAST-M 93.71 95.98 9 NOD-F 90.94 91.11 10 NOD-M 95.71 95.92 11 NZO-F 92.84 93.03 12 NZO-M 91.05 91.36 13 PWK-F 91.36 94.13 14 PWK-M 91.26 94.59 15 WSB-F 96.35 96.63 16 WSB-M 87.69 87.97

Table S3: Number of strain-specific ATAC-seq peaks before and after quality trimming. Strain BeforeTrim AfterTrim 1 129 368 105 2 AJ 242 19 3 B6 1048 103 4 CAST 15894 1406 5 NOD 995 45 6 NZO 244 83 7 PWK 1924 430 8 WSB 5914 813

Table S4: P-values from pairwise Kolmogorov-Smirnov tests between cumulative distribution curves of the five fine-mapping strategies. ”Most Likely” and ”Least Likely” refer to INFIMA most and least likely predictions, respectively. Strategy Most Likely Least Likely Random Closest to Marker Least Likely <2.2e-16 Random <2.2e-16 <2.2e-16 Closest to Marker <2.2e-16 <2.2e-16 1.0e-15 Closest to Gene <2.2e-16 <2.2e-16 <2.2e-16 <2.2e-16

6 Table S5: Kullback–Leibler (KL) divergence between density curves of the five fine-mapping strategies. ”Most Likely” and ”Least Likely” refer to INFIMA most and least likely predictions, respectively. Strategy Most Likely Least Likely Random Closest to Marker Closest to Most Likely 0 0.548 0.248 0.121 0.080 Least Likely 0.464 0 0.091 0.189 0.419 Random 0.236 0.098 0 0.069 0.215 Closest to Marker 0.126 0.216 0.072 0 0.085 Closest to Gene 0.091 0.478 0.220 0.082 0

Table S6: Bonferroni adjusted p-values from pairwise Chi-Squared tests comparing the distri- bution of Hi-C scores of the five fine-mapping strategies across ten quantile bins. ”Most Likely” and ”Least Likely” refer to INFIMA most and least likely predictions, respectively. Strategy Most Likely Least Likely Random Closest to Marker Least Likely 1.16e-201 Random 1.19e-86 2.61e-23 Closest to Marker 1.31e-39 3.36e-72 2.92e-12 Closest to Gene 3.08e-17 6.12e-160 1.71e-67 1.35e-17

3 Supplementary Figures

7 129−F 129−M AJ−F AJ−M 15 20 TSS score = 25.55 TSS score = 14.09 TSS score = 19.71 15 TSS score = 16.78 20 15 10

10 10 10 5 5 5 TSS enrichment score TSS enrichment score TSS enrichment score TSS enrichment score

0 0 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 Distance to TSS (bp) Distance to TSS (bp) Distance to TSS (bp) Distance to TSS (bp) B6−F B6−M CAST−F CAST−M 20 20 20 20 TSS score = 21.96 TSS score = 20.43 TSS score = 20.07 TSS score = 19.06 15 15 15 15

10 10 10 10

5 5 5 5 TSS enrichment score TSS enrichment score TSS enrichment score TSS enrichment score

0 0 0 0 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 Distance to TSS (bp) Distance to TSS (bp) Distance to TSS (bp) Distance to TSS (bp) NOD−F NOD−M NZO−F NZO−M

20 20 TSS score = 18.73 15 TSS score = 20.4 15 TSS score = 17.64 TSS score = 16.58 15 15

10 10 10 10

5 5 5 5 TSS enrichment score TSS enrichment score TSS enrichment score TSS enrichment score

0 0 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 Distance to TSS (bp) Distance to TSS (bp) Distance to TSS (bp) Distance to TSS (bp) PWK−F PWK−M WSB−F WSB−M

20 20 20 TSS score = 19.81 TSS score = 21.52 20 TSS score = 21.99 TSS score = 19.14 15 15 15 15

10 10 10 10

5 5 5 5 TSS enrichment score TSS enrichment score TSS enrichment score TSS enrichment score

0 0 0 0 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 Distance to TSS (bp) Distance to TSS (bp) Distance to TSS (bp) Distance to TSS (bp)

Fig. S1: Summary of transcription start site (TSS) enrichment analysis. TSS scores are calcu- lated by ataqv (https://github.com/ParkerLab/ataqv) and represent an enrichment metric based on the transposition activity around the transcription start sites. Scores of 15 or larger are deemed as ideal quality.

8 Fig. S2: The full version of Fig. 2e.

2e+05 1e+05

Count Count

106 106

104 104

102 102 1e+05 5e+04 100 100 AJ−M ATAC−seq reads AJ−M ATAC−seq 129−M ATAC−seq reads 129−M ATAC−seq

2 2 0e+00 R = 0.893 0e+00 R = 0.998

0 50000 100000 150000 200000 0 50000 100000 150000 200000 129−F ATAC−seq reads AJ−F ATAC−seq reads

9 150000

150000

Count 100000 Count

100000 106 106

104 104

102 102

100 50000 100 B6−M ATAC−seq reads B6−M ATAC−seq 50000 reads Cast−M ATAC−seq

2 2 0 R = 0.995 0 R = 0.973

0 50000 100000 150000 0 50000 100000 150000 B6−F ATAC−seq reads Cast−F ATAC−seq reads

200000

150000 150000

Count Count

106 106 100000 100000 104 104

102 102

100 100 NZO−M ATAC−seq reads NZO−M ATAC−seq NOD−M ATAC−seq reads NOD−M ATAC−seq 50000 50000

2 2 0 R = 0.990 0 R = 0.885

0e+00 1e+05 2e+05 0 50000 100000 150000 NOD−F ATAC−seq reads NZO−F ATAC−seq reads

90000

1e+05 Count Count

106 106 60000 104 104

102 102 5e+04 100 100 PWK−M ATAC−seq reads PWK−M ATAC−seq reads WSB−M ATAC−seq 30000

2 2 0 R = 0.894 0e+00 R = 0.993

0 50000 100000 150000 200000 0e+00 5e+04 1e+05 PWK−F ATAC−seq reads WSB−F ATAC−seq reads

Fig. S3: Reproducibility of ATAC-seq samples. ATAC-seq read counts summarized over 200 bp bins along the genome across the male (M) and female (F) samples.

10 5 ● ●

● ●

● ●

● ● ● ● ● ● ● ● ● ● 4 ● ● ● ●

) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Before trim ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● MeanSignal ● ● ● ● After trim ● ● ● ● ● ● ● ● ● ● ● ( ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● g ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● o ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● l ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

2

1 2 3 4 5 6 7 8 Number of strains sharing a peak

Fig. S4: ATAC-seq signal of peaks shared by different numbers of strains. ”Before trim” and ”After trim” denote ATAC-seq signal of the master peaks before and after trimming, respectively, as a function of the total number of strains a peak is shared by.

Proportion of non−quiescent peaks after trimming Proportion of peaks after trimming Number of strains sharing a peak = 1 Number of strains sharing a peak = 1 Max = 0.622 Max = 1 Min = 0.583 Min = 0.374

1 1 0.62 1 2 2 3 3 4 0.615 4 0.9 5 5 6 0.61 6 7 7 0.8 8 8 9 0.605 9 10 10 0.7 11 0.6 11 12 12 13 13 0.6 14 0.595 14 15 15 16 0.59 16 0.5 17 17 18 0.585 18 19 19 0.4 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30

Percentile of MeanSignal Percentile 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Percentile of MeanP

11 Proportion of non−quiescent peaks after trimming Proportion of peaks after trimming Number of strains sharing a peak = 2 Number of strains sharing a peak = 2 Max = 0.735 Max = 1 Min = 0.692 Min = 0.403

1 1 1 2 2 3 0.73 3 4 4 0.9 5 5 6 6 7 0.72 7 0.8 8 8 9 9 10 10 0.7 11 11 12 0.71 12 13 13 0.6 14 14 15 15 16 0.7 16 0.5 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30

Percentile of MeanSignal Percentile 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Percentile of MeanP

Proportion of non−quiescent peaks after trimming Proportion of peaks after trimming Number of strains sharing a peak = 3 Number of strains sharing a peak = 3 Max = 0.778 Max = 1 Min = 0.752 Min = 0.402

1 1 1 2 2 3 0.775 3 4 4 0.9 5 5 6 0.77 6 7 7 0.8 8 8 9 9 10 0.765 10 0.7 11 11 12 12 13 13 0.6 14 0.76 14 15 15 16 16 0.5 17 0.755 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30

Percentile of MeanSignal Percentile 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Percentile of MeanP

12 Proportion of non−quiescent peaks after trimming Proportion of peaks after trimming Number of strains sharing a peak = 4 Number of strains sharing a peak = 4 Max = 0.798 Max = 1 Min = 0.777 Min = 0.412

1 1 1 2 2 3 3 4 0.795 4 0.9 5 5 6 6 7 7 0.8 8 0.79 8 9 9 10 10 0.7 11 11 12 0.785 12 13 13 0.6 14 14 15 15 16 16 17 0.78 17 0.5 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30

Percentile of MeanSignal Percentile 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Percentile of MeanP

Proportion of non−quiescent peaks after trimming Proportion of peaks after trimming Number of strains sharing a peak = 5 Number of strains sharing a peak = 5 Max = 0.822 Max = 1 Min = 0.803 Min = 0.408

1 1 1 2 2 3 0.82 3 4 4 0.9 5 5 6 6 7 7 0.8 8 0.815 8 9 9 10 10 0.7 11 11 12 0.81 12 13 13 0.6 14 14 15 15 16 16 0.5 17 0.805 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30

Percentile of MeanSignal Percentile 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Percentile of MeanP

13 Proportion of non−quiescent peaks after trimming Proportion of peaks after trimming Number of strains sharing a peak = 6 Number of strains sharing a peak = 6 Max = 0.832 Max = 1 Min = 0.811 Min = 0.417

1 1 1 2 0.83 2 3 3 4 4 0.9 5 5 6 6 7 0.825 7 0.8 8 8 9 9 10 10 0.7 11 0.82 11 12 12 13 13 14 14 0.6 15 15 16 0.815 16 17 17 0.5 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30

Percentile of MeanSignal Percentile 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Percentile of MeanP

Proportion of non−quiescent peaks after trimming Proportion of peaks after trimming Number of strains sharing a peak = 7 Number of strains sharing a peak = 7 Max = 0.862 Max = 1 Min = 0.842 Min = 0.419

1 1 1 2 2 3 0.86 3 4 4 0.9 5 5 6 6 7 0.855 7 0.8 8 8 9 9 10 10 0.7 11 11 12 0.85 12 13 13 14 14 0.6 15 15 16 16 17 0.845 17 0.5 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30

Percentile of MeanSignal Percentile 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Percentile of MeanP

14 Proportion of non−quiescent peaks after trimming Proportion of peaks after trimming Number of strains sharing a peak = 8 Number of strains sharing a peak = 8 Max = 0.954 Max = 1 Min = 0.936 Min = 0.456

1 1 1 2 2 3 3 4 4 0.9 5 0.95 5 6 6 7 7 8 8 0.8 9 9 10 0.945 10 11 11 0.7 12 12 13 13 14 14 15 0.94 15 0.6 16 16 17 17 18 18 0.5 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30

Percentile of MeanSignal Percentile 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Percentile of MeanP

Fig. S5: Trimming of the master peak list. Heatmaps depict the proportion of retained peaks (Right columns) and the proportion of peaks in non-quiescent regions of the genome pooled across a set of tissues (Left columns) as a function of the two trimming parameters for each value of the number of strains sharing the peaks. The final percentiles for the tuning parameters (MeanP, MeanSignal) were set as: (51, 45), (23, 6), (46, 1), (45, 0), (47, 0), (2, 14), (0, 0), (0,0) for the number of strains sharing the peaks = 1, 2, 3, 4, 5, 6, 7, 8, respectively. Here, ”(MeanP, MeanSignal) = (a, b)” corresponds to removing peaks of the master peak list with the lowest (a-1)% of ‘MeanP’ and/or the lowest (b-1)% of ‘MeanSignal’.

15 21285

20000

15000

10000 Intersection Size

5000 4219 2974

17211406 895 872 865 831 814 813 797 532 455 442 430 416 402 347 330 330 326 303 0 B6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 129 ● ● ● ● ● ● ● ● NZO ● ● ● ● ● ● ● AJ ● ● ● ● ● ● ● NOD ● ● ● ● ● ● ● ● ● ● ● ● ● WSB ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PWK ● ● ● ● ● ● ● ● ● ● ● ● ● Cast ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40000300002000010000 0 Set Size

Fig. S6: The UpSet plot for 51,014 ATAC-seq master peaks after quality trimming. Bars repre- sent the numbers of peaks that originate from different combinations of strains.

● ● ● ● ● ●

3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● After trim ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● MeanSignal ● ● ● ● ● ● ● ● ● ● ● ( ● ● ● ● ● ● ● ● ● Before trim ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● 2.5 ● ● ● ● ● ● ● ● ● g ● ● ● ● ● ● ● ● ● ● ● ● ● ● o ● ● ● l ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

2.0

129 AJ B6 CAST NOD NZO PWK WSB Strain−specific ATAC−seq peaks

Fig. S7: ATAC-seq signal of strain-specific peaks. ”Before trim” and ”After trim” denote ATAC- seq signal of the master peaks before and after trimming, respectively.

16 unclassified promoter enhancer both

100

75

50

25 Percentage of ATAC−seq Peaks of ATAC−seq Percentage 0 1 2 3 4 5 6 7 8 Number of strains sharing a peak

Fig. S8: Promoter/enhancer annotation of the ATAC-seq master peak list stratified by the num- bers of strains sharing a peak (see URLs; Supplementary Notes).

17 430 19 103 45

75 813

1406

105 83

50

25 Percentage of Non−Promoter/Enhancer ATAC−seq Peaks of Non−Promoter/Enhancer ATAC−seq Percentage

0

129 AJ B6 Cast NOD NZO PWK WSB Strain

Fig. S9: Percentage of strain-specific ATAC-seq peaks not categorized as promoter or en- hancer. The total numbers of strain-specifc ATAC-seq peaks are displayed on top of each bar.

18 Intron: 30.70% Distal Intergenic: 34.54% Promoter: 34.50% 30 Distal Intergenic: 28.81% Promoter: 26.64%

Intron: 28.54%

20 StrainEffect No Strain Effect Strain Effect

10 Observed percentage of peaks Exon: 4.73%

Exon: 4.45% 0

0 20 40 Expected percentage of peaks

Fig. S10: Genomic location analysis of the ATAC-seq peaks with or without strain effect. Ex- pected percentages are computed with the regioneR package using genomic location annota- tions from ChIPseeker package.

19 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ●

log2(gene expression) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

0 ●

Ahr Arx Irx2 Nfic Yy1 Arnt Ctcf Ets1 Mafb Mafk Mzf1 Pax6 Pdx1 Spi1 Spib Foxa2 Gata2 Gata4 Mnx1 Nfatc2 Sox10 Nkx2−2 Nkx6−1

Fig. S11: Founder islet expressions of TFs in Fig. 3a. TFs that are not shown did not have detectable levels of expression.

20 3 3 3

● ● ● ● ● 2 ● 2 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 1 1

● ● ● ● ● ● ● ● ● 0 ● 0 ● 0 ● KL divergence KL divergence KL divergence PB0029.1;Hic1_1 CN0070.1;LM70 CN0209.1;LM209 CN0207.1;LM207 CN0113.1;LM113 CN0202.1;LM202 PB0132.1;Hbp1_2 PB0018.1;Foxk1_1 PB0061.1;Sox11_1 PB0193.1;Tcfe2a_2

−1 −1 −1 CN0124.1;LM124 PB0168.1;Sox14_2 CN0128.1;LM128 CN0141.1;LM141 PB0071.1;Sox4_1 MA0142.1;Pou5f1 MA0119.1;TLX1::NFIC MA0143.1; PB0166.1;Sox12_2 −2 −2 −2 CN0014.1;LM14 MA0442.1;SOX10 MA0095.1;YY1 MA0161.1;NFIC

3 3 3

● ● ● ● ● ● ● ● ● 2 ● 2 ● 2 ● ● ● ● ● ● ● ● ● ● ●

1 1 1

● ● ● ● ● ● ● ● ● 0 ● 0 ● 0 ● PH0129.1;Otx1 PH0015.1;Crx MA0076.1;ELK4 MA0028.1;ELK1 CN0099.1;LM99 KL divergence KL divergence KL divergence PB0011.1;Ehf_1 MA0062.1;GABPA PB0020.1;Gabpa_1 MA0156.1;FEV

−1 PB0077.1;Spdef_1 −1 −1 MA0136.1;ELF5 MA0136.1;ELF5 PH0122.1;Obox2 PB0012.1;Elf3_1 PB0115.1;Ehf_2 PB0058.1;Sfpi1_1 −2 MA0156.1;FEV −2 −2 MA0098.1;ETS1 MA0081.1;SPIB MA0056.1;MZF1_1−4

Fig. S12: Kullback-Leibler (KL) divergence of highlighted TF motifs with footprints in Fig. 3a to all other 744 TF motifs queried. Similarities between motifs were computed based on both Kullback-Leibler divergence and Pearson correlation coefficient. After taking intersec- tion among top 20 with respect to the two similarity metrics, top 10 motifs with KL less than 0.25 are labelled.

21 3 3 3

● ● ● ● ● ● ● ● 2 ● 2 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 1 1

● ● ● ● ● ● ● ● ● 0 ● 0 ● 0 ● KL divergence KL divergence KL divergence MA0036.1;GATA2 PH0111.1;Nkx2−2 PH0115.1;Nkx2−6 PB0021.1;Gata3_1 PB0022.1;Gata5_1 PB0023.1;Gata6_1 PB0111.1;Bhlhb2_2 −1 −1 −1 PH0004.1;Nkx3−2 PH0114.1;Nkx2−5 PB0048.1;Nkx3−1_1 MA0140.1;Tal1::Gata1 −2 −2 −2 MA0130.1;ZNF354C MA0035.1;Gata1 MA0035.2;Gata1

3 3 3

● ● ● ● ● ● ● ● ● ● ● 2 2 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1 1 1

● ● ● ● ● ● ● ● ● ● ● 0 ● 0 ● 0 ● MA0136.1;ELF5 CN0099.1;LM99 PB0011.1;Ehf_1 KL divergence KL divergence KL divergence PB0012.1;Elf3_1 CN0110.1;LM110 PB0162.1;Sfpi1_2 PB0011.1;Ehf_1 PB0125.1;Gata3_2

−1 −1 −1 PB0012.1;Elf3_1 PB0058.1;Sfpi1_1 MA0140.1;Tal1::Gata1 MA0144.1;Stat3 MA0080.2;SPI1 MA0080.1;SPI1 PB0058.1;Sfpi1_1 MA0062.1;GABPA

−2 −2 −2 MA0156.1;FEV MA0037.1;GATA3 MA0080.1;SPI1 MA0080.2;SPI1

Fig. S13: Kullback-Leibler (KL) divergence of highlighted TF motifs with footprints in Fig. 3a to all other 744 TF motifs queried. Similarities between motifs were computed based on both Kullback-Leibler divergence and Pearson correlation coefficient. After taking intersec- tion among top 20 with respect to the two similarity metrics, top 10 motifs with KL less than 0.25 are labelled.

22 3 ● 3 3 ● ●

● ● ● ● ● ● ● 2 ● 2 2 ● ● ● ●

1 1 1

● ● ● ● ● 0 ● 0 ● 0 PH0085.1;Irx4 MA0069.1;Pax6 CN0036.1;LM36 KL divergence KL divergence KL divergence PH0083.1;Irx3_1 CN0214.1;LM214 CN0231.1;LM231 CN0124.1;LM124 PH0087.1;Irx6 MA0162.1;Egr1 −1 −1 −1 CN0114.1;LM114 PH0086.1;Irx5 PH0084.1;Irx3_2 CN0178.1;LM178 CN0186.1;LM186 CN0225.1;LM225 CN0229.1;LM229

−2 −2 −2 MA0152.1;NFATC2 MA0006.1;Arnt::Ahr PH0082.1;Irx2

3 3 3

2 2 2

● ● ●

1 1 1

0 0 0 PH0086.1;Irx5 PH0172.1;Tlx2 KL divergence KL divergence KL divergence PH0070.1;Hoxc5 MA0125.1;Nobox PB0145.1;Mafb_2 PH0054.1;Hoxa7_1 PH0020.1;Dlx1

−1 −1 MA0075.1;Prrx2 −1 PH0118.1;Nkx6−1_1 MA0075.1;Prrx2 PH0119.1;Nkx6−1_2 PB0042.1;Mafk_1 MA0125.1;Nobox PH0120.1;Nkx6−3 −2 −2 −2 PB0041.1;Mafb_1 PH0118.1;Nkx6−1_1 PH0119.1;Nkx6−1_2

Fig. S14: Kullback-Leibler (KL) divergence of highlighted TF motifs with footprints in Fig. 3a to all other 744 TF motifs queried. Similarities between motifs were computed based on both Kullback-Leibler divergence and Pearson correlation coefficient. After taking intersec- tion among top 20 with respect to the two similarity metrics, top 10 motifs with KL less than 0.25 are labelled.

23 3 3 3

● ● ● 2 ● 2 2

● ● ●

1 1 1

0 0 0 PH0175.1;Vax2 PH0032.1;Evx2 PH0031.1; MA0132.1;Pdx1 PH0034.1;Gbx2 MA0132.1;Pdx1 KL divergence KL divergence KL divergence PH0045.1;Hoxa1 PH0050.1;Hoxa3 PH0103.1;Meox1 CN0126.1;LM126 PH0036.1;Gsx2 CN0069.1;LM69 PB0031.1;Hoxa3_1

−1 −1 MA0075.1;Prrx2 −1 PH0055.1;Hoxa7_2 PH0027.1;Emx2 PH0060.1;Hoxb5 PH0175.1;Vax2 PH0031.1;Evx1 MA0075.1;Prrx2 MA0075.1;Prrx2 PH0081.1;Pdx1 PH0050.1;Hoxa3 PH0052.1;Hoxa5 PH0135.1;Phox2a PH0049.1;Hoxa2

−2 −2 −2 PH0062.1;Hoxb7 MA0132.1;Pdx1 PH0081.1;Pdx1 PH0039.1;Mnx1

3 3 3

2 2 2

● ● ● ● ● ● ● ● 1 1 1

● ● 0 ● 0 0 ● PH0030.1;Esx1 KL divergence KL divergence KL divergence PH0010.1;Alx1_1 MA0032.1;FOXC1 PB0016.1;Foxj1_1 CN0026.1;LM26 MA0006.1;Arnt::Ahr

−1 −1 MA0075.1;Prrx2 −1 MA0031.1;FOXD1 PH0099.1;Lhx9 MA0148.1;FOXA1 PB0015.1;Foxa2_1 −2 −2 −2 MA0069.1;Pax6 PH0132.1;Pax6 MA0047.2;Foxa2

Fig. S15: Kullback-Leibler (KL) divergence of highlighted TF motifs with footprints in Fig. 3a to all other 744 TF motifs queried. Similarities between motifs were computed based on both Kullback-Leibler divergence and Pearson correlation coefficient. After taking intersec- tion among top 20 with respect to the two similarity metrics, top 10 motifs with KL less than 0.25 are labelled.

24 3 3 3

2 2 2

● ● ● 1 1 1

0 0 0 PH0087.1;Irx6 KL divergence KL divergence KL divergence PB0102.1;Zic2_1 PB0101.1;Zic1_1 MA0148.1;FOXA1 MA0031.1;FOXD1 MA0157.1;FOXO3 PB0016.1;Foxj1_1 PB0075.1;Sp100_1 PB0075.1;Sp100_1 PB0192.1;Tcfap2e_2

−1 −1 −1 PB0202.1;Zfp410_2 PB0103.1;Zic3_1 PB0186.1;Tcf3_2 PB0017.1;Foxj3_1 PB0092.1;Zbtb7b_1 PB0019.1;Foxl1_1 PB0097.1;Zfp281_1 PB0018.1;Foxk1_1 PB0145.1;Mafb_2

−2 −2 −2 PB0100.1;Zfp740_1 PB0015.1;Foxa2_1 PB0119.1;Foxa2_2 MA0068.1;Pax4

3 3 3

2 2 2 ● ● ● ● ● ● ● 1 1 1

0 0 0 PH0086.1;Irx5 PH0090.1;Lbx2 PH0031.1;Evx1 PH0156.1;Rax KL divergence KL divergence KL divergence PH0050.1;Hoxa3 PB0186.1;Tcf3_2 PH0004.1;Nkx3−2 MA0130.1;ZNF354C PB0145.1;Mafb_2 −1 −1 −1 MA0075.1;Prrx2 PH0003.1;Arx PH0001.1;Alx3 PB0041.1;Mafb_1 PH0113.1;Nkx2−4 PB0092.1;Zbtb7b_1 PH0114.1;Nkx2−5 −2 PH0027.1;Emx2 −2 −2 PH0131.1;Pax4 PH0111.1;Nkx2−2 PB0042.1;Mafk_1

Fig. S16: Kullback-Leibler (KL) divergence of highlighted TF motifs with footprints in Fig. 3a to all other 744 TF motifs queried. Similarities between motifs were computed based on both Kullback-Leibler divergence and Pearson correlation coefficient. After taking intersec- tion among top 20 with respect to the two similarity metrics, top 10 motifs with KL less than 0.25 are labelled.

25 MA0130.1;ZNF354C

PH0111.1;Nkx2-2

Fig. S17: Sequence logos of ZNF354C and Nkx2-2.

26 10000

5000 Number of ATAC−seq peaks Number of ATAC−seq

0

0 10 20 30 40 Number of local−ATAC−MVs

Fig. S18: Histogram of numbers of local-ATAC-MVs within a SNP harboring differential ATAC- seq master peak.

27 129:AJ:CAST:NOD:NZO:PWK 129:AJ:CAST:NOD:NZO:WSB 129:AJ:NOD:WSB CAST:NZO:PWK:WSB CAST:NOD:PWK:WSB NZO 129:AJ:NOD:NZO CAST:NOD:PWK 129:CAST:PWK:WSB 129:AJ:NOD:NZO:WSB PWK:WSB Positive NZO:PWK Negative 129:CAST:PWK CAST:NZO:PWK CAST:WSB WSB

Strains with the alternative allele Strains PWK CAST:PWK:WSB CAST:PWK CAST 5000 0 5000 10000 Number of local−ATAC−MVs

Fig. S19: Genotype decomposition of local-ATAC-MVs. Groups of strains with the alternative allele compared to reference B6 allele are displayed in the y-axis. Positive category: alternative allele is associated with an increase in chromatin accessibility. Negative category: alternative allele is associated with a decrease in chromatin accessibility.

28 100

75 Annotation 5' UTR 3' UTR Downstream 50 Exon

Percentage Promoter Distal Intergenic 25 Intron

0

Fig. S20: Genomic location annotation of local-ATAC-MVs. The annotation categories and coordinates are from the ChIPseeker R package

29 tan ihterfrneadatraieallso h N o ahSPmtfcombination. SNP-motif each for SNP bp, lines the defined black 25 of is - The alleles depth alternative 0 footprint respectively. and Right The reference site, Middle, the binding regions. with bp, these strains TF 25 of centered - each the 0 within FPD to Left values as bp, respect signal 50 average with - the bp depict 26 50 Left - of bp. 26 in regions Right site within binding signal Tn5 TF normalized average to represents the distance y-axis represents calculation. x-axis (FPD) depth and footprint cuts the of Illustration S21: Fig.

Normalized Tn5 Cuts 1 = 50 − ( � L ! a ,L M b ,R b 25 ,R a ) P ausa niiulmtflctosaecmae across compared are locations motif individual at values FPD . � " 0 30 � 0 L � a " , L b , M 25 , R b and , � ! R a 50 represent bp Fig. S22: Summary of the atSNP and FPD analysis for individual footprints. Only the footprints with pval rank < 0.05 and pval fpd < 0.05 are displayed in the figure (grey dashed lines). The sign of x-axis: negative means ∆FPD < 0; positive means ∆FPD > 0. The sign of y-axis: negative means motif disruption; positive means motif enhancing. Blue and red colored points are the concordant footprints passing FDR of 0.05.

31 1500

1000 category Disrupt Enhance

Number of SNPs 500

0

0.0 − 0.20.2 − 0.40.4 − 0.60.6 − 0.80.8 − 1.01.0 − 1.21.2 − 1.41.4 − 1.61.6 − 1.81.8 − 2.0 Motif Information Content (IC)

Fig. S23: Summary of the atSNP results for individual footprints binned by motif information content.

32 Fig. S24: Heatmap of islet RNA-seq data from founder DO strains: row and columns represent genes and samples, respectively. Transcripts of genes (rows) across samples are standardized to [0, 1] and are clustered by using k-means with k = 10. Columns are not clustered.

33 1 0.9 0.8 0.7 0.6 0.5 0.4

0.3 Strain 129 AJ B6 Cast NOD NZO PWK WSB

Fig. S25: Heatmap of pairwise Pearson correlations between 91 islet RNA-seq samples from DO founder strains. Rows and columns are organized hierarchically.

34 0.4

0.3

0.2 Density

0.1

0.0 0 2 4 6 log10(distance + 1)

Fig. S26: The density of the genomic distances between the local-ATAC-MVs and the tran- scription start sites (TSSs) of the genes that they associate with. Blue and red shading denote promoter-proximal (distance ≤ 3 Kb) and promoter-distal ATAC-seq peaks harboring the local- ATAC-MVs (distance > 3 Kb), respectively. The dashed lines represent the means for different groups.

B6 eGenes Cast eGenes 2.5

3 2.0

1.5 2 B6 1.0 Cast 1 0.5 Normalized ATAC−seq signal Normalized ATAC−seq signal Normalized ATAC−seq 0.0 0 −2000 −1000 0 1000 2000 −2000 −1000 0 1000 2000 Distance to TSS Distance to TSS

Fig. S27: B6 and CAST normalized ATAC-seq signals at the TSSs of eGenes that are differen- tially expressed between the two strains. Left and right panels depict eGenes for B6 (expressed more in B6) and CAST (expressed more in CAST), respectively.

35 129

AJ AJ

B6 B6

Cast Cast

NOD NOD

NZO ● NZO

PWK PWK

WSB

0.36 0.4 0.44 0.48 0.52 0.57 0.61 0.65 0.69 0.73 0.78

Fig. S28: Proportion of concordant eGenes across strains, i.e., differentially expressed genes with agreement between the direction of higher expression and higher TSS accessibility.

36 ●

6000

4000

2000

Number of DO−eQTL markers ●

● ● 0 ● ● ● ● ● ●

1 2 3 4 5 6 7 8 9 11 Number of genes associated with eQTL marker

Fig. S29: Number of DO-eQTL markers (y-axis) vs. the number of DO genes a marker is associated with (x-axis). Most eQTL markers are associated with only one gene.

37 p < 2.22e−16 p < 2.22e−16 0.013 0.92 0.950 p < 2.22e−16 p < 2.22e−16 5.1e−13

● ● ● ● ● ● ● ● 0.650 ● ● ● ● 0.945 ● ● ●

0.91 0.625 0.940 AUPR Power AUROC

0.600 0.935

0.90 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.575 0.930 ● ● ● ● ● ● ● ● ● ●

Simulation setting NI MI HI

Fig. S30: INFIMA model performance quantified by inferred posterior probabilities of having a causal SNP across all the genes, i.e., Vˆg, ··· , g = 1, ··· ,G. Statistically significant Wilcoxon rank sum tests are highlighted at the top.

38 Fig. S31: The DO mice LD matrix of the imputed SNPs at the Adcy5 locus. Row and columns depict the 4,616 SNPs within the 200 Kb of the eQTL marker and are ordered with respect to their genomic coordinates. Red color indicates a perfect LD.

39 a b

0.15 1500

0.10 1000 Most Likely Least Likely Most Likely Random Closest to Gene Closest to Marker Closest to Gene 0.05 500 Number of causal local−ATAC−MVs Proportion of causal local−ATAC−MVs

0 0.00

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Quantile bins of normalized Hi−C score Quantile bins of normalized Hi−C score

Fig. S32: a. Evaluation of fine-mapping strategies by counting the number of causal local- ATAC-MVs in each Hi-C quantile bin. b. A direct comparison between INFIMA ”Most Likely” and ”Closest to Gene”.

40 Fig. S33: The DO mice LD matrix for the 3,671 local-ATAC-MVs from 1. Rows and columns depict local-ATAC-MVs and are ordered with respect to their genomic coordinates. Red color indicates a perfect LD.

41 1.00

80%

0.75

60% C Hi − e d

i z 0.50

ar d 50% tan d S

0.25 40%

20% 0.00

ely ely er om k rior) rior) rior) rior) and R (no p Most Lik Least Lik G (no p (with p iE G (with p − iE − uS Closest to Gene AP uS S Closest to Mar AP D S D

Fig. S34: Standardized Hi-C scores for the inferred causal local-ATAC-MVs by INFIMA and the competing approaches. Shaded areas mark different percentiles of normalized scores: Light blue: 20% and 80% percentiles; Blue: 40% and 60% percentiles; Dark blue: median.

42 4000

3000

2000

Number of GWAS SNPs Number of GWAS 1000

0

Insulin Proinsulin BodyWeight Triglycerides BloodGlucose BodyMassIndex Type1DiabetesType2Diabetes InsulinResistance DiabetesGestational MetabolicSyndrome DiabeticNeuropathies InsulinSecretingCells DiabeticNephropathiesGlucoseToleranceTestGlycatedHemoglobinA

Fig. S35: An overview of the number of GWAS SNPs for 16 diabetic traits.

GWAS SNP nearest ATAC peak GWAS SNP nearest ATAC peak

nearest ATAC peak nearest ATAC peak ✓

syntenic region local-ATAC-MVs best syntenic local-ATAC-MVs region

Fig. S36: Mapping scheme between human and mouse: direct lift-over (left); ATAC-seq peak- based lift-over (right).

43 99 100 100 ● ● 95 ●

75

57 ● 54 ● 50

40 ●

30 Number of ATAC−seq peaks Number of ATAC−seq ●

25 23 ●

12 13 ● ● 5 4 4 ● 3 2 ● ● 2 2 ● 1 1 1 ● ● 0 0 0 0 ● ● ● ● 0 ● ● ● ●

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Number of genes

Fig. S37: Number of DO-eQTL genes an ATAC-seq peak is linked to by INFIMA.

GWAS SNP pcHi-C GENE1 GWAS SNP GENE1

Gene1 Gene1

best syntenic local-ATAC-MVs best syntenic local-ATAC-MVs region regulating Gene1 region regulating Gene1

Fig. S38: Validation scheme with promoter capture Hi-C (pcHi-C) data. Left: GWAS SNP and/or human ATAC-seq peak at enhancer region in contact with orthologous gene promoter. Right: GWAS SNP and/or human ATAC-seq peak at promoter region of the orthologous gene. pcHi-C assays excludes short range interactions between a gene and its own promoter (Right panel).

44 ATAC-MVs. S40: Fig. ATAC-MVs. S39: Fig. a ABCC8 PDX1 Human GWAS SNPs rhlgu tN eut.Lf:hmnGA Ns ih:muelocal- mouse Right: SNPs. GWAS human Left: results. atSNP orthologous pval_rank =1.68e-3 pval_rank =5.99e-3 pval_rank =6.33e-4 rhlgu tN eut.Lf:hmnGA Ns ih:muelocal- mouse Right: SNPs. GWAS human Left: results. atSNP orthologous 45 b

PWM (+) (+) PWM PWM (+) (+) PWM 5' 5' 5' 5' 5' 5' 5' 5' 1 1 PB0084.1;Tcf7l2_1 MotifScanfor rs227822836 2 2 1 1 MA0095.1;YY1 MotifScanfor rs230081777 local-ATAC-QTLs 3 3 pval_rank =2.03e-2 4 4 pval_rank =8.53e-3

PWM (−) (−) PWM Best match tothereference genome Best match tothereference genome 2 2 5 5 Best match to the SNP genome Best match totheSNP genome Best match totheSNP 3' 3' 3' 3' 6 6 1 1 MA0018.2;CREB1 MotifScanfor rs229501323 7 7 3 3 8 8 2 2 9 9 Best match tothereference genome 10 10 Best match to the SNP genome Best match totheSNP 3 3 11 11 4 4 12 12 4 4 13 13 14 14 5 5 5 5 15 15 16 16 6 6 17 17 6 6 3' 18 18 7 7 3' 3' 3' 3' 3' 3' 3' 8 8 5' 5' 5' 5' ATAC-MVs. S42: Fig. ATAC-MVs. S41: Fig. PDX1 PDX1 rhlgu tN eut.Lf:hmnGA Ns ih:muelocal- mouse Right: SNPs. GWAS human Left: results. atSNP orthologous local- mouse Right: SNPs. GWAS human Left: results. atSNP orthologous 46

PWM (−) (−) PWM 3' 3' PWM (+) (+) PWM 3' 5' 5' 5' 5' 1 1 MA0162.1;Egr1 MotifScanfor rs241858428 1 1 MA0062.1;GABPA MotifScanfor rs229501323 2 2 3' 2 2 3 3 Best match tothereference genome Best match tothereference genome Best match to the SNP genome Best match totheSNP 3 3 4 4 Best match to the SNP genome Best match totheSNP 5 5 4 4 6 6 5 5 7 7 6 6 8 8 7 7 9 9 8 8 10 10 9 9 11 11 5' 10 10 12 12 3' 3' 3' 3' 13 13 5' 5' 5' ATAC-MVs. S45: Fig. ATAC-MVs. S44: Fig. ATAC-MVs. S43: Fig. PDX1 PDX1 PDX1 rhlgu tN eut.Lf:hmnGA Ns ih:muelocal- mouse Right: SNPs. GWAS human Left: results. atSNP orthologous local- mouse Right: SNPs. GWAS human Left: results. atSNP orthologous local- mouse Right: SNPs. GWAS human Left: results. atSNP orthologous 47

PWM (−) (−) PWM 3' 3' 3' 1 1 MA0122.1;Nkx3−2 MotifScanfor rs241858428 PWM (−) (−) PWM 2 2 PWM (+) (+) PWM 3' 3' 3' 3' 5' 5' 5' 5' 1 1 3 3 1 1 Best match tothereference genome 3' 2 2 MA0060.1;NFYA MotifScanfor rs32366259 PB0133.1;Hic1_2 MotifScanfor rs229501323 Best match to the SNP genome Best match totheSNP 2 2 4 4 3 3 3 3 4 4 Best match tothereference genome 4 4 Best match tothereference genome 5 5 Best match to the SNP genome Best match totheSNP 5 5 Best match to the SNP genome Best match totheSNP 5 5 6 6 6 6 6 6 7 7 7 7 8 8 7 7 8 8 9 9 9 9 8 8 10 10 10 10 11 11 11 11 9 9 12 12 12 12 5' 13 13 10 10 13 13 14 14 14 14 11 11 15 15 15 15 16 16 16 16 12 12 3' 3' 3' 3' 5' 5' 5' 5' 5' 5' 5' ATAC-MVs. S47: Fig. ATAC-MVs. S46: Fig. PDX1 PDX1 rhlgu tN eut.Lf:hmnGA Ns ih:muelocal- mouse Right: SNPs. GWAS human Left: results. atSNP orthologous local- mouse Right: SNPs. GWAS human Left: results. atSNP orthologous 48

PWM (+) (+) PWM 5' 5' 5' 5' 1 1 MA0003.1;TFAP2A MotifScanfor rs229501323 2 2 Best match tothereference genome Best match to the SNP genome Best match totheSNP 3 3

PWM (+) (+) PWM 5' 5' 5' 5' 4 4 1 1 2 2 MA0106.1;TP53 MotifScanfor rs32366259 3 3 5 5 4 4 5 5 Best match tothereference genome Best match to the SNP genome Best match totheSNP 6 6 6 6 7 7 8 8 9 9 10 10 7 7 11 11 12 12 13 13 8 8 14 14 15 15 16 16 17 17 9 9 18 18 19 19 3' 3' 3' 3' 20 20 3' 3' 3' 3' ADCY5 (chr3) 50 Kb

SEC22A ADCY5 HACD2 Type2Diabetes: rs11708067 BloodGlucose & InsulinSecretingCells: rs11708903, rs6438788, rs4450740 GWAS SNP ATAC-seq peak

pcHi-C interactions

Scale 100 kb mm10 chr16: 35,150,000 35,200,000 35,250,000 35,300,000 GENCODE VM23 Comprehensive Transcript Set (only Basic displayed by default) Adcy5 (chr16) Adcy5 50 Kb Gm25179 Gm49702 Sec22a Adcy5 Sec22a Sec22a Gm49706 127 _ 129

0 _ 127 _ AJ

0 _ 127 _ B6

0 _ 127 _ Cast

0 _ 127 _ NOD

0 _ 127 _ NZO

0 _ 127 _ PWK

0 _ 127 _ WSB ATAC-seq signals across eight founder strains ATAC-seq 0 _ DO-eQTL marker Trimmed Peak List rs212054017 C>T

Fig. S48: Promoter capture Hi-C links ADCY5 promoter to 4 GWAS SNPs which map to 1 mouse local-ATAC-MV with INFIMA predicted susceptibility gene Adcy5. Top: depictions of intronic GWAS SNPs (translucent green) interactions with the ADCY5 promoter (translucent gray), together with the human ATAC-seq peaks. Bottom: mouse genome de- pictions of ATAC-seq signal for local-ATAC-MVs (translucent red) where INFIMA fine-maps DO-eQTL marker (dashed line) linked to Adcy5 (promoters highlighted in translucent gray).

49 KCNQ1 (chr11) 100 Kb TNNT3 TNNT3 IGF2 ASCL2 CD81 KCNQ1 NAP1L4 LSP1 PRR33 TH C11orf21 TSSC4 SLC22A18AS TNNT3 TNNT3 INS TSPAN32 TRPM5 CDKN1C TNNT3 TNNT3 TNNT3 SLC22A18 TNNT3 PHLDA2 051649.1 LSP1 LSP1 Type1Diabetes: rs6356, rs10770141, rs10743152 TNNT3 LSP1GWAS SNP LSP1 ATAC-seqTNNT3 peak LINC01150 TNNT3 pcHi-CScale 200 kb mm10 interactionschr7: 142,900,000 143,000,000 143,100,000 143,200,000 143,300,000 143,400,000 143,500,000 GENCODE VM23 Comprehensive Transcript Set (only Basic displayed by default) Gm6471 Th Ascl2 R74862 Gm37235 Kcnq1ot1 Gm27901 Nap1l4 Gm6471 Th Tspan32 Tssc4 Gm27395 Gm38095 Cdkn1c Cars Gm39117 Ascl2 Cd81 Gm27389 Gm38065 Cdkn1c Cars Tspan32os Tssc4 Gm27803 Gm44732 Tspan32 Tssc4 Gm27539 Slc22a18 100 Kb Kcnq1 (chr7) Tspan32 Tssc4 Gm37364 Slc22a18 Tspan32 Kcnq1 Phlda2 Th Ascl2 Tspan32 Cd81 Kcnq1 Cdkn1c Nap1l4 Tspan32 Kcnq1 Nap1l4 Tspan32 Cd81Gm38321 4933417O13Rik Nap1l4Phlda2 Tspan32 Tssc4Gm28821 Slc22Gm23297a18 R74862 Trpm5 Trpm5 200 _ 129

0 _ 200 _ AJ

0 _ 200 _ B6

0 _ 200 _ Scale 200 kb Cast mm10 chr7: 142,900,000 143,000,000 143,100,000 143,200,000 143,300,000 143,400,000 143,500,000 0 _ GENCODE VM23 Comprehensive Transcript Set (only Basic displayed by default) 200 _ Gm6471 Th Ascl2 R74862 Gm37235 Kcnq1ot1NOD Gm27901 Nap1l4 Gm6471 Th Tspan32 Tssc4 Gm27395 Gm38095 Cdkn1c Cars Gm39117 Ascl2 Cd81 Gm27389 Gm38065 Cdkn1c Cars 0 _ Tspan32os Tssc4 Gm27803 Gm44732 200 _ Tspan32 Tssc4 NZO Gm27539 Slc22a18 Tspan32 Tssc4 Gm37364 Slc22a18 0 _ Tspan32 Kcnq1 Phlda2 Tspan32 Kcnq1 Nap1l4 200 _ Tspan32 Gm38321 PWK 4933417O13Rik Nap1l4 Tspan32 Gm28821 Gm23297 0 _ R74862 200 _ Trpm5 WSB

ATAC-seq signals across eight founder strains ATAC-seq 200 _ 129 129 0 _ 0 _ 200 _ rs238279370 C>A Trimmed PeakAJ List DO-eQTL marker AJ rs234511411 A>C 0 _ Fig. S49: Promoter capture Hi-C links KCNQ1 promoter to 3 GWAS SNPs which map to 2 mouse local-ATAC-MV with INFIMA predicted susceptibility gene Kcnq1. Top: human genome depictions of TH promoter or distal GWAS SNPs (translucent green) interactions with the KCNQ1 promoter (translucent gray), together with the human ATAC-seq peaks. Bottom: mouse genome depictions of ATAC-seq signal for local-ATAC-MVs (translucent red) where INFIMA fine-maps DO-eQTL marker (dashed line) linked to Kcnq1 (promoters highlighted in translu- cent gray).

50 60

40

20

0 ation annotations % of lo c lo c 60 al − Genomi c

40 A T A C − 20 Q T Ls

0 2000 4000 6000 0 # of mapped GWAS SNPs

50 Intron 40 Distal Intergenic Promoter 30

Exon 20 3' UTR Peak-based liftover 10 Downstream 0 5' UTR Genomic location annotations

Fig. S50: Comparison of genomic location annotations between human GWAS SNPs and the orthologous mouse genetic variants. Left: numbers of mapped GWAS SNPs within intronic, distal, promoter, exonic, UTR and downstream genomic locations. Right: Barplots of the local- ATAC-MVs mapping to the intronic, distal and promoter groups of GWAS SNPs, highlighting marked conservation of genomic location types. Note that the mapping here is without the 10 Kb distance constraint in Fig. 8a. The level of genomic compartment conservation declines when removing the distance constraint.

51