bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Functional characterization of thousands of type 2 diabetes-associated and

2 chromatin-modulating variants under steady state and endoplasmic reticulum

3 stress

4

5 Shubham Khetan1,3, Susan Kales2, Romy Kursawe1, Alexandria Jillette1, Steven K.

6 Reilly4, Duygu Ucar1,3,5, Ryan Tewhey2,6,7*, and Michael L. Stitzel1,3,5*

7

8 1The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA

9 2The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, 04609, USA

10 3Department of Genetics and Genome Sciences, University of Connecticut, Farmington,

11 CT, 06032, USA,

12 4Broad Institute of MIT and Harvard, Cambridge, MA, USA

13 5Institute of Systems Genomics, University of Connecticut, Farmington, CT, 06032, USA

14 6Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono,

15 ME, 04469, USA.

16 7Tufts University School of Medicine, Boston, MA, 02111, USA

17

18 *Corresponding authors: [email protected] and [email protected]

19

20 Keywords: Massively Parallel Reporter Assay (MPRA), human pancreatic islets, MIN6

21 beta cells, chromatin accessibility, ATAC-seq, chromatin accessibility quantitative trait

22 locus (caQTLs), genome-wide association study (GWAS), single nucleotide

23 polymorphism (SNP), Type 2 diabetes (T2D), endoplasmic reticulum (ER) stress, short

24 interspersed nuclear elements (SINE), Alu elements. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

25 Abstract: (199 words)

26 A major goal in functional genomics and complex disease genetics is to identify

27 functional cis-regulatory elements (CREs) and single nucleotide polymorphisms (SNPs)

28 altering CRE activity in disease-relevant cell types and environmental conditions. We

29 tested >13,000 sequences containing each allele of 6,628 SNPs associated with altered

30 in vivo chromatin accessibility in human islets and/or type 2 diabetes risk (T2D GWAS

31 SNPs) for transcriptional activity in ß cell under steady state and endoplasmic reticulum

32 (ER) stress conditions using the massively parallel reporter assay (MPRA).

33 Approximately 30% (n=1,983) of putative CREs were active in at least one condition.

34 SNP allelic effects on in vitro MPRA activity strongly correlated with their effects on in

35 vivo islet chromatin accessibility (Pearson r=0.52), i.e., alleles associated with increased

36 chromatin accessibility exhibited higher MPRA activity. Importantly, MPRA identified

37 220/2500 T2D GWAS SNPs, representing 104 distinct association signals, that

38 significantly altered transcriptional activity in ß cells. This study has thus identified

39 functional ß cell transcription-activating sequences with in vivo relevance, uncovered

40 regulatory features that modulate transcriptional activity in ß cells under steady state and

41 ER stress conditions, and substantially expanded the set of putative functional variants

42 that modulate transcriptional activity in ß cells from thousands of genetically-linked T2D

43 GWAS SNPs. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

44 Introduction

45 Studies over the past decade have identified millions of single nucleotide

46 polymorphisms (SNPs) in the human population1. Genome-wide association studies

47 (GWAS) have linked thousands of these SNPs to variability in physiological traits and

48 disease susceptibility2, including type 2 diabetes (T2D)3. The overwhelming majority of

49 GWAS SNPs reside in non-coding, regulatory regions of the genome3,4, implicating

50 altered transcriptional regulation as a common molecular mechanism underlying disease

51 risk. Compared to -coding regions, predicting regulatory functions of non-coding

52 sequences, and the effect of SNPs within them, remains a significant challenge.

53 Approaches such as ATAC-seq identify regions of accessible chromatin as a

54 marker of in vivo cis-regulatory elements (CREs)5. However, CREs identified by

55 chromatin accessibility (ATAC-seq peaks) vary significantly in length. Moreover, ATAC-

56 seq peaks are characterized by 1 or more summits, which have higher chromatin

57 accessibility than flanking genomic regions also within ATAC-seq peaks6. Few studies

58 have explored the relationship between transcriptional activity and open chromatin

59 regions defined by ATAC-seq7,8. For instance, it is not known whether higher chromatin

60 accessibility within ATAC-seq peaks also corresponds to enhanced transcriptional

61 activity, or whether proximity to ATAC-seq peak summits influences the likelihood for a

62 SNP to affect enhancer activity. Studies designed to identify chromatin accessibility

63 quantitative trait loci (caQTL) are being increasingly applied to nominate SNPs altering

64 CRE activity9,10. Previously, we identified 2,949 caQTLs in human islets11, which were

65 significantly enriched for type 2 diabetes (T2D)-associated SNPs, highlighting the utility

66 of this approach to nominate putative disease-associated SNPs affecting CRE activity in

67 relevant cell types. However, caQTLs are correlative associations, and high throughput

68 assays to directly test variant effects are needed to experimentally establish causality.

69 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

70 Massively parallel reporter assays (MPRAs) have been developed as a functional

71 genomics platform to interrogate the enhancer potential of thousands of sequences

72 simultaneously12 under a variety of (patho)physiologic conditions. By introducing

73 nucleotide changes in a given sequence of interest, the effect of naturally occurring

74 sequence variants in the human population on MPRA activity can also be elucidated13.

75 Several studies have employed MPRA to identify functional SNPs associated with red

76 blood cell traits14, adiposity15, osteoarthritis16, cancer and eQTLs13.

77 Although studies over the past several years implicate altered islet CRE function

78 in T2D genetic risk and progression, they have colocalized only ~1/4 of T2D-associated

79 loci to altered chromatin accessibility and/or expression levels in islets11,17–20. We

80 hypothesize that this is partially because previous studies measured chromatin

81 accessibility (caQTLs) and gene expression (eQTL) in islets cultured under steady state

82 conditions11,21, consequently missing important gene-environment interactions

83 characteristic of a complex disorder like T2D. For example, endoplasmic reticulum (ER)

84 stress, which is elicited by the high demands of insulin production and secretion on

85 protein folding capacity in β cells, leads to (patho)physiologic adaptations. Low levels of

86 ER stress are a signal for β cells to proliferate as a mechanism to meet higher insulin

87 secretion demands22. However, uncompensated ER stress induced by genetic and/or

88 environmental perturbations leads to β cell failure and T2D23.

89 Here, we applied MPRA to directly test >13,000 sequences, including those

90 overlapping T2D GWAS and islet caQTL SNPs, for their ability to activate ß cell

91 transcription in steady state conditions and after exposure to the ER stress-inducing

92 agent thapsigargin or DMSO solvent control. We identified motifs and features of

93 sequences affecting MPRA activity in each condition and linked allelic effects on MPRA

94 activity to T2D genetics and in vivo islet chromatin accessibility. These results

95 demonstrate the power of MPRA to elucidate in vivo relevant ß cell transcriptional bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

96 activating sequences and define the regulatory grammar of these sequences under

97 steady state and stress-responsive conditions. We nominate 54 putative causal variants

98 among multiple genetically-linked islet dysfunction and T2D GWAS SNPs for further

99 study.

100

101 Results

102 Selection and testing of sequences for MPRA activity in β cells

103 To test putative islet β cell regulatory sequences for their ability to enhance

104 transcription from a minimal promoter and to identify SNPs that alter regulatory activity,

105 we generated an MPRA library containing: i) 1,910 SNPs significantly associated with

106 changes in human islet chromatin accessibility (caQTLs)11; ii) 2,218 control SNPs

107 overlapping human islet ATAC-seq peaks, which were reported to have no correlation

108 with changes in human islet chromatin accessibility11; and iii) 2,500 index and genetically

109 linked (r2>0.8) SNPs/indels corresponding to 259 T2D association signals in the

110 NHGRI/EBI GWAS Catalogue (Figure 1a, Table S2, and Methods). In total, 13,628 two-

111 hundred (bp) sequences from the were included in this MPRA

112 library.

113 MIN6 mouse β cells have been extensively employed to study human islet

114 regulatory sequences because human and mouse transcription factor (TF) expression

115 and DNA binding motifs are extensively conserved. MIN6 cells express many β cell-

116 specific TFs, such as Pdx1, Nkx6.1 and Foxa2, and in a comparison with nine human

117 tissues/cell-types, the chromatin accessibility profile of MIN6 mouse β cells most

118 resembled that of human islets (Supplementary Figure 1). To test human sequences for

119 islet β cell transcriptional activity, we transfected the MPRA plasmid library into five

120 independent batches of MIN6 β cells under standard culture conditions (25mM glucose,

121 Methods). Thirty hours later, transfected cells were harvested for RNA isolation, GFP bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

122 mRNA capture, and Illumina sequencing of the regulatory sequence-associated

123 barcodes (Figure 1b). As anticipated, principal component analysis indicated that RNA

124 expression of the transfected MPRA libraries was highly correlated between the five

125 biological replicates and clustered distinctly from the MPRA plasmid library input

126 (Supplementary Figure 2, compare RNA vs. DNA). 2,224/13,628 sequences (16.3%),

127 representing 1,372 distinct elements, exhibited significantly higher counts after

128 transfection compared to the plasmid library input under standard culture conditions

129 (FDR<1%; Figure 1c; Supplementary Table 2). We refer to these sequences as ‘MPRA

130 active’ throughout the remainder of the manuscript. MPRA active sequences were

131 significantly enriched for in vivo binding by human islet-specific TFs (PDX1, NKX6.1 and

132 FOXA2; ChIP-seq; Figure 1d), suggesting that MPRA identifies human islet β cell

133 regulatory sequences with in vivo relevance and validating MIN6 as a powerful cellular

134 model to test human sequences for islet β cell transcriptional activity.

135

136 Identification of β cell regulatory elements responding to ER stress

137 The high secretory burden of insulin production, processing, and secretion

138 makes β cells particularly susceptible to ER stress. In fact, activation of the unfolded

139 protein response (UPR) and ER stress has been implicated in the genetic etiology and

140 pathophysiology of both monogenic24,25 and type 2 diabetes23. Thapsigargin (TG) blocks

141 calcium transport into the ER lumen and is widely used to induce ER stress in cells26–33.

142 Therefore, to identify sequences whose ß cell transcriptional activity is modulated by ER

143 stress, MIN6 cells transfected with the MPRA library were grown in standard culture

144 media (25 mM glucose) supplemented with 250 nM TG or DMSO solvent control for 24

145 hours (Figure 2a). As expected, exposing MIN6 cells to 250 nM TG induced expression

146 of ER stress response such as Ddit3 (Chop), Hspa5, and Edem1, and reduced

147 Ins2 expression (Figure 2b). Compared to the DMSO solvent control, ER stress bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

148 decreased MPRA activity of 656 sequences (mapping to 449 elements) and increased

149 MPRA activity of 328 sequences (mapping to 275 elements), respectively, at FDR <1%

150 (Figure 2c).

151 We compared these differentially active sequences to identify TF motifs that may

152 be mediating their transcriptional activity differences. Elements enriched for sequence

153 binding motifs of β cell-specific TFs that mediate insulin gene transcription and beta cell

154 function34,35, such as MAFA, FOXA2 and PAX6, had lower MPRA activity under ER

155 stress (Figure 2d). Concordantly, we observed a significant decrease in insulin gene

156 expression (Figure 2b), suggesting ER stress leads to inactivation of these β cell-specific

157 TFs. It is known that uncompensated ER stress leads to preferential ATF4 translation

158 and induction of DDIT3/CHOP36,37. Consistently, we found that motifs for these TFs were

159 enriched in elements with higher MPRA activity after ER stress induction (Figure 2d).

160 Activity measurements with MPRA, therefore, recapitulate TF dynamics during ER stress

161 and identify β cell regulatory elements that respond to ER stress.

162

163 ATAC-seq peak summits are enriched for TF binding motifs and MPRA activity

164 In total, 30% of sequences containing one or both SNP alleles were MPRA active

165 in at least one treatment condition (Figure 3a, n=1983/6628, FDR ≤1%). MPRA active

166 sequences were significantly enriched for 114 TF binding motifs (Supplementary Table

167 3), including those of islet TFs such as HNF1, MAFA, RFX5, and FOXA2. Human-mouse

168 sequence similarity did not influence the probability that a tested human sequence was

169 MPRA active in MIN6 (black dashed line in Figure 3b; Supplementary Figure 3). As

170 anticipated, sequences overlapping regions with in vivo evidence of transcriptional

171 regulatory potential in human islets (ATAC-seq peaks) were significantly more likely to

172 be identified as MPRA active regardless of the sequence similarity between human and

173 mouse (Figure 3b). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

174 Notably, the majority (60%) of elements accessible in human islets were not

175 MPRA active (Figure 3b). Islet ATAC-seq peaks vary in length (100-3500 bps).

176 Moreover, within an ATAC-seq peak, there are differences in chromatin accessibility

177 levels at nucleotide resolution, where peak summits are locations with higher read

178 counts than the flanking regions (schematic in Figure 3c; example in Figure 5a). Peak

179 summits are significantly enriched in TF binding motifs compared to flanking regions

180 within the same ATAC-seq peak (Figure 3c and Supplementary Figure 4). Therefore, for

181 MPRA library sequences overlapping ATAC-seq peaks, we examined the distance of

182 SNPs, which we designed to be in the center of all MPRA sequences tested, to the

183 summit of ATAC-seq peaks in which they reside. Sequences closer to ATAC-seq peak

184 summits were significantly more likely to be MPRA active (Figure 3d). However, this

185 effect was observed only when SNPs were less than 100 bps from the ATAC-seq peak

186 summit (dashed green line in Figure 3d). Since the MPRA library tests the 100 base

187 pairs flanking both sides of a given SNP of interest, ATAC-seq peak summits were

188 included in the MPRA sequences only if they were ≤100 bp from the SNP. Therefore,

189 MPRA sequences overlapping ATAC-seq peaks were categorized based on whether the

190 summit was within 100 bps of the SNP or not, i.e., whether the ATAC-seq peak summit

191 was included in the sequence tested with MPRA or not. Sequences that included the

192 ATAC-seq peak summit were indeed more likely to be active with MPRA, irrespective of

193 the number of times a genomic region was accessible in a cohort of 19 islet donors

194 (Figure 3e). Interestingly, inclusion of the ATAC-seq peak summit was predictive of

195 MPRA activity even for genomic regions with accessibility in only one of 19 islet donors.

196 Together, these analyses indicate that sequences overlapping human islet

197 ATAC-seq peaks were more likely to be active with MPRA in MIN6 mouse β cells than

198 sequences outside peaks, regardless of sequence similarity between human and

199 mouse. ATAC-seq peak summits are enriched for TF binding motifs, and exclusion of bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

200 ATAC-seq peak summits among the elements overlapping ATAC-seq peaks,

201 significantly decreased the probability of observing MPRA activity.

202

203 SNPs proximal to ATAC-seq summits have a higher probability of altering MPRA

204 activity and chromatin accessibility

205 To identify SNPs that modulate regulatory element activity, we assessed allelic

206 differences (skew) in MPRA activity (schematic in Figure 1b) under each MIN6 beta cell

207 culture condition (standard, 250nM TG, and DMSO solvent control). In total, 879 SNPs

208 exhibited allelic skew at FDR <10% (Table S2; Supplementary Figure 5). Importantly,

209 when allelic skew was detected in more than one experimental condition, the direction-

210 of-effect was concordant for >98.5% of the SNPs (Supplementary Figure 5). Assessing

211 allelic skew for SNPs across all conditions increased the power to detect allelic skew

212 and identified both highly reproducible and condition-specific SNP allelic effects on

213 MPRA activity.

214 SNPs associated with chromatin accessibility changes in islets (caQTLs) were

215 significantly closer to the ATAC-seq peak summits than control SNPs, which are SNPs

216 that reside in ATAC-seq peaks11 but were not associated with chromatin accessibility

217 changes in islets (Figure 4a). Consequently, caQTL SNPs were more likely to be MPRA

218 active than were control SNPs (Figure 4b). Among elements with MRPA activity, caQTL

219 SNPs were also more likely to exhibit allelic skew than control SNPs (Figure 4c) and

220 showed a significantly greater magnitude of skew between alleles (Figure 4d). Together,

221 these results suggest that caQTL SNPs are closer to ATAC-seq peak summits and more

222 likely to modulate the magnitude of both in vitro MPRA activity and in vivo chromatin

223 accessibility, likely by disrupting TF binding motifs that are enriched in these regions.

224 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

225 Allelic effects on MIN6 beta cell MPRA activity and in vivo islet chromatin

226 accessibility are significantly correlated

227 After identifying SNPs with allelic skew in MPRA activity, we investigated the

228 relationship between SNP effects on in vivo islet chromatin accessibility (caQTLs) and in

229 vitro MPRA activity. Figure 5a shows an example caQTL (rs17396537) for which islet

230 donors with homozygous GG genotypes (n=6) exhibit higher chromatin accessibility than

231 islet donors with GC (n=11) or CC (n=2) genotypes at this variant. Concordantly, the

232 sequence containing the ‘G’ allele for rs17396537 exhibited higher MPRA activity than

233 the one containing the ‘C’ allele (Figure 5a, b). 297/1910 lead caQTL SNPs

234 demonstrated significant allelic skew in MPRA activity. Overall, caQTL and MPRA

235 directions-of-effect for these 297 SNPs were highly correlated (Pearson R = 0.526,

236 Figure 5b), and a significant concordance of direction was observed (i.e., alleles

237 associated with higher chromatin accessibility also had higher MPRA activity) (n =

238 246/297; 82.8%).

239 A subset of lead caQTL SNPs (17.2%) showed discordant allelic effects on

240 chromatin accessibility levels and MPRA activity (blue; Figure 5B). We hypothesized that

241 discordance may be driven by a nearby SNP that has MPRA activity antagonistic to the

242 lead caQTL SNP10. For example, the reference allele of the lead caQTL SNP,

243 rs1515555, is associated with in vivo higher chromatin accessibility but lower MPRA

244 activity (Figure 5b). This region contains a second SNP, rs1515556, 24 bps away from

245 rs1515555 and in tight LD (r2 = 0.99). We therefore assessed sequences containing

246 each of the four possible allelic combinations of these SNPs in the MPRA library for

247 additive, antagonistic, or synergistic transcriptional effects. In comparison to

248 rs1515555:rs1515556 REF:REF sequence (G:T in Figure 5c), the ALT:REF (A:T)

249 sequence had higher MPRA activity, while the REF:ALT (G:C) sequence had lower

250 MPRA activity (G:C). However, due to high LD between the 2 SNPs, both of these bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

251 combinations (‘A:T’ and ‘G:C’; ALT:REF and REF:ALT; Figure 5c) are rarely observed in

252 the human population. Sequences with either REF:REF (G:T) or ALT:ALT (A:C) at

253 rs1515555:rs1515556 are more likely to be observed in individuals in the population.

254 Comparison of these haplotypes revealed concordant effects on both chromatin

255 accessibility and MPRA activity and underscores the importance of accounting for

256 haplotypes when characterizing allelic effects.

257 For 94 lead caQTL SNPs, we tested for potential interactions with neighboring

258 SNPs by including all 4 allelic combinations in the MPRA library. However, these were

259 too few to clearly implicate antagonism between neighboring SNPs for the observed

260 discordance between chromatin accessibility and MPRA activity. Therefore, we cannot

261 rule out discordant SNPs as merely being false positives or mechanistic difference due

262 to technical constraints of the assay. However, MPRA provides a unique opportunity to

263 test and dissect the contributions of neighboring SNP to in vivo effects.

264 Allelic effects on MPRA activity and chromatin accessibility are therefore

265 significantly correlated. Importantly, these results demonstrate that that MPRA detects in

266 vivo relevant allelic effects on ß cell transcriptional activity.

267

268 Functional identification of T2D SNPs altering β cell regulatory element activity

269 A key challenge to translate GWAS associations into a mechanistic

270 understanding of the genes and pathways altered by T2D risk variants is identifying the

271 SNP(s) that affect functional or active CREs among tens to hundreds of genetically

272 linked, associated SNPs in each locus. The ability to functionally annotate groups of

273 SNPs with empirical measurements of CRE modulation would help prioritize variants

274 and expedite our ability to identify T2D causal alleles. To uncover transcriptional effects

275 of T2D-associated SNP alleles, we tested sequences containing each allele of 2,500

276 index and genetically linked (r2>0.8) SNPs/indels for 259 association signals for T2D and bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

277 related quantitative traits reported in the NHGRI/EBI GWAS Catalog. One or both alleles

278 of 492 SNPs/indels were active by MPRA. Active sequences were enriched for the

279 binding motifs of TFs that play important roles in β cell maturation and insulin secretion

280 (Figure 6a), such as liver x receptor (LXR)38,39, thyroid hormone receptor (THRa and

281 THRb)40,41, retinoic acid receptor (RARa)42, and BCL11A43. Allelic effects on MPRA

282 activity were observed for approximately half of these elements (n=220/492),

283 corresponding to 104 distinct T2D-associated loci (Supplementary Table 4). Importantly,

284 this list included T2D-associated SNPs that were previously identified for their in vivo

285 effects on islet chromatin accessibility and/or in vitro effects on luciferase reporter

286 assays, such as rs7903146 (TCF7L2)44,45, rs1635852 (JAZF1)46, rs10428126

287 (IGF2BP2)11,47, and rs12189774 (VEGFA)48 (Figure 6b, Table 1). Importantly, MPRA has

288 substantially expanded the set of T2D-associated GWAS SNPs with empiric effects on

289 transcriptional activation in ß cells, and therefore nominated them for the first time as the

290 putative causal/functional variants in these T2D loci. These include rs2881632

291 (SDHAF4), rs13096599 (KBTBD8), rs7783500 (GCC1), rs72697237 (NOTCH2),

292 rs13405776 (THADA), rs17748864 (PEX5L), rs7957197 (OASL), and rs13026123

293 (SRBD1), which are highlighted in Table 1.

294 For 54/104 T2D-associated signals, one SNP among those tested exhibited

295 allelic effects on MPRA activity (black points in Figure 6b). For example, rs987964 was

296 the only SNP of 49 tested in the ZBTB20 locus (Figure 6c; Supplementary Table 5) that

297 exhibited allelic effects on MPRA activity. For this and 53 additional T2D-associated

298 signals, MPRA thus provides empiric evidence that the variant directly modulates CRE

299 activity and assists in prioritizing variants for additional investigation as putative causal

300 alleles (Supplementary Table 5). The number of SNPs with allelic skew was found to

301 generally correlate with the number of SNPs tested per locus (Figure 6b). For example,

302 10/45 tested SNPs in LD with the index SNP rs12304921 in the HIGD1C locus had an bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

303 allelic skew in MPRA activity (Figure 6d), not all of which were in the same direction with

304 respect to the T2D risk alleles. Investigating the relative importance and potential

305 interactions between these ten putative causal variants will be important to establish

306 causality for this locus. These results underscore the value of assessing MPRA activity

307 in disease-relevant cell types as a tool to nominate putative causal T2D SNPs based on

308 systematic testing of their transcription-modulating effects in beta cells.

309

310 ER stress modulates MPRA activity of T2D SNP-containing sequences

311 MPRA activity of some putative CREs were exclusive to one out of the three

312 conditions tested. For example, ~5% of those tested were identified as MPRA active

313 exclusively in ER stressed beta cells (Figure 3a, n=106). However, we found that ER

314 stress elicited substantial quantitative changes in MPRA activity of 724 CREs (Figure

315 2c). To investigate both the impact of ER stress on MPRA activity and how SNPs modify

316 this response, we examined elements meeting two criteria: 1) allelic effects on MPRA

317 activity was observed in at least one condition; and 2) a minimum of one allele showed

318 differences in MPRA activity under ER stress (Figure 7a).

319 We previously characterized islet caQTLs from individuals with intact insulin

320 expression and secretion, and observed an enrichment of islet-specific TF binding motifs

321 at caQTL regulatory elements11. ER stress, however, leads to the inactivation of β cell

322 specific TFs (Figure 2d), with a concomitant decrease in Ins2 gene expression (Figure

323 2b). As might be expected based on these observations, steady state islet caQTL-

324 containing sequences predominantly exhibited lower MPRA activity in TG-treated cells

325 than the DMSO solvent controls. (Figure 7a, red dashes and 7b, red). In contrast, T2D

326 SNP-containing elements were significantly more likely to have higher MPRA activity

327 under ER stress (Figure 7a, green and 7b). Given the majority of SNPs tested in this set bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

328 are non-causal/functional ‘passenger’ alleles, this effect may reflect general CRE

329 architecture across T2D-associated loci.

330 Curiously, only 13% (30/220) of T2D-associated SNPs with allelic skew in MPRA

331 activity overlapped islet ATAC-seq peaks. We hypothesized that this may partially be

332 due to mappability issues since 63.1% (139 / 220) of T2D-associated SNPs with allelic

333 skew in MPRA activity overlapped repetitive sequences. Since there are no mappability

334 issues with MPRA, we first asked if repetitive sequences have MPRA activity. Among

335 the various classes of repetitive sequences, only elements overlapping short

336 interspersed nuclear elements (SINEs) were significantly more likely to be active

337 (Supplementary Figure 6a). Since >50% of elements overlapping SINEs in the MPRA

338 library harbored T2D-associated SNPs (Supplementary Figure 6b and S6c), we next

339 asked whether T2D-associated elements overlapping SINEs showed higher activity

340 under ER stress. Indeed, >75% of T2D-associated elements with higher MPRA activity

341 under ER stress overlapped SINEs, a significantly higher proportion compared to T2D-

342 associated elements with lower or no change in MPRA activity under ER stress (Figure

343 7c). Alu elements, the most common SINEs in the human genome, are primate-

344 specific49. When we investigated conservation in 20 mammalian genomes, T2D-

345 associated elements with higher MPRA activity under ER stress were indeed less likely

346 to be conserved in all non-primate mammalian species for which comparisons were

347 made (Figure 7d).

348 In conclusion, we have identified 220 elements whose MPRA activity is disrupted

349 by T2D-associated SNPs. 40% (n=86) of these T2D-associated SNPs had significantly

350 higher MPRA activity under ER stress, >75% of which overlapped SINEs.

351

352 Discussion bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

353 In this study, we used MPRA to test >13,000 sequences containing each allele of

354 6628 SNPs for MPRA activity in MIN6 β cells under standard culture conditions and after

355 ER stress or paired solvent control exposures. In total, 30% (n=1983/6628) of putative

356 CREs tested increased transcriptional activity of a minimal promoter. SNP alleles,

357 including those associated with altered in vivo chromatin accessibility in human islets

358 and with T2D genetic risk by GWAS, altered MPRA activity for approximately half of

359 these elements (n=879/1983).

360 MPRA activity exhibited a striking, positive correlation with in vivo islet chromatin

361 accessibility. As anticipated, elements accessible in vivo were far more likely to have

362 MPRA activity in vitro. However, MPRA refined our understanding of the specific content

363 and location of DNA sequences within ATAC-seq peaks that drive β cell transcriptional

364 activation. Likelihood of an element showing MPRA activity increased as a function of its

365 proximity to ATAC-seq peak summits, and SNPs closer to ATAC-seq peak summits

366 were more likely to alter MPRA activity and chromatin accessibility. These results help to

367 prioritize sequences within open chromatin regions for their importance in regulating

368 enhancer activity by disrupting TF binding. Interactions between neighboring SNPs

369 sometimes led to discordant effects between chromatin accessibility and activity

370 measured with MPRA. In these instances, MPRA helped to determine the relative

371 contributions of neighboring SNPs (e.g., additive or antagonistic effects) on

372 transcriptional activation.

373 T2D-associated SNPs are overwhelmingly non-coding, and primarily alter

374 regulatory activity rather than protein structure and function. A key challenge in T2D

375 genetics is to identify the putative causal SNP(s) from among tens to hundreds of

376 genetically linked, noncoding variants per association signal. This is the first study to

377 systematically test thousands of T2D-associated SNP alleles for their effects on

378 transcriptional activity in β cells. MPRA identified 492 elements in T2D-associated loci as bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

379 active. 220/492 SNP alleles, representing 104 distinct T2D association signals,

380 significantly altered β cell MPRA activity. MPRA recapitulated, and thereby confirmed,

381 allelic effects of several T2D SNPs that have been characterized previously using

382 targeted, low throughput luciferase assays, such as rs10428126 and rs2943656 at the

383 IGF2BP2 and IRS1 locus, respectively. For 54/104 T2D association signals, only one

384 SNP showed an allelic skew among all the SNPs tested, implicating them as the putative

385 causal SNP within their respective locus. Although not an exhaustive test of all credible

386 set SNPs identified to date, this approach has substantially expanded the list of T2D-

387 associated SNP alleles that empirically alter ß cell regulatory sequence activity. These

388 results should facilitate targeted follow-up experiments and integrated genomic analyses

389 for a better understanding of the functional genetics of islet (dys)function and T2D risk

390 and strongly motivate future studies of more comprehensive panels of T2D-associated

391 SNPs for their transcription-modulating effects in beta cells and other diabetes-relevant

392 cell types under steady state, stimulatory, and/or stress conditions.

393 Islet dysfunction and beta cell failure in T2D results from both genetic and

394 environmental risk factors. MPRA of sequences containing T2D-associated SNPs, when

395 tested under control and ER stress conditions, revealed the effects of these factors on ß

396 cell transcriptional activation. A significant proportion (40%; 86/220) of sequences

397 containing T2D-associated index or linked SNPs exhibiting allelic skew under control

398 conditions had significantly higher MPRA activity under ER stress, suggesting – i) MIN6

399 cells cultured under standard, high glucose (25 mM) conditions are already under a

400 basal level of ER stress, and ii) many T2D-associated SNPs overlap CREs relevant for β

401 cells to respond to ER stress. Uncompensated ER stress was found to lead to the

402 inactivation of β cell-specific TFs causing the downregulation of insulin transcription and

403 secretion50–54. ER folding and trafficking capacity may therefore be a major factor

404 determining how much insulin can be released by β cells under elevated blood glucose bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

405 levels, before stress ensues. SNPs at regulatory elements modulating the expression of

406 genes relevant to meet higher demands for insulin synthesis may impair or improve ER

407 capacity, ultimately determining the threshold at which ER stress ensues, leading to β

408 cell failure.

409 Finally, we uncovered evidence that repetitive element sequences, most notably

410 SINEs, elicited robust transcriptional activation in ß cells and that multiple T2D-

411 associated SNPs residing in these sequences modulated their transcriptional activity.

412 The MPRA results obtained in this study thus suggest that repetitive element-containing

413 sequences may play important roles in modulating stimulus and/or stress-responsive ß

414 cell transcriptional programs and remind us to consider the potentially important roles

415 that repetitive elements, and SNPs within them, may play in the genetics of islet

416 (dys)function, diabetes risk and progression. Studies over the past few years

417 demonstrating repetitive element-mediated oncogene activation and modulation of

418 chromatin structure have contributed to an emerging appreciation of the importance of

419 these sequences in epigenetic and transcriptional regulation55–63. Recently, Hernandez

420 et al found that Alu elements are transcriptionally induced by cellular stress, including

421 thermal and ER stress, and that the corresponding SINE RNAs function as critical

422 transcriptional switches during stress64. Future studies to elucidate the target genes of

423 these and other MPRA active, SINE-containing regulatory sequences, will be necessary

424 to fully understand the functional consequences of sequence variation in these

425 transcriptionally active sequences and the potential role(s) of exaptation57,65,66 in the

426 genetics of islet (dys)function and T2D.

427

428 Figure Legends

429 Figure 1. Massively Parallel Reporter Assay (MPRA) identifies beta cell

430 transcription activating sequences. (a) Features of the 6,628 two hundred base pair bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

431 (bp) elements selected for MPRA library construction and testing. 4,329 elements

432 overlap islet ATAC-seq peaks, of which 1,910 contain SNPs associated with altered in

433 vivo chromatin accessibility (caQTLs). 2,500 elements contain T2D-associated SNPs

434 (index or linked (r2>0.8)) reported in the NHGRI/EBI GWAS catalog, 201 of which

435 overlap islet ATAC-seq peaks. (b) Schematic of MPRA approach. A mock SNP,

436 rs123456789, is used as an example to show how MPRA activity and allelic skew is

437 measured for each 200 bp regulatory element. mP = minimal promoter. (c) Log2 fold-

438 change in barcode/sequence counts in plasmid input (DNA) and gfp mRNA transcripts

439 (RNA) 30 hr after transfection of MPRA library into MIN6 cells grown in standard culture

440 conditions (DMEM, 25 mM glucose). Red points indicate sequences with MPRA activity

441 (FDR<1%; n = 2224/13,628). (d) Barplot showing the odds of MPRA active sequences

442 overlapping in vivo human islet TF ChIP-seq17 peaks compared to MPRA-inactive

443 sequences. ***p<0.001; **p<0.01, Fisher’s Exact Test.

444

445 Figure 2. MPRA identifies endoplasmic reticulum (ER) stress-responsive

446 transcriptional activating sequences. (a) Experimental design to identify ER stress-

447 responsive transcription activating sequences. (b) RT-qPCR of ER stress response

448 genes (Ddit3, Hspa5, Edem1, and Hsp90b1), Ins2, and the control gene Actb in MIN6

449 cells exposed to the ER stress inducer thapsigargin (TG, 250 nM) or dimethylsulfoxide

450 (DMSO) solvent control for 24 hours. Bar plots show mean +/- SEM from three biological

451 replicates. *p<0.05; ** p<0.01; *** p<0.001, paired t-test. (c) Heatmap of sequences with

452 higher (far right column, red bar; n=328, mapping to 275 elements) or lower MPRA

453 activity (blue bar; n=656, mapping to 449 elements) under ER stress (FDR<1%). Red

454 annotation bars to the left of the heatmap indicate sequences identified as MPRA active

455 in DMSO and/or TG based on their sequence counts in RNA vs. plasmid DNA input. (d)

456 Comparison of transcription factor (TF) binding motifs enriched in elements with higher bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

457 (n=275) or lower (n=449) MPRA activity under ER stress. Red and blue dots denote TF

458 motifs significantly enriched in elements with higher or lower MPRA activity in ER

459 stressed MIN6 beta cells, respectively (FDR<1%). Yellow dots denote TF motifs with no

460 significant enrichment in either comparison.

461

462 Figure 3. ATAC-seq peak summits are enriched for MPRA activity. (a) Venn

463 diagram showing the number of elements with at least one allele identified as MPRA

464 active in standard culture, TG, and DMSO conditions (FDR<1%). Note that elements

465 identified as MPRA active in ≥1 experimental condition may still exhibit quantitative

466 differences in MPRA activity under ER stress. (b) The probability that an element has

467 significant MPRA activity (y-axis; at least 1 allele; any experimental condition) is shown

468 as a function of sequence similarity (x-axis) between human and mouse genomes

469 (expected probability is the black dotted line). Elements overlapping islet ATAC-seq

470 peaks (brown line) had a significantly higher probability of MPRA activity than elements

471 that did not overlap ATAC-seq peaks (green line), regardless of human-mouse

472 sequence similarity. ***, **, and * indicate Bonferroni-corrected Fisher’s Exact Test

473 p<0.001, 0.01, 0.05, respectively. (c) 200 bp genomic regions around human islet ATAC-

474 seq peak summits were compared to 200 bp regions also within ATAC-seq peaks, but

475 flanking the peak summit. ATAC-seq peak summits were significantly enriched for many

476 TF motifs compared to genomic regions that overlap ATAC-seq peaks but are further

477 away from the summit. (d) Elements where the SNP (center of all tested loci) was ± 100

478 bp (green dashed lines) from ATAC-seq peak summit were more likely to be identified as

479 MPRA active. This effect is restricted to SNPs within 100 bps of the ATAC-seq peak

480 summit because our MPRA library was designed to test only the 200 bp sequence on

481 either side of a given SNP of interest. ATAC-seq peak summits were therefore included

482 in the sequences tested only if the distance of the SNP was within 100 bps of the ATAC- bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

483 seq peak summit. (e) (Top) The probability of an element (at least one allele) being

484 identified as MPRA active (y-axis) binned by the number of islet donors (out of 19 total)

485 in whom the ATAC-seq peak was present (x-axis). Elements where the ATAC-seq peak

486 summit was included in the sequence tested, i.e., distance of SNP to ATAC-seq peak

487 summit was less than 100 bps (red), had a significantly higher probability of being

488 identified as MPRA active. * indicates FDR < 10%, Fisher’s Exact Test. (Bottom)

489 Stacked barplot of the number of times an ATAC-seq peak overlapping the MPRA

490 sequence tested is detected in the cohort (n = 19 islet donors).

491

492 Figure 4. SNPs closer to ATAC-seq peak summits are more likely to affect MPRA

493 activity and chromatin accessibility. (a) caQTL SNPs (blue bars) are closer to ATAC-

494 seq peak summits than control SNPs (red bars), irrespective of MPRA activity (pairwise

495 Wilcoxon tests with Bonferroni correction for multiple testing). (b) Elements containing

496 caQTL SNPs are more likely to be identified as MPRA active (any experimental

497 condition; Fisher’s Exact Test). (c) Among SNPs with at least 1 allele identified as MPRA

498 active, caQTLs are significantly more likely show allelic skew (any experimental

499 condition), suggesting polymorphisms proximal to ATAC-seq peak summits are more

500 likely to affect MPRA activity (Fisher’s Exact Test). (d) Among SNPs with allelic skew in

501 MPRA activity, caQTLs have a significantly larger difference in MPRA activity between

502 alleles than control SNPs (Wilcoxon Test). For all panels, *, **, and *** indicate adjusted

503 p <0.05, p < 0.01, and p < 0.001, respectively.

504

505 Figure 5. Alleles associated with higher chromatin accessibility also have higher

506 MPRA activity. (a) (Left) caQTL example: Islet samples GG homozygous for

507 rs17396537 (green; average ATAC-seq read counts) were associated with significantly

508 higher chromatin accessibility than samples with GC (pink) or CC (black) genotypes bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

509 (hg19 coordinates - chr2:133408787-133409072). (Right): Sequence with the reference

510 ‘G’ allele at rs17396537 displayed significantly higher MPRA activity than the alternate

511 ‘C’ allele. (b) Correlations between allelic effects on in vivo islet chromatin accessibility

512 (x-axis) and MPRA activity (y-axis) for sequences containing islet caQTL SNPs.

513 Quadrants 1 and 3 (red) correspond to SNPs where the allele associated with higher

514 chromatin accessibility also had higher MPRA activity (concordant). The number of

515 SNPs in each quadrant is indicated. Pearson R = 0.526, p value < 2.2e-16) among the

516 caQTL SNPs with significant allelic skew in MPRA activity (FDR<10%). Brackets indicate

517 points beyond the axis limits. For example, there are a total of 135 SNPs in the bottom

518 left quadrant, 2 of which are beyond the axis limits. REF=hg19 reference allele;

519 ALT=alternate allele. (c) As shown in panel b, the lead caQTL SNP rs1515555

520 demonstrated apparently discordant allelic effects between chromatin accessibility and

521 MPRA activity. Since there is another SNP, rs1515556, 24 bps from the lead caQTL

522 SNP (r2=0.99), sequences containing all four allelic combinations were tested in the

523 MPRA library. The alternate allele of rs1515555 increased MPRA activity, whereas that

524 of rs1515556 decreased MPRA activity. Ref/Ref for rs1515555 and rs1515556 exhibited

525 higher MPRA activity than the Alt/Alt combination, which is concordant with its higher

526 chromatin accessibility associated with Ref/Ref for rs1515555 and rs1515556.

527

528 Figure 6. Functional identification of T2D SNPs altering β cell regulatory element

529 activity. (a) TF binding motifs enriched at 492 elements containing T2D-associated

530 SNPs with significant MPRA activity (at least one allele). Red dots denote significantly

531 enriched TF binding motifs (FDR<1%). (b) MPRA identifies a subset of T2D-associated

532 index and genetically linked (r2≥0.8) SNPs representing 259 signals from the NHGRI/EBI

533 GWAS catalog (x-axis; log scale) that demonstrate significant allelic effects on MPRA

534 activity (y-axis) in one or more condition tested. A total of 220 SNPs in linkage bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

535 disequilibrium with 104 T2D-associated index SNPs show significant allelic effects on

536 MPRA activity (FDR<10%). Jitter in the plot is used to visually separate individual points.

537 (c) (Top). In total, 49 SNPs in LD (r2 > 0.8) with the T2D-associated index SNP

538 rs73230612 were tested with MPRA (ZBTB20 locus). Allelic skew in MPRA activity was

539 detected in the same direction across all 3 experimental conditions (standard culture,

540 DMSO and TG) at 1 SNP only, rs987964. (Bottom; zoom-in) The 200 bp genomic region

541 spanning rs987964 is specific to humans. (d) In total, 45 SNPs in LD (r2 > 0.8) with the

542 T2D-associated index SNP rs12304921 were tested with MPRA (HIGD1C locus). Allelic

543 skew in MPRA activity was detected at 10 SNPs.

544

545 Figure 7. ER stress increases MPRA activity of 40% of T2D-associated SNP

546 containing sequences with allelic skew. (a) Among the SNPs with allelic skew (any

547 condition), the heatmap includes those with at least 1 allele showing significantly higher

548 or lower MPRA activity under ER stress vs. DMSO (row annotation (right); red and blue

549 bars, respectively). For each SNP (rows), z-scores are obtained by centering and scaling

550 the normalized RNA/DNA ratios for the reference and alternate alleles (5 replicates

551 each: DMSO and TG). Annotation horizontal bars on the left indicate whether an

552 element (row) harbors a caQTL or T2D-associated SNP. Note that the majority of

553 elements with higher MPRA activity under ER stress contain T2D-associated SNPs. (b)

554 SNPs with allelic skew (any condition) are categorized as having at least 1 allele with – i)

555 lower, ii) no difference in, or iii) higher MPRA activity under ER stress vs. DMSO (x-axis).

556 SNPs in each of the above categories are further fractioned as being – i) caQTLs (red),

557 ii) control SNPs (yellow), or iii) T2D-associated (green). A significant proportion of caQTL

558 SNPs with allelic skew were significantly more likely to show reduced MPRA activity, and

559 significantly less likely to show higher MPRA activity, under ER stress. A significant

560 proportion of T2D-associated SNPs with allelic skew displayed the opposite trend: they bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

561 were significantly less likely to show reduced MPRA activity, and significantly more likely

562 to show higher MPRA activity (86/220), under ER stress. (c) Fraction of T2D-associated

563 elements or 10,000 random genomic regions overlapping SINE repeats in the human

564 genome. The T2D-associated elements are categorized based on whether the elements

565 show – i) MPRA activity, ii) Allelic skew in MPRA activity, or iii) higher MPRA activity

566 under ER stress. A significant fraction of T2D-associated elements with higher activity

567 under ER stress overlap SINEs. (d) The probability that elements containing SNPs with

568 allelic skew are conserved with sequence similarity greater than 20% (y-axis) is plotted

569 for 20 mammals whose genomes are available (x-axis). Red and yellow dots show

570 comparisons for caQTL and control SNPs, respectively. T2D-associated SNPs with

571 allelic skew were split based on whether they had significantly higher MPRA activity

572 under ER stress (n=86; purple), or no change (n=110; green). Compared to 10,000

573 random 200 bps regions from the human genome, elements overlapping caQTL and

574 control SNPs were significantly more likely, and T2D-associated elements with higher

575 MPRA activity under ER stress significantly less likely, to be conserved in non-primate

576 mammalian species (Fisher’s exact test; FDR<5%). For Panels B and C, Fisher’s Exact

577 Test p-value (corrected for multiple testing) < 0.001 is indicated as ***, ** indicates p-

578 value < 0.01, and * indicates p-value < 0.05.

579

580 Supplementary Figure Legends

581 Supplementary Figure 1: The chromatin accessibility profile of MIN6 mouse β cells

582 most resembles that of human islets. (a) ATAC-seq peaks from 9 human tissues were

583 mapped to the mouse genome (mm9) using the UCSC Genome Browser liftover tool

584 and compared to MIN6 ATAC-seq peak locations. (b) Heatmap showing the enrichment

585 of islet-specific TF motifs at human ATAC-seq peaks uniquely overlapping MIN6 ATAC-

586 seq peaks. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

587

588 Supplementary Figure 2: QC Metrics of MPRA libraries. (a) Scatter plot of the first 2

589 principal components, which together explain >99% of the variation in 5 plasmid and 5

590 RNA MPRA replicates. RNA replicates were obtained after transfection of the MPRA

591 library into MIN6 cells under standard culture conditions. (b) Pairwise Pearson

592 correlation coefficients are shown as a heatmap with unsupervised row and column

593 clustering.

594

595 Supplementary Figure 3: Human-Mouse sequence similarity of elements with

596 MPRA activity. Human-mouse sequence similarity for elements with (red) or without

597 (black) MPRA activity is plotted. Human elements with 0% sequence similarity did not

598 liftover to the mouse genome (mm9) with even 1% sequence similarity. A bootstrap

599 hypothesis test of equality did not find a statistically significant difference between

600 elements with (red) or without (black) MPRA activity for human-mouse sequence

601 similarity. The upper and lower end-points for equality is indicated as a blue reference

602 band.

603

604 Supplementary Figure 4: Enrichment of islet-specific TF binding motifs at islet

605 ATAC-seq peak summits. Islet ATAC-seq peak summits are significantly enriched for

606 many islet-specific TF motifs compared to genomic regions that overlap ATAC-seq

607 peaks but are further away from the summit.

608

609 Supplementary Figure 5: Allelic skew in MPRA activity for 879 SNPs across the 3

610 experimental conditions. (Top) Venn diagrams showing the number of SNPs with

611 allelic skew in MPRA activity (FDR<10%) in each condition. (Bottom) Scatter plot of log

612 fold change in MPRA activity for SNPs with allelic skew in 2 experimental conditions. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

613 SNPs for which comparisons are made are indicated as pink in the corresponding Venn

614 diagram above each scatter plot.

615

616 Supplementary Figure 6: The majority of elements overlapping SINEs harbor T2D-

617 associated SNPs. (a) Barplot showing the odds of MPRA active sequences (at least

618 one allele; any experimental condition) overlapping long interspersed nuclear element

619 (LINE), long terminal repeat (LTR), or short interspersed nuclear element (SINE)

620 repetitive element classes. ***p< 0.001, Fisher’s Exact test. (b) Fraction of elements

621 containing i) caQTL, ii) control, or iii) T2D-associated SNPs that overlap SINEs. 10,000

622 random genomic regions from hg19 are included for context. (c) Stacked barplot

623 showing the fraction of elements in the MPRA library overlapping SINEs in the human

624 genome.

625

626

627

628 Methods

629 MPRA library design: 200 base pair sequences, with 100 bps flanking each side of

630 6628 SNPs were included in our MPRA library. The SNPs belong to 3 categories:

631 1. Islet chromatin accessibility quantitative trait loci (caQTLs): These are SNPs we

632 previously identified as having a significant association with altered in vivo

633 chromatin accessibility in islet samples11. For 1806 caQTLs, only the lead SNP

634 was included in the MPRA library. For 94 caQTLs, the 200-bp sequences

635 contained two SNPs in LD less than 25 bp apart. For these sequences, all four

636 allelic combinations were synthesized and tested.

637 2. Control SNPs: SNPs that overlapped islet ATAC-seq peaks but did not

638 significantly alter accessibility of those peaks were also synthesized and tested11. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

639 Since islet caQTLs were identified in a relatively small cohort of individuals

640 (n=19), the following criteria were used to include SNPs for which the caQTL

641 study was likely more powered to detect associations with chromatin

642 accessibility:

643 a. unadjusted p value > 0.2,

644 b. minor allele frequency > 0.125

645 SNPs overlapping individual-specific peaks or sharing peaks with other SNPs

646 were further filtered out. Of the remaining 15,178 SNPs, 2218 were randomly

647 selected to be included in the MPRA library.

648 3. T2D-associated SNPs/indels: 200 bp sequences overlapping SNPs (n=2299),

649 small insertions (n=72) and small deletions (n=129) in linkage disequilibrium

650 (r2>0.8) with T2D-associated index SNPs (n=259) from the NHGRI/EBI GWAS

651 Catalog (accessed January 19th, 2017) were synthesized and tested, as

652 previously described67. Briefly, T2D-associated GWAS SNPs were pruned using

653 PLINK version 1.968 to identify SNPs in high linkage disequilibrium (r2>0.8).

654

655 The vast majority of SNPs and sequences tested belonged to only 1 of the 3

656 categories. However, T2D-associated SNPs overlapping 13 ATAC-seq peaks were

657 significantly associated with chromatin accessibility in islets (caQTLs). Therefore, for

658 analysis purposes, whenever SNPs were required to belong to only 1 of the 3 categories

659 above (such as Figures 1A, 7A and 7B), they were not categorized as caQTLs, but as

660 being T2D-associated only.

661

662 MPRA library construction: The MPRA library was constructed as previously

663 described13. Briefly, oligos were synthesized (Agilent Technologies) as 230 bp

664 sequences containing 200 bp of genomic sequences and 15 bp of adaptor sequence on bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

665 either end. Unique 20 bp barcodes were added by PCR along with additional constant

666 sequence for subsequent incorporation into a backbone vector by Gibson assembly. The

667 oligo library was expanded by electroporation into E. coli, and the resulting plasmid

668 library was sequenced by Illumina 2 X 150 bp chemistry to acquire oligo-barcode

669 pairings. The library underwent restriction digestion, and GFP with a minimal TATA

670 promoter was inserted by Gibson assembly resulting in the 200 bp oligo sequence

671 positioned directly upstream of the promoter and the 20 bp barcode falling in the 3’ UTR

672 of GFP. After expansion within E. coli the final MPRA plasmid library was sequenced by

673 Illumina 1 X 31 bp chemistry to acquire a baseline representation of each oligo-barcode

674 pair within the library. Barcodes mapping to more than 1 sequence were discarded from

675 all downstream analyses. Note: 2 separate batches of the MPRA library were prepared.

676 The first batch was used to perform MPRA under standard culture conditions. This

677 MPRA library was then electroporated into E. coli to obtain a second batch of the MRPA

678 library, which was used for the DMSO-TG experiments.

679

680 MPRA library transfection into MIN6 cells: 10 million MIN6 cells were seeded in each

681 of seven 15 cm2 dishes. The cells were 60-70% confluent the next day. Each 15 cm2

682 dish was replaced with 20 ml of fresh media, and transfected with 7ug of the MPRA

683 plasmid library using 55ul Lipofectamine 2000 (38% transfection efficiency). Six hours

684 after transfection, media was either i) not changed (MPRA under standard culture

685 conditions), ii) replaced with media containing 250 nM Thapsigargin dissolved in 0.025%

686 DMSO, or iii) replaced with media containing 0.025% DMSO. Thirty hours after

687 transfection, cells were trypsinized and collected by centrifugation. Cell pellets

688 were frozen at -80°C. For each condition (standard culture / DMSO / 250 nM TG), MIN6

689 cells were transfected on five separate days to generate biological replicates.

690 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

691 RNA isolation and MPRA RNA-seq library generation: RNA was extracted from

692 frozen cell pellets using the Qiagen RNeasy Midi kit. Following DNase treatment, a

693 mixture of 3 GFP-specific biotinylated primers (Supplementary Table 1; #120, #123 and

694 #126) were used to immunoprecipitated GFP transcripts using Streptavidin C1

695 Dynabeads (Life Technologies). Following another round of DNase treatment, cDNA

696 was synthesized from GFP mRNA using SuperScript IV and purified with AMPure XP

697 beads. Quantitative PCR using primers specific for GFP (Supplementary Table 1; #34

698 and #52) was used to determine the cycle at which linear amplification begins for each

699 replicate. Replicates were diluted to approximately the same concentration based on the

700 qPCR results, and PCR with primers #34 and #52 was used to amplify barcodes

701 associated with the ~13.5k sequences included in the MRPA library for each replicate (9

702 cycles for standard culture, and 13 cycles for DMSO / 250 nM TG). A second round of

703 PCR (6 cycles) was used to add Illumina sequencing adaptors to the DNA/RNA

704 replicates. The resulting MPRA barcode libraries were spiked with 5% PhiX and

705 sequenced using Illumina single-end 31 bp chemistry (with 8 bp index read), clustered at

706 80-90% maximum density.

707

708 MPRA data analysis: Data from the MPRA was analyzed as previously described13.

709 Briefly, the sum of the barcode counts for each oligo within replicates was median

710 normalized, and oligos showing differential expression relative to the plasmid input were

711 identified by modeling a negative binomial distribution with DESeq2 and applying a false

712 discovery rate (FDR) threshold of 1%. For sequences that displayed significant MPRA

713 activity, a paired t-test was applied on the log-transformed RNA/plasmid ratios for each

714 experimental replicate to test whether the reference and alternate allele had similar

715 activity. An FDR threshold of 10% was used to identify SNPs with a significant skew in

716 MPRA activity between alleles (allelic skew). Because the MPRA testing standard bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

717 culture conditions was performed with a separate MRPA library preparation. Therefore,

718 the DMSO-TG MPRA results were not directly compared to MPRA performed under

719 standard culture conditions.

720

721 Annotating repetitive elements tested with MPRA: The ‘RepeatMasker’ track for

722 hg19 was downloaded from the UCSC genome browser. Among the ten different

723 classes of repeats, only three classes (long interspersed nuclear element (LINE), long

724 terminal repeat (LTR), and short interspersed nuclear element (SINE)) overlapped more

725 than 100 elements tested with MPRA. Therefore, only these three classes of repeats

726 were assessed for associations with MRPA activity.

727

728 Mapping human regulatory sequences tested with MPRA to mammalian genomes:

729 Liftover tool in the UCSC genome browser was used to map human sequences (hg19)

730 tested with MPRA to 20 mammalian genomes (with a minimum ratio of 0.20 bases that

731 must remap; allowing for multiple output regions). The 20 mammalian genomes are:

732 papAnu2 (Baboon), felCat5 (Cat), PanTro6 (Chimpanzee), BosTau7 (Cow), canFAM3

733 (Dog), loxAfr3 (Elephant), nomLeu3 (Gibbon), gorGor3 (Gorilla), equCab2 (Horse), mm9

734 (mouse), ponAbe2 (Orangutan), aiMel1 (Panda), susScr11 (Pig), ochPri3 (Pika),

735 oryCun2 (Rabbit), rn5 (Rat), rheMac8 (Rhesus), oviAri3 (Sheep), sorAra2 (Shrew),

736 speTri2 (Squirrel). Human sequences that did not liftover to the genome assembly of a

737 given species were subsequently classified as not conserved (with a minimum ratio of

738 0.20 bases that must remap; allowing for multiple output regions).

739 To obtain human-mouse sequence similarity measures, liftover was performed

740 99 times with the minimum ratio of bases that must remap ranging from 0.01 to 1.00 in

741 increments of 0.01 (allowing for multiple output regions). The R package ‘sm’ was used

742 plot density of human-mouse sequence similarity and perform non-parametric bootstrap bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

743 hypothesis tests of equality. Human sequences that did not liftover to the mm9 mouse

744 genome with even 1% sequence similarity were classified as having 0% sequence

745 similarity.

746

747 Cross-species mapping of human ATAC-seq peaks to MIN6 ATAC-seq peaks:

748 Human and mouse ATAC-seq data were processed as previously described11. Briefly,

749 low quality portions of reads were trimmed using Trimmomatic70 and aligned to the hg19

750 or mm9 genome assembly using Burrows Wheeler Aligner-MEM. For each replicate,

751 duplicate reads were removed after shifting. Technical replicates were merged using

752 SAMtools and peaks were called using MACS26 (with parameters -callpeak --nomodel -f

753 BAMPE) at FDR<1%. ATAC-seq peak summit positions were obtained from MACS2

754 output files. The liftover tool in the UCSC genome browser was used to map human

755 ATAC-seq peaks to the mouse (mm9) genome using a minimum ratio of 0.10 bases that

756 must remap (not allowing for multiple output regions). Using bedtools, human ATAC-seq

757 peaks mapping to the mouse genome were then overlapped with MIN6 ATAC-seq

758 peaks.

759

760 Transcription factor (TF) motif enrichment: Homer findMotifsGenome.pl script was

761 used to investigate TF motifs enriched in a given set of sequences. Elements with lower

762 MPRA activity under ER stress were used as background to identify TF motifs enriched

763 in elements with higher MPRA activity under ER stress, and vice-versa (parameters:

764 hg19, -size given). 2,008 T2D-associated elements with no MPRA activity were used as

765 background to identify TF motifs enriched in the 492 T2D SNP-containing sequences

766 with significant MPRA activity (parameters: hg19, -size given). For the cross-species

767 analysis of ATAC-seq peaks, ATAC-seq peaks shared with other human cell types were bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

768 used as background to identify TF motifs enriched in unique ATAC-seq peaks

769 (parameters: mm9, -size given).

770

771 Analysis of islet ChIP-seq data: Chromatin immunoprecipitation sequencing (ChIP-

772 seq) data from Pasquali et al.17 were aligned to the hg19 reference human genome as

773 previously described44. Elements tested with MPRA were then overlapped with ChIP-seq

774 peaks to conduct Fisher’s exact tests using R.

775

776 CentriMo analysis of ATAC-seq peak summits: CentriMo software in the MEME Suite

777 was used to identify TF motifs enriched in the 100 bp genomic regions flanking human

778 islet ATAC-seq peak summits71. Negative controls were 200 bp genomic regions

779 overlapping ATAC-seq peaks but flanking the human islet ATAC-seq peak summits

780 (default parameters).

781

782 Acknowledgements. We thank members of the Stitzel and Ucar labs for helpful

783 discussion and critiques during study design and execution and Francis S. Collins, D.

784 Leland Taylor, Christine Beck and Taneli Helenius for helpful comments on the

785 manuscript. This work was supported by W81XWH-18-0401 (to MLS and DU) and

786 R00HG008179 (to RT).

787

788 References

789 1. 1000 Genomes Project Consortium et al. A global reference for human genetic

790 variation. Nature 526, 68–74 (2015).

791 2. MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide

792 association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

793 3. Mahajan, A. et al. Fine-mapping of an expanded set of type 2 diabetes loci to

794 single-variant resolution using high-density imputation and islet-specific

795 epigenome maps. (2018) doi:10.1101/245506.

796 4. Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536,

797 41–47 (2016).

798 5. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J.

799 Transposition of native chromatin for fast and sensitive epigenomic profiling of

800 open chromatin, DNA-binding and nucleosome position. Nat Meth 10,

801 1213–1218 (2013).

802 6. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137

803 (2008).

804 7. Wang, X. et al. High-resolution genome-wide functional dissection of

805 transcriptional regulatory regions and nucleotides in human. Nat. Commun. 9,

806 5380 (2018).

807 8. Movva, R. et al. Deciphering regulatory DNA sequences and noncoding genetic

808 variants using neural network models of massively parallel reporter assays. PloS

809 One 14, e0218073 (2019).

810 9. Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human

811 expression variation. Nature 482, 390–394 (2012).

812 10. Kumasaka, N., Knights, A. J. & Gaffney, D. J. Fine-mapping cellular QTLs with

813 RASQUAL and ATAC-seq. Nat. Genet. 48, 206–213 (2016).

814 11. Khetan, S. et al. Type 2 Diabetes-Associated Genetic Variants Regulate Chromatin

815 Accessibility in Human Islets. Diabetes 67, 2466–2477 (2018). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

816 12. Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers

817 in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30,

818 271–277 (2012).

819 13. Tewhey, R. et al. Direct Identification of Hundreds of Expression-Modulating

820 Variants using a Multiplexed Reporter Assay. Cell 165, 1519–1529 (2016).

821 14. Ulirsch, J. C. et al. Systematic Functional Dissection of Common Genetic Variation

822 Affecting Red Blood Cell Traits. Cell 165, 1530–1545 (2016).

823 15. Vockley, C. M. et al. Massively parallel quantification of the regulatory effects of

824 noncoding genetic variation in a human cohort. Genome Res. 25, 1206–1214

825 (2015).

826 16. Klein, J. C. et al. Functional testing of thousands of osteoarthritis-associated

827 variants for regulatory activity. Nat. Commun. 10, 2434 (2019).

828 17. Pasquali, L. et al. Pancreatic islet enhancer clusters enriched in type 2 diabetes

829 risk-associated variants. Nat. Genet. 46, 136–143 (2014).

830 18. Parker, S. C. J. et al. Chromatin stretch enhancer states drive cell-specific gene

831 regulation and harbor human disease risk variants. Proc. Natl. Acad. Sci. U. S. A.

832 110, 17921–17926 (2013).

833 19. Roman, T. S. et al. A Type 2 Diabetes-Associated Functional Regulatory Variant in

834 a Pancreatic Islet Enhancer at the Adcy5 Locus. Diabetes (2017)

835 doi:10.2337/db17-0464.

836 20. Kycia, I. et al. A common type 2 diabetes risk variant potentiates activity of an

837 evolutionarily conserved islet stretch enhancer and increases C2CD4A/B

838 expression. Am. J. Hum. Genet. in press, (2018). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

839 21. Varshney, A. et al. Genetic regulatory signatures underlying islet gene expression

840 and type 2 diabetes. Proc. Natl. Acad. Sci. U. S. A. (2017)

841 doi:10.1073/pnas.1621192114.

842 22. Sharma, R. B. et al. Insulin demand regulates β cell number via the unfolded

843 protein response. J. Clin. Invest. 125, 3831–3846 (2015).

844 23. Dooley, J. et al. Genetic predisposition for beta cell fragility underlies type 1 and

845 type 2 diabetes. Nat. Genet. 48, 519–527 (2016).

846 24. Bonnycastle, L. L. et al. Autosomal dominant diabetes arising from a Wolfram

847 syndrome 1 mutation. Diabetes 62, 3943–3950 (2013).

848 25. Delépine, M. et al. EIF2AK3, encoding translation initiation factor 2-alpha kinase

849 3, is mutated in patients with Wolcott-Rallison syndrome. Nat. Genet. 25, 406–

850 409 (2000).

851 26. Booth, C. & Koch, G. L. Perturbation of cellular calcium induces secretion of

852 luminal ER proteins. Cell 59, 729–737 (1989).

853 27. Sehgal, P. et al. Inhibition of the sarco/endoplasmic reticulum (ER) Ca2+-ATPase

854 by thapsigargin analogs induces cell death via ER Ca2+ depletion and the

855 unfolded protein response. J. Biol. Chem. 292, 19656–19673 (2017).

856 28. Stone, S. et al. Pancreatic stone protein/regenerating protein is a potential

857 biomarker for endoplasmic reticulum stress in beta cells. Sci. Rep. 9, 5199

858 (2019).

859 29. Robbins, R. D. et al. Inhibition of deoxyhypusine synthase enhances islet {beta}

860 cell function and survival in the setting of endoplasmic reticulum stress and type

861 2 diabetes. J. Biol. Chem. 285, 39943–39952 (2010). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

862 30. Cunha, D. A. et al. Pancreatic β-cell protection from inflammatory stress by the

863 endoplasmic reticulum proteins thrombospondin 1 and mesencephalic

864 astrocyte-derived neutrotrophic factor (MANF). J. Biol. Chem. 292, 14977–14988

865 (2017).

866 31. Syed, I. et al. PAHSAs attenuate immune responses and promote β cell survival in

867 autoimmune diabetic mice. J. Clin. Invest. 129, 3717–3731 (2019).

868 32. Chen, Y.-C. et al. PAM haploinsufficiency does not accelerate the development of

869 diet- and human IAPP-induced diabetes in mice. Diabetologia 63, 561–576

870 (2020).

871 33. Suetomi, R. et al. Adrenomedullin has a cytoprotective role against ER stress for

872 pancreatic ß-cells in autocrine and paracrine manners. J. Diabetes Investig.

873 (2020) doi:10.1111/jdi.13218.

874 34. Gao, N. et al. Foxa1 and Foxa2 maintain the metabolic and secretory features of

875 the mature beta-cell. Mol. Endocrinol. Baltim. Md 24, 1594–1604 (2010).

876 35. Fu, Z., Gilbert, E. R. & Liu, D. Regulation of insulin synthesis and secretion and

877 pancreatic Beta-cell dysfunction in diabetes. Curr. Diabetes Rev. 9, 25–53 (2013).

878 36. Cullinan, S. B. & Diehl, J. A. Coordination of ER and oxidative stress signaling: the

879 PERK/Nrf2 signaling pathway. Int. J. Biochem. Cell Biol. 38, 317–332 (2006).

880 37. Oyadomari, S. & Mori, M. Roles of CHOP/GADD153 in endoplasmic reticulum

881 stress. Cell Death Differ. 11, 381–389 (2004).

882 38. Efanov, A. M., Sewing, S., Bokvist, K. & Gromada, J. Liver X receptor activation

883 stimulates insulin secretion via modulation of glucose and lipid metabolism in

884 pancreatic beta-cells. Diabetes 53 Suppl 3, S75-78 (2004). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

885 39. Ogihara, T. et al. Liver X receptor agonists augment human islet function through

886 activation of anaplerotic pathways and glycerolipid/free fatty acid cycling. J. Biol.

887 Chem. 285, 5392–5404 (2010).

888 40. Aguayo-Mazzucato, C. et al. Thyroid hormone promotes postnatal rat pancreatic

889 β-cell development and glucose-responsive insulin secretion through MAFA.

890 Diabetes 62, 1569–1580 (2013).

891 41. Aguayo-Mazzucato, C. et al. T3 Induces Both Markers of Maturation and Aging in

892 Pancreatic β-Cells. Diabetes 67, 1322–1331 (2018).

893 42. Brun, P.-J. et al. Retinoic acid receptor signaling is required to maintain glucose-

894 stimulated insulin secretion and β-cell mass. FASEB J. Off. Publ. Fed. Am. Soc. Exp.

895 Biol. 29, 671–683 (2015).

896 43. Peiris, H. et al. Discovering human diabetes-risk gene function with genetics and

897 physiological assays. Nat. Commun. 9, 3855 (2018).

898 44. Stitzel, M. L. et al. Global epigenomic analysis of primary human pancreatic islets

899 provides insights into type 2 diabetes susceptibility loci. Cell Metab. 12, 443–455

900 (2010).

901 45. Gaulton, K. J. et al. A map of open chromatin in human pancreatic islets. Nat.

902 Genet. 42, 255–259 (2010).

903 46. Fogarty, M. P., Panhuis, T. M., Vadlamudi, S., Buchkovich, M. L. & Mohlke, K. L.

904 Allele-specific transcriptional activity at type 2 diabetes-associated single

905 nucleotide polymorphisms in regions of pancreatic islet open chromatin at the

906 JAZF1 locus. Diabetes 62, 1756–1762 (2013). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

907 47. Greenwald, W. W. et al. Pancreatic islet chromatin accessibility and

908 conformation reveals distal enhancer networks of type 2 diabetes risk. Nat.

909 Commun. 10, 2078 (2019).

910 48. Miguel-Escalada, I. et al. Human pancreatic islet three-dimensional chromatin

911 architecture provides insights into the genetics of type 2 diabetes. Nat. Genet. 51,

912 1137–1148 (2019).

913 49. Deininger, P. Alu elements: know the SINEs. Genome Biol. 12, 236 (2011).

914 50. Kono, T. et al. Impaired Store-Operated Calcium Entry and STIM1 Loss Lead to

915 Reduced Insulin Secretion and Increased Endoplasmic Reticulum Stress in the

916 Diabetic β-Cell. Diabetes 67, 2293–2304 (2018).

917 51. Ghiasi, S. M. et al. Endoplasmic Reticulum Chaperone Glucose-Regulated Protein

918 94 Is Essential for Proinsulin Handling. Diabetes 68, 747–760 (2019).

919 52. Xin, Y. et al. Pseudotime Ordering of Single Human β-Cells Reveals States of

920 Insulin Production and Unfolded Protein Response. Diabetes 67, 1783–1794

921 (2018).

922 53. Kim, M. J. et al. Attenuation of PERK enhances glucose-stimulated insulin

923 secretion in islets. J. Endocrinol. 236, 125–136 (2018).

924 54. Hasnain, S. Z., Prins, J. B. & McGuckin, M. A. Oxidative and endoplasmic reticulum

925 stress in β-cell dysfunction in diabetes. J. Mol. Endocrinol. 56, R33-54 (2016).

926 55. Pehrsson, E. C., Choudhary, M. N. K., Sundaram, V. & Wang, T. The epigenomic

927 landscape of transposable elements across normal human development and

928 anatomy. Nat. Commun. 10, 5640 (2019). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

929 56. Su, M., Han, D., Boyd-Kirkup, J., Yu, X. & Han, J.-D. J. Evolution of Alu elements

930 toward enhancers. Cell Rep. 7, 376–385 (2014).

931 57. Jang, H. S. et al. Transposable elements drive widespread expression of

932 oncogenes in human cancers. Nat. Genet. 51, 611–617 (2019).

933 58. Franchini, L. F. et al. Convergent evolution of two mammalian neuronal

934 enhancers by sequential exaptation of unrelated retroposons. Proc. Natl. Acad.

935 Sci. U. S. A. 108, 15270–15275 (2011).

936 59. Choudhary, M. N. et al. Co-opted transposons help perpetuate conserved higher-

937 order chromosomal structures. Genome Biol. 21, 16 (2020).

938 60. Sundaram, V. et al. Widespread contribution of transposable elements to the

939 innovation of gene regulatory networks. Genome Res. 24, 1963–1976 (2014).

940 61. Xie, M. et al. DNA hypomethylation within specific transposable element families

941 associates with tissue-specific enhancer landscape. Nat. Genet. 45, 836–841

942 (2013).

943 62. Sundaram, V. et al. Functional cis-regulatory modules encoded by mouse-specific

944 endogenous retrovirus. Nat. Commun. 8, 14550 (2017).

945 63. Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter

946 profiling reveals widespread alternative promoter usage and transposon-driven

947 developmental gene expression. Genome Res. 23, 169–180 (2013).

948 64. Hernandez, A. J. et al. B2 and ALU retrotransposons are self-cleaving ribozymes

949 whose activity is enhanced by EZH2. Proc. Natl. Acad. Sci. U. S. A. 117, 415–425

950 (2020). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

951 65. Bejerano, G. et al. A distal enhancer and an ultraconserved exon are derived

952 from a novel retroposon. Nature 441, 87–90 (2006).

953 66. Mita, P. & Boeke, J. D. How retrotransposons shape genome regulation. Curr.

954 Opin. Genet. Dev. 37, 90–100 (2016).

955 67. Lawlor, N. et al. Multiomic Profiling Identifies cis-Regulatory Networks

956 Underlying Human Pancreatic β Cell Identity and Function. Cell Rep. 26, 788-

957 801.e6 (2019).

958 68. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-

959 based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

960 69. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and

961 dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

962 70. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for

963 Illumina sequence data. Bioinforma. Oxf. Engl. 30, 2114–2120 (2014).

964 71. Bailey, T. L. & Machanick, P. Inferring direct DNA binding from ChIP-seq. Nucleic

965 Acids Res. 40, e128 (2012).

966 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1 a 1.00 b AA Allele-A

RNA = DNA Allele-A Overlap caQTLs not Enhancer

Islet RNA DNA (n=1910) ATAC-seq rs123456789-allele-A mP GFP Barcode-A Peak

0.75 >

AA AA AA Allele-B RNA > DNA AA Allele-B control SNPs AA is Enhancer AA 0.50 (n=2218) RNA DNA

Fraction rs123456789-allele-B mP GFP Barcode-B

Transfect Plasmid Pool MIN6 beta cells (DNA) 0.25 T2D-Associated n=201 SNPs/Indels 2 rs123456789-allele-A mP GFP Barcode-A Sequences (n=2500) 1 (259 Signals) Example SNP Element rs123456789-allele-B mP GFP Barcode-B rs123456789 200 bps 20 bps 0.00 c d *** 10.0 *** *** 7.5 *** ** ** *** 5.0 ( RNA / DNA ) 2 2.5 log

0.0 Odds of MRPA active sequences overlapping human islet ChIP-seq peaks 0.0 0.5 1.0 1.5 2.0 2.5 −2.5 PDX1 CTCF MAFB H2A.Z

0 2000 4000 6000 FOXA2 NKX6.1

Mean DNA Count H3K27ac bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2 a c MPRA MPRA activity

Plasmid Pool 2 1 0 −1 −2 Active Lower in TG (n=656) rs123456789-allele-A mP GFP Barcode-A Higher in TG (n=328)

rs123456789-allele-B mP GFP Barcode-B 6,628 SNPs 13,628 sequences

Transfect MIN6 mouse beta cells Standard Culture 6 hours

24 hrs 24 hrs

0.025% DMSO 250 nM solvent control Thapsigargin (TG) Replicates 1-5 Replicates 1-5 Replicates 1-5

Harvest cells for RNA TG - - -RNA - -RNA- 30 hrs after transfection DNA

DMSO DMSO TG b d 5.0 6

CEBP Atf4 4 Chop 2.5 LXRE 2 THRa

ed to Gapdh Mef2a

(TG / DMSO) 0.0 2 n−Myc

mali z 0 log ichment) TG-DMSO Nor CRE CLOCK −2 MafA (Enr 2 −2.5 RXR

log Foxa2 Rfx1 −4 ns X−box ns * *** ** * TFE3 USF1 PAX6 Hnf1 −5.0 Actb Ins2 Hspa5 Edem1 −5.0 −2.5 0.0 2.5 5.0 Hsp90b1 log (Enrichment) DMSO-TG Ddit3/Chop 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a c e Figure 3 Elements tested: JUN p=7.8e-169 TG Model FOS p=2.1e-162 Included ATAC-seq 106 BATF p=2.0e-124 ATAC-seq peak summit (<100 bp) * 129 NF2L2 p=2.2e-116 * * peak NFE2 p=1.6e-107 Excluded ATAC-seq BACH2 p=7.9e-99 peak summit (>100 bp) 81 CREB1 p=5.8e-87 # Active CEBPZ p=1.5e-62 # Elements * * *

Summit Flank NFYB p=1.9e-49 = * 0.0080 949 Summit * * * 0.0070 Flank * n.s. 375 * 153 0.0060 * n.s. * * 0.0050 * n.s.

25mM Probability 0.0040 Glucose 190 DMSO 0.0030 Standard Culture -80 -60 -40 -20 0 20 40 60 80 MPRA activity Distance from sequence center b d 0.2 0.3 0.4 0.5 0.6 0.7

Overlap Islet ATAC-seq peak (n=4329) Elements overlapping ATAC-seq peaks are: Probability of an elemet showing

All elements tested MPRA 0.1 Elements tested: # Active Doesn’t Overlap ATAC-seq peak (n=2299) 0.004 Active # Elements *** (n=1553) Included ATAC-seq = peak summit (<100 bp) *** * Not *** *** Excluded ATAC-seq *** MPRA peak summit (>100 bp) Active

Density (n=2666) # Elements 0 200 400 600 1 35 7 9 11 1315 17 19

MPRA activity 19 19 19 19 19 19 19 19 19 19 0.0 0.2 0.4

0.0 # Islets with overlapping ATAC-seq Peak # Islets −300 −200 −100 0 100 200 300 < 20 % >80 % Probability of an elemet showing 20-40 % 40-60 % 60-80 % Median distance from islet ATAC-seq Human-mouse sequence conservation peak summit t o sequence center bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 4 a b c d ** *** *** ***

** # Allelic Skew # Allelic # MPRA Active = ** # Elements # MPRA Active = (Higher Allele / Lower Allele) 2 has allelic skew in activity 0.1 0.2 0.3 0.4 0.5 0.6 0.7 showing MPRA activity Absolute (median) distance of SNP to ATAC-seq peak summit ATAC-seq of SNP to (median) distance Absolute 0 100 200 300 400 500 Probability an MPRA active element Probability of an element 707 778 283 376 MPRA Activity log 2218 1910 707 778 0.0 0.1 0.2 0.3 0.4 0.5 Control SNPs + + 0.0 0.1 0.2 0.3 0.4 0.5 caQTL SNPs + + MPRA Active + + Control SNPs Control SNPs Control(n=283) SNPs (n=376) caQTL SNPs caQTL SNPs caQTL SNPs bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5 a * 0.4

rs17396537 0.2

GG n=6 (RNA/DNA) 2 GC n=11 log CC n=2 0.0

ATAC-seq Peak Summit LYPD1

200 bp G C rs17396537 (G/C) element REF ALT b Pearson R = 0.526 *** c *

1 N=24[+2] N=111 * * * Ref < Alt *

rs1515555 (DMSO) rs2943656 0 rs17396537 IRS1 (RNA/DNA) 2 0.2 0.3 0.4 0.5 0.6 log MPRA log2 (Alt / Ref)

rs10428126 0.1 IGF2BP2 n.s

−1 N=133[+2] N=23[+2] Ref > Alt rs1515555 G G A A 0.2 0.3 0.4 0.5 0.6 0.7 0.8 rs1515556 T C T C Ref > Alt caQTL effect size Ref < Alt REF/REF ALT/ALT bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 6 a c 60 LXRE rs9879646

GSC Atf1

THRa EAR2 40 COUP−TFII Std. Culture

value) CRX (25mM Glu)

RARa (allelic skew) DMSO THRb Usf2 Bcl11a Sp2 TG −log10(p KLF5 Sp5

Sp1 0 0.05 0.10 0.15 20 Mef2b log2 (minor allele / major allele) Mef2c KLF14 Mef2a Chop ZBTB20 Nkx3.1 USF1 RAR:RXR Pitx1 NRF rs9879646 200 bp NFkB−p65−Rel Rhesus Mouse 0 Dog −2 0 2 Elephant log2(# Target / # Background) Opossum

b 10 d HIGD1C rs143582379

9 PCNXL2 0.2

8 INTS8 HMG20A

7 0.0 6 5 ke w in MPRA activity IGF2BP2 rs4768945 JAZF1 rs12315928 4 CDKN2B rs4768942 VEGFA 3 rs4768941 MAP3K1 rs12316856 TCF7L2 2 rs224587 −0.4 −0.2 # SNPs with allelic s rs11169627 log2 (minor allele / major allele) (allelic skew) rs73090647 1 ZBTB20 rs73310020 0 HIGD1C SLC11A2 0 1 2 3 4 5 6 Rhesus Mouse log2(# SNPs tested) bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. a b c Figure 7 MPRA activity *** z-score of RNA/DNA 1.00 under ER stress *** *** (n=134) (n=226) (n=16) *** 4 2 0 −2 −4 Lower n.s. caQTLsT2D Higher * 0.75 (n=31)

* *** 0.50 10,000 *** (n=185) random n.s. regions

n.s. SNPs overlapping SINE Repeats 0.25 (n=67) Fraction of elements with T2D-associated 0.0 0.2 0.4 0.6 0.8 1.0

** *** 0.00 (n=24) (n=110) (n=86) No MPRA Activity - + + + + Lower Higher change Allelic Skew - - + - + Change in MPRA activity under Higher Activity - - - + + ER stress of SNPs with allelic Skew under ER stress

d SNPs with Allelic skew caQTLs (n=376) Controls (n=283)

rs224587

rs73090647

10,000 random rs9879646 200 bps Enhancer Activity under ER stress of regions T2D-associated SNPs with Allelic skew from hg19 No change

rs12316856 Probability SNPs with allelic skew are conserved Higher (n=86) 0.0 0.2(n=110) 0.4 0.6 0.8 1.0 12345123451234512345 Replicate ---- (Sequences are conservation if sequence similarity > 20%) Pig Rat Cat Pika Dog Ref Alt Ref Alt Allele Cow Horse Panda Shrew Sheep Gorilla Rabbit Mouse Rhesus

- --- Gibbon Squirrel DMSO TG Treatment Baboon Elephant Orangutan Chimpamzee bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

IndexSNP Locus SNP REF_ALLELE ALT_ALLELE MAF_AFR MAF_AMR rs7903146 TCF7L2 rs7903146 C T 0.28 0.25 rs9472138 VEGFA rs12189774 G A 0.19 0.21 rs3842770 INS-IGF2 rs115420895 G A 0.06 0.01 rs7578326 IRS1 rs2943656 A G 0.6 0.73 rs10507349 RNF6 rs4630391 C T 0.08 0.36 rs1535500 KCNK16 rs12663159 C A 0.86 0.49 rs864745 rs1635852 T C 0.22 0.4 JAZF1 rs864745 rs849133 C T 0.21 0.4 rs10811660 G A 0.05 0.14 rs10811661 CDKN2B-AS1 rs10811661 T C 0.05 0.14 rs10965246 T C 0.07 0.14 rs10428126 T C 0.6 0.29 rs9808924 G A 0.8 0.32 rs1470579 IGF2BP2 rs9854769 A G 0.8 0.31 rs7630554 A G 0.79 0.31 rs35188816 CA C 0.78 0.31 rs1048886 SDHAF4 rs2881632 C A 0.4 0.22 rs12496230 KBTBD8 rs13096599 T C 0.02 0.2 rs6467136 GCC1 rs7783500 G A 0.54 0.4 rs10923931 NOTCH2 rs72697237 A T 0.36 0.12 rs7578597 THADA rs13405776 C T 0.34 0.07 rs7630877 PEX5L rs17748864 C T 0.24 0.27 rs7957197 OASL rs7957197 T A 0.17 0.15 rs34669198 SRBD1 rs13026123 T A 0.09 0.09 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Allele with higher MPRA activity under MAF_ASN MAF_EUR Standard Culture DMSO 250 nM TG 0.03 0.31 - T - 0.11 0.29 - G G 0 0.0013 A - - 0.94 0.62 - G G 0.44 0.24 C C C 0.43 0.49 A - - 0.21 0.54 - C - 0.21 0.53 C C - 0.44 0.16 - G - 0.44 0.16 - C - 0.44 0.17 C C - 0.28 0.29 T T - 0.29 0.29 A - - 0.29 0.29 G G G 0.29 0.28 G G G 0.28 0.27 - CA - 0.12 0.19 A A A 0.08 0.18 C C C 0.67 0.42 G G G 0.03 0.11 T T T 0.01 0.09 T T T 0.15 0.34 T T T 0 0.2 T T T 0.1 0.13 T T T bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

References Comments Stitzel 2010, Gaulton 2010 Validated with MPRA Miguel-Escalada 2019 Validated with MPRA Khetan 2018 Validated with MPRA Khetan 2018 Validated with MPRA Khetan 2018 MPRA has further clarified haplotype Varshney 2017 MPRA has further clarified haplotype Fogarty 2013 Validated with MPRA Novel Wesolowska-Andersen 2019, bioRxiv Validated with MPRA Khetan 2018 Validated with MPRA Novel WW Greenwald 2019 Validated with MPRA Novel Novel Novel Novel Novel Novel Novel Novel Novel Novel Novel Novel