bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1 Functional characterization of thousands of type 2 diabetes-associated and
2 chromatin-modulating variants under steady state and endoplasmic reticulum
3 stress
4
5 Shubham Khetan1,3, Susan Kales2, Romy Kursawe1, Alexandria Jillette1, Steven K.
6 Reilly4, Duygu Ucar1,3,5, Ryan Tewhey2,6,7*, and Michael L. Stitzel1,3,5*
7
8 1The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
9 2The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, 04609, USA
10 3Department of Genetics and Genome Sciences, University of Connecticut, Farmington,
11 CT, 06032, USA,
12 4Broad Institute of MIT and Harvard, Cambridge, MA, USA
13 5Institute of Systems Genomics, University of Connecticut, Farmington, CT, 06032, USA
14 6Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono,
15 ME, 04469, USA.
16 7Tufts University School of Medicine, Boston, MA, 02111, USA
17
18 *Corresponding authors: [email protected] and [email protected]
19
20 Keywords: Massively Parallel Reporter Assay (MPRA), human pancreatic islets, MIN6
21 beta cells, chromatin accessibility, ATAC-seq, chromatin accessibility quantitative trait
22 locus (caQTLs), genome-wide association study (GWAS), single nucleotide
23 polymorphism (SNP), Type 2 diabetes (T2D), endoplasmic reticulum (ER) stress, short
24 interspersed nuclear elements (SINE), Alu elements. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
25 Abstract: (199 words)
26 A major goal in functional genomics and complex disease genetics is to identify
27 functional cis-regulatory elements (CREs) and single nucleotide polymorphisms (SNPs)
28 altering CRE activity in disease-relevant cell types and environmental conditions. We
29 tested >13,000 sequences containing each allele of 6,628 SNPs associated with altered
30 in vivo chromatin accessibility in human islets and/or type 2 diabetes risk (T2D GWAS
31 SNPs) for transcriptional activity in ß cell under steady state and endoplasmic reticulum
32 (ER) stress conditions using the massively parallel reporter assay (MPRA).
33 Approximately 30% (n=1,983) of putative CREs were active in at least one condition.
34 SNP allelic effects on in vitro MPRA activity strongly correlated with their effects on in
35 vivo islet chromatin accessibility (Pearson r=0.52), i.e., alleles associated with increased
36 chromatin accessibility exhibited higher MPRA activity. Importantly, MPRA identified
37 220/2500 T2D GWAS SNPs, representing 104 distinct association signals, that
38 significantly altered transcriptional activity in ß cells. This study has thus identified
39 functional ß cell transcription-activating sequences with in vivo relevance, uncovered
40 regulatory features that modulate transcriptional activity in ß cells under steady state and
41 ER stress conditions, and substantially expanded the set of putative functional variants
42 that modulate transcriptional activity in ß cells from thousands of genetically-linked T2D
43 GWAS SNPs. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
44 Introduction
45 Studies over the past decade have identified millions of single nucleotide
46 polymorphisms (SNPs) in the human population1. Genome-wide association studies
47 (GWAS) have linked thousands of these SNPs to variability in physiological traits and
48 disease susceptibility2, including type 2 diabetes (T2D)3. The overwhelming majority of
49 GWAS SNPs reside in non-coding, regulatory regions of the genome3,4, implicating
50 altered transcriptional regulation as a common molecular mechanism underlying disease
51 risk. Compared to protein-coding regions, predicting regulatory functions of non-coding
52 sequences, and the effect of SNPs within them, remains a significant challenge.
53 Approaches such as ATAC-seq identify regions of accessible chromatin as a
54 marker of in vivo cis-regulatory elements (CREs)5. However, CREs identified by
55 chromatin accessibility (ATAC-seq peaks) vary significantly in length. Moreover, ATAC-
56 seq peaks are characterized by 1 or more summits, which have higher chromatin
57 accessibility than flanking genomic regions also within ATAC-seq peaks6. Few studies
58 have explored the relationship between transcriptional activity and open chromatin
59 regions defined by ATAC-seq7,8. For instance, it is not known whether higher chromatin
60 accessibility within ATAC-seq peaks also corresponds to enhanced transcriptional
61 activity, or whether proximity to ATAC-seq peak summits influences the likelihood for a
62 SNP to affect enhancer activity. Studies designed to identify chromatin accessibility
63 quantitative trait loci (caQTL) are being increasingly applied to nominate SNPs altering
64 CRE activity9,10. Previously, we identified 2,949 caQTLs in human islets11, which were
65 significantly enriched for type 2 diabetes (T2D)-associated SNPs, highlighting the utility
66 of this approach to nominate putative disease-associated SNPs affecting CRE activity in
67 relevant cell types. However, caQTLs are correlative associations, and high throughput
68 assays to directly test variant effects are needed to experimentally establish causality.
69 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
70 Massively parallel reporter assays (MPRAs) have been developed as a functional
71 genomics platform to interrogate the enhancer potential of thousands of sequences
72 simultaneously12 under a variety of (patho)physiologic conditions. By introducing
73 nucleotide changes in a given sequence of interest, the effect of naturally occurring
74 sequence variants in the human population on MPRA activity can also be elucidated13.
75 Several studies have employed MPRA to identify functional SNPs associated with red
76 blood cell traits14, adiposity15, osteoarthritis16, cancer and eQTLs13.
77 Although studies over the past several years implicate altered islet CRE function
78 in T2D genetic risk and progression, they have colocalized only ~1/4 of T2D-associated
79 loci to altered chromatin accessibility and/or gene expression levels in islets11,17–20. We
80 hypothesize that this is partially because previous studies measured chromatin
81 accessibility (caQTLs) and gene expression (eQTL) in islets cultured under steady state
82 conditions11,21, consequently missing important gene-environment interactions
83 characteristic of a complex disorder like T2D. For example, endoplasmic reticulum (ER)
84 stress, which is elicited by the high demands of insulin production and secretion on
85 protein folding capacity in β cells, leads to (patho)physiologic adaptations. Low levels of
86 ER stress are a signal for β cells to proliferate as a mechanism to meet higher insulin
87 secretion demands22. However, uncompensated ER stress induced by genetic and/or
88 environmental perturbations leads to β cell failure and T2D23.
89 Here, we applied MPRA to directly test >13,000 sequences, including those
90 overlapping T2D GWAS and islet caQTL SNPs, for their ability to activate ß cell
91 transcription in steady state conditions and after exposure to the ER stress-inducing
92 agent thapsigargin or DMSO solvent control. We identified motifs and features of
93 sequences affecting MPRA activity in each condition and linked allelic effects on MPRA
94 activity to T2D genetics and in vivo islet chromatin accessibility. These results
95 demonstrate the power of MPRA to elucidate in vivo relevant ß cell transcriptional bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
96 activating sequences and define the regulatory grammar of these sequences under
97 steady state and stress-responsive conditions. We nominate 54 putative causal variants
98 among multiple genetically-linked islet dysfunction and T2D GWAS SNPs for further
99 study.
100
101 Results
102 Selection and testing of sequences for MPRA activity in β cells
103 To test putative islet β cell regulatory sequences for their ability to enhance
104 transcription from a minimal promoter and to identify SNPs that alter regulatory activity,
105 we generated an MPRA library containing: i) 1,910 SNPs significantly associated with
106 changes in human islet chromatin accessibility (caQTLs)11; ii) 2,218 control SNPs
107 overlapping human islet ATAC-seq peaks, which were reported to have no correlation
108 with changes in human islet chromatin accessibility11; and iii) 2,500 index and genetically
109 linked (r2>0.8) SNPs/indels corresponding to 259 T2D association signals in the
110 NHGRI/EBI GWAS Catalogue (Figure 1a, Table S2, and Methods). In total, 13,628 two-
111 hundred base pair (bp) sequences from the human genome were included in this MPRA
112 library.
113 MIN6 mouse β cells have been extensively employed to study human islet
114 regulatory sequences because human and mouse transcription factor (TF) expression
115 and DNA binding motifs are extensively conserved. MIN6 cells express many β cell-
116 specific TFs, such as Pdx1, Nkx6.1 and Foxa2, and in a comparison with nine human
117 tissues/cell-types, the chromatin accessibility profile of MIN6 mouse β cells most
118 resembled that of human islets (Supplementary Figure 1). To test human sequences for
119 islet β cell transcriptional activity, we transfected the MPRA plasmid library into five
120 independent batches of MIN6 β cells under standard culture conditions (25mM glucose,
121 Methods). Thirty hours later, transfected cells were harvested for RNA isolation, GFP bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
122 mRNA capture, and Illumina sequencing of the regulatory sequence-associated
123 barcodes (Figure 1b). As anticipated, principal component analysis indicated that RNA
124 expression of the transfected MPRA libraries was highly correlated between the five
125 biological replicates and clustered distinctly from the MPRA plasmid library input
126 (Supplementary Figure 2, compare RNA vs. DNA). 2,224/13,628 sequences (16.3%),
127 representing 1,372 distinct elements, exhibited significantly higher counts after
128 transfection compared to the plasmid library input under standard culture conditions
129 (FDR<1%; Figure 1c; Supplementary Table 2). We refer to these sequences as ‘MPRA
130 active’ throughout the remainder of the manuscript. MPRA active sequences were
131 significantly enriched for in vivo binding by human islet-specific TFs (PDX1, NKX6.1 and
132 FOXA2; ChIP-seq; Figure 1d), suggesting that MPRA identifies human islet β cell
133 regulatory sequences with in vivo relevance and validating MIN6 as a powerful cellular
134 model to test human sequences for islet β cell transcriptional activity.
135
136 Identification of β cell regulatory elements responding to ER stress
137 The high secretory burden of insulin production, processing, and secretion
138 makes β cells particularly susceptible to ER stress. In fact, activation of the unfolded
139 protein response (UPR) and ER stress has been implicated in the genetic etiology and
140 pathophysiology of both monogenic24,25 and type 2 diabetes23. Thapsigargin (TG) blocks
141 calcium transport into the ER lumen and is widely used to induce ER stress in cells26–33.
142 Therefore, to identify sequences whose ß cell transcriptional activity is modulated by ER
143 stress, MIN6 cells transfected with the MPRA library were grown in standard culture
144 media (25 mM glucose) supplemented with 250 nM TG or DMSO solvent control for 24
145 hours (Figure 2a). As expected, exposing MIN6 cells to 250 nM TG induced expression
146 of ER stress response genes such as Ddit3 (Chop), Hspa5, and Edem1, and reduced
147 Ins2 expression (Figure 2b). Compared to the DMSO solvent control, ER stress bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
148 decreased MPRA activity of 656 sequences (mapping to 449 elements) and increased
149 MPRA activity of 328 sequences (mapping to 275 elements), respectively, at FDR <1%
150 (Figure 2c).
151 We compared these differentially active sequences to identify TF motifs that may
152 be mediating their transcriptional activity differences. Elements enriched for sequence
153 binding motifs of β cell-specific TFs that mediate insulin gene transcription and beta cell
154 function34,35, such as MAFA, FOXA2 and PAX6, had lower MPRA activity under ER
155 stress (Figure 2d). Concordantly, we observed a significant decrease in insulin gene
156 expression (Figure 2b), suggesting ER stress leads to inactivation of these β cell-specific
157 TFs. It is known that uncompensated ER stress leads to preferential ATF4 translation
158 and induction of DDIT3/CHOP36,37. Consistently, we found that motifs for these TFs were
159 enriched in elements with higher MPRA activity after ER stress induction (Figure 2d).
160 Activity measurements with MPRA, therefore, recapitulate TF dynamics during ER stress
161 and identify β cell regulatory elements that respond to ER stress.
162
163 ATAC-seq peak summits are enriched for TF binding motifs and MPRA activity
164 In total, 30% of sequences containing one or both SNP alleles were MPRA active
165 in at least one treatment condition (Figure 3a, n=1983/6628, FDR ≤1%). MPRA active
166 sequences were significantly enriched for 114 TF binding motifs (Supplementary Table
167 3), including those of islet TFs such as HNF1, MAFA, RFX5, and FOXA2. Human-mouse
168 sequence similarity did not influence the probability that a tested human sequence was
169 MPRA active in MIN6 (black dashed line in Figure 3b; Supplementary Figure 3). As
170 anticipated, sequences overlapping regions with in vivo evidence of transcriptional
171 regulatory potential in human islets (ATAC-seq peaks) were significantly more likely to
172 be identified as MPRA active regardless of the sequence similarity between human and
173 mouse (Figure 3b). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
174 Notably, the majority (60%) of elements accessible in human islets were not
175 MPRA active (Figure 3b). Islet ATAC-seq peaks vary in length (100-3500 bps).
176 Moreover, within an ATAC-seq peak, there are differences in chromatin accessibility
177 levels at nucleotide resolution, where peak summits are locations with higher read
178 counts than the flanking regions (schematic in Figure 3c; example in Figure 5a). Peak
179 summits are significantly enriched in TF binding motifs compared to flanking regions
180 within the same ATAC-seq peak (Figure 3c and Supplementary Figure 4). Therefore, for
181 MPRA library sequences overlapping ATAC-seq peaks, we examined the distance of
182 SNPs, which we designed to be in the center of all MPRA sequences tested, to the
183 summit of ATAC-seq peaks in which they reside. Sequences closer to ATAC-seq peak
184 summits were significantly more likely to be MPRA active (Figure 3d). However, this
185 effect was observed only when SNPs were less than 100 bps from the ATAC-seq peak
186 summit (dashed green line in Figure 3d). Since the MPRA library tests the 100 base
187 pairs flanking both sides of a given SNP of interest, ATAC-seq peak summits were
188 included in the MPRA sequences only if they were ≤100 bp from the SNP. Therefore,
189 MPRA sequences overlapping ATAC-seq peaks were categorized based on whether the
190 summit was within 100 bps of the SNP or not, i.e., whether the ATAC-seq peak summit
191 was included in the sequence tested with MPRA or not. Sequences that included the
192 ATAC-seq peak summit were indeed more likely to be active with MPRA, irrespective of
193 the number of times a genomic region was accessible in a cohort of 19 islet donors
194 (Figure 3e). Interestingly, inclusion of the ATAC-seq peak summit was predictive of
195 MPRA activity even for genomic regions with accessibility in only one of 19 islet donors.
196 Together, these analyses indicate that sequences overlapping human islet
197 ATAC-seq peaks were more likely to be active with MPRA in MIN6 mouse β cells than
198 sequences outside peaks, regardless of sequence similarity between human and
199 mouse. ATAC-seq peak summits are enriched for TF binding motifs, and exclusion of bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
200 ATAC-seq peak summits among the elements overlapping ATAC-seq peaks,
201 significantly decreased the probability of observing MPRA activity.
202
203 SNPs proximal to ATAC-seq summits have a higher probability of altering MPRA
204 activity and chromatin accessibility
205 To identify SNPs that modulate regulatory element activity, we assessed allelic
206 differences (skew) in MPRA activity (schematic in Figure 1b) under each MIN6 beta cell
207 culture condition (standard, 250nM TG, and DMSO solvent control). In total, 879 SNPs
208 exhibited allelic skew at FDR <10% (Table S2; Supplementary Figure 5). Importantly,
209 when allelic skew was detected in more than one experimental condition, the direction-
210 of-effect was concordant for >98.5% of the SNPs (Supplementary Figure 5). Assessing
211 allelic skew for SNPs across all conditions increased the power to detect allelic skew
212 and identified both highly reproducible and condition-specific SNP allelic effects on
213 MPRA activity.
214 SNPs associated with chromatin accessibility changes in islets (caQTLs) were
215 significantly closer to the ATAC-seq peak summits than control SNPs, which are SNPs
216 that reside in ATAC-seq peaks11 but were not associated with chromatin accessibility
217 changes in islets (Figure 4a). Consequently, caQTL SNPs were more likely to be MPRA
218 active than were control SNPs (Figure 4b). Among elements with MRPA activity, caQTL
219 SNPs were also more likely to exhibit allelic skew than control SNPs (Figure 4c) and
220 showed a significantly greater magnitude of skew between alleles (Figure 4d). Together,
221 these results suggest that caQTL SNPs are closer to ATAC-seq peak summits and more
222 likely to modulate the magnitude of both in vitro MPRA activity and in vivo chromatin
223 accessibility, likely by disrupting TF binding motifs that are enriched in these regions.
224 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
225 Allelic effects on MIN6 beta cell MPRA activity and in vivo islet chromatin
226 accessibility are significantly correlated
227 After identifying SNPs with allelic skew in MPRA activity, we investigated the
228 relationship between SNP effects on in vivo islet chromatin accessibility (caQTLs) and in
229 vitro MPRA activity. Figure 5a shows an example caQTL (rs17396537) for which islet
230 donors with homozygous GG genotypes (n=6) exhibit higher chromatin accessibility than
231 islet donors with GC (n=11) or CC (n=2) genotypes at this variant. Concordantly, the
232 sequence containing the ‘G’ allele for rs17396537 exhibited higher MPRA activity than
233 the one containing the ‘C’ allele (Figure 5a, b). 297/1910 lead caQTL SNPs
234 demonstrated significant allelic skew in MPRA activity. Overall, caQTL and MPRA
235 directions-of-effect for these 297 SNPs were highly correlated (Pearson R = 0.526,
236 Figure 5b), and a significant concordance of direction was observed (i.e., alleles
237 associated with higher chromatin accessibility also had higher MPRA activity) (n =
238 246/297; 82.8%).
239 A subset of lead caQTL SNPs (17.2%) showed discordant allelic effects on
240 chromatin accessibility levels and MPRA activity (blue; Figure 5B). We hypothesized that
241 discordance may be driven by a nearby SNP that has MPRA activity antagonistic to the
242 lead caQTL SNP10. For example, the reference allele of the lead caQTL SNP,
243 rs1515555, is associated with in vivo higher chromatin accessibility but lower MPRA
244 activity (Figure 5b). This region contains a second SNP, rs1515556, 24 bps away from
245 rs1515555 and in tight LD (r2 = 0.99). We therefore assessed sequences containing
246 each of the four possible allelic combinations of these SNPs in the MPRA library for
247 additive, antagonistic, or synergistic transcriptional effects. In comparison to
248 rs1515555:rs1515556 REF:REF sequence (G:T in Figure 5c), the ALT:REF (A:T)
249 sequence had higher MPRA activity, while the REF:ALT (G:C) sequence had lower
250 MPRA activity (G:C). However, due to high LD between the 2 SNPs, both of these bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
251 combinations (‘A:T’ and ‘G:C’; ALT:REF and REF:ALT; Figure 5c) are rarely observed in
252 the human population. Sequences with either REF:REF (G:T) or ALT:ALT (A:C) at
253 rs1515555:rs1515556 are more likely to be observed in individuals in the population.
254 Comparison of these haplotypes revealed concordant effects on both chromatin
255 accessibility and MPRA activity and underscores the importance of accounting for
256 haplotypes when characterizing allelic effects.
257 For 94 lead caQTL SNPs, we tested for potential interactions with neighboring
258 SNPs by including all 4 allelic combinations in the MPRA library. However, these were
259 too few to clearly implicate antagonism between neighboring SNPs for the observed
260 discordance between chromatin accessibility and MPRA activity. Therefore, we cannot
261 rule out discordant SNPs as merely being false positives or mechanistic difference due
262 to technical constraints of the assay. However, MPRA provides a unique opportunity to
263 test and dissect the contributions of neighboring SNP to in vivo effects.
264 Allelic effects on MPRA activity and chromatin accessibility are therefore
265 significantly correlated. Importantly, these results demonstrate that that MPRA detects in
266 vivo relevant allelic effects on ß cell transcriptional activity.
267
268 Functional identification of T2D SNPs altering β cell regulatory element activity
269 A key challenge to translate GWAS associations into a mechanistic
270 understanding of the genes and pathways altered by T2D risk variants is identifying the
271 SNP(s) that affect functional or active CREs among tens to hundreds of genetically
272 linked, associated SNPs in each locus. The ability to functionally annotate groups of
273 SNPs with empirical measurements of CRE modulation would help prioritize variants
274 and expedite our ability to identify T2D causal alleles. To uncover transcriptional effects
275 of T2D-associated SNP alleles, we tested sequences containing each allele of 2,500
276 index and genetically linked (r2>0.8) SNPs/indels for 259 association signals for T2D and bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
277 related quantitative traits reported in the NHGRI/EBI GWAS Catalog. One or both alleles
278 of 492 SNPs/indels were active by MPRA. Active sequences were enriched for the
279 binding motifs of TFs that play important roles in β cell maturation and insulin secretion
280 (Figure 6a), such as liver x receptor (LXR)38,39, thyroid hormone receptor (THRa and
281 THRb)40,41, retinoic acid receptor (RARa)42, and BCL11A43. Allelic effects on MPRA
282 activity were observed for approximately half of these elements (n=220/492),
283 corresponding to 104 distinct T2D-associated loci (Supplementary Table 4). Importantly,
284 this list included T2D-associated SNPs that were previously identified for their in vivo
285 effects on islet chromatin accessibility and/or in vitro effects on luciferase reporter
286 assays, such as rs7903146 (TCF7L2)44,45, rs1635852 (JAZF1)46, rs10428126
287 (IGF2BP2)11,47, and rs12189774 (VEGFA)48 (Figure 6b, Table 1). Importantly, MPRA has
288 substantially expanded the set of T2D-associated GWAS SNPs with empiric effects on
289 transcriptional activation in ß cells, and therefore nominated them for the first time as the
290 putative causal/functional variants in these T2D loci. These include rs2881632
291 (SDHAF4), rs13096599 (KBTBD8), rs7783500 (GCC1), rs72697237 (NOTCH2),
292 rs13405776 (THADA), rs17748864 (PEX5L), rs7957197 (OASL), and rs13026123
293 (SRBD1), which are highlighted in Table 1.
294 For 54/104 T2D-associated signals, one SNP among those tested exhibited
295 allelic effects on MPRA activity (black points in Figure 6b). For example, rs987964 was
296 the only SNP of 49 tested in the ZBTB20 locus (Figure 6c; Supplementary Table 5) that
297 exhibited allelic effects on MPRA activity. For this and 53 additional T2D-associated
298 signals, MPRA thus provides empiric evidence that the variant directly modulates CRE
299 activity and assists in prioritizing variants for additional investigation as putative causal
300 alleles (Supplementary Table 5). The number of SNPs with allelic skew was found to
301 generally correlate with the number of SNPs tested per locus (Figure 6b). For example,
302 10/45 tested SNPs in LD with the index SNP rs12304921 in the HIGD1C locus had an bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
303 allelic skew in MPRA activity (Figure 6d), not all of which were in the same direction with
304 respect to the T2D risk alleles. Investigating the relative importance and potential
305 interactions between these ten putative causal variants will be important to establish
306 causality for this locus. These results underscore the value of assessing MPRA activity
307 in disease-relevant cell types as a tool to nominate putative causal T2D SNPs based on
308 systematic testing of their transcription-modulating effects in beta cells.
309
310 ER stress modulates MPRA activity of T2D SNP-containing sequences
311 MPRA activity of some putative CREs were exclusive to one out of the three
312 conditions tested. For example, ~5% of those tested were identified as MPRA active
313 exclusively in ER stressed beta cells (Figure 3a, n=106). However, we found that ER
314 stress elicited substantial quantitative changes in MPRA activity of 724 CREs (Figure
315 2c). To investigate both the impact of ER stress on MPRA activity and how SNPs modify
316 this response, we examined elements meeting two criteria: 1) allelic effects on MPRA
317 activity was observed in at least one condition; and 2) a minimum of one allele showed
318 differences in MPRA activity under ER stress (Figure 7a).
319 We previously characterized islet caQTLs from individuals with intact insulin
320 expression and secretion, and observed an enrichment of islet-specific TF binding motifs
321 at caQTL regulatory elements11. ER stress, however, leads to the inactivation of β cell
322 specific TFs (Figure 2d), with a concomitant decrease in Ins2 gene expression (Figure
323 2b). As might be expected based on these observations, steady state islet caQTL-
324 containing sequences predominantly exhibited lower MPRA activity in TG-treated cells
325 than the DMSO solvent controls. (Figure 7a, red dashes and 7b, red). In contrast, T2D
326 SNP-containing elements were significantly more likely to have higher MPRA activity
327 under ER stress (Figure 7a, green and 7b). Given the majority of SNPs tested in this set bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
328 are non-causal/functional ‘passenger’ alleles, this effect may reflect general CRE
329 architecture across T2D-associated loci.
330 Curiously, only 13% (30/220) of T2D-associated SNPs with allelic skew in MPRA
331 activity overlapped islet ATAC-seq peaks. We hypothesized that this may partially be
332 due to mappability issues since 63.1% (139 / 220) of T2D-associated SNPs with allelic
333 skew in MPRA activity overlapped repetitive sequences. Since there are no mappability
334 issues with MPRA, we first asked if repetitive sequences have MPRA activity. Among
335 the various classes of repetitive sequences, only elements overlapping short
336 interspersed nuclear elements (SINEs) were significantly more likely to be active
337 (Supplementary Figure 6a). Since >50% of elements overlapping SINEs in the MPRA
338 library harbored T2D-associated SNPs (Supplementary Figure 6b and S6c), we next
339 asked whether T2D-associated elements overlapping SINEs showed higher activity
340 under ER stress. Indeed, >75% of T2D-associated elements with higher MPRA activity
341 under ER stress overlapped SINEs, a significantly higher proportion compared to T2D-
342 associated elements with lower or no change in MPRA activity under ER stress (Figure
343 7c). Alu elements, the most common SINEs in the human genome, are primate-
344 specific49. When we investigated conservation in 20 mammalian genomes, T2D-
345 associated elements with higher MPRA activity under ER stress were indeed less likely
346 to be conserved in all non-primate mammalian species for which comparisons were
347 made (Figure 7d).
348 In conclusion, we have identified 220 elements whose MPRA activity is disrupted
349 by T2D-associated SNPs. 40% (n=86) of these T2D-associated SNPs had significantly
350 higher MPRA activity under ER stress, >75% of which overlapped SINEs.
351
352 Discussion bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
353 In this study, we used MPRA to test >13,000 sequences containing each allele of
354 6628 SNPs for MPRA activity in MIN6 β cells under standard culture conditions and after
355 ER stress or paired solvent control exposures. In total, 30% (n=1983/6628) of putative
356 CREs tested increased transcriptional activity of a minimal promoter. SNP alleles,
357 including those associated with altered in vivo chromatin accessibility in human islets
358 and with T2D genetic risk by GWAS, altered MPRA activity for approximately half of
359 these elements (n=879/1983).
360 MPRA activity exhibited a striking, positive correlation with in vivo islet chromatin
361 accessibility. As anticipated, elements accessible in vivo were far more likely to have
362 MPRA activity in vitro. However, MPRA refined our understanding of the specific content
363 and location of DNA sequences within ATAC-seq peaks that drive β cell transcriptional
364 activation. Likelihood of an element showing MPRA activity increased as a function of its
365 proximity to ATAC-seq peak summits, and SNPs closer to ATAC-seq peak summits
366 were more likely to alter MPRA activity and chromatin accessibility. These results help to
367 prioritize sequences within open chromatin regions for their importance in regulating
368 enhancer activity by disrupting TF binding. Interactions between neighboring SNPs
369 sometimes led to discordant effects between chromatin accessibility and activity
370 measured with MPRA. In these instances, MPRA helped to determine the relative
371 contributions of neighboring SNPs (e.g., additive or antagonistic effects) on
372 transcriptional activation.
373 T2D-associated SNPs are overwhelmingly non-coding, and primarily alter
374 regulatory activity rather than protein structure and function. A key challenge in T2D
375 genetics is to identify the putative causal SNP(s) from among tens to hundreds of
376 genetically linked, noncoding variants per association signal. This is the first study to
377 systematically test thousands of T2D-associated SNP alleles for their effects on
378 transcriptional activity in β cells. MPRA identified 492 elements in T2D-associated loci as bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
379 active. 220/492 SNP alleles, representing 104 distinct T2D association signals,
380 significantly altered β cell MPRA activity. MPRA recapitulated, and thereby confirmed,
381 allelic effects of several T2D SNPs that have been characterized previously using
382 targeted, low throughput luciferase assays, such as rs10428126 and rs2943656 at the
383 IGF2BP2 and IRS1 locus, respectively. For 54/104 T2D association signals, only one
384 SNP showed an allelic skew among all the SNPs tested, implicating them as the putative
385 causal SNP within their respective locus. Although not an exhaustive test of all credible
386 set SNPs identified to date, this approach has substantially expanded the list of T2D-
387 associated SNP alleles that empirically alter ß cell regulatory sequence activity. These
388 results should facilitate targeted follow-up experiments and integrated genomic analyses
389 for a better understanding of the functional genetics of islet (dys)function and T2D risk
390 and strongly motivate future studies of more comprehensive panels of T2D-associated
391 SNPs for their transcription-modulating effects in beta cells and other diabetes-relevant
392 cell types under steady state, stimulatory, and/or stress conditions.
393 Islet dysfunction and beta cell failure in T2D results from both genetic and
394 environmental risk factors. MPRA of sequences containing T2D-associated SNPs, when
395 tested under control and ER stress conditions, revealed the effects of these factors on ß
396 cell transcriptional activation. A significant proportion (40%; 86/220) of sequences
397 containing T2D-associated index or linked SNPs exhibiting allelic skew under control
398 conditions had significantly higher MPRA activity under ER stress, suggesting – i) MIN6
399 cells cultured under standard, high glucose (25 mM) conditions are already under a
400 basal level of ER stress, and ii) many T2D-associated SNPs overlap CREs relevant for β
401 cells to respond to ER stress. Uncompensated ER stress was found to lead to the
402 inactivation of β cell-specific TFs causing the downregulation of insulin transcription and
403 secretion50–54. ER folding and trafficking capacity may therefore be a major factor
404 determining how much insulin can be released by β cells under elevated blood glucose bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
405 levels, before stress ensues. SNPs at regulatory elements modulating the expression of
406 genes relevant to meet higher demands for insulin synthesis may impair or improve ER
407 capacity, ultimately determining the threshold at which ER stress ensues, leading to β
408 cell failure.
409 Finally, we uncovered evidence that repetitive element sequences, most notably
410 SINEs, elicited robust transcriptional activation in ß cells and that multiple T2D-
411 associated SNPs residing in these sequences modulated their transcriptional activity.
412 The MPRA results obtained in this study thus suggest that repetitive element-containing
413 sequences may play important roles in modulating stimulus and/or stress-responsive ß
414 cell transcriptional programs and remind us to consider the potentially important roles
415 that repetitive elements, and SNPs within them, may play in the genetics of islet
416 (dys)function, diabetes risk and progression. Studies over the past few years
417 demonstrating repetitive element-mediated oncogene activation and modulation of
418 chromatin structure have contributed to an emerging appreciation of the importance of
419 these sequences in epigenetic and transcriptional regulation55–63. Recently, Hernandez
420 et al found that Alu elements are transcriptionally induced by cellular stress, including
421 thermal and ER stress, and that the corresponding SINE RNAs function as critical
422 transcriptional switches during stress64. Future studies to elucidate the target genes of
423 these and other MPRA active, SINE-containing regulatory sequences, will be necessary
424 to fully understand the functional consequences of sequence variation in these
425 transcriptionally active sequences and the potential role(s) of exaptation57,65,66 in the
426 genetics of islet (dys)function and T2D.
427
428 Figure Legends
429 Figure 1. Massively Parallel Reporter Assay (MPRA) identifies beta cell
430 transcription activating sequences. (a) Features of the 6,628 two hundred base pair bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
431 (bp) elements selected for MPRA library construction and testing. 4,329 elements
432 overlap islet ATAC-seq peaks, of which 1,910 contain SNPs associated with altered in
433 vivo chromatin accessibility (caQTLs). 2,500 elements contain T2D-associated SNPs
434 (index or linked (r2>0.8)) reported in the NHGRI/EBI GWAS catalog, 201 of which
435 overlap islet ATAC-seq peaks. (b) Schematic of MPRA approach. A mock SNP,
436 rs123456789, is used as an example to show how MPRA activity and allelic skew is
437 measured for each 200 bp regulatory element. mP = minimal promoter. (c) Log2 fold-
438 change in barcode/sequence counts in plasmid input (DNA) and gfp mRNA transcripts
439 (RNA) 30 hr after transfection of MPRA library into MIN6 cells grown in standard culture
440 conditions (DMEM, 25 mM glucose). Red points indicate sequences with MPRA activity
441 (FDR<1%; n = 2224/13,628). (d) Barplot showing the odds of MPRA active sequences
442 overlapping in vivo human islet TF ChIP-seq17 peaks compared to MPRA-inactive
443 sequences. ***p<0.001; **p<0.01, Fisher’s Exact Test.
444
445 Figure 2. MPRA identifies endoplasmic reticulum (ER) stress-responsive
446 transcriptional activating sequences. (a) Experimental design to identify ER stress-
447 responsive transcription activating sequences. (b) RT-qPCR of ER stress response
448 genes (Ddit3, Hspa5, Edem1, and Hsp90b1), Ins2, and the control gene Actb in MIN6
449 cells exposed to the ER stress inducer thapsigargin (TG, 250 nM) or dimethylsulfoxide
450 (DMSO) solvent control for 24 hours. Bar plots show mean +/- SEM from three biological
451 replicates. *p<0.05; ** p<0.01; *** p<0.001, paired t-test. (c) Heatmap of sequences with
452 higher (far right column, red bar; n=328, mapping to 275 elements) or lower MPRA
453 activity (blue bar; n=656, mapping to 449 elements) under ER stress (FDR<1%). Red
454 annotation bars to the left of the heatmap indicate sequences identified as MPRA active
455 in DMSO and/or TG based on their sequence counts in RNA vs. plasmid DNA input. (d)
456 Comparison of transcription factor (TF) binding motifs enriched in elements with higher bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
457 (n=275) or lower (n=449) MPRA activity under ER stress. Red and blue dots denote TF
458 motifs significantly enriched in elements with higher or lower MPRA activity in ER
459 stressed MIN6 beta cells, respectively (FDR<1%). Yellow dots denote TF motifs with no
460 significant enrichment in either comparison.
461
462 Figure 3. ATAC-seq peak summits are enriched for MPRA activity. (a) Venn
463 diagram showing the number of elements with at least one allele identified as MPRA
464 active in standard culture, TG, and DMSO conditions (FDR<1%). Note that elements
465 identified as MPRA active in ≥1 experimental condition may still exhibit quantitative
466 differences in MPRA activity under ER stress. (b) The probability that an element has
467 significant MPRA activity (y-axis; at least 1 allele; any experimental condition) is shown
468 as a function of sequence similarity (x-axis) between human and mouse genomes
469 (expected probability is the black dotted line). Elements overlapping islet ATAC-seq
470 peaks (brown line) had a significantly higher probability of MPRA activity than elements
471 that did not overlap ATAC-seq peaks (green line), regardless of human-mouse
472 sequence similarity. ***, **, and * indicate Bonferroni-corrected Fisher’s Exact Test
473 p<0.001, 0.01, 0.05, respectively. (c) 200 bp genomic regions around human islet ATAC-
474 seq peak summits were compared to 200 bp regions also within ATAC-seq peaks, but
475 flanking the peak summit. ATAC-seq peak summits were significantly enriched for many
476 TF motifs compared to genomic regions that overlap ATAC-seq peaks but are further
477 away from the summit. (d) Elements where the SNP (center of all tested loci) was ± 100
478 bp (green dashed lines) from ATAC-seq peak summit were more likely to be identified as
479 MPRA active. This effect is restricted to SNPs within 100 bps of the ATAC-seq peak
480 summit because our MPRA library was designed to test only the 200 bp sequence on
481 either side of a given SNP of interest. ATAC-seq peak summits were therefore included
482 in the sequences tested only if the distance of the SNP was within 100 bps of the ATAC- bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
483 seq peak summit. (e) (Top) The probability of an element (at least one allele) being
484 identified as MPRA active (y-axis) binned by the number of islet donors (out of 19 total)
485 in whom the ATAC-seq peak was present (x-axis). Elements where the ATAC-seq peak
486 summit was included in the sequence tested, i.e., distance of SNP to ATAC-seq peak
487 summit was less than 100 bps (red), had a significantly higher probability of being
488 identified as MPRA active. * indicates FDR < 10%, Fisher’s Exact Test. (Bottom)
489 Stacked barplot of the number of times an ATAC-seq peak overlapping the MPRA
490 sequence tested is detected in the cohort (n = 19 islet donors).
491
492 Figure 4. SNPs closer to ATAC-seq peak summits are more likely to affect MPRA
493 activity and chromatin accessibility. (a) caQTL SNPs (blue bars) are closer to ATAC-
494 seq peak summits than control SNPs (red bars), irrespective of MPRA activity (pairwise
495 Wilcoxon tests with Bonferroni correction for multiple testing). (b) Elements containing
496 caQTL SNPs are more likely to be identified as MPRA active (any experimental
497 condition; Fisher’s Exact Test). (c) Among SNPs with at least 1 allele identified as MPRA
498 active, caQTLs are significantly more likely show allelic skew (any experimental
499 condition), suggesting polymorphisms proximal to ATAC-seq peak summits are more
500 likely to affect MPRA activity (Fisher’s Exact Test). (d) Among SNPs with allelic skew in
501 MPRA activity, caQTLs have a significantly larger difference in MPRA activity between
502 alleles than control SNPs (Wilcoxon Test). For all panels, *, **, and *** indicate adjusted
503 p <0.05, p < 0.01, and p < 0.001, respectively.
504
505 Figure 5. Alleles associated with higher chromatin accessibility also have higher
506 MPRA activity. (a) (Left) caQTL example: Islet samples GG homozygous for
507 rs17396537 (green; average ATAC-seq read counts) were associated with significantly
508 higher chromatin accessibility than samples with GC (pink) or CC (black) genotypes bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
509 (hg19 coordinates - chr2:133408787-133409072). (Right): Sequence with the reference
510 ‘G’ allele at rs17396537 displayed significantly higher MPRA activity than the alternate
511 ‘C’ allele. (b) Correlations between allelic effects on in vivo islet chromatin accessibility
512 (x-axis) and MPRA activity (y-axis) for sequences containing islet caQTL SNPs.
513 Quadrants 1 and 3 (red) correspond to SNPs where the allele associated with higher
514 chromatin accessibility also had higher MPRA activity (concordant). The number of
515 SNPs in each quadrant is indicated. Pearson R = 0.526, p value < 2.2e-16) among the
516 caQTL SNPs with significant allelic skew in MPRA activity (FDR<10%). Brackets indicate
517 points beyond the axis limits. For example, there are a total of 135 SNPs in the bottom
518 left quadrant, 2 of which are beyond the axis limits. REF=hg19 reference allele;
519 ALT=alternate allele. (c) As shown in panel b, the lead caQTL SNP rs1515555
520 demonstrated apparently discordant allelic effects between chromatin accessibility and
521 MPRA activity. Since there is another SNP, rs1515556, 24 bps from the lead caQTL
522 SNP (r2=0.99), sequences containing all four allelic combinations were tested in the
523 MPRA library. The alternate allele of rs1515555 increased MPRA activity, whereas that
524 of rs1515556 decreased MPRA activity. Ref/Ref for rs1515555 and rs1515556 exhibited
525 higher MPRA activity than the Alt/Alt combination, which is concordant with its higher
526 chromatin accessibility associated with Ref/Ref for rs1515555 and rs1515556.
527
528 Figure 6. Functional identification of T2D SNPs altering β cell regulatory element
529 activity. (a) TF binding motifs enriched at 492 elements containing T2D-associated
530 SNPs with significant MPRA activity (at least one allele). Red dots denote significantly
531 enriched TF binding motifs (FDR<1%). (b) MPRA identifies a subset of T2D-associated
532 index and genetically linked (r2≥0.8) SNPs representing 259 signals from the NHGRI/EBI
533 GWAS catalog (x-axis; log scale) that demonstrate significant allelic effects on MPRA
534 activity (y-axis) in one or more condition tested. A total of 220 SNPs in linkage bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
535 disequilibrium with 104 T2D-associated index SNPs show significant allelic effects on
536 MPRA activity (FDR<10%). Jitter in the plot is used to visually separate individual points.
537 (c) (Top). In total, 49 SNPs in LD (r2 > 0.8) with the T2D-associated index SNP
538 rs73230612 were tested with MPRA (ZBTB20 locus). Allelic skew in MPRA activity was
539 detected in the same direction across all 3 experimental conditions (standard culture,
540 DMSO and TG) at 1 SNP only, rs987964. (Bottom; zoom-in) The 200 bp genomic region
541 spanning rs987964 is specific to humans. (d) In total, 45 SNPs in LD (r2 > 0.8) with the
542 T2D-associated index SNP rs12304921 were tested with MPRA (HIGD1C locus). Allelic
543 skew in MPRA activity was detected at 10 SNPs.
544
545 Figure 7. ER stress increases MPRA activity of 40% of T2D-associated SNP
546 containing sequences with allelic skew. (a) Among the SNPs with allelic skew (any
547 condition), the heatmap includes those with at least 1 allele showing significantly higher
548 or lower MPRA activity under ER stress vs. DMSO (row annotation (right); red and blue
549 bars, respectively). For each SNP (rows), z-scores are obtained by centering and scaling
550 the normalized RNA/DNA ratios for the reference and alternate alleles (5 replicates
551 each: DMSO and TG). Annotation horizontal bars on the left indicate whether an
552 element (row) harbors a caQTL or T2D-associated SNP. Note that the majority of
553 elements with higher MPRA activity under ER stress contain T2D-associated SNPs. (b)
554 SNPs with allelic skew (any condition) are categorized as having at least 1 allele with – i)
555 lower, ii) no difference in, or iii) higher MPRA activity under ER stress vs. DMSO (x-axis).
556 SNPs in each of the above categories are further fractioned as being – i) caQTLs (red),
557 ii) control SNPs (yellow), or iii) T2D-associated (green). A significant proportion of caQTL
558 SNPs with allelic skew were significantly more likely to show reduced MPRA activity, and
559 significantly less likely to show higher MPRA activity, under ER stress. A significant
560 proportion of T2D-associated SNPs with allelic skew displayed the opposite trend: they bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
561 were significantly less likely to show reduced MPRA activity, and significantly more likely
562 to show higher MPRA activity (86/220), under ER stress. (c) Fraction of T2D-associated
563 elements or 10,000 random genomic regions overlapping SINE repeats in the human
564 genome. The T2D-associated elements are categorized based on whether the elements
565 show – i) MPRA activity, ii) Allelic skew in MPRA activity, or iii) higher MPRA activity
566 under ER stress. A significant fraction of T2D-associated elements with higher activity
567 under ER stress overlap SINEs. (d) The probability that elements containing SNPs with
568 allelic skew are conserved with sequence similarity greater than 20% (y-axis) is plotted
569 for 20 mammals whose genomes are available (x-axis). Red and yellow dots show
570 comparisons for caQTL and control SNPs, respectively. T2D-associated SNPs with
571 allelic skew were split based on whether they had significantly higher MPRA activity
572 under ER stress (n=86; purple), or no change (n=110; green). Compared to 10,000
573 random 200 bps regions from the human genome, elements overlapping caQTL and
574 control SNPs were significantly more likely, and T2D-associated elements with higher
575 MPRA activity under ER stress significantly less likely, to be conserved in non-primate
576 mammalian species (Fisher’s exact test; FDR<5%). For Panels B and C, Fisher’s Exact
577 Test p-value (corrected for multiple testing) < 0.001 is indicated as ***, ** indicates p-
578 value < 0.01, and * indicates p-value < 0.05.
579
580 Supplementary Figure Legends
581 Supplementary Figure 1: The chromatin accessibility profile of MIN6 mouse β cells
582 most resembles that of human islets. (a) ATAC-seq peaks from 9 human tissues were
583 mapped to the mouse genome (mm9) using the UCSC Genome Browser liftover tool
584 and compared to MIN6 ATAC-seq peak locations. (b) Heatmap showing the enrichment
585 of islet-specific TF motifs at human ATAC-seq peaks uniquely overlapping MIN6 ATAC-
586 seq peaks. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
587
588 Supplementary Figure 2: QC Metrics of MPRA libraries. (a) Scatter plot of the first 2
589 principal components, which together explain >99% of the variation in 5 plasmid and 5
590 RNA MPRA replicates. RNA replicates were obtained after transfection of the MPRA
591 library into MIN6 cells under standard culture conditions. (b) Pairwise Pearson
592 correlation coefficients are shown as a heatmap with unsupervised row and column
593 clustering.
594
595 Supplementary Figure 3: Human-Mouse sequence similarity of elements with
596 MPRA activity. Human-mouse sequence similarity for elements with (red) or without
597 (black) MPRA activity is plotted. Human elements with 0% sequence similarity did not
598 liftover to the mouse genome (mm9) with even 1% sequence similarity. A bootstrap
599 hypothesis test of equality did not find a statistically significant difference between
600 elements with (red) or without (black) MPRA activity for human-mouse sequence
601 similarity. The upper and lower end-points for equality is indicated as a blue reference
602 band.
603
604 Supplementary Figure 4: Enrichment of islet-specific TF binding motifs at islet
605 ATAC-seq peak summits. Islet ATAC-seq peak summits are significantly enriched for
606 many islet-specific TF motifs compared to genomic regions that overlap ATAC-seq
607 peaks but are further away from the summit.
608
609 Supplementary Figure 5: Allelic skew in MPRA activity for 879 SNPs across the 3
610 experimental conditions. (Top) Venn diagrams showing the number of SNPs with
611 allelic skew in MPRA activity (FDR<10%) in each condition. (Bottom) Scatter plot of log
612 fold change in MPRA activity for SNPs with allelic skew in 2 experimental conditions. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
613 SNPs for which comparisons are made are indicated as pink in the corresponding Venn
614 diagram above each scatter plot.
615
616 Supplementary Figure 6: The majority of elements overlapping SINEs harbor T2D-
617 associated SNPs. (a) Barplot showing the odds of MPRA active sequences (at least
618 one allele; any experimental condition) overlapping long interspersed nuclear element
619 (LINE), long terminal repeat (LTR), or short interspersed nuclear element (SINE)
620 repetitive element classes. ***p< 0.001, Fisher’s Exact test. (b) Fraction of elements
621 containing i) caQTL, ii) control, or iii) T2D-associated SNPs that overlap SINEs. 10,000
622 random genomic regions from hg19 are included for context. (c) Stacked barplot
623 showing the fraction of elements in the MPRA library overlapping SINEs in the human
624 genome.
625
626
627
628 Methods
629 MPRA library design: 200 base pair sequences, with 100 bps flanking each side of
630 6628 SNPs were included in our MPRA library. The SNPs belong to 3 categories:
631 1. Islet chromatin accessibility quantitative trait loci (caQTLs): These are SNPs we
632 previously identified as having a significant association with altered in vivo
633 chromatin accessibility in islet samples11. For 1806 caQTLs, only the lead SNP
634 was included in the MPRA library. For 94 caQTLs, the 200-bp sequences
635 contained two SNPs in LD less than 25 bp apart. For these sequences, all four
636 allelic combinations were synthesized and tested.
637 2. Control SNPs: SNPs that overlapped islet ATAC-seq peaks but did not
638 significantly alter accessibility of those peaks were also synthesized and tested11. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
639 Since islet caQTLs were identified in a relatively small cohort of individuals
640 (n=19), the following criteria were used to include SNPs for which the caQTL
641 study was likely more powered to detect associations with chromatin
642 accessibility:
643 a. unadjusted p value > 0.2,
644 b. minor allele frequency > 0.125
645 SNPs overlapping individual-specific peaks or sharing peaks with other SNPs
646 were further filtered out. Of the remaining 15,178 SNPs, 2218 were randomly
647 selected to be included in the MPRA library.
648 3. T2D-associated SNPs/indels: 200 bp sequences overlapping SNPs (n=2299),
649 small insertions (n=72) and small deletions (n=129) in linkage disequilibrium
650 (r2>0.8) with T2D-associated index SNPs (n=259) from the NHGRI/EBI GWAS
651 Catalog (accessed January 19th, 2017) were synthesized and tested, as
652 previously described67. Briefly, T2D-associated GWAS SNPs were pruned using
653 PLINK version 1.968 to identify SNPs in high linkage disequilibrium (r2>0.8).
654
655 The vast majority of SNPs and sequences tested belonged to only 1 of the 3
656 categories. However, T2D-associated SNPs overlapping 13 ATAC-seq peaks were
657 significantly associated with chromatin accessibility in islets (caQTLs). Therefore, for
658 analysis purposes, whenever SNPs were required to belong to only 1 of the 3 categories
659 above (such as Figures 1A, 7A and 7B), they were not categorized as caQTLs, but as
660 being T2D-associated only.
661
662 MPRA library construction: The MPRA library was constructed as previously
663 described13. Briefly, oligos were synthesized (Agilent Technologies) as 230 bp
664 sequences containing 200 bp of genomic sequences and 15 bp of adaptor sequence on bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
665 either end. Unique 20 bp barcodes were added by PCR along with additional constant
666 sequence for subsequent incorporation into a backbone vector by Gibson assembly. The
667 oligo library was expanded by electroporation into E. coli, and the resulting plasmid
668 library was sequenced by Illumina 2 X 150 bp chemistry to acquire oligo-barcode
669 pairings. The library underwent restriction digestion, and GFP with a minimal TATA
670 promoter was inserted by Gibson assembly resulting in the 200 bp oligo sequence
671 positioned directly upstream of the promoter and the 20 bp barcode falling in the 3’ UTR
672 of GFP. After expansion within E. coli the final MPRA plasmid library was sequenced by
673 Illumina 1 X 31 bp chemistry to acquire a baseline representation of each oligo-barcode
674 pair within the library. Barcodes mapping to more than 1 sequence were discarded from
675 all downstream analyses. Note: 2 separate batches of the MPRA library were prepared.
676 The first batch was used to perform MPRA under standard culture conditions. This
677 MPRA library was then electroporated into E. coli to obtain a second batch of the MRPA
678 library, which was used for the DMSO-TG experiments.
679
680 MPRA library transfection into MIN6 cells: 10 million MIN6 cells were seeded in each
681 of seven 15 cm2 dishes. The cells were 60-70% confluent the next day. Each 15 cm2
682 dish was replaced with 20 ml of fresh media, and transfected with 7ug of the MPRA
683 plasmid library using 55ul Lipofectamine 2000 (38% transfection efficiency). Six hours
684 after transfection, media was either i) not changed (MPRA under standard culture
685 conditions), ii) replaced with media containing 250 nM Thapsigargin dissolved in 0.025%
686 DMSO, or iii) replaced with media containing 0.025% DMSO. Thirty hours after
687 transfection, cells were trypsinized and collected by centrifugation. Cell pellets
688 were frozen at -80°C. For each condition (standard culture / DMSO / 250 nM TG), MIN6
689 cells were transfected on five separate days to generate biological replicates.
690 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
691 RNA isolation and MPRA RNA-seq library generation: RNA was extracted from
692 frozen cell pellets using the Qiagen RNeasy Midi kit. Following DNase treatment, a
693 mixture of 3 GFP-specific biotinylated primers (Supplementary Table 1; #120, #123 and
694 #126) were used to immunoprecipitated GFP transcripts using Streptavidin C1
695 Dynabeads (Life Technologies). Following another round of DNase treatment, cDNA
696 was synthesized from GFP mRNA using SuperScript IV and purified with AMPure XP
697 beads. Quantitative PCR using primers specific for GFP (Supplementary Table 1; #34
698 and #52) was used to determine the cycle at which linear amplification begins for each
699 replicate. Replicates were diluted to approximately the same concentration based on the
700 qPCR results, and PCR with primers #34 and #52 was used to amplify barcodes
701 associated with the ~13.5k sequences included in the MRPA library for each replicate (9
702 cycles for standard culture, and 13 cycles for DMSO / 250 nM TG). A second round of
703 PCR (6 cycles) was used to add Illumina sequencing adaptors to the DNA/RNA
704 replicates. The resulting MPRA barcode libraries were spiked with 5% PhiX and
705 sequenced using Illumina single-end 31 bp chemistry (with 8 bp index read), clustered at
706 80-90% maximum density.
707
708 MPRA data analysis: Data from the MPRA was analyzed as previously described13.
709 Briefly, the sum of the barcode counts for each oligo within replicates was median
710 normalized, and oligos showing differential expression relative to the plasmid input were
711 identified by modeling a negative binomial distribution with DESeq2 and applying a false
712 discovery rate (FDR) threshold of 1%. For sequences that displayed significant MPRA
713 activity, a paired t-test was applied on the log-transformed RNA/plasmid ratios for each
714 experimental replicate to test whether the reference and alternate allele had similar
715 activity. An FDR threshold of 10% was used to identify SNPs with a significant skew in
716 MPRA activity between alleles (allelic skew). Because the MPRA testing standard bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
717 culture conditions was performed with a separate MRPA library preparation. Therefore,
718 the DMSO-TG MPRA results were not directly compared to MPRA performed under
719 standard culture conditions.
720
721 Annotating repetitive elements tested with MPRA: The ‘RepeatMasker’ track for
722 hg19 was downloaded from the UCSC genome browser. Among the ten different
723 classes of repeats, only three classes (long interspersed nuclear element (LINE), long
724 terminal repeat (LTR), and short interspersed nuclear element (SINE)) overlapped more
725 than 100 elements tested with MPRA. Therefore, only these three classes of repeats
726 were assessed for associations with MRPA activity.
727
728 Mapping human regulatory sequences tested with MPRA to mammalian genomes:
729 Liftover tool in the UCSC genome browser was used to map human sequences (hg19)
730 tested with MPRA to 20 mammalian genomes (with a minimum ratio of 0.20 bases that
731 must remap; allowing for multiple output regions). The 20 mammalian genomes are:
732 papAnu2 (Baboon), felCat5 (Cat), PanTro6 (Chimpanzee), BosTau7 (Cow), canFAM3
733 (Dog), loxAfr3 (Elephant), nomLeu3 (Gibbon), gorGor3 (Gorilla), equCab2 (Horse), mm9
734 (mouse), ponAbe2 (Orangutan), aiMel1 (Panda), susScr11 (Pig), ochPri3 (Pika),
735 oryCun2 (Rabbit), rn5 (Rat), rheMac8 (Rhesus), oviAri3 (Sheep), sorAra2 (Shrew),
736 speTri2 (Squirrel). Human sequences that did not liftover to the genome assembly of a
737 given species were subsequently classified as not conserved (with a minimum ratio of
738 0.20 bases that must remap; allowing for multiple output regions).
739 To obtain human-mouse sequence similarity measures, liftover was performed
740 99 times with the minimum ratio of bases that must remap ranging from 0.01 to 1.00 in
741 increments of 0.01 (allowing for multiple output regions). The R package ‘sm’ was used
742 plot density of human-mouse sequence similarity and perform non-parametric bootstrap bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
743 hypothesis tests of equality. Human sequences that did not liftover to the mm9 mouse
744 genome with even 1% sequence similarity were classified as having 0% sequence
745 similarity.
746
747 Cross-species mapping of human ATAC-seq peaks to MIN6 ATAC-seq peaks:
748 Human and mouse ATAC-seq data were processed as previously described11. Briefly,
749 low quality portions of reads were trimmed using Trimmomatic70 and aligned to the hg19
750 or mm9 genome assembly using Burrows Wheeler Aligner-MEM. For each replicate,
751 duplicate reads were removed after shifting. Technical replicates were merged using
752 SAMtools and peaks were called using MACS26 (with parameters -callpeak --nomodel -f
753 BAMPE) at FDR<1%. ATAC-seq peak summit positions were obtained from MACS2
754 output files. The liftover tool in the UCSC genome browser was used to map human
755 ATAC-seq peaks to the mouse (mm9) genome using a minimum ratio of 0.10 bases that
756 must remap (not allowing for multiple output regions). Using bedtools, human ATAC-seq
757 peaks mapping to the mouse genome were then overlapped with MIN6 ATAC-seq
758 peaks.
759
760 Transcription factor (TF) motif enrichment: Homer findMotifsGenome.pl script was
761 used to investigate TF motifs enriched in a given set of sequences. Elements with lower
762 MPRA activity under ER stress were used as background to identify TF motifs enriched
763 in elements with higher MPRA activity under ER stress, and vice-versa (parameters:
764 hg19, -size given). 2,008 T2D-associated elements with no MPRA activity were used as
765 background to identify TF motifs enriched in the 492 T2D SNP-containing sequences
766 with significant MPRA activity (parameters: hg19, -size given). For the cross-species
767 analysis of ATAC-seq peaks, ATAC-seq peaks shared with other human cell types were bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
768 used as background to identify TF motifs enriched in unique ATAC-seq peaks
769 (parameters: mm9, -size given).
770
771 Analysis of islet ChIP-seq data: Chromatin immunoprecipitation sequencing (ChIP-
772 seq) data from Pasquali et al.17 were aligned to the hg19 reference human genome as
773 previously described44. Elements tested with MPRA were then overlapped with ChIP-seq
774 peaks to conduct Fisher’s exact tests using R.
775
776 CentriMo analysis of ATAC-seq peak summits: CentriMo software in the MEME Suite
777 was used to identify TF motifs enriched in the 100 bp genomic regions flanking human
778 islet ATAC-seq peak summits71. Negative controls were 200 bp genomic regions
779 overlapping ATAC-seq peaks but flanking the human islet ATAC-seq peak summits
780 (default parameters).
781
782 Acknowledgements. We thank members of the Stitzel and Ucar labs for helpful
783 discussion and critiques during study design and execution and Francis S. Collins, D.
784 Leland Taylor, Christine Beck and Taneli Helenius for helpful comments on the
785 manuscript. This work was supported by W81XWH-18-0401 (to MLS and DU) and
786 R00HG008179 (to RT).
787
788 References
789 1. 1000 Genomes Project Consortium et al. A global reference for human genetic
790 variation. Nature 526, 68–74 (2015).
791 2. MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide
792 association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
793 3. Mahajan, A. et al. Fine-mapping of an expanded set of type 2 diabetes loci to
794 single-variant resolution using high-density imputation and islet-specific
795 epigenome maps. (2018) doi:10.1101/245506.
796 4. Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536,
797 41–47 (2016).
798 5. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J.
799 Transposition of native chromatin for fast and sensitive epigenomic profiling of
800 open chromatin, DNA-binding proteins and nucleosome position. Nat Meth 10,
801 1213–1218 (2013).
802 6. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137
803 (2008).
804 7. Wang, X. et al. High-resolution genome-wide functional dissection of
805 transcriptional regulatory regions and nucleotides in human. Nat. Commun. 9,
806 5380 (2018).
807 8. Movva, R. et al. Deciphering regulatory DNA sequences and noncoding genetic
808 variants using neural network models of massively parallel reporter assays. PloS
809 One 14, e0218073 (2019).
810 9. Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human
811 expression variation. Nature 482, 390–394 (2012).
812 10. Kumasaka, N., Knights, A. J. & Gaffney, D. J. Fine-mapping cellular QTLs with
813 RASQUAL and ATAC-seq. Nat. Genet. 48, 206–213 (2016).
814 11. Khetan, S. et al. Type 2 Diabetes-Associated Genetic Variants Regulate Chromatin
815 Accessibility in Human Islets. Diabetes 67, 2466–2477 (2018). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
816 12. Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers
817 in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30,
818 271–277 (2012).
819 13. Tewhey, R. et al. Direct Identification of Hundreds of Expression-Modulating
820 Variants using a Multiplexed Reporter Assay. Cell 165, 1519–1529 (2016).
821 14. Ulirsch, J. C. et al. Systematic Functional Dissection of Common Genetic Variation
822 Affecting Red Blood Cell Traits. Cell 165, 1530–1545 (2016).
823 15. Vockley, C. M. et al. Massively parallel quantification of the regulatory effects of
824 noncoding genetic variation in a human cohort. Genome Res. 25, 1206–1214
825 (2015).
826 16. Klein, J. C. et al. Functional testing of thousands of osteoarthritis-associated
827 variants for regulatory activity. Nat. Commun. 10, 2434 (2019).
828 17. Pasquali, L. et al. Pancreatic islet enhancer clusters enriched in type 2 diabetes
829 risk-associated variants. Nat. Genet. 46, 136–143 (2014).
830 18. Parker, S. C. J. et al. Chromatin stretch enhancer states drive cell-specific gene
831 regulation and harbor human disease risk variants. Proc. Natl. Acad. Sci. U. S. A.
832 110, 17921–17926 (2013).
833 19. Roman, T. S. et al. A Type 2 Diabetes-Associated Functional Regulatory Variant in
834 a Pancreatic Islet Enhancer at the Adcy5 Locus. Diabetes (2017)
835 doi:10.2337/db17-0464.
836 20. Kycia, I. et al. A common type 2 diabetes risk variant potentiates activity of an
837 evolutionarily conserved islet stretch enhancer and increases C2CD4A/B
838 expression. Am. J. Hum. Genet. in press, (2018). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
839 21. Varshney, A. et al. Genetic regulatory signatures underlying islet gene expression
840 and type 2 diabetes. Proc. Natl. Acad. Sci. U. S. A. (2017)
841 doi:10.1073/pnas.1621192114.
842 22. Sharma, R. B. et al. Insulin demand regulates β cell number via the unfolded
843 protein response. J. Clin. Invest. 125, 3831–3846 (2015).
844 23. Dooley, J. et al. Genetic predisposition for beta cell fragility underlies type 1 and
845 type 2 diabetes. Nat. Genet. 48, 519–527 (2016).
846 24. Bonnycastle, L. L. et al. Autosomal dominant diabetes arising from a Wolfram
847 syndrome 1 mutation. Diabetes 62, 3943–3950 (2013).
848 25. Delépine, M. et al. EIF2AK3, encoding translation initiation factor 2-alpha kinase
849 3, is mutated in patients with Wolcott-Rallison syndrome. Nat. Genet. 25, 406–
850 409 (2000).
851 26. Booth, C. & Koch, G. L. Perturbation of cellular calcium induces secretion of
852 luminal ER proteins. Cell 59, 729–737 (1989).
853 27. Sehgal, P. et al. Inhibition of the sarco/endoplasmic reticulum (ER) Ca2+-ATPase
854 by thapsigargin analogs induces cell death via ER Ca2+ depletion and the
855 unfolded protein response. J. Biol. Chem. 292, 19656–19673 (2017).
856 28. Stone, S. et al. Pancreatic stone protein/regenerating protein is a potential
857 biomarker for endoplasmic reticulum stress in beta cells. Sci. Rep. 9, 5199
858 (2019).
859 29. Robbins, R. D. et al. Inhibition of deoxyhypusine synthase enhances islet {beta}
860 cell function and survival in the setting of endoplasmic reticulum stress and type
861 2 diabetes. J. Biol. Chem. 285, 39943–39952 (2010). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
862 30. Cunha, D. A. et al. Pancreatic β-cell protection from inflammatory stress by the
863 endoplasmic reticulum proteins thrombospondin 1 and mesencephalic
864 astrocyte-derived neutrotrophic factor (MANF). J. Biol. Chem. 292, 14977–14988
865 (2017).
866 31. Syed, I. et al. PAHSAs attenuate immune responses and promote β cell survival in
867 autoimmune diabetic mice. J. Clin. Invest. 129, 3717–3731 (2019).
868 32. Chen, Y.-C. et al. PAM haploinsufficiency does not accelerate the development of
869 diet- and human IAPP-induced diabetes in mice. Diabetologia 63, 561–576
870 (2020).
871 33. Suetomi, R. et al. Adrenomedullin has a cytoprotective role against ER stress for
872 pancreatic ß-cells in autocrine and paracrine manners. J. Diabetes Investig.
873 (2020) doi:10.1111/jdi.13218.
874 34. Gao, N. et al. Foxa1 and Foxa2 maintain the metabolic and secretory features of
875 the mature beta-cell. Mol. Endocrinol. Baltim. Md 24, 1594–1604 (2010).
876 35. Fu, Z., Gilbert, E. R. & Liu, D. Regulation of insulin synthesis and secretion and
877 pancreatic Beta-cell dysfunction in diabetes. Curr. Diabetes Rev. 9, 25–53 (2013).
878 36. Cullinan, S. B. & Diehl, J. A. Coordination of ER and oxidative stress signaling: the
879 PERK/Nrf2 signaling pathway. Int. J. Biochem. Cell Biol. 38, 317–332 (2006).
880 37. Oyadomari, S. & Mori, M. Roles of CHOP/GADD153 in endoplasmic reticulum
881 stress. Cell Death Differ. 11, 381–389 (2004).
882 38. Efanov, A. M., Sewing, S., Bokvist, K. & Gromada, J. Liver X receptor activation
883 stimulates insulin secretion via modulation of glucose and lipid metabolism in
884 pancreatic beta-cells. Diabetes 53 Suppl 3, S75-78 (2004). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
885 39. Ogihara, T. et al. Liver X receptor agonists augment human islet function through
886 activation of anaplerotic pathways and glycerolipid/free fatty acid cycling. J. Biol.
887 Chem. 285, 5392–5404 (2010).
888 40. Aguayo-Mazzucato, C. et al. Thyroid hormone promotes postnatal rat pancreatic
889 β-cell development and glucose-responsive insulin secretion through MAFA.
890 Diabetes 62, 1569–1580 (2013).
891 41. Aguayo-Mazzucato, C. et al. T3 Induces Both Markers of Maturation and Aging in
892 Pancreatic β-Cells. Diabetes 67, 1322–1331 (2018).
893 42. Brun, P.-J. et al. Retinoic acid receptor signaling is required to maintain glucose-
894 stimulated insulin secretion and β-cell mass. FASEB J. Off. Publ. Fed. Am. Soc. Exp.
895 Biol. 29, 671–683 (2015).
896 43. Peiris, H. et al. Discovering human diabetes-risk gene function with genetics and
897 physiological assays. Nat. Commun. 9, 3855 (2018).
898 44. Stitzel, M. L. et al. Global epigenomic analysis of primary human pancreatic islets
899 provides insights into type 2 diabetes susceptibility loci. Cell Metab. 12, 443–455
900 (2010).
901 45. Gaulton, K. J. et al. A map of open chromatin in human pancreatic islets. Nat.
902 Genet. 42, 255–259 (2010).
903 46. Fogarty, M. P., Panhuis, T. M., Vadlamudi, S., Buchkovich, M. L. & Mohlke, K. L.
904 Allele-specific transcriptional activity at type 2 diabetes-associated single
905 nucleotide polymorphisms in regions of pancreatic islet open chromatin at the
906 JAZF1 locus. Diabetes 62, 1756–1762 (2013). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
907 47. Greenwald, W. W. et al. Pancreatic islet chromatin accessibility and
908 conformation reveals distal enhancer networks of type 2 diabetes risk. Nat.
909 Commun. 10, 2078 (2019).
910 48. Miguel-Escalada, I. et al. Human pancreatic islet three-dimensional chromatin
911 architecture provides insights into the genetics of type 2 diabetes. Nat. Genet. 51,
912 1137–1148 (2019).
913 49. Deininger, P. Alu elements: know the SINEs. Genome Biol. 12, 236 (2011).
914 50. Kono, T. et al. Impaired Store-Operated Calcium Entry and STIM1 Loss Lead to
915 Reduced Insulin Secretion and Increased Endoplasmic Reticulum Stress in the
916 Diabetic β-Cell. Diabetes 67, 2293–2304 (2018).
917 51. Ghiasi, S. M. et al. Endoplasmic Reticulum Chaperone Glucose-Regulated Protein
918 94 Is Essential for Proinsulin Handling. Diabetes 68, 747–760 (2019).
919 52. Xin, Y. et al. Pseudotime Ordering of Single Human β-Cells Reveals States of
920 Insulin Production and Unfolded Protein Response. Diabetes 67, 1783–1794
921 (2018).
922 53. Kim, M. J. et al. Attenuation of PERK enhances glucose-stimulated insulin
923 secretion in islets. J. Endocrinol. 236, 125–136 (2018).
924 54. Hasnain, S. Z., Prins, J. B. & McGuckin, M. A. Oxidative and endoplasmic reticulum
925 stress in β-cell dysfunction in diabetes. J. Mol. Endocrinol. 56, R33-54 (2016).
926 55. Pehrsson, E. C., Choudhary, M. N. K., Sundaram, V. & Wang, T. The epigenomic
927 landscape of transposable elements across normal human development and
928 anatomy. Nat. Commun. 10, 5640 (2019). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
929 56. Su, M., Han, D., Boyd-Kirkup, J., Yu, X. & Han, J.-D. J. Evolution of Alu elements
930 toward enhancers. Cell Rep. 7, 376–385 (2014).
931 57. Jang, H. S. et al. Transposable elements drive widespread expression of
932 oncogenes in human cancers. Nat. Genet. 51, 611–617 (2019).
933 58. Franchini, L. F. et al. Convergent evolution of two mammalian neuronal
934 enhancers by sequential exaptation of unrelated retroposons. Proc. Natl. Acad.
935 Sci. U. S. A. 108, 15270–15275 (2011).
936 59. Choudhary, M. N. et al. Co-opted transposons help perpetuate conserved higher-
937 order chromosomal structures. Genome Biol. 21, 16 (2020).
938 60. Sundaram, V. et al. Widespread contribution of transposable elements to the
939 innovation of gene regulatory networks. Genome Res. 24, 1963–1976 (2014).
940 61. Xie, M. et al. DNA hypomethylation within specific transposable element families
941 associates with tissue-specific enhancer landscape. Nat. Genet. 45, 836–841
942 (2013).
943 62. Sundaram, V. et al. Functional cis-regulatory modules encoded by mouse-specific
944 endogenous retrovirus. Nat. Commun. 8, 14550 (2017).
945 63. Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter
946 profiling reveals widespread alternative promoter usage and transposon-driven
947 developmental gene expression. Genome Res. 23, 169–180 (2013).
948 64. Hernandez, A. J. et al. B2 and ALU retrotransposons are self-cleaving ribozymes
949 whose activity is enhanced by EZH2. Proc. Natl. Acad. Sci. U. S. A. 117, 415–425
950 (2020). bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
951 65. Bejerano, G. et al. A distal enhancer and an ultraconserved exon are derived
952 from a novel retroposon. Nature 441, 87–90 (2006).
953 66. Mita, P. & Boeke, J. D. How retrotransposons shape genome regulation. Curr.
954 Opin. Genet. Dev. 37, 90–100 (2016).
955 67. Lawlor, N. et al. Multiomic Profiling Identifies cis-Regulatory Networks
956 Underlying Human Pancreatic β Cell Identity and Function. Cell Rep. 26, 788-
957 801.e6 (2019).
958 68. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-
959 based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
960 69. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and
961 dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
962 70. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for
963 Illumina sequence data. Bioinforma. Oxf. Engl. 30, 2114–2120 (2014).
964 71. Bailey, T. L. & Machanick, P. Inferring direct DNA binding from ChIP-seq. Nucleic
965 Acids Res. 40, e128 (2012).
966 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 1 a 1.00 b AA Allele-A
RNA = DNA Allele-A Overlap caQTLs not Enhancer
Islet RNA DNA (n=1910) ATAC-seq rs123456789-allele-A mP GFP Barcode-A Peak
0.75 >
AA AA AA Allele-B RNA > DNA AA Allele-B control SNPs AA is Enhancer AA 0.50 (n=2218) RNA DNA
Fraction rs123456789-allele-B mP GFP Barcode-B
Transfect Plasmid Pool MIN6 beta cells (DNA) 0.25 T2D-Associated n=201 SNPs/Indels 2 rs123456789-allele-A mP GFP Barcode-A Sequences (n=2500) 1 (259 Signals) Example SNP Element rs123456789-allele-B mP GFP Barcode-B rs123456789 200 bps 20 bps 0.00 c d *** 10.0 *** *** 7.5 *** ** ** *** 5.0 ( RNA / DNA ) 2 2.5 log
0.0 Odds of MRPA active sequences overlapping human islet ChIP-seq peaks 0.0 0.5 1.0 1.5 2.0 2.5 −2.5 PDX1 CTCF MAFB H2A.Z
0 2000 4000 6000 FOXA2 NKX6.1
Mean DNA Count H3K27ac bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 2 a c MPRA MPRA activity
Plasmid Pool 2 1 0 −1 −2 Active Lower in TG (n=656) rs123456789-allele-A mP GFP Barcode-A Higher in TG (n=328)
rs123456789-allele-B mP GFP Barcode-B 6,628 SNPs 13,628 sequences
Transfect MIN6 mouse beta cells Standard Culture 6 hours
24 hrs 24 hrs
0.025% DMSO 250 nM solvent control Thapsigargin (TG) Replicates 1-5 Replicates 1-5 Replicates 1-5
Harvest cells for RNA TG - - -RNA - -RNA- 30 hrs after transfection DNA
DMSO DMSO TG b d 5.0 6
CEBP Atf4 4 Chop 2.5 LXRE 2 THRa
ed to Gapdh Mef2a
(TG / DMSO) 0.0 2 n−Myc
mali z 0 log ichment) TG-DMSO Nor CRE CLOCK −2 MafA (Enr 2 −2.5 RXR
log Foxa2 Rfx1 −4 ns X−box ns * *** ** * TFE3 USF1 PAX6 Hnf1 −5.0 Actb Ins2 Hspa5 Edem1 −5.0 −2.5 0.0 2.5 5.0 Hsp90b1 log (Enrichment) DMSO-TG Ddit3/Chop 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
a c e Figure 3 Elements tested: JUN p=7.8e-169 TG Model FOS p=2.1e-162 Included ATAC-seq 106 BATF p=2.0e-124 ATAC-seq peak summit (<100 bp) * 129 NF2L2 p=2.2e-116 * * peak NFE2 p=1.6e-107 Excluded ATAC-seq BACH2 p=7.9e-99 peak summit (>100 bp) 81 CREB1 p=5.8e-87 # Active CEBPZ p=1.5e-62 # Elements * * *
Summit Flank NFYB p=1.9e-49 = * 0.0080 949 Summit * * * 0.0070 Flank * n.s. 375 * 153 0.0060 * n.s. * * 0.0050 * n.s.
25mM Probability 0.0040 Glucose 190 DMSO 0.0030 Standard Culture -80 -60 -40 -20 0 20 40 60 80 MPRA activity Distance from sequence center b d 0.2 0.3 0.4 0.5 0.6 0.7
Overlap Islet ATAC-seq peak (n=4329) Elements overlapping ATAC-seq peaks are: Probability of an elemet showing
All elements tested MPRA 0.1 Elements tested: # Active Doesn’t Overlap ATAC-seq peak (n=2299) 0.004 Active # Elements *** (n=1553) Included ATAC-seq = peak summit (<100 bp) *** * Not *** *** Excluded ATAC-seq *** MPRA peak summit (>100 bp) Active
Density (n=2666) # Elements 0 200 400 600 1 35 7 9 11 1315 17 19
MPRA activity 19 19 19 19 19 19 19 19 19 19 0.0 0.2 0.4
0.0 # Islets with overlapping ATAC-seq Peak # Islets −300 −200 −100 0 100 200 300 < 20 % >80 % Probability of an elemet showing 20-40 % 40-60 % 60-80 % Median distance from islet ATAC-seq Human-mouse sequence conservation peak summit t o sequence center bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 4 a b c d ** *** *** ***
** # Allelic Skew # Allelic # MPRA Active = ** # Elements # MPRA Active = (Higher Allele / Lower Allele) 2 has allelic skew in activity 0.1 0.2 0.3 0.4 0.5 0.6 0.7 showing MPRA activity Absolute (median) distance of SNP to ATAC-seq peak summit ATAC-seq of SNP to (median) distance Absolute 0 100 200 300 400 500 Probability an MPRA active element Probability of an element 707 778 283 376 MPRA Activity log 2218 1910 707 778 0.0 0.1 0.2 0.3 0.4 0.5 Control SNPs + + 0.0 0.1 0.2 0.3 0.4 0.5 caQTL SNPs + + MPRA Active + + Control SNPs Control SNPs Control(n=283) SNPs (n=376) caQTL SNPs caQTL SNPs caQTL SNPs bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 5 a * 0.4
rs17396537 0.2
GG n=6 (RNA/DNA) 2 GC n=11 log CC n=2 0.0
ATAC-seq Peak Summit LYPD1
200 bp G C rs17396537 (G/C) element REF ALT b Pearson R = 0.526 *** c *
1 N=24[+2] N=111 * * * Ref < Alt *
rs1515555 (DMSO) rs2943656 0 rs17396537 IRS1 (RNA/DNA) 2 0.2 0.3 0.4 0.5 0.6 log MPRA log2 (Alt / Ref)
rs10428126 0.1 IGF2BP2 n.s
−1 N=133[+2] N=23[+2] Ref > Alt rs1515555 G G A A 0.2 0.3 0.4 0.5 0.6 0.7 0.8 rs1515556 T C T C Ref > Alt caQTL effect size Ref < Alt REF/REF ALT/ALT bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 6 a c 60 LXRE rs9879646
GSC Atf1
THRa EAR2 40 COUP−TFII Std. Culture
value) CRX (25mM Glu)
RARa (allelic skew) DMSO THRb Usf2 Bcl11a Sp2 TG −log10(p KLF5 Sp5
Sp1 0 0.05 0.10 0.15 20 Mef2b log2 (minor allele / major allele) Mef2c KLF14 Mef2a Chop ZBTB20 Nkx3.1 USF1 RAR:RXR Pitx1 NRF rs9879646 200 bp NFkB−p65−Rel Rhesus Mouse 0 Dog −2 0 2 Elephant log2(# Target / # Background) Opossum
b 10 d HIGD1C rs143582379
9 PCNXL2 0.2
8 INTS8 HMG20A
7 0.0 6 5 ke w in MPRA activity IGF2BP2 rs4768945 JAZF1 rs12315928 4 CDKN2B rs4768942 VEGFA 3 rs4768941 MAP3K1 rs12316856 TCF7L2 2 rs224587 −0.4 −0.2 # SNPs with allelic s rs11169627 log2 (minor allele / major allele) (allelic skew) rs73090647 1 ZBTB20 rs73310020 0 HIGD1C SLC11A2 0 1 2 3 4 5 6 Rhesus Mouse log2(# SNPs tested) bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. a b c Figure 7 MPRA activity *** z-score of RNA/DNA 1.00 under ER stress *** *** (n=134) (n=226) (n=16) *** 4 2 0 −2 −4 Lower n.s. caQTLsT2D Higher * 0.75 (n=31)
* *** 0.50 10,000 *** (n=185) random n.s. regions
n.s. SNPs overlapping SINE Repeats 0.25 (n=67) Fraction of elements with T2D-associated 0.0 0.2 0.4 0.6 0.8 1.0
** *** 0.00 (n=24) (n=110) (n=86) No MPRA Activity - + + + + Lower Higher change Allelic Skew - - + - + Change in MPRA activity under Higher Activity - - - + + ER stress of SNPs with allelic Skew under ER stress
d SNPs with Allelic skew caQTLs (n=376) Controls (n=283)
rs224587
rs73090647
10,000 random rs9879646 200 bps Enhancer Activity under ER stress of regions T2D-associated SNPs with Allelic skew from hg19 No change
rs12316856 Probability SNPs with allelic skew are conserved Higher (n=86) 0.0 0.2(n=110) 0.4 0.6 0.8 1.0 12345123451234512345 Replicate ---- (Sequences are conservation if sequence similarity > 20%) Pig Rat Cat Pika Dog Ref Alt Ref Alt Allele Cow Horse Panda Shrew Sheep Gorilla Rabbit Mouse Rhesus
- --- Gibbon Squirrel DMSO TG Treatment Baboon Elephant Orangutan Chimpamzee bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
IndexSNP Locus SNP REF_ALLELE ALT_ALLELE MAF_AFR MAF_AMR rs7903146 TCF7L2 rs7903146 C T 0.28 0.25 rs9472138 VEGFA rs12189774 G A 0.19 0.21 rs3842770 INS-IGF2 rs115420895 G A 0.06 0.01 rs7578326 IRS1 rs2943656 A G 0.6 0.73 rs10507349 RNF6 rs4630391 C T 0.08 0.36 rs1535500 KCNK16 rs12663159 C A 0.86 0.49 rs864745 rs1635852 T C 0.22 0.4 JAZF1 rs864745 rs849133 C T 0.21 0.4 rs10811660 G A 0.05 0.14 rs10811661 CDKN2B-AS1 rs10811661 T C 0.05 0.14 rs10965246 T C 0.07 0.14 rs10428126 T C 0.6 0.29 rs9808924 G A 0.8 0.32 rs1470579 IGF2BP2 rs9854769 A G 0.8 0.31 rs7630554 A G 0.79 0.31 rs35188816 CA C 0.78 0.31 rs1048886 SDHAF4 rs2881632 C A 0.4 0.22 rs12496230 KBTBD8 rs13096599 T C 0.02 0.2 rs6467136 GCC1 rs7783500 G A 0.54 0.4 rs10923931 NOTCH2 rs72697237 A T 0.36 0.12 rs7578597 THADA rs13405776 C T 0.34 0.07 rs7630877 PEX5L rs17748864 C T 0.24 0.27 rs7957197 OASL rs7957197 T A 0.17 0.15 rs34669198 SRBD1 rs13026123 T A 0.09 0.09 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Allele with higher MPRA activity under MAF_ASN MAF_EUR Standard Culture DMSO 250 nM TG 0.03 0.31 - T - 0.11 0.29 - G G 0 0.0013 A - - 0.94 0.62 - G G 0.44 0.24 C C C 0.43 0.49 A - - 0.21 0.54 - C - 0.21 0.53 C C - 0.44 0.16 - G - 0.44 0.16 - C - 0.44 0.17 C C - 0.28 0.29 T T - 0.29 0.29 A - - 0.29 0.29 G G G 0.29 0.28 G G G 0.28 0.27 - CA - 0.12 0.19 A A A 0.08 0.18 C C C 0.67 0.42 G G G 0.03 0.11 T T T 0.01 0.09 T T T 0.15 0.34 T T T 0 0.2 T T T 0.1 0.13 T T T bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
References Comments Stitzel 2010, Gaulton 2010 Validated with MPRA Miguel-Escalada 2019 Validated with MPRA Khetan 2018 Validated with MPRA Khetan 2018 Validated with MPRA Khetan 2018 MPRA has further clarified haplotype Varshney 2017 MPRA has further clarified haplotype Fogarty 2013 Validated with MPRA Novel Wesolowska-Andersen 2019, bioRxiv Validated with MPRA Khetan 2018 Validated with MPRA Novel WW Greenwald 2019 Validated with MPRA Novel Novel Novel Novel Novel Novel Novel Novel Novel Novel Novel Novel