Functional Characterization of Thousands of Type 2 Diabetes-Associated And
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Functional characterization of thousands of type 2 diabetes-associated and 2 chromatin-modulating variants under steady state and endoplasmic reticulum 3 stress 4 5 Shubham Khetan1,3, Susan Kales2, Romy Kursawe1, Alexandria Jillette1, Steven K. 6 Reilly4, Duygu Ucar1,3,5, Ryan Tewhey2,6,7*, and Michael L. Stitzel1,3,5* 7 8 1The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA 9 2The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, 04609, USA 10 3Department of Genetics and Genome Sciences, University of Connecticut, Farmington, 11 CT, 06032, USA, 12 4Broad Institute of MIT and Harvard, Cambridge, MA, USA 13 5Institute of Systems Genomics, University of Connecticut, Farmington, CT, 06032, USA 14 6Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, 15 ME, 04469, USA. 16 7Tufts University School of Medicine, Boston, MA, 02111, USA 17 18 *Corresponding authors: [email protected] and [email protected] 19 20 Keywords: Massively Parallel Reporter Assay (MPRA), human pancreatic islets, MIN6 21 beta cells, chromatin accessibility, ATAC-seq, chromatin accessibility quantitative trait 22 locus (caQTLs), genome-wide association study (GWAS), single nucleotide 23 polymorphism (SNP), Type 2 diabetes (T2D), endoplasmic reticulum (ER) stress, short 24 interspersed nuclear elements (SINE), Alu elements. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 25 Abstract: (199 words) 26 A major goal in functional genomics and complex disease genetics is to identify 27 functional cis-regulatory elements (CREs) and single nucleotide polymorphisms (SNPs) 28 altering CRE activity in disease-relevant cell types and environmental conditions. We 29 tested >13,000 sequences containing each allele of 6,628 SNPs associated with altered 30 in vivo chromatin accessibility in human islets and/or type 2 diabetes risk (T2D GWAS 31 SNPs) for transcriptional activity in ß cell under steady state and endoplasmic reticulum 32 (ER) stress conditions using the massively parallel reporter assay (MPRA). 33 Approximately 30% (n=1,983) of putative CREs were active in at least one condition. 34 SNP allelic effects on in vitro MPRA activity strongly correlated with their effects on in 35 vivo islet chromatin accessibility (Pearson r=0.52), i.e., alleles associated with increased 36 chromatin accessibility exhibited higher MPRA activity. Importantly, MPRA identified 37 220/2500 T2D GWAS SNPs, representing 104 distinct association signals, that 38 significantly altered transcriptional activity in ß cells. This study has thus identified 39 functional ß cell transcription-activating sequences with in vivo relevance, uncovered 40 regulatory features that modulate transcriptional activity in ß cells under steady state and 41 ER stress conditions, and substantially expanded the set of putative functional variants 42 that modulate transcriptional activity in ß cells from thousands of genetically-linked T2D 43 GWAS SNPs. bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 44 Introduction 45 Studies over the past decade have identified millions of single nucleotide 46 polymorphisms (SNPs) in the human population1. Genome-wide association studies 47 (GWAS) have linked thousands of these SNPs to variability in physiological traits and 48 disease susceptibility2, including type 2 diabetes (T2D)3. The overwhelming majority of 49 GWAS SNPs reside in non-coding, regulatory regions of the genome3,4, implicating 50 altered transcriptional regulation as a common molecular mechanism underlying disease 51 risk. Compared to protein-coding regions, predicting regulatory functions of non-coding 52 sequences, and the effect of SNPs within them, remains a significant challenge. 53 Approaches such as ATAC-seq identify regions of accessible chromatin as a 54 marker of in vivo cis-regulatory elements (CREs)5. However, CREs identified by 55 chromatin accessibility (ATAC-seq peaks) vary significantly in length. Moreover, ATAC- 56 seq peaks are characterized by 1 or more summits, which have higher chromatin 57 accessibility than flanking genomic regions also within ATAC-seq peaks6. Few studies 58 have explored the relationship between transcriptional activity and open chromatin 59 regions defined by ATAC-seq7,8. For instance, it is not known whether higher chromatin 60 accessibility within ATAC-seq peaks also corresponds to enhanced transcriptional 61 activity, or whether proximity to ATAC-seq peak summits influences the likelihood for a 62 SNP to affect enhancer activity. Studies designed to identify chromatin accessibility 63 quantitative trait loci (caQTL) are being increasingly applied to nominate SNPs altering 64 CRE activity9,10. Previously, we identified 2,949 caQTLs in human islets11, which were 65 significantly enriched for type 2 diabetes (T2D)-associated SNPs, highlighting the utility 66 of this approach to nominate putative disease-associated SNPs affecting CRE activity in 67 relevant cell types. However, caQTLs are correlative associations, and high throughput 68 assays to directly test variant effects are needed to experimentally establish causality. 69 bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 70 Massively parallel reporter assays (MPRAs) have been developed as a functional 71 genomics platform to interrogate the enhancer potential of thousands of sequences 72 simultaneously12 under a variety of (patho)physiologic conditions. By introducing 73 nucleotide changes in a given sequence of interest, the effect of naturally occurring 74 sequence variants in the human population on MPRA activity can also be elucidated13. 75 Several studies have employed MPRA to identify functional SNPs associated with red 76 blood cell traits14, adiposity15, osteoarthritis16, cancer and eQTLs13. 77 Although studies over the past several years implicate altered islet CRE function 78 in T2D genetic risk and progression, they have colocalized only ~1/4 of T2D-associated 79 loci to altered chromatin accessibility and/or gene expression levels in islets11,17–20. We 80 hypothesize that this is partially because previous studies measured chromatin 81 accessibility (caQTLs) and gene expression (eQTL) in islets cultured under steady state 82 conditions11,21, consequently missing important gene-environment interactions 83 characteristic of a complex disorder like T2D. For example, endoplasmic reticulum (ER) 84 stress, which is elicited by the high demands of insulin production and secretion on 85 protein folding capacity in β cells, leads to (patho)physiologic adaptations. Low levels of 86 ER stress are a signal for β cells to proliferate as a mechanism to meet higher insulin 87 secretion demands22. However, uncompensated ER stress induced by genetic and/or 88 environmental perturbations leads to β cell failure and T2D23. 89 Here, we applied MPRA to directly test >13,000 sequences, including those 90 overlapping T2D GWAS and islet caQTL SNPs, for their ability to activate ß cell 91 transcription in steady state conditions and after exposure to the ER stress-inducing 92 agent thapsigargin or DMSO solvent control. We identified motifs and features of 93 sequences affecting MPRA activity in each condition and linked allelic effects on MPRA 94 activity to T2D genetics and in vivo islet chromatin accessibility. These results 95 demonstrate the power of MPRA to elucidate in vivo relevant ß cell transcriptional bioRxiv preprint doi: https://doi.org/10.1101/2020.02.12.939348; this version posted February 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 96 activating sequences and define the regulatory grammar of these sequences under 97 steady state and stress-responsive conditions. We nominate 54 putative causal variants 98 among multiple genetically-linked islet dysfunction and T2D GWAS SNPs for further 99 study. 100 101 Results 102 Selection and testing of sequences for MPRA activity in β cells 103 To test putative islet β cell regulatory sequences for their ability to enhance 104 transcription from a minimal promoter and to identify SNPs that alter regulatory activity, 105 we generated an MPRA library containing: i) 1,910 SNPs significantly associated with 106 changes in human islet chromatin accessibility (caQTLs)11; ii) 2,218 control SNPs 107 overlapping human islet ATAC-seq peaks, which were reported to have no correlation 108 with changes in human islet chromatin accessibility11; and iii) 2,500 index and genetically 109 linked (r2>0.8) SNPs/indels corresponding to 259 T2D association signals in the 110 NHGRI/EBI GWAS Catalogue (Figure 1a,