Identifying regulatory mechanisms of human disease
Tim Reddy Genomics and Precision Medicine Forum April 5, 2018 My long term goal
To understand how changes in gene regulation contribute to human health and diseases Two stories: • Identifying genetic mechanisms of disease
• Quantifying the gene regulatory effects of drugs Two stories: • Identifying genetic mechanisms of disease
• Quantifying the gene regulatory effects of drugs There are now thousands of known associations between genotype and phenotype Genetic associations with human traits and diseases are largely non-coding Protein Coding Untranslated Exonic (7%) (2%)
Non-coding (91%) Why this is important • Improved diagnostics • Improved preventative measures • Potential to identify therapeutically actionable mechanisms • Regulatory elements may be targetable therapeutically
For all of these reasons, understanding the regulatory mechanisms of human disease has immense value for improving health. Case study: hyperglycemia during pregnancy GDM and fetal health contribute to a transgenerational cycle of diabetes and obesity
Maternal Obesity/Diabetes
Adult Obesity Fetal Overnutrition and Type 2 Diabetes Macrosomia Metabolic Syndrome
Adolescent Obesity Postnatal Postnatal Early-Onset Type 2 Diabetes Overnutrition Overnutrition
Slide from Bill Lowe, Adapted from: Dabelea and Crume, Diabetes 60:1849, 2011 Hyperglcyemia and Adverse Pregnancy Outcomes (HAPO)
• The HAPO study was designed to address the hypothesis that hyperglycemia is associated with adverse neonatal outcomes.
• GWA between maternal genotype and measures of glucose metabolism identified variants in several genomic regions that are known to be associated with type 2 diabetes
• HAPO also found novel genetic associations with hyperglycemia specifically during pregnancy. Genetic variation on chr10 associated with hyperglycemia during pregnancy
Hayes et al, Diabetes, 2013 Lead imputed variant in 1st intron of HKDC1 rs4746822 rs1983127 (Lead SNP, Imputation) (Genotyped SNP)
Hayes MG et al, Diabetes, 2013 Lots of epigenetic signals of regulation rs4746822 rs1983127 (Lead SNP, Imputation) (Genotyped SNP)
Hayes MG et al, Diabetes, 2013 Candidate regulatory elements in the locus rs4746822 rs1983127 (Lead SNP) (Genotyped SNP)
I II III IV V VI VII VIII X XI IX Guo et al, Nat Comm, 2015 Allele-specific reporter assays
C Luciferase
G Luciferase } Candidate regulatory elements in the locus rs4746822 rs1983127 (Lead SNP) (Genotyped SNP)
I II III IV V VI VII VIII X XI IX Guo et al, Nat Comm, 2015 Many regulatory variants near HKDC1
rs4746822 rs1983127 (Lead SNP) (Genotyped SNP)
Guo et al, Nat Comm, 2015 Regulatory effects are coordinated with respect to risk allele rs4746822 rs1983127 (Lead SNP) (Genotyped SNP)
Direction of effect with respect to GWAS risk allele: Guo et al, Nat Comm, 2015 HKDC1 Complete Literature Review (ca. 2013)
Inferring therapeutic targets from heterogeneous data: HKDC1 is a novel potential therapeutic target for cancer. Li GH, Huang JF., Bioinformatics, 2013
Identification and characterization of genes that control fat deposition in chickens. Claire D'Andre H, Paul W, Shen X, Jia X, Zhang R, Sun L, Zhang X., J Anim Sci Biotechnol. 2013
Identification of HKDC1 and BACE2 as genes influencing glycemic traits during pregnancy through genome-wide association studies. HAPO Study Cooperative Research Group., Diabetes. 2013
Case-control genome-wide association study of attention-deficit/hyperactivity disorder. IMAGE II Consortium Group, J Am Acad Child Adolesc Psychiatry. 2010
Molecular evolution of the vertebrate hexokinase gene family: Identification of a conserved fifth vertebrate hexokinase gene. Irwin DM, Tan H., Comp Biochem Physiol Part D Genomics Proteomics. 2008 HKDC1 Complete Literature Review (ca. 2013)
Inferring therapeutic targets from heterogeneous data: HKDC1 is a novel potential therapeutic target for cancer. Li GH, Huang JF., Bioinformatics, 2013
Identification and characterization of genes that control fat deposition in chickens. Claire D'Andre H, Paul W, Shen X, Jia X, Zhang R, Sun L, Zhang X., J Anim Sci Biotechnol. 2013
Identification of HKDC1 and BACE2 as genes influencing glycemic traits during pregnancy through genome-wide association studies. HAPO Study Cooperative Research Group., Diabetes. 2013
Case-control genome-wide association study of attention-deficit/hyperactivity disorder. IMAGE II Consortium Group, J Am Acad Child Adolesc Psychiatry. 2010
Molecular evolution of the vertebrate hexokinase gene family: Identification of a conserved fifth vertebrate hexokinase gene. Irwin DM, Tan H., Comp Biochem Physiol Part D Genomics Proteomics. 2008 Hexokinase catalyzes the first step in glycolysis Review of hexokinases Wikipedia: There are four important mammalian hexokinase isozymes. (emphasis added)
HK1: • Km < 1 mM glucose HK2: • Can metabolize various hexose sugars
HK3: • Activity saturated at physiologcal glucose concentrations.
HK4 (Glucokinase): Km ~ 8 mM Activity is dynamic over physiological [glucose] Genetic variation near HK1 and HK4 has been associated with diabetes a1.5 HKDC1 mRNA Expression b 1.5 Scrambled
t Cellular HK Activity HKDC1 siRNA 1+2
n
A
u
N
n
o 1.0
o 1.0
R
i
m
s
m
s
A
e
e
e
r
v
i
v
p
t i 0.5
t
x
a 0.5
l
a
E
l
e
e
R R 0.0 Scrambled 1 2 1+2 0.0 HK1 HK2 GCK HKDC1 siRNA 1 C P D c 10 d 1.5 e Adenovirus: F K 1 G H
K
C
n
H
y
D
t o 1.0 anti-HKDC1:
i
i
e
K
v
s
v
i
i
t
s H 5 anti-β-actin:
t
c
e
a
e
r
l
A 0.5
v
p
e
i
t x f R 1.0
a
E
l
e 0 0.0 0
9
R
Purified HKDC1 Controlhas hexokinaseHKDC1 activity4 1 D C O D 1 g K K h d
y
e
H H t 30
i z 0.5
i
kD l
v
i
t
a
)
117 c
g
20 m
r A GFP
m
o
c
/ i HK1
f 80 N
i U 10
( c HKDC1
e 38 p 0 0.0 S 1 1 K C -2 -1 0 1 2 H D K log [glucose (mM)] H 10 Conclusion: HKDC1 is a 5th human hexokinase Guo et al, Nat Comm, 2015 Mouse Model of HKDC1: non-pregnant adults are normoglycemic
Ludvik et al, Endocrinology, 2016 Mouse Model of HKDC1: impaired glucose tolerance in pregnancy
Ludvik et al, Endocrinology, 2016 Summary • Much of the genetics of complex disease maps to non-coding regions of the genome • Mapping causes underlying those associations suggests that multiple genetic variants may underlie those association signals • Doing so can reveal unexpected candidate genes that themselves could have therapeutic potential That was hard.
(And that was an easy case.) The 3q25 locus associated with fetal adiposity
Vockley, Guo, Majoros, et al, Genome Research, 2015 The 3q25 locus associated with fetal adiposity
Epigenetically predicted candidate regulatory elements
Vockley, Guo, Majoros, et al, Genome Research, 2015 The 3q25 locus associated with fetal adiposity Candidate target genes Long-noncoding RNAs in adiposity (Sun et al, 2013)
Vockley, Guo, Majoros, et al, Genome Research, 2015 The 3q25 locus associated with fetal adiposity Candidate target genes Cyclin-L1 Could be involved in cell cycling
Vockley, Guo, Majoros, et al, Genome Research, 2015 The 3q25 locus associated with fetal adiposity Candidate target genes Makes fruit flies fat (Melted gene, Teleman et al, 2005)
Vockley, Guo, Majoros, et al, Genome Research, 2015 Making allele-specific reporter assays high-throughput
C Luciferase
T Luciferase STARR-seq reporter assays
GFP
STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays
GFP
Regulatory elements located in the 3’ UTR of the reporter gene.
STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays
GFP
From that position, the elements regulate their own expression.
STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays
GFP
AAAAAAA..... AAAAAAA.....
AAAAAAA..... AAAAAAA.....
STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays
GFP
AAAAAAA.....
STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays
GFP
AAAAAAA..... AAAAAAA..... AAAAAAA..... AAAAAAA..... AAAAAAA.....
STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays
GFP Read 1 Read 2
AAAAAAA..... AAAAAAA..... AAAAAAA..... AAAAAAA..... AAAAAAA.....
STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays
GFP
4 Regulatory Element activity 1
STARR-seq: Arnold et al, Science, 2013 A platform for diverse studies
GFP
STARR-seq: Arnold et al, Science, 2013 A platform for diverse studies
GFP
STARR-seq: Arnold et al, Science, 2013 A platform for diverse studies
Bacterial Artificial Chromosomes
GFP
STARR-seq: Arnold et al, Science, 2013 Comprehensively assaying genomic responses to steroid hormones
GFP
Vockley et al, Cell, 2016 Probe-based capture of GWAS regions Patient DNA Custom RNA Pulldown of Input Baits Selected Regions
GFP Coverage of Target Regions
Coverage of reporter assays in the region Coverage of Target Regions
Output from POP-STARR assays in HepG2 cells Coverage of reporter assays in the region
Coverage of reporter assays in the region Coverage of Target Regions
Coverage of reporter assays in the region Differences in Regulatory Activity Predict Hyperglycemia rs6517656 1
0.8 퐴푙푡푒푟푛푎푡푒 퐴푙푙푒푙푒 푅푁퐴ൗ 0.6 퐴푙푡푒푟푛푎푡푒 퐴푙푙푒푙푒 퐷푁퐴 0.4 푅푒푓 퐴푙푙푒푙푒푠 푅푁퐴 ൗ 0.2 푅푒푓 퐴푙푙푒푙푒푠 퐷푁퐴 0
Normalized Expression Normalized 1 2
Normalized Expression Normalized Ref Alt Differences in Regulatory Activity Predict Hyperglycemia
rs6517656 rs1541103 rs2776343 rs13049843 1 2.5 3 1
2.5 ff 0.8 2 0.8 2 0.6 1.5 0.6 1.5 0.4 1 0.4 1
0.2 0.5 0.5 0.2
Normalized Expression Normalized Expression Normalized
0 0 0 0 Normalized Expression Normalized Normalized Expression Normalized 1 2 Ref1 Alt2 Ref1 Alt2 1 2 Normalized Expression Normalized Ref Alt Ref Alt
Long Distance Noncoding Variants
log10 log10 value p
- 0 0 3 5 8
41 41.2 41.4 41.6 41.8 42 Position on chr21(MB)
Long Distance Noncoding Variants
log10 log10 value p
-
0 0 2 4 6 8 8
41 41.2 41.4 41.6 41.8 42 Position on chr21(MB) CRISPR/Cas9 epigenome editing
Genome Editing Epigenome Editing
Nature (2016) Enhancer Activation: dCas9-P300 at HS2 activates globin expression 46 kb away
Hilton et al, Nature Biotechnology, 2015 Enhancer Repression: dCas9-KRAB causes H3K9me3...
5265000 5275000 5305000 (gRNA target) HS2 HBE1 HBG2 HBG1 HBBP1 HBD HBB
DNase HS
H3K9me3 ChIP-seq: dCas9-KRAB only
dCas9 + Cr4
dCas9 + Cr10
Thakore et al, Nature Methods, 2015 Enhancer Repression: dCas9-KRAB causes H3K9me3...
5265000 5275000 5305000 (gRNA target) HS2 HBE1 HBG2 HBG1 HBBP1 HBD HBB
DNase HS
H3K9me3 ChIP-seq: dCas9-KRAB only
dCas9 + Cr4 dCas9-KRAB + Cr4
dCas9 + Cr10 dCas9-KRAB + Cr10 Thakore et al, Nature Methods, 2015 ...and specific silencing of globin gene expression
HBB HBD HBG1
(relative (relative expression) HBG2 2 HBBP1
log HBE1
Average expression Thakore et al, Nature Methods, 2015 Using dCas9-P300 to reveal target genes in genetic associations
gRNAs designed to target dCas9 or dCas9P300 to candidate HKDC1 regulatory elements.
Karl Guo Summary • Using reporter assays we can systematically and comprehensively assays regulatory variants from patient DNA across genetic association loci • Combined with CRISPR-based approached to reprogram the epigenome, we can find target genes for those variants. • Together, this combination of approaches is directing us to unstudied target genes, revealing valuable new biological mechanisms of disease Two stories: • Identifying genetic mechanisms of disease
• Quantifying the gene regulatory effects of drugs Glucocorticoids (GCs)
Suppresses the immune system Reduces inflammation Glucocorticoids (GCs)
Suppresses the immune system Reduces inflammation
Increases blood sugar Reduces bone density The basic mechanism of the GC response GCs are one of the most extensively characterized genomic responses
GR binds tens of thousands of sites in the human genome across many different cell types.
That binding coordinates with a host of other TFs and co-factors to ultimately regulate gene expression.
There are 100’s to 1000’s of GC-responsive genes, with slightly more activated genes than repressed genes. Many fundamental questions remain open
We still do not know how GR chooses which genomic sites to bind.
We still do not know which of those binding sites are functional.
We still do not know how GR typically represses genes. Quantifying the functionality of GR binding sites
Luciferase Reporter Assays Quantifying the functionality of GR binding sites
Luciferase Reporter Assays ChIP-STARR-seq Assays
Vockley et al, Cell, 2016 A platform for diverse studies
Whole human genome
GFP
STARR-seq: Arnold et al, Science, 2013 Whole human genome STARR-seq libraries
Graham Johnson Whole human genome STARR-seq libraries
NA12878 DNA WG-SS library A549 transformation transfection
20 mL of Gibson Assembly reactions
A extract differential e
call g
N
d
l
n
RNAs R
peaks enrichment o
a
F
h
A 0 1 4 812 c dex exposure (h) library deep N 5x biological replicates construction sequencing D Activity Graham Johnson Whole human genome STARR-seq libraries
NA12878 DNA WG-SS library A549 transformation transfection
20 mL of Gibson Assembly reactions
A extract differential e
call g
N
d
l
n
RNAs R
peaks enrichment o
a
F
h
A 0 1 4 812 c dex exposure (h) library deep N 5x biological replicates construction sequencing D Activity Graham Johnson Library coverage and fragment size
Median fragment Median 55x size of 390 bp genome coverage
Graham Johnson Whole human genome STARR-seq libraries
109 cells
5 replicates each 8 rxns per replicate
3 x 109 reads for DNA 3 x 109 reads for RNA
Graham Johnson Whole human genome STARR-seq for the GC response
Graham Johnson Whole human genome STARR-seq for the GC response
Graham Johnson Whole human genome STARR-seq for the GC response
Graham Johnson GC regulatory activity increase across the time course
Graham Johnson, PhD http The landscape The landscape of thegenomic DEXresponse :// Data browser:Data ggr.reddylab.org
DEX-responsive regulatory elements McDowellposter #99 http://www.encodeproject.org Data download Data Whole human genome STARR-seq for the GC response Whole human genome STARR-seq for the GC response In contrast, GR binding occurs early and wanes over time
Graham Johnson, PhD GR and AP1 motifs prominent at dex-induced sites
A Q4 Q3 Q2 Q1 Q1 Q2 Q3 Q4 B TP53 43 27 25 24 Branch 1 3 27 63 94 GR 15 AP1 31 27 25 27 **
HES2 9.4 12 13 9.1 17 14 16 4 AP1 ) ns *
n
M
u FOXG1 4.6 7.3 10 K 10
J
P
c
3.3 1.6 3.1 1.7 NFAT5
R
2
CREB1 6.1 5.8 (
g
q
o
l
MEF2D 4.6 3.9 2 e
s 5.1 TP73 Branch 2 n
-
a 5
P
GATA2 3.8 e
I
h
M
NFKB2 3.9 1.6 1.4 SP8 C IRF8 1.9 0 1.4 GATA2 TCF3 1.5 1.4 Q1 Q2 Q3 Q4 TBP 1.6 2.1 ZFX Binned steady-state REs HOXA10 2.1 2.1 C STAT1 4.5 1.9 1.5 1.9 HSF1 15
q ** SP8 5.7 e Q1 vs. Q2 ****
s
- 1.6 1.6 CREB1 ) Q1 vs. Q3 ****
P
ZFX 1.8 I
M
h 10 Q1 vs. Q4 ****
K
C
P NRF1 1.8 Q2 vs. Q3 ****
1.5 FOXG1 n
R
(
u SOX6 4 Q2 vs. Q4 **** J 5
C
c
F POU3F1 1.4 2.9 2.2 SOX6 2 Q3 vs. Q4 **
g
o 0 NFAT5 7.1 4.3 L 1.6 3.4 HES2 ZBTB33 9 1.3 -5 -Log10 Adj. P-values 30 0 0 20 100 Q1 Q2 Q3 Q4 25 20 15 10 5 0 0 4 8 12 16 20 Binned 8 h DREs P53 and AP1 prominent at repressed sites
A Q4 Q3 Q2 Q1 Q1 Q2 Q3 Q4 B TP53 43 27 25 24 Branch 1 3 27 63 94 GR 15 AP1 31 27 25 27 **
HES2 9.4 12 13 9.1 17 14 16 4 AP1 ) ns *
n
M
u FOXG1 4.6 7.3 10 K 10
J
P
c
3.3 1.6 3.1 1.7 NFAT5
R
2
CREB1 6.1 5.8 (
g
q
o
l
MEF2D 4.6 3.9 2 e
s 5.1 TP73 Branch 2 n
-
a 5
P
GATA2 3.8 e
I
h
M
NFKB2 3.9 1.6 1.4 SP8 C IRF8 1.9 0 1.4 GATA2 TCF3 1.5 1.4 Q1 Q2 Q3 Q4 TBP 1.6 2.1 ZFX Binned steady-state REs HOXA10 2.1 2.1 C STAT1 4.5 1.9 1.5 1.9 HSF1 15
q ** SP8 5.7 e Q1 vs. Q2 ****
s
- 1.6 1.6 CREB1 ) Q1 vs. Q3 ****
P
ZFX 1.8 I
M
h 10 Q1 vs. Q4 ****
K
C
P NRF1 1.8 Q2 vs. Q3 ****
1.5 FOXG1 n
R
(
u SOX6 4 Q2 vs. Q4 **** J 5
C
c
F POU3F1 1.4 2.9 2.2 SOX6 2 Q3 vs. Q4 **
g
o 0 NFAT5 7.1 4.3 L 1.6 3.4 HES2 ZBTB33 9 1.3 -5 -Log10 Adj. P-values 30 0 0 20 100 Q1 Q2 Q3 Q4 25 20 15 10 5 0 0 4 8 12 16 20 Binned 8 h DREs Identification of allele-specific GC responses 50% of dex-induced activity is outside of DHS
Graham Johnson, PhD 50% of dex-induced activity is outside of DHS
One hypothesis: some of those sites are sub-threshold DHS
Graham Johnson, PhD 50% of dex-induced activity is outside of DHS
One hypothesis: some of those sites are sub-threshold DHS
Q: Do those sites bind TFs? Q: Do those sites bind GR? A: About 10% A: 258 of 1700 (~15%)
Graham Johnson, PhD 50% of dex-induced activity is outside of DHS
One hypothesis: some of those sites are sub-threshold DHS
Q: Do those sites bind TFs? Q: Do those sites bind GR? A: About 10% A: 258 of 1700 (~15%)
Graham Johnson, PhD 50% of dex-induced activity is outside of DHS
Our hypothesis: the remaining closed chromatin sites are latent GR response elements in other cell types.
Graham Johnson, PhD 50% of dex-induced activity is outside of DHS
Our hypothesis: the remaining closed chromatin sites are latent GR response elements in other cell types.
Graham Johnson, PhD Summary
High coverage wgSTARR-seq assays provide detailed maps of the effect of GCs on regulatory element activity.
Can also be used to map allele-specific activity, and latent enhancers in other cells.
Because the assays do not require prior knowledge of mechanism, they can be used to measure regulatory responses to many environmental signals. Reddy Lab: Acknowledgements Funding: Graham Johnson, PhD William Lowe (NW) Sarah Cunningham Geoff Hayes (NW) Bill Majoros, PhD Denise Scholtens (NW) Keith Siklenka, PhD Mike Nodzenski Anthony D’Ippolito, PhD Ian McDowell, PhD Brian Layden Lab (UIC) Young-Sook Kim Jungkyun Seo Greg Crawford Lab: Reddy Lab Alums: Alejandro Barrera Alexias Safi Chris Vockley, PhD Luke Bartelt Ling Song Karl Guo, PhD Courtney Williams Nicole Clark Linda Hong Ana Berglind Alex Hartemink Lab: Sarah Leichter Charlie Gersbach Lab: Kevin Luo, PhD Dewran Kocak Barbara Engelhardt Lab: Tyler Klann, Josh Black Bianca Dumitrascu