Identifying regulatory mechanisms of human disease

Tim Reddy Genomics and Precision Medicine Forum April 5, 2018 My long term goal

To understand how changes in regulation contribute to human health and diseases Two stories: • Identifying genetic mechanisms of disease

• Quantifying the gene regulatory effects of drugs Two stories: • Identifying genetic mechanisms of disease

• Quantifying the gene regulatory effects of drugs There are now thousands of known associations between genotype and phenotype Genetic associations with human traits and diseases are largely non-coding Protein Coding Untranslated Exonic (7%) (2%)

Non-coding (91%) Why this is important • Improved diagnostics • Improved preventative measures • Potential to identify therapeutically actionable mechanisms • Regulatory elements may be targetable therapeutically

For all of these reasons, understanding the regulatory mechanisms of human disease has immense value for improving health. Case study: hyperglycemia during pregnancy GDM and fetal health contribute to a transgenerational cycle of diabetes and obesity

Maternal Obesity/Diabetes

Adult Obesity Fetal Overnutrition and Type 2 Diabetes Macrosomia Metabolic Syndrome

Adolescent Obesity Postnatal Postnatal Early-Onset Type 2 Diabetes Overnutrition Overnutrition

Slide from Bill Lowe, Adapted from: Dabelea and Crume, Diabetes 60:1849, 2011 Hyperglcyemia and Adverse Pregnancy Outcomes (HAPO)

• The HAPO study was designed to address the hypothesis that hyperglycemia is associated with adverse neonatal outcomes.

• GWA between maternal genotype and measures of identified variants in several genomic regions that are known to be associated with type 2 diabetes

• HAPO also found novel genetic associations with hyperglycemia specifically during pregnancy. Genetic variation on chr10 associated with hyperglycemia during pregnancy

Hayes et al, Diabetes, 2013 Lead imputed variant in 1st intron of HKDC1 rs4746822 rs1983127 (Lead SNP, Imputation) (Genotyped SNP)

Hayes MG et al, Diabetes, 2013 Lots of epigenetic signals of regulation rs4746822 rs1983127 (Lead SNP, Imputation) (Genotyped SNP)

Hayes MG et al, Diabetes, 2013 Candidate regulatory elements in the locus rs4746822 rs1983127 (Lead SNP) (Genotyped SNP)

I II III IV V VI VII VIII X XI IX Guo et al, Nat Comm, 2015 Allele-specific reporter assays

C Luciferase

G Luciferase } Candidate regulatory elements in the locus rs4746822 rs1983127 (Lead SNP) (Genotyped SNP)

I II III IV V VI VII VIII X XI IX Guo et al, Nat Comm, 2015 Many regulatory variants near HKDC1

rs4746822 rs1983127 (Lead SNP) (Genotyped SNP)

Guo et al, Nat Comm, 2015 Regulatory effects are coordinated with respect to risk allele rs4746822 rs1983127 (Lead SNP) (Genotyped SNP)

Direction of effect with respect to GWAS risk allele: Guo et al, Nat Comm, 2015 HKDC1 Complete Literature Review (ca. 2013)

Inferring therapeutic targets from heterogeneous data: HKDC1 is a novel potential therapeutic target for cancer. Li GH, Huang JF., Bioinformatics, 2013

Identification and characterization of that control fat deposition in chickens. Claire D'Andre H, Paul W, Shen X, Jia X, Zhang R, Sun L, Zhang X., J Anim Sci Biotechnol. 2013

Identification of HKDC1 and BACE2 as genes influencing glycemic traits during pregnancy through genome-wide association studies. HAPO Study Cooperative Research Group., Diabetes. 2013

Case-control genome-wide association study of attention-deficit/hyperactivity disorder. IMAGE II Consortium Group, J Am Acad Child Adolesc Psychiatry. 2010

Molecular evolution of the vertebrate gene family: Identification of a conserved fifth vertebrate hexokinase gene. Irwin DM, Tan H., Comp Biochem Physiol Part D Genomics Proteomics. 2008 HKDC1 Complete Literature Review (ca. 2013)

Inferring therapeutic targets from heterogeneous data: HKDC1 is a novel potential therapeutic target for cancer. Li GH, Huang JF., Bioinformatics, 2013

Identification and characterization of genes that control fat deposition in chickens. Claire D'Andre H, Paul W, Shen X, Jia X, Zhang R, Sun L, Zhang X., J Anim Sci Biotechnol. 2013

Identification of HKDC1 and BACE2 as genes influencing glycemic traits during pregnancy through genome-wide association studies. HAPO Study Cooperative Research Group., Diabetes. 2013

Case-control genome-wide association study of attention-deficit/hyperactivity disorder. IMAGE II Consortium Group, J Am Acad Child Adolesc Psychiatry. 2010

Molecular evolution of the vertebrate hexokinase gene family: Identification of a conserved fifth vertebrate hexokinase gene. Irwin DM, Tan H., Comp Biochem Physiol Part D Genomics Proteomics. 2008 Hexokinase catalyzes the first step in glycolysis Review of Wikipedia: There are four important mammalian hexokinase isozymes. (emphasis added)

HK1: • Km < 1 mM glucose HK2: • Can metabolize various hexose sugars

HK3: • Activity saturated at physiologcal glucose concentrations.

HK4 (): Km ~ 8 mM Activity is dynamic over physiological [glucose] Genetic variation near HK1 and HK4 has been associated with diabetes a1.5 HKDC1 mRNA Expression b 1.5 Scrambled

t Cellular HK Activity HKDC1 siRNA 1+2

n

A

u

N

n

o 1.0

o 1.0

R

i

m

s

m

s

A

e

e

e

r

v

i

v

p

t i 0.5

t

x

a 0.5

l

a

E

l

e

e

R R 0.0 Scrambled 1 2 1+2 0.0 HK1 HK2 GCK HKDC1 siRNA 1 C P D c 10 d 1.5 e Adenovirus: F K 1 G H

K

C

n

H

y

D

t o 1.0 anti-HKDC1:

i

i

e

K

v

s

v

i

i

t

s H 5 anti-β-actin:

t

c

e

a

e

r

l

A 0.5

v

p

e

i

t x f R 1.0

a

E

l

e 0 0.0 0

9

R

Purified HKDC1 Controlhas hexokinaseHKDC1 activity4 1 D C O D 1 g K K h d

y

e

H H t 30

i z 0.5

i

kD l

v

i

t

a

)

117 c

g

20 m

r A GFP

m

o

c

/ i HK1

f 80 N

i U 10

( c HKDC1

e 38 p 0 0.0 S 1 1 K C -2 -1 0 1 2 H D K log [glucose (mM)] H 10 Conclusion: HKDC1 is a 5th human hexokinase Guo et al, Nat Comm, 2015 Mouse Model of HKDC1: non-pregnant adults are normoglycemic

Ludvik et al, Endocrinology, 2016 Mouse Model of HKDC1: impaired glucose tolerance in pregnancy

Ludvik et al, Endocrinology, 2016 Summary • Much of the genetics of complex disease maps to non-coding regions of the genome • Mapping causes underlying those associations suggests that multiple genetic variants may underlie those association signals • Doing so can reveal unexpected candidate genes that themselves could have therapeutic potential That was hard.

(And that was an easy case.) The 3q25 locus associated with fetal adiposity

Vockley, Guo, Majoros, et al, Genome Research, 2015 The 3q25 locus associated with fetal adiposity

Epigenetically predicted candidate regulatory elements

Vockley, Guo, Majoros, et al, Genome Research, 2015 The 3q25 locus associated with fetal adiposity Candidate target genes Long-noncoding RNAs in adiposity (Sun et al, 2013)

Vockley, Guo, Majoros, et al, Genome Research, 2015 The 3q25 locus associated with fetal adiposity Candidate target genes Cyclin-L1 Could be involved in cell cycling

Vockley, Guo, Majoros, et al, Genome Research, 2015 The 3q25 locus associated with fetal adiposity Candidate target genes Makes fruit flies fat (Melted gene, Teleman et al, 2005)

Vockley, Guo, Majoros, et al, Genome Research, 2015 Making allele-specific reporter assays high-throughput

C Luciferase

T Luciferase STARR-seq reporter assays

GFP

STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays

GFP

Regulatory elements located in the 3’ UTR of the reporter gene.

STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays

GFP

From that position, the elements regulate their own expression.

STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays

GFP

AAAAAAA..... AAAAAAA.....

AAAAAAA..... AAAAAAA.....

STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays

GFP

AAAAAAA.....

STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays

GFP

AAAAAAA..... AAAAAAA..... AAAAAAA..... AAAAAAA..... AAAAAAA.....

STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays

GFP Read 1 Read 2

AAAAAAA..... AAAAAAA..... AAAAAAA..... AAAAAAA..... AAAAAAA.....

STARR-seq: Arnold et al, Science, 2013 STARR-seq reporter assays

GFP

4 Regulatory Element activity 1

STARR-seq: Arnold et al, Science, 2013 A platform for diverse studies

GFP

STARR-seq: Arnold et al, Science, 2013 A platform for diverse studies

GFP

STARR-seq: Arnold et al, Science, 2013 A platform for diverse studies

Bacterial Artificial

GFP

STARR-seq: Arnold et al, Science, 2013 Comprehensively assaying genomic responses to steroid hormones

GFP

Vockley et al, Cell, 2016 Probe-based capture of GWAS regions Patient DNA Custom RNA Pulldown of Input Baits Selected Regions

GFP Coverage of Target Regions

Coverage of reporter assays in the region Coverage of Target Regions

Output from POP-STARR assays in HepG2 cells Coverage of reporter assays in the region

Coverage of reporter assays in the region Coverage of Target Regions

Coverage of reporter assays in the region Differences in Regulatory Activity Predict Hyperglycemia rs6517656 1

0.8 퐴푙푡푒푟푛푎푡푒 퐴푙푙푒푙푒 푅푁퐴ൗ 0.6 퐴푙푡푒푟푛푎푡푒 퐴푙푙푒푙푒 퐷푁퐴 0.4 푅푒푓 퐴푙푙푒푙푒푠 푅푁퐴 ൗ 0.2 푅푒푓 퐴푙푙푒푙푒푠 퐷푁퐴 0

Normalized Expression Normalized 1 2

Normalized Expression Normalized Ref Alt Differences in Regulatory Activity Predict Hyperglycemia

rs6517656 rs1541103 rs2776343 rs13049843 1 2.5 3 1

2.5 ff 0.8 2 0.8 2 0.6 1.5 0.6 1.5 0.4 1 0.4 1

0.2 0.5 0.5 0.2

Normalized Expression Normalized Expression Normalized

0 0 0 0 Normalized Expression Normalized Normalized Expression Normalized 1 2 Ref1 Alt2 Ref1 Alt2 1 2 Normalized Expression Normalized Ref Alt Ref Alt

Long Distance Noncoding Variants

log10 log10 value p

- 0 0 3 5 8

41 41.2 41.4 41.6 41.8 42 Position on chr21(MB)

Long Distance Noncoding Variants

log10 log10 value p

-

0 0 2 4 6 8 8

41 41.2 41.4 41.6 41.8 42 Position on chr21(MB) CRISPR/Cas9 epigenome editing

Genome Editing Epigenome Editing

Nature (2016) Enhancer Activation: dCas9-P300 at HS2 activates globin expression 46 kb away

Hilton et al, Nature Biotechnology, 2015 Enhancer Repression: dCas9-KRAB causes H3K9me3...

5265000 5275000 5305000 (gRNA target) HS2 HBE1 HBG2 HBG1 HBBP1 HBD HBB

DNase HS

H3K9me3 ChIP-seq: dCas9-KRAB only

dCas9 + Cr4

dCas9 + Cr10

Thakore et al, Nature Methods, 2015 Enhancer Repression: dCas9-KRAB causes H3K9me3...

5265000 5275000 5305000 (gRNA target) HS2 HBE1 HBG2 HBG1 HBBP1 HBD HBB

DNase HS

H3K9me3 ChIP-seq: dCas9-KRAB only

dCas9 + Cr4 dCas9-KRAB + Cr4

dCas9 + Cr10 dCas9-KRAB + Cr10 Thakore et al, Nature Methods, 2015 ...and specific silencing of globin gene expression

HBB HBD HBG1

(relative (relative expression) HBG2 2 HBBP1

log HBE1

Average expression Thakore et al, Nature Methods, 2015 Using dCas9-P300 to reveal target genes in genetic associations

gRNAs designed to target dCas9 or dCas9P300 to candidate HKDC1 regulatory elements.

Karl Guo Summary • Using reporter assays we can systematically and comprehensively assays regulatory variants from patient DNA across genetic association loci • Combined with CRISPR-based approached to reprogram the epigenome, we can find target genes for those variants. • Together, this combination of approaches is directing us to unstudied target genes, revealing valuable new biological mechanisms of disease Two stories: • Identifying genetic mechanisms of disease

• Quantifying the gene regulatory effects of drugs Glucocorticoids (GCs)

Suppresses the immune system Reduces inflammation Glucocorticoids (GCs)

Suppresses the immune system Reduces inflammation

Increases blood sugar Reduces bone density The basic mechanism of the GC response GCs are one of the most extensively characterized genomic responses

GR binds tens of thousands of sites in the across many different cell types.

That binding coordinates with a host of other TFs and co-factors to ultimately regulate gene expression.

There are 100’s to 1000’s of GC-responsive genes, with slightly more activated genes than repressed genes. Many fundamental questions remain open

We still do not know how GR chooses which genomic sites to bind.

We still do not know which of those binding sites are functional.

We still do not know how GR typically represses genes. Quantifying the functionality of GR binding sites

Luciferase Reporter Assays Quantifying the functionality of GR binding sites

Luciferase Reporter Assays ChIP-STARR-seq Assays

Vockley et al, Cell, 2016 A platform for diverse studies

Whole human genome

GFP

STARR-seq: Arnold et al, Science, 2013 Whole human genome STARR-seq libraries

Graham Johnson Whole human genome STARR-seq libraries

NA12878 DNA WG-SS library A549 transformation transfection

20 mL of Gibson Assembly reactions

A extract differential e

call g

N

d

l

n

RNAs R

peaks enrichment o

a

F

h

A 0 1 4 812 c dex exposure (h) library deep N 5x biological replicates construction sequencing D Activity Graham Johnson Whole human genome STARR-seq libraries

NA12878 DNA WG-SS library A549 transformation transfection

20 mL of Gibson Assembly reactions

A extract differential e

call g

N

d

l

n

RNAs R

peaks enrichment o

a

F

h

A 0 1 4 812 c dex exposure (h) library deep N 5x biological replicates construction sequencing D Activity Graham Johnson Library coverage and fragment size

Median fragment Median 55x size of 390 bp genome coverage

Graham Johnson Whole human genome STARR-seq libraries

109 cells

5 replicates each 8 rxns per replicate

3 x 109 reads for DNA 3 x 109 reads for RNA

Graham Johnson Whole human genome STARR-seq for the GC response

Graham Johnson Whole human genome STARR-seq for the GC response

Graham Johnson Whole human genome STARR-seq for the GC response

Graham Johnson GC regulatory activity increase across the time course

Graham Johnson, PhD http The landscape The landscape of thegenomic DEXresponse :// Data browser:Data ggr.reddylab.org

DEX-responsive regulatory elements McDowellposter #99 http://www.encodeproject.org Data download Data Whole human genome STARR-seq for the GC response Whole human genome STARR-seq for the GC response In contrast, GR binding occurs early and wanes over time

Graham Johnson, PhD GR and AP1 motifs prominent at dex-induced sites

A Q4 Q3 Q2 Q1 Q1 Q2 Q3 Q4 B TP53 43 27 25 24 Branch 1 3 27 63 94 GR 15 AP1 31 27 25 27 **

HES2 9.4 12 13 9.1 17 14 16 4 AP1 ) ns *

n

M

u FOXG1 4.6 7.3 10 K 10

J

P

c

3.3 1.6 3.1 1.7 NFAT5

R

2

CREB1 6.1 5.8 (

g

q

o

l

MEF2D 4.6 3.9 2 e

s 5.1 TP73 Branch 2 n

-

a 5

P

GATA2 3.8 e

I

h

M

NFKB2 3.9 1.6 1.4 SP8 C IRF8 1.9 0 1.4 GATA2 TCF3 1.5 1.4 Q1 Q2 Q3 Q4 TBP 1.6 2.1 ZFX Binned steady-state REs HOXA10 2.1 2.1 C STAT1 4.5 1.9 1.5 1.9 HSF1 15

q ** SP8 5.7 e Q1 vs. Q2 ****

s

- 1.6 1.6 CREB1 ) Q1 vs. Q3 ****

P

ZFX 1.8 I

M

h 10 Q1 vs. Q4 ****

K

C

P NRF1 1.8 Q2 vs. Q3 ****

1.5 FOXG1 n

R

(

u SOX6 4 Q2 vs. Q4 **** J 5

C

c

F POU3F1 1.4 2.9 2.2 SOX6 2 Q3 vs. Q4 **

g

o 0 NFAT5 7.1 4.3 L 1.6 3.4 HES2 ZBTB33 9 1.3 -5 -Log10 Adj. P-values 30 0 0 20 100 Q1 Q2 Q3 Q4 25 20 15 10 5 0 0 4 8 12 16 20 Binned 8 h DREs P53 and AP1 prominent at repressed sites

A Q4 Q3 Q2 Q1 Q1 Q2 Q3 Q4 B TP53 43 27 25 24 Branch 1 3 27 63 94 GR 15 AP1 31 27 25 27 **

HES2 9.4 12 13 9.1 17 14 16 4 AP1 ) ns *

n

M

u FOXG1 4.6 7.3 10 K 10

J

P

c

3.3 1.6 3.1 1.7 NFAT5

R

2

CREB1 6.1 5.8 (

g

q

o

l

MEF2D 4.6 3.9 2 e

s 5.1 TP73 Branch 2 n

-

a 5

P

GATA2 3.8 e

I

h

M

NFKB2 3.9 1.6 1.4 SP8 C IRF8 1.9 0 1.4 GATA2 TCF3 1.5 1.4 Q1 Q2 Q3 Q4 TBP 1.6 2.1 ZFX Binned steady-state REs HOXA10 2.1 2.1 C STAT1 4.5 1.9 1.5 1.9 HSF1 15

q ** SP8 5.7 e Q1 vs. Q2 ****

s

- 1.6 1.6 CREB1 ) Q1 vs. Q3 ****

P

ZFX 1.8 I

M

h 10 Q1 vs. Q4 ****

K

C

P NRF1 1.8 Q2 vs. Q3 ****

1.5 FOXG1 n

R

(

u SOX6 4 Q2 vs. Q4 **** J 5

C

c

F POU3F1 1.4 2.9 2.2 SOX6 2 Q3 vs. Q4 **

g

o 0 NFAT5 7.1 4.3 L 1.6 3.4 HES2 ZBTB33 9 1.3 -5 -Log10 Adj. P-values 30 0 0 20 100 Q1 Q2 Q3 Q4 25 20 15 10 5 0 0 4 8 12 16 20 Binned 8 h DREs Identification of allele-specific GC responses 50% of dex-induced activity is outside of DHS

Graham Johnson, PhD 50% of dex-induced activity is outside of DHS

One hypothesis: some of those sites are sub-threshold DHS

Graham Johnson, PhD 50% of dex-induced activity is outside of DHS

One hypothesis: some of those sites are sub-threshold DHS

Q: Do those sites bind TFs? Q: Do those sites bind GR? A: About 10% A: 258 of 1700 (~15%)

Graham Johnson, PhD 50% of dex-induced activity is outside of DHS

One hypothesis: some of those sites are sub-threshold DHS

Q: Do those sites bind TFs? Q: Do those sites bind GR? A: About 10% A: 258 of 1700 (~15%)

Graham Johnson, PhD 50% of dex-induced activity is outside of DHS

Our hypothesis: the remaining closed chromatin sites are latent GR response elements in other cell types.

Graham Johnson, PhD 50% of dex-induced activity is outside of DHS

Our hypothesis: the remaining closed chromatin sites are latent GR response elements in other cell types.

Graham Johnson, PhD Summary

High coverage wgSTARR-seq assays provide detailed maps of the effect of GCs on regulatory element activity.

Can also be used to map allele-specific activity, and latent enhancers in other cells.

Because the assays do not require prior knowledge of mechanism, they can be used to measure regulatory responses to many environmental signals. Reddy Lab: Acknowledgements Funding: Graham Johnson, PhD William Lowe (NW) Sarah Cunningham Geoff Hayes (NW) Bill Majoros, PhD Denise Scholtens (NW) Keith Siklenka, PhD Mike Nodzenski Anthony D’Ippolito, PhD Ian McDowell, PhD Brian Layden Lab (UIC) Young-Sook Kim Jungkyun Seo Greg Crawford Lab: Reddy Lab Alums: Alejandro Barrera Alexias Safi Chris Vockley, PhD Luke Bartelt Ling Song Karl Guo, PhD Courtney Williams Nicole Clark Linda Hong Ana Berglind Alex Hartemink Lab: Sarah Leichter Charlie Gersbach Lab: Kevin Luo, PhD Dewran Kocak Barbara Engelhardt Lab: Tyler Klann, Josh Black Bianca Dumitrascu