Unweaving)the)circuitry)) of)complex)disease!

Manolis Kellis

Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory Personal!!today:!23!and!Me! Recombina:on)breakpoints)

Me)vs.)) my)brother) Family)Inheritance)

Dad’s)mom) My)dad) Mom’s)dad) Human)ancestry) Disease)risk)

Genomics:)Regions))mechanisms)) Systems:)genes))combina:ons)) drugs) pathways) 1000s)of)diseaseHassociated)loci)from)GWAS)

• Hundreds)of)studies,)each)with)1000s)of)individuals) – Power!of!gene7cs:!find!loci,!whatever!the!mechanism!may!be! – Challenge:!mechanism,!cell!type,!drug!target,!unexplained!heritability! GenomeHwide)associa:on)studies)(GWAS))

• Iden7fy!regions!that!coBvary!with!the!disease! • Risk!allele!G!more!frequent!in!pa7ents,!A!in!controls! • But:!large!regions!coBinherited!!!find!causal!variant! • Gene7cs!does!not!specify!cell!type!or!process! E environment

causes syndrome

G epigenomeX D S genome disease symptoms biomarkers effects Epidemiology The study of the patterns, causes, and effects of health and disease conditions in defined populations Gene:c) Tissue/) Molecular)Phenotypes) Organismal) Variant) cell)type) Epigene:c) Gene) phenotypes) Changes) Expression) Changes) Methyl.) ) Heart) Gene) Endo) DNA) expr.) phenotypes) Muscle) access.) ) Lipids) Cortex) Tension) CATGACTG! Enhancer) CATGCCTG! Lung) ) Gene) Heartrate) Disease) H3K27ac) expr.) Metabol.) Blood) ) Drug)resp) Skin) Promoter) Disease! ) Gene) cohorts! Nerve) Insulator) expr.) GTEx!/! ENCODE/! GTEx! Environment) Roadmap! /! Epigenomics! 29!mammals! Feedback)from)environment)/)disease)state) Building systems-level views of genome regulation

Goal: A systems-level understanding of genomes and gene regulation:! • The regulators: Transcription factors, microRNAs, sequence specificities" • The regions: enhancers, promoters, and their tissue-specificity" • The targets: TFs"targets, regulators"enhancers, enhancers"genes" • The grammars: Interplay of multiple TFs " prediction of gene expression" ! The parts list = Building blocks of gene regulatory networks" ! Our tools: Comparative genomics & large-scale experimental datasets. ! • Evolutionary signatures for coding/non-coding genes, microRNAs, motifs" • Chromatin signatures for regulatory regions and their tissue specificity" • Activity signatures for linking regulators " enhancers " target genes" • Predictive models for gene function, gene expression, chromatin state" ! Integrative models = Define roles in development, health, disease" Compara:ve)genomics)maps) Func:onal)Genomics)) • Measure)constraint)across)species) and)Epigenomics) ENCODE) and)iden:fy)conserved)regions) Roadmap)) • Define)evolu:onary)signatures)for) in)reference)cell)types) Epigenomics) genes,)ncRNAs,),)miRNAs,)mo:fs) • FineHmap)topHscoring)loci) • Measure)lineageHspecific)constraint) • Iden:fy)relevant)cell)types) within)the)human)popula:on) • Iden:fy)relevant)pathways) 29)mammals) • Detect)addi:onal)loci) 2! 1! REFERENCE)MAPS)

CATGACTG! Gene:c) CATGCCTG! GWAS) Disease) Varia:on) • TopHscoring)loci) 3! • PHvalues,)effect)sizes) • Agnos:c)to)mechanism) Molecular)Varia:on)) Molecular)Varia:on)) in)reference)individuals) in)cases/controls)Disease)) ) Environment) epigenomics Covariates) • Variants)changing)gene)expression) • Intermediate)molecular)phenotypes) • TissueHspecific,)mul:H:ssue)eQTLs) • Measured)in)diseaseHrelevant):ssues) ) • Pinpoint)regulatory)regions) GTEx • Capture)environmental)effects)) • Link)SNPs)to)target)genes/regions) • Capture)downstream)disease)effects)

4! MAPS)THAT)VARY)ACROSS)INDIVIDUALS) 5! Genomic/Epigenomic tools for disease SNPs • Evolutionary signatures  gene/genome annotation – High-resolution annotation: genes, RNAs, motif instances – Measuring selection within the human population • Chromatin states  regulatory region annotation – Classes of promoter/enhancer/transcribed/repressed/etc – Prioritizing chromatin experiments, mark/state imputation • Activity signatures  linking enhancer networks – Activity-based linking of TFs " enhancers " targets – Testing activators/repressors in 2000+ human enhancers • Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs • Personal (epi)genomes: geno-/phenotype, cis/trans – Allele-specific activity. Alzheimer’s and brain methylation Evolutionary signatures reveal genes, RNAs, motifs Compare 29 mammals

Increased conservation pinpoints functional regions!

Distinct patterns of change distinguish diff. functions! Protein-coding genes! - Codon Substitution Frequencies" - Reading Frame Conservation" RNA structures! - Compensatory changes" - Silent G-U substitutions" microRNAs! - Shape of conservation profile" - Structural features: loops, pairs" - Relationship with 3’UTR motifs" Regulatory motifs! - Mutations preserve consensus" Lindblad-Toh Nature 2011! - Increased Branch Length Score" Stark Nature 2007! - Genome-wide conservation" Measuring constraint at individual nucleotides NRSF motif!

• Reveal individual transcription factor binding sites • Within motif instances reveal position-specific bias • More species: motif consensus directly revealed Translational read-through in human & fly! Overlapping selection in human exons!

Synonym. Substitut. Rate! Jungreis, Genome Research 2011! Lin, Genome Research 2011! Protein-coding Continued protein- No more Stop codon! 2nd stop! Reveal splicing signals, RNA structures, conservation coding conservation conserv read through! codon! enhancer motifs, dual-coding genes!

RNA structure families: ortholog/paralog cons! Regions of codon-level positive selection! distributed"

localized"

Parker Gen. Res. 2011! Lindblad-Toh Nature 2011! Ex:MAT2A S-adeosyl-methionic level detection! Distributed vs. localized positive selection! Structure/loop sequence deep conservation! Immunity/taste vs. retinal/bone/secretion! Human constraint outside conserved regions Active regions Average diversity (heterozygosity) " Aggregate over the genome"

• Non-conserved regions: • Conserved regions: – ENCODE-active regions – Non-ENCODE regions show reduced diversity show increased diversity ! Lineage-specific constraint in ! Loss of constraint in human biochemically-active regions when biochemically-inactive Detect!SNPs!that!disrupt!conserved!regulatory!mo7fs!

• Func7onallyBassociated!SNPs!enriched!in!states,!constraint! • Priori7ze!candidates,!increase!resolu7on,!disrupted!mo7fs! What!frac7on!of!the!genome!maWers?!

1%!proteinBcoding! Ac7ve!in!Cell!Type!1!

40%B80%!ac7ve! 5%!mammalian! across!mul7ple! conserved! cell!types! +4B11%!human! constrained! Ac7ve!in!Cell!Type!2!

…!

3,000,000,000)nucleo:des) • Evidence)of)lineageHspecific)constraint)in)human)for)nonHconserved)elements)) • Muta:ons)in)nonHconserved)regions)can)have)lateHonset)disease)phenotypes) • Gene:c)disease)requires)understanding)the)whole)human)genome) Functional genomics and GTEx for disease

1. Reference Epigenomes  chromatin states, linking – Annotate dynamic regulatory elements in multiple cell types – Activity-based linking of regulators " enhancers " targets 2. Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs 3. Multi-cell systems-level expression changes in GTEx – Learn expression programs in 40 tissues for each person – Genetic basis of gene module membership changes 4. Genetic / epigenomic variation in health and disease – Genetic variation#Brain methylation#Alzheimer’s disease

– Global repression of distal enhancers. NRSF, ELK1, CTCF Compara:ve)genomics)maps) Func:onal)Genomics)) • Measure)constraint)across)species) and)Epigenomics) ENCODE) and)iden:fy)conserved)regions) Roadmap)) • Define)evolu:onary)signatures)for) in)reference)cell)types) Epigenomics) genes,)ncRNAs,),)miRNAs,)mo:fs) • FineHmap)topHscoring)loci) • Measure)lineageHspecific)constraint) • Iden:fy)relevant)cell)types) within)the)human)popula:on) • Iden:fy)relevant)pathways) 29)mammals) • Detect)addi:onal)loci) 2! 1! REFERENCE)MAPS)

CATGACTG! Gene:c) CATGCCTG! GWAS) Disease) Varia:on) • TopHscoring)loci) • PHvalues,)effect)sizes) • Agnos:c)to)mechanism) Chromatin signatures for genome annotation Epigenomic maps

1. DNA methylation 2. Histone modifications

Multi-variate HMM captures combinations

3. DNA accessibility

Ernst & Kellis Nature Biotech 2010. Ernst & Kellis Nature Methods, 2012.ChromHMM Chroma:n)state)dynamics)reveal)linking/regulators)

Predicted linking

Correlated activity • Single annotation track for each cell type • Summarize cell-type activity at a glance • Study activity pattern across tissues Ernst , Kheradpour, et al, Nature 2011 Multi-cell activity profiles connect enhancers

Gene Chromatin Active TF motif TF regulator Dip-aligned expression States enrichment expression motif biases

HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1

Link enhancers to target genes

ON Active enhancer Motif enrichment TF On Motif aligned OFF Repressed Motif depletion TF Off Flat profile Multi-cell activity profiles connect enhancers

Gene Chromatin Active TF motif TF regulator Dip-aligned expression States enrichment expression motif biases

HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1

Link TFs to target enhancers! Predict activators vs. repressors!

ON Active enhancer Motif enrichment TF On Motif aligned OFF Repressed Motif depletion TF Off Flat profile Coordinated)ac:vity)reveals)ac:vators/repressors) Enhancer activity! Activity signatures for each TF!

Ex1: Oct4 predicted activator Ex2: Gfi1 repressor of of embryonic stem (ES) cells" K562/GM cells"

• Enhancer networks: Regulator " enhancer " target gene EnhancerHgene)links)supported)by)eQTLHgene)links) eQTL study! 15kb Valida:on)ra:onale:)) Individuals • Expression!Quan7ta7ve!Trait!Loci!(eQTLs)! Indiv. 1 B0.5! A provide!independent!SNPHtoHgene!links! Indiv. 2 B1.5! A • Do!they!agree!with!ac7vityBbased!links?! Indiv. 3 B1.8! A Indiv. 4 3.1! C Example:))Lymphoblastoid)(GM))cells)study) Indiv. 5 1.1! A • Expression/genotype)across!60!individuals! (Montgomery!et#al,!Nature!2010)!! Indiv. 6 B1.8! A • Indiv. 7 B1.4! A 120)eQTLs!are!eligible!for!enhancerBgene! linking!based!on!our!datasets! Indiv. 8 3.2! C • 51!actually!linked!(43%)!using!predic7ons!! Indiv. 9 4.4! C ! … … … !4Hfold)enrichment)(10%!exp.!by!chance)!

• Expression Sequence variant Independent!valida7on!of!links.! level of gene at distal position • Relevance!to!disease!datasets.!23" Causal motifs supported by dips & enhancer assays

Predicted causal HNF motifs (that also showed dips) in HepG2 enhancers"

Tarjei Mikkelsen " Dip evidence of TF binding Enhancer activity halved (nucleosome displacement) by single-motif disruption

 Motifs bound by TF, contribute to enhancers24! Epigenomics Roadmap across 110 tissues/cell types

• ChIP-Seq: 8-20 histone marks Integration: • Methyl: WGBS, RRBS, MeDip, MRE chromatin states, • Accessibility: DNase, Footprints regulatory regions, • RNA: mRNA, smRNA, Exons hi-res, high-coverage Epigenomics Roadmap: 90 reference epigenomes

Interpret GWAS, global effects, reveal relevant cell types CellHtype)specific)func:onal)enrichments)for)cellHtype)specific)enhancers)

Axon)extension)

Development) Muscle) Ac:on)poten:al) Transla:on)

ES cells

ES derived Gastric

Fetal Brain Fetal Organs

Mucosa Skeletal muscle

Smooth muscle

Adult Brain

Skin Mesenchymal Cell)Types) Fibroblast Melanocyte Macrophage

HSCs B-cell T-cell

500k)Enhancers)

TGFHbeta)

Neuronal) THcell)s:mula:on) Lymphocyte)ac:vity) Wouter)Meuleman,)Epigenomics)Roadmap)project,)in)prepara:on) Systema7c!mo7f!disrup7on!for!5!ac7vators! and!2!repressors!in!2!human!cell!lines!

54000+ measurements (x2 cells, 2x repl)!

Kheradpour et al Genome Research 2013 Example!ac7vator:! conserved!HNF4! mo7f!match! WT!expression!! specific!to!HepG2! Mo7f!match! disrup7ons!reduce! expression!to! background!

NonBdisrup7ve! changes!maintain! expression!

Random!changes! depend!on!effect! to!mo7f!match! Functional genomics and GTEx for disease

1. Reference Epigenomes  chromatin states, linking – Annotate dynamic regulatory elements in multiple cell types – Activity-based linking of regulators " enhancers " targets 2. Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs 3. Multi-cell systems-level expression changes in GTEx – Learn expression programs in 40 tissues for each person – Genetic basis of gene module membership changes 4. Genetic / epigenomic variation in health and disease – Genetic variation#Brain methylation#Alzheimer’s disease

– Global repression of distal enhancers. NRSF, ELK1, CTCF Revisiting disease- xx associated variants

• Disease-associated SNPs enriched for enhancers in relevant cell types • E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator Mechanistic predictions for top disease-associated SNPs Lupus erythromatosus in GM lymphoblastoid! Erythrocyte phenotypes in K562 leukemia cells!

`

Disrupt activator Ets-1 motif Creation of repressor Gfi1 motif  Loss of GM-specific activation  Gain K562-specific repression  Loss of enhancer function  Loss of enhancer function  Loss of HLA-DRB1 expression  Loss of CCDC162 expression Acknowledgements and Collaborators 1. Epigenomics Roadmap Project and ENCODE – Brad Bernstein, Bing Ren, John Stam, Joe Costello – Anshul Kundaje, Wouter Meuleman, Matt Eaton, Pouya Kheradpour 2. GWAS interpretation: HaploReg and whole-genome – Luke Ward, Abhishek Sarkar – Collaborat.: Vineeta Agarwala, David Altshuler, Martin Aryee 3. Natural molecular variation: network QTLs – GTEx: Kristin Ardlie, Gaddy Getz, Manolis Dermitzakis – Benjamin Iriarte 4. Molecular variation in disease: AD brain methylation – David Bennett, Phil De Jager, Memory and Aging Project (MAP), and Religious Order Study (ROS) cohorts – Matthew Eaton Collaborators and Acknowledgements

• ENCODE – Brad Bernstein, Tarjei Mikkelsen, Noam Shoresh, David Epstein • Massively parallel enhancer reporter assays – Tarjei Mikkelsen, • Epigenome Roadmap – Bing Ren, Brad Bernstein, John Stam, Joe Costello • 2X mammals – Kerstin Lindblad-Toh, , Manuel Garber, Or Zuk • Funding – NHGRI, NIH, NSF Sloan Foundation MIT group Compbio.mit.edu

Mike Lin Ben Holmes Soheil Luke Angela Feizi Bob Yen Ward Altshul Mukul Chris Bansal Bristow er Stefan Washietl Pouya Kheradpour Manoli Irwin Rachel s Jason Jessica Daniel Ernst Jungreis Sealfon Kellis Wu Marbach Louisa DiStefano Dave Loyal Sushmita Hendrix Goff Roy Stata3 Stata4