Unweaving)The)Circuitry)) Of)Complex)Disease!
Total Page:16
File Type:pdf, Size:1020Kb
Unweaving)the)circuitry)) of)complex)disease! Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory Personal!genomics!today:!23!and!Me! Recombina:on)breakpoints) Me)vs.)) my)brother) Family)Inheritance) Dad’s)mom) My)dad) Mom’s)dad) Human)ancestry) Disease)risk) Genomics:)Regions))mechanisms)) Systems:)genes))combina:ons)) drugs) pathways) 1000s)of)diseaseHassociated)loci)from)GWAS) • Hundreds)of)studies,)each)with)1000s)of)individuals) – Power!of!gene7cs:!find!loci,!whatever!the!mechanism!may!be! – Challenge:!mechanism,!cell!type,!drug!target,!unexplained!heritability! GenomeHwide)associa:on)studies)(GWAS)) • Iden7fy!regions!that!coBvary!with!the!disease! • Risk!allele!G!more!frequent!in!pa7ents,!A!in!controls! • But:!large!regions!coBinherited!!!find!causal!variant! • Gene7cs!does!not!specify!cell!type!or!process! E environment causes syndrome G epigenomeX D S genome disease symptoms biomarkers effects Epidemiology The study of the patterns, causes, and effects of health and disease conditions in defined populations Gene:c) Tissue/) Molecular)Phenotypes) Organismal) Variant) cell)type) Epigene:c) Gene) phenotypes) Changes) Expression) Changes) Methyl.) ) Heart) Gene) Endo) DNA) expr.) phenotypes) Muscle) access.) ) Lipids) Cortex) Tension) CATGACTG! Enhancer) CATGCCTG! Lung) ) Gene) Heartrate) Disease) H3K27ac) expr.) Metabol.) Blood) ) Drug)resp) Skin) Promoter) Disease! ) Gene) cohorts! Nerve) Insulator) expr.) GTEx!/! ENCODE/! GTEx! Environment) Roadmap! Epigenomics/! Epigenomics! 29!mammals! Feedback)from)environment)/)disease)state) Building systems-level views of genome regulation Goal: A systems-level understanding of genomes and gene regulation:! • The regulators: Transcription factors, microRNAs, sequence specificities" • The regions: enhancers, promoters, and their tissue-specificity" • The targets: TFs"targets, regulators"enhancers, enhancers"genes" • The grammars: Interplay of multiple TFs " prediction of gene expression" ! The parts list = Building blocks of gene regulatory networks" ! Our tools: Comparative genomics & large-scale experimental datasets. ! • Evolutionary signatures for coding/non-coding genes, microRNAs, motifs" • Chromatin signatures for regulatory regions and their tissue specificity" • Activity signatures for linking regulators " enhancers " target genes" • Predictive models for gene function, gene expression, chromatin state" ! Integrative models = Define roles in development, health, disease" Compara:ve)genomics)maps) Func:onal)Genomics)) • Measure)constraint)across)species) and)Epigenomics) ENCODE) and)iden:fy)conserved)regions) Roadmap)) • Define)evolu:onary)signatures)for) in)reference)cell)types) Epigenomics) genes,)ncRNAs,),)miRNAs,)mo:fs) • FineHmap)topHscoring)loci) • Measure)lineageHspecific)constraint) • Iden:fy)relevant)cell)types) within)the)human)popula:on) • Iden:fy)relevant)pathways) 29)mammals) • Detect)addi:onal)loci) 2! 1! REFERENCE)MAPS) CATGACTG! Gene:c) CATGCCTG! GWAS) Disease) Varia:on) • TopHscoring)loci) 3! • PHvalues,)effect)sizes) • Agnos:c)to)mechanism) Molecular)Varia:on)) Molecular)Varia:on)) in)reference)individuals) in)cases/controls)Disease)) ) Environment) epigenomics Covariates) • Variants)changing)gene)expression) • Intermediate)molecular)phenotypes) • TissueHspecific,)mul:H:ssue)eQTLs) • Measured)in)diseaseHrelevant):ssues) ) • Pinpoint)regulatory)regions) GTEx • Capture)environmental)effects)) • Link)SNPs)to)target)genes/regions) • Capture)downstream)disease)effects) 4! MAPS)THAT)VARY)ACROSS)INDIVIDUALS) 5! Genomic/Epigenomic tools for disease SNPs • Evolutionary signatures gene/genome annotation – High-resolution annotation: genes, RNAs, motif instances – Measuring selection within the human population • Chromatin states regulatory region annotation – Classes of promoter/enhancer/transcribed/repressed/etc – Prioritizing chromatin experiments, mark/state imputation • Activity signatures linking enhancer networks – Activity-based linking of TFs " enhancers " targets – Testing activators/repressors in 2000+ human enhancers • Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs • Personal (epi)genomes: geno-/phenotype, cis/trans – Allele-specific activity. Alzheimer’s and brain methylation Evolutionary signatures reveal genes, RNAs, motifs Compare 29 mammals Increased conservation pinpoints functional regions! Distinct patterns of change distinguish diff. functions! Protein-coding genes! - Codon Substitution Frequencies" - Reading Frame Conservation" RNA structures! - Compensatory changes" - Silent G-U substitutions" microRNAs! - Shape of conservation profile" - Structural features: loops, pairs" - Relationship with 3’UTR motifs" Regulatory motifs! - Mutations preserve consensus" Lindblad-Toh Nature 2011! - Increased Branch Length Score" Stark Nature 2007! - Genome-wide conservation" Measuring constraint at individual nucleotides NRSF motif! • Reveal individual transcription factor binding sites • Within motif instances reveal position-specific bias • More species: motif consensus directly revealed Translational read-through in human & fly! Overlapping selection in human exons! Synonym. Substitut. Rate! Jungreis, Genome Research 2011! Lin, Genome Research 2011! Protein-coding Continued protein- No more Stop codon! 2nd stop! Reveal splicing signals, RNA structures, conservation coding conservation conserv read through! codon! enhancer motifs, dual-coding genes! RNA structure families: ortholog/paralog cons! Regions of codon-level positive selection! distributed" localized" Parker Gen. Res. 2011! Lindblad-Toh Nature 2011! Ex:MAT2A S-adeosyl-methionic level detection! Distributed vs. localized positive selection! Structure/loop sequence deep conservation! Immunity/taste vs. retinal/bone/secretion! Human constraint outside conserved regions Active regions Average diversity (heterozygosity) " Aggregate over the genome" • Non-conserved regions: • Conserved regions: – ENCODE-active regions – Non-ENCODE regions show reduced diversity show increased diversity ! Lineage-specific constraint in ! Loss of constraint in human biochemically-active regions when biochemically-inactive Detect!SNPs!that!disrupt!conserved!regulatory!mo7fs! • Func7onallyBassociated!SNPs!enriched!in!states,!constraint! • Priori7ze!candidates,!increase!resolu7on,!disrupted!mo7fs! What!frac7on!of!the!genome!maWers?! 1%!proteinBcoding! Ac7ve!in!Cell!Type!1! 40%B80%!ac7ve! 5%!mammalian! across!mul7ple! conserved! cell!types! +4B11%!human! constrained! Ac7ve!in!Cell!Type!2! …! 3,000,000,000)nucleo:des) • Evidence)of)lineageHspecific)constraint)in)human)for)nonHconserved)elements)) • Muta:ons)in)nonHconserved)regions)can)have)lateHonset)disease)phenotypes) • Gene:c)disease)requires)understanding)the)whole)human)genome) Functional genomics and GTEx for disease 1. Reference Epigenomes chromatin states, linking – Annotate dynamic regulatory elements in multiple cell types – Activity-based linking of regulators " enhancers " targets 2. Interpreting disease-associated sequence variants – Mechanistic predictions for individual top-scoring SNPs – Functional roles of 1000s of disease-associated SNPs 3. Multi-cell systems-level expression changes in GTEx – Learn expression programs in 40 tissues for each person – Genetic basis of gene module membership changes 4. Genetic / epigenomic variation in health and disease – Genetic variation#Brain methylation#Alzheimer’s disease – Global repression of distal enhancers. NRSF, ELK1, CTCF Compara:ve)genomics)maps) Func:onal)Genomics)) • Measure)constraint)across)species) and)Epigenomics) ENCODE) and)iden:fy)conserved)regions) Roadmap)) • Define)evolu:onary)signatures)for) in)reference)cell)types) Epigenomics) genes,)ncRNAs,),)miRNAs,)mo:fs) • FineHmap)topHscoring)loci) • Measure)lineageHspecific)constraint) • Iden:fy)relevant)cell)types) within)the)human)popula:on) • Iden:fy)relevant)pathways) 29)mammals) • Detect)addi:onal)loci) 2! 1! REFERENCE)MAPS) CATGACTG! Gene:c) CATGCCTG! GWAS) Disease) Varia:on) • TopHscoring)loci) • PHvalues,)effect)sizes) • Agnos:c)to)mechanism) Chromatin signatures for genome annotation Epigenomic maps 1. DNA methylation 2. Histone modifications Multi-variate HMM captures combinations 3. DNA accessibility Ernst & Kellis Nature Biotech 2010. Ernst & Kellis Nature Methods, 2012.ChromHMM Chroma:n)state)dynamics)reveal)linking/regulators) Predicted linking Correlated activity • Single annotation track for each cell type • Summarize cell-type activity at a glance • Study activity pattern across tissues Ernst , Kheradpour, et al, Nature 2011 Multi-cell activity profiles connect enhancers Gene Chromatin Active TF motif TF regulator Dip-aligned expression States enrichment expression motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 Link enhancers to target genes ON Active enhancer Motif enrichment TF On Motif aligned OFF Repressed Motif depletion TF Off Flat profile Multi-cell activity profiles connect enhancers Gene Chromatin Active TF motif TF regulator Dip-aligned expression States enrichment expression motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 Link TFs to target enhancers! Predict activators vs. repressors! ON Active enhancer Motif enrichment TF On Motif aligned OFF Repressed Motif depletion TF Off Flat profile Coordinated)ac:vity)reveals)ac:vators/repressors) Enhancer activity! Activity signatures for each TF! Ex1: Oct4 predicted activator Ex2: Gfi1 repressor