<<

What Can the Epigenome Teach Us About Cellular States and Diseases?

(a computer scientist’s view)

Luca Pinello Outline

• Epigenetic: the code over the code

• What can we learn from epigenomic data?

• Resources for epigenomic data & analysis Human Project We are what we are thanks to our genes

Genes determine:

• Cellular state

• Disease

• Can be used to make diagnosis and design therapies

First draft of our genome became available in 2001 Genes are not sufficient to explain the complexity of an organism We need to learn to “read” the non-coding part of the genome

adapted from: http://www.opont.hu/hirek3.php?k_hirAzn=27234&k_hirFl=201112&k_hirKat=7 Gene “Regulatory region” ? Transcription Factors

• Transcription factors are proteins that control which genes are turned on or off in the genome

• Their activity determines how cells function and respond to cellular environments

• We have many TFs (>1000) OK, how can I find my spot?

• Single and multiple alignments, motif search tools DNA: a protein parking lot organized by sequences?

• A fundamental question is: is there a natural order dictated by the sequence, or are the binding locations of a protein dictated by other factors? The Epigenetic revolution Interest in Epigenetic is still raising

“Epigenetic can be used to describe anything other than DNA sequence that influences the development of an organism.” Same genotype different phenotypes

“inheritance of phenotype is presumably based on epigenetic modifications of the IAP that may include DNA methylation or packaging”

Different diets in otherwise identical mice can determine glucose intolerance and obesity risk in offspring I told you!

Lamarck’s Darwin’s theory of theory of inheritance of Natural acquired Selection characters

You can inherit “something” beyond the DNA sequence! Epigenetic and gene regulation Epigenetic and chromatin structure

• All the cells (almost) of our body share the same genome but have very different programs….

Adapted from: http://jpkc.scu.edu.cn/ywwy/zbsw(E)/edetail12.htm Chromatin Structure: it’s not static!

Adapted from: http://alexnabaum.blogspot.com/ The code over the code • The chromatin structure and the accessibility are mainly controlled by:

1. Nucleosome positioning

2. DNA methylation

3. Histone modifications

Adapted from: The Cell Biology of Stem Cells (2010) Histone Modifications Specific histone modifications or combinations of modifications confer unique biological functions to the region of the genome associated with them:

Gene

Enhancer

Adapted from Turner, Cell 2002 Heterochromatin Histone Modifications are not static Epigenetic (part of) the control logic of the “software” Sequence them all!

• Transcription Factors ChIP-seq • Histone modifications, nucleosomes • Chromatin remodelers

Bisulfite-seq • DNA Methylation

DNASE-seq • Open Chromatin

RNA-seq • Gene Expression Epigenetic and diseases?

• De-regulation of chromatin regulators/modifiers and chromatin structure

• Use epigenetic information to annotate genetic variants involved in disease The problem: What are the functional mechanisms underlying genetic variants and epigenetic alterations associated with complex traits and diseases?

Genetic Variation

Chromosome Insertion Mutations Rearrangements Deletions SNP

Non Coding RNA Disease

Chromatin Regulators Nucleosome Histone Positioning Modifications DNA Methylation Dysregulation Dysregulation Dysregulation

Epigenetic Variation Epigenetic and Disease

• Deregulation of chromatin remodelers, modifiers and aberrant pattern of histone modifications

• Chromatin instability: although epigenetic changes do not alter the sequence of DNA, an altered chromatin can facilitate mutations and erroneous recombination

• Hyper-methylation of tumor suppressor genes, hypo- methylation or hypo-acethylation of oncogenes

• Loss of Imprinting: activation of the normally silenced allele of an imprinted gene. Genetic and Epigenetic variation in

Transcription factors Chromatin regulators

Suvà et al. Science 2013 GWAS: you can find variants associated with different diseases but… We need to look into the “junk”

“Although 88% of trait/disease-associated SNPs (TASs) were intronic (45%) or intergenic (43%), TASs were not overrepresented in introns and were significantly depleted in intergenic regions…” Where should we look?

adapted from: Paul et al. Bioessays 2014

• Correlations between close variants make it difficult to pinpoint the causal one/s. Where should we look? • We can use external functional annotations:

• Conservation • Enhancers • Open Chromatin • ...

adapted from: Paul et al. Bioessays 2014

• Which one should we use? In which cell type? Exploit epigenetic variability to highlight functional regions and regulators Where to focus?

31 Exploit the cross cell-type variability to find interesting regions

Cell type 2: What’s that?

Cell type 1: Boring Cell type 3: Boring DHS and genetic variants

• Tissue-selective enrichment of disease- associated variants within DHSs

• Disease-associated variants systematically perturb transcription factor recognition sequences DHS and partitioning heritability of regulatory variants

Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of from imputed SNPs (5.1× enrichment; p = 3.7 × 10−17) Exploit the cross cell-type variability to find interesting regions GOAL

Find non-redundant cell type specific regulators and functional regions

We used 19 ChIP-seq datasets for H3K27me3 from the ENCODE project and validated a novel TF in blood development. Haystack Pipeline

Using this pipeline we predicted and experimentally validated regions and novel transcription factors important in blood development Haystack integrative analysis

H3k27ac and gene expression of lymphoblastoid lines from 19 individuals

Data from Kasowski et al, Science 2014 Use Haystack on your data! • Exploiting the variability to find non-redundant cell type/sample specific regulators and functional regions

A Python package called HAYSTACK implements our pipeline

github.com/lucapinello/Haystack

hub.docker.com/r/lucapinello/haystack_bio/ We have many histone modifications…

Idea: We need a way to summarize the combinatorial patterns of multiple histone marks http://compbio.mit.edu/ChromHMM/ Scaling up: Chromatin States

• Chromatin states are defined based on different combinations of histone modifications and correspond to different functional regions

• The goal is to segment the genome into biologically meaningful units. How can we learn the combinatorial code?

• ChromHMM quantifies the presence or absence of each mark in bins of fixed size

1 1 1 1 0 1 0 1 1 0 0 0 .. 1 1 1 1 .. 1 1 0 1 Genomic sequence ChromHMM and segmentation Chromatin states and diseases

Intersection of strong enhancer states with disease-associated SNPs from GWASs shows significant enrichment in relevant cell types

Roadmap Roadmap Epigenomics Roadmap Epigenomics Roadmap Epigenomics Roadmap Epigenomics Resources http://www.roadmapepigenomics.org Roadmap Epigenomics portal http://www.roadmapepigenomics.org Roadmap Epigenomics portal ENCODE portal https://www.encodeproject.org/ Blueprint epigenome http://www.blueprint-epigenome.eu

BLUEPRINT focuses on haematopoietic cells from healthy and diseased individuals International Consortium

IHEC makes available comprehensive sets of reference epigenomes relevant to health and disease Haploreg www.broadinstitute.org/mammals/haploreg/haploreg.php Regulome DB http://www.regulomedb.org/ http://screen.umassmed.edu/ Interesting directions…

• Single cell epigenomics

• Genome editing to uncover other uncharacterized/unmarked regulatory elements? (see for example Rajagopa et al Nat Biotech 2016)

• Epigenetic Wide Association Studies (EWAS) pinellolab.org Computational positions available*

* Boats not included