<<

Decoding States with Epigenome Data

02-715 Advanced Topics in Computaonal HMMs for Decoding Chromatin States

• Epigenec modificaons of the have been associated with – Establishing cell idenes during development – DNA repair, replicaon – Human diseases – • De novo discovery of chroman states given epigenec marks with HMMs – Emission probabilies: which histone marks co-occur? – Transion probabilies: how chroman states are distributed spaally across the genome Dataset

• Genome-wide occupancy data in human CD4 T-cells from ChIP-seq experiments – 38 different histone methylaon and acetylaon marks – Histone variant H2AZ – RNA polymerase II – CTCF

– E.g., H3K9me3 trimethylated lysine 9 of histone 3 HMMs for Decoding Chromatin States

• Hidden states for unknown chroman states – Models with varying number of states – 79 states, pruned to 51 states

• Histone mark data as observaons – Data are binarized (aer thresholding) for each window of size 200bp – Binomial distribuon for each histone mark as emission probability – All histone marks are treated as independent Example of Chromatin State Annotation

Posterior probability of states at each locus, given data Estimated Chromatin States - Emission Emission probabilies Probabilities Genomic funconal enrichment GO Enrichment for Promoter States

• Although states 3-8 were promoter states, each state is enriched for genes with different GO categories Comparison of Promoter States

• Different promoter states peak at different sites Comparison of Transcribed States GWAS and Chromatin States

• GWAS-enriched chroman state 33 Power for Discovering Chromatin States Feature Selection

• We may not need all of the histone marks to explain the chroman state

• Feature selecon as step-wise forward selecon to select a subset of histone marks that describe the chroman state Feature Selection Epigenome and Epigenome and Transcription

• Histone modificaon levels can influence gene expressions

• Nucleosome posions can influence gene expressions – DNA sequence specificies of nucleosome and transcripon factor binding sites – Nucleosomes as repressors

• Methylaon usually represses transcripon Key Questions

• Is there a quantave relaonship between histone modificaons levels and transcripon?

• Is there a subset of histone modificaons that predict transcripon beer than others?

• Are there different requirements for epigenec marks for different promoter types?

• Do these relaonships between histone modificaons and transcripon hold in different ssue types? Dataset

• 38 histone modificaons and one histone variant in human CD4+ T-cells – ChIP-seq data – In a region of 4,001 bp surrounding the transcripon start sites of 14,801 RefSeq genes

• Gene expression levels in the CD4+ T-cells

• 9 histone modificaons in CD36+ and CD133+ cells

• Gene expression levels in CD36+ and CD133+ cells

Histone modificaon levels are predicve for gene expression. (Karlic et al., PNAS, 2010) Linear Models

• Linear regression method – Predictors: histone marks • No binarizaon • For genes with no histone modificaons for parcular modificaons, add a pseudocount – Responses: gene expressions – Promoter regions of different genes as samples Linear Models

• Full model including all histone modificaons

• Compute r2 between observed gene expressions and predicted values to assess the predicve power of the model Linear Models

• Selecng the histone modificaons with the most predicve power Linear Models

• Selecng the histone modificaons with the most predicve power with BIC scores Prediction Accuracy Searching for Histone Modifications with the Most Predictive Power

• The most frequently appearing histone modificaons in models with 1, 2, 3 histone modificaons Model with One Histone Modification

• Correlaons between expressions and each histone modificaon

• Redundancy in histone modificaons Histone Modifications and Promoter Types

• Different promoter types to be considered – LCPs : low CpG content promoters – HCPs : high CpG content promoters – Nucleosomes in HCPs almost always have H3K4me3 marks, whereas nucleosomes in LCPs carry this modificaon only when they are expressed.

• Hypothesis: expression levels of genes with LCPs and HCPs can be predicted by different sets of histone modificaons Histone Modifications and Promoter Types

• Experimental setup – 1,779 LCPs and 7,089 HCPs in the dataset – Fit different models to each of LCPs and HCPs and compare them with the model esmated from the full dataset Histone Modifications and Promoter Types Considering Different Tissue Types

• Used the model trained on CD4+ data to predict gene expressions in CD133+ and CD36+ cells

• Used only those gene expressions with more than five fold differences between CD4+ and CD133+ (also between CD4+ and CD36+) Nucleosome and Transcription

• DNA sequence mofs with high nucleosome binding affinies – Potenally related to bending DNA around the nucleosomes

• DNA sequence mofs with high transcripon factor binding affinies – TF concentraon can also influence gene expression

• Compeon between nucleosomes and transcripon factors can influence the transcripon DNA Sequence, DNA-binding Proteins, and Gene Expression

• Mixture model for predicng gene expressions from nucleosomes and other DNA binding proteins – E: gene expression – C: protein configuraons DNA Sequence, DNA-binding Proteins, and Gene Expression

• Mixture proporons

• Mixture component models Nucleosome and Transcription Nucleosome and Transcription Competition between Nucleosomes and Transcription Factors Competition between Nucleosomes and Transcription Factors Transcriptional Noise Cooperative Binding Reduces Transcriptional Noise Fuzzy Nucleosomes

• Well-posioned vs. fuzzy nucleosomes – Can be inferred from DNA sequences – In fuzzy nucleosomes, many nucleosome posions are observed

Well-posioned Fuzzy nucleosomes nucleosomes Summary

• Histone modificaons contain informaon on chroman states. Chroman states can be potenally decoded from epigenec data.

• Epigenecs and gene expressions – histone modificaons can influence gene expression – nucleosome posions and the compeon between TFs and nucleosome can influence gene expression