monaLisa MOtif aNAlysis with Lisa
European Bioconductor Meeting 2019
Dania Machlab Lukas Burger Michael Stadler
Friedrich Miescher Institute for Biomedical Research Background and Motivation
Co-binding
Chromatin remodeling Use monaLisa to: Blocking • Identify Enriched motifs repositioning • Select motifs explaining observed changes Architectural role
Francois Spitz & Eileen E. M. Furlong (2012) Nature Reviews Genetics Background and Motivation
Enhancer Gene A Genome
ATAC-seq Condition 1
RNA-seq Condition 1
ATAC-seq Condition 2
RNA-seq Condition 2 Predicted TFBS Identify Enriched Motifs elta 1 2 d methylation 100 80 60 40 20 0 2 1 0 − − 10 8 6 4 2 0 Percent G+C Percent log2 enrichment FDR log10) log10) - − FDR ( FDR ( enrichment (log2) nrichment (log2) e ID4 Klf1 SP4 FEV ERF FLI1 ERG Klf12 KLF4 E2F7 ELK3 ZEB1 ETS1 ETV5 ETV1 ETV4 ETV3 CTCF KLF13 KLF14 CTCFL Rarbvar2 BHLHE41 RARAvar2
density of promoters Select Motifs using Stability Selection
Randomized lasso stability selection
weakness Lasso with Lasso Randomized Lasso parameter Cross Validation Stability Selection Stability Selection observed logFC predicted TFBS regularization parameter
Y X
perform ~ regularized regression large � small � true signal noise
Meinshausen & Bühlmann (2010) Journal of the Royal Statistical Society Select Motifs Explaining Observed Changes in Accessibility
NFATC1
TEAD2
TEAD3
NKX2−8
Nkx2−5(var.2)
NFIC
Pear. Cor. KLF5 1 0.5 GATA3 0 −0.5 −1 Gata1
GATA1::TAL1
HNF1A
Nr2f6
Hnf4a 8 − NFIC KLF5 Nr2f6 Hnf4a Gata1 GATA3 TEAD2 TEAD3 HNF1A 5(var.2) NFATC1 NKX2 − GATA1::TAL1 Nkx2 0.0 0.2 0.4 0.6 0.8 1.0 glmnet::glmnet and stabs::stabsel used probability selection Summary and Outlook
• We can identify TFs enriched in regions of interest that display certain log-fold changes • We can select TFs that are likely to explain the observed log-fold changes using stability selection • We can be use any fold-change defined on regions of interest (ATAC-seq, methylation, expression, ChIP-seq …) to select motifs explaining the observed logFC
• We want to look at motif enrichment without using existing databases (unbiased view) • Enriched k-mers, grouping them, aligning them to predict the motif • Submit to Bioconductor
• https://github.com/fmicompbio/monaLisa