monaLisa MOtif aNAlysis with Lisa

European Bioconductor Meeting 2019

Dania Machlab Lukas Burger Michael Stadler

Friedrich Miescher Institute for Biomedical Research Background and Motivation

Co-binding

Chromatin remodeling Use monaLisa to: Blocking • Identify Enriched motifs repositioning • Select motifs explaining observed changes Architectural role

Francois Spitz & Eileen E. M. Furlong (2012) Nature Reviews Genetics Background and Motivation

Enhancer A Genome

ATAC-seq Condition 1

RNA-seq Condition 1

ATAC-seq Condition 2

RNA-seq Condition 2 Predicted TFBS Identify Enriched Motifs elta 1 2 d methylation 100 80 60 40 20 0 2 1 0 − − 10 8 6 4 2 0 Percent G+C Percent log2 enrichment FDR log10) log10) - − FDR ( FDR ( enrichment (log2) nrichment (log2) e ID4 Klf1 SP4 FEV ERF FLI1 ERG Klf12 E2F7 ELK3 ZEB1 ETS1 ETV5 ETV1 ETV4 ETV3 CTCF KLF13 KLF14 CTCFL Rarbvar2 BHLHE41 RARAvar2

density of promoters Select Motifs using Stability Selection

Randomized lasso stability selection

weakness Lasso with Lasso Randomized Lasso parameter Cross Validation Stability Selection Stability Selection observed logFC predicted TFBS regularization parameter

Y X

perform ~ regularized regression large � small � true signal noise

Meinshausen & Bühlmann (2010) Journal of the Royal Statistical Society Select Motifs Explaining Observed Changes in Accessibility

NFATC1

TEAD2

TEAD3

NKX2−8

Nkx2−5(var.2)

NFIC

Pear. Cor. KLF5 1 0.5 GATA3 0 −0.5 −1 Gata1

GATA1::TAL1

HNF1A

Nr2f6

Hnf4a 8 − NFIC KLF5 Nr2f6 Hnf4a Gata1 GATA3 TEAD2 TEAD3 HNF1A 5(var.2) NFATC1 NKX2 − GATA1::TAL1 Nkx2 0.0 0.2 0.4 0.6 0.8 1.0 glmnet::glmnet and stabs::stabsel used probability selection Summary and Outlook

• We can identify TFs enriched in regions of interest that display certain log-fold changes • We can select TFs that are likely to explain the observed log-fold changes using stability selection • We can be use any fold-change defined on regions of interest (ATAC-seq, methylation, expression, ChIP-seq …) to select motifs explaining the observed logFC

• We want to look at motif enrichment without using existing databases (unbiased view) • Enriched k-mers, grouping them, aligning them to predict the motif • Submit to Bioconductor

• https://github.com/fmicompbio/monaLisa