Martin Stöter HT - Technology Development Studio (TDS), KNIME workshop the HC-Screening Unit at the MPI-CBG March 2nd – 4th 2011, Zürich [email protected]

Screen Mining with KNIME

A user-friendly framework for high throughput/content

Outline

- Our challenges with High-Content Screening (HCS) data

- HCS Tools (Community nodes)

- Example Workflows (pseudo demo) - with HCS Tools - templates - Dose Response with variables

Martin Stöter, MPI-CBG, Dresden, Germany 2 Technology Development Studio (TDS)

MPI-CBG, Dresden, Germany

Screening facility for academic laboratories

Provide full service for automation and cell-based screens, RNAi and chemical screens

Equipment: liquid handling robots, drop dispensers, plate washers, plate readers, High Content Screening platforms

Martin Stöter, MPI-CBG, Dresden, Germany 3 Is Data Analysis a Bottleneck in HCS?

Complex Experiments

Lots of data (too much for Excel)

Fancy data analysis / mining Scientists Many scientists, but few data analysts

Sometimes different languages

Data analysis is often a bottleneck!

Data analyst Konstanz Information Miner (KNIME) Open source data analysis platform Modular data pipelining concept

Courtesy of Holger Brandl, MPI-CBG 4 Template Scripts for KNIME

Excel, tables, R, Python, graphical user Matlab, , interfaces programming

Scientist Data analyst Solution Data analysts write template scripts AND users access these using a graphical interface!

R Other scripting languages Local or R server, easy to update & to edit Groovy, Java snippets, JPython, Perl Visualizations (boxplots, histograms, profiles, …)

Matlab Other KNIME functionalities Open in Matlab, Matlab snippet, Mattlab plot

Python Chemoinformatics Open in Python, Python snippet, Python plot Bioinformatics

Courtesy of Holger Brandl, MPI-CBG 5

High-Content Screening (HCS) data

Data generation - Cells (RNAi, compounds) - Microscopy -> images - Image analysis - Cell features/parameters -> well data

Tasks/problems - Read data from various sources SQL database, XML, Excel, various .csv … - Screening specific statistics - Screening specific utilities - Data mining, visualization

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 A B DMSO DMSO DMSO C 0.001 DMSO DMSO 0.001 D 10 DMSO DMSO 10 E 10 DMSO DMSO 10 F 3 DMSO DMSO 3 G 3 DMSO DMSO 3 H 1 DMSO DMSO 1 I 1 DMSO DMSO 1 J 0.3 DMSO DMSO 0.3 K 0.3 DMSO DMSO 0.3 L 0.1 DMSO DMSO 0.1 M no AB no AB 0.1 DMSO DMSO 0.1 N no AB no AB 0.1 DMSO DMSO 0.1 O DMSO DMSO P Martin Stöter, MPI-CBG, Dresden, Germany 6 HCS HCS Tools for KNIME Tools Data Import Image Analysis (Opera, Operetta, Cell Profiler, MotionTracking) Plate Readers (Envision, GeniusPro, MSD SectorImager)

Normalization Percent-of-control (POC), Normalized percent inhibition (NPI) Z-score, B-score Optional: robust statistics (Median + MAD) Select wells to normalize (controls, samples)

Quality Control CV, Z‘, Multivariate Z‘, SSMD

Screen Mining Annotation of screen data from database Dose Response (IC50)

Utilities Handle barcodes & wells, join layouts

Visualization Plate Viewer -> heatmaps, brows wells, ... Mondiran, R templates Martin Stöter, MPI-CBG, Dresden, Germany 7

HCS Tools: Standardized Data Format

- Different readers nodes to shape a common data structure - Enforce standardization of data format - Lower the knowledge entry barrier for new users HCS Tools: Barcode Standard

Regular expression for interpretation of barcode:

Standardized table structure -> connection to our TDS compound database

(?[0-9]{3})(?[A-z]{2})(?[0-9]{6})(?[A-z]{1})

Configurable in Preferences -> KNIME -> HCA Tools

HCS Tools: Annotate Experiment

Excel is (still) the tool of choice for assay development

Join Layout node is Excel Reader for defined spread sheet

Plate format with multiple well attributes (1 plate layout -> 1 column in KNIME) KNIME Workflow Example 1 Plate Viewer: Heatmaps of entire Screen

179 plates x 384wells = ~70.000 data points times x parameters

Martin Stöter, MPI-CBG, Dresden, Germany 12 Workflow Example 2

Martin Stöter, MPI-CBG, Dresden, Germany 13 Workflow Example 2: Clustering

Martin Stöter, MPI-CBG, Dresden, Germany 14 Workflow Example 2

Martin Stöter, MPI-CBG, Dresden, Germany 15 Workflow Example 2: Dose Response node

Martin Stöter, MPI-CBG, Dresden, Germany 16 Using flow variables in R scripts

Dose Response node looks like KNIME, but actually it is R!!!

Martin Stöter, MPI-CBG, Dresden, Germany 17 Summary

• HCS Tools provides a very useful functionality for HCS specific applications in KNIME

• The scripting template environment allows to use a GUI for configuration of scripts.

• R scripts from templates can be modified and customized in any node and support flow variables. How to get the HCS Tools & Scripting nodes?

MPI-CBG http://www.mpi-cbg.de/facilities/profiles/software-engineering/hcs-tools.html HCS Tools KNIME community website http://tech.knime.org/hcs-tools Link KNIME community update site http://tech.knime.org/update/community-contributions/nightly

KNIME -> Help -> Install Software -> Add… Paste update link and select node packages to be installed

…and the R templates? -> links comes automatically with Scripting nodes, but can be configured in KNIME preferences.

Martin Stöter, MPI-CBG, Dresden, Germany 19 Acknowledgements

Software Development / TDS team (MPI-CBG) Bioinformatics Facility (MPI-CBG) Marc Bickle Felix Meyerhofer Holger Brandl Cordula Andre Claudia Möbius

Rico Barsacchi Antje Niederlein HCS Sara Christ Nadine Tomschke Tools Milan Esner Jan Wagner Annett Lohmann

KNIME Michael Berthold and the KNIME team

20 Thank you for your attention!

Martin Stöter, MPI-CBG, Dresden, Germany 21