Martin Stöter HT - Technology Development Studio (TDS), KNIME workshop the HC-Screening Unit at the MPI-CBG March 2nd – 4th 2011, Zürich [email protected]
Screen Mining with KNIME
A user-friendly framework for high throughput/content data analysis
Outline
- Our challenges with High-Content Screening (HCS) data
- HCS Tools (Community nodes)
- Example Workflows (pseudo demo) - with HCS Tools - R templates - Dose Response with variables
Martin Stöter, MPI-CBG, Dresden, Germany 2 Technology Development Studio (TDS)
MPI-CBG, Dresden, Germany
Screening facility for academic laboratories
Provide full service for automation and cell-based screens, RNAi and chemical screens
Equipment: liquid handling robots, drop dispensers, plate washers, plate readers, High Content Screening platforms
Martin Stöter, MPI-CBG, Dresden, Germany 3 Is Data Analysis a Bottleneck in HCS?
Complex Experiments
Lots of data (too much for Excel)
Fancy data analysis / mining Scientists Many scientists, but few data analysts
Sometimes different languages
Data analysis is often a bottleneck!
Data analyst Konstanz Information Miner (KNIME) Open source data analysis platform Modular data pipelining concept
Courtesy of Holger Brandl, MPI-CBG 4 Template Scripts for KNIME
Excel, tables, R, Python, graphical user Matlab, Java, interfaces programming
Scientist Data analyst Solution Data analysts write template scripts AND users access these using a graphical interface!
R Other scripting languages Local or R server, easy to update & to edit Groovy, Java snippets, JPython, Perl Visualizations (boxplots, histograms, profiles, …)
Matlab Other KNIME functionalities Open in Matlab, Matlab snippet, Mattlab plot
Python Chemoinformatics Open in Python, Python snippet, Python plot Bioinformatics Data mining
Courtesy of Holger Brandl, MPI-CBG 5
High-Content Screening (HCS) data
Data generation - Cells (RNAi, compounds) - Microscopy -> images - Image analysis - Cell features/parameters -> well data
Tasks/problems - Read data from various sources SQL database, XML, Excel, various .csv … - Screening specific statistics - Screening specific utilities - Data mining, visualization
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 A B DMSO DMSO DMSO C 0.001 DMSO DMSO 0.001 D 10 DMSO DMSO 10 E 10 DMSO DMSO 10 F 3 DMSO DMSO 3 G 3 DMSO DMSO 3 H 1 DMSO DMSO 1 I 1 DMSO DMSO 1 J 0.3 DMSO DMSO 0.3 K 0.3 DMSO DMSO 0.3 L 0.1 DMSO DMSO 0.1 M no AB no AB 0.1 DMSO DMSO 0.1 N no AB no AB 0.1 DMSO DMSO 0.1 O DMSO DMSO P Martin Stöter, MPI-CBG, Dresden, Germany 6 HCS HCS Tools for KNIME Tools Data Import Image Analysis (Opera, Operetta, Cell Profiler, MotionTracking) Plate Readers (Envision, GeniusPro, MSD SectorImager)
Normalization Percent-of-control (POC), Normalized percent inhibition (NPI) Z-score, B-score Optional: robust statistics (Median + MAD) Select wells to normalize (controls, samples)
Quality Control CV, Z‘, Multivariate Z‘, SSMD
Screen Mining Annotation of screen data from database Dose Response (IC50)
Utilities Handle barcodes & wells, join layouts
Visualization Plate Viewer -> heatmaps, brows wells, ... Mondiran, R templates Martin Stöter, MPI-CBG, Dresden, Germany 7
HCS Tools: Standardized Data Format
- Different readers nodes to shape a common data structure - Enforce standardization of data format - Lower the knowledge entry barrier for new users HCS Tools: Barcode Standard
Regular expression for interpretation of barcode:
Standardized table structure -> connection to our TDS compound database
(?
Configurable in Preferences -> KNIME -> HCA Tools
HCS Tools: Annotate Experiment
Excel is (still) the tool of choice for assay development
Join Layout node is Excel Reader for defined spread sheet
Plate format with multiple well attributes (1 plate layout -> 1 column in KNIME) KNIME Workflow Example 1 Plate Viewer: Heatmaps of entire Screen
179 plates x 384wells = ~70.000 data points times x parameters
Martin Stöter, MPI-CBG, Dresden, Germany 12 Workflow Example 2
Martin Stöter, MPI-CBG, Dresden, Germany 13 Workflow Example 2: Clustering
Martin Stöter, MPI-CBG, Dresden, Germany 14 Workflow Example 2
Martin Stöter, MPI-CBG, Dresden, Germany 15 Workflow Example 2: Dose Response node
Martin Stöter, MPI-CBG, Dresden, Germany 16 Using flow variables in R scripts
Dose Response node looks like KNIME, but actually it is R!!!
Martin Stöter, MPI-CBG, Dresden, Germany 17 Summary
• HCS Tools provides a very useful functionality for HCS specific applications in KNIME
• The scripting template environment allows to use a GUI for configuration of scripts.
• R scripts from templates can be modified and customized in any node and support flow variables. How to get the HCS Tools & Scripting nodes?
MPI-CBG http://www.mpi-cbg.de/facilities/profiles/software-engineering/hcs-tools.html HCS Tools KNIME community website http://tech.knime.org/hcs-tools Link KNIME community update site http://tech.knime.org/update/community-contributions/nightly
KNIME -> Help -> Install Software -> Add… Paste update link and select node packages to be installed
…and the R templates? -> links comes automatically with Scripting nodes, but can be configured in KNIME preferences.
Martin Stöter, MPI-CBG, Dresden, Germany 19 Acknowledgements
Software Development / TDS team (MPI-CBG) Bioinformatics Facility (MPI-CBG) Marc Bickle Felix Meyerhofer Holger Brandl Cordula Andre Claudia Möbius
Rico Barsacchi Antje Niederlein HCS Sara Christ Nadine Tomschke Tools Milan Esner Jan Wagner Annett Lohmann
KNIME Michael Berthold and the KNIME team
20 Thank you for your attention!
Martin Stöter, MPI-CBG, Dresden, Germany 21