Multiple Usage of KNIME in a Screening Laboratory Environment
Total Page:16
File Type:pdf, Size:1020Kb
Multiple Usage of KNIME in a Screening Laboratory Environment KNIME UGM Zürich, 02.02.2012 Marc Bickle HT-TDS, MPI-CBG Outline • Presentation of TDS • Our problem: large complex datasets • KNIME as data mining tool for screening (Community nodes) • HCS tools (Community nodes) • R, python, Matlab, Groovy integration • Some examples of other usage of KNIME The High Throughput Technology Development Studio (HT-TDS) Mission: provide cell-based screening services Automated microscopy and automated image analysis • High spatio-temporal resolution on a cell-by-cell basis • Quantitative measurement of many cellular parameters (intensity, sub cellular localization) allowing finely resolved phenotypic classification • System biology readouts of chemogenomic screens (genome-wide RNAi screens + chemical screens) • Clustering of RNAi and chemical phenotypes for mode-of-action identification (170.000 cpds, GW libraries) shape of cells number of nuclei intensity Automated Confocal Microscopy Profiling/clustering • high resolution • advanced statistics • high throughput • identify target of compounds distribution/distance Automated Image Analysis sub cellular localization • multiple parameters • high definition of phenotypes Identifying MOA By Integrating Chemical And Genetic Screens Compound Screen RNAi Screen 6.0 CHML all oligos run 5 6.0 CHML all oligos run 5 4.0 4.0 2.0 2.0 0.0 0.0 -2.0 -2.0 -4.0 0.0 5.0 10 15 20 25 30 35 40 -4.0 0.0 5.0 10 15 20 25 30 35 40 What genes influence the same parameters as the compounds ? Devise assays to test predictions upon pathways Image Screen Dataflow 1-10 Mio Images (tif) 2-10TB Image Analysis 1-10 x 104 wells 1-10x 106 fields 1-10 x 108 cells 1-10 x 1010 objects CSV files/database 2-100GB 1-10 Mio Images (png) Data Mining 1-50 plots (png/svg) 7 (2010)7 1-10 result files - 1 doi:10.1038/nature08779 C Collinet al.000, Natureet (CSV/xls/pdf) 1-10MB Available Software Solutions • Few software are able to deal with n-dimensional data structures of several GB size 1. Scripting languages: R, S, Matlab, Python, Java, C Issues: • Biologists are rarely at ease with scripting languages • No overview of the data and the analysis flow (not graphical) 2. Commercial software: Genedata, Spotfire, Pipeline Pilot Issues: • Very expensive • Not flexible, no possibility to extend the code • No or small community to share problems and solutions with (but there are field scientists to help out) 3. Graphical Open Source software: KNIME, RapidMiner Issues: • None? KNIME 1. KNIME can handle very large datasets on normal desktop computers 1. The workspace allows to easily assemble analysis pipelines • Good overview of the analysis path and operations (annotation of nodes) 2. Many useful data manipulation nodes, powerful clustering methods and cheminformatic nodes existed 3. The possibility of integrated scripting languages (R, Java) offered great flexibility HCS Tools • KNIME did not have any screening-specific tools implemented ➜ We created a set of KNIME nodes for analyzing screening data 1. Instrument output readers 2. Well annotation tools, barcode tools 3. Typical QC tools: Z’ factor, SSMD, CV 4. Typical normalization tools: Z score, Percent of Control, Normalized Percent Inhibition, B score 5. Typical visualization tools: heatmap Scripting Integration • Some methods were not implemented in KNIME nodes ➜ We integrated R, Python, Groovy, Matlab (requires licensed server) scripting languages with RGG: • Hides script behind a GUI • Choose from a set of templates for methods or plots • Parametrization with buttons or drop boxes (http ://idisk-srv1.mpi-cbg.de/knime/scripting- templates_tds/Matlab/TDS_figure-templates.txt) Workflow Annotate Screen Normalize Read Data QC Data Data Snippets to test for normality, to transform to normality (Box Cox), calculate Mahalanobis distance, Pearson’s correlations Other Applications I • Create a loop to open many csv files, calculate something and close and save the files Example: merge a measurement column from a lot of files to a lot of other files Other Applications II • Standardization of libraries (compounds and siRNA libraries), • Different providers have different datasheets. In order to integrate all libraries in a common database for screen annotation, they need to be standardized and rearrayed to 384 well format Other Applications III • Hitpicking rearraying. After a screen, we need to reconfirm hits and to cherry pick them from the library and transfer to a new 384 well plate. The workflow takes into account the work logic of the robot to obtain the final plate layout. Other Applications IV • Use the generic xml reader to read files that are often not accessible • Example on OPERA database, read microscope specific data such as focus height, sublayout, dichroic mirrors settings and combine with QC image analysis script running on the fly. Other Applications IV Plot the intensity of a channel per column and Verify the parameter profile of controls plate Workshop: Multifactorial Optimization • The HT-TDS offers a one week workshop to learn: 1. Optimize siRNA transfection and antibody staining in 96 well format 2. How to use the Perkin Elmer OPERETTA (widefield microscope) 3. How to perform image analysis with the Open Source software CellProfiler 4. How to perform multiparametric analysis with KNIME Please contact Marc Bickle: [email protected] Summary • The HT-TDS is screening facility specialized in automated imaging (HCS) open to any users • We have created a set of nodes and templates for analyzing screening data in KNIME • KNIME can be used for many other common tasks • The HT-TDS offers a one week workshop for learning automated microscopy, image processing and multivariate analysis using KNIME tools Aknowledgements MPI-CBG Holger Antje Martin Felix Brandl Niederlein Stoeter Meyenhofer .