Scripting without Scripts: A User-Friendly Integration of R, Python, Matlab and Groovy into KNIME
Felix Meyenhofer Technology Development Studio
3. March 2011 4th KNIME Users Group Meeting and Workshop
Friday, March 4, 2011 1 Outline
• Motivation • Architecture and design • Scripting languages • Generic R • Prototyping • Perspectives
2
Friday, March 4, 2011 2 Motivation Outline Architecture and design Scripting languages Generic R Prototyping Perspectives
2
Friday, March 4, 2011 2 Motivation Architecture and design Our situation Scripting languages Generic R Prototyping Perspectives
3
Friday, March 4, 2011 3 Motivation Architecture and design The actors Scripting languages Generic R Prototyping Perspectives
I’m a scientist. I use Excel because I rely on a proper UI, but it I’m a data analyst. does not scale, does not I can script in R, work well for all types of Python, Matlab experiments, etc. . and a little Java . So I often need to go to I hate user see my data analyst interfaces!
4
Friday, March 4, 2011 4 Motivation Architecture and design Are data analysts lazy people? Scripting languages Generic R Prototyping Perspectives
A (typical) research project
• Experiment
• Lots of data
• Fancy data mining scripts
• A little Excel
• Great results
• Paper
5
Friday, March 4, 2011 5 Motivation Architecture and design Are data analysts lazy people? Scripting languages Generic R Prototyping Perspectives
A (typical) research project
• Experiment
• Lots of data
• Fancy data mining scripts
• A little Excel Answer: No, but staff-ratio = 8:1 • Great results
• Paper
5
Friday, March 4, 2011 5 Motivation Architecture and design Can fix the problem? Scripting languages Generic R Prototyping Perspectives
+ It is able to handle big data sets + Has ready to use data interfaces (txt, xls, sql...) + It’s modular architecture allows customization
- Native KNIME visualisation are insufficient (for our purpose) - It lacks specific methods and tool - node developement requires java-geeks and takes time, where the data analysis requirements evolve quickly along with the experiments
In contrast: • Scripting solutions can be quickly developed but are not end-user ready • Scripting allows rapid prototyping of methods and tools
6
Friday, March 4, 2011 6 Motivation Architecture and design Idea Scripting languages Generic R Prototyping Perspectives
What if data analysts could write their scripts as usual, AND users could somehow access these within a well designed framework using a graphical interface?
+ =
7
Friday, March 4, 2011 7 Motivation Architecture and design The basic components Scripting languages Generic R Prototyping Perspectives
Java KNIME
Module Scripting Server + Framework API
8
Friday, March 4, 2011 8 Motivation Architecture and design Script template system Scripting languages Generic R Prototyping Perspectives
XML User Script template interface
Visne et al., RGG: A general GUI Framework for R scripts, 2009, Bioinformatics, doi:10.1186/1471-2105-10-74 97
Friday, March 4, 2011 9 Motivation Architecture and design Scripting node state diagram Scripting languages Generic R Prototyping Perspectives
Empty Script Scripting
Execute Script Create Node on execute Reattach/ Not Unlink Convert to rgg template
Create Template UI Configure Discards all script customizations
Rebuild UI Edit Template
X
Friday, March 4, 2011 10 Motivation Architecture and design Reduction to the necessairy Scripting languages Generic R Prototyping Perspectives
internal internal
10
Friday, March 4, 2011 11 Motivation Architecture and design Template management Scripting languages Generic R Prototyping Perspectives
• Local or remote template repositorie • Plain-text template definition files • Hierarchically organized • Previews for visualization templates
11
Friday, March 4, 2011 12 Motivation Architecture and design The (new) situation Scripting languages Generic R Prototyping Perspectives • Facilitated knowledge propagation within and among research groups • Facilitated knowledge (work) preservation with the template system • Applicable form different levels of expertise
Users Advanced Users Data Analysts Developers • Use Knime and • Customize templates • Write scripts to fill • Provide Knime-nodes templates in-place Knime-gaps for most popular • Be happy! • Use flow variables and • Evolve scripts into new templates loops templates
12
Friday, March 4, 2011 13 Motivation Architecture and design The (new) situation Scripting languages Generic R Prototyping Perspectives • Facilitated knowledge propagation within and among research groups • Facilitated knowledge (work) preservation with the template system • Applicable form different levels of expertise
Users Advanced Users Data Analysts Developers • Use Knime and • Customize templates • Write scripts to fill • Provide Knime-nodes templates in-place Knime-gaps for most popular • Be happy! • Use flow variables and • Evolve scripts into new templates loops templates
12
Friday, March 4, 2011 13 Motivation Architecture and design Scripting nodes vs. conventional nodes Scripting languages Generic R Prototyping Perspectives
• Less integrated than native nodes • ‘Real’ nodes are likely to be (much) faster • ‘Real’ nodes scale better/work with huge data-sets • ‘Real’ nodes can be updated by updating Knime • ‘Real’ node-development requires 10x more resources
13
Friday, March 4, 2011 14 Motivation Architecture and design From templates to nodes Scripting languages Generic R Prototyping Perspectives
+ = Node
• Templates become updateable • Rapid R prototyping • Trivial deployment (1node/min) into end-user ready nodes
14
Friday, March 4, 2011 15 Motivation Architecture and design From templates to nodes Scripting languages Generic R Prototyping Perspectives
+ = Node
• Templates become updateable • Rapid R prototyping • Trivial deployment (1node/min) into end-user ready nodes
14
Friday, March 4, 2011 15 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives
R
Rserve as backend
15
Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives
R MATLAB
Rserve as mpicbg-matlab web-client backend (needs a floating license!)
15
Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives
R MATLAB Python
Rserve as mpicbg-matlab web-client mpicbg-python backend (needs a floating license!) backend
15
Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives
R MATLAB Python Groovy
Rserve as mpicbg-matlab web-client mpicbg-python Using Knime API but in backend (needs a floating license!) backend a dynamically complied environment
15
Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives
R MATLAB Python Groovy
Rserve as mpicbg-matlab web-client mpicbg-python Using Knime API but in backend (needs a floating license!) backend a dynamically complied environment
server infrastructure makes it easier for maintenance. (power users can use a local instance)
15
Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives
R MATLAB Python Groovy
Rserve as mpicbg-matlab web-client mpicbg-python Using Knime API but in backend (needs a floating license!) backend a dynamically complied environment
What else?
All that is necessary is (1) a table conversion mechanism, (2) the possibility to invoke a script with the converted table as argument, and (3) a way to convert the results back into a table
15
Friday, March 4, 2011 16 Motivation Architecture and design Feature overview Scripting languages Generic R Prototyping Perspectives
R MATLAB Python Groovy
• STATISTICS • Statistics • BIOINFORMATICS • Java prototyping • Visualization • IMAGE • Statistics • regexp • Handy data-types PROCESSING • (Image Processing) • ... • Visualization • Visualization • Easy Java interface (Matplotlib)
features • Documentation
Exponential growth Many people know it #4 progamming in packages (#29) * language *
rather slow without memory
)-: grid computing interface syntax (just joking) management toolbox
* http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html 16
Friday, March 4, 2011 17 Motivation Architecture and design Generic R Scripting languages Generic R Prototyping Perspectives
• Knime is focused on table transformations • Some problems do not fit into this scheme • Solution: Generic R nodes
17
Friday, March 4, 2011 18 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives
18
Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives
18
Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives
18
Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives
18
Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives
18
Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives
18
Friday, March 4, 2011 19 The R integration trio
19
Friday, March 4, 2011 20 The R integration trio
19
Friday, March 4, 2011 20 The R integration trio
19
Friday, March 4, 2011 20 The R integration trio
19
Friday, March 4, 2011 20 The R integration trio
19
Friday, March 4, 2011 20 The R integration trio
19
Friday, March 4, 2011 20 Motivation Architecture and design The Python integration trio Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design The Python integration trio Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
20
Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
21
Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
21
Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
21
Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
21
Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
21
Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives
21
Friday, March 4, 2011 22 Motivation Architecture and design Question to the audience: Scripting languages Generic R Prototyping Perspectives
Is this KNIME or R?
22
Friday, March 4, 2011 23 Motivation Architecture and design Question to the audience: Scripting languages Generic R Prototyping Perspectives
Is this KNIME or R?
95%
22
Friday, March 4, 2011 23 Motivation Architecture and design Our experience so far: Scripting languages Generic R Prototyping Perspectives
I love templates, because they can I love templates, because they be used to quickly provide me the exact tools I make custom need. And my data analyst can solutions create them almost accessible for my instantaneously. scientists.
23
Friday, March 4, 2011 24 Motivation Architecture and design Aggregation Scripting languages Generic R Prototyping Perspectives Additional features of our scripting integrations: • Better editor (undo, redo) • Better attribute name insertion • Preserve column names • UI templates + centralized user template repositories • Worker engine can be run locally or on a centralized server • Dynamic repainting of plots • OpenIn* nodes for rapid prototyping • Faster table conversion (~2x for R) • Flow-variable support in scripts • Generic R nodes to process arbitrary data structures
Additional scripting languages: • Python • Matlab
24
Friday, March 4, 2011 25 Acknowledgements
Holger Brandl (Software developer for Bioinformatics)
Tom Haux (Software developer SWENG)
Martin Stöter (Scientist TDS)
Michael Berthold and the entire KNIME team
25
Friday, March 4, 2011 26