Scripting without Scripts: A User-Friendly Integration of , Python, Matlab and Groovy into KNIME

Felix Meyenhofer Technology Development Studio

3. March 2011 4th KNIME Users Group Meeting and Workshop

Friday, March 4, 2011 1 Outline

• Motivation • Architecture and design • Scripting languages • Generic R • Prototyping • Perspectives

2

Friday, March 4, 2011 2 Motivation Outline Architecture and design Scripting languages Generic R Prototyping Perspectives

2

Friday, March 4, 2011 2 Motivation Architecture and design Our situation Scripting languages Generic R Prototyping Perspectives

3

Friday, March 4, 2011 3 Motivation Architecture and design The actors Scripting languages Generic R Prototyping Perspectives

I’m a scientist. I use Excel because I rely on a proper UI, but it I’m a data analyst. does not scale, does not I can script in R, work well for all types of Python, Matlab experiments, etc. . and a little . So I often need to go to I hate user see my data analyst interfaces!

4

Friday, March 4, 2011 4 Motivation Architecture and design Are data analysts lazy people? Scripting languages Generic R Prototyping Perspectives

A (typical) research project

• Experiment

• Lots of data

• Fancy scripts

• A little Excel

• Great results

• Paper

5

Friday, March 4, 2011 5 Motivation Architecture and design Are data analysts lazy people? Scripting languages Generic R Prototyping Perspectives

A (typical) research project

• Experiment

• Lots of data

• Fancy data mining scripts

• A little Excel Answer: No, but staff-ratio = 8:1 • Great results

• Paper

5

Friday, March 4, 2011 5 Motivation Architecture and design Can fix the problem? Scripting languages Generic R Prototyping Perspectives

+ It is able to handle big data sets + Has ready to use data interfaces (txt, xls, sql...) + It’s modular architecture allows customization

- Native KNIME visualisation are insufficient (for our purpose) - It lacks specific methods and tool - node developement requires java-geeks and takes time, where the requirements evolve quickly along with the experiments

In contrast: • Scripting solutions can be quickly developed but are not end-user ready • Scripting allows rapid prototyping of methods and tools

6

Friday, March 4, 2011 6 Motivation Architecture and design Idea Scripting languages Generic R Prototyping Perspectives

What if data analysts could write their scripts as usual, AND users could somehow access these within a well designed framework using a graphical interface?

+ =

7

Friday, March 4, 2011 7 Motivation Architecture and design The basic components Scripting languages Generic R Prototyping Perspectives

Java KNIME

Module Scripting Server + Framework API

8

Friday, March 4, 2011 8 Motivation Architecture and design Script template system Scripting languages Generic R Prototyping Perspectives

XML User Script template interface

Visne et al., RGG: A general GUI Framework for R scripts, 2009, Bioinformatics, doi:10.1186/1471-2105-10-74 97

Friday, March 4, 2011 9 Motivation Architecture and design Scripting node state diagram Scripting languages Generic R Prototyping Perspectives

Empty Script Scripting

Execute Script Create Node on execute Reattach/ Not Unlink Convert to rgg template

Create Template UI Configure Discards all script customizations

Rebuild UI Edit Template

X

Friday, March 4, 2011 10 Motivation Architecture and design Reduction to the necessairy Scripting languages Generic R Prototyping Perspectives

internal internal

10

Friday, March 4, 2011 11 Motivation Architecture and design Template management Scripting languages Generic R Prototyping Perspectives

• Local or remote template repositorie • Plain-text template definition files • Hierarchically organized • Previews for visualization templates

11

Friday, March 4, 2011 12 Motivation Architecture and design The (new) situation Scripting languages Generic R Prototyping Perspectives • Facilitated knowledge propagation within and among research groups • Facilitated knowledge (work) preservation with the template system • Applicable form different levels of expertise

Users Advanced Users Data Analysts Developers • Use Knime and • Customize templates • Write scripts to fill • Provide Knime-nodes templates in-place Knime-gaps for most popular • Be happy! • Use flow variables and • Evolve scripts into new templates loops templates

12

Friday, March 4, 2011 13 Motivation Architecture and design The (new) situation Scripting languages Generic R Prototyping Perspectives • Facilitated knowledge propagation within and among research groups • Facilitated knowledge (work) preservation with the template system • Applicable form different levels of expertise

Users Advanced Users Data Analysts Developers • Use Knime and • Customize templates • Write scripts to fill • Provide Knime-nodes templates in-place Knime-gaps for most popular • Be happy! • Use flow variables and • Evolve scripts into new templates loops templates

12

Friday, March 4, 2011 13 Motivation Architecture and design Scripting nodes vs. conventional nodes Scripting languages Generic R Prototyping Perspectives

• Less integrated than native nodes • ‘Real’ nodes are likely to be (much) faster • ‘Real’ nodes scale better/work with huge data-sets • ‘Real’ nodes can be updated by updating Knime • ‘Real’ node-development requires 10x more resources

13

Friday, March 4, 2011 14 Motivation Architecture and design From templates to nodes Scripting languages Generic R Prototyping Perspectives

+ = Node

• Templates become updateable • Rapid R prototyping • Trivial deployment (1node/min) into end-user ready nodes

14

Friday, March 4, 2011 15 Motivation Architecture and design From templates to nodes Scripting languages Generic R Prototyping Perspectives

+ = Node

• Templates become updateable • Rapid R prototyping • Trivial deployment (1node/min) into end-user ready nodes

14

Friday, March 4, 2011 15 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives

R

Rserve as backend

15

Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives

R MATLAB

Rserve as mpicbg- web-client backend (needs a floating license!)

15

Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives

R MATLAB Python

Rserve as mpicbg-matlab web-client mpicbg-python backend (needs a floating license!) backend

15

Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives

R MATLAB Python Groovy

Rserve as mpicbg-matlab web-client mpicbg-python Using Knime API but in backend (needs a floating license!) backend a dynamically complied environment

15

Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives

R MATLAB Python Groovy

Rserve as mpicbg-matlab web-client mpicbg-python Using Knime API but in backend (needs a floating license!) backend a dynamically complied environment

server infrastructure makes it easier for maintenance. (power users can use a local instance)

15

Friday, March 4, 2011 16 Motivation Architecture and design Scripting language support Scripting languages Generic R Prototyping Perspectives

R MATLAB Python Groovy

Rserve as mpicbg-matlab web-client mpicbg-python Using Knime API but in backend (needs a floating license!) backend a dynamically complied environment

What else?

All that is necessary is (1) a table conversion mechanism, (2) the possibility to invoke a script with the converted table as argument, and (3) a way to convert the results back into a table

15

Friday, March 4, 2011 16 Motivation Architecture and design Feature overview Scripting languages Generic R Prototyping Perspectives

R MATLAB Python Groovy

• STATISTICS • Statistics • BIOINFORMATICS • Java prototyping • Visualization • IMAGE • Statistics • regexp • Handy data-types PROCESSING • (Image Processing) • ... • Visualization • Visualization • Easy Java interface (Matplotlib)

features • Documentation

Exponential growth Many people know it #4 progamming in packages (#29) * language *

rather slow without memory

)-: grid computing interface syntax (just joking) management toolbox

* http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html 16

Friday, March 4, 2011 17 Motivation Architecture and design Generic R Scripting languages Generic R Prototyping Perspectives

• Knime is focused on table transformations • Some problems do not fit into this scheme • Solution: Generic R nodes

17

Friday, March 4, 2011 18 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives

18

Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives

18

Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives

18

Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives

18

Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives

18

Friday, March 4, 2011 19 Motivation Architecture and design The MATLAB integration trio Scripting languages Generic R Prototyping Perspectives

18

Friday, March 4, 2011 19 The R integration trio

19

Friday, March 4, 2011 20 The R integration trio

19

Friday, March 4, 2011 20 The R integration trio

19

Friday, March 4, 2011 20 The R integration trio

19

Friday, March 4, 2011 20 The R integration trio

19

Friday, March 4, 2011 20 The R integration trio

19

Friday, March 4, 2011 20 Motivation Architecture and design The Python integration trio Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design The Python integration trio Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

20

Friday, March 4, 2011 21 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

21

Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

21

Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

21

Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

21

Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

21

Friday, March 4, 2011 22 Motivation Architecture and design Creating a Python plot template Scripting languages Generic R Prototyping Perspectives

21

Friday, March 4, 2011 22 Motivation Architecture and design Question to the audience: Scripting languages Generic R Prototyping Perspectives

Is this KNIME or R?

22

Friday, March 4, 2011 23 Motivation Architecture and design Question to the audience: Scripting languages Generic R Prototyping Perspectives

Is this KNIME or R?

95%

22

Friday, March 4, 2011 23 Motivation Architecture and design Our experience so far: Scripting languages Generic R Prototyping Perspectives

I love templates, because they can I love templates, because they be used to quickly provide me the exact tools I make custom need. And my data analyst can solutions create them almost accessible for my instantaneously. scientists.

23

Friday, March 4, 2011 24 Motivation Architecture and design Aggregation Scripting languages Generic R Prototyping Perspectives Additional features of our scripting integrations: • Better editor (undo, redo) • Better attribute name insertion • Preserve column names • UI templates + centralized user template repositories • Worker engine can be run locally or on a centralized server • Dynamic repainting of plots • OpenIn* nodes for rapid prototyping • Faster table conversion (~2x for R) • Flow-variable support in scripts • Generic R nodes to process arbitrary data structures

Additional scripting languages: • Python • Matlab

24

Friday, March 4, 2011 25 Acknowledgements

Holger Brandl (Software developer for Bioinformatics)

Tom Haux (Software developer SWENG)

Martin Stöter (Scientist TDS)

Michael Berthold and the entire KNIME team

25

Friday, March 4, 2011 26