Using KNIME in Metabolomics

Using KNIME for metabolomics data analysis Visual Programming for Metabolomics Stephan Beisken (PhD) Reza Salek (PhD) Cheminformatics and metabolism The European Bioinformatics Institute (EMBL-EBI) Email: [email protected] Data standards efforts http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4648992/ Supported by PhenoMeNal Samples QC C5 S3 S7 C1 C10 QC S1 C3 S5 C7 S6 QC .. Technical Triplicates C5 C5’ C5’’ S3 S3’ S3’’ .. ..’ ..’’ Data analysis and Audit trails DIMS Data Collection Instrument .RAW files IRFC5 IRFC5’ IRFC5’’ IRFS3 IRFS3’ IRFS3’’ IRF.. IRF..’ IRF..’’ C5 C5 C5 S3 S3 S3 .. .. .. Averaged Transients AT AT ’ AT ’’ AT AT ’ AT ’’ AT AT ’ AT ’’ Apodisation, Zero-filling and FFT TIC Filtering Frequency Spectra FSC5 FSC5’ FSC5’’ FSS3 FSS3’ FSS3’’ FS.. FS..’ FS..’’ Calibrant List Mass Calibration and SIM-stitching Stitched Peak Lists SPLC5 SPLC5’ SPLC5’’ SPLS3 SPLS3’ SPLS3’’ SPL.. SPL..’ SPL..’’ Replicate Replicate Replicate Filtering Filtering Filtering • C5 S3 .. blank RFPL RFPL Capturing different step of data Replicate Filtered Peak Lists RFPL RFPL Sample Filtering processing.: Blank Filtering Sample Filtered Peak Matrix SFPM Missing-value • Relationship and reproducibility Filtering PQN Normalisation Batch Spectral Correction Cleaning SFPM PQN SFPM PQN + BATCH SFPM PQN + BATCH + CLEAN Impute Missing Values using KNN SFPM PQN + KNN SFPM PQN + BATCH + KNN SFPM PQN + BATCH + CLEAN + KNN Glog Transformation SFPM PQN + KNN + GLOG SFPM PQN + BATCH + KNN + GLOG SFPM PQN + BATCH + CLEAN + KNN + GLOG Visual Programming • “Visual programming languages enable physicians and other computer users with little knowledge of programming to develop computer software. The physician uses a visual paradigm to "draw" the computer interface and then attaches short segments of computer code to buttons, menus, and list boxes.” Ebell, M. H. (1993). Visual programming languages. M.D. Computing: Computers in Medical Practice, 10(5), 305–11. Motivation • Simplify your (working) life • Data processing and analysis requires various different tools to work together in sequence • Data input and output • Spreadsheets • Data transformation • Transposition, aggregation, string manipulation • ISAcreator • Formatting of tables • Submission to MetaboLights Disclaimer • Workflows are great • It does not have to be KNIME, there are many other solutions • Every method that captures information in a consistent manner and enables reproducibility is great • Transparency • Ability to share data and ‘everything’ that was done to the data Introduction • KNIME: Konstanz Information Miner • http://www.knime.org/ • Developed at University of Konstanz in Germany • Desktop version available free of charge (open source) • Modular platform for building and executing workflows using predefined components: nodes • Core functionality available for tasks such as data mining, analysis, and manipulation • Extra features and functionality available in KNIME through extensions from various groups (community) and vendors • Written in Java based on the Eclipse SDK platform Workflow Concepts • Workflow execution • Can execute complex, multi-step operations on input data • Can be run be “non-experts” using predefined parameter templates ensuring optimal results • Can be set up for specific measurement systems • Can be shared across researchers Functionality • Data manipulation and analysis • File & database I/O, sorting, filtering, grouping, joining, pivoting • Data mining and machine learning • R, WEKA, KNIME, interactive plotting • Cheminformatics • Conversions, similarity, clustering, (Q)SAR analysis, etc. • Scripting integration • R, Perl, Python, Matlab, Octave, Groovy • Reporting and much more • Bioinformatics, HTS & image analysis, network & text mining • Marketing, big data and business analytics Modules (Community Extensions) • http://tech.knime.org/community • Chemoinformatics • CDK (EMBL-EBI), RDKit (Novartis), Indigo (GGA), • ErlWood (Eli Lilly), Enalos (NovaMechanics) • ChEMBL and ChEBI (EMBL-EBI) • Bioinformatics • OpenMS (Tübingen, ETH Zurich) • MassCascade (EMBL-EBI) • HCS (MPI), NGS (Konstanz), Image analysis • Integration • Python, Perl, R, Groovy, Matlab (MPI), PDB web services client (Vernalis), REST and SOAP web service support Workflow Platforms Applications of metabolomics EMBO course Applications metabolite reporting frequency Applications MVDA Calibration Regression Applications Stat Advantages Disadvantages • Intuitive to use • Steep learning cure • No or little programming • Resource greedy experience required • No (free) server edition • Good for prototyping • Slower execution than • Lots of functionality standalone scripts • Very modular and flexible • Active community • Extensible • Visual Feedback Workbench Auto-layout Execute Execute all nodes Node descrip4on tabs workflow projects favorite nodes public server workflow editor node repository outline console Nodes • Node: Basic processing unit of a workflow • performs a particular task Input port(s) – on the leC of icon Title Output port(s) – on the right of icon Icon Status display (‘traffic lights’) Right-click menu Sequence number • Red (not ready) To configure and • Amber (ready) execute the node, • Green (executed) display the output views, edit the • Blue bar during execu4on node, and display (with percentage or flashing) data for the ports Dialogs • Double-click opens configuration dialogs • Explicit column types MassCascade https://bitbucket.org/sbeisken/masscascadeknime/wiki/ExampleWorkflows XCMS http://www.bioconductor.org/packages/devel/data/experiment/manuals/faahKO/man/faahKO.pdf A great example: OpenMS http://ftp.mi.fu-berlin.de/OpenMS/release-documentation/OpenMS_tutorial.pdf Feature Filter Deconvolution A A B B C C Sample Alignment Compound Spectra Compound Spectra OH HO H H H OH OH H O O O H OH HO O OH H O HH OH HO H H H OH HO Extracted Mass Chromatograms Workflow Parameter Tuning Calibration Regression Applications Data Sources Final Remarks • Workflows can make exploratory or repetitive data tasks easier and save time • Extensive data pre-processing functionality • Extensions for statistics, machine learning, bio-, and cheminformatics • Integration of R (XCMS) and spectrometry extensions can help you to build elaborate pipelines and share work • Can help to organize one’s thoughts. • It’s actually quite a bit of fun. Resources • KNIME Forum • http://www.knime.org/ • KNIME Learning Hub • http://www.knime.org/learning-hub • Quickstart Guide • http://tech.knime.org/files/KNIME_quickstart.pdf • Happy to Help • [email protected] .

Using KNIME in Metabolomics

Text Mining Course for KNIME Analytics Platform

Imagej2-Allow the Users to Use Directly Use/Update Imagej2 Plugins Inside KNIME As Well As Recording and Running KNIME Workflows in Imagej2

KNIME Workbench Guide

Data Analytics with Knime

Sheffield HPC Documentation

Mathematica Document

Direct Submission Or Co-Submission Direct Submission

Role of Materials Data Science and Informatics in Accelerated Materials Innovation Surya R

Titel Untertitel

Leveraging SAS with KNIME

Introduction to Label-Free Quantification

Bringing Open Source to Drug Discovery