® Cheat Sheet: Building a KNIME Workflow for Beginners

Getting started with KNIME Analytics Platform EXPLORE ANALYZE

100.00101.27 ● Read through the installation guide at Scatter Plot: Represents Sunburst Chart: Displays Stacked Area Chart: Plots Decision Tree Decision Tree: The Learner node trains a C4.5 input data rows as points categorical columns through multiple numerical data 90.00 knime.com/installation 80.00 decision tree. The configuration window

in a two dimensional plot. a hierarchy of rings. Each columns on top of each other 70.00 includes options for pruning, early stopping, ● Check out the 7 things you should do after installing Input dimensions ring is sliced according to using the previous line as the 60.00 information measures, splitting values, and KNIME Analytics Platform at (columns) on the x-y axis the nominal values in the base reference. The areas in 50.00 more. Both the Learner and the Predictor knime.com/blog/seven-things plot and graphical corresponding column and between lines are colored for 40.00 node provide an interactive view where the properties can be changed to the selected hierarchy. easier comparison. This chart 30.00 decision tree is displayed together with the 20.00

● Take the E-Learning Course at in the configuration This is a powerful chart for is commonly used to visualize 10.00 input data propagation. knime.com/knime-introductory-course window or interactively in multivariate analysis. trending topics. 0.00 k-Means k-Means: Implements the k-Means clustering the node view. -10.00 -14.72 1.00 5.00 10.00 15.00 20.00 25.00 31.00 algorithm. Number of clusters must be set Browse workflows, nodes, and components at interactive) are (All visualizations ● prior to node execution. This node builds the hub.knime.com Line Plot Color Manager Pie Chart Line Plot: Plots numerical values in data columns Color Manager: Assigns a color property to Pie Chart: Visualizes one aggregated metric for clusters. The Cluster Assigner node finds the (y-axis) against values in a reference column each input row based on the row’s value in a different data partitions with colored slices on a Understanding the traffic light system: closest cluster and assigns it to the input (x-axis). Data points are connected via colored lines. selected column. This color property affects circle where the areas are proportional to the metric data row. Being an unsupervised algorithm, Not configured: Node is not yet configured and If the reference column on the x-axis contains the graphical representation in the upcoming values. The partitions are defined by a categorical this node pair doesn’t follow the classic cannot be executed with its current settings sorted time values, the line plot graphically views. column. Learner - Predictor scheme. represents the evolution of a time series. Configured: Node has been correctly configured Logistic Regression Data Explorer Box Plot Bar Chart Logistic Regression: The Learner node trains a and may be executed at any time Data Explorer: Provides an interactive view to Box Plot: Visualizes numeric columns using Bar Chart: Visualizes one or more aggregated metrics summarize the statistics of the input data via the quartile statistics. Watch out for the points for different data partitions with rectangular bars logistic regression model to predict categorical target values. The configuration window Executed: Node has been successfully statistical measures and histograms - for both at the end of the whiskers - they might mark where the heights are proportional to the metric includes options for solver, input feature executed and results can be viewed and used in numerical and nominal columns. outliers! values. The partitions are defined by a categorical choice, regularization functions to avoid downstream nodes column. overfitting, & more. Scorer Scorer: Calculates a number of performance READ measures such as accuracy, F1-score, or Cohen’s Kappa, to quantify the quality of a File Reader File Reader: Reads all text files, Table Reader Table Reader: Reads data from a .table file. Explore classifier. particularly character separated files, .table files are organized using a KNIME Learner Learner Nodes: Supervised algorithms in such as CSV files. The File Reader is the proprietary format, including the full file KNIME Analytics Platform have a Learner workhorse for reading text data. structure and are optimized for space and Predictor node to train a model on a previously labelled Numeric Scorer Numeric Scorer: Calculates a number of speed - providing maximum performance with training set. numerical error measures, such as root mean minimum configuration! squared error, mean absolute error, or ^2, to Excel Reader (XLS) Excel Reader (XLS): Reads content from Predictor Nodes: Used for applying models. quantify the quality of a numerical predictor sheets in Excel files (XLS, XLSX). Sheet The two inputs are the trained model and the model. and cells to be read can be defined in the Google Sheets Google Sheets Reader: Reads data from a Reader data to process. The output contains the configuration window. Google Sheet file. Authentication occurs on original data and the model predictions. ROC Curve ROC Curve: Displays the Receiver Operating the Google site. Google credentials are not Characteristic (ROC) curve of a classifier saved within the KNIME workflow. Table Creator Table Creator: Allows users to manually working on a binary class problem. One of create a data table in its configuration Read Transform Analyze Deploy the two classes is arbitrarily chosen as the window as a data sheet. Data cells can be positive class and the ROC curve is built on copied and pasted in the sheet. Perfect knime:// protocol: References a file path relative to the probabilities/scores produced for that for generating small data sets. some key location of the current KNIME installation like class on the input data set. knime://knime.workflow/../<filename> or knime://

TRANSFORM DEPLOY Resources Data to Report Data to Report: Marks the data table to be exported to BIRT GroupBy GroupBy: Groups the rows of a table by the unique Math Formula Math Formula: Implements a number of math Joiner Joiner: Joins rows from two data tables based - a partially open source reporting tool integrated within ● KNIME Forum: Join our global values in selected columns and calculates operations across multiple input columns, from on common values in one or more key columns. KNIME. When switching from KNIME to BIRT, the marked community and engage in conversa- aggregation and statistical measures for the simple sum and average, to logarithms and The most common join types are possible: inner data sets are imported into BIRT. The Image To Report tions at forum.knime.com defined groups. Despite its simple name, it offers exponentials. All Math Formula operators are also join, left outer join, right outer join, and full outer node marks the input images to be exported to BIRT. powerful functionality and has many unsuspected available in the Column Expressions node. join. ● KNIME Books: More tips, ideas, and usages. For example - row deduplication. Excel Writer (XLS) Excel Writer (XLS): Writes the input data table to a sheet lessons from knime.com/knimepress in an Excel file (XLS or XLSX). Pivoting Pivoting: Extends the aggregation functionality of String to Date&Time String to Date&Time: Converts values in a String Sorter Sorter: Sorts the table in ascending or ● KNIME Events: Take a course, attend the GroupBy node by creating an output data table column into Date&Time values. The Date&Time descending order based on the values of a a workshop, or join a meetup at with columns and rows for the unique values in format contained in the String values can be chosen column. In addition, it is possible to sort knime.com/events selected input columns. Note: the unique values of manually defined or auto guessed. based on multiple columns. Table Writer Table Writer: Writes the input data table to a file using the the grouping column become rows and the unique ● KNIME Blog: Engaging topics, values of the pivoting column become columns. .table KNIME proprietary format. This format includes the full file structure and is optimized for space and speed. challenges, industry news, and knowledge at knime.com/blog Rule Engine Rule Engine: Applies a set of rules to each row of Cell Splitter Cell Splitter: Splits values in a selected column into Concatenate Concatenate: Merges vertically two data tables, Including the table structure in the file is a great advantage the input data table. All Rule Engine operators are two or more substrings, as defined by a delimiter by piling up cells in columns with the same - especially when exchanging data files among users. also available in the Column Expressions node. match. Delimiter is a set character, such as a name. Cells in uncommon columns are filled with ● KNIME Hub: Browse and share CSV Writer comma, space, or any other character or character missing values. The Concatenate (Optional in) CSV Writer: Writes the input data table to a CSV file. workflows, nodes, and components, sequence. node merges vertically up to four data tables. and/or share your own workflows. Add ratings, or comments to other workflows at hub.knime.com Partitioning Partitioning: Splits data into two subsets Column Filter Column Filter: Filters columns in or out from the Missing Value Missing Value: Defines a strategy to deal with according to a sampling strategy. This node is input data table according to a filtering rule. missing values in the input data table - either Google Sheets Google Sheets Writer: Writes the input data table into a ● More Guides: Still using SAS or generally used to produce a training and a test set Columns to be retained can be manually picked or globally on all columns, or individually for each Writer Google Sheet file. Authentication occurs on the Google Excel? Transition to KNIME Analytics to train and evaluate a machine learning model. selected according to their type, or of a regex single column. site. Google credentials are not saved within the KNIME expression matching their name. Platform with these handy guides at workflow. knime.com/knimepress Row Filter Row Filter: Filters rows in or out from the input data Column Rename Column Rename: Assigns new names and types to String Manipulation String Manipulation: Performs operations on Send to Tableau Connectors to Tableau: Export input data table into a ● KNIME Server: For team-based table according to a filtering rule. The filtering rule selected columns, as configured in the dialog. String values in columns, such as combining two Server can match a value in a selected column or numbers or more Strings together, extracting one or more Tableau file or server for reporting. collaboration, automation, in a numerical range. substrings, trimming blank spaces, and so on. All management, and deployment check operators are also available in the Column out KNIME Server at knime.com/server Expressions node. ® KNIME Press

Extend your KNIME knowledge with our collection of books from KNIME Press. For beginner and advanced users, through to those interested in specialty topics such as topic detection, data blending, and classic solutions to common use cases using KNIME Analytics Platform - there’s something for everyone. Available for download at www.knime.com/knimepress.

$ $ $

$

© 2019 KNIME AG. All rights reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. KNIME AG Technoparkstrasse 1, 8005 Zurich, Switzerland / [email protected] / www.knime.com