<<

Cheat Sheet: Wrangling with KNIME Analytics Platform

ACCESS DATA COMBINE DATA FILTER DATA WRITE DATA DATE&TIME Row Filter CSV Reader Reads a CSV file from either your local Amazon S3 Concatenates the Connector Connects to Amazon S3 and points to a Filters rows in or out of the input table Writes the input table(s) to sheet(s) Parses the strings in the selected file system or another connected file rows of all input working directory (with a UNIX-like syntax, e.g., Concatenate according to a filtering rule. The filtering rule Excel Writer in an Excel file (XLS or XLSX). Click String to Date&Time columns according to a date/time system. Click the three dots in the lower tables by writing /mybucket/myfolder/myfile). Allows can match a value in a selected column or the three dots in the lower left format and converts them into left corner to add a dynamic connection them below each downstream reader nodes to access data from numbers in a numerical range. corner to add a dynamic sheet input Date&Time cells. Four Date&Time input port to connect to an external file other. This is Amazon S3 as a file system. port to write multiple data tables forms are supported: only date, only system, like Amazon S3, Azure Blob especially useful for Rule-based Row Filter into multiple sheets. time, date&time, and date&time plus Storage, etc. tables with shared Filters rows in or out according to a set of rules, defined in its configuration window. time zone. Excel Reader column headers. Writes the input data table to a CSV Reads sheet(s) from one or more Common settings of Reader and Writer nodes Rules are evaluated from top to bottom. CSV Writer file. Click the three dots in the lower Extracts rows where the time value in Excel files. One sheet from each Joins the columns Using TRUE as the antecedent applies the Date&Time-based File path: All Reader and Writer nodes require a file path. Row Filter Excel file. A loop can be used to read of the two input Full outer join Inner join left corner to add a dynamic the selected column lies within a The file path can be expressed as an absolute path in rule to all unmatched rows. multiple sheets from one Excel file. tables based on one Reference connection input port to write to an given time window. The time window the local file system, a relative path to a key location in Joiner Left Right Left Right Row Filter external file system, like Amazon is specified either by a start and /or or multiple joining Table Table Table Table Filters rows in or out from the top input Table Reader Reads data from a .table file. The .table the current KNIME installation, or a path defined in an S3, Azure Blob Storage, etc. an end date or by a start date and a columns. Allows table according to matching values in the Send to Tableau external file system if such a connection is used. Server duration. files are organized using a KNIME you to select Right outer join Left outer join selected column of the lower input table. proprietary format, including the full file Uploads the input table to a Tableau Multiple files: Reader nodes can read and concatenate between different Calculates the difference between two structure, and are optimized for space and joiner modes and to Left Right Left Right server for reporting. multiple files, according to a selected file extension or Table Table Table Table Column Filter Filters columns in or out from the input date&time objects e.g., from two speed - providing maximum performance use multiple joining Date&Time file name pattern. table according to a filtering rule. Difference selected columns, from a selected with minimum configuration. columns. SAP Reader Columns to be retained can be manually Send to Power BI column and a fixed value, from a (Theobald Software) Transformation tab: Reader nodes include a Cell Replacer picked or selected according to their type, selected column and the current Transformation tab for renaming, filtering, re-ordering, Replaces the values in one column of the table at Uploads the input table to Microsoft Loads data from various SAP systems or based on a regex expression matching execution time, or from one cell and and type changing of the columns. the top input port with values from a look up table Power BI for reporting. (e.g. SAP S/4HANA, SAP BW, SAP /3). their name. the cell in the previous row for a provided at the bottom input port. selected column. Extract Date&Time Fields DB Reader DB Writer DB Connector DB Row Filter Expands the input SQL query to include the Inserts the data rows from the top Extracts selected time and date fields Connects to any JDBC-compliant Executes the input SQL query on row filter criteria defined in the input port into a table in the database from a selected column of type database. The JDBC driver must be added the database and exports the Expands the input SQL query to include the join DB Joiner configuration window. Grouping of multiple specified by the input connection port. date&time and appends their values in in the KNIME Preferences and then results into a KNIME data table. of two tables. It has a similar configuration conditions with an AND or OR conjunctions If the database table does not exist it new columns. selected in the node configuration window. window as the joiner node. No SQL coding is also supported. No SQL coding required. will be created. H2 Connector required. There are more DB nodes, all expanding DB Connection DB Table Selector Creates a SQL query to access the database DB Query Connects to an H2 database. Similar the input SQL query with additional SQL Table Writer table selected in the configuration window. The Modifies the input SQL query using Writes the resulting rows from the dedicated connector nodes connect to other instructions. Besides the SQL Query node, no DB CLEAN DATA table can be selected either via browsing the custom SQL. The input SQL query is input SQL query into a new table inside databases, such as MySQL or PostgreSQL. nodes require SQL coding. DATABASES represented by the place holder database metadata or via a custom SQL query. the database. Missing Value #table#. Defines and applies a strategy to replace missing values in the input table - either globally on all columns, or individually for each single column.

RESHAPE AND AGGREGATE DATA Duplicate DYNAMIC PORT Row Filter Detects duplicate rows and applies the Concatenate selected operation, e.g. removes Dynamic ports: Additional input ports can be duplicate rows. Duplicates are rows Groups the rows of a table by the added by clicking the three dots in the bottom left that have the same value in all selected unique values in selected Combine Filter Aggregate Write corner of a node. columns. GroupBy columns and calculates Numeric Outliers aggregation and statistical Detects and treats numerical outliers for measures for the defined groups. each of the selected columns Despite its simple name, it offers individually using the interquartile range powerful functionality and has FORMAT EXCEL SHEETS (IQR). many unsuspected usages. The Continental Nodes for KNIME extension allows you to automatically Extends the aggregation format an existing Excel sheet. The key is an additional data table of the functionality of the GroupBy same size as the original Excel sheet, where each cell contains one or node by creating an output table Group Pivot Pivoting more comma separated tag values e.g., header, border, etc. Based on with columns and rows for the these tags, the XLS Formatter nodes add new formatting instructions to unique values in the selected the existing instructions, as available at the lower (optional) input port. E-Books: KNIME Advanced Luck covers input columns. The unique advanced features & more. Practicing Data values of the grouping columns XLS Control Science is a collection of case DATA TYPES & CONVERSIONS Table Generator Transforms the input table to an XLS Control Table, become rows and the unique studies from past projects. Both available at values of the pivoting columns meaning it exchanges the column names to A, B, C, ... and S String: Sequence of characters, e.g. "This is a string" Collection Cell: Collection of multiple values of either .com/knimepress become columns. the row IDs to 1, 2, 3, ... It is the kickoff node to collect I Integer: Whole real valued number, e.g. -100 or 345 the same or different types e.g., can be a list of values formatting instructions for an Excel sheet and feeds all XLS Splits values in the selected column D Double: Real valued number, e.g. -0.432 or 45.39 or a set of values. In a set each value occurs only once. formatter nodes. Maps the categorical values in the Cell Splitter KNIME Blog: Engaging topics, challenges, into two or more substrings, as Date&Time : A data format for date, time, date&time, or XLS Background selected columns to integer values Document/Image: KNIME Analytics Platform Colorizer industry news, & knowledge nuggets at Category to Number and exports the mapping rules to defined by a delimiter match. A date&time plus time zone. supports many more data types like text documents, Adds background color and/or pattern fill formatting knime.com/blog the model output port. The delimiter is a defined character, such B Boolean: Two possible values only, e.g. TRUE and FALSE images, fingerprints, etc. instructions to all cells with a specified tag in the XLS as a comma, space, or any other Category to Number (Apply) and Number to String String to Number Control Table at the top input port. E-Learning Courses: Take our free online Number to Category (Apply) nodes character or character sequence. Converts the data type of the selected Converts the data type of the selected self-paced courses to learn about the apply the mapping rule in both 2 S columns from a number format, e.g. integer S 2 columns from string to either double or XLS Border Ungroup Formatter different steps in a data directions. Ungroups a collection-type cell by or double, to string. integer. Adds border formatting instructions for a given range creating one row for each value in the specified by a tag in the XLS control table at the top science project (with exercises & solutions to Creates one new column for each collection cell. Other columns from input port. test your knowledge) at value in the selected input One to Many the input table are left unaltered. www.knime.com/knime-self-paced-courses column. These values become the column headers. Cells in the CREATE COLUMNS XLS Cell Merger Stacks the cells of the selected value newly created columns are set to Unpivoting Adds formatting instructions to merge all cells with a KNIME Hub: Browse and share workflows, columns into one column. The cells Math Formula String Manipulation 0 if the value is not present, Implements a number of math operations Performs operations on string values in specified tag in the XLS control table at the top input port. nodes, and components. Add ratings, or of the selected retained columns are otherwise 1. This type of encoding across multiple input columns. The math S columns, such as combining two or more comments to other workflows at is called one-hot vector. appended to the corresponding operations can be applied to multiple columns strings together, extracting one or more hub.knime.com Transpose output rows. with the Math Formula (Multi Column) node. substrings, trimming blank spaces, and so on. XLS Conditional Formatter Sorter Converts the rows to columns and Rule Engine String Replacer Adds formatting instructions to color cell backgrounds KNIME Forum: Join our global community & the columns to rows. Sorts the table in ascending or Applies a set of rules to each row of the according to their numeric value for all cells specified by a engage in conversations at Replaces values in a selected string column if descending order based on the input table. Rules are applied from top to tag in the XLS control table at the top they match a defined pattern. forum.knime.com values of one or more columns. bottom. The first rule that matches is used. XLS Formatter Performs several transformations (apply) KNIME Server: For team-based collaboration, Table Manipulator Value Counter at once, such as renaming, filtering, Column Expressions Counts the number of occurrences of Counter Generation Combines the functionality of the Math automation, management, & deployment re-ordering and type changing, on Applies all formatting instructions to an existing Excel sheet. all values in a selected column from Creates a new column with a counter. The Formula, Rule Engine, and String check out KNIME Server at the input columns. By adding the input table. start value and step size are defined in the Manipulation nodes. More than one www.knime.com/knime-server dynamic ports it can replace a configuration window. expression can be defined to modify or add concatenate node. multiple columns at the same time. KNIME Press

Extend your KNIME knowledge with our collection of books from KNIME Press. For beginner and advanced users, through to those interested in specialty topics such as topic detection, , and classic solutions to common use cases using KNIME Analytics Platform - there’s something for everyone. Available for download at www.knime.com/knimepress.

KNIME ® BEGINNER·S LUCK

Decision Tree Learner File Reader Partitioning

Decision Tree training to Predictor Scorer original 80 vs. 20 predict income data set

attach class confusion matrix probabilities + scores

A Guide to KNIME Analytics Platform for Beginners Authors: Satoru Hayasaka and Rosaria Silipo

SECOND EDITION

Data Blending with KNIME

KNIME® Rosaria Silipo & Lada Rudnitckaia

© 2021 KNIME AG. All rights reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.