[L1-DS] KNIME Analytics Platform for Data Scientists: Basics

KNIME AG

Copyright © 2020 KNIME AG Overview KNIME Analytics Platform

® Copyright © 2020 KNIME AG 12 What is KNIME Analytics Platform?

• A tool for data analysis, manipulation, visualization, and reporting • Based on the graphical programming paradigm • Provides a diverse array of extensions: – Text Mining – Network Mining – Cheminformatics – Many integrations, such as Java, , Python, Weka, Keras, Plotly, H2O, etc.

® Copyright © 2020 KNIME AG 23 Visual KNIME Workflows

NODES perform tasks on data Not Configured Configured Inputs Outputs Executed Status Error

Nodes are combined to create

WORKFLOWS

® Copyright © 2020 KNIME AG 34 Data Access

• Databases – MySQL, PostgreSQL – any JDBC (Oracle, DB2, MS SQL Server) • Files – CSV, txt – Excel, Word, PDF – SAS, SPSS – XML – PMML – Images, texts, networks, chem • Web, Cloud – REST, Web services – Twitter, Google

® Copyright © 2020 KNIME AG 45 Big Data

• Spark & Databricks • HDFS support • Hive • Impala • In-database processing

® Copyright © 2020 KNIME AG 56 Transformation

• Preprocessing – Row, column, matrix based • Data blending – Join, concatenate, append • Aggregation – Grouping, pivoting, binning • Feature Creation and Selection

® Copyright © 2020 KNIME AG 67 Analysis & Data Mining

• Regression – Linear, logistic • Classification – Decision tree, ensembles, SVM, MLP, Naïve Bayes • Clustering – k-means, DBSCAN, hierarchical • Validation – Cross-validation, scoring, ROC • Deep Learning – Keras, DL4J • External – R, Python, Weka, H2O, Keras

® Copyright © 2020 KNIME AG 78 Visualization

• Interactive Visualizations • JavaScript-based nodes – Scatter Plot, Box Plot, Line Plot – Networks, ROC Curve, Decision Tree – Plotly Integration – Adding more with each release! • Misc – Tag cloud, open street map, molecules • Script-based visualizations – R, Python

® Copyright © 2020 KNIME AG 89 Deployment

• Database • Files – Excel, CSV, txt – XML – PMML – to: local, KNIME Server, SSH-, FTP-Server • BIRT Reporting

® Copyright © 2020 KNIME AG 109 Over 2000 Native and Embedded Nodes Included:

Data Access Transformation Analysis & Mining Visualization Deployment MySQL, Oracle, ... Row Statistics R via BIRT SAS, SPSS, ... Column Data Mining JFreeChart PMML Excel, Flat, ... Matrix JavaScript Machine Learning Plotly XML, JSON Hive, Impala, ... Text, Image Web Analytics Community / 3rd Databases XML, JSON, PMML Time Series Text Mining Excel, Flat, etc. Text, Doc, Image, ... Java Network Analysis Text, Doc, Image Web Crawlers Python Social Media Analysis Industry Specific Industry Specific Community / 3rd R, Weka, Python Community / 3rd Community / 3rd Community / 3rd

® Copyright © 2020 KNIME AG 1011 Overview

• Installing KNIME Analytics Platform • The KNIME Workspace • The KNIME File Extensions • The KNIME Workbench – Workflow editor – Explorer – Node Repository – Description • Installing new features

® Copyright © 2020 KNIME AG 1112 Install KNIME Analytics Platform

• Select the KNIME version for your computer: – Mac – Windows – 32 or 64 bit – Linux • Download archive and extract the file, or download installer package and run it

® Copyright © 2020 KNIME AG 1213 Start KNIME Analytics Platform

• Use the shortcut created by the installer

• Or go to the installation directory and launch KNIME via the .exe

® Copyright © 2020 KNIME AG 1314 The KNIME Workspace

• The workspace is the folder/directory in which workflows (and potentially data files) are stored for the current KNIME session. • Workspaces are portable (just like KNIME)

® Copyright © 2020 KNIME AG 1415 The KNIME Workbench

® Copyright © 2020 KNIME AG 1516 KNIME Explorer

• In LOCAL you can access your own workflow projects. • The Explorer toolbar on the top has a search box and buttons to – select the workflow displayed in the active editor – refresh the view • The KNIME Explorer can contain 4 types of content: – Workflows – Workflow groups – Data files – Shared Components

® Copyright © 2020 KNIME AG 1617 Creating New Workflows, Importing and Exporting

• Right-click in KNIME Explorer to create new workflow or workflow group or to import workflow • Right-click on workflow or workflow group to export

® Copyright © 2020 KNIME AG 1718 Node Repository

• The Node Repository lists all KNIME nodes

• The search box has 2 modes – Standard Search – exact match of node name – Fuzzy Search – finds the most similar node name

• Nodes can be added by drag and drop from the Node Repository to the Workflow Editor.

® Copyright © 2020 KNIME AG 1819 Console and Other Views

• Console view prints out error and warning messages about what is going on under the hood

• Click on View and select Other… to add different views – Node Monitor, Licenses, etc.

® Copyright © 2020 KNIME AG 1920 Description

• The Description window gives information about: – Node Functionality – Input & Output – Node Settings – Ports – References to literature

® Copyright © 2020 KNIME AG 2021 Workflow Description

• When selecting the workflow, the Description window gives information about the workflow’s: – Title – Description – Associated Tags and Links – Creation Date – Author

® Copyright © 2020 KNIME AG 2122 Workflow Coach

• Node recommendation engine – Gives hints about which node use next in the workflow – Based on KNIME communities' usage statistics – Based on own KNIME workflows

® Copyright © 2020 KNIME AG 2223 Tool Bar

The buttons in the toolbar can be used for the active workflow. The most important buttons: – Execute selected and executable nodes (F7) – Execute all executable nodes – Execute selected nodes and open first view – Cancel all selected, running nodes (F9) – Cancel all running nodes

® Copyright © 2020 KNIME AG 2324 KNIME File Extensions

• Dedicated file extensions for Workflows and Workflow groups associated with KNIME Analytics Platform

• *.knwf for KNIME Workflow Files

• *.knar for KNIME Archive Files

® Copyright © 2020 KNIME AG 2425 More on Nodes…

A node can have 3 states:

Not Configured: The node is waiting for configuration or incoming data.

Configured: The node has been configured correctly, and can be executed.

Executed: The node has been successfully executed. Results may be viewed and used in downstream nodes.

® Copyright © 2020 KNIME AG 2526 Inserting and Connecting Nodes

• Insert nodes into workspace by dragging them from Node Repository or by double-clicking in Node Repository • Connect nodes by left-clicking output port of Node A and dragging the cursor to (matching) input port of Node B • Common port types:

Model Flow Variable

Image

DB Data Data DB Connection

® Copyright © 2020 KNIME AG 2627 Node Configuration

• Most nodes require configuration • To access a node configuration window: – Double-click the node – Right-click -> Configure

® Copyright © 2020 KNIME AG 2728 Node Execution

• Right-click node • Select Execute in the context menu • If execution is successful, status shows green light • If execution encounters errors, status shows red light

® Copyright © 2020 KNIME AG 2829 Node Views

• Right-click node • Select Views in context menu • Select output port to inspect execution results

Plot View Data View

® Copyright © 2020 KNIME AG 2930 Curved Connections!

® Copyright © 2020 KNIME AG 3031 Getting Started: KNIME Example Server

• Connect via KNIME Explorer to a public repository with large selection of example workflows for many, many applications • Workflows also available on KNIME Hub

® Copyright © 2020 KNIME AG 3132 Sharing Workflows How to use the KNIME Hub

® Copyright © 2020 KNIME AG 3233 KNIME Hub

A place to share knowledge about Workflows and Nodes https://hub.knime.com

® Copyright © 2020 KNIME AG 34 The KNIME Hub

® Copyright © 2020 KNIME AG 3435 Searching Nodes and Workflows

® Copyright © 2020 KNIME AG 3536 Opening a Workflow from the Hub

® Copyright © 2020 KNIME AG 3637 Open Workflow in KNIME Analytics Platform

® Copyright © 2020 KNIME AG 3738 Saving the Workflow

® Copyright © 2020 KNIME AG 3839 Edit the Workflow

Drag & Drop

® Copyright © 2020 KNIME AG 3940 Sharing the Workflow

1. Save your Edits

2. Connect to KNIME Hub

® Copyright © 2020 KNIME AG 4041 Log in the Hub

KNIME Forum Account Credentials

® Copyright © 2020 KNIME AG 4142 Publish your Workflow

1. Edit Metadata

2. Drag & Drop Private: Visible only to you Public: Visible to the KNIME Community via the KNIME Hub

® Copyright © 2020 KNIME AG 4243 Open your Workflow in the Hub

® Copyright © 2020 KNIME AG 4344 Open your Workflow in the Hub

® Copyright © 2020 KNIME AG 4445 Hot Keys (for Future Reference)

Task Hot key Description Node Configuration F6 opens the configuration window of the selected node F7 executes selected configured nodes Shift + F7 executes all configured nodes Node Execution Shift + F10 executes all configured nodes and opens all views F9 cancels selected running nodes Shift + F9 cancels all running nodes Node Connections Ctrl + L connects selected nodes Ctrl + Shift + L disconnects selected nodes Ctrl + Shift + Arrow moves the selected node in the arrow direction Move Nodes and Ctrl + Shift + PgUp/PgDown moves the selected annotation in the front or in the back of all Annotations overlapping annotations F8 resets selected nodes Ctrl + S saves the workflow Workflow Operations Ctrl + Shift + S saves all open workflows Ctrl + Shift + W closes all open workflows Metanode Shift + F12 opens metanode wizard

® Copyright © 2020 KNIME AG 4546 Stay connected with KNIME

Blog: knime.com/blog Follow us on social media: Forum: forum.knime.com

KNIME Hub: hub.knime.com

KNIME E-Learning Course: www.knime.com/e-learning-course

® Copyright © 2020 KNIME AG 4647 Today’s Example: Churn Prediction

• Build a data science application step by step

• Each section of the course has an associated workflow with exercises

• The exercises complete the steps in the CRISP- DM cycle

® Copyright © 2020 KNIME AG 481 Today’s Example: Churn Prediction

Data Model Model Model Deployment Preparation Training Optimization Evaluation

It always starts with some data … Data Manipulation Model Training Parameter Tuning Performance Measures Files & DBs Data Blending Bag of Models Parameter Optimization Accuracy Dashboards Missing Values Handling Model Selection Regularization ROC Curve REST API Feature Generation Ensemble Models Model Size Cross-Validation SQL Code Export Dimensionality Reduction Own Ensemble Model No. Iterations … Reporting Feature Selection External Models … … Outlier Removal Import Existing Models Normalization Model Factory Partitioning … …

® Copyright © 2020 KNIME AG 49 The Data

• The data files used in the exercises are available in the “data” folder: data files in different file formats, web-based data, data on a database, etc.

• For churn prediction, customer data are blended from different sources • The Data Explorer node is helpful in inspecting data

® Copyright © 2020 KNIME AG 503 Today’s Example: Churn Prediction

Explore the final workflow

® Copyright © 2020 KNIME AG 514 Importing Data Accessing Files and Databases

® Copyright © 2020 KNIME AG 521 Data Source Nodes

Typically characterized by: - color - No input ports, 1-2 output ports

Output port Status

Node label

® Copyright © 2020 KNIME AG 532 New Node: File Reader

Workhorse of the KNIME Source nodes • Reads all text based files (e.g. csv, txt, etc.) • Many advanced features allow it to read most ‘weird’ files – Short lines, inline comments, headers and special encoding

YouTube KNIME TV Channel video: https://youtu.be/flaHQw-Qhlg

® Copyright © 2020 KNIME AG 543 File Reader Configuration

File path

Basic Settings Advanced Settings

Preview

Help Button

® Copyright © 2020 KNIME AG 554 Alternative Faster Way …

Drag & Drop OR Copy & Paste

® Copyright © 2020 KNIME AG 565 Filenames and the knime:// Protocol

Absolute URL

Mountpoint-relative URL

Local path

® Copyright © 2020 KNIME AG 576 Workflow-Relative File Paths

• Best choice if workflows are to be shared • Requires matching folder structure within workflow group - Independent of environment outside of workflow group • Example: Path to „Sentiment Analysis.table“ - Local path: – :\Users\rb\knime-workspace\KNIMEUserTraining\data\Sentiment Analysis.table - Workflow relative:

YouTube KNIME TV Channel: https://youtu.be/U9sP4g4yGwY

® Copyright © 2020 KNIME AG 587 New Node: Excel Reader (XLS)

• Reads .xls and .xlsx file from Microsoft Excel • Supports reading from multiple sheets

® Copyright © 2020 KNIME AG 598 Excel Reader Configuration

File path

File system

Sheet specific settings

Preview

® Copyright © 2020 KNIME AG 609 Filenames and the knime:// Protocol

Absolute URL

Local Path

Custom URL

® Copyright © 2020 KNIME AG 1061 New Node: Table Reader

• Reads tables from the native KNIME Format. • Maximum performance, minimum configuration

File path

YouTube KNIME TV channel video: https://youtu.be/tid1qi2HAOo

® Copyright © 2020 KNIME AG 1162 Database Connectivity

• Read data from any JDBC enabled database • Write your own SQL or model it using dedicated nodes

® Copyright © 2020 KNIME AG 1263 New Nodes: Database Connectors

• Native: Postgres, MySQL, MS SQL Server, SQLite • DB Connector (e.g. DB2, HANA). • Big Data: HIVE and Impala

® Copyright © 2020 KNIME AG 1364 Other Useful Data Sources

• PMML Reader – reads standard predictive models • XML Reader with XPATH support • Python/R Source nodes • Tika Parser – extracts textual data from 200+ file types • REST Web Services, and many more

® Copyright © 2020 KNIME AG 1465 Importing Data Exercise

Start with exercise: Importing Data Read the following files • Sentiment Analysis.table • Sentiment Rating.csv • Product Data2.xls Optional: Read the web_activity table from the database WebActivity.sqlite (hint: drag and drop the files from the KNIME Explorer panel to get started) You can download the training workflows from the KNIME Hub: https://hub.knime.com/knime/space/Education/01%20KNIME%20User%20Training/

® Copyright © 2020 KNIME AG 1566 RESTful Web Services

• Use KNIME nodes to interact with RESTful web services • Send requests using standard HTTP methods

JSON Response:

XML Response:

® Copyright © 2020 KNIME AG 1667 RESTful Web Services

Provide authentication Enter URL, or use if necessary from column

Add delay between individual requests

https://www.knime.com/blog/a-restful-way-to-find-and-retrieve-data https://www.knime.com/blog/OSM-meets-CSV-file-and-Google-API

® Copyright © 2020 KNIME AG 1768 JSON Reader and JSON Path nodes

• Use the JSON Reader (or GET Request) node to get a JSON cell • Use the JSON Path node to query the JSON file and extract parameters • Editor window simplifies construction of JSON queries by auto-generating them (click on properties)

® Copyright © 2020 KNIME AG 1869 Google Sheets

• Access your data stored in Google Services - Read data from Google Sheets - Write data to new sheets - Modify existing sheets • Makes collaboration and sharing of data easy - (especially vs. sending Excel sheets via email...) Authenticate via pop- up window (Oauth2)

® Copyright © 2020 KNIME AG 1970 Google Sheets

• Select from available sheets on Google Drive • Transform data in KNIME, or enrich with new data • Create new sheet or update existing sheets - Allows to read from / write to specific range of sheet (e.g. A1:G10)

Authenticate via pop- up window (Oauth2)

Select from available sheets, Specify target sheet, select open in browser for preview which columns to write, etc.

® Copyright © 2020 KNIME AG 2071 Today’s Example

® Copyright © 2020 KNIME AG 2172 Data Manipulation Clean, Join, Aggregate

® Copyright © 2020 KNIME AG 731 Data Manipulation Nodes

• Yellow color with a variety of input and output ports • Apply a transformation to input data • Many, many nodes!

® Copyright © 2020 KNIME AG 742 New Node: Concatenate

Combine rows from 2 or more tables with shared columns • Handles duplicate row keys gracefully • Take the union or intersection of columns

® Copyright © 2020 KNIME AG 753 Dynamic Ports

Add and remove node ports based on your needs, e.g. in order to concatenate three or more tables

Click on the three dots in the bottom left corner

® Copyright © 2020 KNIME AG 764 New Node: Cell Replacer

Replaces the content of a column based on a lookup • Top port references the table to be searched • Bottom port holds the lookup table (search keys and replacement values)

® Copyright © 2020 KNIME AG 775 New Node: String Manipulation

Create and edit values in String columns • Clean up capitalization (eg. Lowercase) • Replace strings • Modify existing strings or create new columns

® Copyright © 2020 KNIME AG 786 Data Manipulation Exercise, Activity I

Start with exercise: Data Manipulation, Activity I • Concatenate web activity data from the old and new systems • Replace the written sentiment values with the numeric sentiment scores • Make sure that all product names in the product data spreadsheet are written in lower case letters

® Copyright © 2020 KNIME AG 797 Joining Columns of Data

Left Table Right Table

CustomerKey OrderDate OrderID Join by CustomerKey CustomerKey DoB City Gender 22 2019-09-23 #23444 17 1974-02-23 Berlin F 24 2019-09-30 #23457 65 2001-05-25 Stuttgart F 15 2019-10-07 #28985 Inner Join 35 1988-08-05 Cologne M 10 2091-10-13 #29999 15 1983-07-20 Hamburg M 10 1993-01-13 Berlin M

CustomerKey OrderDate OrderID DoB City Gender

15 2019-10-07 #28985 1983-07-20 Hamburg M Left Outer Join 10 2091-10-13 #29999 1993-01-13 Berlin M Right Outer Join

CustomerKey OrderDate OrderID DoB City Gender CustomerKey OrderDate OrderID DoB City Gender

22 2019-09-23 #23444 ? ? ? 17 ? ? 1974-02-23 Berlin F 24 2019-09-30 #23457 ? ? ? 65 ? ? 2001-05-25 Stuttgart F 15 2019-10-07 #28985 1983-07-20 Hamburg M 35 ? ? 1988-08-05 Cologne M

10 2091-10-13 #29999 1993-01-13 Berlin M 15 2019-10-07 #28985 1983-07-20 Hamburg M 10 2091-10-13 #29999 1993-01-13 Berlin M

® Copyright © 2020 KNIME AG 808 Joining Columns of Data

Left Table Right Table

CustomerKey OrderDate OrderID Join by CustomerKey CustomerKey DoB City Gender 22 2019-09-23 #23444 17 1974-02-23 Berlin F 24 2019-09-30 #23457 65 2001-05-25 Stuttgart F 15 2019-10-07 #28985 35 1988-08-05 Cologne M 10 2091-10-13 #29999 15 1983-07-20 Hamburg M 10 1993-01-13 Berlin M

Full Outer Join

CustomerKey OrderDate OrderID DoB City Gender Missing values in the left table 17 ? ? 1974-02-23 Berlin F 65 ? ? 2001-05-25 Stuttgart F 35 ? ? 1988-08-05 Cologne M 15 2019-10-07 #28985 1983-07-20 Hamburg M Missing values in 10 2091-10-13 #29999 1993-01-13 Berlin M the right table 22 2019-09-23 #23444 ? ? ? 24 2019-09-30 #23457 ? ? ?

® Copyright © 2020 KNIME AG 819 New Node: Joiner

Combines columns from 2 different tables • Top port contains “Left” data table • Bottom port contains “Right” data table

® Copyright © 2020 KNIME AG 1082 Joiner Configuration – Linking Rows

Joiner mode

Values to join on. Multiple joining columns are allowed

® Copyright © 2020 KNIME AG 1183 Joiner Configuration – Column Selection

Columns from left table to output table

Columns from right table to output table

® Copyright © 2020 KNIME AG 1284 Data Aggregation

Product ID Category # Ordered Items P 1 Clothing 2 Group Sum(# Ordered Items) P 2 Home 3 Clothing 8 P 3 Clothing 1 Home 3 P 4 Clothing 5 Electronics 12 P 5 Electronics 7 P 6 Electronics 5

Aggregated on Category (group) by Sum (aggregation method)

® Copyright © 2020 KNIME AG 1385 New Node: GroupBy

Aggregate rows to summarize data • First tab provides grouping options

• Second tab provides control over aggregation details Aggregation columns

Aggregation methods

YouTube KNIME TV video: https://youtu.be/bDwF-TOMtWw

® Copyright © 2020 KNIME AG 1486 New Node: Duplicate Row Filter

Detect duplicate row and apply a selected treatment • First tab provides the option to select columns • Second tab provides options for treating duplicated values Flag or Remove Duplicates

Select criteria to keep row

® Copyright © 2020 KNIME AG 1587 New Node: Column Expression

• Append or modify an arbitrary number of columns using expressions • Many different functions are available • No restriction on number of lines per expression allow to write complex expressions • Part of the KNIME Labs extension

® Copyright © 2020 KNIME AG 1688 Workflow Organization and Documentation

® Copyright © 2020 KNIME AG 1789 Comments & Annotations DoubleDouble--clickclick toto writewrite UseUse thethe panelpanel toto changechange propertiesproperties

YouTube KNIME TV Channel: https://youtu.be/AHURYB_O8sA

® Copyright © 2020 KNIME AG 1890 Workflow Organisation – Good Practices

• Workflow annotations • Node labels • Metanodes − Right click -> Create Metanode... − Organize workflow by task − Hide complexity & improve readability

® Copyright © 2020 KNIME AG 1991 Workflow Organisation – Components

• Component encapsulates a reusable functionality as a KNIME workflow • Components can be configured as any KNIME nodes Drag and drop from the KNIME Hub to your workflow • Access and share components on the KNIME Hub

® Copyright © 2020 KNIME AG 2092 KNIME WorkflowDiff

• Automates identification and comparison of nodes in a workflow, metanodes, and two different workflows • Identifies insertions, deletions, substitutions, and parameter changes

® Copyright © 2020 KNIME AG 2193 Data Manipulation Exercise, Activity II

Start with exercise Data Manipulation, Activity II • Join all data into one table using a series of joiner nodes (use "Customer Key" as the joining column) • Filter out duplicate rows • Clean up and document your workflow using annotations, node labels, and metanodes

® Copyright © 2020 KNIME AG 2294 Today’s Example

® Copyright © 2020 KNIME AG 2395 Data Visualization Charts and Tables

® Copyright © 2020 KNIME AG 961 Data Visualization

• Large selection of easy to use visualization nodes − Web-based and interactive − Dedicated nodes, no scripting required • Plotly nodes − Similar but integrated from an external library

• R and Python View nodes for highly customizable graphics − Require scripting

® Copyright © 2020 KNIME AG 972 Visualizations using 1 Column

® Copyright © 2020 KNIME AG 98 Visualizations using 2 Columns

1 Column

® Copyright © 2020 KNIME AG 99 Visualizations using 3 Columns

® Copyright © 2020 KNIME AG 100 New Node: Scatter Plot

Image outport • Plots different columns on X and Y • Displays data including color information Interactivity options • Produces an interactive view and an image • Select data points and publish selection to other views

® Copyright © 2020 KNIME AG 1016 New Node: Scatter Plot

Four configuration tabs

® Copyright © 2020 KNIME AG 1027 New Node: Color Manager

• Color by nominal or continuous values • Sync colors between views using the color model port and Color Appender node

Color range Discrete for numerical colors for values nominal values

® Copyright © 2020 KNIME AG 1038 New Node: Bar Chart

• Show numerical values across categories • Vertical or horizontal bars • Bars can be grouped or stacked

® Copyright © 2020 KNIME AG 1049 New Node: Line Plot

• Plot sequence of values, e.g. over time • Useful to identify trends, also between groups

® Copyright © 2020 KNIME AG 10510 New Node: Stacked Area Chart

• Visualizes numerical values from multiple columns as stacked areas • Great for plotting distributions over time

® Copyright © 2020 KNIME AG 10611 Selection & Filtering in JavaScript Views

Interactivity allows you to select data points in views − Selection is propagated to other views − Highlight selected rows or filter them − Click “Apply” to add column to data that indicates selection (true/false) for use in downstream nodes

Apply selection

® Copyright © 2020 KNIME AG 10712 Components – Combined Views

• Multiple JavaScript View nodes Scatter Plot can be combined in Components • Selections are transmitted to all other views • Also for use on the KNIME WebPortal

Table View

® Copyright © 2020 KNIME AG 10813 Interactivity across Charts: Selection and Filter Events

® Copyright © 2020 KNIME AG 109 Interactivity across Charts: Selection and Filter Events

® Copyright © 2020 KNIME AG 110 Interactivity across Charts: Selection and Filter Events

Subscribing to Selection and Filter

® Copyright © 2020 KNIME AG 111 Interactivity across Charts: Selection and Filter Events

Subscribing to Selection and Filter Publishing Selection

® Copyright © 2020 KNIME AG 112 Configure Content and Views Layout

• Click layout button when inside • Add views and rows via drag&drop Component to assign views to rows • Add columns using + buttons and columns

® Copyright © 2020 KNIME AG 11318 Data Aggregation

Aggregation: Count Category Online Onsite Product Store Category # Ordered ID Items Clothing 2 1 P 1 Online Clothing 2 Home 0 1 P 2 Onsite Home 3 Electronics 2 0 P 3 Onsite Clothing 1 P 4 Online Clothing 5 P 5 Online Electronics 7 Aggregation: Sum (# Ordered Items) P 6 Online Electronics 5 Category Online Onsite Clothing 7 1 Home 0 3 Electronics 12 0 Solution: Pivoting Node

® Copyright © 2020 KNIME AG 19 114 Data Aggregation

Product Store Category # Ordered ID Items Aggregation: Sum (# Ordered Items) P 1 Online Clothing 2 Category Online Onsite P 2 Onsite Home 3 Clothing 7 1 P 3 Onsite Clothing 1 Home 0 3 P 4 Online Clothing 5 Electronics 12 0 P 5 Online Electronics 7 P 6 Online Electronics 5

Pivoting Node: Group - Pivot - Aggregate

® Copyright © 2020 KNIME AG 20 115 New Node: Pivoting

Performs pivoting on selected columns for grouping and pivoting • Values of group columns become unique rows • Values of the pivot columns become unique columns for each set of column combination together with each aggregation • Many aggregation methods are provided (similar to GroupBy)

® Copyright © 2020 KNIME AG 11621 New Node: Pivoting

Groups ~ Rows

Pivots ~ Columns

Aggregation

® Copyright © 2020 KNIME AG 11722 Script-based View Nodes

• R View nodes for greater customizability − Use your favorite libraries, e.g. ggplot2 • If you prefer Python: Python View node • For JS developers: Generic JavaScript View

® Copyright © 2020 KNIME AG 11823 Visualization Exercise Start with exercise: Visualization • Read sales.csv data • Assign a different color to each product • Plot BasketValue against BasketSize using the Scatter Plot node • Show the total BasketValue by time and product in a Line Plot and a Stacked Area Chart (Use the Pivoting node to get the sum of sales by Quarter and Product!)

• Execute the Fully Joined Data metanode • Show the number of customers in the different web activity categories in a Bar Chart • Show the age distribution of the customers in a Histogram • Create a composite view by combining the Bar Chart and Histogram • Select one web activity class in the Bar Chart. Which age classes are represented in the selected web activity class?

® Copyright © 2020 KNIME AG 11925 Today’s Example

® Copyright © 2020 KNIME AG 12026 Data Mining Partition, Learn, Predict, Score

® Copyright © 2020 KNIME AG 1211 Data Mining Strategies

Example Applications: • Anomaly Detection (fraud, predictive maintenance) • Association Rule Learning (market basket analysis) • Clustering (market segmentation) • Classification (next best offer, churn preventions) • Regression (trend estimation)

® Copyright © 2020 KNIME AG 1222 Data Mining: Process Overview

Train Training Model Set Apply Score Model Model Original Data Set Test Set

Train and Evaluate Partition data apply models performance

® Copyright © 2020 KNIME AG 1233 Data Mining in KNIME

• KNIME has many modeling tools! - Decision tree, random forest, SVM, regression, neural networks, clustering, … - and integrations with other libraries: R, Python, H2O, WEKA, libSVM, etc. • And many model evaluation nodes - ROC, standard, numeric and entropy scorers - Feature elimination - Cross validation

® Copyright © 2020 KNIME AG 1244 New Node: Partitioning

• Use to split data into training and evaluation sets - Partition by count (e.g. 10 rows) or fraction (e.g. 10%) - Sample by a variety of methods; random, linear, stratified

® Copyright © 2020 KNIME AG 1255 Learner-Predictor Motif

• Most data mining approaches in KNIME use a Learner-predictor motif. • The Learner node trains the model with

its input data. Trained Model • The Predictor node applies the model to a different subset of data.

New Data!

® Copyright © 2020 KNIME AG 1266 Classification

Predict nominal outcomes on existing data (supervised) • Applications - Churn analysis (yes/no) - Chemical activity (active/inactive) - Spam detection (spam/not spam) - Optical character recognition (A-Z)

• Methods - Decision Trees - Neural Networks - Naïve Bayes -

® Copyright © 2020 KNIME AG 1277 Target Column

• Target column contains values that are predicted by the classification model • Binomial target values are often encoded to 1 and 0

Application Target Target Values Column Churn Churn Yes/No or 1/0 analysis Chemical Active Yes/No or 1/0 activity Spam Spam Yes/No or 1/0 Detection Optical Character A-Z Character Recognition

® Copyright © 2020 KNIME AG 1288 KNIME’s Decision Tree

J.R. Quinlan, “C4.5 Programs for machine learning” J. Shafer, R. Agrawal, M. Mehta, “SPRINT: A Scalable Parallel Classifier for Data Mining”

• C4.5 builds a tree from a set of training data using the concept of information entropy. • At each node of the tree, the attribute of the data with the highest normalized information gain (difference in entropy) is chosen to split the data. • The C4.5 algorithm then recurses on the smaller sub lists.

® Copyright © 2020 KNIME AG 1299 New Node: Decision Tree Learner

® Copyright © 2020 KNIME AG 13010 Decision Tree View

Most of the people who don‘t churn have more than one contract

® Copyright © 2020 KNIME AG 13111 New Node: Decision Tree Predictor

• Takes a decision tree model & applies it to new data • Check the box to append class probabilities

® Copyright © 2020 KNIME AG 13212 New Node: Scorer

Compare predicted results to known truth in order to evaluate model quality

® Copyright © 2020 KNIME AG 13313 New Node: Scorer

Confusion matrix shows the distribution of model errors

An accuracy statistics table provides a detailed analysis of model quality

® Copyright © 2020 KNIME AG 13414 Confusion Matrix

® Copyright © 2020 KNIME AG 13515 Receiver Operating Characteristics

• Sort by confidence in target class • Plot true positive rate vs false positive rate • Ideal models achieve 100% TPR with 0% FPR • Area under the curve indicates model quality - (1=ideal model, 0.5 = random outcome)

® Copyright © 2020 KNIME AG 13616 New Node: ROC Curve

• Requires individual class probabilities from a preceding predictor • User must define: 1. Original class column 2. Positive class value 3. Probability for the selected positive class value for one or multiple models

® Copyright © 2020 KNIME AG 13717 Data Mining Exercise, Activity I

Start with exercise: Data Mining, Activity I: • Partition the fully joined data into a training and test set (50%, Stratified Sampling on Target) • Train a decision tree on the training set to predict Target • Use the trained model to predict Target in the test set • What is the overall accuracy of your model? • Optional: Evaluate the accuracy and robustness of the model with the ROC Curve node

® Copyright © 2020 KNIME AG 13818 Regression

Predict numeric outcomes on existing data (supervised) • Applications - Forecasting - Quantitative Analysis

• Methods - Linear - Polynomial - Regression Trees - Partial Least Squares

® Copyright © 2020 KNIME AG 13919 New Nodes: Linear Regression Learner & Regression Predictor

A linear model relating a dependent variable to 1 or more independent variables - Model coefficients provided in 2nd output port - Also available: Polynomial and Tree Ensemble Regression nodes

® Copyright © 2020 KNIME AG 14020 New Node: Numeric Scorer

Similar to scorer node, but for nodes with numeric predictions (e.g. linear/polynomial regression) • Compare dependent variable values to predicted values to evaluate goodness of fit. • Report R2, MAE, MSE, RMSE etc.

® Copyright © 2020 KNIME AG 14121 Data Mining Exercise, Activity II

Start with exercise: Data Mining, Activity II: • Read weather.table data • Split the data into rows up to 2016 (training set) and rows from 2017 on (test set) • Train a linear regression model that predicts the AIR_TEMP as a function of all other features in the dataset • Use the model to predict the temperature in 2017 and evaluate the model with the Numeric Scorer node • Optional: – Calculate the mean temperature per month in the training data – Join the mean temperature per month to the test set – Use the Numeric Scorer to see if the average monthly temperature provides a better prediction than the Linear Regression model

® Copyright © 2020 KNIME AG 14222 Today’s Example

® Copyright © 2020 KNIME AG 14323 Clustering

Discover hidden structure in unlabeled data (unsupervised)

• Applications - Market Segmentation - Diversity picking • Methods - K-means/medoids - Hierarchical - DBScan - OPTICS - Neighbourgrams

® Copyright © 2020 KNIME AG 14424 New Nodes: k-Means Clustering

• Looks at n observations to define the means for k clusters. • Each observation is then assigned to its closest cluster center. • You must provide k.

® Copyright © 2020 KNIME AG 14525 Data Mining Exercise, Activity III

Start with exercise: Data Mining, Activity III • Read location_data.table data • Filter the data to entries from California (region_code = CA) • Perform k-means clustering with k=3. Use only latitude and longitude for clustering. • Optional: plot latitude and longitude in a view (OSM Map or Scatter Plot) and use the view to visually optimize k

® Copyright © 2020 KNIME AG 14626 Scripting Integrations: R and Python

• Run R or Python code in KNIME Analytics Platform • Works with existing Python and R installations • Syntax highlighting support • Different nodes for many tasks, e.g training a model using an algorithm available in Python

® Copyright © 2020 KNIME AG 14727 Java Snippet

• Fastest running scripting node in KNIME • Syntax highlighting, auto completion, error checking • Templates allow you to save scripts for later re-use • Import custom libraries

® Copyright © 2020 KNIME AG 14828 Exporting Data & Deployment

® Copyright © 2020 KNIME AG 1491 Exporting Data

After an analysis is completed, what next? - Write results to a file - Create/update a database - Save the model for use elsewhere - Generate a rich report - Deploy via KNIME WebPortal - Deploy via workflow as RESTful web service

® Copyright © 2020 KNIME AG 1502 Input/Output in Deployment

Input Output • File (CSV, Table, XLS, …) • Report (BIRT, Tableau, • Database Spotfire, PowerBI) • JSON for REST API • Email • File (CSV, Table, XLS, …) • WebPortal

® Copyright © 2020 KNIME AG 1513 To Report / Email

To BIRT Report

Also available: Nodes for Tableau and Spotfire

® Copyright © 2020 KNIME AG 1524 To File / Database

® Copyright © 2020 KNIME AG 1535 REST API (Available on KNIME Server)

® Copyright © 2020 KNIME AG 1546 To Dashboard on WebPortal

Step 3 Step 1 Step 2 Step 4 Step 5 Customize Upload File Select Columns Interactive View Download Image Column Domains

® Copyright © 2020 KNIME AG 1557 Workflow on KNIME WebPortal

WebPortal Page (Step 1) Upload File

Available in WebPortal Page KNIME Server (Step 4) Interactive View

® Copyright © 2020 KNIME AG 1568 Components to Produce Dashboard on Web Page

File Selection Column Selection

Stacked Area Chart Filter by Row Filter Range

® Copyright © 2020 KNIME AG 1579 Data Export Nodes

Typically characterized by: - Magenta color - 1 input port, no output ports - Create file on file system or write to database

® Copyright © 2020 KNIME AG 15810 New Node: Table Writer

® Copyright © 2020 KNIME AG 15911 New Node: XLS Writer

® Copyright © 2020 KNIME AG 16012 New Node: Database Writer

® Copyright © 2020 KNIME AG 16113 Automation: Call Local Workflow

• Use Call Local Workflow node to send data and parameters to other workflows and trigger execution - Send results back to caller-workflow

- Include report from called workflow Path to workflow • Create modular workflows Add report to output - E.g. separate workflows for ETL and prediction

• Alternative: Call Remote Workflow Click to query the expected input(s) - Trigger execution of workflows on KNIME Server via REST API Specify source column(s) with input data / parameters

® Copyright © 2020 KNIME AG 16214 Automation: Call Local Workflow

Convert output data to KNIME table ETL

Calls workflow once for each input row

Send results back to Prediction caller-workflow

Enter default format for incoming data

® Copyright © 2020 KNIME AG 16315 Use Call Local Workflow to Send Conditional Emails with Report

Sometimes, report should be sent under specific circumstances • E.g. if some KPI is below threshold

Provide email credentials, host, etc. Convert binary column to file and save to temp dir

Workflow creates Define rule, only Path to file report, sends back send email if with report binary column conditions apply

® Copyright © 2020 KNIME AG 16416 KNIME Server as a REST Resource

https://www.knime.org/blog/giving-the-knime-server-a-rest

® Copyright © 2020 KNIME AG 16517 KNIME Server as a REST resource

• Use Swagger, SOAPUI or Chrome extension Postman to explore the HTTP requests and test them

® Copyright © 2020 KNIME AG 16618 Remote File Handling – Cloud Storage

• Integrate remote data sources from Amazon AWS, Microsoft Azure, and Google Cloud - Upload files - Download files, or read their content directly into KNIME - List files in remote directories - Create directories - Delete files / directories

® Copyright © 2020 KNIME AG 16719 Remote File Handling – Cloud Storage

Example: Upload all files from a local directory to Amazon S3

Enter credentials

Upload!

Create Create new directory in bucket bucket Create URIs of local file paths

® Copyright © 2020 KNIME AG 16820 Reporting in KNIME

® Copyright © 2020 KNIME AG 16921 Reporting in KNIME

• Reporting in KNIME is done via a 3rd party application named BIRT (Business Intelligence Reporting Tool) • Data is sent to BIRT from KNIME using special nodes. • Reports in BIRT are constructed from report items, which may include images, tables, charts and labels. • Reports may be generated in a variety of formats (html, pdf, pptx, xlsx, docx, …)

® Copyright © 2020 KNIME AG 17022 Installation

• Can be installed via KNIME -> Install KNIME Extension • Install the KNIME Report Designer

® Copyright © 2020 KNIME AG 17123 New Node: Data to Report

Send a data table to BIRT

Set the node label!

Hint: The node label will be used to identify the data source in the reporting view -> Make sure to use understandable labels if you have more than one data source

® Copyright © 2020 KNIME AG 17224 New Node: Image to Report

Send an image to BIRT - PNG and SVG are supported formats (see node description for details)

Hint: Customize the image size in the Data to Report node to fit the report

® Copyright © 2020 KNIME AG 17325 Edit the Report

Open the workflow and click the Report Editor button in the tool bar

® Copyright © 2020 KNIME AG 17426 Reporting Perspective

Click button to create report

Data from KNIME - names of data Report layout – only sources are taken structure, data is filled in from node label when creating the report View tabs

Add report items via drag and drop

® Copyright © 2020 KNIME AG 17527 Charting in BIRT

• Many chart types • Fine control of plot appearance • Familiar ‘Excel Like’ interface • Supports interactivity

® Copyright © 2020 KNIME AG 17628 Tips & Tricks

• Use an underlying grid to structure the report • Names of columns should not change • Use the grouping function to combine results • Use the Master Layout Tab (For footers etc.)

® Copyright © 2020 KNIME AG 17729 Exporting Data Exercise

Start with exercise: Exporting Data • Write the predictions to a KNIME table • Write the decision tree model to a PMML model file • Create a heatmap of the normalized confusion matrix of your model and send it to a BIRT report • Send your model accuracy to a BIRT report • Create a simple report showing the overall accuracy and the heatmap of the confusion matrix • Generate a PDF of your report

® Copyright © 2020 KNIME AG 17830 Today’s Example

® Copyright © 2020 KNIME AG 17931 Thank You!

[email protected]

The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by Copyright © 2020 KNIME AG KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.