Visual Programming for Data Science Doing Complex Things the Right

Visual Programming for Data Science Doing Complex Things the Right

Visual Programming for Data Science Doing Complex Things the Right Way Michael Berthold Spring Data Talks March 24, 2021 1 Data Science Trends Democratizing Data Science No-Code / Low-Code © 2021 KNIME AG. All rights reserved. 2 Data Science Trends Democratizing Data Science No-Code / Low-Code “How to become a Data Scientist in One Weekend” © 2021 KNIME AG. All rights reserved. 3 Common Data Science Requirements Blend & Transform Model & Visualize Deploy & Manage Consume & Interact Techniques: • Joins • Clustering • Model Monitoring • Active Learning • Pivot • Regression • Updating • Interactive Applications • Normalization • Deep Learning • Validation • Services • PCA • Visualizations • … • … • … • … Technologies and Languages: • SQL • R, Python • Go • Plotly, R-Shiny • Data Lakes Data Science• TeamsJava, shouldn’tC really have• toKubernetes worry about all of this…• JavaScript (AWS, Azure, Google…) • JavaScript • Docker • Vue.js • … • … • Grafana • REST • … • … © 2021 KNIME AG. All rights reserved. 4 Visual Programming for Data Science © 2021 KNIME AG. All rights reserved. 5 Technologies & Languages under the Hood © 2021 KNIME AG. All rights reserved. 6 Data Science for Real . Data Science *is* Complex (that’s why it’s called “…Science”) . It requires a broad spectra of skills – often done in collaboration . It requires understanding and access to the inner wheels of an algorithm (but not to the algorithm itself) . No single language or tool has all the answers – it’s about tool blending as well! © 2021 KNIME AG. All rights reserved. 7 Data Science *is* complex: Data Wrangling © 2021 KNIME AG. All rights reserved. 8 Data Science *is* complex: Deep Learning www.knime.com/codeless-deep-learning-book (40% off until March 30th via Amazon) © 2021 KNIME AG. All rights reserved. 9 Data Science *is* complex: Visual Programming Visual Data Science Algorithms: Recursive Feature Elimination: © 2021 KNIME AG. All rights reserved. 10 Data Science *is* complex: More Topics . Data Blending: . Compliance . Feature Selection . Best Practices . Feature Engineering . Required Standards . Anonymization . Bias Detection and Removal . … . Data Privacy / GDPR . Explainable “AI” . Modeling . … . Model Selection . Parameter Optimization . Governance . Ensemble Creation . Integration / Dependencies . Active Learning . Traceable / Lineage . Transfer Learning . Documented . … . Reproducibility . … © 2021 KNIME AG. All rights reserved. 11 Data Science for Real . Data Science *is* Complex (that’s why it’s called “…Science”) . It requires a broad spectra of skills – often done in collaboration . It requires understanding and access to the inner wheels of an algorithm (but not to the algorithm itself) . No single language or tool has all the answers – it’s about tool blending as well! . Visual Programming (if done right) . provides the appropriate level of abstraction to allow focus on modeling a data process . allows access to all relevant nuts and bolts . removes tool interface complexity . enables abstraction / encapsulation to safely reuse . provides transparency . enables governance © 2021 KNIME AG. All rights reserved. 12 The Data Scientist Top-5 What Should a Visual Programming Environment for Data Science provide? 1. Consistent UI (similar look&feel no matter what the underlying technology) 2. One flexible Framework (interface the underlying technologies) 3. Uniform access to all relevant Parameters (translating different tool APIs to data science configurations) 4. Handle all Dependencies and Versions (guarantee (backwards) compatibility) 5. Access to the Underlying Technology when desired (inventing, writing, optimizing new algorithms is usually not part of the job) © 2021 KNIME AG. All rights reserved. 13 The Organizational Top-5 What does my Organization get? 1. Collaboration (No technology breaks inhibiting experts working together) 2. Reusability (Expertise easy to encapsulate & reuse) 3. Governance (The environment handles all SW integration and dependency nightmares) 4. Compliance (The environment enables best practices and permission handling) 5. Documentation and Meta Information (Capture, reproduce, and communicate the flow of data & information) © 2021 KNIME AG. All rights reserved. 14 “Computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty.” Donald Knuth 1974 © 2021 KNIME AG. All rights reserved. 15.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    15 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us