Integrating High-Performance Machine Learning: H2O and KNIME

Integrating High-Performance Machine Learning: H2O and KNIME

Integrating high-performance machine learning: H2O and KNIME Mark Landry (H2O), Christian Dietz (KNIME) © 2017 KNIME AG. All Rights Reserved. Speed Accuracy H2O: in-memory machine learning platform designed for speed on distributed systems 2 High Level Architecture HDFS H2O Compute Engine Exploratory & Supervised & S3 Load Data Descriptive Unsupervised Predict Analysis Modeling Distributed In-Memory Feature Model NFS Data & Model Engineering & Evaluation & Loss-less Storage Compression Selection Selection Local Data Prep Export: Model Export: Plain Old Java Plain Old Java Object Object SQL Production Scoring Environment Your Imagination 3 Distributed Algorithms Advantageous Foundation • Foundation for In-Memory Distributed Algorithm Calculation - Distributed Data Frames and columnar compression • All algorithms are distributed in H2O: GBM, GLM, DRF, Parallel Parse into Distributed Rows Deep Learning and more. Fine-grained map-reduce iterations. • Only enterprise-grade, open-source distributed algorithms in the market User Benefits • “Out-of-box” functionalities for all algorithms (NO MORE SCRIPTING) and uniform interface across all languages: R, Python, Java • Designed for all sizes of data sets, especially large data Foundation for Distributed Algorithms Distributed for Foundation • Highly optimized Java code for model exports Fine Grain Map Reduce Illustration: Scalable • In-house expertise for all algorithms Distributed Histogram Calculation for GBM 4 Scientific Advisory Council 5 Machine Learning Benchmarks (https://github.com/szilard/benchm-ml) Time (s) AUC X = million of samples X = million of samples Gradient Boosting Machine Benchmark (also available for GLM and Random Forest) 6 H2O Algorithms Supervised Learning Unsupervised Learning • Generalized Linear Models: Binomial, • K-means: Partitions observations into k Statistical Gaussian, Gamma, Poisson and Tweedie Clustering clusters/groups of the same spatial size. Analysis • Naïve Bayes Automatically detect optimal k • Distributed Random Forest: Classification • Principal Component Analysis: Linearly transforms or regression models Dimensionality correlated variables to independent components Ensembles • Gradient Boosting Machine: Produces an Reduction • Generalized Low Rank Models: extend the idea of ensemble of decision trees with increasing PCA to handle arbitrary data consisting of numerical, refined approximations Boolean, categorical, and missing data • Deep learning: Create multi-layer feed • Autoencoders: Find outliers using a Deep Neural forward neural networks starting with an Anomaly nonlinear dimensionality reduction using input layer followed by multiple layers of Networks Detection deep learning nonlinear transformations 7 H2O in KNIME Live Demo © 2017 KNIME AG. All Rights Reserved. 8 H2O in KNIME • Offer our users high-performance machine learning algorithms from H2O in KNIME • Allow to mix & match with other KNIME functionality – Data wrangling KNIME Analytics Platform functionality – KNIME Big-Data Connectors – Text Mining, Image Processing, Cheminformatics, … – and more! © 2017 KNIME AG. All Rights Reserved. 9 H2O in KNIME Live Demo © 2017 KNIME AG. All Rights Reserved. 10 H2O in KNIME – Cross Validation © 2017 KNIME AG. All Rights Reserved. 11 H2O in KNIME – Cross Validation © 2017 KNIME AG. All Rights Reserved. 12 H2O in KNIME – Cross Validation © 2017 KNIME AG. All Rights Reserved. 13 H2O in KNIME – Parameter Optimization © 2017 KNIME AG. All Rights Reserved. 14 H2O in KNIME – Parameter Optimization © 2017 KNIME AG. All Rights Reserved. 15 H2O in KNIME – Nodes in KNIME 3.4 © 2017 KNIME AG. All Rights Reserved. 16 H2O in KNIME – What’s cooking? © 2017 KNIME AG. All Rights Reserved. 17 H2O in KNIME – What’s cooking? © 2017 KNIME AG. All Rights Reserved. 18 Thank you! © 2017 KNIME AG. All Rights Reserved. 19.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us