Watson Machine Learning for Z/OS — Jamar Smith Data Scientist, North America Z Hybrid Cloud [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Watson Machine Learning for z/OS — Jamar Smith Data Scientist, North America z Hybrid Cloud [email protected] 1 Goal Demonstrate the value of enterprise analytics on the IBM Z platform. 2 Agenda Enterprise Analytics Strategy Machine Learning Overview Value of Analytics in Place IBM Cloud Pak 4 Data 3 Enterprise Analytics Strategy 4 Current trends in analytics The need for Pervasive Analytics is increasing in almost every industry Real time or near real time analytic results are necessary Need to leverage all relevant data sources available for insight Ease demands on highly sought-after analytic skill base Embrace rapid pace of innovation 5 Data gravity key to enterprise analytics Performance matters Core transactional for variety of data on systems of record are and off IBM Z on IBM Z Predominance of data Real-time / near real- originates on IBM Z, time insights are z/OS (transactions, valuable member info, …) Data volume is large, distilling data Data Security / data privacy provides operational needs to be preserved efficiencies Gravity Podcast: http://www.ibmbigdatahub.com/podcast/making-data-simple-what-data-gravity 6 IBM Z Analytics Keep your data in place – a different approach to enterprise analytics • Keep data in place for analytics • Keep data in place, encrypted and secure • Minimize latency, cost and complexity of data movement • Transform data on platform • Improve data quality and governance • Apply the same resiliency to analytics as your operational applications • Combine insight from structured & unstructured data from z and non-z data sources • Leverage existing people, processes and infrastructure 7 Machine Learning Overview 8 What is Machine Learning? Computers that learn without being explicitly programmed. Provide Data Perform Analysis Actionable Insight Hint: It’s just a bunch of math. 9 Traditional decision process Loan Application Approve or Reject House Data Appraise Home Value Warranty Resolution Predict Causality Customer Satisfaction Churn 10 Decision process with ML Mathematical Function f(x) Loan Application Represents a pattern Approve or Reject House Data with a Mathematical Appraise Home Value Warranty Resolution Function Predict Causality Customer Satisfaction Churn 11 What’s involved in Machine Learning Machine learning prep Clearly define business problem Select data set to address business problem Transform Data Model Management Machine learning process Monitor models performance Build a model using a subset over time of the data Retrain model if Deploy the model to score performance has degraded against new data 12 Why Machine Learning? Tap into the rich value of historical data Discover insights and generate predictive models make better decisions Don’t just generate reports, use predictive analytics Predictive analytics in the future means things like: • Fraud detection The value of machine learning is rooted in • Optimization of resources its ability to create accurate models to • Infinite others all meant to increase guide future actions and to discover revenue or provide savings patterns that we’ve never seen before 13 Value of Analytics in Place 14 QMF: Move Towards ML with BI Start with the Data! QMF, the BI tool on IBM Z for the first Create stories with the data that influence step in a data driven enterprise questions about the business Analytics Stack on IBM Z Driven by Data Gravity IBM Z Operations Db2 AI for z/OS Analytics ML based Anti-Fraud (Db2ZAI) (IZOA, formerly IOAz) Solutions MACHINE LEARNING SOLUTIONS Machine Learning for z/OS MACHINE LEARNING PLATFORM Open Data Analytics for z/OS Spark Cluster Spark Cluster Spark Loader Anaconda Optimized Data Spark Cluster Data Warehouse Engine Access Layer Db2 Analytics Accelerator (IDAA) z/OS Distributed Platform ANALYTICS ENGINE Data Virtualization Manager (DVM) Transactions HTAP News Merchants Transactions User Behavior Twitter Merchants Client Data Db2 Analytics Accelerator (IDAA) z/OS Distributed Platform DATA 16 Open Source at its Core IBM Machine Learning for z/OS Business Govern, Manage, Algorithm Assist… Data ML Applications Data Prep Algo Model Deploy Predict Monitor, Feedback Distilled Distilled Insight Insight Query Analytic Result Acceleration Sets IBM Open Data Analytics for z/OS Python Optimized Data Integration Layer Merchant Transaction Customer Pauselss GC Distributed Apache New SIMD instructions Pervasive Spark 32 TB Memory Encryption Federate analytics leveraging data in place for more current insights at scale, optimized security, privacy and reduced costs 17 Full Lifecycle Machine Learning Platform Explore & Data Train & Evaluate Go Live: Predict Ingest Deploy Visualize Preparation Models and Monitor Data Engineers Data Application Production Scientists Developers Engineers Platform agnostic model development Leverage open source software Real-time insight with transactions Insight incorporated from any platform Industry leading encryption, security, reliability & availability 18 Tools for Both Coders and Non-Coders • Visual productivity tool around data science • Open-Source data science tools (Python, Spark, Jupyter Notebooks) • Quicker time to value • Inclusion of full-fledged data preparation and many machine learning algorithms VISUAL PROGRAMMATIC • Commercial tools (SPSS) • Trained using open source • Line of business/solution or self-taught focused • Works within a start-up, technology • Trained in data mining/ analytic firm, CIO office or dedicated methodology • Background in mathematics, • Background in social sciences, computer science economics, mathematics • Uses programming languages, APIs and avoids packages Better Together 19 Utilities to accelerate every stage of Machine Learning Auto data preparation Auto feature Auto modelling (ADP) engineering Automatically analyzes input Cognitive assistant for data scientists Automatically recommends data and prepares (CADS) feature set which can it for training • Select the best algorithm with the produce model with best • Fills missing values best performance from a set of accuracy • Encodes/decodes candidates • Join multiple tables and categorical data automatically select Hyperparameter optimization (HPO) • Index string data relevant features • • Group all numeric types into Select the hyperparameter with the • Feature selection based best performance from a set of vectors on underlying candidates given a specific algorithm • Normalize data correlation analysis CADS and HPO use the performance of models on small data sets to predict performance on large data sets. They use ML to facility ML 20 Data visualization of SPSS Modeler in ML for z/OS Chart themes 21 ML for z/OS Fraud detection solution templates Sample the records Tree based sampling for skewed data in every leaf node • Data for fraud detection are generally skewed, e.g. 1/5000 fraud ratio – Leads to biased model • Random sampling method may lead to information loss and unstable model performance • Tree based sampling method to populate training data set • Goal/Results – Amplify probability of discovering fraud from the data data – Minimize false positives and maximize finding truly positive fraud 22 Db2 Health Tree - using IBM WML for z/OS § Leverages machine learning and data science § Ingests SMF data for model training and scoring § Analyzes, monitors, and visualizes large amount of operational data • Builds a hierarchy health tree to represent the health status of the Db2 sub-systems, transactions and individual KPIs • Monitors the changes in health status over time § Highlights abnormal KPIs in a timeline to assist root cause diagnosis § Uses ML for z/OS functionalities to provide module life cycle management § Provides real-time scoring capability by adopting SMF real-time interface 23 IBM Cloud Pak 4 Data 24 The building blocks of data and analytics IBM Cloud Pak for Data (ICP4D) 1. Services Ecosystem Services Layer 1 With a click, access and deploy an ecosystem of 45+ analytics services and templates from IBM and third parties. 2 2. Data Virtualization Quickly and easily query across multiple data Platform sources without moving your data Interface Layer 3. Platform Interface 3 Speed time-to-value with a single user experience that integrates data management, 4 data governance and analysis for greater Kubernetes efficiency and improved use of resources Layer 4. Red Hat OPENSHIFT® Leverage the leading hybrid cloud, enterprise On- container platform for an innovative and fast Infrastructure deployment strategy Premises Layer 5. Any Cloud 5 Avoid lock-in and leverage all cloud infrastructures with our multi-cloud approach ICP4D Use Case with WMLz - Get Access to Data on and off IBM Z - Deploy ML models into production at the speed of your business 27 Summary • Train anywhere, deploy anywhere Leveraging WMLz for in-transaction scoring • Data gravity Limiting data movement via coexistence of WMLz with ICP4D • Several coexistence scenarios Generating benefits of both WMLz and ICP4D • IBM Db2 Analytics Accelerator Access IDAA directly from WMLz and ICP4D • Data virtualization On- Premises Provision Z data to ICP4D via IBM Data Virtualization Manager for z/OS Thank you Jamar Smith Data Scientist, North America z Hybrid Cloud [email protected] 29 Appendix 30 More resources Machine Learning and z Systems https://www.youtube.com/watch?v=T2HtyNX7aHc Machine Learning Launch Event interview https://www.youtube.com/watch?v=WHenFAa6iPw&feature=youtu.be&list=PLenh213llmca-QogcjfSW9RHPtNye9N_p Machine Learning and z Systems https://www.youtube.com/watch?v=T2HtyNX7aHc Machine Learning Launch