TIBCO Data Science, Streaming on AWS
Total Page:16
File Type:pdf, Size:1020Kb
A I M 2 0 1 - S Hot paths to anomaly detection with TIBCO data science, streaming on AWS Steven Hillion Michael O’Connell Sr Director, Data Science Chief Analytics Officer @StevenHillion @MichOConnell © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Ideal Data Science Platform Offer Cross-Sell Products Streaming Predict Impending Equipment Failure Data Marts Optimize Batch Scoring Pricing Transactional Stores Real-Time Scoring Prevent Fraud Live Applications Point-Click | Code Manage Inventory External Sources Data Access & Prep Edge Devices Modeling, ML, DL © Copyright 2000-2019 TIBCO Software Inc. 3 Agenda • TIBCO Data Science and AWS Marketplace • The TIBCO Connected Intelligence Cloud • Anomaly Detection and Analysis • Demonstration – Spatial Anomaly Analysis • Links and Assets © Copyright 2000-2019 TIBCO Software Inc. 4 TIBCO Data Science Data Access/Prep Modeling Operations Business Apps + Distributed compute + Visual composition + Model lifecycle management + Engineering/IoT FUNCTION + Feature engineering + Multilingual notebook + Batch automation + Customer analytics + Reusable templates + Native ML & OS + Real-time event processing + Risk management + Auto-ML, data prep + REST, applications, embedding + Supply chain Medic; e.g., researcher on epidemic monitoring Engineer; e.g., aerodynamics engineer Marketeer; e.g., customer engagement analyst Business User USER or Data Scientist Data Scientist Analytics Operations Analytics Operations AUTOMATION Citizen Data Scientist Citizen Data Scientist IT / Software Engineer IT / Administration © Copyright 2000-2019 TIBCO Software Inc. TIBCO Data Science on AWS TIBCO DS on AWS Marketplace; Biggest vCPU Grid; Lightest Serverless Footprint © Copyright 2000-2019 TIBCO Software Inc. 6 Data Science in the Cloud: Leidos Healthcare Analytics Leidos Collaborative Advanced Analytics & Data Sharing Platform (CAADS) uses TIBCO Data Science and AWS to deliver analytics services in Healthcare CDC NIH CMS NASA Disease Outbreaks: Disease Outbreaks: Data Governance: Space Exploration: Determining the Run simulations of Analyzing and Analyzing human cause of an HIV disease propagation consolidating data factors that affect the outbreak in the to guide public policy, around emerging ability to transport Midwest specifically around Healthcare policies astronauts on long the Zika virus across 56 regions in fights (e.g., to Mars) the United States © Copyright 2000-2019 TIBCO Software Inc. 7 TIBCO analytics transformation platform Powered by shared data assets ANALYTICS ACTIONS Visual Analytics Data Science Operational Analytics AUGMENT INTELLIGENCE DATA OPERATIONS INFORMATION MANAGEMENT METADATA MDM / RDM Metadata Spark Compute & Store Data Virtualization Metadata Master and Master & (Data Governance, Reference Data UNIFY Reference Data Data Catalog) Management DATA Data Store 1 Data Store n EDW/Lake Transactional Data EVENTS INTEGRATION Streaming / Hybrid API Events Integration Management INTERCONNECT EVERYTHING DATA SOURCES Streaming In-memory Enterprise ... Enterprise Sources Data Grid Application 1 Application n Cloud Native Open Platform AI Foundation © Copyright 2000-2019 TIBCO Software Inc. TIBCO Data Science and AWS EDWs & Data Operational Analytics Analytics Sources Systems Sandboxes Platform Federated Data Amazon SageMaker Science Containers/Code Real Services Streaming Business Events & - StreamBase Time Time (BW) SQL Flogo Spark TIBCO Data Microservices and APIs Amazon EMR Science Amazon SageMaker Model Management Data Marts (DV) Batch Automation Visual UI / Amazon Redshift Notebook Collaboration / Transactional Project Stores Manage Execute TIBCO Live Apps Amazon Simple Storage Service (S3) External Visual Analytics Sources (TIBCO Spotfire) AWS Deployments © Copyright 2000-2019 TIBCO Software Inc. 9 TIBCO Data Science Solutions on AWS Cloud Apps: Visual Analytics, Data Science, Streaming, Case Management Anomaly Detection Risk Management Customer Engagement Starter Set Process Mining IoT Analytics Anomaly Detection Risk Management Customer Engagement Blockchain – Dovetail Partner Management Starter Toolkit Review Status: TIBCO Spotfire Model: TIBCO Data Science Analyze Event Stream: Case Manage: TIBCO Live Apps Identify issues, sweet spots Supervised: Train TIBCO Flogo, Cloud Integration Investigate identified cases Unsupervised: Anomalies Batch and Real-Time Updates Audit trail + recycle © Copyright 2000-2019 TIBCO Software Inc. Anomaly Analysis Solution Overview 1 Collect data from equipment, normalize, model to predict 3 Alert raised and case created – TIBCO Cloud magnitude of anomaly – TIBCO Data Science & AWS Live Apps 2 Model detects anomaly – TIBCO Data Science 4 Case manager investigates and takes action to the equipment – TIBCO Cloud Integration © Copyright 2000-2019 TIBCO Software Inc. Data Challenges in High-Tech Manufacturing Stack 3D Flash Memory Cell Layers Vertically 96 Memory Cell Layers © Copyright 2000-2019 TIBCO Software Inc. Hot Paths to Anomaly Detection 14 Longitudinal Anomaly Analysis © Copyright 2000-2019 TIBCO Software Inc. Cross-Sectional Anomaly Analysis © Copyright 2000-2019 TIBCO Software Inc. Spatial Anomaly Analysis One-Valued Wafer Maps Big Red Spot Plaid Geometric Patterns One Big Blob & One-Valued One Big Blob © Copyright 2000-2019 TIBCO Software Inc. Cross-Sectional Anomaly Detection © Copyright 2000-2019 TIBCO Software Inc. Cross-Sectional Anomaly Detection © Copyright 2000-2019 TIBCO Software Inc. Analysis: Cluster Incidents, View Signatures © Copyright 2000-2019 TIBCO Software Inc. Analytic workflow – methodologies + demos Find and cluster anomalous events [Demo #1] • Transform wafer maps into vectorized coefficients, then cluster on quality • Many measured parameters, e.g., quality tests: storage fidelity, logic circuits • Approach 1: SVD + K-means • Focus on failure mode parameters • Approach 2: Bessel functions + hierarchical clustering • Radial basis functions • Rotationally invariant | Null-value tolerant | Efficient storage • Better than SVD + K-means for multi-parameter analysis Monitor anomalies [Demo #2] • Stream wafer data • Vectorize and cluster Predict when and why anomalies occur [Demo #3] Demo data • Reduce dimensionality of very wide data • 733 dies • Train models to determine sensor importance • 20 x 40 layout • 1,500 measured parameters • Identify responsible process parameters • Color by die test (white = Process variable corrections/models rebasing pass all, colors = fail a test) • Identify new patterns as they emerge (e.g., incident analysis) • Factory monitoring staff can click to characterize the new pattern © Copyright 2000-2019 TIBCO Software Inc. 21 (TIBCO Streaming operator for Python) Capture and Ingest data Read data Preprocess Perform SVD on the send data to into Amazon from Kinesis and transform bin values for wafers livestream Kinesis for into TIBCO the data using Python further Streaming processing Send data for live viewing into Spotfire Data Streams Combine data PMML operator to & results cluster the data using a model trained in TIBCO Data Science Refine clusters, identify wafers of interest Send data to Spark/Hadoop cluster for more in-depth analysis in TIBCO Data Science Retrain models for clustering and root-cause analysis © Copyright 2000-2019 TIBCO Software Inc. 22 Anomaly Detection with Spatial Signature Analysis 23 Anomaly Analysis Find and explore anomalous events Monitor anomalies Predict when and why they occur © Copyright 2000-2019 TIBCO Software Inc. 24 Anomaly Detection and Analysis Find and explore anomalous events Monitor anomalies Predict when and why they occur © Copyright 2000-2019 TIBCO Software Inc. 25 © Copyright 2000-2019 TIBCO Software Inc. 26 © Copyright 2000-2019 TIBCO Software Inc. 27 Anomaly Detection and Analysis Find and explore anomalous events Monitor anomalies Predict when and why they occur © Copyright 2000-2019 TIBCO Software Inc. 28 © Copyright 2000-2019 TIBCO Software Inc. 29 © Copyright 2000-2019 TIBCO Software Inc. 30 © Copyright 2000-2019 TIBCO Software Inc. 31 Anomaly Detection and Analysis Find and explore anomalous events Monitor anomalies Predict when and why they occur © Copyright 2000-2019 TIBCO Software Inc. 32 Digital Twin for Semiconductor Yield Digital Twins for Semiconductor Manufacturing Yield: Wide-and-Big Data Analysis Build Models to Relate Product Yield Failure Modes ( Yi ) with Process Parameters ( Pi ) Product Test: Process Process Measurement Process Start Process Step i+2 Step 1 Step i Step 1 Step n Pass / Fail, Failure Modes ・・・ ・・・ P1 Pi Pi+1 Pi+2 Pn ~ 1K Steps, > 1M Process Parameters Equipment Sensor Data Physical Measurement Data Equipment Names Run continuously to capture new Yield, relationships as they emerge Yield Components or Process Optimization Process Monitoring Clusters Process Control Digital Twin Yi = f(P P P … P ) Y = f(Y1 Y2, Y3, … Yn ) Yield Models 1 , 2, 3, n , © Copyright 2000-2019 TIBCO Software Inc. 33 The Extreme Challenge of Big & Wide Data • Not just big data – many rows: lots, wafers, die, units • Also wide data – many columns: > 1M process parameters • Sensor traces • Time series for every sensor on each machine in each run • Physical measurements • Film thickness, critical dimensions, layer-to-layer overlay, defect classes & counts • Equipment and process attributes • Machine and component IDs, process recipe info • Supplies • Chemical batch IDs, QA sample data “Today [semiconductor] fabs collect more than 5 billion sensor data points each day. The challenge is to turn massive amounts of data