Handling Concept Drift: Importance, Challenges & Solutions

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Handling Concept Drift: Importance, Challenges & Solutions A. Bifet, J. Gama, M. Pechenizkiy, I. Žliobaitė PAKDD-2011 Tutorial May 27, Shenzhen, China http://www.cs.waikato.ac.nz/~abifet/PAKDD2011/ What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Motivation for the Tutorial • in the real world data – more and more data is organized as data streams – data comes from complex environment, and it is evolving over time – concept drift = underlying distribution of data is changing • attention to handling concept drift is increasing, since predictions become less accurate over time • to prevent that, learning models need to adapt to changes quickly and accurately PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 2 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Objectives of the Tutorial • many approaches to handle drifts in a wide range of applications have been proposed • the tutorial aims to provide an integrated view of the adaptive learning methods and application needs PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 3 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Outline Block 1: PROBLEM Block 2: TECHNIQUES Block 3: EVALUATION Block 4: APPLICATIONS OUTLOOK PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 4 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Presenters & Schedule Indrė Žliobaitė Block 1 : what concept drift is, why it needs 14:05 – 14:50 to be handled and how João Gama 14:50 – 15:40 Block 2: approaches for adaptive machine learning to handle concept drift PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 5 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Presenters & Schedule Albert Bifet Block 3: training/evaluation 16:00 – 16:45 of adaptive learning approaches and a software demo Mykola Pechenizkiy Block 4: challenges for handling concept drift 16:45 – 17:30 in different applications PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 6 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 7 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Block 1: introduction PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 8 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook The goals: • to introduce the problem of concept drift in supervised learning • to overview and categorize the main principles how concept drift can be (and is) handled PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 9 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Example: production PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 10 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook What CD is About :: Types of Driftsmany & Approaches process :: Techniques in Detail :: Evaluation :: MOA Demomany :: Types ofprocess Applications :: Outlook Example:parameters productionparameters prediction predict quality reduced and transformed t1, t2 q parameters 3 79 3S 78 monitor 2 2S 77 1 76 75 Target t[2] 0 74 Konz. (INA&INAL) Konz. -1 73 2S 72 -2 3S 0 100 200 300 400 -3 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Num t[1] SIMCA-P+ 11 - 13.12.2005 10:17:16 PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 11 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook What CD is About :: Types of Driftsmany & Approaches process :: Techniques in Detail :: Evaluation :: MOA Demomany :: Types ofprocess Applications :: Outlook Example:parameters productionparameters prediction predict quality reduced and transformed t1, t2 q parameters 3 79 3S 78 monitor 2 2S 77 1 76 75 Target t[2] 0 74 Konz. (INA&INAL) Konz. -1 73 2S 72 -2 3S 0 100 200 300 400 -3 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Num t[1] SIMCA-P+ 11 - 13.12.2005 10:17:16 after 2-3 months predictions get “out of tune” PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 12 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Static model Model does not change Process changes source: Evonik Industries PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 13 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook the supplier of raw a sensor breaks/ new operating material changes “wears off”/ crew is replaced Model does not change Process changes source: Evonik Industries new regulations to new production save electricity procedures PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 14 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Desired Properties of a System To Handle Concept Drift • Adapt to concept drift asap • Distinguish noise from changes – robust to noise, but adaptive to changes • Recognizing and reacting to reoccurring contexts • Adapting with limited resources – time and memory PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 15 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Setting: adaptive learning PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 16 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Supervised Learning Training: Learning Testing X' a mapping function population data y = L (x) 2. Application: Applying L to unseen data X Historical data Learner 1. training y' = L (X') y labels 2. application testing labels ? y' PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 17 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Learning with concept drift population Training: = ?? Testing X' data y = L (x) population Application: y' = L?? (X') X Historical data Learner y labels testing labels ? y' PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 18 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Online setting ? time Data arrives in a stream PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 19 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive Learning ? time updated model as time passes the model updates/retrains prediction itself if needed PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 20 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook How to Adapt Select the right training data and or adjust retrain the model incrementally

Handling Concept Drift: Importance, Challenges & Solutions

Concept Drift Adaptation Techniques in Distributed Environment for Real-World Data Streams

Adaptation Strategies for Automated Machine Learning on Evolving Data

DATA STREAM MINING a Practical Approach

A Review of Meta-Level Learning in the Context of Multi-Component

SAMOA: Scalable Advanced Massive Online Analysis

Iovfdt for Concept-Drift Problem in Big Data Rajeswari P

Mining Data Streams with Concept Drift

Automated Machine Learning Techniques for Data Streams Alexandru-Ionut Imbrea University of Twente P.O

Mitigating Concept Drift in Data Mining Applications for Intrusion Detection Systems

Anomaly Detection for Data Streams Based on Isolation Forest Using

Extremely Fast Decision Tree Mining for Evolving Data Streams

Scikit-Multiflow: a Multi-Output Streaming Framework