Handling Concept Drift: Importance, Challenges & Solutions
Total Page:16
File Type:pdf, Size:1020Kb
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Handling Concept Drift: Importance, Challenges & Solutions A. Bifet, J. Gama, M. Pechenizkiy, I. Žliobaitė PAKDD-2011 Tutorial May 27, Shenzhen, China http://www.cs.waikato.ac.nz/~abifet/PAKDD2011/ What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Motivation for the Tutorial • in the real world data – more and more data is organized as data streams – data comes from complex environment, and it is evolving over time – concept drift = underlying distribution of data is changing • attention to handling concept drift is increasing, since predictions become less accurate over time • to prevent that, learning models need to adapt to changes quickly and accurately PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 2 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Objectives of the Tutorial • many approaches to handle drifts in a wide range of applications have been proposed • the tutorial aims to provide an integrated view of the adaptive learning methods and application needs PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 3 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Outline Block 1: PROBLEM Block 2: TECHNIQUES Block 3: EVALUATION Block 4: APPLICATIONS OUTLOOK PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 4 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Presenters & Schedule Indrė Žliobaitė Block 1 : what concept drift is, why it needs 14:05 – 14:50 to be handled and how João Gama 14:50 – 15:40 Block 2: approaches for adaptive machine learning to handle concept drift PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 5 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Presenters & Schedule Albert Bifet Block 3: training/evaluation 16:00 – 16:45 of adaptive learning approaches and a software demo Mykola Pechenizkiy Block 4: challenges for handling concept drift 16:45 – 17:30 in different applications PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 6 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 7 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Block 1: introduction PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 8 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook The goals: • to introduce the problem of concept drift in supervised learning • to overview and categorize the main principles how concept drift can be (and is) handled PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 9 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Example: production PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 10 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook What CD is About :: Types of Driftsmany & Approaches process :: Techniques in Detail :: Evaluation :: MOA Demomany :: Types ofprocess Applications :: Outlook Example:parameters productionparameters prediction predict quality reduced and transformed t1, t2 q parameters 3 79 3S 78 monitor 2 2S 77 1 76 75 Target t[2] 0 74 Konz. (INA&INAL) Konz. -1 73 2S 72 -2 3S 0 100 200 300 400 -3 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Num t[1] SIMCA-P+ 11 - 13.12.2005 10:17:16 PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 11 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook What CD is About :: Types of Driftsmany & Approaches process :: Techniques in Detail :: Evaluation :: MOA Demomany :: Types ofprocess Applications :: Outlook Example:parameters productionparameters prediction predict quality reduced and transformed t1, t2 q parameters 3 79 3S 78 monitor 2 2S 77 1 76 75 Target t[2] 0 74 Konz. (INA&INAL) Konz. -1 73 2S 72 -2 3S 0 100 200 300 400 -3 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Num t[1] SIMCA-P+ 11 - 13.12.2005 10:17:16 after 2-3 months predictions get “out of tune” PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 12 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Static model Model does not change Process changes source: Evonik Industries PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 13 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook the supplier of raw a sensor breaks/ new operating material changes “wears off”/ crew is replaced Model does not change Process changes source: Evonik Industries new regulations to new production save electricity procedures PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 14 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Desired Properties of a System To Handle Concept Drift • Adapt to concept drift asap • Distinguish noise from changes – robust to noise, but adaptive to changes • Recognizing and reacting to reoccurring contexts • Adapting with limited resources – time and memory PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 15 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Setting: adaptive learning PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 16 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Supervised Learning Training: Learning Testing X' a mapping function population data y = L (x) 2. Application: Applying L to unseen data X Historical data Learner 1. training y' = L (X') y labels 2. application testing labels ? y' PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 17 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Learning with concept drift population Training: = ?? Testing X' data y = L (x) population Application: y' = L?? (X') X Historical data Learner y labels testing labels ? y' PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 18 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Online setting ? time Data arrives in a stream PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 19 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive Learning ? time updated model as time passes the model updates/retrains prediction itself if needed PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 20 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook How to Adapt Select the right training data and or adjust retrain the model incrementally