Automatic Learning to Detect Concept Drift

Automatic Learning to Detect Concept Drift Hang Yu1 , Tianyu Liu1 , Jie Lu1 and Guangquan Zhang1 1Australian Artificial Intelligence Institute, University of Technology Sydney. Australia. fhang.yu, [email protected], fJie.Lu, [email protected] Abstract Data Sudden Many methods have been proposed to detect con- Drift: distribution Time cept drift, i.e., the change in the distribution of Data Gradual streaming data, due to concept drift causes a de- Drift: distribution Time crease in the prediction accuracy of algorithms. Data However, the most of current detection methods are Incremental based on the assessment of the degree of change Drift: distribution Time in the data distribution, cannot identify the type Data of concept drift. In this paper, we propose Active Reoccurring distribution Time Drift Detection with Meta learning (Meta-ADD), context: a novel framework that learns to classify concept Figure 1: Types of concept drift. drift by tracking the changed pattern of error rates. Specifically, in the training phase, we extract meta- features based on the error rates of various concept data and extract the key features; 3) test statistics calculation drift, after which a meta-detector is developed via is the measurement of dissimilarity, or distance estimation a prototypical neural network by representing var- [de Lima Cabral and de Barros, 2018]; 4) hypothesis test- ious concept drift classes as corresponding proto- ing uses a specific hypothesis test to evaluate the statistical types. In the detection phase, the learned meta- significance of the change observed in step 3, or the p-value. detector is fine-tuned to adapt to the corresponding data stream via stream-based active learning. However, this framework faces two drawbacks: the first is Hence, Meta-ADD uses machine learning to learn the cold start problem. In current drift detection methods, an to detect concept drifts and identify their types au- initial window is needed to collect the basic statistic proper- tomatically, which can directly support drift under- ties for the hypothesis test. As a result, no detection strate- stand. The experiment results verify the effective- gies can be implemented in the initial window, but there may ness of Meta-ADD. be concept drift in the initial window. The second is, to the best of our knowledge, the existing methods can only identify whether concept drift has occurred, but they are unable to 1 Introduction identify what type of concept drift has occurred. The reason Concept drift is a hot research topic in data stream mining [He for this is that although the change of data distribution can be et al., 2019], incremental learning [Losing et al., 2018], non- represented by test statistics such as average error rate, test arXiv:2105.01419v1 [cs.AI] 4 May 2021 stationary learning [Pratama and Wang, 2019], and it can be statistics cannot represent the changed pattern of data distri- further extended to machine learning. In the machine learn- bution, i.e., the relationship between concept drift at adjacent ing area, concept drift is a phenomenon in which the statisti- timestamps cannot be captured. In practice, knowing the type cal properties of the predicted variable change over time in an of concept drift is useful to understand them. For example, arbitrary way [Lu et al., 2017]. This causes the predictions to if a person’s weight suddenly decreases, this may indicate he become less accurate as time passes, and therefore needs to has health problems which should be addressed instantly, but be detected. Figure 1 shows the four types of concept drift, if a person’s weight gradual decreases over a long time, such categorized according to the changed pattern of data distribu- as over one year, the situation may be normal because the tion [Gama et al., 2014]. man is making an effort to lose weight. In the two recent decades, although many methods of de- In this paper, we propose a novel framework to learn a tecting concept drift have been proposed, they all, in gen- meta-detector with the ability to classify concept drift which eral, follow a framework [Lu et al., 2019] which includes can be trained in advance. When a data stream is inputted, it four steps: 1) data retrieval aims to retrieve data chunks from can be used to directly classify concept drift without setting data streams; 2) data modeling aims to abstract the retrieved an initial window. However, this framework faces two main challenges. On the one hand, although concept drift can be called Adaptive Windowing (ADWIN) [Bifet and Gavalda, classified into four types, the presentations, i.e., changed pat- 2007] was proposed. In ADWIN, the size of the compared tern of data distribution, of one type of drifts is infinite possi- windows can be adjusted automatically. bly over 1000000. By contrast, the presentations of this type of drifts, which are included in training data, are very small 2.2 Data Distribution-based Drift Detection possibly 1000. On the other hand, for each type of concept The second largest category of drift detection algorithms is drift, each data stream may have its own unique presentation, data distribution-based drift detection. In this category of also the meta-detector must have the ability to be able to ad- gorithms, the dissimilarity between the distribution of new just to fit the detected data stream. To address these two chal- data and historical data is quantified by using a distance met- lenges, we propose Active Drift Detection with Meta learning ric. Once the dissimilarity is proven to be statistically signifi- (Meta-ADD). The contributions of this paper are summarized cantly different, a drift alarm will be triggered. According to as follows: the literature, the first formal treatment of change detection in • We propose a framework that learns to detect drift by data streams was proposed by [Kifer et al., 2004]. Another learning the presentation of each type of concept drift. typical data distribution-based drift detection algorithm is the As a result, in our framework, not only can concept drift Information-Theoretic Approach (ITA) [Dasu et al., 2006]. be detected, the type of drift can also be identified to Similar implementations have been adopted in the Compe- help us understand them. tence Model-based drift detection (CM) [Lu et al., 2014], Equal Density Estimation (EDE) [Yu et al., 2020], and Local • We propose to transform the pre-training of a meta- Drift Degree-based Density Synchronized Drift Adaptation detector into a few-shot learning problem, and thereby (LDD-DSDA) [Liu et al., 2017]. proposing a meta-learning method [Finn et al., 2017] based on prototypical-networks [Snell et al., 2017] to 2.3 Multiple Hypothesis Test Drift Detection learn the meta-detector. In this meta-detector, various concept drift classes can be represented as a correspond- Multiple hypothesis test drift detection algorithms apply sim- ing single prototype. ilar techniques to those mentioned in the previous two cate- gories. The novelty of this type of algorithms is that the mul- • To improve the accuracy of detecting concept drift, we tiple hypothesis tests is used to detect concept drift in differ- propose a stream-based active learning algorithm (SAL) ent ways. For example, the Just-In-Time adaptive classifier which is capable of handling various concept drifts by (JIT) [Alippi and Roveri, 2008] is the first algorithm to set adapting the meta-detector to the underlying distribution multiple drift detection hypotheses in this way, but Hierarchi- in the stream. cal Change-Detection Tests (HCDTs) [Alippi et al., 2017] is The rest of this paper is organized as follows. Section the first attempt to address concept drift using a hierarchical 2 presents the related work. Section 3 provides the details architecture. of our proposed Meta-ADD framework. Section 4 discusses the experiment results for several well-known datasets in this 3 Active Drift Detection with Meta-learning research area. Finally, Section 5 concludes the paper and In this section, we elaborate on the Active Drift Detection presents the future work. with Meta learning (Meta-ADD). Meta-ADD includes two main phases: training phase and detection phase. During 2 Related work the training phase, we extract the meta-features of various The survey of the literature related to drift detection methods concept drifts, and then learn a meta-detector where vari- reveals that in general, these can be divided into three cate- ous concept drift classes can be represented as a correspond- gories: ing single prototype [Shao et al., 2018]. In the detection phase, the learned meta-detector is used to detect concept 2.1 Error Rate-based Drift Detection drift and is further fine-tuned via stream-based active learn- Error rate-based drift detection methods form the largest cat- ing [Krawczyk and Cano, 2019]. egory of drift detection algorithms. The intuition behind this type of algorithms is to monitor fluctuations in the error rate 3.1 Extracting Meta-Features as time passes. Once the change of the error rate is proven In this subsection, we describe the extraction of the meta- to be statistically significant, a drift alarm will be triggered. features that can be used across different datasets. Although One of the top referenced this type of algorithms is the Drift there are four types of concept drift, the reoccurring concept Detection Method (DDM) [Gama et al., 2004] which was type focuses on tracking whether an old concept occurs again. the first algorithm to contain defined warning and drift lev- In contrast, our work focuses on detecting concept drift im- els for signalling concept drift. Many subsequent algorithms mediately instead of considering whether a concept is an old have adopted a similar implementation, e.g., Early Drift De- concept.

Load more