ONLINE PARAMETRIC AND NON PARAMETRIC DRIFT DETECTION TECHNIQUES FOR STREAMING DATA IN MACHINE LEARNING: A REVIEW P.V.N. Rajeswari1, Dr. M. Shashi2 1Assoc. Professor, Dept of Comp. Sc. & Engg., Visvodaya Engineering College, Kavali, AP, India 2Professor, Dept of Comp. Sc. & System Engg, Andhra University, Visakhapatnam, AP, India
Abstract applied in the areas like, image processing, data Enormous amount of data streams are mining, medical diagnosis, search engines, etc. generated by large number of real time Machine learning categorized into two types applications with the stupendous namely Batch learning and Online learning based advancement in technology. Incremental and on representation and presentation of training online - offline learning algorithms are more samples. Batch learning systems accept a large relevant to process the data streams in the number of examples and learn to build a model data mining context. Detecting the changes from all of them at once. In contrast, Online and reacting against them is an interesting learning systems are given examples research area in current era in knowledge sequentially as streaming data [1] from which discovery process, in Machine Learning. The learning happens incrementally. target function also is changed frequently, due to the occurrence of changes in data streams. Learning from data streams aims to This is an intrinsic problem of online learning, capture the target concept as it changes over popularly known as Concept Drift. To handle time, and it is a current research area of growing the problem despite of the learning model, interest. For instance, changes can emerge due to novel methods are needed to examine the changing interests of people towards news performance metrics. This examination will topics, shopping preferences, energy be done during the learning process and consumption, etc. (e.g. influenced by seasonal prompt drift signals whenever a significant changes). Spam filtering [2] is another example, variation has been detected. The Concept wherein spammers try to elude filters by Drift problem can be detected using various disguising their emails as genuine with newer techniques. This paper focuses on reviewing tricks challenging spam filters to continuously on existing Parametric and non-Parametric enhance their capability to successfully identify techniques Online for drift detection. spam over time. So it is possible that a learning Keywords: Stream mining; Concept drift; model previously induced may be incompatible Parametric; Non-parametric; Machine with the present data, making an update Learning; mandatory. This type of problem is commonly known as Concept Drift. Many online learning I. INTRODUCTION algorithms[18], which are variants of the base Adaptation of science and technological models such as Rule-based systems, Decision outcomes including computerization in every Trees, Naive Bayes, Support Vector Machines, walk of life leads to accumulation of voluminous Instance based learning, and ensemble of amount of data that need to be processed by classifiers [3] [4], have been implemented for machines for pattern extraction for its effective handling Concept Drift. usage. Machine Learning algorithms are widely
ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 45 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)
In supervised incremental learning, a Exponentially Weighted Moving Average well-extended approach continuously monitors a For Concept Drift Detection (ECDD) [15]. performance measure such as accuracy, or speed etc., of the learning model to handle Concept The comparative study of all algorithms is Drift. A Concept Drift is assumed if a significant provided in Table 1. fall in this measure is estimated and required actions are defined to update the model II. ONLINE PARAMETRIC DRIFT according to the latest data available. In this DETECTION TECHNIQUES scenario, automatic/ inherent change detectors of A. Shewhart’s Control Charts the learning algorithm play a crucial role [5] [6] To verify the validity of manufacturing or [7]. These detectors often operate over a stream business control state, Shewhart’s control charts of real values corresponding to a given (widely known as Process Behavior charts) are performance measure. Most existing approaches used as statistical tools [17]. In other words, monitor changes in terms of a suitable statistic certain statistical assumptions about the process (such as the mean or median) with standard data it produces are used to test the steadiness of deviation [8], to successfully capture the some of the process properties over time. Mean, distributional changes associated with the stream median, variance or standard deviation, of performance measures. Thus, the problem of distribution shape or fraction of non-conforming concept drift detection is reduced to estimating items are the commonly considered properties in significant changes in the statistic calculated the charts. from the sequence of values that measure a performance characteristic. Often, this stream of Shewhart’s control charts are used to real values is also huge (probably infinite), as the continuously monitor any deviations from the learning model is monitored over longer times. current state. These are aimed to identify events, which are very unlikely when the controlled Therefore, it is very common to enforce process is in stable condition. An alarm is used restrictions on these online change detectors [8], to signal when any change is noticed in the [9]. The computational complexity required to stability of the process, so that possible steps process each performance value must be constant should be taken for investigating the causes of as well as methods should be single-pass, where the change as well as for correction. each performance value is processed once and then discarded. These change detectors must also For instance, in case of the control limits able to deal with common types of change UCL (Upper Control Limit) or LCL (Lower widespread in much real-world data [4]. Under Control Limit)) are exceeded. The control limits these conditions, many traditional statistical are constructed at ±3σ from the mean of the approaches that assume a fixed size of the input performance measure, which can be exceeded data for estimating distributional changes are not with relative frequency of 0.27%. suitable. In addition to the LCL and UCL, warning Some of the most studied parametric limits for LWL and UWL are also constructed at schemes to detect changes online are: ±2σ and as well as ±σ. Having multiple limits Shewhart’s control charts [17] defined at various levels around the mean Exponentially Weighted Moving Average statistic, provides fine-grain control through (EWMA) [11] more sophisticated rules specifying low Control charts and Page’s cumulative sum probability event possessing when the process is (CUSUM) [10] under control. Procedure and Non-parametric approaches Some of the following rules can be are Drift Detection Method (DDM) [5] selected in the Shewhart control chart dialog Early Drift Detection Method (EDDM) [6] panel; by default all available rules are selected Adaptive WINdow (ADWIN) [7] [17] Adaptive WINdow2 (ADWIN2) [7] Detection With a Statistical Test of Equal Proportions STEPD [16]
ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 46 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)
One point exceeds LCL/UCL B. CUSUM : Cumulative Sum Control Two out of three points exceed LWL or Chart UWL limits CUSUM is commonly used as sequential Four out of five points are above/below the analysis technique in statistical quality control. It central line and exceed ± limits is mainly used for monitoring abrupt or sudden Six consecutive points show change detection. In this, ‘θ’ is used as the increasing/decreasing trend parameter of probability distribution referred to Eight consecutive values are beyond ± as "quality number". It is a technique to verify limits any changes in mean, median, etc., for deciding Nine points above/below the middle line when to take corrective action. When the Difference of consecutive values alternates CUSUM [10] method is applied to changes in in sign for fourteen points mean, it can be used for step detection of a time Fifteen points are within ± limits series. Alternative forms of CUSUM charts are given below: Application of Shewhart Control Charts Direct form or Recursive forms for monitoring a control process involves two One- or Two-sided forms steps namely Chart construction and Chart application. X[n] is defined as a discrete random The purpose of construction step is to signal with independent and identically specify the middle/ central line (CL) and control distributed samples in CUSUM technique. Each limits to describe the interval of real process sample follows a probability density function correctly. When these values are not provided in (PDF) represented as p(x[n], θ) that depends on advance, assuming normally distributed data, the ‘θ’ value known as ‘deterministic parameter’ they are set to mean and interval which contains i.e., it can be considered as either mean ‘µx’ or 2 ’ 99.3% of available training data, with the the variance ‘σ x of X[n]. At the change time ‘n’, statistical characteristics. These are based on the discrete random signal may possess one observations of the process under control, abrupt change occurrence, indicated with ‘c’, and excluding outliers or otherwise suspect data at the time nc, the abrupt change is represented by points. The chart is applied to control the an instant change in the value of ‘θ’. Hence θ = process, by using a particular relevant rule set θ0 before nc and θ = θ1 from nc in the current sample. A software tool named Quality Control Expert (QC. Expert) offers seven common types Therefore, according to the given of Shewhart control charts namely X-bar and S, hypothesis/ assumptions, the whole PDF of the X-bar and R , X-individual for continuous signal px that is observed between the first variables, np, p, u, c for discrete quality attributes sample x[0] and the current sample, i.e., x[k] can Figure 1 shows an example of process variant take two different types. chart borrowed from Wikipedia. Let H0 be the no change premise and the PDF of X[n] is given by: