ONLINE PARAMETRIC AND NON PARAMETRIC DRIFT DETECTION TECHNIQUES FOR STREAMING DATA IN MACHINE LEARNING: A REVIEW P.V.N. Rajeswari1, Dr. M. Shashi2 1Assoc. Professor, Dept of Comp. Sc. & Engg., Visvodaya Engineering College, Kavali, AP, India 2Professor, Dept of Comp. Sc. & System Engg, Andhra University, Visakhapatnam, AP, India

Abstract applied in the areas like, image processing, data Enormous amount of data streams are mining, medical diagnosis, search engines, etc. generated by large number of real time Machine learning categorized into two types applications with the stupendous namely Batch learning and Online learning based advancement in technology. Incremental and on representation and presentation of training online - offline learning algorithms are more samples. Batch learning systems accept a large relevant to process the data streams in the number of examples and learn to build a model data mining context. Detecting the changes from all of them at once. In contrast, Online and reacting against them is an interesting learning systems are given examples research area in current era in knowledge sequentially as streaming data [1] from which discovery process, in Machine Learning. The learning happens incrementally. target function also is changed frequently, due to the occurrence of changes in data streams. Learning from data streams aims to This is an intrinsic problem of online learning, capture the target concept as it changes over popularly known as Concept Drift. To handle time, and it is a current research area of growing the problem despite of the learning model, interest. For instance, changes can emerge due to novel methods are needed to examine the changing interests of people towards news performance metrics. This examination will topics, shopping preferences, energy be done during the learning process and consumption, etc. (e.g. influenced by seasonal prompt drift signals whenever a significant changes). Spam filtering [2] is another example, variation has been detected. The Concept wherein spammers try to elude filters by Drift problem can be detected using various disguising their emails as genuine with newer techniques. This paper focuses on reviewing tricks challenging spam filters to continuously on existing Parametric and non-Parametric enhance their capability to successfully identify techniques Online for drift detection. spam over time. So it is possible that a learning Keywords: Stream mining; Concept drift; model previously induced may be incompatible Parametric; Non-parametric; Machine with the present data, making an update Learning; mandatory. This type of problem is commonly known as Concept Drift. Many online learning I. INTRODUCTION algorithms[18], which are variants of the base Adaptation of science and technological models such as Rule-based systems, Decision outcomes including computerization in every Trees, Naive Bayes, Support Vector Machines, walk of life leads to accumulation of voluminous Instance based learning, and ensemble of amount of data that need to be processed by classifiers [3] [4], have been implemented for machines for pattern extraction for its effective handling Concept Drift. usage. Machine Learning algorithms are widely

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 45 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

In supervised incremental learning, a  Exponentially Weighted Moving Average well-extended approach continuously monitors a For Concept Drift Detection (ECDD) [15]. performance measure such as accuracy, or speed etc., of the learning model to handle Concept The comparative study of all algorithms is Drift. A Concept Drift is assumed if a significant provided in Table 1. fall in this measure is estimated and required actions are defined to update the model II. ONLINE PARAMETRIC DRIFT according to the latest data available. In this DETECTION TECHNIQUES scenario, automatic/ inherent change detectors of A. Shewhart’s Control Charts the learning algorithm play a crucial role [5] [6] To verify the validity of manufacturing or [7]. These detectors often operate over a stream business control state, Shewhart’s control charts of real values corresponding to a given (widely known as Process Behavior charts) are performance measure. Most existing approaches used as statistical tools [17]. In other words, monitor changes in terms of a suitable statistic certain statistical assumptions about the process (such as the mean or median) with standard data it produces are used to test the steadiness of deviation [8], to successfully capture the some of the process properties over time. Mean, distributional changes associated with the stream median, variance or standard deviation, of performance measures. Thus, the problem of distribution shape or fraction of non-conforming concept drift detection is reduced to estimating items are the commonly considered properties in significant changes in the statistic calculated the charts. from the sequence of values that measure a performance characteristic. Often, this stream of Shewhart’s control charts are used to real values is also huge (probably infinite), as the continuously monitor any deviations from the learning model is monitored over longer times. current state. These are aimed to identify events, which are very unlikely when the controlled Therefore, it is very common to enforce process is in stable condition. An alarm is used restrictions on these online change detectors [8], to signal when any change is noticed in the [9]. The computational complexity required to stability of the process, so that possible steps process each performance value must be constant should be taken for investigating the causes of as well as methods should be single-pass, where the change as well as for correction. each performance value is processed once and then discarded. These change detectors must also For instance, in case of the control limits able to deal with common types of change UCL (Upper Control Limit) or LCL (Lower widespread in much real-world data [4]. Under Control Limit)) are exceeded. The control limits these conditions, many traditional statistical are constructed at ±3σ from the mean of the approaches that assume a fixed size of the input performance measure, which can be exceeded data for estimating distributional changes are not with relative frequency of 0.27%. suitable. In addition to the LCL and UCL, warning Some of the most studied parametric limits for LWL and UWL are also constructed at schemes to detect changes online are: ±2σ and as well as ±σ. Having multiple limits  Shewhart’s control charts [17] defined at various levels around the mean  Exponentially Weighted Moving Average statistic, provides fine-grain control through (EWMA) [11] more sophisticated rules specifying low  Control charts and Page’s cumulative sum probability event possessing when the process is (CUSUM) [10] under control.  Procedure and Non-parametric approaches Some of the following rules can be are Drift Detection Method (DDM) [5] selected in the Shewhart dialog  Early Drift Detection Method (EDDM) [6] panel; by default all available rules are selected  Adaptive WINdow (ADWIN) [7] [17]  Adaptive WINdow2 (ADWIN2) [7]  Detection With a Statistical Test of Equal Proportions STEPD [16]

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 46 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

 One point exceeds LCL/UCL B. CUSUM : Cumulative Sum Control  Two out of three points exceed LWL or Chart UWL limits CUSUM is commonly used as sequential  Four out of five points are above/below the analysis technique in statistical quality control. It central line and exceed ±  limits is mainly used for monitoring abrupt or sudden  Six consecutive points show change detection. In this, ‘θ’ is used as the increasing/decreasing trend parameter of probability distribution referred to  Eight consecutive values are beyond ±  as "quality number". It is a technique to verify limits any changes in mean, median, etc., for deciding  Nine points above/below the middle line when to take corrective action. When the  Difference of consecutive values alternates CUSUM [10] method is applied to changes in in sign for fourteen points mean, it can be used for step detection of a time  Fifteen points are within ±  limits series. Alternative forms of CUSUM charts are given below: Application of Shewhart Control Charts  Direct form or Recursive forms for monitoring a control process involves two  One- or Two-sided forms steps namely Chart construction and Chart application. X[n] is defined as a discrete random The purpose of construction step is to signal with independent and identically specify the middle/ central line (CL) and control distributed samples in CUSUM technique. Each limits to describe the interval of real process sample follows a probability density function correctly. When these values are not provided in (PDF) represented as p(x[n], θ) that depends on advance, assuming normally distributed data, the ‘θ’ value known as ‘deterministic parameter’ they are set to mean and interval which contains i.e., it can be considered as either mean ‘µx’ or 2 ’ 99.3% of available training data, with the the variance ‘σ x of X[n]. At the change time ‘n’, statistical characteristics. These are based on the discrete random signal may possess one observations of the process under control, abrupt change occurrence, indicated with ‘c’, and excluding outliers or otherwise suspect data at the time nc, the abrupt change is represented by points. The chart is applied to control the an instant change in the value of ‘θ’. Hence θ = process, by using a particular relevant rule set θ0 before nc and θ = θ1 from nc in the current sample. A software tool named Quality Control Expert (QC. Expert) offers seven common types Therefore, according to the given of Shewhart control charts namely X-bar and S, hypothesis/ assumptions, the whole PDF of the X-bar and R , X-individual for continuous signal px that is observed between the first variables, np, p, u, c for discrete quality attributes sample x[0] and the current sample, i.e., x[k] can Figure 1 shows an example of process variant take two different types. chart borrowed from Wikipedia. Let H0 be the no change premise and the PDF of X[n] is given by:

| , And the change premise is represented as H1, and the PDF is represented as:

| , ,

With the help of PDF of each incoming Figure 1: Process Control Chart Sample sample or example p(x[n], θ), it is possible to detect the abrupt change and also the values of

the parameters before (θ0) and after (θ1), using

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 47 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) this method. Abrupt changes for 1000 data Percentage is used as single parameter for values using CUSUM algorithm is shown in percent-based EMA and duration is used as a Figure 1. But in this method, some unknown parameter for period-based EMA, duration is the values are also to be determined as mentioned parameter. The formula for an EMA is: below: EMA (present) = ((Price (present) – EMA  There is a deficient in abrupt change rate (previous)) x (Multiplier) + EMA (previous). between the values for n = 0 and n = k. The parameter “Multiplier” in the above  The value of the possible change time nc. equation has different notations in different Following are the two different steps are EMAs. In percent-based EMA, specific performed in this algorithm: percentage is used where as in period-based EMA, it is equal to 2 /(1 + N), where N is the 1. Detection step: To decide the true specified number of periods. hypothesis between H0 and H1. 2. Estimation step: To estimate the change The EWMA chart may also be used time nc, in case H1 is true. whenever there is only a single response is Here, H0 and H1 are the hypotheses for no available at each time point. An alternative change and change in the data respectively. option for single responses is the Individuals and C. EWMA : Exponentially Weighted Moving Moving Range (I-MR) control charts. Averages Control Charts III. ONLINE NON-PARAMETRIC DRIFT Like the CUSUM chart, even small DETECTION TECHNIQUES deviation in the mean value of the process can be A. ADWIN : ADaptive WINdow detected using EWMA control chart. These An adaptive sliding window method, control charts observe the mean of a given proposed by Bifet [12, 13], for detecting sudden process based on samples/data collected from the drifts rapidly generating and voluminous data process at given time-intervals which span in streams is termed as ADWIN. The working of terms of hours, days, weeks, or even months. the algorithm mainly depends on the parameter Subgroups are constituted based on the called as ‘W’value i.e., a sliding window for the measurements of the data samples taken. most recently read data samples. The main idea of ADWIN is as follows: The EWMA chart relies on the whenever two ‘large enough’ sub-windows of specification of a target value and on known or size ‘W’ shows signs of different average values, consistent approximate of the standard deviation one can conclude that the equivalent expected of the input samples. For this rationale, the values are different from each other, and the moving average chart is better used after process older portion of the window is dropped. This control has been established. This Exponential gives the answer for the question whether “the Moving Averages (EMA) can be obtained in two average µt remained constant in W with different behaviors named as Percent-based confidence value δ” or not. The related pseudo- EMA and Period-based EMA. code for ADWIN [13] is given in Figure 3. The important part of the algorithm lies in the definition of ‘Єcut’ value and the test it is used for.

Input: S: a data stream of examples δ:confidence level Output : W : a window of examples

1: initialize window W; 2: for all xi Є S do 3: W ← W U { xi }; 4: repeat 5: drop the oldest element from W; 6: until |µw0- µw1|< Єcut holds for every Figure 2 : Typical behavior of CUSUM split of W into W=W0.W1 algorithm using Guassian signal Figure 3 : ADWIN algorithm Psuedocode.

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 48 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

In this algorithm ‘n’ denote the size of W, where C. DDM : Drift Detection Method n0 and n1 are the sizes of W0 and W1 The DDM method was proposed by respectively. Gama et al. [5] that includes Early Drift So that n = n0+n1. Let, µ`w0 and µ`w1 are the Detection Method (EDDM) [6], and Detection averages of W0 and W1 values, and also µw0 and with a Statistical Test of Equal Proportions µw1 as their predictable values. (STEPD) method [14], applies to single classifier learning methods. Only one classifier is used at The value of ‘Єcut’ is proposed as follows: any point of time in these methods, and old classifier is replaced with newly trained classifier ∈ . ` after occurrence concept drift. It is assured that error rate of the online where, and ` classifies will decrease if the stability is But, ADWIN algorithm proved maintained in the target concept even with the inefficient in terms of time and memory progression in time. Therefore, whenever there is constraints. a considerable increase in the error rate of the online classifier it’s true that the concept is B. ADWIN2 : ADaptive WINdow2 changing which results as concept drift. Warning level and drift level are the two types of ADWIN2 is developed by Bifet and thresholds mentioned to estimate the severity of Gavalda [7] with the ideas from different data the drift in inflow. Data examples or inflow are stream algorithms [20 - 23] to overcome the stored in short-term memory at the occurrence of limitations of ADWIN algorithm in terms of time warning level as well as all available samples are and memory. ADWIN algorithm re-initialized at the occurrence of the drift. Also computationally expensive as it verifies all online classifier is remodeled automatically. “large enough” possible sub-windows of the DDM reacts fastly and positively when sudden current window exhaustively, for possible cuts and significant changes are detected, but reacts which lead to increase in the memory size slowly and negatively for gradual and small exponentially. The ADWIN2 algorithm retains a changes. Sometimes DDM is ineffective because data structure with the following properties: of short memory overflow and also sometimes  For each available memory word which incapable of detecting sudden change extends up to ‘W’ value, the space occurrences. complexity results as O(M*log(W/M)). DDM employs binomial distribution,  In O(1) amortized and with worst case which is used to represent the form of underlying value O(logW) it can process the arrival probability concept for concerned random of a new element. variable. The random variable is meant to notate  It can provide the accurate counts of 1’s the total number of errors in ‘n’ samples. For for all the existed sub-windows whose each ‘i’, a sampled point in the input sequence, lengths are of the form b(1+1/M), in (pi), indicates the probability of misclassifying O(1) time complexity per query. error rate and si = 1 ⁄ indicates the ADWIN2 is an adaptive windowing related standard deviation obtained. With (pi), scheme which is based on the use of the value, it is assumed that the error rate of the Hoeffding bound to detect concept change with learning algorithm will also decrease even with less memory and time limits. ADWIN used a the increase in the number of samples, if variation of exponential histograms and a distribution in examples stand stationary. It is memory parameter to limit the number of stated in PAC (Probability Approximate Correct) hypothesis tests done on a given window. learning model [10], [24]). So, with the ADWIN2 was shown to be superior to Gama’s considerable increase in the error of the method and fixed size window with flushing [16] algorithm, it suggests that the changing in the on almost all performance measures such as the class distribution occurring and present decision false positive rate, false negative rate and learning model will become inappropriate for the sensitivity to slow gradual changes.

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 49 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

1 1 1 1 current stream, and it checks when the following 2. p i+2.s i /p max+2.s max< β indicates the conditions set off: drift level. It can be assured that the concept drift occurred and the learning model needs to be reset i. pi+si ≥ pmin+2.smin for the warning level above this level. At the same time new model has ii. pi + si ≥ pmin + 3. smin for the drift level to be built by using the examples stored since the warning level triggered. Ahead of the drift level, occurrence of the 1 1 changes in the concept is supposed to be true, The values for p max and s max also have then the model induced by the learning method to reset without any delay. The technique need to be updated and a new learning model is considers the thresholds and then searches learnt using the stored examples in the memory. occurrence of drift, if any, in given concept, only And the values for pmin and smin variables are reset after the occurrence of a minimum of 30 errors. as well. This approach suits well in detecting abrupt and gradual changes, when the gradual E. ECDD : Exponentially Weighted Moving change is not very slow. But it may prove Average For Concept Drift Detection difficult in dealing when there is slow and Ross & Adams et al., proposed a drift gradual change. In that scenario, the stored detection method based on Exponentially examples are observed for long time and as a Weighted Moving Average (EWMA) chart [11], consequence triggering at drift level will happen and used for identifying an increase in the mean after long time. In addition to that, the required of a sequence of random variables to monitor the memory for the examples can be exceeded which misclassification of the streaming data. ECDD results need of more memory which proved as a method possess both single pass and disadvantage to this method. computationally efficient with an overhead of D. EDDM : Early Drift Detection Method O(1) properties. Here, means µ0 and µ1 are used as indications for, before and after change EDDM was developed by Baena-Garcia respectively. Change is occurred in the et al. [6], to overcome the drawback in DDM, so independent random variables X1…..Xn.., where that to improve the gradual concept drift µt is the time of mean µ and σx is the standard detection. In parallel, it shows better deviation., The values of success and failure i.e., performance in abrupt concept drift too. The probability (1 and 0) are computed online using basic scheme introduced in this technique is, ECDD. The basic idea is on the classification instead of considering only the number of errors, accuracy of the base learner in the actual the distance between two errors classification instance, together with an estimator of the have to be taken into account for finding the drift. expected time between false positive detections. Predictions are measured by learning method and the distance between two errors will increase if F. STEPD : Detection With a Statistical there is drift. The adequate distance between two Test of Equal proportions 1 errors (represented as p i) and its standard Nishida et al, proposed the STEPD 1 deviation (represented as s i) are to be calculated. technique [16], in which, monitoring of the These values are to be stored if and only if predictive accuracy in single classifier is 1 1 p i+2.s i reaches its maximum value by obtaining performed. Two different analytical accuracies 1 1 p max and s max. respectively. Thus, the value are taken into account, one is which occurred 1 1 p max+2.s max corresponds with the point where recently and the other is overall one, in STEPD. the distribution of distances between errors is The two accuracies are compared by using maximum. STEPD. In this, if the target concept is stationary, it is assumed that the accuracy of a The method defines two thresholds too: classifier for recent W examples will be equal 1 1 1 1 1. p i+2.s i / p max+2. s max < α for the to the overall accuracy from the beginning of warning level. There is a possibility of change in the learning; and whenever there is a significant context further if the calculated value is more decrease of recent accuracy indicates that there than this level. Ahead of this level, the samples is a change in the concerned concept. To obtain are stored in advance of context change the value for observed significant level, a chi- possibility. square test is performed by computing a statistic and the obtained value is compared to

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 50 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

the percentile of the standard normal The outcome of the status is presented in distribution. If this value is less than a terms of three states termed as Stable, Warning significance level, then the null-hypothesis is and Drift state. Stable state occurs when there is rejected, assuming that a concept drift has no change in the generated input data stream. If occurred. In this, two significant levels viz. any changes are observed in the streaming data warning and drift are also used which are with Pr{µ-2σ ≤ p ≤ µ+2σ}≈ 0.95, it results in a similar to the ones used in DDM, EDDM, PHT, warning state as well as with Pr{µ-2σ ≤ p ≤ and ECDD detection methods. µ+2σ}≈ 0.99 results in drift level. Here ‘p’ is the statistic, computed from the last observed values G. HDDM : Hoeffding Bound Drift Detection and ‘µ’ is for expected value. Alternative Method classifier will replace the old classifier HDDM method based on the design of automatically with occurrence of the changes statistical process control by using a Hoeffding (drift) in the streaming data. inequality which assumes only independent and bounded random variables and no probability IV. CONCLUSION function is assumed [19]. Two important In order to detect any occurrence of modules namely change detector and learning changes either in online or offline streaming data algorithm were proposed in HDDM for detection or samples i.e., whenever target functions are 1 of the change in concepts. Loss function: L(c i, ci changing, a drift detection technique should be ) is used to detect changes in change detector used to intimate the changes in the data which module. Loss function results in two states: improves the accuracy of the classification 1  in-control (L(c i, ci )=0), if concept is stable process. As drift detection and classification 1 algorithms are needed while changes occurs in  out-control (L(c i, ci ) =1), if changes are detected data stream. There are numerous drift detection techniques like DDM, Drift detection through re- In this technique, following two tests are sampling, EDDM, ECDD, STEPD, DCDD, performed: HDDM, etc.  A-test or Bounded Moving Averages - Some of the drift detection techniques implemented using window strategies uses any of the windowing methods, statistical or approach ensemble methodologies. This Paper gives an overview about concept drift and different types  W-test or Bounded Weighted Moving of drift detection techniques which are classified Averages -implemented using weighted in terms of Parametric and Non-Parametric instance approach. methods for online process along with its requirements and details of changes.

Table 1: Comparative study of ONLINE & OFFLINE drift detection algorithms S. Author Method Merit Demerit No Identifies when process is out It uses only the information from of control by using mean, last sample provided & Walter Shewhart median or standard deviation insensitive to small process 1 A. Control methods. shifts, violates likelihood Shewhart chart principle and uses ARL(Average Run Lengths) for finding the deviation. Efficient for detecting small This CUSUM charts or schemes CUSUM shifts in the process mean from results as more complicated E. S. 2 Control shifts of 0.5 to 2 standard nature in design process. And it Page chart deviations from the target is slower to detect large shifts in mean. the process mean.

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 51 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

EWMA Very efficient in detecting Slower in detecting large shifts S. S. 3 Control small shifts i.e., from .5 sigma in the process mean. Roberts charts to 2 sigma in process mean. Bifet et Used to sudden drifts in Gives high false positive rate and 4 ADWIN al voluminous data. in-tolerance. By using Hoeffding bounds The drawback with this is, it is Bifet and results in less memory - time useful in dealing changes only 5 ADWIN2 Gavalda limits and guarantees on false for mean values. positive and negative rates. Well suits in detecting abrupt Need more memory and time 6 Gama et changes and gradual changes, when gradual change is very DDM al. but for only when the gradual slow and triggers false alarms change is not very slow quickly and frequently Proves efficient and improved Less efficient for noisy data. Baena- for gradual drifts and it also 7 Garc et EDDM keeps a better performance al. with sudden concept drift. Used for identifying an No guarantee of accurate increase in the mean of performance.

misclassification of the Ross & 8 streaming data. This method Adams ECDD possess both single pass and et al computationally efficient with an overhead of O(1). Used to find the difference in The algorithm have to reset Nishida accuracy on older and more every time upon change 9 STEPD et al recent training examples along detection. with a statistical test. Isvani Very efficient for both abrupt Frıas and gradual changes detection In this, there is no distribution Blanco using A-test & W-test in terms over the incoming samples. 10 and HDDM of WARNING, STABLE & Jose del DRIFT states. Campo Avila

V. REFERENCES Concept Drift Adaptation, ACM Comp. [1] Mohamed Medhat Gaber, Arkady Zaslavsky Surv., Vol. 46, No. 4, 2014. and Krishnaswamy, “Mining Data Streams: [5] J. Gama, P. Medas, G. Castillo, and P. A Review”, SIGMOD Record, Vol. 34, No. Rodrigues, “Learning with drift detection”, 2, 2005. in Proc. SBIA Adv. Artificial Intelligence, [2] Christina V, Karpagavalli P, Suganya G: “A Vol. 3171, pp. 286–294, 2004. Study On Email Spam Filtering [6] M. Baena, J. Del Campo, R. Fidalgo, A. Techniques”, Intl. J. of Comp. Applications, Bifet, R. Gavalda, and R.Morales: “Early 0975 – 8887 p, Vol. 12– Issue No.1, 2010. Drift Detection Method”, Proc. of 4th Intl. [3] H. Wang, W. Fan, P. Yu, and J. Han, Conference on Knowledge Discovery Data “Mining Concept-Drifting Data Streams Streams, pp. 77–86, 2006. th using Ensemble Classifiers”, Proc. of 9 [7] A. Bifet and R. Gavalda, “Learning from ACM SIGKDD Int. Conf. on Knowledge Time-Changing Data with Adaptive Discovery Data Mining, pp. 226– 235, 2003. Windowing”, Proc. of SIAM Intl. Conf. on [4] J. Gama, I. Zliobaite, A. Bifet, M. Data Mining, pp. 443–448, 2010. Pechenizkiy, and Pouchachia, “A Survey on

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 52 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

[8] G. Ross, D. Tasoulis, and N. Adams: “Non- [17] M. B. (Thijs) Vermaat, Roxana A. Ion, Parametric Monitoring of Data Streams for Ronald J. M. M. Does1, and Chris A. J. Changes in Location and Scale, Klaassen, “A Comparison of Shewhart Technometrics”, Vol. 53, No. 4, pp. 379– Individuals Control Harts Based on Normal, 389, 2011. Non-parametric, and Extreme- value [9] M. Datar, A. Gionis, P. Indyk, and R. Theory”, Quality & Reliability Engineering Motwani, “Maintaining Stream International, 2003. over Sliding Windows”, SIAM J. Comp., [18] Joseph Carmona and Ricard Gavald,“Online Vol. 31, No. 6, pp. 1794–1813, 2002. Techniques for Dealing with Concept Drift in [10] D. Montgomery, “Introduction to Statistical Process Mining”, Proc. of Int. Symp. on Data Quality Control”, 5th Edition, New York, Analytics (IDA), pp 90-102, 2012. USA: Wiley, 2001. [19] Isvani Frıas - Blanco Jose Del Campo, [11] E. Dalton, “EWMA charts”, NCSS “Online Non – Parametric Drift Detection Statistical Software chapter 249, pg 249-1 to Methods on Hoeffding’s Bounds”, IEEE 249-18, 2014. Transactions on Knowledge and Data Engg., [12] Albert Bifet and Ricard Gavald`a, Kalman: Vol. 27, No.3, 2015. “Filters for an Adaptive Windows for [20] B. Babcock, S. Babu, M. Datar, R. Motwani, Learning in Data Streams”, Discovery and J. Widom, “Models and Issues in Data Science, Vol. 4265 of Lecture Notes in Stream Systems”, Proc. of 21st ACM Symp. Comp. Sc., Springer, pages 29–40, 2006. on Principles of Database Systems, 2002. [13] Albert Bifet and Ricard Gavalda, “Learning [21] B. Babcock, M. Datar, and R. Motwani, from time changing data with adaptive “Sampling from a Moving Window Over window”,in SDM, SIAM, 2007 (Repeat of Streaming Data”, Proc. of 13th Annual ACM [7]). - SIAM Symp. on Discrete Algorithms, 2002. [14] Nishida and K. Yamauchi, “Detecting [22] M. Datar, A. Gionis, P. Indyk, and R. Concept Drift Using Statistical Testing”, Motwani, “Maintaining Stream Statistics Proc. of 10th Intl. Conf. on Discovery Sc., pp. over Sliding Windows”, SIAM J. on 264–269, 2007. Computing, 14(1), pp 27–45, 2002. [15] G. J. Ross, N. M. Adams, D. Tasoulis, D. [23] S. Muthukrishnan, “Data Streams, Alg’s and Hand, “Exponentially Weighted Moving appl’s”, Proc. of 14th Annual ACM SIAM Average Charts for Detecting Concept Symp. on Discrete Algorithms, 2003. Drift”, Intl. J. of Pattern Recognition Letters, [24] Leslie Valiant, “Probability Approximate 191-198, 2014. Correct (PAC), Universal Framework for [16] K. Nishida, “Learning and Detecting Learning in SIAM News, Volume 46, No. 8 Concept Drift, A Dissertation” Doctor of (2013). Philosophy in Info. Sc. and Tech., Graduate School of Info., 2008.

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-5, ISSUE-3, 2018 53