A New Framework for On-Line Change Detection
Total Page:16
File Type:pdf, Size:1020Kb
A New Framework for On-Line Change Detection Ankur Jain and Yuan-Fang Wang Department of Computer Science University of California Santa Barbara, CA 93106 SUMMARY In this paper, an on-line change-detection algorithm is proposed. The algorithm is applicable for detecting the changes in both independent and dependent random processes. It is specifically tailored for on-line applications, and uses only a small amount of memory and a reasonable computation effort. The proposed algorithm does not require the knowledge of the form (e.g., the distribution is Gaussian) or the parameter (e.g., the distribution has zero mean) of the probability distributions of the processes before and after the changeover. This represents a significant relaxation of the assumptions in most other algo- rithms (e.g., CUSUM and GLR) that must know the form and the parameter value of the process before the changeover (often the process after the changeover as well). While such knowledge is available in, say, the process control applications, it is not true for many others. For example, in signal segmentation, the statistical properties of the before and the after processes are often not known. What is important is that if a certain statistical property has changed in the signal, then the signal should be broken into pieces. Theoretically, it is proven that the proposed algorithm produces the correct detection results in the expected sense, and is an unbiased estimator of the changeover location under certain general conditions. In practice, not only the proposed technique is more general than the traditional techniques, but it is significantly more accurate, especially in the difficult cases where the populations of the before and after processes are not well separated. As will be demonstrated in the experimental results, the CUSUM method, widely regarded as one of the best techniques for process monitoring, performs significantly worse than the new technique in terms of both the detection accuracy and the detection bias. 1 1 INTRODUCTION Detecting changes (novelty, abnormality, irregularity, faults, etc.) in data is important in diverse application domains. To give a few examples: The ability to detect malicious intrusion attempts and identify denial-of- service attacks is a must in the modern computer networks. A network of sensors might be deployed in a forest to monitor the humidity, temperature, and acidity of the soil. Abnormal readings might indicate an incipient forest fire or illegal toxic waste dumping. In signal analysis, a basic operation is to decompose a signal into stationary, or weakly stationary, segments with distinct statistical properties. Quality control in an assembly line requires continuous, on-line monitoring of the production process to identify it as either in control or out of control. In an aircraft or a spacecraft navigation system, it is critical to identify and isolate faulty sensors/devices to maintain the normal system operation under adverse conditions. In a water (air) quality control system, it is important that an alarm is raised if an abnormal concentration of certain harmful substances (e.g., lead in the water or ozone in the air) is detected. All these problems fall in the realm of change/abnormality/fault/novelty detection. It is obvious that a good change-detection algorithm must be able to quickly and accurately identify the changes while raising as few false alarms as possible. In [6], the change-detection algorithms are classified into two categories, namely, that of detecting the changes (1) in independent random sequences1 with the distributions parameterized by scalar parameters, and (2) in dependent sequences characterized by a multi-dimensional parameter vector. The latter can be further divided based on the types of the changes allowed (either additive or non-additive/spectral changes) and the system models assumed (a linear or a nonlinear system model). In the case of a linear system model, the model can be an iterative model (corresponding to an FIR filter), a recursive model (corresponding to an IIR filter), or a state-space model (corresponding to the model used in the Kalman Filter) [15, 26, 8, 25]. While the change-detection problems in the second category appear to be much more complicated than those in the first category, these two types of problems are actually intimately related. One systematic approach (Fig. 1) of detecting the changes in a dependent random sequence is to perform a “whitening” transform 1An independent (dependent) random sequence is a realization of a random process where the random variables at each time instance are independent (not independent). 2 Figure 1: A unified framework to address the change-detection problems of both dependent and independent random processes. (or an inverse transform). The whitening transform T−1, together with the original system transform T, produces an identity mapping as shown in Fig. 1. It has the effect of “peeling away” the dependency induced by the dependency models to reveal the “driving force” behind the system change. Often, the driving force can be treated as an independent random process, which implies that the change-detection algorithms in the first category are again applicable. 2 METHODS The proposed technique is specifically designed to address change detection in independent random se- quences. The algorithm operates based on the hypothesis-and-verification principle: In the first stage, we compute an index where the changeover is most likely to occur within the processing window (hypothe- sis), and in the second stage, we validate if such a hypothesis is correct (verification), as there might be no changeover at all. We formulate the change-detection problem in an independent random sequence mathematically as fol- lows: Consider a data stream X1; X2; · · · ; Xn of length n, which is composed of random variables of possibly two distributions, i.e., X1; X2; · · · ; Xk−1 are I.I.D. variables having a distribution of D1 while 2 Xk; Xk+1; · · · ; Xn are I.I.D. variables having a distribution of D2. The changeover point k is unknown and can be anywhere between 2 and n, or beyond (if k = 1 or k > n, then there is only one type of I.I.D. variables in X1; · · · ; Xn). The core algorithm is then for determining the occurrence and the position of 2As will be shown in the experimental results, the analysis window size n in practice is very short. Hence, it is reasonably to assume that there is at most one changeover within the window. 3 the changeover. For sensor-network and computer-network applications, it is especially important that the design is for an on-line algorithm using a small amount of memory and a reasonable computation effort. In an on-line application, data are continuously streamed from a source to a receiver. Our algorithm op- erates by sliding an analysis window of length n over the data stream, Xi; i = 1; · · ·, and examines if the changeover point occurs within the window. To simplify the notation, we denote the n data items within the analysis window X1; · · · ; Xn, regardless of their true positions in the stream. Furthermore, we allow the algorithm to operate on a slightly longer data stream X1−m; · · · ; X0; X1; · · · ; Xn; Xn+1; · · · ; Xn+m; or the processing window is padded with m extra data items at each end (X1−m; · · · ; X0, and Xn+1; · · · ; Xn+m). This padding is necessary to ascertain with accuracy the occurrence of the changeover point from index 2 up to and including index n. The window can be positioned in either an overlapped or a non-overlapped manner over the data stream. If the window is positioned in a non-overlapped manner over successive groups of n data items, there is an average latency of n=2 + m steps in reporting the changeover, if one did occur within the window. If the window is allowed to center at each and every data item, the latency is cut to m steps. While we do not assume that the form or he parameters of the underlying distributions are known, we can nonetheless test for the changeover by observing if certain statistical properties (such as the data stream’s mean and variance) have changed. Denote the statistics to watch out for as f. Then for an index l; 1 < l ≤ n within the processing window, we define the following four functions of l (the means and variances of the statistics of the before and after component distributions, separated at l)3 l−1 l−1 f(X ) (f(X )−f¯ )2 i=1−m i 2 i=1−m i 1 f¯1(l) = ; s (l) = ; and n1 f1 n1−1 (1) Pn+m Pn+m f(X ) (f(X )−f¯ )2 f¯ (l) = i=l i ; s2 (l) = i=l i 2 2 n2 f2 n2−1 P P where n1 = m + l − 1, n2 = n + m − l + 1, and n1 + n2 = n + 2m. If the windowed stream does contain two distributions, intuitively, each individual distribution should be homogeneous in f while the two distributions should be distinct in f. Hence, we define two statistics on the stream: the between-class 3To be consistent with the notations commonly used in the statistics [46, 43], we denote the mean, standard deviation, and ¯ 2 2 variance of the statistics f of a finite sample as f; sf ; and sf , while those of the underlying population as µf ; σf , and σf . It is important to properly distinguish the sample quantities and the population quantities. For example, the sample variance for a sample of size n is computed by dividing the squared sum by n − 1 instead of n if the sample mean is estimated from the same data set. 4 scatter: n1n2 2 sb(l) = (f¯1(l) − f¯2(l)) (2) (n1 + n2) that measures the degree of dissimilarity between the two distributions (large numbers indicating a high degree of dissimilarity) and the within-class scatter: 2 2 sw(l) = (n1 − 1)sf1 (l) + (n2 − 1)sf2 (l) (3) that measures the degree of homogeneity within the two distributions, weighed by the sample size (small numbers indicating a high degree of homogeneity).