The Modified Cusum Algorithm for Slow and Drastic Change Detection in General Hmms with Unknown Change Parameters

THE MODIFIED CUSUM ALGORITHM FOR SLOW AND DRASTIC CHANGE DETECTION IN GENERAL HMMS WITH UNKNOWN CHANGE PARAMETERS Namrata Vaswani ECE Dept., Georgia Institute of Technology, Atlanta, GA 30332 [email protected] ABSTRACT defined but for nonlinear systems, linearization techniques like ex- tended Kalman filters are the main tools [6]. These are compu- We study the change detection problem in a general HMM when tationally efficient but are not always applicable. In [7], the au- the change parameters are unknown and the change can be slow thors attempt to use a particle filtering approach for evaluating the or drastic. Drastic changes can be detected easily using the in- CUSUM statistic without linearization. This, in the most general crease in tracking error or the negative log of observation likeli- case (when the PF is not asymptotically stable), would require to hood (OL). But slow changes usually get missed. We have pro- run (1 + t) PFs to evaluate CUSUM at time t. In [7], they define a posed in past work a statistic called ELL which works for slow modification of the CUSUM statistic which has non-growing com- change detection. Now single time estimates of any statistic can putational complexity with t. be noisy. Hence we propose a modification of the Cumulative Sum For unknown changed system parameters, the LR can be re- (CUSUM) algorithm which can be applied to ELL and OL and thus placed by Generalized Likelihood Ratio (GLR). The solution of improves both slow and drastic change detection performance. GLR for linear systems in well known [6]. But for nonlinear systems, CUSUM applied to GLR would require to run one PF for each value of the unknown parameter. In [8], the authors con- 1. INTRODUCTION sidered a case where the unknown parameter belongs to a finite set with cardinality P . This required running (1 + P t) PFs at Change detection is required in many practical problems arising in time t to evaluate CUSUM at t. Hence the authors proposed to quality control, flight control, fault detection and in surveillance search over a finite past of length ¢, i.e. to use CUSUMt;¢ = problems like abnormal activity detection [1]. In most cases, the max0·p·¢ LR(p; t). For the general case where the unknown pa- underlying system in its normal state can be modeled as a paramet- rameter belongs to an uncountable set, ML parameter estimation ric stochastic model which is usually nonlinear. The observations procedures have been proposed, see a recent survey article [9]. But are usually noisy (making the system partially observed). Such a most of these use a PF estimate of the joint posterior of all states, system forms a “general HMM” [2] (also referred to as a “partially N ¼t (dx1:tjY1:t) and the error in the joint posterior estimated us- observed nonlinear dynamical model” or a “stochastic state space ing a fixed number of particles increases with time [9] (unstable). model” in different contexts). It can be approximately tracked (es- Also, all of the above algorithms are based on observation LR and timate probability distribution of the hidden state variables given hence are unable to detect slow changes [1]. observations) using a Particle Filter (PF) [3]. In this paper, we propose to modify the CUSUM algorithm in We study here the change detection problem in a general HMM a different way. This modification is applicable to any single time when the change parameters are unknown and the change can be instant change detection statistic. We apply it to both OL and ELL slow or drastic. We use a PF to estimate the posterior probability and so are able to detect drastic as well as slow changes. Also, it distribution of the state at time t, Xt, given observations up to t, can be evaluated using a single PF even when the unknown change 4 P r(Xt 2 dxjY1:t) = ¼t(dx). Even when change parameters are parameter belongs to an uncountable set and the CUSUM statistic unknown, drastic changes can be detected easily using the increase estimate is stable. The paper is organized as follows: In Section 2, in tracking error [4] or the negative log of observation likelihood we discuss ELL and OL. The modified CUSUM algorithm and its (OL). OL is the negative log likelihood of the current observation application to ELL and OL is discussed in Section 3. In Section conditioned on all past observations. But slow changes usually get 4, we study two example applications, discuss their stability and missed. We have proposed in past work [5], a statistic called ELL show simulation results. Conclusions are presented in Section 5. (Expected Log-Likelihood) which is able to detect slow changes. We have also shown complementary behavior of ELL and OL for 1.1. The General HMM Model and Problem Definition slow and drastic changes [1]. Now single time instant estimates of ELL or OL may be noisy The system (or state) process fXtg is a Markov process with state and are prone to outliers. Hence in practice, one needs to aver- transition kernel Qt(xt; dxt+1) and the observation process is a age the statistic over a set of past time instants. A principled way memoryless function of the state given by Yt = ht(Xt) + wt of doing this is the CUSUM algorithm [6]. For known change where wt is an i.i.d. noise process and ht is, in general, a non- parameters, the CUSUM algorithm finds the maximum (over all linear function. The conditional distribution of the observation previous time instants, t ¡ p + 1) of the observation likelihood ra- given state, Gt(dyt; xt), is assumed to be absolutely continuous 4 tio, LR(p; t), assuming that the change occurred at time t ¡ p + 1 and its pdf is given by gt(Yt; x) = Ãt(x). Now p0(x) (the prior (see equation (2) in Section 3). For linear systems, the LR is well initial state distribution), Gt(dyt; xt), Qt(xt; dxt+1) are known and assumed to be absolutely continuous1. Thus the prior distri- The Kerridge Inaccuracy [10] between two pdfs p and q is de- bution of the state at any t is also absolutely continuous and ad- R 4 c fined as K(p : q) = p(x)[¡ log q(x)]dx. We have ELL(Y1:t) = mits a density, pt(x). Note that we use the superscript is used 0 0 0 E¼t [¡ log pt (x)] = K(¼t : pt ). to denote any parameter related to the changed system, for the 0 c;0 Why ELL works: Now, taking expectation of ELL(Y1:t) = original system and for the case when the observations of the 0 0 K(¼t : pt ) over normal observation sequences, we get changed system are filtered using a filter optimal for the original 0 0 0 2 EY 0 [ELL(Y1:t)] = EY 0 E¼0 [¡ log pt (x)] = Ep0 [¡ log pt (x)] = system . Also the unnormalized filter kernel [2] is defined as 1:t 1:t t t 0 0 0 4 0 Rt(xt; dxt+1) = Ãt(xt+1)Qt(xt; dxt+1). H(pt ) = K(pt : pt ) = EKt where H(:) denotes entropy. Sim- c We assume that the normal (original/unchanged) system has ilarly, for the changed system observations, EY c [ELL(Y1:t)] = 0 1:t state transition kernel Q . A change (which can be slow or drastic) 4 t K(pc : p0) = EKc, i.e. the expectation of ELL(Y c ) taken over in the system model begins to occur at some finite time t and lasts t t t 1:t c changed system observation sequences is actually the Kerridge In- till a final finite time tf . In the time interval, [tc; tf ], the state c c 0 accuracy between the changed system prior, pt , and the original transition kernel is Qt and after tf it again becomes Qt . Both 0 c system prior, pt , which will be larger than the Kerridge Inaccu- Qt and the change start and end times tc; tf are assumed to be 0 0 0 racy between pt and pt (entropy of pt ) [11]. unknown. The goal is to detect the change, with minimum delay. c Now, ELL will detect the change when EKt is “significantly” 4 larger than EK0. Setting the change threshold to · = EK0 + 1.2. Approximate Non-linear Filtering using a Particle Filter p t t t 0 0 0 3 VKt ; where VKt = V arY1:t (Kt ); will ensure a false alarm The problem of nonlinear filtering is to compute at each time t, probability less than 0:11 (follows fromp the Chebyshev inequality the conditional probability distribution, of the state Xt given the c c [12]). By the same logic, if EKt ¡ 3 VKt > ·t then the miss observation sequence Y1:t, ¼t(dx) = P r(Xt 2 dxjY1:t). The probability [12] (probability of missing the change) will also be transition from ¼t¡1 to ¼t is: less than 0:11. Ãt¼tjt¡1 ¼t¡1 —-> ¼tjt¡1 = Qt(¼t¡1) —-> ¼t = 2.2. When ELL fails: The OL Statistic (¼tjt¡1;Ãt) The above analysis assumed no estimation errors in evaluating A Particle Filter [3] is a recursive algorithm for approximate non- ELL and if this were true ELL would work for drastic changes linear filtering which produces at each time t, a cloud of N par- as well. But, the PF being used is optimal for the unchanged sys- (i) N ticles fxt g whose empirical measure, ¼t , closely “follows” ¼t. tem. Hence when estimating ¼t (required for evaluating the ELL) N 4 for the changed system, there is “exact filtering error” [1].

The Modified Cusum Algorithm for Slow and Drastic Change Detection in General Hmms with Unknown Change Parameters

Quickest Change Detection

Unsupervised Network Anomaly Detection Johan Mazel

Statistical Methods for Change Detection - Michèle Basseville

A Survey of Change Detection Methods Based on Remote Sensing Images for Multi-Source and Multi-Objective Scenarios

Change Detection Algorithms

(Quickest) Change Detection: Classical Results and New Directions Liyan Xie, Student Member, IEEE, Shaofeng Zou, Member, IEEE, Yao Xie, Member, IEEE, and Venugopal V

Robust Change Detection and Change Point Estimation for Poisson Count Processes Marcus B

A Markovian Approach to Unsupervised Change Detection with Multiresolution and Multimodality SAR Data

Change Detection Under Autocorrelation ∑

An Unsupervised Change Detection Method Using Time-Series of Polsar Images from Radarsat-2 and Gaofen-3

Event Detection in Artigo Data

Sequential Change-Point Detection Whenunknown Parameters Are