Tracking Dynamic Sparse Signals with Hierarchical Kalman Filters: a Case Study

Tracking Dynamic Sparse Signals with Hierarchical Kalman Filters: A Case Study Jason Filos, Evripidis Karseras and Wei Dai Shulin Yan Electrical and Electronic Engineering Department of Computing Imperial College London Imperial College London London, UK London, UK {j.filos, e.karseras11, wei.dai1}@imperial.ac.uk [email protected] Abstract—Tracking and recovering dynamic sparse signals possible with a hierarchy of prior distributions that intuitively using traditional Kalman filtering techniques tend to fail. Com- confines the space of all possible states. The key fact behind pressive sensing (CS) addresses the problem of reconstructing this technique is that it provides estimates on full distributions. signals for which the support is assumed to be sparse but is not fit for dynamic models. This paper provides a study on the Non-Bayesian sparse recovery algorithms do not take into performance of a hierarchical Bayesian Kalman (HB-Kalman) account the signals’ statistics making their use in tracking filter that succeeds in promoting sparsity and accurately tracks sparse signals difficult. The resulting statistical information time varying sparse signals. Two case studies using real-world can be used to make predictions for future states without the data show how the proposed method outperforms the traditional risk of them not being sparse. Kalman filter when tracking dynamic sparse signals. It is shown that the Bayesian Subspace Pursuit (BSP) algorithm, that is at the core of the HB-Kalman method, achieves better performance For multiple time instances the system state can be tracked than previously proposed greedy methods. accurately using Kalman filtering. Unfortunately the classic Index Terms—Kalman filtering; compressed sensing; sparse Kalman approach is not fit for sparse signals. A Kalman Bayesian learning; sparse representations. filtering-based CS approach was first presented in [11] where the filter is externally modified to admit sparse solutions. I. INTRODUCTION The idea in [11] and [12] is to enforce sparsity by applying In this work we consider the problem of reconstructing threshold operators. Work in [13] adopts a probabilistic model time sequences of signals that are assumed to be sparse in but signal amplitudes and support are estimated separately. some transform domain. Recent studies have shown that sparse Finally, the techniques presented in [14] use prior sparsity signals can be recovered accurately using less observations knowledge into the tracking process. All these approaches than what is considered necessary using traditional sampling typically require a number of parameters to be pre-set. criteria using the theory of compressed sensing (CS) [1], [2]. However, there are a number of practical limitations. In this work a sparsity-promoting Bayesian network is First, the recovery of sparse signals using CS consists of employed that extends the data model adopted in traditional solving an NP-hard minimization problem [1], [3]. Secondly, tracking. The problem of sparse signal recovery is tackled CS reconstruction is not fit for dynamic models. Existing efficiently by the hierarchical nature of the Bayesian model. solutions that address the dynamic problem either treat the The statistical information that is obtained as a by-product entire time sequence as a single spatiotemporal signal and of the hierarchical model is then incorporated in updating perform CS to reconstruct it [4], or alternatively apply CS previous estimates to produce sparsity-aware state estimates. at each time instance separately [5]. In [6] a non-Bayesian CS Additionally, the automatic determination of the active com- reconstruction is presented that assumes known sparsity levels ponents solves the problem of having to assume fixed sparsity and a non-dynamic model. A class of adaptive filters, based levels and involves only the noise parameter as the unknown on the least mean squares (LMS) algorithm, are presented in parameter to be adjusted manually. The hierarchical Bayesian [7] and [8]. The adaptive framework of the LMS is used in Kalman filter (Section II) proposed in [15], [16], is tested on these approaches for CS. However, these approaches are not real data: First, the time-domain signal of a real piano record- fit for dynamic sparse signal reconstruction. ing is reconstructed using only 25% of originally sampled For a single time instance, the problem of sparse signal data (Section III). Second, incomplete satellite data, measuring reconstruction was solved in [9] using a Bayesian network that the distribution of ozone gas in the planets’ atmosphere, is elegantly promotes sparsity. This learning framework, referred recovered to yield the original content (Section IV). On basis to as the Relevance Vector Machine (RVM), results in highly of these two case studies, simulations show how the proposed sparse models for the input and has gained popularity in the method outperforms both the traditional Kalman filter, when signal processing community for its use in compressed sensing dealing with dynamic sparse signals, and a state-of-the-art applications [5] and basis selection [10]. Sparsity is rendered greedy CS reconstruction algorithm. II. SYSTEM MODEL Differently from the standard Kalman filter, one has to Let the random variables x and y , at some time t, describe perform the additional step of learning the hyper-parameters t t Φ the system state and observations respectively. The standard αt. From Equation (3) we get ye,t = tqt + nt where a Kalman filter formulation is given by sparse qt is preferred to produce a sparse xt. Following the analysis in [9] and [17], maximising the likelihood p(yt|αt) xt = xt−1 + qt, (1) is equivalent to minimising the following cost function: Φ yt = txt + nt, (2) Σ T Σ−1 L(αt) = log | α| + ye,t α ye,t, (4) where qt and nt denote the process and measurement noise Σ 2 Φ −1ΦT where α = σ I + tAt t . and Φt is a design matrix. We assume that the signal xt ∈ Rn is sparse in some transform domain and that the entries B. Bayesian Subspace Pursuit Φ Rm×n t ∈ are independently identically distributed Gaussian The algorithms described in [17], that optimise cost function 1 random variables drawn from N 0, m . (4), are greedy algorithms at heart. An important observation Based on the Gaussian assumption one has p(xt|xt−1) = can be made by deriving the scaled version of this cost Φ 2 N (xt−1, Qt) and p (yt|xt) = N xt, σ I , with p(qt) = function with the noise variance. After some basic linear al- 0 0 2 N ( , Qt) and p(nt)= N , σ I , so that the Kalman filter gebra manipulation and for a single time instance the function continuously alternates between the prediction and update step. becomes: The prediction step calculates the parameters of p(xt|yt−1) 1 σ2L = σ2 log σ2I + Φ A− ΦT + (5) while the update step evaluates those of p(xt|yt). I I I T Φ 2 ΦT Φ −1ΦT In order to address the sparsity of the state vector, an y I − I(σ A I + I I ) I y. additional level of parameters α is introduced to control the Subscript I denotes the subset of columns ofΦ for which variance of each component xi [9]: 0 < αi < +∞. By taking the limit of Equation (5) for when n 1 noise variance approaches zero we obtain: p (x|α)= N 0, α− = N 0, A−1 , i 2 i=1 2 Φ Φ† Y lim σ L (α)= y − I Iy . (6) σ2→0 2 where matrix A = diag ([α1, ··· , αn]). By driving αi =+∞ This result suggests that the principle behind the optimisation it follows that p (xi|αi) = N (0, 0) and consequently it is of the cost function is the same as the one used in Basis certain that xi =0. Pursuit and subsequently many greedy sparse reconstruction A. Hierarchical Kalman filter algorithms such as the OMP [18] and Subspace Pursuit [19]. The principles behind the Kalman filter and SBL are put In fact it can be shown that the proposed algorithm requires together to derive the Hierarchical Bayesian Kalman (HB- more strict bounds on the mutual coherence of design matrix Kalman) filter. The proposed approach has several advantages. Φ than the OMP algorithm. The proposed algorithm described First, by adopting the same system model as described in in Algorithm 1 adopts attributes from the Subspace Pursuit Equations (1) and (2), one can track the mean and covariance into the optimisation procedure. The qualities of the inference procedure are consequently improved. For a detailed analysis of the state vector xt. Second, the employment of hyper- parameters, to model state innovation, promotes sparsity. of the mathematical derivations the interested reader is referred The measurement noise is chosen to be Gaussian with to [16]. For clarity the fundamental quantities needed for known covariance, i.e. n ∼ N 0, σ2I . The state innovation Algorithm 1 are listed below. These are the scaled versions 0 −1 of the corresponding quantities derived in [17]: process is given by qt ∼ N , At , with At = diag (αt)= 1 1 diag ([α1, ··· , αn] ) and the hyper-parameters αi are learned 2 T 2 − 2 T 2 − t σ si = φ σ C φi, σ qi = φ σ C y. online from the data, as opposed to the traditional Kalman i −i i −i filter where the covariance matrix Q of qt is given. The motivation for deriving scaled versions of the quanti- Similar to the standard Kalman filter, two steps, prediction ties given in [17] is the poor performance of the original and update, need to be performed at each time instance. In the derivations when the noise variance is known a-priori. It is 2 prediction step, one has to evaluate: simple to ascertain this fact by setting σ = 0 in [17].

Load more