Tracking Dynamic Sparse Signals with Kalman Filters: Framework and Improved Inference

Proceedings of the 10th International Conference on Sampling Theory and Applications Tracking Dynamic Sparse Signals with Kalman Filters: Framework and Improved Inference Evripidis Karseras , Kin Leung and Wei Dai Department of Electrical and Electronic Engineering, Imperial College, London, UK fe.karseras11, kin.leung, [email protected] Abstract—The standard Kalman filter performs optimally for This is of great importance since it provides all the necessary conventional signals but tends to fail when it comes to recovering statistical information to use in the prediction step of the dynamic sparse signals. In this paper a method to solve this prob- tracking process. Additionally, the inference procedure used lem is proposed. The basic idea is to model the system dynamics with a hierarchical Bayesian network which successfully captures in this framework allows for automatic determination of the the inherent sparsity of the data, in contrast to the traditional active components hence the need for a pre-determined level state-space model. This probabilistic model provides all the of sparsity is eliminated. This is an appealing attribute for an necessary statistical information needed to perform sparsity- on-line tracking algorithm. aware predictions and updates in the Kalman filter steps. A set In this work the aforementioned Bayesian network is em- of theorems show that a properly scaled version of the associated cost function can lead to less greedy optimisation algorithms, un- ployed to extend the state-space model adopted in the tra- like the ones previously proposed. It is demonstrated empirically ditional Kalman filter. This way the problem of modelling that the proposed method outperforms the traditional Kalman sparsity is tackled efficiently. The resulting statistical infor- filter for dynamic sparse signals and also how the redesigned mation from the inference procedure is then incorporated in inference algorithm, termed here Bayesian Subspace Pursuit the Kalman filter steps thus producing sparsity-aware state (BSP) greatly improves the inference procedure. estimates. I. INTRODUCTION A set of theorems dictate that a proper scaling of the cost function associated with the inference procedure can lead to The Kalman filter has been the workhorse approach in the more efficient inference algorithms. The techniques initially area of linear dynamic system modelling in both practical proposed are greedy methods at heart. By scaling the cost and theoretic scenarios. The escalating trend towards sparse function with the noise variance, and by using knowledge signal representation has rendered this estimator to be useless gained from well known compressed sensing algorithms, it when it comes to tracking dynamic sparse signals. It is easy is possible to redesign these methods to admit better quali- to verify that the estimation process behind the Kalman filter ties. The gains are two fold. Firstly, the improved inference is not fit for sparse signals. Intuitively, the Gaussian prior mechanism bears far better qualities than the one previously distribution placed over the system’s observations does not proposed. Secondly, the proposed method outperforms the place any sparsity constraints over the space of all possible traditional Kalman filter in terms of reconstruction error when solutions. it comes to dynamic sparse signals. The Kalman filter was externally modified in the bibli- In Section II we present the basic idea for amalgamating the ography to admit sparse solutions. The idea in [1] and [2] Bayesian network of the RVM in the Kalman filter, termed is to enforce sparsity by thresholds. Work in [3] adopts a here Hierarchical Bayesian Kalman filter (HB-Kalman). In probabilistic model but signal amplitudes and support are Section III we present as set of theorems and explain the estimated separately. Finally, the techniques presented in [4] motivation to improve upon previous techniques. Additionally use prior sparsity knowledge into the tracking process. All we provide the steps for a revised inference algorithm based these approaches typically require a number of parameters to on the Subspace Pursuit (SP) reconstruction algorithm in [8], be pre-determined. It also remains unclear how these methods termed here Bayesian Subspace Pursuit (BSP). In Section IV perform towards model and parameter mismatch. we demonstrate the performance of the proposed methods in For a single time instance of the sparse reconstruction some synthetic scenarios. problem, the Relevance Vector Machine (RVM) introduced in [10] was used with great success in Compressed Sens- II. HIERARCHICAL BAYESIAN KALMAN FILTER ing applications [5] and basis selection [6]. The hierarchical The system model is described by the following equations: Bayesian network behind the RVM achieves highly sparse x = F x + z ; (1) models for the observations not only providing estimates for t t t−1 t sparse signals but on their full posterior distributions as well. yt = Φtxt + nt: (2) where vectors xt; yt denote the system’s state and observa- The authors would like to acknowledge the European Commission for funding SmartEN ITN (Grant No. 238726) under the Marie Curie ITN FP7 tion respectively. The state innovation and observation noise programme. processes are modelled by zt and nt respectively. 224 Proceedings of the 10th International Conference on Sampling Theory and Applications n We assume that signal xt 2 R is sparse in some domain. Differently from the standard Kalman filter, one has to which is considered to remain the same at all time instances perform the additional step of learning the hyper-parameters (e.g the frames of a video are sparse in the wavelet domain). αt. From Equation (2) we get ye;t = Φtzt + nt where a This allows to set the state transition matrix Ft equal to the sparse zt is preferred to produce a sparse xt. Following the unitary matrix I. Equation (1) becomes: analysis in [10] and [9], maximising the likelihood p(ytjαt) is equivalent to minimising the following cost function: xt = xt−1 + zt T −1 As in the standard Kalman filter we adopt the Gaus- L(αt) = log jΣαj + ye;tΣα ye;t; (3) sian assumption so that: p(z ) = N (0; Z ), p(n ) = 2 −1 T t t t where Σα = σ I + ΦtAt Φt . The algorithms described in 2 N 0; σ I ; and p(xtjxt−1) = N (xt−1; Zt) and p (ytjxt) = [9] can be applied to estimate αt. Note that the cost function 2 N Φxt; σ I . At each time instance, the Kalman filter in- L(α) is not convex. The obtained estimate αt is generally sub- volves the prediction step where the parameters of p(xtjyt−1) optimal and details on the estimation of the globally optimal are calculated, while the update step evaluates those of αt are given in the next section. p(xtjyt). The advantages of the standard Kalman filter include the ability to track the full statistics, and that the mean squared III. BAYESIAN SUBSPACE PURSUIT error solution coincides with the maximum posterior solution Here we discuss the performance guarantees for a single which has a closed form. The major issue when applying the time instance of the inference procedure. For convenience, filter to dynamic sparse signals, is that the solution is typically subscript t is dropped and focus is turned to Equation (2) −1 not sparse. This drawback is due to the fact that in the standard where xjα ∼ N 0; A . This was analysed in [6] for the approach, the covariance matrix Zt is priorly given. Variants purpose of Basis Selection. It had also been proven in [6] that of the Kalman filter such as the non-linear Kalman filter also a maximally sparse solution of y = Φx attains the global suffer because of the special nature of the of the non-linearities minimum of the cost function. However, the analysis did not associated with sparse reconstruction. specify the conditions to avoid local minima. By contrast, we To alleviate this problem, the key idea behind Sparse provide a more refined analysis. Due to space constraints, only Bayesian Learning (SBL) [10] is employed. As opposed to the main results are presented. 2 the traditional Kalman filter where the covariance matrix Zt We follow [6] by driving the noise variance σ ! 0. The of zt is given, here it is assumed that the state innovation following Theorem specifies the behaviour of the cost function process is given by: L (α). −1 zt ∼ N 0; At ; Theorem 1. For any given α, define the set I , f1 ≤ i ≤ n : 0 < αi < 1g. Then it holds that: where A = diag (α) = diag ([α1; ··· ; αn]t), and the hyper- 2 parameters αi are unknown and have to be learned from yt. 2 y lim σ L (α) = y − ΦI ΦI y ; (4) To see how this promotes a sparse solution, let us drop the σ2!0 2 subscript t for simplicity. Then it holds that: where ΦI is a sub-matrix of Φ formed by the columns indexed y n by I, and ΦI denotes the pseudo-inverse of ΦI . −1 Y −1 p (xjα) = N 0; A = N 0; αi : Furthermore, if jIj < m and y 2 span (ΦI ), then L (α) ! i=1 −∞ and σ2L (α) ! 0 as σ2 ! 0. By driving αi = +1 it means that p (xijαi) = N (0; 0); Two observations can be obtained: (a) the scenarios anal- hence it is certain that xi = 0. What remains is to find the max- ysed in [6] can be seen as special cases of Theorem 1 imum likelihood solution of α for the given observation vector where L (α) ! −∞; and (b) a proper scaling of the cost y. The explicit form of the likelihood function p yjα; σ2 was function gives the squared `2-norm of the reconstruction error. derived in [10] and a set of fast algorithms to estimate α and Reconstruction is then equivalent to recovering a support set consequently z and x are proposed in [9].

Load more