EURASIP Journal on Applied Signal Processing
Particle Filtering in Signal Processing
Guest Editors: Petar M. Djuri, Simon J. Godsill, and Arnaud Doucet
EURASIP Journal on Applied Signal Processing Particle Filtering in Signal Processing
EURASIP Journal on Applied Signal Processing Particle Filtering in Signal Processing
Guest Editors: Petar M. Djurić, Simon J. Godsill, and Arnaud Doucet
Copyright © 2004 Hindawi Publishing Corporation. All rights reserved.
This is a special issue published in volume 2004 of “EURASIP Journal on Applied Signal Processing.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Editor-in-Chief Marc Moonen, Belgium
Senior Advisory Editor K. J. Ray Liu, College Park, USA Associate Editors Gonzalo Arce, USA A. Gorokhov, The Netherlands King N. Ngan, Hong Kong Jaakko Astola, Finland Peter Handel, Sweden Douglas O’Shaughnessy, Canada Kenneth Barner, USA Ulrich Heute, Germany Antonio Ortega, USA Mauro Barni, Italy John Homer, Australia Montse Pardas, Spain Sankar Basu, USA Jiri Jan, Czech Vincent Poor, USA Jacob Benesty, Canada Søren Holdt Jensen, Denmark Phillip Regalia, France Helmut Bölcskei, Switzerland Mark Kahrs, USA Markus Rupp, Austria Joe Chen, USA Thomas Kaiser, Germany Hideaki Sakai, Japan Chong-Yung Chi, Taiwan Moon Gi Kang, Korea Bill Sandham, UK M. Reha Civanlar, Turkey Aggelos Katsaggelos, USA Dirk Slock, France Luciano Costa, Brazil Walter Kellermann, Germany Piet Sommen, The Netherlands Satya Dharanipragada, USA Alex Kot, Singapore John Sorensen, Denmark Petar M. Djurić, USA C.-C. Jay Kuo, USA Sergios Theodoridis, Greece Jean-Luc Dugelay, France Chin-Hui Lee, USA Dimitrios Tzovaras, Greece Touradj Ebrahimi, Switzerland Sang Uk Lee, Korea Jacques Verly, Belgium Frank Ehlers, Germany Geert Leus, The Netherlands Xiaodong Wang, USA Moncef Gabbouj, Finland Mark Liao, Taiwan Douglas Williams, USA Sharon Gannot, Israel Yuan-Pei Lin, Taiwan Xiang-Gen Xia, USA Fulvio Gini, Italy Bernie Mulgrew, UK Jar-Ferr Yang, Taiwan
Contents
Editorial, Petar M. Djurić, Simon J. Godsill, and Arnaud Doucet Volume 2004 (2004), Issue 15, Pages 2239-2241
Global Sampling for Sequential Filtering over Discrete State Space, Pascal Cheung-Mon-Chan and Eric Moulines Volume 2004 (2004), Issue 15, Pages 2242-2254
Multilevel Mixture Kalman Filter, Dong Guo, Xiaodong Wang, and Rong Chen Volume 2004 (2004), Issue 15, Pages 2255-2266
Resampling Algorithms for Particle Filters: A Computational Complexity Perspective, Miodrag Bolić, Petar M. Djurić, and Sangjin Hong Volume 2004 (2004), Issue 15, Pages 2267-2277
A New Class of Particle Filters for Random Dynamic Systems with Unknown Statistics, Joaquín Míguez, Mónica F. Bugallo, and Petar M. Djurić Volume 2004 (2004), Issue 15, Pages 2278-2294
A Particle Filtering Approach to Change Detection for Nonlinear Systems, Babak Azimi-Sadjadi and P. S. Krishnaprasad Volume 2004 (2004), Issue 15, Pages 2295-2305
Particle Filtering for Joint Symbol and Code Delay Estimation in DS Spread Spectrum Systems in Multipath Environment, Elena Punskaya, Arnaud Doucet, and William J. Fitzgerald Volume 2004 (2004), Issue 15, Pages 2306-2314
Particle Filtering Equalization Method for a Satellite Communication Channel, Stéphane Sénécal, Pierre-Olivier Amblard, and Laurent Cavazzana Volume 2004 (2004), Issue 15, Pages 2315-2327
Channel Tracking Using Particle Filtering in Unresolvable Multipath Environments, Tanya Bertozzi, Didier Le Ruyet, Cristiano Panazio, and Han Vu Thien Volume 2004 (2004), Issue 15, Pages 2328-2338
Joint Tracking of Manoeuvring Targets and Classification of Their Manoeuvrability, Simon Maskell Volume 2004 (2004), Issue 15, Pages 2339-2350
Bearings-Only Tracking of Manoeuvring Targets Using Particle Filters, M. Sanjeev Arulampalam, B. Ristic, N. Gordon, and T. Mansell Volume 2004 (2004), Issue 15, Pages 2351-2365
Time-Varying Noise Estimation for Speech Enhancement and Recognition Using Sequential Monte Carlo Method, Kaisheng Yao and Te-Won Lee Volume 2004 (2004), Issue 15, Pages 2366-2384
Particle Filtering Applied to Musical Tempo Tracking, Stephen W. Hainsworth and Malcolm D. Macleod Volume 2004 (2004), Issue 15, Pages 2385-2395 EURASIP Journal on Applied Signal Processing 2004:15, 2239–2241 c 2004 Hindawi Publishing Corporation
Editorial
Petar M. Djuric´ Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY 11794, USA Email: [email protected]
Simon J. Godsill Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK Email: [email protected]
Arnaud Doucet Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK Email: [email protected]
In most problems of sequential signal processing, measured The beginnings of particle filtering can be traced back to or received data are processed in real time. Typically, the data the late 1940s and early 1950s, which were followed in the last are modeled by state-space models with linear or nonlinear fifty years with sporadic outbreaks of intense activity [2]. Al- unknowns and noise sources that are assumed either Gaus- though its implementation is computationally intensive, the sian or non-Gaussian. When the models describing the data widespread availability of fast computers and the amenability are linear and the noise is Gaussian, the optimal solution is of the particle filtering methods for parallel implementation the renowned Kalman filter. For models that deviate from make them very attractive for solving difficult signal process- linearity and Gaussianity, many different methods exist, of ing problems. which the best known perhaps is the extended Kalman filter. The papers of the special issue may be arranged into four About a decade ago, Gordon et al. published an article on groups, that is, papers on (1) general theory, (2) applica- nonlinear and non-Gaussian state estimation that captured tions of particle filtering to target tracking, (3) applications much attention of the signal processing community [1]. The of particle filtering to communications, and (4) applications article introduced a method for sequential signal processing of particle filtering to speech and music processing. In this is- based on Monte Carlo sampling and showed that the method sue, we do not have tutorials on particle filtering, and instead, may have profound potential. Not surprisingly, it has incited we refer the reader to some recent references [3, 4, 5, 6]. a great deal of research, which has contributed to making se- quential signal processing by Monte Carlo methods one of General theory the most prominent developments in statistical signal pro- In the first paper, “Global sampling for sequential fil- cessing in the recent years. tering over discrete state space,” Cheung-Mon-Chan and The underlying idea of the method is the approximation Moulines study conditionally Gaussian linear state-space of posterior densities by discrete random measures. The mea- models, which, when conditioned on a set of indicator vari- sures are composed of samples from the states of the un- ables taking values in a finite set, become linear and Gaus- knowns and of weights associated with the samples. The sam- sian. In this paper, the authors propose a global sampling al- ples are usually referred to as particles, and the process of gorithm for such filters and compare them with other state- updating the random measures with the arrival of new data of-the-art implementations. as particle filtering. One may view particle filtering as explo- Guo et al. in “Multilevel mixture Kalman filter” pro- ration of the space of unknowns with random grids whose pose a new Monte Carlo sampling scheme for implement- nodes are the particles. With the acquisition of new data, the ing the mixture Kalman filter. The authors use a multilevel random grids evolve and their nodes are assigned weights structure of the space for the indicator variables and draw to approximate optimally the desired densities. The assign- samples in a multilevel fashion. They begin with sampling ment of new weights is carried out recursively and is based from the highest-level space and follow up by drawing sam- on Bayesian importance sampling theory. ples from associate subspaces from lower-level spaces. They 2240 EURASIP Journal on Applied Signal Processing demonstrate the method on examples from wireless commu- In the other paper, “Bearings-only tracking of manoeu- nication. vring targets using particle filters,” Arulampalam et al. inves- In the third paper, “Resampling algorithms for particle tigate the problem of bearings-only tracking of maneuvering filters: A computational complexity perspective,” Bolicetal.´ targets. They formulate the problem in the framework of a propose and analyze new resampling algorithms for particle multiple-model tracking problem in jump Markov systems filters that are suitable for real-time implementation. By de- and propose three different particle filters. They conduct ex- creasing the number of operations and memory access, the tensive simulations and show that their filters outperform the algorithms reduce the complexity of both hardware and DSP trackers based on standard interacting multiple models. realization. The performance of the algorithms is evaluated on particle filters applied to bearings-only tracking and joint Applications to speech and music detection and estimation in wireless communications. In “Time-varying noise estimation for speech enhancement In “A new class of particle filters for random dynamic sys- and recognition using sequential Monte Carlo method,” Yao tems with unknown statistics,” M´ıguez et al. propose a new and Lee develop particle filters for sequential estimation of class of particle filtering methods that do not assume explicit time-varying mean vectors of noise power in the log-spectral mathematical forms of the probability distributions of the domain, where the noise parameters evolve according to a noise in the system. This implies simpler, more robust, and random walk model. The authors demonstrate the perfor- more flexible particle filters than the standard particle filters. mance of the proposed filters in automated speech recogni- The performance of these filters is shown on autonomous tion and speech enhancement, respectively. positioning of a vehicle in a 2-dimensional space. Hainsworth and Macleod in “Particle filtering applied to Finally, in “A particle filtering approach to change de- musical tempo tracking” aim at estimating the time-varying tection for nonlinear systems,” Azimi-Sadjadi and Krish- tempo process in musical audio analysis. They present two naprasad present a particle filtering method for change de- algorithms for generic beat tracking that can be used across tection in stochastic systems with nonlinear dynamics based a variety of musical styles. The authors have tested the algo- on a statistic that allows for recursive computation of likeli- rithms on a large database and have discussed existing prob- hood ratios. They use the method in an Inertial Navigation lems and directions for further improvement of the current System/Global Positioning System application. methods. In summary, this special issue provides some inter- Applications in communications esting theoretical developments in particle filtering theory In “Particle filtering for joint symbol and code delay esti- and novel applications in communications, tracking, and mation in DS spread spectrum systems in multipath envi- speech/music signal processing. We hope that these papers ronment,” Punskaya et al. develop receivers based on several will not only be of immediate use to practitioners and the- algorithms that involve both deterministic and randomized oreticians but will also instigate further development in the schemes. They test their method against other deterministic field. Lastly, we thank the authors for their contributions and and stochastic procedures by means of extensive simulations. the reviewers for their valuable comments and criticism. In the second paper, “Particle filtering equalization method for a satellite communication channel,” Sen´ ecal´ et al. propose a particle filtering method for inline and Petar M. Djuri´c blind equalization of satellite communication channels and Simon J. Godsill restoration of the transmitted messages. The performance of Arnaud Doucet the algorithms is presented by bit error rates as functions of signal-to-noise ratio. Bertozzi et al. in “Channel tracking using particle filter- ing in unresolvable multipath environments,” propose a new REFERENCES timing error detector for timing tracking loops of Rake re- [1] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel ap- ceivers in spread spectrum systems. In their scheme, the de- proach to nonlinear/non-Gaussian Bayesian state estimation,” lays of each path of the frequency-selective channels are esti- IEE Proceedings Part F: Radar and Signal Processing, vol. 140, mated jointly. Their simulation results demonstrate that the no. 2, pp. 107–113, 1993. [2]J.S.Liu, Monte Carlo Strategies in Scientific Computing, proposed scheme has better performance than the one based Springer, New York, NY, USA, 2001. on conventional early-late gate detectors in indoor scenarios. [3] A. Doucet, N. de Freitas, and N. Gordon, Eds., Sequential Monte Carlo Methods in Practice, Springer,NewYork,USA, Applications to target tracking 2001. In “Joint tracking of manoeuvring targets and classification [4] A. Doucet, S. J. Godsill, and C. Andrieu, “On sequential Monte of their manoeuvrability,” by Maskell, semi-Markov models Carlo sampling methods for Bayesian filtering,” Stat. Comput., are used to describe the behavior of maneuvering targets. The vol. 10, no. 3, pp. 197–208, 2000. [5]P.M.Djuric´ and S. J. Godsill, Eds., “Special issue on Monte author proposes an architecture that allows particle filters to ffi Carlo methods for statistical signal processing,” IEEE Trans. be robust and e cient when they jointly track and classify Signal Processing, vol. 50, no. 2, 2002. targets. He also shows that with his approach, one can classify [6]P.M.Djuric,´ J. H. Kotecha, J. Zhang, et al., “Particle filtering,” targets on the basis of their maneuverability. IEEE Signal Processing Magazine, vol. 20, no. 5, pp. 19–38, 2003. Editorial 2241
Petar M. Djuric´ received his B.S. and M.S. degrees in electrical engineering from the University of Belgrade in 1981 and 1986, re- spectively, and his Ph.D. degree in electrical engineering from the University of Rhode Island in 1990. From 1981 to 1986, he was a Research Associate with the Institute of Nu- clear Sciences, Vinca, Belgrade. Since 1990, he has been with Stony Brook University, where he is a Professor in the Department of Electrical and Computer Engineering. He works in the area of sta- tistical signal processing, and his primary interests are in the theory of modeling, detection, estimation, and time series analysis and its application to a wide variety of disciplines including wireless com- munications and biomedicine. Simon J. Godsill is a Reader in statisti- cal signal processing in the Department of Engineering, Cambridge University. He is an Associate Editor for IEEE Transac- tions on Signal Processing and the Jour- nal of Bayesian Analysis, and is a Mem- ber of IEEE Signal Processing Theory and Methods Committee. He has research inter- ests in Bayesian and statistical methods for signal processing, Monte Carlo algorithms for Bayesian problems, modelling and enhancement of audio and musical signals, tracking, and genomic signal processing. He has published extensively in journals, books, and conferences. He has coedited in 2002 a special issue of IEEE Transactions on Signal Processing on Monte Carlo methods in signal processing and or- ganized many conference sessions on related themes.
Arnaud Doucet wasborninFranceonthe 2nd of November 1970. He graduated from Institut National des Telecommunications in June 1993 and obtained his Ph.D. degree from Universite´ Paris-Sud Orsay in Decem- ber 1997. From January 1998 to February 2001 he was a research associate in Cam- bridge University. From March 2001 to Au- gust 2002, he was a Senior Lecturer in the Department of Electrical Engineering, Mel- bourne University, Australia. Since September 2002, he has been a University Lecturer in information engineering at Cambridge Uni- versity. His research interests include simulation-based methods and their applications to Bayesian statistics and control. EURASIP Journal on Applied Signal Processing 2004:15, 2242–2254 c 2004 Hindawi Publishing Corporation
Global Sampling for Sequential Filtering over Discrete State Space
Pascal Cheung-Mon-Chan Ecole´ Nationale Sup´erieure des T´el´ecommunications, 46 rue Barrault, 75634 Paris C´edex 13, France Email: [email protected]
Eric Moulines Ecole´ Nationale Sup´erieure des T´el´ecommunications, 46 rue Barrault, 75634 Paris C´edex 13, France Email: [email protected]
Received 21 June 2003; Revised 22 January 2004
In many situations, there is a need to approximate a sequence of probability measures over a growing product of finite spaces. Whereas it is in general possible to determine analytic expressions for these probability measures, the number of computations needed to evaluate these quantities grows exponentially thus precluding real-time implementation. Sequential Monte Carlo tech- niques (SMC), which consist in approximating the flow of probability measures by the empirical distribution of a finite set of particles, are attractive techniques for addressing this type of problems. In this paper, we present a simple implementation of the sequential importance sampling/resampling (SISR) technique for approximating these distributions; this method relies on the fact that, the space being finite, it is possible to consider every offspring of the trajectory of particles. The procedure is straightforward to implement, and well-suited for practical implementation. A limited Monte Carlo experiment is carried out to support our findings. Keywords and phrases: particle filters, sequential importance sampling, sequential Monte Carlo sampling, sequential filtering, conditionally linear Gaussian state-space models, autoregressive models.
1. INTRODUCTION depending on their ability to represent the distribution of the hidden state conditional on the observations. The main dif- State-spacemodelshavebeenaroundforquitealongtime ference between the different implementations of the SMC to model dynamic systems. State-space models are used in a algorithms depends on the way this population of particles variety of fields such as computer vision, financial data anal- evolves in time. It is no surprise that most of the efforts in ysis, mobile communication, radar systems, among others. A this field has been dedicated to finding numerically efficient main challenge is to design efficient methods for online esti- and robust methods, which can be used in real-time imple- mation, prediction, and smoothing of the hidden state given mentations. the continuous flow of observations from the system. Ex- In this paper, we consider a special case of state-space cept in a few special cases, including linear state-space mod- model, often referred to in the literature as conditionally els (see [1]) and hidden finite-state Markov chain (see [2]), Gaussian linear state-space models (CGLSSMs), which has re- this problem does not admit computationally tractable exact ceived a lot of attention in the recent years (see, e.g., [4, 5, solutions. 6, 7]). The main feature of a CGLSSM is that, conditionally From the mid 1960s, considerable research efforts have on a set of indicator variables, here taking their values in a been devoted to develop computationally efficient methods finite set, the system becomes linear and Gaussian. Efficient to approximate these distributions; in the last decade, a great recursive procedures—such as the Kalman filter/smoother— deal of attention has been devoted to sequential Monte Carlo are available to compute the distribution of the state variable (SMC) algorithms (see [3] and the references therein). The conditional on the indicator variable and the observations. basic idea of SMC method consists in approximating the con- By embedding these algorithms in the sequential importance ditional distribution of the hidden state with the empirical sampling/resampling (SISR) framework, it is possible to de- distribution of a set of random points, called particles. These rive computationally efficient sampling procedures which particles can either give birth to offspring particles or die, focus their attention on the space of indicator variables. Global Sampling for Sequential Filtering 2243
These algorithms are collectively referred to as mixture by µ ⊗ Q the measure on the product space (X × Y, B(X) ⊗ Kalman filters (MKFs), a term first coined by Chen and Liu B(Y)) defined by [8] who have developed a generic sampling algorithm; closely related ideas have appeared earlier in the automatic con- µ⊗Q(A×B)= µ(dx)Q(x, B) ∀A∈B(X), B ∈B(Y). (1) trol/signal processing and computational statistics literature A (see, e.g., [9, 10] for early work in this field; see [5] and the references therein for a tutorial on these methods; see [3] Let X :(Ω, F ) → (X, B(X)) and Y :(Ω, F ) → (Y, B(Y)) for practical implementations of these techniques). Because be two random variables and µ and ν two measures on these sampling procedures operate on a lower-dimensional (X, B(X)) and (Y, B(Y)), respectively. Assume that the space, they typically achieve lower Monte Carlo variance than probability distribution of (X, Y)hasadensitydenotedby “plain” particle filtering methods. f (x, y)withrespectto µ ⊗ ν.Wedenoteby f (y|x) = ν In the CGLSSM considered here, it is assumed that the f (x, y)/ Y f (x, y) (dy) the conditional density of Y given X. indicator variables are discrete and take a finite number of different values. It is thus feasible to consider every possible 2.2. Sequential importance sampling offspring of a trajectory, defined here as a particular realiza- Let {Ft}t≥0 beasequenceofprobabilitymeasureson tion of a sequence of indicator variables from initial time 0 to def (Zt+1, P (Z)⊗(t+1)), where Z ={z , ..., z } is a finite set with the current time t. This has been observed by the authors in 1 M cardinal equal to M. It is assumed in this section that for any [5, 7, 8], among many others, who have used this property to t λ − ∈ Z such that f − (λ − ) = 0, we have design appropriate proposal distributions for improving the 0:t 1 t 1 0:t 1 accuracy and performance of SISR procedures. = ∀ ∈ In this work, we use this key property in a different way, ft([λ0:t−1, λ]) 0 λ Z,(2) along the lines drawn in [11, Section 3]; the basic idea con- ≥ sists in considering the population of every possible offspring where for any τ 0, fτ denotes the density of Fτ with respect ≥ of every trajectory and globally sampling from this popula- to the counting measure. For any t 1, there exists a finite t P ⊗t ≺ P tion. This algorithm is referred to as the global sampling (GS) transition kernel Qt :(Z , (Z) ) (Z, (Z)) such that algorithm. This algorithm can be seen as a simple implemen- = ⊗ tation of the SISR algorithm for the so-called optimal impor- Ft Ft−1 Qt. (3) tance distribution. Some limited Monte Carlo experiments on prototypal We denote by qt the density of the kernel Qt with respect to examples show that this algorithm compares favorably with to the counting measure, which can simply be expressed as state-of-the-art implementation of MKF; in a joint symbol estimation and channel equalization task, we have in particu- ft λ 0:t−1, λ if ft−1 λ0:t−1 = 0, − , = f − λ − (4) lar achieved extremely encouraging performance with as few qt λ0:t 1 λ t 1 0:t 1 as 5 particles, making the proposed algorithm amenable to 0 otherwise. real-time applications. In the SIS framework (see [5, 8]), the probability distribu- tion F on Zt+1 is approximated by particles (Λ(1,t), ..., Λ(N,t)) 2. SEQUENTIAL MONTE CARLO ALGORITHMS t associated to nonnegative weights (w(1,t), ..., w(N,t)); the esti- 2.1. Notations and definitions mator of the probability measure associated to this weighted particle system is given by Before going further, some additional definitions and nota- tions are required. Let X (resp., Y) be a general set and let N (i,t) = w δΛ(i,t) B(X)(resp.,B(Y)) denote a σ-algebra on X (resp., Y). If Q FN = i 1 . (5) t N (i,t) is a nonnegative function on X × B(Y) such that i=1 w (i) for each B ∈ B(Y), Q(·, B) is a nonnegative measur- These trajectories and weights are obtained by drawing N in- able function on X, dependent trajectories Λ(i,t) under an instrumental probabil- (ii) for each x ∈ X, Q(x, ·)isameasureonB(Y), t+1 ⊗(t+1) ity distribution Gt on (Z , P (Z) ) and computing the then we call Q a transition kernel from (X, B(X)) to importance weights as (Y, B(Y)) and we denote Q :(X, B(X))≺(Y, B(Y)). If for Λ(i,t) each x ∈ X, Q(x, ·)isafinitemeasureon(Y, B(Y)), then (i,t) = ft ∈{ } w (i,t) , i 1, ..., N ,(6) we say that the transition is finite. If for all x ∈ X, Q(x, ·) gt(Λ ) is a probability measure on (Y, B(Y)), then Q is said to be a Markov transition kernel. where gt is the density of the probability measure Gt with Denote by B(X) ⊗ B(Y) the product σ-algebra (the respect to the counting measure on (Zt+1, P (Z)(t+1)). It is smallest σ-algebra containing all the sets A × B,whereA ∈ assumed that for each t, Ft is absolutely continuous with B(X)andB ∈ B(Y)). If µ is a measure on (X, B(X)) and Q respect to the instrumental probability Gt, that is, for all t+1 is a transition kernel, Q :(X, B(X))≺(Y, B(Y)), we denote λ0:t ∈ Z such that gt(λ0:t) = 0, ft(λ0:t) = 0. In the SIS 2244 EURASIP Journal on Applied Signal Processing framework, these weighted trajectories are updated by draw- The choice of the optimal instrumental distribution (14)has ing at each time step an offspring of each particle and then been introduced in [12] and has since then been used and/or computing the associated importance weight. It is assumed rediscovered by many authors (see [5, Section II-D] for a dis- in the sequel that the instrumental probability measure sat- cussion and extended references). Using this particular form isfies a decomposition similar to (3), that is, of the importance kernel, the incremental importance sam- pling weights (13)aregivenby Gt = Gt−1 ⊗ Kt,(7) M ⊗ where K :(Zt, P (Z) t)≺(Z, P (Z)) is a Markov transition (i,t−1) (i,t−1) t ut Λ , zJ(i,t) = qt Λ , zj , i ∈{1, ..., N}. M t kernel: = K (λ − , {z }) = 1. Hence, for all λ − ∈ Z , = j 1 t 0:t 1 j 0:t 1 j 1 M = (15) j=1 gt([λ0:t−1, zj ]) gt−1(λ0:t−1), showing that whenever = = ∈{ } (i,t−1) (i,t−1) gt−1(λ0:t−1) 0, gt([λ0:t−1, zj ]) 0forallj 1, ..., M . It is worthwhile to note that ut([Λ , zj ]) = ut([Λ , Define by kt the density of the Markov transition kernel Kt zl]) for all j, l ∈{1, ..., M}; the incremental importance with respect to the counting measure: weights do not depend upon the particular offspring of the particle which is drawn. gt λ 0:t−1, λ if gt−1 λ0:t−1 = 0, − , = g − λ − (8) kt λ0:t 1 λ t 1 0:t 1 2.3. Sequential importance sampling/resampling 0 otherwise. (i,t) def= (i,t) N (i,t) The normalized importance weights w¯ w / i=1 w In the SIS framework, at each time t, for each particle Λ(i,t−1), reflect the contribution of the imputed trajectories to the im- N i ∈{1, ..., N}, and then for each particular offspring j ∈ portance sampling estimate Ft . A weight close to zero in- {1, ..., M}, we evaluate the weights dicates that the associated trajectory has a “small” contri- ff bution. Such trajectories are thus ine ective and should be (i,j,t) (i,t−1) ρ = kt Λ , zj (9) eliminated. Resampling is the method usually employed to com- andwedrawanindexJ(i,t) from a multinomial distribution bat the degeneracy of the system of particles. Let [Λ(1,t−1), with parameters (ρ(i,1,t−1), ..., ρ(i,M,t−1)) conditionally inde- ..., Λ(N,t−1)] be a set of particles at time t − 1 and let pendently from the past: [w(1,t−1), ..., w(N,t−1)] be the associated importance weights. An SISR iteration, in its most elementary form, produces P (i,t) = | G = (i,j,t) ∈{ } ∈{ } J j t−1 ρ , i 1, ..., N , j 1, ..., M , a set of particles [Λ(1,t), ..., Λ(N,t)] with equal weights 1/N. (10) The SISR algorithm is a two-step procedure. In the first step, each particle is updated according to the importance tran- where Gt is the history of the particle system at time t, sition kernel kt and the incremental importance weights are computed according to (12)and(13), exactly as in the SIS al- G = Λ(j,τ), (j,τ) ,1≤ ≤ ,1≤ ≤ (11) t σ w j N τ t . gorithm. This produces an intermediate set of particles Λ˜ (i,t) ˜(i,t) The updated system of particles then is with associated importance weights w defined as
(i,t) (i,t−1) Λ˜ (i,t) = Λ(i,t−1) Λ = Λ , zJ(i,t) . (12) , zJ˜(i,t) , (16) (i,t) (i,t−1) (i,t−1) ˜ = Λ , ˜(i,t) , ∈{1, , }, If (Λ(1,0), ..., Λ(N,0)) is an independent sample from the dis- w w ut zJ i ... N tribution G0, it is then easy to see that at each time t, the particles (Λ(1,t), ..., Λ(N,t)) are independent and distributed where the random variables J˜(i,t), i ∈{1, ..., N},aredrawn according to Gt; the associated (unnormalized) importance conditionally independently from the past according to a (i,t) (i,t) (i,t) weights w = ft(Λ )/gt(Λ )canbewrittenasaproduct multinomial distribution with parameters (i,t) = Λ(i,t−1) (i,t−1) w ut( , zJ(i,t) )w , where the incremental weight (i,t−1) (i,t) (i,t−1) ut(Λ , ZJ(i,t) )isgivenby P J˜ = j Gt−1 = kt Λ , zj , (17) i ∈{1, ..., N}, j ∈{1, ..., M}. def qt λ0:t−1, λ t ut λ0:t−1, λ = ∀λ0:t−1 ∈ Z , λ ∈ Z. (13) kt λ0:t−1, λ (i,t) (i,t) We denote by S˜t = ((Λ˜ , w˜ ), i ∈{1, ..., N}), this in- It is easily shown that the instrumental distribution kt which termediate set of particles. In the second step, we resam- minimizes the variance of the importance weights condition- ple the intermediate particle system. Resampling consists in ally to the history of the particle system (see [5,Proposition transforming the weighted approximation of the probability N = N (i,t) 2]) is given by measure Ft, Ft i=1 w˜ δΛ˜ (i,t) , into an unweighted one, ˜N = −1 N Ft N i=1 δΛ(i,t) . To avoid introducing bias during the q λ − , · · = t 0:t 1 ∈ t resampling step, an unbiased resampling procedure should kt λ0:t−1, M for any λ0:t−1 Z . j=1 qt λ0:t−1, zj be used. More precisely, we draw with replacements N in- (1,t) (N,t) (i,t) = N (14) dices I , ..., I in such a way that N k=1 δi,I(k,t) , Global Sampling for Sequential Filtering 2245 the number of times the ith trajectory is chosen satisfies Recall that, when the optimal importance distribution is used, for each particle i ∈{1, ..., N}, the random variables N ˜(i,t) ∈{ } (i,t) (i,t) (i,t) J , i 1, ..., M , are conditionally independent from N = N, E N | G˜ t = Nw˜ G −1 and are distributed with multinomial random variable = (18) t i 1 with parameters ∈{ } for any i 1, ..., N , Λ(i,t−1) (i,t) qt , zj G˜ P J˜ = j | Gt−1 = , where t is the history of the particle system just before the M Λ(i,t−1) j=1 qt , zj (22) resampling step (see (11)), that is, G˜ t is the σ-algebra gener- (1,t) (N,t) ∈{ } ∈{ } ated by the union of Gt−1 and σ(J˜ , ..., J˜ ): i 1, ..., N , j 1, ..., M . (1,t) (N,t) We may compute, for , ∈{1, , } and ∈{1, , }, G˜ t = Gt−1∨σ J˜ , ..., J˜ . (19) i k ... N j ... M (k,t) (k,t) ∈{ } P I , J = (i, j) | Gt−1 Then,weset,fork 1, ..., N , = E P (k,t) = ˜(i,t) = | G˜ G ( , ) I i, J j t t−1 I(k,t), J(k,t) = I(k,t), J˜(I k t ,t) , = E P (k,t) = | G˜ ˜(i,t) = G (20) I i t 1(J j) t−1 Λ(k,t) = Λ(I(k,t),t−1) (k,t) = 1 , zJ(k,t) , w . M Λ(i,t−1) (i,t−1) N j=1 qt , zj w = (23) N M Λ(i,t−1) (i,t−1) =1 =1 qt , zj w Note that the sampling is done with replacement in the sense i j (i,t) that the same particle can be either eliminated or copied sev- × P J˜ = j | Gt−1 = eral times in the final updated sample. We denote by St (i,t−1) (i,t−1) qt(Λ , zj )w ((Λ(i,t), w(i,t)), i ∈{1, ..., N}) this set of particles. = = w¯ (i,j,t), N M Λ(i,t−1) (i,t−1) There are several options to obtain an unbiased sample. i=1 j=1 qt , zj w The most obvious choice consists in drawing the N particles conditionally independently on G˜ according to a multino- showing that the SISR algorithm is equivalent to drawing, t G mial distribution with normalized weights (w˜(1,t), ..., w˜(N,t)). conditionally independently from t−1, N random variables × ff In the literature, this is referred to as multinomial sam- out of N M possible o springs of the system of particles, (i,j,t) ∈{ } ∈{ } pling. As a result, under multinomial sampling, the particles with weights (w¯ , i 1, ..., N , j 1, ..., N ). (i,t) Resampling can be done at any time. When resampling is Λ are conditional on G˜t independent and identically dis- tributed (i.i.d.). There are however better algorithms which done at every time step, it is said to be systematic. In this case, (i,t) ∈{ } reduce the added variability introduced during the sampling the importance weights at each time t, w , i 1, ..., N , step (see the appendix). are all equal to 1/N. Systematic resampling is not always rec- This procedure is referred to as the SISR procedure. The ommended since resampling is costly from the computa- particles with large normalized importance weights are likely tional point of view and may result in loss of statistical ef- to be selected and will be kept alive. On the contrary, the ficiency by introducing some additional randomness in the ff particles with low normalized importance weights are elimi- particle system. However, the e ect of resampling is not nec- nated. Resampling provides more efficient samples of future essarily negative because it allows to control the degener- states but increases sampling variation in the past states be- acy of the particle systems, which has a positive impact on cause it reduces the number of distinct trajectories. the quality of the estimates. Therefore, systematic resam- The SISR algorithm with multinomial sampling defines a pling yields in some situations better estimates than the stan- Markov chain on the path space. The transition kernel of this dard SIS procedure (without resampling); in some cases (see chain depends upon the choice of the proposal distribution Section 4.2 for an illustration), it compares favorably with and of the unbiased procedure used in the resampling step. more sophisticated versions of the SISR algorithm, where re- These transition kernels are, except in a few special cases, in- sampling is done at random times (e.g., when the entropy ffi volved. However, when the “optimal” importance distribu- or the coe cient of variations of the normalized importance tion (14) is used in conjunction with multinomial sampling, weights is below a threshold). the transition kernel has a simple and intuitive expression. As 2.4. The global sampling algorithm already mentioned above, the incremental weights for all the possible offsprings of a given particle are, in this case, iden- When the instrumental distribution is the so-called optimal tical; as a consequence, under multinomial sampling, the in- sampling distribution (14), it is possible to combine the sam- dices I(k,t), k ∈{1, ..., N}, are i.i.d. with multinomial distri- pling/resampling step above into a single sampling step. This bution for all k ∈{1, ..., N}, idea has already been mentioned and worked out in [11, Section 3] under the name of deterministic/resample low (k,t) P I = i|G˜ t weights (RLW) approach, yet the algorithm given below is M Λ(i,t−1) (i,t−1) not given explicitly in this reference. j=1 qt , zj w − − = , i ∈{1, ..., N}. Let [Λ(1,t 1), ..., Λ(N,t 1)]beasetofparticlesattimet −1 N M Λ(i,t−1) (i,t−1) − − i=1 j=1 qt , zj w and let [w(1,t 1), ..., w(N,t 1)] be the associated importance (21) weights. Similar to the SISR step, the GS algorithm produces 2246 EURASIP Journal on Applied Signal Processing a set of particles [Λ(1,t), ..., Λ(N,t)] with equal weights. The where GS algorithm combines the two-stage sampling procedure (i) {Λ } ≥ are the indicators variables,hereassumedto (first, samples a particular offspring of a particle, updates the t t 0 take values in a finite set Z ={z , z , ..., z },where importance weights, and then resamples from the popula- 1 2 M M denotes the cardinal of the set Z; the law of {Λt}t≥0 tion) into a single one. is assumed to be known but is otherwise not specified; t+1 (i) We first compute the weights (ii) for any t ≥ 0, Ψt is a function Ψt : Z → S,whereS is a finite set; (i,j,t) (i,t−1) (i,t−1) { } × w =w qt Λ , zj , i∈{1, ..., N}, j ∈{1, ..., M}. (iii) Xt t≥0 are the (nx 1) state vectors; these state vari- (24) ables are not directly observed; (ii) We then draw N random variables ((I(1,t), J(1,t)), ..., (iv) the distribution of X0 is complex Gaussian with mean (I(N,t), J(N,t))) in {1, ..., N}×{1, ..., M} using an un- µ0 and covariance Γ0; ∈ biased sampling procedure, that is, for all (i, j) (v) {Yt}t0 are the (ny × 1) observations; {1, ..., N}×{1, ..., M}, the number of times of the (vi) {Wt}t0 and {Vt}t0 are (complex) nw-andnv- particles (i, j)is dimensional (complex) Gaussian white noise, Wt ∼ def N × ∼ N × × (i,j,t) = ∈{ } (k,t) (k,t) = c(0, Inw nw )andVt c(0, Inv nv ), where Ip p is the N k 1, ..., N , I , J (i, j) (25) p× p identity matrix; {Wt}t0 is referred to as the state { } thus satisfying the following two conditions: noise,whereas Vt t0 is the observation noise; (vii) {As, s ∈ S} are the state transition matrices, {Bs, s ∈ S} N M are the observation matrices, and {Cs, s ∈ S} and (i ,j ,t) N = N, {Ds, s ∈ S} are Cholesky factors of the covariance ma- i =1 j =1 trix of the state noise and measurement noise, respec- (26) (i,j,t) tively; these matrices are assumed to be known; E (i,j,t) G = w N −1 N . {Λ } t N M (i ,j ,t) (viii) the indicator process t t≥0 and the noise observa- i =1 j =1 w tion processes {Vt}t≥0 and {Wt}t≥0 are independent. The updated set of particles is then defined as This model has been considered by many authors, following the pioneering work in [13, 14](see[5, 7, 8, 15] for author- (k,t) (I(k,t),t−1) (k,t) 1 itative recent surveys). Despite its simplicity, this model is Λ = Λ , zJ(k,t) , w = . (27) N flexible enough to describe many situations of interests in- cluding linear state-space models with non-Gaussian state If multinomial sampling is used, then the GS algorithm is a noise or observation noise (heavy-tail noise), jump linear simple implementation of the SISR algorithm, which com- systems, linear state space with missing observations; of bines the two-stage sampling into a single one. Since the course, digital communication over fading channels, and so computational cost of drawing L random variables grows lin- forth. early with L, the cost of simulations is proportional to NM Our aim in this paper is to compute recursively in time an for the GS algorithm and NM + N for the SISR algorithm. estimate of the conditional probability of the (unobserved) There is thus a (slight) advantage in using the GS implemen- indicator variable Λ given the observation up to time n + ∆, tation. When sampling is done using a different unbiased n P Λ | = ∆ method (see the appendix), then there is a more substantial that is, ( n Y0:n+∆ y0:n+∆), where is a nonnegative in- { } ≤ difference between these two algorithms. As illustrated in the teger and for any sequence λt t≥0 and any integer 0 i
γ0(zj ) f (y |λ , y − ) = ∈{ } t 0:t 0:t 1 wj M , j 1, ..., M , (39) † † j =1 γ0 zj = | − Γ | − gc yt; Bst µt t 1 λ0:t , Bst t t 1 λ0:t Bst + Dst Dst , { ∈{ }} ∈ (31) and draw Ii, i 1, ..., N in such a way that, for j { } E = = N 1, ..., M , [Nj ] Nwj ,whereNj i=1 δIi,j . Then, set where µt|t−1[λ0:t]andΓt|t−1[λ0:t] denote the filtered mean Λ(i,0) = ∈{ } zIi , i 1, ..., N . and covariance of the state, that is, the conditional mean At time t ≥ 1, assume that we have N trajectories and covariance of the state given the indicators variables λ0:t (i,t−1) (i,t−1) (i,t−1) Λ = (Λ , , Λ − ) and that, for each trajec- − 0 ... t 1 and the observations up to time t 1 (the dependence of (i,t−1) tory,wehavestoredthefilteredmeanµ and covariance the predictive mean µt|t−1[λ0:t] on the observations y0:t−1 is Γ(i,t−1) defined in (36)and(37), respectively. implicit). These quantities can be computed recursively us- ing the following Kalman one-step prediction/correction for- (1) For i ∈{1, ..., N} and j ∈{1, ..., M},compute (i,t−1) mula. Denote by µt−1([λ0:t−1]) and Γt−1([λ0:t−1]) the mean the predictive mean µt|t−1[Λ , zj ]andcovariance (i,t−1) and covariance of the filtering density, respectively. These Γt|t−1[Λ , zj ] using (32)and(33), respectively. (i,t−1) quantities can be recursively updated as follows: Then, compute the innovation covariance Σt[Λ , (i,j,t) zj ] using (34) and evaluate the likelihood γ of (i) predictive mean: (i,t−1) the particle [Λ , zj ] using (31). Finally, compute Λ(i,t−1) = the filtered mean and covariance µt([ , zj ]) and µt|t−1 λ0:t Ast µt−1 λ0:t ; (32) (k,t−1) Γt([Λ , zj ]). (ii) predictive covariance: (2) Compute the weights γ(i,j,t) Γ = Γ T T (i,j,t) = | − λ A − λ A + C C ; (33) w , t t 1 0:t st t 1 0:t st st st N M (i ,j ,t) i =1 j =1 γ (40) (iii) innovation covariance: i ∈{1, ..., N}, j ∈{1, ..., M}.
T T (3) Draw {(Ik, Jk), k ∈{1, ..., N}} using an unbiased Σ λ = B Γ | − λ B + D D ; (34) t 0:t st t t 1 0:t st st st sampling procedure (see (26)) with weights {w(i,j,t)}, i ∈{1, ..., N}, j ∈{1, ..., M};set,fork ∈{1, ..., N}, (iv) Kalman Gain: − Λ(k,t) = Λ(Ik,t 1) ( , zJk ). Store the filtered mean and (k,t) (k,t) −1 covariance µt([Λ ]) and Γt([Λ ]) using (36)and K λ = Γ | − λ B Σ [λ ] ; (35) t 0:t t t 1 0:t st t 0:t (37), respectively. (v) filtered mean: Remark 1. From the trajectories and the computed weights it is possible to evaluate, for any δ ≥ 0andt ≥ δ, the posterior = − µt λ0:t µt−1 λ0:t−1 + Kt λ0:t yt Bst µt|t−1 λ0:t−1 ; (36) probability of Λt−δ given Y0:t = y0:t as (vi) filtered covariance: Pˆ Λt−δ = zk | Y0:t = y0:t N Γ λ ]= I − K [λ − ]B Γ | − λ . (37) t 0:t t 0:t 1 st t t 1 0:t (i,k,t), = 0, filtering, w δ = ∝ i 1 Note that the conditional distribution of the state vector Xt N M (i,j,t) given the observations up to time t, y0:t, is a mixture of w δΛ(i,t−1) , δ>0, fixed-lag smoothing. t−δ ,zk Gaussian distributions with a number of components equal i=1 j=1 to Mt+1 which grows exponentially with t.Wehavenowat (41) 2248 EURASIP Journal on Applied Signal Processing
Similarly, we can approximate the filtering and the smooth- Below, we describe a straightforward implementation of ing distribution of the state variable as a mixture of Gaus- the GS method to approximate the smoothing distribution sians. For example, we can estimate the filtered mean and by the delayed sampling procedure; more sophisticated tech- variance of the state as follows: niques, using early pruning of the possible prolonged trajec- tories, are currently under investigation. For any t ∈ N and (i) filtered mean: t+1 for any λ0:t ∈ Z ,denote
N M ∆ w(i,j,t)µ Λ(i,t−1), z ); (42) t+ t j ∆ def= i=1 j=1 Dt λ0:t γτ λ0:τ , (45) = λt+1:t+∆ τ t+1 (ii) filtered covariance: where the function γτ is defined in (38). With this notation, (44)mayberewrittenas N M w(i,j,t)Γ Λ(k,t−1), z . (43) t j i=1 j=1 ∆ ∆ ∝ Dt λ0:t Qt λ0:t−1; λt γt λ0:t ∆ . (46) Dt−1 λ0:t−1 3.3. Fixed-lag smoothing We now describe one iteration of the algorithm. Assume Since the state process is correlated, the future observations that for some time instant t 1, we have N trajectories − − contain information about the current value of the state; Λ(j,t−1) = Λ(j,t 1) Λ(j,t 1) ( 0 , ..., t−1 ); in addition, for each trajec- therefore, whenever it is possible to delay the decision, fixed- tory Λ(j,t−1), the following quantities are stored: lag smoothing estimates yield more reliable information on ∆ Λ(j,t−1) the indicator process than filtering estimates. (1) the factor Dt−1( )definedin(45); τ−t+1 As pointed out above, it is possible to determine an es- (2) for each prolongation λt:τ ∈ Z with τ ∈ timate of the fixed-lag smoothing distribution for any delay {t, t +1,..., t + ∆ − 1}, the conditional likelihood (j,t−1) δ from the trajectories and the associated weights produced γτ (Λ , λt:τ )givenin(38); by the SISR or GS method described above; nevertheless, we ∆ (3) for each prolongation λt:t+∆−1 ∈ Z , the filtering con- should be aware that this estimate can be rather poor when (j,t−1) ditional mean µt+∆−1([Λ , λt:t+∆−1]) and covari- the delay is large, as a consequence of the impoverishment (j,t−1) δ ance Γt+∆−1(Λ , λt:t+∆−1). of the system of particles (the system of particle “forgets” its past). To address this well-known problem in all parti- One iteration of the algorithm is then described below. cle methods, it has been proposed by several authors (see ∆ (1) For each i ∈{1, ..., N} and for each λ ∆ ∈ Z +1, [11, 16, 17, 18]) to sample at time t from the conditional t:t+ compute the predictive conditional mean and co- distribution of Λt given Y0:t+∆ = y0:t+∆ for some ∆ > 0. (i,t−1) variance of the state, µt+∆|t+∆−1([Λ , λt:t+∆]) and The computation of fixed-lag smoothing distribution is also (i,t−1) Γ +∆| +∆−1([Λ , λ : +∆]), using (32)and(33), re- amenable to GS approximation. t t t t Λ spectively. Then compute the innovation covariance Consider the distribution of the indicator variables 0:t Σ Λ(i,t−1) t+∆[( , λt:t+∆)] using (34) and the likelihood conditional to the observations Y0:t+∆ = y0:t+∆,where∆ is a Λ(j,t−1) ∆ γt+∆( , λt:t+∆) using (31). positive integer. Denote by {F } this sequence of proba- t t 0 (2) For each i ∈{1, ..., N} and j ∈{1, ..., M},compute bility measures; the dependence on the observations y0:t+∆ being, as in the previous section, implicit. This sequence of distributions also satisfies (3), that is, there exists a fi- t+∆ ∆ t P ⊗t ≺ P ∆ Λ(i,t−1) = Λ(i,t−1) nite transition kernel Qt :(Z , (Z) ) (Z, (Z)) such that Dt , zj γτ , zj , λt+1:t+τ , ∆ ∆ ∆ = ⊗ λ ∆ τ=t+1 Ft Ft−1 Qt for all t 1. Elementary conditional prob- t+1:t+ ability calculations exploiting the conditional independence ∆ Λ(i,t−1) ∆ (i,j,t) (i,t−1) Dt , zj structure of (28) show that the transition kernel Qt can be γ = γ (Λ , z ) , t j ∆ Λ(i,t−1) determined, up to a normalization constant, by the relation Dt−1 (i,j,t) (i,j,t) γ ∆ w = . t+ | | M N (i ,j ,t) ∆ = f (y y − , λ ) f (λ λ − ) = = γ ∆ ∝ λt+1:t+ τ t τ 0:τ 1 0:τ τ 0:τ 1 i 1 j 1 Qt λ0:t−1; λt t+∆−1 , | − | − (47) λt:t+∆−1 τ=t f (yτ y0:τ 1, λ0:τ )f (λτ λ0:τ 1) (44) (3) Update the trajectory of particles using an unbiased t where, for all λ0:t−1 ∈ Z , the terms f (yτ |y0:τ−1, λ0:τ )canbe sampling procedure {(Ik, Jk), k ∈{1, ..., N}} with determined recursively using Kalman filter fixed-lag smooth- weights {w(i,j,t)}, i ∈{1, ..., N}, j ∈{1, ..., M},and − Λ(k,t) = Λ(Ik,t 1) ∈{ } ing update formula. set ( , zJk ), k 1, ..., N . Global Sampling for Sequential Filtering 2249
4. SOME EXAMPLES σ2 = 1.5, π1 = 1.7, and ρ = 0.3, and applied the GS and the SISR algorithm for online filtering. We compare estimates of 4.1. Autoregressive model with jumps the filtered state mean using the GS and the SIS with sys- To illustrate how the GS method works, we consider the tematic resampling. In both case, we use the estimator (42) state-space model of the filtered mean. Two different unbiased sampling strate- gies are used: multinomial sampling and the modified strat- = ified sampling (detailed in the appendix).1 In Figure 1,we Xt aΛt Xt−1 + σΛt t, (48) have displayed the box and whisker plot2 of the difference = Yt Xt + ρηt, between the filtered mean estimate (42) and the true value of the state variables for N = 5, 10, 50 particles using multino- where {t}t≥0 and {ηt}t≥0 are i.i.d. unit-variance Gaussian {Λ } mial sampling (Figure 1a) and the modified stratified sam- noise. We assume that t t≥0 is an i.i.d. sequence of ran- pling (Figure 1b). These results are obtained from 100 hun- def dom variables taking their values in Z ={1, 2},whichis dred independent Monte Carlo experiments where, for each { } { } independent from both t t≥0 and ηt t≥0, and such that experiment, a new set of the observations and state variables P Λ = = ∈ [ 0 i] πi, i Z. This can easily be extended to deal are simulated. These simulations show that, for the autore- with the Markovian case. This simple model has been dealt gressive model, the filtering algorithm performed reasonably with, among others, in [19]and[20, Section 5.1]. We focus in well even when the number of particles is small (the differ- this section on the filtering problem, that is, we approximate ence between N = 5andN = 50 particles is negligible; the distribution of the hidden state Xt given the observations N = 50 particles is suggested in the literature for the same = up to time t, Y0:t y0:t. For this model, we can carry out the simulation setting [20]). There are no noticeable differences computations easily. The transition kernel qt defined in (30) between the standard SISR implementation and the GS im- ∈ t ∈ is given, for all λ0:t−1 Z , λt Z,by plementation of the SISR. Note that the error in the estimate is dominated by the filtered variance E[(X − E[X |Y ])2]; t t 0:t 2 − | − the additional variations induced by the fluctuations of the πλt yt µt t 1 λ 0:t qt λ0:t−1, λt ∝ exp − , particle estimates are an order of magnitude lower than this Σ 2Σ λ 2π t λ0:t t 0:t quantity. (49) To visualize the difference between the different sam- where the mean µt|t−1[λ0:t]andcovarianceΣt[λ0:t]arecom- pling schemes, it is more appropriate to consider the fluc- puted recursively from the filtering mean µt−1([λ0:t−1]) and tuation of the filtered mean estimates around their sam- covariance Γt−1([λ0:t−1]) according to the following one-step ple mean for a given value of the time index and of the Kalman update equations derived from (32), (33), and (34): observations. In Figure 2, we have displayed the box and whisker plot of the error at time index 25 between the fil- (i) predictive mean: tered mean estimates and their sample mean at each time instant; these results have been obtained from 100 inde- = µt|t−1 λ0:t aλt µt−1 λ0:t ; (50) pendent particles (this time, the set of observations and of states are held fixed over all the Monte Carlo simula- (ii) predictive covariance: tions). As above, we have used N = 5, 10, 50 of particles and two sampling methods: multinomial sampling (Figure 2a)
2 2 and modified stratified sampling (Figure 2b). This figure Γt|t−1 λ0:t = a Γt−1 λ0:t + σ ; (51) λt λt shows that the GS estimate of the sampled mean has a (iii) innovation covariance: lower standard deviation than any other estimators included in this comparison, independently of the number of par- ff Σ = Γ 2 ticles which are used. The di erences between these esti- t λ0:t t|t−1 λ0:t + ρ ; (52) mators are however small compared to the filtering vari- ance. (iv) filtered mean: 4.2. Joint channel equalization and symbol detection Γ t|t−1 λ0:t on a flat Rayleigh-fading channel = | − − + µt λ0:t µt t 1 λ0:t 1 Γ 2 t|t−1 λ0:t + ρ (53) 4.2.1. Model description × − | − − ; yt µt t 1 λ0:t 1 We consider in this section a problem arising in transmis- sion over a Rayleigh-fading channel. Consider a communi- (v) filtered covariance: 2 ρ Γ | − λ Γ = t t 1 0:t 1TheMatlabcodetoreproducetheseexperimentsisavailableat t λ0:t 2 . (54) Γt|t−1 λ0:t + ρ http://www.tsi.enst.fr/∼moulines/. 2The lower and upper limits of the box are the quartiles; the horizontal We have used the parameters (used in the experiments car- line in the box is the sample median; the upper and lower whiskers are at 3/2 ried out in [20, Section 5.1]): ai = 0.9(i = 1, 2), σ1 = 0.5, times interquartiles. 2250 EURASIP Journal on Applied Signal Processing
1 1
0.5 0.5
0 0 Error Error
−0.5 −0.5
−1 −1
SISR5 SISR10 SISR50 GS5 GS10 GS50 SISR5 SISR10 SISR50 GS5 GS10 GS50 Method Method
(a) (b)
Figure 1: Box and whisker plot of the difference between the filtered mean estimates and the actual value of the state estimate for 100 independent Monte Carlo experiments. (a) Multinomial sampling. (b) Residual sampling with the modified stratified sampling.
cation system signaling through a flat-fading channel with where Yt, αt, St,andVt denote the received signal, the fading additive noise. In this context, the indicator variables {Λt} in channel coefficient, the transmitted symbol, and the additive the representation (28) are the input bits which are transmit- noise at time t, respectively. It is assumed in the sequel that ted over the channel and {St}t≥0 are the symbols generally (i) the processes {α } , {Λ } ,and{V } are mutu- taken into an M-ary complex alphabet. The function Ψ is t t 0 t t 0 t t 0 t ally independent; thus the function which maps the stream of input bits into a { } stream of complex symbols: this function combines channel (ii) the noise Vt is a sequence of i.i.d. zero-mean com- ∼ N 2 encoding and symbol mapping. In the simple example con- plex random variables Vt c(0, σV ). sidered below, we assume binary phase shift keying (BPSK) It is further assumed that the channel fading process is modulation with differential encoding: = − (2Λ − 1). St St 1 t { } The input-output relationship of the flat-fading channel is Rayleigh, that is, αt is a zero-mean complex Gaussian pro- described by cess; here modelled as an ARMA(L, L), − −· · ·− = ··· Yt = αtSt + Vt, (55) αt φ1αt−1 φLαt−L θ0ηt +θ1ηt−1+ +θLηt−L, (56) Global Sampling for Sequential Filtering 2251
×10−3 ×10−3 8 8
6 6
4 4
2 2
0 0 Error Error
−2 −2
−4 −4
−6 −6
−8 −8 SISR5 SISR10 SISR50 GS5 GS10 GS50 SISR5 SISR10 SISR50 GS5 GS10 GS50 Method Method
(a) (b)
Figure 2: Box and whisker plot of the difference between the filtered mean estimates and their sample mean for 100 independent particles for a given value of the time index 25 and of the observations. (a) Multinomial sampling. (b) Residual sampling with the modified stratified sampling.
where φ1, ..., φL and θ0, ..., θL are the autoregressive and the where {ψk}1≤k≤m are the coefficients of the expansion of moving average (ARMA) coefficients, respectively, and {ηt} θ(z)/φ(z), for |z|≤1, with is a white complex Gaussian noise with zero mean and unit p variance. This model can be written in state-space form as φ(z) = 1 − φ1z −···−φpz , follows: q (58) θ(z) = 1+θ1z + ···+ θqz . 01 0··· 0 ψ This particular problem has been considered, among others, 1 00 1··· 0 ψ in [10, 16, 18, 21, 22]. 2 X +1 = . . . X + . η , t . . . t . t . . . . (57) 4.2.2. Simulation results φL φL−1 ··· φ1 ψL To allow comparison with previously reported work, we ··· αt = 10 0 Xt + ηt, consider the example studied in [16, Section VIII]. In this 2252 EURASIP Journal on Applied Signal Processing example, the fading process is modelled by the output of a 0.1 Butterworth filter of order L = 3 whose cutoff frequency is 0.05, corresponding to a normalized Doppler frequency fdT = 0.05 with respect to the symbol rate 1/T, which is a 0.01 fast-fading scenario. More specifically, the fading process is modelled by the ARMA(3, 3) process 0.001 α − 2.37409α − +1.92936α − − 0.53208α − BER t t 1 t 2 t 3 −2 = 10 0.89409η +2.68227η − (59) t t 1 +2.68227ηt−2 +0.89409ηt−3 , 0.0001 where ηt ∼ Nc(0, 1). It is assumed that a BPSK modulation is used, that is, St ∈{−1, +1},withdifferential encoding and 1e − 05 10 15 20 25 30 35 40 no channel code; more precisely, we assume that St = St−1Λt, SNR where Λt ∈{−1, +1} is the bit sequence, assumed to be i.i.d. Bernoulli random variables with probability of success = P Λ = = GS (δ 0) ( t 1) 1/2. GS (δ = 1) The performance of the GS receiver (using the modified MKF (δ = 0, β = 0.1) residual sampling algorithm) has been compared with the MKF (δ = 1, β = 0.1) MKF (δ = 0, β = 1) following receiver schemes. MKF (δ = 1, β = 1) (1) Known channel lower bound. We assume that the true Known channel bound ffi Genie-aided bound fading coe cients αt are known to the receiver and Differential detector we calculate the optimal coherent detection rule Sˆt = { ∗ } Λ = sign( αt Yt )and ˆ t SˆtSˆt−1. Figure 3: BER performance of the GS receiver versus the SNR. The (2) Genie-aided lower bound. We assume that a genie al- BER corresponding to delays δ = 0andδ = 1 are shown. Also = lows the receiver to observe Y˜t = αt + V˜t,with shown in this figure are the BER curves for the MKF detector (δ 0 V˜ ∼ N (0, σ2 ). We use Y˜ to calculate an estimate and δ = 1), the known channel lower bound, the genie-aided lower t c V t ff αˆ of the fading coefficients via a Kalman filter and bound, and the di erential detector. The number of particles for the t GS receiver and the MKF detector is 50. we then evaluate the optimal coherent detection Sˆt = { ∗ } Λ = sign( αˆt Yt )and ˆ t SˆtSˆt−1 using the filtered fad- ing process. counting the BER. The BER performance of the GS receiver is (3) Differential detector. In this scenario, no attempt is shown for estimation delays δ = 0 (concurrent estimation) made to estimate the fading process and the input bits and δ = 1. Also shown are the BER curves for the known are estimated using incoherent differential detection: channel lower bound, the genie-aided lower bound, the dif- ∗ Λˆ = sign( {Y Y − }). ferential detector, and the MKF detector with estimation de- t t t 1 = = = (4) MKF detector. The SMC filter described in [16, Sec- lays δ 0andδ 1 and resampling thresholds β 0.1 and β = 1 (systematic resampling). The number of particles tions IV and V] is used to estimate Λt.TheMKFde- tector uses the SISR algorithm to draw samples in the for both the GS receiver and the MKF detector is set to 50. From this figure, it can be seen that with 50 particles, there is indicator space and implements a Kalman filter for ff each trajectory in order to compute its trial sampling no significant performance di erence between the proposed density and its importance weight. Resampling is per- receiver and the MKF detector with the same estimation de- = = formed when the ratio between the effective sample lay and β 0.1orβ 1. Note that, as observed in [16], size defined in [16, equation (45)] and the actual sam- the performance of the receiver is significantly improved by = ple size N is lower than a threshold β. The delayed the delayed-weight method with δ 1 compared with con- current estimate; there is no substantial improvement when weight method is used to obtain an estimate of Λt with adelayδ. increasing further the delay; the GS receiver achieves essen- tially the genie-aided bound over the considered SNR. In all the simulations below, we have used only the concur- Figure 4 shows the BER performance of the GS receiver rent sampling method because in the considered simulation versus the number of particles at SNR = 20 dB and δ = 1. scenarios, the use of the delayed sampling method did not Also shown in this figure is the BER performance for the bring significative improvement. This is mainly due to the MKF detector with β = 0.1andβ = 1, respectively. It can fact that we have only considered, due to space limitations, be seen from this plot that when the number of particles is the uncoded communication scenario. decreased from 50 to 10, the BER of the MKF receiver with Figure 3 shows the BER performance of each receiver ver- β = 0.1 increases by 67%, whereas the BER of the GS receiver sus the SNR. The SNR is defined as var(αt)/ var(Vt) and the increases by 11% only. In fact, Figure 4 also shows that, for BER is obtained by averaging the error rate over 106 sym- this particular example, the BER performance of the GS re- bols. The first 50 symbols were not taken into account in ceiver is identical to the BER performance of an MKF with Global Sampling for Sequential Filtering 2253
0.1 0.1
0.01
0.01 0.001 BER BER
0.0001
0.001 1e − 05 0 5 10 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 Number of particles SNR
= GS, RMS : δ 1 GS, RMS : δ = 1 = = SISR, RMS : δ 1, β 1 SISR, RMS : δ = 1, β = 1 = = SISR, RMS : δ 1, β 0.1 SISR, RMS : δ = 1, β = 0.1 Known channel bound Figure 4: BER performance of the GS receiver versus the number Genie-aided bound Differential detector of particles at SNR = 20 dB and δ = 1. Also shown in this figure are the BER curves for the MKF detector with β = 0.1andβ = 1. Figure 5: BER performance of the GS receiver versus the SNR. The BER corresponding to delay δ = 1 is shown. Also shown in this = = the same number of particles and a resampling threshold set figure are the BER curves for the MKF detector (δ 1, β 0.1), = the known channel lower bound, the genie-aided lower bound, and to β 1 (systematic resampling). This suggests that, contrary the differential detector. The number of particles for the GS receiver to what is usually argued in the literature [5, 16], systematic and the MKF detector is 5. resampling of the particle seems to be, for reasons which re- main yet unclear from a theoretical standpoint, more robust when the number of particles is decreased to meet the con- the implementation of such a solution in real-world appli- straints of real-time implementation. cations: the global sampling algorithm is close to the optimal Figure 5 shows the BER performance of each receiver ver- genie-aided bound with as few as 5 particles and thus pro- sus the SNR when the number of particles for both the GS vides a realistic alternative to the joint channel equalization receiver and the MKF detector is set to 5. For these simula- and symbol detection algorithms reported earlier in the lit- tions, the BER is obtained by averaging the error rate over erature. 105 symbols. From this figure, it can be seen that with 5 par- ticles, there is a significant performance difference between APPENDIX the proposed receiver and the MKF detector with the same estimation delay and a β = 0.1 resampling threshold. This MODIFIED STRATIFIED SAMPLING difference remains significant even for SNR values close to In this appendix, we present the so-called modified stratified 10 dB. Figure 5 also shows that, for this particular example, sampling strategy. Let M and N be integers and (w , ..., w ) the BER performance of the GS receiver is identical to the 1 M M = BER performance of an MKF with the same estimation delay be nonnegative weights such that i=1 wi 1. A sam- and a resampling threshold β set to 1. pling procedure is said to be unbiased if the random vector (N1, ..., NM)(whereNi is the number of times the index i is drawn) satisfies 5. CONCLUSION M In this paper, a sampling algorithm for conditionally linear Ni = N, E[Ni] = Nwi, i ∈{1, ..., M}. (A.1) Gaussian state-space models has been introduced. This algo- i=1 rithm exploits the particular structure of the flow of proba- bility measures and the fact that, at each time instant, a global The modified stratified sampling is summarized as follows. exploration of all possible offsprings of a given trajectory of ∈{ } indicator variables can be considered. The number of trajec- (1) For i 1, ..., M ,compute[Nwi], where [x] is the tories is kept constant by sampling from this set (selection integer part of x; then compute the residual number ˜ = − M step). N N i=1[Nwi] and the residual weights The global sampling algorithm appears, in the example considered here, to be robust even when a very limited num- Nwi − Nwi w˜ = , i ∈{1, ..., M}. (A.2) ber of particles is used, which is a basic requirement for i N˜ 2254 EURASIP Journal on Applied Signal Processing
˜ (2) Draw N i.i.d. random variables U1, ..., UN˜ with a uni- [17] X. Wang, R. Chen, and D. Guo, “Delayed-pilot sampling for form distribution on [0, 1/N˜ ]andcompute,fork ∈ mixture Kalman filter with application in fading channels,” {1, ..., N˜ }, IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 241–254, 2002. [18] E. Punskaya, C. Andrieu, A. Doucet, and W. J. Fitzgerald, k − 1 “Particle filtering for demodulation in fading channels with U˜ = + U . (A.3) k N˜ k non-Gaussian additive noise,” IEEE Transactions on Commu- nications, vol. 49, no. 4, pp. 579–582, 2001. ∈{ } ˜ (3) For i 1, ..., M ,setNi as the number of indices [19] G. Kitagawa, “Monte Carlo filter and smoother for non- k ∈{1, ..., N˜ } satisfying Gaussian non-linear state space models,” Journal of Computa- tional and Graphical Statistics, vol. 5, no. 1, pp. 1–25, 1996. i−1 i [20] J. Liu, R. Chen, and T. Logvinenko, “A theoretical frame- ˜ ≤ wj < Uk wj . (A.4) work for sequential importance sampling and resampling,” j=1 j=1 in Sequential Monte Carlo Methods in Practice,A.Doucet, N.deFreitas,andN.Gordon,Eds.,Springer,NewYork,NY, REFERENCES USA, 2001. [21] F. Ben Salem, “Recepteur´ particulaire pour canaux mobiles [1]T.Kailath,A.Sayed,andB.Hassibi, Linear Estimation,Pren- ff evanescents,”´ in Journ´ees Doctorales d’Automatique (JDA ’01), tice Hall, Englewood Cli s, NJ, USA, 1st edition, 2000. pp. 25–27, Toulouse, France, September 2001. [2] I. MacDonald and W. Zucchini, Hidden Markov and Other [22] F. Ben Salem, R´eception particulaire pour canaux multi-trajets Models for Discrete-Valued Time Series, vol. 70 of Monographs ´evanescents en communication radiomobile, Ph.D. dissertation, on Statistics and Applied Probability, Chapman & Hall, Lon- Universite´ Paul Sabatier, Toulouse, France, 2002. don, UK, 1997. [3] A. Doucet, N. de Freitas, and N. Gordon, “An introduction to sequential Monte Carlo methods,” in Sequential Monte Carlo Pascal Cheung-Mon-Chan graduated from Methods in Practice, A. Doucet, N. de Freitas, and N. Gordon, theEcoleNormaleSuperieure´ de Lyon in Eds., pp. 3–13, Springer, New York, NY, USA, January 2001. 1994 and received the Diplomeˆ d’Ingenieur´ [4] C. Carter and R. Kohn, “Markov chain Monte Carlo in con- from the Ecole´ Nationale Superieure´ des ditionally Gaussian state space models,” Biometrika, vol. 83, Tel´ ecommunications´ (ENST) in Paris in the no. 3, pp. 589–601, 1996. [5] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte same year. After working for General Elec- Carlo sampling methods for Bayesian filtering,” Statistics and tric Medical Systems as a Research and De- Computing, vol. 10, no. 3, pp. 197–208, 2000. velopment Engineer, he received the Ph.D. [6] N. Shephard, “Partial non-Gaussian state space,” Biometrika, degree from the ENST in Paris in 2003. He ff vol. 81, no. 1, pp. 115–131, 1994. is currently a member of the research sta [7] J. Liu and R. Chen, “Sequential Monte Carlo methods for dy- at France Telecom Research and Development. namic systems,” Journal American Statistical Association, vol. 93, no. 444, pp. 1032–1044, 1998. Eric Moulines was born in Bordeaux, France, in 1963. He re- [8] R. Chen and J. Liu, “Mixture Kalman filter,” Journal Royal ceived the M.S. degree from Ecole Polytechnique in 1984, the Ph.D. Statistical Society Series B, vol. 62, no. 3, pp. 493–508, 2000. degree from Ecole´ Nationale Superieure´ des Tel´ ecommunications´ [9] G. Rigal, Filtrage non-lin´eaire, r´esolution particulaire et appli- (ENST) in 1990 in signal processing, and an “Habilitation a` Diriger cations au traitement du signal, Ph.D. dissertation, Universite´ des Recherches” in applied mathematics (probability and statistics) Paul Sabatier, Toulouse, France, 1993. from UniversiteRen´ e´ Descartes (Paris V) in 1995. From 1986 until [10] J. Liu and R. Chen, “Blind deconvolution via sequential im- 1990, he was a member of the technical staff at Centre National de putations,” Journal American Statistical Association, vol. 430, Recherche des Tel´ ecommunications´ (CNET). Since 1990, he was no. 90, pp. 567–576, 1995. with ENST, where he is presently a Professor (since 1996). His [11] E. Punskaya, C. Andrieu, A. Doucet, and W. Fitzgerald, “Par- teaching and research interests include applied probability, math- ticle filtering for multiuser detection in fading CDMA chan- ematical and computational statistics, and signal processing. nels,” in Proc. 11th IEEE Signal Processing Workshop on Statis- tical Signal Processing, pp. 38–41, Orchid Country Club, Sin- gapore, August 2001. [12] V. Zaritskii, V. Svetnik, and L. Shimelevich, “Monte-Carlo technique in problems of optimal information processing,” Automation and Remote Control, vol. 36, no. 12, pp. 95–103, 1975. [13] H. Akashi and H. Kumamoto, “Random sampling approach to state estimation in switching environments,” Automatica, vol. 13, no. 4, pp. 429–434, 1977. [14] J. Tugnait, “Adaptive estimation and identification for discrete systems with markov jump parameters,” IEEE Trans. Auto- matic Control, vol. 27, no. 5, pp. 1054–1065, 1982. [15] A. Doucet, N. Gordon, and V. Krishnamurthy, “Particle filters for state estimation of jump Markov linear systems,” IEEE Trans. Signal Processing, vol. 49, no. 3, pp. 613–624, 2001. [16] R. Chen, X. Wang, and J. Liu, “Adaptive joint detection and decoding in flat-fading channels via mixture kalman filter- ing,” IEEE Transactions on Information Theory, vol. 46, no. 6, pp. 2079–2094, 2000. EURASIP Journal on Applied Signal Processing 2004:15, 2255–2266 c 2004 Hindawi Publishing Corporation
Multilevel Mixture Kalman Filter
Dong Guo Department of Electrical Engineering, Columbia University, New York, NY 10027, USA Email: [email protected]
Xiaodong Wang Department of Electrical Engineering, Columbia University, New York, NY 10027, USA Email: [email protected]
Rong Chen Department of Information and Decision Sciences, University of Illinois at Chicago, Chicago, IL 60607-7124, USA Email: [email protected] Department of Business Statistics & Econometrics, Peking University, Beijing 100871, China
Received 30 April 2003; Revised 18 December 2003
The mixture Kalman filter is a general sequential Monte Carlo technique for conditional linear dynamic systems. It generates sam- ples of some indicator variables recursively based on sequential importance sampling (SIS) and integrates out the linear and Gaus- sian state variables conditioned on these indicators. Due to the marginalization process, the complexity of the mixture Kalman filter is quite high if the dimension of the indicator sampling space is high. In this paper, we address this difficulty by developing a new Monte Carlo sampling scheme, namely, the multilevel mixture Kalman filter. The basic idea is to make use of the multilevel or hierarchical structure of the space from which the indicator variables take values. That is, we draw samples in a multilevel fashion, beginning with sampling from the highest-level sampling space and then draw samples from the associate subspace of the newly drawn samples in a lower-level sampling space, until reaching the desired sampling space. Such a multilevel sampling scheme can be used in conjunction with the delayed estimation method, such as the delayed-sample method, resulting in delayed multilevel mixture Kalman filter. Examples in wireless communication, specifically the coherent and noncoherent 16-QAM over flat-fading channels, are provided to demonstrate the performance of the proposed multilevel mixture Kalman filter. Keywords and phrases: sequential Monte Carlo, mixture Kalman filter, multilevel mixture Kalman filter, delayed-sample method.
1. INTRODUCTION (DLM) [16] and it can be generally described as follows: = Recently there have been significant interests in the use of xt Fλt xt−1 + Gλt ut, (1) the sequential Monte Carlo (SMC) methods to solve on- = yt Hλt xt + Kλt vt, line estimation and prediction problems in dynamic systems. Compared with the traditional filtering methods, the sim- where ut ∼ N (0, I)andvt ∼ N (0, I) are the state and ob- ple, flexible—yet powerful—SMC provides effective means servation noise, respectively, and λt is a sequence of random to overcome the computational difficulties in dealing with indicator variables which may form a Markov chain, but are nonlinear dynamic models. The basic idea of the SMC tech- independent of ut and vt and the past xs and ys, s
y the performance of the MKF. However, the computational complexity of the delayed-sample method is very high. For example, for a ∆-step delayed-sample method, the algorith- s2,2 s2,1 s1,2 s1,1 ∆ mic complexity is O(|A1| ), where |A1| is the cardinality of c2 c1 the original sampling space. Here, we also provide a delayed multilevel MKF by exploring the multilevel structure of the
s2,3 s2,4 s1,3 s1,4 indicator space. Instead of exploring the original entire space of future states, we only sample the future states in a higher- x level sampling space, thus significantly reducing the dimen- s3,2 s3,1 s4,2 s4,1 sion of the search space and the computational complexity. c3 c4 In recent years, the SMC methods have been success- fully employed in several important problems in communi-
s3,3 s3,4 s4,3 s4,4 cations, such as the detection in flat-fading channels [13, 19, 20], space-time coding [21, 22], OFDM system [23], and so on. To show the good performance of the proposed novel re- Figure 1: The multilevel structure of the 16-QAM modulation used ceivers, we apply them into the problem of adaptive detection A = in digital communications. The set of transmitted symbol 1 in flat-fading channels in the presence of Gaussian noise. { , = 1, ,4, = 1, ,4} is the original sampling space, and si,j i ... j ... The remainder of the paper is organized as follows. Sec- the centers A2 ={c1, c2, c3, c4} constitute a higher-level sampling space. tion 2 briefly reviews the MKF algorithm and its variants. In Section 3, we present the multilevel MKF algorithm. In However, the computational complexity of the MKF can Section 4, we treat the delayed multilevel MKF algorithm. be quite high, especially in the case of high-dimensional in- In Section 5, we provide simulation examples. Section 6 con- dicator space, due to the need of marginalizing out the indi- cludes the paper. cator variables. Fortunately, often the space from which the indicator variables take values exhibits multilevel or hierar- 2. BACKGROUND OF MIXTURE KALMAN FILTER chical structures, which can be exploited to reduce the com- putational complexity of the MKF. For example, a multilevel 2.1. Mixture Kalman filter structure of the 16-QAM modulation used in digital commu- Consider again the CDLMs defined by (1). The MKF exploits nications is shown in Figure 1. The set of transmitted sym- the conditional Gaussian property conditioned on the indi- bols A1 ={si,j , i = 1, ...,4, j = 1, ...,4} is the original sam- cator variable and utilizes a marginalization operation to im- ffi pling space, and the centers A2 ={c1, c2, c3, c4} constitute prove the algorithmic e ciency. Instead of dealing with both a higher-level sampling space. Thus, based on the observed xt and λt, the MKF draws Monte Carlo samples only in the data, for every sample stream, we first draw a sample (say indicator space and uses a mixture of Gaussian distributions c1) from the higher-level sampling space A2 and then draw a to approximate the target distribution. Compared with the ffi new sample from the associated subspaces s1,1, s1,2, s1,3, s1,4 of generic SMC method, the MKF is substantially more e cient c1 in the original sampling space A1.Inthisway,weneednot (e.g., it produces more accurate results with the same com- sample from the entire original sampling space, and many putational resources). Kalman filter update steps associated with the standard MKF First we define an important concept that is used can be saved. throughout the paper. A set of random samples and the { (i) (i) }m This kind of hierarchical structure imposed on the in- corresponding weights (η , w ) i=1 is said to be properly dicator space is also employed in the partitioned sampling weighted with respect to the distribution π(·)if,foranymea- strategy [18], which greatly improved the efficiency and the surable function h,wehave accuracy for multiple target tracking over the original SMC m (j) (j) = h η w methods. However, in this paper, the hierarchical structure is j 1 −→ −→ ∞ m (j) Eπ h(η) as m . (2) employed to reduce the computational load associated with j=1 w MKF, especially for high-dimensional indicator space, while (j) retaining the desirable properties of MKF. In particular, if η is sampled from a trial distribution · (j) = Dynamic systems often possess strong memories, that is, g( ) which has the same support as π, and if w (j) (j) { (j) (j) }m future observations can reveal substantial information about π(η )/g(η ), then (η , w ) j=1 is properly weighted · the current state. Therefore, it is often beneficial to make use with respect to π( ). = Λ = of these future observations in sampling the current state. Let Yt (y0, y1, ..., yt)and t (λ0, λ1, ..., λt). By re- However, an MKF method usually does not go back to regen- cursively generating a set of properly weighted random sam- { Λ(j) (j) }m Λ | erate past samples in view of the new observations, although ples ( t , wt ) j=1 to represent p( t Yt), the MKF ap- the past estimation can be adjusted by using the new impor- proximates the target distribution p(xt | Yt)byarandom tance weights. To overcome this difficulty, a delayed-sample mixture of Gaussian distributions method is developed [13]. It makes use of future observa- m (j)N µ(j) Σ(j) tions in generating samples of the current state. It is seen wt c t , t ,(3) there that this method is especially effective in improving j=1 Multilevel Mixture Kalman Filter 2257 where to be ineffective. If there are too many ineffective samples, the Monte Carlo procedure becomes inefficient. To avoid the µ(j) = µ Λ(j) Σ(j) = Σ Λ(j) t t t , t t t (4) degeneracy, a useful resampling procedure, which was sug- gested in [7, 11], may be used. Roughly speaking, resampling are obtained with a Kalman filter on the system (1) for the is to multiply the streams with the larger importance weights, Λ(j) given indicator trajectory t .Denote while eliminating the ones with small importance weights. A simple, but efficient, resampling procedure consists of the (j) µ(j) Σ(j) κt t , t . (5) following two steps. {Λ (j) µ (j) Σ (j)}m Thus, a key step in the MKF is the production at time t of the (1) Sample a new set of streams t , t , t j=1 from { Λ(j) (j) (j) }m {Λ(j) µ(j) Σ(j)}m weighted samples of indicators ( t , κt , wt ) j=1 based t , t , t j=1 with probability proportional to { Λ(j) (j) (j) }m { (j)}m on the set of samples ( t−1, κt−1, wt−1) j=1 at the previous the importance weights wt j=1. − (j) ( ) time (t 1). Suppose that the indicator λt takes values from {Λ µ (j) Σ j }m (2) To each stream in t , t , t j=1, assign equal a finite set A1. The MKF algorithm is as follows. (j) = = weight, that is, wt 1/m, j 1, ..., m. Algorithm 1 (MKF). Suppose at time (t−1), a set of property { Λ(j) (j) (j) }m Resampling can be done at every fixed-length time inter- weighted samples ( t−1, κt−1, wt−1) j=1 is available with re- Λ | val (say, every five steps) or it can be conducted dynamically. spect to p( t−1 Yt−1). Then at time t, as the new data yt ff becomes available, the following steps are implemented to The e ective sample size can be used to monitor the varia- update each weighted sample. tion of the importance weights of the sample streams and to decide when to resample as the system evolves. The effective For j = 1, ..., m, the following steps are applied. sample size is defined as in [13]: (i) Based on the new data y ,foreacha ∈ A ,runaone- t i 1 m step Kalman filter update assuming λt = ai to obtain m¯ ,(9) t 1+υ2 t = (j) −y−−−−t , λt →ai (j) µ Λ(j) = Σ Λ(j) = κt−1 κt,i t t−1, λt ai , t t−1, λt ai . where υt, the coefficient of variation, is given by (6) ( ) 2 1 m w j (ii) For each a ∈ A , compute the sampling density υ2 = t − 1 , (10) i 1 t m w¯ j=1 t (j) = | Λ(j) ρt,i P λt ai t−1, Yt (7) = m (j) (j) (j) with w¯t j=1 wt /m. In dynamic resampling, a resampling ∝ | = Λ − = | Λ p yt λt ai, t−1, Yt 1 P λt ai t−1 . step is performed once the effective sample size m¯ t is below a certain threshold. Note that by the model (1), the first density in (7)is Heuristically, resampling can provide chances for good Gaussian and can be computed based on the Kalman sample streams to amplify themselves and hence “rejuvenate” (j) filter update (6). Draw a sample λt according to the the sampler to produce a better result for future states as (j) Λ(j) the system evolves. It can be shown that the samples drawn above sampling density. Append λt to t−1 and ob- (j) (j) (j) (j) by the above resampling procedure are also indeed properly tain Λ .Ifλ = a , then set κ = κ . t t i t t,i weighted with respect to p(Λ | Y ), provided that m is suf- (iii) Compute the importance weight t t ficiently large. In practice, when small to modest m is used (we use m = 50 in this paper), the resampling procedure (j) = (j) · | Λ(j) wt wt−1 p yt t−1, Yt−1 canbeseenasatradeoff between the bias and the variance. |A | 1 (8) That is, the new samples with their weights resulting from the ∝ (j) · (j) wt−1 ρt,i . resampling procedure are only approximately proper, and i=1 this introduces small bias in the Monte Carlo estimation. On the other hand, however, resampling significantly reduces the {Λ(j) (j) (j)} The new sample t , κt , wt is then properly Monte Carlo variance for future samples. weighted with respect to p(Λt | Yt). (iv) Perform a resampling step as discussed below. 2.3. Delayed estimation 2.2. Resampling procedure Model (1) often exhibites strong memory. As a result, fu- (j) ture observations often contain information about the cur- The importance sampling weight wt measures the “quality” rent state. Hence a delayed estimate is usually more accurate Λ(j) of the corresponding imputed indicator sequence t .Arel- than the concurrent estimate. In delayed estimation, instead atively small weight implies that the sample is drawn far from of making inference on Λt instantaneously with the poste- the main body of the posterior distribution and has a small rior distribution p(Λt | Yt), we delay this inference to a later contribution in the final estimation. Such a sample is said time (t + ∆), ∆ > 0, with the distribution p(Λt | Yt+∆). 2258 EURASIP Journal on Applied Signal Processing
As discussed in [13], there are primarily two approaches to The delayed-sample MKF algorithm recursively propagates delayed estimation, namely, the delayed-weight method and the samples properly weighted for p(Λt−1 | Yt+∆−1) to those the delayed-sample method. for p(Λt | Yt+∆) and is summarized as follows.
2.3.1. Delayed-weight method Algorithm 2 (delayed-sample MKF). Suppose, at time (t+∆− { Λ(j) κ(j) (j) }m { Λ(j) (j) }m 1), a set of properly weighted samples ( t−1, t−1, wt−1) j=1 In the concurrent MKF algorithm, if the set ( t+δ, wt+δ) j=1 is available with respect to p(Λt−1 | Yt+∆−1). Then at time is properly weighted with respect to p(Λt+δ | Yt+δ), then (t + ∆) as the new data yt+∆ becomes available, the following when we focus our attention on λt at time (t + δ), we have steps are implemented to update each weighted sample. { (j) (j) }m that (λt , wt+δ) j=1 is properly weighted with respect to For j = 1, 2, ..., m, the following steps are performed. p(λt | Yt+δ). Then any inference about the indicator λt, = ∈A t+∆−1 ∈ A∆ E{h(λt) | Yt+δ}, can be approximated by (i) For each λt+∆ ai 1,andforeachλt 1 ,per- form a one-step update on the corresponding Kalman (j) t+∆−1 E h λt | Yt+δ filter κt+∆−1(λt ), that is,