Particle Filtering in Signal Processing

EURASIP Journal on Applied Signal Processing

Guest Editors: Petar M. Djuri, Simon J. Godsill, and Arnaud Doucet

EURASIP Journal on Applied Signal Processing Particle Filtering in Signal Processing

Guest Editors: Petar M. Djurić, Simon J. Godsill, and Arnaud Doucet

This is a special issue published in volume 2004 of “EURASIP Journal on Applied Signal Processing.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Editor-in-Chief Marc Moonen, Belgium

Senior Advisory Editor K. J. Ray Liu, College Park, USA Associate Editors Gonzalo Arce, USA A. Gorokhov, The Netherlands King N. Ngan, Hong Kong Jaakko Astola, Finland Peter Handel, Sweden Douglas O’Shaughnessy, Canada Kenneth Barner, USA Ulrich Heute, Germany Antonio Ortega, USA Mauro Barni, Italy John Homer, Australia Montse Pardas, Spain Sankar Basu, USA Jiri Jan, Czech Vincent Poor, USA Jacob Benesty, Canada Søren Holdt Jensen, Denmark Phillip Regalia, France Helmut Bölcskei, Switzerland Mark Kahrs, USA Markus Rupp, Austria Joe Chen, USA Thomas Kaiser, Germany Hideaki Sakai, Japan Chong-Yung Chi, Taiwan Moon Gi Kang, Korea Bill Sandham, UK M. Reha Civanlar, Turkey Aggelos Katsaggelos, USA Dirk Slock, France Luciano Costa, Brazil Walter Kellermann, Germany Piet Sommen, The Netherlands Satya Dharanipragada, USA Alex Kot, Singapore John Sorensen, Denmark Petar M. Djurić, USA C.-C. Jay Kuo, USA Sergios Theodoridis, Greece Jean-Luc Dugelay, France Chin-Hui Lee, USA Dimitrios Tzovaras, Greece Touradj Ebrahimi, Switzerland Sang Uk Lee, Korea Jacques Verly, Belgium Frank Ehlers, Germany Geert Leus, The Netherlands Xiaodong Wang, USA Moncef Gabbouj, Finland Mark Liao, Taiwan Douglas Williams, USA Sharon Gannot, Israel Yuan-Pei Lin, Taiwan Xiang-Gen Xia, USA Fulvio Gini, Italy Bernie Mulgrew, UK Jar-Ferr Yang, Taiwan

Contents

Editorial, Petar M. Djurić, Simon J. Godsill, and Arnaud Doucet Volume 2004 (2004), Issue 15, Pages 2239-2241

Global Sampling for Sequential Filtering over Discrete State Space, Pascal Cheung-Mon-Chan and Eric Moulines Volume 2004 (2004), Issue 15, Pages 2242-2254

Multilevel Mixture Kalman Filter, Dong Guo, Xiaodong Wang, and Rong Chen Volume 2004 (2004), Issue 15, Pages 2255-2266

Resampling Algorithms for Particle Filters: A Computational Complexity Perspective, Miodrag Bolić, Petar M. Djurić, and Sangjin Hong Volume 2004 (2004), Issue 15, Pages 2267-2277

A New Class of Particle Filters for Random Dynamic Systems with Unknown Statistics, Joaquín Míguez, Mónica F. Bugallo, and Petar M. Djurić Volume 2004 (2004), Issue 15, Pages 2278-2294

A Particle Filtering Approach to Change Detection for Nonlinear Systems, Babak Azimi-Sadjadi and P. S. Krishnaprasad Volume 2004 (2004), Issue 15, Pages 2295-2305

Particle Filtering for Joint Symbol and Code Delay Estimation in DS Spread Spectrum Systems in Multipath Environment, Elena Punskaya, Arnaud Doucet, and William J. Fitzgerald Volume 2004 (2004), Issue 15, Pages 2306-2314

Particle Filtering Equalization Method for a Satellite Communication Channel, Stéphane Sénécal, Pierre-Olivier Amblard, and Laurent Cavazzana Volume 2004 (2004), Issue 15, Pages 2315-2327

Channel Tracking Using Particle Filtering in Unresolvable Multipath Environments, Tanya Bertozzi, Didier Le Ruyet, Cristiano Panazio, and Han Vu Thien Volume 2004 (2004), Issue 15, Pages 2328-2338

Joint Tracking of Manoeuvring Targets and Classification of Their Manoeuvrability, Simon Maskell Volume 2004 (2004), Issue 15, Pages 2339-2350

Bearings-Only Tracking of Manoeuvring Targets Using Particle Filters, M. Sanjeev Arulampalam, B. Ristic, N. Gordon, and T. Mansell Volume 2004 (2004), Issue 15, Pages 2351-2365

Time-Varying Noise Estimation for Speech Enhancement and Recognition Using Sequential Monte Carlo Method, Kaisheng Yao and Te-Won Lee Volume 2004 (2004), Issue 15, Pages 2366-2384

Particle Filtering Applied to Musical Tempo Tracking, Stephen W. Hainsworth and Malcolm D. Macleod Volume 2004 (2004), Issue 15, Pages 2385-2395 EURASIP Journal on Applied Signal Processing 2004:15, 2239–2241 c 2004 Hindawi Publishing Corporation

Editorial

Petar M. Djuric´ Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY 11794, USA Email: [email protected]

Simon J. Godsill Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK Email: [email protected]

Arnaud Doucet Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK Email: [email protected]

In most problems of sequential signal processing, measured The beginnings of particle filtering can be traced back to or received data are processed in real time. Typically, the data the late 1940s and early 1950s, which were followed in the last are modeled by state-space models with linear or nonlinear fifty years with sporadic outbreaks of intense activity [2]. Al- unknowns and noise sources that are assumed either Gaus- though its implementation is computationally intensive, the sian or non-Gaussian. When the models describing the data widespread availability of fast computers and the amenability are linear and the noise is Gaussian, the optimal solution is of the particle filtering methods for parallel implementation the renowned Kalman filter. For models that deviate from make them very attractive for solving difficult signal process- linearity and Gaussianity, many different methods exist, of ing problems. which the best known perhaps is the extended Kalman filter. The papers of the special issue may be arranged into four About a decade ago, Gordon et al. published an article on groups, that is, papers on (1) general theory, (2) applica- nonlinear and non-Gaussian state estimation that captured tions of particle filtering to target tracking, (3) applications much attention of the signal processing community [1]. The of particle filtering to communications, and (4) applications article introduced a method for sequential signal processing of particle filtering to speech and music processing. In this is- based on Monte Carlo sampling and showed that the method sue, we do not have tutorials on particle filtering, and instead, may have profound potential. Not surprisingly, it has incited we refer the reader to some recent references [3, 4, 5, 6]. a great deal of research, which has contributed to making sequential signal processing by Monte Carlo methods one of General theory the most prominent developments in statistical signal pro- In the first paper, “Global sampling for sequential fil- cessing in the recent years. tering over discrete state space,” Cheung-Mon-Chan and The underlying idea of the method is the approximation Moulines study conditionally Gaussian linear state-space of posterior densities by discrete random measures. The mea- models, which, when conditioned on a set of indicator vari- sures are composed of samples from the states of the un- ables taking values in a finite set, become linear and Gaus- knowns and of weights associated with the samples. The sam- sian. In this paper, the authors propose a global sampling al- ples are usually referred to as particles, and the process of gorithm for such filters and compare them with other state- updating the random measures with the arrival of new data of-the-art implementations. as particle filtering. One may view particle filtering as explo- Guo et al. in “Multilevel mixture Kalman filter” pro- ration of the space of unknowns with random grids whose pose a new Monte Carlo sampling scheme for implement- nodes are the particles. With the acquisition of new data, the ing the mixture Kalman filter. The authors use a multilevel random grids evolve and their nodes are assigned weights structure of the space for the indicator variables and draw to approximate optimally the desired densities. The assign- samples in a multilevel fashion. They begin with sampling ment of new weights is carried out recursively and is based from the highest-level space and follow up by drawing sam- on Bayesian importance sampling theory. ples from associate subspaces from lower-level spaces. They 2240 EURASIP Journal on Applied Signal Processing demonstrate the method on examples from wireless commu- In the other paper, “Bearings-only tracking of manoeu- nication. vring targets using particle filters,” Arulampalam et al. inves- In the third paper, “Resampling algorithms for particle tigate the problem of bearings-only tracking of maneuvering filters: A computational complexity perspective,” Bolicetal.´ targets. They formulate the problem in the framework of a propose and analyze new resampling algorithms for particle multiple-model tracking problem in jump Markov systems filters that are suitable for real-time implementation. By de- and propose three different particle filters. They conduct ex- creasing the number of operations and memory access, the tensive simulations and show that their filters outperform the algorithms reduce the complexity of both hardware and DSP trackers based on standard interacting multiple models. realization. The performance of the algorithms is evaluated on particle filters applied to bearings-only tracking and joint Applications to speech and music detection and estimation in wireless communications. In “Time-varying noise estimation for speech enhancement In “A new class of particle filters for random dynamic sys- and recognition using sequential Monte Carlo method,” Yao tems with unknown statistics,” M´ıguez et al. propose a new and Lee develop particle filters for sequential estimation of class of particle filtering methods that do not assume explicit time-varying mean vectors of noise power in the log-spectral mathematical forms of the probability distributions of the domain, where the noise parameters evolve according to a noise in the system. This implies simpler, more robust, and random walk model. The authors demonstrate the perfor- more flexible particle filters than the standard particle filters. mance of the proposed filters in automated speech recogni- The performance of these filters is shown on autonomous tion and speech enhancement, respectively. positioning of a vehicle in a 2-dimensional space. Hainsworth and Macleod in “Particle filtering applied to Finally, in “A particle filtering approach to change de- musical tempo tracking” aim at estimating the time-varying tection for nonlinear systems,” Azimi-Sadjadi and Krish- tempo process in musical audio analysis. They present two naprasad present a particle filtering method for change de- algorithms for generic beat tracking that can be used across tection in stochastic systems with nonlinear dynamics based a variety of musical styles. The authors have tested the algo- on a statistic that allows for recursive computation of likeli- rithms on a large database and have discussed existing prob- hood ratios. They use the method in an Inertial Navigation lems and directions for further improvement of the current System/Global Positioning System application. methods. In summary, this special issue provides some inter- Applications in communications esting theoretical developments in particle filtering theory In “Particle filtering for joint symbol and code delay esti- and novel applications in communications, tracking, and mation in DS spread spectrum systems in multipath envi- speech/music signal processing. We hope that these papers ronment,” Punskaya et al. develop receivers based on several will not only be of immediate use to practitioners and the- algorithms that involve both deterministic and randomized oreticians but will also instigate further development in the schemes. They test their method against other deterministic field. Lastly, we thank the authors for their contributions and and stochastic procedures by means of extensive simulations. the reviewers for their valuable comments and criticism. In the second paper, “Particle filtering equalization method for a satellite communication channel,” Sen´ ecal´ et al. propose a particle filtering method for inline and Petar M. Djurić blind equalization of satellite communication channels and Simon J. Godsill restoration of the transmitted messages. The performance of Arnaud Doucet the algorithms is presented by bit error rates as functions of signal-to-noise ratio. Bertozzi et al. in “Channel tracking using particle filtering in unresolvable multipath environments,” propose a new REFERENCES timing error detector for timing tracking loops of Rake re- [1] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel ap- ceivers in spread spectrum systems. In their scheme, the de- proach to nonlinear/non-Gaussian Bayesian state estimation,” lays of each path of the frequency-selective channels are esti- IEE Proceedings Part F: Radar and Signal Processing, vol. 140, mated jointly. Their simulation results demonstrate that the no. 2, pp. 107–113, 1993. [2]J.S.Liu, Monte Carlo Strategies in Scientific Computing, proposed scheme has better performance than the one based Springer, New York, NY, USA, 2001. on conventional early-late gate detectors in indoor scenarios. [3] A. Doucet, N. de Freitas, and N. Gordon, Eds., Sequential Monte Carlo Methods in Practice, Springer,NewYork,USA, Applications to target tracking 2001. In “Joint tracking of manoeuvring targets and classification [4] A. Doucet, S. J. Godsill, and C. Andrieu, “On sequential Monte of their manoeuvrability,” by Maskell, semi-Markov models Carlo sampling methods for Bayesian filtering,” Stat. Comput., are used to describe the behavior of maneuvering targets. The vol. 10, no. 3, pp. 197–208, 2000. [5]P.M.Djuric´ and S. J. Godsill, Eds., “Special issue on Monte author proposes an architecture that allows particle filters to ffi Carlo methods for statistical signal processing,” IEEE Trans. be robust and e cient when they jointly track and classify Signal Processing, vol. 50, no. 2, 2002. targets. He also shows that with his approach, one can classify [6]P.M.Djuric,´ J. H. Kotecha, J. Zhang, et al., “Particle filtering,” targets on the basis of their maneuverability. IEEE Signal Processing Magazine, vol. 20, no. 5, pp. 19–38, 2003. Editorial 2241

Petar M. Djuric´ received his B.S. and M.S. degrees in electrical engineering from the University of Belgrade in 1981 and 1986, respectively, and his Ph.D. degree in electrical engineering from the University of Rhode Island in 1990. From 1981 to 1986, he was a Research Associate with the Institute of Nu- clear Sciences, Vinca, Belgrade. Since 1990, he has been with Stony Brook University, where he is a Professor in the Department of Electrical and Computer Engineering. He works in the area of statistical signal processing, and his primary interests are in the theory of modeling, detection, estimation, and time series analysis and its application to a wide variety of disciplines including wireless communications and biomedicine. Simon J. Godsill is a Reader in statistical signal processing in the Department of Engineering, Cambridge University. He is an Associate Editor for IEEE Transac- tions on Signal Processing and the Jour- nal of Bayesian Analysis, and is a Mem- ber of IEEE Signal Processing Theory and Methods Committee. He has research interests in Bayesian and statistical methods for signal processing, Monte Carlo algorithms for Bayesian problems, modelling and enhancement of audio and musical signals, tracking, and genomic signal processing. He has published extensively in journals, books, and conferences. He has coedited in 2002 a special issue of IEEE Transactions on Signal Processing on Monte Carlo methods in signal processing and organized many conference sessions on related themes.

Arnaud Doucet wasborninFranceonthe 2nd of November 1970. He graduated from Institut National des Telecommunications in June 1993 and obtained his Ph.D. degree from Universite´ Paris-Sud Orsay in Decem- ber 1997. From January 1998 to February 2001 he was a research associate in Cam- bridge University. From March 2001 to Au- gust 2002, he was a Senior Lecturer in the Department of Electrical Engineering, Mel- bourne University, Australia. Since September 2002, he has been a University Lecturer in information engineering at Cambridge Uni- versity. His research interests include simulation-based methods and their applications to Bayesian statistics and control. EURASIP Journal on Applied Signal Processing 2004:15, 2242–2254 c 2004 Hindawi Publishing Corporation

Global Sampling for Sequential Filtering over Discrete State Space

Pascal Cheung-Mon-Chan Ecole´ Nationale Supérieure des Télécommunications, 46 rue Barrault, 75634 Paris Cédex 13, France Email: [email protected]

Eric Moulines Ecole´ Nationale Supérieure des Télécommunications, 46 rue Barrault, 75634 Paris Cédex 13, France Email: [email protected]

Received 21 June 2003; Revised 22 January 2004

In many situations, there is a need to approximate a sequence of probability measures over a growing product of finite spaces. Whereas it is in general possible to determine analytic expressions for these probability measures, the number of computations needed to evaluate these quantities grows exponentially thus precluding real-time implementation. Sequential Monte Carlo techniques (SMC), which consist in approximating the flow of probability measures by the empirical distribution of a finite set of particles, are attractive techniques for addressing this type of problems. In this paper, we present a simple implementation of the sequential importance sampling/resampling (SISR) technique for approximating these distributions; this method relies on the fact that, the space being finite, it is possible to consider every offspring of the trajectory of particles. The procedure is straightforward to implement, and well-suited for practical implementation. A limited Monte Carlo experiment is carried out to support our findings. Keywords and phrases: particle filters, sequential importance sampling, sequential Monte Carlo sampling, sequential filtering, conditionally linear Gaussian state-space models, autoregressive models.

1. INTRODUCTION depending on their ability to represent the distribution of the hidden state conditional on the observations. The main dif- State-spacemodelshavebeenaroundforquitealongtime ference between the different implementations of the SMC to model dynamic systems. State-space models are used in a algorithms depends on the way this population of particles variety of fields such as computer vision, financial data anal- evolves in time. It is no surprise that most of the efforts in ysis, mobile communication, radar systems, among others. A this field has been dedicated to finding numerically efficient main challenge is to design efficient methods for online esti- and robust methods, which can be used in real-time imple- mation, prediction, and smoothing of the hidden state given mentations. the continuous flow of observations from the system. Ex- In this paper, we consider a special case of state-space cept in a few special cases, including linear state-space mod- model, often referred to in the literature as conditionally els (see [1]) and hidden finite-state Markov chain (see [2]), Gaussian linear state-space models (CGLSSMs), which has re- this problem does not admit computationally tractable exact ceived a lot of attention in the recent years (see, e.g., [4, 5, solutions. 6, 7]). The main feature of a CGLSSM is that, conditionally From the mid 1960s, considerable research efforts have on a set of indicator variables, here taking their values in a been devoted to develop computationally efficient methods finite set, the system becomes linear and Gaussian. Efficient to approximate these distributions; in the last decade, a great recursive procedures—such as the Kalman filter/smoother— deal of attention has been devoted to sequential Monte Carlo are available to compute the distribution of the state variable (SMC) algorithms (see [3] and the references therein). The conditional on the indicator variable and the observations. basic idea of SMC method consists in approximating the con- By embedding these algorithms in the sequential importance ditional distribution of the hidden state with the empirical sampling/resampling (SISR) framework, it is possible to de- distribution of a set of random points, called particles. These rive computationally efficient sampling procedures which particles can either give birth to offspring particles or die, focus their attention on the space of indicator variables. Global Sampling for Sequential Filtering 2243

These algorithms are collectively referred to as mixture by µ ⊗ Q the measure on the product space (X × Y, B(X) ⊗ Kalman filters (MKFs), a term first coined by Chen and Liu B(Y)) defined by [8] who have developed a generic sampling algorithm; closely related ideas have appeared earlier in the automatic con- µ⊗Q(A×B)= µ(dx)Q(x, B) ∀A∈B(X), B ∈B(Y). (1) trol/signal processing and computational statistics literature A (see, e.g., [9, 10] for early work in this field; see [5] and the references therein for a tutorial on these methods; see [3] Let X :(Ω, F ) → (X, B(X)) and Y :(Ω, F ) → (Y, B(Y)) for practical implementations of these techniques). Because be two random variables and µ and ν two measures on these sampling procedures operate on a lower-dimensional (X, B(X)) and (Y, B(Y)), respectively. Assume that the space, they typically achieve lower Monte Carlo variance than probability distribution of (X, Y)hasadensitydenotedby “plain” particle filtering methods. f (x, y)withrespectto µ ⊗ ν.Wedenoteby f (y|x) = ν In the CGLSSM considered here, it is assumed that the f (x, y)/ Y f (x, y) (dy) the conditional density of Y given X. indicator variables are discrete and take a finite number of different values. It is thus feasible to consider every possible 2.2. Sequential importance sampling offspring of a trajectory, defined here as a particular realiza- Let {Ft}t≥0 beasequenceofprobabilitymeasureson tion of a sequence of indicator variables from initial time 0 to def (Zt+1, P (Z)⊗(t+1)), where Z ={z , ..., z } is a finite set with the current time t. This has been observed by the authors in 1 M cardinal equal to M. It is assumed in this section that for any [5, 7, 8], among many others, who have used this property to t λ − ∈ Z such that f − (λ − ) = 0, we have design appropriate proposal distributions for improving the 0:t 1 t 1 0:t 1 accuracy and performance of SISR procedures. = ∀ ∈ In this work, we use this key property in a different way, ft([λ0:t−1, λ]) 0 λ Z,(2) along the lines drawn in [11, Section 3]; the basic idea con- ≥ sists in considering the population of every possible offspring where for any τ 0, fτ denotes the density of Fτ with respect ≥ of every trajectory and globally sampling from this popula- to the counting measure. For any t 1, there exists a finite t P ⊗t ≺ P tion. This algorithm is referred to as the global sampling (GS) transition kernel Qt :(Z , (Z) ) (Z, (Z)) such that algorithm. This algorithm can be seen as a simple implemen- = ⊗ tation of the SISR algorithm for the so-called optimal impor- Ft Ft−1 Qt. (3) tance distribution. Some limited Monte Carlo experiments on prototypal We denote by qt the density of the kernel Qt with respect to examples show that this algorithm compares favorably with to the counting measure, which can simply be expressed as state-of-the-art implementation of MKF; in a joint symbol   estimation and channel equalization task, we have in particu-  ft λ0:t−1, λ if ft−1 λ0:t−1 = 0, − , = f − λ − (4) lar achieved extremely encouraging performance with as few qt λ0:t 1 λ  t 1 0:t 1 as 5 particles, making the proposed algorithm amenable to 0 otherwise. real-time applications. In the SIS framework (see [5, 8]), the probability distribution F on Zt+1 is approximated by particles (Λ(1,t), ..., Λ(N,t)) 2. SEQUENTIAL MONTE CARLO ALGORITHMS t associated to nonnegative weights (w(1,t), ..., w(N,t)); the esti- 2.1. Notations and definitions mator of the probability measure associated to this weighted particle system is given by Before going further, some additional definitions and notations are required. Let X (resp., Y) be a general set and let N (i,t) = w δΛ(i,t) B(X)(resp.,B(Y)) denote a σ-algebra on X (resp., Y). If Q FN = i 1 . (5) t N (i,t) is a nonnegative function on X × B(Y) such that i=1 w (i) for each B ∈ B(Y), Q(·, B) is a nonnegative measur- These trajectories and weights are obtained by drawing N in- able function on X, dependent trajectories Λ(i,t) under an instrumental probabil- (ii) for each x ∈ X, Q(x, ·)isameasureonB(Y), t+1 ⊗(t+1) ity distribution Gt on (Z , P (Z) ) and computing the then we call Q a transition kernel from (X, B(X)) to importance weights as (Y, B(Y)) and we denote Q :(X, B(X))≺(Y, B(Y)). If for Λ(i,t) each x ∈ X, Q(x, ·)isafinitemeasureon(Y, B(Y)), then (i,t) = ft ∈{ } w (i,t) , i 1, ..., N ,(6) we say that the transition is finite. If for all x ∈ X, Q(x, ·) gt(Λ ) is a probability measure on (Y, B(Y)), then Q is said to be a Markov transition kernel. where gt is the density of the probability measure Gt with Denote by B(X) ⊗ B(Y) the product σ-algebra (the respect to the counting measure on (Zt+1, P (Z)(t+1)). It is smallest σ-algebra containing all the sets A × B,whereA ∈ assumed that for each t, Ft is absolutely continuous with B(X)andB ∈ B(Y)). If µ is a measure on (X, B(X)) and Q respect to the instrumental probability Gt, that is, for all t+1 is a transition kernel, Q :(X, B(X))≺(Y, B(Y)), we denote λ0:t ∈ Z such that gt(λ0:t) = 0, ft(λ0:t) = 0. In the SIS 2244 EURASIP Journal on Applied Signal Processing framework, these weighted trajectories are updated by draw- The choice of the optimal instrumental distribution (14)has ing at each time step an offspring of each particle and then been introduced in [12] and has since then been used and/or computing the associated importance weight. It is assumed rediscovered by many authors (see [5, Section II-D] for a dis- in the sequel that the instrumental probability measure sat- cussion and extended references). Using this particular form isfies a decomposition similar to (3), that is, of the importance kernel, the incremental importance sampling weights (13)aregivenby Gt = Gt−1 ⊗ Kt,(7) M ⊗ where K :(Zt, P (Z) t)≺(Z, P (Z)) is a Markov transition (i,t−1) (i,t−1) t ut Λ , zJ(i,t) = qt Λ , zj , i ∈{1, ..., N}. M t kernel: = K (λ − , {z }) = 1. Hence, for all λ − ∈ Z , = j 1 t 0:t 1 j 0:t 1 j 1 M = (15) j=1 gt([λ0:t−1, zj ]) gt−1(λ0:t−1), showing that whenever = = ∈{ } (i,t−1) (i,t−1) gt−1(λ0:t−1) 0, gt([λ0:t−1, zj ]) 0forallj 1, ..., M . It is worthwhile to note that ut([Λ , zj ]) = ut([Λ , Define by kt the density of the Markov transition kernel Kt zl]) for all j, l ∈{1, ..., M}; the incremental importance with respect to the counting measure: weights do not depend upon the particular offspring of the  particle which is drawn.   gt λ0:t−1, λ if gt−1 λ0:t−1 = 0, − , = g − λ − (8) kt λ0:t 1 λ  t 1 0:t 1 2.3. Sequential importance sampling/resampling 0 otherwise. (i,t) def= (i,t) N (i,t) The normalized importance weights w¯ w / i=1 w In the SIS framework, at each time t, for each particle Λ(i,t−1), reflect the contribution of the imputed trajectories to the im- N i ∈{1, ..., N}, and then for each particular offspring j ∈ portance sampling estimate Ft . A weight close to zero in- {1, ..., M}, we evaluate the weights dicates that the associated trajectory has a “small” contri- ff bution. Such trajectories are thus ine ective and should be (i,j,t) (i,t−1) ρ = kt Λ , zj (9) eliminated. Resampling is the method usually employed to com- andwedrawanindexJ(i,t) from a multinomial distribution bat the degeneracy of the system of particles. Let [Λ(1,t−1), with parameters (ρ(i,1,t−1), ..., ρ(i,M,t−1)) conditionally inde- ..., Λ(N,t−1)] be a set of particles at time t − 1 and let pendently from the past: [w(1,t−1), ..., w(N,t−1)] be the associated importance weights. An SISR iteration, in its most elementary form, produces P (i,t) = | G = (i,j,t) ∈{ } ∈{ } J j t−1 ρ , i 1, ..., N , j 1, ..., M , a set of particles [Λ(1,t), ..., Λ(N,t)] with equal weights 1/N. (10) The SISR algorithm is a two-step procedure. In the first step, each particle is updated according to the importance tran- where Gt is the history of the particle system at time t, sition kernel kt and the incremental importance weights are computed according to (12)and(13), exactly as in the SIS al- G = Λ(j,τ), (j,τ) ,1≤ ≤ ,1≤ ≤ (11) t σ w j N τ t . gorithm. This produces an intermediate set of particles Λ˜ (i,t) ˜(i,t) The updated system of particles then is with associated importance weights w defined as

(i,t) (i,t−1) Λ˜ (i,t) = Λ(i,t−1) Λ = Λ , zJ(i,t) . (12) , zJ˜(i,t) , (16) (i,t) (i,t−1) (i,t−1) ˜ = Λ , ˜(i,t) , ∈{1, , }, If (Λ(1,0), ..., Λ(N,0)) is an independent sample from the dis- w w ut zJ i ... N tribution G0, it is then easy to see that at each time t, the particles (Λ(1,t), ..., Λ(N,t)) are independent and distributed where the random variables J˜(i,t), i ∈{1, ..., N},aredrawn according to Gt; the associated (unnormalized) importance conditionally independently from the past according to a (i,t) (i,t) (i,t) weights w = ft(Λ )/gt(Λ )canbewrittenasaproduct multinomial distribution with parameters (i,t) = Λ(i,t−1) (i,t−1) w ut( , zJ(i,t) )w , where the incremental weight (i,t−1) (i,t) (i,t−1) ut(Λ , ZJ(i,t) )isgivenby P J˜ = j Gt−1 = kt Λ , zj , (17) i ∈{1, ..., N}, j ∈{1, ..., M}. def qtλ0:t−1, λ t ut λ0:t−1, λ = ∀λ0:t−1 ∈ Z , λ ∈ Z. (13) kt λ0:t−1, λ (i,t) (i,t) We denote by S˜t = ((Λ˜ , w˜ ), i ∈{1, ..., N}), this in- It is easily shown that the instrumental distribution kt which termediate set of particles. In the second step, we resam- minimizes the variance of the importance weights condition- ple the intermediate particle system. Resampling consists in ally to the history of the particle system (see [5,Proposition transforming the weighted approximation of the probability N = N (i,t) 2]) is given by measure Ft, Ft i=1 w˜ δΛ˜ (i,t) , into an unweighted one, Ñ = −1 N Ft N i=1 δΛ(i,t) . To avoid introducing bias during the q λ − , · · = t 0:t 1 ∈ t resampling step, an unbiased resampling procedure should kt λ0:t−1, M for any λ0:t−1 Z . j=1 qt λ0:t−1, zj be used. More precisely, we draw with replacements N in- (1,t) (N,t) (i,t) = N (14) dices I , ..., I in such a way that N k=1 δi,I(k,t) , Global Sampling for Sequential Filtering 2245 the number of times the ith trajectory is chosen satisfies Recall that, when the optimal importance distribution is used, for each particle i ∈{1, ..., N}, the random variables N ˜(i,t) ∈{ } (i,t) (i,t) (i,t) J , i 1, ..., M , are conditionally independent from N = N, E N | G˜ t = Nw˜ G −1 and are distributed with multinomial random variable = (18) t i 1 with parameters ∈{ } for any i 1, ..., N , Λ(i,t−1) (i,t) qt , zj G˜ P J˜ = j | Gt−1 = , where t is the history of the particle system just before the M Λ(i,t−1) j=1 qt , zj (22) resampling step (see (11)), that is, G˜ t is the σ-algebra gener- (1,t) (N,t) ∈{ } ∈{ } ated by the union of Gt−1 and σ(J˜ , ..., J˜ ): i 1, ..., N , j 1, ..., M . (1,t) (N,t) We may compute, for , ∈{1, , } and ∈{1, , }, G˜ t = Gt−1∨σ J˜ , ..., J˜ . (19) i k ... N j ... M (k,t) (k,t) ∈{ } P I , J = (i, j) | Gt−1 Then,weset,fork 1, ..., N , = E P (k,t) = ˜(i,t) = | G˜ G ( , ) I i, J j t t−1 I(k,t), J(k,t) = I(k,t), J˜(I k t ,t) , = E P (k,t) = | G˜ ˜(i,t) = G (20) I i t 1(J j) t−1 Λ(k,t) = Λ(I(k,t),t−1) (k,t) = 1 , zJ(k,t) , w . M Λ(i,t−1) (i,t−1) N j=1 qt , zj w = (23) N M Λ(i,t−1) (i,t−1) =1 =1 qt , zj w Note that the sampling is done with replacement in the sense i j (i,t) that the same particle can be either eliminated or copied sev- × P J˜ = j | Gt−1 = eral times in the final updated sample. We denote by St (i,t−1) (i,t−1) qt(Λ , zj )w ((Λ(i,t), w(i,t)), i ∈{1, ..., N}) this set of particles. = = w¯ (i,j,t), N M Λ(i,t−1) (i,t−1) There are several options to obtain an unbiased sample. i=1 j=1 qt , zj w The most obvious choice consists in drawing the N particles conditionally independently on G˜ according to a multino- showing that the SISR algorithm is equivalent to drawing, t G mial distribution with normalized weights (w˜(1,t), ..., w˜(N,t)). conditionally independently from t−1, N random variables × ff In the literature, this is referred to as multinomial sam- out of N M possible o springs of the system of particles, (i,j,t) ∈{ } ∈{ } pling. As a result, under multinomial sampling, the particles with weights (w¯ , i 1, ..., N , j 1, ..., N ). (i,t) Resampling can be done at any time. When resampling is Λ are conditional on G˜t independent and identically distributed (i.i.d.). There are however better algorithms which done at every time step, it is said to be systematic. In this case, (i,t) ∈{ } reduce the added variability introduced during the sampling the importance weights at each time t, w , i 1, ..., N , step (see the appendix). are all equal to 1/N. Systematic resampling is not always rec- This procedure is referred to as the SISR procedure. The ommended since resampling is costly from the computa- particles with large normalized importance weights are likely tional point of view and may result in loss of statistical ef- to be selected and will be kept alive. On the contrary, the ficiency by introducing some additional randomness in the ff particles with low normalized importance weights are elimi- particle system. However, the e ect of resampling is not nec- nated. Resampling provides more efficient samples of future essarily negative because it allows to control the degener- states but increases sampling variation in the past states be- acy of the particle systems, which has a positive impact on cause it reduces the number of distinct trajectories. the quality of the estimates. Therefore, systematic resam- The SISR algorithm with multinomial sampling defines a pling yields in some situations better estimates than the stan- Markov chain on the path space. The transition kernel of this dard SIS procedure (without resampling); in some cases (see chain depends upon the choice of the proposal distribution Section 4.2 for an illustration), it compares favorably with and of the unbiased procedure used in the resampling step. more sophisticated versions of the SISR algorithm, where re- These transition kernels are, except in a few special cases, in- sampling is done at random times (e.g., when the entropy ffi volved. However, when the “optimal” importance distribu- or the coe cient of variations of the normalized importance tion (14) is used in conjunction with multinomial sampling, weights is below a threshold). the transition kernel has a simple and intuitive expression. As 2.4. The global sampling algorithm already mentioned above, the incremental weights for all the possible offsprings of a given particle are, in this case, iden- When the instrumental distribution is the so-called optimal tical; as a consequence, under multinomial sampling, the in- sampling distribution (14), it is possible to combine the sam- dices I(k,t), k ∈{1, ..., N}, are i.i.d. with multinomial distri- pling/resampling step above into a single sampling step. This bution for all k ∈{1, ..., N}, idea has already been mentioned and worked out in [11, Section 3] under the name of deterministic/resample low (k,t) P I = i|G˜ t weights (RLW) approach, yet the algorithm given below is M Λ(i,t−1) (i,t−1) not given explicitly in this reference. j=1 qt , zj w − − = , i ∈{1, ..., N}. Let [Λ(1,t 1), ..., Λ(N,t 1)]beasetofparticlesattimet −1 N M Λ(i,t−1) (i,t−1) − − i=1 j=1 qt , zj w and let [w(1,t 1), ..., w(N,t 1)] be the associated importance (21) weights. Similar to the SISR step, the GS algorithm produces 2246 EURASIP Journal on Applied Signal Processing a set of particles [Λ(1,t), ..., Λ(N,t)] with equal weights. The where GS algorithm combines the two-stage sampling procedure (i) {Λ } ≥ are the indicators variables,hereassumedto (first, samples a particular offspring of a particle, updates the t t 0 take values in a finite set Z ={z , z , ..., z },where importance weights, and then resamples from the popula- 1 2 M M denotes the cardinal of the set Z; the law of {Λt}t≥0 tion) into a single one. is assumed to be known but is otherwise not specified; t+1 (i) We first compute the weights (ii) for any t ≥ 0, Ψt is a function Ψt : Z → S,whereS is a finite set; (i,j,t) (i,t−1) (i,t−1) { } × w =w qt Λ , zj , i∈{1, ..., N}, j ∈{1, ..., M}. (iii) Xt t≥0 are the (nx 1) state vectors; these state vari- (24) ables are not directly observed; (ii) We then draw N random variables ((I(1,t), J(1,t)), ..., (iv) the distribution of X0 is complex Gaussian with mean (I(N,t), J(N,t))) in {1, ..., N}×{1, ..., M} using an un- µ0 and covariance Γ0; ∈ biased sampling procedure, that is, for all (i, j) (v) {Yt}t0 are the (ny × 1) observations; {1, ..., N}×{1, ..., M}, the number of times of the (vi) {Wt}t0 and {Vt}t0 are (complex) nw-andnv- particles (i, j)is dimensional (complex) Gaussian white noise, Wt ∼ def N × ∼ N × × (i,j,t) = ∈{ } (k,t) (k,t) = c(0, Inw nw )andVt c(0, Inv nv ), where Ip p is the N k 1, ..., N , I , J (i, j) (25) p× p identity matrix; {Wt}t0 is referred to as the state { } thus satisfying the following two conditions: noise,whereas Vt t0 is the observation noise; (vii) {As, s ∈ S} are the state transition matrices, {Bs, s ∈ S} N M are the observation matrices, and {Cs, s ∈ S} and (i ,j ,t) N = N, {Ds, s ∈ S} are Cholesky factors of the covariance ma- i =1 j =1 trix of the state noise and measurement noise, respec- (26) (i,j,t) tively; these matrices are assumed to be known; E (i,j,t) G = w N −1 N . {Λ } t N M (i ,j ,t) (viii) the indicator process t t≥0 and the noise observa- i =1 j =1 w tion processes {Vt}t≥0 and {Wt}t≥0 are independent. The updated set of particles is then defined as This model has been considered by many authors, following the pioneering work in [13, 14](see[5, 7, 8, 15] for author- (k,t) (I(k,t),t−1) (k,t) 1 itative recent surveys). Despite its simplicity, this model is Λ = Λ , zJ(k,t) , w = . (27) N flexible enough to describe many situations of interests including linear state-space models with non-Gaussian state If multinomial sampling is used, then the GS algorithm is a noise or observation noise (heavy-tail noise), jump linear simple implementation of the SISR algorithm, which com- systems, linear state space with missing observations; of bines the two-stage sampling into a single one. Since the course, digital communication over fading channels, and so computational cost of drawing L random variables grows lin- forth. early with L, the cost of simulations is proportional to NM Our aim in this paper is to compute recursively in time an for the GS algorithm and NM + N for the SISR algorithm. estimate of the conditional probability of the (unobserved) There is thus a (slight) advantage in using the GS implemen- indicator variable Λ given the observation up to time n + ∆, tation. When sampling is done using a different unbiased n P Λ | = ∆ method (see the appendix), then there is a more substantial that is, ( n Y0:n+∆ y0:n+∆), where is a nonnegative in- { } ≤ difference between these two algorithms. As illustrated in the teger and for any sequence λt t≥0 and any integer 0 i 0, it is called the fixed-lag smoothing distribution, and ∆ is the lag. 3. GLOBAL SAMPLING FOR CONDITIONALLY GAUSSIAN STATE-SPACE MODELS 3.2. Filtering 3.1. Conditionally linear Gaussian state-space model In this section, we describe the implementation of the GS algorithm to approximate the filtering probability of the indi- As emphasized in the introduction, CGLSSMs are a partic- cator variables given the observations ular class of state-space models which are such that, conditional to a set of indicator variables, the system becomes lin- ft λ0:t = P Λ0:t = λ0:t | Y0:t = y0:t (29) ear and Gaussian. More precisely, in the CGLSSM (28). We will first show that the filtering = Ψ Λ St t 0:t , probability Ft satisfies condition (3), that is, for any t ≥ 1, = ⊗ ffi = Ft Ft−1 Qt; we then present an e cient recursive algo- Xt ASt Xt−1 + CSt Wt, (28) rithm to compute the transition kernel Qt using the Kalman = ≥ ∈ t+1 Yt BSt Xt + DSt Vt, filter update equations. For any t 1andforanyλ0:t Z , Global Sampling for Sequential Filtering 2247 under the conditional independence structure implied by the hands all the necessary ingredients to derive the GS approx- CGLSSM (28), the Bayes formula shows that imation of the filtering distribution. For any t ∈ N and for ∈ t+1 any λ0:t Z ,denote ∝ | |  qt λ0:t−1; λt f (yt y0:t−1, λ0:t) f (λt λ0:t−1). (30)  ( | ) for = 0, def= f y0 λ0 f λ0 t γt(λ0:t)  (38) The predictive distribution of the observations given the in- f (yt|λ0:t, y0:t−1) f (λt|λ0:t−1)fort>0. dicator variables f (yt|y0:t−1, λ0:t) can be evaluated along each ∝ trajectory of indicator variables λ0:t using the Kalman filter With these notations, (30)readsqt(λ0:t−1; λt) γt(λ0:t). recursions. Denote by gc(·; µ, Γ) the density of a complex cir- The first step consists in initializing the particle tracks. = ∈{ } (i,0) = Γ(i,0) = Γ cular Gaussian random vector with mean µ and covariance For t 1andi 1, ..., N ,setµ µ0 and 0, † Γ matrix Γ,andforA a matrix, let A be the transpose conju- where µ0 and 0 are the initial mean and variance of the state gate of A;wehave,withst = Ψt(λ0:t)(andΨt is defined in vector (which are assumed to be known); then, compute the (28)), weights

γ0(zj ) f (y |λ , y − ) = ∈{ } t 0:t 0:t 1 wj M , j 1, ..., M , (39) † † j =1 γ0 zj = | − Γ | − gc yt; Bst µt t 1 λ0:t , Bst t t 1 λ0:t Bst + Dst Dst , { ∈{ }} ∈ (31) and draw Ii, i 1, ..., N in such a way that, for j { } E = = N 1, ..., M , [Nj ] Nwj ,whereNj i=1 δIi,j . Then, set where µt|t−1[λ0:t]andΓt|t−1[λ0:t] denote the filtered mean Λ(i,0) = ∈{ } zIi , i 1, ..., N . and covariance of the state, that is, the conditional mean At time t ≥ 1, assume that we have N trajectories and covariance of the state given the indicators variables λ0:t (i,t−1) (i,t−1) (i,t−1) Λ = (Λ , , Λ − ) and that, for each trajec- − 0 ... t 1 and the observations up to time t 1 (the dependence of (i,t−1) tory,wehavestoredthefilteredmeanµ and covariance the predictive mean µt|t−1[λ0:t] on the observations y0:t−1 is Γ(i,t−1) defined in (36)and(37), respectively. implicit). These quantities can be computed recursively using the following Kalman one-step prediction/correction for- (1) For i ∈{1, ..., N} and j ∈{1, ..., M},compute (i,t−1) mula. Denote by µt−1([λ0:t−1]) and Γt−1([λ0:t−1]) the mean the predictive mean µt|t−1[Λ , zj ]andcovariance (i,t−1) and covariance of the filtering density, respectively. These Γt|t−1[Λ , zj ] using (32)and(33), respectively. (i,t−1) quantities can be recursively updated as follows: Then, compute the innovation covariance Σt[Λ , (i,j,t) zj ] using (34) and evaluate the likelihood γ of (i) predictive mean: (i,t−1) the particle [Λ , zj ] using (31). Finally, compute Λ(i,t−1) = the filtered mean and covariance µt([ , zj ]) and µt|t−1 λ0:t Ast µt−1 λ0:t ; (32) (k,t−1) Γt([Λ , zj ]). (ii) predictive covariance: (2) Compute the weights γ(i,j,t) Γ = Γ T T (i,j,t) = | − λ A − λ A + C C ; (33) w , t t 1 0:t st t 1 0:t st st st N M (i ,j ,t) i =1 j =1 γ (40) (iii) innovation covariance: i ∈{1, ..., N}, j ∈{1, ..., M}.

T T (3) Draw {(Ik, Jk), k ∈{1, ..., N}} using an unbiased Σ λ = B Γ | − λ B + D D ; (34) t 0:t st t t 1 0:t st st st sampling procedure (see (26)) with weights {w(i,j,t)}, i ∈{1, ..., N}, j ∈{1, ..., M};set,fork ∈{1, ..., N}, (iv) Kalman Gain: − Λ(k,t) = Λ(Ik,t 1) ( , zJk ). Store the filtered mean and (k,t) (k,t) −1 covariance µt([Λ ]) and Γt([Λ ]) using (36)and K λ = Γ | − λ B Σ [λ ] ; (35) t 0:t t t 1 0:t st t 0:t (37), respectively. (v) filtered mean: Remark 1. From the trajectories and the computed weights it is possible to evaluate, for any δ ≥ 0andt ≥ δ, the posterior = − µt λ0:t µt−1 λ0:t−1 + Kt λ0:t yt Bst µt|t−1 λ0:t−1 ; (36) probability of Λt−δ given Y0:t = y0:t as (vi) filtered covariance: Pˆ Λt−δ = zk | Y0:t = y0:t   N Γ λ ]= I − K [λ − ]B Γ | − λ . (37)  t 0:t t 0:t 1 st t t 1 0:t  (i,k,t), = 0, filtering,  w δ = ∝ i 1   Note that the conditional distribution of the state vector Xt  N M   (i,j,t) given the observations up to time t, y0:t, is a mixture of  w δΛ(i,t−1) , δ>0, fixed-lag smoothing. t−δ ,zk Gaussian distributions with a number of components equal i=1 j=1 to Mt+1 which grows exponentially with t.Wehavenowat (41) 2248 EURASIP Journal on Applied Signal Processing

Similarly, we can approximate the filtering and the smooth- Below, we describe a straightforward implementation of ing distribution of the state variable as a mixture of Gaus- the GS method to approximate the smoothing distribution sians. For example, we can estimate the filtered mean and by the delayed sampling procedure; more sophisticated tech- variance of the state as follows: niques, using early pruning of the possible prolonged trajectories, are currently under investigation. For any t ∈ N and (i) filtered mean: t+1 for any λ0:t ∈ Z ,denote

N M ∆ w(i,j,t)µ Λ(i,t−1), z ); (42) t+ t j ∆ def= i=1 j=1 Dt λ0:t γτ λ0:τ , (45) = λt+1:t+∆ τ t+1 (ii) filtered covariance: where the function γτ is defined in (38). With this notation, (44)mayberewrittenas N M w(i,j,t)Γ Λ(k,t−1), z . (43) t j i=1 j=1 ∆ ∆ ∝ Dt λ0:t Qt λ0:t−1; λt γt λ0:t ∆ . (46) Dt−1 λ0:t−1 3.3. Fixed-lag smoothing We now describe one iteration of the algorithm. Assume Since the state process is correlated, the future observations that for some time instant t 1, we have N trajectories − − contain information about the current value of the state; Λ(j,t−1) = Λ(j,t 1) Λ(j,t 1) ( 0 , ..., t−1 ); in addition, for each trajec- therefore, whenever it is possible to delay the decision, fixed- tory Λ(j,t−1), the following quantities are stored: lag smoothing estimates yield more reliable information on ∆ Λ(j,t−1) the indicator process than filtering estimates. (1) the factor Dt−1( )definedin(45); τ−t+1 As pointed out above, it is possible to determine an es- (2) for each prolongation λt:τ ∈ Z with τ ∈ timate of the fixed-lag smoothing distribution for any delay {t, t +1,..., t + ∆ − 1}, the conditional likelihood (j,t−1) δ from the trajectories and the associated weights produced γτ (Λ , λt:τ )givenin(38); by the SISR or GS method described above; nevertheless, we ∆ (3) for each prolongation λt:t+∆−1 ∈ Z , the filtering con- should be aware that this estimate can be rather poor when (j,t−1) ditional mean µt+∆−1([Λ , λt:t+∆−1]) and covari- the delay is large, as a consequence of the impoverishment (j,t−1) δ ance Γt+∆−1(Λ , λt:t+∆−1). of the system of particles (the system of particle “forgets” its past). To address this well-known problem in all parti- One iteration of the algorithm is then described below. cle methods, it has been proposed by several authors (see ∆ (1) For each i ∈{1, ..., N} and for each λ ∆ ∈ Z +1, [11, 16, 17, 18]) to sample at time t from the conditional t:t+ compute the predictive conditional mean and co- distribution of Λt given Y0:t+∆ = y0:t+∆ for some ∆ > 0. (i,t−1) variance of the state, µt+∆|t+∆−1([Λ , λt:t+∆]) and The computation of fixed-lag smoothing distribution is also (i,t−1) Γ +∆| +∆−1([Λ , λ : +∆]), using (32)and(33), re- amenable to GS approximation. t t t t Λ spectively. Then compute the innovation covariance Consider the distribution of the indicator variables 0:t Σ Λ(i,t−1) t+∆[( , λt:t+∆)] using (34) and the likelihood conditional to the observations Y0:t+∆ = y0:t+∆,where∆ is a Λ(j,t−1) ∆ γt+∆( , λt:t+∆) using (31). positive integer. Denote by {F } this sequence of proba- t t 0 (2) For each i ∈{1, ..., N} and j ∈{1, ..., M},compute bility measures; the dependence on the observations y0:t+∆ being, as in the previous section, implicit. This sequence of distributions also satisfies (3), that is, there exists a fi- t+∆ ∆ t P ⊗t ≺ P ∆ Λ(i,t−1) = Λ(i,t−1) nite transition kernel Qt :(Z , (Z) ) (Z, (Z)) such that Dt , zj γτ , zj , λt+1:t+τ , ∆ ∆ ∆ = ⊗ λ ∆ τ=t+1 Ft Ft−1 Qt for all t 1. Elementary conditional prob- t+1:t+ ability calculations exploiting the conditional independence ∆ Λ(i,t−1) ∆ (i,j,t) (i,t−1) Dt , zj structure of (28) show that the transition kernel Qt can be γ = γ (Λ , z ) , t j ∆ Λ(i,t−1) determined, up to a normalization constant, by the relation Dt−1 (i,j,t) (i,j,t) γ ∆ w = . t+ | | M N (i ,j ,t) ∆ = f (y y − , λ ) f (λ λ − ) = = γ ∆ ∝ λt+1:t+ τ t τ 0:τ 1 0:τ τ 0:τ 1 i 1 j 1 Qt λ0:t−1; λt t+∆−1 , | − | − (47) λt:t+∆−1 τ=t f (yτ y0:τ 1, λ0:τ )f (λτ λ0:τ 1) (44) (3) Update the trajectory of particles using an unbiased t where, for all λ0:t−1 ∈ Z , the terms f (yτ |y0:τ−1, λ0:τ )canbe sampling procedure {(Ik, Jk), k ∈{1, ..., N}} with determined recursively using Kalman filter fixed-lag smooth- weights {w(i,j,t)}, i ∈{1, ..., N}, j ∈{1, ..., M},and − Λ(k,t) = Λ(Ik,t 1) ∈{ } ing update formula. set ( , zJk ), k 1, ..., N . Global Sampling for Sequential Filtering 2249

4. SOME EXAMPLES σ2 = 1.5, π1 = 1.7, and ρ = 0.3, and applied the GS and the SISR algorithm for online filtering. We compare estimates of 4.1. Autoregressive model with jumps the filtered state mean using the GS and the SIS with sys- To illustrate how the GS method works, we consider the tematic resampling. In both case, we use the estimator (42) state-space model of the filtered mean. Two different unbiased sampling strategies are used: multinomial sampling and the modified strat- = ified sampling (detailed in the appendix).1 In Figure 1,we Xt aΛt Xt−1 + σΛt t, (48) have displayed the box and whisker plot2 of the difference = Yt Xt + ρηt, between the filtered mean estimate (42) and the true value of the state variables for N = 5, 10, 50 particles using multino- where {t}t≥0 and {ηt}t≥0 are i.i.d. unit-variance Gaussian {Λ } mial sampling (Figure 1a) and the modified stratified sam- noise. We assume that t t≥0 is an i.i.d. sequence of ran- pling (Figure 1b). These results are obtained from 100 hun- def dom variables taking their values in Z ={1, 2},whichis dred independent Monte Carlo experiments where, for each { } { } independent from both t t≥0 and ηt t≥0, and such that experiment, a new set of the observations and state variables P Λ = = ∈ [ 0 i] πi, i Z. This can easily be extended to deal are simulated. These simulations show that, for the autore- with the Markovian case. This simple model has been dealt gressive model, the filtering algorithm performed reasonably with, among others, in [19]and[20, Section 5.1]. We focus in well even when the number of particles is small (the differ- this section on the filtering problem, that is, we approximate ence between N = 5andN = 50 particles is negligible; the distribution of the hidden state Xt given the observations N = 50 particles is suggested in the literature for the same = up to time t, Y0:t y0:t. For this model, we can carry out the simulation setting [20]). There are no noticeable differences computations easily. The transition kernel qt defined in (30) between the standard SISR implementation and the GS im- ∈ t ∈ is given, for all λ0:t−1 Z , λt Z,by plementation of the SISR. Note that the error in the estimate is dominated by the filtered variance E[(X − E[X |Y ])2]; t t 0:t 2 − | − the additional variations induced by the fluctuations of the πλt yt µt t 1 λ 0:t qt λ0:t−1, λt ∝ exp − , particle estimates are an order of magnitude lower than this Σ 2Σ λ 2π t λ0:t t 0:t quantity. (49) To visualize the difference between the different sam- where the mean µt|t−1[λ0:t]andcovarianceΣt[λ0:t]arecom- pling schemes, it is more appropriate to consider the fluc- puted recursively from the filtering mean µt−1([λ0:t−1]) and tuation of the filtered mean estimates around their sam- covariance Γt−1([λ0:t−1]) according to the following one-step ple mean for a given value of the time index and of the Kalman update equations derived from (32), (33), and (34): observations. In Figure 2, we have displayed the box and whisker plot of the error at time index 25 between the fil- (i) predictive mean: tered mean estimates and their sample mean at each time instant; these results have been obtained from 100 inde- = µt|t−1 λ0:t aλt µt−1 λ0:t ; (50) pendent particles (this time, the set of observations and of states are held fixed over all the Monte Carlo simula- (ii) predictive covariance: tions). As above, we have used N = 5, 10, 50 of particles and two sampling methods: multinomial sampling (Figure 2a)

2 2 and modified stratified sampling (Figure 2b). This figure Γt|t−1 λ0:t = a Γt−1 λ0:t + σ ; (51) λt λt shows that the GS estimate of the sampled mean has a (iii) innovation covariance: lower standard deviation than any other estimators included in this comparison, independently of the number of par- ff Σ = Γ 2 ticles which are used. The di erences between these esti- t λ0:t t|t−1 λ0:t + ρ ; (52) mators are however small compared to the filtering variance. (iv) filtered mean: 4.2. Joint channel equalization and symbol detection Γ t|t−1 λ0:t on a flat Rayleigh-fading channel = | − − + µt λ0:t µt t 1 λ0:t 1 Γ 2 t|t−1 λ0:t + ρ (53) 4.2.1. Model description × − | − − ; yt µt t 1 λ0:t 1 We consider in this section a problem arising in transmission over a Rayleigh-fading channel. Consider a communi- (v) filtered covariance: 2 ρ Γ | − λ Γ = t t 1 0:t 1TheMatlabcodetoreproducetheseexperimentsisavailableat t λ0:t 2 . (54) Γt|t−1 λ0:t + ρ http://www.tsi.enst.fr/∼moulines/. 2The lower and upper limits of the box are the quartiles; the horizontal We have used the parameters (used in the experiments car- line in the box is the sample median; the upper and lower whiskers are at 3/2 ried out in [20, Section 5.1]): ai = 0.9(i = 1, 2), σ1 = 0.5, times interquartiles. 2250 EURASIP Journal on Applied Signal Processing

1 1

0.5 0.5

0 0 Error Error

−0.5 −0.5

−1 −1

SISR5 SISR10 SISR50 GS5 GS10 GS50 SISR5 SISR10 SISR50 GS5 GS10 GS50 Method Method

(a) (b)

Figure 1: Box and whisker plot of the difference between the filtered mean estimates and the actual value of the state estimate for 100 independent Monte Carlo experiments. (a) Multinomial sampling. (b) Residual sampling with the modified stratified sampling.

cation system signaling through a flat-fading channel with where Yt, αt, St,andVt denote the received signal, the fading additive noise. In this context, the indicator variables {Λt} in channel coefficient, the transmitted symbol, and the additive the representation (28) are the input bits which are transmit- noise at time t, respectively. It is assumed in the sequel that ted over the channel and {St}t≥0 are the symbols generally (i) the processes {α } , {Λ } ,and{V } are mutu- taken into an M-ary complex alphabet. The function Ψ is t t 0 t t 0 t t 0 t ally independent; thus the function which maps the stream of input bits into a { } stream of complex symbols: this function combines channel (ii) the noise Vt is a sequence of i.i.d. zero-mean com- ∼ N 2 encoding and symbol mapping. In the simple example con- plex random variables Vt c(0, σV ). sidered below, we assume binary phase shift keying (BPSK) It is further assumed that the channel fading process is modulation with differential encoding: = − (2Λ − 1). St St 1 t { } The input-output relationship of the flat-fading channel is Rayleigh, that is, αt is a zero-mean complex Gaussian pro- described by cess; here modelled as an ARMA(L, L), − −· · ·− = ··· Yt = αtSt + Vt, (55) αt φ1αt−1 φLαt−L θ0ηt +θ1ηt−1+ +θLηt−L, (56) Global Sampling for Sequential Filtering 2251

×10−3 ×10−3 8 8

6 6

4 4

2 2

0 0 Error Error

−2 −2

−4 −4

−6 −6

−8 −8 SISR5 SISR10 SISR50 GS5 GS10 GS50 SISR5 SISR10 SISR50 GS5 GS10 GS50 Method Method

(a) (b)

Figure 2: Box and whisker plot of the difference between the filtered mean estimates and their sample mean for 100 independent particles for a given value of the time index 25 and of the observations. (a) Multinomial sampling. (b) Residual sampling with the modified stratified sampling.

where φ1, ..., φL and θ0, ..., θL are the autoregressive and the where {ψk}1≤k≤m are the coefficients of the expansion of moving average (ARMA) coefficients, respectively, and {ηt} θ(z)/φ(z), for |z|≤1, with is a white complex Gaussian noise with zero mean and unit p variance. This model can be written in state-space form as φ(z) = 1 − φ1z −···−φpz , follows: q (58) θ(z) = 1+θ1z + ···+ θqz .     01 0··· 0 ψ This particular problem has been considered, among others,    1  00 1··· 0  ψ  in [10, 16, 18, 21, 22].    2 X +1 =  . . .  X +  .  η , t  . . .  t  .  t . . . . (57) 4.2.2. Simulation results φL φL−1 ··· φ1 ψL To allow comparison with previously reported work, we ··· αt = 10 0 Xt + ηt, consider the example studied in [16, Section VIII]. In this 2252 EURASIP Journal on Applied Signal Processing example, the fading process is modelled by the output of a 0.1 Butterworth filter of order L = 3 whose cutoff frequency is 0.05, corresponding to a normalized Doppler frequency fdT = 0.05 with respect to the symbol rate 1/T, which is a 0.01 fast-fading scenario. More specifically, the fading process is modelled by the ARMA(3, 3) process 0.001 α − 2.37409α − +1.92936α − − 0.53208α − BER t t 1 t 2 t 3 −2 = 10 0.89409η +2.68227η − (59) t t 1 +2.68227ηt−2 +0.89409ηt−3 , 0.0001 where ηt ∼ Nc(0, 1). It is assumed that a BPSK modulation is used, that is, St ∈{−1, +1},withdifferential encoding and 1e − 05 10 15 20 25 30 35 40 no channel code; more precisely, we assume that St = St−1Λt, SNR where Λt ∈{−1, +1} is the bit sequence, assumed to be i.i.d. Bernoulli random variables with probability of success = P Λ = = GS (δ 0) ( t 1) 1/2. GS (δ = 1) The performance of the GS receiver (using the modified MKF (δ = 0, β = 0.1) residual sampling algorithm) has been compared with the MKF (δ = 1, β = 0.1) MKF (δ = 0, β = 1) following receiver schemes. MKF (δ = 1, β = 1) (1) Known channel lower bound. We assume that the true Known channel bound ffi Genie-aided bound fading coe cients αt are known to the receiver and Differential detector we calculate the optimal coherent detection rule Sˆt = { ∗ } Λ = sign( αt Yt )and ˆ t SˆtSˆt−1. Figure 3: BER performance of the GS receiver versus the SNR. The (2) Genie-aided lower bound. We assume that a genie al- BER corresponding to delays δ = 0andδ = 1 are shown. Also = lows the receiver to observe Y˜t = αt + V˜t,with shown in this figure are the BER curves for the MKF detector (δ 0 V˜ ∼ N (0, σ2 ). We use Y˜ to calculate an estimate and δ = 1), the known channel lower bound, the genie-aided lower t c V t ff αˆ of the fading coefficients via a Kalman filter and bound, and the di erential detector. The number of particles for the t GS receiver and the MKF detector is 50. we then evaluate the optimal coherent detection Sˆt = { ∗ } Λ = sign( αˆt Yt )and ˆ t SˆtSˆt−1 using the filtered fading process. counting the BER. The BER performance of the GS receiver is (3) Differential detector. In this scenario, no attempt is shown for estimation delays δ = 0 (concurrent estimation) made to estimate the fading process and the input bits and δ = 1. Also shown are the BER curves for the known are estimated using incoherent differential detection: channel lower bound, the genie-aided lower bound, the dif- ∗ Λˆ = sign( {Y Y − }). ferential detector, and the MKF detector with estimation de- t t t 1 = = = (4) MKF detector. The SMC filter described in [16, Sec- lays δ 0andδ 1 and resampling thresholds β 0.1 and β = 1 (systematic resampling). The number of particles tions IV and V] is used to estimate Λt.TheMKFde- tector uses the SISR algorithm to draw samples in the for both the GS receiver and the MKF detector is set to 50. From this figure, it can be seen that with 50 particles, there is indicator space and implements a Kalman filter for ff each trajectory in order to compute its trial sampling no significant performance di erence between the proposed density and its importance weight. Resampling is per- receiver and the MKF detector with the same estimation de- = = formed when the ratio between the effective sample lay and β 0.1orβ 1. Note that, as observed in [16], size defined in [16, equation (45)] and the actual sam- the performance of the receiver is significantly improved by = ple size N is lower than a threshold β. The delayed the delayed-weight method with δ 1 compared with concurrent estimate; there is no substantial improvement when weight method is used to obtain an estimate of Λt with adelayδ. increasing further the delay; the GS receiver achieves essentially the genie-aided bound over the considered SNR. In all the simulations below, we have used only the concur- Figure 4 shows the BER performance of the GS receiver rent sampling method because in the considered simulation versus the number of particles at SNR = 20 dB and δ = 1. scenarios, the use of the delayed sampling method did not Also shown in this figure is the BER performance for the bring significative improvement. This is mainly due to the MKF detector with β = 0.1andβ = 1, respectively. It can fact that we have only considered, due to space limitations, be seen from this plot that when the number of particles is the uncoded communication scenario. decreased from 50 to 10, the BER of the MKF receiver with Figure 3 shows the BER performance of each receiver ver- β = 0.1 increases by 67%, whereas the BER of the GS receiver sus the SNR. The SNR is defined as var(αt)/ var(Vt) and the increases by 11% only. In fact, Figure 4 also shows that, for BER is obtained by averaging the error rate over 106 sym- this particular example, the BER performance of the GS re- bols. The first 50 symbols were not taken into account in ceiver is identical to the BER performance of an MKF with Global Sampling for Sequential Filtering 2253

0.1 0.1

0.01

0.01 0.001 BER BER

0.0001

0.001 1e − 05 0 5 10 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 Number of particles SNR

= GS, RMS : δ 1 GS, RMS : δ = 1 = = SISR, RMS : δ 1, β 1 SISR, RMS : δ = 1, β = 1 = = SISR, RMS : δ 1, β 0.1 SISR, RMS : δ = 1, β = 0.1 Known channel bound Figure 4: BER performance of the GS receiver versus the number Genie-aided bound Differential detector of particles at SNR = 20 dB and δ = 1. Also shown in this figure are the BER curves for the MKF detector with β = 0.1andβ = 1. Figure 5: BER performance of the GS receiver versus the SNR. The BER corresponding to delay δ = 1 is shown. Also shown in this = = the same number of particles and a resampling threshold set figure are the BER curves for the MKF detector (δ 1, β 0.1), = the known channel lower bound, the genie-aided lower bound, and to β 1 (systematic resampling). This suggests that, contrary the differential detector. The number of particles for the GS receiver to what is usually argued in the literature [5, 16], systematic and the MKF detector is 5. resampling of the particle seems to be, for reasons which remain yet unclear from a theoretical standpoint, more robust when the number of particles is decreased to meet the con- the implementation of such a solution in real-world appli- straints of real-time implementation. cations: the global sampling algorithm is close to the optimal Figure 5 shows the BER performance of each receiver ver- genie-aided bound with as few as 5 particles and thus pro- sus the SNR when the number of particles for both the GS vides a realistic alternative to the joint channel equalization receiver and the MKF detector is set to 5. For these simula- and symbol detection algorithms reported earlier in the lit- tions, the BER is obtained by averaging the error rate over erature. 105 symbols. From this figure, it can be seen that with 5 particles, there is a significant performance difference between APPENDIX the proposed receiver and the MKF detector with the same estimation delay and a β = 0.1 resampling threshold. This MODIFIED STRATIFIED SAMPLING difference remains significant even for SNR values close to In this appendix, we present the so-called modified stratified 10 dB. Figure 5 also shows that, for this particular example, sampling strategy. Let M and N be integers and (w , ..., w ) the BER performance of the GS receiver is identical to the 1 M M = BER performance of an MKF with the same estimation delay be nonnegative weights such that i=1 wi 1. A sam- and a resampling threshold β set to 1. pling procedure is said to be unbiased if the random vector (N1, ..., NM)(whereNi is the number of times the index i is drawn) satisfies 5. CONCLUSION M In this paper, a sampling algorithm for conditionally linear Ni = N, E[Ni] = Nwi, i ∈{1, ..., M}. (A.1) Gaussian state-space models has been introduced. This algo- i=1 rithm exploits the particular structure of the flow of probability measures and the fact that, at each time instant, a global The modified stratified sampling is summarized as follows. exploration of all possible offsprings of a given trajectory of ∈{ } indicator variables can be considered. The number of trajec- (1) For i 1, ..., M ,compute[Nwi], where [x] is the tories is kept constant by sampling from this set (selection integer part of x; then compute the residual number ˜ = − M step). N N i=1[Nwi] and the residual weights The global sampling algorithm appears, in the example considered here, to be robust even when a very limited num- Nwi − Nwi w˜ = , i ∈{1, ..., M}. (A.2) ber of particles is used, which is a basic requirement for i N˜ 2254 EURASIP Journal on Applied Signal Processing

˜ (2) Draw N i.i.d. random variables U1, ..., UN˜ with a uni- [17] X. Wang, R. Chen, and D. Guo, “Delayed-pilot sampling for form distribution on [0, 1/N˜ ]andcompute,fork ∈ mixture Kalman filter with application in fading channels,” {1, ..., N˜ }, IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 241–254, 2002. [18] E. Punskaya, C. Andrieu, A. Doucet, and W. J. Fitzgerald, k − 1 “Particle filtering for demodulation in fading channels with U˜ = + U . (A.3) k N˜ k non-Gaussian additive noise,” IEEE Transactions on Commu- nications, vol. 49, no. 4, pp. 579–582, 2001. ∈{ } ˜ (3) For i 1, ..., M ,setNi as the number of indices [19] G. Kitagawa, “Monte Carlo filter and smoother for non- k ∈{1, ..., N˜ } satisfying Gaussian non-linear state space models,” Journal of Computa- tional and Graphical Statistics, vol. 5, no. 1, pp. 1–25, 1996. i−1 i [20] J. Liu, R. Chen, and T. Logvinenko, “A theoretical frame- ˜ ≤ wj < Uk wj . (A.4) work for sequential importance sampling and resampling,” j=1 j=1 in Sequential Monte Carlo Methods in Practice,A.Doucet, N.deFreitas,andN.Gordon,Eds.,Springer,NewYork,NY, REFERENCES USA, 2001. [21] F. Ben Salem, “Recepteur´ particulaire pour canaux mobiles [1]T.Kailath,A.Sayed,andB.Hassibi, Linear Estimation,Pren- ff evanescents,”´ in Journées Doctorales d’Automatique (JDA ’01), tice Hall, Englewood Cli s, NJ, USA, 1st edition, 2000. pp. 25–27, Toulouse, France, September 2001. [2] I. MacDonald and W. Zucchini, Hidden Markov and Other [22] F. Ben Salem, Réception particulaire pour canaux multi-trajets Models for Discrete-Valued Time Series, vol. 70 of Monographs évanescents en communication radiomobile, Ph.D. dissertation, on Statistics and Applied Probability, Chapman & Hall, Lon- Universite´ Paul Sabatier, Toulouse, France, 2002. don, UK, 1997. [3] A. Doucet, N. de Freitas, and N. Gordon, “An introduction to sequential Monte Carlo methods,” in Sequential Monte Carlo Pascal Cheung-Mon-Chan graduated from Methods in Practice, A. Doucet, N. de Freitas, and N. Gordon, theEcoleNormaleSuperieure´ de Lyon in Eds., pp. 3–13, Springer, New York, NY, USA, January 2001. 1994 and received the Diplomeˆ d’Ingenieur´ [4] C. Carter and R. Kohn, “Markov chain Monte Carlo in con- from the Ecole´ Nationale Superieure´ des ditionally Gaussian state space models,” Biometrika, vol. 83, Tel´ ecommunications´ (ENST) in Paris in the no. 3, pp. 589–601, 1996. [5] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte same year. After working for General Elec- Carlo sampling methods for Bayesian filtering,” Statistics and tric Medical Systems as a Research and De- Computing, vol. 10, no. 3, pp. 197–208, 2000. velopment Engineer, he received the Ph.D. [6] N. Shephard, “Partial non-Gaussian state space,” Biometrika, degree from the ENST in Paris in 2003. He ff vol. 81, no. 1, pp. 115–131, 1994. is currently a member of the research sta [7] J. Liu and R. Chen, “Sequential Monte Carlo methods for dy- at France Telecom Research and Development. namic systems,” Journal American Statistical Association, vol. 93, no. 444, pp. 1032–1044, 1998. Eric Moulines was born in Bordeaux, France, in 1963. He re- [8] R. Chen and J. Liu, “Mixture Kalman filter,” Journal Royal ceived the M.S. degree from Ecole Polytechnique in 1984, the Ph.D. Statistical Society Series B, vol. 62, no. 3, pp. 493–508, 2000. degree from Ecole´ Nationale Superieure´ des Tel´ ecommunications´ [9] G. Rigal, Filtrage non-linéaire, résolution particulaire et appli- (ENST) in 1990 in signal processing, and an “Habilitation a` Diriger cations au traitement du signal, Ph.D. dissertation, Universite´ des Recherches” in applied mathematics (probability and statistics) Paul Sabatier, Toulouse, France, 1993. from UniversiteRen´ e´ Descartes (Paris V) in 1995. From 1986 until [10] J. Liu and R. Chen, “Blind deconvolution via sequential im- 1990, he was a member of the technical staff at Centre National de putations,” Journal American Statistical Association, vol. 430, Recherche des Tel´ ecommunications´ (CNET). Since 1990, he was no. 90, pp. 567–576, 1995. with ENST, where he is presently a Professor (since 1996). His [11] E. Punskaya, C. Andrieu, A. Doucet, and W. Fitzgerald, “Par- teaching and research interests include applied probability, math- ticle filtering for multiuser detection in fading CDMA chan- ematical and computational statistics, and signal processing. nels,” in Proc. 11th IEEE Signal Processing Workshop on Statis- tical Signal Processing, pp. 38–41, Orchid Country Club, Sin- gapore, August 2001. [12] V. Zaritskii, V. Svetnik, and L. Shimelevich, “Monte-Carlo technique in problems of optimal information processing,” Automation and Remote Control, vol. 36, no. 12, pp. 95–103, 1975. [13] H. Akashi and H. Kumamoto, “Random sampling approach to state estimation in switching environments,” Automatica, vol. 13, no. 4, pp. 429–434, 1977. [14] J. Tugnait, “Adaptive estimation and identification for discrete systems with markov jump parameters,” IEEE Trans. Auto- matic Control, vol. 27, no. 5, pp. 1054–1065, 1982. [15] A. Doucet, N. Gordon, and V. Krishnamurthy, “Particle filters for state estimation of jump Markov linear systems,” IEEE Trans. Signal Processing, vol. 49, no. 3, pp. 613–624, 2001. [16] R. Chen, X. Wang, and J. Liu, “Adaptive joint detection and decoding in flat-fading channels via mixture kalman filtering,” IEEE Transactions on Information Theory, vol. 46, no. 6, pp. 2079–2094, 2000. EURASIP Journal on Applied Signal Processing 2004:15, 2255–2266 c 2004 Hindawi Publishing Corporation

Multilevel Mixture Kalman Filter

Dong Guo Department of Electrical Engineering, Columbia University, New York, NY 10027, USA Email: [email protected]

Xiaodong Wang Department of Electrical Engineering, Columbia University, New York, NY 10027, USA Email: [email protected]

Rong Chen Department of Information and Decision Sciences, University of Illinois at Chicago, Chicago, IL 60607-7124, USA Email: [email protected] Department of Business Statistics & Econometrics, Peking University, Beijing 100871, China

Received 30 April 2003; Revised 18 December 2003

The mixture Kalman filter is a general sequential Monte Carlo technique for conditional linear dynamic systems. It generates samples of some indicator variables recursively based on sequential importance sampling (SIS) and integrates out the linear and Gaus- sian state variables conditioned on these indicators. Due to the marginalization process, the complexity of the mixture Kalman filter is quite high if the dimension of the indicator sampling space is high. In this paper, we address this difficulty by developing a new Monte Carlo sampling scheme, namely, the multilevel mixture Kalman filter. The basic idea is to make use of the multilevel or hierarchical structure of the space from which the indicator variables take values. That is, we draw samples in a multilevel fashion, beginning with sampling from the highest-level sampling space and then draw samples from the associate subspace of the newly drawn samples in a lower-level sampling space, until reaching the desired sampling space. Such a multilevel sampling scheme can be used in conjunction with the delayed estimation method, such as the delayed-sample method, resulting in delayed multilevel mixture Kalman filter. Examples in wireless communication, specifically the coherent and noncoherent 16-QAM over flat-fading channels, are provided to demonstrate the performance of the proposed multilevel mixture Kalman filter. Keywords and phrases: sequential Monte Carlo, mixture Kalman filter, multilevel mixture Kalman filter, delayed-sample method.

1. INTRODUCTION (DLM) [16] and it can be generally described as follows: = Recently there have been significant interests in the use of xt Fλt xt−1 + Gλt ut, (1) the sequential Monte Carlo (SMC) methods to solve on- = yt Hλt xt + Kλt vt, line estimation and prediction problems in dynamic systems. Compared with the traditional filtering methods, the sim- where ut ∼ N (0, I)andvt ∼ N (0, I) are the state and ob- ple, flexible—yet powerful—SMC provides effective means servation noise, respectively, and λt is a sequence of random to overcome the computational difficulties in dealing with indicator variables which may form a Markov chain, but are nonlinear dynamic models. The basic idea of the SMC tech- independent of ut and vt and the past xs and ys, svariances associ- ilar method is also discussed in [15] for CDLM system. The ated with a standard sequential importance sampler applied CDLM is a direct generalization of the dynamic linear model directly to the space of the state variable. 2256 EURASIP Journal on Applied Signal Processing

y the performance of the MKF. However, the computational complexity of the delayed-sample method is very high. For example, for a ∆-step delayed-sample method, the algorith- s2,2 s2,1 s1,2 s1,1 ∆ mic complexity is O(|A1| ), where |A1| is the cardinality of c2 c1 the original sampling space. Here, we also provide a delayed multilevel MKF by exploring the multilevel structure of the

s2,3 s2,4 s1,3 s1,4 indicator space. Instead of exploring the original entire space of future states, we only sample the future states in a higher- x level sampling space, thus signiﬁcantly reducing the dimen- s3,2 s3,1 s4,2 s4,1 sion of the search space and the computational complexity. c3 c4 In recent years, the SMC methods have been successfully employed in several important problems in communi-

s3,3 s3,4 s4,3 s4,4 cations, such as the detection in flat-fading channels [13, 19, 20], space-time coding [21, 22], OFDM system [23], and so on. To show the good performance of the proposed novel re- Figure 1: The multilevel structure of the 16-QAM modulation used ceivers, we apply them into the problem of adaptive detection A = in digital communications. The set of transmitted symbol 1 in flat-fading channels in the presence of Gaussian noise. { , = 1, ,4, = 1, ,4} is the original sampling space, and si,j i ... j ... The remainder of the paper is organized as follows. Sec- the centers A2 ={c1, c2, c3, c4} constitute a higher-level sampling space. tion 2 briefly reviews the MKF algorithm and its variants. In Section 3, we present the multilevel MKF algorithm. In However, the computational complexity of the MKF can Section 4, we treat the delayed multilevel MKF algorithm. be quite high, especially in the case of high-dimensional in- In Section 5, we provide simulation examples. Section 6 con- dicator space, due to the need of marginalizing out the indi- cludes the paper. cator variables. Fortunately, often the space from which the indicator variables take values exhibits multilevel or hierar- 2. BACKGROUND OF MIXTURE KALMAN FILTER chical structures, which can be exploited to reduce the computational complexity of the MKF. For example, a multilevel 2.1. Mixture Kalman filter structure of the 16-QAM modulation used in digital commu- Consider again the CDLMs defined by (1). The MKF exploits nications is shown in Figure 1. The set of transmitted sym- the conditional Gaussian property conditioned on the indi- bols A1 ={si,j , i = 1, ...,4, j = 1, ...,4} is the original sam- cator variable and utilizes a marginalization operation to im- ffi pling space, and the centers A2 ={c1, c2, c3, c4} constitute prove the algorithmic e ciency. Instead of dealing with both a higher-level sampling space. Thus, based on the observed xt and λt, the MKF draws Monte Carlo samples only in the data, for every sample stream, we first draw a sample (say indicator space and uses a mixture of Gaussian distributions c1) from the higher-level sampling space A2 and then draw a to approximate the target distribution. Compared with the ffi new sample from the associated subspaces s1,1, s1,2, s1,3, s1,4 of generic SMC method, the MKF is substantially more e cient c1 in the original sampling space A1.Inthisway,weneednot (e.g., it produces more accurate results with the same com- sample from the entire original sampling space, and many putational resources). Kalman filter update steps associated with the standard MKF First we define an important concept that is used can be saved. throughout the paper. A set of random samples and the { (i) (i) }m This kind of hierarchical structure imposed on the in- corresponding weights (η , w ) i=1 is said to be properly dicator space is also employed in the partitioned sampling weighted with respect to the distribution π(·)if,foranymea- strategy [18], which greatly improved the efficiency and the surable function h,wehave accuracy for multiple target tracking over the original SMC m (j) (j) = h η w methods. However, in this paper, the hierarchical structure is j 1 −→ −→ ∞ m (j) Eπ h(η) as m . (2) employed to reduce the computational load associated with j=1 w MKF, especially for high-dimensional indicator space, while (j) retaining the desirable properties of MKF. In particular, if η is sampled from a trial distribution · (j) = Dynamic systems often possess strong memories, that is, g( ) which has the same support as π, and if w (j) (j) { (j) (j) }m future observations can reveal substantial information about π(η )/g(η ), then (η , w ) j=1 is properly weighted · the current state. Therefore, it is often beneficial to make use with respect to π( ). = Λ = of these future observations in sampling the current state. Let Yt (y0, y1, ..., yt)and t (λ0, λ1, ..., λt). By re- However, an MKF method usually does not go back to regen- cursively generating a set of properly weighted random sam- { Λ(j) (j) }m Λ | erate past samples in view of the new observations, although ples ( t , wt ) j=1 to represent p( t Yt), the MKF ap- the past estimation can be adjusted by using the new impor- proximates the target distribution p(xt | Yt)byarandom tance weights. To overcome this difficulty, a delayed-sample mixture of Gaussian distributions method is developed [13]. It makes use of future observa- m (j)N µ(j) Σ(j) tions in generating samples of the current state. It is seen wt c t , t ,(3) there that this method is especially effective in improving j=1 Multilevel Mixture Kalman Filter 2257 where to be ineffective. If there are too many ineffective samples, the Monte Carlo procedure becomes inefficient. To avoid the µ(j) = µ Λ(j) Σ(j) = Σ Λ(j) t t t , t t t (4) degeneracy, a useful resampling procedure, which was suggested in [7, 11], may be used. Roughly speaking, resampling are obtained with a Kalman filter on the system (1) for the is to multiply the streams with the larger importance weights, Λ(j) given indicator trajectory t .Denote while eliminating the ones with small importance weights. A simple, but efficient, resampling procedure consists of the (j) µ(j) Σ(j) κt t , t . (5) following two steps. {Λ (j) µ(j) Σ(j)}m Thus, a key step in the MKF is the production at time t of the (1) Sample a new set of streams t , t , t j=1 from { Λ(j) (j) (j) }m {Λ(j) µ(j) Σ(j)}m weighted samples of indicators ( t , κt , wt ) j=1 based t , t , t j=1 with probability proportional to { Λ(j) (j) (j) }m { (j)}m on the set of samples ( t−1, κt−1, wt−1) j=1 at the previous the importance weights wt j=1. − (j) ( ) time (t 1). Suppose that the indicator λt takes values from {Λ µ(j) Σ j }m (2) To each stream in t , t , t j=1, assign equal a finite set A1. The MKF algorithm is as follows. (j) = = weight, that is, wt 1/m, j 1, ..., m. Algorithm 1 (MKF). Suppose at time (t−1), a set of property { Λ(j) (j) (j) }m Resampling can be done at every fixed-length time inter- weighted samples ( t−1, κt−1, wt−1) j=1 is available with re- Λ | val (say, every five steps) or it can be conducted dynamically. spect to p( t−1 Yt−1). Then at time t, as the new data yt ff becomes available, the following steps are implemented to The e ective sample size can be used to monitor the varia- update each weighted sample. tion of the importance weights of the sample streams and to decide when to resample as the system evolves. The effective For j = 1, ..., m, the following steps are applied. sample size is defined as in [13]: (i) Based on the new data y ,foreacha ∈ A ,runaone- t i 1 m step Kalman filter update assuming λt = ai to obtain m¯ ,(9) t 1+υ2 t = (j) −y−−−−t , λt →ai (j) µ Λ(j) = Σ Λ(j) = κt−1 κt,i t t−1, λt ai , t t−1, λt ai . where υt, the coefficient of variation, is given by (6) ( ) 2 1 m w j (ii) For each a ∈ A , compute the sampling density υ2 = t − 1 , (10) i 1 t m w¯ j=1 t (j) = | Λ(j) ρt,i P λt ai t−1, Yt (7) = m (j) (j) (j) with w¯t j=1 wt /m. In dynamic resampling, a resampling ∝ | = Λ − = | Λ p yt λt ai, t−1, Yt 1 P λt ai t−1 . step is performed once the effective sample size m¯ t is below a certain threshold. Note that by the model (1), the first density in (7)is Heuristically, resampling can provide chances for good Gaussian and can be computed based on the Kalman sample streams to amplify themselves and hence “rejuvenate” (j) filter update (6). Draw a sample λt according to the the sampler to produce a better result for future states as (j) Λ(j) the system evolves. It can be shown that the samples drawn above sampling density. Append λt to t−1 and ob- (j) (j) (j) (j) by the above resampling procedure are also indeed properly tain Λ .Ifλ = a , then set κ = κ . t t i t t,i weighted with respect to p(Λ | Y ), provided that m is suf- (iii) Compute the importance weight t t ficiently large. In practice, when small to modest m is used (we use m = 50 in this paper), the resampling procedure (j) = (j) · | Λ(j) wt wt−1 p yt t−1, Yt−1 canbeseenasatradeoff between the bias and the variance. |A | 1 (8) That is, the new samples with their weights resulting from the ∝ (j) · (j) wt−1 ρt,i . resampling procedure are only approximately proper, and i=1 this introduces small bias in the Monte Carlo estimation. On the other hand, however, resampling significantly reduces the {Λ(j) (j) (j)} The new sample t , κt , wt is then properly Monte Carlo variance for future samples. weighted with respect to p(Λt | Yt). (iv) Perform a resampling step as discussed below. 2.3. Delayed estimation 2.2. Resampling procedure Model (1) often exhibites strong memory. As a result, fu- (j) ture observations often contain information about the cur- The importance sampling weight wt measures the “quality” rent state. Hence a delayed estimate is usually more accurate Λ(j) of the corresponding imputed indicator sequence t .Arel- than the concurrent estimate. In delayed estimation, instead atively small weight implies that the sample is drawn far from of making inference on Λt instantaneously with the poste- the main body of the posterior distribution and has a small rior distribution p(Λt | Yt), we delay this inference to a later contribution in the final estimation. Such a sample is said time (t + ∆), ∆ > 0, with the distribution p(Λt | Yt+∆). 2258 EURASIP Journal on Applied Signal Processing

As discussed in [13], there are primarily two approaches to The delayed-sample MKF algorithm recursively propagates delayed estimation, namely, the delayed-weight method and the samples properly weighted for p(Λt−1 | Yt+∆−1) to those the delayed-sample method. for p(Λt | Yt+∆) and is summarized as follows.

2.3.1. Delayed-weight method Algorithm 2 (delayed-sample MKF). Suppose, at time (t+∆− { Λ(j) κ(j) (j) }m { Λ(j) (j) }m 1), a set of properly weighted samples ( t−1, t−1, wt−1) j=1 In the concurrent MKF algorithm, if the set ( t+δ, wt+δ) j=1 is available with respect to p(Λt−1 | Yt+∆−1). Then at time is properly weighted with respect to p(Λt+δ | Yt+δ), then (t + ∆) as the new data yt+∆ becomes available, the following when we focus our attention on λt at time (t + δ), we have steps are implemented to update each weighted sample. { (j) (j) }m that (λt , wt+δ) j=1 is properly weighted with respect to For j = 1, 2, ..., m, the following steps are performed. p(λt | Yt+δ). Then any inference about the indicator λt, = ∈A t+∆−1 ∈ A∆ E{h(λt) | Yt+δ}, can be approximated by (i) For each λt+∆ ai 1,andforeachλt 1 ,perform a one-step update on the corresponding Kalman (j) t+∆−1 E h λt | Yt+δ ﬁlter κt+∆−1(λt ), that is,

m m = ∼ 1 (j) (j) (j) (11) (j) t+∆−1 yt+∆, λt+∆ ai (j) t+∆−1 = = κ ∆− λ −−−−−−−→ κ ∆ λ , λ ∆ = a . (15) h λt wt+δ, Wt+δ wt+δ. t+ 1 t t+ t t+ i Wt+δ j=1 j=1 (ii) For each ai ∈ A1, compute the sampling density { (j) }m (j) = | Λ(j) Since the weights wt+δ j=1 contain information about the ρt,i P λt ai t−1, Yt+∆ future observations (yt+1, ..., yt+δ), the estimate in (11)is = = | Λ(j) usually more accurate than the concurrent estimate. Note P λt ai t−1 that such a delayed estimation method incurs no additional ∆ computational cost (i.e., CPU time), but it requires some ex- (j) ( ) ( ) × | Λ = t+τ { j j }m p yt+τ Yt+τ−1, t−1, λt ai, λt+1 tra memory for storing λt , ..., λ j=1. For most systems, t+δ t+∆∈A∆ τ=0 this simple delayed-weight method is quite eﬀective for im- λt+1 1 proving the performance over the concurrent method. How- ∆ ﬃ × | Λ(j) = t+τ−1 ever, if this method is not su cient for exploiting the con- P λt+τ t−1, λt ai, λt+1 . straint structures of the indicator variable, we must resort to τ=1 the delayed-sample method, which is described next. (16)

2.3.2. Delayed-sample method Note that the second density in (16) is Gaussian and can be computed based on the results of the Kalman An alternative method of delayed estimation is to generate (j) ﬁlter updates in (15). Draw a sample λt according to (j) (j) m both the delayed samples and the weights {(λ , w )} = (j) (j) t t j 1 the above sampling density. Append to Λ − and Λ | λt t 1 based on the observations Yt+∆, hence making p( t Yt+∆) Λ(j) κ(j) the target distribution at time (t + ∆). The procedure will obtain t . Based on this sample, form t using the provide better Monte Carlo samples since it utilizes the fu- results from the previous step. (j) = ture observations (yt+1, ..., yt+∆) in generating the current (iii) Compute the importance weight. If λt−1 ak and (j) = samples of λt. But the algorithm is also more demanding λt ai, then both analytically and computationally because of the need ∆ Λ(j) | of marginalizing out λt+1, ..., λt+ . p t Yt+∆ (j) = (j) For each possible “future” (relative to time t − 1) symbol wt w − t 1 Λ(j) | (j) | Λ(j) sequence at time (t + ∆ − 1), that is, p t−1 Yt+∆−1 P λt t−1, Yt+∆ ∆ (j) ∆ ∆ | Λ t+τ ∆ λt+ ∈A +1 τ=0 p yt+τ Yt+τ−1, t−1, λt λt, λt+1, ..., λt+∆−1 ∈ A , (12) (j) t 1 1 ∝ w − t 1 ∆−1 (j) t+τ t+∆−1∈A∆ = y | Y − , Λ − , λt 1 τ 0 p t+τ t+τ 1 t 1 λt (j) t+τ ∆−1 we keep the value of a ∆-step Kalman ﬁlter {κ (λ )} = , t+τ t τ 0 ∆ | Λ(j) t+τ−1 where τ=0 P λt+τ t−1, λt × ∆−1 | Λ(j) t+τ−1 τ=0 P λt+τ t−1, λt (j) t+τ κt+τ λt (j) (j) (j) wt−1 (j) µ Λ t+τ Σ Λ t+τ = ∆− ∝ − | − Λ t+τ t−1, λt , t+τ t−1, λt , τ 0, ..., 1, (j) p yt 1 Yt 2, t−1 (13) ρt−1,k |A | with λb (λ , λ , ..., λ ). Denote 1 a a a+1 b × = | Λ(j) (j) P λt−1 ak t−2 ρt,i . = ∆− i 1 κ(j) (j) (j) t+τ 1 t+τ ∈ Aτ+1 t−1 κt−1, κt+τ λt τ=0 : λt 1 . (14) (17) Multilevel Mixture Kalman Filter 2259

(iv) Resample if the effective sample size is below a certain Moreover, the centers of these four subspaces are c1 = threshold, as discussed in Section 2.2. (1.5, 1.5), c2 = (−1.5, 1.5), c3 = (−1.5, −1.5), and c4 = (1.5, −1.5). Thus, we have obtained a higher-level sampling Finally, as noted in [13], we can use the above delayed- space composed of four elements. Then the MKF can draw sample method in conjunction with the delayed-weight samples first from the highest-level sampling space and then method. For example, using the delayed-sample method, we from the associated child set in the next lower-level sampling { Λ(j) (j) }m generate delayed samples and weights ( t , wt ) j=1 based space. The procedure is iterated until reaching the original on observations Yt+∆. Then with an additional delay δ,we sampling space. can use the following delayed-weight method to approximate For simplicity, we will use ci,l to represent the ith symbol any inference about the indicator λt: value in the lth-level sampling space. Assume that there are, in total, L levels of sampling space and the number of ele- |A | E h λt | Yt+∆+δ ments at the lth level is l . The original sampling space is defined as the first level. Then the multilevel MKF is summa- m m (18) =∼ 1 (j) (j) = (j) rized as follows. h λt wt+δ, Wt+δ wt+δ. Wt+δ j=1 j=1 Algorithm 3 (multilevel MKF). Suppose, at time (t − 1), a set { Λ(j) (j) (j) }m of properly weighted samples ( t−1, κt−1, wt−1) j=1 is avail- 3. MULTILEVEL MIXTURE KALMAN FILTER able with respect to p(Λt−1 | Yt−1). Then at time t, as the new data y becomes available, the following steps are imple- Suppose that the indicator space has a multilevel structure. t mented to update each weighted sample. For example, consider the following scenario of a two-level For = 1, , , perform the following steps. sampling space. The first level is the original sampling space j ... m Ω with Ω ={λi, i = 1, 2, ..., N}. We also have a higher-level (j) (A1) Draw the sample ct,L in the Lth-level sampling space. sampling space Ω with Ω ={c , i = 1, 2, ..., N}. The higher- i (a) Based on the new data yt,foreachci,L ∈ AL, level sampling space can be obtained as follows. Define the perform a one-step update on the corresponding elements (say ci) in the higher-level sampling space as the (j) Kalman filter κ − (λ − ) assuming c = c to ob- centers of subset ω in the original sampling space. That is, t 1 t 1 t,L i,L i tain (j) −y−−t , ci→,L (j) 1 κt−1 λt−1 κt,i λt−1, ci,L . (23) ci = λj I λj ∈ ωi i = 1, 2, ..., N, (19) ωi j (b) For each ci,L ∈ AL, compute the Lth-level sampling density = where ωi, i 1, 2, ..., N, is the subset in the original sampling space (i,j) = | Λ(j) ρt,L P ct,L ci,L t−1, Yt (24) N = | Λ(j) | Λ(j) p yt ci,L, t−1, Yt−1 P ci,L t−1, Yt−1 . Ω = ωi, ωi ωj =∅, i, j = 1, ..., N, i = j. (20) i Note that by the model (1), the first density in (24) We call ci the parent of the elements in subset ωi and ωi the is Gaussian and can be computed based on the child set of ci. We can also iterate the above merging proce- Kalman filter update (23). (j) dure on the newly created higher-level sampling space to get (c) Draw a sample ct,L from the Lth-level sampling an even higher-level sampling space. space AL according to the above sampling density, For example, we consider the 16-QAM modulation sys- that is, tem often used in digital communications. The values of the (j) = ∝ (i,j) ∈ A symbols are taken from the set P ct,L ci,L ρt,L , ci,L L. (25) Ω = (a, b):a, b =±0.5, ±2.5 . (21) (j) (A2) Draw the sample ct,l in the lth-level sampling space. (j) As shown in Figure 1, the sampling space Ω can be divided First, find the child set ωl in the current lth-level sam- (j) into four disjoint subsets: pling space for the drawn sample ct,l+1 in the (l + 1)th- level sampling space, then proceed with three more ω1= (0.5, 0.5), (0.5, 2.5), (2.5, 0.5), (2.5, 2.5) , steps. (j) = | (j)| (j) (a) For each ci,l , i 1, 2, ..., ωl in the child set ωl , ω2= (−0.5, 0.5), (−0.5, 2.5), (−2.5, 0.5), (−2.5, 2.5) , perform a one-step update on the corresponding = − − − − − − − − (j) ω3 ( 0.5, 0.5), ( 0.5, 2.5), ( 2.5, 0.5), ( 2.5, 2.5) , Kalman filter κ − (λt−1)toobtain t 1 = − − − − ω4 (0.5, 0.5), (0.5, 2.5), (2.5, 0.5), (2.5, 2.5) . (j) y , c (j) −−−t i→,l (j) (j) (22) κt−1 λt−1 κt,i λt−1, ci,l . (26) 2260 EURASIP Journal on Applied Signal Processing

(j) (b) For each ci,l , compute the sampling density Substituting it into (30), and repeating the procedure with (j) (j) wt−2, ..., w1 ,respectively,wefinallyhave (i,j) = (j) | Λ(j) ρt,l P ct,l ci,l t−1, Yt (27) (j) (j) (j) (j) (j) = p y | c , Λ − , Y − P c | Λ − , Y − . wt t i,l t 1 t 1 i,l t 1 t 1 Λ(j) (j) | p t−1, λt Yt Note that the first density in (27) is also Gaussian = . (j) | Λ(j) ··· (j) | Λ(j) (j) | Λ(j) and can be computed based on the Kalman filter P λ1 0 , Y1 P λt−1 t−2, Yt−1 P λt t−1, Yt update (26). (32) (j) (c) Draw a sample ct,l according to the sampling den- {Λ(j) (j)}m sity, that is, Consequently, the samples t , wt j=1 drawn by the above procedure are properly weighted with respect to p(Λt | Yt) (j) = (j) ∝ (i,j) (j) ∈ (j) {Λ(j) (j) }m P ct,l ci,l ρt,l , ci,l ωl . (28) provided that t−1, wt−1 j=1 areproperlyweightedwithre- spect to p(Λt−1 | Yt−1). Repeat the above steps for the next level until we draw (j) = (j) asampleλt ct,1 from the original sampling space. (j) (j) (j) 4. DELAYED MULTILEVEL MIXTURE (A3) Append the symbol λ to Λ − and obtain Λ . t t 1 t KALMAN FILTER (A4) Compute the trial sampling probability. Assuming the (j) = i,j The delayed-sample method is used to generate samples drawn sample ct,l ct,l in the lth-level sampling space (i,j) { (j) (j) }m and the associated sampling probability is ρ , then (λt , ωt ) j=1 based on the observations Yt+∆,hencemak- t,l Λ | ∆ (j) (j) ing p( t Yt+∆) the target distribution at time (t + ). The the effective sampling probability ( | Λ − , Y )can P λt t 1 t procedure will provide better Monte Carlo samples since it be computed as follows: utilizes the future observations (yt+1, ..., yt+∆) in generating L the current samples of λt. But the algorithm is also more de- (j) | Λ(j) = (i,j) manding both analytically and computationally because of P λt t−1, Yt ρt,l . (29) l=1 the need of marginalizing out λt+1, ..., λt+∆. Instead of exploring the “future” symbol sequences in (A5) Compute the importance weight the original sampling space, our proposed delayed multilevel MKF will marginalize out the future symbols in a higher- Λ(j) (j) | level sampling space. That is, for each possible “future” (rel- p − , λt Yt (j) = (j) t 1 ative to time t − 1) symbol sequence at time (t + ∆), that is, wt w − t 1 Λ(j) | (j) | Λ(j) p t−1 Yt−1 P λt t−1, Yt (30) ∆ (j) (j) (j) ∆ ∈ A × A | − | Λ − λt, ct+1,l, ..., ct+ ,l 1 l (l>1), (33) (j) p yt λt , Yt 1 P λt t−1, Yt 1 = w − . t 1 L (i,j) l=1 ρt,l where ct,l is the symbol in the lth-level sampling space, ff ∆ { (j) (A6) Resample if the e ective sample size is below a certain we compute the value of a -step Kalman filter κt+τ (λt, t+τ }∆ threshold, as discussed in Section 2.2. ct+1,l) τ=1,where

Remark 1 (complexity). Note that the dominant computation required for the above multilevel MKF is the update of (j) t+τ κt+τ λt, ct+1,l the Kalman filter in (23)and(26). Denote J |A1| and K |ω |. The number of one-step Kalman filter updates µ Λ(j) t+τ Σ Λ(j) t+τ = ∆ l l t+τ t−1, λt, ct+1,l , t+τ t−1, λt, ct+1,l , τ 1, ..., , = L in the multilevel MKF is N l=1 Kl. Consider the 16- (34) QAM and its corresponding two-level sampling space shown in Figure 1. There are N = 8 one-step Kalman filter updates b with ca,l (ca,l, ca+1,l, ..., cb,l). The delayed multilevel MKF needed by the multilevel MKF, whereas, the original MKF recursively propagates the samples properly weighted for = requires J 16 Kalman updates for each Markov stream. p(Λt−1 | Yt+∆−1) to those for p(Λt | Yt+∆) and is summa- Hence, the computation complexity is reduced by half by the rized as follows. multilevel MKF. Algorithm 4 (delayed multilevel MKF). Suppose, at time (t − Remark 2 (properties of the weighted samples). From (30), { Λ(j) (j) (j) }m 1), a set of properly weighted samples ( t−1, κt−1, wt−1) j=1 we have Λ | is available with respect to p( t−1 Yt+∆−1). Then at time t, (j) (j) ∆ Λ | − as the new data yt+ becomes available, the following steps (j) (j) p t−2, λt−1 Yt 1 w − = w − . (31) are implemented to update each weighted sample. t 1 t 2 Λ(j) | (j) | Λ(j) p t−2 Yt−2 P λt−1 t−2, Yt−1 For j = 1, ..., m, apply the following steps. Multilevel Mixture Kalman Filter 2261

= ∈ A t+∆ ∈ A∆ (A1) For each λt ai 1,andforeachct+1,l l ,per- Kalman filter assuming that the elements in the higher-level form the update on the corresponding Kalman filter sampling space are transmitted. Therefore, some bias will be (j) t+∆−1 introduced into the computation of the weight. On the other κt+∆−1(λt, ct+1,l ), that is, hand, we make use of more information in the approxima- (j) − yt+τ , ct+τ,l (j) (j) (j) t+τ 1 −−−−−→ t+τ Λ | ∆ κt+τ−1 λt, ct+1,l κt+τ λt, ct+1,l . (35) tion of better distribution p( t−1, λt Yt+ ), which makes the algorithm more efficient than the original MKF. (A2) For each ai ∈ A1, compute the sampling density Remark 4 (properly weighted samples). To mitigate the bias (j) = | Λ(j) ρt,i P λt ai t−1, yt+∆ problem introduced in the weight computation, instead of (38), we can use the following importance weight: = t+∆ t+∆ | Λ(j) (j) p yt , λt, ct+1,l t−1, Yt−1 ∆ ct+ ∈A∆ Λ(j) (j) | t+1,l l p − , λt Yt (j) = (j) t1 (j) wt wt−1 (j) (j) (j) ∝ P λ = a | Λ − Λ | | Λ t i t 1 p t−1 Yt−1 P λt t−1, Yt+∆ ∆ (39) (j) (j) (j) (j) | | Λ × | Λ = t+τ p yt λt , Yt−1 P λt t−1, Yt−1 p yt+τ Yt+τ−1, t−1, λt ai, c +1, (j) t l = w − . t+∆ ∈A∆ τ=0 t 1 (j) ct+1,l l ρ t,i ∆ t+τ−1 (j) Similar to the multilevel sampling algorithm in Section 3,it × P ct+τ,l | c , λt = ai, Λ − . t+1,l t 1 ( ) ( ) = {Λ j j }m τ 1 is easily seen that the samples t , wt j=1 drawn by the (36) above procedure are properly weighted with respect to p(Λ | (j) t (A3) Draw a sample λ according to the above sampling (j) (j) m t Yt) provided that {Λ − , w − } = are properly weighted with density, that is, t 1 t 1 j 1 respect to p(Λt−1 | Yt−1). Since the whole procedure is just to get better samples based on the future information, de- (j) = ∝ (j) P λt λi ρt,i . (37) layed weight may be very effective. Furthermore, there is no bias anymore in the weight computation although we still ap- (j) Λ(j) Λ(j) (A4) Append the sample λt to t−1 and obtain t . proximate the mixture Gaussian distribution with a Gaussian (A5) Compute the importance weight distribution as in Remark 3. Λ(j) (j) | Remark 5 (complexity). Note that the dominant computa- p − , λt Yt+∆ (j) = (j) t 1 tion required for the above delayed multilevel MKF is mainly wt w − t 1 Λ(j) | (j) | Λ(j) p t−1 Yt+∆−1 P λt t−1, Yt+∆ in the first step. Denote J |A1| and M |Al|.Thenumber of one-step Kalman filter updates in the delayed multilevel t+∆ t+∆ (j) (j) t+∆ ∈A∆ p y , c , λ | Λ − , Y −1 (j) Ct+1,l l t t+1,l t t 1 t MKF can be computed as follows: ∝ − wt 1 ∆− ∆− (j) (j) ∆− ∆ t+ 1 t+ 1 | Λ Ct+ 1∈A p yt , c t−1, Yt−1 ρt,i ∆+1 t,l l t,l ∆ M − 1 N = J + JM + ···+ JM = J . (40) ∆ t+τ (j) (j) − t+∆ ∈A∆ = p y |Y − , C , λ , Λ − M 1 (j) Ct+1,l l τ 1 t+τ t+τ 1 t+1,l t t 1 = − ∆ wt 1 ∆−1 (j) +1 ∆− ∆ | t+τ Λ Note that the delayed-sample MKF requires J Kalman up- t+ 1∈A =0 p yt+τ Yt+τ−1, C , t−1 ct,l l τ t,l dates at each time. Since usually M J, compared with the (j) | Λ(j) ∆ | t+τ−1 Λ(j) P λt t−1, Yt−1 τ=1 P ct+τ,l ct+1,l , t delayed-sample MKF, the computational complexity of the × . delayed multilevel MKF is significantly reduced. ∆−1 | t+τ−1 Λ(j) (j) τ=0 P ct+τ,l Ct,l , t−1 ρt,i (38) Remark 6 (alternative sampling method). To further reduce (A6) Do resampling if the effective sample size is below a the computational complexity in the first step, we can take certain threshold, as discussed in Section 2.2. the alternative sampling method composed of the following steps.

Remark 3 (properties of the weighted samples). Similar to Algorithm 5 (alternative sampling method). (1) At time t+∆, the multilevel sampling algorithm in Section 3,itcanbe t+∆ ∈ A∆ for each ct,l l , perform the update on the correspond- (j) (j) {Λ }m (j) ∆− shown that the samples t , wt j=1 drawn by the above t+ 1 ing Kalman ﬁlter κt+∆−1(ct,l ), that is, procedure are properly weighted with respect to p(Λ | Y ∆) t t+ {Λ(j) (j) }m (j) t+τ−1 −y−−−−t+τ , ct+τ→,l (j) t+τ provided that t−1, wt−1 j=1 areproperlyweightedwithre- κt+τ−1 ct,l κt+τ ct,l . (41) spect to p(Λ − | Y ∆− ). However, the likelihood function t 1 t+ 1 ∆ (j) t+ | t+τ Λ (2) Select K paths from ct+1,l based on the computed p(yt+τ Yt+τ−1, Ct,l , t−1) is not simply a Gaussian distribution any more, but a mixture of Gaussian components. weight Since the mixture of Gaussian distribution is implausible to ∆ be achieved within the rough sampling space, it has to be ap- | Λ(j) t+τ | t+τ−1 Λ(j) p yt+τ Yt+τ−1, t−1, ct,l P ct+τ,l ct,l , t−1 . (42) proximated with a Gaussian distribution computed by the τ=0 2262 EURASIP Journal on Applied Signal Processing

(3) For each λt = ai ∈ A1,andforeachpathk in the Other suboptimal receivers for flat-fading channels include selected K paths, perform the update on the corresponding the method based on a combination of a hidden Markov (j) t+∆−1,k model and a Kalman filtering [30], and the approach based Kalman filter κt+∆−1(λt, ct+1,l ), that is, on the expectation-maximization (EM) algorithm [31]. (j) t+τ−1,k −y−−−−−t+τ , ct+τ,→lk (j) t+τ,k In the communication system, the transmitted data sym- κt+τ−1 λt, ct+1,l κt+τ λt, ct+1,l . (43) bols {st} take values from a finite alphabet set A1 ={a1, ..., a|A |}, and each symbol is transmitted over a Rayleigh (4) For each a ∈ A , compute the sampling density 1 i 1 fading channel. As shown in [32, 33], the fading process is adequately represented by an ARMA model, whose param- (j) = | Λ(j) ρt,i P λt ai t−1 eters are chosen to match the spectral characteristics of the K ∆ fading process. That is, the fading process is modelled by the (j) t+τ,k × p y | Y − , Λ − , λ = a , c output of a lowpass Butterworth filter of order r driven by t+τ t+τ 1 t 1 t i t+1,l (44) k=1 τ=0 white Gaussian noise ∆ Θ × k | t+τ−1,k = Λ(j) = (D) P ct+τ,l ct+1,l , λt ai, t−1 . αt Φ ut , (45) τ=1 (D) k Φ The weight calculation can be computed by (40) for the where D is the back-shift operator D , ut ut−k; (z) r ··· Θ r ··· { } target distribution p(λ | Y )or(45) for the target distri- φr z + + φ1z +1; (z) θr z + + θ1z + θ0;and ut is t t a white complex Gaussian noise sequence with unit variance bution p(λt | Yt+∆). Besides the bias that resulted from the Gaussian approximation in Remark 3, the latter also intro- and independent real and complex components. The coef- { } { } ducesnewbiasbecauseofthesummationoverK selected ficients φi and θi , as well as the order r of the Butter- paths, other than the whole sampling space in higher level. worth filter, are chosen so that the transfer function of the However,thefirstonedoesnotintroduceanybiasasin filter matches the power spectral density of the fading process, which in turn is determined by the channel Doppler Remark 4.DenoteJ |A | and M |A |.Thenumber 1 l frequency. On the other hand, a simpler method, which uses of one-step Kalman filter updates in the delayed multilevel ∆ a two-path model to build ARMA process, can be found MKF can be computed as N = JK + M . in [34]; the results there closely approximate more complex Remark 7 (the choice of the parameters). Note that the per- path models. Then such a communication system has the fol- formance of the delayed multilevel MKF is mainly dominated lowing state-space model form: by the two important parameters: the number of prediction xt = Fxt−1 + gut, (46) steps ∆ and the specific level of the sampling space Al.With = H the same computation, the delayed multilevel MKF can see yt sth xt + σvt, (47) further “future” steps (larger ∆) with a coarser-level sampling space (larger l), whereas it can see the “future” samples in a where ∆   finer sampling space (smaller l), but with smaller steps. −φ −φ ··· −φ 0    1 2 r   10··· 00 1     Remark 8 (multiple sampling space). In Algorithm 5,wecan  ···  0 ff ff  01 00   also use di erent sampling spaces for di erent delay steps. F   , g  .  . (48)  ......   .  That is, we can gradually increase the sampling space from  . . . . .  . the lower level to the higher level with an increase in the delay 0 00··· 10 step.

The fading coefficient sequence {αt} can then be written as 5. SIMULATIONS H ··· αt = h xt, h θ0 θ1 θr . (49) We consider the problem of adaptive detection in flat-fading channels in the presence of Gaussian noise. This problem is In the state-space model, {ut} in (46)and{vt} in (47) are the of fundamental importance in communication theory and white complex Gaussian noise sequences with unit variance an array of methodologies have been developed to tackle and independent real and imaginary components: this problem. Specifically, the optimal detector for flat-fading i.i.d. i.i.d. channels with known channel statistics is studied in [24, 25], ut ∼ Nc(0, 1), vt ∼ Nc(0, 1). (50) which has a prohibitively high complexity. Suboptimal receivers in flat-fading channels employ a two-stage structure, In our simulations, the fading process is specifically mod- with a channel estimation stage followed by a sequence de- eled by the output of a Butterworth filter of order r = 3 tection stage. Channel estimation is typically implemented driven by a complex white Gaussian noise process. The cut- by a Kalman filter or a linear predictor, and is facilitated by off frequency of this filter is 0.05, corresponding to a normal- per-survivor processing (PSP) [26], decision feedback [27], ized Doppler frequency (with respect to the symbol rate 1/T) pilot symbols [28], or a combination of the above [29]. fdT = 0.05, which is a fast-fading scenario. That is, the fading Multilevel Mixture Kalman Filter 2263

0 coeﬃcients {αt} are modeled by the following ARMA(3,3) 10 process:

α − 2.37409α − +1.92936α − − 0.53208α − t t 1 t 2 t 3 −2 = 10 0.89409u +2.68227u − +2.68227u − (51) −1 t t 1 t 2 10 +0.89409ut−3 , where ut ∼ Nc(0, 1). The filter coefficients in (51)arechosen BER { }= such that Var αt 1. However, a simpler method, which −2 uses a two-path model to build ARMA process, can be found 10 in [34]. We then apply the proposed multilevel MKF methods to the problem of online estimation of the a posteriori probabil- − ity of the symbol st based on the received signals up to time 10 3 t. That is, at time t, we need to estimate 0 5 10 15 20 25 Eb/N0 (dB) P s = a | Y , a ∈ A . (52) t i t i 1 PSP (10% pilot) PSP (20% pilot) Then a hard maximum a posteriori (MAP) decision on sym- Multilevel (10% pilot) bol st is given by MKF (10% pilot) Multilevel (20% pilot) st = arg max P st = ai | Yt . (53) MKF (20% pilot) ai∈A1 Genie bound If we obtain a set of Monte Carlo samples of the transmitted Figure 2: The BER performance of the PSP, MKF, and multilevel { (j) (j) }m symbols (St , wt ) j=1, properly weighted with respect to MKF for pilot-aided 16-QAM over flat-fading channels. p(St | Yt), then the a posteriori symbol probability in (52)is approximated by

m 1 (j) (j) We next show the performance of the multilevel MKF p s = a | Y =∼ 1 s = a w , a ∈ A , (54) t i t t i t i 1 algorithm for detecting 16-QAM over flat-fading channels. Wt j=1 The receiver implements the decoding algorithms discussed m (j) in Section 3 in combination with the delayed-weight method with W = w . t j 1 t under Schemes 1 and 2 discussed above. In our simulations, In this paper, we use the 16-QAM modulation. Note that we take m = 50 Markov streams. The length of the symbol the 16-QAM modulation has much more phase ambiguities sequence is 256. We first show the bit error rate (BER) per- than the BPSK in [13]. We have provided the following two formance versus the signal-to-noise (SNR) by the PSP, the schemes to resolve phase ambiguities. multilevel MKF, and the MKF receiver under different pilot Scheme 1 (pilot-assisted). Pilot symbols are inserted period- schemes without delayed weight in Figure 2 and with delayed ically every fixed length of symbols; the similar scheme was weight in Figure 3. The numbers of bit errors were collected used in [20]. In this paper, 10% and 20% pilot symbols are over 50 independent simulations at low SNR or more at high studied. SNR. In these figures, we also plotted the “genie-aided lower bound.”1 The BER performance of PSP is far from the ge- Scheme 2(differential 16-QAM). We view the 16-QAM as nie bound at SNR higher than 10 dB. But the performance a pair of QPSK symbols. Then two differential QPSKs will of the multilevel MKF with 20% pilot symbol is very close be used to resolve the phase ambiguity. Given the trans- to the genie bound. Furthermore, with the delayed-weight method, the performance of the multilevel MKF can be sig- mitted symbol st−1 and information symbol dt, they can nificantly improved. We next show the BER performance in be represented by the QPSK symbol pair as (rst−1,1, rst−1,2) ff Figure 4 under the differential coding scheme. As in the pilot- and (rdt ,1, rdt,2), respectively. Then the di erential modulation scheme is given by assisted scheme, the BER performance of PSP is far from the genie bound at SNR higher than 15 dB. On the contrary, the = · = · rst ,1 rst−1,1 rdt ,1, rst ,2 rst−1,2 rdt ,2. (55)

1The genie-aided lower bound is obtained as follows. We assume that The two-QPSK symbol pair (rst ,1, rst ,2) will be mapped to the 16-QAM symbols and then transmitted through the fading at each time t, a genie provides the receiver with an observation of the ff modulation-free channel coefficient corrupted by additive white Gaussian channel. The traditional di erential receiver performs the 2 noise with the same variance, that is, yt = αt + nt,wherent ∼ Nc(0, σ ). following steps: The receiver employs a Kalman filter to track the fading process based on the information provided by the genie, that is, it computes αt = E{αt | yt}. = · ∗ = · ∗ rdt ,1 rdt ,1 rst−1,1, rdt ,2 rst ,2 rst−1,2. (56) An estimate of the transmitted 16-QAM is obtained by slicing (ytαt ). 2264 EURASIP Journal on Applied Signal Processing

100 100

10−1 10−1 BER BER

10−2 10−2

10−3 10−3 0 5 10 15 20 25 0 5 10 15 20 25

Eb/N0 (dB) Eb/N0 (dB)

PSP (10% pilot) PSP PSP (20% pilot) Multilevel Multilevel with delayed weight (10% pilot) MKF MKF with delayed weight (10% pilot) Multilevel with delayed weight (δ = 10) Multilevel with delayed weight (20% pilot) MKF with delayed weight (δ = 10) MKF with delayed weight (20% pilot) Genie bound Genie bound

Figure 3: The BER performance of the PSP, the MKF with delayed Figure 4: The BER performance of the PSP, the MKF, and the mul- weight (δ = 10), and the multilevel MKF with delayed weight (δ = tilevel MKF for differential 16-QAM over flat-fading channels. The 10) for pilot-aided 16-QAM over flat-fading channels. delayed-weight method is used with δ = 10. performance of the multilevel MKF is very close to that of the original MKF although there is a 5 dB gap between the genie bound and the performance of the multilevel MKF. The BER performance of the MKF and the multilevel 10−1 MKF with or without delayed weight versus the number of Markov streams is shown in Figure 5 under the differential coding scheme. The BER is gradually improved from the

value 0.16 to the value about 0.08 for multilevel MKF, 0.07 BER for MKF, 0.065 for multilevel MKF with delayed weight, and 0.062 for MKF with delayed weight with 25 Markov streams. However, the BER performance can not be improved anymore with more than 25 streams. Therefore, the optimal number of Markov streams will be 25 in this example.

Next, we illustrate the performance of the delayed multi- − 10 2 level MKF. The receiver implements the decoding algorithms 0 5 10 15 20 25 30 35 40 45 50 discussed in Section 4. We show the BER performance ver- Number of Markov streams sus SNR in Figure 6, computed by the delayed-sample or the delayed multilevel MKF with one delayed step (∆ = 1). Multilevel MKF The BER performance of the MKF and the MKF with de- Multilevel with delayed weight (δ = 10) layed weight is also plotted in the same figure. In the de- MKF with delayed weight (δ = 10) layed multilevel MKF, we implement two schemes for choosing the sampling space for the “future” symbols. In the first Figure 5: The BER performance of the MKF, and the multilevel scheme, we choose the second-level (l = 2) sampling space MKF for differential 16-QAM over flat-fading channels versus the A ={c , c , c , c },wherec , i = 1, ..., 4, are the solid circles number of Markov streams. The delayed-weight method is used 2 1 2 3 4 i = shown in Figure 1; and in the second scheme, we choose the with δ 10. third-level (l = 3) sampling space A3 ={c1 + c2, c3 + c4}.It is seen that the BER of the multilevel MKF is very close to that of the delayed-sample algorithm. It is also seen that the delayed multilevel MKF based on the second-level sampling performance of the multilevel MKF method is conditioned space A2 is nearly 2 dB better than that based on the third- on the specific level sampling space. The performance of the level sampling space A3. Multilevel Mixture Kalman Filter 2265

100 REFERENCES [1] C. Andrieu, N. de Freitas, and A. Doucet, “Robust full Bayesian learning for radial basis networks,” Neural Compu- tation, vol. 13, no. 10, pp. 2359–2407, 2001. 10−1 [2] E. Beadle and P. Djuric, “A fast-weighted Bayesian bootstrap filter for nonlinear model state estimation,” IEEE Trans. on Aerospace and Electronics Systems, vol. 33, no. 1, pp. 338–343, 1997. BER [3] R. Chen and J. S. Liu, “Mixture Kalman filters,” Journalofthe 10−2 Royal Statistical Society: Series B, vol. 62, no. 3, pp. 493–508, 2000. [4] C.-M. Cho and P. Djuric, “Bayesian detection and estimation of cisoids in colored noise,” IEEE Trans. Signal Processing, vol. 43, no. 12, pp. 2943–2952, 1995. 10−3 [5] P. M. Djuric and J. Chun, “An MCMC sampling approach to 0 5 10 15 20 25 estimation of nonstationary hidden Markov models,” IEEE Eb/N0 (dB) Trans. Signal Processing, vol. 50, no. 5, pp. 1113–1123, 2002. [6] A. Doucet, N. de Freitas, and N. Gordon, Eds., Sequential MKF Monte Carlo in Practice, Springer-Verlag, New York, NY, USA, Delayed multilevel sampling (∆ = 1, l = 3) 2001. MKF with delayed weight (δ = 10) ∆ = = [7] A. Doucet, S. J. Godsill, and C. Andrieu, “On sequential Delayed multilevel sampling ( 1, l 2) Monte Carlo sampling methods for Bayesian filtering,” Statis- Delayed sample (∆ = 1) Genie bound tics and Computing, vol. 10, no. 3, pp. 197–208, 2000. [8] W. Fong, S. J. Godsill, A. Doucet, and M. West, “Monte Carlo Figure 6: The BER performance of the delayed multilevel MKF with smoothing with application to audio signal enhancement,” the second (l = 2)-level or the third (l = 3)-level sampling space for IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 438–449, 2002. the differential 16-QAM system over flat-fading channels. The BER [9] Y. Huang and P. Djuric,´ “Multiuser detection of synchronous performance of the delayed-sample method is also shown. code-division multiple-access signals by perfect sampling,” IEEE Trans. Signal Processing, vol. 50, no. 7, pp. 1724–1734, 2002. [10] A. Kong, J. S. Liu, and W. H. Wong, “Sequential imputations 6. CONCLUSION and Bayesian missing data problems,” Journal of the American Statistical Association, vol. 89, no. 425, pp. 278–288, 1994. In this paper, we have developed a new sequential Monte [11] J. S. Liu and R. Chen, “Sequential Monte Carlo methods for Carlo (SMC) sampling method—the multilevel mixture dynamic systems,” Journal of the American Statistical Associa- Kalman filter (MKF)—under the framework of MKF for tion, vol. 93, no. 443, pp. 1032–1044, 1998. conditional dynamic linear systems. This new scheme gener- [12] M. K. Pitt and N. Shephard, “Filtering via simulation: auxil- ates random streams using sequential importance sampling iary particle filters,” Journal of the American Statistical Associ- (SIS), based on the multilevel or hierarchical structure of the ation, vol. 94, no. 446, pp. 590–599, 1999. indicator random variables. This technique can also be used [13]R.Chen,X.Wang,andJ.S.Liu,“Adaptivejointdetection and decoding in flat-fading channels via mixture Kalman fil- in conjunction with the delayed estimation methods, result- tering,” IEEE Transactions on Information Theory, vol. 46, no. ing in a delayed multilevel MKF. Moreover, the performance 6, pp. 2079–2094, 2000. of both the multilevel MKF and the delayed multilevel MKF [14] X. Wang, R. Chen, and D. Guo, “Delayed-pilot sampling for can be further enhanced when employed in conjunction with mixture Kalman filter with application in fading channels,” the delayed-weight method. IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 241–254, 2002. We have also applied the multilevel MKF algorithm and [15] A. Doucet, N. J. Gordon, and V. Kroshnamurthy, “Particle fil- the delayed multilevel MKF algorithm to solve the problem ters for state estimation of jump Markov linear systems,” IEEE of adaptive detection in flat-fading communication channels. Trans. Signal Processing, vol. 49, no. 3, pp. 613–624, 2001. It is seen that compared with the receiver algorithm based on [16] M. West and J. Harrison, Bayesian Forecasting and Dynamic Models, Springer-Verlag, New York, NY, USA, 1989. the original MKF, the proposed multilevel MKF techniques ff [17] R. Y. Rubinstein, Simulation and the Monte Carlo Method, o er very good performance. It is also seen that the receivers John Wiley & Sons, New York, NY, USA, 1981. based on the delayed multilevel MKF can achieve similar per- [18] J. MacCormick and A. Blake, “A probabilistic exclusion prin- formance as that based on the delayed-sample MKF, but with ciple for tracking multiple objects,” International Journal of a much lower computational complexity. Computer Vision, vol. 39, no. 1, pp. 57–71, 2000. [19] D. Guo, X. Wang, and R. Chen, “Wavelet-based sequential Monte Carlo blind receivers in fading channels with unknown ACKNOWLEDGMENT channel statistics,” IEEE Trans. Signal Processing, vol. 52, no. 1, pp. 227–239, 2004. This work was supported in part by the US National Science [20] E. Punskaya, C. Andrieu, A. Doucet, and W. Fitzgerald, “Par- Foundation (NSF) under Grants DMS-0225692 and CCR- ticle filtering for demodulation in fading channels with non- 0225826, and by the US Office of Naval Research (ONR) un- Gaussian additive noise,” IEEE Trans. Communications, vol. der Grant N00014-03-1-0039. 49, no. 4, pp. 579–582, 2001. 2266 EURASIP Journal on Applied Signal Processing

[21] D. Guo and X. Wang, “Blind detection in MIMO systems via Xiaodong Wang received the B.S. degree sequential Monte Carlo,” IEEE Journal on Selected Areas in in electrical engineering and applied math- Communications, vol. 21, no. 3, pp. 464–473, 2003. ematics from Shanghai Jiao Tong Univer- [22] J. Zhang and P. Djuric,´ “Joint estimation and decoding of sity, Shanghai, China, in 1992; the M.S. de- space-time trellis codes,” EURASIP Journal on Applied Signal gree in electrical and computer engineer- Processing, vol. 2002, no. 3, pp. 305–315, 2002. ing from Purdue University in 1995; and the [23] Z. Yang and X. Wang, “A sequential Monte Carlo blind re- Ph.D. degree in electrical engineering from ceiver for OFDM systems in frequency-selective fading chan- Princeton University in 1998. From 1998 nels,” IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 271–280, to 2001, he was an Assistant Professor in 2002. the Department of Electrical Engineering, [24] R. Haeb and H. Meyr, “A systematic approach to carrier re- Texas A&M University. In 2002, he joined the Department of Elec- covery and detection of digitally phase modulated signals of trical Engineering, Columbia University, as an Assistant Professor. fading channels,” IEEE Trans. Communications,vol.37,no.7, Dr. Wang’s research interests fall in the general areas of computing, pp. 748–754, 1989. signal processing, and communications. He has worked in the areas [25] D. Makrakis, P. T. Mathiopoulos, and D. P. Bouras, “Opti- of wireless communications, statistical signal processing, parallel mal decoding of coded PSK and QAM signals in correlated fast fading channels and AWGN: a combined envelope, mul- and distributed computing, and bioinformatics. He has published tiple differential and coherent detection approach,” IEEE extensively in these areas. Among his publications is a recent book Trans. Communications, vol. 42, no. 1, pp. 63–75, 1994. entitled Wireless Communication Systems: Advanced Techniques for [26] G. M. Vitetta and D. P. Taylor, “Maximum likelihood decod- Signal Reception. Dr. Wang has received the 1999 NSF CAREER ing of uncoded and coded PSK signal sequences transmitted Award and the 2001 IEEE Communications Society and Informa- over Rayleigh flat-fading channels,” IEEE Trans. Communica- tion Theory Society Joint Paper Award. He currently serves as an tions, vol. 43, no. 11, pp. 2750–2758, 1995. Associate Editor for the IEEE Transactions on Signal Processing, [27] Y. Liu and S. D. Blostein, “Identification of frequency non- the IEEE Transactions on Communications, the IEEE Transactions selective fading channels using decision feedback and adaptive on Wireless Communications, and IEEE Transactions on Informa- linear prediction,” IEEE Trans. Communications, vol. 43, no. tion Theory. 2/3/4, pp. 1484–1492, 1995. [28] H. Kong and E. Shwedyk, “On channel estimation and se- Rong Chen is a Professor at the Department quence detection of interleaved coded signals over frequency of Information and Decision Sciences, Col- nonselective Rayleigh fading channels,” IEEE Trans. Vehicular lege of Business Administration, University Technology, vol. 47, no. 2, pp. 558–565, 1998. of Illinois at Chicago (UIC). Before join- [29] G. T. Irvine and P. J. McLane, “Symbol-aided plus decision- ing UIC in 1999, he was with the Depart- directed reception for PSK/TCM modulation on shadowed ment of Statistics, Texas A&M University. mobile satellite fading channels,” IEEE Journal on Selected Ar- Dr. Chen received his B.S. degree in math- eas in Communications, vol. 10, no. 8, pp. 1289–1299, 1992. ematics from Peking University, China, in [30] I. B. Collings and J. B. Moore, “An adaptive hidden Markov 1985; his M.S. and Ph.D. degrees in statistics model approach to FM and M-ary DPSK demodulation in from Carnegie Mellon University in 1987 noisy fading channels,” Signal Processing,vol.47,no.1,pp. and 1990, respectively. His main research interests are in time se- 71–84, 1995. ries analysis, statistical computing and Monte Carlo methods in dy- [31] C. N. Georghiades and J. C. Han, “Sequence estimation in the presence of random parameters via the EM algorithm,” IEEE namic systems, and statistical applications in engineering and busi- Trans. Communications, vol. 45, no. 3, pp. 300–308, 1997. ness. He is an Associate Editor for the Journal of American Statis- [32] G. L. Stuber,¨ Principles of Mobile Communication,Kluwer tical Association, Journal of Business and Economic Statistics, Sta- Academic Publishers, Boston, Mass, USA, 2000. tistica Sinica, and Computational Statistics. [33] R. A. Ziegler and J. M. Cioffi, “Estimation of time-varying digital radio channels,” IEEE Trans. Vehicular Technology, vol. 41, no. 2, pp. 134–151, 1992. [34] A. Duel-Hallen, “Decorrelating decision-feedback multiuser detector for synchronous code-division multiple-access channel,” IEEE Trans. Communications, vol. 41, no. 2, pp. 285–290, 1993.

Dong Guo received the B.S. degree in geophysics and computer science from China University of Mining and Technology (CUMT), Xuzhou, China, in 1993; the M.S. degree in geophysics from the Graduate School, Research Institute of Petroleum Ex- ploration and Development (RIPED), Bei- jing, China, in 1996. In 1999, he received the Ph.D. degree in applied mathematics from Beijing University, Beijing, China. He is now pursuing a second Ph.D. degree in the Department of Electrical En- gineering, Columbia University, New York. His research interests are in the area of statistical signal processing and communications. EURASIP Journal on Applied Signal Processing 2004:15, 2267–2277 c 2004 Hindawi Publishing Corporation

Resampling Algorithms for Particle Filters: A Computational Complexity Perspective

Miodrag Bolic´ Department of Electrical and Computer Engineering, State University of New York at Stony Brook, Stony Brook, NY 11794-2350, USA Email: [email protected] Petar M. Djuric´ Department of Electrical and Computer Engineering, State University of New York at Stony Brook, Stony Brook, NY 11794-2350, USA Email: [email protected] Sangjin Hong Department of Electrical and Computer Engineering, State University of New York at Stony Brook, Stony Brook, NY 11794-2350, USA Email: [email protected]

Received 30 April 2003; Revised 28 January 2004

Newly developed resampling algorithms for particle filters suitable for real-time implementation are described and their analysis is presented. The new algorithms reduce the complexity of both hardware and DSP realization through addressing common issues such as decreasing the number of operations and memory access. Moreover, the algorithms allow for use of higher sampling frequencies by overlapping in time the resampling step with the other particle filtering steps. Since resampling is not dependent on any particular application, the analysis is appropriate for all types of particle filters that use resampling. The performance of the algorithms is evaluated on particle filters applied to bearings-only tracking and joint detection and estimation in wireless communications. We have demonstrated that the proposed algorithms reduce the complexity without performance degradation. Keywords and phrases: particle filters, resampling, computational complexity, sequential implementation.

1. INTRODUCTION critical for practical implementations. The performance of the algorithms is analyzed when they are executed on a dig- Particle filters (PFs) are very suitable for nonlinear and/or ital signal processor (DSP) and specially designed hardware. non-Gaussian applications. In their operation, the main Note that resampling is the only PF step that does not de- principle is recursive generation of random measures, which pend on the application or the state-space model. Therefore, approximate the distributions of the unknowns. The ran- the analysis and the algorithms for resampling are general. dom measures are composed of particles (samples) drawn From an algorithmic standpoint, the main challenges in- from relevant distributions and of importance weights of clude development of algorithms for resampling that are the particles. These random measures allow for computation suitable for applications requiring temporal concurrency.1 of all sorts of estimates of the unknowns, including mini- A possibility of overlapping PF operations is considered be- mum mean square error (MMSE) and maximum a posteriori cause it directly affects hardware performance, that is, it in- (MAP) estimates. As new observations become available, the creases speed and reduces memory access. We investigate se- particles and the weights are propagated by exploiting Bayes quential resampling algorithms and analyze their computa- theorem and the concept of sequential importance sampling tional complexity metrics including the number of opera- [1, 2]. tionsaswellastheclassandtypeofoperationbyperform- The main goals of this paper are the development of re- ing behavioral profiling [3]. We do not consider fixed point sampling methods that allow for increased speeds of PFs, that require less memory, that achieve fixed timings regardless of the statistics of the particles, and that are computationally 1Temporal concurrency quantifies the expected number of operations less complex. Development of such algorithms is extremely that are simultaneously executed, that is, are overlapped in time. 2268 EURASIP Journal on Applied Signal Processing precision issues, where a hardware solution of resampling After resampling, the future particles are more concentrated suitable for fixed precision implementation has already been in domains of higher posterior probability, which entails im- presented [4]. proved estimates. The analysis in this paper is related to the sample The PF operations are performed according to importance resampling (SIR) type of PFs. However, the anal- m (m) ∼ | (in−1) (1) generation of particles (samples) xn π(xn xn−1 , ysis can be easily extended to any PF that performs resam- m | (in−1) pling, for instance, the auxiliary SIR (ASIR) filter. First, in y1:n), where π(xn xn−1 , y1:n)isanimportancedensity (m) Section 2 we provide a brief review of the resampling opera- and in is an array of indexes, which shows that the (m) tion. We then consider random and deterministic resampling particle m should be reallocated to the position in ; algorithms as well as their combinations. The main feature (2) computation of weights by of the random resampling algorithm, referred to as residual- m m systematic resampling (RSR) and described in Section 3,is (in−1) (m) (m) (in−1) ∗(m) = wn−1 p yn xn p xn xn−1 to perform resampling in fixed time that does not depend on w m m (2) n (in−1) (m) (in−1) the number of particles at the output of the resampling pro- an−1 π xn xn−1 , y1:n cedure. The deterministic algorithms, discussed in Section 4, (m) = ∗(m) M ∗(j) are threshold-based algorithms, where particles with mod- followed by normalization wn wn / j=1 wn ; (m) (m) (m) erate weights are not resampled. Thereby significant savings (3) resampling in ∼ an ,wherean is a suitable resam- can be achieved in computations and in the number of times pling function whose support is defined by the particle the memories are accessed. We show two characteristic types (m) xn [8]. of deterministic algorithms: a low-complexity algorithm and an algorithm that allows for overlapping of the resampling The above representation of the PF algorithm provides a operation with the particle generation and weight computa- certain level of generality. For example, the SIR filter with (m) tion. The performance and complexity analysis are presented a stratified resampling is implemented by choosing an = in Section 5 and the summary of our contributions is out- (m) (m) wn for m = 1, ..., M. When an = 1/M, there is no re- lined in Section 6. (m) sampling and in = m. The ASIR filter can be implemented (m) = (m) |µ(m) = | (m) by setting an wn p(yn+1 n+1)andπ(xn) p(xn xn−1), µ(m) 2. OVERVIEW OF RESAMPLING IN PFs where n is the mean, the mode, or some other likely value | (m) associated with the density p(xn xn−1). PFs are used for tracking states of dynamic state-space models described by the set of equations 3. RESIDUAL SYSTEMATIC RESAMPLING ALGORITHM xn = f xn−1 + un, (1) In this section, we consider stratified random resampling al- (m) (m) yn = g xn + vn, gorithms, where an = wn [9, 10, 11]. Standard algorithms used for random resampling are different variants of strati- where xn is an evolving state vector of interest, yn is a vector fied sampling such as residual resampling (RR) [12], branch- of observations, un and vn are independent noise vectors with ing corrections, [13] and systematic resampling (SR) [6]. SR known distributions, and f (·)andg(·) are known functions. is the most commonly used since it is the fastest resampling The most common objective is to estimate xn as it evolves in algorithm for computer simulations. time. We propose a new resampling algorithm which is based PFs accomplish tracking of xn by updating a random on stratified resampling, and we refer to it as RSR [14]. Sim- { (m) (m)}M 2 measure x1:n , wn m=1, which is composed of M parti- ilar to RR, RSR calculates the number of times each particle (m) (m) cles xn and their weights wn defined at time instant n, is replicated except that it avoids the second iteration of RR recursively in time [5, 6, 7]. The random measure approxi- when residual particles need to be resampled. Recall that in mates the posterior density of the unknown trajectory x1:n, RR, the number of replications of a specific particle is deter- p(x1:n|y1:n), where y1:n is the set of observations. mined in the first loop by truncating the product of the num- In the implementation of PFs, there are three important ber of particles and the particle weight. In RSR instead, the operations: particle generation, weight computation, and re- updated uniform random number is formed in a different sampling. Resampling is a critical operation in particle filter- fashion, which allows for only one iteration loop and pro- ing because with time, a small number of weights dominate cessing time that is independent of the distribution of the the remaining weights, thereby leading to poor approxima- weights at the input. The RSR algorithm for N input and M tion of the posterior density and consequently to inferior es- output (resampled) particles is summarized by Algorithm 1. timates. With resampling, the particles with large weights are Figure 1 graphically illustrates the SR and RSR methods replicated and the ones with negligible weights are removed. for the case of N = M = 5 particles with weights given in Table 1. SR calculates the cumulative sum of the weights (m) = Σm (i) (m) C i=1wn and compares C with the updated uni- 2 (m) The notation x1:n signifies x1:n ={x1, x2, ..., xn}. form number U for m = 1, ..., N. The uniform number Computational Complexity of Resampling in Particle Filters 2269

(m) { }N particles x is necessary. In Figure 2a, particle 1 is replicated Purpose: generation of an array of indexes i 1 at time instant n, n>0. twice and occupies the locations of particles 1 and 2. Particle ( ) { }N 2 is replicated once and must be stored in the memory of x m Input: an array of weights wn 1 , input and output number of particles, N and M,respectively. or it would be rewritten. We refer to this method as particle Method: allocation with index addressing. (i) = RSR(N, M, w) In Figure 2b, the indexes represent the number of times Generate a random number ∆U(0) ∼ U[0, 1/M] each particle is replicated. For example, i(1) = 2 means that for m = 1–N particle 1 is replicated twice. We refer to this method as par- (m) = (m) − ∆ (m−1) · i (wn U ) M +1 ticle allocation with replication factors. This method still re- ∆ (m) = ∆ (m−1) (m) − (m) U U + i /M wn quires additional memory for particles and memory for stor- end ing indexes. The additional memory for storing the particles x(m) is Algorithm 1: Residual systematic resampling algorithm. not necessary if the particles are replicated to the positions of the discarded particles. We call this method particle alloca- U(0) is generated by drawing from the uniform distribution tion with arranged indexes of positions and replication factors U[0, 1/M] and updated by U(m) = U(m−1) +1/M.Thenum- (Figure 2c). Here, the addresses of both replicated particles ber of replications for particle m is determined as the num- and discarded particles as well as the number of times they ber of times the updated uniform number is in the range are replicated (replication factor) are stored. The indexes are [C(m−1), C(m)). For particle 1, U(0) and U(1) belong to the arranged in a way so that the replicated particles are placed range [0, C(1)), so that this particle is replicated twice, which in the upper and the discarded particles in the lower part of is shown with two arrows that correspond to particle 1. Par- the index memory. In Figure 2c, the replicated particles take ticles 2 and 3 are replicated once. Particle 4 is discarded the addresses 1 − 4 and the discarded particle is on the ad- (i(4) = 0) because no U(m) for m = 1, ..., N appears in the dress 5. When one knows in advance the addresses of the dis- range [C(3), C(4)). carded particles, there is no need for additional memory for The RSR algorithm draws the uniform random number storing the resampled particles x(m) because the new particles U(0) = ∆U(0) in the same way but updates it by ∆U(m) = are placed on the addresses occupied by the particles that are (m−1) (m) (m) ∆U + i /M − wn . In the figure, we display both discarded. It is useful for PFs applied to multidimensional (m) (m−1) (m) (m) (m) (m) models since it avoids need for excessive memory for storing U = ∆U +i /M and ∆U = U −wn . Here, the uniform number is updated with reference to the origin of the temporary particles. currently considered weight, while in SR, it is propagated with For the RSR method, it is natural to use particle alloca- reference to the origin of the coordinate system.Thedifference tion with replication factor and arranged indexes because the ∆U(m) between the updated uniform number and the cur- RSR produces replication factors. In the particle generation rent weight is propagated. Figure 1 shows that i(1) = 2and step, the for loop with the number of iterations that corre- ∆ (1) sponds to the replication factors is used for each replicated that U is calculated and then used as the initial uniform ff random number for particle 2. Particle 4 is discarded because particle. The di erence between the SR and the RSR methods is in the way the inner loop in the resampling step for SR ∆U(3) = U(4) >w(4), so that (w(4) − ∆U(3)) · M=−1and n and particle generation step for RSR is performed. Since the i(4) = 0. If we compare ∆U(1) with the relative position of the number of replicated particles is random, the while loop in U(2) and C(1) in SR, ∆U(2) in RSR with the relative position SR has an unspecified number of operations. To allow for an of U(3) and C(2) in SR, and so on, we see that they are equal. unspecified number of iterations, complicated control struc- Therefore, SR and RSR produce identical resampling result. tures in hardware are needed [15]. The main advantage of 3.1. Particle allocation and memory usage our approach is that the while loop of SR is replaced with a for loop with known number of iterations. We call particle allocation the way in which particles are placed in their new memory locations as a result of resampling. With proper allocation, we want to reduce the number 4. DETERMINISTIC RESAMPLING of memory accesses and the size of state memory. The allocation is performed through index addressing, and its execu- 4.1. Overview tion can be overlapped in time with the particle generation In the literature, threshold-based resampling algorithms are step. In Figure 2, three different outputs of resampling for based on the combination of RR and rejection control and the input weights from Figure 1 are considered. In Figure 2a, they result in nondeterministic timing and increased com- the indexes represent positions of the replicated particles. For plexity [8, 16]. Here, we develop threshold-based algorithms example, i(2) = 1 means that particle 1 replaces particle 2. whose purpose is to reduce complexity and processing time. Particle allocation is easily overlapped with particle genera- We refer to these methods as partial resampling (PR) because (m) = (i(m)) = {(m)}M tion using x x for m 1, ..., M,where x m=1 only a part of the particles is resampled. is the set of resampled particles. The randomness of the re- In PR, the particles are grouped in two separate classes: sampling output makes it difficult to realize in-place storage one composed of particles with moderate weights and an- so that additional temporary memory for storing resampled other with dominating and negligible weights. The particles 2270 EURASIP Journal on Applied Signal Processing

U(5)

1 (3) (4) U U w(5) U(4) ∆U(4) ∆ (3) U w(4) U(2) U(3) w(3) ∆U(2)

(1) U (2) C U(2) C(5) = 1 C w C(4) ∆U(1) C(3) (2) U(1) C

(1) C w(1) U(0) U(0)

12345 12345 Particle Particle

(a) (b)

Figure 1: (a) Systematic and (b) residual systematic resampling for an example with M = 5 particles.

Table 1: Weights of particles. 1 1 1 2 1 1 2 ( ) ( ) m w m i m 2 1 2 1 2 2 1 1 7/20 2 3 2 3 1 3 3 1 2 6 20 1 / 4 3 4 0 4 5 1 3 2/20 1 5 5 5 1 5 4 4 2/20 0 5 3/20 1 (a) (b) (c) with moderate weights are not resampled, whereas the negli- Figure 2: Types of memory usages: (a) indexes are positions of the gible and dominating particles are resampled. It is clear that replicated particles, (b) indexes are replication factors, and (c) in- on average, resampling would be performed much faster be- dexes are arranged as positions and replication factors. cause the particles with moderate weights are not resampled. ff We propose several PR algorithms which di er in the resam- The resampling function of the first PR algorithm (PR1) pling function. is shown in Figure 3a and it corresponds to the stratified resampling case. The number of particles at the input and at 4.2. Partial resampling: suboptimal algorithms the output of the resampling procedure is the same and equal PR could be seen as a way of a partial correction of the vari- to Nh + Nl. The resampling function is given by ance of the weights at each time instant. PR methods con-   (m) (m) (m) sist of two steps: one in which the particles are classified as wn for wn >Th or wn 1/M and 0 Th and negligible weights The second PR algorithm (PR2) is shown in Figure 3b. = Nl (m) (m) ff Wl m=1 wn for wn

a(i) a(i)

1 1

− − 1 − Wh − Wl 1 Wh Wl − − M − Nh − Nl M Nh Nl

0 Tl 1/M Th 1 w(i) 0 Tl 1/M Th 1 w(i)

(a) (b)

a(i)

Nh + Nl M

1/M

0 Tl 1/M Th 1 w(i)

(c)

Figure 3: Resampling functions for the PR algorithms (a) PR1, (b) PR2, and (c) PR3. consequently, particles with negligible weights are not used whose resampling function is given as in the resampling procedure. Particles with dominating  weights replace those with negligible weights with certainty.  Nh + Nl (m)  for wn >Th, The resampling function is given as  M  a(m) = 1 (m) (5) n  for Tl T ,   n N n h 0 otherwise. − −h a(m) = 1 Wh Wl (m) (4) n  for Tl Th. There are dominating particles are replicated r =Nl/Nh +1times. ∗(m) (m) only Nh input particles and Nh + Nl particles are produced at The weights are calculated as w = w ,wherem rep- the output. resents positions of particles with moderate weights, and as ∗(l) (m) The third PR algorithm (PR3) is shown in Figure 3c.The w = w /r + Wl/(Nh + Nl), where m are positions of par- weights of all the particles above the threshold Th are scaled ticles with dominating weights and l of particles with both with the same number. So, PR3 is a deterministic algorithm dominating and negligible weights. 2272 EURASIP Journal on Applied Signal Processing

Initial weights Classiﬁcation PR3 algorithm ∗ m w(m) m i(m) w (m) T3 = 1 1 7 10 1 2 4.5/20, 4.5/20 / P1, P2 2 6/10 2 2 4/20, 4/20 T2 = 1/2 3 2/10 3 0 — P5 1/4

Figure 4: OPR method combined with the PR3 method used for ﬁnal computation of weights and replication factors.

Another way of performing PR is to use a set of thresh- that all the particles must be resampled, thereby implying olds. The idea is to perform initial classification of the parti- that there cannot be improvements from an implementation cles while the weights are computed and then to carry out the standpoint. The main purpose of the PR2 algorithm is to im- actual resampling together with the particle generation step. prove the worst-case timing of the PR1 algorithm. Here, only So, the resampling consists of two steps as in the PR2 algo- Nh dominating particles are resampled. So, the input num- rithm, where classification of the particles is overlapped with ber of particles in the resampling procedure is Nh, while the the weight computation. We refer to this method as over- output number of particles is Nh +Nl. If the RSR algorithm is lapped PR (OPR). used for resampling, then the complexity of the second step A problem with the classification of the particles is the is O(Nh). necessity of knowing the overall sum of nonnormalized PR1 and PR2 contain two loops and their timings depend weights in advance. The problem can be resolved as follows. on the weight statistics. As such, they do not have advan- The particles are partitioned according to their weights. The tages for real-time implementation in comparison with RSR, thresholds for group i are defined as Ti−1, Ti for i = 1, ..., K, which has only one loop of M iterations and whose process- where K is the number of groups, Ti−1

SNR SNR 5101520 5101520 1 1

0.1 0.1 BER BER

0.01 0.01

0.001 0.001

MF PR3(2) MF OPR SR PR3(5) SR PR2 SR-eﬀective PR3(10) SR-eﬀective PR3

Figure 5: Performance of the PR3 algorithm for different threshold Figure 6: Comparison of the PR2, PR3, and OPR algorithms with values applied to joint detection and estimation. SR applied to the joint detection and estimation problem. more efficient and it could be implemented using ways simi- the case when the SR is performed are shown as well. It is lar to the linked lists, where the element in a group contains observed that the BER is similar for all types of resampling. two fields: the field with the address of the particle and the However, the best results are obtained when the thresholds field that points out to the next element on the list. Thus, 2M and 1/2M were used. Here, the effective number of par- dynamic allocation requires memory with capacity of 2M ticles that is used is the largest in comparison with the PR3 al- words. As expected, overlapping increases the resources. gorithm with greater Th and smaller Tl. This is a logical result because according to PR3, all the particles are concentrated in the narrower area between the two thresholds producing 5. PARTICLE FILTERING PERFORMANCE in this way a larger effective sample size. PR3 with thresholds AND COMPLEXITY 2M and 1/2M slightly outperforms the SR algorithm which 5.1. Performance analysis is a bit surprising. The reason for this could be that the particles with moderate weights are not unnecessarily resampled The proposed resampling algorithms are applied and their in the PR3 algorithm. The same result is obtained even with performance is evaluated for the joint detection and esti- different values of Doppler spread. mation problem in communication [18, 19] and for the In Figure 6, BER versus SNR is shown for different re- bearings-only tracking problem [7]. sampling algorithms: PR2, PR3, OPR, and SR. The thresholds that are used for the PR2 and PR3 are 2M and 1/2M.The 5.1.1. Joint detection and estimation OPR uses K = 24 groups and thresholds which are power of The experiment considered a Rayleigh fading channel with two. Again, all the results are comparable. The OPR and PR2 additive Gaussian noise with a differentially encoded BPSK algorithms slightly outperform the other algorithms. modulation scheme. The detector was implemented for a 5.1.2. Bearings-only tracking channel with normalized Doppler spreads given by Bd = 0.01, which corresponds to fast fading. An AR(3) process We tested the performance of PFs by applying the resampling was used to model the channel. The AR coefficients were ob- algorithms to bearings-only tracking [7]withdifferent initial tained from the method suggested in [20]. The proposed de- conditions. In the experiment, PR2 and PR3 are used with tectors were compared with the clairvoyant detector, which two sets of threshold values, that is, Th ={2M,10M} and performs matched filtering and detection assuming that the Tl ={1/(2M), 1/(10M)}.InFigure 7, we show the number channel is known exactly by the receiver. The number of of times when the track is lost versus number of particles, for particles was N = 1000. two different pairs of thresholds. We consider that the track In Figure 5, the bit error rate (BER) versus signal-to- is lost if all the particles have zero weights. In the figure, the noise ratio (SNR) is depicted for the PR3 algorithm with PR3 algorithm with thresholds 2M and 1/2M is denoted as different sets of thresholds, that is, Th ={2M,5M,10M} PR3(2) and the one 10M and 1/10M as PR3(10). The used and Tl ={1/(2M), 1/(5M), 1/(10M)}. In the figure, the PR3 algorithms are SR, SR performed after every 5th observation, algorithm with the thresholds 2M and 1/2M is denoted as PR2, and PR3. The resampling algorithms show again similar PR3(2), the one with thresholds 5M and 1/5M as PR3(5), performances. The best results for PR2 and PR3 are obtained and so on. The BERs for the matched filter (MF) and for when the thresholds 10M and 1/10M are used. 2274 EURASIP Journal on Applied Signal Processing

80 Table 2: Comparison of the number of operations for different resampling algorithms. 70 60 Operation SR RR RSR PR3 50 Multiplications 0 NN 0 Additions 2M + N 6N 3N 2N 40 Comparisons N + M 3N 02N 30 Lost track (%) 20 10 5.2.2. Memory requirements 0 In our analysis, we considered the memory requirement not 1000 2000 4000 only for resampling, but also for the complete PF. The mem- Number of particles ory size of the weights and the memory access during weight computation do not depend on the resampling algorithm. SR PR3(10) SR-every 5th time PR2(2) We consider particle allocation without indexes and with in- PR3(2) PR2(10) dex addressing for the SR algorithm, and with arranged indexing for RSR, PR2, PR3, and OPR. For both particle allo- Figure 7: Number of times when the track is lost for the PR2, PR3, cation methods, the SR algorithm has to use two memories and SR applied to the bearings-only tracking problem. for storing particles. In Table 3, we can see the memory capacity for the RSR, PR2, and PR3 algorithms. The difference among these methods is only in the size of the index memory. 5.2. Complexity analysis For the RSR algorithm which uses particle allocation with arranged indexes, the index memory has a size of 2M,whereM The complexity of the proposed resampling algorithms is words are used for storing the addresses of the particles that evaluated. We consider both computation complexity as well are replicated or discarded. The other M words represent the as memory requirements. We also present benefits of the replication factors. proposed algorithms when concurrency in hardware is ex- The number of resampled particles for the worst case of ploited. the PR2 algorithm corresponds to the number of particles in the RSR algorithm. Therefore, their index memories are of 5.2.1. Computational complexity the same size. From an implementation standpoint, the most In Table 2, we provide a comparison of the different resam- promising algorithm is the PR3 algorithm. It is the simplest pling algorithms. The results for RR are obtained for the one and it requires the smallest size of memory. The replica- worst-case scenario. The complexity of the RR, RSR, and PR tion factor of the dominating particles is the same and of the algorithmsisofO(N), and the complexity of the SR algo- moderate particles is one. So, the size of the index memory of rithm is of O(max (N, M)), where N and M are the input and PR3 is M, and it requires only one additional bit to represent output numbers of particles of the resampling procedure. whether a particle is dominant or moderate. The OPR algorithm needs the largest index memory. When the number of particles at the input of the resam- ff pling algorithm is equal to the number of particles at the out- When all the PF steps are overlapped, it requires a di erent put, the RR algorithm is by far the most complex. While the access pattern than the other deterministic algorithms. Due number of additions for the SR and RSR algorithms are the to possible overwriting of indexes that are formed during the same, the RSR algorithm performs M multiplications. Since weight computation step with the ones that are read during multiplication is more complex than addition, we can view particle generation, it is necessary to use two index-memory that the SR is a less complex algorithm. However, when N is a banks. Furthermore, particle generation and weight compu- power of two such that the multiplications by N are avoided, tation should access these memories alternately. Writing to the RSR algorithm is the least complex. the first memory is performed in the resampling step in one The resampling algorithms SR, RSR, and PR3 were im- time instance whereas in the next one, the same memory is plemented on the Texas Instruments (TI) floating-point used by particle generation for reading. The second mem- DSP TMS320C67xx. Several steps of profiling brought about ory bank is used alternately. If we compare the memory re- five-fold speed-up when the number of resampled parti- quirements of the OPR algorithm with that of the PR3 algo- cles was 1000. The particle allocation step was not consid- rithm, it is clear that OPR requires four times more memory ered. The number of clock cycles per particle was around for storing indexes for resampling. 18 for RSR and 4.1 for PR3. The SR algorithm does not 5.2.3. PF speed improvements have fixed timing. The mean duration was 24.125 cycles per particle with standard deviation of 5.17. On the pro- The PF sampling frequency can be increased in hardware cessor TMS320C6711C whose cycle time is 5 nanoseconds, by exploiting temporal concurrency. Since there are no data the processing of RSR with 1000 particles took 90 microsec- dependencies among the particles in the particle generation onds. and weight computation, the operations of these two steps Computational Complexity of Resampling in Particle Filters 2275

Table 3: Memory capacity for diﬀerent resampling algorithms.

SR without indexes SR with indexes RSR PR2 PR3 OPR

States 2NsM 2NsMNsMNsMNsMNsM Weights M M MMMM Indexes 0 M 2M 2MM4M

Generation of particles

Generation of particles Weight computation Weight computation Resampling Resampling

LT T LTLr

(a) (b)

Figure 8: The timing of the PF with the (a) RSR or PR methods and (b) with the OPR method. can be overlapped. Furthermore, the number of memory ac- resampling operation performed in M clock cycles, RSR or cesses is reduced because during weight computation, the PR3 algorithms with particle allocation with arranged in- values of the states do not need to be read from the mem- dexes must be used. The minimum PF sampling period that ory since they are already in the registers. canbeachievedis(2MTclk + L). The normalization step requires the use of an additional OPR in combination with the PR3 algorithm allows for loop of M iterations as well as M divisions per observation. It higher sampling frequencies. In the OPR, the classification has been noted that the normalization represents an unnec- of the particles is overlapped with the weight calculation as essary step which can be merged with the resampling and/or shown in Figure 8b. The symbol Lr is the constant latency of the computation of the importance weights. Avoidance of the part of the OPR algorithm that determines which group normalization requires additional changes which depend on contains moderate, and which contains negligible and domi- whether resampling is carried out at each time instant and nating particles. The latency Lr is proportional to the number on the type of resampling. For PFs which perform SR or RSR of OPR groups. The speed of the PF can almost be increased at each time instant, the uniform random number in the re- twice if we consider pipelined hardware implementation. In sampling algorithm should be drawn from [0, WM/M)and Figure 8b, it is obvious that the PF processing time is reduced updated with WM/M,whereWM is the sum of the weights. to (MTclk + L + Lr ). Normalization in the PR methods could be avoided by including information about the sum WM in the thresholds by 5.3. Final remarks using T = T W and T = T W . With this approach, hn h M ln l M We summarize the impact of the proposed resampling algo- dynamic range problems for fixed precision arithmetics that rithms on the PF speed and memory requirements. appear usually with division are reduced. The computational burden is decreased as well because the number of divisions (1) The RSR is an improved RR algorithm with higher is reduced from M to 1. speed and fixed processing time. As such, besides for The timing operations for a hardware implementation, hardware implementations, it is a better algorithm for where all the blocks are fine-grain pipelined are shown in resampling that is executed on standard computers. Figure 8a. Here, the particle generation and weight calcula- (2) Memory requirements are reduced. The number of tion operations are overlapped in time and normalization is memory access and the size of the memory are reduced avoided. The symbol L is the constant hardware latency de- when RSR or any of PR algorithms are used for mul- fined by the depth of pipelining in the particle generation and tidimensional state-space models. These methods can weight computation, Tclk is the clock period, M is the num- be appropriate for both hardware and DSP applica- ber of particles, and T is the minimum processing time of any tions, where the available memory is limited. When the of the basic PF operations. The SR is not suitable for hard- state-space model is one dimensional, then there is no ware implementations, where fixed and minimal timings are purpose of adding an index memory and introducing required, because its processing time depends on the weight a more complex control. In this case, the SR algorithm distribution and it is longer than MTclk.So,inordertohave is recommended. 2276 EURASIP Journal on Applied Signal Processing

(3) In hardware implementation and with the use of tem- [7] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel ap- poral concurrency, the PF sampling frequency can be proach to nonlinear/non-Gaussian Bayesian state estimation,” considerably improved. The best results are achieved IEE Proceedings Part F: Radar and Signal Processing, vol. 140, for the OPR algorithm at the expense of hardware re- no. 2, pp. 107–113, 1993. [8] J. S. Liu, R. Chen, and T. Logvinenko, “A theoretical frame- sources. work for sequential importance sampling and resampling,” (4)Theaverageamountofoperationsisreduced.Thisis in Sequential Monte Carlo Methods in Practice,A.Doucet, true for PR1, PR2, and PR3 since they perform re- N. de Freitas, and N. Gordon, Eds., pp. 225–246, Springer, sampling on a smaller number of particles. This is New York, NY, USA, January 2001. desirable in PC simulations and some DSP applica- [9] C. Berzuini, N. G. Best, W. R. Gilks, and C. Larizza, “Dy- namic conditional independence models and Markov chain tions. Monte Carlo methods,” Journal of the American Statistical As- sociation, vol. 92, no. 440, pp. 1403–1412, 1997. [10] A. Kong, J. S. Liu, and W. H. Wong, “Sequential imputations 6. CONCLUSION and Bayesian missing data problems,” Journal of American Statistical Association, vol. 89, no. 425, pp. 278–288, 1994. Resampling is a critical step in the hardware implementation [11] J. S. Liu and R. Chen, “Blind deconvolution via sequential of PFs. We have identified design issues of resampling algo- imputations,” Journal of American Statistical Association, vol. rithms related to execution time and storage requirement. 90, no. 430, pp. 567–576, 1995. We have proposed new resampling algorithms whose pro- [12] E.R.BeadleandP.M.Djuric,´ “A fast-weighted Bayesian bootstrap filter for nonlinear model state estimation,” IEEE Trans. cessing time is not random and that are more suitable for on Aerospace and Electronic Systems, vol. 33, no. 1, pp. 338– hardware implementation. The new resampling algorithms 343, 1997. reduce the number of operations and memory access or al- [13] D. Crisan, P. Del Moral, and T. J. Lyons, “Discrete filtering us- low for overlapping the resampling step with weight compu- ing branching and interacting particle systems,” Markov Pro- tation and particle generation. While these algorithms min- cesses and Related Fields, vol. 5, no. 3, pp. 293–318, 1999. imize performance degradation, their complexity is reduced [14] M. Bolic,´ P. M. Djuric,´ and S. Hong, “New resampling algo- remarkably. We have also provided performance analysis of rithms for particle filters,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’03), vol. 2, pp. 589–592, PFs that use our resampling algorithms when applied to Hong Kong, China, April 2003. joint detection and estimation in wireless communications [15] K. Compton and S. Hauck, “Reconfigurable computing: a sur- and bearings-only tracking. Even though the algorithms are vey of systems and software,” ACM Computing Surveys, vol. developed with the aim of improving the hardware imple- 34, no. 2, pp. 171–210, 2002. mentation, these algorithms should also be considered as [16] J. S. Liu, R. Chen, and W. H. Wong, “Rejection control and resampling methods in simulations on standard comput- sequential importance sampling,” JournalofAmericanStatis- ers. tical Association, vol. 93, no. 443, pp. 1022–1031, 1998. [17] M. Bolic,´ P. M. Djuric,´ and S. Hong, “Resampling algorithms and architectures for distributed particle filters,” submitted to IEEE Trans. Signal Processing. ACKNOWLEDGMENT [18]R.Chen,X.Wang,andJ.S.Liu,“Adaptivejointdetection and decoding in flat-fading channels via mixture Kalman fil- This work has been supported under the NSF Awards CCR- tering,” IEEE Transactions on Information Theory, vol. 46, no. 0082607 and CCR-0220011. 6, pp. 2079–2094, 2000. [19] P. M. Djuric,´ J. H. Kotecha, J. Zhang, et al., “Particle filtering,” IEEE Signal Processing Magazine, vol. 20, no. 5, pp. 19– REFERENCES 38, 2003. [1] J. M. Bernardo and A. F. M. Smith, Bayesian Theory,John [20] H. Zamiri-Jafarian and S. Pasupathy, “Adaptive MLSD Wiley & Sons, New York, NY, USA, 1994. receiver with identification of flat fading channels,” in Proc. IEEE Vehicular Technology Conference, vol. 2, pp. 695– [2] J. Geweke, “Antithetic acceleration of Monte Carlo integra- 699, 1997. tion in Bayesian inference,” Journal of Econometrics, vol. 38, no. 1-2, pp. 73–89, 1988. [3] A. Gerstlauer, R. Domer, J. Peng, and D. Gajski, System Design: Miodrag Bolic´ received the B.S. and M.S. A Practical Guide with SpecC, Kluwer Academic Publishers, degrees in electrical engineering and com- Boston, Mass, USA, 2001. puter sciences (EECS) from the University [4] S. Hong, M. Bolic,andP.M.Djuri´ c,´ “An efficient fixed-point of Belgrade, Yugoslavia, in 1996 and 2001, implementation of residual resampling scheme for high- respectively. He is now a Ph.D. student at speed particle filters,” IEEE Signal Processing Letters, vol. 11, the State University of New York (SUNY), no. 5, pp. 482–485, 2004. Stony Brook, USA, and he is working part- [5] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A time for Symbol Technologies, Holtsville, tutorial on particle filters for online nonlinear/non-Gaussian NY. Before joining SUNY, he worked at Bayesian tracking,” IEEE Trans. Signal Processing, vol. 50, no. the Institute of Nuclear Science Vinca,ˇ Yu- 2, pp. 174–188, 2002. goslavia. His research is related to the development and modifica- [6] A. Doucet, N. de Freitas, and N. Gordon, Eds., Sequential tion of the statistical signal processing algorithms for more efficient Monte Carlo Methods in Practice, Springer,NewYork,NY, hardware implementation and VLSI architectures for digital signal USA, 2001. processing. Computational Complexity of Resampling in Particle Filters 2277

Sangjin Hong received the B.S. and M.S. degrees in electrical engineering and computer sciences (EECS) from the Univer- sity of California, Berkeley. He received his Ph.D. degree in EECS from the Univer- sity of Michigan, Ann Arbor. He is currently with the Department of Electrical and Computer Engineering at State University of New York (SUNY), Stony Brook. Before joining SUNY, he worked at Ford Aerospace Corp. Computer Systems Division as a Systems Engineer. He also worked at Samsung Electronics in Korea as a Technical Consultant. His current research interests are in the areas of low-power VLSI design of multimedia wireless communications and digital signal processing systems, reconﬁgurable SoC design and optimization, VLSI signal processing, and low-complexity digital circuits. Prof. Hong served on numerous technical program committees for IEEE conferences. Prof. Hong is a Member of IEEE. EURASIP Journal on Applied Signal Processing 2004:15, 2278–2294 c 2004 Hindawi Publishing Corporation

A New Class of Particle Filters for Random Dynamic Systems with Unknown Statistics

Joaqu´ın M´ıguez Departamento de Electronica´ e Sistemas, Universidade da Coruna,˜ Facultade de Informatica,´ Campus de Elvina˜ s/n, 15071 A Coruna,˜ Spain Email: [email protected] Monica´ F. Bugallo Department of Electrical and Computer Engineering, State University of New York at Stony Brook, Stony Brook, NY 11794-2350, USA Email: [email protected] Petar M. Djuric´ Department of Electrical and Computer Engineering, State University of New York at Stony Brook, Stony Brook, NY 11794-2350, USA Email: [email protected]

Received 4 May 2003; Revised 29 January 2004

In recent years, particle filtering has become a powerful tool for tracking signals and time-varying parameters of random dynamic systems. These methods require a mathematical representation of the dynamics of the system evolution, together with assumptions of probabilistic models. In this paper, we present a new class of particle filtering methods that do not assume explicit mathematical forms of the probability distributions of the noise in the system. As a consequence, the proposed techniques are simpler, more robust, and more flexible than standard particle filters. Apart from the theoretical development of specific methods in the new class, we provide computer simulation results that demonstrate the performance of the algorithms in the problem of autonomous positioning of a vehicle in a 2-dimensional space. Keywords and phrases: particle filtering, dynamic systems, online estimation, stochastic optimization.

L L 1. INTRODUCTION (e) fy : R x → Iy ⊆ R y is a (possibly nonlinear) transformation of the state; L Many problems in signal processing can be stated in terms of (f) vt ∈ R y is the observation noise vector at time t,as- the estimation of an unobserved discrete-time random signal sumed statistically independent from the system noise in a dynamic system of the form ut. Equation (1) describes the dynamics of the system state vec- xt = fx(xt− )+ut, t = 1, 2, ...,(1) 1 tor and, hence, it is usually termed state equation or system yt = fy(xt)+vt, t = 1, 2, ...,(2)equation, whereas (2) is commonly referred to as observation equation or measurement equation. It is convenient to dis- where tinguish the structure of the dynamic system defined by the L fx fy (a) xt ∈ R x is the signal of interest, which represents the functions and from the associated probabilistic model, system state at time t; which depends on the probability distribution of the noise L L signals and the a priori distribution of the state, that is, the (b) fx : R x → Ix ⊆ R x is a (possibly nonlinear) state statistics of x . transition function; 0 L We denote the a priori probability density function (pdf) (c) ut ∈ R x is the state perturbation or system noise at of a random signal s as p(s). If the signal s is statistically t time ; dependent on some observation z, then we write the con- L (d) yt ∈ R y is the vector of observations collected at time ditional (or a posteriori) pdf as p(s|z). From the Bayesian t, which depends on the system state; point of view, all the information relevant for the estimation ANewClassofParticleFilters 2279 of the state at time t is contained in the so-called filtering pdf, (i) the knowledge of the probabilistic model of the dy- that is, the a posteriori density of the system state given the namic system (1)-(2), which includes the densities observations up to time t, p(x0), p(ut), and p(vt), (ii) the ability to numerically evaluate the likelihood p | xt y1:t ,(3)p(yt|xt) and to sample from the transition density p(xt|xt−1). where y1:t ={y1, ..., yt}. The density (3) usually involves a multidimensional integral which does not have a closed- Therefore, the practical performance of PF algorithms in form solution for an arbitrary choice of the system structure real-world problems heavily depends on how accurate the and the probabilistic model. Indeed, analytical solutions can underlying probabilistic model of choice is. Although this only be obtained for particular setups. The most classical ex- may seem irrelevant in engineering problems where it is rel- ample occurs when fx and fy are linear functions and the atively straightforward to associate the observed signals with noise processes are Gaussian with known parameters. Then realistic and convenient probability distributions, in many the filtering pdf of xt is itself Gaussian, with mean mt and situations, this is not the case. Many times, it is very hard covariance matrix Ct,whichwedenoteas to find an adequate model using the information available in practice. In other occasions, the working models obtained p xt|y1:t = N mt, Ct ,(4)after a lot of effort (involving, e.g., time-series analysis techniques) are so complicated that they render any subsequent where the posterior parameters mt and Ct can be recur- signal processing algorithm impractical due to its high com- sively computed, as time evolves, using the elegant algorithm plexity. known as Kalman filter (KF) [1]. Unfortunately, the assump- In this paper, we introduce a new PF approach to deal tions of linearity and Gaussianity do not hold for most with uncertainty in the probabilistic modeling of the dy- real-world problems. Although modified Kalman-like solu- namic system (1)-(2). We start with the requirement that the tions that account for general nonlinear and non-Gaussian ultimate objective of PF is to yield an estimate of the sig- settings have been proposed, including the extended KF nals of interest x0:t, given the observations y1:t.Ifasuitable (EKF) [2] and the unscented KF (UKF) [3], such tech- probabilistic model is at hand, good signal estimates can be niques are based on simplifications of the system dynam- computed from the filtering pdf (3) induced by the model, ics and suffer from severe degradation when the true dy- and a PF algorithm can be employed to recursively build up namic system departs from the linear and Gaussian assump- a random grid that approximates the posterior distribution. tions. However, it is often possible to use signal estimation methods Since general analytical solutions are intractable, Baye- that do not explicitly rely on the a posteriori pdf, for example, sian estimation in nonlinear, non-Gaussian systems must blind detection in digital communications can be performed be addressed using numerical techniques. Deterministic ap- according to several criteria, such as the constrained mini- proaches, such as classical numerical integration procedures, mization of the received signal power [11] or the constant turn out ineffective or too demanding except for very low- modulus method [12, 13]. Such approaches are very popular dimensional systems and, as a consequence, methods based because they are based on a simple figure of merit, and this on the Monte Carlo methodology have progressively gained simplicity leads to robust and easy-to-design algorithms. momentum. Monte Carlo methods are simulation-based The same advantages in robustness and easy design can techniques aimed at estimating the a posteriori pdf of the be gained in PF whenever a complex (yet possibly not state signal given the available observations. The pdf esti- strongly tied to physical reality) probabilistic model can be mate consists of a random grid of weighted points in the state substituted by a simpler reference for signal estimation. Con- space RLx . These points, usually termed particles,areMonte trary to initial intuition, estimation techniques based on Carlo samples of the system state that are assigned nonnega- ad hoc, heuristic, or, simply, alternative references, differ- tive weights, which can be interpreted as probabilities of the ent from the state posterior distribution, are not precluded particles. by the use of the PF methodology. It is important to real- The collection of particles and their weights yield an em- ize that the procedure for sequential build-up of the ran- pirical measure which approximates the continuous a poste- dom grid is not tied to the concept of a posteriori pdf. We riori pdf of the system state [4]. The recursive update of this will show that, by simply specifying a stochastic mechanism measure whenever a new observation is available is known for generating particles, the PF methodology can be success- as particle filtering (PF). Although there are other popular fully used to build an approximation of any function of the Monte Carlo methods based on the idea of producing em- system state that admits a recursive decomposition. Specif- pirical measures with random support, for example, Markov ically, we propose a new family of PF algorithms in which Chain Monte Carlo (MCMC) techniques [5], PF algorithms the statistical reference of the a posteriori state pdf is sub- have recently received a lot of attention because they are par- stituted by a user-defined cost function that measures the ticularly suitable for real-time estimation. The sequential im- quality of the state signal estimates according to the avail- portance sampling (SIS) algorithm [6, 7] and the bootstrap able observations. Hence, methods within the new class are filter (BF) [8, 9] are the most successful members of the PF termed cost-reference particle filters (CRPFs), in contrast class of methods [10]. Existing PF techniques rely on to conventional statistical-reference particle filters (SRPFs). 2280 EURASIP Journal on Applied Signal Processing

As long as a recursive decomposition of the cost function is of CRPFs and propose a simple choice of the algorithm pa- found, a PF algorithm, similar to the SIS and bootstrap meth- rameters that lead to a straightforward interpretation of the ods, can be used to construct a random-grid approximation CRPF technique. of the cost function in the vicinity of its minima. For this reason, CRPFs yield local representations of the cost function 2.1. Sequential algorithm specifically built to facilitate the computation of minimum- The ultimate aim of the method is the online estimation of cost estimates of the state signal. the sequence of system states from the available observations, The remainder of this paper is organized as follows. that is, we intend to estimate xt|y1:t, t = 0, 1, 2, ...,accord- The fundamentals of the CRPF family are introduced in ing to some reference function that yields a quantitative mea- Section 2. This includes three basic building blocks: the cost sure of quality. In particular, we propose the use of a real cost and risk functions, which provide a measure of the quality of function with a recursive additive structure, that is, the particles, and the stochastic mechanism for particle generation and sequential update of the random grid. Since the C x0:t|y1:t, λ = λC x0:t−1|y1:t−1, λ + C xt|yt ,(5) usual tools for PF algorithm design (e.g., proposal distribu- L L tions, auxiliary variables, etc.) do not necessarily extend to where 0 <λ<1 is a forgetting factor, C : R x × R y → R C | λ the new framework, this section also contains a discussion on is the incremental cost function, and (x0:t y1:t, ) complies design issues. In particular, we identify the factors on which with the definition the choice of the cost and risk functions will usually depend, C R(t+1)Lx × RtLy × R −→ R. and derive consequent design guidelines, including a useful : (6) choice of parameters that leads to a simple interpretation of We should remark that (5) is not the only recursive decom- the algorithm and its connection with the theory of stochas- position that can be employed. A straightforward alternative tic approximation (SA) [14]. is to choose a cost function which is built at time t as the Due to the change in the reference, convergence results convex sum regarding SRPFs are not valid for CRPFs. Section 3 is devoted ffi to the problem of identifying su cient conditions for asymp- C x0:t|y1:t, λ = λC x0:t−1|y1:t−1, λ +(1− λ)C(xt|yt). totically optimal propagation (AOP) of particles. The stochas- (7) tic procedure for drawing new samples of the state signal and propagating the existing particles using the new samples is This form of cost function is perfectly valid for the defini- the key for the convergence of the algorithm. We term this tion and construction of CRPFs and choosing it would not particle propagation step as asymptotically optimal when the affect (or would affect trivially) the arguments presented in increment in the average cost of the particles in the filter after the rest of this paper, including the asymptotic convergence propagation is minimal. A set of sufficient conditions for op- results in Section 3.However,wewillconstrainourselvesto timal propagation, related to the properties of the sampling the familiar form of (5) for simplicity. and weighting methods, is provided. AhighvalueofC(x0:t|y1:t, λ) means that the state se- Section 4 is devoted to the discussion of resampling in the quence x0:t is not a good estimate given the sequence of ob- new family of PF techniques. We argue that the objective of servations y1:t, while a low value of C(x0:t|y1:t, λ) indicates ff this important algorithmic step is di erent from its usual role that x0:t is close to the true state signal. The sequence x0:t is in conventional PF algorithms, and exploit this difference said to have a high cost, in the former case, or a low cost, in to propose a local resampling scheme suitable for a straight- the latter case. Particularly notice the recursive structure in forward implementation using parallel VLSI hardware (note (5), where the cost of a sequence up to time t − 1canbeup- that resampling is a major bottleneck for the parallel imple- dated by solely looking at the state and observation vectors mentation of PF methods [15]). at time t, xt,andyt, respectively, which are used to compute Computer simulation results that illustrate the validity the cost increment C(xt|yt). The forgetting factor λ avoids of our approach are presented in Section 5.Inparticular,we attributing an excessive weight to old observations when a tackle the problem of positioning a vehicle that moves along long series of data are collected, hence allowing for potential a 2-dimensional space. An instance of the proposed CRPF adaptivity. class of methods that employs a simple cost function is com- We also introduce a one-step risk function of the form pared with the standard auxiliary BF [9] technique. Finally, L L Section 6 contains conclusions. R : R x × R y −→ R, (8) xt−1, yt R xt−1|yt 2. COST-REFERENCE PARTICLE FILTERING that measures the adequacy of the state at time t −1 given the The basics of the new family of PF methods are introduced new observation yt. It is convenient to view the risk function in this section. We start with a general description of the R(xt−1|yt) as a prediction of the cost increment C(xt|yt) CRPF technique, where key concepts, namely, the cost and that can be obtained before xt is actually propagated. Hence, risk functions, particle propagation, and particle selection, a natural choice of the risk function is are introduced. The second part of the section is devoted to practical design issues. We suggest guidelines for the design R xt−1 yt =C fx xt−1 yt . (9) ANewClassofParticleFilters 2281

The proposed estimation technique proceeds sequen- which yields a predictive cost of the trajectory x0:t according tially in a similar manner as the BF. Given a set of M to the new observation yt. Deﬁne a probability mass function state samples and associated costs up to time t, that is, the (pmf) of the form weighted-particle set (wps) (i) (i) πˆt ∝ µ Rt , (14) M +1 +1 Ξ = (i) C(i) t xt , t i=1, (10) where µ : R → [0, +∞) is a monotonically decreasing func- C(i) = C (i)| λ tion. An intermediate wps is obtained by resampling the where t (x0:t y1:t, ), the grid of state trajectories is { (i)}M π(i) randomly propagated when yt+1 is observed in order to build trajectories xt i=1 according to the pmf ˆt+1.Speciﬁcally, i k k Ξt ( ) = ( ) π( ) an updated wps +1. The state and observation signals are we select xˆt xt with probability ˆt+1, and build the those described in the dynamic system (1)-(2). We only add Ξˆ ={(i) Cˆ (i)}M Cˆ (i) = C(k) set t+1 xˆt , t i=1,where t t if and only if the following mild assumptions: (i) (k) xˆt = xt . (1) the initial state is known to lie in a bounded interval (3) Time t +1(random particle propagation). Select an I ⊂ RLx p | x0 ; arbitrary conditional pdf of the state t+1(xt+1 xt) with the (2) the system and observation noise are both zero mean. constraint that

Assumption (1) is needed to ensure that we initially draw a Ep | t = fx t set of samples that is not infinitely far from the true state x0. t+1(xt+1 xt ) x +1 x , (15) Notice that this is a structural assumption, not a probabilistic one. Assumption (2) is made for the sake of simplicity where Ep(s)[·] denotes expected value with respect to the pdf because zero-mean noise is the rule in most systems. in the subindex. Using the selected propagation density, draw The sequential CRPF algorithm based on the structure new particles of system (1)-(2), the definitions of cost and risk functions given by (5)and(8), respectively, and assumptions (1) and (i) ∼ p | (i) (2), is described below. xt+1 t+1 xt+1 xˆt (16) (1) Time t = 0 (initialization). Draw M particles from the uniform distribution in the interval Ix , 0 and update the associated costs (i) ∼ U I x0 x0 , (11) C(i) = λCˆ (i) C(i) t+1 t + t+1, (17) and assign them a zero cost. The initial wps where M Ξ = (i) C(i) = 0 x0 , 0 0 i=1 (12) C(i) =C (i) t+1 xt+1 yt+1 (18) is obtained. (2) Time t +1(selection of the most promising trajectories). for i = 1, 2, ..., M. Ξ ={ (i) C(i) }M The goal of the selection step is to replicate those particles As a result, the updated wps t+1 xt+1, t+1 i=1 is ob- with a low cost while high-cost particles are discarded. As tained. usual in PF methods, selection is implemented by a resam- (4) Time t +1(estimation of the state). Estimation pro- pling procedure [7]. We point out that, differently from the π(i) i = ... M cedures are better understood if a pmf t+1, 1, 2, , , standard BF, resampling in CRPFs does not produce equally is assigned to the particles in Ξt+1. The natural way to define weighted particles. Instead, each particle preserves its own this pmf is according to the particle costs, that is, cost. Notice that the equal weighting of resampled particles in standard PF algorithms comes from the use of a statisti- π(i) ∝ µ C(i) cal reference. In CRPF, preserving the particle costs after re- t+1 t+1 , (19) sampling actually shifts the random grid representation of the cost function toward its local minima. Such a behavior where µ is a monotonically decreasing function. is sound, as we are interested in minimum cost signal esti- t mates. Further issues related to resampling are discussed in The minimum cost estimate at time + 1 is trivially com- Section 4. puted as For i = 1, 2, ..., M, compute the one-step risk of particle i and let (i) i = arg max πt , 0 i +1 (20) i i i i R( ) = λC( ) R ( )| min = ( 0) t+1 t + xt yt+1 (13) x˜0:t+1 xt+1 2282 EURASIP Journal on Applied Signal Processing and its physical meaning is obvious. An equally useful esti- A simple, yet useful and physically meaningful, choice of (i) C ·|· · R ·|· mate can be computed as the mean value of xt+1 according to ( , ), ( ) that will be used in the numerical examples π(i) of Section 5 is given by the pmf t+1, that is, M C x0 = 0, (22) (i) (i) mean = π . q x˜t+1 t+1xt+1 (21) C = − f i= xt yt yt y xt , (23) 1 q mean R xt yt+1 = yt+1 − fy fx xt , (24) Note that x˜t+1 can also be regarded as a minimum cost estimate because the particle set Ξt is a random-grid local √ +1 q ≥ = T representation of the cost function in the vicinity of its min- where 1and v v v denotes the norm of v.Givena ima. In fact, estimator (21) has slight advantages over (20). fixed and bounded sequence of observations y1:t, the optimal π(i) (minimum cost) sequence of state vectors is Namely, the averaging of particles according to the pmf t+1 yields an estimated state trajectory which is smoother than opt = C λ x0:t arg min x0:t y1:t, the one resulting from simply choosing the particle with the x0:t least cost at each time step. Besides, computing the mean t (25) (i) = λ(t−k)C . of the particles under πt may result in an estimate with a arg min xt yt +1 x0:t mean k=0 slightly smaller cost than the least cost particle, since x˜t+1 is obtained by interpolation of particles around the least cost opt We call x0:t optimal because it is obtained by minimization state. ff Sufficient conditions for the mean estimate (21) to attain of the continuous cost function, and it is in general di erent from the minimum cost estimate obtained by CRPF, which an asymptotically minimum cost are given in Section 3. min We will refer to the general procedure described above as we have denoted as x˜0:t in Section 2.1. With the assumed choice of cost and risk functions given a CRPF algorithm. It is apparent that many implementations L by (22)–(24), the invertible observation function fy : R x → are possible for a single problem, so in the next section, we L Iy ⊆ R y , and yt ∈ Iy,forallt ≥ 1, it is straightforward to discuss the choice of the functions and parameters involved 1 in the method. derive a pointwise solution of the form opt −1 xt = arg min C xt yt = fy yt . (26) 2.2. Design issues xt An instance of the CRPF class of algorithms is selected by choosing Therefore, as the CRPF algorithm randomly selects and propagates the sample states with the least cost, it can be un- (i) the cost function C(·|·), derstood (again, under assumption of (22)–(24)) as a nu- (ii) the risk function R(·|·), merical stochastic method for approximately solving the set (iii) the monotonically decreasing function µ : R → [0, of (possibly nonlinear) equations +∞) that maps costs and risks into the resampling and estimation pmfs, as indicated in (14)and(19), respec- yt − fy xt = 0, t = 1, 2, .... (27) tively, Furthermore, setting q = 1in(23)and(24), we obtain a (iv) the sequence of pdfs pt+1(xt+1|xt) for particle generation. Monte Carlo estimate of the mean absolute deviation solution of the above set of equations, while q = 2 results in a The cost and risk functions measure the quality of the par- stochastic optimization of the least squares type. ticles in the filter. Recall that the risk is conveniently inter- This interpretation of the CRPF algorithm as a method preted as a prediction of the cost of a particle, given a new ob- for numerically solving (27) allows to establish a connec- servation, before random propagation is actually carried out tion between the proposed methodology and the theory of (see the selection step in Section 2.1). Therefore, the cost and SA [14], which is briefly commented upon in Appendix A. the risk should be closely related, and we suggest to choose The function µ : R → [0, +∞) should be selected to R(·|·) according to (9). Whenever possible, both the cost guarantee an adequate discrimination of low-cost particles and risk functions should be from those presenting higher costs (recall that we are interested in computing a local representation of the cost func- (i) strictly convex in the range of values of xt, where the tion in the vicinity of its minima). As shown in Section 5, state is expected to lie, in order to avoid ambiguities in the choice of µ has a direct impact on the algorithm perfor- the estimators (20)and(21) as well as in the selection mance. Specifically, notice that the accuracy of the selection (resampling) step, step is highly dependent on the ability of µ to assign large (ii) easy to compute in order to facilitate the practical im- probability masses to lower-cost particles. plementation of the algorithm, (iii) dependent on the complete state and observation sig- t 1 nals, that is, it should involve all the elements of x and Note, however, that additional solutions may exist at ∇x fy(x) = 0 de- yt. pending on fy(·). ANewClassofParticleFilters 2283

A straightforward choice of this function is for AOP of the particles from time t − 1totimet.Let Ξ ={(i) C(i)}M t t xt , t i=1 be the wps computed at time .Wesay −1 (i) (i) (i) that Ξt has been obtained by AOP from Ξt−1 if and only if µ1 Ct = Ct , Ct ∈ R \{0}, (28) opt lim C xt yt − Ct = 0 (in some sense), (32) which is simple to compute and potentially useful in many M→∞ systems. It has a serious drawback, however, in situations i i {C( )}− {C( )} opt where the range of the costs, that is, maxi t mini t , where xt is the optimal state according to (26)and /M M C(i) is much smaller than the average cost (1 ) i=1 t .In such scenarios, µ1 yields nearly uniform probability masses M (i) (i) and the algorithm performance degrades. Better discrimina- Ct = t Ct , (33) tion properties can be achieved with an adequate modifica- i=1 tion of µ1,forexample,with (i) (i) with a pmf t ∝ µ(Ct ), is the mean incremental cost at (i) 1 time t. The results presented in this section prove that AOP µ Ct = 2 (i) (k) β , (29) Ct − mink Ct + δ can be ensured by adequately choosing the propagation density and function µ : R → [0, ∞) that relates the cost to the (i) (i) (i) (i) pmf’s πˆt and πt . Notice that πt = t when λ = 0. where 0 <δ<1andβ>1. When compared with µ1, µ2 assigns larger masses to low-cost particles and much smaller A corollary of the AOP convergence theorem is also es- ffi masses to higher-cost samples. The discrimination ability of tablished that provides su cient conditions for the mean state estimate given by (21), for the case λ = 0, to be asymp- µ2 is enhanced by reducing the value of δ (i.e., δ 0) and/or totically optimal in terms of its incremental cost. increasing β. The relative merit of µ2 over µ1 is experimen- tally illustrated in Section 5. The last selection to be made is the pdf for particle prop- 3.1. Preliminary definitions p | agation, t+1(xt+1 xt), in step 3 of the CRPF algorithm. The Some preliminary definitions are necessary before stating theoretical properties required for optimal propagation are and proving sufficient AOP conditions. If the selection explored in Section 3. From a practical and intuitive2 point and propagation steps of the CRPF method are considered of view, it is desirable to use easy-to-sample pdfs with a large jointly, it turns out that, at time t, M particles are sampled as enough variance to avoid losing tracks of the state signal, (i) M but not too large, to prevent the generation of too dispersed xt ∼ pt (x), (34) particles. A simple strategy implemented in the simulations of Section 5 consists of using zero-mean Gaussian densities where M < ∞ denotes the number of particles available at M with adaptively selected variance. Specifically, the particle i is time t − 1 and sampling the pdf pt (x) amounts to resam- propagated from time t to time t +1as M Ξ ={ (i) C(i) }M pling times in t−1 xt−1, t−1 i=1 and then propagating the resulting particles and updating the costs to build the i i i ( ) ∼ N f ( ) σ2,( ) Ξ ={(i) C(i)}M xt+1 x xˆt−1 , t ILx , (30) new wps t xt , t i=1 (note that we explicitly allow M = M). Although other possibilities exist, for example, as where ILx is the Lx × Lx identity function and the variance described in Section 4, when multinomial resampling is used 2,(i) σt is recursively computed as in the selection step of the CRPF algorithm, the pdf in (34)is a finite mixture of the form (i) (i) 2 i t − i xt − fx xˆt− σ2,( ) = 1σ2,( ) 1 . M t t−1 + (31) k k t tLx pM = π( ) p ( ) . t (x) ˆt t x xt−1 (35) k=1 This adaptive-variance technique has appeared useful and efficient in our simulations, as illustrated in Section 5,butal- We also introduce the following notation for a ball cen- ternative approaches (including the simple choice of a fixed opt tered at xt with radius ε>0: variance) can also be successfully exploited. opt Lx opt S xt , ε = x ∈ R : x − xt <ε , (36) 3. CONVERGENCE OF CRPF ALGORITHMS In this section, we assess the convergence of the proposed and we write CRPF algorithm. In particular, we seek sufficient conditions M SM opt ε = ∈ (i) − opt <ε xt , x xt i=1 : x xt (37) 2Part of this intuition is confirmed by the convergence theorem in Section 3. for its discrete counterpart built from the particles in Ξt. 2284 EURASIP Journal on Applied Signal Processing

3.2. Convergence theorem Theorem 1. If conditions (38), (39),and(40) in Lemma 1 i hold true, then the mean incremental cost at time t, { ( )}M t Lemma 1. Let xt i=1 beasetofparticlesdrawnattime pM using the propagation pdf t (x) as deﬁned by (34),lety1:t be a M C | ≥ (i) (i) ﬁxed bounded sequence of observations, and let (x yt) 0 Ct = t C xt yt , (45) opt be a continuous cost function, bounded in S{xt , ε}, with a i=1 opt minimum at x = xt . M →∞ If the three following conditions are met: converges to the minimal incremental cost as , opt (1) any ball with center at xt has a nonzero probability un- opt lim C xt yt − Ct = 0 (i.p.). (46) der the propagation density, that is, M→∞

See Appendix C for a proof. M pt d = γ> ∀ε> Finally, an interesting corollary that justifies the use of the opt (x) x 0 0, (38) S{xt ,ε} mean estimate (21) can be easily derived from Lemma 1 and Theorem 1. (2) thesupremumofthefunctionµ(C(·|·)) for points opt outside S(xt , ε) is a finite constant, that is, Corollary 1. Assuming (38), (39),and(40) in Lemma 1,and forgetting factor λ = 0, the mean cost estimate is asymptotically optimal, that is, Sout = sup µ C xt yt < ∞, (39) L opt xt ∈R x \S(xt ,ε) mean opt lim C x˜t yt −Ct xt |yt = 0 (i.p.), (47) M→∞ (3) thesupremumofthefunctionµ(C(·|·)) for points in- M opt side S (xt , ε) converges to infinity faster than the iden- where tity function, that is, M mean (i) (i) x˜t = πt xt . (48) M i=1 lim = 0, (40) M→∞ Sin See Appendix D for a proof.

where 3.3. Discussion Theorem 1 states that conditions (38)–(40)aresufficient to Sin = sup µ C xt yt , (41) M opt achieve AOP (i.p.). The validity of this result clearly depends xt ∈S (xt ,ε) M on the existence of a propagation pdf, pt [·], and a measure µt with good properties in order to meet the required condi- µ A ⊆{ (i)}M → ∞ then the set function t : xt i=1 [0, ) defined as tions. It is impossible to guarantee that condition (38)holds opt (i) M true in general, as the value of xt is a priori unknown, but µt A ⊆ xt = µ C x yt (42) i=1 if the number of particles is large enough and they are evenly x∈A distributed on the state space, it is reasonable to expect that opt the region around xt has a nonzero probability. Intuitively, is an infinite discrete measure (see definition in, e.g., [16]) that if the wps is locked to the system state at time t − 1, using the satisfies system dynamics to propagate the particles to time t should keep the filtering algorithm locked to the state trajectory. In- M opt µt S xt , ε deed, our computer simulation experiments give evidence lim Pr 1 − ≥ δ = 0 ∀δ>0, (43) M→∞ (i) M that the propagation pdf is not a critical weakness, and the µt xt i=1 proposed sequence of Gaussian densities given by (30)and (31) yields a remarkably good performance. where Pr[·] denotes probability, that is, Conditions (39)and(40) are related to the choice of µ or, equivalently, the measure µt. For the proposed cost model M opt given by (22)and(23), it is simple to show that condition µt S xt , ε lim M = 1 (i.p.), (44) (39) holds true, both for µ = µ1 and µ = µ2,asdefinedin(28) M→∞ µ (i) t xt i=1 and (29), respectively. The analysis of condition (40)ismore demanding and will not be addressed here. An educated in- where i.p. stands for “in probability.” tuition, also supported by the computer simulation results in Section 5, points in the direction of selecting µ = µ2 with a See Appendix B for a proof. small enough value of δ. ANewClassofParticleFilters 2285

(ii) Each PE draws a single particle with probabilities ac- i PE 1 cording to the risks, that is, for the th PE: (i) (k) (i) (k) xˆt = xt , Cˆ t = Ct , k ∈{i − 1, i, i +1}, (49) PE 2 PE 6 π(k) = µ R(k) / i+1 µ R(k) with probability ˆt ( t+1) l=i−1 ( t+1). Note that, in two simple steps, the algorithm stochasti- PE 3 PE 5 cally selects those particles with smaller risks. It is apparent that the method lends easily to parallelization, with very limited communication requirements. The term local resam- PE 4 pling comes from the observation that low-risk particles are only locally spread by the method, that is, a PE containing a high-risk particle can only get a low-risk sample from its two Figure 1: M = 6 processors in a ring configuration for parallel im- neighbors. plementation of the local resampling algorithm. 5. COMPUTER SIMULATIONS In this section, we present computer simulations that illus- 4. RESAMPLING AND PARALLEL IMPLEMENTATION trate the validity of our approach. We have considered the Resampling is an indispensable algorithmic component in problem of autonomous positioning of a vehicle moving along a 2-dimensional space. The vehicle is assumed to have sequential methods for statistical reference PF, which, oth- T erwise, suffer from weight degeneracy and do not converge means to estimate its current speed every s seconds and it to useful solutions [4, 7, 15]. However, resampling also be- also measures, with the same frequency, the power of three ffi radio signals emitted from known locations and with known comes a major obstacle for e cient implementation of PF ffi algorithms in parallel VLSI hardware devices because it cre- attenuation coe cients. This information can be used by a ates full data dependencies among processing units [15]. Al- particle filter to estimate the actual vehicle position. though some promising methods have been recently pro- Following [18], we model the positioning problem by the posed [15, 17], parallelization of resampling algorithms re- state-space system mains an open problem. (i) state equation: The selection step in CRPFs (see Section 2.1)ismuchless = restrictive than resampling in conventional SRPFs. Specifi- xt Gxxt−1 + Gvvt + Guut; (50) cally, while resampling methods in SRPFs must ensure that (ii) observation equation: the probability distribution of the resampled population is an unbiased and unweighted approximation of the original P y = i,0 w distribution of the particles [4], selection in CRPFs is only i,t 10 log10 αi + i,t, (51) ri − xt aimed at ensuring that the particles are close to the locations that produce cost function minima. We have found evidence where xt ∈ R2 indicates the position of the vehicle in the of state estimates obtained by CRPF being better when the 2-dimensional reference set, Gx = I2 and Gv = Gu = TsI2 random grid of particles comprises small regions of the state are known transition matrices, vt ∈ R2 is the observable space around these minima. Therefore, selection algorithms vehicle speed, which is assumed constant during the in- can be devised with the only and mild constraint that they do terval ((t − 1)Ts, tTs), and ut is a noise process that ac- not increase the average cost of particles. counts for measurement errors of the speed. The vector yt = T Now we briefly describe a simple resampling technique [y1,t, y2,t, y3,t] collects the received power from three emit- for CRPFs that lends itself to a straightforward paralleliza- ters located at known reference locations ri ∈ R2, i = 1, 2, 3, tion. Figure 1 shows an array of independent processors con- that transmit their signals with initial power Pi,0 through a nected in a ring configuration. We assume, for simplicity, fading channel with attenuation coefficient αi, and, finally, T that the number of processors is equal to the number of wt = [w1,t, w2,t, w3,t] is the observation noise. Each time particles M, although the algorithm is easily generalized to step represents Ts seconds, the position vectors xt and ri have a smaller number of processing elements (PEs). The ith PE units of meters (m), the speed is given in m/s, and the re- { (i) C(i) R(i) } (PEi) contains the triple xt , t , t+1 in its memory. The ceived power is measured in dB. The initial vehicle position proposed local resampling technique proceeds in two steps. x0 is drawn from a standard 2-dimensional Gaussian distribution, that is, x0 ∼ N (0, I2). { (i) C(i) R(i) } (i) PEi transmits xt , t , t+1 to PEi+1 and PEi−1 We have applied the proposed CRPF methodology for and receives the corresponding information from its solving the positioning problem and, for comparison and neighbors. This communication step can be typically benchmarking purposes, we have also implemented the pop- carried out in a single cycle and, when complete, PEi ular auxiliary BF [9], which has an algorithmic structure (re- { (k) C(k) R(k) }i+1 contains three particles xt , t , t+1 k=i−1. sampling, importance sampling, and state estimation) very 2286 EURASIP Journal on Applied Signal Processing

Parameters. For all i, λ = . q = δ = . β = M = σ2,(i) = 0 95; 1, 2; 0 01; 2; 50; 0 10. Initialization. For i = 1, ..., M, (i) ∼ U − x0 ( 8, +8), C(i) = 0 0. Recursive update. For i = 1, ..., M, R(i) = λC(i) − f (i) q t+1 t + yt+1 y(Gxxt + Gvvt+1) . Multinomial selection (resampling).  i  R( ) −1  ( t+1)  M l (function µ1),  ( ) −1 l= (Rt ) π(i) = 1 +1 pmf : ˆt+1  i (j)  R( ) − R δ −β  ( t+1 minj∈{1,...,M} t+1 + ) µ .  l j (function 2) M R( ) − R( ) δ −β l=1( t+1 minj∈{1,...,M} t+1 + ) (i) Cˆ (i) = (k) C(k) k ∈{ ... M} π(k) Selection. (xˆt , t ) (xt , t ), 1, , , with probability ˆt+1. Variance update. t ≤ σ2,(i) = σ2,(i) 10: t+1 t , t − (i) − f (i) 2 t> σ2,(i) = 1 σ2,(i) xt x(xˆt−1) 10: t t−1 + . t tLx (i) ∼ p | (i) Let xt+1 t+1(xt+1 xˆt ), where (i) E (i) [xt+1] = fx(xˆt ), pt+1(xt+1|xˆt ) 2,(i) Cov (i) [xt+1] = σt I2, pt+1(xt+1|xˆt ) +1 (i) ={ (i) (i) } x0:t+1 xˆ0:t, xt+1 , C(i) = λCˆ (i) − f (i) q t+1 t + yt+1 y(xt+1) . State estimation. (i) (i) (i) (i) πt ∝ µ1(Ct )orπt ∝ µ2(Ct ), M mean (i) (i) x˜t = πt xt . i=1

Algorithm 1: CRPF algorithm with multinomial resampling for the 2-dimensional positioning problem. similar to the proposed CRPF family. Algorithm 1 summa- and temporally white, with the mixture Gaussian pdfs: rizes the details of the CRPF algorithm with multinomial se- √ ∼ . N . lection, including the alternatives in the choice of function µ. ut 0 3 0, 0 2I2 √ The selection step can be substituted by the local resampling +0.4N 0, I2 +0.3N 0, 10I2 , procedure shown in Algorithm 2. A pseudocode for the aux- (52) w ∼ . N . . N iliary BF is also provided in Algorithm 3. l,t 0 3 (0, 0 2) + 0 4 (0, 1) In the following subsections, we describe different com- +0.3N (0, 10), l = 1, 2, 3. puter experiments that were carried out using synthetic data generated according to model (50)-(51). Two types of plots In Figure 2, we compare the auxiliary BF with perfect knowl- are presented, both for CRPF and BF algorithms. Vehicle tra- edge of the noise distributions, and several CRPF algorithms jectories in the 2-dimensional space, resulting from a single that use the cost and risk functions proposed in Section 2.2 simulation of the dynamic system, are shown to illustrate the (see (22)–(24)). For all CRPF methods, the forgetting factor ability of the algorithms to remain locked to the state trajec- was λ = 0.95, but we ran algorithms with different values of tory. We chose the mean absolute deviation as a performance q, q = 1, 2, and functions µ1 and µ2 (see (28)and(29)). For figure of merit. It was measured between the true vehicle tra- the latter function µ2,wesetδ = 0.01 and β = 2. The prop- jectory in R2 and the trajectory estimated by the particle fil- agation mechanism for the CRPF methods consisted of the ters and its unit was meter. All mean-deviation plots were sequence of Gaussian densities given by (30)and(31), with σ2,(i) = i obtained by averaging 50 independent simulations. Both the initial value 0 10 for all . BF and the CRPF type of algorithms were run with M = 50 Figure 2a shows the system trajectory in a single run and particles. the estimates corresponding to the BF and CRPF algorithms. The trajectory started in an unknown position close to (0, 0) 5.1. Mixture Gaussian noise processes and evolved for one hour, with sampling period Ts = 2sec- In the first experiment, we modeled the system and observa- onds. It is apparent that all the algorithms remained locked tion noise processes ut and wt, respectively, as independent to the vehicle position during the whole simulation interval. ANewClassofParticleFilters 2287

Local selection (resampling) at the ith PE.   R(k) −1  t+1  µ ,  i+1 (l) −1 1 l=i− Rt k = i − i i π(k) = 1 +1 For 1, , +1, ˆt+1  k j −β  R( ) − R( ) δ  t+1 minj∈{i−1,i,i+1} t+1 + µ .  j −β 2 i+1 R(l) − R( ) δ l=i−1 t+1 minj∈{i−1,i,i+1} t+1 + (i) Cˆ (i) = (l) C(l) l ∈{i − i i } π(l) Selection. (xˆt , t ) (xt , t ), 1, , +1 , with probability ˆt+1.

Algorithm 2: Local resampling for the CRPF algorithm.

Initialization. For i = 1, ..., M, steadily attain a good performance for a wider range of sce- i narios. This conclusion is easily confirmed with the remain- x( ) ∼ N (0, I ), 0 2 ing experiments presented in this section. w(i) = 1 0 M . Figures 3 and 4 show the trajectories and the mean ab- Recursive update. solute deviations for the BF and CRPF algorithms when the For t = 1, ..., K, sampling period was increased to Ts = 5 seconds and Ts = 10 For i = 1, ..., M, seconds, respectively. Note that increasing Ts also increases (i) = f (i) xˆt x(xt−1), the speed measurement error. As before, the CRPF tech- κ = k p | (k) w(k) i , with probability (yt xˆt ) t−1, niques with µ2 outperformed the BF in the long term. (i) ∼ p | (κi) xt [xt xt−1]. Because of its better performance, we also checked the p | (i) (i) (yt xt ) behavior of the CRPF method that uses µ2 for different val- wt = Weight update. ˜ (κi) . p(yt|xˆt ) ues of parameter δ. Figure 5a shows the true position and the (i) ff δ (i) w˜t estimates obtained using three di erent values of ,namely, Weight normalization. wt = . M w(k) 0.1, 0.01, and 0.001, with fixed β = 2. All the algorithms ap- k=1 ˜t pear to perform similarly for the considered range of values. Algorithm 3: Auxiliary BF for the 2-dimensional positioning This is confirmed with the results presented in Figure 5b in problem. terms of the mean absolute deviation. They also illustrate the robustness and stability of the method. In the following, unless it is stated differently, the CRPF The latter observation is confirmed by the mean absolute algorithm was always implemented with µ2 and parameters deviation plot in Figure 2b. The deviation signal was com- q = 2, δ = 0.01, and β = 2. The sampling period was also puted as fixed and was Ts = 5 seconds. 50 1 1 est est et = x1,t,j − x t j + x2,t,j − x t j , (53) 5.2. Mixture Gaussian system and observation 50 2 1, , 2, , j=1 noise—Gaussian BF T where j is the simulation number, xt,j = [x1,t,j , x2,t,j] is the Figure 6 shows the results (trajectory and mean deviation) t est = xest xest T true position at time ,andxt,j [ 1,t,j , 2,t,j ] is the cor- obtained with the same system and observation noise distri- responding estimate obtained with the particle filter. We ob- butions as in Section 5.1 when the auxiliary BF (labeled as BF serve that the CRPF algorithms with µ2 attained the lowest (Gaussian)) is mismatched with the dynamical system and deviation and outperformed the auxiliary BF. Although it is models the noise processes with Gaussian densities: not shown here, the auxiliary BF improved its performance √ 3 as the sampling period was decreased, and achieved a lower p[ut] = N (0, 0.2I2), deviation than the CRPFs for Ts ≤ 0.5 second. The reason (54) p wl t = N . l = . is that, as Ts decreases, the correlation of the states increases [ , ] (0, 0 2), 1, 2, 3 due to the variation of Gu, and the BF exploits this statistical It is apparent that the use of the correct statistical infor- information better. Therefore, we can conclude that the BF mation is critical for the bootstrap algorithm (in the figure, can be more accurate when strong statistical information is we also plotted the result obtained when the BF used the true available, and that the proposed CRPFs are more robust and mixture Gaussian density—labeled as BF (M-Gaussian)). Note that the CRPF algorithm also drew the state particles 3Obviously,theBFwillalsodeliverabetterperformanceasthenumber from a Gaussian sequence of densities (see Section 2.2), but of particles M grows. In fact, it can be shown [4] that the estimate of the it attained a superior performance compared to the BF. posteriori pdf and its moments obtained from the BF converge uniformly to the true density and the true values of the moments. This means that, as 5.3. Local versus multinomial resampling M →∞, the state estimates given by the BF become optimal (in the mean square error sense) and that for large M, the BF will outperform the CRPF We have verified the performance of the CRPF that uses the algorithm. new resampling scheme proposed in Section 4. The results 2288 EURASIP Journal on Applied Signal Processing

×104 ×104 4 8

3 6

2 y 4 y

1 2

0 0

−6 −4 −2 0 2 −4 −3 −2 −1 0 1 x ×104 x ×104 q = µ q = µ True trajectory CRPF ( 2, 1) True trajectory CRPF ( 2, 1) q = µ q = µ BF CRPF ( 1, 2) BF CRPF ( 1, 2) q = µ CRPF ( 1, 1) CRPF (q = 2, µ2) CRPF (q = 1, µ1) CRPF (q = 2, µ2) (a) (a)

200 250

200 150

150 100

100

Joint mean abs. dev. (m) 50 Joint mean abs. dev. (m) 50

0 0 0 500 1000 1500 2000 0 200 400 600 800 t × 2(s) t × 5(s) q = µ BF CRPF ( 1, 2) BF CRPF (q = 1, µ2) q = µ q = µ CRPF ( 1, 1) CRPF ( 2, 2) CRPF (q = 1, µ1) CRPF (q = 2, µ2) q = µ CRPF ( 2, 1) CRPF (q = 2, µ1)

(b) (b)

Figure 2: Mixture Gaussian noise processes. Ts = 2 seconds. (a) Figure 3: Mixture Gaussian noise processes. Ts = 5 seconds. (a) Trajectory. (b) Mean absolute deviation. Trajectory. (b) Mean absolute deviation.

mean can be observed in Figure 7. The CRPF with local resampling (see (20)) and the mean cost estimate x˜t (see (21)). It is shows approximately the same performance as the BF with clear that both algorithms performed similarly and outper- perfect knowledge of the noise statistics. Although it presents formed the BF in the long term. a slight degradation with respect to the CRPF with multinomial resampling, the feasibility of a simple parallel imple- 5.5. Laplacian noise mentation makes the local resampling method extremely ap- Finally, we have repeated our experiment by modeling the pealing. noises using Laplacian distributions, that is, 5.4. Different estimation criteria √ 1 −| |/ . p[ut] = L 0, 0.5I = e ut 0 5, Figure 8 compares the trajectory and mean deviation of 2 0.5 (55) two CRPF algorithms that used diﬀerent criteria to obtain 1 −|w |/ . p w = . L . = e l,t 0 5 l = . min l,t 0 3 (0, 0 5) , 1, 2, 3 the estimates of the state: the minimum cost estimate x˜t 0.5 ANewClassofParticleFilters 2289

×104 ×104 4 1

0 3 −1

−2 y 2 y −3

1 −4

−5

0 −6 −10000 −5000 0 5000 −8 −6 −4 −2 0 2 x x ×104

q = µ True trajectory CRPF ( 2, 1) True trajectory CRPF (q = 2, µ2, δ = 0.01) q = µ BF CRPF ( 1, 2) BF CRPF (q = 2, µ2, δ = 0.001) q = µ q = µ CRPF ( 1, 1) CRPF ( 2, 2) CRPF (q = 2, µ2, δ = 0.1)

(a) (a)

300 250

250 200 200 150 150

100 100 Joint mean abs. dev. (m) Joint mean abs. dev. (m) 50 50

0 0 0 100 200 300 400 0 200 400 600 800 t × 10 (s) t × 5(s)

BF CRPF (q = 1, µ2) q = µ BF CRPF (q = 1, µ1) CRPF ( 2, 2) CRPF (q = 2, µ2, δ = 0.1) CRPF (q = 2, µ1) CRPF (q = 2, µ2, δ = 0.01) CRPF (q = 2, µ2, δ = 0.001)

(b) (b)

Figure 4: Mixture Gaussian noise processes. Ts = 10 seconds. Figure 5: Diﬀerent δ values. Ts = 5 seconds. (a) Trajectory. (a) Trajectory. (b) Mean absolute deviation. (b) Mean absolute deviation.

Figure 9 depicts the results obtained for the BF with perfect and noise processes that affect the system, and require the knowledge of the probability distribution of the noise and ability to evaluate likelihood functions and the state tran- the CRPF algorithm. Again, the proposed method attained sition densities. Under these assumptions, different meth- better performance in terms of mean absolute deviation. ods have been proposed that recursively estimate posterior densities by generating a collection of samples and associated importance weights. In this paper, we introduced a new 6. CONCLUSIONS class of particle filtering methods that aim at the estimation Particle filters provide optimal numerical solutions in prob- of system states from available observations without a priori lems that amount to estimation of unobserved time-varying knowledge of any probability density functions.Theproposed states of dynamic systems. Such methods rely on the knowl- method is based on cost functions that measure the quality edge of prior probability distributions of the initial state of the state signal estimates given the available observations. 2290 EURASIP Journal on Applied Signal Processing

×103 ×104 20 2

0 15

−2 10 y y −4 5 −6

0 −8

−5 −10 −5 0 5 10 15 −1 0 1 2 x ×104 x ×104

True trajectory True trajectory BF (M-Gaussian) BF BF (Gaussian) CRPF (multinomial) CRPF (q = 2, µ2) CRPF (local)

(a) (a)

350 250

300 200 250 150 200

150 100

100 Joint mean abs. dev. (m)

Joint mean abs. dev. (m) 50 50 0 0 0 200 400 600 800 0 200 400 600 800 t × 5(s) t × 5(s) BF BF (M-Gaussian) CRPF (multinomial) BF (Gaussian) CRPF (local) CRPF (q = 2, µ2)

(b) (b)

Figure 6: Mixture Gaussian system and observation noise— Figure 7: Local versus multinomial resampling. Ts = 5 seconds. Gaussian BF. Ts = 5 seconds. (a) Trajectory. (b) Mean absolute de- (a) Trajectory. (b) Mean absolute deviation. viation.

Since they do not assume explicit probabilistic models for the robustness and the excellent performance of the proposed al- dynamic system, the proposed techniques, which have been gorithms when compared to the popular auxiliary BF. termed CRPFs, are more robust than standard particle ﬁl- ters in problems where there is uncertainty (or a mismatch with physical phenomena) in the probabilistic model of the APPENDICES dynamic system. The basic concepts related to the formu- A. CRPF AND STOCHASTIC APPROXIMATION lation and design of these new algorithms, as well as theoretical results concerning their convergence, were provided. It is interesting to compare the CRPF method with the SA We also proposed a local resampling scheme that allows for algorithm. The subject of SA can be traced back to the 1951 simple implementations of the CRPF techniques with paral- paper of Robbins and Monro [19], and a recent tutorial re- lel VLSI hardware. Computer simulation results illustrate the view can be found in [14]. ANewClassofParticleFilters 2291

×104 ×104 10 5

8 4

6 3

y 4 y 2

2 1

0 0

−2 −1 −2 0 2 4 6 8 −2 0 2 4 6 x ×104 x ×104

True trajectory True trajectory BF BF CRPF (mean) CRPF CRPF (min)

(a) (a)

250 100

200 80

150 60

100 40 Joint mean abs. dev. (m) 50 Joint mean abs. dev. (m) 20

0 0 0 200 400 600 800 0 200 400 600 800 t × 5(s) t × 5(s)

BF BF CRPF (mean) CRPF CRPF (min)

(b) (b)

Figure 8: Diﬀerent estimation criteria. Ts = 5 seconds. (a) Trajec- Figure 9: Laplacian. Ts = 5 s. (a) Trajectory. (b) Mean absolute tory. (b) Mean absolute deviation. deviation.

In a typical problem addressed by SA, an objective func- for xt: tion that has to be minimized involves expectations, for example, the minimization of E(Q(x, ψt)), where Q(·)isa xˆt = xˆt−1 + γtQ xˆt−1, ψt ,(A.1) function of the unknown x and random variables ψt.The problem is that the distributions of the random variables are where γt is a sequence of positive scalars that have to satisfy 2 unknown and the expectation of the function cannot be an- the conditions t γt =∞, t γt < ∞. In the signal processing alytically found. To make the problem tractable, one approx- literature, the best known SA method is the LMS algorithm. imates the expectation by simply dropping the expectation The CRPF method also attempts to estimate the un- operator, and proceeding as if E(Q(x, ψt)) = Q(x, ψt). Rob- known xt without probabilistic assumptions. In doing so, it bins and Monro proposed the following scheme that solves actually aims at inverting the dynamic model and, therefore, 2292 EURASIP Journal on Applied Signal Processing it performs SA similarly to RM though by other means.4 In for the indicator function, we can write CRPF, the dynamics of the state are taken into account both through the propagation step and by recursively solving the M µ (i) \ SM opt ε optimization problem (25). Further research in CRPF from t xt i=1 xt , the perspective of SA can probably yield new and deeper in- M (i) (i) sight of this new class of algorithms. = µ C t | t opt t x y 1{x: x−xt ≥ε} x i=1 (B.8) M B. PROOF OF LEMMA 1 (i) ≤ (i) opt t = nM Sout 1{x:|xt −xt |≥ε} x Sout , The proof is carried out in two steps. First, we prove the im- i=1 plication n { (i)}M \ where M is the cardinality of the discrete set xt i=1 M opt S (xt , ε). Therefore, using (B.8) and the equivalence be- E M nM 1 − δ pt S lim = 0(B.1)tween (B.3)and(B.5), we arrive at the implication out M→∞ M opt δ µt S xt , ε ⇓ µ SM opt ε   − t xt , ≥ δ = M opt lim Pr 1 i M 0 µt S t ε M→∞ ( )  x ,  µt xt i= lim Pr 1 − M ≥ δ = 0(B.2) 1 M→∞ µ (i) t xt i=1 ⇑ (B.9) − δ ε δ> M opt 1 for any , 0. Then, we only need to show that (B.1)holds lim Pr µt S xt , ε ≤ SoutnM = 0. M→∞ δ true under conditions (38)–(40) in order to complete the proof. Straightforward manipulation of the inequality in (43) Since both µ(C(xt|yt)) ≥ 0and(1− δ)SoutnM/δ > 0, leads to the following equivalence chain that holds true for we can use the relationship [16, equation 4.4-5] to obtain any ε, δ>0: − δ   M opt 1 M opt Pr µt S xt , ε ≤ SoutnM  µt S xt , ε  δ lim Pr 1 − M ≥ δ = 0(B.3) M→∞ µ (i) (B.10) t xt i= 1 − δ E nM [nM] 1 ≤ S Pr[ ] , δ out M opt µt S xt , ε M opt (i) M lim Pr µt S xt , ε ≤ (1 − δ)µt xt i= = 0(B.4) M→∞ 1 where we have used the fact that the supremum Sout does not depend on M or nM. When jointly considered, (B.9)and (B.10) yield the implication (B.1)⇒(B.2)andweonlyhaveto M opt show that (B.1) holds true in order to complete the proof. lim Pr µt S xt , ε M→∞ The expectation on the left-hand side of (B.1)canbe (B.5) computed by resorting to assumption (38), which yields, af- − δ M ≤ 1 µ (i) \ SM opt ε = ter straightforward manipulations, δ t xt i=1 xt , 0, lim E nM nM = (1 − γ)limM (i.p.). (B.11) where we have exploited that M→∞ Pr[ ] M→∞

M µ (i) = µ SM opt ε t xt i=1 t xt , Substituting (B.11) into (B.1) yields M (B.6) µ (i) \ SM opt ε . + t xt i=1 xt , E n 1 − δ Pr[nM ] M Sout lim δ M→∞ M opt Using the notation µt S xt , ε  1 − δ M  x ∈ A = Sout(1 − γ)lim (B.12) 1if , δ M→∞ µ SM opt ε 1A(x) = (B.7) t xt , 0 otherwise, 1 − δ M ≤ Sout(1 − γ)lim = 0, δ M→∞ Sin where the last equality is obtained from assumptions (39) 4Note, however, that the notion of inversion must be understood in a −1 broad sense, since fy may not necessarily be invertible and, even if fy exists, and (40). The proof is complete by going back to implication it may happen that yt does not belong to its domain. (B.1)⇒(B.2). ANewClassofParticleFilters 2293

C. PROOF OF THEOREM 1 D. PROOF OF COROLLARY 1 M opt (i) (i) Using Lemma 1, we obtain that the set S (xt , ε)has When λ = 0, t = πt and, according to Lemma 1, (asymptotically) a unit probability mass after the propagation step, that is, (i) lim πt = 1(i.p.) (D.1) M→∞ (i) i (i)∈SM opt ε lim t = 1(i.p.)∀ε>0. (C.1) :xt (xt , ) M→∞ (i) M opt i:xt ∈S xt ,ε for all ε>0. Hence, we can write the mean state estimate (in Therefore, the limit M →∞)as opt lim C xt yt − Ct mean (i) (i) M→∞ lim x˜t = lim πt xt (D.2) M→∞ M→∞ (i) M opt xt ∈S (xt ,ε) opt = lim C xt yt M→∞ and, therefore, the incremental cost of the mean state estimate can be upper bounded as − (i)C (i)| t xt yt (i.p.), (i) M opt mean i:xt ∈S xt ,ε lim C x˜t yt ≤ lim sup C xt|yt . (D.3) M→∞ M→∞ M opt (C.2) xt ∈S (xt ,ε) opt opt (i) (i) C | t lim C xt yt − t C xt yt Using inequality (D.3) and the obvious fact that (xt y ) M→∞ (i) M opt is minimal by definition, we find that i:xt ∈S xt ,ε opt (C.3) mean opt ≤ lim C xt yt lim C x˜t |yt −C xt |yt M→∞ M→∞ opt ≤ lim sup C xt|yt −C xt yt − sup C xt yt (i.p.) M→∞ ∈SM opt ε M opt xt (xt , ) xt ∈S (xt ,ε) = B(ε), for all ε>0. (D.4) We write the upper bound on the right-hand side of (C.3) as a function of the radius ε: where B(ε) is the same as defined in (C.4). Therefore, we can apply the same√ technique as in the proof of Theorem 1 and, opt ε = / M B(ε) = lim C xt yt − sup C xt yt . taking 1 ,weobtain M→∞ M opt xt ∈S (xt ,ε) (C.4) mean opt lim C x˜t yt −Ct xt yt √ M→∞ M opt It can be easily proved that limM→∞ #S (xt , ε = 1/ M) = (D.5) ∞ ≤ B ε = √1 = , where # denotes√ the number of elements in a discrete set. M 0(i.p.), Since limM→∞ 1/ M = 0 and it is assumed that C(·|yt)is opt continuous and bounded in S(xt , ε)forallε>0, it follows that which concludes the proof of the corollary. B ε = √1 = M 0 (i.p.) (C.5) ACKNOWLEDGMENTS and, by exploiting the fact that the left-hand side of (C.2) This work has been supported by Ministerio de Ciencia y does not depend on ε, we can readily use (C.5)toobtain Tecnolog´ıa of Spain and Xunta de Galicia (Project TIC2001- 0751-C04-01), and the National Science Foundation (Award opt CCR-0082607). The authors wish to thank the anonymous lim C xt yt − Ct M→∞ reviewers of this paper for their very constructive comments (C.6) that assisted us in improving the original draft. J. M´ıguez ≤ B ε = √1 = M 0(i.p.), would also like to thank Professor Ricardo Cao for intellec- tually stimulating discussions which certainly contributed to which concludes the proof. the final form of this work. 2294 EURASIP Journal on Applied Signal Processing

REFERENCES Joaqu´ın M´ıguez was born in Ferrol, Gali- cia, Spain, in 1974. He obtained the Licen- [1] R. E. Kalman, “A new approach to linear filtering and predic- ciado en Informatica´ (M.S.) and Doctor en tion problems,” Transactions of the ASME—Journal of Basic Informatica´ (Ph.D.) degrees from Universi- Engineering, vol. 82, pp. 35–45, March 1960. dade da Coruna,˜ Spain, in 1997 and 2000, [2]B.D.O.AndersonandJ.B.Moore, Optimal Filtering, Prentice-Hall, Englewood Cliffs, NJ, USA, 1979. respectively. Late in 2000, he joined Depar- [3] S. Julier, J. Uhlmann, and H. F. Durrant-Whyte, “A new tamento de Electronica´ e Sistemas, Univer- method for the nonlinear transformation of means and co- sidade da Coruna,˜ where he became an As- variances in filters and estimators,” IEEE Trans. on Automatic sociate Professor in July 2003. From April to Control, vol. 45, no. 3, pp. 477–482, 2000. December 2001, he was a Visiting Scholar [4] D. Crisan and A. Doucet, “A survey of convergence results in the Department of Electrical and Computer Engineering, Stony on particle filtering methods for practitioners,” IEEE Trans. Brook University. His research interests are in the field of statistical Signal Processing, vol. 50, no. 3, pp. 736–746, 2002. signal processing, with emphasis on the topics of Bayesian analy- [5] W. J. Fitzgerald, “Markov chain Monte Carlo methods with sis, sequential Monte Carlo methods, adaptive filtering, stochastic applications to signal processing,” Signal Processing, vol. 81, optimization, and their applications to multiuser communications, no. 1, pp. 3–18, 2001. smart antenna systems, sensor networks, target tracking, and vehi- [6] J. S. Liu and R. Chen, “Sequential Monte Carlo methods for cle positioning and navigation. dynamic systems,” Journal of the American Statistical Associa- tion, vol. 93, no. 443, pp. 1032–1044, 1998. Monica´ F. Bugallo received the Ph.D. de- [7] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte gree in computer engineering from Univer- Carlo sampling methods for Bayesian filtering,” Statistics and sidade da Coruna,˜ Spain, in 2001. From Computing, vol. 10, no. 3, pp. 197–208, 2000. 1998 to 2000, she was with the Departa- [8]N.Gordon,D.Salmond,andA.F.M.Smith, “Novelap- mento de Electronica´ e Sistemas, the Uni- proach to nonlinear and non-Gaussian Bayesian state estima- versidade da Coruna,˜ where she worked in tion,” IEE Proceedings Part F: Radar and Signal Processing, vol. interference cancellation applied to mul- 140, no. 2, pp. 107–113, 1993. tiuser communication systems. In 2001, she [9] M. K. Pitt and N. Shephard, “Auxiliary variable based par- joined the Department of Electrical and ticle filters,” in Sequential Monte Carlo Methods in Practice, Computer Engineering, Stony Brook Uni- A. Doucet, N. de Freitas, and N. Gordon, Eds., pp. 273–293, versity, where she is currently an Assistant Professor, and teaches Springer, New York, NY, USA, 2001. courses in digital communications and information theory. Her re- [10] A. Doucet, N. de Freitas, and N. Gordon, Eds., Sequential search interests lie in the area of statistical signal processing and its Monte Carlo Methods in Practice, Springer,NewYork,NY, applications to different disciplines including communications and USA, 2001. biology. [11] M. L. Honig, U. Madhow, and S. Verdu,´ “Blind adaptive multiuser detection,” IEEE Transactions on Information Theory, vol. 41, no. 4, pp. 944–960, 1995. Petar M. Djuric´ received his B.S. and M.S. [12] J. R. Treichler, M. G. Larimore, and J. C. Harp, “Practical blind degrees in electrical engineering from the demodulators for high-order QAM signals,” Proceedings of the University of Belgrade in 1981 and 1986, re- IEEE, vol. 86, no. 10, pp. 1907–1926, 1998. spectively, and his Ph.D. degree in electrical [13] J. M´ıguez and L. Castedo, “A linearly constrained constant engineering from the University of Rhode modulus approach to blind adaptive multiuser interference Island in 1990. From 1981 to 1986, he was a suppression,” IEEE Communications Letters, vol. 2, no. 8, pp. Research Associate with the Institute of Nu- 217–219, 1998. clear Sciences, Vinca,ˇ Belgrade. Since 1990, [14] T. L. Lai, “Stochastic approximation,” The Annals of Statistics, he has been with Stony Brook University, vol. 31, no. 2, pp. 391–406, 2003. where he is a Professor in the Department of [15] M. Bolic,´ P. M. Djuric,´ and S. Hong, “New resampling algo- Electrical and Computer Engineering. He works in the area of sta- rithms for particle filters,” in Proc. IEEE International Confer- tistical signal processing, and his primary interests are in the theory ence on Acoustics, Speech, and Signal Processing (ICASSP ’03), of modeling, detection, estimation, and time series analysis and its vol. 2, pp. 589–592, Hong Kong, April 2003. application to a wide variety of disciplines including wireless com- [16] H. Stark and J. W. Woods, Probability and Random Processes munications and biomedicine. with Applications to Signal Processing, Prentice-Hall, Upper Saddle River, NJ, USA, 3rd edition, 2002. [17] M. Bolic,´ P. M. Djuric,´ and S. Hong, “Resampling algorithms for particle filters suitable for parallel VLSI implementation,” in Proc. 37th Annual Conference on Information Sciences and Systems (CISS ’03), Baltimore, Md, March 2003. [18] F. Gustafsson, F. Gunnarsson, N. Bergman, U. Forssell, J. Jans- son, R. Karlsson, and P.-J. Nordlund, “Particle filters for positioning, navigation, and tracking,” IEEE Transactions on Sig- nal Processing, vol. 50, no. 2, pp. 425–437, 2002. [19] H. Robbins and S. Monro, “A stochastic approximation method,” Annals of Mathematical Statistics, vol. 22, pp. 400– 407, 1951. EURASIP Journal on Applied Signal Processing 2004:15, 2295–2305 c 2004 Hindawi Publishing Corporation

A Particle Filtering Approach to Change Detection for Nonlinear Systems

Babak Azimi-Sadjadi Electrical, Computer, and Systems Engineering Department, Rensselaer Polytechnic Institute, Troy, NY 12180-3590, USA Email: [email protected] The Institute for Systems Research, University of Maryland, College Park, MD 20742, USA P.S. Krishnaprasad The Institute for Systems Research, University of Maryland, College Park, MD 20742, USA Email: [email protected]

Received 13 September 2003; Revised 22 March 2004

We present a change detection method for nonlinear stochastic systems based on particle ﬁltering. We assume that the parameters of the system before and after change are known. The statistic for this method is chosen in such a way that it can be calculated recursively while the computational complexity of the method remains constant with respect to time. We present simulation results that show the advantages of this method compared to linearization techniques. Keywords and phrases: nonlinear ﬁltering, generalized likelihood ratio test, CUSUM algorithm, online change detection.

1. INTRODUCTION be classified under the general name of likelihood ratio (LR) methods. Cumulative sum (CUSUM) and generalized LR Page states the change detection problem as follows [1]: (GLR) tests are among these methods. CUSUM was first pro- “Wheneverobservationsaretakeninorderitcanhappenthat posed by Page [1]. The most basic CUSUM algorithm as- the whole set of observations can be divided into subsets, each sumes that the observation signal is a sequence of stochastic of which can be regarded as a random sample from a common variables which are independent and identically distributed distribution, each subset corresponding to a different parameter (i.i.d.) with known common probability density function be- value of the distribution. The problems to be considered in this fore the change time, and i.i.d. with another known probabil- paper are concerned with the identification of the subsamples ity density after the change time. In the CUSUM algorithm, and the detection of changes in the parameter value.” the log-likelihood ratio for the observation from time i to We refer to a change or an abrupt change as any change time k is calculated and its difference with its current mini- in the parameters of the system that happens either instan- mum is compared with a certain threshold. If this difference taneously or much faster than any change that the nominal exceeds the threshold, an alarm is issued. bandwidth of the system allows. Properties of the CUSUM algorithm have been studied The key difficulty of all change detection methods is that extensively. Its most important property is the asymptotic of detecting intrinsic changes that are not necessarily directly optimality, which was first proven in [3]. More precisely, observed but are measured together with other types of per- CUSUM is optimal, with respect to the worst mean delay, turbations [2]. when the mean time between false alarms goes to infinity. The change detection could be offline or online. In on- This asymptotic point of view is convenient in practice be- line change detection, we are only interested in detecting the cause a low rate of false alarms is always desirable. change as quickly as possible (e.g., to minimize the detection In the case of unknown system parameters after change, delay with fixed mean time between false alarms), and the the GLR algorithm can be used as a generalization of the estimate of the time when the change occurs is not of impor- CUSUM algorithm. Since, in this algorithm, the exact infor- tance. In offline change detection, we assume that the whole mation of the change pattern is not known, the LR is maxi- observation sequence is available at once and finding the es- mized over all possible change patterns.1 timate of the time of change could be one of the goals of the detection method. In this paper, we limit our concern to online detection of abrupt changes. 1If the maximum does not exist, the supremum of the LR should be cal- The change detection methods that we consider here can culated. 2296 EURASIP Journal on Applied Signal Processing

For stochastic systems with linear dynamics and linear If the carrier phase is used for position information in an observations, the observation sequence is not i.i.d. Therefore, integrated INS/GPS, one sudden change that happens rather the regular CUSUM algorithm cannot be applied for detec- often is the cycle slip. A cycle slip happens when the phase tion of changes in such systems. However, if such systems of the received signal estimated by the phase lock loop in the are driven by Gaussian noise, the innovation process asso- receiver has a sudden jump. An integrated INS/GPS with car- ciated with the system is known to be a sequence of indepen- rier phase receiver is used as an application for the method dent random variables. The regular CUSUM algorithm or its introduced in this paper for detection of a cycle slip with more general counterpart, GLR, can be applied to this inno- known strength. vation process [2, 4]. In Section 2, we state the approximate nonlinear filter- In this paper, we are interested in the change detection ing method used in this paper. In Section 3, we briefly define problem for stochastic systems with nonlinear dynamics and the change detection problem. In Section 4, we review the observations. We show that for such systems, the complexity CUSUM algorithm for linear systems with additive changes. of the CUSUM algorithm grows with respect to time. This Then, in Sections 5 and 6,wepresentanewchangedetection growth in complexity cannot be tolerated in practical prob- method for nonlinear stochastic systems. In Section 7,welay lems. Therefore, instead of the statistic used in the CUSUM out the formulation for an integrated INS/GPS. In Section 8, algorithm, we introduce an alternative statistic. We show that we present some simulation results. In Section 9, we summa- with this statistic, the calculation of the LR can be done re- rize the results and lay out the future work. cursively and the computational complexity of the method stays constant with respect to time. 2. APPROXIMATE NONLINEAR FILTERING Unlike the linear case, change detection for nonlinear Consider the dynamical system stochastic systems has not been investigated in any depth. In the cases where a nonlinear system experiences a sudden x = f x + G x w , k+1 k k k k k change, linearization and change detection methods for lin- (1) yk = hk xk + vk, ear systems are the main tools for solving the change detection problem (see, e.g., [5]). The reason this subject has n d where the distribution of x0 is given, xk ∈ R , yk ∈ R ,and not been pursued is clear; even when there is no change, q d wk ∈ R ,andvk ∈ R are white noise processes with known the estimation of the state of the system, given the observa- statistics, and the functions fk(·)andhk(·) and the matrix tions, results in an infinite-dimensional nonlinear filter [6], Gk(·) have the proper dimensions. The noise processes wk, and the change in the system can only make the estimation vk, k = 0, 1, ..., and the initial condition x0 are assumed in- harder. dependent. In the last decade, there has been an increasing interest We assume the initial distribution for x0 is given. The in simulation-based nonlinear filtering methods. These fil- goal is to find the conditional distribution of the state, tering methods are based on a gridless approximation of the |Yk Yk = given the observation, that is, Pk(dxk 1), where 1 conditional density of the state, given the observations. Grid- {y , y , , y } is the observation up to and including time ff 1 2 ... k less simulation-based filtering, now known by many di er- k. The propagation of the conditional distribution, at least ent names such as particle filtering (PAF) [7, 8], the con- conceptually, can be expressed as follows [6]. densation algorithm [9], the sequential Monte Carlo (SMC) method [10], and Bayesian bootstrap filtering [11], was first Step (1). Initialization: introducedbyGordonetal.[11] and then it was rediscovered P0 dx0 y0 = P dx0 . (2) independently by Isard and Blake [9] and Kitagawa [12]. Step (2). Diffusion: The theoretical results regarding the convergence of the approximate conditional density given by PAF to the true P − dx Yk (k+1) k+1 1 conditional density (in some proper sense) suggest that this (3) = Yk method is a strong alternative for nonlinear filtering [7]. The P dxk+1 xk Pk dxk 1 . advantage of this method over the nonlinear filter is that PAF is a finite-dimensional filter. The authors believe that PAF Step (3). Bayes’ rule update: and its modifications are a starting point to study change de- k+1 P(k+1) dxk+1 Y tection for nonlinear stochastic systems. In this paper, we use 1 k − Y (4) the results in [13] and we develop a new change detection = p yk+1 xk+1 P(k+1) dxk+1 1 . − Yk method for nonlinear stochastic systems. p yk+1 xk+1 P(k+1) dxk+1 1 In [13], we showed that when the number of satel- ← lites is below a critical number, linearization methods such Step (4). k k +1;gotoStep(2). as extended Kalman filtering (EKF) result in an unaccept- We have assumed that P(dyk+1|xk+1)= p(yk+1|xk+1)dyk+1 able position error for an integrated inertial navigation sys- and p(yk+1|xk+1) is the conditional density of the observa- tem/global positioning system (INS/GPS). We also showed tion, given the state at time k +1. that the approximate nonlinear filtering methods, the projec- The conditional distribution given by the above steps tion particle filter [13], in particular, are capable of providing is exact, but in general, it can be viewed as an infinite- an acceptable estimate of the position in the same situation. dimensional filter, thus not implementable. PAF, in brief, Change Detection for Nonlinear Systems 2297

Step (1). Initialization. Step (1). Initialization. 1 N 1 N Sample N i.i.d. random vectors x0, ..., x0 with Sample N i.i.d. random vectors x0, ..., x0 with the initial distribution P0(dx). the density p0(x). Step (2). Diﬀusion. Step (2). Diﬀusion. 1 N 1 N 1 N 1 N Find xˆk+1, ..., xˆk+1 from the given xk, ..., xk , Find xˆk+1, ..., xˆk+1 from the given xk, ..., xk , using the dynamic rule: using the dynamic rule: xk+1 = fk xk + Gk xk vk. xk+1 = fk xk + Gk xk vk.

ˆ − 1 N Step (3). Use Bayes’ rule and find the empirical Step (3). Find the MLE of θ(k+1) ,givenxˆk+1, ..., xˆk+1: distribution N N j ˆ − = T i − Υ p yk+1 xˆ θ(k+1) arg max exp θ c xˆk+1 (θ) . N = k+1 θ Pk+1(dx) N δ j (dx). i=1 i xˆk+1 j=1 i=1 p yk+1 xˆk+1 Step (4). Use Bayes’ rule Step (4). Resampling. ˆ Sample x1 , ..., xN according to PN (dx). p x, θ(k+1) k+1 k+1 k+1 T ← exp θˆ − c(x) − Υ θˆ − p y x Step (5). k k +1;gotoStep(2). = (k+1) (k+1) k+1 . ˆT − Υ ˆ − exp θ(k+1)− c(x) θ(k+1) p yk+1 x dx Algorithm 1: Particle filtering. Step (5). Resampling. Sample x1 , ..., xN according to p(x, θˆ ). is an approximation method that mimics the above calcu- k+1 k+1 k+1 ← lations with a finite number of operations using the Monte Step (6). k k +1;gotoStep(2). Carlo method. Algorithm 1 shows one manifestation of PAF Algorithm 2: Projection particle filtering for an exponential family [7, 11]. of densities. 1 N It is customary to call xk, ..., xk particles. The key idea in PAF is to eliminate the particles that have low importance With this definition for the exponential family of densi- weights p(yk|xk) and to multiply particles having high im- ties, the projection PAF algorithm is stated as in Algorithm 2. portance weights [11, 14]. The surviving particles are thus N approximately distributed according to Pk (dx). This auto- 3. CHANGE DETECTION: PROBLEM DEFINITION matically makes the approximation one of better resolution in the areas where the probability is higher. Online detection of a change can be formulated as follows Yk In the simulations in this paper, we use a modified ver- [2]. Let 1 be a sequence of observed random variables | sion of the classical PAF method called projection PAF. For with conditional densities pθ(yk yk−1, ..., y1). Before the un- completeness sake, we repeat the algorithm that was given in known change time t0, the parameter of the conditional den- [13]. In projection PAF, we assume that the conditional density θ is constant and equal to θ0. After the change, this pa- sity of the state of the system, given the observation, is close rameter is equal to θ1. In online change detection, one is in- to an exponential family of densities S defined as follows.2 terested in detecting the occurrence of such a change. The exact time and the estimation of the parameters before and Definition 1(Brigo[15]). Let {c1(·), ..., cp(·)} be affinely in- after the change are not required. In the case of multiple dependent3 scalar functions defined on k dimensional Eu- changes, we assume that the changes are detected fast enough clidean space Rk. Assume that so that in each time instance, only one change can be considered. Online change detection is performed by a stopping p T Θ0 = θ ∈ R : Υ(θ) = log exp θ c(x) dx < ∞ (5) rule [2] t = inf k : g Yk ≥ λ ,(7) is a convex set with a nonempty interior, where c(x) = a k 1 T [c1(x), ..., cp(x)] . Then S,definedas where λ is a threshold, {gk}k≥1 is a family of functions, and ta is the alarm time, that is, the time when change is detected. S = · ∈ Θ p( , θ), θ , If t

∈ Rn ∈ Rd ∈ Rq ∈ Rd 0 · 1 · 0 · 1 · where xk , yk ,andwk and vk are and the functions fk ( ), fk ( ), hk( ), hk( ) and the matrices Γ 0 · 1 · white noise processes with known statistics. Fk, Gk, HK , k, Gk( ), Gk( ) have the proper dimensions. The sudden change and Ξk are matrices of proper dimensions, and Υx(k, t0)and occurs when ik changes from 0 to 1. Υ k y(k, t0) are the dynamic proﬁles of the assumed changes, In this setup, Sj can be written as follows: of dimensions ˜ ≤ and ˜ ≤ ,respectively.w and v n n d d k k k j−1 are white Gaussian noise processes, independent of the initial p Y Y , t0 = j Υ = Υ = k = j 1 condition x0. It is assumed that x(k, t0) 0and y(k, t0) Sj ln −1 . (13) p YkY j , t >k 0fork

| k−1 = P(xk Y0 , t0 1)

| k−2 = | k−1 = P(xk−1 Y0 , t0 1) P(xk Y0 , t0 2)

| k−2 = | k−1 = P(xk−1 Y0 , t0 2) P(xk Y0 , t0 3)

| 3 = P(x4 Y0 , t0 1)

| 2 = 3 − P(x3 Y , t0 1) P(x4|Y , t0 = 2) | k 1 = − 0 0 P(xk Y0 , t0 k 2)

| 1 = | 2 = | 3 = | k−2 = − | k−1 = − P(x2 Y0 , t0 1) P(x3 Y0 , t0 2) P(x4 Y0 , t0 3) P(xk−1 Y0 , t0 k 2) P(xk Y0 , t0 k 1)

| 0 ≥ | 1 ≥ | 2 ≥ | 3 ≥ | k−2 ≥ − | k−1 ≥ P(x1 Y0 , t0 1) P(x2 Y0 , t0 2) P(x3 Y0 , t0 3) P(x4 Y0 , t0 4) P(xk−1 Y0 , t0 k 1) P(xk Y0 , t0 k)

Figure 1: Combination of nonlinear filters used in the CUSUM change detection algorithm. that is, joins them. The time needed for the branch of the in- Consider the following statistic: dependent nonlinear filter to join the other forked branches YkY j−1 ∈{ } p j 1 , t0 j, ..., k depends on the convergence rate and the target accuracy of Tk = ln . (16) the approximation. j YkY j−1 p j 1 , t0 >k Although the procedure mentioned above can be used for k asymptotically stable nonlinear filters, there are several prob- The change detection algorithm based on statistic Tj can lems associated with this method. The known theoretical re- be presented as sults for identifying asymptotically stable filters are limited to t = min k ≥ j | Tk ≥ λ or Tk ≤−α , (17) either requiring ergodicity and the compactness of the state a j j k ≥ k ≤− space [18, 19, 20], or very special cases of the observation where j is the last time when Tj λ or Tj α,andλ>0 equation [17]. The rate of convergence of the filters in differ- and α>0 are chosen such that the detection delay is mini- ent branches is another potential shortcoming of the men- mized for a fixed mean time between two false alarms. tioned procedure. If the convergence rate is low in compar- For the rest of this paper, we assume that the probability ison to the rate of parameter change in the system, then the of change at time i condition on no change before i is q, that algorithm cannot take advantage of this convergence. is, P t0 = i | t0 ≥ i = q. (18) 6. NONLINEAR CHANGE DETECTION: NONGROWING COMPUTATIONAL COMPLEXITY Without loss of generality and for simplifying the nota- = k tion, we assume that j 1. To calculate the statistic T1 ,itis In this section, we introduce a new statistic to overcome ffi ≤ |Yk |Yk su cient to find P(dxk, t0 k 1)andP(dxk, t0 >k 1). the problem of growing computational complexity for the k Then T1 is given by change detection method. We emphasize that the parameters of the system before and after change are assumed to (1 − q)k P dx , t ≤ kYk Tk = ln k 0 1 be known. Therefore, the conditional density of the state of 1 − − k k 1 (1 q) P dxk, t0 >k Y the system, given the observation, can be calculated using a 1 (19) (1 − q)k W1 nonlinear filter. We show that this statistic can be calculated = ln k , − − k 0 recursively. 1 (1 q) Wk 2300 EURASIP Journal on Applied Signal Processing

0 = |Yk 1 = ≤ ≤ |Yk |Yk where Wk P(dxk, t0 >k 1)andWk P(dxk, t0 calculate P(dxk, t0 k 1)andP(dxk, t0 >k 1)recur- |Yk k ﬃ ≤ |Yk+1 k 1). Therefore, to calculate T1 recursively, it is su cient to sively. P(dxk+1, t0 k +1 1 ) can be written as follows:

≤ Yk+1 P dxk+1, t0 k +1 1 ≤ Yk = P dxk+1,t0 k +1, yk+1 1 ≤ Yk Yk P dxk+1, t0 k +1,yk+1 1 + P dxk+1, t0 >k+1,yk+1 1 ≤ ≤ Yk (20) = p yk+1 xk+1, t0 k +1) P dxk+1, t0 k +1 1 ≤ ≤ Yk Yk p yk+1 xk+1, t0 k +1 P dxk+1, t0 k +1 1 + p yk+1 xk+1, t0 >k+1 P dxk+1, t0 >k+1 1 p y x P dx , t ≤ k +1Yk = 1 k+1 k+1 k+1 0 1 , ≤ Yk Yk p1 yk+1 xk+1 P dxk+1, t0 k +1 1 + p0 yk+1 xk+1 P dxk+1, t0 >k+1 1

where p1(yk+1|xk+1) is the conditional density of the state tional density under the hypothesis that no change has oc- |Yk+1 xk+1, given the observation yk+1, under the hypothesis that curred. Similarly, for P(dxk+1, t0 >k+1 1 ), we have the change has occurred and p0(yk+1|xk+1) is the same condi- following:

k+1 P dxk+1, t0 >k+1 Y 1 p y x P dx , t >k+1Yk (21) = 0 k+1 k+1 k+1 0 1 . ≤ |Yk Yk p1 yk+1 xk+1 P dxk+1, t0 k +1 1 + p0 yk+1 xk+1 P dxk+1, t0>k +1 1

k Also, we have Equations (19)–(23) show that the statistic T1 can be calculated recursively. They also show that in the prediction step P dx , t ≤ k +1Yk k+1 0 1 of the nonlinear ﬁlter at each time instance, only two conditional distributions should be calculated, ( x , ≤ = ≤ | ≤ Yk ≤ Yk P d k+1 t0 P dxk+1, t0 k +1 xk, t0 k, 1 P dxk, t0 k 1 |Yk |Yk k +1 1)andP(dxk+1, t0 >k+1 1). Therefore, if a PAF method is used to approximate (22)and(23), we only ≤ | Yk Yk + P dxk+1, t0 k+1 xk, t0 >k, 1)P dxk, t0 >k 1 need two sets of particles to approximate these two conditional distributions. In the Bayes’ rule update step and the = 1 k+1 ≤ Yk Wk P1 dxk+1 xk P dxk t0 k, 1 resampling step of the PAF, (20)and(21)areused.Onepos- sible way of implementing (19)in(23) using a PAF method 0 = Yk + qWk P dxk+1 xk, t0 k +1 P dxk t0 >k, 1 , is as follows. (22) In Algorithm 3, the same number of particles is used to ﬁnd the conditional distribution before and after change. P dx , t >k+1|Yk k+1 0 1 This guarantees that always enough numbers of particles are +1 (23) = (1 − q)W0 Pk dx x P dx t >k, Yk . available for approximating the conditional densities before k 0 k+1 k k 0 1 and after change.

k+1 k+1 In the remaining sections, we use the introduced statistic In (22)and(23), P (dx |x )andP (dx |x ) are the k 0 k+1 k 1 k+1 k Tj for detecting a sudden change in the phase measurement Markov transition kernel under the hypothesis that no of an integrated INS/GPS. change has occurred before k + 1 and change has occurred before k,respectively.P(dxk+1|xk, t0 = k + 1) is the Markov transition kernel under the hypothesis that the change has 7. INTEGRATED INS/GPS occurred at k + 1. For the dynamics in (11), we have GPS provides worldwide accurate positioning if four or more | = = k+1 | P(dxk+1 xk, t0 k +1) P0 (dxk+1 xk). satellites are in view of the receiver. Although the satellite Change Detection for Nonlinear Systems 2301

Step (1). Initialization. Table 1: Definition of the parameters for WGS84 reference frame. 1 N Sample 2N i.i.d. random vectors x0, ..., x0 with 0 N+1 2N a 6378137.0 m Semimajor axis the weight W0 /N,andx0 , ..., x0 with the 1 1 = − 0 = weight W0 /N and W0 1 W0 , with the b 6356752.3142 m Semiminor axis initial distribution P0(dx, t0 > 0).(Sincewestart −5 ωie 7.292115 × 10 Earth angular velocity with the assumption that no change has − happened, is either 0 or a very small number.) e ( b(a b))/a Ellipsoid eccentricity Step (2). Diffusion. 1 N 1 N Find xˆk+1, ..., xˆk+1 from the given xk, ..., xk using the dynamic rule: Using carrier phase measurements enables the differential GPS to reach centimeter-level accuracy. A phase lock loop x = f0 x + 0 x v , k+1 k k Gk k k cannot measure the full-cycle part of the carrier phase. This N+1 2N unmeasured part is called integer ambiguity that requires to and find xˆk+1 , ..., xˆk+1 from the given N+1 2N be resolved through other means. In this paper, we assume xk , ..., xk using the dynamic rule: that the integer ambiguity is resolved (see, e.g., [26]). How- = 1 1 xk+1 fk xk + Gk xk vk. ever, the measured phase can experience a sudden change | Yk undetected by the phase lock loop. This sudden change is Step (3). Find an estimate for P(dxk+1 t0 >k, 1 )and | ≤ Yk called the cycle slip and if it is undetected by the integrated P(dxk+1 t0 k, 1 ) either by using empirical distributions INS/GPS, it results in an error in the estimated position. We will use the method introduced in this paper to detect such N N Yk = 1 changes. P(k+1)− dxk+1 t0 >k, 1 δ j dxk+1 , xˆk+1 N j=1 We consider the observation equation provided by the ith 2N GPS satellite at time k to have the form N k 1 − ≤ Y = j P(k+1) dxk+1 t0 k, 1 δxˆ dxk+1 , N k+1 i = i − i i j=N+1 yk ρ px, py, pz ρ bx, by, bz + ni t, t0 + cδ + vk, (24) or by using Step (3) of Algorithm 2. Then T T N k where [px, py, pz] and [bx, by, bz] are the rover and P − dxk+1, t0 >k+1|Y (k+1) 1 (known) base coordinates at time ,respectively, i( , , ) 0 N k k ρ x1 x2 x3 = (1 − q)W P − dxk+1 t0 >k, Y , T k (k+1) 1 is the distance from point [x1, x2, x3] to satellite i, δ is a com- N ≤ |Yk P − dxk+1, t0 k +1 bination of the receiver clock bias in the base and the rover, c (k+1) 1 0 N k i = qW P − dxk+1 t0 >k, Y is the speed of light, vk is the measurement noise for the ith k (k+1) 1 1 N ≤ Yk satellite signal, t0 is the unknown moment of the cycle slip, + Wk P(k+1)− dxk+1 t0 k, 1 . and ni(t, t0) = 0fortk+1 1 )andPk+1(dxk+1, t0 k + The main goal in the simulations is to detect the change |Yk+1 0 = N |Yk+1 1 1 ). Then set Wk+1 Pk+1(dxk+1, t0 >k+1 1 ) in the phase measurement as soon as it happens. 1 = N ≤ |Yk+1 and Wk+1 Pk+1(dxk+1, t0 k +1 1 ). Here we point out that the nonlinearity in measurement Step (5). Resampling. is not solely due to the function ρ. Integrated INS/GPS re- 1 N N quires coordinate transformations between INS parameters Sample xk+1, ..., xk+1 according to Pk+1(dxk+1, k+1 0 and GPS parameters, which contributes to the nonlinearity t0 >k+1|Y )/W . 1 k+1 of the measurement. Sample xN+1, ..., x2N according to PN (dx , k+1 k+1 k+1 k+1 Parameters of an integrated INS/GPS are expressed in t ≤ k +1|Yk+1)/W1 . 0 1 k+1 different coordinate systems. The GPS measurements are Step (6). k ← k +1;gotoStep(2). given in an earth-centered earth-fixed (ECEF) frame [27, 28]. The GPS position is given either in the rectangular coordi- Algorithm 3: Change detection using particle filtering. nate system (see (24)) or in the geodetic coordinate system with the familiar latitude, longitude, and height coordinate T vector [pλ, pφ, ph] . For the latter, the earth’s geoid is approx- constellation guarantees availability of four or more satel- imated by an ellipsoid. Table 1 shows the defining parame- lites worldwide, natural or man-made obstacles can block the ters for the geoid according to the WGS84 reference frame. satellite signals easily. Integrating dead reckoning or INS with The parameters and measurements of INS are given in the GPS [21, 22, 23, 24] is a method to overcome this vulnera- local geographical frame or in the body frame system, where bility. Here, INS or the dead reckoning provides positioning the velocity is given by the north-east-down velocity vec- T that is calibrated by the GPS. In this section, we consider the tor [vN , vE, vD] . The transformation matrix from the body case of an integrated INS/GPS. In [25], we showed that using frame to the local geographical frame is given by the matrix nonlinear filtering for positioning is essential. We compared Rb2g . In this paper, we assume the estimation problem for the the proposed PAF with regular PAF and EKF. gyro measurements is solved, hence Rb2g is known. 2302 EURASIP Journal on Applied Signal Processing

The GPS clock drift and the INS equations constitute key 8. SIMULATIONS AND RESULTS dynamics in integrated INS/GPS. The INS dynamic equation The dimension of the dynamical system in the simulation is can be expressed as follows (see [25] for details): eleven. The state of the dynamical system x is given as follows:     1   T  00 x = p , p , p , v , v , v , b , b , b , ρ, δ . (28) pλ  R +p  vN  λ φ h N E D u v w    λ h      =  1    d pφ  0 0  vE  dt,  R +p cos p  The differential equation describing the dynamics of the sys- φ h λ ff ph 00−1 vD tem is the combination of the di erential equations in (25), (26), and (27). Here we assume that ab = 0.001 and that the   covariance matrices for the Brownian motions in the INS dy- v2  − E −  Σ Σ  tan pλ 2ωie sin pλ vE  namic equations b and v are diagonal. To be more specific,  Rφ +ph  Σ = −6 Σ = −4   b 10 I and v 10 I,whereI is the identity matrix    vN vD   +  of the proper size. The observation equation is given in (24), vN  Rλ +ph    v v  where y is one component of the observation vector. The di-    E N  i d   =  tan(λ)+ωie sin(pλ)vN  vE   Rλ +ph  mension of the observation vector is the same as the number    vEvD  of available satellites. In (24), the observation is given as a vD  + +2ωie cos pλ vD   +  function of the position in the ECEF rectangular coordinate  Rφ ph   2 2  system that is then transformed to the geodetic coordinate − vN − vE − 2ωie cos pλ vE system [25]. Rλ +ph Rφ +ph For the simulation, we simply chose an eleven-dimen-  sional Gaussian density for the projection PAF. This choice of   density makes the random vector generation easy and com-   putationally affordable. To be able to use the projection PAF,          we used the maximum likelihood criteria to estimate the pa- a˜u  bu  0         rameters of the Gaussian density before and after Bayes’ cor- +R   +   +    dt + dwv, b2g a˜v bv 0  t rection.  a˜ b g  We used two Novatel GPS receivers to collect navigation w w    data (April 2, 2000). From this data, we extracted position in-  formation for the satellites, the pseudorange, and the carrier phase measurement noise powers for the L1 frequency. From (25) the collected information we generated the pseudorange and the carrier phase data for one static and one moving receiver where g = 9.780327 m/s2 is the gravitational accelera- 2 2 2 3/2 (base and rover, respectively). We assume that for the car- tion, Rλ = a(1 − e )/(1 − e sin (pλ)) , Rφ = a/(1 − 2 2 1/2 T T rier phase measurement, the integer ambiguity problem is e sin (pλ)) ,[a˜u, a˜v, a˜w] and [bu, bv, bw] are the ac- already solved (see [26] for details). The movement of the celerometer measurement and the accelerometer measure- INS/GPS platform was simulation based and it was the basis ment bias, respectively, both expressed in the body frame, v for the measurement data measured by the accelerometers, and w is a vector-valued Brownian motion process with zero the gyros, the GPS pseudorange, and the GPS carrier phase mean and known covariance matrix. The measurement bias data. = T b [bu, bv, bw] has the dynamics As a precursor, we note that in the simulation in [13] we showed that for an integrated INS/GPS when the num- =− b db abbdt + dwt , (26) ber of satellites is less than a critical number, projection PAF provides a very accurate estimate of the position, while the b position solution given by EKF is unacceptable. In Figures 2 where wt is a vector-valued Brownian motion with zero mean and known covariance matrix, and ab is a small posi- and 3, a comparison of the position estimation error in the tive constant. The receiver clock drift δt is represented by the rectangular coordinate system for one typical run of each integration of an exponentially correlated random process ρt method is shown. For that simulation, we assumed that the [24]: GPS receiver starts with six satellites. At time t = 100, the receiver loses the signal from three satellites, and it gains one = 1 ρ satellite signal back at t 400. From these two figures, it dρ =− ρ dt + dw , t 500 t t is clear that when the number of satellites in view is below (27) a certain number (here four satellites), the EKF is unable to = ρ dδt tdt, provide a reasonable estimate of the position for the inte- ρ grated INS/GPS. Since the error of the position estimate of where wt is a zero-mean Brownian motion process with vari- linearization methods is unacceptable even when no change 2 −24 ance σρ = 10 .Thisdynamicmodelistypicalforaquartz in the phase measurement occurs, using these methods in the TCXO with frequency drift rate 10−9 s/s [24]. presence of an abrupt change is fruitless as well. Change Detection for Nonlinear Systems 2303

400 25

350 20 15 300 10 250 5 j k (m)

200 T 0 e x − 150 5 −10 100 −15 50 −20 0 −25 0 100 200 300 400 500 600 700 0 5 10 15 20 25 30 35 Time (s) Time (s)

PAF k Projection PAF Figure 4: The plot of Tj with respect to time for 100 independent EKF runs of an INS/GSP system. At time t = 15, the receiver loses three satellites. The cycle slip in channel one occurred at t = 20 seconds. Figure 2: Position estimation error (= Euclidean distance between Cartesian position vector and its estimate) for three methods: EKF, PAF, and projection PAF. The system starts with six satellites in view. 25 At t = 100 seconds, the signals from three satellites are lost. At t = 400 seconds, the system regains the signal from one satellite. 20

18 10 16 5 k g 14 0 12 −5 10 (m) −10 e

x 8 6 −15

4 −20 0 5 10 15 20 25 30 35 2 Time (s) 0

Figure 5: The plot of gk with respect to time for 100 independent 100 200 300 400 500 600 700 runs of an INS/GSP system. gk is the statistics used in the CUSUM Time (s) algorithm. At time t = 15, the receiver loses three satellites. The cycle slip in channel one occurred at t = 20 seconds. PAF Projection PAF EKF

Figure 3: Details of Figure 2, where the diﬀerence between the pro- unknown parameter is the change time. In the future, we will jection PAF method and the PAF method is clear. address the more general problem of unknown change parameters. For the simulation in this paper, we assumed that the To apply our method described in (17)forsudden phase lock loop associated to satellite one experiences a cy- phase change detection in an integrated INS/GPS, we use cle slip at time t = 20 and the phase changes suddenly. The projection PAF as our nonlinear ﬁltering method. We use the size of the change is one cycle. We assumed that the GPS re- CUSUM algorithm to evaluate the proposed changed detec- ceiver starts with six satellites. At time t = 15, the receiver tion scheme. We compare the statistic in (17) with that of loses three satellites (we eliminate them from the data). We the CUSUM algorithm. We wish to emphasize that in the ex- used Algorithm 2 to calculate the statistic in the CUSUM al- ample given in this section, we assume that the parameter of gorithm gk and Algorithm 3 and projection PAF to calculate k k change, before and after the change, is known and the only Tj .InFigure 4,wehaveplottedTj with respect to time for 2304 EURASIP Journal on Applied Signal Processing

100 independent runs. In Figure 5, we have plotted the statis- M. Valeckova, M. Karn´ y,´ and K. Warwick, Eds., vol. 1, pp. tic gk for the same 100 independent runs. The figures show 167–174, Prague, Czech Republic, September 1998. k [9] M. Isard and A. Blake, “Contour tracking by stochastic prop- that there are sudden changes both in Tj and gk when a cycle slip occurs, and this is true for all 100 runs. These fig- agation of conditional density,” in Proc. European Conference on Computer Vision (ECCV ’96), vol. 1, pp. 343–356, Cam- ures also show that for this simulation, the performance of bridge, UK, April 1996. the algorithm given in this paper is comparable to the per- [10] P. Fearnhead, Sequential Monte Carlo methods in filter theory, formance of the CUSUM algorithm. This simulation shows Ph.D. thesis, Merton College, University of Oxford, Oxford, k that Tj can be used successfully to detect the cycle slip with UK, 1998. known strength. [11]N.J.Gordon,D.J.Salmond,andA.F.M.Smith,“Novelap- proach to nonlinear/non-Gaussian Bayesian state estimation,” IEE Proceedings Part F: Radar and Signal Processing, vol. 140, 9. CONCLUSION AND FUTURE WORK no. 2, pp. 107–113, 1993. [12] G. Kitagawa, “Monte Carlo filter and smoother for non- In this paper, we developed a new method for the detec- Gaussian nonlinear state space models,” JournalofCompu- tion of abrupt changes for known parameters after change. tational and Graphical Statistics, vol. 5, no. 1, pp. 1–25, 1996. [13] B. Azimi-Sadjadi and P. S. Krishnaprasad, “Approximate We showed that unlike the CUSUM algorithm, the statistic nonlinear filtering and its applications for an integrated in this method can be calculated recursively for nonlinear INS/GPS,” to appear in Automatica. stochastic systems. In the future, we intend to extend our re- [14] A. Doucet, N. de Freitas, and N. Gordon, “An introduction to sults to the case where the parameters after change are un- sequential Monte Carlo methods,” in Sequential Monte Carlo known. The major obstacle in this extension is the complex- Methods in Practice,A.Doucet,N.deFreitas,andN.Gordon, ity of the change detection method. Eds., pp. 1–14, Springer, New York, NY, USA, 2001. [15] D. Brigo, Filtering by projection on the manifold of exponential densities, Ph.D. thesis, Department of Economics and Econo- metrics, Vrije Universiteit, Amsterdam, The Netherlands, Oc- ACKNOWLEDGMENT tober 1996. This research was supported in part by Army Research Of- [16] B. R. Crain, “Exponential models, maximum likelihood estimation, and the Haar condition,” Journal of the American fice under ODDR&E MURI97 Program Grant no. DAAG55- Statistical Association, vol. 71, no. 355, pp. 737–740, 1976. 97-1-0114 to the Center for Dynamics and Control of Smart [17] A. Budhiraja and D. Ocone, “Exponential stability in discrete- Structures (through Harvard University), under ODDR&E time filtering for non-ergodic signals,” Stochastic Processes and MURI01 Program Grant no. DAAD19-01-1-0465 to the Their Applications, vol. 82, no. 2, pp. 245–257, 1999. Center for Communicating Networked Control Systems [18] R. Atar and O. Zeitouni, “Exponential stability for nonlinear (through Boston University), and by the National Science filtering,” Annales de l’Institut Henri Poincaré. Probabilités et Foundation Learning and Intelligent Systems Initiative Grant Statistiques, vol. 33, no. 6, pp. 697–725, 1997. [19] R. Atar, “Exponential stability for nonlinear filtering of dif- no. CMS9720334. fusion processes in a noncompact domain,” The Annals of Probability, vol. 26, no. 4, pp. 1552–1574, 1998. REFERENCES [20] F. LeGland and N. Oudjane, “Stability and approximation of nonlinear filters in the Hilbert metric, and application to [1] E. S. Page, “Continuous inspection schemes,” Biometrika, vol. particle filters,” in Proc. IEEE 39th Conference on Decision and 41, pp. 100–115, 1954. Control, vol. 2, pp. 1585–1590, Sydney, Australia, December [2] M. Basseville and I. V. Nikiforov, Detection of Abrupt Changes: 2000. Theory and Application, Prentice-Hall, Englewood Cliffs, NJ, [21] E. Abbott and D. Powell, “Land-vehicle navigation using USA, 1993. GPS,” Proceedings of the IEEE, vol. 87, no. 1, pp. 145–162, [3] G. Lorden, “Procedures for reacting to a change in distri- 1999. bution,” Annals of Mathematical Statistics,vol.42,no.6,pp. [22] G. Elkaim, M. O’Connor, T. Bell, and B. Parkinson, “System 1897–1908, 1971. identification of a farm vehicle using carrier-phase differential [4] A. S. Willsky and H. L. Jones, “A generalized likelihood ratio GPS,” in Proc. of 9th International Technical Meeting of the approach to the detection and estimation of jumps in linear Satellite Division of the Institute of Navigation (ION GPS-96), systems,” IEEE Trans. Automatic Control,vol.21,no.1,pp. pp. 485–494, Kansas City, Mo, USA, September 1996. 108–112, 1976. [23] R. Zickel and N. Nehemia, “GPS aided dead reckoning nav- [5] I. V. Nikiforov, “New optimal approach to global positioning igation,” in Proc. 1994 National Technical Meeting, pp. 577– system/differential global positioning system integrity moni- 586, San Diego, Calif, USA, January 1994. toring,” Journal of Guidance, Control and Dynamics, vol. 19, [24] H. Carvalho, P. Del Moral, A. Monin, and G. Salut, “Optimal no. 5, pp. 1023–1033, 1996. nonlinear filtering in GPS/INS integration,” IEEE Trans. on [6] P. S. Maybeck, Stochastic Models, Estimation, and Control, Aerospace and Electronics Systems, vol. 33, no. 3, pp. 835–850, vol. 2, Academic Press, New York, NY, USA, 1982. 1997. [7] P. Del Moral, “Nonlinear filtering: interacting particle solu- [25] B. Azimi-Sadjadi, Approximate nonlinear filtering with appli- tion,” Markov Processes and Related Fields,vol.2,no.4,pp. cations to navigation, Ph.D. thesis, University of Maryland at 555–580, 1996. College Park, College Park, Md, USA, August 2001. [8] F. Le Gland, C. Musso, and N. Oudjane, “An analysis of [26] B. Azimi-Sadjadi and P. S. Krishnaprasad, “Integer ambiguity regularized interacting particle methods in nonlinear filter- resolution in GPS using particle filtering,” in Proc. American ing,” in Proc. IEEE 3rd European Workshop on Computer- Control Conference, vol. 5, pp. 3761–3766, Arlington, Va, USA, Intensive Methods in Control and Signal Processing,J.Roj´ıcek, June 2001. Change Detection for Nonlinear Systems 2305

[27] B. Hofmann-Wellenhof, H. Lichtenegger, and J. Collins, Global Positioning System: Theory and Practice, Springer, New York, NY, USA, 2nd edition, 1993. [28] J. A. Farrell and M. Barth, The Global Positioning System and Inertial Navigation, McGraw-Hill, New York, NY, USA, 1998.

Babak Azimi-Sadjadi received his B.S. degree from Sharif University of Technology in 1989, his M.S. degree from Tehran Uni- versity in 1992, and his Ph.D. degree from the University of Maryland at College Park in 2001, all in electrical engineering. He is currently with the Department of Electrical, Computer, and Systems Engineering, Rens- selaer Polytechnic Institute, where he is a Research Assistant Professor. He also has a joint appointment with the Institute for Systems Research, Univer- sity of Maryland, as a Research Associate. His research interests include nonlinear ﬁltering, networked control systems, and wireless networks. P. S. Krishnaprasad received the Ph.D. degree from Harvard University in 1977. He was in the faculty of the Systems Engineer- ing Department at Case Western Reserve University from 1977 to 1980. He has been with the University of Maryland since Au- gust 1980, where he has held the position of Professor of electrical engineering since 1987, and has a joint appointment with the Institute for Systems Research since 1988. He is also a member of the Applied Mathematics faculty. He has held visiting positions with Erasmus University (Rotterdam); the Department of Mathematics at University of California, Berke- ley; the Department of Mathematics at University of Groningen (The Netherlands); the Mathematical Sciences Institute at Cornell University; and the Mechanical and Aerospace Engineering De- partment at Princeton University. Krishnaprasad’s interests include geometric control theory, ﬁltering and signal processing theory, robotics, acoustics, and biologically-inspired approaches to control, sensing, and computation. He is currently actively engaged in the study of collectives, that is, communicating networked control systems. P. S. Krishnaprasad was elected a Fellow of the IEEE in 1990. EURASIP Journal on Applied Signal Processing 2004:15, 2306–2314 c 2004 Hindawi Publishing Corporation

Particle Filtering for Joint Symbol and Code Delay Estimation in DS Spread Spectrum Systems in Multipath Environment

Elena Punskaya Signal Processing Laboratory, Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, UK Email: [email protected] Arnaud Doucet Signal Processing Laboratory, Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, UK Email: [email protected] William J. Fitzgerald Signal Processing Laboratory, Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, UK Email: [email protected]

Received 19 June 2003; Revised 18 June 2004

We develop a new receiver for joint symbol, channel characteristics, and code delay estimation for DS spread spectrum systems under conditions of multipath fading. The approach is based on particle ﬁltering techniques and combines sequential importance sampling, a selection scheme, and a variance reduction technique. Several algorithms involving both deterministic and randomized schemes are considered and an extensive simulation study is carried out in order to demonstrate the performance of the proposed methods. Keywords and phrases: direct sequence spread spectrum system, multipath fading, particle ﬁlters, signal detection, synchronization.

1. INTRODUCTION cases, when EKF methods were applied, the estimated parameters were divergent [1]. Joint signal detection and chan- Direct sequence (DS) spread spectrum systems are robust nel estimation was performed using deterministic maximum to many channel impairments, allow multiuser CDMA and likelihood (DML) methods [3, 4]. However, since the un- low-detectability signal transmission, and, therefore, are known parameters of interest were assumed deterministic in widely used in different areas of digital communications. Un- this case, a serious drawback of DML-type approaches was like many other communication systems, however, spread the phenomenon of error propagation. Later, a stochastic spectrum receivers require additional code synchronization, maximum likelihood (ML) approach for the estimation of which can be a rather challenging task under conditions of channel parameters was adopted with consequent symbol multipath fading, when severe amplitude and phase varia- detection using Viterbi algorithms [5]. The space-alternating tions take place. generalized expectation maximization (SAGE) scheme for The problem of joint symbol, delay, and multipath esti- maximum a posteriori (MAP) estimation was presented in mation has been addressed in the literature before (see e.g., [6]. [1, 2]), and proved to be a difficult one due to its inher- In this paper, we propose to estimate the channel param- ited nonlinearity. The previously proposed approaches were eters, code delays, and symbols jointly using particle filtering mainly based on the use of the extended Kalman filter (EKF). techniques—a set of powerful and versatile simulation-based However, many of them concentrated on the channel param- methods recently appeared in the literature (see [7]forasur- eters and delay estimation only; moreover, in a number of vey). The idea is to approximate the posterior distribution of Particle Filtering for DS Spread Spectrum Systems 2307 interest by swarms of N (N 1) weighted points in the sam- waveform transmitted in the symbol interval of duration T: ple space, called particles, which evolve randomly in time in ff correlation with each other and either give birth to o spring stx(τ) = Re sn d1:n u(τ)exp j2πf0τ particles or die according to their ability to represent the dif- (1) n − T<τ≤ nT ferent zones of interest of the state space. for ( 1) , The methods have already been successfully applied to where sn(·) performs the mapping from the digital sequence problems arising in digital communications, in particu- to waveforms and corresponds to the modulation technique lar, demodulation in fading channels [7, 8, 9]anddetec- employed, f denotes the carrier frequency, and u(τ)isa tion in synchronous CDMA [10]. In all this work, the un- 0 wideband pseudonoise (PN) waveform defined by known fading channel characteristics were integrated out and only the symbols needed to be imputed. The algorithm, K thus, made use of the structure of the model, and the un- u τ = c η τ − kT . known state involved discrete parameters only. Later inves- ( ) k c (2) k= tigation [10, 11], however, revealed some concerns regard- 1 ffi ing the e ciency of the standard randomized particle fil- c K tering techniques in this context. It has been shown that, Here, 1:K is a spreading code sequence consisting of chips ffi (with values {±1}) per symbol, η(τ − kTc) is a rectangu- for a fixed computational complexity, more e cient de- T terministic schemes could be designed leading to an im- lar pulse of unit height and duration c,transmittedat k − T <τ≤ kT T proved performance of the receiver. We attempt here to ( 1) c c,and c is the chip interval satisfying T = T/K study these results further, and compare various random- the relation c . ized and nonrandomized approaches. Iltis [12]hasrecently developed a particle filtering method to address a problem Channel model closely related to ours. However, in his approach, the un- The signal is passed through a noisy multipath fading chan- known symbol sequence is obtained through a standard al- nel which causes random amplitude and phase variations on gorithm, and only channel parameters and code delays are the signal. The channel can be represented by a time-varying estimated using particle filtering. The problem we are deal- tapped-delayed line with taps spaced Ts seconds apart, where ing with is more complex, since it involves both discrete Ts is the Nyquist sampling rate for the transmitted waveform; (symbols) and continuous-valued (delays) unknowns. The Ts = Tc/2 due to the PN bandwidth being approximately deterministic particle method, unfortunately, is not applica- 1/Tc (see sampling theorem [13]). The equivalent discrete- ble directly in this case. However, in view of the recent re- time impulse response of the channel is given by sults, we propose to combine it with sequential importance sampling for the mixed, discrete and continuous-valued pa- L−1 rameter case, followed by an appropriate selection proce- (l) ht = ft δt,l,(3) dure. The resulting algorithm explores the state space in a l=0 more systematic way at little or no extra cost in comparison with the standard particle filtering, employing a subop- where t is a discrete time index, t = 1, 2, ....By L we under- timal importance distribution. We develop and test this ap- stand the maximum number of paths (nonzero coefficients proach against other deterministic and stochastic schemes, (l) of ht)ofthechannel[5], ft are the complex-valued time- and demonstrate its performance by means of an extensive varying multipath coefficients arranged into the vector ft, simulation study. and δt l denotes the Kronecker delta, which is 1 if t = l,and0 The remainder of the paper is organized as follows. The , otherwise. model specifications and estimation objectives are stated We assume here that the channel coefficients ft and code in Section 2.InSection 3, a particle filtering method is delay θt propagate according to the first-order autoregressive developed for joint symbol/channel coefficients/code de- (AR) model: lay estimation. This section also introduces and reviews several alternative deterministic and stochastic schemes, = i.i.d.∼ N with simulation results and comparisons presented in ft Aft−1 + Bvt, vt c 0, IL ,(4) Section 4. Some conclusions are drawn at the end of the pa- θ = γθ σ ϑ ϑ i.i.d.∼ N per. t t−1 + θ t, t (0, 1), (5) which corresponds to a Rayleigh uncorrelated scattering L− 2. PROBLEM STATEMENT AND channel model; here A diag(α(0), ..., α( 1)), B (0) (L−1) (l) ESTIMATION OBJECTIVES diag(σ f , ..., σ f ), with σ f being the standard deviation, and α(l) accounting for the Doppler spread (see [2, 14]for Transmitted waveform details and discussion on the use of higher-order AR). In We denote, for any generic sequence κt, κi:j (κi, κi+1, ..., this paper, matrices A, B, and parameters γ and σθ are as- T κj ) , and let dn be the nth information symbol, and stx(τ) sumed known. Directions on the choice of these parameters the corresponding analog bandpass spread spectrum signal are given in [2, 14]. 2308 EURASIP Journal on Applied Signal Processing

Received signal for the joint estimation of all unknown parameters is de- The complex output of the channel sampled at the Nyquist tailed. We begin our treatment with incorporating a variance rate, (in which case, t = 2K(n − 1) + 1, ...,2Kn samples reduction technique, namely, Rao-Blackwellisation, and then correspond to the nth symbol transmitted, that is, dn ↔ proceed with the derivation of the particle filtering equations for the estimation of the required posterior distribution. The y2K(n−1)+1:2Kn) can, thus, be expressed as alternative deterministic and stochastic approaches are con- i.i.d. sidered at the end of the section. yt = C d1:n, θt + σt, t ∼ Nc(0, 1), (6) 3.1. Rao-Blackwellisation d θ = L−1 f (l)s t − l T − θ σ2 where C( 1:n, t) l=0 t rx(( ) s t)and is In this paper, we follow a Bayesian approach and, given the 1 (l) y the noise variance. The noise sequences ϑt, t,andvt , l = measurements 1:2Kn, base our inference on the joint pos- 0, ..., L − 1 are assumed mutually independent and indepen- terior distribution p(d1:n, f0:2Kn, θ0:2Kn|y1:2Kn). A straightforward application of particle filtering would, thus, focus on dent of the initial states f0 ∼ Nc(f0, Σf,0), θ0 ∼ N (θ0, Σθ,0). The received waveform s (τ) is obtained after ideal lowpass the estimation of this joint probability distribution, and, con- rx d θ filtering of rectangular pulses and is given by [2] sequently, obtain the estimates of 1:n, f0:2Kn,and 0:2Kn sequentially in time. It is beneficial, however, to improve the srx(τ) standard approach by making most of the structure of the model and applying the variance reduction techniques. K 1 τ−(k−1)Tc τ − kTc Indeed, similar to [7, 8, 9, 15], the problem of estimat- =sn d1:n ck Si 2π −Si 2π , π Tc Tc ing p(d1:n, f0:2Kn, θ0:2Kn|y1:2Kn)canbereducedtoaone k=1 of sampling from a lower-dimensional posterior p(d1:n, n − T<τ≤ nT for ( 1) , θ0:2Kn|y1:2Kn). If the approximation of p(d1:n, θ0:2Kn|y1:2Kn) (7) could be obtained, say, via particle filtering: where p d θ |y N 1:n, 0:2Kn 1:2Kn φ sin(ϕ) Si(φ) = dϕ. (8) N (10) ϕ = w (i)δ d θ − d(i) θ(i) 0 n 1:n, 0:2Kn 1:n, 0:2Kn , i=1 Estimation objectives one could compute the probability density p(f0:2Kn|y1:2Kn, The symbols dn, which are assumed i.i.d., the channel char- d1:n, θ0:2Kn) using the Kalman filter associated with (4)and acteristics ft, and the code delay θt are unknown for n, t>0. (6). As a result, the posterior p(f0:2Kn|y1:2Kn) could be ap- Our aim is to obtain sequentially in time an estimate of proximated by a random mixture of Gaussians the joint posterior probability density of these parameters p d n Kn θ Kn|y Kn ( 1: , f0:2 , 0:2 1:2 ), and some of its characteristics, pN f0:2Kn|y1:2Kn such as the MAP estimates of the symbols = p f0:2Kn|y1:2Kn, d1:n, θ0:2Kn θ 0:2Kn d d n = arg max p d n|y Kn ,(9) 1:n 1: 1: 1:2 d1:n (11) ×pN d1:n, θ0:2Kn|y1:2Kn dθ0:2Kn and the minimum mean square error (MMSE) estimates N E |y = w (i) p |y d(i) θ(i) of the channel characteristics (f0:2Kn 1:2Kn) and the delays n f0:2Kn 1:2Kn, 1:n, 0:2Kn i= E(θ0:2Kn|y1:2Kn). This problem, unfortunately, does not ad- 1 mit any analytical solution and, thus, approximate methods leading to lower variance of the estimates and, therefore, in- must be employed. One of the methods that has proved to be ffi useful in practice is particle filtering, and, in the next section, creased algorithm e ciency [15]. we propose a receiver based on the use of these techniques. Strictly speaking, we are interested in estimating the information symbols only with the tracking of the channel being naturally incorporated into the proposed algorithm. 3. PARTICLE FILTERING RECEIVER However, following this approach, the MMSE (conditional mean) estimates of fading coefficients can, of course, be ob- Particle filtering receivers have already been designed in tained if necessary as follows: [7, 8, 9], although for a much simpler case including symbols estimation only. The problem considered here is more EN f2K(n−1)+1:2Kn|y1:2Kn complicated since an additional continuous parameter is involved, and, in this section, the particle filtering algorithm = f2K(n−1)+1:2KnpN f0:2Kn|y1:2Kn df0:2Kn (12) N i (i) (i) 1 = w ( )E |y d θ The case of non-Gaussian noise can be easily treated using the tech- n f2K(n−1)+1:2Kn 1:2Kn, 1:n, 0:2Kn , niques presented in [9]. i=1 Particle Filtering for DS Spread Spectrum Systems 2309

E |y d(i) θ(i) with [f2K(n−1)+1:2Kn 1:2Kn, 1:n, 0:2Kn] being computed by K Sequential importance sampling step the Kalman filter, with 2 steps required for each symbol For i = 1, ..., N, sample transmitted. (i) d (i) θ ∼ π d θ |d(i) θ(i) ( n , n ) ( n, n 1:n−1, 0:n−1, y1:n). i ... N 3.2. Particle filtering algorithm For = 1, , , evaluate the importance (i) weights wn up to a normalizing constant. We can now proceed with the estimation of p(d n, (i) (i) 1: For i = 1, ..., N,normalizewn to obtain w n . θ0:2Kn|y1:2Kn) using particle filtering techniques. The method is based on the following remark. Suppose N particles Selection step i i ( ) ( ) N If N ff 0impliesπ(d1:n, θ0:n|y1:n) > 0). Then, using the importance sampling identity, an estimate of p(d1:n, θ0:n|y1:n) is given by the following point mass approximation: For i = 1, ..., N, (i) (i) sample dn ∼ p(dn), set wn = 1. N For t = 2K(n − 1) + 1, ...,2Kn, i (i) (i) i i pN d n θ n| n = w ( )δ d n θ n − d θ θ ( ) ∼ p θ |θ( ) 1: , 0: y1: n 1: , 0: 1:n, 0:n , sample t ( t t−1), i=1 perform one-step Kalman filter update (14) w(i) = w(i) p y |d θ(i) θ (i) y ( n n ( t 1:n, 0:t−1, t , 1:t−1)). i ( ) (i) (i) where w n are the so-called normalized importance weights, For i = 1, ..., N,normalizewn to obtain w n . w(i) p d(i) θ(i) | (i) n (i) 1:n, 0:n y1:n Algorithm 2: Sequential importance sampling (prior as impor- w n = , wn ∝ , (15) N w(j) π d(i) θ(i) | tance distribution). j=1 1:n 1:n, 0:n y1:n and yn denotes 3.3. Implementation issues yn = y K n− Kn (16) 2 ( 1)+1:2 The choice of importance distribution and selection scheme n = ... π d(i) θ(i) | is discussed in [16]; depending on these choices, the compu- for 1, 2, . The distribution ( 1:n, 0:n y1:n)hastoad- i i tational complexity of the algorithm varies. π d( ) θ( ) | mit ( 1:n−1, 0:n−1 y1:n−1) as a marginal distribution so that one could propagate this estimate sequentially in time with- 3.3.1. Importance density out subsequently modifying the past simulated trajectories. (i) Prior density The weights wn could also be updated online in this case: The simplest solution is to take the prior as an importance w(i) ∝ w(i) p |d(i) θ(i) distribution, that is, n n−1 yn 1:n, 1:n, y1:n−1 (i) i (i) (i) p d θ( )|d θ π dn θn|d n− θ n− n = p dn p θn|θn− n , n n−1, n−1 (17) , 1: 1, 0: 1, y1: 1 × i i i . π d( ) θ(i)|d( ) θ( ) Kn n , n 1:n−1, 0:n−1, y1:n 2 = p dn p θt|θt−1 , The sequential importance sampling described above is t=2K(n−1)+1 combined with a selection procedure when the effective sam- (19) ple size Neff then wn becomes − N 1 (i) 2 Neff = w n (18) wn ∝ p yn|y1:n−1, d1:n, θ0:n i=1 Kn 2 (20) = p yt|d1:n, θ0:t, y1:t−1 , falls below some fraction of N,sayNthres (see [15]forde- t= K n− tails). This helps to avoid the degeneracy of the algorithm by 2 ( 1)+1 discarding particles with low normalized importance weights and requires evaluation of 2K one-step Kalman filter updates and multiplying those with high ones. foreachsymbolasshowninAlgorithm 2. n − N {d(i) Given for the ( 1)th symbol particles 1:n−1, If K is long, it is useful to resample the particles at inter- θ(i) }N p d 0:n−1 i=1 distributed approximately according to ( 1:n−1, mediate steps between t = 2K(n − 1) + 1 and t = 2Kn.One θ0:n−1|y1:n−1), the general particle filtering receiver, proceeds can also use Markov chain Monte Carlo (MCMC) steps to as in Algorithm 1. rejuvenate the particles and in particular dn. 2310 EURASIP Journal on Applied Signal Processing

Suboptimal importance density For i = 1, ..., N, (i,m) Of course, using the prior distribution in our case can be For m = 1, ..., M, wn = 1, ineﬃcient, as no information carried by the observations is For t = 2Kn+1,...,2K(n +1), θ (i) ∼ p θ |θ(i) sample t ( t t−1), used to explore the state space. The optimal choice, in a sense m = ... M of minimizing the conditional variance of the importance for 1, , , perform one-step Kalman ﬁlter update weights [15], would consist of taking w(i,m) = w(i,m) p y |d(i) d = m θ(i) ( n n ( t 1:n−1, n , 0:t−1, θ (i) y t , 1:t−1)). π d θ |d θ = p d θ |d θ (i) n, n 1:n−1, 0:n−1, y1:n n, n 1:n−1, 0:n−1, y1:n , Evaluate the importance weight wn up to a (21) normalizing constant:

M as an importance density. From Bayes’ rule, p(dn, θn|d n− , 1: 1 (i) (i,m) wn ∝ wn p dn = m . θ0:n−1, y1:n) may be expressed as m=1 i = ... N w(i) w (i) p dn, θn|d1:n−1, θ0:n−1, y1:n For 1, , ,normalize n to obtain n . p yn|y n− , d n− , dn, θ n− , θn p dn p θn|θn− Algorithm 3: Evaluation of importance weights (suboptimal im- = 1: 1 1: 1 0: 1 1 , p yn|y1:n−1, d1:n−1, θ0:n−1 portance distribution). (22) in which case, The importance weights in this case can be calculated as wn ∝ p yn|y1:n−1, d1:n−1, θ0:n wn = p yn|y1:n−1, d1:n−1, θ0:n−1 M M = p yn|y n− , d n− , dn = m, θ n− , θn p dn = m , ˘ 1: 1 1: 1 0: 1 = p yn|y1:n−1, d1:n−1, dn = m, θ0:n−1, θn (23) m=1 θ˘ n m=1 (27) × p dn = m p θ˘n|θn− dθ˘n 1 where θn is drawn from the prior Gaussian distribution with 2 mean γθn−1 and variance σθ : cannot be computed analytically. Our aim then is to develop a suboptimal importance density “closest” to p(dn, θ(i) ∼ N γθ(i) σ2 i = ... N. n n−1, θ for 1, , (28) θn|d1:n−1, θ0:n−1, y1:n). p dn θn|d n− θ n− n (i) The probability density ( , 1: 1, 0: 1, y1: )can Theimportanceweightwn in (27)doesnotactuallydepend be factorized as (i) on dn , and the weights evaluation and selection steps can be (i) done prior to the sampling of dn as in Algorithm 3. p dn, θn|d1:n−1, θ0:n−1, y1:n For each symbol detection, this procedure requires the = p dn|d1:n−1, θ0:n, y1:n p θn|d1:n−1, θ0:n−1, y1:n , evaluation of the M 2K-step-ahead Kalman filters, which is (24) quite computationally expensive. Further research should, therefore, concentrate on development of other more effi- where p(dn|d1:n−1, θ0:n, y1:n) would be an optimal impor- cient suboptimal importance distributions on a case by case tance function if θn were fixed given by basis.

3.3.2. Selection p | d d θ p d yn y1:n−1, 1:n−1, n, 0:n n p dn|d1:n−1, θ0:n, y1:n = . As far as the selection step is concerned, a stratified sam- p yn|y1:n−1, d1:n−1, θ0:n (25) pling scheme [18] is employed in this paper since it has the minimum variance one can achieve in the class of unbiased N The second term in (24), p(θn|d1:n−1, θ0:n−1, y1:n), unfortu- schemes [19]. The algorithm is based on generating points nately presents a problem since the integral in (23) cannot be equally spaced in the interval [0, 1], with the number of off- evaluated in closed form. As a solution, we propose to use the spring Ni foreachparticlebeingequaltothenumberof q q prior density p(θn|θn−1) instead of p(θn|d1:n−1, θ0:n−1, y1:n) points lying between the partial sums of weights i−1 and i, j q = i w ( ) and, thus, employ the following suboptimal importance where i j=1 t . The procedure can be implemented in function (see [17] for a similar approach developed indepen- O(N)operations. dently): 3.4. Deterministic particle filter The use of the suboptimal importance distribution described π dn, θn|d1:n−1, θ0:n−1, y1:n (26) in Section 3.3.1 increases the efficiency of the algorithm in = p dn|d1:n−1, θ0:n, y1:n p θn|θn−1 . comparison with the standard approach using the prior. Particle Filtering for DS Spread Spectrum Systems 2311

However, as shown in [11], if one already opts for (27), and 100 all the calculations have to be performed anyway, it might be better to base our approximation of p(d1:n, θ0:n|y1:n) directly on − pN×M d1:n, θ0:n|y1:n 10 1 N M = w (i,m)δ d θ − d(i) d = m θ(i) θ(i) n 1:n, 0:n 1:n−1, n , 0:n−1, n , i=1 m=1 BER (29) 10−2 (i,m) where corresponding weights w n are equal to w(i,m) ∝w (i) p | d(i) d =m θ(i) θ(i) p d =m n n−1 yn y1:n−1, 1:n−1, n , 0:n−1, n n , 10−3 (30) 5101520 (i) and θn is still drawn from its prior (28). Indeed, all possible SNR (dB) “extensions” of the existing state sequences at each step n are considered in this case, and one does not discard unneces- PFP DML PFS DSR sarily any information by selecting randomly one path out of M w (i) the available. In the above expression, n−1 is the weight of Figure 1: Bit error rate performance (4DPSK signal, N = 100). the “parent” particle, which has M “offspring” instead of the usual one, resulting in a total number of N × M particles at each stage. This number increases exponentially with time, and, therefore, a selection procedure has to be employed at In order to assess the performance of the proposed ap- each step n. proaches we first applied them to a simpler case of synchro- The simplest way to perform such selection is just to nization in flat fading conditions, L = 1, for a system with choose the N most likely offspring and discard the others no spectrum spreading employed, c1 = 1, K = 1. In the first (as, e.g., in [20]). The superiority of this approach over other experiment, 4DPSK signals were considered with the aver- methods in the fully discrete framework is shown in [10, 11]. age signal-to-noise ratio (SNR) varying from 5 to 20 dB. The A more complicated procedure involves preserving the par- AR coefficients for the channel, (4), were set to α(0) = 0.999, (0) ticles with high weights and resampling the ones with low σ f = 0.01, and the delay model parameters in (5)were weights, thus reducing their total number to N. An algorithm chosen to be the same, γ = 0.999 and σθ = 0.01. The BER of this type is presented in [21] but other selection schemes obtained by the particle filtering receiver employing prior can be designed. Contrary to the case involving the discrete (PFP) and suboptimal (PFS) importance distributions, and parameters only, in this scenario, a resampling scheme with the deterministic receiver preserving N most likely particles (i) replacement could be employed, since θn is chosen ran- (DML) and using stratified resampling (DSR) is presented domly.Therefore,stratifiedresamplingcouldbeusedinor- in Figure 1. The marginal maximum a posteriori estimate dertoselectN particles from N × M particles available. (MMAP) Whether we choose to preserve the most likely particles, d = p d |y employ the selection scheme proposed in [21], or stratified n arg max n 1:2Kn (32) d resampling, the computational load of the resulting algo- n n N × M × K rithmsateachtimestep is that of 2 Kalman was employed to obtain the symbols. The number of par- filters, and the selection step in the first two cases is imple- ticles used in these algorithms was equal to N = 100, and O N × M N × M mented in ( log ) operations. Of course, if little or no improvement in BER was gained by increasing M is large, which is the case in many applications, all these this number for deterministic schemes. For the randomized methods are too computationally expensive to be used, and approaches, the number of particles required to achieve the one should employ a standard particle filter. BER of DSR algorithm was equal to N = 1200. In Figure 2, the mean square delay error (MSE) is presented as a function 4. SIMULATION RESULTS of the number of particles N for SNR = 10 dB:

In the following experiments the bit error rate (BER) and the KL 2d tracking delay error (θt−θt) were evaluated by means of com- 1 2 θMSE = θn − θn , (33) KLd puter simulations. Gray-encoded M-ary differential phase 2 n=1 shift keyed (MDPSK) signals were employed, with mapping function where Ld is a length of the symbol sequence, Ld = 1000. The results for the different SNRs are given in Figure 3.Asone n M πm s = jφ φ = 2 δ d − d . can see, the deterministic particle filter with stratified resam- n exp n , n M j m (31) j=1 m=1 pling slightly outperforms the receiver selecting most likely 2312 EURASIP Journal on Applied Signal Processing

0.025 0.25 0.2 0.02 0.15 0.1 0.015 0.05 0 0.01 −0.05 −0.1 Delay estimation error Mean square delay error −0.15 0.005 −0.2 −0.25 0 50 100 150 200 250 300 350 400 450 500 0 100 200 300 400 500 600 700 Number of particles (N) Sample

PFP PFS EKF DML DSR DSR

Figure 2: Mean square delay error for the diﬀerent number of par- Figure 4: The error in delay estimation. ticles (4DPSK, SNR = 10 dB).

10−1

0.02 DS spread spectrum system

10−2 0.015 BER

0.01 10−3 Mean square delay error

0.005

10−4 51015205101520 SNR (dB) SNR (dB)

PFP DML PFS DSR PFS DSR

Figure 3: Mean square delay error via SNR (4DPSK N = 100). Figure 5: Bit error rate.

particles, and is more efficient than both standard particle a sense a variant of the conventional per-survivor processing filtering schemes. (PSP) algorithm combined with the randomized particle fil- The particle filtering approach was also compared with tering procedure for the channel and delay estimation. Thus, the EKF method [2] in the second experiment. The algo- this procedure has proved more efficient, even in the situa- rithm was simplified to consider channel and code delay esti- tion which is more favorable for EKF, that is, which does not mation only (the transmitted symbols were assumed known involve the uncertainty associated with the unknown trans- as, e.g., with pilot symbols being used). Otherwise, simu- mitted symbols. Simulations demonstrating the superiority lation set-up was the same (N = 100 particles were em- of the particle filtering approach over RAKE receiver for re- ployed). The results for the SNR = 10 dB presented in lated problems are presented in [22]; for symbol detection, Figure 4 demonstrate good performance of the particle filter- the comparison of the particle filtering methods with other ing method. Please note that deterministic particle filter is in well-known approaches could be found in [8, 9, 23]. Particle Filtering for DS Spread Spectrum Systems 2313

0.5 0.018 . 0 4 . DS spread spectrum system DS spread spectrum system 0 016 0.3 0.014 0.2 0.012 0.1 0 0.01 N = −0.1 0.008 100 −0.2 Mean square delay error Delay estimation error 0.006 −0.3 = 0.004 −0.4 SNR 10 dB −0.5 0.002 0 500 1000 1500 2000 2500 3000 5101520 Sample SNR (dB)

PFS PFS DSR DSR

Figure 6: The error in delay estimation. Figure 7: Mean square delay error via SNR.

Finally, we applied the proposed algorithm to perform to consider channel tracking only since it is naturally incor- joint symbols/channel coefficients/code delay estimation for porated in the proposed algorithm. Future research should DS spread spectrum systems with K = 15, L = 4. A binary concentrate on the development of suboptimal importance DPSK modulation scheme was employed with the multipath distributions and selection schemes capable of increasing the channel response and AR coefficients chosen as in Channel algorithm efficiency. Bin[2]. As shown in Figure 5, the algorithm employing 100 particles exhibits good BER performance. A tracking error REFERENCES trajectory for 100 information symbols (corresponding to [1] R. A. Iltis, “A DS-CDMA tracking mode receiver with 1500 chips and 3000 channel samples) and an average SNR joint channel/delay estimation and MMSE detection,” IEEE equal to 10 dB is presented in Figure 6. Figure 7 also illus- Trans. Communications, vol. 49, no. 10, pp. 1770–1779, 2001. trates the mean square delay error as a function of SNR for [2] R. A. Iltis, “Joint estimation of PN code delay and multipath Ld = 1000. using the extended Kalman filter,” IEEE Trans. Communica- tions, vol. 38, no. 10, pp. 1677–1685, 1990. [3] U. Fawer and B. Aazhang, “A multiuser receiver for code di- 5. CONCLUSION vision multiple access communications over multipath channels,” IEEE Trans. Communications,vol.43,no.2/3/4,pp. In this paper, we propose the application of particle filter- 1556–1565, 1995. ing techniques to a challenging problem of joint symbols, [4] H. Zamiri-Jafarian and S. Pasupathy, “Adaptive MLSDE using channel coefficients, and code delay estimation for DS spread the EM algorithm,” IEEE Trans. Communications, vol. 47, no. 8, pp. 1181–1193, 1999. spectrum systems in multipath fading. The algorithm is de- [5] J. M´ıguez and L. Castedo, “Semiblind maximum-likelihood signed to make use of the structure of the model, and incor- demodulation for CDMA systems,” IEEE Trans. Vehicular porates a variance reduction technique. The work is based Technology, vol. 51, no. 4, pp. 775–781, 2002. on the recent results on the superiority of the DML ap- [6] A. Logothetis and C. Carlemalm, “SAGE algorithms for mul- proach in a fully discrete environment [10, 11]. The method tipath detection and parameter estimation in asynchronous cannot be applied straightforwardly, however, and several CDMA systems,” IEEE Trans. Signal Processing, vol. 48, no. procedures combining both deterministic and randomized 11, pp. 3162–3174, 2000. [7] A. Doucet, J. F. G. de Freitas, and N. J. Gordon, Eds., Sequen- schemes are considered. The algorithms are tested and com- tial Monte Carlo Methods in Practice,Springer,NewYork,NY, pared. Although computer simulations show that all meth- USA, 2001. ods are capable of providing good performance, in this par- [8] R. Chen, X. Wang, and J. S. Liu, “Adaptive joint detection ticular case involving additional continuous-valued param- and decoding in flat-fading channels via mixture Kalman fil- eters, the deterministic scheme employing stratified resam- tering,” IEEE Transactions on Information Theory, vol. 46, no. pling turns out to be the most efficient one. The choice of 6, pp. 2079–2094, 2000. [9] E. Punskaya, C. Andrieu, A. Doucet, and W. J. Fitzgerald, the algorithm might, however, be application-dependent, so “Particle filtering for demodulation in fading channels with further investigation is necessary. The receiver can be ex- non-Gaussian additive noise,” IEEE Trans. Communications, tended to address multiuser DS-CDMA transmission, using vol. 49, no. 4, pp. 579–582, 2001. the techniques proposed in [24], for example, or simplified [10] E. Punskaya, C. Andrieu, A. Doucet, and W. J. Fitzgerald, 2314 EURASIP Journal on Applied Signal Processing

“Particle filtering for multiuser detection in fading CDMA Arnaud Doucet wasborninFranceon channels,” in Proc. IEEE 11th Signal Processing Workshop on November 2, 1970. He received the M.S. de- Statistical Signal Processing (SSP ’01), pp. 38–41, Singapore, gree from the Telecom INT in 1993 and the August 2001. Ph.D. degree from University Paris XI Or- [11] E. Punskaya, A. Doucet, and W. J. Fitzgerald, “On the use and say in December 1997. From 1998 to 2000, misuse of particle filtering in digital communications,” in Pro- he was a Research Associate in the Signal ceedings XI European Signal Processing Conference (EUSIPCO Processing group of Cambridge University. ’02), Toulouse, France, September 2002. From 2001 to 2002 he was a Senior Lec- [12] R. A. Iltis, “A sequential Monte Carlo filter for joint linear/ turer in the Department of Electrical and nonlinear state estimation with application to DS-CDMA,” Electronic Engineering at Melbourne Uni- IEEE Trans. Signal Processing, vol. 51, no. 2, pp. 417–426, 2003. versity, Australia. Since September 2002, he has been university [13] J. G. Proakis, Digital Communications, McGraw-Hill, New Lecturer in information engineering at Cambridge University. He York, NY, USA, 3rd edition, 1995. is a Member of the IEEE Signal Processing Theory and Methods [14] C. Komninakis, C. Fragouli, A. H. Sayed, and R. D. Wesel, “Channel estimation and equalization in fading,” in Proc. Committee and an Associate Editor for The Annals of The Insti- IEEE 33rd Asilomar Conference on Signals, Systems, and Com- tute of Statistical Mathematics and Test. His research interests in- puters, vol. 2, pp. 1159–1163, Pacific Grove, Calif, USA, Octo- clude Bayesian statistics and simulation-based methods for estima- ber 1999. tion and control. He has coedited with J. F. G. de Freitas and N. J. [15] A. Doucet, S. Godsill, and S. Andrieu, “On sequential Monte Gordon Sequential Monte Carlo Methods in Practice, 2001. Carlo sampling methods for Bayesian filtering,” Statistics and Computing, vol. 10, no. 3, pp. 197–208, 2000. William J. Fitzgerald received the B.S., [16] A. Doucet, N. J. Gordon, and V. Kroshnamurthy, “Particle fil- M.S., and Ph.D. degrees in physics from ters for state estimation of jump Markov linear systems,” IEEE the University of Birmingham, Birming- Trans. Signal Processing, vol. 49, no. 3, pp. 613–624, 2001. ham, UK. He is the Professor of applied [17] T. Ghirmai, M. F. Bugallo, J. M´ıguez, and P. M. Djuric, “Joint statistics and signal processing and the Head symbol detection and timing estimation using particle filter- of Research at the Signal Processing Lab- ing,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process- oratory, Department of Engineering, Uni- ing (ICASSP ’03), vol. 4, pp. 596–599, Hong Kong, April 2003. versity of Cambridge, UK. Prior to joining [18] G. Kitagawa, “Monte Carlo filter and smoother for non- Cambridge, he worked at the Institut Laue Gaussian nonlinear state space models,” Journal of Compu- Langevin, Grenoble, France, for seven years tational and Graphical Statistics, vol. 5, no. 1, pp. 1–25, 1996. and was then a Professor of physics at the Swiss Federal Institute of [19] D. Crisan, “Particle filters—a theoretical perspective,” in Se- Technology (ETH), Zurich,¨ for five years. He is a Fellow of Christ’s quential Monte Carlo Methods in Practice,A.Doucet,J.F.G. College, Cambridge, where he is Director of studies in engineering. deFreitas,andN.J.Gordon,Eds.,Springer,NewYork,NY, He works on Bayesian inference applied to signal and data mod- USA, 2001. eling. He is also interested in nonlinear signal processing, image [20] J. K. Tugnait and A. H. Haddad, “A detection-estimation restoration and medical imaging, extreme value statistics, bioinfor- scheme for state estimation in switching environments,” Au- matics, data mining, and data classification. tomatica, vol. 15, no. 4, pp. 477–481, 1979. [21] P. Fearnhead, Sequential Monte Carlo methods in filter theory, Ph.D. thesis, University of Oxford, Oxford, UK, 1998. [22] T. Bertozzi, Applications of particle filtering to digital communications, Ph.D. thesis, CNAM, Paris, France, December 2003. [23] E. Punskaya, Sequential Monte Carlo methods for digital communications, Ph.D. thesis, Engineering Department, Cam- bridge University, Cambridge, UK, June 2003. [24] B. Dong, X. Wang, and A. Doucet, “A new class of soft MIMO demodulation algorithms,” IEEE Trans. Signal Processing, vol. 51, no. 11, pp. 2752–2763, 2003.

Elena Punskaya received the B.S. and M.S. degrees in mechanical engineering from the Bauman Moscow State Technical Uni- versity, Moscow, Russia, in 1997, and the M.Phil. and Ph.D. degrees in information engineering from the University of Cam- bridge, Cambridge, UK, in 1999 and 2003, respectively. She is currently with the Sig- nal Processing Laboratory, Department of Engineering, University of Cambridge, UK, and is a Research Fellow of Homerton College, Cambridge, UK, where she is a Director of Studies in Engineering. Her research interests focus on Bayesian inference, simulation-based methods, in particular, Markov chain Monte Carlo and particle ﬁltering, and their application to digital communications. EURASIP Journal on Applied Signal Processing 2004:15, 2315–2327 c 2004 Hindawi Publishing Corporation

Particle Filtering Equalization Method for a Satellite Communication Channel

Stephane´ Sen´ ecal´ The Institute of Statistical Mathematics, 4-6-7 Minami Azabu, Minato-ku, Tokyo 106-8569, Japan Email: [email protected]

Pierre-Olivier Amblard Groupe Non Lin´eaire, Laboratoire des Images et des Signaux (LIS), ENSIEG, BP 46, 38402 Saint Martin d’H`eres Cedex, France Email: [email protected]

Laurent Cavazzana Ecole´ Nationale Supérieure d’Informatique et de Mathématiques Appliquées de Grenoble (ENSIMAG), BP 72, 38402 Saint Martin d’Hères Cedex, France Email: [email protected]

Received 23 April 2003; Revised 23 March 2004

We propose the use of particle filtering techniques and Monte Carlo methods to tackle the in-line and blind equalization of a satellite communication channel. The main difficulties encountered are the nonlinear distortions caused by the amplifier stage in the satellite. Several processing methods manage to take into account these nonlinearities but they require the knowledge of a training input sequence for updating the equalizer parameters. Blind equalization methods also exist but they require a Volterra modelization of the system which is not suited for equalization purpose for the present model. The aim of the method proposed in the paper is also to blindly restore the emitted message. To reach this goal, a Bayesian point of view is adopted. Prior knowledge of the emitted symbols and of the nonlinear amplification model, as well as the information available from the received signal, is jointly used by considering the posterior distribution of the input sequence. Such a probability distribution is very difficult to study and thus motivates the implementation of Monte Carlo simulation methods. The presentation of the equalization method is cut into two parts. The first part solves the problem for a simplified model, focusing on the nonlinearities of the model. The second part deals with the complete model, using sampling approaches previously developed. The algorithms are illustrated and their performance is evaluated using bit error rate versus signal-to-noise ratio curves. Keywords and phrases: traveling-wave-tube amplifier, Bayesian inference, Monte Carlo estimation method, sequential simulation, particle filtering.

1. INTRODUCTION Although efficient for amplifying tasks, TWT devices suffer from nonlinear behaviors in their characteristics, thus im- Telecommunication has been taking on increasing impor- plying complex modeling and processing methods for equal- tance in the past decades and thus led to the use of satellite- izing the transmission channel. based means for transmitting information. A major imple- The very first approaches for solving the equalization mentation task to deal with such an approach is the atten- problem of models similar to the one depicted in Figure 1 uation of emitted communication signals during their trip were developed in the framework of neural networks. These through the atmosphere. Indeed, one of the most important methods are based on a modelization of the nonlinearities roles devoted to telecommunication satellites is to amplify using layers of perceptrons [2, 3, 4, 5, 6]. Most of these ap- the received signal before sending it back to Earth. Severe proaches require a learning or training input sequence for technical constraints, due to the lack of space and energy adapting the parameters of the equalization algorithm. How- available on board, can be solved thanks to special devices, ever, the knowledge or the use of such sequences is some- namely, traveling-wave-tube (TWT) amplifiers [1]. A com- times impossible: if the signal is intensely corrupted by noise mon model for such a satellite transmission chain is depicted at the receiver stage or for noncooperative applications, for in Figure 1. instance. 2316 EURASIP Journal on Applied Signal Processing

Satellite Input Output multiplexing TWTA multiplexing

Emission noise ne(t) Multipath fading channel

Emission ﬁlter Reception noise nr (t)

Emitted signal Received signal e(t) r(t)

Figure 1: Satellite communication channel.

e(t) r(t) 5 Transmission chain 4.5 4 . + 3 5 rˆ(t) − Volterra ﬁlter 3 2.5

LMS 2 Quadratic error 1.5 Figure 2: Identification of the model depicted in Figure 1 with a 1 Volterra filter. 0.5 0 Blind equalization methods have thus to be considered. 0 20 40 60 80 100 120 140 160 180 200 These methods often need precise hypothesis with the emit- Time index ted signals: Gaussianity or circularity properties of the probability density function of the signal, for instance [7]. Recently, Figure 3: Identification error, scheme of Figure 2. some methods make it possible to identify [8]orequalize [9, 10, 11] blindly nonlinear communication channels un- e t r t der general hypothesis. These blind equalization methods as- ( ) Transmission ( ) Volterra chain filter sume that the transfer function of the system can be modeled as a Volterra filter [12, 13]. LMS However, for the transmission model considered here, a − Volterra modelization happens to be only suitable for the task of identification and not for a direct equalization. For Figure 4: Equalization of the model depicted in Figure 1 with a instance, a method based on a Volterra modelization of the Volterra filter. TWT amplifier and a Viterbi algorithm at the receiver stage is considered in [14]. Such an identification method can be easily implemented through a recursive adaptation rule of this case, the error function happens not to converge show- the filter parameters with a least mean squares approach (cf. ing that the Volterra filter is unstable and that the system is Figure 2). The mean of the quadratic error (straight line) and not invertible with such modelization. its standard deviation (dotted lines) are depicted in Figure 3 It is then necessary to consider a different approach for 100 realizations of binary phase shift keying (BPSK) sym- for realizing blindly the equalization of this communication bol sequences, each composed of 200 samples. Similarly, the model. The aim of this paper is thus to introduce a blind equalization problem of the transmission chain 1 can be con- and sequential equalization method based on particle filter- sidered with a Volterra filter scheme, adapted with a recursive ing (sequential Monte Carlo techniques) [15, 16]. For re- least squares algorithm as depicted in Figure 4.However,in alizing the equalization of the communication channel, it Particle Filtering for Satellite Channel Equalization 2317

φ seems interesting to fully exploit the analytical properties of where the sequence of samples ( k)1≤k≤Ne is independently the nonlinearities induced by TWT amplifiers through para- and identically distributed from metric models of these devices [1]. Sequential Monte Carlo methods, originally developed for the recursive estimation of U{π/4,3π/4,5π/4,7π/4},(3) nonlinear and/or non-Gaussian state space models [17, 18], U Ω are well suited for reaching this goal. The field of communi- where Ω denotes the uniform distribution on the set . cation seems to be particularly propitious for applying par- The signal is emitted through the atmosphere to the satellite. ticle filtering techniques, as shown in the recent literature of The emission process is modeled by a Chebyshev filter. This the signal processing community [15, 19] and this issue and class of filters admits an IIR representation and their param- ff of the statistics community (see [20, 21], [16, Section 4]). eters, particularly their cuto frequency, depend on the value T Such Monte Carlo approaches were successfully applied to of symbol rate [2]. A detailed introduction to Chebyshev blind deconvolution [22], equalization of flat-fading chan- filters is given in [25], for instance. In the present case, the nels [23], and phase tracking problems [24], for instance. emission filter is assumed to be modeled with a 3 dB band- . /T The paper is organized as follows. Firstly, models for the width equal to 1 66 . The emitted signal is altered during emitted signal, the TWT amplification stage, and the other its trip in the atmosphere by disturbance signals. These phe- n t parts of the transmission chain are introduced in Section 2.A nomena are modeled by an additive corrupting noise e( ), procedure for estimating the emitted signal is considered in which is assumed to be Gaussianly, independently, and iden- a Bayesian framework. Monte Carlo estimation techniques tically distributed: are then proposed in Section 3 for implementing the com- 2 putation of the estimated signal under the assumption of a ne(t) ∼ NCC 0, σe ,(4) simpler communication model, focusing on the nonlinear 2 part of the channel. This approach uses analytical formu- where NCC 0, σe is a complex circular Gaussian distribu- lae of the TWT amplifier model described in Section 2 and tion, with zero-mean and variance equal to σ2. sampling methods for estimating integral expressions. The method is then generalized in Section 4 for building a blind Remark 1. The amplitude of signal (1)isadjustedinpractice and recursive equalization scheme of the complete transmis- at the emission stage in order to reach a signal-to-noise ratio sion chain. The sequential simulation algorithm proposed is (SNR) roughly equal to 15 dB during the transmission. based on particle filtering techniques. This approach makes it possible to process the data in-line and without the help 2.2. Amplification of a learning input sequence. The performance of the algo- After being received by the satellite, the signal is amplified rithm is illustrated by numerical experiments in Section 5. and sent back to Earth. This amplification stage is processed Finally, some conclusions are drawn in Section 6.Detailsof by a TWT device. A simple model for TWT amplifier is an the Monte Carlo approach are given in Appendix A. instantaneous nonlinear filter defined by 2. MODELING OF THE TRANSMISSION MODEL z = r exp(ıφ) −→ Z = A(r)exp ı φ + Φ(r) ,(5) The model of the satellite communication channel depicted in Figure 1 is roughly the same as the one considered for var- where r denotes the modulus of input signal. Amplitude gain ious problems dealing with TWT amplifiers devices (cf., e.g., and phase wrapping can be modeled by the following expres- [2]). The different stages of this communication channel are sions: detailed below. αar A(r) = ,(6) 2.1. Emission stage 1+βar2 The information signal to transmit is denoted by e(t). It is αpr2 N Φ(r) = . (7) usually a digital signal composed of a sequence of e symbols 1+βpr2 e ( k)1≤k≤Ne . The signal is transmitted under the analog form Theseformulaehavebeenshowntomodelvarioustypesof N e TWT amplifier device with accuracy [1]. Figures 5 and 6 rep- e(t) = ekI k− T kT (t), (1) [( 1) , [ resent functions (6)and(7)fortwosetsofparametersesti- k=1 mated in [1, Table 1] from real data and duplicated in Table 1. where T denotes the symbol rate and IΩ(·) is the indicator Curves with straight lines represent functions obtained with function of set Ω.Symbolsek are generated from classical the set of parameters of the first row of Table 1. The ones with modulations used in the field of digital telecommunication, dashed lines represent functions obtained with the other set like PSK or quadratic amplitude modulation (QAM), for in- of parameters. stance. In the following, the case of 4-QAM symbols is con- Adrawbackofmodel(5) is that it is not invertible in a sidered. Each symbol can be written as strict theoretical sense, as drawn in Figure 5.However,only the amplificative and invertible part of the system, repre- ek = exp ıφk ,(2)sented above the dotted line on Figure 5, will be considered. 2318 EURASIP Journal on Applied Signal Processing

1.2 Table 1: Parameters of (6) and (7) measured in practice.

αa βa αp βp 1 2149.1 0.8 2.2 1.2 2.5 2.8

0.6 distributed: Amplitude gain 0.4 2 nr (t) ∼ NCC 0, σr . (8)

0.2 This noise is always much more intense than at the emission stage. This is mainly due to the weak emission power avail- r t 0 able in the satellite. The received signal, denoted as ( ), is 00.511.522.53sampled at rate Ts. Modulus of input signal 2.4. Equalization Figure 5: Amplitude gain (6)ofTWTmodels. The goal of equalization is to recover emitted sequence e ( k)1≤k≤Ne from the knowledge of sampled sequence r jT ( ( s))1≤ j≤Nr . The equalization method proposed in this 1 φ paper consists in estimating symbol sequence ( k)1≤k≤Ne 0.9 by considering its posterior distribution conditionally to se- r jTs ≤ j≤N 0.8 quence ( ( ))1 r of samples of the received signal: 0.7 p φk r jTs . 1≤k≤Ne 1≤ j≤Nr (9) 0.6 0.5 To reach this goal, a Bayesian estimation procedure is considered with the computation of maximum a posteriori (MAP) . 0 4 estimates [27]. Phase wrapping 0.3 . Remark 2. Bayesian approaches have already been success- 0 2 fully applied in digital signal processing in the field of mobile 0.1 communication. In [28], for instance, autoregressive models 0 and discrete-valued signals are considered. 00.511.522.53 Modulus of input signal The computation of the estimates is implemented via Monte Carlo simulation methods [16, 29]. As the complete Figure 6: Phase wrapping (7)ofTWTmodels. transmission chain is a complex system, a simpler model focusing on the nonlinear part of the channel is considered in the following section, where Monte Carlo estimation ap- Signal processing in the satellite also performs the task of proaches are introduced. These estimation techniques will be multiplexing. The devices used for this purpose are modeled used in the equalization algorithm for the global transmis- by Chebyshev filters. Tuning of their parameters is given in sion chain in Section 4. [2], for instance. In the present case, filters at the input and at the output of the amplifier are assumed to have bandwidths 3. MONTE CARLO ESTIMATION METHODS /T . /T equal, respectively, to 2 and 3 3 . As a first approximation, to focus on the nonlinearity of the model, only a TWT amplifier is considered in a transmission 2.3. Reception channel corrupted with noises at its input and output parts The transmission of the signal back to Earth is much less as shown in Figure 7. powerful than at the emission stage. This is mainly due to se- The received signal r(t) is assumed to be sampled at sym- vere technical constraints because of the satellite design. The bol rate T. The problem is then to estimate a 4-QAM sym- influence of the atmospheric propagation medium is then bol φ aprioridistributed from (3) with the knowledge of the modeled by a multipath fading channel [26, Section 11], with model depicted in Figure 7 (cf. relations (4), (6), (7)and(8)), one reflected path representing an attenuation of 10 dB in and information of a received sample r. A Bayesian approach this case: z(t) → z(t)+αz(t − ∆). Moreover, the signal is is developed [27] by considering the posterior distribution still corrupted by disturbance signals, modeled by an additive noise signal nr (t), Gaussianly, independently, and identically p(φ|r) (10) Particle Filtering for Satellite Channel Equalization 2319

ne(t) nr (t) Table 2: Estimates of (11), SNRe = 10 dB, SNRr = 3 dB, 100 realizations. e(t) x(t) y(t) r(t) TWTA φ p (φ|r) π 0.69 ± 0.28 4 Figure 7: Simple communication channel with TWT amplifier. 3π 0.13 ± 0.21 4 5π 0.02 ± 0.05 and its classical MAP estimate. The method proposed in the 4 following consists in estimating values of distribution (10) 7π 0.16 ± 0.24 thanks to Monte Carlo simulation schemes [29] using re- 4 lations (6)and(7) which model nonlinearities of the TWT amplifier in a parametric manner. A 3.1. Estimation with known parameters the first row of Table 1.Amplitude of the emitted signal equals 0.5 and variances of transmission noises are such that In order to further simplify the study, parameters of the SNRe = 10 dB and SNRr = 3 dB. One hundred realizations transmission channel depicted in Figure 7 are firstly assumed are simulated and, for each, a sequence (15) of 100 samples ffi to be known. In the sequel, the coe cients of expressions (6) is considered. Table 2 gives mean values obtained from (16) and (7) are denoted by the symbol TWT. This information and their standard deviations. is taken into account in the posterior distribution (10)thus The error of the estimated values (16)ofprobabilities becoming (11) might seem quite large as the standard deviations can be reduced providing a larger number of samples. In the se- p φ A, σe,TWT,σr , r , (11) quel, we are only interested in obtaining rough estimates of where A denotes the amplitude of the emitted signal. From (11), enabling comparison of mean values for different φ as Bayes’ formula, the probability density function of this dis- shown in Table 2. Thus, even with a reduced number of sam- tribution is proportional to ples (15), it is possible to estimate accurately the MAP estimate of (11). p r A, φ, σe,TWT,σr × p φ A, σe,TWT,σr . (12) Performance of the Monte Carlo estimation method is then considered with respect to SNR at the input and out- The prior distribution at the right-hand side of the above ex- put of the amplifier (cf. Figure 7). The bit error rate (BER) pression reduced to p(φ), which is given by (3). The problem is computed by averaging the results obtained with a MAP is then to compute the likelihood approach. Statistics of the Monte Carlo estimates (16)ofdis- tribution (11) are computed with 100 realizations of symbol p r A, φ, σe,TWT,σr . (13) sequences composed of 1, 000 samples each. For each estimate, sequences (15) composed of 100 samples are consid- Indeed, this formula can be viewed (cf. Appendix A) as the ered. The results of these simulations for SNRe taking val- following expectation: ues 10, 12, and 15 dB are depicted in Figure 8 and curves from the bottom to the top are associated to decreasing 1 2 E exp − r − TWT(x) (14) SNRe. σ2 r The Bayesian approach and its Monte Carlo implemen- with respect to the random variable x which is Gaussianly tation make it possible to estimate the emitted signal with distributed: accuracy for a wide range of noise variances (cf. Figure 8). However, the estimation method described previously re- 2 x ∼ NCC A exp(ıφ), σe . (15) quires the knowledge of the model parameters. For many applications in the field of telecommunication, it is necessary Considering a sequence of samples (x)1≤≤N independently to assume these parameters unknown. It is the case for non- and identically distributed from (15), a Monte Carlo approx- stationary transmission models and for communication in imation of (14)isgivenby noncooperative contexts like passive listening, for instance. The equalization problem of the simplified model depicted N 1 1 2 in Figure 7 is now tackled in the case where the parameters exp − r − TWT x (16) N σ2 (A, σe,TWT,σr ) of the transmission channel are assumed to = r 1 be unknown. which is accurate for a number N of samples large enough. References [29, 30] provide detailed ideas and references 3.2. Estimation with unknown parameters about Monte Carlo methods. To illustrate such an approach, If the parameters are unknown, there are, at least, two approximation (16) is computed for the emitted symbol φ = Bayesian estimation approaches to be considered with re- π/4 and the values of TWT amplifier parameters given by spect to posterior distribution (10). A first method consists 2320 EURASIP Journal on Applied Signal Processing

Assuming that symbols and the model parameters are independent, expression (18) is proportional to

p φ × p r φ A σe σr p A σe σr − ( ) , , ,TWT, , ,TWT, 10 1 (20) × d A, σe,TWT,σr .

From the study of the previous case, the likelihood term in BER the integrand can be computed via a Monte Carlo estimate of expression (14) with a sequence of samples (15). An approach to estimate (18) is then to consider the integral ex- 10−2 pression in (20) as the expectation p rφ A σ σ Ep(A,σe,TWT,σr ) , , e,TWT, r (21)

345678910which is estimated via a Monte Carlo approximation of the SNR (dB) following form:

SNRe =15 dB Np e = SNR 12 dB 1 SNRe =10 dB p r φ, Ak, σe(k), TWTk, σr (k) , (22) Np k=1 Figure 8: Mean BER values for MAP estimates of signals ver- A σ k σ k where ( k, e( ), TWTk, r ( ))k=1,...,Np is a sequence of sam- sus SNRr in dB, model of Figure 7 with known parameters (A, σe,TWT,σr ), for various SNRe values. ples independently and identically distributed from the prior distribution in dealing with the joint distribution of all the parameters of p A, σe,TWT,σr . (23) the model Remark 3. The algorithm for sampling distribution (17) in- p φ, A, σe,TWT,σr r . (17) troduced in [31] requires also the setting of prior distribution (23).

This method makes it possible theoretically to jointly esti- The model of the parameters includes generally prior in- mate the emitted symbols and the parameters of the transformation thanks to physical constraints. For instance, the mission channel by implementing MAP and/or posterior TWT amplifier is assumed to work in the amplificative part mean approaches. The probability density function of dis- of its characteristic (cf. Figure 5). Thus, as a first rough ap- tribution (17) being generally very complex, Markov chain proximation, it can be assumed that A ∼ U[0,1] apriori. This Monte Carlo (MCMC) simulation methods [29, 30]canbe parameter is also tuned such that SNRe equals 15 dB during used to perform these estimation tasks. Such an approach the emission process (cf. Remark 1) implying the constraint is developed in [31] particularly for equalizing the complete σe = 0.2A. In a less strict case, it is sufficient to assume that transmission chain depicted in Figure 1.However,resultsob- σe ∼ U[0.01,0.5]. The parameters (αa, βa, αp, βp) of the TWT tained with this method happen not to give accurate esti- amplifier are supposed to be independent of other variables mates of the model parameters in practice. Indeed, MCMC of the system and also to be mutually independent. From the methods are generally useful for estimating various models values introduced in Table 1,anadequateprior distribution in the field of telecommunication [28, 32]. is Another approach consists in considering a marginalized version of distribution (10) with respect to the parameters of αa, βa, αp, βp ∼ U[1,3] × U[0,2] × U[1,5] × U[2,10]. (24) the model:

The extremal values of the downlink transmission noise vari- p(φ) = p φ, A, σe,TWT,σr r d A, σe,TWT,σr . (18) ance σr can be estimated with respect to prior ranges of values defined above. A uniform U[0.1,1.1] prior distribution for σr is thus chosen. Once all prior distributions have been defined, Such a technique, called Rao-Blackwellization in the statistics it is possible to implement a Monte Carlo estimation proce- literature, for example, [33, 34], makes it possible to improve dure for (20) with the help of approximations (22)and(16). ffi the e ciency of sampling schemes (see [20, 21], [16, Section Such an approach is tested for the computation of values of 24]). From Bayes’ formula, the integrand of expression (18) posterior distribution for an emitted symbol φ = π/4 consid- is proportional to ering that the values of the TWT amplifier are given by the first row of Table 1, that the amplitude of the emitted signal p r φ, A, σe,TWT,σr × p φ, A, σe,TWT,σr . (19) is given by A = 0.5, and that noise variances are such that Particle Filtering for Satellite Channel Equalization 2321

Table 3: Estimates of (10), SNRe = 10 dB, SNRr = 3 dB, 100 real- = izations. SNRe 10 dB

φ p (φ|r) SNRe=12 dB π 10−1 0.61 ± 0.21 4 3π 0.23 ± 0.20 4

5π BER 0.05 ± 0.03 4 7π 0.12 ± 0.14 SNRe=15 dB 4 10−2

= = SNRe 10 dB and SNRr 3 dB. One hundred estimations 345678910 are simulated and for each realization, sequences of 100 sam- SNR (dB) ples Figure 9: Mean BER values for MAP estimates of signals versus Ak σe k k σr k , ( ), TWT , ( ) 1≤k≤Np (25) SNRr in dB, model of Figure 7 with unknown/known parameters (A, σe,TWT,σr ) (straight/dashed lines), for various SNRe values. are drawn from distribution (23). For each sequence, as in the case where the model parameters are known, approximations (16)arecomputedfromsequences(25) composed of into account several phenomena: 100 samples each. Table 3 shows the mean values of the es- (1) effects of the filters modeling, emission, and multi- timated (22) and their standard deviations computed from plexing stages; these simulations. (2) attenuation of the received signal mainly due to multi- As for the previous case, where parameters (A, σe,TWT, ple paths during the downlink transmission; σr ) are known, we are only interested in obtaining rough mean values of Monte Carlo estimates of the MAP expres- (3) correlation induced by filters and emission and fading sion (10) and thus do not consider larger sample sizes for models. reducing the standard deviation of these estimates. An equalization method is proposed for this model Performance of this Monte Carlo estimation method is within a Bayesian estimation framework [27]. It consists in now considered with respect to uplink and downlink SNR. considering the posterior distribution of the sampled sym- As previously, BERs are computed by averaging results ob- bols conditionally to the sequence of the received samples: tained with a MAP approach for the Monte Carlo estimate (22)ofposterior distribution (10). One hundred realizations p e jTs ≤ j≤N r jTs ≤ j≤N . (26) of 1 000-symbol sequences are considered for each value of 1 r 1 r SNR. For every Monte Carlo estimate, sequences (25)and An estimation procedure is then implemented by computing (15) are composed of 100 samples. The results of simulations the MAP estimate of distribution (26). Monte Carlo estima- for SNRe equal to 10, 12, and 15 dB are depicted in Figure 9. tion methods developed in the previous paragraphs can be Curves from the bottom to the top are associated to a de- slightly modified in order to take into account the parame- creasing uplink SNR. As a comparison, the estimated mean ters of the complete transmission chain (cf. points (1) and values of BER in the case where the parameters of the TWT (2) above). amplifier are known are represented with dashed lines. The correlation of the samples at the receiver stage Performance is not much corrupted in the case where mainly comes from the linear filters in the channel. In fact, model parameters are unknown. Thus, considering the poste- this problem yields to the estimation of parameter p: the rior distribution of interest (18), marginalized with respect to number of received samples per symbol rate p = T/Ts,as these parameters, seems to be a good strategy for tackling the parameters of Chebyshev filters at the emission and multi- equalization problem. An in-line simulation method based plexing stages depend on its value. on the Monte Carlo estimation techniques previously devel- Computing the correlation of the received samples makes oped is proposed hereinafter for realizing the equalization of it possible to give an estimate of p [31] in the case where this the complete transmission chain depicted in Figure 1. quantity is an integer, and thus to estimate the parameters of the filters. In the sequel, we consider that this is the case, 4. PARTICLE FILTERING EQUALIZATION METHOD assuming that a proper synchronization processing has been performed at the receiver stage. This task can also be achieved 4.1. Transmission model via Monte Carlo simulation methods [24]. This parameter p Equalizing the complete satellite communication channel de- will be used in an explicit manner in the recursive equaliza- picted in Figure 1 is a difficult problem as it requires taking tion algorithm introduced in Section 4.3. 2322 EURASIP Journal on Applied Signal Processing

pt(x) (1) Initialization. Sample φ0(i) ∼ U{π/4,3π/4,5π/4,7π/4}, set the weights w0(i) = 1/M for i = 1, ..., M,set j = 1. x (2) Importance sampling. Diﬀuse, propagate the t particles by drawing φj (i) ∼ p φj φj−1(i) (28)

for i = 1, ..., M, and actualize the paths φ i ... φ i = φ i ... φ i φ i . t +1 0( ), , j ( ) 0( ), , j−1( ), j ( ) (3) Compute, update the weights Time w j (i) = p r jTech φj (i) × wj−1(i), (29)

Figure 10: Formal updating scheme of a particle ﬁlter. M w i = w i / w k and normalize them: j ( ) j ( ) k=1 j ( ). (4) Selection/actualization of particles. Resample M φ i ... φ i An MCMC simulation scheme [29, 30], for the batch particles ( 0( ), , j ( )) from the set φ i ... φ i processing of received data, was studied in [31]. A sequen- ( 0( ), , j ( ))1≤i≤M according to their weights w i /M tial simulation method for sampling the distribution (26)is ( j ( ))1≤i≤M and set the weights equal to 1 . now introduced, as many applications in the ﬁeld of telecom- (5) j ← j +1andgoto(2). munication require in-line processing methods when data is available sequentially. Algorithm 1: Equalization algorithm.

4.2. Sequential simulation method explained in the next paragraph. Such Monte Carlo simula- A sequential method for sampling distribution (26)canbe tion scheme is now proposed to tackle the sequential sam- implemented via particle filtering techniques [15, 16]. The pling of distribution (26). wide scope of this approach, originally developed for the recursive estimation of nonlinear state space models [17, 18, 4.3. Equalization algorithm 20, 21], is well suited for the sampling task of this equaliza- In the present case, phase samples φj = φ(jTs) of the emitted tion problem. The basic idea of particle filtering is to generate signal are directly sampled. The simulation scheme which is iteratively sequences of the variables of interest, each of them considered is the bootstrap filter [15, 16, 17, 18] and is given denotedasa“particle,”herewrittenas in Algorithm 1. The important sampling and computation steps (28)and x i x i ... x i 0( ), 1( ), , t( ) 1≤i≤M (27) (29) are detailed hereinafter. The information brought by parameter p,numberofre- such that particles (xt(1), ..., xt(M)) at time t are distributed ceived samples per symbol duration, is taken into account from the desired distribution, denoted as pt(x). This goal can via the proposal distribution (28). Indeed, candidates for be reached with the use of two “tuning factors” in the algo- particles φj (i) can be naturally sampled from the following rithm: scheme: φ i = φ i − /p (i) the way the particles are propagated or diffused, (i) Set j ( ) j−1( ) with probability 1 1 ; φ i ∼U /p xt(i) → xt+1(i), in the sampling space, namely, the (ii) Sample j ( ) {π/4,3π/4,5π/4,7π/4} with probability 1 . choice of a proposal or candidate distribution; x ... x M This sampling scheme is very simple and can easily be (ii) the way the distribution of particles ( t(1), , t( )) φkp i ∼ U{π/ π/ π/ π/ } approximates the target distribution pt(x): by affecting improved by considering ( ) 4,3 4,5 4,7 4 and aweightwt(i) to each particle depending on the pro- φkp+s(i) = φkp(i)for1 ≤ k ≤ p − 1, for instance. How- posal distribution, and updating these weights with an ever, the scheme above gives sufficiently accurate results as a appropriate recursive scheme. first approximation, due to its flexibility (if a false symbol is chosen, there is probability to switch to other symbols again) These two tasks are illustrated in Figure 10, where each “ball” and its ability to deal with possible uncertainty on the value stands for a particule xt(i) whose weight is represented by the of parameter p. This scheme is also efficient to limit the neg- length of an associated arrow. ative effect of sample impoverishment due to the resampling Such recursive simulation algorithm is referred to as se- step, as detailed hereinafter. The proposed scheme, however, quential importance sampling or particle filtering in the liter- does not take into account completely the information com- ature [16, 18, 20, 21]ofMonteCarlomethods.Agoodchoice ing from emission and received signals and if some coding of the candidate distribution generally makes it possible to techniques are used to generate the symbols, this knowledge reduce the computational time of the sampling scheme, as should be introduced in the sampling scheme (28) if possible. Particle Filtering for Satellite Channel Equalization 2323

The computation of weights (29) is realized by using sim- 100 ilar Monte Carlo approaches to the ones introduced previ- 80 ective ously, including ﬁlters and their parameters in expressions ﬀ 60 (14)and(18). In this respect, Algorithm 1 can be seen as a 40

Rao-Blackwellized particle ﬁlter [20, 21] where the parame- sample size 20

ters of the channel, considered here as nuisance parameters, Estimated e 0 are integrated out. This generally helps to lead to more robust 0 102030405060708090100 estimates in practice [16, 33, 34]. Time index A crucial point in the implementation of particle filtering techniques lies in the resampling stage, step (4) in (a) Algorithm 1. As the computations for sampling the candidates (28) and updating the weights (29)canbeperformed in parallel, the resampling step gives the main contribu- 5 tion in the computing time of the algorithm as its achieve- 4.5 ment needs the interaction of all the particles. This stage is 4 . compulsory in practice if one wants the sampler to work 3 5 ffi 3 e ciently. This is mainly due to the fact that the sequen- 2.5 tial importance sampling algorithm without resampling in- 2 . creases naturally the variance of the weights (wj (i)) ≤i≤M Entropy of the weights 1 5 1 0 102030405060708090100 with time [20, 22, 35]. In such case, only a few particles are af- Time index fected nonnegligible weights after several iterations, implying a poor approximation of the target distribution and a waste (b) of computation. To limit this effect, several approaches can be considered [15]. One consists in using very large numbers of particles M Figure 11: (a) Estimated effective sample size (ESS) and (b) entropy and/or in performing the resampling step for each iteration (H) of the weights for one realization of the particle filtering algo- M= ≤ M/ [17, 18]. However, resampling too many times often leads rithm, 100 particles, resampling performed when ESS 10 or H ≤ log M/2. to severe sample impoverishment [16, 20, 21]. Other methods, also aiming at minimizing computational and memory costs, consist in using efficient sampling schemes for diffus- ing the particles [20] and performing occasionally the resam- dition pling stage when it seems to be needed [15, 21]. When to perform resampling can be decided by measuring the vari- H wt(1), ..., wt(M) ff (31) ance of weights via the computation of the e ective sam- ≤ λ × max H w(1), ..., w(M) = λ log M M/ w i w ... w M ple size (1 + var( j ( ))), whose one estimate is given by (1), , ( ) / M w2 i 1 i=1 j ( )[15, 16, 21, 35]. In this case, the resampling ff holds, assuming that λ is a threshold value set by the user. stage can be performed each time the estimated e ective ff sample size is small, measuring how the propagation of the To show that the estimated e ective sample size and en- particles in the sampling space is efficient. This quantity tropy lead to similar results for the resampling task, their equals M for uniformly weighted particles and equals 1 for values for one realization of the algorithm are depicted in degenerated cases where all the particles have zero weights Figure 11. ff except one. Also, the resampling step can be performed via di erent It is also possible to compute the entropy of the weights, techniques [18, 21]. In the sequel, we use the general multi- describing “how far” the distribution of the weights is from nomial sampling procedure [16, 18]. φ the uniform distribution. Indeed, the entropy is maximized As the variable of interest is distributed from a set of M for uniform weights and minimized for the degenerated con- discrete values, using large numbers of particles happened figurations as mentioned above. In this sense, the entropy to be unnecessary. Simulations have shown that several ffi of weights quantifies the information of the samples and dozens are su cient for the problem considered here. This measures the efficiency of representation for a given popu- is also the case for other Monte Carlo simulation methods lation of particles. This approach is adopted in [24, 36], for used for estimating telecommunication models [37]. More- instance, and also in our algorithm as follows. Step (4) of over, the degeneracy phenomenon is not really a nuisance in Algorithm 1 is therefore replaced by the computing of en- the present case as the simulation algorithm Algorithm 1 is tropy of the weights only used in a MAP estimation purpose and not for computing mean integral estimates as (18)and(A.4), for in- M stance. H w ... w M =− w i w i t(1), , t( ) t( )log t( ) (30) Some simulations concerning performance of the equal- i= 1 ization algorithm with this sequential sampling scheme are and a resampling/selection step is processed only if the con- now presented. 2324 EURASIP Journal on Applied Signal Processing

− 10−1 10 1

10−2 10−2 BER BER

10−3

10−4 −3 −2 −10 1 2 3 −3 −2 −10 1 2 3 4 5 SNR (dB) SNR (dB)

= Figure 12: Mean ± standard deviation (straight/dotted lines) of the SNRe 15 dB SNRe = 12 dB BER values for MAP estimates of signals versus SNRr in dB, model SNRe = 10 dB of Figure 1 for SNRe= 12 dB. Figure 13: Mean BER values for MAP estimates of signals versus SNRr in dB, model of Figure 1 for various SNRe values. 5. NUMERICAL EXPERIMENTS The equalization method was run for 100 realizations of se- tion in devices of the amplifier and/or of the filters or if sud- quences of 1 000 samples for each value of SNR at the receiver den change of noise intensities happens during the trans- stage and for SNR equal to 12 dB at the emission stage. The mission, the estimation method remains almost insensitive number of received samples per symbol rate is p = 8. The to these changes. The approach currently developed cannot channel model is depicted in Figure 1. thus be used for diagnostic purposes, as it is the case for certain methods based on neural networks [4]. An interesting (a) The amplifier model is described in Section 2.2 where hybrid approach is proposed in the conclusion to cope with parameters are set as the first row of Table 1. this task. (b) The fading channel is composed of one delayed traject In order to compare the performance of the equaliza- ∆ = T of 3 samples ( 3 s; cf. Section 2.3) and 10 dB at- tion method, BERs computed from the MAP estimates of the tenuated in comparison with the principal trajectory. symbols are compared with ones obtained with signals estimated by an equalizer built from a 2-10-4 multilayer neural For each realization, the emitted symbol sequence is esti- network, using hyperbolic tangents as activation functions mated by considering the MAP trajectory computed from [3]. The mean values (straight lines) and standard deviations a Monte Carlo approximation of the distribution (26)with (dotted lines) of BER computed from Monte Carlo MAP es- M = 50 particles (27). Weights (29)arecomputedwith timates are represented in Figures 14 and 15. The mean val- the help of Monte Carlo estimation techniques introduced ues of BER computed from signals estimated with the neural in Section 3.2, considering sequences of 100 samples. In our network method are depicted in dashed lines. simulations, the value for threshold (31) λ = 0.1 gave accept- The two methods give similar results for this configura- able estimation results. The mean values (straight lines) and tion. Nevertheless, an important and interesting characteris- their associated variances (dotted lines) of the BER of MAP tic of the sequential Monte Carlo estimation method is that estimates are depicted in Figure 12. it does not require any learning sequence for equalizing the The equalization algorithm was also run for different val- transmission chain, contrary to approaches based on neu- ues of uplink SNR: 10 dB and 15 dB. The mean values of the ralnetworks.Theproposedmethodisthusefficient in the BER computed from the estimated phases are depicted in context of blind communication. A calibration step is at least Figure 13. Curves from the bottom to the top are associated necessary in order to estimate the number of received sam- to a decreasing SNRe. ples per symbol rate. This tuning can be realized in a simple One of the advantages of the proposed equalization manner by computing the correlation of the received sam- method is its robustness with respect to nonstationarities ples. of the transmission channel. This property comes from the MAP estimation procedure considering the distribution (18) marginalized with respect to channel parameters. Simula- 6. CONCLUSION tions including perturbations of the parameters of the transmission chain lead to similar results to those presented in The particle filtering equalization method proposed in this Figures 12 and 13. On the other hand, in case of dysfunc- paper makes it possible to estimate sequentially digital signals Particle Filtering for Satellite Channel Equalization 2325

− 10 1 10−1

10−2 10−2 BER BER

10−3

10−4 −3 −2 −10123 −3 −2 −10 1 2 3 SNR (dB) SNR (dB)

Figure 14: Mean ± standard deviation (straight/dotted lines) of the Figure 15: Mean ± standard deviation (straight/dotted lines) of the BER values for MAP estimates of signals versus SNRr in dB, model BER values for MAP estimates of signals versus SNRr in dB, model of Figure 1 for SNRe= 15 dB. Means of the BER values for the signals of Figure 1 for SNRe= 15 dB. Means of the BER values for the signals estimated with neural networks are depicted in dashed lines. estimated with neural networks are depicted in dashed lines.

transmitted through a satellite communication channel. The APPENDICES approach takes explicitly into account the nonlinearities in- A. MONTE CARLO ESTIMATION OF POSTERIOR duced by the amplification stage in the satellite thanks to DISTRIBUTION p(φ|A, σe,TWT,σr , r) a Bayesian estimation framework implemented via Monte Carlo simulation methods. This approach enables to estimate As stated in Section 3.1, the problem is to compute likelihood recursively the distribution of interest, namely, the posterior (13). A solution consists in integrating this expression with distribution of the variables, marginalized with respect to the respect to y, the amplified signal (cf. Figure 7): parameters of the transmission chain. An advantage of this approach is that it is robust to non- p r yA φ σ σ dy. stationarities of the channel. On the contrary, the method , , , e,TWT, r (A.1) cannot detect these nonstationarities and thus be employed to predict some dysfunction of transmission devices, as it From Bayes’ formula, this expression is proportional to is the case for certain neural networks approach. Therefore, it seems interesting to use Markov chain and/or sequential p ry A φ σ σ p yA φ σ σ dy. Monte Carlo methods to train appropriate neural networks , , , e,TWT, r , , e,TWT, r models. This approach was successfully applied in the field of (A.2) statistical learning, for instance [16, Section 17]. As downlink transmission noise is assumed to be Gaussian As for many particle-filtering-based methods, the imple- (cf. (8)), the likelihood is yielding mentation of the sampling algorithm is generally demanding in terms of computing time, if compared to classical nonsim- 2 |r − y| p r y, σr ∝ exp − . (A.3) ulation approaches, especially for tackling problems arising σ2 in the field of digital communication [19]. r However, an advantage of the proposed Monte Carlo equalization method is that it does not require the knowl- The right probability density function in integral (A.2)can x edge of any learning input sequence in order to update be computed by marginalizing it with respect to , the signal the equalizer parameters. To the best of our knowledge, it to amplify (cf. Figure 7): seems at the moment the only method for achieving di- p y A, φ, σe,TWT,σr rectly the blind equalization task of such transmission channel. This blind property can be of premium importance in = p y xA φ σ σ dx the framework of communication in noncooperative con- , , , e,TWT, r text. This is the case in passive listening, for instance, or (A.4) when transmission of learning sequences cannot be com- ∝ p y x, A, φ, σe,TWT,σr pleted correctly because of intense noises during the trans- × p xA φ σ σ dx. mission. , , e,TWT, r 2326 EURASIP Journal on Applied Signal Processing

The signal y is entirely determined by the signal x from re- [4] M. Ibnkahla, N. J. Bershad, J. Sombrin, and F. Castanie,´ lations (6)and(7). The left probability density function in “Neural network modeling and identification of nonlinear integral (A.4)thusequals channels with memory: algorithms, applications, and analytic models,” IEEE Trans. Signal Processing,vol.46,no.5,pp. 1208–1220, 1998. p y x, A, φ, σe,TWT,σr = δ y − TWT(x) . (A.5) [5] M. Ibnkahla, J. Sombrin, F. Castanie,´ and N. J. Bershad, “Neu- ral networks for modeling nonlinear memoryless communication channels,” IEEE Trans. Communications,vol.45,no.7, The right probability density function in expression (A.4) pp. 768–771, 1997. yields [6] C. You and D. Hong, “Blind equalization techniques using the complex multi-layer perceptron,” Proc. IEEE Global Telecom- munications Conference (GLOBECOM ’96), pp. 1340–1344, 2 x − A exp(ıφ) November 1996. p x A, φ, σe ∝ exp − (A.6) σ2 [7] S. Prakriya and D. Hatzinakos, “Blind identification of LTI- e ZMNL-LTI nonlinear channel models,” IEEE Trans. Signal Processing, vol. 43, no. 12, pp. 3007–3013, 1995. as the uplink transmission noise is assumed to be Gaussian [8] S. Prakriya and D. Hatzinakos, “Blind identification of linear subsystems of LTI-ZMNL-LTI models with cyclostation- (cf. (4)). Expression (13)isthusproportionalto ary inputs,” IEEE Trans. Signal Processing,vol.45,no.8,pp. 2023–2036, 1997. 2 [9] G. B. Giannakis and E. Serpedin, “Linear multichannel 1 x−A exp(ıφ) exp − r−TWT(x)2 exp − dx blind equalizers of nonlinear FIR Volterra channels,” IEEE σr2 σe2 Trans. Signal Processing, vol. 45, no. 1, pp. 67–81, 1997. (A.7) [10] G. Kechriotis, E. Zervas, and E. S. Manolakos, “Using re- current neural networks for adaptive communication chan- and the above formula can be viewed as an expectation nel equalization,” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 267–278, 1994. [11] G. M. Raz and B. D. Van Veen, “Blind equalization and iden- 1 E exp − r − TWT(x)2 ,(A.8) tification of nonlinear and IIR systems-a least squares ap- σr2 proach,” IEEE Trans. Signal Processing, vol. 48, no. 1, pp. 192– 200, 2000. [12] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, where vol. 7 of Cambridge Nonlinear Science Series, Cambridge Uni- versity Press, Cambridge, UK, 1997. 2 [13] H. Tong, Non-linear Time Series: a Dynamical System Ap- x ∼ NCC A exp(ıφ), σe . (A.9) proach, vol. 6 of Oxford Statistical Science Series, Oxford Uni- versity Press, Oxford, UK, 1990. [14] J. Y. Huang, “On the design of nonlinear satellite CDMA re- ACKNOWLEDGMENTS ceiver,” M.S. thesis, Institute of Communication Engineer- ing, College of Engineering and Computer Science, National This work was partly supported by the research program Chiao Tung University, Hsinchu, Taiwan, June 1998. BLISS (IST-1999-14190) from the European Commission. [15] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on particle filters for online nonlinear/non-Gaussian The first author is grateful to the Japan Society for the Pro- Bayesian tracking,” IEEE Trans. Signal Processing, vol. 50, no. motion of Science and the Grant-in-Aid for Scientific Re- 2, pp. 174–188, 2002. search in Japan for their funding. The authors thank P. [16] A. Doucet, N. De Freitas, and N. Gordon, Eds., “Statistics for Comon and C. Jutten for helpful comments and are grate- engineering and information science,” in Sequential Monte ful to the anonymous reviewers for their helpful suggestions Carlo Methods in Practice, Springer, New York, NY, USA, which have greatly improved the presentation of this paper. 2001. [17] N. Gordon, D. Salmond, and A. F. M. Smith, “Novel approach to nonlinear/non-Gaussian Bayesian state estimation,” IEE REFERENCES proceedings F, Radar and Signal Processing, vol. 140, no. 2, pp. 107–113, 1993. [1] A. A. M. Saleh, “Frequency-independent and frequency- [18] G. Kitagawa, “Monte Carlo filter and smoother for non- dependent nonlinear models of TWT amplifiers,” IEEE Gaussian nonlinear state space models,” JournalofCompu- Trans. Communications, vol. 29, no. 11, pp. 1715–1720, 1981. tational and Graphical Statistics, vol. 5, no. 1, pp. 1–25, 1996. [2] S. Bouchired, M. Ibnkahla, D. Roviras, and F. Castanie,´ [19] E. Punskaya, A. Doucet, and W. J. Fitzgerald, “On the use “Equalization of satellite mobile communication channels us- and misuse of particle filtering in digital communications,” ing RBF networks,” in Proc. IEEE Workshop on Personal In- Proc. European Signal Processing Conference (EUSIPCO ’02), door and Mobile Radio Communication (PIMRC ’98),Boston, September 2002. Mass, USA, September 1998. [20] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte [3] G. J. Gibson, S. Siu, and C. F. N. Cowen, “Multilayer percep- Carlo sampling methods for Bayesian filtering,” Statistics and tron structures applied to adaptive equalisers for data com- Computing, vol. 10, no. 3, pp. 197–208, 2000. munications,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal [21] J. S. Liu and R. Chen, “Sequential Monte Carlo methods for Processing (ICASSP ’89), vol. 2, pp. 1183–1186, Glasgow , UK, dynamic systems,” Journal of the American Statistical Associa- May 1989. tion, vol. 93, no. 443, pp. 1032–1044, 1998. Particle Filtering for Satellite Channel Equalization 2327

[22] J. S. Liu and R. Chen, “Blind deconvolution via sequential Pierre-Olivier Amblard wasborninDi- imputations,” Journal of the American Statistical Association, jon, France, in 1967. He received the vol. 90, no. 430, pp. 567–576, 1995. Ingenieur´ degree in electrical engineering in [23] R. Chen, X. Wang, and J. S. Liu, “Adaptive joint detection 1990 from the Ecole´ Nationale Superieure´ and decoding in flat-fading channels via mixture kalman fil- d’Ingenieurs´ Electriciens de Grenoble, In- tering,” IEEE Transactions on Information Theory, vol. 46, no. stitut National Polytechnique de Grenoble 6, pp. 2079–2094, 2000. (ENSIEG, INPG). He received the DEA de- [24] P.-O. Amblard, J.-M. Brossier, and E. Moisan, “Phase tracking: gree and the Ph.D. degree in signal pro- what do we gain from optimality? Particle filtering vs. phase- cessing in 1990 and 1994, respectively, both locked loops,” Signal Processing, vol. 83, no. 1, pp. 151–167, from the INPG. Since 1994, he has been 2003. Charge´ de Recherches with the Centre National de la Recherche [25] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Pro- Scientifique (CNRS), and he is working in the Laboratoire des Im- cessing, Prentice-Hall Signal Processing Series, Prentice-Hall, ages et des Signaux (LIS, UMR 5083), where he is in charge of the Englewood Cliffs, NJ, USA, 1989. [26] C. Meyr, M. Moeneclaey, and S. A. Fechtel, Digital Commu- Groupe Non Lineaire.´ His research interests include higher-order nication Receivers, Wiley Series in Telecommunications and statistics, nonstationary and nonlinear signal processing, and ap- Signal Processing, John Wiley & Sons, New York, NY, USA, plications of signal processing in physics. 1995. [27] J. M. Bernardo and A. F. M. Smith, Bayesian Theory,Wiley Laurent Cavazzana graduated from the ´ Series in Probability and Mathematical Statistics, John Wiley Ecole Nationale Superieure´ d’Informatique & Sons, Chichester, UK, 1994. et de Mathematiques´ Appliquees´ de Greno- [28] T. Clapp and S. J. Godsill, “Bayesian blind deconvolution ble (France) and received a DEA degree for mobile communications,” in Proc. IEE Colloquium on in applied mathematics from the Joseph Adaptive Signal Processing for Mobile Communication Systems, Fourier University (Grenoble, France) in vol. 9, pp. 9/1–9/6, London , UK, October 1997. 2003. His areas of interest lie in stochastic [29] C. P. Robert and G. Casella, Monte Carlo Statistical Methods, modeling,neuralnetworks,andspeechpro- Springer-Verlag, New York, NY, USA, 1999. cessing. [30] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, Markov Chain Monte Carlo in Practice, Interdisciplinary Statistics, Chapman & Hall, London, UK, 1996. [31] S. Sen´ ecal´ and P.-O. Amblard, “Blind equalization of a nonlinear satellite system using MCMC simulation methods,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 1, pp. 46–57, 2002. [32] R. Chen and T. H. Li, “Blind restoration of linearly degraded discrete signals by Gibbs sampling,” IEEE Trans. Signal Pro- cessing, vol. 43, no. 10, pp. 2410–2413, 1995. [33] G. Casella and C. Robert, “Rao-blackwellisation of sampling schemes,” Biometrika, vol. 83, no. 1, pp. 81–94, 1996. [34] J. S. Liu, W. H. Wong, and A. Kong, “Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes,” Biometrika, vol. 81, no. 1, pp. 27–40, 1994. [35] A. Kong, J. S. Liu, and W. H. Wong, “Sequential imputations and Bayesian missing data problems,” Journal of the American Statistical Association, vol. 89, no. 425, pp. 278–288, 1994. [36] D. T. Pham, “Stochastic methods for sequential data assimila- tion in strongly nonlinear systems,” Monthly Weather Review, vol. 129, no. 5, pp. 1194–1207, 2001. [37] P. Cheung-Mon-Chan and E. Moulines, “Global sampling for sequential filtering over discrete state space,” EURASIP Jour- nal on Applied Signal Processing, vol. 2004, no. 15, pp. 2242– 2254, 2004.

Stephane´ Sen´ ecal´ received the M.S. degree in electrical engineering in 1999 and the Ph.D. degree in statistical signal processing in 2002 from Institut National Polytech- nique de Grenoble, France. He is now a Re- search Associate at the Institute of Statistical Mathematics, Japan. His research interests include stochastic simulation methods and their applications in signal processing and modeling. EURASIP Journal on Applied Signal Processing 2004:15, 2328–2338 c 2004 Hindawi Publishing Corporation

Channel Tracking Using Particle Filtering in Unresolvable Multipath Environments

Tanya Bertozzi Diginext, 45 Impasse de la Draille, 13857 Aix-en-Provence Cedex 3, France Email: [email protected] Conservatoire National des Arts et M´etiers (CNAM), 292 rue Saint-Martin, 75141 Paris Cedex 3, France Didier Le Ruyet Conservatoire National des Arts et M´etiers (CNAM), 292 rue Saint-Martin, 75141 Paris Cedex 3, France Email: [email protected]

Cristiano Panazio Conservatoire National des Arts et M´etiers (CNAM), 292 rue Saint-Martin, 75141 Paris Cedex 3, France Email: [email protected]

Han Vu Thien Conservatoire National des Arts et M´etiers (CNAM), 292 rue Saint-Martin, 75141 Paris Cedex 3, France Email: [email protected]

Received 1 May 2003; Revised 9 June 2004

We propose a new timing error detector for timing tracking loops inside the Rake receiver in spread spectrum systems. Based on a particle ﬁlter, this timing error detector jointly tracks the delays of each path of the frequency-selective channels. Instead of using a conventional channel estimator, we have introduced a joint time delay and channel estimator with almost no additional computational complexity. The proposed scheme avoids the drawback of the classical early-late gate detector which is not able to separate closely spaced paths. Simulation results show that the proposed detectors outperform the conventional early-late gate detector in indoor scenarios. Keywords and phrases: sequential Monte Carlo, multipath channels, importance sampling, timing estimation.

1. INTRODUCTION tracking. During the acquisition phase, the number and the delays of the most significant paths are determined. These In wireless communications, direct-sequence spread spec- delays are estimated within one half chip from the exact de- trum (DS-SS) techniques have received an increasing inter- lays. Then, the tracking module refines the first estimation est, especially for the third generation of mobile systems. In and follows the delay variations during the permanent phase. DS-SS systems, the adapted filter typically employed is the The conventional TED used during the tracking phase is the Rakereceiver.Thisreceiverisefficient to counteract the ef- early-late gate-TED (ELG-TED) associated with each path. It fects of frequency-selective channels. It is composed of fin- is well known that the ELG-TED works very well in the case gers, each assigned to one of the most significant channel of a single fading path. However, in the presence of multipath paths. The outputs of the fingers are combined proportion- propagation, the interference between the different paths can ally to the power of each path for estimating the transmitted degrade its performance. In fact, the ELG-TED cannot sepa- symbols (maximum-ratio combining). Unfortunately, the rate the individual paths when they are closer than one chip performance of the Rake receiver strongly depends on the period from the other paths, whereas a discrimination up to quality of the estimation of the parameters associated with Tc/4 can still increase the diversity of the receiver (Tc de- the channel paths. As a consequence, we have to estimate notes the chip time) [1]. When the difference between the the delay of each path using a timing error detector (TED). delays of two paths is contained in the interval 0–1.5 Tc,we This goal is generally achieved in two steps: acquisition and are in the presence of unresolvable multipaths. This scenario Channel Tracking Using Particle Filtering 2329 corresponds, for example, to the indoor scenario. The prob- Channel lem of unresolvable multipaths has recently been analyzed in

[2, 3, 4]. ∗ sn g(t) h(t, τ) g (−t) r(t) Particle filtering (PF) or sequential Monte Carlo (SMC) methods [5] represent the most powerful approach for the sequential estimation of the hidden state of a nonlinear dy- dm n(t) namic model. The solution to this problem depends on the knowledge of the posterior probability density (PPD) of the hidden state given the observations. Except in a few special Figure 1: Equivalent lowpass transmission system model. cases including linear Gaussian system models, it is impossible to analytically calculate a sequential expression of this PPD. It is necessary to adopt numerical approximations. The h(t, τ) denotes the overall impulse response of the multi- PF methods give a discrete approximation of the PPD of the path propagation channel with Lh independent paths (wide- hidden state by weighted points or particles which can be re- sense stationary uncorrelated scatterers (WSSUS) model): cursively updated as new observations become available. The first main application of the PF methods was target Lh tracking. More recently, these techniques have been success- h(t, τ) = hl(t)δ τ − τl(t) . (2) fully applied in communications, including blind equaliza- l=1 tion in Gaussian [6] and non-Gaussian [7, 8] noises and joint symbol and timing estimation [9]. For a complete survey of Each path is characterized by its time-varying delay τl(t)and the communication problems dealt with using PF methods, channel coefficient hl(t). see [10]. The signal at the output of the matched filter is given by In this paper we propose to use the PF methods for es- − timating the delays of the paths in multipath fading chan- Lh Ns 1 nels. Since these methods are based on a joint approach, r(t) = hl(t) sn dmRg t − mTc − nT − τl(t) + n˜(t), they provide optimal estimates of the different channel de- l=1 n m=0 lays. In this way, we can overcome the problem of the ad- (3) jacent paths which causes the failure of the conventional where n˜(t) represents the additive white gaussian noise single-path-tracking approaches in the presence of unresolv- (AWGN) n(t) filtered by the matched filter and able multipaths. Moreover, we will combine the PF-based TED (PF-TED) with a conventional estimator for estimating +∞ = ∗ ffi Rg (t) g (τ)g(t + τ)dτ (4) the amplitudes of the channel coe cients. We will also apply −∞ the PF methods to the estimation of the channel coefficients in order to jointly estimate the delays and the coefficients. is the total impulse response of the transmission and receiver This paper is organized as follows. In Section 2,wewill filters. introduce the system model. Then in Section 3,wewill Figure 1 shows the equivalent lowpass transmission describe the conventional ELG-TED and the PF-TED. In model considered in this paper. Section 4, we will present the conventional estimators of the The output of the matched filter is used as the input of the channel coefficients and the application of the PF methods Rake receiver. The Rake receiver model is shown in Figure 2. to the joint estimation of the delays and the channel coeffi- TheRakereceiveriscomposedofL branches corresponding cients. In Section 5, we will give simulation results. Finally, to the L most significant paths. In the lth branch, the received we will draw a conclusion in Section 5. and filtered signal r(t) is sampled at time mTc +nT +τˆl in order to compensate the timing delay τl of the associated path 2. SYSTEM MODEL with the estimate τˆl.Theoutputsofeachbrancharecom- bined to estimate the transmitted symbols. The output of the We consider a DS-SS system sending a complex data se- Rake receiver is given as quence {sn}. The data symbols are spread by a spreading se- − Ns 1 L N −1 quence {d } = where N is the spreading factor. s m m 0 s = = 1 ˆ ∗ ∗ The resulting baseband equivalent transmitted signal is sˆn sˆ(nT) hl dmr mTc + nT + τˆl . (5) Ns = given by l=1 m 0

−1 Ns 3. THE TIMING ERROR DETECTION e(t) = sn dmg t − mTc − nT ,(1) n m=0 3.1. The conventional TED where Tc and T are respectively the chip and symbol period The Rake receiver needs good timing delays and channel es- and g(t) is the impulse response of the root-raised cosine ﬁl- timators for each path to extract the most signal power from ter with a rolloﬀ factor equal to 0.22 in the case of the uni- the received signal and to maximize the signal-to-noise ratio versal mobile telecommunications system (UMTS) [11]. at the output of Rake receiver. 2330 EURASIP Journal on Applied Signal Processing

nT + mTc + τˆ1(t) ˆ∗ ˆ∗ dm h1

N −1 Interpolator/ 1 s

decimator Ns m=0

( ) . . r t . . sˆ . . n 1/Ts nT + mTc + τˆL(t) ˆ∗ ˆ∗ dm hL

Ns−1 Interpolator/ 1 decimator Ns m=0

Figure 2: Rake receiver model.

The conventional TED for DS-SS systems is the ELG- the knowledge of two distributions: the PPD of the sequence TED. The ELG-TED is devoted to the tracking of the delay of hidden states from time 1 to time t given the correspond- of one path. It is composed of the early and late branches. ing sequence of observations and the marginal distribution The signal r(t) is sampled at time mTc + nT + τˆl ± ∆. In this of the hidden state at time t given the sequence of the obser- paper, we will use ∆ = Tc/2. We will restrict ourselves to the vations until time t. Except in a few special cases including coherent ELG-TED where the algorithm uses an estimation linear Gaussian state space models, it is impossible to analyt- of the transmitted data or the pilots when they are available. ically calculate these distributions. The PF methods provide a The output of a coherent ELG-TED associated with the lth discrete and sequential approximation of the distributions. It path is given by can be updated when a new observation is available, without reprocessing the previous observations. The support of the xn = x(nT) distributions is discretized by particles, which are weighted − samples evolving in time. (n+1)Ns 1 ∗ ∗ Tc Tracking the delay of the individual channel paths can be = Re sˆ h r mTc + τˆl + n l 2 (6) interpreted as a Bayesian inference. The delays are the hidden m=nNs state of the system and the model (3) of the received samples Tc ∗ relating the observations to the delays represents the observa- − r mT + τˆ − dˆ . c l 2 m tion equation. We notice that this equation is nonlinear with respect to the delays and as a consequence, we cannot analyt- The main limitation of the ELG-TED is its discrimination ca- ically estimate the delays. To overcome this nonlinearity, we pability. Indeed, when the paths are unresolvable (separated propose to apply the PF methods. by less than Tc), the ELG-TED is not able to correctly dis- The PF methods have previously been applied for the de- tinguish and track the path. This scenario corresponds for lay estimation in DS-CDMA systems [12, 13]. In [12], the example to the indoor case. PF methods are used to jointly estimate the data, the chan- These drawbacks motivated the proposed PF-TED. nel coefficients, and the propagation delay. In [13], the PF methods are combined with a Kalman filter (KF) to respec- 3.2. The PF-TED tively estimate the delay propagation and the channel coefficients; the information symbols are assumed known, pro- We propose to use the PF methods in order to jointly track vided by a Rake receiver. In both papers, the delays of each the delay of each individual path of the channel. We assume channel path are considered known and multiple of the sam- that the acquisition phase has allowed us to determine the pling time; therefore, only the propagation delay is estimated. number of the most significant paths and to roughly estimate In this paper, the approach is different. We suppose that each their delay. channel path has a slow time-varying delay, unknown at the The PF methods are used to sequentially estimate time- receiver. This environment can represent an indoor wireless varying quantities from measures provided by sensors. In communication. We assume that the information symbols general, the physical phenomenon is represented by a state are known or have been estimated essentially for three rea- space model composed of two equations: the first describes sons: the evolution of the unknown quantities called hidden state (evolution equation) and the second the relation between the (i) the computational complexity of the receiver should measures called observations and the hidden state (observa- be reduced; tion equation). Given the initial distribution of the hidden (ii) the channel estimation is typically performed trans- state, the estimation of the hidden state at time t based on the mitting known pilot symbols, for example using a spe- observations until time t is known as Bayesian inference or cific channel as the common pilot channel (CPICH) of Bayesian filtering. This estimation can be obtained through the UMTS; Channel Tracking Using Particle Filtering 2331

The calculation of (9) requires the knowledge of the PPD Interpolator/ r(mTc) Particle r(t) τˆ1(nT), ..., τˆL(nT) p(τ1:n|r1:n). decimator filter 1/Ts The simulations give similar results for the MMSE method and the MAP method. Hence, we choose to adopt Figure 3: Structure of the proposed PF-TED. the MMSE solution as in [9]. In order to obtain samples from the marginal distribution, we use the sequential importance sampling (SIS) approach [16]. Applying the definition of the (iii) the PF methods applied to the estimation of the in- expectation, (8) can be expressed as follows: formation symbols perform slightly worse than simple deterministic algorithms [12, 14]. τˆ = τ p τ |r dτ . (10) Firstly, we will apply the PF methods only to the estima- n n n 1:n n tion of the delays of each channel path, considering that the channel coefficients are known. In the next paragraph, we The aim of the SIS technique is to approximate the ffi will introduce the estimation of the channel coe cients. marginal distribution p(τn|r1:n) by means of weighted par- The structure of the proposed PF-TED is shown in ticles: Figure 3. This estimator operates on samples from the matched filter output taken at an arbitrary sampling rate 1/Ts Np (at least Nyquist sampling). Then, the samples are processed | ≈ (i) − (i) p τn r1:n wñ δ τn τn , (11) by means of interpolation and decimation in order to ob- i=1 tain intermediate samples at the chip rate 1/Tc. These samples are the input of the particle filter. In order to reduce the (i) computational complexity of the PF-TED and since the time where Np is the number of particles, wñ is the normalized variation of the delays is slow with respect to the symbol du- importance weight at time n associated with the particle i, (i) (i) ration, we choose that the particle filter works at the symbol and δ(τn − τn ) denotes the Dirac delta centered in τn = τn . rate 1/T. Moreover, in order to exploit all the information The phases of the PF-TED based on the SIS approach are contained in the chips of a symbol period, the equations of summarized below. the PF algorithm are modified. The PF algorithm proposed (1) Initialization. In this paper, we apply the PF meth- in this paper is thus the adaptation of the PF methods to a ods for the tracking phase, assuming that the number of the DS-SS system. channel paths and the initial value of the delay for each path Following [15], the evolution of the delays of the channel have been estimated during the acquisition phase [17]. We paths can be described as a first-order autoregressive (AR) assume that the error on the delay estimated by the acquisi- process: tion phase belongs to the interval (−Tc/2, Tc/2). Hence, the a priori probability density p(τ0) can be considered uniformly = τ1,n α1τ1,n−1 + v1,n, distributed in (τˆ0 −Tc/2, τˆ0 +Tc/2), where τˆ0 is the delay provided by the acquisition phase. Note that the PF methods can . (7) . be used also for the acquisition phase. However, the number τL,n = αLτL,n−1 + vL,n, of particles has to be increased, because we have no a priori information on the initial value of the delays. = where τl,n for l 1, ..., L denotes the delay of the lth channel (2) Importance sampling. The time evolution of the parti- path at time n, α1, ..., αL express the possible time variation cles is achieved with an importance sampling distribution. of the delays from a time to the next one, and v1, ..., vL are When rn is observed, the particles are drawn according to 2 AWGN with zero mean and variance σv . Note that the time the importance function. In general, the importance func- index n is an integer multiple of the symbol duration. tion is chosen to minimize the variance of the importance The estimation of the delays can be achieved using the weights associated with each particle. In fact, it can be shown minimum mean square error (MMSE) method or the max- that the variance of the importance weights can only increase imum a posteriori (MAP) method. The MMSE solution is stochastically over time [16]. This means that, after a few it- given by the following expectation: erations of the SIS algorithm, only one particle has a nor-

= | malized weight almost equal to 1 and the other weights are τˆn E τn r1:n ,(8)very close to zero. Therefore, a large computational eﬀort is devoted to updating paths with almost no contribution to where τn ={τ1,n, ..., τL,n} and r1:n is the sequence of received samples from time 1 to n. The calculation of (8) involves the the ﬁnal estimate. In order to avoid this behavior, a resampling phase of the particles is inserted among the recursions knowledge of the marginal distribution p(τn|r1:n). Unlike the MMSE solution that yields an estimate of the delays at each of the SIS algorithm. To limit this degeneracy phenomenon, time, the MAP method provides the estimate of the hidden we need to use the optimal importance function [16], given by state sequence τ1:n ={τ1, ..., τn}: = | (i)| (i) = (i)| (i) τˆ1:n arg max p τ1:n r1:n . (9) π τn τ1:n−1, r1:n p τn τn−1, rn . (12) τ1:n 2332 EURASIP Journal on Applied Signal Processing

Unfortunately, the optimal importance function can be ana- In order to reduce the computational complexity of the PF- lytically calculated only in a few cases, including the class of TED, in (16) we have assumed that the contribution of the models represented by a Gaussian state space model with lin- raised cosine filter Rg to the sum on the spreading sequence is ear observation equation. In this case, the observation equa- limited to the previous 3 and next 3 samples. By substitution tion (3) is nonlinear and thus, the optimal importance func- of (15)in(14), the latter becomes tion cannot be analytically determined. We can consider two (n+1)N −1 solutions to this problem [16]: Ns s | (i) = 1 − 1 − (i)2 p rn τn 2 exp 2 rm µm . (i)| (i) πσn σn = (i) the a priori importance function p(τn τn−1); m nNs (ii) an approximated expression of the optimal impor- (17) tance function by linearization of the observation (i) = (i) = Assuming the a priori importance function, (13) yields equation about τl,n αlτl,n−1 for l 1, ..., L. (i) = (i) | (i) Since the second solution involves the derivative calculation wn wn−1 p rn τn of the nonlinear observation equation, and hence very com- Ns ( +1) −1 1 1 n Ns plex operations, we choose the a priori importance function = (i) − − (i)2 wn−1 2 exp 2 rm µm . = πσ σ = as in [9]. Considering that the noises vl,n for l 1, ..., L in n n m nNs (7) are Gaussian, the importance function for each delay l is (18) (i) 2 a Gaussian distribution with mean αlτl,n−1 and variance σv . (3) Weight update. The evaluation of the importance Finally, the importance weights in (18) are normalized function for each particle at time n enables the calculation using the following expression: of the importance weights [16]: w(i) w˜(i) = n . (19) n Np (j) | (i) (i)| (i) j=1 wn (i) (i) p rn τn p τn τn−1 w = w − . (13) n n 1 (i)| (i) π τn τ1:n−1, r1:n (4) Estimation. By substitution of (11) into (10), we obtain at each time the MMSE estimate: This expression represents the calculation of the importance N weights if we only consider the samples of the received sig- p τˆ = w˜(i)τ(i). (20) nal at the symbol rate. However, in a DS-SS system we have n n n i=1 additional information provided by Ns samples for each symbol period due to the spreading sequence. Consequently, we (5) Resampling. This algorithm presents a degeneracy modify (13) taking into account the presence of a spreading phenomenon. After a few iterations of the algorithm, only sequence. Indeed, observing that the received samples are in- one particle has a normalized weight almost equal to 1 and (i) the other weights are very close to zero. This problem of the dependent, the probability density p(rn|τn ) at the symbol rate can be written as SIS method can be eliminated with a resampling of the particles. A measure of the degeneracy is the effective sample size − (n+1) Ns 1 Neff ,estimatedby | (i) = | (i) p rn τn p rm τn . (14) = 1 m nNs Nêff = . (21) Np (i) 2 i=1 wñ Considering (3) at the chip rate and recalling the assump- (i) When Nêff is below a fixed threshold Nthres, the particles are tions of known symbols, the probability density p(rm|τn )is Gaussian. Typically, the received sample r is complex. For resampled according to the weight distribution [16]. After m each resampling task, the normalized weights are initialized the calculation of the Gaussian distribution, we can write rm as a bidimensional vector with components being the real to 1/Np. part and the imaginary part of rm. The probability density (i) p(rm|τn )isthusgivenby 4. THE ESTIMATION OF THE CHANNEL COEFFICIENTS 4.1. The conventional estimators 1 1 2 p r |τ(i) = exp − r − µ(i) , (15) Channel estimation is performed using the known pilot sym- m n πσ2 σ2 m m n n bols. If we suppose that the channel remains almost un- changed during the slot, the conventional estimator of the 2 where σn is the variance of the AWGN n˜(t)in(3) and the channel coefficients of the lth path is obtained by correlation (i) mean µm is obtained by using the known symbols [18]:

N −1 − L m+3 1 pilot Ns 1 µ(i) = h s d R mT − kT − nT − τ(i) . (16) hˆ = s∗d∗r mT + nT + τˆ , (22) m l,n n k g c c l,n l N N n m c l,n l=1 k=m−3 pilot s n=0 m=0 Channel Tracking Using Particle Filtering 2333

where Npilot is the number of pilots in a slot. For each path, where hn ={h1,n, ..., hL,n}. Considering that the noises zl,n the received signal is sampled at time mTc +nT +τˆl,n in order for l = 1, ..., L in (24) are Gaussian, the importance function to compensate its delay. Then the samples are multiplied by for the channel coefficients is a Gaussian distribution with (i) 2 the despread sequence and summed on the whole sequence mean βlhl,n−1 and variance σz . To determine the positions of of pilot symbols. The problem of this estimator is that when the particles at time n from the positions at time n − 1, each (i)| (i) the delays are unresolvable, the estimation becomes biased. particle is drawn according to p(τn τn−1)and(25). To eliminate this bias, we can use an estimator based on the The calculation of the importance weights is very simi- maximum likelihood (ML) criterion. In [1, 19], a simplified lar to the case of the PF-TED. The only difference is that the version of the ML estimation is proposed. The channel coef- channel coefficients hl,n are replaced by the support of the ficients which maximize the ML criterion are given by (i) particles hl,n to calculate the mean (16).

hˆ = P−1a, (23) 5. SIMULATION RESULTS ˆ ˆ ˆ where h = (h1, ..., hL), P is an L × L matrix with elements In this section, we will compare the performance of the con- Pij = Rg (τi,n − τj,n), and a is the vector of the channel coeffi- ventional ELG-TED and the PF-TED. In order to demon- cients calculated using (22). strate the gain achieved using the latter, we will consider different indoor scenarios with a two-path Rayleigh channel 4.2. The PF-based joint estimation of the delays with the same average power on each path and a maximum and the channel coefficients Doppler frequency of 19 Hz corresponding to a mobile speed of 10 Km/h for a carrier frequency of 2 GHz. The simulation We can apply the PF methods to jointly estimate the delays setup is compatible with the UMTS standard. In these con- of each path and the channel coefficients with a very low additions, the time variations of the channel delays can be ex- ditional cost in terms of computational complexity. This is pressed by the model (7), with α = ··· = α = 0.99999 a suboptimal solution, since the observation equation (3)is 1 L and σ2 = 10−5 [15]. Moreover, the time variations of the linear and Gaussian with respect to the channel coefficients. v channel coefficients can be represented by the model (24), The optimal solution is represented by a KF. However, com- β =···=β = 0.999 and σ2 = 10−3. bining the PF methods and the KF to jointly estimate the de- 1 L z In these simulations, a CPICH is used. In each slot of lays and the channel coefficients involves the implementation CPICH, 40 pilot symbols equal to 1 are expanded into a chip of a KF. It is better to use the particles employed for the delay level by a spreading factor of 64. The spreading sequence is a estimation and to associate to each particle the estimation of PN sequence changing at each symbol. the channel coefficients. In this case, the hidden state is composed of the L de- 5.1. Tracking performance lays and the L channel coefficients of each individual path. ffi When a particle evolves in time, its new position is thus de- We assume that the channel coe cients are known to eval- termined by the evolution of the delays and the evolution of uate the TED’s tracking capacity and the simulation time is the channel coefficients.Thedelaysevolveasdescribedfor equal to 0.333 second, corresponding to 500 slots. We have the PF-TED. For the channel coefficients, we assume that the firstly considered the delays of the two paths varying accord- time variations are slow as, for example, in indoor environ- ing to the following model: ments. Hence, the evolution of the channel coefficients can τ1, = α1τ1, −1 + v1, , be expressed by the following first-order AR model: n n n (26) τ2,n = α2τ2,n−1 + v2,n, = h1,n β1h1,n−1 + z1,n, = = 2 = 2 = = where α1 α2 0.999, σv,1 σv,2 0.001, τ1,0 0, and = . (24) τ2,0 1. . Figure 4 shows one realization of the considered delays hL,n = βLhL,n−1 + zL,n, and the tracking performance of two ELG-TEDs used for the estimation of the two delays. We assume that Es/N0 = 10 dB, where β1, ..., βL describe the possible time variation of the where Es is the energy per symbol and N0 is the unilateral ffi channel coe cients from a time to the next one and z1, ..., zL spectral power density. The classical ELG-TED presents dif- 2 areAWGNwithzeromeanandvarianceσz .Theparameters ficulties to follow the time variation of the two delays, espe- of the channel AR model (24) are chosen according to the cially when the delay separation becomes less than 1 Tc. Doppler spread of the channel [20]. Notice that this joint es- However, it is very important for the TED to distinguish timator operates at the symbol rate as the PF-TED. the different paths of the channel to enable the Rake receiver As for the delays, we only consider the MMSE method to exploit the diversity contained in the multipath nature for the estimation of the channel coefficients and we use the of the channel. In [1], it has been shown that the gain in a priori importance function: diversity decreases as the separation between the paths decreases. In particular, a loss of 2.5 dB in the performance of (i)| (i) = (i)| (i) −2 π hn h1:n−1, r1:n p hn hn−1 , (25) the matched filter bound for a BER equal to 10 , passing 2334 EURASIP Journal on Applied Signal Processing

1.6 2.5 1.4 2 1.2 1 1.5 c c 0.8 /T /T 0.6 1 estimate estimate τ τ 0.4 0.5 0.2

0 0 −0.2 −0.4 −0.5 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 Time in slots Time in slots

True delay Estimated delay, second path Estimated delay Real delay, second path Estimated delay, ﬁrst path Real delay, ﬁrst path

Figure 4: Delay tracking with the conventional ELG-TED. Figure 6: Delay tracking with the conventional ELG-TED.

1.4 In order to better highlight this behavior, we have ﬁxed the delay of the ﬁrst path at 0 and the delay of the second 1.2 path is decreasing linearly from 2Tc to 0 over a simulation 1 time of 0.333 second corresponding to 500 slots. We assume that Es/N0 = 10 dB, where Es is the energy per symbol and 0.8 c N0 is the unilateral spectral power density.

/T Firstly, we consider that the channel coefficients are 0.6 known to evaluate the TED’s tracking capacity. Figure 6 gives estimate τ 0.4 a representative example of the evolution of the two estimated delays using two ELG-TEDs. As soon as the difference 0.2 between the two delays is lower than 1 Tc, due to the correlation between the two paths, the estimated delays tend to 0 oscillatearoundeachrealdelay.TheELG-TEDsarenolonger able to perform the correct tracking of the delays. On the −0.2 0 50 100 150 200 250 300 350 400 450 500 other hand, as shown in Figure 7, the proposed PF-TED is Time in slots able to track almost perfectly the two paths. These results have been obtained using a particle filter with only 10 par- True delay ticles. Estimated delay Then, we have introduced the estimation of the channel coefficients into the TED. Figure 8 shows the results obtained Figure 5: Delay tracking with the PF-TED. with two ELG-TEDs combined with the conventional estimator based on the correlation. As soon as the difference between the two delays is lower than 1 Tc, the detectors no from Tc to Tc/4, has been observed. Moreover, it has been longer recognize the two paths: the weaker path merges with noted that an interesting gain in diversity occurs if the TED the stronger one. distinguishes paths separated by more than Tc/4. On the In Figure 9, the PF-TED is also associated with the con- other hand, it has been found that the performance of the ventional estimator of the channel coefficients based on the matched filter bound for a separation of Tc/8 is very close correlation. When the delay of the second path becomes less to the one obtained with only one path. Consequently, the than 1 Tc, the channel estimator decreases its capacity to TED discrimination capacity has to be equal to Tc/4. Unfor- track the time variations of the channel coefficients and the tunately, the ELG-TED fails to distinguish all the paths with PF-TED cannot track the delays of the two paths. To im- a delay separation less than 1 Tc.InFigure 5, we can observe prove the channel estimation, we associate the PF-TED with how the discrimination capacity of the TED can be improved the ML estimator, as shown in Figure 10. In this case, the using the PF methods. PF-TED can track the delay of the second path up to Tc/2. Channel Tracking Using Particle Filtering 2335

2.5 2.5

2 2

1.5 1.5 c c /T /T 1 1 estimate estimate τ τ 0.5 0.5

0 0

−0.5 −0.5 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 Time in slot Time in slots

Real delay, first path Real delay, first path Real delay, second path Real delay, second path Estimated delay, first path Estimated delay, first path Estimated delay, second path Estimated delay, second path

Figure 7: Delay tracking with the PF-TED. Figure 9: Delay tracking with the PF-TED associated with a conventional channel coeﬃcient estimator based on the correlation.

2.5 2

2 1.5

1.5 c c 1 /T /T 1 estimate estimate τ τ 0.5 0.5

0 0

−0.5 −0.5 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 Time in slots Time in slots

Estimated delay, second path Real delay, first path Real delay, second path Real delay, second path Estimated delay, first path Estimated delay, first path Real delay, first path Estimated delay, second path

Figure 8: Delay tracking with the conventional ELG-TED associ- Figure 10: Delay tracking with the PF-TED associated with a con- ated with a conventional channel coeﬃcient estimator based on the ventional channel coeﬃcient estimator based on the ML. correlation.

For smaller delays, the PF-TED continues to distinguish the computational complexity with respect to the PF-TED, since two paths, but it cannot follow the time variations of the sec- it exploits the set of particles used for the delay estimation for ond delay. The delay of the second path remains close to the the channel coeﬃcient estimation. values estimated at Tc/2. Using the PF methods to jointly estimate the delays and 5.2. Mean square error of the delay estimators the channel coeﬃcients, we can notice in Figure 11 that the In this section, we will compare the estimation of the mean PF-TED can track the time variations of the second path. square error (MSE) estimating τn of the ELG-TED and This solution implies only a low additional cost in terms of the PF-TED with the lower posterior Cramer-Rao bound 2336 EURASIP Journal on Applied Signal Processing

2.5 0.2 0.18 2 0.16

1.5 0.14

c 0.12 /T 1 0.1 estimate τ 0.08 0 5 . Mean square error 0.06

0 0.04 0.02 −0.5 0 0 50 100 150 200 250 300 350 400 450 500 0 2 4 6 8 10 12 14 16 18 20 Time in slots Time in slots

Real delay, ﬁrst path PF-TED Real delay, second paths PCRB Estimated delay, ﬁrst path ELG-TED Estimated delay, second path

Figure 11: Delay tracking with a joint delay and channel coeﬃcient Figure 12: Comparison of the PCRB with the MSE estimating τn of estimator based on the PF methods. ELG-TED and PF-TED.

(PCRB). In the Bayesian context of this paper, the PCRB [21] In Figure 12, we show the comparison of the PCRB with is more suitable than the Cramer-Rao bound [22]toevaluate the MSE estimating τn of the ELG-TED and PF-TED. For the MSE of varying unknown parameters. both algorithms, we use a uniform initial pdf p(τ0). For the The PCRB for estimating τn using r1:n has the form PF-TED, the 10 particles were initialized uniformly in the interval {−Tc/2, Tc/2}. The signal-to-noise ratio Es/N0 was 2 − ﬁxed to 10 dB. We can see in Figure 12 that the PF-TED out- E τˆ − τ ≥ J 1, (27) n n n,n performs the ELG-TED and reaches the PCRB bound after 15 slots. The slow convergence of the ELG-TED and PF-TED where J is the right lower element of the n × n Fisher infor- n,n compared to the PCRB can be explained since the two TEDs mation matrix. are updated at each symbol while the PCRB bound is calcu- In [21], the authors have shown how to recursively eval- lated for each chip. uate Jn,n. For our application, the nonlinear ﬁltering system is 5.3. Performance evaluation

Figure 13 shows the BERs versus Es/N0 considering a two- τn+1 = ατn + vn, (28) path channel with the same average power on each path. The rn = zn τn + nñ, delays of the first and second paths were respectively fixed at 0and1Tc. The same maximum Doppler frequency as above where the second relation represents the nonlinear observa- was used. The BER values have been averaged over 50 000 tion equation (3)atchiprate. bits. Since the spreading sequence is different at each chip When using two ELG-TEDs, except when the channel is time, we have to evaluate zn(τn) at this rate. known, the performance is very poor compared to the max- From the general recursive equation given in [21], the se- imum achievable performance (known delays and channel { } quence Jn,n can be obtained as follows: coefficients). On the other hand, the PF-TED with channel coefficients known or estimated reaches the optimal perfor- = −1 2 −1 Jn+1,n+1 σv + E τn+1 zn+1(τn+1) σn mance. We can conclude that the considered TED must be (29) able to separate the different paths of the channel, otherwise − −1 2 2 −1 −1 ασv Jn,n + α σv . the performance of the Rake receiver breaks down. InordertocalculateE[ τn+1 zn+1(τn+1)], we have applied a Monte Carlo evaluation. We generate M i.i.d. state trajec- 6. CONCLUSIONS { i i i } ≤ ≤ tories of a given length Nt τ0, τ1, ..., τNt with 1 i M In this paper we have proposed to use the PF methods in or- by simulating the system model defined in (28) starting from der to track the delay of the different channel paths. We have an initial state τ0 drawn from the a priori probability density assumed that an acquisition phase has already provided an p(τ0). For the calculation, we fixed M = 100. initial estimation of these delays. Channel Tracking Using Particle Filtering 2337

100 REFERENCES [1] H. Boujemaa and M. Siala, “Rake receivers for direct sequence spread spectrum systems,” Ann. Telecommun.,vol.56,no.5-6, pp. 291–305, 2001. − [2] V. Aue and G. P. Fettweis, “A non-coherent tracking scheme 10 1 for the RAKE receiver that can cope with unresolvable multipath,” in Proc.IEEEInternationalConferenceonCommunica- tions (ICC ’99), pp. 1917–1921, Vancouver, BC, Canada, June BER 1999. − [3] R. De Gaudenzi, “Direct-sequence spread-spectrum chip 10 2 tracking in the presence of unresolvable multipath components,” IEEE Trans. Vehicular Technology,vol.48,no.5,pp. 1573–1583, 1999. [4] G. Fock, J. Baltersee, P.Schulz-Rittich, and H. Meyr, “Channel − tracking for RAKE receivers in closely spaced multipath envi- 10 3 024681012 ronments,” IEEE Journal on Selected Areas in Communications, vol. 19, no. 12, pp. 2420–2431, 2001. Es/N0(dB) [5] A. Doucet, J. F. G. de Freitas, and N. J. Gordon, Eds., Sequen- Rake known delay, known channel tial Monte Carlo Methods in Practice,Springer,NewYork,NY, ELG-TED known channel USA, 2001. PF-TED known channel [6] J. S. Liu and R. Chen, “Blind deconvolution via sequential ELG-TED estimated channel correlation imputations,” Journal of the American Statistical Association, PF-TED estimated channel correlation vol. 90, no. 430, pp. 567–576, 1995. PF-TED estimated channel PF [7] R. Chen, X. Wang, and J. S. Liu, “Adaptive joint detection Figure 13: Performance comparison of the ELG-TED and the PF- and decoding in flat-fading channels via mixture Kalman fil- TED. tering,” IEEE Transactions on Information Theory, vol. 46, no. 6, pp. 2079–2094, 2000. ffi [8] E. Punskaya, C. Andrieu, A. Doucet, and W. J. Fitzgerald, We have firstly considered that the channel coe cients “Particle filtering for demodulation in fading channels with are known. We have compared the tracking capacity of the non-Gaussian additive noise,” IEEE Trans. Communications, conventional ELG-TED and the proposed PF-TED. We have vol. 49, no. 4, pp. 579–582, 2001. shown that when the delays of the channel paths become very [9] T. Ghirmai, M. F. Bugallo, J. M´ıguez, and P. M. Djuric,ˇ “Joint close (less than 1 Tc), the ELG-TED is unable to track the symbol detection and timing estimation using particle filter- time variations of the delays. However, the PF-TED contin- ing,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process- ues to track the delays. ing (ICASSP ’03), vol. 4, pp. 596–599, Hong Kong, April 2003. We have introduced the channel coefficient estimation to [10] P. M. Djuric,ˇ J. H. Kotecha, J. Zhang, et al., “Particle filtering,” IEEE Signal Processing Magazine, vol. 2, no. 5, pp. 19–38, the TED. We have considered two classical methods: the es- September 2003. timation based on the correlation using pilot symbols and [11] Third Generation Partnership Project, “Spreading and mod- the estimation based on the ML criterion. We have shown ulation (FDD) (Release 1999),” 3G Tech. Spec. (TS) 25.213, that the ELG-TED with estimation of the channel coefficients v. 3.3.0, Technical Specification Group Radio Access Network, loses the capacity to distinguish the paths when the delays December 2000. are very closed. On the other hand, the PF-TED associated [12] E. Punskaya, A. Doucet, and W. J. Fitzgerald, “On the use with the classical two-channel estimator is able to separate and misuse of particle filtering in digital communications,” in the different paths. However, for very close delays the chan- Proc. 11th European Signal Processing Conference (EUSIPCO nel estimators prevent the PF-TED from tracking the time ’02), Toulouse, France, September 2002. variations of the delays. We have proposed to estimate jointly [13] R. A. Iltis, “A sequential Monte Carlo filter for joint ffi linear/nonlinear state estimation with application to DS- the delays and the channel coe cients using the PF methods CDMA,” IEEE Trans. Signal Processing, vol. 51, no. 2, pp. 417– to avoid this loss of tracking. We have found that the joint 426, 2003. estimation enables a better tracking of the delays. [14] T. Bertozzi, D. Le Ruyet, G. Rigal, and H. Vu-Thien, “On Finally, we have seen that it is very important for the Rake particle filtering for digital communications,” in Proc. IEEE receiver that the TED can distinguish the different paths of Workshop on Signal Processing Advances in Wireless Communi- the channel. We have observed that in the case of unresolv- cations (SPAWC ’03), pp. 570–574, Rome, Italy, June 2003. able paths, the ELG-TED confuses the paths and the perfor- [15] R. A. Iltis, “Joint estimation of PN code delay and multipath mance of the Rake receiver is very poor. using the extended Kalman filter,” IEEE Trans. Communica- tions, vol. 38, no. 10, pp. 1677–1685, 1990. As a conclusion, we can say that the PF-TED based on ffi [16] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte the joint estimation of the delays and the channel coe cients Carlo sampling methods for Bayesian filtering,” Statistics and can be a good substitute of the classical ELG-TED, specially Computing, vol. 10, no. 3, pp. 197–208, 2000. for indoor wireless communications. Moreover, the compu- [17] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communi- tational complexity of the PF-TED is very limited, since we cation, Addison-Wesley Wireless Communications, Prentice have used only 10 particles. Hall, Englewood Cliffs, NJ, USA, 1995. 2338 EURASIP Journal on Applied Signal Processing

[18] H. Meyr, M. Moeneclaey, and S. Fechtel, Digital Communica- Han Vu Thien received the M. Eng. degree tion Receivers: Synchronization, Channel Estimation, and Sig- in telecommunications from the Ecole´ Na- nal Processing, Wiley Series in Telecommunications and Sig- tionale Superieure´ des Tel´ ecommunications´ nal Processing, John Wiley & Sons, New York, NY, USA, 2nd (ENST), Paris, France, in 1967 and the edition, 1998. Ph.D. in physical science from University [19] E. Sourour, G. Bottomley, and R. Ramesh, “Delay tracking for of Paris XI, Orsay, France, in 1972. He has direct sequence spread spectrum systems in multipath fading spent his entire career with the Conserva- channels,” in IEEE 49th Vehicular Technology Conference (VTC toire National des Arts et Metiers´ (CNAM), ’99), vol. 1, pp. 422–426, Houston, Tex, USA, July 1999. wherehebecameaProfessorofelectrical [20] C. Komninakis, C. Fragouli, A. H. Sayed, and R. D. We- engineering in 1982. Since 1984, he is also sel, “Multi-input multi-output fading channel tracking and the Director of the Laboratoire des Signaux et systemes. His main equalization using Kalman estimation,” IEEE Trans. Signal research interests lie in image and signal processing for medical ap- Processing, vol. 50, no. 5, pp. 1065–1076, 2002. plications, Digital television, and HF communication. [21] P. Tichavsky, C. H. Muravchik, and A. Nehorai, “Poste- rior Cramer-Rao bounds for discrete-time nonlinear ﬁlter- ing,” IEEE Trans. Signal Processing, vol. 46, no. 5, pp. 1386– 1396, 1998. [22] U. Mengali and A. N. D’Andrea, Synchronization Techniques for Digital Receivers, Plenum Press, New York, NY, USA, 1997.

Tanya Bertozzi received the Dr. Eng. degree in electrical engineering with a specializa- tion in telecommunications from Univer- sity of Parma, Italy, in 1998. From Octo- ber 1998 to February 1999, she was a Re- search Engineer in the Information Depart- ment, University of Parma, Italy. In March 1999, she joined Diginext, Aix-en-Provence, France, as a Research Engineer. She is currently a Project Manager. In 2003, she received the Ph.D. degree in digital communications from the Con- servatoire National des Arts et Metiers´ (CNAM), Paris, France. Her research interests are in the areas of digital communications and signal processing including particle ﬁltering applications, channel estimation, multiuser detection, and space-time coding.

Didier Le Ruyet received the M.Eng. degree, the M.S. degree, and the Ph.D. degree from the Conservatoire National des Arts et Metiers´ (CNAM), Paris, France, in 1994 and 2001, respectively. From 1988 to 1995 he was a Research Engineer in the image processing and telecommunication Depart- ments of Sagem, Cergy, France. He joined the Department of Electrical and Com- puter Engineering at CNAM, Paris, in 1996, where he became an Assistant Professor in 2002. His main research interests lie in the areas of digital communications and signal processing including channel estimation, channel coding, iterative decoding, and space-time coding.

Cristiano Panazio wasborninBras´ılia, Brazil, in 1977. He received his B.S. and his M.S. degrees in electrical engineering from University of Campinas (UNICAMP), in 1999 and 2001, respectively. Since 2002, he is pursuing a Ph.D. degree at the Conserva- toire National des Arts et Metiers´ (CNAM), Paris, France. His interests include adaptive ﬁltering, synchronization, and nonlinear signal processing. EURASIP Journal on Applied Signal Processing 2004:15, 2339–2350 c 2004 Hindawi Publishing Corporation

Joint Tracking of Manoeuvring Targets and Classiﬁcation of Their Manoeuvrability

Simon Maskell QinetiQ Ltd, St. Andrews Road, Malvern, Worcestershire WR14 3PS, UK Email: [email protected] Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK

Received 30 May 2003; Revised 23 January 2004

Semi-Markov models are a generalisation of Markov models that explicitly model the state-dependent sojourn time distribution, the time for which the system remains in a given state. Markov models result in an exponentially distributed sojourn time, while semi-Markov models make it possible to define the distribution explicitly. Such models can be used to describe the behaviour of manoeuvring targets, and particle filtering can then facilitate tracking. An architecture is proposed that enables particle filters to be both robust and efficient when conducting joint tracking and classification. It is demonstrated that this approach can be used to classify targets on the basis of their manoeuvrability. Keywords and phrases: tracking, classification, manoeuvring targets, particle filtering.

1. INTRODUCTION proposes an approach that does not rely on the sojourn time distribution being of a given form, and so is capa- When tracking a manoeuvring target, one needs models that ble of capitalising on all available model fidelity regarding can cater for each of the different regimes that can govern the this distribution. The author asserts that the restrictions of target’s evolution. The transitions between these regimes are the aforementioned approaches currently limit the use of often (either explicitly or implicitly) taken to evolve accord- semi-Markov models in tracking systems and that the im- ing to a Markov model. At each time epoch there is a proba- proved modelling (and so estimation) accuracy that semi- bility of being in one discrete state given that the system was Markov models make possible has not been realised up to in another discrete state. Such Markov switching models re- now. sult in an exponentially distributed sojourn time, the time This paper further considers the problem of both track- for which the system remains in a given discrete state. Semi- ing and classifying targets. As discussed in [4], joint track- Markov models (also known as renewal processes [1]) are ing and classification is complicated by the fact that sequen- a generalisation of Markov models that explicitly model the tially updating a distribution over class membership neces- (discrete-state-dependent) distribution over sojourn time. At sarily results in an accumulation of errors. This is because, each time epoch there is a probability of being in one discrete when tracking, errors are forgotten. In this context, the ca- state given that the system was in another discrete state and pacity to not forget, memory, is a measure of how rapidly the how long it has been in that discrete state. Such models of- distribution over states becomes increasingly diffuse, making fer the potential to better describe the behaviour of manoeu- it difficult to predict where the target will be given knowledge vring targets. of where it was. Just as the system forgets where it was, so any However, it is believed that the full potential of semi- algorithm that mimics the system forgets any errors that are Markov models has not yet been realised. In [2], sojourns introduced. So, if the algorithm forgets any errors, it must were restricted to end at discrete epochs and filtered mode converge. In the case of classification, this diffusion does not probabilities were used to deduce the parameters of the time- take place; if one knew the class at one point, it would be varying Markov process, equivalent to the semi-Markov pro- known for all future times. As a result, when conducting joint cess. In [3], the sojourns were taken to be gamma-distributed tracking and classification, it becomes not just pragmatically with integer-shape parameters such that the gamma vari- attractive but essential that the tracking process introduces ate could be expressed as a sum of exponential variates; as few errors as possible. This means that the accumulation the semi-Markov model could then be expressed as a (po- of errors that necessarily takes place has as little impact as tentially highly dimensional) Markov model. This paper possible on the classification process. 2340 EURASIP Journal on Applied Signal Processing

There have been some previous approaches to solving the filter is not being increased by using semi-Markov models, problem of joint tracking and identification that have been but rather particle filters are being applied to the problem based on both grid-based approximations [5] and particle fil- associated with semi-Markov models. The resulting computers [6, 7]. An important failing of these implementations is tational cost is roughly equivalent to one Kalman filter per that target classes with temporarily low likelihoods can end particle and in the example considered in Section 6 just 25 up being permanently lost. As a consequence of this same particles were used for each of the three classes.1 The au- feature of the algorithms, these implementations cannot re- thor believes that this computational cost is not excessive cover from any miscalculations and are not robust. This ro- and that, in applications for which it is beneficial to capi- bustness issue has been addressed by stratifying the classi- talise on the use of semi-Markov models—which the author fier [4]; one uses separate filters to track the target for each believes to be numerous—the approach is practically useful. class (i.e., one might use a particle filter for one class and a However, this issue of the trade-off between the computa- Kalman filter for another) and then combines the outputs to tional cost and the resulting performance for specific appli- estimate the class membership probabilities and so classifica- cations is not the focus of this paper; here the focus is on tion of the target. This architecture does enable different state proposing the generic methodology. For this reason, a sim- spaces and filters to be used for each class, but has the defi- ple yet challenging, rather than necessarily practically useful, ciency that this choice could introduce biases and so system- example is used to demonstrate that the methodology has atic errors. So, the approach taken here is to adopt a single merit. state space common to all the classes and a single (particle) A crucial element of the particle filter is the proposal dis- filter, but to then attempt to make the filter as efficientaspos- tribution, the method by which each new sample is proposed sible while maintaining robustness. This ability to make the from the old samples. Expedient choice of proposal distri- filter efficient by exploiting the structure of the problem in bution can make it possible to drastically reduce the num- the structure of the solution is the motivation for the use of ber of particles necessary to achieve a certain level of per- a particle filter specifically. formance. Often, the trade-off between complexity and per- This paper demonstrates this methodology by consider- formance is such that this reduction in the number of parti- ing the challenging problem of classifying targets which differ cles outweighs any additional computation necessary to use only in terms of their similar sojourn time distributions; the the more expedient proposal distributions. So, the choice of set of dynamic models used to model the different regimes proposal distribution can be motivated as a method for re- are taken to be the same for all the classes. Were one using ducing computational expense. Here, however, if as few er- a Markov model, all the classes would have the same mean rors as possible, are to be introduced as is critically impor- sojourn time and so the same best-fitting Markov model. tant when conducting joint tracking and classification, it is Hence, it is only possible to classify the targets because semi- crucial that the proposal distribution is well matched to the Markov models are being used. true system. Hence, the set of samples is divided into a num- Since the semi-Markov models are nonlinear and non- ber of strata, each of which had a proposal that was well Gaussian, the particle-filtering methodology [8]isadopted matched to one of the classes. Whatever the proposal dis- for solving this joint tracking and classification problem. The tribution, it is possible to calculate the probability of ev- particle-filter represents uncertainty using a set of samples. ery class. So, to minimise the errors introduced, for each Here, each of the samples represent different hypotheses for particle (and so hypothesis for the history of state transi- the sojourns times and state transitions. Since there is uncer- tions and sojourn times), the probability of all the classes tainty over both how many transitions occurred and when is calculated. So each particle uses a proposal matched to they occurred, the particles represent the diversity over the one class, but calculates the probability of the target being number of transitions and their timing. Hence, the parti- a member of every class. Note that this calculation is not cles differ in dimensionality. This is different from the usual computationally expensive, but provides information that case for which the dimensionality of all the particles is the can be used to significantly improve the efficiency of the fil- same. Indeed, this application of the particle filter is a spe- ter. cial case of the generic framework developed concurrently by So, the particles are used to estimate the manoeuvres and other researchers [9]. The approach described here exploits a Kalman filter is used to track the target. The particles are the specifics of the semi-Markov model, but the reader inter- split into strata each of which is well suited to tracking one of ested in the more generic aspects of the problem is referred the classes and the strata of particles used to classify the target to [9]. on the basis of the target’s manoeuvrability. The motivation Since, if the sojourn times are known, the system is linear for this architecture is the need to simultaneously achieve ro- and Gaussian, the Kalman filter is used to deduce the param- bustness and efficiency. eters of the uncertainty over target state given the hypothe- This paper is structured as follows: Section 2 begins sised history of sojourns. So, the particle filter is only used for by introducing the notation and the semi-Markov model the difficult part of the problem—that of deducing the timings of the sojourn ends—and the filter operates much like a multiple hypothesis tracker, with hypotheses in the (contin- 1This number is small and one might use more in practical situations, uous) space of transition times. To make this more explicit, but the point is that the number of particles is not large and so the compu- it should be emphasised that the complexity of the particle tational expense is roughly comparable to other existing algorithms. Joint Manoeuvring Target Tracking—Manoeuvrability Classification 2341

τtk τk τtk+1 Continuous time, τ τkt τt τkt +1 Continuous time, τ

Measurement, k Measurement, k kk+1 k +2 kt kt +1 Sojourn ends, t Sojourn ends, t

tk tk +1 tk+1 = tk+2 tk+1 +1 t

Figure 1: Diagram showing the relationship between continuous time, the time when measurements were received, and the time of sojourn ends. The circles represent the receipt of measurements or the start of a sojourn. structure that is used. Section 3 describeshowaparticlefil- Table 1: Definition of notation. ter can be applied to the hard parts of the problem, the esti- Notation Definition mation of the semi-Markov process’ states. Some theoretical concerns relating to robust joint tracking and identification τk Continuous time relating to kth measurement are discussed in Section 4. Then, in Section 5,efficient and τt Continuous time relating to tth sojourn time ≤ ≤ robust particle-filter architectures are proposed as solutions tk Sojourn prior to kth measurement; so that τtk τk τtk+1 ≤ ≤ for the joint tracking and classification problem. Finally, an kt Measurement prior to tth sojourn; so that τkt τt τkt +1 exemplar problem is considered in Section 6 and some con- st Manoeuvre regime for τt <τ<τt+1 clusions are drawn in Section 7.

distribution of xτ given the initial state at the start of the so- 2. MODEL journ and the fact that the sojourn continues to time τ is When using semi-Markov models, there is a need to distin- | | guish between continuous time, the indexing of the measure- p xτ xτt , st, τt+1 >τ f xτ xτt , st . (3) ments, and the indexing of the sojourns. Here, continuous time is taken to be τ, measurements are indexed by k,and If xk is the history of states (in continuous time), then a manoeuvre regimes (or sojourns) are indexed by t.Thecon- probabilistic model exists for how each measurement, yk,is tinuous time when the kth measurement was received is τk. related to the state at the corresponding continuous time: The time of the onset of the sojourn is τ ; t is then the in- t k | = | = | dex of the sojourn during which the kth measurement was p yk xk p yk xτ1:τk p yk xτk . (4) received. Similarly, kt is the most recent measurement prior to the onset of the tth sojourn. This is summarised in Table 1 This formulation makes it straightforward to then form while Figure 1 illustrates the relationship between such quan- adynamicmodelfors1:tk process and τ1:tk as follows:   tities as (tk +1)andtk+1. tk The model corresponding to sojourn t is st. st is a discrete   p s , τ = p s |s − p τ |τ − , s − p s p τ , | 1:tk 1:tk t t 1 t t 1 t 1 1 1 semi-Markov process with transition probabilities p(st st−1) t=2 that are known; note that since, at the sojourn end, a transi- (5) tion must occur, so p(st|st−1) = 0ifst = st−1; where p(s1) is the initial prior on the state of the sojourn time | − = | − p st st 1 p st s1:t 1 ,(1)(which we later assume to be uniform) and p(τ1) is the prior on the time of the first sojourn end (which we later assume − where s1:t−1 is the history of states for the first to the (t 1)th to be a delta function). This can then be made conditional on regime and similarly, y1:k will be used to denote the history s1:tk−1 and τ1:tk−1 , which makes it possible to sample the semi- of measurements up to the kth measurement. Markov process’ evolution between measurements: For simplicity, the transition probabilities are here considered invariant with respect to time once it has been de- \ | p s1:tk , τ1:tk s1:tk−1 , τ1:tk−1 s1:tk−1 , τ1:tk−1 termined that a sojourn is to end; that is, p(s |s − )isnota t t 1 p s , τ function of τ. The sojourn time distribution that determines ∝ 1:tk 1:tk p s1:t −1 , τ1:t −1 the length of time for which the process remains in state st is k k − | tk distributed as g(τ τt st): | | t =2 p st st −1 p τt τt −1, st−1 p s1 p τ1 = tk−1 | | p τt+1|τt, st g τ − τt|st . (2) t=2 p st st −1 p τt τt −1, st−1 p s1 p τ1 tk The process governs a continuous time process, , st xτ = p st |st −1 p τt |τt −1, st−1 , which given st and a state at a time after the start of the so- t =tk−1+1 | journ xτt+1 >xτ >xτt has a distribution f (xτ xτ , st). So, the (6) 2342 EURASIP Journal on Applied Signal Processing where A \ B is the set A without the elements of the set B. A weight is then assigned according to the principle of im- { }\{ } Note that in this case s1:tk , τ1:tk s1:tk−1 , τ1:tk−1 could be the portance sampling: { }\{ }| empty set in which case, p( s , τ s − , τ − s − , 1:tk 1:tk 1:tk 1 1:tk 1 1:tk 1 = i i i τ1:tk−1 ) 1. \ | p s1:t , τ1:t s1:t − , τ1:t − s1:t − , τ1:t − i i k k k 1 k 1 k 1 k 1 So, it is possible to write the joint distribution of the st w¯ =w − k k 1 i \ i| i and xτ processes and the times of the sojourns, τ1:tk ,upto q s1:tk , τ1:tk s1:tk−1 , τ1:tk−1 s1:tk−1 , τ1:tk−1 , yk the time of the kth measurement, τ ,as k × | i p yk s1:tk , τ1:tk . | (9) p s1:tk , xk, τ1:tk y1:k ∝ | p s1:tk ,τ1:tk p xk, y1:k s1:tk , τ1:tk These unnormalised weights are then normalised: = p s ,τ p x |s , τ p y |x 1:tk 1:tk k 1:tk 1:tk 1:k k ¯ i   i = wk w , (10) tk k N i i=1 w¯ = |  |  k p s ,τ p x x , s p x x − , s − 1:tk 1:tk τk τtk tk τt τt 1 t 1 t=2 and estimates of expectations calculated using the (nor- k malised) weighted set of samples. When the weights become × p x p y |x τ1 k τk skewed, some of the samples dominate these expectations, k=1 so the particles are resampled; particles with low weights are ∝ | − , − , − − p s1:tk 1 xk 1τ1:tk 1 y1:k 1 probabilistically discarded and particles with high weights (7) The posterior at k−1 are probabilistically replicated in such a way that the expected number of offspring resulting from a given particle is propor- × \ | , − , − − , − p s1:tk τ1:tk s1:tk 1 τ1:tk 1 s1:tk 1 τ1:tk 1 tional to the particle’s weight. This resampling can introduce Evolution of semi-Markov model unnecessary errors. So, it should be used as infrequently as possible. To this end, a threshold can be put on the approxi- | p xτk xτt , stk ff ff × | k mate e ective sample size, so that when this e ective sample p ykxτk | p xτ − xτ , st − size falls below a predefined threshold, the resampling step is k 1 tk−1 k 1 Likelihood performed. This approximate effective sample can be calcu- Effect on xτ of incomplete regimes   lated as follows: tk ×  |  p x x − , s − . τt τt 1 t 1 ≈ 1 = Neff . (11) t tk−1+1 N i 2 i=1 w¯k Effect on xτ of sojourns between k−1andk It is also possible to calculate the incremental likelihood: This is a recursive formulation of the problem. The an- notations indicate the individual terms’ relevance. N | ≈ i p yk y1:k−1 w¯k, (12) 3. APPLICATION OF PARTICLE FILTERING i=1

Here, an outline of the form of particle filtering used is given which can be used to calculate the likelihood of the entire so as to provide some context for the subsequent discussion data sequence, which will be useful in later sections: and introduce notation. The reader who is unfamiliar with the subject is referred to the various tutorials (e.g., [8]) and k books (e.g., [10]) available on the subject. p y1:k = p y1 p yk |y1:k−1 , (13) A particle filter is used to deduce the sequence of sojourn k=2 times, τ1:tk , and the sequence of transitions, s1:tk , as a set of measurements are received. This is achieved by sampling N where p(y1) p(y1|y1:0), so can be calculated using (12). times from a proposal distribution of a form that extends the existing set of sojourn times and the st process with samples of the sojourns that took place between the previous and the 4. THEORETICAL CONCERNS RELATING TO JOINT current measurements: TRACKING AND CLASSIFICATION \ i The proofs of convergence for particle filters rely on the abil- s1:tk , τ1:tk s1:tk−1 , τ1:tk−1 ity of the dynamic models used to forget, the errors intro- ∼ \ | i duced by the Monte Carlo integration [11, 12]. If errors are q s1:tk , τ1:tk s1:tk−1 , τ1:tk−1 s1:tk−1 , τ1:tk−1 , yk , forgotten, then the errors cannot accumulate and so the algo- i = 1, ..., N. rithm must converge on the true uncertainty relating to the (8) path through the state space. Joint Manoeuvring Target Tracking—Manoeuvrability Classification 2343

Conversely, if the system does not forget, then errors will 5. EFFICIENT AND ROBUST CLASSIFICATION accumulate and this will eventually cause the filter to diverge. This applies to sequential algorithms in general, in- The previous section asserts that to be robust, it is essential cluding Kalman filters,2 which accumulate finite precision er- to estimate probabilities based on all the classes always re- ffi rors, though such errors are often sufficiently small that such maining constant. However, to be e cient, the filter should ff problems rarely arise and have even less rarely been noticed. react to the classification estimates and focus its e ort on the For a system to forget, its model needs to involve the most probable classes (this could equally be the class with states changing with time; it must be ergodic. There is then the highest expected cost according to some nonuniform cost a finite probability of the system being in any state given that function but this is not considered here). To resolve these two seemingly contradictory require- it was in any other state at some point in the past; so, it is not ffi possible for the system to get stuck in a state. Models for clas- ments of robustness twinned with e ciency, the structure of sification do not have this ergodic property since the class is the particle filter can be capitalised upon. The particle fil- constant for all time; such models have infinite memory. Ap- ter distinguishes between the proposal used to sample the proaches to classification (and other long memory problems) particles’ paths and the weights used to reflect the disparity have been proposed in the past based on both implicit and between the proposal and the true posterior. So, it is possi- explicit modifications of the model that reduce the memory ble for the proposal to react to the classification probabilities and favour proposals well suited to the more probable of the system by introducing some dynamics. Here, the em- ff phasis is on using the models in their true form. classes while calculating the weights for the di erent classes; this is equivalent to Rao-Blackwellising the discrete distribu- However, if the model’s state is discrete, as is the case with tion over class for each particle. classification, there is a potential solution described in this One could enable the system to react to the classification context in [4]. The idea is to ensure that all probabilities are probabilities while remaining robust to misclassification by calculated based on the classes remaining constant and to run each particle sampling the importance function from a set a filter for each class; these filters cannot be reduced in num- of importance samplers according to the classification prob- ber when the probability passes a threshold if the system is abilities. Each importance sampler would be well suited to to be robust. In such a case, the overall filter is condition- the corresponding class and each particle would calculate the ally ergodic. The approach is similar to that advocated for weights with respect to all the classes given its sampled values classification alone whereby different classifiers are used for of the state. different classes [13]. However, here a different architecture is advocated; the The preceding argument relates to the way that the fil- particles are divided into strata, such that the different strata ter forgets errors. This enables the filter to always be able to each use an importance function well suited to one of the visit every part of the state space; and the approach advo- classes. For any particle in the jth stratum, Sj , and in cated makes it possible to recover from a misclassification. the context of the application of particle filtering to semi- However, this does not guarantee that the filter can calculate Markov models, the importance function is then of the form classification probabilities with any accuracy. The problem is { }\{ }|{ } q( s , τ s − , τ − s − , τ − , y , S ). The strata ff 1:tk 1:tk 1:tk 1 1:tk 1 1:tk 1 1:tk 1 k j the variation resulting from di erent realisations of the er- then each have an associated weight and these weights sum to rors caused in the inference process. In a particle-filter con- unity across the strata. If each particle calculates the proba- text, this variation is the Monte Carlo variation and is the ff bility of all the classes given its set of hypotheses, then the result of having sampled one of many possible di erent sets architecture will be robust. It is then possible to make the ar- of particles at a given time. Put more simply; performing the chitecture efficient by adding a decision logic that reacts to sampling step twice would not give the same set of samples. the weights on the strata; one might add and remove strata Equation (13) means that, if each iteration of the tracker on the basis of the classification probabilities. The focus here introduces errors, the classification errors necessarily accu- is not on designing such a decision logic, but to propose an mulate. There is nothing that can be done about this. All that architecture that permits the use of such logic. can be done is to attempt to minimise the errors that are in- To use this architecture, it is necessary to manipulate troduced such that the inevitable accumulation of errors will strata of particles and so to be able to calculate the total not impact performance on a time scale that is of interest. weight on a class or equally on a stratum. To this end, the So, to be able to classify targets based on their dynamic relations that enable this to happen are now outlined. behaviour, all estimates of probabilities must be based on the The classes are indexed by c,particlesbyi, and the strata classes remaining constant for all time and the errors intro- by j. The model used to calculate the weights is M and the duced into the filter must be minimised. As a result, clas- stratum is S. So, the unnormalised weight for the ith particle sification performance is a good test of algorithmic perfor- (i,j,c) in stratum Sj , using model Mc,isw¯ . mance. k The weight on a stratum, p(Sj |y1:k), can be deduced from

2 It is well documented that extended Kalman filters can accumulate lin- p Sj |y1:k ∝ p y1:k|Sj p Sj , (14) earisation errors which can cause filter divergence, but here the discussion relates to Kalman filtering with linear Gaussian distributions such that the Kalman filter is an analytic solution to the problem of describing the pdf. where p(Sj ) is the (probably uniform) prior across the strata. 2344 EURASIP Journal on Applied Signal Processing

This leads to the following recursion: For j = 1:NM Initialise: (j) = 1 p Sj |y1:k ∝ p yk|y1:k−1, Sj p Sj |y1:k−1 , (15) w0 /NM For i = 1:NP | Initialise: w(i j) = 1/N where p(yk|y1:k−1, Sj ) can be estimated using a minor modi- 0 P (i,j) ∼ fication of (12) as follows: Initialise: x0 p(x0) For c = 1:NM (c|i,j) = | ≈ (i,j,c) Initialise: w0 1/NM p yk y1:k−1, Sj w¯k . (16) End For i,c End For Similarly, for the classes, End For For k = 1:NK | ∝ | | p Mc y1:k p yk y1:k−1, Mc p Mc y1:k−1 , (17) Implement recursion where End For Algorithm 1 p Mc|y1:k = p Sj , Mc|y1:k j = | | So, with NP particles and NM classes (and so NM strata), p Sj y1:k p Mc Sj , y1:k , (18) j running the algorithm over NK steps can be summarised as follows in Algorithm 1. p(x0) is the initial prior on the state | ∝ (i,j,c) p Mc Sj , y1:k w¯k . and Implement Recursion is conducted as in Algorithm 2 i where Vj is the reciprocal of the sum of the squared weights, To implement this recursion, the weights of the classes on the basis of which one can decide whether or not it is necessary to Resample. NT is then the threshold on the approxi- are normalised such that they sum to unity over the particle ff in the strata: mate e ective sample size which determines when to resample; NT ≈ (1/2)NP might be typical. Note that the resam- (i,j,c) pling operation will result in replicants of a subset of some (c|i,j) w¯k wk (i,j) , (19) of the particles within the jth stratum, but that for each copy | w¯k (c i,j) of the ith particle in the jth stratum, wk is left unmodi- (i,j) fied. where w¯k is the total unnormalised weight of the particle: (i,j) (i,j,c) w¯k w¯k . (20) 6. EXAMPLE c 6.1. Model These weights are then normalised such that they sum to The classification of targets which differ solely in terms of the unity within each strata: semi-Markov model governing the st processisconsidered. (i,j) The classes have different gamma distributions for their so- | ¯ (i j) wk journ times but all have the same mean value for the sojourn wk (j) , (21) w¯k time, and so the same best-fitting Markov model. As stated in the introduction, this example is intended to provide a diffi- (j) where w¯k is the total unnormalised weight of the stratum: cult to analyse, yet simple to understand, exemplar problem. The author does intend the reader to infer that the specific (j) (i,j) choice of models and parameters are well suited to any spe- w¯k w¯k . (22) i cific application. The xτ process is taken to be a constant velocity model; These weights are also normalised such that they sum to an integrated diffusion process unity across the strata: f xτ+∆|xτ , s = N xτ+∆; A(∆)xτ , Qs(∆) , (25) (j) (j) w¯ k wk (j) . (23) where N (x; m, C) denotes a Gaussian distribution for x,with w¯ j k mean, m,andcovariance,C,andwhere The skewness of each stratum is then used to assess 1 ∆ whether that stratum has degenerated and so if resampling is A(∆) = , 01 necessary for the set of particles in that stratum. This means   3 2 that the weight relating to Mc for the ith particle within the ∆ ∆ (26)   jth stratum is ∆ =  3 2  2 Qs( )  ∆2  σs , ∆ (i,j,c) ∝ (j) (i| j) (c|i,j) wk wk wk wk . (24) 2 Joint Manoeuvring Target Tracking—Manoeuvrability Classification 2345

0.35 Initialise w¯k = 0 = For j 1:NM 0.3 Initialise Vj = 0 ¯c = Initialise output classification probabilities: Pk 0 0.25 (j) = Initialise w¯k 0 = 0.2 For i 1:NP ) (i,j) t = ( Initialise w¯k 0 p 0.15 (i,j) ∼ | (i,j) Sample xk q(xk xk−1, yk, Sj ) = For c 1:NM 0.1 (i,j,c) = (j) (i| j) (c|i,j) wk wk−1wk−1 wk−1 (i,j,c) = (i,j,c) | (i,j) 0.05 w¯k wk (p(yk xk , Mc) × (i,j,c)| (i,j) (i,j)| (i,j) p(xk xk−1, Mc)/q(xk xk−1, yk, Sj )) 0 02468101214161820 w¯ (i,j) = w¯ (i,j) + w¯ (i,j,c) k k k Sojourn time, t w¯ (j) = w¯ (j) + w¯ (i,j,c) k k k g(τ;2,5) = (i,j,c) w¯k w¯k + w¯k g(τ; 10, 1) ¯c = ¯c (i,j,c) g(τ; 50, 0.2) Pk Pk + w¯k End For = ff End For Figure 2: Sojourn time distributions for st 1forthedi erent classes. End For For c = 1:NM c = ¯c Pk Pk/w¯k, which can be output as necessary G For j = 1:NM where (x; α, β) is a gamma distribution over x, with shape (j) = (j) parameter α and scale parameter β. Figure 2 shows these dif- wk w¯k /w¯k For i = 1:NP ferent sojourn time distributions. Note that since the mean (i| j) = (i,j) (j) wk w¯k /w¯k of the gamma distribution is αβ, all the sojourn distri- = For c 1:NM butions for st = 1 have the same mean. Hence, the ex- (c|i,j) = (c|i,j) (i,j) wk w¯k /w¯k ponential distribution (which only has a single parameter End For | = (i j) 2 that defines the mean) for all three classes would be the Vj Vj +(wk ) End For same. Resample jth stratum if 1/Vj 1. The three classes’ sojourn distributions are 6.2. Tracking of manoeuvring targets   G − = = A target from the first class is considered. A Rao-Blackwel-  τ τt;2,5 , st 1, c 1,   lised particle filter is used. The particle filter samples the so- G τ − τt; 10, 1 , st = 1, c = 2, − | , = journ ends and then, conditional on the sampled sojourn g τ τt st Mc  G − = = ends and state transitions, uses a Kalman filter to exactly de-  τ τt; 50, 0.2 , st 1, c 3,  scribe the uncertainty relating to xτ and a discrete distribu- G − = ∀ τ τt; 10, 0.1 , st 2, c, tion over class to exactly describe the classification probabil- (29) ities (as described previously). 2346 EURASIP Journal on Applied Signal Processing

For the proposal in the particle filter, (6), the dynamic For i = 1:NP prior for the st process is used, with a minor modification: i = Initialise w0 1/NP i = \ | Initialise τ1 0 q s1: , τ1: s1: − , τ1: − s1: − , τ1: − , y i tk tk tk 1 tk 1 tk 1 tk 1 k Initialise s1 as 1 if i>NP/NM or 2 with otherwise i = \ | Initialise Kalman filter mean m0 m p s , τ s − , τ − s − , τ − , τ >τ , M 1:tk 1:tk 1:tk 1 1:tk 1 1:tk 1 1:tk 1 tk+1 k j Initialise Kalman filter covariance Ci = C 0 = \ p s1:tk , τ1:tk+1 s1:tk−1 , τ1:tk−1 End For For k = 1:NK | s1:tk−1 , τ1:tk−1 , τtk+1 >τk, Mj dτtk +1, Initialise V = 0 (32) Initialise w¯k = 0 For i = 1:NP that is, when sampling up to time τk, the st process is ex- { }i \{ }i Sample s1:tk , τ1:tk s1:tk−1 , τ1:tk−1 tended to beyond τk, but the sample of the final sojourn time ∼ { }\{ }|{ }i p( s1:tk ,τ1:tk s1:tk−1 ,τ1:tk−1 s1:tk−1 ,τ1:tk−1 ) i i i i is integrated out (so forgotten); the proposal simply samples Calculate mk and Ck from mk−1 and Ck−1 i \ i that the next sojourn is after the time of the measurement, using s1:tk s1:tk−1 |{ }i i not what time it actually took place. This exploits some struc- Calculate p(yk s1:tk , τ1:tk )fromyk, mk, i ture in the problem since τ +1 has no impact on the estima- and Ck tk i = i |{ }i w¯ w −1 p(yk s1:t , τ1:t ) tion up to time τk and so classification on the basis of y1:k. k k k k = i The weight update equation simplifies since the dynamics are w¯k w¯k + w¯k used as the proposal: End For For i = 1:N P i i = i i = i | wk w¯k/w¯k w¯k wk−1 p yk s1:tk , τ1:tk , (33) = i 2 V V +(wk) |{ }i Resample if 1/V < NT where p(yk s1:tk , τ1:tk ) can straightforwardly be calculated by a Kalman filter with a time-varying process model (with End For model transitions at the sojourn ends) and measurement updates at the times of the measurements. Algorithm 3 Having processed the k measurement, the ith particle then needs to store the time of the hypothesised last sojourn, 2 (i) (i) τtk , the current state, stk , a mean and covariance for xτk ,and (i) 1.8 a weight, wk . Just NP = 25 particles are used and initialised with sam- 1.6 ples from p(s1)andp(τ1) (so all the same τ1). Each particles’ initial value for the Kalman filter’s mean is the true initial process

t 1.4 state, m. The initial value for the covariance is then defined s as C:   1.2 100 0  C = . (34) 1 010 0 10203040506070 The weights are all initialised as equal for all the particles. Time Resampling takes place if the approximate effective sample sizegivenin(11) falls below NT = 12.5. Since each parti- Figure 3: True trajectory for target through st state space. cle needs to calculate the parameters of a Kalman filter, the computational cost is roughly equivalent to that of a multi- most part) to the true trajectory. The diversity of the parti- ple hypothesis tracker [14] with 25 hypotheses; here the hy- cles represents the uncertainty over the later part of the state potheses (particles) are in the continuous space of the times sequence with the particles representing different hypothe- of the sojourn ends rather than the discrete space of the asso- sised times and numbers of recent regime switches. ciations of measurements with the track. The computational cost is therefore relatively low and the algorithm is therefore 6.3. Classification on the basis of manoeuvrability amenable to practical real-time implementation. The proposals that are well suited to each class each use the With N particles and N iterations, the algorithm is im- P K associated class’ prior as their proposal: plemented as in Algorithm 3. The true trajectory through the discrete space is given in \ | q s1:tk , τ1:tk s1:tk−1 , τ1:tk−1 s1:tk−1 , τ1:tk−1 , yk, Sj Figure 3. The hypothesis for the trajectory through the dis- \ | crete space for some of the particles is shown in Figure 4. p s1:tk , τ1:tk s1:tk−1 , τ1:tk−1 s1:tk−1 , τ1:tk−1 , Mj . Note that, as a result of the resampling, all the particles (35) have the same hypothesis for the majority of the trajectory through the discrete space, which is well matched (for the The weight update equation is then Joint Manoeuvring Target Tracking—Manoeuvrability Classification 2347

2 2 2

1.8 1.8 1.8

1.6 1.6 1.6

process 1.4 process 1.4 process 1.4 t t t s s s 1.2 1.2 1.2

1 1 1 0 10203040506070 0 10203040506070 0 10203040506070 Time Time Time

(a) (b) (c)

2 2 2

1.8 1.8 1.8

1.6 1.6 1.6 1 4 1 4 process 1.4 . process . process t t t s s s 1.2 1.2 1.2

1 1 1 0 10203040506070 010203040506070 0 10203040506070 Time Time Time

(d) (e) (f)

2 2 2

1.8 1.8 1.8

1.6 1.6 1.6 1 4 1 4 1 4 process process process . . . t t t s s s 1.2 1.2 1.2

1 1 1 0 10203040506070 0 10203040506070 0 10203040506070 Time Time Time

(g) (h) (i)

Figure 4: A subset of the particles’ hypothesised trajectories through st space. (a) Particle 1. (b) Particle 2. (c) Particle 3. (d) Particle 4. (e) Particle 5. (f) Particle 6. (g) Particle 7. (h) Particle 8. (i) Particle 9.

| (i,j) (i,j) \ (i,j)| (i,j) p y s , τ p s , τ s − , τ − s − , τ − , M (i,j,c) (i,j,c) k 1:tk 1:tk 1:tk 1:tk 1:tk 1 1:tk 1 1:tk 1 1:tk 1 c w¯ = w − . (36) k k 1 (i,j) \ (i,j)| (i,j) p s1:tk , τ1:tk s1:tk−1 , τ1:tk−1 s1:tk−1 , τ1:tk−1 , Mj

(i,j) (i,j) Having processed the k measurement, the ith particle in journ, τtk , the current state, stk ,ameanandcovariance (c|i,j) (i| j) the jth stratum stores the time of the hypothesised last so- for xτk , a weight for each class, wk , and a weight, wk . 2348 EURASIP Journal on Applied Signal Processing

1 1

0.8 0.8

0.6 0.6 (class)) (class)) p p ( ( · 0.4 · 0.4 std std 0.2 0.2

0 0 0 40 80 120 160 200 0 40 80 120 160 200 Time Time

1 1 2 2 3 3

(a) (b)

0.8

0.6 (class)) p (

· 0.4 std 0.2

0 0 40 80 120 160 200 Time

1 2 3

(c)

Figure 5: Standard deviation (std) of estimated classiﬁcation probabilities, (p (class)), across ten ﬁlter runs for simulations according to each of the three models, labelled as 1, 2, and 3. (a) Data simulated from class 1. (b) Data simulated from class 2. (c) Data simulated from class 3.

(j) Each stratum also stores wk . The reader is referred to the between the variance in the class membership probabilities preceding sections’ summaries of the algorithms for the im- and the variance of the strata weights. plementation details. Ten runs were conducted with data simulated according NP = 25 particles are used per stratum, each is initialised to each of the three models. The number of particles used as described previously with a uniform distribution over the is deliberately sufficiently small that the inevitable accumu- classes and with the weights on the strata initialised as be- lation of errors causes problems in the time frame consid- ing equal. Resampling for a given stratum takes place if the ered. This enables a comparison between the time variation approximate effective sample size given in (11) for the stra- in the variance across the runs of the classification probabil- tum falls below NT = 12.5. Since each of the NM = 3strata ities and the variance across the runs of the strata weights. has NP = 25 particles, the computational cost is approx- So, Figures 5 and 6 show the time variation in the variance imately that of a multiple hypothesis tracker which main- across the runs of these two quantities. It is indeed evident tains 75 hypotheses; the algorithm is practicable in terms of that there is significant variation across the runs; the errors its computational expense. are indeed accumulating with time. It is also evident that this However, it should be noted that, for this difficult prob- accumulation is faster for the importance weights than for lem of joint tracking and classification using very similar the classification probabilities. This implies that the choice models, the number of particles used is small. This is inten- of importance function is less important, in terms of robust- tional and is motivated by the need to look at the difference ness of the estimation of the classification probabilities, than Joint Manoeuvring Target Tracking—Manoeuvrability Classification 2349

1 1

0.8 0.8

0.6 0.6 (class)) (class)) w ( w ( ·

0.4 · 0.4 std std 0.2 0.2

0 0 0 40 80 120 160 200 0 40 80 120 160 200 Time Time

1 1 2 2 3 3

(a) (b)

0.8

0.6 (class)) w (

· 0.4 std 0.2

0 0 40 80 120 160 200 Time

1 2 3

(c)

Figure 6: Variance of strata weights across ten filter runs for simulations according to each of the three models. (a) Data simulated from class 1. (b) Data simulated from class 2. (c) Data simulated from class 3. calculating the probabilities of all the classes for every sam- ACKNOWLEDGMENTS ple. This work was funded by the UK MoD Corporate Research ffi It is di cult to draw many conclusions from the varia- Programme. The author also gratefully acknowledges the tions across the true class. Since such issues are quite spe- award of his Industrial Fellowship by the Royal Commis- cific to the models and parameters, which are not the fo- sion for the Exhibition of 1851. The author would like to cus of this paper, this is not further investigated or dis- thank John Boyd and Dave Sworder for useful discussions cussed. on the subject of joint tracking and classification associated with the use of semi-Markov models and Arnaud Doucet and 7. CONCLUSIONS Matthew Orton for discussions of how particle filters can be used in such problems. The author also thanks the reviewers Particle filtering has been applied to the use of semi-Markov for their comments. models for tracking manoeuvring targets. An architecture has been proposed that enables particle filters to be both ro- REFERENCES ffi bust and e cient when classifying targets on the basis of their [1]D.R.CoxandV.Isham, Point Processes, Chapman and Hall, dynamic behaviour. It has been demonstrated that it is pos- New York, NY, USA, 1980. sible to jointly track such manoeuvring targets and classify [2] L. Campo, P. Mookerjee, and Y. Bar-Shalom, “State estima- their manoeuvrability. tion for systems with sojourn-time-dependent Markov model 2350 EURASIP Journal on Applied Signal Processing

switching,” IEEE Trans. Automatic Control,vol.36,no.2,pp. 238–243, 1991. [3] D. D. Sworder, M. Kent, R. Vojak, and R. G. Hutchins, “Re- newal models for maneuvering targets,” IEEE Trans. on Aerospace and Electronics Systems, vol. 31, no. 1, pp. 138–150, 1995. [4] N. J. Gordon, S. Maskell, and T. Kirubarajan, “Efficient particle filters for joint tracking and classification,” in Signal and Data Processing of Small Targets, vol. 4728 of Proceedings SPIE, pp. 439–449, Orlando, Fla, USA, April 2002. [5] S. Challa and G. Pulford, “Joint target tracking and classification using radar and ESM sensors,” IEEE Trans. on Aerospace and Electronics Systems, vol. 37, no. 3, pp. 1039–1055, 2001. [6] Y. Boers and H. Driessen, “Integrated tracking and classification: an application of hybrid state estimation,” in Signal and Data Processing of Small Targets, vol. 4473 of Proceedings SPIE, pp. 198–209, San Diego, Calif, USA, July 2001. [7] S. Herman and P. Moulin, “A particle filtering approach to FM-band passive radar tracking and automatic target recognition,” in Proc. IEEE Aerospace Conference, vol. 4, pp. 1789– 1808, BigSky, Mont, USA, March 2002. [8] M.S.Arulampalam,S.Maskell,N.J.Gordon,andT.Clapp,“A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking,” IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 174–188, 2002. [9] J. Vermaak, S. J. Godsill, and A. Doucet, “Radial basis function regression using trans-dimensional sequential Monte Carlo,” in Proc. 12th IEEE Workshop on Statistical Signal Processing, pp. 525–528, St. Louis, Mo, USA, October 2003. [10]A.Doucet,J.F.G.deFreitas,andN.J.Gordon,Eds.,Sequen- tial Monte Carlo Methods in Practice,Springer,NewYork,NY, USA, 2001. [11] F. Le Gland and N. Oudjane, “Stability and uniform approximation of nonlinear filters using the Hilbert metric, and application to particle filters,” Tech. Rep. RR-4215, INRIA, Chesnay Cedex France, 2001. [12] D. Crisan and A. Doucet, “Convergence of sequential Monte Carlo methods,” Tech. Rep. CUED/F-INFENG/TR381, De- partment of Engineering, University of Cambridge, Cam- bridge, UK, 2000. [13] P. M. Baggenstoss, “The PDF projection theorem and the class-specific method,” IEEE Trans. Signal Processing, vol. 51, no. 3, pp. 672–685, 2003. [14] Y. Bar-Shalom and X. R. Li, Multitarget-Multisensor Tracking: Principles and Techniques, YBS Publishing, Storrs, Conn, USA, 1995.

Simon Maskell is a Ph.D. degree graduand and holds a ﬁrst-class degree with Distinc- tion from Department of Engineering, Uni- versity of Cambridge. His Ph.D. degree was funded by one of six prestigious Royal Com- mission for the Exhibition of 1851 Indus- trial Fellowships awarded on the basis of outstanding excellence to researchers working in British industry. At QinetiQ, he is a Lead Researcher for tracking in the Ad- vanced Signal and Information Processing Group (ASIP). As such, he leads several projects and coordinates ASIP’s tracking research while also supervising a number of other researchers. Simon has authored a number of papers, as well as several technical reports and a patent. He has also been the leading force behind the development of a QinetiQ product to provide a generic solution to all of QinetiQ’s tracking problems. EURASIP Journal on Applied Signal Processing 2004:15, 2351–2365 c 2004 Hindawi Publishing Corporation

Bearings-Only Tracking of Manoeuvring Targets Using Particle Filters

M. Sanjeev Arulampalam Maritime Operations Division, Defence Science and Technology Organisation (DSTO), Edinburgh, South Australia 5111, Australia Email: [email protected]

B. Ristic Intelligence, Surveillance & Reconnaissance Division, Defence Science and Technology Organisation (DSTO), Edinburgh, South Australia 5111, Australia Email: [email protected] N. Gordon Intelligence, Surveillance & Reconnaissance Division, Defence Science and Technology Organisation (DSTO), Edinburgh, South Australia 5111, Australia Email: [email protected] T. Mansell Maritime Operations Division, Defence Science and Technology Organisation (DSTO), Edinburgh, South Australia 5111, Australia Email: [email protected]

Received 2 June 2003; Revised 17 December 2003

We investigate the problem of bearings-only tracking of manoeuvring targets using particle filters (PFs). Three different (PFs) are proposed for this problem which is formulated as a multiple model tracking problem in a jump Markov system (JMS) framework. The proposed filters are (i) multiple model PF (MMPF), (ii) auxiliary MMPF (AUX-MMPF), and (iii) jump Markov system PF (JMS-PF). The performance of these filters is compared with that of standard interacting multiple model (IMM)-based trackers such as IMM-EKF and IMM-UKF for three separate cases: (i) single-sensor case, (ii) multisensor case, and (iii) tracking with hard constraints. A conservative CRLB applicable for this problem is also derived and compared with the RMS error performance of the filters. The results confirm the superiority of the PFs for this difficult nonlinear tracking problem. Keywords and phrases: bearings-only tracking, manoeuvring target tracking, particle filter.

1. INTRODUCTION Most researchers in the field of bearings-only tracking have concentrated on tracking a nonmanoeuvring target. The problem of bearings-only tracking arises in a variety of Due to inherent nonlinearity and observability issues, it is important practical applications. Typical examples are sub- difficult to construct a finite-dimensional optimal Bayesian marine tracking (using a passive sonar) or aircraft surveil- filter even for this relatively simple problem. As for the lance (using a radar in a passive mode or an electronic war- bearings-only tracking of a manoeuvring target, the prob- fare device) [1, 2, 3]. The problem is sometimes referred to lemismuchmoredifficult and so far, very limited research as target motion analysis (TMA), and its objective is to track has been published in the open literature. For example, inter- the kinematics (position and velocity) of a moving target us- acting multiple model (IMM)-based trackers were proposed ing noise-corrupted bearing measurements. In the case of au- in [6, 7] for this problem. These algorithms employ a con- tonomous TMA (single observer only), which is the focus of stant velocity (CV) model along with manoeuvre models to a major part of this paper, the observation platform needs to capture the dynamic behaviour of a manoeuvring target sce- manoeuvre in order to estimate the target range [1, 3]. This nario. Le Cadre and Tremois [8] modelled the manoeuvring need for ownship manoeuvre and its impact on target state target using the CV model with process noise and developed observability have been explored extensively in [4, 5]. a tracking filter in the hidden Markov model framework. 2352 EURASIP Journal on Applied Signal Processing

This paper presents the application of particle filters 2. PROBLEM FORMULATION (PFs) [9, 10, 11] for bearings-only tracking of manoeuvring targets and compares its performance with traditional IMM- 2.1. Single-sensor case based filters. This work builds on the investigation carried Conceptually, the basic problem in bearings-only tracking is out by the authors in [12] for the same problem. The ad- to estimate the trajectory of a target (i.e., position and veloc- ditional features considered in this paper include (a) use of ity) from noise-corrupted sensor bearing data. For the case ff di erent manoeuvre models, (b) two additional PFs, and of a single-sensor problem, these bearing data are obtained (c) tracking with hard constraints. The error performance from a single-moving observer (ownship). The target state of the developed filters is analysed by Monte Carlo (MC) vector is simulations and compared to the theoretical Cramer-Rao´ lower bounds (CRLBs). Essentially, the manoeuvring tar- T get problem is formulated in a jump Markov system (JMS) xt = xt yt x˙ t y˙ t ,(1) framework and these filters provide suboptimal solutions to the target state, given a sequence of bearing measurements where (x, y)and(x˙, y˙) are the position and velocity compo- and the particular JMS framework. In the JMS framework nents, respectively. The ownship state vector xo is similarly considered in this paper, the target motion is modelled by defined. We now introduce the relative state vector defined three switching dynamics models whose evolution follows a by Markov chain. One of these models is the standard CV model while the other two correspond to coordinated turn (CT) T models that capture the manoeuvre dynamics. x xt − xo = xyx˙ y˙ (2) Three different PFs are proposed for this problem: (i) multiple model PF (MMPF), (ii) auxiliary MMPF (AUX- MMPF), and (iii) JMS-PF. The MMPF [12, 13]andAUX- for which the discrete-time state equation will be written. MMPF [14] represent the target state and the mode at ev- The dynamics of a manoeuvring target is modelled by mul- ery time by a set of paired particles and construct the joint tiple switching regimes, also known as a JMS. We make the posterior density of the target state and mode, given all mea- assumption that at any time in the observation period, the surements. The JMS-PF, on the other hand, involves a hybrid target motion obeys one of s = 3 dynamic behaviour models: scheme where it uses particles to represent only the distribu- (a) CV motion model, (b) clockwise CT model, and (c) anti- tion of the modes, while mode-conditioned state estimation clockwise CT model. Let S {1, 2, 3} denote the set of three is carried out using extended Kalman filters (EKFs). models for the dynamic motion, and let rk be the regime vari- The performance of the above algorithms is compared able in effect in the interval (k − 1, k], where k is the discrete- with two conventional schemes: (i) IMM-EKF and (ii) IMM- time index. Then, the target dynamics can be mathematically UKF. These filters represent the posterior density at each time written as epoch by a finite Gaussian mixture, and they merge and mix these Gaussian mixture components at every step to avoid = (rk+1) o o ∈ the exponential growth in the number of mixture compo- xk+1 f xk, xk, xk+1 + Gvk rk+1 S ,(3) nents. The IMM-EKF uses EKFs while the IMM-UKF utilises unscented Kalman filters (UKFs) [15] to compute the mode- where conditioned state estimates. In addition to the autonomous bearings-only tracking   T2 problem, two further cases are investigated in the paper: mul-    0   2  tisensor bearings-only tracking, and tracking with hard con-  2   T  straints. The multisensor bearings-only problem involves a G =  0  ,(4)  2  slight modification to the original problem, where a second   static sensor sends its target bearing measurements to the  T 0  original platform. The problem of tracking with hard con- 0 T straints involves the use of prior knowledge, such as speed constraints, to improve tracker performance. T is the sampling time, and v is a 2 × 1 i.i.d. process noise The organisation of the paper is as follows. Section 2 k vector with vk ∼ N (0, Q). The process noise covariance ma- presents the mathematical formulation for the bearings-only trix is chosen to be Q = σ2I,whereI is the 2 × 2 identity tracking problem for each of the three different cases in- a matrix and σa is a process noise parameter. Note that Gvk vestigated: (i) single-sensor case, (ii) multisensor case, and in (3) corresponds to a piecewise constant white acceleration (iii) tracking with hard constraints. In Section 3 the relevant noise model [16] which is adequate for the large sampling CRLBs are derived for all but case (iii) for which the ana- time chosen in our paper. The mode-conditioned transition lytic derivation is difficult (due to the non-Gaussian prior function f(rk+1)(·, ·, ·)in(3)isgivenby and process noise vectors). The tracking algorithms for each case are then presented in Section 4 followed by simulation (rk+1) o o = (rk+1) · o − o results in Section 5. f xk, xk, xk+1 F xk xk + xk xk+1. (5) Bearings-Only Tracking of Manoeuvring Targets 2353

Here F(rk+1)(·) is the transition matrix corresponding to The available measurement at time k is the angle from mode rk+1, which, for the particular problem of interest, can the observer’s platform to the target, referenced (clockwise be speciﬁed as follows. F(1)(·) corresponds to CV motion and positive) to the y-axis and is given by is thus given by the standard CV transition matrix:   zk = h xk + wk, (12) 10T 0   010T F(1) x =   . (6) k 001 0 where wk is a zero-mean independent Gaussian noise with variance 2 and 000 1 σθ The next two transition matrices correspond to CT transi- xk h xk = arctan (13) tions (clockwise and anticlockwise, respectively). These are yk given by

  is the true bearing angle. The state variable of interest for esti- Ω(j) − Ω(j) mation is the hybrid state vector y = (xT , r )T .Thus,givena  sin k T 1 cos k T  k k k 10 −  ={ }  Ω(j) Ω(j)  set of measurements Zk z1, ..., zk and the jump-Markov  k k    model (3), the problem is to obtain estimates of the hybrid  1 − cos Ω(j) sin Ω(j)   k T k T  state vector yk. In particular, we are interested in computing (j) =01 (j) (j)  F xk  Ω Ω  , the kinematic state estimate xˆ | = E[x |Z ] and mode prob-  k k  k k k k   P = | ∈  Ω(j) − Ω(j)  abilities (rk j Zk), for every j S. 00 cos k T sin k T    Ω(j) Ω(j) 2.2. Multisensor case 0 0 sin k T cos k T Suppose there is a possibility of the ownship receiving addi- j = 2, 3, tional (secondary) bearing measurements from a sensor lo- (7) s s cated at (xk, yk) whose measurement errors are independent to that of the ownship sensor. For simplicity, we assume that where the mode-conditioned turning rates are (a) additional measurements are synchronous to the primary sensor measurements that (b) there is a zero transmission de- Ω(2) = am lay from the sensor at (xs , ys ) to the ownship at (xo, yo). The k , k k k k o 2 o 2 secondary measurement can be modelled as x˙k + x˙k + y˙k + y˙k (8) −a Ω(3) = m k . = o 2 o 2 zk h xk + wk, (14) x˙k + x˙k + y˙k + y˙k where Here am > 0 is a typical manoeuvre acceleration. Note that the turning rate is expressed as a function of target speed (a o − s nonlinear function of the state vector xk) and thus models 2 = xk + xk xk h xk arctan o − s (15) and 3 are clearly nonlinear transitions. yk + yk yk We model the mode rk in eﬀect at (k − 1, k]byatime- homogeneous 3-state ﬁrst-order Markov chain with known Π and wk is a zero-mean white Gaussian noise sequence with transition probability matrix , whose elements are 2 variance σθ . If the additional bearing measurement is not re- =∅ ceived at time k,wesetzk . The bearings-only track- πij P rk = j|rk−1 = i , i, j ∈ S,(9)ing problem for this multisensor case is then to estimate the state vector xk given a sequence of measurements Zk = { } such that z1, z1, ..., zk, zk . 2.3. Tracking with constraints πij ≥ 0, πij = 1. (10) j In many tracking problems, one has some hard constraints on the state vector which can be a valuable source of information in the estimation process. For example, we may know The initial probabilities are denoted by πi P(r1 = i)for the minimum and maximum speeds of the target given by i ∈ S and they satisfy the constraint

≥ = ≤ o 2 o 2 ≤ πi 0, πi 1. (11) smin x˙k + x˙k + y˙k + y˙k smax. (16) i 2354 EURASIP Journal on Applied Signal Processing

∗ Suppose some constraint (such as the speed constraint) is where the mode-history-conditioned information matrix Jk imposed on the state vector, and denote the set of constrained is Ψ state vectors by . Let the initial distribution of the state vec- ∼ ∗ = E ∇ ∇ T H ∗ tor in the absence of constraints be x0 p(x0). With con- Jk xk log p xk, Zk xk log p xk, Zk k . straints, this initial distribution becomes a truncated density (22) p˜(x0), that is, ∗ Following [17],arecursionforJk can be written as  −  ∗ = 22 − 21 ∗ 11 1 12  p x0 Jk+1 Dk Dk Jk + Dk Dk , (23) , x0 ∈ Ψ, p˜ x = ∈Ψ p x0 dx0 (17) 0  x0 0 otherwise. where, in the case of additive Gaussian noise models applica- ij ble to our problem, matrices Dk are given by Likewise, the dynamics model should be modiﬁed in such a ( ∗ ) T ( ∗ ) 11 = E ˜ rk+1 −1 ˜ rk+1 way that xk is always constrained to Ψ. In the absence of hard Dk Fk Qk Fk , ∼ constraints, suppose that the process noise vk g(v) is used ( ∗ ) T 12 =−E ˜ rk+1 −1 = 21 T (24) in the ﬁlter. With constraints, the pdf of v becomes a state- Dk Fk Qk Dk , k dependent truncated density given by 22 = −1 E ˜ T −1 ˜ Dk Qk + Hk+1Rk+1Hk+1 ,  where  g(v) , v ∈ G xk , = ∈ g(v)dv ∗ g˜ v; xk  v G(xk) (18) (r ) ∗ T T  ˜ k+1 = ∇ (rk+1) 0 otherwise, Fk xk f (xk) , (25) ˜ = ∇ T T Hk+1 xk+1 hk+1(xk+1) , where G(xk) ={v : xk ∈ Ψ}. = 2 → = 2 For the bearings-only tracking problem, we will consider Rk+1 σβ Rk+1 σθ is the variance of the bearing mea- hard constraints in target dynamics only. The measurement surements, and Qk is the process noise covariance matrix. ˜(1) ∗ = model remains the same as that for the unconstrained prob- The Jacobian Fk for the case of rk+1 1 is simply the transi- (r∗ ) lem. Given a sequence of measurements Zk and some con- tionmatrixgivenin(6). For r∗ ∈{2, 3}, the Jacobian F˜ k+1 Ψ k+1 k straint on the state vector, the aim is to obtain estimates of can be computed as the state vector xk, that is,   (j) (j) ∂f ∂f 10 1 1  xˆ | = E x Z , Ψ   k k k k  ∂x˙k ∂y˙k    (19)  (j) (j)   ∂f ∂f  = xk p xk Zk, Ψ dxk,  2 2  01  (j)  ∂x˙k ∂y˙k  F˜ =   , j = 2, 3, (26) | Ψ k  (j) (j)  where p(xk Zk, ) is the posterior density of the state, given  ∂f ∂f   3 3  the measurements and hard constraints. 00   ∂x˙k ∂y˙k     (j) (j)  ∂f4 ∂f4 ´ 00 3. CRAMER-RAO LOWER BOUNDS ∂x˙k ∂y˙k We follow the approach taken in [12] for the development (j) · of a conservative CRLB for the manoeuvring target tracking where fi ( ) denotes the ith element of the dynamics model (j) · ˜(j) problem. This bound assumes that the true model history of function f ( ). The detailed evaluation of Fk is given in the the target trajectory appendix. Likewise, the Jacobian of the measurement function is given by H ∗ = r∗, r∗, ..., r∗ (20) k 1 2 k ∂h ∂h ∂h ∂h H˜ k+1 = , (27) is known a priori. Then, a bound on the covariance of xˆk was ∂xk+1 ∂yk+1 ∂x˙k+1 ∂y˙k+1 shown to be where E − − T xˆk xk xˆk xk ∂h yk+1 ∂h −xk+1 = , = , 2 2 2 2 ≥ E − − T H ∗ ∂xk+1 xk+1 + yk+1 ∂yk+1 xk+1 + yk+1 xˆk xk xˆk xk k (21) (28) ∂h ∂h ≥ ∗ −1 = = 0. Jk , ∂x˙k+1 ∂y˙k+1 Bearings-Only Tracking of Manoeuvring Targets 2355

For the case of additional measurements from a sec- 4.1. IMM-EKF algorithm ondary sensor, the only change required will be in the com- The IMM-EKF algorithm is an EKF-based routine that has 22 ˜ putation of Dk .Inparticular,Rk+1 and Hk+1 will be replaced been utilised for manoeuvring target tracking problems for- ˜ by Rk+1 and Hk+1, corresponding to the augmented measure- mulated in a JMS framework [7, 12]. The basic idea is that, ment vector (zk+1, zk+1). These are given by for each dynamic model of the JMS, a separate EKF is used, and the filter outputs are weighted according to the mode 2 probabilities to give the state estimate and covariance. At = σθ 0 Rk+1 2 , 0 σθ each time index, the target state pdf is characterised by a fi-   nite Gaussian mixture which is then propagated to the next ∂h ∂h ∂h ∂h (29) time index. Ideally, this propagation results in an s-fold in-    ∂x ∂y ∂x˙ ∂y˙  crease in the number of mixture components, where s is the H˜ =  k+1 k+1 k+1 k+1  , k+1  ∂h ∂h ∂h ∂h  number of modes in the JMS. However, the IMM-EKF algorithm avoids this growth by merging groups of components ∂xk+1 ∂yk+1 ∂x˙k+1 ∂y˙k+1 using mixture probabilities. The details of the IMM-EKF algorithm can be found in [7], where slightly different motion 2 → 2 where σβ σθ is the noise variance of the secondary sensor, models to the one used here were proposed. ˜ and the first row of Hk+1 is identical to (27).Theelementsof The sources of approximation in the IMM-EKF algo- ˜ the second row of Hk+1 are given by rithm are twofold. First, the EKF approximates nonlinear transformations by linear transformations at some operating 01 02 ∂h yk+1 + y − y point. If the nonlinearity is severe or if the operating point = k+1 k+1 , 01 02 2 01 02 2 ∂xk+1 xk+1 + x − x + yk+1 + y − y is not chosen properly, the resultant approximation can be k+1 k+1 k+1 k+1 01 02 poor, leading to filter divergence. Second, the IMM approx- ∂h − x +1 + x − x = k k+1 k+1 , imates the exponentially growing Gaussian mixture with a ∂y 01 − 02 2 01 − 02 2 k+1 xk+1 + xk+1 xk+1 + yk+1 + yk+1 yk+1 finite Gaussian mixture. The above two approximations can ∂h ∂h cause filter instability in certain scenarios. = = 0. Next, we provide details of the filter initialisation for the ∂x˙k+1 ∂y˙k+1 (30) EKF routines used in this algorithm. The simulation experiments for this problem will be car- 4.1.1. Filter initialisation ried out on fixed trajectories. This means that for the cor- ∼ N 2 responding CRLBs, the expectation operators in (24) vanish Suppose the initial prior range is r (r¯, σr ), where r¯ and 2 and the required Jacobians will be computed at the true tra- σr are the mean and variance of the initial range. Then, given jectories. The recursion (23) is initialised by the first bearing measurement θ1, the position components of the relative target state vector is initialised according to ∗ = −1 standard procedure [12], that is, J1 P1 , (31)

x1 = r¯sin θ1, y1 = r¯cos θ1, (32) where P1 is the initial covariance matrix of the state estimate. This can be computed using the expression (38), where we with covariance replace the measurement θ1 by the true initial bearing. σ2 σ Pxy = x xy , (33) σ σ2 4. TRACKING ALGORITHMS yx y σ2 = r¯2σ2 cos2 θ + σ2 sin2 θ , (34) This section describes five recursive algorithms designed for x θ 1 r 1 2 = 2 2 2 2 2 tracking a manoeuvring target using bearings-only measure- σy r¯ σθ sin θ1 + σr cos θ1, (35) ments. Two of the algorithms are IMM-based algorithms and σ = σ = σ2 − r¯2σ2 sin θ cos θ , (36) the other three are PF-based schemes. The algorithms con- xy yx r θ 1 1 sidered are (i) IMM-EKF, (ii) IMM-UKF, (iii) MMPF, (iv) where σθ is the bearing-measurement standard deviation. We AUX-MMPF, and (v) JMS-PF. All five algorithms are applica- adopt a similar procedure to initialise the velocity compo- ble to both single-sensor and multisensor tracking problems, nents. The overall relative target state vector can thus be ini- formulated in Section 2. tialised as follows. Suppose we have some prior knowledge Sections 4.1, 4.2, 4.3, 4.4,and4.5 will present the ele- ∼ N 2 of the target speed and course given by s (s¯, σs )and ments of the five tracking algorithms to be investigated. The ∼ N 2 c (c¯, σc ), respectively. Then, the overall relative target IMM-based trackers will not be presented in detail; the inter- state vector is initialised as ested reader is referred to [7, 12, 16] for a detailed exposition   of these trackers. Section 4.6 presents the required method- r¯sin θ  1  ology for the multisensor case while Section 4.7 discusses the   =  r¯cos θ1  xˆ1  − o  , (37) modifications required in the PF-based trackers for tracking s¯sin(c¯) x˙1 − o with hard constraints. s¯cos(c¯) y˙1 2356 EURASIP Journal on Applied Signal Processing

o o where (x˙1, y˙1) is the velocity of the ownship at time index 1. function given by the ith row of the Markov chain transition ∗i ∼ | i The corresponding initial covariance matrix is given by probability matrix. That is, we choose rk p(rk rk−1)such   that σ2 σ 00  x xy    ∗i σ σ2 00 P r = j = π . (42) =  yx y  k ij P1  2  , (38)  00σx˙ σx˙ y˙  ∗i ∗i ∼ 00 2 Next, given the mode rk , one can easily sample xk σy˙x˙ σy˙ | i i i ∼ p(xk xk−1, rk) by generating process noise sample vk−1 2 2 N i ∗i i where σx , σxy, σyx, σy are given by (34)–(36), and (0, Q) and propagating xk−1, rk ,andvk−1 through the { ∗i = dynamics model (3). This gives us the sample yk σ2 = s¯2σ2 cos2(c¯)+σ2 sin2(c¯), ∗i T ∗i T }N x˙ c s [(xk ) , rk ] i=1 which can be used to approximate the pos- | 2 = 2 2 2 2 2 terior pdf p(yk Zk)as σy˙ s¯ σc sin (c¯)+σs cos (c¯), (39) = = 2 − 2 2 N σx˙ y˙ σy˙x˙ σs s¯ σc sin(c¯)cos(c¯). ≈ i − ∗i p yk Zk wkδ yk yk , (43) 4.2. IMM-UKF algorithm i=1 This algorithm is similar to the IMM-EKF with the main dif- where ference being that the model-matched EKFs are replaced by model-matched UKFs [15]. The UKF for model 1 uses the i i ∗i w ∝ w − p zk y . (44) unscented transform (UT) only for the filter update (because k k 1 k only the measurement equation is non-linear). The UKFs for | ∗i = | ∗i Note that p(zk yk ) p(zk xk ) which can be computed us- models 2 and 3 use the UT for both the prediction and the ing the measurement equation (12). This completes the de- update stage of the filter. The IMM-UKF is initialised in a scription of a single recursion of the MMPF. The filter is ini- similar manner to that of the IMM-EKF. { i }N tialised by generating N samples x1 i=1 from the initial den- N 4.3. MMPF sity (xˆ1, P1), where xˆ1 and P1 were specified in (37)and (38), respectively. The MMPF [12, 13] has been used to solve various manoeu- A common problem with PFs is the degeneracy phe- vring target tracking problems. Here we briefly review the nomenon, where, after a few iterations, all but one particle basics of this filter. will have negligible weight. A measure of degeneracy is the The MMPF estimates the posterior density p(y |Z ), k k effective sample size Neff which can be empirically evaluated = T T where yk [xk , rk] is the augmented (hybrid) state vec- as tor. In order to recursively compute the PF estimates, the MC representation of p(yk|Zk) has to be propagated in time. Let 1 Nêff = . (45) { i i }N N i 2 yk−1, wk−1 i=1 denote a random measure that characterises i=1 wk | i = the posterior pdf p(yk−1 Zk−1), where yk−1, i 1, ..., N,is i = a set of support points with associated weights wk−1, i The usual approach to overcoming the degeneracy problem 1, ..., N. Then, the posterior density of the augmented state is to introduce resampling whenever Nêff

| T T where p(rk rk−1) is an element of the transition proba- By defining the augmented vector yk (xk , i, rk) ,wecan bility matrix Π defined by (9). To sample directly from write down an MC representation of the pdf p(xk, i, rk|Zk)as p(xk, i, rk|Zk)asgivenby(47)isnotpractical.Hence,we again use importance sampling [9, 14]tofirstobtainasam- ple from a density which closely resembles (47), and then N = ≈ j − j weight the samples appropriately to produce an MC repre- p xk, i, rk Zk p yk wkδ yk yk . (54) j=1 sentation of p(xk, i, rk|Zk). This can be done by introducing the function q(xk, i, rk|Zk) with proportionality Observe that by omitting the {ij , r j } components in the k q x , i, r Z ∝p z µi r p x xi , r p r ri wi , triplet sample, we have a representation of the marginalised k k k k k k k k−1 k k k−1 k−1 | (48) density p(xk Zk), that is, where N p x Z ≈ w j δ x − x j . (55) k k k k k i = E i j=1 µk rk xk xk−1, rk (49) = (rk) i o o f xk−1, xk−1, xk . The AUX-MMPF is initialised according to the same procedure as for MMPF. Importance density q(xk, i, rk|Zk)differs from (47)only in the first factor. Now, we can write q(xk, i, rk|Zk)as 4.5. JMS-PF The JMS-PF is based on the jump Markov linear system = q xk, i, rk Zk q i, rk Zk g xk i, rk, Zk (50) (JMLS) PF proposed in [18, 19]foraJMLS.Let and define Xk = x1, ..., xk , (56) = i Rk r1, ..., rk g xk i, rk, Zk p xk xk−1, rk . (51) denote the sequences of states and modes up to time index k. | In order to obtain a sample from the density q(xk, i, rk Zk), Standard particle filtering techniques focused on the estima- we first integrate (48)withrespecttoxk to get an expression tion of the pdf of the state vector x . However, in the JMS- | k for q(i, rk Zk), PF, we place emphasis on the estimation of the pdf of the ={ } mode sequence Rk, given measurements Zk z1, ..., zk . ∝ i i i | q i, rk Zk p zk µk rk p rk rk−1 wk−1. (52) The density p(Xk, Rk Zk) can be factorised into = A random sample can now be obtained from the density p Xk, Rk Zk p Xk Rk, Zk p Rk Zk . (57) | { j j }N q(xk, i, rk Zk)asfollows.First,asample i , rk j=1 is drawn from the discrete distribution q(i, rk|Zk)givenby(52). This Given a specific mode sequence Rk and measurements Zk, | can be done by splitting each of the N particles at k − 1 the first term on the right-hand side of (57), p(Xk Rk, Zk), into s groups (s is the number of possible modes), each can easily be estimated using an EKF or some other nonlin- | corresponding to a particular mode. Each of the sN parti- ear filter. Therefore, we focus our attention on p(Rk Zk); for cles is assigned a weight proportional to (52), and N points estimation of this density, we propose to use a PF. { j j }N Using Bayes’ rule, we note that i , rk j=1 are then sampled from this discrete distribution. { j }N From (50)and(51), it is seen that the samples xk j=1 from | p zk Zk−1, Rk p rk rk−1 the joint density q(xk, i, rk Zk) can now be generated from p Rk Zk = p Rk−1 Zk−1 . | ij j { j j j }N p zk Zk−1 p(xk xk−1, rk ). The resultant triplet sample xk, i , rk j=1 is a (58) random sample from the density q(xk, i, rk|Zk). To use these Equation (58) provides a useful recursion for the estimation samples to characterise the density p(xk, i, rk|Zk), we attach j j of p(Rk|Zk)usingaPF.Wedescribeageneralrecursiveal- the weights wk to each particle, where wk is a ratio of (48) { i }N gorithm which generates N particles R =1 at time k which and (47), evaluated at {x j , ij , r j }, that is, k i k k characterises the pdf p(Rk|Zk). The algorithm requires the introduction of an importance function q(r |Z , R − ). Sup- k k k 1 j j j j j j j − { i }N i i i pose at time k 1, one has a set of particles R −1 i=1 that j p zk xk p xk xk−1, rk p rk rk−1 wk−1 k w = − | − k ij j ij j j ij ij characterises the pdf p(Rk 1 Zk 1). That is, p zk µk rk p xk xk−1, rk p rk rk−1 wk−1 (53) p z x j N = k k ≈ 1 − i ij . p Rk−1 Zk−1 δ Rk−1 Rk−1 . (59) p zk µk rk N i=1 2358 EURASIP Journal on Applied Signal Processing

i ∼ | i Now draw N samples rk q(rk Zk, Rk−1). Then, from (58) This completes the description of the PF for estimation of and the principle of importance sampling, one can write the Markov chain distribution p(Rk|Zk). As mentioned earlier, given a particular mode sequence, the state estimates are N easily obtained using a standard EKF. ≈ i − i p Rk Zk wkδ Rk Rk , (60) i=1 4.6. Methodology for the multisensor case i ≡{ i i } where Rk Rk−1, rk and the weight The methodology for the multisensor case is similar to that of the single-sensor case. The two main points to note for i i i this case are that (a) the secondary measurement is processed i ∝ p zk Zk−1, Rk p rk rk−1 wk i i . (61) in a sequential manner assuming a zero time delay between q r Zk, R − k k 1 the primary and secondary measurements and (b) for the From (60), we note that one can perform resampling processing of the secondary measurement, the measurement (if required) to obtain an approximate i.i.d. sample from function (15) is used in place of (13) for the computation of the necessary quantities such as Jacobians, predicted mea- p(Rk|Zk). The recursion can be initialised according to the surements, and weights. specified initial state distribution of the Markov chain, πi = P(r1 = i). How do we choose the importance density q(rk|Zk, 4.7. Modifications for tracking with hard constraints Rk−1)? A sensible selection criterion is to choose a proposal The problem of bearings-only tracking with hard constraints that minimises the variance of the importance weights at was described in Section 2.3. Recall that for the constraint time k,givenRk−1 and Zk. According to this strategy, it xk ∈ Ψ, the state estimate is given by the mean of the pos- was shown in [18] that the optimal importance density is terior density p(x |Z , Ψ). This density cannot be easily con- | i k k p(rk Zk, Rk−1). Now, it is easy to see that this density satis- structed by standard Kalman-filter-based techniques. How- fies ever, since PFs make no restrictions on the prior density or i i the distributions of the process and measurement noise vec- p zk Zk−1, Rk−1, rk p rk rk−1 i = tors, it turns out that p(xk|Zk, Ψ) can be constructed using p rk Zk, Rk−1 i . (62) p zk Zk−1, Rk−1 PFs. The only modifications required in the PFs for the case of constraint xk ∈ Ψ areasfollows: i Note that p(rk|Zk, R − ) is proportional to the numerator of k 1 (i) the prior distribution needs to be p˜(x)definedin(17) (62) as the denominator is independent of rk. Also, the term and the filter needs to be able to sample from this den- p(rk|rk−1) is simply the Markov chain transition probability (specified by the transition probability matrix Π). The term sity; p(zk|Zk−1, Rk), which features in the numerator of (62), can (ii) in the prediction step, samples are drawn from the be approximated by one-step-ahead EKF outputs, that is, we constrained process noise density g˜(v; xk) instead of can write the standard process noise pdf. Both changes require the ability to sample from a truncated p z Z − , R ≈ N ν R , Z − ;0,S R , Z − , (63) k k 1 k k k k 1 k k k 1 density. A simple method to sample from a generic truncated density t˜(x)definedby where νk(·, ·)andSk(·, ·) are the mode-history-conditioned | innovation and its covariance, respectively. Thus, p(rk rk−1)  and (63) allow the computation of the optimal importance   t(x) ∈ Ψ density. , x , t˜(x) = ∈Ψ t(x)dx (66) ·|· ·  x Using (62) as the importance density q( , )in(61), we 0 otherwise find that the weight i ∝ i wk p zk Zk−1, Rk−1 . (64) is as follows. Suppose we can easily sample from t(x). Then, to draw x ∼ t˜(x), we can use rejection sampling from t(x) ∈ Ψ Since rk ∈{1, ..., s}, the importance weights given above can until the condition x is satisfied. The resulting sample be computed as is then distributed according to t˜(x). This simple technique will be adopted in the modifications required in the PF for i ∝ i 1 wk p zk Zk−1, Rk−1 the constrained problem. With the above modifications, the s PF leads to a cloud of particles that characterise the posterior (65) = i = = i | Ψ | p zk Zk−1, Rk−1, rk j p rk j rk−1 . density p(xk Zk, ) from which the state estimate xˆk k and its j=1 covariance Pk|k can be obtained. Note that the computation of the importance weights in (65)requiress one-step-ahead EKF innovations and their 1This rejection sampling scheme can potentially be inefficient. For more covariances. efficient schemes to sample from truncated densities, see [20]. Bearings-Only Tracking of Manoeuvring Targets 2359

5. SIMULATION RESULTS 1.5 In this section, we present a performance comparison of the 1 various tracking algorithms described in the previous sec- Start tion. The comparison will be based on a set of 100MC simu- 0.5 lations and where possible, the CRLB will be used to indicate the best possible performance that one can expect for a given 0 Start scenario and a set of parameters. Before proceeding, we give −

(km) 0.5 a description of the four performance metrics that will be y used in this analysis: (i) RMS position error, (ii) eﬃciency −1 η, (iii) root time-averaged mean square (RTAMS) error, and −1 5 (iv) number of divergent tracks. . To deﬁne each of the above performance metrics, let −2 i i i i (xk, yk)and(xˆk, yˆk) denote the true and estimated target po- −2.5 sitions at time k at the ith MC run. Suppose M of such MC 0 1 2 3 4 5 runs are carried out. Then, the RMS position error at k can x (km) be computed as Target Ownship M = 1 i − i 2 i − i 2 RMSk xˆk xk + yˆk yk . (67) Figure 1: A typical bearings-only tracking scenario with a manoeu- M = i 1 vring target.

−1 Now, if Jk [i, j] denotes the ijth element of the inverse information matrix for the problem at hand, then the corre- 5.1. Single-sensor case sponding CRLB for the metric (67)canbewrittenas The target-observer geometry for this case is shown in Figure 1. The target which is initially 5 km away from the = −1 −1 − ◦ CRLB RMSk Jk [1, 1] + Jk [2, 2]. (68) ownship maintains an initial course of 140 .Itexecutesa manoeuvre in the interval 20–25 minutes to attain a new course of 100◦, and maintains this new course for the rest The second metric stated above is the efficiency parameter η of the observation period. The ownship, travelling at a fixed defined as ◦ speed of 5 knots and an initial course of 140 ,executesama- noeuvre in the interval 13–17 minutes to attain a new course ◦ CRLB RMSk of 20 . It maintains this course for a period of 15 minutes ηk × 100% (69) RMSk at the end of which it executes a second manoeuvre and at- tains a new course of 155◦. The final target-observer range = for this case is 2.91 km. Bearing measurements with accuracy which indicates “closeness” to CRLB. Thus, ηk 100% im- = ◦ = plies an efficient estimator that achieves the CRLB exactly. σθ 1.5 are received every T 1 minute for an observation For a particular scenario and parameters, the overall per- period of 40 minutes. formance of a filter is evaluated using the third metric which Unless otherwise mentioned, the following nominal filter is the RTAMS error. This is defined as parameters were used in the simulations. The initial range and speed prior standard deviations were set to σr = 2km = and σs 2 knots, respectively. The initial course and√ its stan- = = t max M dard deviation were set to c¯ θ1 + π and σc π/ 12, where = 1 i − i 2 i − i 2 RTAMS − xˆk xk + yˆk yk , θ1 is the initial bearing measurement. The process noise pa- tmax M = = −6 2 k +1 i 1 rameter was set to σa = 1.6 × 10 km/s . The MMPF and (70) AUX-MMPF used N = 5000 particles while the JMS-PF used N = 100 particles. Resampling was carried out if Nˆ ff

Table 1: Performance comparison for the single-sensor case.

Algorithm/ RMS error (ﬁnal) RTAMS Improvement Divergent CRLB (km) (%) η (km) (%) tracks IMM-EKF 1.18 40 22 1.07 0 0 IMM-UKF 0.80 28 32 0.72 32 1 MMPF 0.59 20 43 0.44 59 0 AUX-MMPF 0.55 19 46 0.47 56 0 JMS-PF 0.77 27 33 0.64 40 0 CRLB 0.25 9 100 0.21 80 —

2 The rationale for the performance diﬀerences noted 1.8 above can be explained as follows. There are two sources of approximations in both IMM-EKF and IMM-UKF. Firstly, 1.6 the probability of the mode history is approximated by the 1.4 IMM routine which merges mode histories. Secondly, the mode-conditioned ﬁlter estimates are obtained using an EKF 1.2 and an UKF (for the IMM-EKF and IMM-UKF, respec- 1 tively), both of which approximate the non-Gaussian pos- 0.8 terior density by a Gaussian. In contrast, the MMPF and AUX-MMPF attempt to alleviate both sources of approxima- 0.6

RMS position error (km) tions: they estimate the mode probabilities with no merging 0.4 of histories and they make no linearisation (as in EKF) and 0.2 characterise the non-Gaussian posterior density in a near- optimal manner. Thus we observe the superior performance 0 20 25 30 35 40 of the MMPF and AUX-MMPF. The JMS-PF on the other Time (min) hand is worse than MMPF/AUX-MMPF but better than IMM-EKF/IMM-UKFasitattemptstoalleviateonlyone IMM-EKF AUX-MMPF of the sources of approximations discussed above. Specifi- IMM-UKF JMS-PF MMPF CRLB cally, while the JMS-PF attempts to compute the mode history probability exactly, it uses an EKF (a local linearisa- Figure 2: RMS position error versus time for a manoeuvring target tion approximation) to compute the mode-conditioned fil- scenario. tered estimates. Hence, note that even if the number of particles for the JMS-PF is increased, its performance can never and the typical manoeuvre acceleration parameter for the fil- reach that of the MMPF/AUX-MMPF. It is interesting to note −5 2 ters was set to am = 1.08 × 10 km/s . from the improvement figures for the JMS-PF and MMPF Figure 2 shows the RMS error curves corresponding to that the first source of approximation is more critical than the five filters: IMM-EKF, IMM-UKF, MMPF, AUX-MMPF, the second one. In fact, the contributions of the first and and JMS-PF. A detailed comparison is also given in Table 1. second sources of approximation appear to be in the ratio Note that the column “improvement” refers to the percent- 2:1. age improvement in RTAMS error compared with a baseline It is worth noting that in the above simulations, the per- filter which is chosen to be the IMM-EKF. From the graph formance of the AUX-MMPF is comparable to that of the and the table, it is clear that the performance of the IMM- MMPF. This is expected due to the low process noise used EKF and IMM-UKF is poor compared to the other three fil- in the simulations as one would expect the performance of ters. Though the final RMS error performance of the IMM- the AUX-MMPF to approach that of MMPF as the process UKF is comparable to the JMS-PF, since it has one divergent noise tends to zero. However, for problems with moderate to track, its overall performance is considered worse than that high process noise, the AUX-MMPF is likely to outperform of the JMS-PF. It is clear that the best filters for this case the MMPF. were the MMPF and AUX-MMPF which achieved 59% and Next, we illustrate a case where the IMM-EKF shows a 56% improvement, respectively, over the IMM-EKF. Also tendency to diverge while the MMPF tracks the target well for note that the JMS-PF performance is between that of IMM- the same set of measurements. Figure 3a shows the estimated EKF/IMM-UKF and MMPF/AUX-MMPF. From the simula- track and 95% error ellipses (plotted every 8 minutes) for the tions, it appears that the relative computational requirements IMM-EKF. Note that the IMM-EKF covariance estimate at 8 (with respect to the IMM-EKF) for the IMM-UKF, MMPF, minutes is poor as it does not encapsulate the true target po- AUX-MMPF, and JMS-PF are 2.6, 23, 32, and 30, respec- sition. This has resulted in not only subsequent poor track tively. estimates, but also inability to detect the target manoeuvre. Bearings-Only Tracking of Manoeuvring Targets 2361

1 1 0.9 0 0.8 −1 0.7 0.6 −2

(km) 0.5 y −3 0.4

−4 Mode probability 0.3 0.2 −5 0.1 −6 0 0 2 4 6 8 0 5 10 15 20 25 30 35 40 x (km) Time (min)

Target CV model Ownship CT (correct) 95% conﬁdence ellipses CT (opp) Track estimates

(a) (b)

Figure 3: IMM-EKF tracker results. (a) Track estimates and 95% conﬁdence ellipses. (b) Mode probabilities.

2 1 0.9 1 0.8 0.7 0 0.6 − 0.5 (km) 1 y 0.4

−2 Mode probability 0.3 0.2 −3 0.1 0 0 1 2 3 4 5 6 7 0 5 10 15 20 25 30 35 40 x (km) Time (min)

Target CV model Ownship CT (correct) 95% conﬁdence ellipses CT (opp) Track estimates

(a) (b)

Figure 4: MMPF tracker results. (a) Track estimates and 95% conﬁdence ellipses. (b) Mode probabilities.

This is clear from the mode probability curves shown in For the same set of measurements, the MMPF shows ex- Figure 3b, where we note that though there is a slight bump cellent performance as can be seen from Figure 4.Herewe in the mode probability for the correct manoeuvre model, note that the 95% conﬁdence ellipse of the PF encapsulates the algorithm is unable to establish the occurrence of the ma- the true target position at all times. Notice that the size of noeuvre. The overall result is a track that is showing a ten- the covariance matrix shortly after the target manoeuvre is dency to diverge from the true track. small compared to other times. The reason for this is that the 2362 EURASIP Journal on Applied Signal Processing

6 lated covariance of the IMM-based filters are in error, leading to a large innovation for the second sensor measurement. 5 The inaccurate covariance estimate results in an incorrect filter gain computation for the second sensor measurement. In the update equations of these filters, the large innovation gets 4 weighted by the computed gain which does not properly reflect the contribution of the new measurement. The conse- 3 quence of this is filter divergence. It turns out that for the ownship measurements-only case, even if the track and co- 2 variance estimates are in error, the errors introduced in the filter gain computation are not as severe as in the multisensor RMS position error (km) 1 case. Furthermore, as the uncertainty is mainly along the line of bearing, the innovation for this case is not likely to be very large. Thus the severity of track and covariance error for this 0 0 5 10 15 20 25 30 35 40 particular scenario is worse for the multisensor case than for Time (min) the single-sensor case. Similar results have been observed in the context of an air surveillance scenario [12]. IMM-EKF IMM-UKF 5.3. Tracking with hard constraints MMPF CRLB In this section, we present the results for the case of bearings- only tracking with hard constraints. The scenario and pa- Figure 5: RMS position error versus time for a multisensor case. rameters used for this case are identical to the ones considered in Section 5.1. This time, however, in addition to the available bearing measurements, we also impose some target observability is best at that instant compared to other hard constraints on target speed. Specifically, assume that we times. For the given scenario, both the ownship manoeuvre have prior knowledge that the target speed is in the range and the target manoeuvre have resulted in a geometry that 3.5 ≤ s ≤ 4.5 knots. This type of nonstandard information is very observable at that instant. After the target manoeu- is difficult to incorporate into the standard EKF-based algo- vre, the relative position of the target increases and this leads rithms (such as the IMM-EKF), and so in the comparison to a slight decrease in observability and hence slight enlarge- below, the IMM-EKF will not utilise the hard constraints. ment of the covariance matrix. The mode probability curves However, the PF-based algorithms, and, in particular, the for the MMPF shows that unlike the results of IMM-EKF, the MMPF and AUX-MMPF, can easily incorporate such non- MMPF mode probabilities indicate a higher probability of standard information according to the technique described occurrence of a manoeuvre. The overall result is a much bet- in Section 4.7. ter tracker performance for the same set of measurements. Figure 6 shows the RMS error in estimated position for the MMPF that incorporates prior knowledge of speed con- 5.2. Multisensor case straint (referred to as MMPF-C). The figure also shows Here we consider the scenario identical to the one consid- the performance curves of the IMM-EKF and the standard ered in Section 5.1, except that an additional static sensor, MMPF that do not utilise knowledge of hard constraints. located at (5 km, −2 km), provides bearing measurements to A detailed numerical comparison is given in Table 3.Itcan the ownship at regular time intervals. These measurements, be seen that the MMPF-C achieves 83% improvement in ◦ with accuracy σθ = 2 , arrive at only 3 time epochs, namely, RTAMS over the IMM-EKF. Also, observe that by incorpo- at k = 10, 20, and 30. Figure 5 shows a comparison of IMM- rating the hard constraints, the MMPF-C achieves a 50% re- EKF, IMM-UKF, and MMPF for this case. It is seen that the duction in RTAMS error over the standard MMPF that does MMPF exhibits excellent performance, with RMS error re- not utilise hard constraints (emphasising the significance of sults very close to the CRLB. The detailed comparison given this nonstandard information). Incorporating such nonstan- in Table 2 shows that MMPF achieves a final RMS error accu- dard information results in highly non-Gaussian posterior racy that is within 8% of the final range. By comparing with pdfs which the PF is effectively able to characterise. the corresponding results for the single-sensor case, we note that the final RMS error is reduced by a factor of 2.5. Inter- 6. CONCLUSIONS estingly, the IMM-EKF and IMM-UKF performance is very poor and is worse than their corresponding performance This paper presented a comparative study of PF-based track- when no additional measurement is received. Though this ers against conventional IMM-based routines for the prob- may seem counterintuitive, it can be explained as follows. lem of bearings-only tracking of a manoeuvring target. Three For the given geometry, at the time of the first arrival of the separate cases have been analysed: single-sensor case; mul- bearing measurement from the second sensor, it is possible tisensor case, and tracking with speed constraints. The re- that due to nonlinearities and low observability in the time sults overwhelmingly confirm the superior performance of interval 0–10 minutes, the track estimates and filter calcu- PF-based algorithms against the conventional IMM-based Bearings-Only Tracking of Manoeuvring Targets 2363

Table 2: Performance comparison for the multisensor case.

Algorithm/ RMS error (ﬁnal) RTAMS Improvement Divergent CRLB (km) (%) η (km) (%) tracks IMM-EKF 5.03 173 3 3.16 0 17 IMM-UKF 3.51 121 4 2.32 27 7 MMPF 0.25 8 63 0.22 93 1 CRLB 0.15 5 100 0.13 96 —

Table 3: Performance comparison for tracking with hard constraints.

RMS error (ﬁnal) RTAMS Improvement Divergent Algorithm (km) (%) (km) (%) tracks IMM-EKF 1.37 47 1.21 0 0 MMPF 0.53 18 0.44 64 0 MMPF-C 0.12 4 0.20 83 0

2 j = 2, 3, the required Jacobians can be computed to give 1.8   (j) (j)  ∂f1 ∂f1  1.6 10   ∂x˙k ∂y˙k  1.4    (j) (j)     ∂f2 ∂f2  1.2 01  (j)  ∂x˙k ∂y˙k  F˜ =   , j = 2, 3, (A.1) 1 k  (j) (j)   ∂f ∂f  00 3 3  0.8    ∂x˙k ∂y˙k  0.6  (j) (j)   ∂f4 ∂f4  RMS position error (km) 00 0.4 ∂x˙k ∂y˙k 0.2 where 0 20 25 30 35 40 (j) Ω(j) Ω(j) Time (min) ∂f1 = sin k T (j) ∂ k (j) + g1 (k) , ∂x˙k Ω ∂x˙k IMM-EKF k MMPF (j) − − Ω(j) Ω(j) MMPF-C ∂f1 = 1 cos k T (j) ∂ k (j) + g1 (k) , ∂y˙k Ω ∂y˙k k Figure 6: RMS position error versus time for the case of tracking with speed constraint 3.5 ≤ s ≤ 4.5 knots. (j) − Ω(j) Ω(j) ∂f2 = 1 cos k T (j) ∂ k (j) + g2 (k) , ∂x˙k Ω ∂x˙k k schemes. The key strength of the PF, demonstrated in this (j) Ω(j) Ω(j) ∂f2 = sin k T (j) ∂ k application, is its flexibility to handle nonstandard informa- (j) + g2 (k) , ∂y˙k Ω ∂y˙k tion along with the ability to deal with nonlinear and non- k (A.2) Gaussian models. (j) Ω(j) ∂f3 = Ω(j) (j) ∂ k cos k T + g3 (k) , ∂x˙k ∂x˙k APPENDIX (j) Ω(j) ∂f3 =− Ω(j) (j) ∂ k sin k T + g3 (k) , JACOBIANS OF THE MANOEUVRE DYNAMICS ∂y˙k ∂y˙k ( ∗ ) ˜ rk+1 ∗ ∈{ } (j) (j) The Jacobians Fk , rk+1 2, 3 , of the manoeuvre dynam- ∂f ( ) ( ) ∂Ω 4 = sin Ω j T + g j (k) k , ics can be computed by taking the gradients of the respective ∂x˙ k 4 ∂x˙ ( ) k k transitions. Let f j (·) denote the ith element of the dynam- i (j) Ω(j) ics model function f(j)(·)andlet(x˙ t , y˙ t ) denote the target ∂f4 = Ω(j) (j) ∂ k k k cos k T + g4 (k) , velocity vector. Then, by taking the gradients of f(j)(·)for ∂y˙k ∂y˙k 2364 EURASIP Journal on Applied Signal Processing with [10] A. Doucet, N. de Freitas, and N. Gordon, Eds., Sequential Monte Carlo Methods in Practice, Springer, New York, NY, (j) t (j) t USA, 2001. (j) T cos Ω T x˙ sin Ω T x˙ g (k) = k k − k k [11] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A 1 Ω(j) Ω(j) 2 k k tutorial on particle filters for online nonlinear/ non-Gaussian Bayesian tracking,” IEEE Trans. Signal Processing, vol. 50, no. Ω(j) ˙ t − T sin k T yk 2, pp. 174–188, 2002. Ω(j) [12] B. Ristic and M. S. Arulampalam, “Tracking a manoeuvring k target using angle-only measurements: algorithms and per- ( ) − 1+cos Ω j T y˙ t formance,” Signal Processing, vol. 83, no. 6, pp. 1223–1238, + k k , 2003. Ω(j) 2 k [13] S. McGinnity and G. W. Irwin, “Multiple model bootstrap (j) t (j) t filter for maneuvering target tracking,” IEEE Transactions on ( ) T sin Ω T x˙ 1 − cos Ω T x˙ g j (k) = k k − k k Aerospace and Electronic Systems, vol. 36, no. 3, pp. 1006–1012, 2 Ω(j) Ω(j) 2 2000. k k [14] R. Karlsson and N. Bergman, “Auxiliary particle filters for ( ) ( ) T cos Ω j T y˙ t sin Ω j T y˙ t (A.3) tracking a maneuvering target,” in Proc. 39th IEEE Conference + k k − k k , on Decision and Control, vol. 4, pp. 3891–3895, Sydney, Aus- Ω(j) Ω(j) 2 k k tralia, December 2000. [15] S. Julier, J. Uhlmann, and H. F. Durrant-Whyte, “A new (j) =− Ω(j) t − Ω(j) t g3 (k) sin k T Tx˙k cos k T T y˙k, method for the nonlinear transformation of means and covariances in filters and estimators,” IEEE Trans. Automatic (j) = Ω(j) t − Ω(j) t g4 (k) cos k T Tx˙k sin k T T y˙k, Control, vol. 45, no. 3, pp. 477–482, 2000. [16] Y. Bar-Shalom and X. R. Li, Estimation and Tracking: Princi- Ω(j) − j+1 t ∂ k ( 1) amx˙k ples, Techniques and Software, Artech House, Norwood, Mass, = , ˙ 2 2 3/2 ∂xk x˙ t + y˙ t USA, 1993. k k [17] P. Tichavsky, C. H. Muravchik, and A. Nehorai, “Poste- Ω(j) − j+1 t rior Cramer-Rao´ bounds for discrete-time nonlinear filter- ∂ k ( 1) am y˙k = ing,” IEEE Trans. Signal Processing, vol. 46, no. 5, pp. 1386– ˙ 2 2 3/2 ∂yk x˙ t + y˙ t 1396, 1998. k k [18] A. Doucet, N. J. Gordon, and V. Krishnamurthy, “Particle filters for state estimation of jump Markov linear systems,” IEEE for j = 2, 3. Trans. Signal Processing, vol. 49, no. 3, pp. 613–624, 2001. [19] N. Bergman, A. Doucet, and N. Gordon, “Optimal estimation REFERENCES and Cramer-Rao´ bounds for partial non-Gaussian state space models,” Annals of the Institute of Statistical Mathematics, vol. [1] S. Blackman and R. Popoli, Design and Analysis of Modern 53, no. 1, pp. 97–112, 2001. Tracking Systems, Artech House, Norwood, Mass, USA, 1999. [20] C. P. Robert and G. Casella, Monte Carlo Statistical Methods, [2] J. C. Hassab, Underwater Signal and Data Processing,CRC Springer-Verlag, New York, NY, USA, 1999. Press, Boca Raton, Fla, USA, 1989. [21] J. Carpenter, P. Clifford, and P. Fearnhead, “Improved particle [3] S. Nardone, A. Lindgren, and K. Gong, “Fundamental prop- filter for nonlinear problems,” IEE Proceedings—Radar, Sonar erties and performance of conventional bearings-only target and Navigation, vol. 146, no. 1, pp. 2–7, 1999. motion analysis,” IEEE Trans. Automatic Control, vol. 29, no. 9, pp. 775–787, 1984. [4] E. Fogel and M. Gavish, “Nth-order dynamics target ob- M. Sanjeev Arulampalam received the B.S. servability from angle measurements,” IEEE Transactions on degree in mathematics and the B.E. de- Aerospace and Electronic Systems, vol. 24, no. 3, pp. 305–308, gree with first-class honours in electrical 1988. and electronic engineering from the Uni- [5] T. L. Song, “Observability of target tracking with bearings- versity of Adelaide in 1991 and 1992, re- only measurements,” IEEE Transactions on Aerospace and Elec- spectively. In 1993, he won a Telstra Re- tronic Systems, vol. 32, no. 4, pp. 1468–1472, 1996. search Labs Postgraduate Fellowship award [6] S. S. Blackman and S. H. Roszkowski, “Application of IMM to work toward a Ph.D. degree in electri- filtering to passive ranging,” in Proc. SPIE Signal and Data cal and electronic engineering at the Uni- Processing of Small Targets, vol. 3809 of Proceedings of SPIE, versity of Melbourne, which he completed pp. 270–281, Denver, Colo, USA, July 1999. in 1997. Since 1998, Dr. Arulampalam has been with the De- [7] T. Kirubarajan, Y. Bar-Shalom, and D. Lerro, “Bearings-only fence Science and Technology Organisation (DSTO), Australia, tracking of maneuvering targets using a batch-recursive estimator,” IEEE Transactions on Aerospace and Electronic Systems, working on many aspects of target tracking with a particular em- vol. 37, no. 3, pp. 770–780, 2001. phasis on nonlinear/non-Gaussian tracking problems. In March [8] J.-P. Le Cadre and O. Tremois, “Bearings-only tracking for 2000, he won the Anglo-Australian Postdoctoral Research Fellow- maneuvering sources,” IEEE Transactions on Aerospace and ship, awarded by the Royal Academy of Engineering, London. Electronic Systems, vol. 34, no. 1, pp. 179–193, 1998. This postdoctoral research was carried out in the UK, both at the [9] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel ap- Defence Evaluation and Research Agency (DERA) and at Cam- proach to nonlinear/non-Gaussian Bayesian state estimation,” bridge University, where he worked on particle filters for nonlin- IEE Proceedings Part F: Radar and Signal Processing, vol. 140, ear tracking problems. Currently, he is a Senior Research Scientist no. 2, pp. 107–113, 1993. in the Submarine Combat Systems Group of Maritime Operations Bearings-Only Tracking of Manoeuvring Targets 2365

Division, DSTO, Australia. His research interests include estima- the Air Warfare Destroyer, Combat System Studies. Dr. Mansell is tion theory, target tracking, and sequential Monte Carlo methods. currently the Counsellor for Defence Science in the Australian High Dr. Arulampalam coauthored a recent book, Beyond the Kalman Commission, London. Dr. Mansell’s main research interests are in Filter: Particle Filters for Tracking Applications, Artech House, 2004. combat system architectures, open system architectures, information fusion, artiﬁcial intelligence, human-machine interaction, and B. Ristic received all his degrees in electrical tactical decision support systems. engineering: a Ph.D. degree from Queens- land University of Technology (QUT), Aus- tralia, in 1995, an M.S. degree from Bel- grade University, Serbia, in 1991, and a B.E. degree from the University of Novi Sad, Ser- bia, in 1984. He began his career in 1984 at the Vincaˇ Institute, Serbia. From 1989 to 1994, he was with the University of Queens- land, Brisbane, and QUT, Australia, doing research related to the automatic placement and routing of integrated circuits and the design and analysis of time-frequency and time-scale distributions. In 1995, he was with GEC Marconi Sys- tems in Sydney, Australia, developing a concept demonstrator for noise cancellation in towed arrays. Since 1996, he has been with the Defense Science and Technology Organisation (DSTO), Aus- tralia, where he has been involved in the design and analysis of target tracking and data fusion systems. During 2003/2004, he has been on a study leave with Universite´ Libre de Bruxelles (ULB), Belgium, doing research on reasoning under uncertainty. Dr. Ristic coauthored a recent book, Beyond the Kalman Filter: Particle Filters for Tracking Applications, Artech House, 2004. During his career, he has published more than 80 technical papers.

N. Gordon obtained a B.S. in mathematics and physics from Nottingham University in 1988 and a Ph.D. degree in statistics from Imperial College, University of London in 1993. He was with the Defence Evaluation and Research Agency (DERA) and QinetiQ in the UK from 1988 to 2002. In 2002, he joined the Tracking and Sensor Fusion Re- search Group at Defence Science and Tech- nology Organisation (DSTO) in Australia. Neil has written approximately 65 articles in peer-reviewed journals and international conferences on tracking and other dynamic state estimation problems. He is the coauthor/coeditor of two books on particle ﬁltering.

T. Mansell received a B.S. degree with ﬁrst- class honours in physics and electronics from Deakin University in 1989. Follow- ing graduation, he joined the Defence Sci- ence and Technology Organisation (DSTO) in Melbourne as a Cadet Research Scientist, and in 1990 began a Ph.D. in artiﬁcial intelligence with The University of Melbourne. On completion of his Ph.D. in 1994, Dr. Mansell began working on the application of information fusion techniques to naval problems (including mine warfare and combat systems). In 1996, Dr. Mansell undertook a 15-month posting with Canada’s Defence Research Establishment Atlantic in the sonar information management group looking at tactical decision-making and naval combat systems. On return to Australia, Dr. Mansell relocated to DSTO, Edinburgh, South Aus- tralia, to lead the submarine combat systems task. In 2000, he became Head of the Submarine Combat Systems Group (SMCS) within Maritime Operations Division. In 2003, Dr. Mansell was ap- pointed Head of Maritime Combat Systems and the DSTO lead for EURASIP Journal on Applied Signal Processing 2004:15, 2366–2384 c 2004 Hindawi Publishing Corporation

Time-Varying Noise Estimation for Speech Enhancement and Recognition Using Sequential Monte Carlo Method

Kaisheng Yao Institute for Neural Computation, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0523, USA Email: [email protected]

Te-Won Lee Institute for Neural Computation, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0523, USA Email: [email protected]

Received 4 May 2003; Revised 9 April 2004

We present a method for sequentially estimating time-varying noise parameters. Noise parameters are sequences of time-varying mean vectors representing the noise power in the log-spectral domain. The proposed sequential Monte Carlo method generates a set of particles in compliance with the prior distribution given by clean speech models. The noise parameters in this model evolve according to random walk functions and the model uses extended Kalman filters to update the weight of each particle as a function of observed noisy speech signals, speech model parameters, and the evolved noise parameters in each particle. Finally, the updated noise parameter is obtained by means of minimum mean square error (MMSE) estimation on these particles. For efficient computations, the residual resampling and Metropolis-Hastings smoothing are used. The proposed sequential estimation method is applied to noisy speech recognition and speech enhancement under strongly time-varying noise conditions. In both scenarios, this method outperforms some alternative methods. Keywords and phrases: sequential Monte Carlo method, speech enhancement, speech recognition, Kalman filter, robust speech recognition.

1. INTRODUCTION The second approach is based on statistical models of speech and/or noise. For example, parallel model combina- A speech processing system may be required to work in con- tion (PMC) [8] adapts speech mean vectors according to ditions where the speech signals are distorted due to back- the input noise power. In [9], code-dependent cepstral nor- ground noise. Those distortions can drastically drop the per- malization (CDCN) modiﬁes speech signals based on prob- formance of automatic speech recognition (ASR) systems, abilities from speech models. Since methods in this model- which usually perform well in quiet environments. Similarly, based approach are devised in a principled way, for example, speech-coding systems spend much of their coding capacity maximum likelihood estimation [9], they usually have bet- encoding additional noise information. ter performances than methods in the ﬁrst approach, par- There have been great interests in developing algo- ticularly in applications such as noisy speech recognition rithms that achieve robustness to those distortions. In gen- [10]. eral, the proposed methods can be grouped into two ap- However, a main shortcoming in some of the methods proaches. One approach is based on front-end process- described above lies in their assumption that the background ing of speech signals, for example, speech enhancement. noise is stationary (noise statistics do not change in a given Speech enhancement can be done either in time-domain, utterance). Based on this assumption, noise is often esti- for example, in [1, 2], or more widely used, in spectral mated from segmented noise-alone slices, for example, by domain [3, 4, 5, 6, 7]. The objective of speech enhance- voice-activity detection (VAD) [7]. Such an assumption may ment is to increase signal-to-noise ratio (SNR) of the pro- not hold in many real applications because the estimated cessed speech with respect to the observed noisy speech noise may not be pertinent to noise in speech intervals in signal. nonstationary environments. Time-Varying Noise Estimation Using Monte Carlo Method 2367

Recently, methods have been proposed for speech en- 2. PROBLEM DEFINITION hancement in nonstationary noise. For example, in [11], 2.1. Model definitions a method based on sequential Monte Carlo method is applied to estimate time-varying autocorrelation coefficients of Consider a clean speech signal x(t)attimet that is corrupted speech models for speech enhancement. This algorithm is by additive background noise n(t).2 In time domain, the re- more advanced in its assumption that autocorrelation coef- ceived speech signal y(t)canbewrittenas ficients of speech models are time varying. In fact, sequen- = tial Monte Carlo method is also applied to estimate noise y(t) x(t)+n(t). (1) parameters for robust speech recognition in nonstationary Assume that the speech signal ( ) and noise ( )areun- noise [12] through a nonlinear model [8], which was re- x t n t correlated. Hence, the power spectrum of the input noisy sig- cently found to be effective for speech enhancement [13]as nal is the summation of the power spectra of clean speech sig- well. nal and those of the noise. The output at filter bank can be The purpose of this paper is to present a method based j lin = | L−1 − − j2πlm/L|2 on sequential Monte Carlo for estimation of noise parameter described by yj (t) m b(m) l=0 v(l)y(t l)e , ∗ (time-varying mean vector of a noise model) with its appli- summing the power spectra of the windowed signal v(t) cation to speech enhancement and recognition. The method y(t)withlengthL at each frequency m with binning weight is based on a nonlinear function that models noise effects on b(m). v(t) is a window function (usually a Hamming win- 3 speech [8, 12, 13]. Sequential Monte Carlo method generates dow) and b(m) is a triangle window. Similarly, we denote the filter bank output for clean speech signal x(t) and noise particles of parameters (including speech and noise parame- lin lin ters) from a prior speech model that has been trained from a n(t)asxj (t)andnj (t)forjth filter bank, respectively. They clean speech database. These particles approximate posterior are related as distribution of speech and noise parameter sequences given lin = lin lin the observed noisy speech sequence. Minimum mean square yj (t) xj (t)+nj (t), (2) error (MMSE) estimation of the noise parameter is obtained from these particles. Once the noise parameter has been es- where j is from 1 to J,andJ is the number of filter banks. timated, it is used in subtraction-type speech enhancement The filter bank output exhibits a large variance. In order methods, for example, Wiener filter and perceptual filter,1 to achieve an accurate statistical model, in some applications, for example, speech recognition, logarithm compression of and adaptation of speech mean vectors for speech recogni- lin tion. yj (t) is used instead. The corresponding compressed power The remainder of the paper is organized as follows. The spectrum is called log-spectral power, which has the follow- model specification and estimation objectives for the noise ing relationship (derived in Appendix A) with noisy signal, parameters are stated in Section 2.InSection 3, the sequen- clean speech signal, and noise: tial Monte Carlo method is developed to solve the noise pa- yl (t) = xl (t) + log 1+exp nl (t) − xl (t) . (3) rameter estimation problem. Section 4.3 demonstrates appli- j j j j cation of this method to speech recognition by modifying The function is plotted in Figure 1. We observed that this speech model parameters. Application to speech enhance- function is convex and continuous. For noise log-spectral ment is shown in Section 4.4. Discussions and conclusions power nl (t) that is much smaller than clean speech log- are presented in Section 5. j l l spectral power xj (t), the function outputs xj (t). This shows that the function is not “sensitive” to noise log-spectral Notation power that is much smaller than clean speech log-spectral Sets are denoted as {·, ·}. Vectors and sequences of vec- power.4 tors are denoted by uppercased letters. Time index is in the We consider the vector for clean speech log-spectral = l = l l T parenthesis of vectors. For example, a sequence Y(1 : T) power X (t) (x1(t), ..., xJ (t)) . Suppose that the statistics (Y(1) Y(2) ··· Y(T)) consists of vector Y(t)attimet, of the log-spectral power sequence Xl(1 : T) can be modeled where its ith element is yi(t). The distribution of the vector by a hidden Markov model (HMM) with output density at ≤ ≤ Y(t)isp(Y(t)). Superscript T denotes transpose. each state st (1 st S) represented by mixtures of Gaussian M l l l The symbol X (or x) is exclusively used for original = N ( ( ); , Σ ), where denotes the number kt 1 πst kt X t µstkt st kt M speech and Y (or y) is used for noisy speech in testing environments. N (or n) is used to denote noise. By default, observation (or feature) vectors are in log- 2Channel distortion and reverberation are not considered in this pa- spectral domain. Superscripts lin, l, c denote linear spec- per. In this paper, x(t) can be considered as a speech signal received by a tral domain, log-spectral domain, and cepstral domain. The close-talking microphone, and n(t) is the background noise picked up by ∗ the microphone. symbol denotes convolution. 3In Mel-scaled filter bank analysis [16], b(m) is a triangle window centered in the Mel scale. 4We will discuss later in Sections 3.5 and 4.2 that such property may 1A model for frequency masking [14, 15]isapplied. result in larger-than-necessary estimation of the noise log-spectral power. 2368 EURASIP Journal on Applied Signal Processing

10 Furthermore, to model time-varying noise statistics, we l 9 assume that the noise parameter µn(t) follows a random walk function; that is, 8 ) t

( j l 7 l l l y ∼ | − µn(t) p µn(t) µn(t 1) 6 (8) = N µl (t); µl (t − 1), V l . 5 n n n 4 l We collectively denote these parameters {µ (t), st, kt, 3 st kt Observation power l ( ); l ( ) ∈ RJ ,1≤ ≤ ,1≤ ≤ , l ( ) ∈ RJ } as µn t µstkt t st S kt M µn t 2 θ(t). It is clearly seen from (4)–(8) that they have the follow- 1 ing prior distribution and likelihood at each time t: 0 −10 −8 −6 −4 −20246810 p θ(t)|θ(t − 1) Noise power nl (t) j = ast−1st πst kt (9) Figure 1: Plot of function yl (t) = xl (t)+log(1+exp(nl (t)−xl (t))). j j j j × N µl (t); µl , Σl N µl (t); µl (t − 1), V l , l = l − st kt stkt st kt n n n xj (t) 1.0; nj (t)rangesfrom 10.0 to 10.0. p Y l(t)|θ(t) = N Y l(t); µl (t) (10) of Gaussian densities in each state. To model the statistics of stkt l noise log-spectral power N (1 : T), we use a single Gaussian +log 1+exp µl (t) − µl (t) , Σl . l n st kt st kt density with a time-varying mean vector µn(t) and a constant l diagonal variance matrix Vn. Remark 1. In comparison with the traditional HMM, the With the above-defined statistical models, we may plot new model shown in Figure 2 may provide more robustness the dependence among their parameters and observation se- to contaminating noise, because it includes explicit modeling quence Y l(1 : t) by a graphical model [17]inFigure 2. of the time-varying noise parameters. However, probabilistic In this figure, the rectangular boxes correspond to discrete inference in the new model can no longer be done by the ef- state/mixture indexes, and the round circles correspond to ficient Viterbi algorithm [18]. continuous-valued vectors. Shaded circles denote observed noisy speech log-spectral power. 2.2. Estimation objective The state st ∈{1, ..., S} gives the current state index at frame t. State sequence is a Markovian sequence with state The objective of this method is to estimate, up to time t,a l | = sequence of noise parameters µn(1 : t) given the observed transition probability p(st st−1) ast−1st . At state st, an index l l noisy speech log-spectral sequence Y l(1 : t) and the above kt ∈{1, ..., M} assigns a Gaussian density N (·; µ , Σ ) stkt st kt defined graphical model, in which speech models are trained with prior probability p(kt|st) = πs k . Speech parameter t t l l ( ) is thus distributed in Gaussian given and ; that from clean speech signals. Formally, µn(1 : t) is calculated by µst kt t st kt is, the MMSE estimation ∼ | = s p s s − a − ,(4) t t t 1 st 1st l = l l | l l µˆn(1 : t) µn(1 : t)p µn(1 : t) Y (1 : t) dµn(1 : t), k ∼ p k |s = π ,(5) l t t t st kt µn(1:t) l ( ) ∼ N ·; l , Σl (6) (11) µst kt t µstkt st kt . where p(µl (1 : t)|Y l(1 : t)) is the posterior distribution of Assuming that the variances of Xl(t)andNl(t)arevery n µl (1 : t)givenY l(1 : t). small (as done in [8]) for each filter bank j,givens and k ,we n t t Based on the graphical model shown in Figure 2, may relate the observed signal Y l(t) to speech mean vector Bayesian estimation of the time-varying noise parameter µl (t) and time-varying noise mean vector µl (t) with the st kt n l (1 : ) involves construction of a likelihood function of function µn t observation sequence Y l(1 : t) given parameter sequence Θ(1 : t) = (θ(1), ..., θ(t)) and prior probability p(Θ(1 : t)) l( ) = l ( )+log 1+exp l ( )− l ( ) + ( ), (7) for = 1, , . The posterior distribution of Θ(1 : )given Y t µst kt t µn t µstkt t wst kt t t ... T t observation sequence Y l(1 : t)is where ( ) is distributed in N (·;0,Σl ), representing the wst kt t stkt possible modeling error and measurement noise in the above p Θ(1 : t)|Y l(1 : t) ∝ p Y l(1 : t)|Θ(1 : t) p Θ(1 : t) . equation. (12) Time-Varying Noise Estimation Using Monte Carlo Method 2369

s0 st−1 st sT

k0 kt−1 kt kT

µl (0) µl (t − 1) µl (t) µl (T) s0k0 st−1kt−1 st kt sT kT

Y l(0) Y l(t − 1) Y l(t) Y l(T)

l l − l l µn(0) µn(t 1) µn(t) µn(T)

Figure 2: The graphical model representation of the dependence of the speech and noise model parameters. st and kt denote the state and l l l Gaussian mixture at frame t in speech model. µst kt (t)andµn(t) denote the speech and noise parameters. Y (t) is the observed noisy speech signal at frame t.

Due to the Markovian property shown in (9)and(10), 3. SEQUENTIAL MONTE CARLO METHOD the above posterior distribution can be written as FOR NOISE PARAMETER ESTIMATION Θ | l This section presents a sequential Monte Carlo method for p (1 : t) Y (1 : t) estimating noise parameters from observed noisy signals and t pretrained clean speech models. This method applies se- ∝ p Y l(τ)|θ(τ) p θ(τ)|θ(τ−1) p Y l(1)|θ(1) p θ(1) . quential Bayesian importance sampling (BIS) in order to τ=2 generate particles of speech and noise parameters from a pro- (13) posal distribution. These particles are selected according to Based on this posterior distribution, MMSE estimation their weights calculated with a function of their likelihood. in (11)canbeachievedby It should be noted that the application here is one particular case of a more general sequential BIS method [19, 20]. µˆl (1 : t) = µl (1 : t) n l 1:n µ1:n(1:t) 3.1. Importance sampling Suppose that there are N particles {Θ(i)(1 : t); i = 1, ..., N}. × p Θ(1 : t)|Y l(1 : t) (14) µl (1:t) Each particle is denoted as s1:t ,k1:t s1:tk1:t

l l dµs k (1 : t)dµn(1 : t). Θ(i) = (i) (i) l(i) l(i) 1:t 1:t (1 : t) s1:t, k1:t, µ (i) (i) (1 : t), µn (1 : t) . (15) s1:tk1:t Note that there are diﬃculties in evaluating the MMSE estimation. The ﬁrst relates to the nonlinear function in (10), These particles are generated according to p(Θ(1 : t)|Y l(1 : and the second arises from the unseen state sequence s1:t t)). Then, these particles form an empirical distribution of and mixture sequence k1:t. These unseen sequences, together Θ(1 : t), given by with nodes { l ( )}, { l( )},and{ l ( )}, form loops in the µst kt t Y t µn t graphical model. These loops in Figure 2 make exact infer- N l 1 ences on posterior probabilities of unseen sequences s1:t and p¯ Θ(1 : t)|Y (1 : t) = δΘ(i) dΘ(1 : t) , (16) N N (1:t) k1:t, computationally intractable. In the following section, we i=1 devise a sequential Monte Carlo method to tackle these problems. where δx(·) is the Dirac delta measure concentrated on x. 2370 EURASIP Journal on Applied Signal Processing

Using this distribution, an estimate of the parameters of in (13), the posterior distribution of Θ(1 : t) = (Θˆ (1 : t − interests f¯Θ(1 : t) can be obtained by 1)θ(t)) given Y l(1 : t)canbewrittenas ¯ l p Θ(1 : t)|Y l(1 : t) ∝ p Y l(t)|θ(t) p θ(t)|θˆ(t − 1) fΘ(1 : t) = fΘ(1 : t)p¯N Θ(1 : t)|Y (1 : t) dΘ(1 : t) (17) t−1 N l 1 (i) × p Y (τ)|θˆ(τ) p θˆ(τ)|θˆ(τ−1) = fΘ (1 : t), τ=2 N = i 1 × p Y l(1)|θˆ(1) p θˆ(1) . where, for example, function fΘ(1 : t)isΘ(1 : t)and (i) (i) (21) fΘ (1 : t) = Θ (1 : t)if f¯Θ(1 : t) is used for estimating pos- Θ terior mean of (1 : t). As the number of particles N goes We assume that the proposal distribution is in fact given to inﬁnity, this estimate approaches the true estimate under as mild conditions [21]. It is common to encounter the situation that the poste- q Θ(1 : t)|Y l(1 : t) rior distribution p(Θ(1 : t)|Y l(1 : t)) cannot be sampled directly. Alternatively, importance sampling (IS) method [22] = q Y l(t)|θ(t) q θ(t)|θˆ(t − 1) implements the empirical estimate in (17) by sampling from t−1 Θ | l (22) an easier distribution q( (1 : t) Y (1 : t)), whose support × q θˆ(τ)|θˆ(τ − 1) q Y l(τ)|θˆ(τ) Θ | l includes that of p( (1 : t) Y (1 : t)); that is, τ=2 × q Y l(1)|θˆ(1) q θˆ(1) . p Θ(1 : t)|Y l(1 : t) f¯Θ(1 : t) = fΘ(1 : t) q Θ(1 : t)|Y l(1 : t) Plugging (21)and(22) into (19), we can update weight in a recursive way; that is, × q Θ(1 : t)|Y l(1 : t) dΘ(1 : t) (18) l (i) (i) (i) N (i) (i) p Y (t)|θ (t) p θ (t)|θˆ (t − 1) i=1 fΘ (1 : t)w (1 : t) w(i)(1 : t) = = , l (i) (i) ˆ(i) N (i) q Y (t)|θ (t) q θ (t)|θ (t − 1) = w (1 : t) i 1 t−1 ˆ(i) | ˆ(i) − l | ˆ(i) (i) τ=2 p θ (τ) θ (τ 1) p Y (τ) θ (τ) where Θ (1 : t) is sampled from distribution q(Θ(1 : × − t 1 ˆ(i) | ˆ(i) − l | ˆ(i) t)|Y l(1 : t)), and each particle (i)hasaweightgivenby τ=2 q θ (τ) θ (τ 1) q Y (τ) θ (τ) l(1)| ˆ(i)(1) ˆ(i)(1) × pY θ pθ p Θ(i)(1 : t)|Y l(1 : t) q Y l(1)|θˆ(i)(1) q θˆ(i)(1) w(i)(1 : t) = . (19) q Θ(i)(1 : t)|Y l(1 : t) p Y l(t)|θ(i)(t) p θ(i)(t)|θˆ(i)(t−1) = w(i)(1 : t−1) . q Y l(t)|θ(i)(t) q θ(i)(t)|θˆ(i)(t−1) Equation (18)canbewrittenas (23)

N Such a time-recursive evaluation of weights can be further (1:i) (i) f¯Θ(1 : t) = fΘ (t)w˜ (1 : t), (20) simpliﬁed by allowing proposal distribution to be the prior i=1 distribution of the parameters. In this paper, the proposal distribution is given as (i) = where the normalized weight is given as w˜ (1 : t) q Y l(t)|θ(i)(t) = 1, (24) (i) N (j) w (1 : t)/ = w (1 : t). j 1 (i) | ˆ(i) − = N l(i) l Σl q θ (t) θ (t 1) a (i) (i) π (i) (i) µ (i) (i) (t); µ (i) (i) , (i) (i) . st−1st st kt st kt st kt st kt 3.2. Sequential Bayesian importance sampling (25) Making use of the Markovian property in (13), we can have the following sequential BIS method to approximate the pos- Consequently, the above weight is updated by terior distribution p(Θ(1 : t)|Y l(1 : t)). Basically, given an (i) ∝ (i) − l | (i) l(i) | l(i) − estimate of the posterior distribution at the previous time w (t) w (t 1)p Y (t) θ (t) p µn (t) µˆn (t 1) . t − 1, the method updates estimate of p(Θ(1 : t)|Y l(1 : t)) by (26) combining a prediction step from a proposal sampling distribution in (24)and(25), and a sampling weight updating Remark 2. Given Θˆ (1 : t − 1), there is an optimal pro- step in (26). posal distribution that minimizes variance of the importance Suppose that a sequence of parameters Θˆ (1 : t − 1) up weights. This optimal proposal distribution is in fact the pos- to the previous time t − 1 is given. By Markovian property terior distribution p(θ(t)|Θˆ (1 : t − 1), Y l(1 : t)) [23, 24]. Time-Varying Noise Estimation Using Monte Carlo Method 2371

3.3. Rao-Blackwellization and the extended nonlinear, it is convex and continuous. Therefore, lineariza- l ff Kalman filter tion of (7)withrespecttoµn(t)maynota ect the mode l l ( ) of the posterior distribution p(µ (1 : t)|Y (1 : t)). By the Note that µl i (t)inparticle(i) is assumed to be distributed n n asymptotic theory (see [25, page 430]), under the mild con- N l(i) l(i) − l in (µn (t); µn (t 1), Vn). By the Rao-Blackwell theo- dition that the variance of noise Nl(t)(parameterizedby rem [25], the variance of weight in (26)canbereducedby V l ) is finite, bias for estimating µˆl (t) by MMSE estimation l(i) n n marginalizing out µn (t). Therefore, we have via (17)withweightgivenby(27) may be reduced as the number of particles N grows large. (However, unbiasedness w(i)(t) ∝ w(i)(t − 1) l for estimating µˆn(t) may not be established since there are zero derivatives with respect to the parameter µl (t)in(7).) × p Y l(t)|θ(i)(t) n l(i) (27) µn (t) Third, evaluation of (28) is computationally more expensive than (27), because (28) involves calculation processes on × p µl(i)(t)|µˆl(i)(t − 1) dµl(i)(t). n n n two state-space models. We will show some experiments in Referring to (9)and(10), we notice that the integrand Section 4.1 to support the above considerations. l | (i) l(i) | l(i) − p(Y (t) θ (t))p(µn (t) µˆn (t 1)) is a state-space model Remark 3. Working in linear spectral domain in (2)for (i) (i) by (7)and(8). In this state-space model, given st , kt , noise estimation does not require EKF. Thus, if the noise l(i) l(i) Θ and µ (i) (i) (t), µn (t) is the hidden continuous-valued vector parameter in (t) and the observations are both in the lin- st kt N l(i) l(i) − l l ear spectral domain, the corresponding sequential BIS can distributed in (µn (t); µˆn (t 1), Vn), and Y (t) is the observed signal of this model. This integral in (27)can achieve asymptotically the target posterior distribution (12). be analytically obtained if we linearize (7)withrespectto In practice, however, due to the large variance in the lin- l(i) ear spectral domain, we may frequently encounter numeri- µ (t). The linearized state-space model provides an ex- n cal problems that make it difficult to build an accurate sta- tended Kalman filter (EKF) (see Appendix B for the detail tistical model for both clean speech and noise. Compress- l | (i) (i) l(i) l(i) − of EKF), and the integral is p(Y (t) st , kt , µ (i) (i) (t), µˆn (t st kt ing linear spectral power into log-spectral domain is com- 1), Y l(t − 1)), which is the predictive likelihood shown in monly used in speech recognition to achieve more accurate (B.1). An advantage of updating weight by (27) is its sim- models. Furthermore, because the performance by adapting plicity of implementation. acoustic models (modifying mean and variance of acous- Because the predictive likelihood is obtained from EKF, tic models) is usually higher than enhanced noisy speech the weight w(i)(t) may not asymptotically approach the target signals for noisy speech recognition [10], in the context of posterior distribution. One way to achieve asymptotically the speech recognition, it is beneficial to devise an algorithm target posterior distribution may follow a method called the that works in the domain for building acoustic models. In extended Kalman particle filter in [26], where the weight is our examples, acoustic models are trained from cepstral or updated by log-spectral features, thus, the parameter estimation algorithm is devised in the log-spectral domain, which is lin- w(i)(t) early related to the cepstral domain. We will show later that l l l (i) l(i) l(i) the estimated noise parameter µˆn(t)substitutesµˆn using a p Y (t)|θ (t) p µn (t)|µˆn (t − 1) ∝ w(i)(t − 1) , log-add method (36) to adapt acoustic model mean vec- l(i) | l(i) − (i) (i) l(i) l q µn (t) µˆn (t 1), st , kt , µ (i) (i) (t), Y (t) tors. Thus, to avoid inconsistency due to transformations best kt ff (28) tween di erent domains, the noise parameter may be estimated in log-spectral domain, instead of linear spectral do- l(i) main. and the proposal distribution for µn (t) is from the posterior l(i) distribution of µn (t) by EKF; that is, 3.4. Avoiding degeneracy by resampling Since the above particles are discrete approximations of the l(i) | l(i) − (i) (i) l(i) l Θ | l q µn (t) µˆn (t 1), st , kt , µ (i) (i) (t), Y (t) posterior distribution p( (1 : t) Y (1 : t)), in practice, after st kt several steps of sequential BIS, the weights of not all but some = N l(i) l(i) − (i) (i) − (i) µn (t); µn (t 1) + G (t)α (t 1), K (t) , particles may become insignificant. This could cause a large (29) variance in the estimate. In addition, it is not necessary to compute particles with insignificant weights. Selection of the where Kalman gain G(i)(t), innovation vector α(i)(t − 1), particles is thus necessary to reduce the variance and to make and posterior variance K (i)(t) are respectively given in (B.7), efficient use of computational resources. (B.2), and (B.4). Many methods for selecting particles have been pro- However, for the following reasons, we did not apply the posed, including sampling-importance resampling (SIR) stricter extended Kalman particle filter to our problem. First, [27], residual resampling [28], and so forth. We apply resid- the scheme in (28) is not Rao-Blackwellized. The variance of ual resampling for its computational simplicity. This method sampling weights might be larger than the Rao-Blackwellized basically avoids degeneracy by discarding those particles with methodin(27). Second, although observation function (7)is insignificant weights, and in order to keep the number of the 2372 EURASIP Journal on Applied Signal Processing particles constant, particles with significant weights are du- distributed approximately according to p(Θ(1 : t − 1)|Y l(1 : ˜ (i) = (i) − plicated. The steps are as follows. Firstly, set N Nw˜ (1 : t 1)), the sequential Monte Carlo method proceeds as fol- ¯ = − N ˜ (i) t) . Secondly, select the remaining N N i=1 N parti- lows at time t. cles with new weights w´ (i)(1 : t) = N¯ −1(w˜(i)(1 : t)N − N˜ (i)), and obtain particles by sampling in a distribution approx- Algorithm 1. imated by these new weights. Finally, add the particles to Bayesian importance sampling step those obtained in the first step. After this residual sampling (1) Sampling. For i = 1, ..., N, sample a proposal Θˆ (i)(1 : step, the weight for each particle is 1/N. Besides compu- = Θˆ (i) − ˆ(i) tational simplicity, residual resampling is known to have t) ( (1 : t 1)θ (t)) by (i) (i) = ¯ (i) − (i) (a) sampling sˆ ∼ a (i) ; smaller variance var N Nw´ (1 : t)(1 w´ (1 : t)) t st−1st (i) (i) (i) compared to that of SIR (which is var N (t) = Nw˜ (1 : ˆ ∼ (i) (b) sampling kt πsˆ k ; (i) t t t)(1 − w˜ (1 : t))). We denote the particles after the selection l(i) ∼ N l l Σl (i) (c) sampling µˆ (i) ˆ(i) (t) (µ (i) ˆ(i) (t); µ (i) ˆ(i) , (i) ˆ(i) ). step as {Θ˜ (1 : t); i = 1 ···N}. sˆt kt sˆt kt sˆt kt sˆt kt After the selection step, the discrete nature of the approx- (2) Extended Kalman prediction. For i = 1, ..., N,evalu- imation may lead to large bias/variance, of which the ex- ate (B.2)–(B.7) for each particle by EKFs. Predict noise treme case is that all the particles have the same parameters parameter for each particle by estimated. Therefore, it is necessary to introduce a resampling step to avoid such degeneracy. We apply a Metropolis- l(i) = l(i) | − µˆn (t) µˆn (t t 1), (32) Hastings smoothing [19] step in each particle by sampling a candidate parameter given the currently estimated param- l(i) eter according to the proposal distribution q(θ (t)|θ˜(i)(t)). where µˆn (t|t − 1) is given in (B.3). For each particle, a value is calculated as (3) Weighting. For i = 1, ..., N, evaluate the weight of each particle Θˆ (i) by (i) = (i) (i) g (t) g1 (t)g2 (t), (30) (i) ∝ (i) − l | (i) ˆ(i) l(i) (i) wˆ (1 : t) wˆ (1 : t 1)p Y (t) sˆt , kt , µˆ (i) ˆ(i) (t), = Θ˜ (i) − | l Θ˜ (i) | sˆt (t)kt where g1 (t) p(( (t 1)θ (t)) Y (1 : t))/p( (1 : t) (i) Y l(1 : t)) and g (t) = q(θ˜(i)(t)|θ (t))/q(θ (t)|θ˜(i)(t)). l(i) − l − 2 µˆn (t 1), Y (t 1) , Within an acceptance possibility min{1, g(i)(t)}, the Markov (33) chain then moves towards the new parameter θ(t); otherwise, it remains at the original parameter. where the second term in the right-hand side of the To simplify calculations, we assume that the proposal dis- equation is the predictive likelihood, given in (B.1), of | ˜(i) 5 Θ˜ (i) tribution q(θ (t) θ (t)) is symmetric. Note that p( (1 : the EKF. t)|Y l(1 : t)) is proportional to w˜(i)(1 : t) up to a scalar factor. (4) Normalization. For i = 1, ..., N, the weight of the ith With (27), (B.1), and w˜(i)(1 : t − 1) = 1/N, we can obtain the particle is normalized by acceptance possibility as    l | (i) (i) l(i) l(i) − l −  ˆ (i)(1 : ) p Y (t) s , k , µ (i) (i) (t), µˆn (t 1), Y (t 1) (i) w t t t s k w˜ (1 : t) = . (34) t t N (i) min 1, . = wˆ (1 : t)  l | (i) ˜(i) l(i) l(i) − l −  i 1 p Y (t) s˜t , kt , µ˜ (i) ˜(i) (t), µˆn (t 1), Y (t 1) s˜t kt (31) Resampling Denote the obtained particles hereafter as {Θˆ (i)(1 : t); i = (1) Selection. Use residual resampling to select particles 1, ..., N} with equal weights. with larger normalized weights and discard those particles with insignificant weights. Duplicate particles of 3.5. Noise parameter estimation via the sequential large weights in order to keep the number of particles Monte Carlo method as N. Denote the set of particles after the selection step as {Θ˜ (i)(1 : t); i = 1, ..., N}. These particles have equal Following the above considerations, we present the imple- weights w˜(i)(1 : t) = 1/N. mented algorithm for noise parameter estimation. Given (2) Metropolis-Hastings smoothing. For i = 1, ..., N, that, at time t−1, N particles {Θˆ (i)(1 : t−1); i = 1, ..., N} are sample Θ(i)(1 : t) = (Θ˜ (i)(1 : t − 1)θ(t)) from step (1) to step (3) in the Bayesian importance sam- Θ˜ (i) 5 (i) pling step with starting parameters given by (1 : t). Generating θ (t) involves sampling speech state st from s˜1:t according For i = 1, ..., N, set an acceptance possibility by (31). to a first-order Markovian transition probability p(s |s˜(i)) in the graphical t t For i = 1, ..., N, accept Θ(i)(1 : t) (i.e., substitute model in Figure 2. Usually, this transition probability matrix is not symmet- Θ˜ (i) Θ (i) (i) ∼ ric; that is, p(s |s˜(i)) = p(s˜(i)|s ). Our assumption of symmetric proposal (1 : t)by (1 : t)) with probability r (t) t t t t (i) distribution q(θ(t)|θ˜(i)(t)) is for simplicity in calculating an acceptance U(0, 1). The particles after the step are {Θˆ (1 : t); i = possibility. 1, ..., N} with equal weights wˆ (i)(1 : t) = 1/N. Time-Varying Noise Estimation Using Monte Carlo Method 2373

Table 1: State estimation experiment results. The results show the mean and variance of the mean squared error (MSE) calculated over 100 independent runs.

MSE Algorithm Averaged execution time (s) Mean Variance Particle filter 8.713 49.012 5.338 Extended Kalman particle filter 6.496 34.899 13.439 Rao-Blackwellized particle filter 4.559 8.096 6.810

Noise parameter estimation based on Rao-Blackwellized particle filter (27). We consider (1) Noise Parameter Estimation. With the above generated particularly difficult tasks for speech processing, speech en- particlesateachtimet, estimation of the noise param- hancement, and noisy speech recognition in nonstationary l eter µn(t) may be acquired by MMSE. Since each par- noisy environments. We show in Section 4.2 that the method l ticle has the same weight, MMSE estimation of µˆn(t) can track noise dynamically. In Section 4.3, we show that the can be easily carried out as method improves system robustness to noise in an ASR system. Finally, we present results on speech enhancement in N Section 4.4, where the estimated noise parameter is used in a l = 1 l(i) µˆn(t) µˆn (t). (35) time-varying linear filter to reduce noise power. N i=1 4.1. Synthetic experiments The computational complexity of the algorithm at each time t is O(2N) and is roughly equivalent to 2N EKFs. These This section7 presents some experiments8 to show the va- steps are highly parallel, and if resources permit, can be im- lidity of Rao-Blackwellized filter applied to the state-space l plemented in a parallel way. Since the sampling is based on model in (7)and(8). A sequence of µn(1 : t)wasgenerated l BIS, the storage required for the calculation does not change from (8), where state-process noise variance Vn was set to ffi 0.75. Speech mean vector l ( )in(7) was set to a constant over time. Thus the computation is e cient and fast. µst kt t Note that the estimated µˆl (t) may be biased from the 10. The observation noise variance Σl was set to 0.00005. n st kt true physical mean vector for log-spectral noise power Nl(t), Given only the noisy observation Y l(1 : t)fort = 1, ..., 60, because the function plotted in Figure 1 has zero derivative different filters (particle filter by (26), extended Kalman par- l l with respect to nj (t) in regions where nj (t)ismuchsmaller ticle filter by (28), and Rao-Blackwellized particle filter by l l(i) than xj (t). For those µˆn (t) which are initialized with val- (27)) were used to estimate the underlying state sequence l l(i) µn(1 : t). The number of particles in each type of filter was ues larger than speech mean vector µ (i) (i) , updating by EKF st kt 200, and all the filters applied residual resampling [28]. The may be lower bounded around the speech mean vector. As a experiments were repeated for 100 times with random re- l N l(i) result, the updated ˆ ( ) = 1 = ˆ ( ) may not be the l µn t /N i 1 µn t initialization of µn(1) for each run. Table 1 summarizes the true noise log-spectral power. mean and variance of the MSE of the state estimates, together with the averaged execution time of each filter. Figure 3 com- Remark 4. The above problem, however, may not hurt a pares the estimates generated from a single run of the differ- model-based noisy speech recognition system, since it is the ent filters. In terms of MSE, the extended Kalman particle modified likelihood in (10) that is used to decode speech filter performed better than the particle filter. However, the 6 signals. But in a speech enhancement system, noisy speech execution time of the extended Kalman particle filter was the spectrum is directly processed on the estimated noise param- longest (more than two times longer than that of particle fil- eter. Therefore, biased estimation of the noise parameter may ter (26)). Performance of the Rao-Blackwellized particle fil- hurt performances more apparently than in a speech recog- ter of (27) is clearly the best in terms of MSE. Notice that its nition system. averaged execution time was comparable to that of particle filter. 4. EXPERIMENTS 4.2. Estimation of noise parameter We first conducted synthetic experiments in Section 4.1 to Experiments were performed on the TI-Digits database compare three types of particle filters presented in Sections downsampled to 16 kHz. Five hundred clean speech utter- 3.2 and 3.3. Then, in the following sections, we present ap- ances from 15 speakers and 111 utterances unseen in the plications of the above noise parameter estimation method training set were used for training and testing, respectively.

6 l The likelihood of the observed signal Y (t), given speech model param- 7A Matlab implementation of the synthetic experiments is available by eter and a noise parameter, is the same as long as the noise parameter is sending email to the corresponding author. l(i) 8 much smaller than the speech parameter µ (i) (i) (t). All variables in these experiments are one dimensional. st kt 2374 EURASIP Journal on Applied Signal Processing

70 Based on the results in Figures 4 and 5, we make the following observations. First, the method can track the evolu- 60 tion of the noise power. Second, the larger driving noise vari- l 50 ance Vn will make faster convergence but larger estimation error. Third, as discussed in Section 3.5, there was large bias 40 in the region where noise power changed from large to small.

) Such observation was more explicit in experiment B (noise t ( l n 30 multiplied with a rectangular signal). µ

20 4.3. Noisy speech recognition in time-varying noise 10 The experiment setup was the same as in the previous experiments in Section 4.2. Features for speech recognition 0 were MFCCs plus their first- and second-order time differ- entials. Here, we compared three systems. The first was the −10 0 102030405060baseline trained on clean speech without noise compensa- Time tion (denoted as Baseline). The second was the system with noise compensation, which transformed clean speech acous- Noisy observations PF-EKF estimate tic models by mapping clean speech mean vector l at each µst kt True x PF-RB estimate state and Gaussian density with the function [8] PF estimate st kt µˆl = µl +log 1+exp µˆl − µl , (36) Figure 3: Plot of estimates generated by the different filters on the st kt st kt n st kt synthetic state estimation experiment versus true state. PF denotes where µˆl was obtained by averaging noise log-spectral in particle filter by (26). PF-EKF denotes particle filter with EKF pro- n noise-alone segments in the testing set. This system was de- posal sampling by (28). PF-RB denotes Rao-Blackwellized particle filter by (27). noted as stationary noise assumption (SNA). The third system used the method in Section 3.5 to estimate the noise l parameter µˆn(t) without training transcript. The estimated l noise parameter was plugged into µˆn in (36) for adapting Digits and silence were respectively modeled by 10-state and acoustic mean vector at each time t. This system was denoted 3-state whole-word HMMs with 4 diagonal Gaussian mix- according to the number of particles and variance of the en- l tures in each state. vironment driving noise Vn. The window size was 25.0 milliseconds with a 10.0 milliseconds shift. Twenty-six filter banks were used in the 4.3.1. Results in the simulated nonstationary noise binning stage; that is, J = 26. Speech feature vectors were In terms of recognition performance in the simulated non- Mel-scaled frequency cepstral coefficients (MFCCs), which stationary noise described in Section 4.2, Table 2 shows that were generated by transforming log-spectral power spectra the method can effectively improve system robustness to vector with discrete Cosine transform (DCT). The baseline the time-varying noise. For example, with 60 particles and l system had 98.7% word accuracy for speech recognition un- the environment driving noise variance Vn set to 0.001, the der clean conditions. method improved word accuracy from 75.3%, achieved by For testing, white noise signal was multiplied by a chirp SNA, to 94.3% in experiment A. The table also shows that signal and a rectangular signal in the time domain. The the word accuracies can be improved by increasing the num- l time-varying mean of the noise power as a result changed ei- ber of particles. For example, given driving noise variance Vn ther continuously, denoted as experiment A, or dramatically, set to 0.0001, increasing the number of particles from 60 to denoted as experiment B. SNR of the noisy speech ranged 120 could improve word accuracy from 77.1% to 85.8% in from 0 dB to 20.4 dB. We plotted the noise power in the 12th experiment B. filter bank versus frames in Figure 4, together with the estimated noise power by the sequential method with the num- 4.3.2. Speech recognition in real noise ber of particles N set to 120 and the environment driving In this experiment, speech signals were contaminated by l noise variance Vn set to 0.0001. As a comparison, we also highly nonstationary machine gun noise in different SNRs. plotted in Figure 5 the noise power and its estimate by the The number of particles was set to 120, and the environment method with the same number of particles but larger driving l driving noise variance Vn was set to 0.0001. Recognition per- noise variance set to 0.001. formances are shown in Table 3, together with Baseline and Four seconds of contaminating noise were used to initial- SNA. It is observed that, in all SNR conditions, the method in l l(i) ize µˆn(0) in the noise estimation method. Initial value µˆn (0) Section 3.5 further improved system performances in com- N l of each particle was obtained by sampling from (µˆn(0) + parison with SNA. For example, in 8.9 dB SNR, the method ζ(0), 10.0), where ζ(0) was distributed in U(−1.0, 9.0). To improved word accuracy from 75.6% by SNA to 83.1%. As apply the estimation algorithm in Section 3.5,observation a whole, it reduced the word error rate by 39.9% more than vectors were transformed into log-spectral domain. SNA. Time-Varying Noise Estimation Using Monte Carlo Method 2375

16 16 15.5 15 15 14.5

14 14 13.5 13 13 Noise power Noise power 12.5 12 12 11.5 11 11 2000 4000 6000 8000 10000 12000 14000 16000 2000 4000 6000 8000 10000 12000 14000 16000 Frame Frame

True value True value Estimated Estimated

l Figure 4: Estimation of the time-varying parameter µn(t) by the sequential Monte Carlo method at the 12th ﬁlter bank in experiment A. The number of particles is 120. The environment driving noise variance is 0.0001. The solid curve is the true noise power, whereas the dash-dotted curve is the estimated noise power.

16 15.5 16 15 15 14.5 14 14 13.5 13 Noise power

Noise power 13 12.5 12 12 11.5 11 11

2000 4000 6000 8000 10000 12000 14000 16000 2000 4000 6000 8000 10000 12000 14000 16000 Frame Frame

True value True value Estimated Estimated

l Figure 5: Estimation of the time-varying parameter µn(t) by the sequential Monte Carlo method at the 12th ﬁlter bank in experiment A. The number of particles is 120. The environment driving noise variance is 0.001. The solid curve is the true noise power, whereas the dash-dotted curve is the estimated noise power.

4.4. Perceptual speech enhancement time-varying linear coefficient at each filter bank; that is, Enhanced speech xˆ(t) is obtained by filtering the noisy lin = · lin xˆ j (t) hj (t) yj (t), (38) speech sequence y(t) via a time-varying linear filter h(t); that is, where hj (t) is the gain at filter bank j at time t. Referring to (2), we can expand it as = ∗ xˆ(t) h(t) y(t). (37) lin = lin lin xˆ j (t) hj (t)xj (t)+hj (t)nj (t). (39) This process can be studied in the frequency domain as mul- We are left with two choices for linear time-varying fil- lin tiplication of the noisy speech power spectrum yj (t)bya ters. 2376 EURASIP Journal on Applied Signal Processing

Table 2: Word accuracy (%) in simulated nonstationary noise, achieved by the sequential Monte Carlo method in comparison with baseline without noise compensation, denoted as Baseline, and noise compensation assuming stationary noise, denoted as stationary noise assumption.

Stationary No. of particles = 60 No. of particles = 120 l l Experiment Baseline noise assumption Vn Vn 0.001 0.0001 0.001 0.0001 A 48.2 75.3 94.394.0 94.394.6 B 53.0 78.0 82.277.1 85.885.8

Table 3: Word accuracy (%) in machine gun noise, achieved by the sequential Monte Carlo method in comparison with baseline without noise compensation, denoted as Baseline, and noise compensation assuming stationary noise, denoted as stationary noise assumption.

= l = SNR (dB) Baseline Stationary noise assumption No. of particles 120, Vn 0.0001 28.990.492.897.6 14.964.576.888.3 8.956.075.683.1 1.650.069.072.9

(1) Wiener filter constructs the coefficient as Both Wiener filter and perceptual filter require the esti- lin mated noise power spectrum nˆ j (t). Under the assumption lin of stationary noise, the noise power spectrum can be esti- nˆ j (t) hj (t) = 1 − , (40) mated from noise-alone segments provided by explicit VAD, ylin(t) j for example, speech enhancement scheme in [7]. However, in real applications, we encounter time-varying noise, which lin where nˆ j (t) is the estimate of noise power spectrum. may change its statistics during speech utterances. (2) The criterion for perceptual filter is to construct hj (t) The objective of this section is to test the above de- so that the amplitude of the filtered noise power spec- vised method in Section 3.5 for speech enhancement in time- · lin l tra hj (t) nj (t) is below the masking threshold of the varying noise. The estimated µˆn(t) is converted to linear lin denoised speech; that is, spectral domain µˆn (t) by exponential operation. Corre- lin lin sponding jth element in µˆn (t)substitutesnˆ j (t)in(40) · lin ≤ hj (t) nj (t) Tj (t), (41) and (42), respectively, to construct Wiener filter and perceptual filter. Therefore, the proposed speech enhancement algorithm is a combination of sequential noise parameter es- where Tj (t) is the masking threshold of the denoised speech signal. The threshold is a function of clean timation by sequential Monte Carlo method and speech en- lin lin hancement method with time-varying linear filtering. Dia- speech spectrum xj (t). Since xj (t) is not directly observed, the following equation is used instead, which gram of the algorithm is shown in Figure 6.Ateachframe makes the masking threshold a function of the esti- t, the algorithm carries out the noise parameter estimation mated noise power spectra nˆlin(t): in the log-spectral domain and perceptual enhancement of j noisy speech in the time domain. Noise parameter estimation in the module “Noise parameter estimation” works in the lin = lin − lin xˆ j (t) yj (t) nˆ j (t). (42) log-spectral domain of input speech signals. Estimation of noise parameter is given by Algorithm 1. With the estimated The perceptual filter exploits the masking properties of noise parameter at the current frame, the module “wiener fil- the human auditory system, and it has been employed by ter” outputs the enhanced speech spectrum in linear spectral many researchers (e.g., [14]) in order to provide improved domain, and the enhanced speech spectrum is used in performance over the Wiener filter in low SNR conditions. “masking threshold calculation.” Perceptual filter based on Masking occurs because the auditory system is incapable of masking threshold and the estimated noise parameter is con- distinguishing two signals close in time or frequency domain. structed in the module “perceptual filter.” With the time- This is manifested by an evaluation of the minimum thresh- varying perceptual filter constructed, input noisy speech sig- old of audibility due to a masker signal. Masking has been nal is filtered in time domain in the module “filtering” widely applied to speech and audio coding [15]. We consider to obtain perceptually enhanced signal xˆ(t). A detailed de- frequency masking [15] when a weak signal is made inaudi- scription of this algorithm is provided in the following sec- ble by a stronger signal occurring simultaneously. tions. Time-Varying Noise Estimation Using Monte Carlo Method 2377

Log-spectral Noise parameter Linear spectral conversion estimation conversion

Voice activity detection

Masking threshold Windowing Wiener ﬁlter + calculation y(t) FFT

Filtering Perceptual ﬁlter xˆ(t)

Figure 6: Diagram of the proposed speech enhancement method. Noisy signal y(t) is converted into linear spectral amplitude in “windowing + FFT.” Noise parameter is sequentially estimated in “noise parameter estimation.” The estimated noise parameter is converted back into linear spectral domain and is fed into “Wiener filter” to obtain enhanced linear power spectrum. The enhanced spectrum is inputted to “masking threshold calculation,” and the obtained masking threshold is used in perceptual filter with the estimated noise parameter in linear spectral domain. Module “perceptual filter” outputs filter coefficients for speech enhancement in “filtering,” which outputs the enhanced signal xˆ(t).

4.4.1. Masking threshold calculation The masking threshold T (t) is obtained through modeling j −100 the frequency selectivity of the human ear and its masking property. This paper applies a computational model of mask- −200 ing by Johnston [15]. − Frequency masking threshold calculation 300

(1) Frequency analysis. According to a mapping between −400 linear frequency and Bark frequency [14], power spectrum xlin(t) after short-time Fourier transform (STFT) of input j Spreading function (dB) −500 speech signal is combined in each Bark bank b (1 ≤ b ≤ B) by −600

bH lin = lin xb (t) xj (t), (43) 2 4 6 8 1012141618 j=b L Critical band number where bL and bH denote the lowest and the highest frequency for the bark index b. Figure 7: Spreading function for the noise masking threshold cal- (2) Convolution with spreading function. The spreading culation. The plot shows the spreading function applied to critical function S is used to estimate the effects of masking across band at 2. critical bands. One example of the spreading function Bb at b = 2 is plotted in Figure 7. The spread Bark spectrum at lin = lin bark index b is denoted as Cb (t) Bbxb (t). (3) Relative threshold calculation based on tone-like or An example of masking threshold in linear spectral do- noise-like determination. The tone-like or noise-like is de- main for a given input spectrum is plotted in Figure 8.The termined by spectral flatness measure (SFM), which is calcu- sampling frequency is 8 kHz. Therefore, the total number of = lated by measuring the decibel (dB) of the ratio of the geo- critical bands is B 18. In the method presented above, the metric mean of the power spectrum to the arithmetic mean masking threshold is calculated from the clean speech sig- of the power spectrum. nal. (4) Masking threshold calculation. The relative threshold is subtracted from the spread critical band spectrum to yield 4.4.2. Wiener filter and perceptual filter the spread threshold estimate. We apply the method in Section 3.5 for time-varying noise l (5) Renormalization and including absolute threshold parameter estimation. The jth element in µˆn(t)isconverted information [15]. to linear spectral domain by exponential operation and then lin (6) Converting the masking threshold from Bark fre- substitutes nˆ j (t)in(40)and(42), respectively, for Wiener quency to linear frequency domain. The masking threshold filter and perceptual filter. Masking threshold of the percep- in linear spectral domain Tj (t) is obtained as a result. tual filter is obtained from Section 4.4.1. 2378 EURASIP Journal on Applied Signal Processing

20 10 5 0 18 −5 00.20.40.60.811.21.41.61.82 16 Time (s) 14 (a) 12 10 10 5 0 Amplitude (dB) −5 8 00.20.40.60.811.21.41.61.82 6 Time (s)

4 (b)

0 500 1000 1500 2000 2500 3000 3500 4000 10 Frequency (Hz) 0 Amplitude of speech spectrum −10 00.20.40.60.811.21.41.61.82 Masking threshold Time (s)

Figure 8: Example of the masking threshold Tb(t). (c)

5 0 −5 4.4.3. Experimental results 00.20.40.60.811.21.41.61.82 Experiments were performed on Aurora 2 database. Speech Time (s) models were trained on 8840 clean speech utterances. The model was an HMM with 18 states and 8 Gaussian mix- (d) tures in each state. Noise model was a single Gaussian density with time-varying mean vector. Window size was 10 25.0 milliseconds with a 10.0 milliseconds shift. J was set to 0 −10 65. 00.20.40.60.811.21.41.61.82 We compared three systems. The first system, denoted Time (s) as Baseline, was a speech enhancement system based on ETSI proposal [7], in which a VAD is used for decision (e) of speech/nonspeech segments. Noise parameters were estimated from segmented noise-alone frames. The second sys- ff Figure 9: An example of signals. (a) Clean speech signal in En- tem, denoted as Known, di ers from the first system in that glish “Oh Oh Two One Six.” (b) Noisy signal (noise is the simulated the Wiener filter was designed with noise parameters esti- nonstationary noise and SNR is −0.2 dB). (c) Enhanced speech sig- mated by the proposed method. The third system, denoted nal by Wiener filter (system Baseline). (d) Enhanced speech signal as Perceptual, was a perceptual filter with noise parameter by Wiener filter with noise parameters estimated by the proposed estimated by the proposed method. method (system Known). (e) Enhanced speech signal by perceptual VAD was initialized during the first three frames in each filter with noise parameters estimated by the proposed method (sys- l tem Perceptual). utterance. Driving variance Vn in (9) was set to 0.0003. Num- ber of particles (N in (35)) was set to 800. Noise signals were (1) simulated nonstationary noise, generated by multiplying white noise with a time-varying was almost at the time when the speech segments occurred. continuous factor in time domain, (2) Babble noise, and (3) Figure 10c shows that Baseline cannot handle this nonsta- Restaurant noise. tionarity of the noise, and the enhanced signal by the system 4.4.4. Performance evaluation still contains much noise power in the speech segments. On the contrary, with the proposed method, the enhanced signal Spectrogram by Known has reduced the noise power in speech segments An example of the original clean speech signals, noisy signals (shown in Figure 10d). Perceptual reduces noise in the en- in the simulated nonstationary noise, and enhanced signals hanced signal to a greater extent than the other two systems are shown in Figure 9. The contrast is more evident by view- (shown in Figure 10e). An example in Babble noise is shown ing their corresponding spectrogram in Figure 10.Itisob- in Figure 11, and the corresponding spectrogram is shown in served that the noise power appeared after 0.4 seconds, which Figure 12. Time-Varying Noise Estimation Using Monte Carlo Method 2379

4 4 0 0 3.5 3.5 −5 −5 3 3 −10 −10 2.5 2.5 −15 −15 2 2 −20 −20 1.5 1.5 −25 Frequency (kHz) Frequency (kHz) −25 1 1 −30 −30 0.5 0.5 −35 − 0 35 0 0.20.40.60.811.21.41.61.82 0.20.40.60.811.21.41.61.82 Time (s) Time (s)

(a) (b)

4 4 0 3.5 0 3.5 −5 − 3 5 3 −10 2.5 −10 2.5 −15 2 −15 2 −20 1.5 −20 1.5

Frequency (kHz) Frequency (kHz) −25 1 −25 1 −30 0.5 −30 0.5 −35 0 −35 0 0.20.40.60.811.21.41.61.82 0.20.40.60.811.21.41.61.82 Time (s) Time (s)

4 5

3.5 0

3 −5

2.5 −10

2 −15

1.5 −20 Frequency (kHz) 1 −25

0.5 −30 0 0.20.40.60.811.21.41.61.82 Time (s)

(e)

Figure 10: An example of the spectrum of the signals (from top to down). (a) Spectrogram of the clean speech signal in English “Oh Oh Two One Six.” (b) Spectrogram of the noisy signal (noise is the simulated nonstationary noise and SNR is −0.2dB).(c)Spectrogramofthe enhanced signal by Wiener filter (system Baseline). (d) Spectrogram of the enhanced signal by Wiener filter with noise parameters estimated by the proposed method (system Known). (e) Spectrogram of the enhanced signal by perceptual filter with noise parameters estimated by the proposed method (system Perceptual). 2380 EURASIP Journal on Applied Signal Processing

10 Another direction in which the method needs to be im- 5 0 proved is obvious in Figure 12. In this figure, high-frequency − 5 components are attenuated more than necessary. Since the 00.20.40.60.811.21.41.61.82 Mel scale and Bark scale are wider in higher-frequency com- Time (s) ponents than those in the lower-frequency components, (a) noise parameters may not be accurately estimated due to frequency uncertainty between linear frequency and Mel 10 5 scale (or Bark scale). Constructing speech enhancement al- 0 −5 gorithms that work directly in linear spectral domain (not 00.20.40.60.811.21.41.61.82 Bark-scaled log-spectral domain in this work) may achieve Time (s) higher frequency resolution and hence better enhancement results. (b) SNR improvement 10 5 0 The amount of noise reduction is generally measured with −5 the segmental SNR (SegSNR) improvement—the difference 00.20.40.60.811.21.41.61.82 between input and output SegSNR: Time (s) (c) B−1 D−1 2 1 (1/D) = n (d + Db) G = 10 · log d 0 , SNR B 10 D−1 − 2 5 b=0 (1/D) d=0 x(d+Db) xˆ(d + Db) 0 (44) −5 00.20.40.60.811.21.41.61.82 where B represents the number of frames in the signal. D is Time (s) the number of observation samples per frame, and it is set to 256. (d) Figure 13 shows the SegSNR improvement obtained 10 from various noise types and at various noise levels. We 0 can see that the system Known with the sequential Monte −10 Carlo method has improved SegSNR over system Baseline. 00.20.40.60.811.21.41.61.82 Figure 13 also shows that both systems Known and Percep- Time (s) tual benefit from the sequential Monte Carlo method. Fur- (e) thermore, Perceptual shows much greater improvement than Known, which implies that it is effective to employ human 9 Figure 11: An example of signals. (a) Clean speech signal in English auditory properties for speech enhancement. “Oh Oh Two One Six.” (b) Noisy signal (noise is babble and SNR is −1.86 dB). (c) Enhanced speech by Wiener filter (system Base- line). (d) Enhanced speech by Wiener filter with noise parameter 5. CONCLUSIONS AND DISCUSSIONS estimated by the proposed method (system Known). (e) Enhanced We have presented a sequential Monte Carlo method for a speech by perceptual filter with noise parameter estimated by the Bayesian estimation of time-varying noise parameters. This proposed method (system Perceptual). method is derived from the general sequential Monte Carlo method for time-varying parameter estimation, but with However, the nonstationary noise was not perfectly re- particular considerations on time-varying noise parameter moved in the final part of the sequence in Figure 10.Thiswas estimation. The estimated noise parameters are used in a in part due to inefficiency in the proposal distribution. Note Wiener filter and a perceptual filter for speech enhancement that the speech states and mixtures were sampled according in nonstationary noisy environments. We also demonstrate to the proposal distribution in (25).Thus,attheendofan that, with the estimated noise parameters, a sequential mod- utterance, the proposed speech states might not yet reach the ification of the time-varying mean vector of speech models l(i) can improve speech recognition performance in nonstation- states of silence. As a result, the speech parameter µˆ (i) (i) (t) st kt ary noise. The results show that it is a promising approach to l might still mask (be larger than) the noise parameter µˆn(t). handle speech signal processing in nonstationary noise sce- In this situation, the noise parameter may not have been narios. updated (remained small if previously estimated noise pa- l(i) rameter was smaller than speech parameters µˆ (i) (i) )because st kt 9 the Kalman gain in EKF was (approaching) zero. Therefore, However, as discussed in Section 4.4.4,becausethesystemPerceptual attenuated higher-frequency components more than traditional Wiener fil- noise in the final part of the sequence cannot be perfectly re- ters, the subjective quality of the perceptually enhanced speech signal in hu- moved in some utterances. man hearing was in fact no better than that by Wiener filters. Time-Varying Noise Estimation Using Monte Carlo Method 2381

4 4 0 0 3.5 3.5 −5 −5 3 3 −10 − 10 2 5 2.5 . −15 −15 2 2 −20 −20 1.5 1.5 −25 Frequency (kHz) Frequency (kHz) −25 1 1 −30 − 30 0.5 0.5 −35 − 0 35 0 0.20.40.60.811.21.41.61.82 0.20.40.60.811.21.41.61.82 Time (s) Time (s)

(a) (b)

4 4

3.5 0 3.5 0 − 3 5 3 −5 − 2.5 10 2.5 −10 − 2 15 2 −15

1.5 −20 1.5 −20 Frequency (kHz) Frequency (kHz) − 1 25 1 −25

0.5 −30 0.5 −30

0 −35 0 −35 0.20.40.60.811.21.41.61.82 0.20.40.60.811.21.41.61.82 Time (s) Time (s)

4 5 3.5 0 3 −5 2.5 −10 2 −15 1.5

Frequency (kHz) −20 1 −25 0.5 −30 0 0.20.40.60.811.21.41.61.82 Time (s)

(e)

Figure 12: An example of the spectrum of the signals (from top to bottom). (a) Spectrogram of the clean speech signal in English “Oh Oh Two One Six.” (b) Spectrogram of the noisy signal (noise is babble and SNR is −1.86 dB). (c) Spectrogram of the enhanced speech by Wiener filter (system Baseline). (d) Spectrogram of the enhanced speech by Wiener filter with noise parameter estimated by the proposed method (system Known). (e) Spectrogram of the enhanced speech by perceptual filter with noise parameter estimated by the proposed method (system Perceptual). 2382 EURASIP Journal on Applied Signal Processing

The sequential Monte Carlo method in this paper is suc- ﬀ 4 cessfully applied to two seemingly di erent areas in speech processing, speech enhancement, and speech recognition. 3 This is possible because the graphical model shown in Figure 2 is applicable to the above two areas. The graphi- 2 cal model incorporates two hidden state sequences: one is the speech state sequence for modeling transition of speech 1 units, and the other is a continuous-valued state sequence for modeling noise statistics. With the sequential Monte

SNR improvement (dB) 0 Carlo method, noise parameter estimation can be con- −1 ducted via sampling the speech state sequences and updating continuous-valued noise states with 2N EKFs at each time. − − 4 20 2 4 6 The highly parallel scheme of the method allows an eﬃcient Input SNR (dB) parallel implementation. (a) We are currently considering the following steps for improved performance: (1) making use of more eﬃcient proposal distribution, for example, auxiliary sampling [29], (2) accurate training of speech models, and (3) design of 6 algorithms working directly in linear spectral domain for 5 speech enhancement. Improvements may be achieved if explicit speech modeling, for example, autocorrelation model- 4 ing of speech signals [11], pitch model [30], and so forth, 3 can be incorporated in the framework. Because there is nonlinear function involved, we also believe that incorporating 2 smoothing techniques recently proposed for nonlinear time 1 series [31] may achieve improved performances. SNR improvement (dB) 0 APPENDICES

−50 5 A. APPROXIMATION OF THE ENVIRONMENT Input SNR (dB) EFFECTS ON SPEECH FEATURES Effects of additive noise on speech power at the jth filter bank (b) lin lin lin can be approximated by (2), where yj (t), xj (t), and nj (t) denote noisy speech power, speech power, and additive noise power in filter bank j [8, 9]. In the log-spectral domain, this equation can be written 8 below as 7 log xlin(t)+nlin(t) j j lin 6 nj (t) = log xlin(t) + log 1+ j xlin(t) 5 j = lin lin − lin 4 log xj (t) + log 1+exp log nj (t) log xj (t) . (A.1) SNR improvement (dB) 3 l = lin l = lin l = 2 Substituting xj (t) log xj (t), nj (t) log nj (t), and yj (t) lin −6 −4 −20 2 4 log yj (t), we have (3). Input SNR (dB) B. EXTENDED KALMAN FILTER (c) The prediction likelihood of the EKF is given by [24] l | (i) (i) l(i) l(i) − l − Figure 13: Segmental SNR improvement in the following noise: p Y (t) st , kt , µ (i) (i) (t), µˆn (t 1), Y (t 1) st kt (a) simulated nonstationary noise; (b) babble noise; (c) restaurant × = p Y l(t)|θ(i)(t) p µl(i)(t)|µˆl(i)(t − 1) dµl(i)(t) noise. The tested systems are the following: ( ) Wiener filter (sys- l(i) n n n tem Baseline); ( ) Wiener filter with noise parameter estimated by µn (t) −1 the proposed method (system Known); (o) Perceptual filter with ∝ − 1 (i) T (i) Σl (i) T l (i) exp α (t) C (t) (i) (i) C (t) + Vn α (t) , noise parameter estimated by the proposed method (system Per- 2 st kt ceptual). (B.1) Time-Varying Noise Estimation Using Monte Carlo Method 2383 where, respectively, the innovation vector α(i)(t), one-step [9] A. Acero, Acoustical and environmental robustness in automatic l(i) speech recognition, Ph.D. thesis, Carnegie Mellon University, ahead prediction of noise parameter µˆn (t|t − 1), correlation l(i) Pittsburgh, Pa, USA, September 1990. matrix of the error in µˆn (t), correlation matrix of the error [10] S. V. Vaseghi and B. P. Milner, “Noise compensation methods l(i) in µˆn (t|t − 1), measurement matrix at time t (obtained by for hidden Markov model speech recognition in adverse en- l(i) vironments,” IEEE Trans. Speech, and Audio Processing, vol. 5, the first-order differentiation of (7)withrespecttoµn (t)), (i) no. 1, pp. 11–21, 1997. and gain function G (t)aregivenas [11] J. Vermaak, C. Andrieu, A. Doucet, and S. J. Godsill, “Particle (i) = l − l(i) methods for Bayesian modeling and enhancement of speech α (t) Y (t) µ (i) (i) (t) signals,” IEEE Trans. Speech, and Audio Processing, vol. 10, no. st kt (B.2) 3, pp. 173–185, 2002. − l(i) − − l(i) log 1+exp µˆn (t 1) µ (i) (i) (t) , [12] K. Yao and S. Nakamura, “Sequential noise compensation by st kt sequential Monte Carlo method,” in Advances in Neural Infor- l(i) | − = l(i) − (i) (i) − µˆn (t t 1) µˆn (t 1) + G (t)α (t 1), (B.3) mation Processing Systems, vol. 14, pp. 1205–1212, MIT Press, K (i)(t) = K (i)(t, t − 1) − G(i)(t)C(i)(t)K (i)(t, t − 1), (B.4) Cambridge, Mass, USA, 2001. [13] D. Burshtein and S. Gannot, “Speech enhancement using a (i) − = (i) − l K (t, t 1) K (t 1) + Vn,(B.5)mixture-maximum model,” IEEE Trans. Speech, and Audio Processing, vol. 10, no. 6, pp. 341–351, 2002. l(i) | − − l exp µˆn (t t 1) µ (i) (i) (t) s k [14] N. Virag, “Single channel speech enhancement based on C(i)(t) = t t ,(B.6)masking properties of the human auditory system,” IEEE l(i) | − − l 1+exp µˆn (t t 1) µ (i) (i) (t) Trans. Speech, and Audio Processing, vol. 7, no. 2, pp. 126–137, st kt 1999. G(i)(t) = K (i)(t, t − 1)C(i)(t)T C(i)(t)K (i)(t, t − 1)C(i)(t)T [15] J. D. Johnston, “Estimation of perceptual entropy using noise −1 masking criteria,” in Proc. IEEE Int. Conf. Acoustics, Speech, Σl + (i) (i) . Signal Processing, vol. 5, pp. 2524–2527, New York, NY, USA, st kt April 1988. (B.7) [16] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recogni- tion, Prentice-Hall, Englewood Cliffs, NJ, USA, 1993. [17] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, ACKNOWLEDGMENTS “An introduction to variational methods for graphical mod- The authors thank anonymous reviewers for their helpful els,” in Learning in Graphical Models, pp. 105–162, Kluwer comments on this paper, which substantially improved its Academic, Dordrecht, 1998. presentation. The corresponding author thanks Dr. S. Naka- [18] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Transac- mura (ATR SLT) for helpful discussions. Part of the work was tions on Information Theory, vol. 13, no. 2, pp. 260–269, 1967. performed when K. Yao was with ATR SLT. [19] W. K. Hastings, “Monte Carlo sampling methods using Markov chains and their applications,” Biometrika, vol. 57, REFERENCES no. 1, pp. 97–109, 1970. [20] G. Casella and C. P. Robert, “Rao-Blackwellisation of sam- [1] J. S. Lim, Speech Enhancement, Prentice-Hall, Englewood pling schemes,” Biometrika, vol. 83, no. 1, pp. 81–94, 1996. Cliffs, NJ, USA, 1983. [21] A. Doucet, N. de Freitas, and N. J. Gordon, Eds., Sequential [2] K. K. Paliwal and A. Basu, “A speech enhancement method Monte Carlo in Practice, Springer-Verlag, New York, NY, USA, based on Kalman filtering,” in Proc. IEEE Int. Conf. Acous- 2001. tics, Speech, Signal Processing, vol. 12, pp. 177–180, Dallas, Tex, [22]J.M.BernardoandA.F.M.Smith, Bayesian Theory,John USA, April 1987. Wiley & Sons, New York, NY, USA, 1994. [3] J. S. Lim and A. V. Oppenheim, “All-pole modeling of de- [23] V. S. Zaritskii, V. B. Svetnik, and L. I. Shimelevich, “Monte- graded speech,” IEEE Trans. Acoustics, Speech, and Signal Pro- Carlo techniques in problem of optimal information process- cessing, vol. 26, no. 3, pp. 197–210, 1978. ing,” Automation and Remote Control, vol. 36, no. 3, pp. 2015– [4] Y. Ephraim, “A Bayesian estimation approach for speech en- 2022, 1975. hancement using hidden Markov models,” IEEE Trans. Signal [24] A. Doucet, S. J. Godsill, and C. Andrieu, “On sequential Processing, vol. 40, no. 4, pp. 725–735, 1992. Monte Carlo sampling methods for Bayesian filtering,” Statis- [5] Y. Ephraim and D. Malah, “Speech enhancement using a tics and Computing, vol. 10, no. 3, pp. 197–208, 2000. minimum-mean square error short-time spectral amplitude estimator,” IEEE Trans. Acoustics, Speech, and Signal Process- [25] E. L. Lehmann and G. Casella, Theory of Point Estimation, ing, vol. 32, no. 6, pp. 1109–1121, 1984. Springer-Verlag, New York, NY, USA, 1998. [6] M. G. Hall, A. V. Oppenheim, and A. S. Willsky, “Time- [26] R. Merwe, A. Doucet, N. Freitas, and E. Wan, “The unscented varying parametric modeling of speech,” Signal Processing, particle filter,” Tech. Rep. CUED/F-INFENG/TR. 380, Signal vol. 5, no. 3, pp. 267–285, 1983. Processing Group, Engineering Department, Cambridge Uni- [7] ETSI, “Speech processing, transmission and quality aspects versity, Cambridge, UK, 2000. (STQ); distributed speech recognition; advanced front-end [27] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, feature extraction algorithm; compression algorithms,” Tech. Chapman & Hall, New York, NY, USA, 1993. Rep. ETSI ES 202 050, European Telecommunications Stan- [28] J. S. Liu and R. Chen, “Sequential Monte Carlo methods for dards Institute, Sophia Antipolis, France, 2002. dynamic systems,” Journal of the American Statistical Associa- [8]M.J.F.GalesandS.J.Young,“Robustspeechrecognitionin tion, vol. 93, no. 443, pp. 1032–1044, 1998. additive and convolutional noise using parallel model com- [29] M. K. Pitt and N. Shephard, “Filtering via simulation: auxil- bination,” Computer Speech and Language,vol.9,no.4,pp. iary particle filters,” Journal of the American Statistical Associ- 289–307, 1995. ation, vol. 94, no. 446, pp. 590–699, 1999. 2384 EURASIP Journal on Applied Signal Processing

[30] M. Davy and S. J. Godsill, “Bayesian harmonic models for musical pitch estimation and analysis,” Tech. Rep. CUED/F- INFENG/TR.431, Signal Processing Group, Engineering De- partment, Cambridge University,, Cambridge, UK, 2002. [31] S. J. Godsill, A. Doucet, and M. West, “Monte Carlo smoothing for nonlinear time series,” Journal of the American Statis- tical Association, vol. 99, no. 465, pp. 156–168, 2004.

Kaisheng Yao is a Postgraduate Researcher at the Institute for Neural Computation, University of California, San Diego. Dr. Yao received his B. Eng. and M. Eng. in electrical engineering from Huazhong University of Science & Technology (HUST), Wuhan, China, in 1993 and 1996. In 2000, he received his Dr. Eng. degree for work performed jointly at the State Key Labora- tory on Microwave and Digital Commu- nications, Department of Electronic Engineering, Tsinghua Uni- versity, China, and at the Human Language Technology Center (HLTC), Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Hong Kong. From 2000 to 2002, he was a Researcher in Spoken Language Trans- lation Research Laboratories at the Advanced Telecommunications Research Institute International (ATR), Japan. Since August 2002, he has been with the Institute for Neural Computation, Univer- sity of California, San Diego. His research interests are mainly in speech recognition in noise, acoustic modeling, dynamic Bayesian networks, statistical signal processing, and some general problems in pattern recognition. He has severed as a Reviewer of IEEE Trans- actions on Speech and Audio Processing and ICASSP.

Te-Won Lee is an Associate Research Pro- fessor at the Institute for Neural Compu- tation, University of California, San Diego, and a Collaborating Professor in the Biosys- tems Department at the Korea Advanced In- stitute of Science and Technology (KAIST). Dr. Lee received his diploma degree in March 1995 and his Ph.D. degree in October 1997 (summa cum laude) in electrical engineering from the University of Technology Berlin. During his studies, he received the Erwin-Stephan Prize for excellent studies from the University of Technology Berlin and the Carl-Ramhauser Prize for excellent dissertations from the Daim- lerChrysler Corporation. He was a Max-Planck Institute Fellow from 1995 till 1997 and a Research Associate at the Salk Institute for Biological Studies from 1997 till 1999. Dr. Lee’s research interests include machine learning algorithms with applications in signal and image processing. Recently, he has worked on variational Bayesian methods for independent component analysis, algorithms for speech enhancement and recognition, models for computational vision, and classiﬁcation algorithms for medical informatics. EURASIP Journal on Applied Signal Processing 2004:15, 2385–2395 c 2004 Hindawi Publishing Corporation

Particle Filtering Applied to Musical Tempo Tracking

Stephen W. Hainsworth Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK Email: [email protected]

Malcolm D. Macleod QinetiQ, Malvern, WR14 3PS, UK Email: [email protected]

Received 30 May 2003; Revised 1 May 2004

This paper explores the use of particle filters for beat tracking in musical audio examples. The aim is to estimate the time-varying tempo process and to find the time locations of beats, as defined by human perception. Two alternative algorithms are presented, one which performs Rao-Blackwellisation to produce an almost deterministic formulation while the second is a formulation which models tempo as a Brownian motion process. The algorithms have been tested on a large and varied database of examples and results are comparable with the current state of the art. The deterministic algorithm gives the better performance of the two algorithms. Keywords and phrases: beat, tracking, particle filters, music.

1. INTRODUCTION examples, though these tend to only find the average tempo and not the phase (as defined in Section 2) of the beat. Mul- Musical audio analysis has been a growing area for research tiple hypothesis approaches (e.g., Goto [10]orDixon[11]) over the last decade. One of the goals in the area is fully auto- are very similar to more rigorously probabilistic approaches mated transcription of real polyphonic audio signals, though (Laroche [12] or Raphael [13], for instance) in that they all this problem is currently only partially solved. More realistic evaluate they likelihood of a hypothesis set; only the frame- sub-tasks in the overall problem exist and can be explored work varies from case to case. Klapuri [14] also presents a with greater success; beat tracking is one of these and has method for beat tracking which takes the approach typified many applications in its own right (automatic accompani- by Scheirer [7] and applies a probabilistic tempo smooth- ment of solo performances [1], auto-DJs, expressive rhyth- ness model to the raw output. This is tested on an extensive mic transformations [2], uses in database retrieval [3], meta- database and the results are the current state of the art. data generation [4], etc.). More recently, particle filters have been applied to the This paper describes an investigation into beat tracking problem; Morris and Sethares [15] briefly present an al- utilising particle filtering algorithms as a framework for se- gorithm which extracts features from the signal and then quential stochastic estimation where the state-space under uses these feature vectors to perform sequential estimation, consideration is a complex one and does not permit a closed though their implementation is not described. Cemgil [16] form solution. also uses a particle filtering method in his comprehensive Historically, a number of methods have been used to at- paper applying Monte Carlo methods to the beat tracking tempt solution of the problem, though they can be broadly of expressively performed MIDI signals.2 This model will be categorised into a number of distinct methodologies.1 The discussed further later, as it shares some aspects with one of oldest approach is to use oscillating filterbanks and to look the models described in this paper. for the maximum output; Scheirer [7] typifies this approach The remainder of the paper is organised as follows: though Large [8] is another example. Autocorrelative meth- Section 2 introduces tempo tracking; Section 3 covers basic ods have also been tried and Tzanetakis [3]orFoote[9]are

2MIDI stands for “musical instrument digital interface” and is a language 1A comprehensive literature review can be found in Seppanen¨ [5]or for expressing musical events in binary. In the context described here, the Hainsworth [6]. note start times are extracted from the MIDI signal. 2386 EURASIP Journal on Applied Signal Processing particle ﬁltering theory. Sections 4, 5 and 6 discuss onset de- Score tection and the two beat tracking models proposed. Results 8 and discussion are presented in Sections 7 and 8, and conclu- Tatum sions in Section 9. Beat Bar 2. TEMPO TRACKING AND BEAT PERCEPTION Timing

So what is beat tracking?3 The least jargon-ridden descrip- Figure 1: Diagram of relationships between metrical levels. tion is that it is the pulse defined by a human listener tapping in time to music. However, the terms tempo, beat and rhythm need to be defined. The highest level descriptor is lents of the human perception. Having said that, it is hoped the rhythm; this is the full description of every timing re- that a successful computer algorithm could help shed light lationship inside a piece of music. However, Bilmes [17] onto potential and as yet unexplained human cognitive pro- breaks this down into four subdivisions: the hierarchical met- cesses. rical structure which describes the idealised timing relationships between musical events (as they might exist in a mu- 2.1. Problem statement sical score for instance), tempo variations which link these together in a possibly time varying flow, timing deviations To summarise, the aim of this investigation is to extract the which are individual timing discrepancies (“swing” is an beat from music as defined by the preferred human tapping example of this) and finally arrhythmic sections. If one ig- tempo; to make the computer tap its hypothetical foot along nores the last of these as fundamentally impossible to analyse in time to the music. This requires a tempo process to be ex- meaningfully, the task is to estimate the tempo curve (tempo plicitly estimated in both frequency and phase, a beat lying tracking) and idealised event times quantised to a grid of where phase is zero. In the process of this, detected “notes” “score locations,” given an input set of musical changepoint in the audio are assigned “score locations” which is equiva- times. lent to quantising them to an underlying, idealised metrical To represent the tempo curve, a frequency and phase is grid. We are not interested in real time implementation nor required such that the phase is zero at beat locations. The in causal beat tracking where only data up to the currently metrical structure can then be broken down into a set of lev- considered time is used for estimation. els described by Klapuri [14]: the beat or tactus is the preferred human tapping tempo; the tatum is the shortest com- 3. PARTICLE FILTERING monly occurring interval; and the bar or measure is related to harmonic change and often correlates to the bar line in com- Particle filters are a sequential Monte Carlo estimation mon score notation of music. It should be noted that the beat method which are powerful, versatile and increasingly used often corresponds to the 1/4 note or crotchet in common no- in tracking problems. Consider the state space system defined tation, but this is not always the case: in fast jazz music, the by beat is often felt at half this rate; in hymn music, traditional notation often gives the beat two crotchets (i.e., 1/2 note). xk = fk xk−1, ξk ,(1) The moral is that one must be careful about relating percep- n n n tion to musical notation! Figure 1 gives a diagrammatic rep- where fk : x × ξ →x , k ∈ N, is a possibly nonlinear resentation of the beat relationships for a simple example. function of the state xk−1, dimension nx and ξk whichisan The beat is subdivided by two to get the tatum and grouped i.i.d. noise process of dimension nξ .Theobjectiveistoesti- in fours to find the bar. The lowest level shows timing devia- mate xk given observations, tions around the fixed metrical grid. Perception of rhythm by humans has long been an active yk = hk xk, νk ,(2) area of research and there is a large body of literature on the n n n subject. Drake et al. [18] found that humans with no musical where hk : x × ν → y is a separate possibly nonlinear training were able to tap along to a musical audio sample “in transform and νk is a separate i.i.d. noise process of dimen- time with the music,” though trained musicians were able to sion nν describing the observation error. do this more accurately. Many other studies have been un- The posterior of interest is given by p(x0:k|y1:k)whichis dertaken into perception of simple rhythmic patterns (e.g., represented in particle filters by a set of point estimates or { (i) w(i)}N { (i) i = ... N} Povel and Essens [19]) and various models of beat percep- particles x0:k, k i=1,where x0:k, 1, , is a set of tion have been proposed (e.g., [20, 21, 22]) from which ideas {w(i) i = support points with associated weights given by k , can be gleaned. However, the models presented in the rest of ... N} N w(i) = 1, , . The weights are normalised such that i=1 k this paper are not intended as perceptual models or even as 1. The posterior is then approximated by perceptually motivated models; they are engineering equiva- N p | ≈ w(i)δ − (i) . x0:k y1:k k x0:k x0:k (3) 3A fuller discussion on this topic can be found in [6]. i=1 Particle Filtering Applied to Musical Tempo Tracking 2387

As N →∞, this assumption asymptotically tends to the true and conditional upon r0:k, z0:k is then deﬁned to be linear posterior. The weights are then selected according to impor- Gaussian. The chain rule gives the expansion, (i) ∼ π (i) | π · tance sampling, x0:k (x0:k y1:k), where ( ) is the so-called importance density. The weights are then given by p r0:k, z0:k|y1:k = p z0:k|r0:k, y1:k p r0:k|y1:k ,(9)

i p | p ( ) | and (x0:k r0:k, y1:k) is deterministically evaluated via the (i) x0:k y1:k wk ∝ . (4) Kalman ﬁlter equations given below in Section 5. After this π (i) | x0:k y1:k marginalisation process (called Rao-Blackwellisation [28]), p(r0:k|y1:k) is then expanded as If we restrict ourselves to importance functions of the form, p | r0:k y1:k π x0:k|y1:k = π xk|x0:k−1, y1:k π x0:k−1|y1:k−1 ,(5) (10) ∝ p yk|r0:k, y1:k−1 p rk|rk−1 × p r0:k−1|y1:k−1 , implying a Markov dependency of order 1, the posterior can with associated (unnormalised) importance weights given by be factorised to give

i i i p | p | ( ) p ( )| ( ) x0:k y1:k (i) (i) yk r0:k, y1:k−1 rk rk−1 wk ∝ wk . (11) −1 π (i)| (i) p | p | rk r0:k−1, y1:k yk x0:k, y1:k−1 xk x0:k−1, y1:k−1 = × p x0:k−1|y1:k−1 p yk|y1:k−1 By splitting the state space up in this way, the dimensionality ∝ p yk|x0:k, y1:k−1 p xk|x0:k−1, y1:k−1 p x0:k−1|y1:k−1 , considered in the particle filter itself is dramatically decreased (6) and the number of particles needed to achieve a given accuracy is also significantly reduced. which allows sequential update. The weights can then be proven to be updated [23] according to 4. CHANGE DETECTION p | (i) p (i)| (i) The success of any algorithm is dependent upon the reliabil- (i) (i) yk xk xk xk−1 wk ∝ wk (7) ity of the data which is provided as an input. Thus, detecting −1 π (i)| (i) xk x0:k−1, y1:k note events in the music for the particle filtering algorithms to track is as important as the actual algorithms themselves. up to a proportionality. Often we are interested in the filtered The onset detection falls into two categories; firstly there is estimate p(xk|y1:k) which can be approximated by detection of transient events which are associated with strong energy changes, epitomised by drum sounds. Secondly, there N is detection of harmonic changes without large associated en- (i) (i) p xk|y1:k ≈ wk δ xk − xk . (8) ergy changes (e.g., in a string quartet). To implement the first i=1 of these, our method approximately follows many algorithms in the literature [7, 11, 12]: frequency bands, f , are separated E n Particle filters often suffer from degeneracy as all but a andanenergyevolutionenvelope f ( ) formed. A three point linear regression is used to find the gradient of E f (n) small number of weights drop to almost zero, a measure of N (i) 2 and peaks in this gradient function are detected (equivalent this being approximated by N ff = 1/ i= (w ) [23]. Good e 1 k to finding sharp, positive increases in energy which hope- choice of the importance density π(xk|x k− , y k) can delay 0: 1 1: fully correspond to the start of notes). Low-energy onsets are this and is crucial to general performance. The introduction ignored and when there are closely spaced pairs of onsets, of a stochastic jitter into the particle set can also help [24]; the lower amplitude one is also discarded. Three frequency however the most common solution is to perform resam- bands were used: 20–200 Hz to capture low frequency in- pling [25] whereby particles with small weights are elimi- i ∗ formation; 200 Hz–15 kHz which captures most of the har- { ( ) }N nated and a new sample set xk i=1 is generated by resam- N monic spectral region; and 15–22 kHz which, contrary to the pling times from the approximate posterior as given by (8) opinion of Duxbury [29], is generally free from harmonic (i)∗ (j) (j) such that Pr(xk = xk ) = wk . The new sample set is then sounds and therefore clearly shows any transient informa- more closely distributed according to the true posterior and tion. (i) the weights should be set to wk = 1/N to reflect this. Further Harmonic change detection is a harder problem and has details on particle filtering can be found in [23, 26]. received very little attention in the past, though two recent A special case of model is the jump Markov linear sys- studies have addressed this [29, 30]. To separate harmonics tems (JMLS) [27] where the state space, x0:k,canbebroken in the frequency domain, long short-time Fourier transform down into {r0:k, z0:k}. r0:k, the jump Markov process, defines (STFT) windows (4096 samples) with a short hop rate (1/8 a path through a bounded and discrete set of potential states frame)wereused.Asameasureofspectralchangefromone 2388 EURASIP Journal on Applied Signal Processing frame to the next, a modified Kullback-Liebler distance mea- where xk is the tempo process at iteration k and can be de- T sure was used: scribed as xk = [ρk, ∆k] . ρk is then the predicted time of k ∆ ∆ = the th observation and k the tempo period, that is, k X[k, n] 60/Tk,whereTk is the tempo in beats per minute (bpm). This dn(k) = log , 2 X[k, n − 1] is equivalent to a constant velocity process and the state inno- (12) vation, ξk is modelled as zero mean Gaussian with covariance DMKL(n) = dn(k), Qk. k∈K, d(n)>0 To solve the quantisation problem, the score location is encoded as the jump parameter, γk,inΦk(γk). This is equiva- X k n n where [ , ] is the STFT with time index and frequency lent to deciding upon the notation that describes the rhythm k bin . The modified measure is thus tailored to accentuate of the observed notes. Φk(γk), is then given by positive energy change. K defines the region 40 Hz–5 kHz where the majority of harmonic energy is to be found and to 1 γk pick peaks, a local average of the function D was formed Φk(γk) = , MKL 01 and then the maximum picked between each of the crossings (15) of the actual function and the average. γk = ck − ck−1. A further discussion of the MKL measure can be found in [31] but a comprehensive analysis is beyond the scope of This associated evolution covariance matrix is [32] this paper. For beat tracking purposes, it is desirable to have a   low false detection rate, though missed detections are not so γ3 γ2  k k    important. While no actual rates for false alarms have been Q = q  3 2  k  2  , (16) determined, the average detected inter-onset interval (IOI) γk T/ N × F γk was compared with an estimate given by ( b ), where 2 T is the length of the example in seconds, Nb is the number of manually labelled beats and F is the number of tatums in for a continuous constant velocity process which is observed a beat. The detected average IOI was always of the order or at discrete time intervals, where q is a scale parameter. larger than the estimate, which shows that under-detection is While the state transition matrix is dependent upon γk, occurring. this is a difference term between two actual locations, ck In summary, there are four vectors of onset observations, and ck−1. It is this process which is important and the prior three from energy change detectors and one from a harmonic on ck becomes a critical issue as it determines the perfor- change detector. The different detectors may all observe an mance characteristics. Cemgil breaks a single beat into sub- actual note, or any combination of them might not. In fact, divisions of two and uses a prior related to the number of clustering of the onset observations from each of the indi- significant digits in the binary expansion of the quantised vidual detection functions is performed prior to the start of location. Cemgil’s application was in MIDI signals where the particle filtering. A group is formed if any events from there is 100% reliability in the data and the onset times are different streams fall within 50 ms of each other for transient accurate. In audio signals, the event detection process in- onsets and 80 ms for harmonic onsets (reflecting the lower troduces errors both in localisation accuracy and in gener- time resolution inherent in the harmonic detection process). ating entirely spurious events. Also, Cemgil’s prior cannot Inspection of the resulting grouped onsets shows that the cope with triplet figures or swing. Thus, we break the no- inter-group separation is usually significantly more than the tated beat down into 24 quantised sub-beat locations, ck = within-group time differences. A set of amplitudes is then as- {1/24, 2/24, ...,24/24, 25/24, ...} and assign a prior sociated with each onset cluster. p c ∝ − d c k exp log2 k , (17) 5. BEAT MODEL 1 where d(ck) is the denominator of the fraction of ck when The model used in this section is loosely based on that of expressed in its most reduced form; that is, d(3/24) = 8, Cemgil et al. [16], designed for MIDI signals. Given the series d(36/24) = 2, and so forth. This prior is motivated by the of onset observations generated as above, the problem is to simple concern of making metrically stronger sub-beat loca- find a tempo profile which links them together and to assign tions more likely; it is a generic prior designed to work with each observation to a quantised score location. all styles and situations. The system can be represented as a JMLS where condi- Finally, the observation model must be considered. Bear- tional on the “jump” parameter, the system is linear Gaussian ing in mind the pre-processing step of clustering onset ob- and the traditional Kalman filter can be used to evaluate the servations from different observation function, the input to sequence likelihood. The system equations are then the particle filter at each step yk willbeavariablelengthvec- tor containing between one and four individual onset observation times. Thus, Hk becomes a function of the length j xk = Φk γk xk−1 + ξk, (13) of the observation vector yk butisessentiallyj rows of the yk = Hkxk + νk, (14) form [10]. The observation error νk is also of length j and Particle Filtering Applied to Musical Tempo Tracking 2389 is modelled as zero-mean Gaussian with diagonal covariance could be that the expected amplitude for a beat is modelled Rk where the elements rjj are related to whichever observa- as twice that of a quaver off-beat. If the particle history shows tion vector is being considered at yk(j). that the previous onset from a given stream was assigned to Thus, conditional upon the ck process which defines the be on the beat and the currently considered location is a qua- l update rate, everything is modelled as linear Gaussian and ver, Θp would equal 0.5. This relative relationship allows the the traditional Kalman filter [33]canbeused.Thisisgiven samemodeltocopewithbothquietandloudsectionsina by the recursion piece. The evolution and observation error terms, p and σp, are assumed to be zero mean Gaussian with appropriate vari- xk|k−1 = Φkxk−1|k−1, ances. T P(k|k − 1) = ΦkP(k − 1|k − 1)Φk + Qk, From now on, to avoid complicating the notation, the T T −1 amplitude process will be represented without sums or prod- K(k) = P(k|k − 1)Hk HkP(k|k − 1)Hk + Rk , l a ={a1 a2 a3 } ucts over the three vectors using p p, p, p and 1 2 3 xk|k = xk|k− + K(k) yk − Hkxk|k− , αp ={αp, αp, αp} (noting that some of these might well be 1 1 given a null value at any given iteration). For each iteration P(k|k) = I − K(k)Hk P(k|k − 1). k, between zero and all three of the amplitude processes will (18) be updated. Each particle must maintain its own covariance estimate P(k|k) as well as its own state. The innovation or residual 5.2. Methodology vector is defined to be the difference between the measured Given the above system, a particle filtering algorithm can and predicted quantities, be used to estimate the posterior at any given iteration. The posterior which we wish to estimate is given yk = yk − Hkxk|k− , (19) 1 by p(c1:k, x1:k, α1:p|y1:k, a1:p) but Rao-Blackwellisation breaks down the posterior into separate terms and has covariance given by S = H P HT R . p c1:k, x1:k, α1:p|y1:k, a1:p k k k|k−1 k + k (20) = p x1:k|c1:k, y1:k (22) 5.1. Amplitude modelling × p α p|c k, a p p c k|y k, a p , The algorithm as described so far will assign the beat (i.e., 1: 1: 1: 1: 1: 1: the phase of c1:k) to the most frequent subdivision, which where p(x1:k|c1:k, y1:k)andp(α1:p|c1:k, a1:p) can be deduced may not be the right one. To aid the correct determination exactly by use of the traditional Kalman filter equations. Thus of phase, attention is turned to the amplitude of the onsets. the only space to search over and perform recursion upon is The assumption is made that the onsets at some score lo- that defined by p(c1:k|y1:k, a1:p). This space is discrete but too cations (e.g., on the beat) will have higher energy than others. large to enumerate all possible paths. Thus we turn to the Each of the three transient onset streams maintains a separate approximation approach offered by particle filters. amplitude process while the harmonic onset stream does not By assuming that the distribution of ck is dependent only have one associated with it due to amplitude not being rele- upon c k− , y k and a p, the importance function can be fac- vant for this feature. 1: 1 1: 1: torised into terms such as π(ck|y1:k, a1:p, c1:k−1). This allows The amplitude processes can be represented as separate recursion of the Rao-Blackwellised posterior JMLSs conditional upon ck. The state equations are given by p c | a αl = Θl αl 1:k y1:k, 1:p p p p−1 + p, l l (21) ∝ p yk, ap|y1:k−1, a1:p−1, c1:k (23) ap = αp + σp, × p ck|ck−1 p c1:k−1|y1:k−1, a1:p−1 , l where ap is the amplitude of the pth onset from the observation stream, l. Thus, the individual process is maintained for where each observation function and updated only when a new ob- p a | a c servation from that stream is encountered. This requires the yk, p y1:k−1, 1:p−1, 1:k introduction of conditioning on p rather than k;1:p then = p yk|y1:k−1, c1:k (24) represents all the indices within the full set 1:k, where an ob- l × p ap|a p− c k servation from stream l is found. Θp(cp−1, cp)isafunction 1: 1, 1: l of cp and cp− . To build up the matrix, Θp, a selection of real 1 and recursive updates to the weight are given by data was examined and a 24 × 24 matrix constructed for the expected amplitude ratio between a pair of score locations. (i) (i) (i) (i) i i p yk|y k− , c p ap|a p− , c p c |c This is then indexed by the currently considered score loca- w( ) = w( ) × 1: 1 1:k 1: 1 1:k k k−1 . k k−1 i i tion cp and also the previously identified one found in stream π c( )| a c( ) k y1:k, 1:p, 1:k−1 l cl Θl , p−1, and the value given is returned to p. For example, it (25) 2390 EURASIP Journal on Applied Signal Processing

6. BEAT MODEL 2 For k = 1 i = N (i) α(i) c(i) for 1: ;drawx1 , 1 and 1 from respective priors The model described above formulates beat location as the for k = 2:end free variable and time as a dependent, non-continuous vari- i = N for 1: able, which seems counter-intuitive. Noting that the model Propagate particle i to a set, s ={1, ..., S} of new c(s) is bilinear, a reformulation of the tempo process is thus locations k . presented now where time is the independent variable and w(s,i) Evaluate the new weight k for each of these by tempo is modelled as Brownian motion4 [35]. The state vec- propagating through the respective Kalman ﬁlter. = τ τ T τ (i) tor is now given by zk [ k, ˙k] where k is in beats and This generates π(ck|y k, a p, c k ). 1: 1: 1: −1 τ˙k is in beats per second (obviously related to bpm). Brown- for i = 1:N Pick a new state for each particle from ian motion, which is a limiting form of the random walk, is π c | a c(i) related to the tempo process by ( k y1:k, 1:p, 1:k−1). Update weights according to (25). dτ˙(t) = qdB(t)+τ˙(0), (26)

Algorithm 1: Rao-Blackwellised particle filter. where q controls the variance of the Brownian motion process B(t) (which is loosely the integral of a Gaussian noise process [32]) and hence the state evolution. This leads to The terms p(yk|y1:k−1, c1:k)andp(ap|a1:p−1, c1:k)arecalcu- t lated from the innovation vector and covariance of the re- τ t = τ τ s ds. p c |c ( ) (0) + ˙( ) (27) spective Kalman filters (see (19)and(20)). ( k k−1)issim- 0 plified to p(ck) and is hence the prior on score location as given in Section 5. Time t is now a continuous variable and hence τ(t) is also a continuously varying parameter, though only being “read” at 5.3. Algorithm algorithmic iterations k thus giving τk τ(tk). The new state equations are given by The algorithm therefore proceeds as given in Algorithm 1. S At each iteration, each particle is propagated to a set of zk = Ξ δk zk−1 + βk, (28) new score locations and the probability of each is evaluated. yk = Γktk + κk, (29) Given the N ×S set of potential states there are then two ways of choosing a new set of updated particles: either stochas- where tic or deterministic selection. The first proceeds in a similar manner to that described by Cemgil [16] where for each k t = t δ . particle the new state is picked from the importance func- k 0 + k (30) tion with a given probability. Deterministic selection sim- j=1 ply takes the best N particles from the whole set of propa- tk is therefore the absolute time of an observation and δk is gated particles. Fully stochastic resampling selection of the the inter-observation time. Ξ(δk) is the state update matrix particles is not an optimal procedure in this case, as dupli- and is given by cation of particles is wasteful. This leaves a choice between Cemgil’s method of stochastically selecting one of the up- 1 δk N Ξ(δk) = . (31) date proposals for each particle or the deterministic -best 01 approach. The latter has been adopted as intuitively sensible. Γk acts in a similar manner to Hk in model one and is of Particle filters suffer from degeneracy in that the poste- variable length but is a vector of ones of the same length as yk. rior will eventually be represented by a single particle with κk is modelled as zero mean Gaussian with covariance Rk as high weight while many particles have negligible probabil- described above. βk is modelled as zero mean Gaussian noise ity mass. Traditional PFs overcome this with resampling (see with covariance given as before by Bar-Shalom [32], [23]) but both methods for particle selection in the previ-   ous section implicitly include resampling. However, degen- δ3 δ2  k k  eracy still exists, in that the PF will tend to converge to a sin-   Q = q  3 2  . gle ck state, so a number of methods were explored for in- k  2  (32) δk creasing the diversity of the particles. Firstly, jitter [24]was δk 2 added to the tempo process to increase local diversity. Sec- ondly, a Metropolis-Hastings (MH) step [34]wasusedtoex- One of the problems associated with Brownian mo- plore jumps to alternative phases of the signal (i.e., to jump tion is that there is no simple, closed form solution for from tracking off-beat quavers to being on the beat). Also, an the prediction density, p(tk|·). Thus attention is turned to MH step to propose related tempos (i.e., doubling or halving the tracked tempo) was investigated but found to be coun- terproductive. 4Also termed as Wiener or Wiener-Levy process. Particle Filtering Applied to Musical Tempo Tracking 2391

fore with a particle filter. The posterior can be updated, thus Initialise: i = 1; z1 = zk; Xk is the predicted inter-onset number of beats. dt > While tol, p t | ∝ p | t p t |t p | i = i z1:k, 1:k y1:k yk z1:k, 1:k k 1:k−1, z1:k zk z1:k−1 +1 If max(τ i) Xk p k| k− = p τk| k− p τ˙k| k− τk . te = tI +(tI+1 − tI ) × (Xk − τI )/(τI+1 + τI ) z z1: 1 z 1 z 1, (37) insert state J between I and I +1 tJ = te Prior importance sampling [23] is used via the hitting time dt = min(tI+1 − te, te − tI ) algorithm above to draw samples of τ˙k and tk: ∼ N m Q m Q Draw zJ ( , )where and are given below π zk, tk|z k− , t k− , y k = p τ˙k|zk− , τk p tk|t k− , z k . q = | τ − X | 1: 1 1: 1 1: 1 1: 1 1: Index min ( 1:i k) . (38) Return τk = Xk, tk = tq and τ˙k = τ˙q. This leads to the weight update being given by Algorithm 2: Sample hitting time. w(i) = w(i) × p | (i) t(i) p τ(i)| (i) . k k−1 yk z1:k, 1:k k zk−1 (39) an alternative method for drawing a hitting time sample of {tk|zk−1, τk = B, tk−1}. This is an iterative process and, con- As before in Section 5, a single beat is split into 24 subdi- ditional upon initial conditions, a linear prediction for the visions and a prior set upon these as given above in (17); p τ | p τ ≡ p c p | (i) t(i) time of the new beat is made. The system is then stochas- ( k zk−1)againreducesto ( k) ( k). (yk z1:k, 1:k)is tically propagated for this length of time and a new tempo the likelihood; if κk from (29) is modelled in the same way as and beat position found. The beat position might under or νk from (14) then the likelihood is Gaussian with covariance overshoot the intended location. If it undershoots, the above again given by Rk which is diagonal and of the same dimen- process is repeated. If it overshoots, then an interpolation es- sion, j as the observation vector yk. Γk is then a j × 1matrix timate is made conditional upon both the previous and sub- with all entries being 1. sequent data estimates. The iteration terminates when the er- Also as before, to explore the beat quantisation space τ1:k ror on τt falls below a threshold. At this point, the algorithm effectively, each particle is predicted onward to S new posi- returns the hitting time tk and the new tempo τ˙k at that hit- tions for τk and therefore again, a set of N × S potential par- ting time. This is laid out explicitly in Algorithm 2,whereΞi ticles is generated. Deterministic selection in this setting is is given by not appropriate so resampling is used to stochastically select N N × S particles from the set. This acts instead of the tradi- 1 dt tional resampling step in selecting high probability particles. Ξi = (33) 01 Amplitude modelling is also included in an identical form to that described in Section 5.1 which modifies (39)to Q and i by w(i) = w(i) × p | (i) t(i) p a | (i) t(i) p τ(i)| (i) . k k−1 yk z1:k, 1:k p z1:k, 1:k k zk−1   (40) dt3 dt2   Also, the MH step described in Section 5.3 to explore differ- Q = q  3 2  . i  dt2  (34) ent phases of the beat is used again. dt 2 7. RESULTS N denotes the Gaussian distribution. The interpolation mean and covariance are given by [36] The algorithms described above in Sections 5 and 6 have been tested on a large database of musical examples drawn from a Q = Q−1 Ξ Q−1 Ξ −1 variety of genres and styles, including rock/pop, dance, clas- I:J + J:I+1 J:I+1 J:I+1 , (35) sical, folk and jazz. 200 samples, averaging about one minute m = Q Q−1Ξ ΞT Q−1 I:J I:J zI + J:I+1 J:I+1zI+1 , in length were used and a “ground truth” manually generated for each by recording a trained musician clapping in time to where the index denotes the use of Ξ and Q from (33)and the music. (34) with appropriate values of dt. The aim is to estimate the tempo and quantisation pa- Thus,wenowhaveamethodofdrawingatimetk and rameters over the whole dataset; in both models, the se- new tempo τ˙k given a previous state zk−1 and proposed new quence of filtered estimates is not the best representation score (beat) location τk. The algorithm then proceeds as be- of this, due to locally unlikely data. Therefore, because each 2392 EURASIP Journal on Applied Signal Processing

Table 1: Results for beat tracking algorithms expressed as a total 100 Classical percentage averaged over the whole database. 80 Jazz Folk Raw Allowed 60 C-L TOT C-L TOT 40 Dance

%correct Rock/pop Model 1 51.5 58.0 69.2 82.2 20 Model 2 34.1 38.4 54.4 72.8 0 Scheirer 26.8 41.9 33.0 53.4 0 20 40 60 80 100 120 140 160 180 200 Song index

(a) particle maintains its own state history, the maximum a posteriori particle at the final iteration was chosen. The parame- 100 ter sets used within each algorithm were chosen heuristically; 80 it was deemed impractical to optimise them over the whole 60 database. Various numbers of particles N were tried though N = 40 results are given below for 200 and 500 for models %correct one and two, respectively. Above these values, performance 20 continued to increase very slightly, as one would expect, but 0 0 20 40 60 80 100 120 140 160 180 200 computational effort also increased proportionally. Song index Tracking was deemed to be accurate if the tempo was correct (interbeat interval matches to within 10%) and a beat (b) was located within 15% of the annotated beat location.5 Kla- puri [14] defines a measure of success as the longest con- Figure 2: Results on test database for model one. The solid line rep- secutive region of beats tracked correctly as a proportion of resents raw performance and the dashed line is performance after the total (denoted “C-L” for consecutive-length). Also pre- acceptable tracking errors have been taken into account. (a) Maxi- sented is a total percentage of correctly tracked beats (la- mumlengthcorrect(%oftotal).(b)Totalpercentagecorrect. belled “TOT”). The results are presented in Table 1.Itwas noted that the algorithms sometimes tracked at double or more typically a particle filter with mainly stochastic pro- half tempo in psychologically plausible patterns; also, dance cesses. Both have many underlying similarities in the model ff music with heavy o -beat accents often caused the algorithm though the inference processes are significantly different. o to track 180 out of phase. The “allowed” columns of the ta- Thus, the results highlight some interesting compar- ble show results accepting these errors. Also shown for com- isons between these two philosophies. On close examination, parison are the results obtained using Scheirer’s algorithm model two was better at finding the most likely local path [7]. through the data, though this was not necessarily the correct The current state of the art is the algorithm of Klapuri one in the long term. A fundamental weakness of the models [14] with 69% success for longest consecutive sequence and is the prior on ck (or equivalently, τk in model two) which 78% for total correct percentage (accepting errors) on his test intrinsically prefers higher tempos—doubling a given tempo database consisting of over 400 examples. Thus the perfor- places more onsets in metrically stronger positions which is mance of our algorithm is at least comparable with this. deemed more likely by the prior given in (17). Because the Figure 2 shows the results for model one over the whole stochastic resampling step efficiently selects and boosts high database graphically while Figure 3 shows the same for model probability regions of the posterior, model two would often two. These are ordered by style and then performance within pick high tempos to track (150-200bpm) which accounts for the style category. Figure 4 shows the tempo profile for a the very low “raw” results. correctly tracked example using model one; note the close A second problem also occurs in model two: because du- agreement between the hand labelled data and the tracked plication of paths through the τ1:k space is necessary to fully tempo. populate each quantisation hypothesis, fewer distinct paths are kept at each iteration. By comparison, the N-best selection scheme of model one ensures that each particle repre- 8. DISCUSSION sents a unique c1:k set and more paths through the state space The algorithms described above have some similar elements are kept for a longer lag. This allows model one to recover but their fundamental operation is quite different: the Rao- better from a region of poor data. This also provides an ex- Blackwellised model of Section 5 actually bears a significant planation for why model one does not track at high tempo resemblance to an interacting multiple models system of the so often—because more paths though the state-space are re- type used in radar tracking [33], as many of the stages are tained for longer, more time is allowed for the amplitude pro- actually deterministic. The second model, however, is much cess to influence the choice of tempo mode. Thus, the conclusion is drawn that the first model is more attractive: the Rao-Blackwellisation of the tempo process allows the search 5The clapped signals were often slightly in error themselves. of the quantisation space to be much more effective. Particle Filtering Applied to Musical Tempo Tracking 2393

100 108 Jazz Folk Classical 80 Rock/pop Dave Matthews band—best of what’s around (live) 106 60 40 Dance 104 %correct 20 0 102 0 20 40 60 80 100 120 140 160 180 200 Song index 100 Tempo (BPM) (a) 98 100 80 96 60 94 40 0 1020304050 %correct 20 Time (s) 0 Hand-labelled tempo 0 20 40 60 80 100 120 140 160 180 200 Estimated tempo Song index

(b) Figure 4: Tempo evolution for a correctly tracked example using model one. Figure 3:Resultsformodeltwo.(a)Maximumlengthcorrect(% of total). (b) Total percentage correct.

The remaining lack of performance can be accredited to pated tempo evolutions and onset locations would have to be four causes: the first is tracking at multiple tempo modes— worked into the priors in order to select the correct tempo. sometimes tracking fails at one mode and settles a few beats This will probably result in an algorithm with many ad-hoc later into a second mode. The results only reflect one of these features but, given that musicians have spent the better part modes. Secondly, stable tracking sometimes occurs at psy- of 600 years trying to create music which confounds expec- chologically implausible modes (e.g., 1.5 times the correct tation, it is unlikely that a simple, generic model to describe tempo) which are not included in the results above. The third all music will ever be found. cause is poor onset detection. Finally, there are also examples in the database which exhibit extreme tempo variation which 9. CONCLUSIONS is never followed. The result of this is a number of suggestions for improve- Two algorithms using particle filters for generic beat track- ments: firstly, the onset detection is crucial and if the detected ing across a variety of musical styles are presented. One is onsets are unreliable (especially at the start of an example) based upon the Kalman filter and is close to a multiple hy- it is unlikely that the algorithm will ever be able to track pothesis tracker. This performs better than a more stochastic the beat properly. This may suggest an “online” onset detec- implementation which models tempo as a Brownian motion tion scheme where the particles propose onsets in the data, process. Results with the first model are comparable with the rather than the current offline, hard decision system. The current state of the art [14]. However, the advantage of parti- other potential scheme for overcoming this would be to pro- cle filtering as a framework is that the model and the imple- pose a salience measure (e.g., [21]) and directly incorporate mentation are separated allowing the easy addition of extra this into the state evolution process, thus hoping to differen- measures to discriminate the correct beat. It is conjectured tiate between likely and unlikely beat locations in the data; that further improvement is likely to require music specific currently, the Rao-Blackwellised amplitude process has been knowledge. given weak variances and hence has little effect in the algorithm, other than to propose correct phase. The other prob- ACKNOWLEDGMENTS lems commonly encountered were tempo errors by plausible ratios; Metropolis-Hastings steps [27] to explore other This work was partly supported by the research program modes of the tempo posterior were tried but have met with BLISS (IST-1999-14190) from the European Commission. little success. The first author is grateful to the Japan Society for the Pro- Thus it seems likely that any real further improvement motion of Science and the Grant-in-Aid for Scientific Re- will have to come from music theory incorporated into the search in Japan for their funding. The authors thank P. algorithm directly, and in a style-specific way—it is unlikely Comon and C. Jutten for helpful comments and are grate- that a beat tracker designed for dance music will work well ful to the anonymous reviewers for their helpful suggestions on choral music! Thus, data expectations and also antici- which have greatly improved the presentation of this paper. 2394 EURASIP Journal on Applied Signal Processing

REFERENCES [21] R. Parncutt, “A perceptual model of pulse salience and metrical accent in musical rhythms,” Music Perception, vol. 11, no. [1] C. Raphael, “A probabilistic expert system for automatic mu- 4, pp. 409–464, 1994. sical accompaniment,” J. Comput. Graph. Statist., vol. 10, no. 3, pp. 486–512, 2001. [22] M. J. Steedman, “The perception of musical rhythm and me- [2] F. Gouyon, L. Fabig, and J. Bonada, “Rhythmic expressiveness tre,” Perception, vol. 6, no. 5, pp. 555–569, 1977. transformations of audio recordings: swing modifications,” in [23] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte Proc. Int. Conference on Digital Audio Effects Workshop,Lon- Carlo sampling methods for Bayesian filtering,” Statistics and don, UK, September 2003. Computing, vol. 10, no. 3, pp. 197–208, 2000. [3] G. Tzanetakis and P. Cook, “Musical genre classification of [24]N.J.Gordon,D.J.Salmond,andA.F.M.Smith,“Novelap- audio signals,” IEEE Trans. Speech, and Audio Processing, vol. proach to nonlinear/non-Gaussian Bayesian state estimation,” 10, no. 5, pp. 293–302, 2002. IEE Proceedings Part F: Radar and Signal Processing, vol. 140, [4] E. D. Scheirer, “About this business of metadata,” in Proc. no. 2, pp. 107–113, 1993. International Symposium on Music Information Retrieval,pp. [25] A. F. M. Smith and A. E. Gelfand, “Bayesian statistics without 252–254, Paris, France, October 2002. tears: a sampling-resampling perspective,” Amer. Statist., vol. [5] J. Seppanen,¨ “Tatum grid analysis of musical signals,” in 46, no. 2, pp. 84–88, 1992. Proc. IEEE Workshop on Applications of Signal Processing to Au- [26] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A dio and Acoustics, pp. 131–134, New Paltz, NY, USA, October tutorial on particle filters for online nonlinear/non-Gaussian 2001. Bayesian tracking,” IEEE Trans. Signal Processing, vol. 50, no. [6] S. W. Hainsworth, Techniques for the automated analysis of 2, pp. 174–188, 2002. musical audio, Ph.D. thesis, Cambridge University Engineer- [27] A. Doucet, N. J. Gordon, and V. Krishnamurthy, “Particle fil- ing Department, Cambridge, UK, 2004. ters for state estimation of jump Markov linear systems,” IEEE [7] E. D. Scheirer, “Tempo and beat analysis of acoustical musical Trans. Signal Processing, vol. 49, no. 3, pp. 613–624, 2001. signals,” J. Acoust. Soc. Amer., vol. 103, no. 1, pp. 588–601, [28] G. Casella and C. P. Robert, “Rao-Blackwellisation of sam- 1998. pling schemes,” Biometrika, vol. 83, no. 1, pp. 81–94, 1996. [8] E. W. Large and M. R. Jones, “The dynamics of attending: [29] C. Duxbury, M. Sandler, and M. Davies, “A hybrid approach Howwetracktimevaryingevents,”Psychological Review, vol. to musical note detection,” in Proc. 5th Int. Conference on Dig- 106, no. 1, pp. 119–159, 1999. ital Audio Effects Workshop, pp. 33–38, Hamburg, Germany, [9] J. Foote and S. Uchihashi, “The beat spectrum: a new ap- September 2002. proach to rhythm analysis,” in Proc. IEEE International Con- ference on Multimedia and Expo, pp. 881–884, Tokyo, Japan, [30] S. Abdallah and M. Plumbley, “Unsupervised onset detection: August 2001. a probabilistic approach using ICA and a hidden Markov clas- [10] M. Goto, “An audio-based real-time beat tracking system for sifier,” in Proc. Cambridge Music Processing Colloquium,Cam- music with or without drum-sounds,” J. of New Music Re- bridge, UK, March 2003. search, vol. 30, no. 2, pp. 159–171, 2001. [31] S. W. Hainsworth and M. D. Macleod, “Onset detection in [11] S. Dixon, “Automatic extraction of tempo and beat from ex- musical audio signals,” in Proc. International Computer Mu- pressive performances,” J. of New Music Research, vol. 30, no. sic Conference, pp. 163–166, Singapore, September–October 1, pp. 39–58, 2001. 2003. [12] J. Laroche, “Estimating tempo, swing and beat locations in [32] Y. Bar-Shalom and T. E. Fortmann, Tracking and Data As- audio recordings,” in Proc. IEEE Workshop on Applications sociation, vol. 179 of Mathematics in Science and Engineering, of Signal Processing to Audio and Acoustics, pp. 135–138, New Academic Press, Boston, Mass, USA, 1988. Paltz, NY, USA, October 2001. [33] S. S. Blackman and R. Popoli, Design and Analysis of Modern [13] C. Raphael, “Automated rhythm transcription,” in Proc. Inter- Tracking Systems, Artech House, Norwood, Mass, USA, 1999. national Symposium on Music Information Retrieval, pp. 99– [34] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, Eds., 107, Bloomington, Ind, USA, October 2001. Markov chain Monte Carlo in practice, Chapman & Hall, Lon- [14] A. Klapuri, “Musical meter estimation and music transcrip- don, UK, 1996. tion,” in Proc. Cambridge Music Processing Colloquium,pp. [35] B. Øksendal, Stochastic Differential Equations, Springer- 40–45, Cambridge University, UK, March 2003. Verlag, New York, NY, USA, 3rd edition, 1992. [15] R. D. Morris and W. A. Sethares, “Beat tracking,” in 7th [36] M. Orton and A. Marrs, “Incorporation of out-of-sequence Valencia International Meeting on Bayesian Statistics, Tenerife, measurements in non-linear dynamic systems using particle Spain, June 2002, personal communication with R. Morris. filters,” Tech. Rep., Cambridge University Engineering De- [16] A. T. Cemgil and B. Kappen, “Monte Carlo methods for partment, Cambridge, UK, 2001. tempo tracking and rhythm quantization,” J. Artificial Intelli- gence Research, vol. 18, no. 1, pp. 45–81, 2003. [17] J. A. Bilmes, “Timing is of the essence: perceptual and computational techniques for representing, learning and reproduc- Stephen W. Hainsworth was born in 1978 ing expressive timing in percussive rhythm,” M.S. thesis, Me- in Stratford-upon-Avon, England. During dia Lab, MIT, Cambridge, Mass, USA, 1993. 8 years at the University of Cambridge, he [18] C. Drake, A. Penel, and E. Bigand, “Tapping in time with me- was awarded the B.A. and M.Eng. degrees in chanical and expressively performed music,” Music Perception, 2000 and the Ph.D. in 2004, with the latter vol. 18, no. 1, pp. 1–23, 2000. concentrating on techniques for the auto- [19] D.-J. Povel and P. Essens, “Perception of musical patterns,” mated analysis of musical audio. Since grad- Music Perception, vol. 2, no. 4, pp. 411–440, 1985. uating for the third time, he has been work- [20] H. C. Longuet-Higgins and C. S. Lee, “The perception of mu- ing in London for Tillinghast-Towers Per- sical rhythms,” Perception, vol. 11, no. 2, pp. 115–128, 1982. rin, an actuarial consultancy. Particle Filtering Applied to Musical Tempo Tracking 2395

Malcolm D. Macleod was born in 1953 in Cathcart, Glasgow, Scotland. He received the B.A. degree in 1974, and Ph.D. on discrete optimisation of DSP systems in 1979, from the University of Cambridge. From 1978 to 1988 he worked for Cambridge Consultants Ltd, on a wide range of signal processing, electronics, and software research and development projects. From 1988 to 1995 he was a Lecturer in the Signal Processing and Communications Group, the Engineering Depart- ment of Cambridge University, and from 1995 to 2002 he was the Department’s Director of Research. In November 2002 he joined the Advanced Signal Processing Group at QinetiQ, Malvern, as a Senior Research Scientist. He has published many papers in the fields of digital filter design, nonlinear filtering, adaptive filtering, efficient implementation of DSP systems, optimal detection, high- resolution spectrum estimation and beamforming, image processing, and applications in sonar, instrumentation, and communication systems.