Improving the Particle Filter in High Dimensions Using Conjugate

Home , Particle filter

Proceedings,18th IFAC Symposium on System Identification Proceedings,18th IFAC Symposium on System Identification Proceedings,18thJuly 9-11, 2018. Stockholm, IFAC Symposium Sweden on System Identification JulyProceedings,18th 9-11, 2018. Stockholm, IFAC Symposium Sweden onAvailable System Identificationonline at www.sciencedirect.com JulyProceedings,18th 9-11, 2018. Stockholm, IFAC Symposium Sweden on System Identification 2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden July 9-11, 2018. Stockholm, Sweden ScienceDirect

IFAC PapersOnLine 51-15 (2018) 670–675 Bugallo (2013); Naesseth et al. (2015); Rebeschini and van be used even for models like (1) where, typically, no closed Improving the particle filter in high Handel (2015); Robert and Künsch (2017). These methods form expression is available for the transition density. Improving the particle filter in high all aim to approximate the particle filter algorithm in some Furthermore, the corresponding weights, given by the ob- dimensionsImproving the using particle conjugate filter artificial in high sense to avoid degeneracy. The method we propose here is servation likelihood, can be easily evaluated. However, this dimensions using conjugate artificial in a different vein—we will instead approximate the model choice of proposal is prone to weight degeneracy, in par- dimensionsprocess using conjugate noise artificial process noise (1) by adding artificial process noise. The filtering problem ticular when the system is high-dimensional or when the process noise is then solved using a regular particle filter with the pro- observations are very informative (low observation noise) Anna Wigren ∗ Lawrence Murray ∗ Fredrik Lindsten ∗ posal chosen as a combination of the standard (bootstrap) or contain outliers (Cappéet al., 2007). This degeneracy Anna Wigren ∗ Lawrence Murray ∗ Fredrik Lindsten ∗ Anna Wigren ∗ Lawrence Murray ∗ Fredrik Lindsten ∗ proposal and the locally optimal proposal. This is related occurs when there is little overlap between the observation Anna Wigren Lawrence Murray Fredrik Lindsten ∗ Department of Information∗ Technology, Uppsala∗ University, Sweden.∗ to ”roughening”, first introduced by Gordon et al. (1993), likelihood and the prior distribution of particles. ∗ Department of Information Technology, Uppsala University, Sweden. ∗ DepartmentE-mail: anna.wigren, of Information lawrence.murray, Technology, Uppsala fredrik.lindsten University,@it.uu.se Sweden. where artificial noise is added after resampling to spread ∗ E-mail: {anna.wigren, lawrence.murray, fredrik.lindsten}@it.uu.se ∗ DepartmentE-mail: {anna.wigren, of Information lawrence.murray, Technology, Uppsala fredrik.lindsten University,}@it.uu.se Sweden. the particles in an attempt to mitigate degeneracy. Here 2.2 The locally optimal proposal E-mail: {anna.wigren, lawrence.murray, fredrik.lindsten}@it.uu.se we refine this concept by proposing a specific proposal for Abstract: The particle{ filter is one of the most successful methods} for state inference and Abstract: The particle filter is one of the most successful methods for state inference and the approximate model to improve the performance for identificationAbstract: The of general particle non-linear filter is one and of non-Gaussian the most successful models. However, methods standardfor state particle inference filters and A possible remedy for the shortcomings of the standard identificationAbstract: The of general particle non-linear filter is one and of non-Gaussian the most successful models. However, methods standardfor state particle inference filters and high-dimensional models. Based on results by Snyder et al. sufferidentificationAbstract: from degeneracyThe of general particle of non-linear filter the particle is one and weights, of non-Gaussian the most in particular successful models. for However, methods high-dimensional standardfor state particle problems.inference filters and We proposal is to use a proposal which shifts the particles sufferidentification from degeneracy of general of non-linear the particle and weights, non-Gaussian in particular models. for However, high-dimensional standard particle problems. filters We (2015), we also provide insights on how approximating the proposesufferidentification from a method degeneracy of general for improving of non-linear the particle the and performance weights, non-Gaussian in of particular the models. particle for However, filter high-dimensional for certain standard challenging particle problems. filters state We towards the observations by taking both the previous proposesuffer from a method degeneracy for improving of the particle the performance weights, in of particular the particle for filter high-dimensional for certain challenging problems. state We model by adding noise can be seen as a bias-variance trade- spaceproposesuffer models,from a method degeneracy with for implications improving of the particle the for high-dimensionalperformance weights, in of particular the inference. particle for filter First high-dimensional for we certain approximate challenging problems. the model state We state xt 1 and the current observation yt into account spacepropose models, a method with for implications improving the for high-dimensionalperformance of the inference. particle filter First for we certain approximate challenging the model state off where the magnitude of the artificial process noise is a − byspacepropose adding models, a method artificial with for implications process improving noise the for in high-dimensionalperformance an additional of state the inference. particle update, filter First then for we we certain approximate design challenging a proposal the model state that when propagating and reweighting the particles. One such byspace adding models, artificial with implications process noise for in high-dimensional an additional state inference. update, First then we we approximate design a proposal the model that tuning parameter. combinesbyspace adding models, the artificial standard with implications process and noise the forlocally in high-dimensional an additionaloptimal proposal. state inference. update, This resultsFirst then we we in approximate design a bias-variance a proposal the modeltrade- that choice is the locally optimal proposal, which is optimal in combinesby adding the artificial standard process and noise the locally in an additionaloptimal proposal. state update, This results then we in design a bias-variance a proposal trade- that off,combinesby adding where the adding artificial standard more process andnoise noise the reduces locally in an the additionaloptimal variance proposal. state of the update, estimate This results then but we increases in design a bias-variance athe proposal model trade- bias. that the sense that the variance of the importance weights is off,combines where the adding standard more andnoise the reduces locally the optimal variance proposal. of the estimate This results but increases in a bias-variance the model trade- bias. Theoff,combines where performance the adding standard more of the andnoise proposed the reduces locally method the optimal variance is empirically proposal. of the estimate Thisevaluated results but on increases in a a linear-Gaussian bias-variance the model trade- state bias. 2. BACKGROUND ON THE PARTICLE FILTER minimized when compared to other proposals depending Theoff, where performance adding more of the noise proposed reduces method the variance is empirically of the estimate evaluated but on increases a linear-Gaussian the model state bias. spaceTheoff, where performance model adding and on more of the noise non-linearproposed reduces method Lorenz’96 the variance is empirically model. of the For estimate evaluatedboth models but on increases we a linear-Gaussian observe the a model significant state bias. only on xt 1 and yt (Doucet et al., 2000). The locally spaceThe performance model and on of the non-linearproposed method Lorenz’96 is empirically model. For evaluatedboth models on we a linear-Gaussian observe a significant state − improvementspaceThe performance model and in performance on of the non-linearproposed over the method Lorenz’96 standard is empirically model. particle For filter. evaluatedboth models on we a linear-Gaussian observe a significant state The particle filter sequentially approximates the filtering optimal proposal propagates the particles according to improvementspace model and in performance on the non-linear over the Lorenz’96 standard model. particle For filter. both models we observe a significant N N i i space model and on the non-linear Lorenz’96 model. For both models we observe a significant distribution asp ˆ (xt y1:t)= w δ i (xt) where x are improvement in performance over the standard particle filter. i=1 t xt t p(xt xt 1)p(yt xt) | q(x x ,y )=p(x x ,y )= − (3) improvement© 2018, IFAC (International in performance Federation over the of Automatic standard particleControl) Hosting filter. by Elsevier Ltd. All rights reserved. random samples (particles), wi are their corresponding t t 1 t t t 1 t | | Keywords: Data assimilation; Sequential Monte Carlo; Estimation and filtering; State-space t | − | − p(yt xt 1) Keywords: Data assimilation; Sequential Monte Carlo; Estimation and filtering; State-space weights, δ is the Dirac delta function and N is the number | − models;Keywords: NonlinearData assimilation; system identification Sequential Monte Carlo; Estimation and filtering; State-space and then reweights using the importance weightsw ˜i = models;Keywords: NonlinearData assimilation; system identification Sequential Monte Carlo; Estimation and filtering; State-space of particles. It is often impossible to sample from the t models; Nonlinear system identification p(y xi )wi . Unfortunately it is often not possible to models; Nonlinear system identification filtering distribution directly, instead importance sampling t t 1 t 1 1. INTRODUCTION as the linear-Gaussian state space model (the Kalman use this| − proposal− due to two major difficulties; it must be 1. INTRODUCTION as the linear-Gaussian state space model (the Kalman is used where samples are drawn sequentially from a 1. INTRODUCTION asfilter). the In linear-Gaussian more general cases state the space filtering model distribution (the Kalman must possible both to sample from the proposal p(xt xt 1,yt) 1. INTRODUCTION filter). In more general cases the filtering distribution must proposal distribution q(xt xt 1,yt). The proposal can be | − Non-linear and high-dimensional state space models arise filter).beas the approximated. In linear-Gaussian more general The cases stateparticle the space filtering filter model is distribution one (the way Kalman to must do | − and to evaluate p(yt xt 1)= p(yt xt)p(xt xt 1)dxt. This Non-linear and high-dimensional state space models arise be approximated. The particle filter is one way to do any distribution from which it is possible to draw samples | − | | − Non-linearin many areas and of high-dimensional application, such state as oceanography space models (Mat- arise bethisfilter). approximated. by In representing more general The the cases particle filtering the filtering filter distribution is distribution one way with to amust set do integral can, in most cases, only be evaluated when p(yt xt) in many areas of application, such as oceanography (Mat- this by representing the filtering distribution with a set and for which q(xt) > 0 whenever p(xt) > 0. To adjust | internNon-linear many et al., areas 2013),and of high-dimensional application, numerical weather such state as prediction oceanography space models (Evensen, (Mat- arise thisofbe weighted approximated. by representing samples The from the particle filtering the distribution. filter distribution is one way with to a set do is conjugate to p(xt xt 1). ternin many et al., areas 2013), of application, numerical weather such as prediction oceanography (Evensen, (Mat- of weighted samples from the distribution. for not sampling from p a correction is introduced in the − tern1994)in many et and al., areas epidemiology 2013), of application, numerical (He et weather such al., 2010; as prediction oceanography Shaman (Evensen, and (Mat- Kar- ofthis weighted by representing samples from the filtering the distribution. distribution with a set | 1994)tern et and al., epidemiology 2013), numerical (He et weather al., 2010; prediction Shaman (Evensen, and Kar- Inof weighted addition samples to being from of significant the distribution. interest on its own, weight update. The unnormalized importance weights are The locally optimal proposal is in general not available 1994)speck,tern et and 2012), al., epidemiology 2013), to mention numerical (He a few. et weather al., Here 2010; prediction we Shaman consider (Evensen, and models Kar- Inof weighted addition samples to being from of significant the distribution. interest on its own, speck,1994) and 2012), epidemiology to mention (He a few. et al., Here 2010; we Shaman consider and models Kar- theIn addition filtering toproblem being is of also significant intimately interest related on to its model own, given by for the model (1). One exception is the special case when speck,of1994) the and form 2012), epidemiology to mention (He a few. et al., Here 2010; we Shaman consider and models Kar- theIn addition filtering toproblem being is of also significant intimately interest related on to its model own, ofspeck, the form 2012), to mention a few. Here we consider models identificationtheIn addition filtering toproblem via being both is of maximum also significant intimately likelihood interest related and on to Bayesian its model own, i i i i i the state dynamics are non-linear with additive Gaussian ofspeck, the form 2012), to mentionxt = af( few.xt 1 Here,vt) we consider models the filtering problem is also intimately related to model i p(xt,xt 1 y1:t) p(xt xt 1)p(yt xt) i of the form theidentification filtering problem via both is maximum also intimately likelihood related and to Bayesian model − − xt = f(xt−1,vt) (1) identificationformulations (see, via both e.g., maximum Schön et likelihoodal. (2015)). and Indeed, Bayesian the w˜t i i | | i i | wt 1 (2) noise, that is when xt = f(xt 1)+vt (Doucet et al., 2000). of the form xt = f(xt−1,vt) (1) q(x ,x y ) q(x x ,y ) − − yt = Cxtt−+1et t identificationformulations (see, via both e.g., maximum Schön et likelihoodal. (2015)). and Indeed, Bayesian the ∝ t t 1 1:t ∝ t t 1 t xyt = fCx(xtt−+1e,vt t) (1) formulationsdata log-likelihood (see, e.g., can be Schön expressed et al. (2015)).as a sum Indeed, of filtering the − | | − Assuming vt and et are independent Gaussian noise with where f is a non-linearyt function= Cxt −+ ofet the previous state x (1), data log-likelihood can be expressed as a sum of filtering i t 1 dataexpectations.formulations log-likelihood (see, Thus, e.g., can even be Schön though expressed et al. we (2015)).as will a sumrestrict Indeed, of filtering our the at- where wt 1 is the normalized weight from the previous mean zero and covariances Q and R respectively, (1) can where f is a non-linearyt function= Cxt + ofet the previous state xt−1, expectations. Thus, even though we will restrict our at- − wherevt is processf is a non-linear noise, and function the observations of the previousyt are state a linearxt−1, tentionexpectations.data log-likelihood to the Thus, filtering can even be problem, though expressed we as willemphasize a sumrestrict of that filtering our the at- time step. The normalized importance weights are wi = be expressed using the densities vt is process noise, and the observations yt are a linear− expectations. Thus, even though we will restrict our at- t functionwherevt is processf is of a the non-linear noise, current and function state the observationsx oft with the previous additiveyt are state Gaussian a linearxt 1, expectations.tention to the Thus, filtering even problem, though we willemphasize restrict that our the at- N functionvt is process of the noise, current and state the observationsx with additiveyt are Gaussian a linear− tentionimprovements to the offered filtering by problem, the new we method emphasize are useful that also the w˜i/ w˜i. x x (f(x ),Q),yx (Cx ,R). (4) functionnoisevt is processe of the noise,(0 current,R). and This state the form observationsxt with of the additive non-linearyt are Gaussian a linear state improvements offered by the new method are useful also t i=1 t t t 1 t 1 t t t functionnoise et of the(0 current,R). This state formxt with of the additive non-linear Gaussian state improvementsfortention identification to the offered filtering of models by problem, the of new the formwe method emphasize (1). are useful that also the | − ∼N − | ∼N functiont of∼N the current state xt with additive Gaussian for identification of models of the form (1). Both densities are Gaussian, so well-known relations give dynamicsnoise et ∼Nf(xt (01,v,Rt)). is This very formgeneral of compared, the non-linear e.g., to state the forimprovements identification offered of models by the of new the form method (1). are useful also Each iteration in the filtering algorithm consists of three dynamics∼Nf(xt−1,vt) is very general compared, e.g., to the for identification of models of the form (1). dynamicscasenoise whenet ∼Nf the(xt− process(01,v,Rt)). is noiseThis very is formgeneral just of additive, compared, the non-linear and e.g., allows to state the for forThe identification particle filter of can, models unlike of the the form Kalman (1). filter, handle steps. First resampling is (possibly) performed according the proposal p(xt xt 1,yt)= (xt µc, Σc) where case when∼N thet− process1 t noise is just additive, and allows for The particle filter can, unlike the Kalman filter, handle | − N | dynamicsusingcase when blackboxf the(xt− process1 simulation,vt) is noise very models, is general just additive, compared,discretized and e.g., stochastic allows to the for Thehighly particle non-linear filter models, can, unlike but may the also Kalman experience filter, handledegen- T T 1 case when the− process noise is just additive, and allows for highly non-linear models, but may also experience degen- to the normalized weights wt 1 from the previous time µc = f(xt 1)+QC (R + CQC )− (yt Cf(xt 1)) caseusing when blackbox the process simulation noise models, is just additive, discretized and stochastic allows for eracyhighlyThe particle of non-linear the particle filter models, can, weights. unlike but Degeneracy may the also Kalman experience occurs filter, when handledegen- one − − − usingdifferential blackbox equations, simulation etc. For models, the observations discretized we stochastic restrict highly non-linear models, but may also experience degen- step, and the weights are set to 1/N . The particles are T T 1 − differential equations, etc. For the observations we restrict highlyeracy of non-linear the particle models, weights. but Degeneracy may also experience occurs when degen- one Σ = Q QC (R + CQC )− CQ, differentialourusing attention blackbox equations, to thesimulation linear-Gaussian etc. For models, the observations case, discretized which iswe stochastic common restrict eracyparticle of hasthe particlea weight weights. close to Degeneracy one while the occurs weights when of one all then propagated to the next time step using the proposal c our attention to the linear-Gaussian case, which is common eracyparticle of hasthe particlea weight weights. close to Degeneracy one while the occurs weights when of one all − (5) ourindifferential many attention applications. equations, to the linear-Gaussian etc. However, For the the observations case, method which we iswe common propose restrict particleother particles has a weight are close close to zero.to one The while filtering the weights distribution of all distribution q. Finally the normalized importance weights in many applications. However, the method we propose particleother particles has a weight are close close to zero.to one The while filtering the weights distribution of all and for the corresponding weights we obtain incanour many attention handle applications. any to the observation linear-Gaussian However, likelihood the case, method for which which we is common proposethere is otheris then particles effectively are represented close to zero. by The a single filtering particle, distribution which are computed as described above. Further details on the can handle any observation likelihood for which there is otheris then particles effectively are represented close to zero. by The a single filtering particle, distribution which T canain conjugate many handle applications. any prior. observation One However, such likelihoodcase the is a method Poisson for which we distributed proposethere is isresults then in effectively a very poor represented approximation. by a single It has particle, been shown which particle filtering algorithm can be found e.g. in (Doucet p(yt xt 1)= (yt Cf(xt 1),R+ CQC ). (6) a conjugate prior. One such case is a Poisson distributed results in a very poor approximation. It has been shown − − observationacan conjugate handle anylikelihood prior. observation One (with sucha likelihoodcase Gamma is a Poisson distributed for which distributed there prior). is resultsthatis then to in effectively avoid a very weight poor represented degeneracy approximation. by the a single number It has particle, been of particles shown which | N | observationa conjugate likelihood prior. One (with such a case Gamma is a Poisson distributed distributed prior). that to avoid weight degeneracy the number of particles et al., 2000). observationa conjugate likelihood prior. One (with such a case Gamma is a Poisson distributed distributed prior). thatmustresults to increase in avoid a very exponentially weight poor degeneracy approximation. with the the state number It hasdimension been of particles shown (Sny- Whenobservation performing likelihood filtering (with we a wish Gamma to recover distributed the unknown prior). must increase exponentially with the state dimension (Sny- 3. PARTICLE FILTER WITH CONJUGATE Whenobservation performing likelihood filtering (with we a wish Gamma to recover distributed the unknown prior). mustderthat et to increase al., avoid 2015). exponentially weight Weight degeneracy degeneracy with the the is state number therefore dimension of a particlesfrequent (Sny- statesWhenx performingat time t filteringgiven all we observations wish to recovery of the the unknown process der et al., 2015). Weight degeneracy is therefore a frequent 2.1 The standard proposal ARTIFICIAL PROCESS NOISE statesWhenx performingt at time t filteringgiven all we observations wish to recovery1:t of the the unknown process derissuemust et increasefor al., high-dimensional 2015). exponentially Weight degeneracy problems. with the is state therefore dimension a frequent (Sny- statesWhenup to timex performingt att time. Thet filtering filteringgiven all we distribution observations wish to recoverpy(1:xt ofy the the) unknown can process only issue for high-dimensional problems. statesup to timext att time. Thet filteringgiven all distribution observationspy(1:xt ofy1: thet) can process only issueder et for al., high-dimensional 2015). Weight degeneracy problems. is therefore a frequent statesupbe evaluated to timext att time. in The closedt filteringgiven form all distribution observations for a few specificpy(1:xt|ofy models,1: thet) can process suchonly A range of different techniques for high-dimensional filter- A common choice of proposal distribution is the transi- The locally optimal proposal has minimal degeneracy com- upbe evaluated to time t. in The closed filtering form distribution for a few specificp(xt|y models,1:t) can suchonly Aissue range for of high-dimensional different techniques problems. for high-dimensional filter- upbe evaluated to time t. in The closed filtering form distribution for a few specificp(xt|y models,1:t) can suchonly Aing range have of been different previously techniques developed. for high-dimensional Methods like the filter- en- tion density p(xt xt 1), referred to as the standard pro- pared to other proposals, and for high-dimensional systems beThis evaluated research in is closed financially form supported for a few by specific the Swedish| models, Research such ing have been previously developed. Methods like the en- | − beThis evaluated research in is closed financially form supported for a few by specific the Swedish models, Research such ingsembleA range have Kalman of been different previously filter techniques (Evensen, developed. for 1994) high-dimensional Methods can be used like to the filter- solve en- posal. Inserting this proposal in (2) gives the unnormalized it can improve upon the standard proposal by several Council This research via the isproject financiallyLearning supported of Large-Scale by the Swedish Probabilistic Research Dy- semble Kalman filter (Evensen, 1994) can be used to solve CouncilThis research via the isproject financiallyLearning supported of Large-Scale by the Swedish Probabilistic Research Dy- thesembleing have filtering Kalman been problem previously filter (Evensen, if the developed. system 1994) Methodsis can mildly be used like non-linear. to the solve en- weightsw ˜t = p(yt xt)wt 1. If resampling is performed in orders of magnitude (Snyder et al., 2015). Unfortunately, namicalCouncil This research Modelsvia the(contract isproject financiallyLearning number: supported of 2016-04278) Large-Scale by the and Swedish Probabilistic by the Research Swedish Dy- thesemble filtering Kalman problem filter (Evensen, if the system 1994) is can mildly be used non-linear. to solve | − namicalCouncil Modelsvia the(contract project Learning number: of 2016-04278) Large-Scale and Probabilistic by the Swedish Dy- Forthesemblemore filtering Kalman difficult problem filter cases (Evensen, if adaptation the system 1994) of is canthe mildly particlebe used non-linear. tofilter solve to every iteration this choice of proposal corresponds to the as pointed out in the previous section the locally optimal namicalFoundationCouncil Modelsvia forthe(contract Strategicproject Learning number: Research of 2016-04278) via Large-Scale the projects and Probabilistic byProbabilistic the Swedish Dy- the filtering problem if the system is mildly non-linear. Foundationnamical Models for(contract Strategic number: Research 2016-04278) via the projects and byProbabilistic the Swedish theFormore filtering difficult problem cases if adaptation the system of is the mildly particle non-linear. filter to original version of the particle filter, the bootstrap filter proposal is in general not available for (1) and common FoundationnamicalModeling Models and for Inference(contract Strategic for number: Research Machine 2016-04278) via Learning the projects and(contract byProbabilistic the number: Swedish Forhigher more dimensions difficult cases is necessary. adaptation Some of examples the particle of particlefilter to ModelingFoundation and for Inference Strategic for Research Machine via Learning the projects(contractProbabilistic number: Forhigher more dimensions difficult cases is necessary. adaptation Some of examples the particle of particlefilter to ModelingFoundationICA16-0015) and for and Inference StrategicASSEMBLE for Research Machine(contract via Learning number:the projects(contract RIT15-0012).Probabilistic number: higherfilters for dimensions high-dimensional is necessary. problems Some examples include Djurićand of particle (Gordon et al., 1993). Note that it is sufficient to be able approximations, e.g. based on local linearizations (Doucet ICA16-0015)Modeling and and InferenceASSEMBLE for Machine(contract Learning number:(contract RIT15-0012). number: filters for high-dimensional problems include Djurićand ICA16-0015)Modeling and and InferenceASSEMBLE for Machine(contract Learning number:(contract RIT15-0012). number: filtershigher for dimensions high-dimensional is necessary. problems Some examples include Djurićand of particle to simulate from p(xt xt 1), exact evaluation of the expres- et al., 2000), are not applicable when the transition density ICA16-0015) and ASSEMBLE (contract number: RIT15-0012). filters for high-dimensional problems include Djurićand | − ICA16-0015) and ASSEMBLE (contract number: RIT15-0012). filters for high-dimensional problems include Djurićand sion is not required. Therefore the standard proposal can function is intractable. However, to still be able to leverage Copyright2405-8963 © 20182018, IFACIFAC (International Federation of Automatic Control)670 Hosting by Elsevier Ltd. All rights reserved. CopyrightPeer review © under2018 IFACresponsibility of International Federation of Automatic670 Control. Copyright10.1016/j.ifacol.2018.09.207 © 2018 IFAC 670 Copyright © 2018 IFAC 670 671

10.1016/j.ifacol.2018.09.207 2405-8963 2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden Anna Wigren et al. / IFAC PapersOnLine 51-15 (2018) 670–675 671

Bugallo (2013); Naesseth et al. (2015); Rebeschini and van be used even for models like (1) where, typically, no closed Handel (2015); Robert and Künsch (2017). These methods form expression is available for the transition density. all aim to approximate the particle filter algorithm in some Furthermore, the corresponding weights, given by the ob- sense to avoid degeneracy. The method we propose here is servation likelihood, can be easily evaluated. However, this in a different vein—we will instead approximate the model choice of proposal is prone to weight degeneracy, in par- (1) by adding artificial process noise. The filtering problem ticular when the system is high-dimensional or when the is then solved using a regular particle filter with the pro- observations are very informative (low observation noise) posal chosen as a combination of the standard (bootstrap) or contain outliers (Cappéet al., 2007). This degeneracy proposal and the locally optimal proposal. This is related occurs when there is little overlap between the observation to ”roughening”, first introduced by Gordon et al. (1993), likelihood and the prior distribution of particles. where artificial noise is added after resampling to spread the particles in an attempt to mitigate degeneracy. Here 2.2 The locally optimal proposal we refine this concept by proposing a specific proposal for the approximate model to improve the performance for A possible remedy for the shortcomings of the standard high-dimensional models. Based on results by Snyder et al. proposal is to use a proposal which shifts the particles (2015), we also provide insights on how approximating the towards the observations by taking both the previous model by adding noise can be seen as a bias-variance trade- state xt 1 and the current observation yt into account off where the magnitude of the artificial process noise is a when propagating− and reweighting the particles. One such tuning parameter. choice is the locally optimal proposal, which is optimal in the sense that the variance of the importance weights is 2. BACKGROUND ON THE PARTICLE FILTER minimized when compared to other proposals depending only on xt 1 and yt (Doucet et al., 2000). The locally The particle filter sequentially approximates the filtering optimal proposal− propagates the particles according to N N i i distribution asp ˆ (xt y1:t)= i=1 wtδxi (xt) where xt are p(xt xt 1)p(yt xt) | t q(x x ,y )=p(x x ,y )= − (3) random samples (particles), wi are their corresponding t t 1 t t t 1 t | | t | − | − p(yt xt 1) weights, δ is the Dirac delta function and N is the number | − and then reweights using the importance weightsw ˜i = of particles. It is often impossible to sample from the t p(y xi )wi . Unfortunately it is often not possible to filtering distribution directly, instead importance sampling t t 1 t 1 use this| − proposal− due to two major difficulties; it must be is used where samples are drawn sequentially from a possible both to sample from the proposal p(xt xt 1,yt) proposal distribution q(xt xt 1,yt). The proposal can be | − | − and to evaluate p(yt xt 1)= p(yt xt)p(xt xt 1)dxt. This any distribution from which it is possible to draw samples | − | | − integral can, in most cases, only be evaluated when p(yt xt) and for which q(xt) > 0 whenever p(xt) > 0. To adjust | is conjugate to p(xt xt 1). for not sampling from p a correction is introduced in the | − weight update. The unnormalized importance weights are The locally optimal proposal is in general not available given by for the model (1). One exception is the special case when i i i i i the state dynamics are non-linear with additive Gaussian i p(xt,xt 1 y1:t) p(xt xt 1)p(yt xt) i w˜t i i− | | i −i | wt 1 (2) noise, that is when xt = f(xt 1)+vt (Doucet et al., 2000). ∝ q(xt,xt 1 y1:t) ∝ q(xt xt 1,yt) − − − | | − Assuming vt and et are independent Gaussian noise with i where wt 1 is the normalized weight from the previous mean zero and covariances Q and R respectively, (1) can − i time step. The normalized importance weights are wt = be expressed using the densities i N i w˜t/ w˜t. xt xt 1 (f(xt 1),Q),yt xt (Cxt,R). (4) i=1 | − ∼N − | ∼N Both densities are Gaussian, so well-known relations give Each iteration in the filtering algorithm consists of three the proposal p(xt xt 1,yt)= (xt µc, Σc) where steps. First resampling is (possibly) performed according | − N | T T 1 to the normalized weights wt 1 from the previous time µc = f(xt 1)+QC (R + CQC )− (yt Cf(xt 1)) − − − step, and the weights are set to 1/N . The particles are T T 1 − Σ = Q QC (R + CQC )− CQ, then propagated to the next time step using the proposal c − distribution q. Finally the normalized importance weights (5) are computed as described above. Further details on the and for the corresponding weights we obtain T particle filtering algorithm can be found e.g. in (Doucet p(yt xt 1)= (yt Cf(xt 1),R+ CQC ). (6) | − N | − et al., 2000). 3. PARTICLE FILTER WITH CONJUGATE 2.1 The standard proposal ARTIFICIAL PROCESS NOISE

A common choice of proposal distribution is the transi- The locally optimal proposal has minimal degeneracy com- tion density p(xt xt 1), referred to as the standard pro- pared to other proposals, and for high-dimensional systems posal. Inserting this| − proposal in (2) gives the unnormalized it can improve upon the standard proposal by several weightsw ˜t = p(yt xt)wt 1. If resampling is performed in orders of magnitude (Snyder et al., 2015). Unfortunately, every iteration this| choice− of proposal corresponds to the as pointed out in the previous section the locally optimal original version of the particle filter, the bootstrap filter proposal is in general not available for (1) and common (Gordon et al., 1993). Note that it is sufficient to be able approximations, e.g. based on local linearizations (Doucet to simulate from p(xt xt 1), exact evaluation of the expres- et al., 2000), are not applicable when the transition density sion is not required.| Therefore− the standard proposal can function is intractable. However, to still be able to leverage

671 2018 IFAC SYSID 2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden July672 9-11, 2018. Stockholm, Sweden Anna Wigren et al. / IFAC PapersOnLine 51-15 (2018) 670–675 the beneﬁts of the locally optimal proposal we propose a on ε implies that there is a lot of freedom in moving the controlled approximation of (1) where artiﬁcial noise is particles in the second stage of the proposal which results added in an extra state update. The approximate model in a lower Monte-Carlo variance, but at the cost of a higher is given by model bias.

xt = f(xt 1,vt) (7a) − For the case of a linear-Gaussian observation model the xt = xt + εξt (7b) covariance matrix S, describing the correlation structure, yt = Cxt + et (7c) must also be specified. A simple choice is to use the identity where ε is a parameter adjusting the magnitude of the matrix which corresponds to adding noise of the same artificial noise. We consider a linear-Gaussian observation magnitude to all states, but with no correlation between model (7c), hence for conjugacy between (7b) and (7c) states. However, if some states are not observed they will not be affected by the artificial process noise—they ξt (0,S) where S is a covariance matrix. Note that by choosing∼N ε = 0 we recover the original model (1). will just be propagated blindly forwards according to the standard proposal. Another possible choice is to use the Fig. 1. Marginal log-likelihood and MSE as a function of ε for a 10-dimensional linear-Gaussian state space model. To design a particle filter for (7) we must choose a proposal weighted sample covariance matrix. This choice will take Left: Block-diagonal covariance matrix. Right: Weighted sample covariance matrix. The black solid line is the true Kalman filter estimate, the dashed red line is the Kalman filter estimate for (7) and the blue dots are the particle and derive the corresponding weights. If xt and xt are the correlation structure of the states into account which taken to be the system states, the model (7) suggests using can mitigate the impact of not observing some states. Each filter estimates for (7). a combination of the standard and the locally optimal element of the weighted sample covariance matrix Ξ at proposal. First the particles are propagated according to time t is given by where A is tridiagonal with value 0.6 on the diagonal, ter estimate for the approximate model (7) corresponding the standard proposal (7a). Noting that the two latter N 0.2 on the first diagonal above and below giving a local to the best we can do. equations (7b) and (7c) are linear-Gaussian the particles 1 i i i dependence between the states, and zeros everywhere else. Ξjk = w (xj µj)(xk µk) (10) are then propagated according to the locally optimal 1 N (wi)2 − − We observe half of the states, hence C is 5 10 where 4.2 Lorenz’96 model − i=1 i=1 the left half is an identity matrix and the right× half is a proposal taking xt to be the previous state. Using (5) and i where j, k =1...d, d is the state dimension, w is the zero matrix. To make the estimation problem harder we (6) we obtain the combined proposal i Next we consider the non-linear, possibly chaotic Lorenz’96 normalized weight of particle i, xj is the value of the j:th assume that the covariance of the measurement noise is model which is often used for testing data assimilation q(xt,xt xt 1,yt)=p(xt xt ,yt)p(xt xt 1) dimension of particle i and µ ,µ are the sample mean for | − | | − j k two orders of magnitude smaller than the covariance of algorithms (Lorenz, 1995). The d-dimensional Lorenz’96 N i i 4 2 = (xt µ, Σ)p(xt xt 1) dimension j, k given by µ = w x . the process noise (10 vs 10 ). Data for T = 200 time N | | − j i=1 j − − model is defined in terms of coupled stochastic differ- 2 T 2 T 1 (8) steps was generated from (11) and N = 1000 particles were µ = xt + ε SC (R + Cε SC )− (yt Cxt ) ential equations which, for the continuous state X(t)= − T 2 2 T 2 T 1 2 4. NUMERICAL EXAMPLES used to estimate the states with the particle filter given by (X1(t),...,Xd(t)) , are given by Σ=ε S ε SC (R + Cε SC )− Cε S. − (8) and (9). where p(x x ) is the standard proposal (7a). The cor- dXk(t)= Xk+1(t) Xk 2(t) Xk 1(t) Xk(t)+F dt t t 1 To evaluate our proposed method we consider two exam- − − − − responding| importance− weights are For the linear-Gaussian state space model the optimal so- ples; a linear-Gaussian state space model and the non- + bdWk(t), (12) 2 T lution to the filtering problem is available from the Kalman w˜t = p(yt xt )wt 1 = (yt Cxt ,R+ Cε SC )wt 1. (9) linear Lorenz’96 model (Lorenz, 1995). For both models filter, hence it is possible to compare the performance of where the first term is drift and the second term is | − N | − we examine the 10-dimensional case where only half of It is clear from (8) and (9) that choosing ε = 0 will recover our method with the best achievable performance. We diffusion. The model is cyclic, hence X 1(t)=Xd 1(t), the states are observed (in noise) at each time step. Two − − the standard proposal. Equation (9) also indicates why will compare both with the true Kalman filter estimate X0(t)=Xd(t) and Xd+1(t)=X1(t) is assumed. F is a this choice of proposal is beneficial for high-dimensional choices of covariance matrices for the artificial process and with the Kalman filter estimate for the approximate forcing constant confining the volume in which the solution problems; adding artificial process noise (ε>0) will make noise are considered; the block-diagonal matrix B with model (7). can move and, for a fixed dimension d of the state space, it the covariance parameter larger which in turn will make an identity matrix in the upper block and zeros in the determines whether the system exhibits chaotic, decaying the weights less degenerate. lower block, and the weighted sample covariance matrix Fig. 1 shows the log-likelihood and the MSE as a function or periodic behavior (Karimi and Paul, 2010). We consider Ξ with elements given by (10). The first choice will add of ε for S = B (left) and S = Ξ (right). It is clear the 10-dimensional case and to obtain a highly non-linear One way to interpret the proposed strategy is that adding artificial process noise to the observed states only, with no from the log-likelihood plots that the standard proposal artificial process noise in (7) will introduce some extra model exhibiting chaotic behavior we use F = 12. For the correlation between the states, whereas the latter choice (ε = 0) degenerates whereas with our proposed method diffusion term we choose b =0.1 and W (t) is a standard 10- movement to the particles. The first propagation (standard it is possible to almost reach the log-likelihood and MSE i will add artificial process noise to all states and allows for dimensional Wiener process. The observations are linear- proposal) moves the initial particles xt 1 according to correlation between the states. for the true Kalman filter for certain ranges of ε. It is also the state dynamics. If we add noise (ε>− 0) the optimal Gaussian and, like in (11), we observe only half of the evident that there is a bias-variance trade-off with a higher states. proposal then moves the propagated particles further, To quantify the performance of our method we use two variance for low values on ε and a lower variance but bigger shifting them towards the observation yt according to the measures; the marginal log-likelihood of the data, log Z = bias for larger values on ε. Data was generated for T = 200 steps assuming obser- log p(y ), and the mean square error (MSE) of the state expression for the mean in (8). 1:T Remark 1. Note that for small values of ε the negative bias vations are made with a timestep ∆t =0.1. The sys- estimates averaged over all time steps and all dimensions. tem can be discretized by considering the discrete state Snyder et al. (2015) found that the difference in per- The likelihood of the data is of particular interest for in the estimate of log Z is an effect of the increased Monte formance between the locally optimal and the standard Carlo variance. It is well known that the particle filter xt = X(t∆t) which, between observations, is propagated parameter estimation and model checking, and the MSE forward according to (12) using M = 15 iterations of proposal increases with the magnitude of the process noise. of the state estimate is of interest for filtering applications. estimate of Z is unbiased, which by Jensen’s inequality ˆ the Euler-Maruyama method for numerical solution of Effectively this means that when using the locally optimal In our discussion of the performance we will also refer to implies E(log Z) log Z, where a large Monte Carlo proposal the Monte Carlo variance of e.g. estimates of variance tends to result≤ in a large negative bias. By a log- stochastic differential equations. Note that this system, the effective sample size (ESS) for the particle filter given unlike the previously considered linear-Gaussian system, the normalizing constant or test functions, such as the N normal central limit theorem (Bérard et al., 2014) it holds by N =1/ (wi )2. If the ESS drops too low at posterior mean or variance of the system state, can be eff i=1 t 1 that the bias is roughly σ2/2 for N large enough, where is one example of (1) where no closed form expression for some point the filter estimates− will be degenerate since all reduced by adding more process noise. However, adding σ2 is the Monte Carlo variance.− p(xt xt 1) exists which makes the filtering problem partic- particles will originate from a small number of ancestor | − more artificial process noise in (7) will introduce more bias ularly challenging. For the artificial process noise particle particles. in our estimates, so ultimately our proposed method has For the sample covariance the highest log-likelihood and filter we use N = 2000 particles and the propagation of a bias-variance trade-off where ε is the tuning parameter. lowest MSE is obtained for similar values on ε, around the states is a simulation forward in time using the Euler- 4.1 Linear-Gaussian model 0.5. For the identity matrix on the other hand the range of Maruyama method for solving (12) numerically. 3.1 Choice of parameters values for ε giving the highest log-likelihood are lower than For small values of ε the particle filter tends to degenerate, We first consider a 10-dimensional linear state space model the range of ε giving the lowest MSE. As ε increases both resulting in very poor estimates of the log-likelihood (as The parameter ε adjusts the magnitude of the noise and with Gaussian noise of the form choices of covariance matrices approaches the Kalman fil- low as 3 106) and high MSE values (as high as 60). For hence controls the bias-variance trade-off. A high value xt = Axt 1 + vt,yt = Cxt + et (11) − · − 673 672 2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden Anna Wigren et al. / IFAC PapersOnLine 51-15 (2018) 670–675 673

Fig. 1. Marginal log-likelihood and MSE as a function of ε for a 10-dimensional linear-Gaussian state space model. Left: Block-diagonal covariance matrix. Right: Weighted sample covariance matrix. The black solid line is the true Kalman filter estimate, the dashed red line is the Kalman filter estimate for (7) and the blue dots are the particle filter estimates for (7). where A is tridiagonal with value 0.6 on the diagonal, ter estimate for the approximate model (7) corresponding 0.2 on the first diagonal above and below giving a local to the best we can do. dependence between the states, and zeros everywhere else. We observe half of the states, hence C is 5 10 where 4.2 Lorenz’96 model the left half is an identity matrix and the right× half is a zero matrix. To make the estimation problem harder we Next we consider the non-linear, possibly chaotic Lorenz’96 assume that the covariance of the measurement noise is model which is often used for testing data assimilation two orders of magnitude smaller than the covariance of algorithms (Lorenz, 1995). The d-dimensional Lorenz’96 4 2 the process noise (10− vs 10− ). Data for T = 200 time model is defined in terms of coupled stochastic differ- steps was generated from (11) and N = 1000 particles were ential equations which, for the continuous state X(t)= T used to estimate the states with the particle filter given by (X1(t),...,Xd(t)) , are given by (8) and (9). dXk(t)= Xk+1(t) Xk 2(t) Xk 1(t) Xk(t)+F dt For the linear-Gaussian state space model the optimal so- − − − − lution to the filtering problem is available from the Kalman + bdWk(t), (12) filter, hence it is possible to compare the performance of where the first term is drift and the second term is our method with the best achievable performance. We diffusion. The model is cyclic, hence X 1(t)=Xd 1(t), − − will compare both with the true Kalman filter estimate X0(t)=Xd(t) and Xd+1(t)=X1(t) is assumed. F is a and with the Kalman filter estimate for the approximate forcing constant confining the volume in which the solution model (7). can move and, for a fixed dimension d of the state space, it determines whether the system exhibits chaotic, decaying Fig. 1 shows the log-likelihood and the MSE as a function or periodic behavior (Karimi and Paul, 2010). We consider of ε for S = B (left) and S = Ξ (right). It is clear the 10-dimensional case and to obtain a highly non-linear from the log-likelihood plots that the standard proposal model exhibiting chaotic behavior we use F = 12. For the (ε = 0) degenerates whereas with our proposed method diffusion term we choose b =0.1 and W (t) is a standard 10- it is possible to almost reach the log-likelihood and MSE dimensional Wiener process. The observations are linear- for the true Kalman filter for certain ranges of ε. It is also Gaussian and, like in (11), we observe only half of the evident that there is a bias-variance trade-off with a higher states. variance for low values on ε and a lower variance but bigger bias for larger values on ε. Data was generated for T = 200 steps assuming obser- Remark 1. Note that for small values of ε the negative bias vations are made with a timestep ∆t =0.1. The sys- in the estimate of log Z is an effect of the increased Monte tem can be discretized by considering the discrete state Carlo variance. It is well known that the particle filter xt = X(t∆t) which, between observations, is propagated estimate of Z is unbiased, which by Jensen’s inequality forward according to (12) using M = 15 iterations of the Euler-Maruyama method for numerical solution of implies E(log Zˆ) log Z, where a large Monte Carlo variance tends to result≤ in a large negative bias. By a log- stochastic differential equations. Note that this system, normal central limit theorem (Bérard et al., 2014) it holds unlike the previously considered linear-Gaussian system, that the bias is roughly σ2/2 for N large enough, where is one example of (1) where no closed form expression for σ2 is the Monte Carlo variance.− p(xt xt 1) exists which makes the filtering problem partic- ularly| challenging.− For the artificial process noise particle For the sample covariance the highest log-likelihood and filter we use N = 2000 particles and the propagation of lowest MSE is obtained for similar values on ε, around the states is a simulation forward in time using the Euler- 0.5. For the identity matrix on the other hand the range of Maruyama method for solving (12) numerically. values for ε giving the highest log-likelihood are lower than For small values of ε the particle filter tends to degenerate, the range of ε giving the lowest MSE. As ε increases both resulting in very poor estimates of the log-likelihood (as choices of covariance matrices approaches the Kalman fil- low as 3 106) and high MSE values (as high as 60). For − ·

673 2018 IFAC SYSID 2018 IFAC SYSID July674 9-11, 2018. Stockholm, Sweden Anna Wigren et al. / IFAC PapersOnLine 51-15 (2018) 670–675 July 9-11, 2018. Stockholm, Sweden

course like to know which ε to pick beforehand, or have an small populations as a case study. Journal of The Royal adaptive scheme for tuning ε online. Devising such rules- Society Interface, 7(43), 271–283. of-thumb or adaptive methods is a topic for future work. Kantas, N., Doucet, A., Singh, S.S., Maciejowski, J., and The same hold for the choice of base covariance S which Chopin, N. (2015). On particle methods for parameter needs further investigation. We note, however, that our estimation in state-space models. Statistical Science, empirical results suggest that the method is fairly robust 30(3), 328–351. to selecting ε “too large”, whereas a too small ε resulted Karimi, A. and Paul, M.R. (2010). Extensive chaos in the in very poor performance. A possible approach is therefore Lorenz-96 model. Chaos: An Interdisciplinary Journal to start with a large ε which is then gradually decreased of Nonlinear Science, 20(4), 043105. while monitoring the ESS. Lorenz, E. (1995). Predictability: a problem partly solved. In Seminar on Predictability, 4-8 September 1995, vol- Finally, we note that the proposed method is useful not ume 1, 1–18. Shinfield Park, Reading. only for filtering problems, but also for system identifi- Mattern, J.P., Dowd, M., and Fennel, K. (2013). Particle cation. Several state-of-the-art methods for identification Fig. 2. Marginal log-likelihood and MSE as a function of ε for the 10-dimensional Lorenz’96 model, zoomed in on filter-based data assimilation for a three-dimensional bi- of non-linear state-space models are based on the log- the choices of ε which (mostly) avoids degeneracy. Left: Block-diagonal covariance matrix. Right: Weighted sample ological ocean model and satellite observations. Journal likelihood estimate (see, e.g., Schönet al. (2015); Kantas covariance matrix. Red crosses show estimates where the ESS drops below two at least once (indicating a degenerate of Geophysical Research: Oceans, 118(5), 2746–2760. et al. (2015)). Thus, the significant improvement in these filter estimate) and blue dots show estimates where the ESS is always greater than two. Naesseth, C., Lindsten, F., and Schön, T. (2015). Nested estimates offered by the proposed method should have sequential Monte Carlo methods. In Proceedings of the 5. CONCLUSION direct bearing also on non-linear system identification. 1 1 32nd International Conference on Machine Learning, < 2 < 2 1292–1301. Lille, France. eff eff 0.8 0.8 REFERENCES Rebeschini, P. and van Handel, R. (2015). Can local 0.6 0.6 The particle filter is a powerful inference method with Bérard, J., Del Moral, P., and Doucet, A. (2014). A lognor- particle filters beat the curse of dimensionality? The strong convergence guarantees. However, for challenging 0.4 0.4 mal central limit theorem for particle approximations of Annals of Applied Probability, 25(5), 2809–2866. cases such as high-dimensional models the well-known 0.2 0.2 normalizing constants. Electronic Journal of Probability, Robert, S. and Künsch, H.R. (2017). Localizing the Probability of min N Probability of min N degeneracy issue of the particle filter can cause the Monte 0 0 19, 28 pp. ensemble Kalman particle filter. Tellus A: Dynamic 0 0.05 0.1 0 0.1 0.2 0.3 0.4 0.5 Carlo error to be significant, effectively rendering the stan- 0 0 Cappé, O., Godsill, S.J., and Moulines, E. (2007). An Meteorology and Oceanography, 69(1), 1282016. dard particle filter useless. Furthermore, for many models overview of existing methods and recent advances in Schön, T.B., Lindsten, F., Dahlin, J., W˚agberg, J., Naes- Fig. 3. Estimated probabilities of degeneracy in terms of of interest—in particular non-linear models of the form sequential Monte Carlo. Proceedings of the IEEE, 95(5), seth, C.A., Svensson, A., and Dai, L. (2015). Sequential the ESS for varying ε. Left: Block-diagonal covariance (1)—it is difficult to improve over the standard particle fil- 899–924. Monte Carlo methods for system identification. IFAC- matrix. Right: Weighted sample covariance matrix. ter proposal due to the intractability of the transition den- Djurić, P.M. and Bugallo, M.F. (2013). Particle filter- PapersOnLine, 48(28), 775 – 786. sity function. To alleviate this issue we have proposed to ing for high-dimensional systems. In 2013 5th IEEE Shaman, J. and Karspeck, A. (2012). Forecasting seasonal instead perform filtering for an approximate model where International Workshop on Computational Advances in outbreaks of influenza. Proceedings of the National the approximation is such that it opens up for a more Multi-Sensor Adaptive Processing (CAMSAP), 352–355. Academy of Sciences, 109(50), 20425–20430. efficient particle filter proposal. This is in contrast with Doucet, A., Godsill, S., and Andrieu, C. (2000). On Snyder, C., Bengtsson, T., and Morzfeld, M. (2015). Per- clarity of presentation we therefore split the particle filter many existing approaches to approximate filtering, which sequential Monte Carlo sampling methods for Bayesian formance bounds for particle filters using the optimal runs into two groups: one containing the degenerate cases, are often based on approximating the inference algorithm filtering. Statistics and Computing, 10(3), 197–208. proposal. Monthly Weather Review, 143(11), 4750–4761. in which the ESS dropped below two at least once during itself. One motivation for this approach is that the model Evensen, G. (1994). Sequential data assimilation with a the run, and one for the non-degenerate cases, in which the in most cases is an approximation to begin with, so adding nonlinear quasi-geostrophic model using Monte Carlo Appendix A. ADDITIONAL PLOTS ESS stayed above two. Fig. 2 has been zoomed in on the a further approximation does not necessarily deteriorate methods to forecast error statistics. Journal of Geo- non-degenerate runs, for S = B (left) and S = Ξ (right). the data analysis to any large degree. Another motivation physical Research: Oceans, 99(C5), 10143–10162. Fig. 4 shows all runs for the 10-dimensional Lorenz’96 Plots showing all runs are given in Fig. 4 in the appendix. is that an error analysis of the proposed procedure can Gordon, N.J., Salmond, D.J., and Smith, A.F.M. (1993). model. For small values on ε the filter degenerates for both focus on the model bias. Indeed, the inference method that Novel approach to nonlinear/non-Gaussian Bayesian From Fig. 2 we observe that indeed there is a bias-variance choices of covariance matrices. When ε is increased the we use is a regular particle filter (albeit for a non-standard state estimation. IEE Proceedings F Radar and Signal trade-off for the proposed method where a higher value of ε number of degenerate estimates gradually decreases. model) and the properties of these methods are by know Processing, 140(2), 107. reduces the variance but increases the bias. A comparison fairly well understood. of the log-likelihood plots for S = B and S = Ξ shows He, D., Ionides, E.L., and King, A.A. (2010). Plug-and- that the latter choice obtains higher likelihood estimates The proposed model approximation, and thus also the play inference for disease dynamics: measles in large and which decay more slowly and with a lower spread on bias-variance trade-off of the resulting method, is con- the estimates for high values on ε. Similarly, the MSE trolled by the tuning parameter ε. From our numerical estimates also show less spread for the choice S = Ξ. This examples with two 10-dimensional state-space models it is indicates that the sample covariance matrix might give an clear that the introduction of a non-zero ε can significantly estimate which is more robust for varying values of ε for improve the performance over the standard particle filter, this model. both in terms of log-likelihood estimates and in terms of the filtering MSE. However, if we increase the dimension To examine the effect of ε on the degeneracy of the particle much further we expect that the proposed method will filter in more detail we show the estimated probabilities struggle as well. This is supported by the fact that even of degeneracy (as defined above in terms of the ESS) for the locally optimal proposal will suffer from the curse of varying ε in Fig. 3, for S = B (left) and S = Ξ (right). dimensionality (Snyder et al., 2015). Thus, to address very These probabilities are estimated based on binning the high-dimensional problems (thousands of dimensions, say) range of ε into 100 equally sized bins and counting the we most likely need to combine the proposed method with number of degenerate runs in each bin. This explains the other approaches. Such combinations of techniques is an raggedness of the estimates, but the plots nevertheless give interesting topic for future research. Fig. 4. Marginal log-likelihood and MSE as a function of ε for the 10-dimensional Lorenz’96 model. Left: Block-diagonal an idea of how the probability of degeneracy decreases covariance matrix. Right: Weighted sample covariance matrix. Red crosses show estimates where the ESS drops with increasing ε (we expect that the true probabilities So far we have only investigated the empirical performance below two at least once (indicating a degenerate filter estimate) and blue dots show estimates where the ESS is are monotonically decreasing). of the new method for varying ε. In practice we would of always greater than two.

674 675 2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden Anna Wigren et al. / IFAC PapersOnLine 51-15 (2018) 670–675 675 course like to know which ε to pick beforehand, or have an small populations as a case study. Journal of The Royal adaptive scheme for tuning ε online. Devising such rules- Society Interface, 7(43), 271–283. of-thumb or adaptive methods is a topic for future work. Kantas, N., Doucet, A., Singh, S.S., Maciejowski, J., and The same hold for the choice of base covariance S which Chopin, N. (2015). On particle methods for parameter needs further investigation. We note, however, that our estimation in state-space models. Statistical Science, empirical results suggest that the method is fairly robust 30(3), 328–351. to selecting ε “too large”, whereas a too small ε resulted Karimi, A. and Paul, M.R. (2010). Extensive chaos in the in very poor performance. A possible approach is therefore Lorenz-96 model. Chaos: An Interdisciplinary Journal to start with a large ε which is then gradually decreased of Nonlinear Science, 20(4), 043105. while monitoring the ESS. Lorenz, E. (1995). Predictability: a problem partly solved. In Seminar on Predictability, 4-8 September 1995, vol- Finally, we note that the proposed method is useful not ume 1, 1–18. Shinfield Park, Reading. only for filtering problems, but also for system identifi- Mattern, J.P., Dowd, M., and Fennel, K. (2013). Particle cation. Several state-of-the-art methods for identification filter-based data assimilation for a three-dimensional bi- of non-linear state-space models are based on the log- ological ocean model and satellite observations. Journal likelihood estimate (see, e.g., Schönet al. (2015); Kantas of Geophysical Research: Oceans, 118(5), 2746–2760. et al. (2015)). Thus, the significant improvement in these Naesseth, C., Lindsten, F., and Schön, T. (2015). Nested estimates offered by the proposed method should have sequential Monte Carlo methods. In Proceedings of the direct bearing also on non-linear system identification. 32nd International Conference on Machine Learning, REFERENCES 1292–1301. Lille, France. Rebeschini, P. and van Handel, R. (2015). Can local Bérard, J., Del Moral, P., and Doucet, A. (2014). A lognor- particle filters beat the curse of dimensionality? The mal central limit theorem for particle approximations of Annals of Applied Probability, 25(5), 2809–2866. normalizing constants. Electronic Journal of Probability, Robert, S. and Künsch, H.R. (2017). Localizing the 19, 28 pp. ensemble Kalman particle filter. Tellus A: Dynamic Cappé, O., Godsill, S.J., and Moulines, E. (2007). An Meteorology and Oceanography, 69(1), 1282016. overview of existing methods and recent advances in Schön, T.B., Lindsten, F., Dahlin, J., W˚agberg, J., Naes- sequential Monte Carlo. Proceedings of the IEEE, 95(5), seth, C.A., Svensson, A., and Dai, L. (2015). Sequential 899–924. Monte Carlo methods for system identification. IFAC- Djurić, P.M. and Bugallo, M.F. (2013). Particle filter- PapersOnLine, 48(28), 775 – 786. ing for high-dimensional systems. In 2013 5th IEEE Shaman, J. and Karspeck, A. (2012). Forecasting seasonal International Workshop on Computational Advances in outbreaks of influenza. Proceedings of the National Multi-Sensor Adaptive Processing (CAMSAP), 352–355. Academy of Sciences, 109(50), 20425–20430. Doucet, A., Godsill, S., and Andrieu, C. (2000). On Snyder, C., Bengtsson, T., and Morzfeld, M. (2015). Per- sequential Monte Carlo sampling methods for Bayesian formance bounds for particle filters using the optimal filtering. Statistics and Computing, 10(3), 197–208. proposal. Monthly Weather Review, 143(11), 4750–4761. Evensen, G. (1994). Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo Appendix A. ADDITIONAL PLOTS methods to forecast error statistics. Journal of Geo- physical Research: Oceans, 99(C5), 10143–10162. Fig. 4 shows all runs for the 10-dimensional Lorenz’96 Gordon, N.J., Salmond, D.J., and Smith, A.F.M. (1993). model. For small values on ε the filter degenerates for both Novel approach to nonlinear/non-Gaussian Bayesian choices of covariance matrices. When ε is increased the state estimation. IEE Proceedings F Radar and Signal number of degenerate estimates gradually decreases. Processing, 140(2), 107. He, D., Ionides, E.L., and King, A.A. (2010). Plug-and- play inference for disease dynamics: measles in large and

Fig. 4. Marginal log-likelihood and MSE as a function of ε for the 10-dimensional Lorenz’96 model. Left: Block-diagonal covariance matrix. Right: Weighted sample covariance matrix. Red crosses show estimates where the ESS drops below two at least once (indicating a degenerate ﬁlter estimate) and blue dots show estimates where the ESS is always greater than two.

675