Particle Filter Recurrent Neural Networks

The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Particle Filter Recurrent Neural Networks Xiao Ma,∗ Peter Karkus,∗ David Hsu, Wee Sun Lee National University of Singapore {xiao-ma, karkus, dyhsu, leews}@comp.nus.edu.sg Abstract thus increasing the number of network parameters and the amount of data required for training. Recurrent neural networks (RNNs) have been extraordinar- ily successful for prediction with sequential data. To tackle We introduce Particle Filter Recurrent Neural Networks highly variable and multi-modal real-world data, we intro- (PF-RNNs), a new family of RNNs that seeks to improve duce Particle Filter Recurrent Neural Networks (PF-RNNs), belief approximation without lengthening the latent vector a new RNN family that explicitly models uncertainty in its in- h, thus reducing the data required for learning. Particle fil- ternal structure: while an RNN relies on a long, deterministic tering (Del Moral 1996) is a model-based belief tracking al- latent state vector, a PF-RNN maintains a latent state distribu- gorithm. It approximates the belief as a set of sampled states tion, approximated as a set of particles. For effective learning, that typically have well-understood meaning. PF-RNNs bor- we provide a fully differentiable particle filter algorithm that row from particle filtering the idea of approximating the updates the PF-RNN latent state distribution according to the belief as a set of weighted particles, and combine it with Bayes rule. Experiments demonstrate that the proposed PF- RNNs outperform the corresponding standard gated RNNs the powerful approximation capacity of RNNs. PF-RNN on a synthetic robot localization dataset and 10 real-world se- approximates the variable and multi-modal belief as a set 1 2 quence prediction datasets for text classification, stock price of weighted latent vectors {h ,h ,...} sampled from the prediction, etc. same distribution. Like standard RNNs, PF-RNNs follow a model-free approach: PF-RNNs’ latent vectors are learned distributed representations, which are not necessarily inter- Introduction pretable. As an alternative to the Gaussian based filters, Prediction with sequential data is a long-standing challenge e.g., Kalman filters, particle filtering is a non-parametric in machine learning. It has many applications, e.g., object approximator that offers a more flexible belief representa- tracking (Blake and Isard 1997), speech recognition (Xiong tion (Del Moral 1996); it is also proven to give a tighter et al. 2018), and decision making under uncertainty (So- evidence lower bound (ELBO) in the data generation do- mani et al. 2013). For effective prediction, predictors require main (Burda, Grosse, and Salakhutdinov 2015). In our case, “memory”, which summarizes and tracks information in the the approximate representation is trained from data to opti- input sequence. The memory state is generally not observ- mize the prediction performance. For effective training with able, hence the need for a belief, i.e., a posterior state dis- gradient methods, we employ a fully differentiable particle tribution that captures the sufficient statistic of the input for filter algorithm that maintains the latent belief. See Fig. 1 for making predictions. Modeling the belief manually is often a comparison of RNN and PF-RNN. difficult. Consider the task of classifying news text—treated We apply the underlying idea of PF-RNN to gated RNNs, as a sequence of words—into categories, such as politics, which are easy to implement and have shown strong per- education, economy, etc. It is difficult to handcraft the belief formance in many sequence prediction tasks. Specifically, representation and dynamics for accurate classification. we propose PF-LSTM and PF-GRU, the particle filter exten- State-of-the-art sequence predictors often use recurrent sions of Long Short Term Memory (LSTM) (Hochreiter and neural networks (RNNs), which learn a vector h of deter- Schmidhuber 1997) and Gated Recurrent Unit (GRU) (Cho ministic time-dependent latent variables as an approxima- et al. 2014). PF-LSTM and PF-GRU serve as drop-in re- tion to the belief. Real-world data, however, are highly vari- placements for LSTM and GRU, respectively. They aim able and often multi-modal. To cope with the complexity of to learn a better belief representation from the same data, uncertain real-world data and achieve better belief approx- though at a greater computational cost. h imation, one could increase the length of latent vector , We evaluated PF-LSTM and PF-GRU on 13 data sets: ∗equal contribution 3 synthetic dataset for systematic understanding and 10 real- Copyright c 2020, Association for the Advancement of Artificial world datasets with different sample sizes for performance Intelligence (www.aaai.org). All rights reserved. comparison. The experiments show that our PF-RNNs out- 5101 single vector ... ≈ ≈ K particles deterministic stochastic update Bayesian update inputRNN output inputPF-RNN output Figure 1: A comparison of RNN and PF-RNN. An RNN approximates the belief as a long latent vector and updates it with a deterministic nonlinear function. A PF-RNN approximates the belief as a set of weighted particles and updates them with the stochastic particle filtering algorithm. perform the corresponding standard RNNs with a compara- ticle filter in an RNN for learning belief tracking, but fol- ble number of parameters. Further, the PF-RNNs achieve the lows a model-based approach and relies on handcrafted be- best results on almost all datasets when there is no restriction lief representation (Jonschkowski, Rastogi, and Brock 2018; on the number of model parameters used.1 Karkus, Hsu, and Lee 2018). PF-RNNs retain the model- free nature of RNNs and exploit their powerful approx- Related Work imation capabilities to learn belief representation directly from data. Other work explicitly addresses belief repre- There are two general categories for prediction with sequen- sentation learning with RNNs (Gregor and Besse 2018; tial data: model-based and model-free. The model-based ap- Guo et al. 2018); however, they do not involve Bayesian be- proach includes, e.g., the well-known hidden Markov mod- lief update or particle filtering. els (HMMs) and the dynamic Bayesian networks (DBNs) (Murphy 2002). They rely on handcrafted state representations with well-defined semantics, e.g., phonemes in speech Particle Filter Recurrent Neural Networks recognition. Given a model, one may perform belief tracking according to the Bayes’ rule. The main difficulty here Overview is the state space and the computational complexity of be- The general sequence prediction problem is to predict an lief tracking grow exponentially with the number of state output sequence, given an input sequence. In this paper, we dimensions. To cope with this difficulty, particle filters rep- focus on predicting the output yt at time t, given the input resent the belief as a set of sampled states and perform ap- history x1,x2,...,xt. proximate inference. Alternatively, the model-free approach, Standard RNNs handle sequence prediction by maintain- such as RNNs, approximates the belief as a latent state vec- ing a deterministic latent state ht that captures the sufficient tor, learned directly from data, and updates it through a de- statistic of the input history, and updating h sequentially terministic nonlinear function, also learned from data. t given new inputs. Specifically, RNNs update ht with a deter- The proposed PF-RNNs build upon RNNs and com- ministic nonlinear function learned from data. The predicted bine their powerful data-driven approximation capabilities output yˆt is another nonlinear function of the latent state ht, with the sample-based belief representation and approxi- also learned from data. mate Bayesian inference used in particle filters. Related To handle highly variable, noisy real-world data, one key sample-based methods have been applied to generative mod- idea of PF-RNN is to capture the sufficient statistic of the els. Importance sampling is used to improve variational input history in the latent belief b(h ) by forming multi- auto-encoders (Burda, Grosse, and Salakhutdinov 2015). t ple hypotheses over h . Specifically, PF-RNNs approximate This is extended to sequence generation (Le et al. 2018) and t b(h ) by a set of K weighted particles {(hi,wi)}K , for la- to reinforcement learning (Igl et al. 2018). Unlike the earlier t t t i=1 tent state hi and weight wi; the particle filtering algorithm works that focus on generation, we combine RNNs and par- t t is used to update the particles according to the Bayes rule. ticle filtering for sequence prediction. PF-RNNs are trained The Bayesian treatment of the latent belief naturally cap- discriminatively, instead of generatively, with the target loss tures the stochastic nature of real-world data. Further, all function on the model output. As a result, PF-RNN training particles share the same parameters in a PF-RNN. The num- prioritizes target prediction over data generation which may ber of particles thus does not affect the number of PF-RNN be irrelevant to the prediction task. network parameters. Given a fixed training data set, we ex- PF-RNNs exploit the general idea of embedding algo- pect that increasing the number of particles improves the be- rithmic priors, in this case, filtering algorithms, in neu- lief approximation and leads to better learning performance, ral networks and train them discriminatively (Jonschkowski but at the cost of greater computational complexity. and Brock 2016; Jonschkowski, Rastogi, and Brock 2018; Karkus, Hsu, and Lee 2018). Earlier work embeds a par- Similar to RNNs, we used learned functions for updating latent states and predict the output yt based on the mean par- 1 ¯ ¯ K i i More details are publicly available in arXiv:1905.12885. The ticle: yˆt = fout(ht), where ht = i=1 wtht and fout is a code is available at https://github.com/Yusufma03/pfrnns task-dependent prediction function.

Particle Filter Recurrent Neural Networks

Real-Time Wavelet Compression and Self-Modeling Curve

Curriculum Vitae

Multivariate Chemometrics As a Strategy to Predict the Allergenic Nature of Food Proteins

Copula-Based Analysis of Dependent Data with Censoring and Zero Inﬂation

Principal Component Analysis

Chemometrics in Europe: I (R;Q,P)= I (Q11p) Na (2) Selected Results

Chemometrics: Theory and Application

A Synergistic Use of Chemometrics and Deep Learning Improved the Predictive Performance of Near-Infrared Spectroscopy Models for Dry Matter Prediction in Mango Fruit

Philip Ernst: Tweedie Award the Institute of Mathematical Statistics CONTENTS Has Selected Philip A

Abstracts Chemometrics and Analytical Chemistry 2014

Chemometric Study of the Correlation Between Human Exposure to Benzene and Pahs and Urinary Excretion of Oxidative Stress Biomarkers

Sequential Truncation of R-Vine Copula Mixture Model for High- Dimensional Datasets