Stochastic Approximation for Optimal Observer Trajectory Planning

Stochastic Approximation for Optimal Observer Trajectory Planning Sumeetpal Singha, Ba-Ngu Voa, Arnaud Doucetb, Robin Evansa aDepartment of Electrical and Electronic Engineering The University of Melbourne, VIC 3010, Australia fssss,b.vo,[email protected] b Signal Processing Group, Engineering Department University of Cambridge, Cambridge CB2 1PZ, UK [email protected] Abstract— A maneuvering target is to be tracked based on noise cor- observer (non-linear observation) was studied in [6] where a sub- rupted measurements of the target's state that are received by a moving optimal observer trajectory in [6] was computed using a linearised observer. Additionally, the quality of the target state observations can observation equation. The possible observer paths were discretised be improved by the appropriate positioning of the observer relative to the target during tracking. The bearings-only tracking problem is an and a Dynamic Programming (DP) algorithm was provided to example of this scenario. The question of optimal observer trajectory compute the optimal discretised path. In [11], the authors only planning naturally arises, i.e. how should the observer manoeuvre assumed Markovian target motion and bearings measurements. By relative to the target in order to optimise the tracking performance? discretising the target and observation equation, the OTP problem In this paper, we formulate this problem as a discrete-time stochastic optimal control problem and present a novel stochastic approximation was posed as a POMDP and a sub-optimal closed-loop controller algorithm for designing the observer trajectory. Numerical examples was obtained. are presented to demonstrate the utility of the proposed methodology. Contributions: 1) The methodology proposed in this paper does not assume linear Gaussian dynamics for the target and observation. I. INTRODUCTION We only assume a Markovian target and an observation process Consider the problem of tracking a maneuvering target for N that admits a differentiable density; see Section II for a precise epochs, and let Xk denote the state of the target at epoch k. At statement. Algorithms are presented in full generality such that they each epoch, a sensor provides a noisy (possibly nonlinear) partial may be applied to solve any problem of the form (1), subject to the observation of the target state Yk = g(Xk; Ak; Vk), where Vk assumptions stated above, and not just trajectory planning problem. denotes noise. We have denoted by Ak some parameter of the sensor 2) In this paper we use Stochastic approximation (SA) to construct that may be adjusted to improve the “quality” of the observation the optimal observer trajectory. Being an iterative gradient descent Yk. In tracking, one is interested in computing the probability method, SA requires estimates of the gradient of the performance density ¼k of the target state Xk given the sequence of observations criterion (1) w.r.t. to the observer trajectory. In our general setting, Y1; : : : ; Yk received and sensor parameters A1; : : : ; Ak until there no closed form expression for ¼k, the performance criterion f g f g epoch k; ¼k is known as the filtering density. The adaptive optimal (1) or its gradient w.r.t. Ak. We demonstrate how low variance tracking problem is to minimise estimates of the gradient may be obtained by using Sequential N Monte Carlo (SMC), aka Particle Filters, and the control variate k 2 approach. 3) Because we do not use discretisation, the SA algorithm E ¯ (Ã(Xk) ¼k; Ã ) (1) ¡ h i (k=1 ) presented in this paper solves the open-loop version of problem X (1) (nearly) exactly. To obtain a closed-loop controller, we use an with respect to (w.r.t.) the choice of sensors A1; : : : ; AN . ¯ f g 2 open-loop feedback policy. Note that in general, even the open- (0; 1] is known as the discount factor. The term adaptive im- loop version of problem (1) cannot be solved exactly because plies that the sensor parameter a time k, Ak, is to be selected a closed-form expression is not available. Only an approximate based on all information available prior to time k, which is solution is possible via discretisation and DP. An iterative gradient Y1; A1; : : : ; Yk¡1; Ak¡1 . The criterion in (1) is the discrepancy f g method appears to be the only way to solve it exactly. (Detailed between the true state Xk and the estimate ¼k, measured through comments are provided in Section II-A). Although we demonstrate a suitable function Ã. Problem (1), which is very general, is also convergence of the SA (main) algorithm for observer trajectory known as a Partially Observed Markov Decision Process (POMDP). design in numerical examples, its theoretical convergence has not In this paper, we are concerned with the scenario where a moving been established and is the subject of future research. platform (or observer) is to be adaptively maneuvered to optimise the tracking performance of a maneuvering target; thus Ak denotes II. PROBLEM FORMULATION the position of the observer at epoch k. This problem is termed the optimal observer trajectory planning (OTP) problem. Let Xk, Ak and Yk respectively denote the state of the target, Literature review: When the dynamics of the target and the the position of the observer and the observation (target state observation process are linear and Gaussian, then the optimal measurement) received at time k where k denotes a discrete-time index. The target state Xk k¸0 is an unobserved Markov process solution to problem (1) (when Ã gives a quadratic cost) can be f g computed off-line, i.e. open and closed loop control yield the same with initial distribution and transition law given by performance [8]. A linear Gaussian target and a bearings only X0 s ¼0; Xk+1 s p ( Xk) ; (2) ¢j a This work was supported in part by CENDSS, ARC and the Defense s Advanced Research Projects Agency of the US Department of Defense and respectively. (The symbol “ ” implies distributed according to.) was monitored by the Office of Naval Research under Contract No. N00014- The transition density p does not depend on the observer motion. 02-1-0802. The observation process Yk k¸0 is generated according to the state f g o and observation dependent probability density trajectory is completely determined, using (7, 10), once X0 and U1:N are given and Yk s q ( Xk; Ak) : (3) ¢j k 1 0 0 0 o k o o k¡i o We assume that the observation process admits a density and the Ak = [(F ) X + (F ) G Ui]: (11) 0 0 1 0 0 density is differentiable w.r.t the second conditioning argument, i=1 · ¸ X namely Ak. We place no restrictions on the target motion model The optimal OTP problem is to solve accept that the target motion is Markovian. Note that we do not N assume a linear Gaussian model. E k 2 min (¼0;A1:N ) ¯ (Ã(Xk) ¼k; Ã ) A1:N ¡ h i In this paper, we concentrate on the jump Markov linear model (k=1 ) (JMLM) for maneuvering targets and a bearings-only measurement X s.t. (11); Uk k; 1 k N; (12) process. For this application of the above general framework, the 2 U · · components of the state are where N is the horizon of interest, ¯ (0; 1) is a discount factor, 2 ¼k is the filtering density at time k (see (13) below), ¼k; Ã denotes T R4 X Xk = [rx;k; vx;k; ry;k; vy;k; k] £ =: (4) R2 h i 2 £ Ã(x)¼k(x)dx, and k . Note that we are solving for A1:N U ½ T in terms of the accelerations Uk = [ux;k,uy;k] . k bounds the x where (rx;k; ry;k) denotes the target's (Cartesian) coordinates, R U (vx;k; vy;k) denotes the target velocity in the x and y direction and y components of the acceleration as determined by the physical and k denotes the mode of the target, which belongs to the finite limitations of the observer. Note that we are assuming a known o set £. (The subscript k indicates time k.) The state of the target (fixed) initial observer state X0 and target distribution ¼0; hence the is comprised of continuous and discrete valued variables and is subscript (¼0; A1:N ) in the expectation operator. (See (16) below used to model maneuvering targets – the target switches between for details of the probability density w.r.t which the expectation E models as indicated by k, between constant velocity maneuvers. (¼0;A1:N ) is taken.) 4 f¢g Let X~k R be the continuous valued components of Xk, i.e. The aim is to optimise tracking performance with respect to 2 T ~ T the observer trajectory described by variables A1:N . The observer Xk = Xk ; k . In the JMLM, k k¸0 is a time-homogeneous o f g locations A1:N are completely determined once X and U1:N Markov chain while 0 h i are specified, (11). The independent variables of the optimisation X~k+1 = F (k+1)X~k + G(k+1)Wk (5) problem are U1:N , which themselves are subject to bounds. Feedback control: The OTP problem stated in (12) is an open- where Wk k¸0 is a sequence of independent and identically f g loop stochastic control problem. In order to utilise feedback, we distributed (i.i.d.) Gaussian random vectors taking values in R2, i.i.d. will use the open-loop feedback control (OLFC) approach [1]. Let Wk s (0; Q), and represents the uncertainty of the acceleration ¤ N U1:N be the solution to (12).

Stochastic Approximation for Optimal Observer Trajectory Planning

Trajectory Averaging for Stochastic Approximation MCMC Algorithms

Stochastic Approximation for the Multivariate and the Functional Median 3 It Can Not Be Updated Simply If the Data Arrive Online

Simulation Methods for Robust Risk Assessment and the Distorted Mix

An Empirical Bayes Procedure for the Selection of Gaussian Graphical Models

Stochastic Approximation of Score Functions for Gaussian Processes

Stochastic Approximation Methods and Applications in Financial Optimization Problems

Stochastic Approximation Algorithm

Stochastic Approximation: from Statistical Origin to Big-Data, Multidisciplinary Applications

On Implementation of the Markov Chain Monte Carlo Stochastic Approximation Algorithm

1 Multidimensional Stochastic Approximation: Adaptive Algorithms and Applications

Performance of a Distributed Stochastic Approximation Algorithm

Arxiv:1909.08749V2 [Stat.ML] 15 Sep 2020 Learning [Ber95a, Ber95b, SB18]