review articles

DOI:10.1145/3363294 Demystifying the uses of a powerful tool for uncertain information.

BY YAN PEI, SWARNENDU BISWAS, DONALD S. FUSSELL, AND KESHAV PINGALI An Elementary Introduction to Kalman Filtering

of Bayesian inference, assuming that KALMAN FILTERING IS a state estimation technique is Gaussian. This leads to the used in many application areas such as spacecraft common misconception that Kalman filtering can be applied only if noise , in , signal is Gaussian.15 processing, and wireless sensor networks because Abstractly, Kalman filtering can be of its ability to extract useful information from seen as a particular approach to combin- ing approximations of an unknown val- noisy data and its small computational and memory ue to produce a better approximation. requirements.12,20,27–29 Recent work has used Kalman Suppose we use two devices of different 5,13,14,23 filtering in controllers for computer systems. key insights Although many introductions to Kalman filtering are ˽˽ This article presents an elementary 1–4,6–11,17,21,25,29 available in the literature, they are usually derivation of Kalman filtering, a classic focused on particular applications such as robot motion state estimation technique. ˽˽ Understanding Kalman filtering is or state estimation in linear systems, making it difficult to useful for more principled control see how to apply Kalman filtering to other problems. Other of computer systems. ˽˽ Kalman filtering is used as a black box presentations derive Kalman filtering as an application by many computer scientists.

122 COMMUNICATIONS OF THE ACM | NOVEMBER 2019 | VOL. 62 | NO. 11 designs to measure the temperature of a called linear because they use information even in the measurement CPU core. Because devices are usually a weighted sum to fuse values; for our from the lower quality device, and the noisy, the measurements are likely to temperature problem, their general form optimal is one in which the

differ from the actual temperature of the is β*x1+α*x2. In this presentation, we use weight given to each measurement is core. As the devices are of different de- the term estimate to refer to both a noisy proportional to the confidence we have signs, let us assume that noise affects measurement and a value computed by in the device producing that measure- the two devices in unrelated ways (this is an estimator, as both are approxima- ment. Only if we have no confidence formalized here using the notion of cor- tions of unknown values of interest. whatever in the first device should we relation). Therefore, the measurements Suppose we have additional infor- discard its measurement. a x1 and x2 are likely to be different from mation about the two devices, say the The goal of this article is to present each other and from the actual core tem- second one uses more advanced tem- the abstract concepts behind Kalman

perature xc. A natural question is the fol- perature sensing. Because we would filtering in a way that is accessible to lowing: is there a way to combine the in- have more confidence in the second most computer scientists while clarify-

formation in the noisy measurements x1 measurement, it seems reasonable ing the key assumptions, and then show

and x2 to obtain a good approximation of that we should discard the first one, how the problem of state estimation in

the actual temperature xc? which is equivalent to using the linear linear systems can be solved as an

One ad hoc solution is to use the for- estimator 0.0*x1 + 1.0*x2. - * +0.5*x to take the average mula 0.5 x1 2 ing tells us that in general, this intui- a An extended version of this article that in- of the two measurements, giving them tively reasonable linear estimator is not cludes additional background material and 30 IMAGE BY A LUNA BLUE BLUE A LUNA BY IMAGE equal weight. Formulas of this sort are “optimal;” paradoxically, there is useful proofs is available.

NOVEMBER 2019 | VOL. 62 | NO. 11 | COMMUNICATIONS OF THE ACM 123 review articles

application of these general concepts. random sample from the distribution the pdf for the possible values of x2. If First, the informal ideas discussed here for that device. We write to the random variables are only uncor-

are formalized using the notions of distri- denote that xi is a random variable with related, knowing x1 might give us new

butions and random samples from distri- pdf pi whose mean and are µi information about x2 such as restricting

butions. Confidence in estimates is and , respectively; following conven- its possible values but the mean of x2|x1

quantified using the and covari- tion, we use xi to represent a random will still be µ2. Using expectations, this b ances of these distributions. Two algo- sample from this distribution as well. can be written as E[x2|x1] = E[x2], which is

rithms are described next. The first one Means and variances of distribu- equivalent to requiring that E[(x1−µ1)(x2−

shows how to fuse estimates (such as core tions model different kinds of inaccura- µ2)], the between the two vari- temperature measurements) optimally, cies in measurements. Device i is said to ables, be equal to zero. This is obviously given a reasonable definition of optimal- have a systematic error or bias in its a weaker condition than independence.

ity. The second addresses a measurements if the mean µi of its dis- Although the discussion in this sec- problem that arises frequently in practice: tribution is not equal to the actual tem- tion has focused on measurements,

estimates are vectors (for example, the perature xc (in general, to the value being the same formalization can be used for position and velocity of a robot), but only a estimated, which is known as ground estimates produced by an estimator. part of the vector can be measured truth); otherwise, the instrument is unbi- Lemma 1(i) shows how the mean and directly; in such a situation, how can an ased. Figure 1 shows pdfs for two devices variance of a linear combination of pair- estimate of the entire vector be obtained that have different amounts of systematic wise uncorrelated random variables can from an estimate of just a part of that error. The variance on the other hand be computed from the means and vari- vector? The best linear unbiased esti- is a measure of the random error in the ances of the random variables.18 The mator (BLUE) is used to solve this prob- measurements. The impact of random mean and variance can be used to quan- lem.16,19,26 It is shown that the Kalman errors can be mitigated by taking many tify bias and random errors for the esti- filter can be derived in a straightfor- measurements with a given device and mator as in the case of measurements. ward way by using these two averaging their values, but this approach An unbiased estimator is one whose to solve the problem of state estimation will not reduce systematic error. mean is equal to the unknown value in linear systems. The extended Kalman In the formulation of Kalman fil- being estimated and it is preferable to a filter and unscented Kalman filter, tering, it is assumed that measuring biased estimator with the same variance. which extended Kalman filtering to non- devices do not have systematic errors. Only unbiased estimators are considered linear systems, are described briefly at However, we do not have the luxury of in this article. Furthermore, an unbiased the end of the article. taking many measurements of a given estimator with a smaller variance is pref- state, so we must take into account the erable to one with a larger variance as we Formalizing Estimates impact of random error on a single would have more confidence in the esti- Scalar estimates. To model the behav- measurement. Therefore, confidence mates it produces. As a step toward gener- ior of devices producing noisy tempera- in a device is modeled formally by the alizing this discussion to estimators that ture measurements, we associate each variance of the distribution associated produce vector estimates, we refer to the device i with a random variable that has with that device; the smaller the vari- variance of an unbiased scalar estimator

a probability density function (pdf) pi(x) ance, the higher our confidence in the as the mean square error of that estimator such as the ones shown in Figure 1 (the measurements made by the device. In or MSE for short. x-axis in this figure represents tempera- Figure 1, the fact we have less confi- Lemma 1(ii) asserts that if a random ture). Random variables need not be dence in the first device has been illus- variable is pairwise uncorrelated with c Gaussian. Obtaining a measurement trated by making p1 more spread out a set of random variables, it is uncor-

from device i corresponds to drawing a than p2, giving it a larger variance. related with any linear combination of The informal notion that noise should those variables. affect the two devices in “unrelated b Basic concepts such as probability density func- ways” is formalized by requiring that Lemma 1. Let tion, mean, expectation, variance and covari- the corresponding random variables be be a set of pairwise uncorrelated ance are introduced in the online appendix. c The role of Gaussians in Kalman filtering is uncorrelated. This is a weaker condition random variables. Let be a discussed later in the article. than requiring them to be independent, random variable that is a linear combi-

as explained in our online appendix nation of the xi’s. Figure 1. Using pdfs to model devices with (http://dl.acm.org/citation.cfm?doid= systematic and random errors. Ground truth is 60°C. Dashed lines are means of pdfs. 3363294&picked=formats). Suppose we (i) The mean and variance of y are: are given the measurement made by one of the devices (say x ) and we have (1) p2(x) 1 to guess what the other measurement (x ) might be. If knowing x does not give p (x) 2 1 (2) p ( x ) 1 us any new information about what x2 might be, the random variables are inde- x pendent. This is expressed formally by (ii) If random variable x is pair-wise 58 60 63 n+1 the equation p(x2|x1) = p(x2); intuitively, uncorrelated with x1,..,xn, it is

knowing the value of x1 does not change uncorrelated with y.

124 COMMUNICATIONS OF THE ACM | NOVEMBER 2019 | VOL. 62 | NO. 11 review articles

Vector estimates. In some applications, Fusing Scalar Estimates estimates are vectors. For example, the We now consider the problem of choos- state of a mobile robot might be rep- ing the optimal values of the param- resented by a vector containing its posi- eters α and β in the linear estimator tion and velocity. Similarly, the vital β*x + α*x for fusing two estimates x An unbiased 1 2 1 signs of a person might be represented and x2 from uncorrelated scalar-valued by a vector containing his tempera- estimator is one random variables. ture, pulse rate, and blood pressure. whose mean The first reasonable requirement is Here, we denote a vector by a boldfaced that if the two estimates x1 and x2 are lowercase letter, and a by an is equal to the equal, fusing them should produce uppercase letter. the same value. This implies that α+β

The ∑xx of a ran- unknown value =1. Therefore, the linear estimators of dom variable x is the matrix E[(x − µx) interest are of the form T being estim