Chapter 11

Tutorial: The

Tony Lacey.

11.1 Intro duction

The Kalman lter [1] has long b een regarded as the optimal solution to many tracking and data

tasks, [2]. Its use in the analysis of visual motion has b een do cumented frequently. The standard Kalman

lter derivation is given here as a tutorial exercise in the practical use of some of the statistical techniques

outlied in previous sections. The lter is constructed as a mean squared error minimiser, but an alternative

derivation of the lter is also provided showing how the lter relates to maximum likeliho o d .

Do cumenting this derivation furnishes the reader with further insightinto the statistical constructs within

the lter.

The purp ose of ltering is to extract the required information from a signal, ignoring everything else.

Howwell a lter p erforms this task can b e measured using a cost or loss function. Indeed wemay de ne

the goal of the lter to b e the minimisation of this loss function.

11.2 Mean squared error

Many signals can b e describ ed in the following way;

y = a x + n 11.1

k k k k

where; y is the time dep endent observed signal, a is a gain term, x is the information b earing signal

k k k

and n is the additive .

k

The overall ob jective is to estimate x . The di erence b etween the estimate ofx ^ and x itself is termed

k k k

the error;

f e  = f x x^  11.2

k k k

The particular shap e of f  e  is dep endent up on the application, however it is clear that the function

k

should b e b oth p ositive and increase monotonically [3]. An error function which exhibits these charac-

teristics is the squared error function;

2

f e  = x x^  11.3

k k k 133

Since it is necessary to consider the ability of the lter to predict many data over a p erio d of time a more

meaningful metric is the exp ected value of the error function;

l ossf unction = E f  e  11.4

k

This results in the mean squared error MSE function;



2

 t = E e 11.5

k

11.3 Maximum likeliho o d

The ab ove derivation of mean squared error, although intuitive is somewhat heuristic. A more rigorous

derivation can b e develop ed using maximum likelihood statistics. This is achieved by rede ning the goal

of the lter to nding the x^ which maximises the probability or likeliho o d of y . That is;

max [P y jx^] 11.6

Assuming that the additive random noise is Gaussian distributed with a standard deviation of  gives;

k

2

y a x^ 

k k k

P y jx^  = K exp 11.7

k k k

2

2

k

where K is a normalisation constant. The maximum likeliho o d function of this is;

k

2

Y

y a x^ 

k k k

P y jx^  = K exp 11.8

k

2

2

k

k

Which leads to;

2

X

y a x^  1

k k k

logP y jx^ = + constant 11.9

2

2 

k

k

The driving function of equation 11.9 is the MSE, which may be maximised by the variation of x^ .

k

Therefore the mean squared error function is applicable when the exp ected variation of y is b est mo delled

k

as a Gaussian distribution. In such a case the MSE serves to provide the value of x^ which maximises

k

the likeliho o d of the signal y .

k

In the following derivation the optimal lter is de ned as b eing that lter, from the set of all p ossible

lters which minimises the mean squared error.

11.4 Kalman Filter Derivation

Before going on to discuss the Kalman lter the work of Norb ert Wiener [4], should rst b e acknowledged

. Wiener describ ed an optimal nite impulse response FIR lter in the mean squared error sense. His

solution will not b e discussed here even though it has much in common with the Kalman lter. Suce to

say that his solution uses b oth the auto correlation and the cross correlation of the received signal with

the original data, in order to derive an impulse resp onse for the lter.

Kalman also presented a prescription of the optimal MSE lter. However Kalman's prescription has

some advantages over Weiner's; it sidesteps the need to determine the impulse resp onse of the lter,

something which is p o orly suited to numerical computation. Kalman describ ed his lter using state 134

space techniques, which unlike Wiener's p erscription, enables the lter to b e used as either a smo other, a

lter or a predictor. The latter of these three, the ability of the Kalman lter to b e used to predict data

has proven to b e a very useful function. It has lead to the Kalman lter b eing applied to a wide range of

tracking and problems. De ning the lter in terms of state space metho ds also simpli es the

implementation of the lter in the discrete domain, another reason for its widespread app eal.

11.5 State space derivation

Assume that wewanttoknow the value of a variable within a pro cess of the form;

x = x + w 11.10

k +1 k k

where; x is the state vector of the pro cess at time k , nx1;  is the state transition of the pro cess

k

from the state at k to the state at k + 1, and is assumed stationary over time, nxm; w is the asso ciated

k

pro cess with known , nx1.

Observations on this variable can b e mo delled in the form;

z = Hx + v 11.11

k k k

where; z is the actual measurementof x at time k, mx1; H is the noiseless connection b etween the

k

state vector and the measurementvector, and is assumed stationary over time mxn; v is the asso ciated

k

measurement error. This is again assumed to b e a white noise pro cess with known covariance and has

zero cross-correlation with the pro cess noise, mx1.

As was shown in section ?? for the minimisation of the MSE to yield the optimal lter it must be

p ossible to correctly mo del the system errors using Gaussian distributions. The of the two

noise mo dels are assumed stationary over time and are given by;

 

T

Q = E w w 11.12

k

k

 

T

R = E v v 11.13

k

k

The mean squared error is given by 11.5. This is equivalent to;

 

T

E e e = P 11.14

k k

k

where; P is the error at time k , nxn.

k

Equation 11.14 may b e expanded to give;

h i

 

T

T

P = E e e = E  x x^ x x^  11.15

k k k k k k

k

0

Assuming the prior estimate ofx ^ is calledx ^ , and was gained by knowledge of the system. It p osible to

k

k

write an up date equation for the new estimate, combing the old estimate with measurement data thus;

0 0

 11.16 + K z H x^ x^ = x^

k k k

k k

0

where; K is the Kalman gain, which will b e derived shortly. The term z H x^ in eqn. 11.16 is known

k k

k

as the innovation or measurement residual;

i = z H x^ 11.17

k k k 135

Substitution of 11.11 into 11.16 gives;

0 0

x^ = x^ + K  Hx + v Hx^  11.18

k k k k

k k

Substituting 11.18 into 11.15 gives;

0

P = E [[I K H x x^  K v ] 11.19

k k k k k

k

i

T

0

[I K H x x^  K v ]

k k k k

k

0

At this p oint it is noted that x x^ is the error of the prior estimate. It is clear that this is uncorrelated

k

k

with the measurement noise and therefore the exp ectation may b e re-written as;

h i

T

0 0

P =  I K H  E x x^ x x^  I K H

k k k k k

k k

 

T T

+ K E v v K 11.20

k k

k k

Substituting equations 11.13 and 11.15 into 11.19 gives;

T

0 T

P = I K H  P  I K H  + K RK 11.21

k k k k

k k

0

where P is the prior estimate of P .

k

k

Equation 11.21 is the error covariance up date equation. The diagonal of the covariance matrix contains

the mean squared errors as shown;

     

2 3

T T T

E e e E e e E e e

k1 k k+1

k1 k1 k 1

     

T T T

4 5

E e e E e e E e e

P = 11.22

k 1 k k +1

kk

k k k

     

T T T

E e e E e e E e e

k 1 k k +1

k +1 k +1 k +1

The sum of the diagonal elements of a matrix is the of a matrix. In the case of the error covariance

matrix the trace is the sum of the mean squared errors. Therefore the mean squared error may be

minimised by minimising the trace of P which in turn will minimise the trace of P .

k kk

The trace of P is rst di erentiated with resp ect to K and the result set to zero in order to nd the

k k

conditions of this minimum.

Expansion of 11.21 gives;



0 0 0 T T 0 T T

P = P K HP P H K + K HP H + R K 11.23

k k k

k k k k k k

Note that the trace of a matrix is equal to the trace of its transp ose, therefore it may written as;

  

T T 0 0 0

11.24 H + R K ] + T K HP ] 2T [K HP T [P ] = T [P

k k k

k k k k

where; T [P ] is the trace of the matrix P .

k k

Di erentiating with resp ect to K gives;

k



dT [P ]

k

0 T 0 T

= 2HP  + 2K HP H + R 11.25

k

k k

dK

k

Setting to zero and re-arranging gives; 136



T

0 0 T

HP  = K HP H + R 11.26

k

k k

Now solving for K gives;

k



1

0 T 0 T

HP H + R 11.27 K = P H

k

k k

Equation 11.27 is the Kalman gain equation. The innovation, i de ned in eqn. 11.17 has an asso ciated

k

measurement prediction covariance. This is de ned as;

0 T

S = HP H + R 11.28

k

k

Finally, substitution of equation 11.27 into 11.23 gives;



1

0 T 0 0 0 T

HP H + R HP P = P P H

k

k k k k

0 0

= P K HP

k

k k

0

= I K H P 11.29

k

k

Equation 11.29 is the up date equation for the error covariance matrix with optimal gain. The three

equations 11.16, 11.27, and 11.29 develop an estimate of the variable x . State pro jection is achieved

k

using;

0

x^ = ^x 11.30

k

k +1

To complete the recursion it is necessary to nd an equation which pro jects the error covariance matrix

into the next time interval, k +1. This is achieved by rst forming an expressions for the prior error;

0 0

e = x x^

k +1

k +1 k +1

= x + w  ^x

k k k

= e + w 11.31

k k

Extending equation 11.15 to time k +1;

h i

 

T

0 0 T 0

P = E e e = E e + w e + w  11.32

k k k k

k+1 k +1 k +1

Note that e and w have zero cross-correlation b ecause the noise w actually accumulates b etween k

k k k

and k + 1 whereas the error e is the error up until time k . Therefore;

k

 

T 0 0 0

e = E e P

k +1 k +1 k +1

h i

 

T

T

= E e e  + E w w

k k k

k

T

= P  + Q 11.33

k

This completes the recursive lter. The algorithmic lo op is summarised in the diagram of gure 11.5. 137 Initial Estimates

Measurements Kalman Gain

Project into k+1 Update Estimate

Update Covariance

Projected Estimates Updated State Estimates

Description Equation

1

0 T 0 T

Kalman Gain K = P H HP H + R

k

k k

0 0

Up date Estimate x^ = x^ + K z H x^ 

k k k

k k

0

Up date Covariance P =  I K H  P

k k

k

0

Pro ject into k +1 x^ = ^x

k

k+1

T

P = P  + Q

k+1 k

Figure 11.1: Kalman Filter Recursive

11.6 The Kalman lter as a chi-square merit function

The ob jective of the Kalman lter is to minimise the mean squared error b etween the actual and estimated

data. Thus it provides the b est estimate of the data in the mean squared error sense. This b eing the

case it should be p ossible to show that the Kalman lter has much in common with the chi-square.

The chi-square merit function is a maximum likeliho o d function, and was derived earlier, equation 11.9.

It is typically used as a criteria to t a set of mo del parameters to a mo del a pro cess known as least

squares tting. The Kalman lter is commonly known as a recursive RLS tter. Drawing

similarities to the chi-square merit function will give a di erent p ersp ective on what the Kalman lter is

doing.

The chi-square merit function is;

 

k

2

X

z h a ;x

i i

2

= 11.34



i

i=1

where; z is the measured value; h is the data mo del with parameters x, assumed linear in a;  is the

i i i

asso ciated with the measured value.

The optimal set of parameters can then b e de ned as that which minimises the ab ove function. Expanding

out the variance gives; 138

k

X

1

2

2

= [z h a ;x] 11.35

i i

 

i i

i=1

Representing the chi-square in vector form and using notation from the earlier Kalman derivation;

T

2 1

= [z h a; x ] R [z h a; x ] 11.36

k k k k

k

1

where; R is the matrix of inverse squared , i.e. 1=  .

i i

The ab ove merit function is the merit function asso ciated with the latest, k th, measurement and provides

a measure of how accurately the mo del predicted this measurement. Given that the inverse mo del

covariance matrix is known up to time k , the merit function up to time k may b e re-written as;

T

01

2

= x x^  P x x^  11.37

k 1 k 1 k 1 k 1

k 1

k 1

To combine the new data with the previous, tting the mo del parameters so as to minimise the overall

chi-square function, the merit function b ecomes the summation of the two;

T T

01

2 1

= x x^  P  x x^  + [z h a; x ] R [z h a; x ] 11.38

k 1 k 1 k 1 k 1 k k k k

k 1

Where the rst derivative of this is given by;

2

d

T

01

1

= 2P x x^  2r h a; x  R [z h a; x ] 11.39

k 1 k 1 x k k k

k 1

dx

The mo del function ha; x  with parameters tted from information to date, may b e considered as;

k

ha; x  = h a; ^x + x  11.40

k k k

where x = x x^ . The expansion of the mo del function to rst order is;

k k k

h ^x + x = h ^x  + xr h ^x  11.41

k k x k

Substituting this result into the derivative equation 11.39 gives;

2

d

01

= 2P x x^ 

k k

k

dx

T

1

2r h a; x^  R [z h a; x^  x x^  r h a; x^ ] 11.42

x k k k k k x k

k

It is assumed that the estimated mo del parameters are a close approximation to the actual mo del param-

eters. Therefore it may b e assumed that the derivatives of the actual mo del and the estimated mo del are

the same. Further, for a system which is linear in a the mo del derivative is constant and may b e written

as;

r h a; x  = r h a; x^  = H 11.43

x k x k

Substituting this into equation 11.39 gives;

2

d

01

T 1 T 1

= 2P x + 2H R H x 2H R [z h a; x^ ] 11.44

k k k k

k

dx 139

Re-arranging gives;

2

 

d

01

T 1 T 1

= 2 P + H R H x 2H R [z h a; x^ ] 11.45

k k k

k

dx

For a minimum the derivative is zero, rearrange in terms of x gives;

k

 

1

01

T 1 T 1

x = P + H R H H R [z h a; x^ ] 11.46

k k k

k

 

1

01

T 1 T 1

x = x^ + P + H R H H R [z h a; x^] 11.47

k k

k

Comparison of equation 11.47 to 11.16 allows the gain, K to b e identi ed as;

k

 

1

01

T 1 T 1

K = P + H R H H R 11.48

k

k

Giving a parameter up date equation of the form;

x = x^ + K [z h a; x^ ] 11.49

k k k k k

Equation 11.49 is identical to 11.16 and describ es the improvement of the parameter estimate using the

error b etween measured and mo del pro jected values.

11.7 Mo del covariance up date

The mo del parameter covariance has b een considered in its inverted form where it is known as the

1

information matrix . It is p ossible to formulate an alternative up date equation for the covariance matrix

using standard error prop ogation;

1 01

1 T

P = P + HR H 11.50

k k

It is p ossible to show that the covariance up dates of equation 11.50 and equation 11.29 are equivalent.

1

This maybeachieved using the identity P  P = I . The original, eqn 11.29 and alternative, eqn 11.50

k

k

forms of the covariance up date equations are;

1 01

0 1 T

P = I K H  P and P = P + HR H

k k

k

k k

Therefore;

01

0 1 T

 I K H  P  P + HR H = I 11.51

k

k

k

Substituting for K gives;

k

i h

  

1

01

T T 1 0 0 T 0 0

H + R P + H R H HP HP H P P

k k k k

k

h



1

T 0 T 0

H + R HP H = I P

k k

i



1

1 0 T 0 T 1

R + HP H + R HP H R H

k k

i h

 

1

1 T 1 T 0 0 T 0

H R H R H + R I + HP HP H = I P

k k k

 

1 1 T 0

H R R H = I P

k

= I 11.52

1

when the Kalman lter is built around the information matrix it is known as the information lter 140