Identification Lecture 10: Prediction error methods and pseudo-linear regressions

Roy Smith

2016-11-22 10.1

Prediction

e k p q

H z p q v k y k p q u k p q G z p q ` p q Typical assumptions G z and H z are stable, p q p q H z is stably invertible (no zeros outside the unit disk) p q e k has known : known pdf or known moments. p q

One-step ahead prediction

Given ZK u 0 , y 0 , . . . , u K 1 , y K 1 , “ t p q p q p ´ q p ´ qu what is the best estimate of y K ? p q

2016-11-22 10.2 Prediction

v k e k p q H z p q p q

Noise model invertibility Given, v k , k 0,...,K 1, can we determine e k , k 0,...,K 1? p q “ ´ p q “ ´

8 Inverse filter: Hinv z : e k hinv i v k i p q p q “ i 0 p q p ´ q ÿ“ We also want the inverse filter to be causal and stable:

8 hinv k 0, k 0, and hinv k . p q “ ă k 0 | p q| ă 8 ÿ“ If H z has no zeros for z 1, then, p q | | ě 1 Hinv z . p q “ H z p q

2016-11-22 10.3

Prediction

v k e k p q H z p q p q

One step ahead prediction Given measurements of v k , k 0,...,K 1, can we predict v K ? p q “ ´ p q Assume that we know H z , how much can we say about v K ? p q p q Assume also that H z is monic (h 0 1). p q p q “

8 v k h i e k i p q “ i 0 p q p ´ q ÿ“ 8 e k h i e k i “ p q ` i 1 p q p ´ q ÿ“ m k 1 “ p ´ q loooooooomoooooooon“observed”

2016-11-22 10.4 Prediction

v k e k p q H z p q p q

One-step ahead prediction The prediction of v k , based on measurements up to time k 1 is, p q ´ vˆ k k 1 . p | ´ q We will argue that a good choice in this case is,

8 vˆ k k 1 m k 1 h i e k i . p | ´ q “ p ´ q “ i 1 p q p ´ q ÿ“ The error in our prediction is e k — which we clearly can’t reduce. p q

2016-11-22 10.5

One-step prediction statistics

General case

Say e k is identically distributed with pdf: fe x , p q p q x δx ` Prob x e k x δx fe x dx fe x δx. t ď p q ď ` u “ p q « p q żx

A posteriori distribution

k 1 What are the statistics of v k given v ´ v , . . . , v k 1 ? p q ´8 “ t p´8q p ´ qu

k 1 Prob x v k x δx v ´ t ď p q ď ` | ´8 u “ Prob x e k m k 1 x δx “ t ď p q ` p ´ q ď ` u Prob x m k 1 e k x m k 1 δx “ t ´ p ´ q ď p q ď ´ p ´ q ` u fe x m k 1 δx. « p ´ p ´ qq

2016-11-22 10.6 One-step ahead prediction statistics

Maximum of the conditional (a posteriori) distribution

Select the prediction estimate as the peak value of the conditional distribution:

vˆ k k 1 argmax fe x m k 1 p | ´ q “ x p ´ p ´ qq m k 1 for the Gaussian case. “ p ´ q This is the most probable value of v k k 1 . p | ´ q of the conditional distribution Select the prediction estimate as the mean value of the conditional distribution:

k 1 vˆ k k 1 E v k v ´ E e k m k 1 p | ´ q “ t p q| ´8 u “ t p q ` p ´ qu m k 1 E e k m k 1 . “ p ´ q ` t p qu “ p ´ q This is the expected value of v k k 1 . p | ´ q

2016-11-22 10.7

One-step ahead prediction

Calculation 8 vˆ k k 1 m k 1 h i e k i p | ´ q “ p ´ q “ i 1 p q p ´ q ÿ“ H z 1 e k (assuming H z is monic) “ p p q ´ q p q p q H z 1 p q ´ v k “ H z p q p q 1 Hinv z v k “ p ´ p qq p q 8 hinv i v k i “ ´ i 1 p q p ´ q ÿ“ Note that vˆ k k 1 depends only on values up to time k 1. p | ´ q ´ The best we can do is:

k 8 vˆ k k 1 hinv i v k i hinv i v k i . p | ´ q “ ´ i 1 p q p ´ q « ´ i 1 p q p ´ q ÿ“ ÿ“

2016-11-22 10.8 Example

Moving average model

1 v k e k ce k 1 , H z 1 cz´ . p q “ p q ` p ´ q ùñ p q “ ` For H z to be stably invertible we require c 1. p q | | ă

1 8 i i H z c z´ . inv 1 cz 1 p q “ ´ “ i 0p´ q ` ÿ“

One-step ahead predictor

8 i vˆ k k 1 1 Hinv z v k c v k i p | ´ q “ p ´ p qq p q “ ´ i 1p´ q p ´ q ÿ“ k c iv k i « ´ i 1p´ q p ´ q ÿ“ cv k 1 c2v k 2 c3v k 3 c kv 0 . “ p ´ q ´ p ´ q ` p ´ q ` ¨ ¨ ¨ ´ p´ q p q

2016-11-22 10.9

Example

Moving average model

1 v k e k ce k 1 , H z 1 cz´ . p q “ p q ` p ´ q ùñ p q “ `

Recursive formulation Note that,

H z vˆ k k 1 H z 1 v k p q p | ´ q “ p p q ´ q p q So,

vˆ k k 1 cvˆ k 1 k 2 cv k 1 p | ´ q ` p ´ | ´ q “ p ´ q

vˆ k k 1 c v k 1 vˆ k 1 k 2 p | ´ q “ p p ´ q ´ p ´ | ´ qq  k 1 (prediction error at k 1) p ´ q ´ c looooooooooooooooomooooooooooooooooonk 1 “ p ´ q

2016-11-22 10.10 Another example

Autoregressive noise model Our noise model is:

8 v k aie k i a 1 for stability. p q “ i 0 p ´ q | | ă ÿ“

8 i i 1 So, H z a z´ , 1 az 1 p q “ i 0 “ ´ ÿ“ ´

1 and Hinv z 1 az´ (a moving average process) p q “ ´

Our one-step ahead predictor is,

vˆ k k 1 1 Hinv z v k av k 1 . p | ´ q “ p ´ p qq p q “ p ´ q

2016-11-22 10.11

Output prediction

e k p q

H z y k G z u k v k p q p q “ p q p q ` p q v k y k p q u k p q G z p q ` p q

One-step ahead prediction Maximise the expected value of the conditional distribution,

yˆ k k 1 E y k ZK G z u k vˆ k k 1 p | ´ q “ t p q| u “ p q p q ` p | ´ q G z u k 1 Hinv z v k “ p q p q ` p ´ p qq p q Hinv z G z u k 1 Hinv z y k “ p q p q p q ` p ´ p qq p q

2016-11-22 10.12 Output prediction

e k p q

H z y k G z u k v k p q p q “ p q p q ` p q v k y k p q u k p q G z p q ` p q

Prediction error

y k yˆ k k 1 Hinv z G z u k Hinv z y k p q ´ p | ´ q “ ´ p q p q p q ` p q p q Hinv z y k G z u k Hinv z v k “ p qp p q ´ p q p qq “ p q p q e k “ p q The innovation is the part of the output prediction that cannot be estimated from past measurements.

2016-11-22 10.13

Prediction error based identification

The one-step ahead predictor is parametrised by θ,

yˆ k θ, ZK Hinv θ, z G θ, z u k 1 Hinv θ, z y k p | q “ p q p q p q ` p ´ p qq p q Define a parametrised prediction error,

 k, θ y k yˆ k, θ , p q “ p q ´ p q which we can optionally filter,

F k, θ F z  k, θ (weighted error). p q “ p q p q Define a cost function,

K 1 1 ´ J θ, Z l  k, θ typically l  k, θ  k, θ . K K F F F 2 p q “ k 0 p p qq p p qq “ } p q} ÿ“

θˆ argmin J θ, ZK . “ θ p q

2016-11-22 10.14 Prediction error methods: ARX models

e k p q B θ, z G θ, z p q , p q “ A θ, z 1 p q A θ, z 1 p q H θ, z , A θ, z v k p q “ p q p q y k B θ, z u k p q p q p q ` A θ, z p q

yˆ k θ Hinv θ, z G θ, z u k 1 Hinv θ, z y k p | q “ p q p q p q ` p ´ p qq p q B z u k 1 A z y k “ p q p q ` p ´ p qq p q θT φ k φT k θ. “ p q “ p q So,

Y Φθ  vector of prediction errors ´ “ ÐÝ regression approach minimises the prediction errors.

2016-11-22 10.15

Model structures

e k p q

H θ, z p q v k y k p q u k p q G θ, z p q ` p q ARX model structure (equation error)

e k p q B θ, z G θ, z p q , p q “ A θ, z 1 p q A θ, z 1 p q H θ, z , A θ, z v k p q “ p q p q y k B θ, z u k p q p q p q ` A θ, z p q

2016-11-22 10.16 ARMAX model structure

e k p q B θ, z G θ, z p q , C θ, z p q “ A θ, z p q p q A θ, z C θ, z p q H θ, z p q , A θ, z v k p q “ p q p q y k B θ, z u k with A z , C z monic. p q p q p q p q p q ` A θ, z p q

Prediction error structure B z A z yˆ k θ p q u k 1 p q y k p | q “ C z p q ` ´ C z p q p q ˆ p q ˙ C z yˆ k θ B z u k C z A z y k p q p | q “ p q p q ` p p q ´ p qq p q

yˆ k θ B z u k 1 A z y k C z 1 y k yˆ k θ p | q “ p q p q ` p ´ p qq p q ` p p q ´ q p p q ´ p | qq  k p q loooooooomoooooooon

2016-11-22 10.17

Pseudolinear regression

One-step ahead ARMAX predictor yˆ k θ B z u k 1 A z y k C z 1  k p | q “ p q p q ` p ´ p qq p q ` p p q ´ q p q b1 . . . a1 . . . c1 ... “ T “ u k 1 ... y k‰ 1 . . .  k 1 ... p ´ q ´ p ´ q p ´ q ϕT θ, k“ θ. ‰ “ p q This is not linear in θ.

Optimisation-based algorithm

minimise  2 (or more generally, l  ) θ, } } p q subject to Y Φ  T θ  (nonlinear equality constraint) “ p q `

2016-11-22 10.18 ARMAX example

ARMAX model structure e k p q 1 2 A z 1 a1z´ a2z´ p q “ ` ` C θ, z 1 2 B z b1z´ b2z´ p q p q “ ` A θ, z 1 2 p q C z 1 c1z´ c2z´ p q “ ` ` v k p q y k B θ, z u k p q p q p q T ` A θ, z θ b1 b2 a1 a2 c1 c2 . p q “ “ ‰

Experiments

§ The plant is “at rest”.

§ length K 31. “ § PRBS input signal, u k . p q

2016-11-22 10.19

ARMAX example

Typical experimental data

1 u(k) 0.5 0 -0.5 -1 0 5 10 15 20 25 30 Index: k 20 y(k) 10

0

-10

-20 0 5 10 15 20 25 30 Index: k 0.5 v(k)

0

-0.5 0 5 10 15 20 25 30 Index: k

2016-11-22 10.20 Constrained minimisation code

% Create data part of regressor. Assume plant at rest PhiTyu(1,:) = [0, 0, 0, 0]; PhiTyu(2,:) = [u(1),0, -y(1), 0]; for i = 3:K, PhiTyu(i,:) = [u(i-1), u(i-2), -y(i-1), -y(i-2)]; end

[x,fval] = fmincon(@(x)ARMAXobjective(x),x0,... [],[],[],[],[],[],@(x)ARMAXconstraint(x,y,PhiTyu));

function [f] = ARMAXobjective(x) % x = [theta; e] f = sqrt(x(7:end)’*x(7:end));

function [c,ceq] = ARMAXconstraint(x,y,PhiTyu) e = x(7:end); PhiTe = zeros(K,2); PhiTe(2,1) = e(1); for j = 3:K, PhiTe(j,:) = [e(j-1), e(j-2)]; end ceq = y - [PhiTyu, PhiTe] * theta - e; c = [];

2016-11-22 10.21

ARMAX example

Transfer function averages: 128 (data length, K 31) “

Mean estimate comparison, K = 31 103 Gz 2 Hz 10 Gopt Hopt

101 Mag.

100

10-1 10-2 10-1 100

50

0

-50

-100

-150

-200 10-2 10-1 100

2016-11-22 10.22 ARMAX example

Coefficient statistics for 128 experiments

Expt datalength = 31 (green circles are the true values) 15

10

5

0

-5

-10

-15 b_1 b_2 a_1 a_2 c_1 c_2 coefficient

2016-11-22 10.23

ARMAX example

Coefficient error for averages: 2, 4, 8, . . . , 128 experiments

103

102

101

100

10-1 theta error

10-2

b 10-3 1 b 2 a 1 a -4 2 10 c 1 c 2

101 102 103 104 Total data length (#avgs x expt length)

2016-11-22 10.24 ARMAX example

Transfer function estimates: 128 experiments (data length, K 31) “

|G(jω)| estimates (128) and mean estimate for K = 31

102

101 Magnitude

100

10-1 10-2 10-1 100 Frequency: rad/sample

2016-11-22 10.25

ARMAX example

Transfer function estimates: 128 experiments (data length, K 31) “

|H(jω)| estimates (128) and mean estimate for K = 31

104

103

102 Magnitude

101

100

10-1 10-2 10-1 100 Frequency: rad/sample

2016-11-22 10.26 ARMAX example

Prediction errors and actual innovations (data length, K 31) “

Actual innovations vs. optimum prediction error

mean ||e||/sqrt(N) mean ||eps||/sqrt(N)

10-1

10-2 101 102 103 104 Total data length (#avgs x expt length)

2016-11-22 10.27

ARMAX example

Longer experiments: K 127 “

1 u(k) 0.5 0 -0.5 -1 0 20 40 60 80 100 120 Index: k 20 y(k) 10

0

-10

-20 0 20 40 60 80 100 120 Index: k 0.5 v(k)

0

-0.5 0 20 40 60 80 100 120 Index: k

2016-11-22 10.28 ARMAX example Coefficient statistics comparison: K 31 and K 127 “ “ 15

10

5

0

5 ´

10 ´

15 ´ b1 b1 a1 a1 c1 c1 K 31 K 127 K 31 K 127 K 31 K 127 “ “ “ “ “ “ Coefficients

2016-11-22 10.29

ARMAX example

Coefficient error comparison: K 31 and K 127 “ “

103

102

101

100

10-1 theta error

10-2

b 10-3 1 b 2 a 1 a -4 2 10 c 1 c 2

101 102 103 104 Total data length (#avgs x expt length)

2016-11-22 10.30 ARMAX example

Prediction error comparison: K 31 and K 127 “ “

Actual innovations vs. optimum prediction error

mean ||e||/sqrt(N) mean ||eps||/sqrt(N)

10-1

10-2 101 102 103 104 Total data length (#avgs x expt length)

2016-11-22 10.31

ARMAX example

Transfer function estimates: 32 experiments (data length, K 127) “

|G(jω)| estimates (32) and mean estimate for K = 127

102

101 Magnitude

100

10-1 10-2 10-1 100 Frequency: rad/sample

2016-11-22 10.32 ARMAX example

Transfer function estimates: 32 experiments (data length, K 127) “

|H(jω)| estimates (32) and mean estimate for K = 127

104

103

102 Magnitude

101

100

10-1 10-2 10-1 100 Frequency: rad/sample

2016-11-22 10.33

ARARMAX model structure

e k p q

C θ, z p q D θ, z p q v k p q y k 1 u k p q B θ, z p q A θ, z ` p q p q

B θ, z C θ, z G θ, z p q ,H θ, z p q , p q “ A θ, z p q “ A θ, z D θ, z p q p q p q with A z , C z and D z monic. p q p q p q

2016-11-22 10.34 Output error model structure

e k p q B θ, z y k B θ, z u k G θ, z p q , p q p q p q p q “ F θ, z ` F θ, z p q p q H θ, z 1. p q “ with F z monic. p q

Pseudolinear predictor framework B θ, z yˆ k θ p q u k φ k, θ T θ. p | q “ F θ, z p q “ p q p q where

φ k, θ T p q “ u k 1 . . . u k m yˆ k 1, θ ... yˆ k nf , θ . r p ´ q p ´ q ´ p ´ q ´ p ´ q s pseudolinear terms loooooooooooooooooooooooomoooooooooooooooooooooooon

2016-11-22 10.35

Box-Jenkins model structure

e k p q

C θ, z p q D θ, z p q v k p q y k B θ, z u k p q p q p q ` F θ, z p q

B θ, z C θ, z G θ, z p q ,H θ, z p q , p q “ F θ, z p q “ D θ, z p q p q Predictor D z B z D z yˆ k θ p q p q u k 1 p q y k p | q “ C z F z p q ` ´ C z p q p q p q ˆ p q ˙

2016-11-22 10.36 General model structure

e k p q

C θ, z p q D θ, z p q v k p q y k 1 B θ, z u k p q p q p q A θ, z ` F θ, z p q p q

B θ, z C θ, z G θ, z p q ,H θ, z p q p q “ A θ, z F θ, z p q “ A θ, z D θ, z p q p q p q p q Predictor D z B z D z A z yˆ k θ p q p q u k 1 p q p q y k p | q “ C z F z p q ` ´ C z p q p q p q ˆ p q ˙ A pseudo- can be derived.

2016-11-22 10.37

Known noise model (with ARMAX dynamics)

Assume that the noise is known,

v k L z e k p q “ p q p q So

A z y k B z u k L z e k . p q p q “ p q p q ` p q p q 1 Filter signals via L´ z , p q 1 yL k L´ z y k p q “ p q p q 1 uL k L´ z u k p q “ p q p q Giving,

A z yL k B z uL k e k , p q p q “ p q p q ` p q for which LS methods give consistent estimates.

2016-11-22 10.38 High-order model fitting

Assume that the noise is autoregressive (ARARX structure), 1 A z y k B z u k e k e k 0, λ . p q p q “ p q p q ` D z p q p q „ N p q p q Fit a high order model (order of D z is nd): p q A z D z y k B z D z u k e k . p q p q p q “ p q p q p q ` p q

Least squares estimate with orders n nd and m nd. This gives a consistent ` ` estimate of, B z D z B z p q p q p q . A z D z “ A z p q p q p q This amounts to making the noise model sufficiently rich to capture additional autoregressive features in the noise.

In practice the cancellation will not be exact: Aˆ z and Bˆ z will be high order. p q p q

2016-11-22 10.39

Bibliography

Prediction Lennart Ljung, System Identification;Theory for the User, 2nd Ed., Prentice-Hall, 1999, [section 3.2].

Model parametrisations Lennart Ljung, System Identification;Theory for the User, 2nd Ed., Prentice-Hall, 1999, [sections 1.3 and 4.2].

Linear and pseudolinear regression Lennart Ljung, System Identification;Theory for the User, 2nd Ed., Prentice-Hall, 1999, [sections 10.1 and 10.2].

2016-11-22 10.40