System Identification Lecture 10: Prediction error methods and pseudo-linear regressions
Roy Smith
2016-11-22 10.1
Prediction
e k p q
H z p q v k y k p q u k p q G z p q ` p q Typical assumptions G z and H z are stable, p q p q H z is stably invertible (no zeros outside the unit disk) p q e k has known statistics: known pdf or known moments. p q
One-step ahead prediction
Given ZK u 0 , y 0 , . . . , u K 1 , y K 1 , “ t p q p q p ´ q p ´ qu what is the best estimate of y K ? p q
2016-11-22 10.2 Prediction
v k e k p q H z p q p q
Noise model invertibility Given, v k , k 0,...,K 1, can we determine e k , k 0,...,K 1? p q “ ´ p q “ ´
8 Inverse filter: Hinv z : e k hinv i v k i p q p q “ i 0 p q p ´ q ÿ“ We also want the inverse filter to be causal and stable:
8 hinv k 0, k 0, and hinv k . p q “ ă k 0 | p q| ă 8 ÿ“ If H z has no zeros for z 1, then, p q | | ě 1 Hinv z . p q “ H z p q
2016-11-22 10.3
Prediction
v k e k p q H z p q p q
One step ahead prediction Given measurements of v k , k 0,...,K 1, can we predict v K ? p q “ ´ p q Assume that we know H z , how much can we say about v K ? p q p q Assume also that H z is monic (h 0 1). p q p q “
8 v k h i e k i p q “ i 0 p q p ´ q ÿ“ 8 e k h i e k i “ p q ` i 1 p q p ´ q ÿ“ m k 1 “ p ´ q loooooooomoooooooon“observed”
2016-11-22 10.4 Prediction
v k e k p q H z p q p q
One-step ahead prediction The prediction of v k , based on measurements up to time k 1 is, p q ´ vˆ k k 1 . p | ´ q We will argue that a good choice in this case is,
8 vˆ k k 1 m k 1 h i e k i . p | ´ q “ p ´ q “ i 1 p q p ´ q ÿ“ The error in our prediction is e k — which we clearly can’t reduce. p q
2016-11-22 10.5
One-step prediction statistics
General case
Say e k is identically distributed with pdf: fe x , p q p q x δx ` Prob x e k x δx fe x dx fe x δx. t ď p q ď ` u “ p q « p q żx
A posteriori distribution
k 1 What are the statistics of v k given v ´ v , . . . , v k 1 ? p q ´8 “ t p´8q p ´ qu
k 1 Prob x v k x δx v ´ t ď p q ď ` | ´8 u “ Prob x e k m k 1 x δx “ t ď p q ` p ´ q ď ` u Prob x m k 1 e k x m k 1 δx “ t ´ p ´ q ď p q ď ´ p ´ q ` u fe x m k 1 δx. « p ´ p ´ qq
2016-11-22 10.6 One-step ahead prediction statistics
Maximum of the conditional (a posteriori) distribution
Select the prediction estimate as the peak value of the conditional distribution:
vˆ k k 1 argmax fe x m k 1 p | ´ q “ x p ´ p ´ qq m k 1 for the Gaussian case. “ p ´ q This is the most probable value of v k k 1 . p | ´ q Mean of the conditional distribution Select the prediction estimate as the mean value of the conditional distribution:
k 1 vˆ k k 1 E v k v ´ E e k m k 1 p | ´ q “ t p q| ´8 u “ t p q ` p ´ qu m k 1 E e k m k 1 . “ p ´ q ` t p qu “ p ´ q This is the expected value of v k k 1 . p | ´ q
2016-11-22 10.7
One-step ahead prediction
Calculation 8 vˆ k k 1 m k 1 h i e k i p | ´ q “ p ´ q “ i 1 p q p ´ q ÿ“ H z 1 e k (assuming H z is monic) “ p p q ´ q p q p q H z 1 p q ´ v k “ H z p q p q 1 Hinv z v k “ p ´ p qq p q 8 hinv i v k i “ ´ i 1 p q p ´ q ÿ“ Note that vˆ k k 1 depends only on values up to time k 1. p | ´ q ´ The best we can do is:
k 8 vˆ k k 1 hinv i v k i hinv i v k i . p | ´ q “ ´ i 1 p q p ´ q « ´ i 1 p q p ´ q ÿ“ ÿ“
2016-11-22 10.8 Example
Moving average model
1 v k e k ce k 1 , H z 1 cz´ . p q “ p q ` p ´ q ùñ p q “ ` For H z to be stably invertible we require c 1. p q | | ă
1 8 i i H z c z´ . inv 1 cz 1 p q “ ´ “ i 0p´ q ` ÿ“
One-step ahead predictor
8 i vˆ k k 1 1 Hinv z v k c v k i p | ´ q “ p ´ p qq p q “ ´ i 1p´ q p ´ q ÿ“ k c iv k i « ´ i 1p´ q p ´ q ÿ“ cv k 1 c2v k 2 c3v k 3 c kv 0 . “ p ´ q ´ p ´ q ` p ´ q ` ¨ ¨ ¨ ´ p´ q p q
2016-11-22 10.9
Example
Moving average model
1 v k e k ce k 1 , H z 1 cz´ . p q “ p q ` p ´ q ùñ p q “ `
Recursive formulation Note that,
H z vˆ k k 1 H z 1 v k p q p | ´ q “ p p q ´ q p q So,
vˆ k k 1 cvˆ k 1 k 2 cv k 1 p | ´ q ` p ´ | ´ q “ p ´ q
vˆ k k 1 c v k 1 vˆ k 1 k 2 p | ´ q “ p p ´ q ´ p ´ | ´ qq k 1 (prediction error at k 1) p ´ q ´ c looooooooooooooooomooooooooooooooooonk 1 “ p ´ q
2016-11-22 10.10 Another example
Autoregressive noise model Our noise model is:
8 v k aie k i a 1 for stability. p q “ i 0 p ´ q | | ă ÿ“
8 i i 1 So, H z a z´ , 1 az 1 p q “ i 0 “ ´ ÿ“ ´
1 and Hinv z 1 az´ (a moving average process) p q “ ´
Our one-step ahead predictor is,
vˆ k k 1 1 Hinv z v k av k 1 . p | ´ q “ p ´ p qq p q “ p ´ q
2016-11-22 10.11
Output prediction
e k p q
H z y k G z u k v k p q p q “ p q p q ` p q v k y k p q u k p q G z p q ` p q
One-step ahead prediction Maximise the expected value of the conditional distribution,
yˆ k k 1 E y k ZK G z u k vˆ k k 1 p | ´ q “ t p q| u “ p q p q ` p | ´ q G z u k 1 Hinv z v k “ p q p q ` p ´ p qq p q Hinv z G z u k 1 Hinv z y k “ p q p q p q ` p ´ p qq p q
2016-11-22 10.12 Output prediction
e k p q
H z y k G z u k v k p q p q “ p q p q ` p q v k y k p q u k p q G z p q ` p q
Prediction error
y k yˆ k k 1 Hinv z G z u k Hinv z y k p q ´ p | ´ q “ ´ p q p q p q ` p q p q Hinv z y k G z u k Hinv z v k “ p qp p q ´ p q p qq “ p q p q e k “ p q The innovation is the part of the output prediction that cannot be estimated from past measurements.
2016-11-22 10.13
Prediction error based identification
The one-step ahead predictor is parametrised by θ,
yˆ k θ, ZK Hinv θ, z G θ, z u k 1 Hinv θ, z y k p | q “ p q p q p q ` p ´ p qq p q Define a parametrised prediction error,
k, θ y k yˆ k, θ , p q “ p q ´ p q which we can optionally filter,
F k, θ F z k, θ (weighted error). p q “ p q p q Define a cost function,
K 1 1 ´ J θ, Z l k, θ typically l k, θ k, θ . K K F F F 2 p q “ k 0 p p qq p p qq “ } p q} ÿ“
θˆ argmin J θ, ZK . “ θ p q
2016-11-22 10.14 Prediction error methods: ARX models
e k p q B θ, z G θ, z p q , p q “ A θ, z 1 p q A θ, z 1 p q H θ, z , A θ, z v k p q “ p q p q y k B θ, z u k p q p q p q ` A θ, z p q
yˆ k θ Hinv θ, z G θ, z u k 1 Hinv θ, z y k p | q “ p q p q p q ` p ´ p qq p q B z u k 1 A z y k “ p q p q ` p ´ p qq p q θT φ k φT k θ. “ p q “ p q So,
Y Φθ vector of prediction errors ´ “ ÐÝ Least squares regression approach minimises the prediction errors.
2016-11-22 10.15
Model structures
e k p q
H θ, z p q v k y k p q u k p q G θ, z p q ` p q ARX model structure (equation error)
e k p q B θ, z G θ, z p q , p q “ A θ, z 1 p q A θ, z 1 p q H θ, z , A θ, z v k p q “ p q p q y k B θ, z u k p q p q p q ` A θ, z p q
2016-11-22 10.16 ARMAX model structure
e k p q B θ, z G θ, z p q , C θ, z p q “ A θ, z p q p q A θ, z C θ, z p q H θ, z p q , A θ, z v k p q “ p q p q y k B θ, z u k with A z , C z monic. p q p q p q p q p q ` A θ, z p q
Prediction error structure B z A z yˆ k θ p q u k 1 p q y k p | q “ C z p q ` ´ C z p q p q ˆ p q ˙ C z yˆ k θ B z u k C z A z y k p q p | q “ p q p q ` p p q ´ p qq p q
yˆ k θ B z u k 1 A z y k C z 1 y k yˆ k θ p | q “ p q p q ` p ´ p qq p q ` p p q ´ q p p q ´ p | qq k p q loooooooomoooooooon
2016-11-22 10.17
Pseudolinear regression
One-step ahead ARMAX predictor yˆ k θ B z u k 1 A z y k C z 1 k p | q “ p q p q ` p ´ p qq p q ` p p q ´ q p q b1 . . . a1 . . . c1 ... “ T “ u k 1 ... y k‰ 1 . . . k 1 ... p ´ q ´ p ´ q p ´ q ϕT θ, k“ θ. ‰ “ p q This is not linear in θ.
Optimisation-based algorithm
minimise 2 (or more generally, l ) θ, } } p q subject to Y Φ T θ (nonlinear equality constraint) “ p q `
2016-11-22 10.18 ARMAX example
ARMAX model structure e k p q 1 2 A z 1 a1z´ a2z´ p q “ ` ` C θ, z 1 2 B z b1z´ b2z´ p q p q “ ` A θ, z 1 2 p q C z 1 c1z´ c2z´ p q “ ` ` v k p q y k B θ, z u k p q p q p q T ` A θ, z θ b1 b2 a1 a2 c1 c2 . p q “ “ ‰
Experiments
§ The plant is “at rest”.
§ Data length K 31. “ § PRBS input signal, u k . p q
2016-11-22 10.19
ARMAX example
Typical experimental data
1 u(k) 0.5 0 -0.5 -1 0 5 10 15 20 25 30 Index: k 20 y(k) 10
0
-10
-20 0 5 10 15 20 25 30 Index: k 0.5 v(k)
0
-0.5 0 5 10 15 20 25 30 Index: k
2016-11-22 10.20 Constrained minimisation code
% Create data part of regressor. Assume plant at rest PhiTyu(1,:) = [0, 0, 0, 0]; PhiTyu(2,:) = [u(1),0, -y(1), 0]; for i = 3:K, PhiTyu(i,:) = [u(i-1), u(i-2), -y(i-1), -y(i-2)]; end
[x,fval] = fmincon(@(x)ARMAXobjective(x),x0,... [],[],[],[],[],[],@(x)ARMAXconstraint(x,y,PhiTyu));
function [f] = ARMAXobjective(x) % x = [theta; e] f = sqrt(x(7:end)’*x(7:end));
function [c,ceq] = ARMAXconstraint(x,y,PhiTyu) e = x(7:end); PhiTe = zeros(K,2); PhiTe(2,1) = e(1); for j = 3:K, PhiTe(j,:) = [e(j-1), e(j-2)]; end ceq = y - [PhiTyu, PhiTe] * theta - e; c = [];
2016-11-22 10.21
ARMAX example
Transfer function averages: 128 experiments (data length, K 31) “
Mean estimate comparison, K = 31 103 Gz 2 Hz 10 Gopt Hopt
101 Mag.
100
10-1 10-2 10-1 100 Frequency
50
0
-50
-100
-150
-200 10-2 10-1 100
2016-11-22 10.22 ARMAX example
Coefficient statistics for 128 experiments
Expt datalength = 31 (green circles are the true values) 15
10
5
0
-5
-10
-15 b_1 b_2 a_1 a_2 c_1 c_2 coefficient
2016-11-22 10.23
ARMAX example
Coefficient error for averages: 2, 4, 8, . . . , 128 experiments
103
102
101
100
10-1 theta error
10-2
b 10-3 1 b 2 a 1 a -4 2 10 c 1 c 2
101 102 103 104 Total data length (#avgs x expt length)
2016-11-22 10.24 ARMAX example
Transfer function estimates: 128 experiments (data length, K 31) “
|G(jω)| estimates (128) and mean estimate for K = 31
102
101 Magnitude
100
10-1 10-2 10-1 100 Frequency: rad/sample
2016-11-22 10.25
ARMAX example
Transfer function estimates: 128 experiments (data length, K 31) “
|H(jω)| estimates (128) and mean estimate for K = 31
104
103
102 Magnitude
101
100
10-1 10-2 10-1 100 Frequency: rad/sample
2016-11-22 10.26 ARMAX example
Prediction errors and actual innovations (data length, K 31) “
Actual innovations vs. optimum prediction error
mean ||e||/sqrt(N) mean ||eps||/sqrt(N)
10-1
10-2 101 102 103 104 Total data length (#avgs x expt length)
2016-11-22 10.27
ARMAX example
Longer experiments: K 127 “
1 u(k) 0.5 0 -0.5 -1 0 20 40 60 80 100 120 Index: k 20 y(k) 10
0
-10
-20 0 20 40 60 80 100 120 Index: k 0.5 v(k)
0
-0.5 0 20 40 60 80 100 120 Index: k
2016-11-22 10.28 ARMAX example Coefficient statistics comparison: K 31 and K 127 “ “ 15
10
5
0
5 ´
10 ´
15 ´ b1 b1 a1 a1 c1 c1 K 31 K 127 K 31 K 127 K 31 K 127 “ “ “ “ “ “ Coefficients
2016-11-22 10.29
ARMAX example
Coefficient error comparison: K 31 and K 127 “ “
103
102
101
100
10-1 theta error
10-2
b 10-3 1 b 2 a 1 a -4 2 10 c 1 c 2
101 102 103 104 Total data length (#avgs x expt length)
2016-11-22 10.30 ARMAX example
Prediction error comparison: K 31 and K 127 “ “
Actual innovations vs. optimum prediction error
mean ||e||/sqrt(N) mean ||eps||/sqrt(N)
10-1
10-2 101 102 103 104 Total data length (#avgs x expt length)
2016-11-22 10.31
ARMAX example
Transfer function estimates: 32 experiments (data length, K 127) “
|G(jω)| estimates (32) and mean estimate for K = 127
102
101 Magnitude
100
10-1 10-2 10-1 100 Frequency: rad/sample
2016-11-22 10.32 ARMAX example
Transfer function estimates: 32 experiments (data length, K 127) “
|H(jω)| estimates (32) and mean estimate for K = 127
104
103
102 Magnitude
101
100
10-1 10-2 10-1 100 Frequency: rad/sample
2016-11-22 10.33
ARARMAX model structure
e k p q
C θ, z p q D θ, z p q v k p q y k 1 u k p q B θ, z p q A θ, z ` p q p q
B θ, z C θ, z G θ, z p q ,H θ, z p q , p q “ A θ, z p q “ A θ, z D θ, z p q p q p q with A z , C z and D z monic. p q p q p q
2016-11-22 10.34 Output error model structure
e k p q B θ, z y k B θ, z u k G θ, z p q , p q p q p q p q “ F θ, z ` F θ, z p q p q H θ, z 1. p q “ with F z monic. p q
Pseudolinear predictor framework B θ, z yˆ k θ p q u k φ k, θ T θ. p | q “ F θ, z p q “ p q p q where
φ k, θ T p q “ u k 1 . . . u k m yˆ k 1, θ ... yˆ k nf , θ . r p ´ q p ´ q ´ p ´ q ´ p ´ q s pseudolinear terms loooooooooooooooooooooooomoooooooooooooooooooooooon
2016-11-22 10.35
Box-Jenkins model structure
e k p q
C θ, z p q D θ, z p q v k p q y k B θ, z u k p q p q p q ` F θ, z p q
B θ, z C θ, z G θ, z p q ,H θ, z p q , p q “ F θ, z p q “ D θ, z p q p q Predictor D z B z D z yˆ k θ p q p q u k 1 p q y k p | q “ C z F z p q ` ´ C z p q p q p q ˆ p q ˙
2016-11-22 10.36 General model structure
e k p q
C θ, z p q D θ, z p q v k p q y k 1 B θ, z u k p q p q p q A θ, z ` F θ, z p q p q
B θ, z C θ, z G θ, z p q ,H θ, z p q p q “ A θ, z F θ, z p q “ A θ, z D θ, z p q p q p q p q Predictor D z B z D z A z yˆ k θ p q p q u k 1 p q p q y k p | q “ C z F z p q ` ´ C z p q p q p q ˆ p q ˙ A pseudo-linear regression can be derived.
2016-11-22 10.37
Known noise model (with ARMAX dynamics)
Assume that the noise is known,
v k L z e k p q “ p q p q So
A z y k B z u k L z e k . p q p q “ p q p q ` p q p q 1 Filter signals via L´ z , p q 1 yL k L´ z y k p q “ p q p q 1 uL k L´ z u k p q “ p q p q Giving,
A z yL k B z uL k e k , p q p q “ p q p q ` p q for which LS methods give consistent estimates.
2016-11-22 10.38 High-order model fitting
Assume that the noise is autoregressive (ARARX structure), 1 A z y k B z u k e k e k 0, λ . p q p q “ p q p q ` D z p q p q „ N p q p q Fit a high order model (order of D z is nd): p q A z D z y k B z D z u k e k . p q p q p q “ p q p q p q ` p q
Least squares estimate with orders n nd and m nd. This gives a consistent ` ` estimate of, B z D z B z p q p q p q . A z D z “ A z p q p q p q This amounts to making the noise model sufficiently rich to capture additional autoregressive features in the noise.
In practice the cancellation will not be exact: Aˆ z and Bˆ z will be high order. p q p q
2016-11-22 10.39
Bibliography
Prediction Lennart Ljung, System Identification;Theory for the User, 2nd Ed., Prentice-Hall, 1999, [section 3.2].
Model parametrisations Lennart Ljung, System Identification;Theory for the User, 2nd Ed., Prentice-Hall, 1999, [sections 1.3 and 4.2].
Linear and pseudolinear regression Lennart Ljung, System Identification;Theory for the User, 2nd Ed., Prentice-Hall, 1999, [sections 10.1 and 10.2].
2016-11-22 10.40