Introduction Model sets Identification criterion Convergence Consistency Distribution
System Identification Lecture 2 Prediction error method
Paul Van den Hof
Control Systems Group Eindhoven University of Technology
dutch institute of systems and control
Paul Van den Hof Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 1 / 51 Contents
Introduction
Model sets
Identification criterion
Convergence
Consistency
Distribution Introduction Model sets Identification criterion Convergence Consistency Distribution
Schedule
Apr 9 PVdH Introduction; concepts Apr 16 PVdH Prediction error identification Apr 23 PVdH Asymptotic analysis Apr 30 GB Bayesian estimation; machine learning May 7 JS Frequency domain identification May 14 JS Nonlinear identification May 28 XB Experiment design June 4 PVdH Closed-loops and dynamic networks
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 3 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
One-step-ahead prediction For a dynamical model y(t) = G(q)u(t) + H(q)e(t) the t−1 t one-step-ahead predictiony ˆ(t|t − 1) := E{y(t)|y , u } is given by yˆ(t|t − 1) = H−1(q)G(q)u(t) + [1 − H−1(q)]y(t)
At the same time:
y(t) =y ˆ(t|t − 1) + e(t)
The one-step ahead predictor of a dynamic model predicts the output of the model up to the white noise term e(t), called the innovation process.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 4 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
One-step-ahead prediction is going to be used as a way to measure the deviation/distance between a data set {(y(t), u(t))}t=1,···N and a predictor model (G(q), H(q)). If the data has been generated by (G, H) then
y(t) − yˆ(t|t − 1) = e(t)
If the data comes from another system, but we calculate the prediction on the basis of (G, H), then
y(t) − yˆ(t|t − 1) = ε(t)
the one-step-ahead prediction error.
ε(t) = H(q)−1[y(t) − G(q)u(t)]
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 5 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
e
H v u + y G + presumed data generating system
predictor model - + G
H-1
e(t)
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 6 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
e
H0 v u + y G 0 + presumed data generating system
predictor model - + G
H-1
ε(t)
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 7 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
ε(t) is going to serve as a basis for the identification criterion.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 8 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Black box model structures and model sets
Predictor model: (G(q), H(q)) Model set: d M = {(G(q, θ), H(q, θ)), θ ∈ Θ ⊂ R } A parametrization is used to represent models by real-valued coefficients Example of parametrization: T • Coefficients in polynomial fractions ⇐ θ = (b1 a1 c1 d1)
−1 −1 b1q 1 + c1q G(q, θ) = −1 ; H(q, θ) = −1 1 + a1q 1 + d1q • Coefficients in series expansions • Matrix coefficients in state space models
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 9 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
ARX Model structure B(q−1, θ) 1 G(q, θ) = ; H(q, θ) = A(q−1, θ) A(q−1, θ)
with
−1 −nk −1 −nb+1 B(q , θ) = q {b0 + b1q + ··· + bnb−1q } −1 −1 −na A(q , θ) = 1 + a1q + ··· + ana q T θ = [a1 a2 ··· ana b0 b1 ··· bnb−1] .
na, nb are the number of parameters in the A and B polynomial. nk number of time delays Predictor:
yˆ(t|t − 1; θ) = B(q−1, θ)u(t) + [1 − A(q−1, θ)]y(t)
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 10 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Model structures
Model structure G(q, θ) H(q, θ) B(q−1, θ) 1 ARX A(q−1, θ) A(q−1, θ) B(q−1, θ) C(q−1, θ) ARMAX A(q−1, θ) A(q−1, θ) B(q−1, θ) OE - Output Error 1 F (q−1, θ)
FIR B(q−1, θ) 1
B(q−1, θ) C(q−1, θ) BJ - Box-Jenkins F (q−1, θ) D(q−1, θ)
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 11 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Properties of model structures • Linearity-in-the-parameters (ARX)
yˆ(t|t − 1; θ) = B(q−1)u(t) + (1 − A(q−1))y(t) = φT (t)θ
is a linear function in θ. ⇒ Important computational advantages. • Independent parametrization of G(q, θ) en H(q, θ) There are no common parameters in G and H. ⇒ Advantages for independent identification of G and H.
Both properties to be utilized later on.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 12 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
System in the model set
Data-generating system S :[G0, H0] d Model set: M : {[G(q, θ), H(q, θ)], θ ∈ Θ ⊂ R }.
S ∈ M denotes that the data generating system can exactly be represented within M, i.e. ∃θ0 ∈ Θ such that
G(q, θ0) = G0(q)
H(q, θ0) = H0(q)
d The notion G0 ∈ G with G = {G(q, θ), θ ∈ Θ ⊂ R } denotes that only G0 can exactly be represented, i.e.
G(q, θ0) = G0(q)
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 13 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Identification criterion
Identification criterion Consider the data-generating system:
y(t) = G0(q)u(t) + H0(q)e(t)
and the parametrized model G(q, θ), H(q, θ). 2 Denote: V¯(θ) = E¯ε (t, θ). ¯ 2 ˆ Then: V (θ) ≥ σe , with equality for θ if
G(q, θˆ) = G0(q)
H(q, θˆ) = H0(q)
Uniqueness of this solution requires some more conditions, to be specified later.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 14 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Reasoning: G (q) − G(q, θ) H (q) ε(t, θ) = 0 u(t) + 0 e(t) H(q, θ) H(q, θ) u and e uncorrelated; First term becomes 0 for G(q, θˆ) = G0(q). H (q) 0 e(t) = [1 + γ (θ)q−1 + γ (θ)q−2 + ··· ]e(t) H(q, θ) 1 2
¯ 2 2 2 2 Eε (t, θ) = [1 + γ1(θ) + γ2(θ) ··· ]σe
is minimal for γ1(θˆ) = γ2(θˆ) = ··· = 0. 2 ⇒ Cost function minimum σe is achieved for
G(q, θˆ) = G0 and H(q, θˆ) = H0.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 15 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Identification criterion Power of prediction error is estimated from a data sequence through the quadratic criterion:
N 1 X V (θ, Z N ) = ε2(t, θ) N N t=1
Parameter estimation through minimizing VN :
N θˆN = arg min VN (θ, Z ) θ
Optimization problem that is convex or not....
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 16 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Optimization: Convex Non-convex
θ V( ) VN(θ)
θ θ
Optimization is convex if the criterion is quadratic in θ. In that situation, the solution can be found analytically.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 17 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Linear regression
If the predictor is linear-in-the-parameters (ARX,FIR):
yˆ(t|t − 1; θ) = ϕT (t)θ
then N 1 X V (θ, Z N ) = [y(t) − ϕT (t)θ]2 N N t=1 and an analytical solution is obtained:
" N #−1 N 1 X 1 X θˆ = ϕ(t)ϕT (t) · ϕ(t)y(t) N N N t=1 t=1
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 18 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Assumptions for analysis: • Data has been generated by a linear dynamical system, possibly operating in a (stabilized) feedback situation, with bounded excitation signals, and v having bounded moments of order 4; •M is uniformly stable (→ the predictor filters and their derivatives w.r.t. θ are uniformly stable).
Convergence result N Under the above assumptions, θˆN = arg minθ VN (θ, Z ), for N → ∞ converges with probability 1 to the minimizing argument θ∗ of 2 V¯(θ) := E¯ε (t, θ)
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 19 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Differently stated:
ˆ ∗ ¯ 2 θN → θ := arg min Eε (t, θ) w.p. 1 for N → ∞. θ The estimate converges to the best possible predictor (in the sense 2 of E¯ε ) within the model set M, and becomes independent of the particular noise sequence (variance goes to 0)
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 20 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Persistence of excitation
Definition - persistently exciting input A quasi-stationary signal u is persistently exciting of order n if the spectral density Φu(ω) is unequal to 0 in n points in the interval (−π, π].
Example The signal u(t) = sin(ω0t) is persistently exciting of order 2. (Φu has a contribution in ±ω0) 2 degrees of freedom (amplitude and phase).
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 21 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Equivalent formulation Proposition A quasi-stationary signal u is persistently exciting of order n if and only if the (Toeplitz) matrix Ru(0) Ru(1) ··· Ru(n − 1) Ru(1) Ru(0) ··· Ru(n − 2) R¯ := n ...... . . . . Ru(n − 1) ··· Ru(1) Ru(0) ¯ with Ru(τ) := E{u(t)u(t − τ)}, is non-singular.
Example: A white noise process (Ru(τ) = δ(τ)) is persistently exciting of any order. Note: R¯n = In Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 22 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Example block signal 2
1.5
1
0.5
0
−0.5
−1
−1.5
−2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 1 Ru(0) = 1 Ru(1) = 3 Ru(2) = − 3 Ru(3) = −1 1 1 Ru(4) = − 3 Ru(5) = 3 Ru(6) = 1 etcetera
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 23 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
1 1 1 3 − 3 −1 1 1 1 ¯ 3 1 3 − 3 R4 = 1 1 1 − 3 3 1 3 1 1 −1 − 3 3 1
R¯3 is regular, R¯4 is singular so u is p.e. of order 3
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 24 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
A step signal:
u(t) = 0 t < 0 = c t ≥ 0
is actually a transient signal, and therefore does not fit well into the “persistent signals” framework. Although a step response reveals all dynamics of an LTI system, a step signal is p.e. of order 1, only having a spectral contribution at ω = 0. So it only persistently excites frequency 0.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 25 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Consistency result for S ∈ M
Consistency for S ∈ M
Let data be generated by a system (G0, H0), and consider a model set M with G(q, θ) parametrized according to
−1 −n +1 b0 + b1q + ··· + bn −1q b G(q, θ) = q−nk · b . −1 −nf 1 + f1q + ··· + fnf q
If S ∈ M and u is persistently exciting of order ≥ nf + nb then
iω ∗ iω G(e , θ ) = G0(e ) iω ∗ iω H(e , θ ) = H0(e ) − π ≤ ω ≤ π
Note that nb and nf are number of parameters in numerator/denominator of G(q, θ).
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 26 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Consistency result for G0 ∈ G
Consistency result for G0 ∈ G Consider the same situation as on the previous slide. If
• G0 ∈ G, and • G and H are independently parametrized in M, and
• u is persistently exciting of order ≥ nf + nb then
iω ∗ iω G(e , θ ) = G0(e ) − π ≤ ω ≤ π
G0 can be estimated consistently, irrespective of the question whether H0 can be modeled exactly. Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 27 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Independent parametrization
Model structure G(q, θ) H(q, θ) OE B/F 1 FIR B 1 BJ B/F C/D
Not: ARX, ARMAX because of common denominator in G en H.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 28 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Asymptotic distribution
The asymptotic pdf of the estimate θˆN is specified as follows: Asymptotic distribution Consider the data assumptions as present in the convergence result. Then for N → ∞, √ ∗ N(θˆN − θ ) → N (0, Pθ)
where 00 ∗ −1 00 ∗ −1 Pθ = [V¯ (θ )] Q[V¯ (θ )] 0 ∗ N 0 ∗ N T Q = lim N · E{[VN (θ , Z )][VN (θ , Z )] } N→∞
while (·)0,(·)00 respectively denote first and second derivative with respect to θ.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 29 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Justification of normal distribution As example: linear regression structure, S ∈ M, leading to
" #−1 1 X 1 X θˆ − θ = ϕ(t)ϕT (t) · ϕ(t)e(t) N 0 N N t t with e(t) a random variable and e a white noise process. Then, for N → ∞, the above expression becomes a weighted sum of an infinite number of random variables with a fixed distribution. According to the law of large numbers, this weighted sum will get a Gaussian pdf.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 30 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Pθ/N has the interpretation of covariance matrix of the parameter estimate: ˆ ˆ T Pθ/N = E[(θN − θ0)(θN − θ0) ]. quantifying the variability of the estimate
Covariance matrix in case of S ∈ M ∗ For the situation θ = θ0 (S ∈ M) it holds that
−1 2 h¯ T i Pθ = σe · Eψ(t, θ0)ψ (t, θ0)
∂yˆ(t|t − 1; θ) ∂ε(t, θ) ψ(t, θ0) = = − ∂θ ∂θ θ=θ0 θ=θ0
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 31 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Justification of the covariance matrix expression In the linear regression situation, S ∈ M, we have
" #−1 1 X 1 X θˆ − θ = ϕ(t)ϕT (t) · ϕ(t)e(t) N 0 N N t t
T so that E¯θ˜θ˜ can be written as " #−1 1 X 1 X ¯ ϕ(t)ϕT (t) ϕ(t)e(t)· E N N t t " #−1 1 X 1 X · ϕT (t)e(t) ϕ(t)ϕT (t) N N t t
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 32 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
2 2 With ϕ(t) deterministic and Ee (t) = σe , then " #−1 1 X P := ¯θ˜θ˜T = σ2 · ϕ(t)ϕT (t) θ E e N t
In linear regression:y ˆ(t|t − 1; θ) = ϕT (t)θ so that
∂yˆ(t|t − 1; θ) ϕ(t) = . ∂θ In general structures, the role of ϕ(t) is replaced by
∂yˆ(t|t − 1; θ) ψ(t, θ0) = . ∂θ θ=θ0 end justification
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 33 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
T Pθ/N = E(θˆN − θ0)(θˆN − θ0) can be estimated from the data and θˆN as:
N 1 X Pˆ =σ ˆ2[ ψ(t, θˆ )ψT (t, θˆ )]−1 θ e N N N t=1 N 1 X σˆ2 = ε2(t, θˆ ) e N N t=1
2 2 withσ ˆe an estimate of σe .
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 34 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Covariance matrix in case of G0 ∈ G ρ Let M = {(G(q, ρ), H(q, η)), θ = ∈ Θ} and η ρ θ∗ = 0 . Then η∗ √ N(ˆρN − ρ0) ∈ As N (0, Pρ) with
2 ¯ T −1 ¯ ˜ ˜T ¯ T −1 Pρ = σe [Eψρ(t)ψρ (t)] [Eψ(t)ψ (t)][Eψρ(t)ψρ (t)]
−1 ∗ ∂ ψρ(t) = H (q, η ) G(q, ρ) u(t) ∂ρ ρ=ρ0 ∞ ∞ X H0(z) X ψ˜(t) = f ψ (t + i); F (z) := = f z−i . i ρ H(z, η∗) i i=0 i=0
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 35 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Parameter uncertainty regions
The asymptotical normal distribution can be used to quantify parameter uncertainty regions. Start with: √ N(θˆN − θ0) → N (0, Pθ) −1 −1 T Let Pθ be decomposed as Pθ = Rθ Rθ, then √ NRθ(θˆN − θ0) → N (0, In)
i.e. a vector of n standard (independent) normally distributed random variables.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 36 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
If x ∈ N (0, In) then for the sum of the quadratic variables:
xT x ∈ χ2(n)
i.e. follows a χ2 distribution, scalar valued. This implies that the sum of quadratic variables
ˆ T −1 ˆ 2 N(θN − θ0) Pθ (θN − θ0) → χ (n). Example χ2-distribution 0.2 0.18 For x ∈ χ2(n): 0.16
0.14 0.12 Z cχ(α,n) 0.1 x dx = α 0.08 0 0.06
0.04
0.02
0 0 5 10 15
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 37 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Note that contour lines:
ˆ T −1 ˆ N(θN − θ0) Pθ (θN − θ0) = c are ellipsoidal contour lines of the multivariable Gaussian distribution:
λ 2sqrt{c 1}
θ (2) 0
w 2 w 1
θ (1) 0
(Pθ has eigenvalues λi and eigenvectors wi ) Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 38 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
while the level of probability related to the event
T −1 N(θ − θ0) Pθ (θ − θ0) < c is determined by the χ2(n)-distribution.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 39 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Denote with cχ(α, n) the value that satisfies
Pr(x ≤ cχ(α, n)) = α for x ∈ χ2(n). ˆ Then θN ∈ Dθ0 with probability α, with c (α, n) D = {θ | (θ − θ )T P−1(θ − θ ) < χ } θ0 0 θ 0 N ellipsoid centered at θ0.
λ 2sqrt{c 1}
θ (2) 0
w 2 w 1
θ (1) 0
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 40 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Alternatively, denote:
T −1 cχ(α, n) Dˆ = {θ | (θ − θˆN ) P (θ − θˆN ) < } θN θ N
ellipsoid centered at θˆN then it can simply be verified that
θˆN ∈ Dθ ⇔ θ0 ∈ Dˆ 0 θN
so that θ0 ∈ Dˆ with probability α. θN
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 41 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Uncertainty regions for frequency responses of estimated models
The identified parameter vector θˆN is a random variable distributed as θˆN ∼ AsN (θ0, Pθ/N) =⇒ iω the identified models (frequency responses) G(e , θˆN ) (and iω H(e , θˆN )) are also random variables: iω • G(e , θˆN ) is an (asymptotically) unbiased estimate of iω G(e , θ0) iω • the variance of G(e , θˆN ) is defined in the frequency domain as:
iω iω iω 2 cov(G(e , θˆN )) := E |G(e , θˆN ) − G(e , θ0)|
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 42 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Covariance of the frequency response
iω ˆ iω ˆ iω 2 cov{G(e , θN } = E |G(e , θN ) − G(e , θ0)| is obtained by
iω ∗ iω iω ˆ ∂G(e , θ) Pθ ∂G(e , θ) cov{G(e , θN } ∼ · · ∂θ N ∂θ θ0 θ0 which is a first order Taylor approximation that is exact for models that are linear in the parameters. ((·)∗ is complex conjugate transpose) iω cov{G(e , θˆN } can be estimated by replacing Pθ by Pˆθ, and θ0 by θˆN .
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 43 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
iω Proof of the expression of cov(G(e , θˆN )) First order (Taylor) approximation:
T iω iω iω G(e , θˆN ) ≈ G(e , θ0) + θˆN − θ0 ΛG (e , θ0)
iω iω ∂G(e ,θ) with Λ(e , θ0) = ∂θ . θ0 iω iω 2 Consequently: |G(e , θˆN ) − G(e , θ0)| = ..
iω iω ∗ iω iω ... = (G(e , θˆN ) − G(e , θ0)) (G(e , θˆN ) − G(e , θ0)) T iω ∗ iω ≈ ΛG (e , θ0) θˆN − θ0 θˆN − θ0 ΛG (e , θ0)
P E(.) = Λ (eiω, θ )∗ · θ · Λ (eiω, θ ) G 0 N G 0
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 44 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Observation The larger N and/or the larger the power of u(t), the smaller iω cov(G(e , θˆN )) direct consequence of the fact that Pθ/N has this property
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 45 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
iω Separate bounds for amplitude and phase of G(e , θˆN )
iω fa,ω(θ) = |G(e , θ)|, iω fp,ω(θ) = arg{G(e , θ)}
then covariance information on fa,ω and fp,ω is obtained from the first order approximations
T ˆ ∂fa,ω(θ) Pθ ∂fa,ω(θ) Cov{fa,ω(θN )} ≈ ∂θ N ∂θ ˆ ˆ θ=θN θ=θN T ˆ ∂fp,ω(θ) Pθ ∂fp,ω(θ) Cov{fp,ω(θN )} ≈ ∂θ N ∂θ ˆ ˆ θ=θN θ=θN This is implemented in Matlab’s System Identification Toolbox (3σ bounds).
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 46 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
2 10 amplitude
0 10
-2 10
-4 10
-2 -1 0 10 10 10
0 phase
-200
-400
-600 -2 -1 0 10 10 10 frequency
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 47 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Remarks • Estimated uncertainty bounds are reliable only if the models are correct (validated), i.e. S ∈ M, or G0 ∈ G. iω • If the uncertainty bound for G(e , θˆN ) is too large in a particular frequency range → (a) redo experiment with increased Φu(ω) at those frequencies, or (b) increase number of data (length of experiment) iω • Related analysis can be developed for H(e , θˆN )
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 48 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Asymptotic-in-order covariance of frequency response Consider the situation as in the convergence result. Let S ∈ M, and let G(q, θ) and H(q, θ) have McMillan degree n. Denote T (q, θ) := G(q, θ) H(q, θ). If n, N → ∞, and n/N → 0, then n Cov (Tˆ (eiω)) ∼ Φ (ω)[Φ (ω)]−1 N N v χ with χ(t) = [u(t) e(t)]T
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 49 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
If the system operates in open loop, u and e are uncorrelated, and the result simplifies to
iω n Φv (ω) Cov (GˆN (e )) ∼ N Φu(ω) ˆ iω n Φv (ω) n iω 2 Cov (HN (e )) ∼ 2 = |H0(e )| N σe N
But note the double limit, n, N → ∞.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 50 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution
Summary - discussion
I Comprehensive PE theory
I Asymptotic results in N, convergence, consistency, covariance matrix
I Variance expressions leading to uncertainty sets
I Principle reasoning: find the “best” model in a given structured model set (of particular order).
I Relation with Maximum Likelihood estimation to be shown (next)
I as well as model order selection.
Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 51 / 51