Introduction Model sets Identification criterion Convergence Consistency Distribution

System Identification Lecture 2 Prediction error method

Paul Van den Hof

Control Group Eindhoven University of Technology

dutch institute of systems and control

Paul Van den Hof Control Systems Group Eindhoven University of Technology DISC - Identification 2018 - Lecture 2 - Prediction error method 1 / 51 Contents

Introduction

Model sets

Identification criterion

Convergence

Consistency

Distribution Introduction Model sets Identification criterion Convergence Consistency Distribution

Schedule

Apr 9 PVdH Introduction; concepts Apr 16 PVdH Prediction error identification Apr 23 PVdH Asymptotic analysis Apr 30 GB Bayesian estimation; machine learning May 7 JS domain identification May 14 JS Nonlinear identification May 28 XB design June 4 PVdH Closed-loops and dynamic networks

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 3 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

One-step-ahead prediction For a dynamical model y(t) = G(q)u(t) + H(q)e(t) the t−1 t one-step-ahead predictiony ˆ(t|t − 1) := E{y(t)|y , u } is given by yˆ(t|t − 1) = H−1(q)G(q)u(t) + [1 − H−1(q)]y(t)

At the same time:

y(t) =y ˆ(t|t − 1) + e(t)

The one-step ahead predictor of a dynamic model predicts the output of the model up to the white noise term e(t), called the innovation process.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 4 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

One-step-ahead prediction is going to be used as a way to measure the deviation/distance between a set {(y(t), u(t))}t=1,···N and a predictor model (G(q), H(q)). If the data has been generated by (G, H) then

y(t) − yˆ(t|t − 1) = e(t)

If the data comes from another system, but we calculate the prediction on the basis of (G, H), then

y(t) − yˆ(t|t − 1) = ε(t)

the one-step-ahead prediction error.

ε(t) = H(q)−1[y(t) − G(q)u(t)]

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 5 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

e

H v u + y G + presumed data generating system

predictor model - + G

H-1

e(t)

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 6 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

e

H0 v u + y G 0 + presumed data generating system

predictor model - + G

H-1

ε(t)

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 7 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

ε(t) is going to serve as a basis for the identification criterion.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 8 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Black box model structures and model sets

Predictor model: (G(q), H(q)) Model set: d M = {(G(q, θ), H(q, θ)), θ ∈ Θ ⊂ R } A parametrization is used to represent models by real-valued coefficients Example of parametrization: T • Coefficients in polynomial fractions ⇐ θ = (b1 a1 c1 d1)

−1 −1 b1q 1 + c1q G(q, θ) = −1 ; H(q, θ) = −1 1 + a1q 1 + d1q • Coefficients in series expansions • Matrix coefficients in state space models

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 9 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

ARX Model structure B(q−1, θ) 1 G(q, θ) = ; H(q, θ) = A(q−1, θ) A(q−1, θ)

with

−1 −nk −1 −nb+1 B(q , θ) = q {b0 + b1q + ··· + bnb−1q } −1 −1 −na A(q , θ) = 1 + a1q + ··· + ana q T θ = [a1 a2 ··· ana b0 b1 ··· bnb−1] .

na, nb are the number of in the A and B polynomial. nk number of time delays Predictor:

yˆ(t|t − 1; θ) = B(q−1, θ)u(t) + [1 − A(q−1, θ)]y(t)

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 10 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Model structures

Model structure G(q, θ) H(q, θ) B(q−1, θ) 1 ARX A(q−1, θ) A(q−1, θ) B(q−1, θ) C(q−1, θ) ARMAX A(q−1, θ) A(q−1, θ) B(q−1, θ) OE - Output Error 1 F (q−1, θ)

FIR B(q−1, θ) 1

B(q−1, θ) C(q−1, θ) BJ - Box-Jenkins F (q−1, θ) D(q−1, θ)

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 11 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Properties of model structures • Linearity-in-the-parameters (ARX)

yˆ(t|t − 1; θ) = B(q−1)u(t) + (1 − A(q−1))y(t) = φT (t)θ

is a linear function in θ. ⇒ Important computational advantages. • Independent parametrization of G(q, θ) en H(q, θ) There are no common parameters in G and H. ⇒ Advantages for independent identification of G and H.

Both properties to be utilized later on.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 12 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

System in the model set

Data-generating system S :[G0, H0] d Model set: M : {[G(q, θ), H(q, θ)], θ ∈ Θ ⊂ R }.

S ∈ M denotes that the data generating system can exactly be represented within M, i.e. ∃θ0 ∈ Θ such that

G(q, θ0) = G0(q)

H(q, θ0) = H0(q)

d The notion G0 ∈ G with G = {G(q, θ), θ ∈ Θ ⊂ R } denotes that only G0 can exactly be represented, i.e.

G(q, θ0) = G0(q)

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 13 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Identification criterion

Identification criterion Consider the data-generating system:

y(t) = G0(q)u(t) + H0(q)e(t)

and the parametrized model G(q, θ), H(q, θ). 2 Denote: V¯(θ) = E¯ε (t, θ). ¯ 2 ˆ Then: V (θ) ≥ σe , with equality for θ if

G(q, θˆ) = G0(q)

H(q, θˆ) = H0(q)

Uniqueness of this solution requires some more conditions, to be specified later.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 14 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Reasoning: G (q) − G(q, θ) H (q) ε(t, θ) = 0 u(t) + 0 e(t) H(q, θ) H(q, θ) u and e uncorrelated; First term becomes 0 for G(q, θˆ) = G0(q). H (q) 0 e(t) = [1 + γ (θ)q−1 + γ (θ)q−2 + ··· ]e(t) H(q, θ) 1 2

¯ 2 2 2 2 Eε (t, θ) = [1 + γ1(θ) + γ2(θ) ··· ]σe

is minimal for γ1(θˆ) = γ2(θˆ) = ··· = 0. 2 ⇒ Cost function minimum σe is achieved for

G(q, θˆ) = G0 and H(q, θˆ) = H0.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 15 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Identification criterion Power of prediction error is estimated from a data sequence through the quadratic criterion:

N 1 X V (θ, Z N ) = ε2(t, θ) N N t=1

Parameter estimation through minimizing VN :

N θˆN = arg min VN (θ, Z ) θ

Optimization problem that is convex or not....

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 16 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Optimization: Convex Non-convex

θ V( ) VN(θ)

θ θ

Optimization is convex if the criterion is quadratic in θ. In that situation, the solution can be found analytically.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 17 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Linear regression

If the predictor is linear-in-the-parameters (ARX,FIR):

yˆ(t|t − 1; θ) = ϕT (t)θ

then N 1 X V (θ, Z N ) = [y(t) − ϕT (t)θ]2 N N t=1 and an analytical solution is obtained:

" N #−1 N 1 X 1 X θˆ = ϕ(t)ϕT (t) · ϕ(t)y(t) N N N t=1 t=1

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 18 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Assumptions for analysis: • Data has been generated by a linear , possibly operating in a (stabilized) feedback situation, with bounded excitation signals, and v having bounded moments of order 4; •M is uniformly stable (→ the predictor filters and their derivatives w.r.t. θ are uniformly stable).

Convergence result N Under the above assumptions, θˆN = arg minθ VN (θ, Z ), for N → ∞ converges with probability 1 to the minimizing argument θ∗ of 2 V¯(θ) := E¯ε (t, θ)

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 19 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Differently stated:

ˆ ∗ ¯ 2 θN → θ := arg min Eε (t, θ) w.p. 1 for N → ∞. θ The estimate converges to the best possible predictor (in the sense 2 of E¯ε ) within the model set M, and becomes independent of the particular noise sequence ( goes to 0)

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 20 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Persistence of excitation

Definition - persistently exciting input A quasi-stationary signal u is persistently exciting of order n if the spectral density Φu(ω) is unequal to 0 in n points in the interval (−π, π].

Example The signal u(t) = sin(ω0t) is persistently exciting of order 2. (Φu has a contribution in ±ω0) 2 degrees of freedom (amplitude and phase).

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 21 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Equivalent formulation Proposition A quasi-stationary signal u is persistently exciting of order n if and only if the (Toeplitz) matrix   Ru(0) Ru(1) ··· Ru(n − 1)  Ru(1) Ru(0) ··· Ru(n − 2)  R¯ :=   n  ......   . . . .  Ru(n − 1) ··· Ru(1) Ru(0) ¯ with Ru(τ) := E{u(t)u(t − τ)}, is non-singular.

Example: A white noise process (Ru(τ) = δ(τ)) is persistently exciting of any order. Note: R¯n = In Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 22 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Example block signal 2

1.5

1

0.5

0

−0.5

−1

−1.5

−2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 1 Ru(0) = 1 Ru(1) = 3 Ru(2) = − 3 Ru(3) = −1 1 1 Ru(4) = − 3 Ru(5) = 3 Ru(6) = 1 etcetera

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 23 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

 1 1  1 3 − 3 −1 1 1 1 ¯  3 1 3 − 3  R4 =  1 1 1   − 3 3 1 3  1 1 −1 − 3 3 1

R¯3 is regular, R¯4 is singular so u is p.e. of order 3

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 24 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

A step signal:

u(t) = 0 t < 0 = c t ≥ 0

is actually a transient signal, and therefore does not fit well into the “persistent signals” framework. Although a step response reveals all dynamics of an LTI system, a step signal is p.e. of order 1, only having a spectral contribution at ω = 0. So it only persistently excites frequency 0.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 25 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Consistency result for S ∈ M

Consistency for S ∈ M

Let data be generated by a system (G0, H0), and consider a model set M with G(q, θ) parametrized according to

−1 −n +1 b0 + b1q + ··· + bn −1q b G(q, θ) = q−nk · b . −1 −nf 1 + f1q + ··· + fnf q

If S ∈ M and u is persistently exciting of order ≥ nf + nb then

iω ∗ iω G(e , θ ) = G0(e ) iω ∗ iω H(e , θ ) = H0(e ) − π ≤ ω ≤ π

Note that nb and nf are number of parameters in numerator/denominator of G(q, θ).

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 26 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Consistency result for G0 ∈ G

Consistency result for G0 ∈ G Consider the same situation as on the previous slide. If

• G0 ∈ G, and • G and H are independently parametrized in M, and

• u is persistently exciting of order ≥ nf + nb then

iω ∗ iω G(e , θ ) = G0(e ) − π ≤ ω ≤ π

G0 can be estimated consistently, irrespective of the question whether H0 can be modeled exactly. Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 27 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Independent parametrization

Model structure G(q, θ) H(q, θ) OE B/F 1 FIR B 1 BJ B/F C/D

Not: ARX, ARMAX because of common denominator in G en H.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 28 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Asymptotic distribution

The asymptotic pdf of the estimate θˆN is specified as follows: Asymptotic distribution Consider the data assumptions as present in the convergence result. Then for N → ∞, √ ∗ N(θˆN − θ ) → N (0, Pθ)

where 00 ∗ −1 00 ∗ −1 Pθ = [V¯ (θ )] Q[V¯ (θ )] 0 ∗ N 0 ∗ N T Q = lim N · E{[VN (θ , Z )][VN (θ , Z )] } N→∞

while (·)0,(·)00 respectively denote first and second derivative with respect to θ.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 29 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Justification of normal distribution As example: structure, S ∈ M, leading to

" #−1 1 X 1 X θˆ − θ = ϕ(t)ϕT (t) · ϕ(t)e(t) N 0 N N t t with e(t) a and e a white noise process. Then, for N → ∞, the above expression becomes a weighted sum of an infinite number of random variables with a fixed distribution. According to the law of large numbers, this weighted sum will get a Gaussian pdf.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 30 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Pθ/N has the interpretation of matrix of the estimate: ˆ ˆ T Pθ/N = E[(θN − θ0)(θN − θ0) ]. quantifying the variability of the estimate

Covariance matrix in case of S ∈ M ∗ For the situation θ = θ0 (S ∈ M) it holds that

−1 2 h¯ T i Pθ = σe · Eψ(t, θ0)ψ (t, θ0)

∂yˆ(t|t − 1; θ) ∂ε(t, θ) ψ(t, θ0) = = − ∂θ ∂θ θ=θ0 θ=θ0

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 31 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Justification of the expression In the linear regression situation, S ∈ M, we have

" #−1 1 X 1 X θˆ − θ = ϕ(t)ϕT (t) · ϕ(t)e(t) N 0 N N t t

T so that E¯θ˜θ˜ can be written as " #−1  1 X 1 X ¯ ϕ(t)ϕT (t) ϕ(t)e(t)· E N N  t t " #−1 1 X 1 X  · ϕT (t)e(t) ϕ(t)ϕT (t) N N t t 

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 32 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

2 2 With ϕ(t) deterministic and Ee (t) = σe , then " #−1 1 X P := ¯θ˜θ˜T = σ2 · ϕ(t)ϕT (t) θ E e N t

In linear regression:y ˆ(t|t − 1; θ) = ϕT (t)θ so that

∂yˆ(t|t − 1; θ) ϕ(t) = . ∂θ In general structures, the role of ϕ(t) is replaced by

∂yˆ(t|t − 1; θ) ψ(t, θ0) = . ∂θ θ=θ0 end justification

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 33 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

T Pθ/N = E(θˆN − θ0)(θˆN − θ0) can be estimated from the data and θˆN as:

N 1 X Pˆ =σ ˆ2[ ψ(t, θˆ )ψT (t, θˆ )]−1 θ e N N N t=1 N 1 X σˆ2 = ε2(t, θˆ ) e N N t=1

2 2 withσ ˆe an estimate of σe .

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 34 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Covariance matrix in case of G0 ∈ G  ρ  Let M = {(G(q, ρ), H(q, η)), θ = ∈ Θ} and η  ρ  θ∗ = 0 . Then η∗ √ N(ˆρN − ρ0) ∈ As N (0, Pρ) with

2 ¯ T −1 ¯ ˜ ˜T ¯ T −1 Pρ = σe [Eψρ(t)ψρ (t)] [Eψ(t)ψ (t)][Eψρ(t)ψρ (t)]

−1 ∗ ∂ ψρ(t) = H (q, η ) G(q, ρ) u(t) ∂ρ ρ=ρ0 ∞ ∞ X H0(z) X ψ˜(t) = f ψ (t + i); F (z) := = f z−i . i ρ H(z, η∗) i i=0 i=0

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 35 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Parameter uncertainty regions

The asymptotical normal distribution can be used to quantify parameter uncertainty regions. Start with: √ N(θˆN − θ0) → N (0, Pθ) −1 −1 T Let Pθ be decomposed as Pθ = Rθ Rθ, then √ NRθ(θˆN − θ0) → N (0, In)

i.e. a vector of n standard (independent) normally distributed random variables.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 36 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

If x ∈ N (0, In) then for the sum of the quadratic variables:

xT x ∈ χ2(n)

i.e. follows a χ2 distribution, scalar valued. This implies that the sum of quadratic variables

ˆ T −1 ˆ 2 N(θN − θ0) Pθ (θN − θ0) → χ (n). Example χ2-distribution 0.2 0.18 For x ∈ χ2(n): 0.16

0.14 0.12 Z cχ(α,n) 0.1 x dx = α 0.08 0 0.06

0.04

0.02

0 0 5 10 15

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 37 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Note that contour lines:

ˆ T −1 ˆ N(θN − θ0) Pθ (θN − θ0) = c are ellipsoidal contour lines of the multivariable Gaussian distribution:

λ 2sqrt{c 1}

θ (2) 0

w 2 w 1

θ (1) 0

(Pθ has eigenvalues λi and eigenvectors wi ) Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 38 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

while the level of probability related to the event

T −1 N(θ − θ0) Pθ (θ − θ0) < c is determined by the χ2(n)-distribution.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 39 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Denote with cχ(α, n) the value that satisfies

Pr(x ≤ cχ(α, n)) = α for x ∈ χ2(n). ˆ Then θN ∈ Dθ0 with probability α, with c (α, n) D = {θ | (θ − θ )T P−1(θ − θ ) < χ } θ0 0 θ 0 N ellipsoid centered at θ0.

λ 2sqrt{c 1}

θ (2) 0

w 2 w 1

θ (1) 0

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 40 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Alternatively, denote:

T −1 cχ(α, n) Dˆ = {θ | (θ − θˆN ) P (θ − θˆN ) < } θN θ N

ellipsoid centered at θˆN then it can simply be verified that

θˆN ∈ Dθ ⇔ θ0 ∈ Dˆ 0 θN

so that θ0 ∈ Dˆ with probability α. θN

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 41 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Uncertainty regions for frequency responses of estimated models

The identified parameter vector θˆN is a random variable distributed as θˆN ∼ AsN (θ0, Pθ/N) =⇒ iω the identified models (frequency responses) G(e , θˆN ) (and iω H(e , θˆN )) are also random variables: iω • G(e , θˆN ) is an (asymptotically) unbiased estimate of iω G(e , θ0) iω • the variance of G(e , θˆN ) is defined in the as:

iω  iω iω 2 cov(G(e , θˆN )) := E |G(e , θˆN ) − G(e , θ0)|

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 42 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Covariance of the frequency response

iω ˆ  iω ˆ iω 2 cov{G(e , θN } = E |G(e , θN ) − G(e , θ0)| is obtained by

iω ∗ iω iω ˆ ∂G(e , θ) Pθ ∂G(e , θ) cov{G(e , θN } ∼ · · ∂θ N ∂θ θ0 θ0 which is a first order Taylor approximation that is exact for models that are linear in the parameters. ((·)∗ is complex conjugate transpose) iω cov{G(e , θˆN } can be estimated by replacing Pθ by Pˆθ, and θ0 by θˆN .

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 43 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

iω Proof of the expression of cov(G(e , θˆN )) First order (Taylor) approximation:

T iω iω   iω G(e , θˆN ) ≈ G(e , θ0) + θˆN − θ0 ΛG (e , θ0)

iω iω ∂G(e ,θ) with Λ(e , θ0) = ∂θ . θ0 iω iω 2 Consequently: |G(e , θˆN ) − G(e , θ0)| = ..

iω iω ∗ iω iω ... = (G(e , θˆN ) − G(e , θ0)) (G(e , θˆN ) − G(e , θ0)) T iω ∗     iω ≈ ΛG (e , θ0) θˆN − θ0 θˆN − θ0 ΛG (e , θ0)

P E(.) = Λ (eiω, θ )∗ · θ · Λ (eiω, θ ) G 0 N G 0

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 44 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Observation The larger N and/or the larger the power of u(t), the smaller iω cov(G(e , θˆN )) direct consequence of the fact that Pθ/N has this property

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 45 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

iω Separate bounds for amplitude and phase of G(e , θˆN )

iω fa,ω(θ) = |G(e , θ)|, iω fp,ω(θ) = arg{G(e , θ)}

then covariance information on fa,ω and fp,ω is obtained from the first order approximations

T ˆ ∂fa,ω(θ) Pθ ∂fa,ω(θ) Cov{fa,ω(θN )} ≈ ∂θ N ∂θ ˆ ˆ θ=θN θ=θN T ˆ ∂fp,ω(θ) Pθ ∂fp,ω(θ) Cov{fp,ω(θN )} ≈ ∂θ N ∂θ ˆ ˆ θ=θN θ=θN This is implemented in Matlab’s System Identification Toolbox (3σ bounds).

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 46 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

2 10 amplitude

0 10

-2 10

-4 10

-2 -1 0 10 10 10

0 phase

-200

-400

-600 -2 -1 0 10 10 10 frequency

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 47 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Remarks • Estimated uncertainty bounds are reliable only if the models are correct (validated), i.e. S ∈ M, or G0 ∈ G. iω • If the uncertainty bound for G(e , θˆN ) is too large in a particular frequency → (a) redo experiment with increased Φu(ω) at those frequencies, or (b) increase number of data (length of experiment) iω • Related analysis can be developed for H(e , θˆN )

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 48 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Asymptotic-in-order covariance of frequency response Consider the situation as in the convergence result. Let S ∈ M, and let G(q, θ) and H(q, θ) have McMillan degree n. Denote T (q, θ) := G(q, θ) H(q, θ). If n, N → ∞, and n/N → 0, then n Cov (Tˆ (eiω)) ∼ Φ (ω)[Φ (ω)]−1 N N v χ with χ(t) = [u(t) e(t)]T

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 49 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

If the system operates in open loop, u and e are uncorrelated, and the result simplifies to

iω n Φv (ω) Cov (GˆN (e )) ∼ N Φu(ω) ˆ iω n Φv (ω) n iω 2 Cov (HN (e )) ∼ 2 = |H0(e )| N σe N

But note the double limit, n, N → ∞.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 50 / 51 Introduction Model sets Identification criterion Convergence Consistency Distribution

Summary - discussion

I Comprehensive PE theory

I Asymptotic results in N, convergence, consistency, covariance matrix

I Variance expressions leading to uncertainty sets

I Principle reasoning: find the “best” model in a given structured model set (of particular order).

I Relation with Maximum Likelihood estimation to be shown (next)

I as well as model order selection.

Control Systems Group Eindhoven University of Technology DISC - System Identification 2018 - Lecture 2 - Prediction error method 51 / 51