Design of in nonlinear models

Luc Pronzato

Université Côte d’Azur, CNRS, I3S, France OutlineI

1 DoE: objectives & examples

2 DoE based on asymptotic normality

3 Construction of (locally) optimal designs

4 Problems with nonlinear models

5 Small-sample properties

6 Nonlocal optimum design

Luc Pronzato (CNRS) in nonlinear models La Réunion, nov. 2016 2 / 73 Method a: weigh each objet successively → y(i) = mi + εi , i = 1,..., 8 2 → estimated weights : mˆ i = yi ∼ N (mi , σ ) 2 Repeat 8 times, average the results: mˆ i ∼ N (mi , σ /8) (with 64 observations...)

1 DoE objectives & examples A/ Parameter estimation 1 DoE: objectives & examples

A/ Parameter estimation

Ex1: Weighing with a two-pan balance Determine the weights of 8 objets, with mass mi , i = 1,..., 8 2 i.i.d. errors εi ∼ N (0, σ )

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 3 / 73 1 DoE objectives & examples A/ Parameter estimation 1 DoE: objectives & examples

A/ Parameter estimation

Ex1: Weighing with a two-pan balance Determine the weights of 8 objets, with mass mi , i = 1,..., 8 2 i.i.d. errors εi ∼ N (0, σ ) Method a: weigh each objet successively → y(i) = mi + εi , i = 1,..., 8 2 → estimated weights : mˆ i = yi ∼ N (mi , σ ) 2 Repeat 8 times, average the results: mˆ i ∼ N (mi , σ /8) (with 64 observations...)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 3 / 73 « 8 observations only, against 64 with method a!

1 DoE objectives & examples A/ Parameter estimation

Method b: more sophisticated. . .

y1 = m1 + m2 + m3 + m4 + m5 + m6 + m7 + m8 + ε1 y2 = m1 + m2 + m3 − m4 − m5 − m6 − m7 + m8 + ε2 y3 = m1 − m2 − m3 + m4 + m5 − m6 − m7 + m8 + ε3 y4 = m1 − m2 − m3 − m4 − m5 + m6 + m7 + m8 + ε4 y5 = −m1 + m2 − m3 + m4 − m5 + m6 − m7 + m8 + ε5 y6 = −m1 + m2 − m3 − m4 + m5 − m6 + m7 + m8 + ε6 y7 = −m1 − m2 + m3 + m4 − m5 − m6 + m7 + m8 + ε7 y8 = −m1 − m2 + m3 − m4 + m5 + m6 − m7 + m8 + ε8 8 1 X → mˆ = y 8 8 i i=1 ε + ε + ε + ε + ε + ε + ε + ε = m + 1 2 3 4 5 6 7 8 8 8 2 ∼ N (m8, σ /8) (idem for mj , j ≤ 7)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 4 / 73 1 DoE objectives & examples A/ Parameter estimation

Method b: more sophisticated. . .

y1 = m1 + m2 + m3 + m4 + m5 + m6 + m7 + m8 + ε1 y2 = m1 + m2 + m3 − m4 − m5 − m6 − m7 + m8 + ε2 y3 = m1 − m2 − m3 + m4 + m5 − m6 − m7 + m8 + ε3 y4 = m1 − m2 − m3 − m4 − m5 + m6 + m7 + m8 + ε4 y5 = −m1 + m2 − m3 + m4 − m5 + m6 − m7 + m8 + ε5 y6 = −m1 + m2 − m3 − m4 + m5 − m6 + m7 + m8 + ε6 y7 = −m1 − m2 + m3 + m4 − m5 − m6 + m7 + m8 + ε7 y8 = −m1 − m2 + m3 − m4 + m5 + m6 − m7 + m8 + ε8 8 1 X → mˆ = y 8 8 i i=1 ε + ε + ε + ε + ε + ε + ε + ε = m + 1 2 3 4 5 6 7 8 8 8 2 ∼ N (m8, σ /8) (idem for mj , j ≤ 7)

« 8 observations only, against 64 with method a!

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 4 / 73 1 Pn > 1 > =⇒ Choose the fk ’s such that Mn = n k=1 fk fk = n F F is nonsingular E{mˆ } = m (no ) > σ2 −1 E{(mˆ − m)(mˆ − m) } = n Mn

−1 =⇒ minimize a scalar function of Mn

1 DoE objectives & examples A/ Parameter estimation Here, selection of a good design = combinatorial problem: P8 > yk = i=1 fk i mi + εk = fk m + εk , > (e.g., in Method b f2 = [1 1 1 − 1 − 1 − 1 − 1 1] ) y = Fm + ε with method a: Fa = I8 > method a: Fb = 8 × 8 Hadamard matrix, Fb Fb = 8 I8

n X > 2 LS estimator mˆ = arg min [yk − fk m] m k=1 n !−1 n X > X > −1 > = fk fk yk fk = (F F) F y k=1 k=1

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 5 / 73 E{mˆ } = m (no bias) > σ2 −1 E{(mˆ − m)(mˆ − m) } = n Mn

−1 =⇒ minimize a scalar function of Mn

1 DoE objectives & examples A/ Parameter estimation Here, selection of a good design = combinatorial problem: P8 > yk = i=1 fk i mi + εk = fk m + εk , > (e.g., in Method b f2 = [1 1 1 − 1 − 1 − 1 − 1 1] ) y = Fm + ε with method a: Fa = I8 > method a: Fb = 8 × 8 Hadamard matrix, Fb Fb = 8 I8

n X > 2 LS estimator mˆ = arg min [yk − fk m] m k=1 n !−1 n X > X > −1 > = fk fk yk fk = (F F) F y k=1 k=1

1 Pn > 1 > =⇒ Choose the fk ’s such that Mn = n k=1 fk fk = n F F is nonsingular

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 5 / 73 1 DoE objectives & examples A/ Parameter estimation Here, selection of a good design = combinatorial problem: P8 > yk = i=1 fk i mi + εk = fk m + εk , > (e.g., in Method b f2 = [1 1 1 − 1 − 1 − 1 − 1 1] ) y = Fm + ε with method a: Fa = I8 > method a: Fb = 8 × 8 Hadamard matrix, Fb Fb = 8 I8

n X > 2 LS estimator mˆ = arg min [yk − fk m] m k=1 n !−1 n X > X > −1 > = fk fk yk fk = (F F) F y k=1 k=1

1 Pn > 1 > =⇒ Choose the fk ’s such that Mn = n k=1 fk fk = n F F is nonsingular E{mˆ } = m (no bias) > σ2 −1 E{(mˆ − m)(mˆ − m) } = n Mn

−1 =⇒ minimize a scalar function of Mn

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 5 / 73 1 DoE objectives & examples A/ Parameter estimation

In this particular situation: combinatorial problem (since fk i ∈ {−1, 0, 1}) [Fisher 1925 . . . ]

More generally, when the design variables (inputs) are real numbers, optimum design for parameter estimation is obtained by optimization of a scalar function of the (asymptotic) matrix of the estimator

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 6 / 73 → Linear differential equations:

( dxC (t) dt = (−KEL− KCP )xC (t) + KPC xP (t) + u(t) dxP (t) dt = KCP xC (t) − KPC xP (t)

we observe the concentration of x in blood: y(t) = xC (t)/V + ε(t), 2 the errors (ti )’s are i.i.d. N (0, σ ), σ = 0.2µg/ml

There are 4 unknown parameters θ = (KCP , KPC , KEL, V ) The profile of the input u(t) is given (fast infusion 75 mg/min for 1 min, then 1.45 mg/min) simulated experiments with «true» parameter values

θ¯ = (0.066 min−1, 0.038 min−1, 0.0242 min−1, 30 l)

1 DoE objectives & examples A/ Parameter estimation

Ex2: [D’Argenio 1981]: two-compartment model in pharmacokinetics A product x is injected in blood (→ input u(t)), xC (t) (product in blood) moves to another tissue → xP (t)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 7 / 73 There are 4 unknown parameters θ = (KCP , KPC , KEL, V ) The profile of the input u(t) is given (fast infusion 75 mg/min for 1 min, then 1.45 mg/min) simulated experiments with «true» parameter values

θ¯ = (0.066 min−1, 0.038 min−1, 0.0242 min−1, 30 l)

1 DoE objectives & examples A/ Parameter estimation

Ex2: [D’Argenio 1981]: two-compartment model in pharmacokinetics A product x is injected in blood (→ input u(t)), xC (t) (product in blood) moves to another tissue → xP (t) → Linear differential equations:

( dxC (t) dt = (−KEL− KCP )xC (t) + KPC xP (t) + u(t) dxP (t) dt = KCP xC (t) − KPC xP (t) we observe the concentration of x in blood: y(t) = xC (t)/V + ε(t), 2 the errors (ti )’s are i.i.d. N (0, σ ), σ = 0.2µg/ml

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 7 / 73 simulated experiments with «true» parameter values

θ¯ = (0.066 min−1, 0.038 min−1, 0.0242 min−1, 30 l)

1 DoE objectives & examples A/ Parameter estimation

Ex2: [D’Argenio 1981]: two-compartment model in pharmacokinetics A product x is injected in blood (→ input u(t)), xC (t) (product in blood) moves to another tissue → xP (t) → Linear differential equations:

( dxC (t) dt = (−KEL− KCP )xC (t) + KPC xP (t) + u(t) dxP (t) dt = KCP xC (t) − KPC xP (t) we observe the concentration of x in blood: y(t) = xC (t)/V + ε(t), 2 the errors (ti )’s are i.i.d. N (0, σ ), σ = 0.2µg/ml

There are 4 unknown parameters θ = (KCP , KPC , KEL, V ) The profile of the input u(t) is given (fast infusion 75 mg/min for 1 min, then 1.45 mg/min)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 7 / 73 1 DoE objectives & examples A/ Parameter estimation

Ex2: [D’Argenio 1981]: two-compartment model in pharmacokinetics A product x is injected in blood (→ input u(t)), xC (t) (product in blood) moves to another tissue → xP (t) → Linear differential equations:

( dxC (t) dt = (−KEL− KCP )xC (t) + KPC xP (t) + u(t) dxP (t) dt = KCP xC (t) − KPC xP (t) we observe the concentration of x in blood: y(t) = xC (t)/V + ε(t), 2 the errors (ti )’s are i.i.d. N (0, σ ), σ = 0.2µg/ml

There are 4 unknown parameters θ = (KCP , KPC , KEL, V ) The profile of the input u(t) is given (fast infusion 75 mg/min for 1 min, then 1.45 mg/min) simulated experiments with «true» parameter values

θ¯ = (0.066 min−1, 0.038 min−1, 0.0242 min−1, 30 l)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 7 / 73 • «optimal» design (for θ¯):

t = (1, 1, 10, 10, 74, 74, 720, 720) (in min)

(assumes that independent measurements at the same time are possible) → 400 simulations → 400 sets of 8 observations each, for each design → 400 parameter estimates (LS) for each design . . . of θˆi (and approximated marginals [Pázman & P 1996])

1 DoE objectives & examples A/ Parameter estimation

Experimental variables = times ti , 1 ≤ ti ≤ 720 min • «conventional» design:

t = (5, 10, 30, 60, 120, 180, 360, 720) (in min)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 8 / 73 → 400 simulations → 400 sets of 8 observations each, for each design → 400 parameter estimates (LS) for each design . . . histograms of θˆi (and approximated marginals [Pázman & P 1996])

1 DoE objectives & examples A/ Parameter estimation

Experimental variables = sampling times ti , 1 ≤ ti ≤ 720 min • «conventional» design:

t = (5, 10, 30, 60, 120, 180, 360, 720) (in min)

• «optimal» design (for θ¯):

t = (1, 1, 10, 10, 74, 74, 720, 720) (in min)

(assumes that independent measurements at the same time are possible)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 8 / 73 1 DoE objectives & examples A/ Parameter estimation

Experimental variables = sampling times ti , 1 ≤ ti ≤ 720 min • «conventional» design:

t = (5, 10, 30, 60, 120, 180, 360, 720) (in min)

• «optimal» design (for θ¯):

t = (1, 1, 10, 10, 74, 74, 720, 720) (in min)

(assumes that independent measurements at the same time are possible) → 400 simulations → 400 sets of 8 observations each, for each design → 400 parameter estimates (LS) for each design . . . histograms of θˆi (and approximated marginals [Pázman & P 1996])

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 8 / 73 1 DoE objectives & examples A/ Parameter estimation

« «optimal» design gives more precise estimation

KˆEL [K¯EL = 0.0242] 180

160

140

120

100

80

60

40

20

0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 9 / 73 1 DoE objectives & examples A/ Parameter estimation

« «optimal» design gives more precise estimation

KˆCP [K¯CP = 0.066] 25

20

15

10

5

0 0 0.05 0.1 0.15 0.2 0.25

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 9 / 73 1 DoE objectives & examples A/ Parameter estimation

« «optimal» design gives more precise estimation

KˆPC [K¯PC = 0.038] 30

25

20

15

10

5

0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 9 / 73 1 DoE objectives & examples A/ Parameter estimation

« «optimal» design gives more precise estimation

Vˆ [V¯ = 30] 0.25

0.2

0.15

0.1

0.05

0 0 5 10 15 20 25 30 35 40 45 50

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 9 / 73 1 DoE objectives & examples B/ Model discrimination

B/ Model discrimination

Ex3: [Box & Hill 1967] Chemical reaction A → B 2 design variables: x = (time t, temperature T ) reaction of 1st, 2nd, 3rd ou 4th order? → 4 model structures are candidate:

(1) η (x, θ1) = exp[−θ11t exp(−θ12/T )] (2) 1 η (x, θ2) = 1 + θ21t exp(−θ22/T ) 1 η(3)(x, θ ) = 3 1/2 [1 + 2θ31t exp(−θ32/T )] 1 η(4)(x, θ ) = 4 1/3 [1 + 3θ41t exp(−θ42/T )]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 10 / 73 Sequential design: after the observation of y(xj ), j = 1,..., k, ˆk • estimate θi (LS) for i = 1, 2, 3, 4 • compute πi (k) that model i is correct for i = 1, 2, 3, 4

Initialization: πi (0) = 1/4, i = 1,..., 4 and x1,..., x4 are given

1 DoE objectives & examples B/ Model discrimination

Simulated (2) Observations with 2nd structure («true»): y(xj ) = η (xj , θ¯2) + εj , with > ® θ¯2 = (400, 5000) the «true» value (unknown) of parameters in model 2 2 ® (j ) i.i.d. N (0, σ ), σ = 0.05 Admissible experimental domain: 0 ≤ t ≤ 150, 450 ≤ T ≤ 600

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 11 / 73 1 DoE objectives & examples B/ Model discrimination

Simulated experiment (2) Observations with 2nd structure («true»): y(xj ) = η (xj , θ¯2) + εj , with > ® θ¯2 = (400, 5000) the «true» value (unknown) of parameters in model 2 2 ® (j ) i.i.d. N (0, σ ), σ = 0.05 Admissible experimental domain: 0 ≤ t ≤ 150, 450 ≤ T ≤ 600

Sequential design: after the observation of y(xj ), j = 1,..., k, ˆk • estimate θi (LS) for i = 1, 2, 3, 4 • compute posterior probability πi (k) that model i is correct for i = 1, 2, 3, 4

Initialization: πi (0) = 1/4, i = 1,..., 4 and x1,..., x4 are given

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 11 / 73 design points xk

600

580 1 4

560 5,7,8

540 6,9,12,14 10,11,13 520

500

480 23

460

440 0 20 40 60 80 100 120 140 160

1 DoE objectives & examples B/ Model discrimination

probabilities πi (k) 1

0.9 π (2) 2 → η

0.8

0.7

0.6

0.5 π 3 0.4

0.3

0.2 π 4 0.1 π 1 0 0 2 4 6 8 10 12 14

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 12 / 73 1 DoE objectives & examples B/ Model discrimination

probabilities πi (k) design points xk 1

600 0.9 π (2) 2 → η 580 0.8 1 4

560 0.7 5,7,8

0.6 540 6,9,12,14 10,11,13 0.5 π 520 3 0.4 500 0.3 480 23 0.2 π 4 460 0.1 π 1 0 440 0 2 4 6 8 10 12 14 0 20 40 60 80 100 120 140 160

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 12 / 73 ˆk If more than two models: estimate θi for all of them, place next point using the two models with best and second best fitting (see [Atkinson & Cox 1974; Hill 1978] for surveys)

1 DoE objectives & examples B/ Model discrimination

Design for discrimination is not considered in the following

(1) A simple sequential method for discriminating between two structures η (x, θ1), (2) η (x, θ2) [Atkinson & Fedorov 1975] ˆk ˆk After observation of y(x1),..., y(xk ) estimate θ1 and θ2 for both models (1) ˆk (2) ˆk 2 place next point xk+1 where [η (x, θ1 ) − η (x, θ2 )] is maximum k → k + 1, repeat

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 13 / 73 1 DoE objectives & examples B/ Model discrimination

Design for discrimination is not considered in the following

(1) A simple sequential method for discriminating between two structures η (x, θ1), (2) η (x, θ2) [Atkinson & Fedorov 1975] ˆk ˆk After observation of y(x1),..., y(xk ) estimate θ1 and θ2 for both models (1) ˆk (2) ˆk 2 place next point xk+1 where [η (x, θ1 ) − η (x, θ2 )] is maximum k → k + 1, repeat ˆk If more than two models: estimate θi for all of them, place next point using the two models with best and second best fitting (see [Atkinson & Cox 1974; Hill 1978] for surveys)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 13 / 73 2 DoE based on asymptotic normality A/ Regression models 2 DoE based on asymptotic normality

A/ Regression models

yi = y(xi ) = η(xi , θ) + εi

System: physical experimental device, experimental conditions xi , i = 1, 2,..., n > Model(θ): mathematical equations, parameters θ = (θ1, . . . , θp) (response η(x, θ) known explicitly or result of simulation of ODEs or PDEs) Criterion: similarity between yi = y(xi ) and η(xi , θ), i = 1, 2,..., n, e.g., 1 Pn 2 J(θ) = n i=1[yi − η(xi , θ)] (LS)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 14 / 73 Criterion: other criteria than LS can be used, e.g., 1 Pn J(θ) = n i=1 |yi − η(xi , θ)| (→ robust estimation), including Maximum-Likelihood (ML) estimation in more general settings 1 Pn (J(θ) = n i=1 log π(yi |θ) → max!) We always assume independent observations y(xi ) (independent εi in regression) — often much more difficult otherwise! Most of the following can be found in [P & Pázman: Design of Experiments in Nonlinear Models, Springer, 2013]

2 DoE based on asymptotic normality A/ Regression models

Remarks: Model(θ) also provides derivatives > ∂η(x, θ)/∂θ = (∂η(x, θ)/∂θ1, . . . , ∂η(x, θ)/∂θp) — plus higher-order derivatives if necessary— via simulation of sensitivity functions, or automatic differentiation (adjoint code)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 15 / 73 We always assume independent observations y(xi ) (independent εi in regression) — often much more difficult otherwise! Most of the following can be found in [P & Pázman: Design of Experiments in Nonlinear Models, Springer, 2013]

2 DoE based on asymptotic normality A/ Regression models

Remarks: Model(θ) also provides derivatives > ∂η(x, θ)/∂θ = (∂η(x, θ)/∂θ1, . . . , ∂η(x, θ)/∂θp) — plus higher-order derivatives if necessary— via simulation of sensitivity functions, or automatic differentiation (adjoint code) Criterion: other criteria than LS can be used, e.g., 1 Pn J(θ) = n i=1 |yi − η(xi , θ)| (→ robust estimation), including Maximum-Likelihood (ML) estimation in more general settings 1 Pn (J(θ) = n i=1 log π(yi |θ) → max!)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 15 / 73 Most of the following can be found in [P & Pázman: Design of Experiments in Nonlinear Models, Springer, 2013]

2 DoE based on asymptotic normality A/ Regression models

Remarks: Model(θ) also provides derivatives > ∂η(x, θ)/∂θ = (∂η(x, θ)/∂θ1, . . . , ∂η(x, θ)/∂θp) — plus higher-order derivatives if necessary— via simulation of sensitivity functions, or automatic differentiation (adjoint code) Criterion: other criteria than LS can be used, e.g., 1 Pn J(θ) = n i=1 |yi − η(xi , θ)| (→ robust estimation), including Maximum-Likelihood (ML) estimation in more general settings 1 Pn (J(θ) = n i=1 log π(yi |θ) → max!) We always assume independent observations y(xi ) (independent εi in regression) — often much more difficult otherwise!

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 15 / 73 2 DoE based on asymptotic normality A/ Regression models

Remarks: Model(θ) also provides derivatives > ∂η(x, θ)/∂θ = (∂η(x, θ)/∂θ1, . . . , ∂η(x, θ)/∂θp) — plus higher-order derivatives if necessary— via simulation of sensitivity functions, or automatic differentiation (adjoint code) Criterion: other criteria than LS can be used, e.g., 1 Pn J(θ) = n i=1 |yi − η(xi , θ)| (→ robust estimation), including Maximum-Likelihood (ML) estimation in more general settings 1 Pn (J(θ) = n i=1 log π(yi |θ) → max!) We always assume independent observations y(xi ) (independent εi in regression) — often much more difficult otherwise! Most of the following can be found in [P & Pázman: Design of Experiments in Nonlinear Models, Springer, 2013]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 15 / 73 : η(x, θ) = f>(x)θ → θˆn = (F>F)−1F>y , > > > with y = (y1,..., yn) and F = (f(x1),..., f(xn))

1 > =⇒ choose the xi such that Mn = n F F has full rank (Mn = normalized information matrix)

> Since yi = f (xi )θ¯ + εi for some θ¯ and E{εi } = 0 for all i,E{y} = Fθ¯ andE {θˆn} = θ¯ ˆn ˆn ¯ ˆn ¯ > 2 > −1 σ2 −1 Also, Var(θ ) = E{(θ − θ)(θ − θ) } = σ (F F) = n Mn when the εi are i.i.d. with finite σ2 −1 =⇒ choose the xi to minimize a scalar function of Mn (see Example 1: weighing with a two-pan balance)

2 DoE based on asymptotic normality B/ LS estimation

B/ LS estimation

ˆn 1 Pn 2 θ = arg minθ n i=1[yi − η(xi , θ)]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 16 / 73 > Since yi = f (xi )θ¯ + εi for some θ¯ and E{εi } = 0 for all i,E{y} = Fθ¯ andE {θˆn} = θ¯ ˆn ˆn ¯ ˆn ¯ > 2 > −1 σ2 −1 Also, Var(θ ) = E{(θ − θ)(θ − θ) } = σ (F F) = n Mn when the εi are i.i.d. with finite variance σ2 −1 =⇒ choose the xi to minimize a scalar function of Mn (see Example 1: weighing with a two-pan balance)

2 DoE based on asymptotic normality B/ LS estimation

B/ LS estimation

ˆn 1 Pn 2 θ = arg minθ n i=1[yi − η(xi , θ)]

Linear model: η(x, θ) = f>(x)θ → θˆn = (F>F)−1F>y , > > > with y = (y1,..., yn) and F = (f(x1),..., f(xn))

1 > =⇒ choose the xi such that Mn = n F F has full rank (Mn = normalized information matrix)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 16 / 73 2 DoE based on asymptotic normality B/ LS estimation

B/ LS estimation

ˆn 1 Pn 2 θ = arg minθ n i=1[yi − η(xi , θ)]

Linear model: η(x, θ) = f>(x)θ → θˆn = (F>F)−1F>y , > > > with y = (y1,..., yn) and F = (f(x1),..., f(xn))

1 > =⇒ choose the xi such that Mn = n F F has full rank (Mn = normalized information matrix)

> Since yi = f (xi )θ¯ + εi for some θ¯ and E{εi } = 0 for all i,E{y} = Fθ¯ andE {θˆn} = θ¯ ˆn ˆn ¯ ˆn ¯ > 2 > −1 σ2 −1 Also, Var(θ ) = E{(θ − θ)(θ − θ) } = σ (F F) = n Mn when the εi are i.i.d. with finite variance σ2 −1 =⇒ choose the xi to minimize a scalar function of Mn (see Example 1: weighing with a two-pan balance)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 16 / 73 Moreover, under «standard» regularity assumptions (η(x, θ) twice continuously 2 differentiable in θ for all x. . . ), for i.i.d. errors εi with finite variance σ , for a suitable sequence (xi ) √ n(θˆn − θ¯) →d N (0, σ2 M−1) as n → ∞ (asymptotic normality)

1 Pn ∂η(xi ,θ) ∂η(xi ,θ) with M = limn→∞ n i=1 ∂θ θ¯ ∂θ> θ¯ (information matrix)

−1 =⇒ choose the xi (design) to minimize a scalar function of M , or maximize a function Φ(M)

= classical approach for DoE (see Example 2 with a two-compartment model)

2 DoE based on asymptotic normality B/ LS estimation

Nonlinear model: η(x, θ) Under «standard» assumptions (θ ∈ Θ compact, η(x, θ) continuous in θ for all x...) and for a suitable sequence (xi ) θˆn a→.s. θ¯ as n → ∞ (strong consistency)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 17 / 73 −1 =⇒ choose the xi (design) to minimize a scalar function of M , or maximize a function Φ(M)

= classical approach for DoE (see Example 2 with a two-compartment model)

2 DoE based on asymptotic normality B/ LS estimation

Nonlinear model: η(x, θ) Under «standard» assumptions (θ ∈ Θ compact, η(x, θ) continuous in θ for all x...) and for a suitable sequence (xi ) θˆn a→.s. θ¯ as n → ∞ (strong consistency)

Moreover, under «standard» regularity assumptions (η(x, θ) twice continuously 2 differentiable in θ for all x. . . ), for i.i.d. errors εi with finite variance σ , for a suitable sequence (xi ) √ n(θˆn − θ¯) →d N (0, σ2 M−1) as n → ∞ (asymptotic normality)

1 Pn ∂η(xi ,θ) ∂η(xi ,θ) with M = limn→∞ n i=1 ∂θ θ¯ ∂θ> θ¯ (information matrix)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 17 / 73 2 DoE based on asymptotic normality B/ LS estimation

Nonlinear model: η(x, θ) Under «standard» assumptions (θ ∈ Θ compact, η(x, θ) continuous in θ for all x...) and for a suitable sequence (xi ) θˆn a→.s. θ¯ as n → ∞ (strong consistency)

Moreover, under «standard» regularity assumptions (η(x, θ) twice continuously 2 differentiable in θ for all x. . . ), for i.i.d. errors εi with finite variance σ , for a suitable sequence (xi ) √ n(θˆn − θ¯) →d N (0, σ2 M−1) as n → ∞ (asymptotic normality)

1 Pn ∂η(xi ,θ) ∂η(xi ,θ) with M = limn→∞ n i=1 ∂θ θ¯ ∂θ> θ¯ (information matrix)

−1 =⇒ choose the xi (design) to minimize a scalar function of M , or maximize a function Φ(M)

= classical approach for DoE (see Example 2 with a two-compartment model)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 17 / 73 √ ˆn ¯ d Strong consistency and asymptotic normality n(θWLS − θ) → N (0, C) as n → ∞, where −1 −1 C = Ma MbMa and 1 Pn ∂η(xi ,θ) ∂η(xi ,θ) Ma = limn→∞ n i=1 w(xi ) ∂θ θ¯ ∂θ> θ¯ 1 Pn 2 2 ∂η(xi ,θ) ∂η(xi ,θ) Mb = limn→∞ n i=1 w (xi ) σ (xi ) ∂θ θ¯ ∂θ> θ¯ « C  M−1 1 Pn 1 ∂η(xi ,θ) ∂η(xi ,θ) M = limn→∞ 2 > n i=1 σ (xi ) ∂θ θ¯ ∂θ θ¯

−1 −2 « C = M when w(x) ∝ σ (x) =⇒ choose the best estimator, then the best design

2 DoE based on asymptotic normality B/ LS estimation

Remarks: Weighted LS: suppose heteroscedastic errors 2 2 2 var{εi } = E{εi } = E{ε (xi )} = σ (xi ) ˆn 1 Pn 2 Weighted LS estimator θWLS minimizes JWLS (θ) = n i=1 w(xi )[ yi − η(xi , θ)]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 18 / 73 « C  M−1 1 Pn 1 ∂η(xi ,θ) ∂η(xi ,θ) M = limn→∞ 2 > n i=1 σ (xi ) ∂θ θ¯ ∂θ θ¯

−1 −2 « C = M when w(x) ∝ σ (x) =⇒ choose the best estimator, then the best design

2 DoE based on asymptotic normality B/ LS estimation

Remarks: Weighted LS: suppose heteroscedastic errors 2 2 2 var{εi } = E{εi } = E{ε (xi )} = σ (xi ) ˆn 1 Pn 2 Weighted LS estimator θWLS minimizes JWLS (θ) = n i=1 w(xi )[ yi − η(xi , θ)] √ ˆn ¯ d Strong consistency and asymptotic normality n(θWLS − θ) → N (0, C) as n → ∞, where −1 −1 C = Ma MbMa and 1 Pn ∂η(xi ,θ) ∂η(xi ,θ) Ma = limn→∞ n i=1 w(xi ) ∂θ θ¯ ∂θ> θ¯ 1 Pn 2 2 ∂η(xi ,θ) ∂η(xi ,θ) Mb = limn→∞ n i=1 w (xi ) σ (xi ) ∂θ θ¯ ∂θ> θ¯

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 18 / 73 2 DoE based on asymptotic normality B/ LS estimation

Remarks: Weighted LS: suppose heteroscedastic errors 2 2 2 var{εi } = E{εi } = E{ε (xi )} = σ (xi ) ˆn 1 Pn 2 Weighted LS estimator θWLS minimizes JWLS (θ) = n i=1 w(xi )[ yi − η(xi , θ)] √ ˆn ¯ d Strong consistency and asymptotic normality n(θWLS − θ) → N (0, C) as n → ∞, where −1 −1 C = Ma MbMa and 1 Pn ∂η(xi ,θ) ∂η(xi ,θ) Ma = limn→∞ n i=1 w(xi ) ∂θ θ¯ ∂θ> θ¯ 1 Pn 2 2 ∂η(xi ,θ) ∂η(xi ,θ) Mb = limn→∞ n i=1 w (xi ) σ (xi ) ∂θ θ¯ ∂θ> θ¯ « C  M−1 1 Pn 1 ∂η(xi ,θ) ∂η(xi ,θ) M = limn→∞ 2 > n i=1 σ (xi ) ∂θ θ¯ ∂θ θ¯

−1 −2 « C = M when w(x) ∝ σ (x) =⇒ choose the best estimator, then the best design

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 18 / 73 √ ˆn ¯ d 2 −1 Similar asymptotic results for ML estimation n(θML − θ) → N (0, σ MF ) as n → ∞, with MF = matrix Model(θ) = linear ODE, experimental design = system input u(t) « (≈ simple) analytic expression for M « optimal input design ⇔ optimal control problem ( domain → optimal combination of sinusoidal signals) [Goodwin & Payne 1977; Zarrop 1979; Ljung 1987; Walter & P 1994, 1997]

2 DoE based on asymptotic normality B/ LS estimation

Remarks (continued): 2 One may also consider the case var{εi } = σ (xi , θ) (errors with parameterized variance) ˆn −2 ˆn « Use two-stage LS: 1/ use w(x) ≡ 1 → θ(1); 2/ use w(x) = σ (x, θ(1)) or use iteratively-reweighted LS (i.e., go on with more stages), or penalized LS. . .

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 19 / 73 Model(θ) = linear ODE, experimental design = system input u(t) « (≈ simple) analytic expression for M « optimal input design ⇔ optimal control problem ( → optimal combination of sinusoidal signals) [Goodwin & Payne 1977; Zarrop 1979; Ljung 1987; Walter & P 1994, 1997]

2 DoE based on asymptotic normality B/ LS estimation

Remarks (continued): 2 One may also consider the case var{εi } = σ (xi , θ) (errors with parameterized variance) ˆn −2 ˆn « Use two-stage LS: 1/ use w(x) ≡ 1 → θ(1); 2/ use w(x) = σ (x, θ(1)) or use iteratively-reweighted LS (i.e., go on with more stages), or penalized LS. . . √ ˆn ¯ d 2 −1 Similar asymptotic results for ML estimation n(θML − θ) → N (0, σ MF ) as n → ∞, with MF = Fisher information matrix

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 19 / 73 2 DoE based on asymptotic normality B/ LS estimation

Remarks (continued): 2 One may also consider the case var{εi } = σ (xi , θ) (errors with parameterized variance) ˆn −2 ˆn « Use two-stage LS: 1/ use w(x) ≡ 1 → θ(1); 2/ use w(x) = σ (x, θ(1)) or use iteratively-reweighted LS (i.e., go on with more stages), or penalized LS. . . √ ˆn ¯ d 2 −1 Similar asymptotic results for ML estimation n(θML − θ) → N (0, σ MF ) as n → ∞, with MF = Fisher information matrix Model(θ) = linear ODE, experimental design = system input u(t) « (≈ simple) analytic expression for M « optimal input design ⇔ optimal control problem (frequency domain → optimal combination of sinusoidal signals) [Goodwin & Payne 1977; Zarrop 1979; Ljung 1987; Walter & P 1994, 1997]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 19 / 73 « Most criteria can be related to geometrical properties of R(θˆn, α)

« Nonlinear model =⇒ M = M(θ) depends on the θ where η(x, θ) is linearized: for the use a nominal value θ0

« locally optimum design

2 DoE based on asymptotic normality C/ Design based on the information matrix

C/ Design based on the information matrix

Maximize Φ(M), but which Φ(·)? « There are many possibilities! LS estimation in with i.i.d. errors N (0, σ2)

σ2 R(θˆn, α) = {θ ∈ Rp :(θ − θˆn)>M (θ − θˆn) ≤ χ2 (1 − α)} n n p

= confidence region (ellipsoid) at level α: Prob{θ¯ ∈ R(θˆn, α)} = α (asymptotically true in nonlinear situations — e.g., nonlinear regression)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 20 / 73 « Nonlinear model =⇒ M = M(θ) depends on the θ where η(x, θ) is linearized: for the moment use a nominal value θ0

« locally optimum design

2 DoE based on asymptotic normality C/ Design based on the information matrix

C/ Design based on the information matrix

Maximize Φ(M), but which Φ(·)? « There are many possibilities! LS estimation in linear regression with i.i.d. errors N (0, σ2)

σ2 R(θˆn, α) = {θ ∈ Rp :(θ − θˆn)>M (θ − θˆn) ≤ χ2 (1 − α)} n n p

= confidence region (ellipsoid) at level α: Prob{θ¯ ∈ R(θˆn, α)} = α (asymptotically true in nonlinear situations — e.g., nonlinear regression)

« Most criteria can be related to geometrical properties of R(θˆn, α)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 20 / 73 2 DoE based on asymptotic normality C/ Design based on the information matrix

C/ Design based on the information matrix

Maximize Φ(M), but which Φ(·)? « There are many possibilities! LS estimation in linear regression with i.i.d. errors N (0, σ2)

σ2 R(θˆn, α) = {θ ∈ Rp :(θ − θˆn)>M (θ − θˆn) ≤ χ2 (1 − α)} n n p

= confidence region (ellipsoid) at level α: Prob{θ¯ ∈ R(θˆn, α)} = α (asymptotically true in nonlinear situations — e.g., nonlinear regression)

« Most criteria can be related to geometrical properties of R(θˆn, α)

« Nonlinear model =⇒ M = M(θ) depends on the θ where η(x, θ) is linearized: for the moment use a nominal value θ0

« locally optimum design

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 20 / 73 E-optimality: maximize λmin(M) ⇔ minimize the longest axis of R(θˆn, α) D-optimality: maximize log det M √ ⇔ minimize volume of R(θˆn, α) (proportional to 1/ det M) Very much used: a D-optimum design is invariant by reparameterization  ∂β  det M0(β(θ)) = det M(θ) det−2 ∂θ> often leads to repeat the same experimental conditions (replications) (remember Ex2: dim(θ) = 4 → 4 different sampling times, several observations at each)

2 DoE based on asymptotic normality C/ Design based on the information matrix

A few choices for Φ(·)

A-optimality: maximize −trace[M−1] ⇔ maximize 1/trace[M−1] ⇔ minimize the sum of lengthes2 of axes of (asymptotic) confidence ellipsoids R(θˆn, α)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 21 / 73 D-optimality: maximize log det M √ ⇔ minimize volume of R(θˆn, α) (proportional to 1/ det M) Very much used: a D-optimum design is invariant by reparameterization  ∂β  det M0(β(θ)) = det M(θ) det−2 ∂θ> often leads to repeat the same experimental conditions (replications) (remember Ex2: dim(θ) = 4 → 4 different sampling times, several observations at each)

2 DoE based on asymptotic normality C/ Design based on the information matrix

A few choices for Φ(·)

A-optimality: maximize −trace[M−1] ⇔ maximize 1/trace[M−1] ⇔ minimize the sum of lengthes2 of axes of (asymptotic) confidence ellipsoids R(θˆn, α)

E-optimality: maximize λmin(M) ⇔ minimize the longest axis of R(θˆn, α)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 21 / 73 2 DoE based on asymptotic normality C/ Design based on the information matrix

A few choices for Φ(·)

A-optimality: maximize −trace[M−1] ⇔ maximize 1/trace[M−1] ⇔ minimize the sum of lengthes2 of axes of (asymptotic) confidence ellipsoids R(θˆn, α)

E-optimality: maximize λmin(M) ⇔ minimize the longest axis of R(θˆn, α) D-optimality: maximize log det M √ ⇔ minimize volume of R(θˆn, α) (proportional to 1/ det M) Very much used: a D-optimum design is invariant by reparameterization  ∂β  det M0(β(θ)) = det M(θ) det−2 ∂θ> often leads to repeat the same experimental conditions (replications) (remember Ex2: dim(θ) = 4 → 4 different sampling times, several observations at each)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 21 / 73 « Useful for model discrimination: (2) (1) if η (x, θ2) = η (x, θ1) + δ(x, θ2\1) (nested models), (2) (1) (2) estimate θ2\1 in η to decide whether η or η is more appropriate, see [Atkinson & Cox 1974]

2 DoE based on asymptotic normality C/ Design based on the information matrix

Ds -optimality: only s < p parameters of interest (and p − s «nuisance» > > > parameters) → θ = (θ1 , θ2 ), with θ1 the vector of s parameters of interest  M M   A A  M(θ) = 11 12 , M−1(θ) = 11 12 M21 M22 A21 A22 with −1 −1 A11 =[ M11 − M12M22 M21] −1 −1 −1 A12 = −[M11 − M12M22 M21] M12M22 −1 −1 −1 A21 = −M22 M21[M11 − M12M22 M21] −1 −1 −1 −1 −1 A22 = M22 + M22 M21[M11 − M12M22 M21] M12M22

−1 → maximize ΦDs [M] = det[M11 − M12M22 M21]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 22 / 73 2 DoE based on asymptotic normality C/ Design based on the information matrix

Ds -optimality: only s < p parameters of interest (and p − s «nuisance» > > > parameters) → θ = (θ1 , θ2 ), with θ1 the vector of s parameters of interest  M M   A A  M(θ) = 11 12 , M−1(θ) = 11 12 M21 M22 A21 A22 with −1 −1 A11 =[ M11 − M12M22 M21] −1 −1 −1 A12 = −[M11 − M12M22 M21] M12M22 −1 −1 −1 A21 = −M22 M21[M11 − M12M22 M21] −1 −1 −1 −1 −1 A22 = M22 + M22 M21[M11 − M12M22 M21] M12M22

−1 → maximize ΦDs [M] = det[M11 − M12M22 M21] « Useful for model discrimination: (2) (1) if η (x, θ2) = η (x, θ1) + δ(x, θ2\1) (nested models), (2) (1) (2) estimate θ2\1 in η to decide whether η or η is more appropriate, see [Atkinson & Cox 1974]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 22 / 73 • If problem dimension n × d not too large → standard algorithm (but with constraints, local optimas. . . ) • Otherwise, use an algorithm that takes the particular form of the problem into account

Exchange methods: at iteration k, exchange one support pointx j by a better one x ∗ in X (design space) — better in the sense of Φ(·) k Xn = (x1,..., xj ,..., xn) l x ∗

3 Construction of (locally) optimal designs A/ Exact design 3 Construction of (locally) optimal designs

A/ Exact design

n observations at Xn = (x1,..., xn) in a regression model (for simplicity) d Each design point xi can be anything, e.g. a point in a subset X of R

0 1 Pn ∂η(xi ,θ) ∂η(xi ,θ) maximize Φ(Mn) w.r.t. Xn with Mn = M(Xn, θ ) = n i=1 ∂θ θ0 ∂θ> θ0

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 23 / 73 • Otherwise, use an algorithm that takes the particular form of the problem into account

Exchange methods: at iteration k, exchange one support pointx j by a better one x ∗ in X (design space) — better in the sense of Φ(·) k Xn = (x1,..., xj ,..., xn) l x ∗

3 Construction of (locally) optimal designs A/ Exact design 3 Construction of (locally) optimal designs

A/ Exact design

n observations at Xn = (x1,..., xn) in a regression model (for simplicity) d Each design point xi can be anything, e.g. a point in a subset X of R

0 1 Pn ∂η(xi ,θ) ∂η(xi ,θ) maximize Φ(Mn) w.r.t. Xn with Mn = M(Xn, θ ) = n i=1 ∂θ θ0 ∂θ> θ0 • If problem dimension n × d not too large → standard algorithm (but with constraints, local optimas. . . )

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 23 / 73 3 Construction of (locally) optimal designs A/ Exact design 3 Construction of (locally) optimal designs

A/ Exact design

n observations at Xn = (x1,..., xn) in a regression model (for simplicity) d Each design point xi can be anything, e.g. a point in a subset X of R

0 1 Pn ∂η(xi ,θ) ∂η(xi ,θ) maximize Φ(Mn) w.r.t. Xn with Mn = M(Xn, θ ) = n i=1 ∂θ θ0 ∂θ> θ0 • If problem dimension n × d not too large → standard algorithm (but with constraints, local optimas. . . ) • Otherwise, use an algorithm that takes the particular form of the problem into account

Exchange methods: at iteration k, exchange one support pointx j by a better one x ∗ in X (design space) — better in the sense of Φ(·) k Xn = (x1,..., xj ,..., xn) l x ∗

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 23 / 73 One iteration → n optimizations of dimension d followed by n criterion values

3 Construction of (locally) optimal designs A/ Exact design

[Fedorov 1972]: consider all n possible exchanges successively, each time k k+1 starting from Xn , retain the «best» one among these n → Xn

k Xn = ( x1 ,..., xj ,..., xn ) l l l ∗ ∗ ∗ x1 xj xn

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 24 / 73 3 Construction of (locally) optimal designs A/ Exact design

[Fedorov 1972]: consider all n possible exchanges successively, each time k k+1 starting from Xn , retain the «best» one among these n → Xn

k Xn = ( x1 ,..., xj ,..., xn ) l l l ∗ ∗ ∗ x1 xj xn One iteration → n optimizations of dimension d followed by ranking n criterion values

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 24 / 73 ∗ globally, exchange some xj with xn+1 [= excursion of length 1, longer excursions are possible. . . ] One iteration → 1 optimization of dimension d followed by ranking n + 1 criterion values

3 Construction of (locally) optimal designs A/ Exact design

[Mitchell, 1974]: DETMAX algorithm If one additional observation were allowed → optimal choice

k+ ∗ Xn = (x1,..., xj ,..., xn, xn+1)

Then, remove one support point to return to a n-points design:

consider all n + 1 possible cancellations, retain the less penalizing in the sense of Φ(Mn)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 25 / 73 3 Construction of (locally) optimal designs A/ Exact design

[Mitchell, 1974]: DETMAX algorithm If one additional observation were allowed → optimal choice

k+ ∗ Xn = (x1,..., xj ,..., xn, xn+1)

Then, remove one support point to return to a n-points design:

consider all n + 1 possible cancellations, retain the less penalizing in the sense of Φ(Mn)

∗ globally, exchange some xj with xn+1 [= excursion of length 1, longer excursions are possible. . . ] One iteration → 1 optimization of dimension d followed by ranking n + 1 criterion values

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 25 / 73 dead ends are possible: • DETMAX: the point to be removed is xn+1 • Fedorov: no possible improvement when optimizing one xi at a time L both give local optima only L Other methods: Branch and bound: guaranteed convergence, but complicated [Welch 1982] Rounding an measure (support points xi and associated ∗ weights wi , i = 1,..., m, presented next in B/): choose n integers ri (ri = nb. of replications of observations at xi ) such that Pm ∗ i=1 ri = n and ri /n ≈ wi ∗ (e.g., maximize mini=1,...,m ri /wi = Adams apportionment, see [Pukelsheim & Reider 1992])

3 Construction of (locally) optimal designs A/ Exact design

DETMAX has simpler iterations than Fedorov, but usually requires more iterations

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 26 / 73 L both give local optima only L Other methods: Branch and bound: guaranteed convergence, but complicated [Welch 1982] Rounding an optimal design measure (support points xi and associated ∗ weights wi , i = 1,..., m, presented next in B/): choose n integers ri (ri = nb. of replications of observations at xi ) such that Pm ∗ i=1 ri = n and ri /n ≈ wi ∗ (e.g., maximize mini=1,...,m ri /wi = Adams apportionment, see [Pukelsheim & Reider 1992])

3 Construction of (locally) optimal designs A/ Exact design

DETMAX has simpler iterations than Fedorov, but usually requires more iterations dead ends are possible: • DETMAX: the point to be removed is xn+1 • Fedorov: no possible improvement when optimizing one xi at a time

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 26 / 73 Other methods: Branch and bound: guaranteed convergence, but complicated [Welch 1982] Rounding an optimal design measure (support points xi and associated ∗ weights wi , i = 1,..., m, presented next in B/): choose n integers ri (ri = nb. of replications of observations at xi ) such that Pm ∗ i=1 ri = n and ri /n ≈ wi ∗ (e.g., maximize mini=1,...,m ri /wi = Adams apportionment, see [Pukelsheim & Reider 1992])

3 Construction of (locally) optimal designs A/ Exact design

DETMAX has simpler iterations than Fedorov, but usually requires more iterations dead ends are possible: • DETMAX: the point to be removed is xn+1 • Fedorov: no possible improvement when optimizing one xi at a time L both give local optima only L

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 26 / 73 3 Construction of (locally) optimal designs A/ Exact design

DETMAX has simpler iterations than Fedorov, but usually requires more iterations dead ends are possible: • DETMAX: the point to be removed is xn+1 • Fedorov: no possible improvement when optimizing one xi at a time L both give local optima only L Other methods: Branch and bound: guaranteed convergence, but complicated [Welch 1982] Rounding an optimal design measure (support points xi and associated ∗ weights wi , i = 1,..., m, presented next in B/): choose n integers ri (ri = nb. of replications of observations at xi ) such that Pm ∗ i=1 ri = n and ri /n ≈ wi ∗ (e.g., maximize mini=1,...,m ri /wi = Adams apportionment, see [Pukelsheim & Reider 1992])

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 26 / 73 Suppose that several xi ’s coincide (replications): only m < n different xi ’s m 0 X ri ∂η(xi , θ) ∂η(xi , θ) M(Xn, θ ) = 0 0 n ∂θ θ ∂θ> θ i=1

r i = proportion of observations collected at x n i = «percentage of experimental effort» at xi

= weight wi of support point xi

3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

B/ Design measures: approximate design theory

[Chernoff 1953; Kiefer & Wolfowitz 1960, Fedorov 1972; Silvey 1980, Pázman 1986, Pukelsheim 1993, Fedorov & Leonov 2014. . . ] (nonlinear) regression, n observations at Xn = (x1,..., xn) with i.i.d. errors: n 0 1 X ∂η(xi , θ) ∂η(xi , θ) M(Xn, θ ) = 0 0 n ∂θ θ ∂θ> θ i=1 (the additive form is essential — related to the independence of observations)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 27 / 73 3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

B/ Design measures: approximate design theory

[Chernoff 1953; Kiefer & Wolfowitz 1960, Fedorov 1972; Silvey 1980, Pázman 1986, Pukelsheim 1993, Fedorov & Leonov 2014. . . ] (nonlinear) regression, n observations at Xn = (x1,..., xn) with i.i.d. errors: n 0 1 X ∂η(xi , θ) ∂η(xi , θ) M(Xn, θ ) = 0 0 n ∂θ θ ∂θ> θ i=1 (the additive form is essential — related to the independence of observations) Suppose that several xi ’s coincide (replications): only m < n different xi ’s m 0 X ri ∂η(xi , θ) ∂η(xi , θ) M(Xn, θ ) = 0 0 n ∂θ θ ∂θ> θ i=1

r i = proportion of observations collected at x n i = «percentage of experimental effort» at xi

= weight wi of support point xi

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 27 / 73 Pm « Release the constraints: use wi ≥ 0 with i=1 wi = 1 ξ = discrete probability measure on X (= design space) support points xi and associated weights wi = «approximate design»

More general expression: ξ = any probability measure on X Z Z 0 ∂η(x, θ) ∂η(x, θ) M(ξ,θ ) = > ξ(dx) , ξ(dx)= 1 X ∂θ θ0 ∂θ θ0 X

3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

m 0 X ∂η(xi , θ) ∂η(xi , θ) M(Xn, θ ) = wi 0 0 ∂θ θ ∂θ> θ i=1   x1 ··· xm Pm design Xn ⇔ with i=1 wi = 1 w1 ··· wm normalized discrete distribution on the xi , with constraints ri /n = wi

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 28 / 73 More general expression: ξ = any probability measure on X Z Z 0 ∂η(x, θ) ∂η(x, θ) M(ξ,θ ) = > ξ(dx) , ξ(dx)= 1 X ∂θ θ0 ∂θ θ0 X

3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

m 0 X ∂η(xi , θ) ∂η(xi , θ) M(Xn, θ ) = wi 0 0 ∂θ θ ∂θ> θ i=1   x1 ··· xm Pm design Xn ⇔ with i=1 wi = 1 w1 ··· wm normalized discrete distribution on the xi , with constraints ri /n = wi

Pm « Release the constraints: use wi ≥ 0 with i=1 wi = 1 ξ = discrete probability measure on X (= design space) support points xi and associated weights wi = «approximate design»

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 28 / 73 3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

m 0 X ∂η(xi , θ) ∂η(xi , θ) M(Xn, θ ) = wi 0 0 ∂θ θ ∂θ> θ i=1   x1 ··· xm Pm design Xn ⇔ with i=1 wi = 1 w1 ··· wm normalized discrete distribution on the xi , with constraints ri /n = wi

Pm « Release the constraints: use wi ≥ 0 with i=1 wi = 1 ξ = discrete probability measure on X (= design space) support points xi and associated weights wi = «approximate design»

More general expression: ξ = any probability measure on X Z Z 0 ∂η(x, θ) ∂η(x, θ) M(ξ,θ ) = > ξ(dx) , ξ(dx)= 1 X ∂θ θ0 ∂θ θ0 X

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 28 / 73 ξ = w1δx1 + w2δx2 + w3δx3 (3 points are enough for q = 2)

3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

M(ξ) ∈ convex closure of M= set of rank 1 matrices ∂η(x,θ) ∂η(x,θ) M(δx ) = ∂θ θ0 ∂θ> θ0 p(p+1) M(ξ) is symmetric p × p: ∈ q-dimensional space, q = 2

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 29 / 73 3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

M(ξ) ∈ convex closure of M= set of rank 1 matrices ∂η(x,θ) ∂η(x,θ) M(δx ) = ∂θ θ0 ∂θ> θ0 p(p+1) M(ξ) is symmetric p × p: ∈ q-dimensional space, q = 2

ξ = w1δx1 + w2δx2 + w3δx3 (3 points are enough for q = 2)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 29 / 73 [Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is p(p+1) on the boundary of the convex closure of M and 2 support points are enough] ∗ Pm ∗ Suppose we found an optimal ξ = i=1 wi δxi ri ∗ for a given n, choose the ri so that n ' wi optimum → rounding of an approximate design Sometimes, ξ∗ can be implemented without any approximation: ξ = power spectral density of an input signal → design of optimal input for ODE model in the frequency domain

Why design measures are interesting? How does it simplify the optimization problem?

3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

Caratheodory Theorem: M(ξ) can be written as the linear combination of at most q + 1 elements of M:

m X ∂η(xi , θ) ∂η(xi , θ) p(p + 1) M(ξ) = wi 0 0 , m ≤ + 1 ∂θ θ ∂θ> θ 2 i=1

p(p+1) ⇒ consider discrete probability measures with 2 + 1 support points at most (true in particular for the optimum design!)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 30 / 73 ∗ Pm ∗ Suppose we found an optimal ξ = i=1 wi δxi ri ∗ for a given n, choose the ri so that n ' wi optimum → rounding of an approximate design Sometimes, ξ∗ can be implemented without any approximation: ξ = power spectral density of an input signal → design of optimal input for ODE model in the frequency domain

Why design measures are interesting? How does it simplify the optimization problem?

3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

Caratheodory Theorem: M(ξ) can be written as the linear combination of at most q + 1 elements of M:

m X ∂η(xi , θ) ∂η(xi , θ) p(p + 1) M(ξ) = wi 0 0 , m ≤ + 1 ∂θ θ ∂θ> θ 2 i=1

p(p+1) ⇒ consider discrete probability measures with 2 + 1 support points at most (true in particular for the optimum design!) [Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is p(p+1) on the boundary of the convex closure of M and 2 support points are enough]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 30 / 73 Sometimes, ξ∗ can be implemented without any approximation: ξ = power spectral density of an input signal → design of optimal input for ODE model in the frequency domain

Why design measures are interesting? How does it simplify the optimization problem?

3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

Caratheodory Theorem: M(ξ) can be written as the linear combination of at most q + 1 elements of M:

m X ∂η(xi , θ) ∂η(xi , θ) p(p + 1) M(ξ) = wi 0 0 , m ≤ + 1 ∂θ θ ∂θ> θ 2 i=1

p(p+1) ⇒ consider discrete probability measures with 2 + 1 support points at most (true in particular for the optimum design!) [Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is p(p+1) on the boundary of the convex closure of M and 2 support points are enough] ∗ Pm ∗ Suppose we found an optimal ξ = i=1 wi δxi ri ∗ for a given n, choose the ri so that n ' wi optimum → rounding of an approximate design

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 30 / 73 Why design measures are interesting? How does it simplify the optimization problem?

3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

Caratheodory Theorem: M(ξ) can be written as the linear combination of at most q + 1 elements of M:

m X ∂η(xi , θ) ∂η(xi , θ) p(p + 1) M(ξ) = wi 0 0 , m ≤ + 1 ∂θ θ ∂θ> θ 2 i=1

p(p+1) ⇒ consider discrete probability measures with 2 + 1 support points at most (true in particular for the optimum design!) [Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is p(p+1) on the boundary of the convex closure of M and 2 support points are enough] ∗ Pm ∗ Suppose we found an optimal ξ = i=1 wi δxi ri ∗ for a given n, choose the ri so that n ' wi optimum → rounding of an approximate design Sometimes, ξ∗ can be implemented without any approximation: ξ = power spectral density of an input signal → design of optimal input for ODE model in the frequency domain

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 30 / 73 3 Construction of (locally) optimal designs B/ Design measures: approximate design theory

Caratheodory Theorem: M(ξ) can be written as the linear combination of at most q + 1 elements of M:

m X ∂η(xi , θ) ∂η(xi , θ) p(p + 1) M(ξ) = wi 0 0 , m ≤ + 1 ∂θ θ ∂θ> θ 2 i=1

p(p+1) ⇒ consider discrete probability measures with 2 + 1 support points at most (true in particular for the optimum design!) [Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is p(p+1) on the boundary of the convex closure of M and 2 support points are enough] ∗ Pm ∗ Suppose we found an optimal ξ = i=1 wi δxi ri ∗ for a given n, choose the ri so that n ' wi optimum → rounding of an approximate design Sometimes, ξ∗ can be implemented without any approximation: ξ = power spectral density of an input signal → design of optimal input for ODE model in the frequency domain

Why design measures are interesting? How does it simplify the optimization problem?

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 30 / 73 ξ∗ is optimal ⇔ directional derivative ≤ 0 in all directions =⇒ «Equivalence Theorem» [Kiefer & Wolfowitz 1960]

3 Construction of (locally) optimal designs C/ Optimal design measures

C/ Optimal design measures

« Maximize Φ(·) concave w.r.t. M(ξ) in a convex set Ex: D-optimality: ∀ M1 O, M2  O, with M2 ∝6 M2, ∀α, 0 < α < 1, log det[(1 − α)M1 + αM2] > (1 − α) log det M1 + α log det M2 ⇒ log det[·] is (strictly) concave convex set + concave criterion ⇒ one unique optimum!

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 31 / 73 3 Construction of (locally) optimal designs C/ Optimal design measures

C/ Optimal design measures

« Maximize Φ(·) concave w.r.t. M(ξ) in a convex set Ex: D-optimality: ∀ M1 O, M2  O, with M2 ∝6 M2, ∀α, 0 < α < 1, log det[(1 − α)M1 + αM2] > (1 − α) log det M1 + α log det M2 ⇒ log det[·] is (strictly) concave convex set + concave criterion ⇒ one unique optimum!

ξ∗ is optimal ⇔ directional derivative ≤ 0 in all directions =⇒ «Equivalence Theorem» [Kiefer & Wolfowitz 1960]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 31 / 73 Takes a simple form when Φ(·) is differentiable ∗ ∗ ξ maximizes φ(ξ) ⇔ maxx∈X Fφ(ξ ; δx ) ≤ 0 ∗ ∗ Check optimality of ξ by plotting Fφ(ξ ; δx ) Ex: D-optimal design ∗ ξD maximizes log det[M(ξ)] w.r.t. ξ ∈ Ξ ∗ ⇔ maxx∈X d(ξD, x) ≤ p ∗ ⇔ ξD minimizes maxx∈X d(ξ, x) w.r.t. ξ ∈ Ξ ∂η(x,θ) −1 ∂η(x,θ) where d(ξ, x) = ∂θ> θ0 M (ξ) ∂θ θ0 ∗ ∗ Moreover, d(ξD, xi ) = p = dim(θ) for any xi = support point of ξD

3 Construction of (locally) optimal designs C/ Optimal design measures

Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)] φ[(1−α)ξ+αν]−φ(ξ) Fφ(ξ; ν) = limα→0+ α = directional derivative of φ(·) at ξ in direction ν

∗ ∗ Equivalence Theorem: ξ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ ; ν) ≤ 0

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 32 / 73 Ex: D-optimal design ∗ ξD maximizes log det[M(ξ)] w.r.t. ξ ∈ Ξ ∗ ⇔ maxx∈X d(ξD, x) ≤ p ∗ ⇔ ξD minimizes maxx∈X d(ξ, x) w.r.t. ξ ∈ Ξ ∂η(x,θ) −1 ∂η(x,θ) where d(ξ, x) = ∂θ> θ0 M (ξ) ∂θ θ0 ∗ ∗ Moreover, d(ξD, xi ) = p = dim(θ) for any xi = support point of ξD

3 Construction of (locally) optimal designs C/ Optimal design measures

Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)] φ[(1−α)ξ+αν]−φ(ξ) Fφ(ξ; ν) = limα→0+ α = directional derivative of φ(·) at ξ in direction ν

∗ ∗ Equivalence Theorem: ξ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ ; ν) ≤ 0

Takes a simple form when Φ(·) is differentiable ∗ ∗ ξ maximizes φ(ξ) ⇔ maxx∈X Fφ(ξ ; δx ) ≤ 0 ∗ ∗ Check optimality of ξ by plotting Fφ(ξ ; δx )

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 32 / 73 3 Construction of (locally) optimal designs C/ Optimal design measures

Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)] φ[(1−α)ξ+αν]−φ(ξ) Fφ(ξ; ν) = limα→0+ α = directional derivative of φ(·) at ξ in direction ν

∗ ∗ Equivalence Theorem: ξ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ ; ν) ≤ 0

Takes a simple form when Φ(·) is differentiable ∗ ∗ ξ maximizes φ(ξ) ⇔ maxx∈X Fφ(ξ ; δx ) ≤ 0 ∗ ∗ Check optimality of ξ by plotting Fφ(ξ ; δx ) Ex: D-optimal design ∗ ξD maximizes log det[M(ξ)] w.r.t. ξ ∈ Ξ ∗ ⇔ maxx∈X d(ξD, x) ≤ p ∗ ⇔ ξD minimizes maxx∈X d(ξ, x) w.r.t. ξ ∈ Ξ ∂η(x,θ) −1 ∂η(x,θ) where d(ξ, x) = ∂θ> θ0 M (ξ) ∂θ θ0 ∗ ∗ Moreover, d(ξD, xi ) = p = dim(θ) for any xi = support point of ξD

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 32 / 73  0.01 0.75   0 1/θ = 0.5  ξ = ξ∗ = 2 2 1/2 1/2 D 1/2 1/2

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

3 Construction of (locally) optimal designs C/ Optimal design measures

+ Ex: η(x, θ) = θ1 exp(−θ2x) (p = 2) i.i.d. errors, X = R 0 [θ2 = 2] → d(ξ, x) as a function of x

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 33 / 73  0 1/θ = 0.5  ξ∗ = 2 D 1/2 1/2

2.5

2

1.5

1

0.5

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

3 Construction of (locally) optimal designs C/ Optimal design measures

+ Ex: η(x, θ) = θ1 exp(−θ2x) (p = 2) i.i.d. errors, X = R 0 [θ2 = 2] → d(ξ, x) as a function of x  0.01 0.75  ξ = 2 1/2 1/2

2.5

2

1.5

1

0.5

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 33 / 73 3 Construction of (locally) optimal designs C/ Optimal design measures

+ Ex: η(x, θ) = θ1 exp(−θ2x) (p = 2) i.i.d. errors, X = R 0 [θ2 = 2] → d(ξ, x) as a function of x  0.01 0.75   0 1/θ = 0.5  ξ = ξ∗ = 2 2 1/2 1/2 D 1/2 1/2

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 33 / 73 1.4

1.2

1

0.8 ¯ ¯ η(x, θ), η(x, θ)± 2 st.d. 0.6

0.4

0.2

0

−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 → put next observation where d(ξ, x) is large

3 Construction of (locally) optimal designs C/ Optimal design measures

KW Eq. Th. relates optimality in θ space to optimality in y space (i.i.d. errors) ˆn 2 ∂η(x,θ) −1 ¯ ∂η(θ,u) 2 n var[η(x, θ )] → σ ∂θ> θ¯M (ξ, θ) ∂θ θ¯ = σ d(ξ, x) θ¯, n → ∞ D-optimality ⇔ G-optimality ∗ « ξD minimizes the maximum of prediction variance over X

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 34 / 73 3 Construction of (locally) optimal designs C/ Optimal design measures

KW Eq. Th. relates optimality in θ space to optimality in y space (i.i.d. errors) ˆn 2 ∂η(x,θ) −1 ¯ ∂η(θ,u) 2 n var[η(x, θ )] → σ ∂θ> θ¯M (ξ, θ) ∂θ θ¯ = σ d(ξ, x) θ¯, n → ∞ D-optimality ⇔ G-optimality ∗ « ξD minimizes the maximum of prediction variance over X 1.4

1.2

1

0.8 ¯ ¯ η(x, θ), η(x, θ)± 2 st.d. 0.6

0.4

0.2

0

−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 → put next observation where d(ξ, x) is large

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 34 / 73 Dual problem to D-optimum design: n ∂η(x,θ) o Define S = ∂θ θ0 , x ∈ X [S ∪ −S = Elfving’s set] E∗ = minimum-volume ellipsoid centered at 0 that contains S ∗ Rp > −1 ∗ ∗ Lagrangian theory ⇒ E = {z ∈ : z MF (ξD)z ≤ p} where ξD is D-optimum ∗ ∗ support points of ξD = contact between E and S In general, few contact points → repeat observations at the same place (see [Yang 2010, Dette & Melas 2011])

There exist dual problems for other criteria Φ(·) (= one of the main topics in [Pukelsheim 1993])

3 Construction of (locally) optimal designs C/ Optimal design measures

Remark: Eq. Th. = stationarity condition = NS condition for optimality 6= duality property!

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 35 / 73 In general, few contact points → repeat observations at the same place (see [Yang 2010, Dette & Melas 2011])

There exist dual problems for other criteria Φ(·) (= one of the main topics in [Pukelsheim 1993])

3 Construction of (locally) optimal designs C/ Optimal design measures

Remark: Eq. Th. = stationarity condition = NS condition for optimality 6= duality property!

Dual problem to D-optimum design: n ∂η(x,θ) o Define S = ∂θ θ0 , x ∈ X [S ∪ −S = Elfving’s set] E∗ = minimum-volume ellipsoid centered at 0 that contains S ∗ Rp > −1 ∗ ∗ Lagrangian theory ⇒ E = {z ∈ : z MF (ξD)z ≤ p} where ξD is D-optimum ∗ ∗ support points of ξD = contact between E and S

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 35 / 73 3 Construction of (locally) optimal designs C/ Optimal design measures

Remark: Eq. Th. = stationarity condition = NS condition for optimality 6= duality property!

Dual problem to D-optimum design: n ∂η(x,θ) o Define S = ∂θ θ0 , x ∈ X [S ∪ −S = Elfving’s set] E∗ = minimum-volume ellipsoid centered at 0 that contains S ∗ Rp > −1 ∗ ∗ Lagrangian theory ⇒ E = {z ∈ : z MF (ξD)z ≤ p} where ξD is D-optimum ∗ ∗ support points of ξD = contact between E and S In general, few contact points → repeat observations at the same place (see [Yang 2010, Dette & Melas 2011])

There exist dual problems for other criteria Φ(·) (= one of the main topics in [Pukelsheim 1993])

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 35 / 73 3 Construction of (locally) optimal designs C/ Optimal design measures

θ1 Ex: η(x, θ) = [exp(−θ2x) − exp(−θ1x)] θ1−θ2 θ = (1, 5), X = R+

0.2

0.15

0.1

0.05

0

−0.05

−0.1

−0.15

−0.2 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 E∗ and S

∗ ⇒ D optimum design ξD supported on two points

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 36 / 73 Fedorov–Wynn Algorithm: sort of steepest ascent 1 : Choose ξ1 not degenerate (det M(ξ1) > 0) ∗ k 2 : Compute xk = arg maxX Fφ(ξ ; δx ) k k If Fφ(ξ ; δ + ) < , stop: ξ is -optimal xk k+1 k ∗ 3 : ξ = (1 − α )ξ + α δ ∗ (delta measure at x ) k k xk k [Vertex Direction] k → k + 1, return to Step 2

Step-length αk ? d(ξk ,x∗)−p « k+1 k αk = arg max φ(ξ ) [= p[d(ξk ,x∗)−1] for D-optimal design [Fedorov 1972]] k monotone convergence P∞ « αk > 0 , limk→∞ αk = 0 , i=1 αk = ∞ [[Wynn 1970] for D-optimal design]

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

D/ Construction of an optimal design measure

Ξ = set of probability measures on X , Φ(·) concave and differentiable, φ(ξ) = Φ[M(ξ)] ∗ ∗ Concavity =⇒ for any ξ ∈ Ξ, φ(ξ ) ≤ φ(ξ) + maxx∈X Fφ(ξ ; δx )

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 37 / 73 Step-length αk ? d(ξk ,x∗)−p « k+1 k αk = arg max φ(ξ ) [= p[d(ξk ,x∗)−1] for D-optimal design [Fedorov 1972]] k monotone convergence P∞ « αk > 0 , limk→∞ αk = 0 , i=1 αk = ∞ [[Wynn 1970] for D-optimal design]

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

D/ Construction of an optimal design measure

Ξ = set of probability measures on X , Φ(·) concave and differentiable, φ(ξ) = Φ[M(ξ)] ∗ ∗ Concavity =⇒ for any ξ ∈ Ξ, φ(ξ ) ≤ φ(ξ) + maxx∈X Fφ(ξ ; δx )

Fedorov–Wynn Algorithm: sort of steepest ascent 1 : Choose ξ1 not degenerate (det M(ξ1) > 0) ∗ k 2 : Compute xk = arg maxX Fφ(ξ ; δx ) k k If Fφ(ξ ; δ + ) < , stop: ξ is -optimal xk k+1 k ∗ 3 : ξ = (1 − α )ξ + α δ ∗ (delta measure at x ) k k xk k [Vertex Direction] k → k + 1, return to Step 2

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 37 / 73 P∞ « αk > 0 , limk→∞ αk = 0 , i=1 αk = ∞ [[Wynn 1970] for D-optimal design]

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

D/ Construction of an optimal design measure

Ξ = set of probability measures on X , Φ(·) concave and differentiable, φ(ξ) = Φ[M(ξ)] ∗ ∗ Concavity =⇒ for any ξ ∈ Ξ, φ(ξ ) ≤ φ(ξ) + maxx∈X Fφ(ξ ; δx )

Fedorov–Wynn Algorithm: sort of steepest ascent 1 : Choose ξ1 not degenerate (det M(ξ1) > 0) ∗ k 2 : Compute xk = arg maxX Fφ(ξ ; δx ) k k If Fφ(ξ ; δ + ) < , stop: ξ is -optimal xk k+1 k ∗ 3 : ξ = (1 − α )ξ + α δ ∗ (delta measure at x ) k k xk k [Vertex Direction] k → k + 1, return to Step 2

Step-length αk ? d(ξk ,x∗)−p « k+1 k αk = arg max φ(ξ ) [= p[d(ξk ,x∗)−1] for D-optimal design [Fedorov 1972]] k monotone convergence

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 37 / 73 3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

D/ Construction of an optimal design measure

Ξ = set of probability measures on X , Φ(·) concave and differentiable, φ(ξ) = Φ[M(ξ)] ∗ ∗ Concavity =⇒ for any ξ ∈ Ξ, φ(ξ ) ≤ φ(ξ) + maxx∈X Fφ(ξ ; δx )

Fedorov–Wynn Algorithm: sort of steepest ascent 1 : Choose ξ1 not degenerate (det M(ξ1) > 0) ∗ k 2 : Compute xk = arg maxX Fφ(ξ ; δx ) k k If Fφ(ξ ; δ + ) < , stop: ξ is -optimal xk k+1 k ∗ 3 : ξ = (1 − α )ξ + α δ ∗ (delta measure at x ) k k xk k [Vertex Direction] k → k + 1, return to Step 2

Step-length αk ? d(ξk ,x∗)−p « k+1 k αk = arg max φ(ξ ) [= p[d(ξk ,x∗)−1] for D-optimal design [Fedorov 1972]] k monotone convergence P∞ « αk > 0 , limk→∞ αk = 0 , i=1 αk = ∞ [[Wynn 1970] for D-optimal design]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 37 / 73 Guaranteed convergence to optimum Faster methods exist: k remove support points from ξ (≈ allow αk to be < 0) [Atwood 1973; Böhning 1985, 1986] combine with gradient projection (or a second-order method) [Wu 1978] use a multiplicative algorithm [Titterington 1976; Torsney 1983–2009; Yu 2010] [for D or A optimal design, far from the optimum] combine different methods [Yu 2011] Still an active topic. . .

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

Remarks:

Consider sequential design, one xi at a time enters M(X) k M(Xk+1) = k+1 M(Xk ) 1 ∂η(xk+1,θ) ∂η(xk+1,θ) + k+1 ∂θ θ0 ∂θ> θ0 k with xk+1 = arg maxX Fφ(ξ ; δx ) 1 ⇔ Wynn algorithm with αk = k+1

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 38 / 73 Faster methods exist: k remove support points from ξ (≈ allow αk to be < 0) [Atwood 1973; Böhning 1985, 1986] combine with gradient projection (or a second-order method) [Wu 1978] use a multiplicative algorithm [Titterington 1976; Torsney 1983–2009; Yu 2010] [for D or A optimal design, far from the optimum] combine different methods [Yu 2011] Still an active topic. . .

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

Remarks:

Consider sequential design, one xi at a time enters M(X) k M(Xk+1) = k+1 M(Xk ) 1 ∂η(xk+1,θ) ∂η(xk+1,θ) + k+1 ∂θ θ0 ∂θ> θ0 k with xk+1 = arg maxX Fφ(ξ ; δx ) 1 ⇔ Wynn algorithm with αk = k+1 Guaranteed convergence to optimum

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 38 / 73 3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

Remarks:

Consider sequential design, one xi at a time enters M(X) k M(Xk+1) = k+1 M(Xk ) 1 ∂η(xk+1,θ) ∂η(xk+1,θ) + k+1 ∂θ θ0 ∂θ> θ0 k with xk+1 = arg maxX Fφ(ξ ; δx ) 1 ⇔ Wynn algorithm with αk = k+1 Guaranteed convergence to optimum Faster methods exist: k remove support points from ξ (≈ allow αk to be < 0) [Atwood 1973; Böhning 1985, 1986] combine with gradient projection (or a second-order method) [Wu 1978] use a multiplicative algorithm [Titterington 1976; Torsney 1983–2009; Yu 2010] [for D or A optimal design, far from the optimum] combine different methods [Yu 2011] Still an active topic. . .

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 38 / 73 « Combine continuous search for support points in X with optimization of a design measure with few support points, say m  ` « Exploit guaranteed (and fast) convergence of algorithms for m small + use Eq. Th. to check optimality [Yang et al., 2013, P & Zhigljavsky, 2014]

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

Remarks: Usually, X = compact subset of Rd (e.g., the probability simplex for mixture experiments) → discretized into X` with ` elements (a grid — or better, a low-discrepancy sequence, see [Niederreiter 1992]) the algorithms above may be slow when ` is large

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 39 / 73 3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

Remarks: Usually, X = compact subset of Rd (e.g., the probability simplex for mixture experiments) → discretized into X` with ` elements (a grid — or better, a low-discrepancy sequence, see [Niederreiter 1992]) the algorithms above may be slow when ` is large

« Combine continuous search for support points in X with optimization of a design measure with few support points, say m  ` « Exploit guaranteed (and fast) convergence of algorithms for m small + use Eq. Th. to check optimality [Yang et al., 2013, P & Zhigljavsky, 2014]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 39 / 73 ∗ Additive model [Schwabe 1995]: ξD = tensor product of optimal designs for (1) (1) (1) β0 + β1 exp(−β2 x1) and (2) (2) (2) (2) (2) (2) β0 + β1 [exp(−β2 x2) − exp(−β1 x2)]/(β1 − β2 )

∗ Use the Equivalence Th. to construct ξD (with arbitrary precision — Maple) [weight 1/9 at (0, 0.46268527927, 2) ⊗ (0, 1.22947139883, 6.85768905493)]

« 7 iterations of the algorithm in [P & Zhigljavsky, 2014] yield ξ such that −5 maxx∈X Fφ(ξ; δx ) < 10

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

Ex: D-optimal design for

θ3 η(x, θ) = θ0 + θ1 exp(−θ2x1) + [exp(−θ4x2) − exp(−θ3x2)] θ3 − θ4

0 0 0 with x = (x1, x2) ∈ X = [0, 2] × [0, 10] (and p = 5, θ2 = 2, θ3 = 0.7, θ4 = 0.2)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 40 / 73 ∗ Use the Equivalence Th. to construct ξD (with arbitrary precision — Maple) [weight 1/9 at (0, 0.46268527927, 2) ⊗ (0, 1.22947139883, 6.85768905493)]

« 7 iterations of the algorithm in [P & Zhigljavsky, 2014] yield ξ such that −5 maxx∈X Fφ(ξ; δx ) < 10

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

Ex: D-optimal design for

θ3 η(x, θ) = θ0 + θ1 exp(−θ2x1) + [exp(−θ4x2) − exp(−θ3x2)] θ3 − θ4

0 0 0 with x = (x1, x2) ∈ X = [0, 2] × [0, 10] (and p = 5, θ2 = 2, θ3 = 0.7, θ4 = 0.2)

∗ Additive model [Schwabe 1995]: ξD = tensor product of optimal designs for (1) (1) (1) β0 + β1 exp(−β2 x1) and (2) (2) (2) (2) (2) (2) β0 + β1 [exp(−β2 x2) − exp(−β1 x2)]/(β1 − β2 )

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 40 / 73 3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

Ex: D-optimal design for

θ3 η(x, θ) = θ0 + θ1 exp(−θ2x1) + [exp(−θ4x2) − exp(−θ3x2)] θ3 − θ4

0 0 0 with x = (x1, x2) ∈ X = [0, 2] × [0, 10] (and p = 5, θ2 = 2, θ3 = 0.7, θ4 = 0.2)

∗ Additive model [Schwabe 1995]: ξD = tensor product of optimal designs for (1) (1) (1) β0 + β1 exp(−β2 x1) and (2) (2) (2) (2) (2) (2) β0 + β1 [exp(−β2 x2) − exp(−β1 x2)]/(β1 − β2 )

∗ Use the Equivalence Th. to construct ξD (with arbitrary precision — Maple) [weight 1/9 at (0, 0.46268527927, 2) ⊗ (0, 1.22947139883, 6.85768905493)]

« 7 iterations of the algorithm in [P & Zhigljavsky, 2014] yield ξ such that −5 maxx∈X Fφ(ξ; δx ) < 10

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 40 / 73 subgradients (↔ directional derivatives) general method for non-differentiable optimization (cutting plane method [Kelley 1960], level method [Nesterov 2004]), see Chap. 9 of [P & Pázman 2013]

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

What if Φ(·) not differentiable? (e.g., maximize Φ(M) = λmin(M)) Φ(·) concave, X discretized into X`, ` not too large optimal design ⇐⇒ optimal vector of weights w ∈ R` P` wi ≥ 0, i=1 wi = 1

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 41 / 73 general method for non-differentiable optimization (cutting plane method [Kelley 1960], level method [Nesterov 2004]), see Chap. 9 of [P & Pázman 2013]

3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

What if Φ(·) not differentiable? (e.g., maximize Φ(M) = λmin(M)) Φ(·) concave, X discretized into X`, ` not too large optimal design ⇐⇒ optimal vector of weights w ∈ R` P` wi ≥ 0, i=1 wi = 1 subgradients (↔ directional derivatives)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 41 / 73 3 Construction of (locally) optimal designs D/ Construction of an optimal design measure

What if Φ(·) not differentiable? (e.g., maximize Φ(M) = λmin(M)) Φ(·) concave, X discretized into X`, ` not too large optimal design ⇐⇒ optimal vector of weights w ∈ R` P` wi ≥ 0, i=1 wi = 1 subgradients (↔ directional derivatives) general method for non-differentiable optimization (cutting plane method [Kelley 1960], level method [Nesterov 2004]), see Chap. 9 of [P & Pázman 2013]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 41 / 73 4 Problems with nonlinear models 4 Problems with nonlinear models

Linear regression: easy

> p The expectation surface Sη = {η(θ) = (η(x1, θ), . . . , η(xn, θ)) : θ ∈ R } is flat and linearly parameterized

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 42 / 73 4 Problems with nonlinear models

Nonlinear regression: may be a bit tricky

Sη is curved (intrinsic curvature) and nonlinearly parameterized (parametric curvature) [Bates & Watts 1980]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 43 / 73 4 Problems with nonlinear models

3 2 Ex: η(x, θ) = θ1{x}1 + θ1(1 − {x}1) + θ2{x}2 + θ2(1 − {x}2) X = (x1, x2, x3), x1 = (0 1), x2 = (1 0), x3 = (1 1), θ ∈ [−3, 4] × [−2, 2]

6

4

2

3 0 η

−2

−4

−6 8 6 80 4 60 2 40 0 20 0 −2 −20 −4 −40 η η 2 1

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 44 / 73 ƒ Everything is local (depends on θ): if we linearize, where do we linearize? (choice of a nominal value θ0) « nonlocal optimum design

4 Problems with nonlinear models

Two important difficulties: ‚ Asymptotically (n → ∞) — or if σ2 small enough — all seems fine (use linear approximations), but the distribution of θˆn may be far from normal for small n (or for σ2 large) « small-sample properties

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 45 / 73 4 Problems with nonlinear models

Two important difficulties: ‚ Asymptotically (n → ∞) — or if σ2 small enough — all seems fine (use linear approximations), but the distribution of θˆn may be far from normal for small n (or for σ2 large) « small-sample properties

ƒ Everything is local (depends on θ): if we linearize, where do we linearize? (choice of a nominal value θ0) « nonlocal optimum design

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 45 / 73 p ® Expectation surface: Sη = {η(θ): θ ∈ R } ® Orthogonal projector onto the tangent space to Sη at η(θ): 1 ∂η(θ) ∂η(θ) P = M−1(X, θ) (a n × n matrix) θ n ∂θ> ∂θ (both depend on X)

5 Small-sample properties A/ A classification of regression models 5 Small-sample properties

A/ A classification of regression models

Suppose that ¯ 2 2 yi = y(xi ) = η(xi , θ) + εi with E{εi } = 0 and E{εi } = σ (xi ) for all i 2 2 Divide yi and η(xi , θ¯) by σ(xi ) one may suppose that σ (x) = σ for all x Denote > > y = (y1,..., yn) and η(θ) = (η(xi , θ), . . . , η(xi , θ)) > 2 ε = (ε1, . . . , εn) so that E{ε} = 0 and Var(ε) = σ In We suppose η(x, θ) twice continuously differentiable w.r.t. θ for any x

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 46 / 73 5 Small-sample properties A/ A classification of regression models 5 Small-sample properties

A/ A classification of regression models

Suppose that ¯ 2 2 yi = y(xi ) = η(xi , θ) + εi with E{εi } = 0 and E{εi } = σ (xi ) for all i 2 2 Divide yi and η(xi , θ¯) by σ(xi ) one may suppose that σ (x) = σ for all x Denote > > y = (y1,..., yn) and η(θ) = (η(xi , θ), . . . , η(xi , θ)) > 2 ε = (ε1, . . . , εn) so that E{ε} = 0 and Var(ε) = σ In We suppose η(x, θ) twice continuously differentiable w.r.t. θ for any x

p ® Expectation surface: Sη = {η(θ): θ ∈ R } ® Orthogonal projector onto the tangent space to Sη at η(θ): 1 ∂η(θ) ∂η(θ) P = M−1(X, θ) (a n × n matrix) θ n ∂θ> ∂θ (both depend on X)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 46 / 73 5 Small-sample properties A/ A classification of regression models

A classification of regression models [Pázman 1993]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 47 / 73 Observing at p different xi only (replications) makes the model intrinsically linear

5 Small-sample properties A/ A classification of regression models

Intrinsically linear models p ® The expectation surface Sη = {η(θ): θ ∈ R } is flat (plane) — intrinsic curvature ≡ 0 ® A reparameterization (continuously differentiable) exists that makes the model linear ∂2η(θ) ® PθH (θ) = H (θ), where H (θ) = ij ij ij ∂θi ∂θj

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 48 / 73 5 Small-sample properties A/ A classification of regression models

Intrinsically linear models p ® The expectation surface Sη = {η(θ): θ ∈ R } is flat (plane) — intrinsic curvature ≡ 0 ® A reparameterization (continuously differentiable) exists that makes the model linear ∂2η(θ) ® PθH (θ) = H (θ), where H (θ) = ij ij ij ∂θi ∂θj

Observing at p different xi only (replications) makes the model intrinsically linear

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 48 / 73 Linear models ® η(x, θ) = f>(x)θ + c(x) ® the model is intrinsically and parametrically linear

Flat models ® A reparameterization exists that makes the information matrix constant ® Riemannian curvature tensor ≡ 0 Rhijk (θ) = Thjik (θ) − Thkij (θ) ≡ 0 where  >  Thjik (θ) = [Hhj (θ)] [In − Pθ]Hik (θ) If all parameters but one appear linearly, then the model is flat

5 Small-sample properties A/ A classification of regression models

Parametrically linear models ® M(X, θ) = constant  ® PθHij (θ) = 0 — parametric curvature ≡ 0

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 49 / 73 Flat models ® A reparameterization exists that makes the information matrix constant ® Riemannian curvature tensor ≡ 0 Rhijk (θ) = Thjik (θ) − Thkij (θ) ≡ 0 where  >  Thjik (θ) = [Hhj (θ)] [In − Pθ]Hik (θ) If all parameters but one appear linearly, then the model is flat

5 Small-sample properties A/ A classification of regression models

Parametrically linear models ® M(X, θ) = constant  ® PθHij (θ) = 0 — parametric curvature ≡ 0

Linear models ® η(x, θ) = f>(x)θ + c(x) ® the model is intrinsically and parametrically linear

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 49 / 73 5 Small-sample properties A/ A classification of regression models

Parametrically linear models ® M(X, θ) = constant  ® PθHij (θ) = 0 — parametric curvature ≡ 0

Linear models ® η(x, θ) = f>(x)θ + c(x) ® the model is intrinsically and parametrically linear

Flat models ® A reparameterization exists that makes the information matrix constant ® Riemannian curvature tensor ≡ 0 Rhijk (θ) = Thjik (θ) − Thkij (θ) ≡ 0 where  >  Thjik (θ) = [Hhj (θ)] [In − Pθ]Hik (θ) If all parameters but one appear linearly, then the model is flat

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 49 / 73 5 Small-sample properties A/ A classification of regression models

A classification of regression models [Pázman 1993] (bis)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 50 / 73 Ex: η(x, θ) = x θ3, θ¯ = 0, all observations at the same x 6= 0

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

5 Small-sample properties B/ Density of the LS estimator

B/ Density of the LS estimator

2 Suppose ε ∼ N (0, σ In) Intrinsically linear models (in particular, repetitions at p points):

p/2 1/2 ˆn ¯ n det M(X,θ)  1 ¯ 2 → exact distribution θ ∼ q(θ|θ) = (2π)p/2 σp exp − 2σ2 kη(θ) − η(θ)k

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 51 / 73 Ex: η(x, θ) = x θ3, θ¯ = 0, all observations at the same x 6= 0

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

5 Small-sample properties B/ Density of the LS estimator B/ Density of the LS estimator

2 Suppose ε ∼ N (0, σ In) Intrinsically linear models (in particular, repetitions at p points):

p/2 1/2 ˆn ¯ n det M(X,θ)  1 ¯ 2 → exact distribution θ ∼ q(θ|θ) = (2π)p/2 σp exp − 2σ2 kη(θ) − η(θ)k

Ex: η(x, θ) = exp(−θx), θ¯ = 2, 15 observations at the same x = 1/2 (σ2 = 1)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 51 / 73 5 Small-sample properties B/ Density of the LS estimator

B/ Density of the LS estimator

2 Suppose ε ∼ N (0, σ In) Intrinsically linear models (in particular, repetitions at p points):

p/2 1/2 ˆn ¯ n det M(X,θ)  1 ¯ 2 → exact distribution θ ∼ q(θ|θ) = (2π)p/2 σp exp − 2σ2 kη(θ) − η(θ)k

Ex: η(x, θ) = x θ3, θ¯ = 0, all observations at the same x 6= 0

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 51 / 73 Remarks:

Other approximations (more complicated) for models with Rhijk (θ) 6≡ 0 (non-flat) Approximation of the density of the penalized LS estimator  2 arg minθ∈Θ ky − η(θ)k + 2w(θ) (which includes the case of Bayesian estimation) We also know the (approximate) marginal densities of the LS estimator θˆn [Pázman & P 1996]

5 Small-sample properties B/ Density of the LS estimator

Flat models: approximate density of θˆn ¯ q(θ|θ¯) = det[Q(θ,θ)] exp − 1 kP [η(θ) − η(θ¯)]k2 (2π)p/2 σp np/2 det1/2 M(X,θ) 2σ2 θ ¯ ¯ >  where {Q(θ, θ)}ij = {n M(X, θ)}ij + [η(θ) − η(θ)] [In − Pθ]Hij (θ)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 52 / 73 5 Small-sample properties B/ Density of the LS estimator

Flat models: approximate density of θˆn ¯ q(θ|θ¯) = det[Q(θ,θ)] exp − 1 kP [η(θ) − η(θ¯)]k2 (2π)p/2 σp np/2 det1/2 M(X,θ) 2σ2 θ ¯ ¯ >  where {Q(θ, θ)}ij = {n M(X, θ)}ij + [η(θ) − η(θ)] [In − Pθ]Hij (θ) Remarks:

Other approximations (more complicated) for models with Rhijk (θ) 6≡ 0 (non-flat) Approximation of the density of the penalized LS estimator  2 arg minθ∈Θ ky − η(θ)k + 2w(θ) (which includes the case of Bayesian estimation) We also know the (approximate) marginal densities of the LS estimator θˆn [Pázman & P 1996]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 52 / 73 « exact confidence regions at level α  p > 2 2 2 θ ∈ R : e (θ) Pθ e(θ)/σ < χp[1 − α] (if σ known) n > o p n−p e (θ) Pθ e(θ) 2 θ ∈ : > < Fp,n−p[1 − α] (if σ unknown) R p e (θ)[In−Pθ ] e(θ) (but they are not of minimum volume, maybe composed of disconnected subsets. . . )

« approximate confidence regions based on likelihood ratio (usually connected): n o p 2 ˆ 2 2 2 2 θ ∈ R : ke(θ)k − ke(θ)k < σ χp[1 − α] (if σ known) n o p 2 ˆ 2 p 2 θ ∈ R : ke(θ)k /ke(θ)k < 1 + n−p Fp,n−p[1 − α] (if σ unknown)

5 Small-sample properties C/ Confidence regions

C/ Confidence regions

2 Suppose ε ∼ N (0, σ In), define e(θ) = y − η(θ) > ¯ ¯ 2 2 e (θ) Pθ¯ e(θ)/σ ∼ χp > ¯ ¯ 2 2 e (θ)[In − Pθ¯] e(θ)/σ ∼ χn−p and they are independent

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 53 / 73 « approximate confidence regions based on likelihood ratio (usually connected): n o p 2 ˆ 2 2 2 2 θ ∈ R : ke(θ)k − ke(θ)k < σ χp[1 − α] (if σ known) n o p 2 ˆ 2 p 2 θ ∈ R : ke(θ)k /ke(θ)k < 1 + n−p Fp,n−p[1 − α] (if σ unknown)

5 Small-sample properties C/ Confidence regions

C/ Confidence regions

2 Suppose ε ∼ N (0, σ In), define e(θ) = y − η(θ) > ¯ ¯ 2 2 e (θ) Pθ¯ e(θ)/σ ∼ χp > ¯ ¯ 2 2 e (θ)[In − Pθ¯] e(θ)/σ ∼ χn−p and they are independent

« exact confidence regions at level α  p > 2 2 2 θ ∈ R : e (θ) Pθ e(θ)/σ < χp[1 − α] (if σ known) n > o p n−p e (θ) Pθ e(θ) 2 θ ∈ : > < Fp,n−p[1 − α] (if σ unknown) R p e (θ)[In−Pθ ] e(θ) (but they are not of minimum volume, maybe composed of disconnected subsets. . . )

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 53 / 73 5 Small-sample properties C/ Confidence regions

C/ Confidence regions

2 Suppose ε ∼ N (0, σ In), define e(θ) = y − η(θ) > ¯ ¯ 2 2 e (θ) Pθ¯ e(θ)/σ ∼ χp > ¯ ¯ 2 2 e (θ)[In − Pθ¯] e(θ)/σ ∼ χn−p and they are independent

« exact confidence regions at level α  p > 2 2 2 θ ∈ R : e (θ) Pθ e(θ)/σ < χp[1 − α] (if σ known) n > o p n−p e (θ) Pθ e(θ) 2 θ ∈ : > < Fp,n−p[1 − α] (if σ unknown) R p e (θ)[In−Pθ ] e(θ) (but they are not of minimum volume, maybe composed of disconnected subsets. . . )

« approximate confidence regions based on likelihood ratio (usually connected): n o p 2 ˆ 2 2 2 2 θ ∈ R : ke(θ)k − ke(θ)k < σ χp[1 − α] (if σ known) n o p 2 ˆ 2 p 2 θ ∈ R : ke(θ)k /ke(θ)k < 1 + n−p Fp,n−p[1 − α] (if σ unknown)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 53 / 73 b) (approximate or exact) density of θˆn e.g., minimize R kθ − θ¯k2q(θ|θ¯) dθ w.r.t. X (using , [Pázman & P 1992, Gauchi & Pázman 2006]) c) higher-order approximation of optimality criteria, 2 using ϕ(y|X, θ¯) = N (η(θ¯), σ In) minimize MSE R kθˆn(y) − θ¯k2ϕ(y|X, θ¯) dy [Clarke 1980] minimize entropy − R log[q(θˆn(y)|θ¯)]ϕ(y|X, θ¯) dy [P & Pázman 1994] (usual normal approximation → D-optimality) → explicit (but rather complicated) expressions (depend on 3rd-order derivatives of η(x, θ) w.r.t. θ)

5 Small-sample properties D/ Design based on small-sample properties

D/ Design based on small-sample properties

3 main ideas (exact design only) based on: a) (approximate) volume of (approximate) confidence regions (not necessarily of minimum volume) [Hamilton & Watts 1985; Vila 1990; Vila & Gauchi 2007] (ellipsoidal approximation → D-optimality)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 54 / 73 c) higher-order approximation of optimality criteria, 2 using ϕ(y|X, θ¯) = N (η(θ¯), σ In) minimize MSE R kθˆn(y) − θ¯k2ϕ(y|X, θ¯) dy [Clarke 1980] minimize entropy − R log[q(θˆn(y)|θ¯)]ϕ(y|X, θ¯) dy [P & Pázman 1994] (usual normal approximation → D-optimality) → explicit (but rather complicated) expressions (depend on 3rd-order derivatives of η(x, θ) w.r.t. θ)

5 Small-sample properties D/ Design based on small-sample properties

D/ Design based on small-sample properties

3 main ideas (exact design only) based on: a) (approximate) volume of (approximate) confidence regions (not necessarily of minimum volume) [Hamilton & Watts 1985; Vila 1990; Vila & Gauchi 2007] (ellipsoidal approximation → D-optimality) b) (approximate or exact) density of θˆn e.g., minimize R kθ − θ¯k2q(θ|θ¯) dθ w.r.t. X (using stochastic approximation, [Pázman & P 1992, Gauchi & Pázman 2006])

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 54 / 73 5 Small-sample properties D/ Design based on small-sample properties

D/ Design based on small-sample properties

3 main ideas (exact design only) based on: a) (approximate) volume of (approximate) confidence regions (not necessarily of minimum volume) [Hamilton & Watts 1985; Vila 1990; Vila & Gauchi 2007] (ellipsoidal approximation → D-optimality) b) (approximate or exact) density of θˆn e.g., minimize R kθ − θ¯k2q(θ|θ¯) dθ w.r.t. X (using stochastic approximation, [Pázman & P 1992, Gauchi & Pázman 2006]) c) higher-order approximation of optimality criteria, 2 using ϕ(y|X, θ¯) = N (η(θ¯), σ In) minimize MSE R kθˆn(y) − θ¯k2ϕ(y|X, θ¯) dy [Clarke 1980] minimize entropy − R log[q(θˆn(y)|θ¯)]ϕ(y|X, θ¯) dy [P & Pázman 1994] (usual normal approximation → D-optimality) → explicit (but rather complicated) expressions (depend on 3rd-order derivatives of η(x, θ) w.r.t. θ)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 54 / 73 5 Small-sample properties E/ One additional difficulty

E/ One additional difficulty

Overlapping of Sη, local minimizers. . .

4

3.5 C +

3

2.5

2

1.5

1 B + + 0.5 A

0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

L Important and difficult problem, often neglected!

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 55 / 73 L All approaches presented so far are local (the optimal design depends on θ¯ unknown ← θ0)

5 Small-sample properties E/ One additional difficulty

What can we do at the design stage? « extensions of usual optimality criteria, e.g.

kη(θ) − η(θ0)k2 maximize φeE (X) = min θ kθ − θ0k2 or R [η(x, θ) − η(x, θ0)]2 ξ(dx) maximize φeE (ξ) = min θ kθ − θ0k2

corresponds to E-optimal design if the model is linear (maximize λminM(ξ)), see Chap. 7 of [P & Pázman 2013], [Pázman & P, 2014]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 56 / 73 5 Small-sample properties E/ One additional difficulty

What can we do at the design stage? « extensions of usual optimality criteria, e.g.

kη(θ) − η(θ0)k2 maximize φeE (X) = min θ kθ − θ0k2 or R [η(x, θ) − η(x, θ0)]2 ξ(dx) maximize φeE (ξ) = min θ kθ − θ0k2

corresponds to E-optimal design if the model is linear (maximize λminM(ξ)), see Chap. 7 of [P & Pázman 2013], [Pázman & P, 2014]

L All approaches presented so far are local (the optimal design depends on θ¯ unknown ← θ0)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 56 / 73 ‚ Average optimum design: maximize Eθ{φ(X, θ)} (or Eθ{φ(ξ, θ)}) ƒ Maximin optimum design: maximize minθ{φ(X, θ)} (or minθ{φ(ξ, θ)}) « Between ‚ and ƒ: regularized maximin criteria, quantiles and probability level criteria „ Sequential design

6 Nonlocal optimum design 6 Nonlocal optimum design

Ex: η(x, θ) = exp(−θx), yi = η(xi , θ¯) + εi , θ > 0, x ∈ X = [0, ∞) M(ξ,θ 0) = R x 2 exp(−2θ0x) ξ(dx) X ∗ ∗ « ξD = ξA = ... = δ1/θ0 Objective: remove the dependence in nominal value θ0 3 main classes of methods (related)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 57 / 73 ƒ Maximin optimum design: maximize minθ{φ(X, θ)} (or minθ{φ(ξ, θ)}) « Between ‚ and ƒ: regularized maximin criteria, quantiles and probability level criteria „ Sequential design

6 Nonlocal optimum design 6 Nonlocal optimum design

Ex: η(x, θ) = exp(−θx), yi = η(xi , θ¯) + εi , θ > 0, x ∈ X = [0, ∞) M(ξ,θ 0) = R x 2 exp(−2θ0x) ξ(dx) X ∗ ∗ « ξD = ξA = ... = δ1/θ0 Objective: remove the dependence in nominal value θ0 3 main classes of methods (related) ‚ Average optimum design: maximize Eθ{φ(X, θ)} (or Eθ{φ(ξ, θ)})

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 57 / 73 „ Sequential design

6 Nonlocal optimum design 6 Nonlocal optimum design

Ex: η(x, θ) = exp(−θx), yi = η(xi , θ¯) + εi , θ > 0, x ∈ X = [0, ∞) M(ξ,θ 0) = R x 2 exp(−2θ0x) ξ(dx) X ∗ ∗ « ξD = ξA = ... = δ1/θ0 Objective: remove the dependence in nominal value θ0 3 main classes of methods (related) ‚ Average optimum design: maximize Eθ{φ(X, θ)} (or Eθ{φ(ξ, θ)}) ƒ Maximin optimum design: maximize minθ{φ(X, θ)} (or minθ{φ(ξ, θ)}) « Between ‚ and ƒ: regularized maximin criteria, quantiles and probability level criteria

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 57 / 73 6 Nonlocal optimum design 6 Nonlocal optimum design

Ex: η(x, θ) = exp(−θx), yi = η(xi , θ¯) + εi , θ > 0, x ∈ X = [0, ∞) M(ξ,θ 0) = R x 2 exp(−2θ0x) ξ(dx) X ∗ ∗ « ξD = ξA = ... = δ1/θ0 Objective: remove the dependence in nominal value θ0 3 main classes of methods (related) ‚ Average optimum design: maximize Eθ{φ(X, θ)} (or Eθ{φ(ξ, θ)}) ƒ Maximin optimum design: maximize minθ{φ(X, θ)} (or minθ{φ(ξ, θ)}) « Between ‚ and ƒ: regularized maximin criteria, quantiles and probability level criteria „ Sequential design

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 57 / 73 Approximate design theory: φAO(·) is concave when each φ(·, θ) is concave « same properties and same algorithms as in Section 3 for design measures

Exact design: same algorithms as in Section 3 (for continuous distributions µ use stochastic approximation to avoid evaluations of integrals [P & Walter 1985])

6 Nonlocal optimum design A/ Average Optimum design

A/ Average Optimum design Nothing special: probability measure µ(dθ) on Θ ⊆ Rp 0 R φ(·, θ ) → φAO(·) = Θ φ(·, θ) µ(dθ) (1) (M) PM (i) No difficulty if Θ = {θ , . . . θ } finite and µ = i=1 αi δθ (integral → finite sum)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 58 / 73 Exact design: same algorithms as in Section 3 (for continuous distributions µ use stochastic approximation to avoid evaluations of integrals [P & Walter 1985])

6 Nonlocal optimum design A/ Average Optimum design

A/ Average Optimum design Nothing special: probability measure µ(dθ) on Θ ⊆ Rp 0 R φ(·, θ ) → φAO(·) = Θ φ(·, θ) µ(dθ) (1) (M) PM (i) No difficulty if Θ = {θ , . . . θ } finite and µ = i=1 αi δθ (integral → finite sum)

Approximate design theory: φAO(·) is concave when each φ(·, θ) is concave « same properties and same algorithms as in Section 3 for design measures

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 58 / 73 6 Nonlocal optimum design A/ Average Optimum design

A/ Average Optimum design Nothing special: probability measure µ(dθ) on Θ ⊆ Rp 0 R φ(·, θ ) → φAO(·) = Θ φ(·, θ) µ(dθ) (1) (M) PM (i) No difficulty if Θ = {θ , . . . θ } finite and µ = i=1 αi δθ (integral → finite sum)

Approximate design theory: φAO(·) is concave when each φ(·, θ) is concave « same properties and same algorithms as in Section 3 for design measures

Exact design: same algorithms as in Section 3 (for continuous distributions µ use stochastic approximation to avoid evaluations of integrals [P & Walter 1985])

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 58 / 73 If the experiment informative enough (σ2 small, n large enough): maximizing I(X) ⇔∼ maximizing R log det M(X, θ) π(θ) dθ

6 Nonlocal optimum design A/ Average Optimum design

A Bayesian interpretation: Suppose µ=prior distribution has a density π(θ) R → entropy − Θ π(θ) log[π(θ)] dθ

ϕ(y|X,θ)π(θ) Posterior distribution of θ: π(θ|X, y)= ϕ(y|X) Gain in information = decrease of entropy Entropy may increase, but expected gain in information I(X) is always positive [Lindley 1956] R « I(X) = Ey{ Θ(π(θ|X, y) log[π(θ|X, y)] − π(θ) log[π(θ)]) dθ} where the expectation Ey{·} is for the marginal ϕ(y|X)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 59 / 73 6 Nonlocal optimum design A/ Average Optimum design

A Bayesian interpretation: Suppose µ=prior distribution has a density π(θ) R → entropy − Θ π(θ) log[π(θ)] dθ

ϕ(y|X,θ)π(θ) Posterior distribution of θ: π(θ|X, y)= ϕ(y|X) Gain in information = decrease of entropy Entropy may increase, but expected gain in information I(X) is always positive [Lindley 1956] R « I(X) = Ey{ Θ(π(θ|X, y) log[π(θ|X, y)] − π(θ) log[π(θ)]) dθ} where the expectation Ey{·} is for the marginal ϕ(y|X) If the experiment informative enough (σ2 small, n large enough): maximizing I(X) ⇔∼ maximizing R log det M(X, θ) π(θ) dθ

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 59 / 73 πν , ν uniform on X = [0, 2] 1

0.9

0.8

0.7

0.6 ) θ 0.5 (x, η

0.4

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x

Ex: η(x, θ) = exp(−θx), α-quantiles of η(x, θ) for different π π uniform on Θ = [1, 10] 1

0.9

0.8

0.7

0.6 ) θ 0.5 (x, η

0.4

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x

det1/2 M(ν,θ) πν (θ) = → uniform distribution of responses η(·, θ) (for the R det1/2 M(ν,θ) dθ Θ metric defined by ν) [Bornkamp 2011]

6 Nonlocal optimum design A/ Average Optimum design Which prior π(θ)? Expected gain in information maximum when π(·) = noninformative prior (Jeffrey) 1/2 π∗(θ) = det M(ξ,θ) « maximize R det1/2 M(ξ, θ) dθ R det1/2 M(ξ,θ) dθ Θ

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 60 / 73 πν , ν uniform on X = [0, 2] 1

0.9

0.8

0.7

0.6 ) θ 0.5 (x, η

0.4

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x

Ex: η(x, θ) = exp(−θx), α-quantiles of η(x, θ) for different π π uniform on Θ = [1, 10] 1

0.9

0.8

0.7

0.6 ) θ 0.5 (x, η

0.4

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x

6 Nonlocal optimum design A/ Average Optimum design Which prior π(θ)? Expected gain in information maximum when π(·) = noninformative prior (Jeffrey) 1/2 π∗(θ) = det M(ξ,θ) « maximize R det1/2 M(ξ, θ) dθ R det1/2 M(ξ,θ) dθ Θ det1/2 M(ν,θ) πν (θ) = → uniform distribution of responses η(·, θ) (for the R det1/2 M(ν,θ) dθ Θ metric defined by ν) [Bornkamp 2011]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 60 / 73 πν , ν uniform on X = [0, 2] 1

0.9

0.8

0.7

0.6 ) θ 0.5 (x, η

0.4

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x

6 Nonlocal optimum design A/ Average Optimum design Which prior π(θ)? Expected gain in information maximum when π(·) = noninformative prior (Jeffrey) 1/2 π∗(θ) = det M(ξ,θ) « maximize R det1/2 M(ξ, θ) dθ R det1/2 M(ξ,θ) dθ Θ det1/2 M(ν,θ) πν (θ) = → uniform distribution of responses η(·, θ) (for the R det1/2 M(ν,θ) dθ Θ metric defined by ν) [Bornkamp 2011] Ex: η(x, θ) = exp(−θx), α-quantiles of η(x, θ) for different π π uniform on Θ = [1, 10] 1

0.9

0.8

0.7

0.6 ) θ 0.5 (x, η

0.4

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 60 / 73 6 Nonlocal optimum design A/ Average Optimum design Which prior π(θ)? Expected gain in information maximum when π(·) = noninformative prior (Jeffrey) 1/2 π∗(θ) = det M(ξ,θ) « maximize R det1/2 M(ξ, θ) dθ R det1/2 M(ξ,θ) dθ Θ det1/2 M(ν,θ) πν (θ) = → uniform distribution of responses η(·, θ) (for the R det1/2 M(ν,θ) dθ Θ metric defined by ν) [Bornkamp 2011] Ex: η(x, θ) = exp(−θx), α-quantiles of η(x, θ) for different π

π uniform on Θ = [1, 10] πν , ν uniform on X = [0, 2] 1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6 ) ) θ θ 0.5 0.5 (x, (x, η η

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x x

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 60 / 73 Approximate design: φMmO(·) concave when each φ(·, θ) is concave L but φMmO(·) is non-differentiable! « maximize φMmO(ξ) using a specific algorithm for concave non-differentiable maximization (cutting plane, level method. . . see Section 3)

6 Nonlocal optimum design B/ Maximin Optimum design

B/ Maximin Optimum design

0 φ(·, θ ) → φMmO(·) = minθ∈Θ φ(·, θ) Exact design: Θ finite → same algorithms as in Section 3 Θ compact subset of Rp → relaxation method to solve a sequence of maximin problems with finite (and growing) sets Θ(k) [P & Walter 1988]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 61 / 73 6 Nonlocal optimum design B/ Maximin Optimum design

B/ Maximin Optimum design

0 φ(·, θ ) → φMmO(·) = minθ∈Θ φ(·, θ) Exact design: Θ finite → same algorithms as in Section 3 Θ compact subset of Rp → relaxation method to solve a sequence of maximin problems with finite (and growing) sets Θ(k) [P & Walter 1988] Approximate design: φMmO(·) concave when each φ(·, θ) is concave L but φMmO(·) is non-differentiable! « maximize φMmO(ξ) using a specific algorithm for concave non-differentiable maximization (cutting plane, level method. . . see Section 3)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 61 / 73 Equivalence Theorem: ∗ ∗ ξ maximizes φMmO(ξ) ⇔ maxν∈Ξ FφMmO (ξ ; ν) ≤ 0 ∗ ⇔ maxν∈Ξ minθ∈Θ(ξ∗) Fφ(ξ ; ν, θ) ≤ 0 with Θ(ξ) = {θ : φ(ξ, θ) = φMmO(ξ)} R ∗ ∗ ⇔ maxx∈X Θ(ξ∗) Fφ(ξ ; δx , θ)µ (dθ) ≤ 0 for some probability measure µ∗ on Θ(ξ∗)

Once ξ∗ is determined, solve a LP problem: ∗ ∗ R ∗ µ on Θ(ξ ) minimizes maxx∈X Θ(ξ∗) Fφ(ξ ; δx , θ)µ(dθ) R ∗ ∗ « plot Θ(ξ∗) Fφ(ξ ; δx , θ)µ (dθ) (should be ≤ 0)

6 Nonlocal optimum design B/ Maximin Optimum design

How to check optimality of ξ∗? 0 ∗ 0 φ(·, θ ) differentiable: maxx∈X Fφ(ξ ; δx , θ ) ≤ 0 ? ∗ 0 → plot Fφ(ξ ; δx , θ ) as a function of x

∗ φMmO(·) not differentiable: maxν∈Ξ FφMmO (ξ ; ν) ≤ 0 cannot be exploited directly

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 62 / 73 Once ξ∗ is determined, solve a LP problem: ∗ ∗ R ∗ µ on Θ(ξ ) minimizes maxx∈X Θ(ξ∗) Fφ(ξ ; δx , θ)µ(dθ) R ∗ ∗ « plot Θ(ξ∗) Fφ(ξ ; δx , θ)µ (dθ) (should be ≤ 0)

6 Nonlocal optimum design B/ Maximin Optimum design

How to check optimality of ξ∗? 0 ∗ 0 φ(·, θ ) differentiable: maxx∈X Fφ(ξ ; δx , θ ) ≤ 0 ? ∗ 0 → plot Fφ(ξ ; δx , θ ) as a function of x

∗ φMmO(·) not differentiable: maxν∈Ξ FφMmO (ξ ; ν) ≤ 0 cannot be exploited directly Equivalence Theorem: ∗ ∗ ξ maximizes φMmO(ξ) ⇔ maxν∈Ξ FφMmO (ξ ; ν) ≤ 0 ∗ ⇔ maxν∈Ξ minθ∈Θ(ξ∗) Fφ(ξ ; ν, θ) ≤ 0 with Θ(ξ) = {θ : φ(ξ, θ) = φMmO(ξ)} R ∗ ∗ ⇔ maxx∈X Θ(ξ∗) Fφ(ξ ; δx , θ)µ (dθ) ≤ 0 for some probability measure µ∗ on Θ(ξ∗)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 62 / 73 6 Nonlocal optimum design B/ Maximin Optimum design

How to check optimality of ξ∗? 0 ∗ 0 φ(·, θ ) differentiable: maxx∈X Fφ(ξ ; δx , θ ) ≤ 0 ? ∗ 0 → plot Fφ(ξ ; δx , θ ) as a function of x

∗ φMmO(·) not differentiable: maxν∈Ξ FφMmO (ξ ; ν) ≤ 0 cannot be exploited directly Equivalence Theorem: ∗ ∗ ξ maximizes φMmO(ξ) ⇔ maxν∈Ξ FφMmO (ξ ; ν) ≤ 0 ∗ ⇔ maxν∈Ξ minθ∈Θ(ξ∗) Fφ(ξ ; ν, θ) ≤ 0 with Θ(ξ) = {θ : φ(ξ, θ) = φMmO(ξ)} R ∗ ∗ ⇔ maxx∈X Θ(ξ∗) Fφ(ξ ; δx , θ)µ (dθ) ≤ 0 for some probability measure µ∗ on Θ(ξ∗)

Once ξ∗ is determined, solve a LP problem: ∗ ∗ R ∗ µ on Θ(ξ ) minimizes maxx∈X Θ(ξ∗) Fφ(ξ ; δx , θ)µ(dθ) R ∗ ∗ « plot Θ(ξ∗) Fφ(ξ ; δx , θ)µ (dθ) (should be ≤ 0)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 62 / 73 R ∗ ∗ Θ(ξ∗) Fφ(ξ ; δx , θ)µ (dθ) for

θ2max = 2 (solid line, 2 support points) and

θ2max = 20 (dashed line, 4 support points)

0.01

0

−0.01

−0.02

−0.03

−0.04

−0.05

−0.06 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

6 Nonlocal optimum design B/ Maximin Optimum design

Ex: η(x, θ) = θ1 exp(−θ2x), p = 2, X = [0, 2], θ2 ∈ [0, θ2max ] det1/p M(ξ,θ) φ(ξ, θ) = 1/p ∗ (D efficiency, ∈ [0, 1]) det M(ξD ,θ)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 63 / 73 6 Nonlocal optimum design B/ Maximin Optimum design

Ex: η(x, θ) = θ1 exp(−θ2x), p = 2, X = [0, 2], θ2 ∈ [0, θ2max ] det1/p M(ξ,θ) φ(ξ, θ) = 1/p ∗ (D efficiency, ∈ [0, 1]) det M(ξD ,θ) R ∗ ∗ Θ(ξ∗) Fφ(ξ ; δx , θ)µ (dθ) for

θ2max = 2 (solid line, 2 support points) and

θ2max = 20 (dashed line, 4 support points)

0.01

0

−0.01

−0.02

−0.03

−0.04

−0.05

−0.06 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 63 / 73 1

0.9 q=−0.8

0.8

0.7 Ex: η = exp(−θx) q=0.4 0.6 M(ξ,θ) q=40 φ(ξ, θ) = ∗ (= efficiency) M(ξ ,θ) 0.5

Plot of φq(δx ) function of x 0.4

0.3

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

6 Nonlocal optimum design C/ Regularized Maximin Optimum design C/ Regularized Maximin Optimum design Suppose φ(·, θ) > 0 for all θ; µ a probability measure on Θ ⊂ Rp compact 1 R −q  q φMmO(ξ)= minθ∈Θ φ(·, θ) ≤ φq(ξ) = Θ φ (ξ, θ)µ(dθ) (differentiable) R with φ−1(ξ) = φAO(ξ), φ0(ξ) = exp Θ log[φ(ξ, θ)]µ(dθ) and

φq(ξ) → φMmO(ξ) as q → ∞ (and φq(·) concave for q ≥ −1) P δ φ (ξ∗) (1) (M) i θ(i) MmO q −1/q Moreover, Θ = {θ , . . . , θ }, µ = =⇒ ∗ ≥ M M φMmO

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 64 / 73 6 Nonlocal optimum design C/ Regularized Maximin Optimum design C/ Regularized Maximin Optimum design Suppose φ(·, θ) > 0 for all θ; µ a probability measure on Θ ⊂ Rp compact 1 R −q  q φMmO(ξ)= minθ∈Θ φ(·, θ) ≤ φq(ξ) = Θ φ (ξ, θ)µ(dθ) (differentiable) R with φ−1(ξ) = φAO(ξ), φ0(ξ) = exp Θ log[φ(ξ, θ)]µ(dθ) and

φq(ξ) → φMmO(ξ) as q → ∞ (and φq(·) concave for q ≥ −1) P δ φ (ξ∗) (1) (M) i θ(i) MmO q −1/q Moreover, Θ = {θ , . . . , θ }, µ = =⇒ ∗ ≥ M M φMmO

1

0.9 q=−0.8

0.8

0.7 Ex: η = exp(−θx) q=0.4 0.6 M(ξ,θ) q=40 φ(ξ, θ) = ∗ (= efficiency) M(ξ ,θ) 0.5

Plot of φq(δx ) function of x 0.4

0.3

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 64 / 73 0.25

0.2

P (ξ)=1−α u given → u )

θ 0.15 ;

Pu(ξ) = µ{φ(ξ, θ) ≥ u} ξ ( φ

0.1

α ∈ (0, 1) given → p.d.f. of Qα(ξ) = max{u : Pu(ξ) ≥ 1−α}

0.05 α

0 0 1 2 3 4 5 6 7 8 9 10 ξ u=Qα( )

6 Nonlocal optimum design D/ Quantiles and probability level criteria

D/ Quantiles and probability level criteria ∗ A/ ξAO good for µ(dθ) on Θ, may be bad for some θ L for ψ(·) %, maximizing ψ R φ(·, θ) µ(dθ) (AO-opt.) Θ R is different from maximizing Θ ψ[φ(·, θ)] µ(dθ) ∗ B/ ξMmO often depends on the boundary of Θ 0 ( we often simply replace the dependence on θ by a dependence on θmax)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 65 / 73 6 Nonlocal optimum design D/ Quantiles and probability level criteria

D/ Quantiles and probability level criteria ∗ A/ ξAO good for µ(dθ) on Θ, may be bad for some θ L for ψ(·) %, maximizing ψ R φ(·, θ) µ(dθ) (AO-opt.) Θ R is different from maximizing Θ ψ[φ(·, θ)] µ(dθ) ∗ B/ ξMmO often depends on the boundary of Θ 0 ( we often simply replace the dependence on θ by a dependence on θmax)

0.25

0.2

P (ξ)=1−α u given → u )

θ 0.15 ;

Pu(ξ) = µ{φ(ξ, θ) ≥ u} ξ ( φ

0.1

α ∈ (0, 1) given → p.d.f. of Qα(ξ) = max{u : Pu(ξ) ≥ 1−α}

0.05 α

0 0 1 2 3 4 5 6 7 8 9 10 ξ u=Qα( )

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 65 / 73 0.045

0.04

0.035

α=0.5 Ex: η = exp(−θx) 0.03

φ(ξ, θ) = M(ξ, θ) 0.025 Plot of Q (δ ) function of x α x 0.02 (with φAO(δx ) and φMmO(δx )) 0.015

α=0.1 0.01

0.005 α=0.05

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

6 Nonlocal optimum design D/ Quantiles and probability level criteria

® maximizing Pu(ξ) = µ{φ(ξ, θ) ≥ u} is well adapted to φ(ξ, θ) = efficiency ∈ (0, 1) ® Qα(ξ) → φMMO as α → 0 ® for ψ(·) %, using ψ[φ(ξ, θ)] does not change Pu(ξ) and Qα(ξ) L Pu(ξ) and Qα(ξ) generally not concave! (but we can compute directional derivatives and maximize)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 66 / 73 6 Nonlocal optimum design D/ Quantiles and probability level criteria

® maximizing Pu(ξ) = µ{φ(ξ, θ) ≥ u} is well adapted to φ(ξ, θ) = efficiency ∈ (0, 1) ® Qα(ξ) → φMMO as α → 0 ® for ψ(·) %, using ψ[φ(ξ, θ)] does not change Pu(ξ) and Qα(ξ) L Pu(ξ) and Qα(ξ) generally not concave! (but we can compute directional derivatives and maximize)

0.045

0.04

0.035

α=0.5 Ex: η = exp(−θx) 0.03

φ(ξ, θ) = M(ξ, θ) 0.025 Plot of Q (δ ) function of x α x 0.02 (with φAO(δx ) and φMmO(δx )) 0.015

α=0.1 0.01

0.005 α=0.05

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 66 / 73 Replace unknown θ by best current guess θˆk (there exist variants with Bayesian estimation and average optimality)

L Consistency of θˆn? Asymptotic normality (for design based on M)? (X k depends on y1,..., yk−1 =⇒ independence is lost)

6 Nonlocal optimum design E/ Sequential design

E/ Sequential design 0 1 0 θ → design: X = arg maxX φ(X, θ ) → observe: y1 = y1(X 1) 1 1 1 → estimate: θˆ = arg minθ J(θ; y , X ) 2 1 1 → design: X = arg maxX φ({X , X}, θˆ ) → observe: y2 = y2(X 2) 2 1 2 1 2 → estimate: θˆ = arg minθ J(θ; { y , y }, {X , X }) | {z } | {z } growing growing 3 1 2 2 → design: X = arg maxX φ({X , X , X}, θˆ ) . . . etc.

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 67 / 73 L Consistency of θˆn? Asymptotic normality (for design based on M)? (X k depends on y1,..., yk−1 =⇒ independence is lost)

6 Nonlocal optimum design E/ Sequential design

E/ Sequential design 0 1 0 θ → design: X = arg maxX φ(X, θ ) → observe: y1 = y1(X 1) 1 1 1 → estimate: θˆ = arg minθ J(θ; y , X ) 2 1 1 → design: X = arg maxX φ({X , X}, θˆ ) → observe: y2 = y2(X 2) 2 1 2 1 2 → estimate: θˆ = arg minθ J(θ; { y , y }, {X , X }) | {z } | {z } growing growing 3 1 2 2 → design: X = arg maxX φ({X , X , X}, θˆ ) . . . etc.

Replace unknown θ by best current guess θˆk (there exist variants with Bayesian estimation and average optimality)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 67 / 73 6 Nonlocal optimum design E/ Sequential design

E/ Sequential design 0 1 0 θ → design: X = arg maxX φ(X, θ ) → observe: y1 = y1(X 1) 1 1 1 → estimate: θˆ = arg minθ J(θ; y , X ) 2 1 1 → design: X = arg maxX φ({X , X}, θˆ ) → observe: y2 = y2(X 2) 2 1 2 1 2 → estimate: θˆ = arg minθ J(θ; { y , y }, {X , X }) | {z } | {z } growing growing 3 1 2 2 → design: X = arg maxX φ({X , X , X}, θˆ ) . . . etc.

Replace unknown θ by best current guess θˆk (there exist variants with Bayesian estimation and average optimality)

L Consistency of θˆn? Asymptotic normality (for design based on M)? (X k depends on y1,..., yk−1 =⇒ independence is lost)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 67 / 73 If n observation in total, two stages only:√ size of first batch? → should be proportional to n (it does not say much . . . )

k « Full sequential design: X = {xk } (batches of size 1) → convergence properties difficult to investigate

ˆk k ˆk 1 ∂η(xk+1, θ) ∂η(xk+1, θ) M(Xk+1, θ ) = M(Xk , θ ) + k k k + 1 k + 1 ∂θ θˆ ∂θ> θˆ

k ˆk 1 with xk+1 = arg maxX Fφ(ξ ; δx |θ ) ⇔ Wynn’s algorithm [1970] with αk = k+1 ® some CV results for Bayesian estimation [Hu 1998] ® no general CV results for LS and ML estimation some results when X is finite (X = {x (1),..., x (`)}) [P 2009, 2010]

6 Nonlocal optimum design E/ Sequential design

« No problem if each X i has size ≥ p = dim(θ) (batch sequential design)

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 68 / 73 k « Full sequential design: X = {xk } (batches of size 1) → convergence properties difficult to investigate

ˆk k ˆk 1 ∂η(xk+1, θ) ∂η(xk+1, θ) M(Xk+1, θ ) = M(Xk , θ ) + k k k + 1 k + 1 ∂θ θˆ ∂θ> θˆ

k ˆk 1 with xk+1 = arg maxX Fφ(ξ ; δx |θ ) ⇔ Wynn’s algorithm [1970] with αk = k+1 ® some CV results for Bayesian estimation [Hu 1998] ® no general CV results for LS and ML estimation some results when X is finite (X = {x (1),..., x (`)}) [P 2009, 2010]

6 Nonlocal optimum design E/ Sequential design

« No problem if each X i has size ≥ p = dim(θ) (batch sequential design) If n observation in total, two stages only:√ size of first batch? → should be proportional to n (it does not say much . . . )

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 68 / 73 ® some CV results for Bayesian estimation [Hu 1998] ® no general CV results for LS and ML estimation some results when X is finite (X = {x (1),..., x (`)}) [P 2009, 2010]

6 Nonlocal optimum design E/ Sequential design

« No problem if each X i has size ≥ p = dim(θ) (batch sequential design) If n observation in total, two stages only:√ size of first batch? → should be proportional to n (it does not say much . . . )

k « Full sequential design: X = {xk } (batches of size 1) → convergence properties difficult to investigate

ˆk k ˆk 1 ∂η(xk+1, θ) ∂η(xk+1, θ) M(Xk+1, θ ) = M(Xk , θ ) + k k k + 1 k + 1 ∂θ θˆ ∂θ> θˆ

k ˆk 1 with xk+1 = arg maxX Fφ(ξ ; δx |θ ) ⇔ Wynn’s algorithm [1970] with αk = k+1

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 68 / 73 6 Nonlocal optimum design E/ Sequential design

« No problem if each X i has size ≥ p = dim(θ) (batch sequential design) If n observation in total, two stages only:√ size of first batch? → should be proportional to n (it does not say much . . . )

k « Full sequential design: X = {xk } (batches of size 1) → convergence properties difficult to investigate

ˆk k ˆk 1 ∂η(xk+1, θ) ∂η(xk+1, θ) M(Xk+1, θ ) = M(Xk , θ ) + k k k + 1 k + 1 ∂θ θˆ ∂θ> θˆ

k ˆk 1 with xk+1 = arg maxX Fφ(ξ ; δx |θ ) ⇔ Wynn’s algorithm [1970] with αk = k+1 ® some CV results for Bayesian estimation [Hu 1998] ® no general CV results for LS and ML estimation some results when X is finite (X = {x (1),..., x (`)}) [P 2009, 2010]

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 68 / 73 References ReferencesI

Atkinson, A., Cox, D., 1974. Planning experiments for discriminating between models (with discussion). Journal of Royal Statistical Society B36, 321–348. Atkinson, A., Fedorov, V., 1975. The design of experiments for discriminating between two rival models. Biometrika 62 (1), 57–70. Atwood, C., 1973. Sequences converging to D-optimal designs of experiments. Annals of Statistics 1 (2), 342–352. Bates, D., Watts, D., 1980. Relative curvature measures of nonlinearity. Journal of Royal Statistical Society B42, 1–25. Böhning, D., 1985. Numerical estimation of a probability measure. Journal of Statistical Planning and Inference 11, 57–69. Böhning, D., 1986. A vertex-exchange-method in D-optimal design theory. Metrika 33, 337–347. Box, G., Hill, W., 1967. Discrimination among mechanistic models. Technometrics 9 (1), 57–71. Chernoff, H., 1953. Locally optimal designs for estimating parameters. Annals of Math. Stat. 24, 586–602. Clarke, G., 1980. Moments of the least-squares estimators in a non-linear regression model. Journal of Royal Statistical Society B42, 227–237. D’Argenio, D., 1981. Optimal sampling times for pharmacokinetic experiments. Journal of Pharmacokinetics and Biopharmaceutics 9 (6), 739–756. Dette, H., Melas, V., 2011. A note on de la Garza phenomenon for locally optimal designs. Annals of Statistics 39 (2), 1266–1281. Fedorov, V., 1972. Theory of Optimal Experiments. Academic Press, New York. Fedorov, V., Leonov, S., 2014. Optimal Design for Nonlinear Response Models. CRC Press, Boca Raton.

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 69 / 73 References ReferencesII

Fisher, R., 1925. Statistical Methods for Research Workers. Oliver & Boyd, Edimbourgh. Gauchi, J.-P., Pázman, A., 2006. Designs in nonlinear regression by stochastic minimization of functionnals of the square error matrix. Journal of Statistical Planning and Inference 136, 1135–1152. Goodwin, G., Payne, R., 1977. Dynamic System Identification: Experiment Design and Analysis. Academic Press, New York. Hamilton, D., Watts, D., 1985. A quadratic design criterion for precise estimation in nonlinear regression models. Technometrics 27, 241–250. Hill, P., 1978. A review of experimental design procedures for regression model discrimination. Technometrics 20, 15–21. Hu, I., 1998. On sequential designs in nonlinear problems. Biometrika 85 (2), 496–503. Kelley, J., 1960. The cutting plane method for solving convex programs. SIAM Journal 8, 703–712. Kiefer, J., Wolfowitz, J., 1960. The equivalence of two extremum problems. Canadian Journal of Mathematics 12, 363–366. Ljung, L., 1987. System Identification, Theory for the User. Prentice-Hall, Englewood Cliffs. Mitchell, T., 1974. An algorithm for the construction of “D-optimal” experimental designs. Technometrics 16, 203–210. Nesterov, Y., 2004. Introductory Lectures to Convex Optimization: A Basic Course. Kluwer, Dordrecht. Niederreiter, H., 1992. Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia. Pázman, A., 1986. Foundations of Optimum Experimental Design. Reidel (Kluwer group), Dordrecht (co-pub. VEDA, Bratislava). Pázman, A., 1993. Nonlinear Statistical Models. Kluwer, Dordrecht.

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 70 / 73 References ReferencesIII

Pázman, A., Pronzato, L., 1992. Nonlinear experimental design based on the distribution of estimators. Journal of Statistical Planning and Inference 33, 385–402. Pázman, A., Pronzato, L., 1996. A Dirac function method for densities of nonlinear statistics and for marginal densities in nonlinear regression. Statistics & Probability Letters 26, 159–167. Pázman, A., Pronzato, L., 2014. Optimum design accounting for the global nonlinear behavior of the model. Annals of Statistics 42 (4), 1426–1451. Pronzato, L., 2009. Asymptotic properties of nonlinear estimates in stochastic models with finite design space. Statistics & Probability Letters 79, 2307–2313. Pronzato, L., 2010. One-step ahead adaptive D-optimal design on a finite design space is asymptotically optimal. Metrika 71 (2), 219–238, ( DOI: 10.1007/s00184-008-0227-y). Pronzato, L., Pázman, A., 1994. Second-order approximation of the entropy in nonlinear least-squares estimation. Kybernetika 30 (2), 187–198, Erratum 32(1):104, 1996. Pronzato, L., Pázman, A., 2013. Design of Experiments in Nonlinear Models. Asymptotic Normality, Optimality Criteria and Small-Sample Properties. Springer, LNS 212, New York. Pronzato, L., Walter, E., 1985. Robust experiment design via stochastic approximation. Mathematical Biosciences 75, 103–120. Pronzato, L., Walter, E., 1988. Robust experiment design via maximin optimization. Mathematical Biosciences 89, 161–176. Pronzato, L., Zhigljavsky, A., 2014. Algorithmic construction of optimal designs on compact sets for concave and differentiable criteria. Journal of Statistical Planning and Inference 154, 141–155. Pukelsheim, F., 1993. Optimal Experimental Design. Wiley, New York. Pukelsheim, F., Reider, S., 1992. Efficient rounding of approximate designs. Biometrika 79 (4), 763–770.

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 71 / 73 References ReferencesIV

Schwabe, R., 1995. Designing experiments for additive nonlinear models. In: Kitsos, C., Müller, W. (Eds.), MODA4 – Advances in Model-Oriented Data Analysis, Spetses (Greece), june 1995. Physica Verlag, Heidelberg, pp. 77–85. Silvey, S., 1980. Optimal Design. Chapman & Hall, London. Titterington, D., 1976. Algorithms for computing D-optimal designs on a finite design space. In: Proc. of the 1976 Conference on Information Science and Systems. Dept. of Electronic Engineering, John Hopkins University, Baltimore, pp. 213–216. Torsney, B., 1983. A moment inequality and monotonicity of an algorithm. In: Kortanek, K., Fiacco, A. (Eds.), Proc. Int. Symp. on Semi-infinite Programming and Applications. Springer, Heidelberg, pp. 249–260. Torsney, B., 2009. W-iterations and ripples therefrom. In: Pronzato, L., Zhigljavsky, A. (Eds.), Optimal Design and Related Areas in Optimization and Statistics. Springer, Ch. 1, pp. 1–12. Vila, J.-P., 1990. Exact experimental designs via stochastic optimization for nonlinear regression models. In: Proc. Compstat, Int. Assoc. for Statistical Computing. Physica Verlag, Heidelberg, pp. 291–296. Vila, J.-P., Gauchi, J.-P., 2007. Optimal designs based on exact confidence regions for parameter estimation of a nonlinear regression model. Journal of Statistical Planning and Inference 137, 2935–2953. Walter, E., Pronzato, L., 1994. Identification de Modèles Paramétriques à Partir de Données Expérimentales. Masson, Paris, 371 pages. Walter, E., Pronzato, L., 1997. Identification of Parametric Models from Experimental Data. Springer, Heidelberg. Welch, W., 1982. Branch-and-bound search for experimental designs based on D-optimality and other criteria. Technometrics 24 (1), 41–28.

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 72 / 73 References ReferencesV

Wu, C., 1978. Some algorithmic aspects of the theory of optimal designs. Annals of Statistics 6 (6), 1286–1301. Wynn, H., 1970. The sequential generation of D-optimum experimental designs. Annals of Math. Stat. 41, 1655–1664. Yang, M., 2010. On de la Garza phenomenon. Annals of Statistics 38 (4), 2499–2524. Yang, M., Biedermann, S., Tang, E., 2013. On optimal designs for nonlinear models: a general and efficient algorithm. Journal of the American Statistical Association 108 (504), 1411–1420. Yu, Y., 2010. Strict monotonicity and convergence rate of Titterington’s algorithm for computing D-optimal designs. Comput. Statist. Data Anal. 54, 1419–1425. Yu, Y., 2011. D-optimal designs via a cocktail algorithm. Stat. Comput. 21, 475–481. Zarrop, M., 1979. Optimal Experiment Design for Dynamic System Identification. Springer, Heidelberg.

Luc Pronzato (CNRS) Design of experiments in nonlinear models La Réunion, nov. 2016 73 / 73