<<

OPTIMAL DESIGN FOR MULTIPLE RESPONSES WITH

VARIANCE DEPENDING ON UNKNOWN

Valerii Fedorov, Rob ert Gagnon, and Sergei Leonov

GSK BDS Technical Rep ort 2001-03 August 2001

This pap er was reviewed and recommended for publication by

Anthony C. Atkinson

London Scho ol of Economics

London, U.K.

John Peterson

Biomedical Sciences

GlaxoSmithKline Pharmaceuticals

Upp er Merion, PA, U.S.A.

William F. Rosenb erger

Department of Mathematics and

University of Maryland, Baltimore County

Baltimore, MD, U.S.A.

c

Copyright 2001 by GlaxoSmithKline Pharmaceuticals

Biomedical Data Sciences

GlaxoSmithKline Pharmaceuticals

1250 South Collegeville Road, PO Box 5089

Collegeville, PA 19426-0989

Optimal Design for Multiple Resp onses with

Variance Dep ending on Unknown Parameters

Valerii FEDOROV, Rob ert GAGNON, and Sergei LEONOV

Biomedical Data Sciences

GlaxoSmithKline Pharmaceuticals

Abstract

We discuss optimal design for multiresp onse mo dels with a variance that dep ends

on unknown parameters. The approach relies on optimization of convex functions of the

Fisher information matrix. We prop ose iterated which are asymptotically equiv-

alent to maximum likeliho o d estimators. Combining these estimators with convex design

theory leads to optimal design metho ds which can b e used in the lo cal optimality setting. A

mo del with exp erimental costs is intro duced which is studied within the normalized design

and can be applied, for example, to the analysis of clinical trials with multiple

endp oints.

Contents

1 Intro duction 4

2 Regression Mo dels and Maximum Likeliho o d Estimation 4

3 Iterated Estimators and Combined 7

3.1 Multivariate with unknown but constant matrix . . . 9

4 Optimal Design of Exp eriments 10

4.1 Dose response model ...... 11

5 Optimal Designs Under Cost Constraints 12

5.1 Two response functions with cost constraints ...... 15

5.2 Linear regression with random parameters ...... 18

6 Discussion 20

7 App endix 22

1 INTRODUCTION 4

1 Intro duction

In many areas of research, including biomedical studies, investigators are faced with multire-

sp onse mo dels in whichvariation of the resp onse is dep endent up on unknown mo del parameters.

This is a common issue, for example, in pharmacokinetics, dose resp onse, rep eated measures,

, and mo dels. Many estimation metho ds have b een prop osed for these

situations, see for example Beal and Sheiner (1988), Davidian and Carroll (1987), Jennrich

(1969), and Lindstrom and Bates (1990). In these mo dels, as in all others, optimal allo cation

of resources through exp erimental design is essential. Optimal designs provide not only statis-

tically optimal estimates of mo del parameters, but also ensure that investments of time and

money are utilized to their fullest. In many cases, investigators must design studies in which

they are sub ject to some typ e of constraint. One example is a cost constraint, in which the total

budget for conducting the study is limited, and the study design must b e adjusted not only to

ensure that the budget is realized, but also to ensure that optimal estimation of parameters is

achieved.

In this pap er, we intro duce an iterated which is asymptotically equivalentto the

maximum likeliho o d estimator (MLE). This iterated estimator is a natural generalization of

the traditional iteratively reweighted least squares algorithms. It includes not only the squared

deviations of the predicted resp onses from the observations, but also the squared deviations of

the predicted disp ersion matrix from observed residual matrices. In this way, our combined

iterated estimator allows us to construct a natural extension from least squares estimation to

the MLE. Weshowhow to exploit classic optimal design metho ds and algorithms, and provide

the reader with several examples which include a p opular nonlinear dose resp onse mo del and a

linear random e ects mo del. Finally, a mo del with exp erimental costs is intro duced and studied

within the framework of normalized designs. Among p otential applications of this mo del is the

analysis of clinical trials with multiple endp oints.

The pap er is organized as follows. In Section 2, weintro duce the mo del of observations and

discuss classic results of the maximum likeliho o d theory. In Section 3, the iterated estimator is

intro duced. Section 4 concentrates on optimal design problems. In Section 5, the mo del with

exp erimental costs is studied within the normalized design paradigm. We conclude the pap er

with the Discussion. The App endix contains pro ofs of some technical results.

2 Regression Mo dels and Maximum Likeliho o d Estimation

In this section, weintro duce the multiresp onse regression mo del, with variance matrix dep end-

ing up on unknown mo del parameters. Mo dels of this typ e include rep eated measures, random

co ecients, and heteroscedastic regression, among others. We also present a brief review of

maximum likeliho o d , concluding with the asymptotic normality of the MLE.

Note that the MLE for the regression mo dels describ ed herein do es not yield closed form solu-

2 REGRESSION MODELS AND MAXIMUM LIKELIHOOD ESTIMATION 5

tions, except in the simplest of cases. It is necessary, therefore, to resort to iterative pro cedures,

and to rely on the convergence and asymptotic prop erties of these pro cedures for estimation and

inference.

Let the observed k  1vector y have a normal distribution and

E[y jx]= (x;  ); Var [y jx]=S (x;  ); (1)

T

where  (x;  )=( (x;  );::: ; (x;  )) ; S (x;  )isak  k matrix, x are indep endentvariables

1

k

m

(predictors) and  2  are unknown parameters. In this case the score function of a

single observation y is given by

n o

@ 1

T

1

log jS (x;  )j +[y  (x;  )] S (x;  )[y  (x;  )] ; R (y jx;  )=

2 @

and the corresp onding matrix is (cf. Magnus and Neudecker (1988, Ch. 6),

or Muirhead (1982, Ch. 1))

m

; (2) (x;  )=(x;  ; ; S )=Var[R (y jx;  )] = [ (x;  )]

; =1

" #

T

@ (x;  ) @S(x;  ) @(x;  ) 1 @S(x;  )

1 1 1

 (x;  )= tr S (x;  ) S (x;  ) + S (x;  ) :

@ @ 2 @ @

In general, the dimension and structure of y; , and S can vary for di erent x. To indicate this,

we should intro duce a subscript k for every x , but we do not use it, retaining the traditional

i i

notation y ;  (x ;) and S (x ;) if it do es not cause confusion. The log-likeliho o d function L

i i i

for N indep endent observations y ::: y can b e written as

1 N

N

n o

X

1

T

1

log jS (x ;)j +[y  (x ;)] S (x ;)[y  (x ;)] ; (3) L ( )=

i i i i i i N

2

i=1

and the information matrix is additive in this case, i.e.

N

X

 ( ) = (x ;):

N i

i=1

Anyvector  which maximizes the log-likeliho o d function L ( ),

N N

 = arg max L ( ) (4)

N N

 2

is called a maximum likeliho o d estimator (MLE). Weintro duce the following assumptions:

Assumption 1. The set is compact; x 2X where X is compact, and al l components

i

in  (x;  ) and S (x;  ) arecontinuous with respect to  uniformly in ,with S (x;  )  S where

0



S is a positive de nite matrix. The true vector of unknown parameters  is an internal point

0

of .

P



Assumption 2. The sum f (x ;; )=N converges uniformly in to a continuous

i

i



function  (;  ),

N

X

1  

lim N f (x ;; )= (;  ) ; (5)

i

N !1

i=1

2 REGRESSION MODELS AND MAXIMUM LIKELIHOOD ESTIMATION 6

where

h i

 1 

f (x;  ;  ) = log jS (x;  )j +tr S (x;  ) S (x;  ) +

 T 1 

+[ (x;  )  (x;  )] S (x;  ) [ (x;  )  (x;  )];

 

and the function  (;  ) attains its unique minimum at  =  .

Following Jennrich (1969, Theorem 6), it can be shown that under Assumptions 1 and 2,

the MLE is a measurable function of observations and is strongly consistent; see also Fedorov

(1974), Heyde (1997, Ch. 12), Pazman (1993), Wu (1981). Condition (5) is based on the fact

that

n o

T

1

E [y  (x;  )] S (x;  )[y  (x;  )] =

h i

T

 1  1 

= [ (x;  )  (x;  )] S (x;  )[ (x;  )  (x;  )] + tr S (x;  ) S (x;  ) ;

and the Kolmogorovlaw of large numb ers; see Rao (1973, Ch. 2c.3).

If in addition to the ab ove assumptions all comp onents of  (x;  ) and S (x;  ) are twice

di erentiable with resp ect to  for all  2 , and the limit matrix

N

X

 1 

(x ; )= lim N M ( ) (6)

i

N !1

i=1

exists and is regular, then  is asymptotically normally distributed, i.e.

N

p

1

 

N (  ) N(0; M ( )): (7)

N

N

Note that the selection of the series fx g is crucial for consistency and precision of  :

i N

N

Remark 1. Given N and fx g , a design measure can b e de ned as

i

1

N

X

1

(x); (z )=f1 if z = x; and 0 otherwiseg :  (x) =

x x N

i

N

i=1



If the sequence f (x)g weakly converges to  (x), then the limiting function  (;  ) in the

N

\identi ability" Assumption 2 can b e presented as

Z

 

 (;  )= f (x;  ;  )d (x);

cf. Malyutov (1988). Most often, within the optimal design paradigm, the limit design  is a

discrete measure, i.e. a collection of supp ort p oints fx ; j =1; :::; ng with weights p , suchthat

j j

P

p = 1; see Section 4.

j j

3 ITERATED ESTIMATORS AND COMBINED LEAST SQUARES 7

3 Iterated Estimators and Combined Least Squares

If the disp ersion matrices of the observed y are known, i.e. S (x ;)= S (x ), then (3) leads to

i i

what is usually called the generalized least squares estimator:

N

X

T

1

~

[y  (x ;)] S (x )[y  (x ;)] ; (8)  = arg min

i i i i i N



i=1

whichiswell studied. When the S dep ends on  , it is tempting to replace (8)

by

N

X

T

1

~

 = arg min [ y  (x ;)] S (x ;)[y  (x ;)] ; (9)

N i i i i i



i=1

which in general is not consistent; cf. Fedorov (1974), Muller (1998).

Resorting to iteratively reweighted least squares (IRLS), see Beal and Sheiner (1988), Vonesh

and Chinchilli (1997),

~

 = lim  ; (10)

N t

N !1

N

X

T

1

 = arg min [ y  (x ;)] S (x ; )[y  (x ;)] ;

t i i i t1 i i



i=1

leads to a strongly consistent estimator with an asymptotic disp ersion matrix

# "

1

N

 

X

@(x ;) @(x ;)

i i

1

~

; S (x ;) Var  '

i N

T

@ @

i=1

~

 =

N

which is bigger, in terms of non-negative de nite matrix ordering, than the corresp onding matrix

for the MLE, de ned in (6). For a discussion on related issues, see Jobson and Fuller (1980),

Malyutov (1982).

The natural step after (10) is intro duction of the combined iterated least squares estimator

which includes squared deviations of the predicted disp ersion matrix S (x;  ) from observed

residual matrices:

^

 = lim  ; (11)

N t

t!1

where

2

 =argminv (;  );

t t1

N

 2

N

X

T

2 0 1 0

v (;  ) = [y  (x ;)] S (x ; )[y  (x ;)]

i i i i i

N

i=1

N

hn o i

X

2

1

0 0 T 0 1 0

+ tr [y  (x ; )] [y  (x ; )]  (x ; ;  ) S (x ;) S (x ; ) (12)

i i i i i i i

2

i=1

where

0 0 0 T

 (x ; ;  ) = [ (x ;)  (x ; )] [ (x ;)  (x ; )] :

i i i i i

3 ITERATED ESTIMATORS AND COMBINED LEAST SQUARES 8

To prove the convergence of the combined iterated estimator, together with Assumption 1, we

need the following:

Assumption 3. The variance function satis es S (x;  )  S for al l x 2 X and  2 ,

1

where S is a positive de nite matrix. Design measures  (x) converge weakly to  (x), and the

1 N

function

Z

 

~

~ (;  ) = f (x;  ;  )d (x)



is continuous with respect to  , and attains its unique minimum at  =  , where

n o

2

1 1

  T  

~

f (x;  ;  ) = [ (x;  )  (x;  )] S [ (x;  )  (x;  )] + tr [S (x;  ) S (x;  )] S :

1 1

The following theorem establishes the asymptotic equivalence of the combined iterated esti-

mator (11), (12) and the MLE (4). The intro duction of stationary points of the log-likeliho o d

function in the statement of the theorem is similar to Cramer's (1946) de nition of the MLE.

Theorem 1. Under the regularity Assumptions 1 and 3,

^

lim P f 2  g = 1;

N N

N !1

where  is a set of stationary points of the log- L ( ),

N N

@L ( )

N

 = f : = 0; j =1;::: ;mg:

N

@

j

The pro of of the theorem is p ostp oned to the App endix.

0

Remark 2. The intro duction of the term  (x ; ;  ) in (12) together with Assumption 3

i

 

0 2 0

guarantees that for any  , the unique minimum of lim E v (;  ) =N with resp ect to  is

N !1

N



attained at  =  ; see Lemma 1 in the App endix for details. Note that if the iterated estimator

(11), (12) converges, then  (x ;  ; ) ! 0 as t ! 1. Therefore, in some situations this

i t t1

term can b e omitted if a starting p oint  is close enough to the true value; see Section 3.1 for

0

an example.



Remark 3. If function ~ (;  ) de ned in Assumption 3 attains its unique minimum at

R

  

 =  , then so do es function  (;  ) = f (x;  ;  )d (x), see Remark 1. To verify this, note

that if S and S are p ostitive de nite matrices, then the matrix function



1

g (S ) = log jS j + tr[S S ]



attains its unique minimum at S = S ; see Seb er (1984, App endix A7).



0

Remark 4. If in (12) one cho oses  =  and, similar to (9), considers

^

2

^

 = arg min v (; );

N

N

 2

3 ITERATED ESTIMATORS AND COMBINED LEAST SQUARES 9

then using the approach describ ed in Fedorov (1974), it can veri ed that this \non-reweighted"

^

^

estimator  is, in general, inconsistent.

N

3.1 Multivariate linear regression with unknown but constant covariance

matrix

T

Let y be a k  1 normally distributed vector and E[y jx]=F (x) ; Var [y jx]=S , i.e.

T T T T

 =( ; S ;S ;::: ;S ; S ;S ;::: ;S ; ::: ; S )=( ; vech (S ));

11 21 22 32

k 1 k 2 kk

with F (x)=[f (x);::: ;f (x)]. We follow Harville (1997, Ch. 16.4) in using the notation vech

1

k

for element-wise \vectorization" of a k  k symmetric matrix. The simplest estimator of the

regression is given by

!

1

N N

X X

T

~ = F (x )F (x ) F (x )y ;

N i i i i

i=1 i=1

whichisunbiased (though not ecient) and thus provides a choice of which allows for dropping

0

0

the term  (x ; ;  ) on the right-hand side of (12); see Remark 2. In this case, the mo di ed

i

0 2

(;  ) can b e presented as function v

N

N

h i h i

X

T

0 T 2 T 0 1

(;  )= y F (x ) v y F (x ) (S )

i i i i

N

i=1

N

hn o i

X

2

1

T 0 T 0 T 0 1

: (13) tr [y F (x ) ][y F (x ) ] S (S ) +

i i i i

2

i=1

Parameters and S in (13) are nicely partitioned and if k  k , then each step in the iterative

i

pro cedure (11) can b e presented in closed form,

N

h ih i

X

T

1

1 T T

= M Y and S = N y F (x ) y F (x ) ; (14)

t t1 t i i t1 i i t1

t1

i=1

where

N N

X X

1 1 T

F (x )S y ; F (x )S F (x ); Y = M =

i i i i t t

t t

i=1 i=1

cf. Fedorov (1977). Consequently,

^

^ = lim and S = lim S :

N t n t

t!1 t!1

 

T

T T

^

^

The information matrix of  = ^ ; vech (S ) is blo ckwise with resp ect to and vech(S ),

N

N N

therefore the asymptotic disp ersion matrices can b e computed separately. For instance,

" #

1

N

X

1

T

^

Var ( ^ ) ' F (x ) : F (x )S

N i i

N

i=1

Note that the second formula on the right-hand side of (14) is valid only if all k comp onents are

measured at all p oints x . Otherwise, S cannot b e presented in closed form for any nondiagonal

i t case.

4 OPTIMAL DESIGN OF 10

4 Optimal Design of Exp eriments

As so on as the information matrix of a single measurement is de ned, it is a rather straightfor-

ward task to construct numerical algorithms and derive their prop erties in the optimal design

P

n

n = N , where n is the theory setting. Let n measurements b e taken at p oint x , and let

i i i

i=1

numb er of distinct x .

i

Let design  b e de ned as

N

n

X

n

 = f(n ;x ) ; n = N ; x 2Xg;

N i i i i

1

i=1

where X is a design region. Each design generates the information matrix

n n

X X X

M ( )= n (x ;) = N p (x ;) = NM(; ); where M (; )= p (x ;);

N i i i i i i

i i=1 i=1

weights p = n =N ; M (; ) is a normalized information matrix, and  = fx ;p g is a normalized

i i i i

(continuous, or approximate) design. In this setting N may be viewed as a resource available

to a practitioner; see Section 5 for a di erent normalization. In convex design theory, it is

standard to allowweights p to vary continuously. The goal is to minimize various functionals

i

1

dep ending on the normalized variance matrix M (; );

 1

 = arg min [M (; )]:



Among p opular optimality criteria are:

1

 D-criterion, = log jM (; )j, which is often called a generalized variance and is

related to the volume of the con dence ellipsoid;

 

1

 A-criterion (linear), = tr AM (; ) , where A is an m  m non-negative de nite

matrix and m is the numb er of unknown parameters in the mo del.

Using classic optimal design techniques, one can establish an analog of the generalized equiv-



alence theorem. A necessary and sucient condition for a design  to b e lo cally optimal is the

inequality

h i

1 

(x;  ;  ) = tr (x;  )M ( ;)  m; m =dim; (15)

in the case of the D-criterion, and

h i h i

1  1  1 

(x;  ;  ) = tr (x;  )M ( ;)AM ( ;)  tr AM ( ;) (16)

in the case of the linear criterion. Equality in (15), or (16), is attained at the supp ort points



of the optimal designs  . The function (x;  ;  ) is often called a sensitivity function of the

corresp onding criterion; see FedorovandHackl (1997, Ch. 2).

For a single resp onse function, k = 1, the sensitivity function of the D-criterion can be

rewritten as

d (x;  ;  ) d (x;  ;  )

1 2

(x;  ;  )= + ; (17)

2

S (x;  ) 2S (x;  )

4 OPTIMAL 11

where

@(x;  ) @(x;  ) @S(x;  ) @S(x;  )

1 1

d (x;  ;  )= M (; ) ; d (x;  ;  )= M (; ) ; (18)

1 2

T T

@ @ @ @

see Downing et al. (2001). The scalar case was extensively discussed by Atkinson and Co ok

(1995) for various partitioning schemes of parameter vector  , including separate and overlapping

parameters in the variance function.

Example 3.1 (continued). As mentioned ab ove, in this example the information matrix

is blo ckwise with resp ect to and S , i.e.

! !

 (x) 0 M (; ) 0

(x;  )= ; and M (; )= ;

0  0 

SS SS

P

1 T

where  (x)=F (x)S F (x); M (; )= p  (x ); 0's are zero matrices of prop er size

i i

i

and  do es not dep end on x. Therefore, (15) admits the following presentation:

SS

h i

1 T 1 

tr S F (x)M ( ;)F (x)  dim : (19)

 T 1

( ;)F (x) is the asymptotic disp ersion matrix of the predicted The matrix d = F (x)M

1

T

resp onse vector F (x) at p oint x.



Note that in general the design  may dep end on S , i.e. (15) or (16) lead to a lo cally optimal

design. Formulas like (15) and (16) provide a basis for rst order numerical algorithms similar to

those discussed in ma jor texts on exp erimental design; cf. Atkinson and Donev (1992), Fedorov

and Hackl (1997). While not discussing these algorithms in general we provide some sp ecial

cases in examples.

4.1 Dose response model

In dose resp onse studies, the resp onse is often describ ed by a logistic function

2 1

; (20)  (x;  ) =  (x; ) = +

1

4

1+(x= )

3

where x is a given dose. The p ower mo del is a p opular choice for the variance function,

 T T T T

2

S (x;  ) =   (x; ) ;  =( ; ) : (21)

1

To illustrate D-optimal design for this mo del, we use the data from a study on the ability of

a comp ound to inhibit the proliferation of b one marrow erythroleukemia cells in a cell-based

assay; see Downing et al. (2001). The vector of unknown parameters was estimated by tting

the data collected from the two-fold dilution design covering a of concentrations from

0.98 to 500 ng/ml. Thus the design region was set to [-0.02, 6.21] on the log-scale. Fig. 1

presents the lo cally optimal design and the variance of prediction for the mo del (20), (21),

T

where  = (616; 1646; 75:2; 1 :34 ; 0 :33 ; 0 :90 ) .

The two subplots on the right-side of Fig. 1 presentthe normalized variance of prediction

(x;  ;  ) de ned in (17), for a serial dilution design  (i.e. uniform on the log-scale, upp er 0

5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 12

1600 ψ 12 ψ (standard) 1 ψ (new term) 1400 10 2 ) θ )

, 8 θ 1200 0 ξ (x, (x, η 6 ψ 1000 4

800 2 0 0 2 4 6 0 2 4 6 log(x) log(x)

1000 12

800 10 ) ) θ θ

, 8

, 600 * ξ ξ

(x, 6 d(x, 400 ψ 4 200 ξ0 (uniform) 2 ξ* (optimal) 0 0 0 2 4 6 0 2 4 6

log(x) log(x)

Figure 1: Upp er left: resp onse. Upp er right: normalized variance for uniform design. Lower right:

normalized variance for optimal design. Lower left: unnormalized variance (triangles - optimal design,

circles - serial dilution)



right) and D-optimal design  (lower right). The solid lines show the function (x;  ;  ) while

the dashed and dotted lines display the 1st and 2nd terms on the right-hand side of (17),

resp ectively. The unnormalized variance of prediction d (x;  ;  ) de ned in (18), is given on the

1

lower-left subplot. It is worthyto note that the optimal design in our example is supp orted at

just four points, whichis less than the number of estimated parameters. We also remark that

the weights of the supp ort p oints are not equal. In our example p = f0:28; 0:22; 0:22; 0:28g .

5 Optimal Designs Under Cost Constraints

Traditionally when normalized designs are discussed, the normalization factor is equal to the

number of exp eriments N ; see Section 3. Now let each measurement at point x b e asso ciated

i

with a cost c(x ), and there exists a restriction on the total cost,

i

n

X

n c(x )  C: (22)

i i

i=1

5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 13

In this case it is quite natural to normalize the information matrix by the total cost C and

intro duce

X

n c(x ) (x;  ) M ( )

i i N

= w ~ (x ;); with w = ; ~ (x;  )= : (23) M (; ) =

i i i C

C C c(x)

i

Note that the considered case should not be confused with the case when additionally to (23)

P

one also imp oses that n  N . The corresp onding design problem is more complicated and

i

i

must b e addressed as discussed in Co ok and Fedorov (1995).

As so on as cost function c(x) is de ned, one can use the well elab orated techniques of

constructing continuous designs for various criteria of optimality

1

[M (; )] ! min; where  = fw ;x g:

i i

C



As usual, to obtain frequencies n , values n~ = w C=c(x ) have to be rounded to the nearest

i i i i

P

integers n sub ject to n c(x )  C ; for details on rounding, see Pukelsheim (1993, Ch. 12).

i i i

i

To illustrate the p otential of this approach, we construct D-optimal designs for the two-

T

dimensional resp onse function  (x;  )=[ (x;  ); (x;  )] , with the variance matrix

1 2

!

S (x;  ) S (x;  )

11 12

S (x;  ) = : (24)

S (x;  ) S (x;  )

12 22

Let a single measurement of function  (x;  ) cost c (x); i = 1; 2. Additionally, we imp ose a

i i

cost c (x)onany single or pair of measurements. The rationale b ehind this mo del comes from

v

considering a hyp othetical visit of a patient to the clinic to participate in a . It is

assumed that each visit costs c (x), where x denotes a patient (or more appropriately, some

v

patient's characteristics). There are three options for each patient:

(1) Taketestt whichby itself costs c (x); the total cost of this option is C (x)=c (x)+c (x).

1 1 1 v 1

(2) Take test t which costs c (x); the total cost is C (x)=c (x)+c (x).

2 2 2 v 2

(3) Take b oth tests t and t ; in this case the cost is C (x)=c (x)+c (x)+c (x).

1 2 3 v 1 2

Another interpretation could b e measuring pharmacokinetic pro le (blo o d concentration) at one

or two time p oints.

To consider this example within the traditional framework, intro duce binary variables x

1

and x ; x = f0 or 1g;i =1; 2: Let X =(x; x ;x ) where x b elongs to a \traditional" design

2 i 1 2

region X , and pair (x ;x ) b elongs to

1 2

X = f(x ;x ):x =0 or1; max(x ;x )=1g:

12 1 2 i 1 2

De ne

 (X;  )=I  (x;  ); S (X;  )=I S (x;  ) I ; (25)

x ;x x ;x x ;x

1 2 1 2 1 2

where

!

x 0

1

I = :

x ;x

1 2

0 x 2

5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 14

Nowintro duce the \extended" design region X,

[ [

Z ; (26) Z X = X X = Z

3 2 12 1

where

Z = f(x; x ;x ): x 2X;x =1;x =0g; Z = f(x; x ;x ): x 2X;x =0;x =1g;

1 1 2 1 2 2 1 2 1 2

\ \

Z = f(x; x ;x ): x 2X;x = x =1g; and Z Z Z = :

3 1 2 1 2 1 2 3

The normalized information matrix M (; ) and design  are de ned as

C

n n

X X

M (; ) = w ~ (X ;); w =1;  = fX ;w g ;

C i i i i i

i=1 i=1

where ~ (X;  )=(X;  )=C (x) if X 2 Z ; i =1; 2; 3; with (X;  ) de ned in (2), and

i i

 (X;  ); S (X;  ) intro duced in (25).

Note that the generalization to k > 2 is straightforward; one has to intro duce k binary

variables x and matrix I = diag (x ): The number of subregions Z in this case is equal

i x ;:::;x i i

1

k

k

to 2 1:

The formulation (26) will b e used in the example b elow to demonstrate the p erformance of

various designs with resp ect to the sensitivity function (X;  ;  ); X 2 Z . i

Response functions: η (x,θ) = γ + γ x + γ x2 + γ x3; η (x,θ) = γ + γ x + γ x 1 1 2 3 4 2 1 5 6 + 5 η 1 η 2

4

3 ) θ 2 (x, η

1

0

−1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

x

T

Figure 2: Resp onse functions  and  ; parameter =(1; 2; 3; 1; 2; 1:5)

1 2

5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 15

5.1 Two response functions with cost constraints

For the example, we selected functions

2 3 T T

 (x;  )= + x + x + x = F (x) ;  (x;  )= + x + x = F (x) ;

1 1 2 3 4 2 1 5 6 +

1 2

2 3 T T

where F (x)=(1;x;x ;x ; 0; 0) ; F (x)=(1; 0; 0; 0;x;x ) ; x 2X =[1; 1]; and

1 2 +

x = fx if x  0; and 0 otherwiseg; see Fig.2. Cost functions are selected as constants

+

c ;c ;c 2 [0; 1] anddonotdependonx. Similarly, the variance matrix S (x;  ) is constant,

v 1 2

!

S S

11 12

: S (x;  ) =

S S

12 22

In our computations, we take S = S =1; S = ; 0    1, thus changing the value of

11 22 12

T

S only. Note that  = ( ; ;::: ; ; S ;S ;S ) : The functions  ; are linear with

12 1 2 6 11 12 22 1 2

resp ect to unknown parameters . Thus optimal designs do not dep end on their values. On the

contrary, in this example optimal designs do dep end on the values of the variance parameters

S , i.e. we construct lo cally optimal designs with resp ect to their values (compare Figures 4 and

ij

5). We considered a rather simple example to illustrate the approach. Nevertheless, it allows us

to demonstrate howthechange in cost functions and variance parameter  e ects the selection

of design p oints.

1st function only: cost = 1.0 + 0.0

9

1 7 Z ∈ 5 (X), X

ψ 3

1 0 −1 1 2nd function only: cost = 1.0 + 0.0

9

2 7 Z ∈ 5 (X), X

ψ 3

1 0 −1 1 Both functions: cost = 1.0 + 0.0 + 0.0

9••••• 0.32 0.16 0.04 0.16 0.32 1 3 5 3 2 4 7 Z ∈ 5 (X), X

ψ 3

1 0 −1 −0.5 0 0.5 1

x

Figure 3: Sensitivity function and D-optimal design for c =1, c = c =0;  =0

v 1 2

For the rst run, wecho ose c =1; c = c =0;  =0;see Fig.3 whichshows the sensitivity

v 1 2

h i

1



;) function (X;  ;  )=tr ~ (X;  )M ( for X 2 Z ; j =1; 2; 3: Not surprisingly, in this case

j C

5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 16

selected are design p oints in subregion Z . Indeed, since individual measurements cost nothing,

3

it is b ene cial to take two measurements instead of a single one to gain more information and

to decrease the variability of parameter estimates. The weights of the supp ort p oints are shown

in the plot which illustrates the generalized equivalence theorem: the sensitivity function hits

the reference line m = 9 at the supp ort p oints of D-optimal design; recall that dim( )=9.

If we intro duce p ositive costs of the individual measurements, then the weights are redis-

tributed. The case c =1; c = c =0:5;  =0; is presented in Fig. 4. Compared to Fig.3,

v 1 2

the design weights in the middle of subregion Z shift to subregion Z , where two new points

3 1

app ear: x = 0:45 with weights w = 0:18. It is interesting that in this case no supp ort

4;5 4;5

points app ear in subregion Z (i.e. measuring the 2-nd function only). 2

1st function only: cost = 1.0 + 0.5

9 •• 0.18 0.18 1 7 4 5 Z ∈ 5 (X), X

ψ 3

1 0 −1 1 2nd function only: cost = 1.0 + 0.5

9

2 7 Z ∈ 5 (X), X

ψ 3

1 0 −1 1 Both functions: cost = 1.0 + 0.5 + 0.5 9••0.31 • 0.31 1 0.02 3

3 2 7 Z ∈ 5 (X), X

ψ 3

1 0 −1 −0.5 0 0.5 1

x

Figure 4: D-optimal design for c =1, c = c =0:5;  =0

v 1 2

The next case deals with positive correlation  = 0:3 and c = 1; c = c = 0:5; see Fig.

v 1 2

5. Now there are just 4 supp ort points in the design: two of them are at the b oundaries of

subregion Z with weights w = 0:33, and the other two are in the middle of subregion Z ,

3 1;2 1

x = 0:45 with weights w =0:17:

3;4 3;4

So far, subregion Z has not b een represented in the optimal designs. Fig.6 illustrates a case

2

when supp ort p oints app ear in this subregion. For this, wetake c =1;c =1;c =0:1;  =0:2:

v 1 2

A design p oint x = 0 app ears in the center of Z with weight w = 0:1. Not surprisingly,

5 2 5

subregion Z has no supp ort p oints in this example since the cost of measuring function  is

1 1

much higher than for function  . 2

5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 17

1st function only: cost = 1.0 + 0.5

9 •• 0.17 0.17 1 7 3 4 Z ∈ 5 (X), X

ψ 3

1 0 −1 1 2nd function only: cost = 1.0 + 0.5

9

2 7 Z ∈ 5 (X), X

ψ 3

1 0 −1 1 Both functions: cost = 1.0 + 0.5 + 0.5 •• 9 0.33 0.33 1 2 3 7 Z ∈ 5 (X), X

ψ 3

1 0 −1 −0.5 0 0.5 1

x

Figure 5: D-optimal design for c =1, c = c =0:5;  =0:3

v 1 2

1st function only: cost = 1.0 + 1.0

9

1 7 Z ∈ 5 (X), X

ψ 3

1 0 −1 1 2nd function only: cost = 1.0 + 0.1

9 • 0.10 2 7 5 Z ∈ 5 (X), X

ψ 3

1 0 −1 1 Both functions: cost = 1.0 + 1.0 + 0.1

9••0.31 ••0.31 1 0.14 0.14 4

3 2 3 7 Z ∈ 5 (X), X

ψ 3

1 0 −1 −0.5 0 0.5 1

x

Figure 6: D-optimal design for c =1, c =1; c =0:1;  =0:2

v 1 2

5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 18

5.2 Linear regression with random parameters

Let

T 2

E(y j ; x)=f (x) ; Var (y j ; x)= : (27)

m

We assume that given , all observations are indep endent. Parameters 2 R are indep endently

sampled from the normal p opulation with

E( )= ; Var ( )=; (28)

0

where and  are often referred to as \p opulation", or \global" parameters. Let

0

T

T

f (x )=[f (x );::: ;f (x )] ; i =1;::: ;k ; y =(y ;::: ;y ); x =(x ;::: ;x );

ij 1 ij m ij j 1j j 1j

k ;j k ;j

j

j j

h i

and F (x )= f (x );::: ;f(x ) : Then mo del (27) can b e represented as

j 1;j

k ;j

j

T

(x ) : (29) E(y j ;x )=F

j j j j j

We emphasize that di erent numb ers of measurements k can be obtained for di erent j 's.

j

u

Predictor x is a q -dimensional vector; for example, if a patient receives a q -drug treatment, x

ij

ij

denotes the dose level of drug u administered to individual j in exp eriment i; u =1;::: ;q.

From (28) and (29) it follows that

T 2 T 2

E(y jx )= (x ; )=F (x ) ; Var (y jx )=S (; ;x )=F (x )F (x )+ I : (30)

j j 0 j 0 j j j j

k

j

We rst assume that  is diagonal, i.e.  = diag ( ); =1;::: ;m. For the discussion on

how to tackle the general case of non-diagonal matrix , see Remark 5 in the App endix.

Straightforward exercises in matrix algebra lead from (2) to the following representation of

the information matrix:

1 1 0 0

 0 0  0 0

;j m;m m;1 N; m;m m;1

N N

X X

C C B B

2

C C B B

; (31) =  ( ; ; )= (x ;)=

0   0  

N j

m;m m;m

;j ;j N; N;

A A @ @

j =1 j =1

T T

0   0  

1;m ;j 1;m N;

;j N;

where

1

T 2

 = F (x )S F (x ); S = S (; ;x );

;j j j j j

j

h i

2

1

1 T

f g = F (x )S F (x ) ; ; =1;::: ;m;

j j

;j

j

2

h i

1 1

2 2

T

F (x )S F tr S ; (x ); =1;::: ;m;  = f g =

j j ;j

;j

j j

2 2

 

F (x ) = f (x );f (x );::: ;f (x ) ; and 0 is an (a  b) matrix of zeros:

j 1j 2j

k ;j a;b

j

Thus sets of parameters and f;g are mutually orthogonal. This makes optimal design

and estimation problems computationally more a ordable. Iterated estimators, similar to the

example in Section 3.1, can b e written

2 2

^

= lim  ; ^ = lim ;  = lim  ; ^

N t N t

N t

t!1 t!1 t!1

5 OPTIMAL DESIGNS UNDER COST CONSTRAINTS 19

where

N N

X X

1 1 T 1 2

= M Y ; M = F (x )S F (x ); Y = F (x )S y ; S = S ( ; ;x );

t+1 t ;t j j t j j tj t j

t

;t

tj tj

j =1 j =1

! ! !

1

y M M 

t;1 t+1

t; t;

; (32) =

T 2

y M M 

t;2 t; 

t+1

t;

T

where  = ( ;::: ; ) ; M is an (m  m) matrix; M and y are (m  1) vectors;

t t1 tm t;1

t; t;

M ; M ; and M are the same as  ;  ; and  , resp ectively except that S

t;  N; tj

t; t; N; N;

should b e substituted for S ,

j

N

io n h

X

2

1

1

T

fy g = ; =1;::: ;m; F (x )S y F (x )

t;1 j j j t

tj

2

j =1

N

X

1

T 2 T

y = [y F (x ) ] S [y F (x ) ]:

t;2 j j t j j t

tj

2

j =1

The pro of of (32) is p ostp oned to the App endix.

Now, we show how to intro duce cost constraints for this example; for other metho ds of

generating optimal design with constraints in random e ects mo dels, see Mentre and al. (1997).

The hyp othetical mo del is similar to that for Example 5.2, where patients visit a clinic to

participate in a clinical trial. Here, we assume that the patients will undergo serial

over time, for example blo o d sampling, within a sp eci ed time interval [0;T]. Each patientwho

participates in the trial mayhave a di erentnumb er of samples taken, up to a maximum of q .

Following Example 5.2, we imp ose a cost c for each visit, and we assign a cost c for each

v s

of the individual samples. Therefore, if k samples are taken, the total cost p er patient will b e

describ ed as

C = c + kc ; k =1;::: ;q;

v s

k

with the restriction (22) on the total cost.

Since samples are taken over time, there is a natural ordering corresp onding to the timing of

the samples. The design region X for a patient dep ends up on the numb er and timing of samples

taken from that patient. For example, if a patient visits the clinic for a single sample only,the

design region X will consist of a single p oint x, 0  x  T . For a patienthaving two samples

1

taken, the design region X will consist of vectors of length 2,

2

X = fX =(x ;x ); 0  x

2 1 2 1 2

etc. If q samples are taken from a patient, then

X = fX =(x ;::: ;x ); 0  x <:::

q 1 q 1 q

Finally, the normalized information matrix can b e de ned as

n n

X X

M (; ) = w ~ (X ) ; w =1;

C i i i

i=1 i=1

6 DISCUSSION 20

where ~ (X;  ) = (X;  )=C ,ifdim(X )=k and X 2X , and the information matrix (X;  )

k k

is de ned in (31).

6 Discussion

Multiresp onse regression mo dels with a variance matrix dep ending up on unknown parameters

form a common basis for mo deling in many areas of research. Well known examples exist in

biomedical research, such as in pharmacokinetics and dose resp onse mo dels. Further examples

are prevalent in econometrics, , and agricultural eld studies, among others. A

rich literature exists in which metho ds for parameter estimation are derived and compared.

Optimal design for these typ es of mo dels has also b een studied, but the literature is not as well

develop ed.

In this pap er, we prop ose an iterated estimator, which is shown to b e a least squares estimator

combining the usual generalized least squares with squared deviations of the predicted disp ersion

matrix from the observed residual matrices. This estimator is shown to be asymptotically

equivalent to the MLE. We provide closed form solutions for the prop osed estimator for a

linear mo del with random parameters. For the case of a single resp onse, the combined iterated

estimator is similar to the iterated estimator prop osed byDavidian and Carroll (1987). However

they partition the parameter vector into two subvectors, with the second one app earing only

in the variance function, and p erform iterations on each term separately. Moreover, for cases

where parameters in the exp ectation and variance functions coincide, the second term disapp ears

completely from the iterations and hence their iterated estimator do es not lead to the MLE.

Optimal exp erimental design is a critical asp ect of research, and well known algorithms

can be utilized to generate such designs. We combine the prop osed iterated estimator with

convex design theory to generate lo cally optimal designs. In a sp eci c example, a dose resp onse

study with the resp onse mo deled by a logistic function and the variance mo deled by a two

parameter p ower function, a rst order numerical algorithm was utilized to generate a D-optimal

design. The optimal design was compared to a two-fold dilution design with 10 dilutions (a

standard dose resp onse design). We underline that the optimal design for this mo del is supp orted

at four design points, which is less than the total of 6 parameters in the mo del. Therefore,

combined application of the estimation and design algorithms leads to an optimal design which,

if implemented, requires fewer resources than the standard design (4 points instead of 10).

Certainly, for nonlinear mo dels one constructs lo cally optimal designs which requires preliminary

estimates of unknown parameters. From our exp erience with the logistic mo del, the lo cally

optimal designs are quite robust to a reasonable variation of parameter estimates.

Finally,we intro duce cost functions and demonstrate the application of cost constraints to

normalized designs. This provides exp erimenters with a basis for incorp orating a total cost,

and allows them to achieve a statistically optimal design, in light of these costs. Such a design

o ers several b ene ts, not the least of which is to enable the exp erimenter to conduct the study

within budget while obtaining reliable parameter estimates. In our example with two resp onse

6 DISCUSSION 21

functions, we demonstrate the impact of cost constraints. With a xed overall cost only (no

individual costs, see Fig. 3), supp ort p oints are allo cated within our design region Z , whichis

3

not surprising: it is b ene cial to taketwo measurements, rather than one, to increase information

content and reduce parameter variability. Intro duction of p ositive costs and correlation b etween

the resp onse functions shifts the design which is demonstrated in the examples; see Figs. 4-6.

In conclusion, this pap er describ es an iterated estimator for nding parameter estimates for

multirep onse mo dels with variance dep ending up on unknown parameters, combines this metho d

with convex design theory,andintro duces designs with cost constraints. These concepts can b e

avaluable to ol to exp erimenters, enabling ecient parameter estimation and optimal allo cation

of resources.

Acknowledgements

The authors are grateful to the referees for useful comments that help ed to improve the presen- tation of the results.

7 APPENDIX 22

7 App endix

In this section, we use a few standard formulas from matrix di erential calculus; see Harville

(1997, Ch. 15).

(1) If S is a symmetric matrix which dep ends on parameter ,andifu is a scalar function of

S ,then

 

du @S @u

; (33) = tr

d @S @

1

@ log jS j dS dS

1 1 1

= S ; = S S : (34)

@S d d

(2) If A; S; and B are symmetric matrices of prop er dimension, then

@

2

tr [(A S )B ] = 2B (S A)B : (35)

@S

The following lemma is used in the pro of of Theorem 1.

Lemma 1. Let

S  S (x;  )  S for any x 2X and  2 ; (36)

0 1

where S and S are p ositive de nite matrices. Let

0 1

 0  T 1 0 

R (x; ;  ; ) = [ (x;  )  (x;  )] S (x;  )[ (x;  )  (x;  )];

1

 

1

 0  0  1 0 2

R (x; ;  ; ) = trf R (x; ;  ; ) + S (x;  ) S (x;  ) S (x;  )g ;

2 22

2

 0  0 T 0 0 T

R = [ (x;  )  (x;  )] [ (x;  )  (x;  )] [ (x;  )  (x;  )] [ (x;  )  (x;  )] ;

22

and for a given  ,

1

1 1

 T   2

[ (x;  )  (x;  )] S trf[S (x;  ) S (x;  )] S [ (x;  )  (x;  )] + g > 0: (37)

1 1

2

Then

 0  0

R (x; ;  ; )+R (x; ;  ; ) > 0:

1 2

0

Pro of. It is obvious that for any x;  ; and  ,

 0  0

R (x; ;  ; )  0; R (x; ;  ; )  0;

1 2

and

 0  0 

R (x; ;  ; ) = R (x; ;  ; ) = 0 if  =  :

1 2

Next, term R can b e represented as

22

 0  T  0 T

R =[ (x;  )  (x;  )] [ (x;  )  (x;  )] [ (x;  )  (x;  )] [ (x;  )  (x;  )] : (38) 22

7 APPENDIX 23

Since both terms on the left-hand side of (37) are non-negative, then at least one of them is

p ositive. If the rst term is p ositive, then the lemma follows from (36) and the de nition of R .

1



Next, if the rst term is equal to zero, then  (x;  )= (x;  ), and (38) implies that

1 1

1

 0  1 0 2  2

R (x; ;  ; ) = trf[S (x;  ) S (x;  )]S (x;  )g  trf[S (x;  ) S (x;  )]S g > 0;

2

1

2 2

since in this case the second term on the right-hand side of (37) is p ositive. This proves the

lemma.

Pro of of Theorem 1. First, compute partial derivatives of the log-likeliho o d function

L ( ), intro duced in (3). Using (33) and (34), and letting z ( )=y  (x ;), one gets:

N i i i

" #

N N

T

X X

@ (x ;) @L ( ) @S(x ;)

i N i

1 1

2 = tr S (x ;) 2 S (x ;)z ( )

i i i

@ @ @

j j j

i=1 i=1

N

X

@S(x ;)

i

T 1 1

z ( ) S (x ;) S (x ;) z ( ) : (39)

i i i

i

@

j

i=1

2 0

Next, use (33) and (35) to compute partial derivatives of v (;  ) with resp ect to  , which

j

N

leads to

N

2 0 T

X

@v (;  ) @ (x ;)

i

1 0

N

= 2 S (x ; ) z ( )+

i i

@ @

j j

i=1

# "

N

o n

X

@S(x ;)

i

0 1 0 1 0 0 T

: ( ) S (x ; ) tr S (x ; ) S (x ;) z ( )z +

i i i i

i

@

j

i=1

From the identity tr[AB ]=tr[BA], it follows that

N

2 0 T

X

@v (;  ) @ (x ;)

i

1

N

= 2 S (x ;) z ( )+

i i

@ @

j j

0

i=1

 =

" #

N N

X X

@S(x ;) @S(x ;)

i i

1 T 1 1

+ tr S (x ;) z ( ) S (x ;) S (x ;) z ( ) ;

i i i i

i

@ @

j j

i=1 i=1

which coincides with (39).

Note that if the algorithm (11) converges, then under the intro duced assumptions

2 2

^

@v (;  ) @v (;  )

N

t

N N

= =0; lim

t!1

@ @

^

 =  =

t

N

^

which implies that  2  . To prove the convergence of (11), intro duce

N N

0 2 0

A ( ) = arg min v (;  ):

N

N

 2

Then (11) can be presented as the recursion  = A ( ) for the xed point problem  =

t N t1

A ( ). Convergence of the recursion is guaranteed if for any  ; 2 there exists a constant

N 1 2

K; 0

T

T

[A ( ) A ( )] [A ( ) A ( )]  K (  ) (  );

N 1 N 2 N 1 N 2 1 2 1 2

7 APPENDIX 24

cf. Saaty and Bram (1964), Ch.1.10.

Remark that A ( ) is simply a generalized version of the least squares estimator with prede-

N

termined weights. Straightforward calculations show that the exp ectation of the ith summand

on the right-hand side of (12) is equal to

 0  0  0  0

R (x ; ;  ; )=R (x ; ;  ; )+R (x ; ;  ; )+R (x ;  ; )

i 1 i 2 i 3 i

where terms R and R are intro duced in Lemma 1, and term R do es not dep end on  . Lemma

1 2 3

0

1 together with Assumption 3 guarantees that for any  , the limiting function

X

1

 0

lim R (x ; ;  ; )

i

N !1

N

i



has a unique minimum with resp ect to  at  =  : Using the strong lawoflargenumb ers, and

following Jennrich (1969), one can show that A ( ) is strongly consistent, i.e. converges almost

N



surely to  ;cf. Rao (1973, Ch.2c). From this fact and the compactness of , it follows that the

probability

h i

T

T

P fA ( ) A ( )g fA ( ) A ( )gK (  ) (  )

N 1 N 2 N 1 N 2 1 2 1 2

tends to 1 as N ! 1 uniformly over  ; 2 for any xed 0 < K < 1. Thus, for large N

1 2

^

with probability close to 1 the limit (11) exists and, consequently,  2  which proves the

N N

theorem.

T 1

Pro of of (32). Intro duce z = y F (x ) ; B =(S ) ; and recall that

j j j t j t;j

T 2

S = F (x )F (x ) +  I :

j j j

k

It is straightforward to show that

m

X

T T

F (x )F (x )=  F (x )F (x ) ;

j j j j

=1

and therefore

@S @S

j j

T

= F (x )F (x ); = I : (40)

j j

k

2

@ @

The analogue of the second term on the right-hand side of (13) can b e written as

N

h i

X

2

1

T

v = tr (z z S )B

2 j j j

j

2

j =1

Then using (33), (35), (40), and the identitytr[AB ]=tr[BA], one gets:

2 3

N

n o

X

@v

2

T 2 T T

4 5

= tr F (x )F (x ) +  I z z B F (x )F (x ) B =

j j j j j j j

k

j

@

j =1

8 9

m N

< =

X X

T 2 T T

 F (x )F (x ) +  I z z = F (x )B B F (x ) =

j j j j j j j

k

j

: ;

j =1

=1

7 APPENDIX 25

m

X

2

 fM g + 2 fM g 2fy g : = 2

t;1

t; t;

=1

2

In a similar fashion, taking the partial derivative with resp ect to  leads to

3 2

N

o n

X

@v

2

T 2 T

5 4

B F (x )F (x ) +  I z z = = tr B

j j j j j

k

j

2

@

j =1

8 9

2 3

m N N N

< =

X X X X

2 2 T T

4 5

= tr  +  B B z z B B F (x )F (x )B =

j j j j j j j

j j

: ;

j =1 j =1 j =1

=1

m

X

2

= 2  fM g + 2 M 2y :

t;  t;2

t;

=1

Finally, equating the expressions of partial derivatives to zero entails

! ! !

M M  y

t;1

t; t;

= ;

T 2

M M  y

t;  t;2

t;

whichproves (32).

Remark 5. To establish the analog of (32) in the case of non-diagonal symmetric matrix

=( ), one has to exploit the identity

m

X

T T

(x )F (x ) = F (x )F (x )=  F

j q j j j rq

r

r;q =1

m

h i

X X

T T T

=  F (x )F (x ) +  F (x )F (x )+F (x )F (x ) : (41)

rr j r j rq j q j j r j

r r q

r>q

r =1

First, the formula (31) should b e mo di ed according to (41). The information matrix 

N;

is nowan[m(m +1)=2]  [m(m +1)=2] matrix corresp onding to parameter vech () (cf. Example

1 for the notation vech). Let

1

T

W = F (x )S F (x ):

j j

j;

j

Then the elements of  are de ned by  ,

N; N ; ;r q

N N N

X X X

1

2

 = W ;  = W W ;  = [W W + W W ] ;

N ; ;r r N ; ;r q j; r j; q j; r j; q

;r q j; q j; r

j; r

2

j =1 j =1 j =1

cf. JennrichandSchluchter (1986). An m(m +1)=2vector  has elements  ,

N; N ; ;

N N

X X

1

2 T 2 T

F (x )S F (x ); F (x )S F (x );  =  =

j j j j N; ;

N ; ;

2

j =1 j =1

and, nally  do es not change. N;

7 APPENDIX 26

2

Using now formula (41), taking partial derivatives with resp ect to  ; , and  , and

equating them to zero leads to the system of [m(m +1)=2 + 1] linear equations, the solution of

whichisgiven by

! ! !

1

~ ~

y~ M M 

t;1

t; t;

; =

T 2

~

y~ M M 

t;2 t; 

t;

~ ~

where M and M are intro duced absolutely similar to  and  , with S substi-

tj

t; t; N; N;

tuting for S ,

j

N N

i h

X X

2

1

1 1 1

fy~ g = ; fy~ g = F (x )S z F (x )S z ; F (x )S z

t;1 t;1 j j j j j j

tj tj tj

2

j =1 j =1

N

X

1

2

T

y~ = z S z :

t;1 j

j

tj

2

j =1

REFERENCES 27

References

[1] Atkinson, A.C. and Co ok, R.D. (1995), D-optimum designs for heteroscedastic linear mo d-

els, JASA, 90 (429), 204-212.

[2] Atkinson, A.C., and Donev, A. (1992), Optimum Experimental Designs, Clarendon Press,

Oxford.

[3] Beal, S.L., and Sheiner, L.B. (1988), Heteroscedastic . Technometrics,

30(3), 327-338.

[4] Co ok, R.D., and Fedorov, V.V. (1995), Constrained optimization of exp erimental design,

Statistics, 26, 129-178.

[5] Cramer, H. (1946), Mathematical Methods of Statistics, Princeton University Press.

[6] Davidian, M., and Carroll, R.J. (1987), Variance function estimation, JASA, 82 (400),

1079-1091.

[7] Downing, D.J., Fedorov, V.V., Leonov, S.L. (2001), Extracting information from the vari-

ance function: optimal design. In: Atkinson, A.C., Hackl, P., Muller,  W.G. (eds.) MODA6

- Advances in Model-Oriented Design and Analysis, Heidelb erg, Physica-Verlag, 45-52.

[8] Fedorov, V.V. (1974), Regression problems with controllable variables sub ject to error,

Biometrika, 61, 49-56.

[9] Fedorov, V.V. (1977), Parameter estimation for multivariate regression, In: Nalimov, V.

(Ed.), Regression Experiments (Design and Analysis), Moscow State Universtiy, Moscow,

112-122 (in Russian).

[10] Fedorov, V.V., and Hackl, P. (1997), Model-Oriented Design of Experiments, Springer-

Verlag, New York.

[11] Harville, D.A. (1997), Matrix Algebra from a 's Perspective, Springer-Verlag,

New York.

[12] Heyde, C.C. (1997), Quasi-Likelihood and Its Applications, Springer-Verlag, New York.

[13] Jennrich, R.I. (1969), Asymptotic prop erties of nonlinear least squares estimators, Ann.

Math. Stat. 40, 633-643.

[14] Jennrich, R.I. , and Schluchter, M.D. (1986), Unbalanced rep eated-measures mo dels with

structured covariance matrices, Biometrics, 42, 805-820.

[15] Jobson, J.D., and Fuller, W.A. (1980), Least squares estimation when the

and parameter vector are functionally related, JASA, 75 (369), 176-181.

REFERENCES 28

[16] Lindstrom, M.J., and Bates, D.M. (1990), Nonlinear mixed e ects mo dels for rep eated

measures data, Biometrics, 46, 673-687.

[17] Magnus, J.R., and Neudecker, H. (1988), Matrix Di erential Calculus with Applications in

Statistics and Econometrics,Wiley,NewYork.

[18] Malyutov, M.B. (1982), On asymptotic prop erties and application of IRGNA-estimates

for parameters of generalized regression mo dels. In: Stoch. Processes and Appl., Moscow,

144-165 (in Russian).

[19] Malyutov, M.B. (1988), Design and analysis in generalized regression mo del F. In: Fedorov,

V.V., Lauter, H. (Eds.), Model-Oriented Data Analysis, Springer-Verlag, Berlin, 72-76.

[20] Mentre, F., Mallet, A., Baccar, D. (1997), Optimal design in random-e ects regression

mo dels, Biometrika, 84 (2), 429-442.

[21] Muirhead, R. (1982), Aspects of Multivariate ,Wiley,NewYork.

[22] Muller, W.G. (1998), Col lecting Spatial Data, Springer-Verlag, New York.

[23] Pukelsheim, F. (1993), Optimal Design of Experiments, Wiley,NewYork.

[24] Pazman, A. (1993), Nonlinear Statistical Models,Kluwer, Dordrecht.

[25] Rao, C.R. (1973), Linear and Its Applications, 2nd Ed., Wiley, New

York.

[26] Saaty, T.L., and Bram, J. (1964), Nonlinear Mathematics, McGraw-Hill, New York.

[27] Seb er, G.A.E. (1984), Multivariate Observations,Wiley,NewYork.

[28] Vonesh, E.F., and Chinchilli, V.M. (1997), Linear and Nonlinear Models for the Analysis

of Repeated Measurements, Marcel Dekker, New York.

[29] Wu, C.F. (1981), Asymptotic theory of nonlinear least squares estimation, Ann. Stat., 9 (3), 501-513.