Statistica Sinica 5(1995), 421-440

ON EFFICIENT DESIGNING

OF NONLINEAR

Probal Chaudhuri and Per A. Mykland

Indian Statistical Institute and University of Chicago

Abstract: Adaptive designs that optimize the asso ciated with a

nonlinear exp erimentare considered. Asymptotic prop erties of the maximum like-

liho o d estimate and related based on dep endent generated

by sequentially designed adaptive nonlinear exp eriments are explored. Conditions on

the exp erimental designs that ensure rst order eciency of the maximum likeliho o d

estimate when the parametrization of the nonlinear mo del is suciently smo oth and

regular are derived. A few interesting op en questions that arise naturally in course of

the investigation are mentioned and brie y discussed.

Key words and phrases: Adaptive sequential designs, rst order eciency, Fisher

information, lo cal -optimality, martingales, maximum likeliho o d, nonlinear mo dels.

1. Intro duction: Adaptive Sequential Design of Nonlinear Exp eri-

ments

In a typical nonlinear set up involving a resp onse Y and a regressor X , the de-

p endence of the resp onse on the regressor is mo deled using a family of probability

distributions, whichinvolve an unknown Euclidean parameter (to be estimated

from the data) in such a way that the parametrization is smo oth and regular,

and the asso ciated Fisher information happ ens to b e a nonlinear function of that

parameter of interest. Standard examples of such nonlinear mo dels include the

usual nonlinear regression mo dels (see e.g. Gallant (1987), Bates and Watts

(1988), Seb er and Wild (1989)), various generalized linear mo dels (see e.g. Mc-

Cullagh and Nelder (1989)), and many p opular heteroscedastic regression mo dels

(see e.g. Box and Hill (1974), Bickel (1978), Jobson and Fuller (1980), Carroll

and Rupp ert (1988)). An awkward situation arises when one needs to design

an exp eriment in order to ensure maximum eciency of parameter estimates

in such a nonlinear problem. Since the Fisher information asso ciated with any

such problem dep ends on the unknown parameter in a nonlinear way, an ecient

designing of the exp eriment to guarantee optimal p erformance of the nonlinear

or the maximum likeliho o d estimate will require knowledge of that

unknown parameter! (See Co chran (1973, pp: 771-772), Bates and Watts (1988,

422 PROBAL CHAUDHURI AND PER A. MYKLAND

p: 129) for various interesting remarks in connection with this). An extensive

discussion on p ossible resolutions of this problem together with several examples

of nonlinear exp eriments can be found in Ford, Titterington and Kitsos (1989),

Myers, Khuri and Carter (1989) and Chaudhuri and Mykland (1993). A few

more interesting examples are discussed b elow.

Example 1.1. Treloar (1974) investigated an enzymatic reaction in whichthe

number of counts per minute of a radioactive pro duct from the reaction was

measured, and from these counts the initial velo city (Y ) of the reaction was

calculated. This velo city is related to the substrate concentration (X )chosen by

the exp erimenter through the nonlinear regression equation (Michaelis-Menten

1

mo del) Y = X ( + X ) + e. Here  = ( ; ) is the unknown parameter

0 1 0 1

of interest, and e is the random noise with zero as usual. The exp eriment

was conducted once with the enzyme (Galactosyltransferase of Golgi Membranes)

treated with Puromycin and once with the enzyme untreated. The main ob jective

of the study was to investigate the e ect of the intro duction of Puromycin on the

ultimate velo city parameter and the half velo city parameter .

0 1

Example 1.2. Cox and Snell (1989, pp: 10-11) while discussing binary resp onse

regression mo dels mentioned ab out an industrial exp eriment, where the number

of ingots that were not ready for rolling were counted out of a xed number of

ingots tested. The number counted dep ends on twocovariates, which are heating

time and soaking time determined by the exp erimenter at the time of pro cessing.

This leads to a mo del and the main interest is in getting

insights into how variations in heating and soaking times in uence the number

of ingots that turn out to b e not ready for rolling.

Example 1.3. Powsner (1935) collected data from an exp eriment that was

conducted to determine the e ect of exp erimental temp erature on the develop-

mental stages (i.e. embryonic, egg-larval, larval and pupal stages) of the fruit

y Drosophila Melanogaster. McCullagh and Nelder (1989) discussed the data

and demonstrated how it can be analyzed using appropriate gamma regression

mo dels with the exp erimental temp erature as the covariate.

Chaudhuri and Mykland (1993) investigated an adaptive sequential scheme

for designing nonlinear exp eriments and established asymptotic D-optimalityof

1=2

the design as well as n -consistency, asymptotic normalityand rst order e-

ciency of the maximum likeliho o d estimate based on observations generated by

their prop osed scheme in a nonlinear set up satisfying suitable regularity condi-

tions. Adaptive sequential and batch sequential designs for certain very sp eci c

typ es of nonlinear exp eriments related to some sp ecial mo dels were explored ear-

DESIGNING NONLINEAR EXPERIMENTS 423

lier byBox and Hunter (1965), Ford and Silvey (1980), Ab delbasit and Plackett

(1983), Ford, Titterington and Wu (1985), Wu (1985), Minkin (1987) and Mc-

Cormick, Mallik and Reeves (1988) among others. Some of these authors made

attempts to study the asymptotic behavior of their resp ective design schemes

and resulting parameter estimates. The key idea lying at the ro ot of such design

strategies is to divide available resources into several groups and to split the entire

exp eriment into a number of steps. At each step, the exp eriment is conducted

using only a single p ortion of divided resources. At the end of each step, an

analysis is carried out and parameter estimates are up dated using the available

data. The results emerging from this data analysis are used in ecient design-

ing of the exp eriment at subsequent steps. As pointed out by several authors

(see e.g. Cherno (1975), Fedorov (1972), Silvey (1980)), the most attractive

feature of adaptive sequential exp eriments is their ability to optimally utilize the

dynamics of the learning pro cess asso ciated with exp erimentation, data analysis

and inference. At the b eginning, the scientist do es not have much information,

and hence an initial exp eriment is b ound to be somewhat tentative in nature.

As the exp erimentcontinues and more and more observations are obtained, the

scientist is able to form a more precise impression of the underlying theory,and

this more de nite p erception is used to design a more informative exp erimentin

the future that is exp ected to give rise to more relevant and useful data.

The main purp ose of this article is to explore some fundamental theoretical

issues that are of critical imp ortance in studying the asymptotics of sequen-

tially designed adaptive nonlinear exp eriments. In Section 2, we will explore the

asymptotic p erformance of maximum likeliho o d estimates based on dep endent

observations generated by adaptive and sequentially designed exp eriments in a

very general nonlinear set up, which amply covers situations o ccurring in prac-

tice and mo dels studied in the literature. Unlike Chaudhuri and Mykland (1993),

who considered a very sp eci c adaptive sequential scheme, our fo cus here will b e

on general adaptive pro cedures for sequentially designing exp eriments and some

related asymptotics. In Section 3, we will discuss the limiting b ehavior of some

adaptive design pro cedures and work out some useful conditions that guarantee

their large sample optimality with resp ect to a broad class of

criteria. Once again, unlikeChaudhuri and Mykland (1993), who concentrated

on D-optimal designs only, we will consider a general family of optimal design

criteria that includes D-optimality and many other optimality criteria as sp ecial

cases. In the course of the development of the principal theoretical results in

Sections 2 and 3, we observe certain intriguing facts, and some interesting ques-

tions, which arise naturally, remain unanswered at this . All technical

pro ofs are presented in App endix.

424 PROBAL CHAUDHURI AND PER A. MYKLAND

2. Maximum Likeliho o d and Related Statistical Inference

Supp ose now that the conditional distribution of the resp onse Y given X = x

has a probability density function or probability mass function f (y j; x). Here the

form of the function f is assumed to b e known, and  is an unknown d-dimensional

Euclidean parameter of interest that takes its values in  (the parameter space),

d

which will be assumed to be a convex and op en subset of R . The value of

the regressor X is determined by the exp erimenter, who cho oses it from a set

(the exp eriment space) before each trial of the exp eriment. In applications,

can b e a nite set (e.g. factorial exp eriments with each factor having a nite

numb er of levels) or some appropriate compact (i.e. closed and b ounded) interval

or region in an Euclidean space (e.g. when the regressor X is continuous in

nature) or a mixture of the two (e.g. analysis of typ e problems). As

pointed out in Chaudhuri and Mykland (1993), the conditional distribution of

Y given X = x mayinvolve an unknown nuisance parameter (e.g. the unknown

error in a homoscedastic normal regression mo del) in addition to the

parameter of principal interest  . However, as argued in Chaudhuri and Mykland

(1993, p: 539), for a numb er of mo dels frequently used in practice and extensively

discussed in the literature, we can ignore the presence of that nuisance parameter

(and therebykeep our notations simple) as long as we are interested in the large

sample b ehavior of the maximum likeliho o d estimate of  and optimal designing

of the exp eriment to ensure greatest eciency in asymptotic inference based on

that estimate. We assume that the parametrization in the mo del f (y j; x) is

smo oth and regular in the sense that the following Cramer typ e conditions will

be satis ed. From now on all vectors in this article will b e column vectors, the

sup erscript T will b e used to denote the transp ose, and jj will b e used to denote

the Euclidean norm of a vector or a matrix.

Condition 2.1. Supp ose that the resp onse Y takes its values in a set R (the

resp onse space, which can be nite, countably in nite, an interval or a region

in an Euclidean space dep ending on the situation), which is equip ed with a

 - eld on it, and  is a  - nite measure (which can be the usual counting

measure or the Leb esgue measure dep ending on the situation) on R such that

R

f (y j; x)(dy )=1. Then the supp ort of f (y j; x)isR for all p ossible values

R

of  and x. Further, for every xed x 2 and y 2 R, log ff (y j; x)g is thrice

continuously di erentiable in  at any  2 .

Condition 2.2. Let r log ff (y j; x)g = G(y; ; x) be the d-dimensional gra-

dient vector obtained by computing the rst order partial derivatives of

R

log ff (y j; x)g with resp ect to  . Then G(y; ; x)f (y j; x)(dy )=0 and

R

R

2+t

sup jG(y; ; x)j f (y j; x)(dy ) < 1 for some t>0.

x2 R

DESIGNING NONLINEAR EXPERIMENTS 425

Condition 2.3. Let H (y; ; x) denote the d  d Hessian matrix of log ff (y j; x)g

obtained by computing its second order partial derivatives with resp ect to  .

Then wehave

Z

H (y; ; x)f (y j; x)(dy )

R

Z

T

= fG(y; ; x)gfG(y; ; x)g f (y j; x)(dy )=I (; x);

R

where I (; x) is the d  d Fisher information matrix asso ciated with the mo del

R

2

f (y j; x). Also, sup jH (y; ; x)j f (y j; x)(dy ) < 1.

x2

R

Condition 2.4. For every  2 , there exists an op en neighborhood N ( )  of

R

K (y; ; x) f (y j; x) and a non-negative function K (y; ; x) that satis es sup

x2

R

0

(dy ) < 1, and each of the third order partial derivatives of log ff (y j ;x)g with

0 0

resp ect to  is dominated by K (y; ; x) for all  2 N ( ).

It is obvious that for a typical mo del in the class of generalized linear mo dels,

Conditions 2.1 through 2.4 will hold. As a matter of fact, these regularity con-

ditions are satis ed whenever the conditional distribution of the resp onse given

the regressor is mo deled using standard exp onential families (see e.g. Lehmann

(1983)) or various curved exp onential families (see Efron (1975, 1978)). Many

imp ortant heteroscedastic mo dels, where the resp onse variable

follows normal distribution with a variance that is assumed to be a nonlinear

function of the mean (see Box and Hill (1974), Bickel (1978), Jobson and Fuller

(1980), Carroll and Rupp ert (1988), etc.) with a xed and known form, are

curved exp onential mo dels satisfying Conditions 2.1 through 2.4. For a standard

nonlinear regression mo del with the error having normal distribution with zero

mean and a xed (but p ossibly unknown) variance, Conditions 2.1 through 2.4

translate into some regularity conditions on the regression function that dep ends

on the unknown parameter of interest in a smo oth and nonlinear way. It is easy

to see that such regularity conditions on the regression function are closely re-

lated to the conditions used in the literature (see e.g. Jenrich (1969), Wu(1981),

1=2

Gallant (1987), Seb er and Wild (1989)) in order to establish n -consistency and

asymptotic normality of the nonlinear least squares estimate based on indep en-

dent observations.

Consider next an adaptive sequential exp eriment in which the ith design

point X is allowed to dep end on the past data (Y ;X );::: ;(Y ;X )fori>1.

i 1 1 i1 i1

Clearly, when data will b e generated from such an exp eriment, we will no longer

have a sequence of indep endent observations, and the standard asymptotics for

maximum likeliho o d estimates based on indep endent observations will not be

applicable. However, the dep endence (if any) of Y on (Y ;X );::: ;(Y ;X )

i 1 1 i1 i1

will be through X . As a result, the likeliho o d based on (Y ;X );::: ;(Y ;X )

i 1 1 i i

426 PROBAL CHAUDHURI AND PER A. MYKLAND

Q

i

will continue to remain in a pro duct form f (Y j; X ) for any i  1. It

r r

r =1

is then obvious that the log-likeliho o d and its various derivatives (with resp ect

to the parameter  ) with suitable adjustments will give rise to sums of martin-

gale di erence sequences, which form the rows of certain triangular arrays (see

also Chaudhuri and Mykland (1993)). As usual, de ne the maximum likeliho o d

estimate based on (Y ;X );::: ;(Y ;X )as

1 1 n n

n

Y

^

 = arg max f (Y j; X ) :

n r r

 2

r =1

^

Alternatively,  can b e viewed as a ro ot of the likeliho o d equation

n

n n

X X

r log ff (Y j; X )g = G(Y ;;X ) = 0 :

r r r r

r =1 r =1

In some cases, the likeliho o d may have multiple maxima, and in those cases

the likeliho o d equation will havemultiple ro ots. However, for generalized linear

mo dels, the concavity of the log-likeliho o d will guarantee the uniqueness of the

maximum likeliho o d estimate for suitable sample sizes. For a standard nonlinear

regression problem, appropriate conditions on the regression function will ensure

the uniqueness (at least in large samples) of the nonlinear least squares estimate.

The following results yield very general sucient conditions, which are to be

1=2

satis ed by the sequence of design p oints, for consistency, n convergence rate

and asymptotic normality of the maximum likeliho o d estimate, when it is based

on dep endent data arising from an adaptive sequential exp eriment.

Result 2.5. Assume that Conditions 2.1 through 2.4 hold, and the sequence

of observations (Y ;X );::: ;(Y ;X ) is generated from an adaptive sequential

1 1 n n

exp eriment. Let  denote the smallest eigenvalue of the average Fisher infor-

n

P

n

1

mation matrix n I (; X ) up to the nth trial. Supp ose that the design

r

r =1

scheme is such that for some p ositive constant <1=4, n  remains b ounded

n

away from zero in probability as n tends to in nity. Then there exists a choice of

the maximum likeliho o d estimate (i.e. a ro ot of the likeliho o d equation) which

will b e weakly consistent.

A pro of of Result 2.5, which makes use of certain asymptotic prop erties of the

martingales asso ciated with the likeliho o d and its derivatives, will b e given in the

App endix. A natural question that arises at this p ointishow to ensure that n 

n

remains b ounded away from zero in probability for some p ositive < 1=4 as n

tends to in nity. Wewillnow exhibit a simple way of designing the exp erimentso

that this condition holds. First observe that for several mo dels used in practice,

R

there exists a probability measure  on such that I (; x) (dx) is a p ositive

0 0

DESIGNING NONLINEAR EXPERIMENTS 427

de nite matrix for all  2 , and frequently  can be taken to be the uniform

0

probability measure (discrete or continuous dep ending on the nature of ) on .

Let k < ::: < k <::: b e an increasing sequence of p ositiveintegers suchthat

1 m

1=(1 )

m k tends to one as m tends to in nity. Then cho ose the design p oints

m

X ;::: ;X ;::: as indep endent and identically distributed random elements in

k k

1 m

with as their common distribution. Note that these design p oints are chosen

0

in a non-adaptive manner. Further, these design points can be chosen without

dep ending on the rest of the design points (i.e. the X 's for which i 6= k for

i m

any m), the latter ones being chosen in some appropriate adaptive way. It is

straight forward to verify that sucha choice of the design points will guarantee

the condition assumed ab out  in the statement of Result 2.5.

n

Note at this p oint that the condition assumed in Result 2.5 on the minimum

eigenvalue of the average Fisher information matrix is weaker than asymptotic

p ositive de niteness of the average information. However, it is sucientforweak

consistency of the maximum likeliho o d estimate. In the next couple of results, we

1=2

will consider sucient conditions for n -consistency and asymptotic normality

of the maximum likeliho o d estimate.

Result 2.6. Assume that Conditions 2.1 through 2.4 hold, and the observa-

tions (Y ;X );::: ;(Y ;X ) are generated by of adaptive sequential trials.

1 1 n n

Supp ose that the design scheme is such that the average Fisher information up

P

n

1

I (; X ) converges in probability to a nonrandom p os- to the nth trial n

r

r =1

itive de nite matrix A as n tends to in nity. Then there exists a choice of the

1=2

^ ^

maximum likeliho o d estimate  such that as n tends to in nity, n (  )

n n

converges weakly to a d-dimensional normal random vector with zero mean and

1

A as the disp ersion matrix.

A pro of of this result, which utilizes a standard martingale central limit

theorem, will b e given in the App endix. The following is an easy consequence of

Result 2.6, and is quite useful in statistical applications.

Result 2.7. Assume that I (; x) is a continuous function of both of its argu-

ments, where  varies in  and x varies in the compact space . Then under the

P

n

1

^

conditions assumed in Result 2.6, n I ( ;X )converges to A in probabil-

n r

r =1

ity as n tends to in nity for an appropriate choice of the maximum likeliho o d

P

n

1=2

^ ^ ^

estimate  , and consequently the weak limit of f I ( ;X )g (  ) will

n n r n

r =1

be d-dimensional normal with zero mean and the d  d identity matrix as the

disp ersion matrix. As a result, the con dence ellipsoid for the parameter  , which

2

can b e constructed using the distribution with d degrees of freedom and based

^

on the maximum likeliho o d estimate  together with the estimated total Fisher

n

P

n

^

information up to the nth trial I ( ;X ), will asymptotically have the right

n r

r =1

coverage probability.

428 PROBAL CHAUDHURI AND PER A. MYKLAND

A particular adaptive pro cedure for designing nonlinear exp eriments was pro-

p osed in Chaudhuri and Mykland (1993), and it has b een shown by these authors

to satisfy the limiting condition on the design sequence assumed in Results 2.6

and 2.7. Results 2.5 through 2.7 app ear to b e the most general sucient condi-

1=2

tions that are available at this moment for consistency, n convergence rate

and asymptotic normality of the maximum likeliho o d estimate constructed from

dep endent observations generated by a sequentially designed adaptive exp eri-

ment in a nonlinear set up. The fundamental implication of Results 2.6 and 2.7

is that as long as the weak limit of the average Fisher information, which happ ens

to b e a random ob ject in nite samples in view of the adaptive sequential nature

of the design, is degenerate and p ositive de nite, valid asymptotic inference is

p ossible based on the maximum likeliho o d estimate just like the large sample

in xed design problems, where successive observations are

indep endent. In other words, the adaptive sequential nature of the exp eriment

can essentially be ignored for making large sample inference even though there

will b e in selected design p oints and dep endence in the sequence of

observations in such an exp eriment. It has b een established in the literature (see

e.g. Johnson (1970), Johnson and Ladalla (1979), LeCam (1986), Prakasa Rao

(1987)) that under some standard smo othness conditions (including some Cramer

typ e conditions) on the likeliho o d and the prior distribution (i.e. a probability

measure on the parameter space ) many desirable asymptotic prop erties (in

the frequentist sense) hold for the Bayes estimate and related

1=2

based on indep endent data (e.g. n -consistency and asymptotic normality of

the Bayes estimate, consistency of the p osterior and its asymptotic expansion,

asymptotic accuracy of the Bayesian credible region constructed through high-

est p osterior distribution, etc.). A natural question, which arises at this point

and is currently b eing investigated by the authors, is to what extent analogous

asymptotic results will hold if Bayesian techniques are applied to dep endentdata

arising from adaptive sequential trials. Recently Wo o dro ofe (1989) has explored

some issues closely related to this in the context of adaptive linear mo dels that

arise in some sequentially designed exp eriments.

The weak convergence of the average Fisher information to a degenerate

p ositive de nite limit is a crucial ergo dicity condition related to the growth of

the martingale pro cesses intrinsically asso ciated with the likeliho o d (see also Hall

and Heyde (1980), Sweeting (1980, 1983), Prakasa Rao (1987), who considered

maximum likeliho o d estimation in various dep endent pro cesses). It is desirable

that the chosen design ensures optimality of that weak limit with resp ect to some

suitable optimal design criterion. Then Results 2.6 and 2.7 together with such

an asymptotic optimality (if it holds) of the generated design will guarantee rst

order eciency of likeliho o d based parameter estimates and related inference in

DESIGNING NONLINEAR EXPERIMENTS 429

sequentially designed adaptive exp eriments. In the following section, we consider

limiting prop erties of design sequences generated by some adaptive pro cedures

in terms of the asymptotic b ehavior of the average Fisher information matrix

P

n

1

I (; X ) and establish the large sample optimalityof those pro cedures n

r

r =1

with resp ect to a broad and useful class of optimal design criteria.

3. Asymptotic Optimality of Designs

Consider the following real valued functions de ned on the space of d  d

p ositive de nite matrices. They are frequently used to de ne standard optimal

design criteria based on the Fisher information matrix asso ciated with an exp er-

iment.

(1) (A) = log fdet (A)g (D-optimality criterion).

1

(2) (A) = trace (A ) (A-optimality criterion).

p

(3) (A) = trace (A ) (Kiefer's  optimality criterion with p being a xed

p

p ositiveinteger).

An imp ortant prop ertyofeach of these functions is that it is continuous, strictly

convex and remains b ounded below as A varies over any convex and b ounded

subset of the space of d  d p ositive de nite matrices (see e.g. Kiefer (1974)).

Supp ose now that is a compact metric space, and I (; x)isacontinuous func-

tion of b oth of its arguments. Following Kiefer (1959, 1961) and Kiefer and

Wolfowitz (1959), let the design space D ( ) be de ned as the collection of all

p ossible probability measures on . We will assume that for every  2 , there

R

exists a  2D( ) such that the matrix I (; x) (dx) is p ositive de nite. Next



we de ne a lo cally -optimal design (see also Cherno (1953))  2D( ) at  as

   

Z Z



I (; x) (dx) ; I (; x) (dx) = min  

 2D ( )

where  is as de ned in (1) or (2) or (3) at the b eginning of this section. Then, in

view of the compactness of the metric space and the continuityofI (; x), such



a lo cally -optimal  , whichdependson  ,willalways exist though it maynot

R



I (; x) (dx) be unique. However, the lo cally -optimal Fisher information

will be a uniquely de ned p ositive de nite matrix in view of the strict convex-

ity of . Therefore, it is meaningful to lo ok for adaptive pro cedures generating

the design sequence X ;::: ;X in such a way that the average Fisher informa-

1 n

R

P

n

 1

I (; x) (dx)asn tends to in nity. tion n I (; X )converges weakly to

r

r =1

From nowonwe will assume that the following condition holds.

Condition 3.1. is a compact metric space, and I (; x) has the form I (; x)=

T d

fV (; x)gfV (; x)g for all  2  and x 2 , where V (; x) is a continuous R

valued function of b oth of its arguments.

430 PROBAL CHAUDHURI AND PER A. MYKLAND

Observe that this condition will b e trivially satis ed for typical exp eriment

spaces used in practice (e.g. when is a nite set or a compact set in an Euclidean

space or some kind of a mixture of the two). Also, for Fisher information matrices

asso ciated with frequently o ccurring nonlinear mo dels (e.g. standard nonlinear

regression mo dels, generalized linear mo dels), it is easy to see that this condition

holds (see also some of the remarks in Chaudhuri and Mykland (1993, p: 640)). In

Section 3.1 that follows, we will explore the asymptotic p erformance of adaptive

schemes for -optimal sequential designs, which generalize the D-optimal design

considered by Chaudhuri and Mykland (1993). We will establish some desirable

asymptotic prop erties of the generated design sequence under suitable regularity

conditions.

3.1. Adaptive schemes for regular optimal design criteria

We now concentrate on exp eriments, where the value of n is determined

by the available resources, and is known to the exp erimenter at the b eginning.

We will carry out n of these n trials at the initial stage when either very lit-

1

tle or almost no information on the parameter of interest is available. Let us

denote the rst n design points by X ;::: ;X , which are to be chosen in a

1 1 n

1

static (i.e. non-dynamic or non-adaptive) fashion. In the absence of any prior

knowledge ab out the parameter  , these initial design p oints can b e selected sys-

tematically to make them evenly distributed in . Alternatively, the uniform

on can be used to generate n i.i.d random points

1

in . In some situations, reliable and adequate prior informations on  may

be available. In those cases, numerous theoretical and empirical results avail-

able in the literature (see e.g. Atkinson and Hunter (1968), Box (1968, 1970),

Rasch (1990), Ford, Torsney and Wu (1992), Haines (1993)) on lo cally optimal

designs for nonlinear exp eriments can provide useful guidelines for selecting the

initial design p oints incorp orating such prior informations. Next, for each i such

that n < i  n, the ith design point X is to be chosen in a dynamic fash-

1 i

ion using the adaptive sequential pro cedure that minimizes the quadratic form

n o

P

i1

 T 1   

fV ( ;X )g r (i 1) I ( ;X ) fV ( ;X )g. Here  is an estimate

i r i

i i i i

r =1

of  based on data available up to the current stage (i.e. the observa-

tions (Y ;X );::: ;(Y ;X ) obtained at the (i 1)th trial and b efore), and

1 1 i1 i1

r denotes the derivative matrix of  as de ned and discussed in Wu and

Wynn (1978, pp: 1274-1275). For illuminating discussions on some related al-

gorithms for sequential generation of design p oints for sp ecial as well as general

optimality criteria, the reader is referred to Wynn (1970, 1972), Fedorov(1972),

Atwood (1973, 1976), Tsay (1976), Pazman (1986), Kitsos (1989), Rob ertazzi

and Schwartz (1989), etc.. Clearly, the selection rule for X , where n

i 1

is myopic in nature as it lo oks only one step into the future beyond present

DESIGNING NONLINEAR EXPERIMENTS 431

(note that since we are selecting only one design p ointatany particular stage of

selection, the implementation of the algorithm b ecomes quite convenient). Nev-

ertheless, the following result, which generalizes an earlier D-optimality result in

Chaudhuri and Mykland (1993, Theorem 3.5), establishes a set of simple su-

cient conditions that ensure asymptotic -optimality of such a myopic scheme.

A pro of of this result, which utilizes some key technical results develop ed in Wu

and Wynn (1978) in the course of their exploration of the convergence prop erties

of general step-length algorithms for regular optimal design criteria, will b e given

in the App endix.

Result 3.2. Assume that Condition 3.1 holds, and the initial static exp eriment

is chosen in such a way that n tends to in nity, n =n tends to zero, and the

1 1

P

n

1

1

I (; X ) remains b ounded away from smallest eigenvalue of the matrix n

r

1

r =1



zero as n go es to in nity. Further, assume that the estimates  's (n

1

i

used in the adaptive sequential stage of the exp eriment satisfy the following



consistency and stability conditions. (a) max Prfj  j > g tends to

n

1 i

zero as n tends to in nity for any >0.

o o n n

P P P

i i n1

 1  1

re- ;X ) I ( ;X )  i I (  i (b) The sum

r r

i i+1

r =1 r =1 i=n +1

1

mains b ounded in probability as n tends to in nity.

Then, if the design p oints X 's with n

i 1

P

n

1

I (; X ) converges in describ ed ab ove, the average Fisher information n

r

r =1

probability to the lo cally -optimal (at  , the true value of the unknown param-

R



eter) Fisher information I (; x) (dx)asn tends to in nity.

It will be appropriate to note here that in view of some of the remarks in

Chaudhuri and Mykland (1993, p: 543), it is not dicult to cho ose the initial

design sequence prop erly so that the conditions mentioned at the b eginning of

the statement of Result 3.2 will be satis ed. Also, it is p ossible to show that



's (n

1

i

(b) if wefollow the principal ideas b ehind some of the explicit constructions given

by Chaudhuri and Mykland (1993, p: 543), who restricted their attention to D-

optimal designs and used some similar conditions. For an insight into some of

the consequences and implications of these two technical conditions, the reader

is referred to Chaudhuri and Mykland (1993, Section 3). An imp ortant issue

that remains unresolved at this moment is the e ect of the choices of the initial



static exp eriment and the parameter estimates  's on the nite sample accuracy

i

^

of the maximum likeliho o d estimate  . The authors are currently lo oking into

n

certain martingale expansions related to the log-likeliho o d and its derivatives

in an attempt to explore beyond the asymptotics that is achievable through a

martingale . It is hop ed that such expansions will provide

deep er insights into the problem leading towards more explicit guidelines for

cho osing the initial static exp eriment and a more e ective construction of the

432 PROBAL CHAUDHURI AND PER A. MYKLAND



estimates  's.

i

Recently McLeish and Tosh (1990) have rep orted an extensive simulation

study on some sp eci c sequential designs in bioassay. To investigate the nite

sample p erformance of our adaptive sequential designs, we ran some simulations

with the two parameter simple linear mo del (i.e. Pr(Y =

1

1jX ) = exp( + X )f1 + exp( + X )g ). The real valued covariate X

0 1 0 1

was allowed to vary in the unit interval [0; 1]. It is p ossible to show in this

case (see e.g Minkin (1987)) that the D-optimal design is uniform and supp orted

on a two p oint subset of [0; 1]. This two point subset varies dep ending on the

parameter vector  = ( ; ). Sp eci cally, when  = (0:0; 0:0), the optimal

0 1

design is the uniform distribution on f0:0; 1:0g, while for  = (3:0; 5:0), it is

uniform on f0:0; 0:41g. We decided to take n = 55 and the size of the initial

static design was chosen to b e n = 15 with rst 15 design p oints evenly placed

1

in the interval [0; 1]. After that for n < i  n, the design points X 's were

1 i

determined one by one using the adaptive D-optimality criterion as describ ed



at the b eginning of this Section 3.1. We chose  = the maximum likeliho o d

i

estimate based on (Y ;X );::: ;(Y ;X )ifj

1 1 j j

other words, the estimate was up dated after the 15th, the 25th, the 35th and the

45th trials. In the case  =(0:0; 0:0), the values of the sequentially selected design

points X through X were observed to alternate b etween 0.0 and 1.0, while for

16 55

 =(3:0; 5:0) they were alternating b etween 0.0 and 0.41. Even with several other

choices of  , we always observed the sequentially chosen values of X through

16

X oscillating b etween two supp orting p oints of the true optimal design. Even

55

though the maximum likeliho o d estimates computed at various stages happ ened

to b e somewhat away from the true parameters due to variations caused

by the randomness in the data, the adaptive sequential pro cedure did not miss

the optimal design in any of the simulations that weran.

3.2. Some technical issues and concluding remarks

Result 3.2 asserts the existence of asymptotically -optimal designs that will

guarantee asymptotic normalityaswell as rst order eciency of the maximum

likeliho o d estimate based on dep endentdata arising from adaptive and sequen-

tially designed nonlinear exp eriments. However, it do es not give the freedom to

do it in an arbitrary way. All of the n static design points have to be at the

1

b eginning of the exp eriment, and in the general case one mayhave to use a batch

sequential scheme rather than a fully sequential approach (see also some of the

remarks in Chaudhuri and Mykland (1993, Section 3, p: 543)). This raises the

question : whether it is p ossible to have fully adaptive and sequential schemes

that will lead to asymptotically -optimal designs. A partial answer to this

question is provided by the following Result.

DESIGNING NONLINEAR EXPERIMENTS 433

Result 3.3. Assume that is a compact metric space, I (; x) is a continu-

ous function of b oth of its arguments, and there exists  2 D ( ) such that

0

R

I (; x) (dx)ispositive de nite for all  2 . With the exception of the static

0

;::: that are chosen following the idea describ ed in ;::: ;X design points X

k k

m 1

the remark immediately after Result 2.5, cho ose the design p oint X (i 6= k for

i m

 

any m  1) using the probability distribution  on . Here  denotes a lo cally

i i

 

-optimal design at  ,and 's form a sequence of consistent estimates of  such

i i



that  is based on the data (Y ;X );::: ;(Y ;X ) obtained up to the (i 1)th

1 1 i1 i1

i

P

n

1

I (; X )converges in prob- trial. Then the average Fisher information n

r

r =1

ability to the lo cally -optimal (at  , the true value of the unknown parameter)

R



(dx)asn tends to in nity. I (; x) Fisher information



Note that in view of Result 2.5 and the remark following it,  can b e taken

i

to be the maximum likeliho o d estimate based on data up to the (i 1)th trial.

However, it is not at all clear how go o d or useful is the pro cedure outlined in



(i.e. a lo cally - Result 3.3. In practice, it may be dicult to determine 

i



optimal design at  ) explicitly, and generating a random element from using

i



the probability measure  may be even harder. We therefore recommend the

i

scheme describ ed in Result 3.2 whenever p ossible. Nevertheless, it has substantial

theoretical interest that an asymptotically -optimal adaptive sequential design

as describ ed in Result 3.3 can b e established under such minimal conditions. Note

that for Result 3.3, we onlyneedthecontinuity of the information matrix I (; x),

and no sp ecial structure (like the one assumed in Condition 3.1) is necessary. In

particular, this result is applicable to many heteroscedastic regression mo dels,

where the conditional distribution of the resp onse given the regressor is assumed

to b e normal with a variance that dep ends on the mean in a nonlinear way. Result

3.3 do es make one wonder if it is p ossible to nd more convenient, fully adaptive

and sequential designs that will ensure rst order eciency of the maximum

likeliho o d estimate of a parameter in a smo oth and regular nonlinear mo del.

Finally,wewould like to close this section by p ointing out that the sequence

of non-adaptive design points X ;::: ;X ;::: are intro duced here to ensure

k k

1 m

consistency of the maximum likeliho o d estimate. While this is a convenient

strategy that can be used to achieve consistency in a very general set up, one

should b e able to avoid it if necessary, and use more natural and easier metho ds

1=(1 )

in sp eci c situations. Note that in view of the fact that m k tends to

m

one as m tends to in nity, the sequence X ;::: ;X ;::: do es not a ect the rst

k k

1 m

1=2

order asymptotic prop erties of the n -consistentmaximum likeliho o d estimate

^

 .

n

Acknowledgements

The research of Probal Chaudhuri was partially supp orted by a grant from

434 PROBAL CHAUDHURI AND PER A. MYKLAND

Indian Statistical Institute. The research of Per A. Mykland was partially sup-

p orted by NSF grant DMS 93-05601. The authors thank the Referee and the

Asso ciate Editor for several helpful comments.

App endix : The Pro ofs

Pro of of Result 2.5. Let us intro duce an increasing sequence of  - elds

F 's with 1  i  n such that F is generated by Y ;::: ;Y (in other words,

ni ni 1 i

P

i

 (Y ;::: ;Y )=F ). Then it follows from Condition 2.2 that f G(Y ;;X );

1 i ni r r

r =1

F g is a square integrable martingale. Further, wemust have

ni 1in

( )

2

n n

X X

2

E = O (n) as n !1; G(Y ;;X ) = E jG(Y ;;X )j

r r r r

r =1 r =1

so that

n

X

1=2

G(Y ;;X ) = O (n ) as n !1: (A:1)

r r P

r =1

P

i

[H (Y ;;X )+I (; X )]; F g is another square Next observe that f

r r r ni 1in

r =1

integrable martingale, and therefore a similar argumentasabove implies that

n

X

1=2

[H (Y ;;X )+I (; X )] = O (n ) as n !1: (A:2)

r r r P

r =1

For any >0, let N ( ) denote the neighborhood with  as the center and the ra-



dius  . It is now easy to see using Condition 2.4 and the condition assumed on 

n

in the statement of the result that if wecho ose  = n , where < <(1=2) ,

n

P

n

1 0

the smallest eigenvalue of the Hessian matrix of n log ff (Y j ;X )g will

r r

r =1

remain negative and b ounded away from zero in probability as n tends to in nity

0

for all  2 N ( ). In other words, we will have the following,



n

) (

n

X

0 0

log ff (Y j ;X )g is concave for  2 N ( ) = 1: (A:3) lim Pr

r r 

n

n!1

r =1

Consider next the following third order Taylor expansion of the log-likeliho o d

around the true parameter  ,

) (

n n n

X X X

0 0 T

log ff (Y j ;X )g = log ff (Y j; X )g +(  ) G(Y ;;X )

r r r r r r

r =1 r =1 r =1

( )

n

X

0 0 0 0 T

H (Y ;;X ) (  )+R ( ; ): (A:4) +(  )

r r n

r =1

DESIGNING NONLINEAR EXPERIMENTS 435

Here, in view of Condition 2.4, the remainder term in (A.4) will satisfy

0 3

sup jR ( ;)j = O (n ) as n !1: (A:5)

n P

n

0 0

 :j  j

n

Results (A.1) through (A.5) now imply that the probability of the event that the

likeliho o d equation has a ro ot in the interior of N ( ) will tend to one as n tends



n

to in nity. This completes the pro of.

^

Pro of of Result 2.6. Applying Result 2.5, let  be a weakly consistent solution

n

of the likeliho o d equation. Consider the following rst order Taylor expansion of

P

n

1

^

n G(Y ;  ;X ) around the true parameter  .

r n r

r =1

n n

X X

1 1

^

0=n G(Y ;  ;X )=n G(Y ;;X )

r n r r r

r =1 r =1

) (

n

X

1

^

(  ); (A:6) H (Y ;;X )+ ( ) n +

n r r n

r =1

where  ( )isa d  d random matrix such that j ( )j tends to zero in probability

n n

^

as n tends to in nity in view of Condition 2.4 and the weak consistency of  .

n

Further, in view of Conditions 2.2 and 2.3, the design condition in the statement

of the result and the martingale central limit result stated in Corollary 3.1 in Hall

P

n

1=2

and Heyde (1980, pp: 58-59), we conclude that n G(Y ;;X )converges

r r

r =1

weakly to a d-dimensional normal random vector with zero mean and A as the

disp ersion matrix. Note that Corollary 3.1 in Hall and Heyde (1980, pp: 58-

59) is stated for real valued martingales, and we need to argue here via the

well known Cramer-Wold device. The conditional Lindeb erg condition that is

needed for applying this Corollary is satis ed b ecause of the moment restriction

imp osed in Condition 2.2. Also, the condition on the conditional variance pro cess

assumed in that corollary can b e easily veri ed using Condition 2.3 and the design

condition assumed in the statement of the result. Finally, using (A.2) we get that

P

n

1

n H (Y ;;X ) must converge in probability to the non-random p ositive

r r

r =1

de nite matrix A. The pro of is now complete using (A.6).

^

Pro of of Result 2.7. The weak consistency of  implies that for any  > 0,

n

^

the probability of the event f  2 N ( ) g will tend to one as n tends to in nity.

n 

The continuity of the Fisher information matrix in b oth of its arguments ensures

its uniform continuity as the two arguments vary in compact subsets contained

in their resp ective domains. In particular, wemust have

P

^

max I ( ;X ) I (; X ) ! 0 as n !1: (A:7)

n r r

1r n

436 PROBAL CHAUDHURI AND PER A. MYKLAND

P

n

1

^

I ( ;X )to The weak convergence of the estimated average information n

n r

r =1

A and subsequent assertions in the statement of the result now follow immediately

from (A.7) and Result 2.6.

Pro of of Result 3.2. We begin byintro ducing some notations. Supp ose that

the design p oints X 's are generated following the strategy describ ed b efore the

i

0 0

statement of the result. For any  2  and 1  m  n, let us write J ( )=

m

R

P

m

 0  1

I (; x) (dx), where  is the true parameter I ( ;X ), and set J = m

r

r =1



and  is a lo cally -optimal design at  . Also, following Wu and Wynn (1978),

we de ne

@

r(M ;M )= lim f (1 )M + M g

1 2 1 2

!0+

@

and

2

@ 

2

f (1 )M + M g ; r (M ;M )= lim

1 2 1 2

2

!0+

@

where M and M are p ositive de nite matrices. Since each of the functions

1 2

de ned in (1), (2) and (3) at the b eginning of Section 3 are nicely di erentiable,

we will have r(M ;M ) = trace fr(M )(M M )g. Here for M = (m ),

1 2 1 2 1 ij

 

@ (M )

the derivative r(M )ofevaluated at M is de ned as r(M )= . A

@m

ij

second order Taylor expansion now yields the following for n  i

1



fJ ( )g

i+1

i+1

 1  

=fJ ( )g +(i +1) rfJ ( );I( ;X )g

i i i+1

i+1 i+1 i+1

  1 2 2 

;X )g; (A:8) ;X );I( )+ I ( +2 (i +1) r f(1  )J (

i+1 i+1 i i i

i+1 i+1 i+1

1

where 0    (i +1) . Next, it follows from some crucial observations in

i

Wu and Wynn (1978, Section 3) and the conditions assumed in the statement



of the result that fJ ( )g remains b ounded (uniformly in i) in probability

i

i+1

as n tends to in nity and i varies between n and n 1. Now, x  > 0 and

1

 

supp ose that fJ ( )g > (J )+ for all i with n  i < n. Then, using

i 1

i+1

the convexity of  and the metho d of construction of X ,we get (see also the

i+1

arguments used in connection with (2.2) in Wu and Wynn (1978))

 

0 <  < fJ ( )g(J )

i

i+1

   

 rfJ ( );J g rfJ ( );I( ;X )g: (A:9)

i i i+1

i+1 i+1 i+1

On the other hand, the assumptions in the statement of the result ensure that

2

the r  term (i.e. the third term on the right) in (A.8) remains b ounded (again

uniformly in i) in probability as n tends to in nity and i varies b etween n and

1

n 1. Therefore, using the condition that n tends to in nity as n tends to

1

in nity and (A.9), wehave

 

  1

lim min Pr fJ ( )gfJ ( )g(i +1) (=2) =1: (A:10)

i+1 i

i+1 i+1

n!1

n i

DESIGNING NONLINEAR EXPERIMENTS 437

P

n1

1

Note at this p oint that (i +1) tends to in nityas n go es to in nityin

i=n

1

view of the condition n =n tends to zero as n tends to in nity. Condition (b)

1

assumed in the statement of the result now implies the following,

lim Pr [ There exists i suchthat n

1

n!1

 

and fJ ( )g(J )+ ] = 1: (A:11)

i

i+1

Further, it follows from (A.8) that

 

lim min Pr [fJ ( )gfJ ( )g +  ] = 1: (A:12)

i+1 i

i+1 i+1

n!1

n i

1

One can conclude from (A.8) and the arguments used in connection with (A.9)

and (A.10) (see also the arguments used in connection with (2.5) in Wu and

Wynn (1978)) that

 

lim min Pr [fJ ( )gfJ ( )g

i+1 i

i+1 i+1

n!1

n i

1

 

whenever fJ ( )g > (J )+ ]=1: (A:13)

i

i+1

Finally, (A.11), (A.12), (A.13) and Condition (a) in the statement of the result

yield



lim Pr [fJ ( )g  (J )+4 ]=1:

n

n!1

This completes the pro of in view of the continuity and strict convexityof.

Pro of of Result 3.3. For n  1, de ne the set of p ositive integers S as

n

S = fr : 1  r  n and r 6= k for any m  1g. Then, in view of the way

n m

1

the sequence of positiveintegers k 's is constructed, n f#(S )g must tend to

m n

one as n tends to in nity. Recall now that  is a continuous and strictly convex

function. It is easy to show using the continuity of I (; x), the compactness of



, the weak consistency of the sequence of estimates  's and straight forward

i

mo di cations of some of the arguments in the pro of of Lemma A.2 in Chaudhuri

and Mykland (1993) that

Z Z

X

P

1  

f#(S )g I (; x) (dx)! I (; x) (dx) as n !1: (A:14)

n

r

r 2S

n

R 

P



[I (; X ) I (; x) (dx)]; F is a square inte- Observe also that

r t

r

r 2S

t

t2S

n

grable martingale, where F is the  - eld generated by(Y ;X );::: ;(Y ;X ) (in

t 1 1 t t

other words  f(Y ;X );::: ;(Y ;X )g = F ). Hence, wemust have

1 1 t t t

Z

X

P

1 

[I (; X ) f#(S )g I (; x) (dx)] ! 0 as n !1: (A:15)

r n

r

r 2S

n

The desired result is now immediate from (A.14) and (A.15).

438 PROBAL CHAUDHURI AND PER A. MYKLAND

References

Ab delbasit, K. M. and Plackett, R. L. (1983). Exp erimental design for binary data. J. Amer.

Statist. Asso c. 78, 90-98.

Atkinson, A. C. and Hunter, W. G. (1968). The design of exp eriments for parameter estimation.

Technometrics 10, 271-289.

Atwo o d, C. L. (1973). Sequences converging to D-optimal designs of exp eriments. Ann. Statist.

1, 342-352.

Atwo o d, C. L. (1976). Convergent design sequences, for suciently regular optimality criteria.

Ann. Statist. 4, 1124-1138.

Bates, D. M. and Watts, D. G. (1988). Nonlinear Regression : Analysis and Its Applications.

John Wiley, New York.

Bickel, P. J. (1978). Using residuals robustly I : tests for , nonlinearity. Ann.

Statist. 6, 266-291.

Box, G. E. P. and Hill, W. J. (1974). Correcting inhomogeneityofvariance with p ower trans-

formation weighting. Technometrics 16, 385-389.

Box, G. E. P. and Hunter, W. G. (1965). Sequential design of exp eriments for nonlinear mo dels.

In Pro ceedings of the IBM Scienti c Computing Symp osium on , 113-137, Octob er

21-23, 1963.

Box, M. J. (1968). The o ccurrence of replications in optimal designs of exp eriments to estimate

parameters in non-linear mo dels. J. Roy. Statist. So c. Ser.B 30, 290-302.

Box, M. J. (1970). Some exp eriences with a nonlinear exp erimental design criterion. Techno-

metrics 12, 569-589.

Carroll, R. J. and Rupp ert, D. (1988). Transformation and Weighting in Regression. Chapman

and Hall, London.

Chaudhuri, P. and Mykland, P.A.(1993). Nonlinear exp eriments : Optimal design and infer-

ence based on likeliho o d. J. Amer. Statist. Asso c. 88, 538-546.

Cherno , H. (1953). Lo cally optimal designs for estimating parameters. Ann. Math. Statist.

24, 586-602.

Cherno , H. (1975). Approaches in sequential design of exp eriments. In A Survey of Statistical

Design and Linear Mo dels (Edited by J. N. Srivastava), 67-90, North Holland, New York.

Co chran, W. G. (1973). Exp eriments for nonlinear functions (R. A. Fisher Memorial Lecture).

J. Amer. Statist. Asso c. 68, 771-781.

Cox, D. R. and Snell, E. J. (1989). Analysis of Binary Data. Chapman and Hall, London.

Efron, B. (1975). De ning the curvature of a statistical problem (with applications to second

order eciency). Ann. Statist. 3, 1189-1242.

Efron, B. (1978). The geometry of exp onential families. Ann. Statist. 6, 362-376.

Fedorov, V. V. (1972). Theory of Optimal Exp eriments. Academic Press, New York.

Ford, I. and Silvey, S. D. (1980). A sequentially constructed design for estimating a nonlinear

parametric function. Biometrika 67, 381-388.

Ford, I., Torsney, B. and Wu, C. F. J. (1992). The use of a canonical form in the construction of

lo cally optimal designs for non-linear problems. J. Roy. Statist. So c. Ser.B 54, 569-583.

Ford, I., Titterington, D. M. and Kitsos, C. P. (1989). Recentadvances in nonlinear exp eri-

mental design. Technometrics 31, 49-60.

Ford, I., Titterington, D. M. and Wu, C. F. J. (1985). Inference and sequential design.

Biometrika 72, 545-551.

Gallant, A. R. (1987). Nonlinear Statistical Mo dels. John Wiley,NewYork.

DESIGNING NONLINEAR EXPERIMENTS 439

Haines, L. M. (1993). Optimal design for nonlinear regression mo dels. Comm. Statist. Theory

Metho ds 22, 1613-1627.

Hall, P. G. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Academic

Press, New York.

Jenrich, R. I. (1969). Asymptotic prop erties of non- estimators. Ann. Math.

Statist. 40, 633-643.

Jobson, J. D. and Fuller, W. A. (1980). Least squares estimation when the

and parameter vector are functionally related. J. Amer. Statist. Asso c. 75, 176-181.

Johnson, R. A. (1970). Asymptotic expansions asso ciated with p osterior distributions. Ann.

Math. Statist. 41, 851-864.

Johnson, R. A. and Ladalla, J. N. (1979). The large sample behavior of p osterior distribu-

tions when sampling from multiparameter exp onential family mo dels, and allied results.

Sankhya,Ser.B41, 196-215.

Kiefer, J. (1959). Optimum exp erimental designs (with discussion). J. Roy. Statist. So c. Ser.B

21, 272-319.

Kiefer, J. (1961). Optimum designs in regression problems, I I. Ann. Math. Statist. 32, 298-325.

Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). Ann.

Statist. 2, 849-879.

Kiefer, J. and Wolfowitz, J. (1959). Optimum designs in regression problems. Ann. Math.

Statist. 30, 271-294.

Kitsos, C. P. (1989). Fully-sequential pro cedures in nonlinear design problems. Comput.

Statist. Data Anal. 8, 13-19.

Le Cam, L. (1986). Asymptotic Metho ds in Statistical Decision Theory. Springer Verlag, New

York.

Lehmann, E. L. (1983). Theory of . John Wiley,NewYork.

McCormick, W. P., Mallik, A. K. and Reeves, J. H. (1988). Strong consistency of the MLE for

sequential design problems. Statist. Probab. Lett. 6, 441-446.

McCullagh, P. and Nelder, J. (1989). Generalized Linear Mo dels. Chapman and Hall, London.

McLeish, D. L. and Tosh, D. (1990). Sequential designs in bioassay. Biometrics 46, 103-116.

Minkin, S. (1987). Optimal designs for binary data. J. Amer. Statist. Asso c. 82, 1098-1103.

Myers, R. H., Khuri, A. I. and Carter, W. H. (1989). Resp onse surface metho dology : 1966-1988.

Technometrics 31, 137-157.

Pazman, A. (1986). Foundations of Optimum Exp erimental Design. D. Reidel, Boston.

Powsner, L. (1935). The e ects of temp erature on the durations of the developmental stages of

Drosophila Melanogaster. Physiological Zo ology 8, 474-520.

Prakasa Rao, B. L. S. (1987). Asymptotic Theory of Statistical Inference. John Wiley, New

York.

Rasch, D. (1990). Optimum exp erimental design in nonlinear regression. Comm. Statist.

Theory Metho ds 19, 4789-4806.

Rob ertazzi, T. G. and Schwartz, S. C. (1989). An accelerated sequential algorithm for pro ducing

D-optimal designs. SIAM J. Sci. Statist. Comput. 10, 341-358.

Seb er, G. A. F. and Wild, C. J. (1989). Nonlinear Regression. John Wiley,NewYork.

Silvey, S. D. (1980). Optimal Design. Chapman and Hall, London.

Sweeting, T. J. (1980). Uniform asymptotic normality of the maximum likeliho o d estimator.

Ann. Statist. 8, 1375-1381.

440 PROBAL CHAUDHURI AND PER A. MYKLAND

Sweeting, T. J. (1983). On estimator eciency in sto chastic pro cesses. Sto chastic Pro cess.

Appl. 15, 93-98.

Treloar, M. A. (1974). E ects of Puromycin on Galactosyltransferase of Golgi Membranes.

Unpublished Master's Thesis. UniversityofToronto.

Tsay, J. Y. (1976). On the sequential construction of D-optimal designs. J. Amer. Statist.

Asso c. 71, 671-674.

Woodroofe, M. (1989). Very weak expansions for sequentially designed exp eriments: Linear

mo dels. Ann. Statist. 17, 1087-1102.

Wu, C. F. J. (1981). Asymptotic theory of nonlinear least squares estimation. Ann. Statist. 9,

501-513.

Wu, C. F. J. (1985). Asymptotic inference from sequential design in a nonlinear situation.

Biometrika 72, 553-558.

Wu, C. F. J. and Wynn, H. P. (1978). The convergence of general step-length algorithms for

regular optimum design criteria. Ann. Statist. 6, 1273-1285.

Wynn, H. P. (1970). The sequential generation of D-optimum exp erimental designs. Ann.

Math. Statist. 41, 1655-1664.

Wynn, H. P. (1972). Results in the theory and construction of D-optimum exp erimental designs.

J. Roy. Statist. So c. Ser.B 34, 133-147.

Indian Statistical Institute, Calcutte, India.

Department of Statistics, University of Chicago, IL 60637, U.S.A.

(Received Decemb er 1993; accepted March 1995)