Sequential This page intentionally left blank Zakhula Gouindarajulu University of Kentucky, USA

Seq uential Statistics

\: World Scientific NEW JERSEY LONDON SINGAPORE BElJlNG SHANGHAI HONG KONG TAIPEI CHENNAI Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK once: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

SEQUENTIAL STATISTICS Copyright 0 2004 by World ScientificPublishing Co. Re. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any , electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permissionfrom the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN 981-238-905-9

Printed in Singapore. “Education without wisdom and wisdom without humility are husks without grain.”

Sri Sathya Sai Baba This page intentionally left blank Dedicated To the Memory of My parents This page intentionally left blank Preface

Sequential statistics is concerned with treatment of data when the number of ob- servations is not fixed in advance. Since Wald (1947) wrote his celebrated book, the subject has grown considerably, especially in the areas of sequential estima- tion and . The author’s book THE SEQUENTIAL STATISTICAL OF HYPOTHESIS TESTING, POINT AND INTERVAL ESTIMA- TION, AND DECISION THEORY (available from American Sciences Press, Inc., 20 Cross Road, Syracuse, New York 13224-2104, U.S.A.), ISBN 0-935950-17-6, is a comprehensive 680 page reference to the field of sequential analysis; every- one reading the present new work will likely want to see that there is at least a copy of that comprehensive book in their institution’s library, and many serious researchers will want a copy in their personal libraries. Of that previous book, reviewers said “There are plenty of examples and problems” and “The presenta- tion is clear and to the point.” In , the present new book is designed for a semester’s course and is thus less than half the length of the previous book. Other books by Ghosh (1970), Siegmund (1985)) Wetherill and Glazebrook (1986) and Ghosh, Mukhopadhyay and Sen (1997) are either too theoretical or limited in scope. It is so easy for an instructor to get side-tracked and bogged down with details and running out of time to cover interesting topics. In this new version, I have tried to select only those topics that can be covered in a semester’s course. Still, the instructor may not be able to cover all the topics in the book in one semester. Thus he has some flexibility in the choice of topics. Straightforward and elementary proofs are provided and for more details the reader is referred to the earlier book of the author. Thus, the mathematical and statistical level of the book is maintained at an elementary level. This book is geared to seniors and first year graduate students who have had a semester’s course in each of advanced calculus, probability and . A semester’s course can be based on chapter 1, chapter 2 (excluding section 2.7) chapter 3 (excluding sections 3.7 and 3.8) chapter 4 ( excluding section 4.9) and chapter 5 (excluding sections 5.5 and 5.6). The instructor might devote three 50-minute lectures to chapter 1, ten lectures to chapter 2, nine lectures to each of chapter 3 and 4, and five lectures in chapter 5, with the remaining lectures devoted to sections of his/her and student’s interests.

1x X PREFACE

The chapter on applications to biostatistics is new and the supplement con- taining computer programs to certain selected sequential procedures is also pro- vided. Useful illustrations and numerical tables are provided wherever possible. Problems identified by the section to which they pertain are given at the ends of all chapters. An extensive list of references that are cited in the book is given at the end. This list of references is by no means complete.

April 2004 Z. Govindarajulu Professor of Statistics University of Kentucky Lexington, KY Acknowledgments

I have been inspired by the celebrated book on this topic by the late and I am grateful to Dover Publications for their kind permission for my use of certain parts of Wald’s book as a source. I am equally grateful to the American Sciences Press for permission to use several sections of my earlier book as a source for the present book. I am grateful to Dr. Hokwon Cho of the University of Nevada, Las Vegas for putting the entire manuscript on LaTex and cheerfully making all the necessary changes in the subsequent revisions. I am thankful to the Department of Statistics for its support and other help. I thank Professors Rasul Khan of Cleveland State University and Derek S. Coad of the University of Sussex (England) for their useful comments on an earlier draft of the manuscript. I also wish to express my thanks to Ms. Yubing Zhai and Ms. Tan Rok Ting, editors of World Scientific Publishing Co. for her encouragement, cooperation and help. My thanks go to the American Statistical Association for its kind permission to reproduce table 3.9.3 and tables 3.9.1 & 3.9.2 from their publications, namely, the Journal of American Statistical Association Vol. 65 and the American Sta- tistician Vol. 25, respectively. To Blackwell Publishing Ltd. for its kind permission to reproduce tables 2.7.1 and 2.7.2 from the Journal of Royal Statistical Society Series B Vol. 20 and tables 3.2.18~3.2.2 and table 3.3.lfrom the Australian Journal of Statistics , Vols 31 and 36 respectively. To the University of Chicago Press for their kind permission to reproduce data set from Olson & Miller, Morphological Integration, p. 317 and to have brief excerpt from Kemperman, J. H. B. (1961)) the Passage Problem for a stationary Markov Chain. To Springer-Verlag GMbH & Co. to use Theorem 8.25 and Corollary 8.33 of Siegmund, D. 0.)(1985) Sequential Analysis as a source for Theorem 3.4.4 of the present book. To Professor Erich Lehmann for his kind permission to use his book Testing Statistical Hypotheses (1959) as a source for the proof of Theorem 2.9.1 and the

xi xii ACKNOWLEDGMENTS statement of Theorem 2.9.2. To Oxford University Press for its permission of the Biometrika Trustees to reproduce Table 1 of Lindley and Barnett (1965) Biometriku, Vol. 52, p. 523. To Professors Donald Darling and Mrs. Carol Robbins for their kind permis- sion for reproducing Table 3.8.1 from Proceedings of the Nut. Acud. Sciences Vol. 60. To Professor Thomas Ferguson for his kind permission to use his distribution as Problem 2.1.6. To CRC Press for its kind permission to use Sections 1.2 and 1.3 of B. Wether- ill (1975) as a source for sections 1.1 and 1.2 of this book. To the Institute of Mathematical Statistics for its kind permission to repro- duce Tables 2.6.1, 2.10.1, 2.10.2 and 3.8.1 from the Annuls ofMuthemutica1 Sta- tistics and Annals of Statistics. To Francis Taylor Group for their kind permission to reproduce tables 5.4.1 and 5.4.2 from Stu&stics Vol. 33. To John Wiley & Sons, for their kind permission to use Whitehead (1983) sections 3.7 and 3.8 as a source for section 5.6 of this book. Contents

Preface ix

Acknowledgments xi 1 Preliminaries 1 1.1 Introduction to Sequential Procedures ...... 1 1.2 Inspection Plans ...... 3 1.2.1 Sample Size Distribution ...... 3 1.3 Stein’s Two-stage Procedure ...... 6 1.3.1 The Procedure ...... 7

2 The Sequential Probability Ratio Test 11 2.1 The Sequential Probability Ratio Test (SPRT) ...... 11 2.2 SPRT: It’s Finite Termination and Bounds ...... 13 2.3 The Operating Characteristic Function ...... 19 2.4 The Average Sample Number ...... 21 2.5 Wald’s F’undamental Identity ...... 29 2.5.1 Applications of the F‘undamental Identity ...... 30 2.6 Bounds for the Average Sample Number ...... 33 2.7 Improvements to OC and ASN Functions ...... 36 2.7.1 The OC F‘unction ...... 36 2.7.2 The Average Sample Number ...... 38 2.8 TruncatedSPRT ...... 40 2.9 Optimal Properties of the SPRT ...... 45 2.10 The Restricted SPRT ...... 47 2.11 Large-Sample Properties of the SPRT ...... 51 2.12 Problems ...... 54

3 Tests for Composite Hypotheses 59 3.1 Method of Weight Functions ...... 59 3.1.1 Applications of the Method of Weight Functions ...... 60 3.2 Sequential t and t2 Tests ...... 61

xiii XiV CONTENTS

3.2.1 Uniform Asymptotic Expansion and Inversion for an Integral 63 3.2.2 Barnard’s Versions of Sequential t- and t2-tests ...... 65 3.2.3 Simulation Studies ...... 65 3.2.4 Asymptotic Normality of the T ...... 66 3.2.5 Finite Sure Termination of Sequential t- and t2-tests .... 69 3.2.6 Sequential t2-test (or t-test for Two-sided Alternatives) . . 71 3.2.7 The Sequential Test T ...... 73 3.2.8 An Alternative Sequential Test T’ ...... 74 3.3 Sequential F-test ...... 75 3.3.1 Inversion Formula ...... 77 3.4 Likelihood Ratio Test Procedures ...... 79 3.4.1 Generalized Likelihood Ratio Tests for Koopman-Darmois Families ...... 86 3.5 Testing Three Hypotheses about Normal ...... 90 3.5.1 Armitage-Sobel- ...... 90 3.5.2 Choice of the Stopping Bounds ...... 93 3.5.3 Bounds for ASN ...... 94 3.5.4 Testing Two-sided Alternatives for Normal Mean ...... 96 3.6 The of the SPRT ...... 99 3.6.1 Efficiency of the SPRT Relative to the Fixed-Sample Size Procedure at the Hypotheses ...... 99 3.6.2 Relative Efficiency at 8 # 80, 81 ...... 102 3.6.3 Limiting Relative Efficiency of the SPRT ...... 105 3.7 Bayes Sequential Procedures ...... 106 3.7.1 Bayes Sequential Binomial SPRT ...... 106 3.7.2 Dynamic Programming Method for the Binomial Case ...111 3.7.3 The Dynamic Programming Equation ...... 112 3.7.4 Bayes Sequential Procedures for the Normal Mean .....114 3.8 Small Error Probability and Power One Test ...... 117 3.9 Sequential Rank Test Procedures ...... 123 3.9.1 Kolmogorov-Smirnov Tests with Power One ...... 123 3.9.2 Sequential ...... 124 3.9.3 Rank Order SPRT’s Based on Lehmann Alternatives: Two- Sample Case ...... 127 3.9.4 One-Sample Rank Order SPRT’s for Symmetry ...... 130 3.10 Appendix: A Useful Lemma ...... 138 3.11 Problems ...... 139

4 Sequential Estimation 143 4.1 Basic Concepts ...... 143 4.2 Sufficiency and Completeness ...... 144 4.3 Cram&-Rao Lower Bound ...... 152 CONTENTS xv

4.4 Two-Stage Procedures ...... 158 4.4.1 Stein’s Procedure for Estimating the Mean of a Normal Distribution with Unknown ...... 158 4.4.2 A Procedure for Estimating the Difference of Two Means . 162 4.4.3 Procedures for Estimating the Common Mean ...... 164 4.4.4 Double-Sampling Estimation Procedures ...... 167 4.4.5 Fixed Length Confidence Intervals Based on SPRT .....173 4.5 Large-Sample Theory for Estimators ...... 182 4.6 Determination of Fixed-width Intervals ...... 191 4.7 Interval and Point Estimates for the Mean ...... 196 4.7.1 for the Mean ...... 196 4.7.2 Risk-efficient Estimation of the Mean ...... 201 4.8 Estimation of Regression Coefficients ...... 203 4.8.1 Fixed-Size Confidence Bounds ...... 203 4.9 Confidence Intervals for P(X

5 Applications to Biostatistics 227 5.1 The Robbins-Monro Procedure ...... 227 5.2 Parametric Estimation ...... 228 5.3 Up and Down Rule ...... 230 5.4 Spearman-Karber (S-K) Estimator ...... 231 5.5 Repeated Significance Tests ...... 236 5.6 Test Statistics Useful in ...... 238 5.7 Sample Size Re-estimation Procedures ...... 246 5.7.1 Normal Responses ...... 246 5.7.2 Formulation of the Problem ...... 247 5.7.3 Binary Response ...... 254 5.8 Problems ...... 262

6 Matlab Programs in Sequential Analysis 265 6.1 Introduction ...... 265 6.2 Sequential Procedures ...... 269 6.2.1 Sequential Probability Ratio Test (SPRT) ...... 269 6.2.2 Restricted SPRT (Anderson’s Test) ...... 271 6.2.3 Rushton’s Sequential t-test ...... 273 6.2.4 Sequential t-test ...... 275 6.2.5 Sequential t2-test ...... 277 xvi CONTENTS

6.2.6 Hall’s Sequential Test ...... , . , . . 279 6.2.7 Stein’s Two-Stage Procedure () . , . , . 281 6.2.8 Stein’s Two-Stage Test ...... 283 6.2.9 Robbins’ Power One Test ...... 285 6.2.10 Rank Order SPRT ...... 287 6.2.11 Cox’s Sequential Estimation Procedure ...... , . . . 289 6.3 Distribution . . . , ...... 290

Referenced Journals 293

Bibliography 295

Subject Index 311 Chapter 1 Preliminaries

1.1 Introduction to Sequential Procedures

Sequential procedures differ from other statistical procedures in that the sample size is not fixed in advance. The experimenter has the option of looking at a sequence of observations one (or a fixed number) at a time and decide whether to: stop sampling and take a decision; or to continue sampling and make a decision some time later. The order of the sequence of observations which the experimenter will take is specified in advance. Decision problems in which the experimenter may sequentially vary the treatments is of a higher order of difficulty and is called the sequential design problem. For example, consider the following problem.

Problem 1.1.1 If we wish to compare several drugs or treatments (as an se- quential screening of cancer drugs), then it should be possible to drop some drugs out of the trials at an early stage if the results from these are very poor when compared with the others.

Thus, an essential feature of a sequential procedure is that the number of observations required to terminate the is a random variable since it depends on the outcome of the observations. Sequential procedures are of interest because they are economical in the sense that we may reach a decision earlier via a sequential procedure than via a fixed-sample size procedure. In sequential we need to specify:

1. the initial sample size

2. a rule for termination of the experiment 3. the additional number of observations to take if the experiment is to be continued; and

1 2 CHAPTER 2. PRELIMINARIES

4. a terminal decision rule.

Notice that (2) and (3) can be combined into a single rule. Experiments in which only the number of observations is sequentially dependent, require simpler theory and will be of general applicability, than the sequential design problem in which not only the number of trials but also the number of treatments will be sequentially dependent. If the experiment has been continued until we observe XI,X2, ..., X,, a se- quential test is completely defined by specifying the disjoint subsets Rg, RL, and R& of m-dimensional Euclidean space R, for m = 1,2, ... If (XI,X2, ..., X,) be- longs to Rg we accept the hypothesis H, we reject H when it belongs to RL and we continue sampling if it falls within region R&. Since the above sets are mutu- ally exclusive and have union R,, it suffices to specify any two of the three sets. The basic problem is a suitable choice of these sets. The criteria for the choice of these sets will be dictated by the operating characteristic (OC) and the average sample number (ASN) functions which will be elaborated in the following. Suppose that the underlying distribution function is indexed by a real-valued parameter, and suppose that the statistician has to choose between two hypothe- ses, HO and HI. The OC(8) is defined as the probability of accepting HO when 8 is the value of the parameter. It is desirable that the OC function should be high for values of 6’ that are consistent with HO and low for values of 8 that are consistent with HI. For instance, one may require OC(8) 2 1 - a for all 8 in HO and OC(8) 5 ,B for 8 in HI, where a and p denote the error probabilities. A sequential test S is said to be admissible if its OC function meets the above criteria. As noted earlier, the number of observations required by a sequential proce- dure is a random variable and of much interest is its expected value when 8 is the true value of the parameter. This expected value is typically a function of 8, and is called the ASN function. It is desirable to have a small ASN function for given a and p. We also desire the expected sample size to be smaller than those required by the fixed-sample size procedure. Let u(8lD) denote the ex- pected sample size of procedure D when 8 is the true value of the parameter. If DO is admissible and if u(OID0) = minDu(B(D) then DO is considered to be a LLuniformlybest” test. However, in general, no uniformly best test exists. It is possible to find an optimal sequential procedure when HO and HI are simple hypotheses. Wald’s sequential probability ratio test (SPRT) gives the minimum ASN at both HO and HI. The efficiency of a procedure D at 8 is defined as the ratio of the minimum expected sample number at 8 to the expected sample number of D at 8. Wald’s SPRT has efficiency equal to 1 at both Ho and HI. 1.2. SAMPLING INSPECTION PLANS 3

1.2 Sampling Inspection Plans

The earliest sequential procedure is the double sampling plan of Dodge and Romig (1929) for sampling inspection. A lot consisting of n items and rejecting (accept- ing) the lot if the number of defectives in the sample is > (<) c. The drawback of this scheme is that we might have had more than c defectives earlier than sample size n. An alternative scheme is: sample one item at a time, reject the lot as soon as the number of defectives in the sample is 2 c, and accept the lot as soon as the number of effective items in the sample is 2 n - c + 1. The required sample size is at least c and is at most n. This scheme is called curtailed inspection.

1.2.1 Sample Size Distribution Let N denote the random sample size required to terminate the experiment. Then

Pe (N = c and reject Ho) = P, (1.2.1)

Po (N = c + r and reject Ho) = ('z-~ ')oc(~ - O>T, r = 1,2).., n - c, (1.2.2)

Po (N= n - c+ 1 + s and accept Ho) = (n-;+S)s.(l -o)n-c+l,

s = O,l,..,c- 1. (1.2.3)

Now n Ee (N)= C mpm, m=l where pm denotes the probability that a decision is reached at the mth trial. (Note that P(N = mlreject Ho) = 0 for m < c, and P (N = mlaccept Ho) = 0 for rn < n - c+ 1). Further

pm = (reject at stage m) + P (accept at stage m,m 2 c) (1.2.4) 4 CHAPTER 1. PRELIMINARIES

Hence

(1.2.5)

c- 1 xL-(n-c:l+r ) 8'. (1.2.6)

One should prefer the curtailed sampling plan to an equivalent single sampling plan because E (Nle) for the former lies below the sample size for the single sampling plan. Consider the case c = 1. Then

n-1 E (Np) = 8C (r + 1) (1 - 8)' + n (1 - 8)n r=o n-1

r=o n-1 n n-1 n-1

r=o j=l r=O j=O

(1.2.7) which is increasing in y. Hence E (Nl8) is decreasing in 8 when c = 1. However, this is not true for c > 1. (See Table 1.2.1 and the case c = 4).

Table 1.2.1 Giving E (N|O) for various of n, c, and c=l c=2 C=4 10 20 25 .01 9.56 18.20 22.22 9.07 19.06 24.01 7.07 17.17 22.22 -10 6.51 8.78 9.28 8.76 14.73 16.49 7.74 18.10 22.58 .20 4.46 4.94 4.98 7.45 9.58 9.84 8.34 16.15 18.02 -30 3.24 3.33 3.33 6.03 6.64 6.66 8.50 12.77 13.17 .40 2.48 2.50 2.50 4.86 5.00 5.00 8.13 9.94 9.99 .50 2.00 2.00 2.00 3.97 4.00 4.00 7.39 8.00 8.00 1.2. SAMPLING INSPECTION PLANS 5

Let

PI (0) = P(accept lot using the fixed sample procedure 10)

(1.2.8) and

P2 (8) = P (accept lot using the sequential rule 10) n = P(accept lot and N = mle) m=n-c+l

c- 1 - (1 -qn-c+l ) er. (1.2.9)

Then we have following Lemma.

Lemma 1.2.1 PI (0) = P2 (0) for all n and c.

Proof. For c = 1, Pl (0) = P2 (e) = (1 - qn. For c = 2, p1 (e) = p2 (e) = (I - qn+ ne (1 - qn-l. Now assume it is true for any c and consider the case c + 1. That is, assume

and we wish to show that

r+n-c-1 er. (1.2.11) r ) k=O r=O

Subtract (1.2.10) from (1.2.11) and cancelling a common factor (1 - e)n-c, it suffices to show that 6 CHAPTER 1. PRELIMINARIES or

or

or c- 1

O=-E(r+;:;- l)er + 2 (s+ns-1 - c- y3, r=o which is obviously true. H

Remark 1.2.1 Lemma 1.2.1 can also be established by showing that all the sample paths leading to accepting the lot are exactly the same in both the sampling schemes.

1.3 Stein’s Two-stage Procedure

In this section we present a certain hypothesis-testing problem for which fixed- sample and meaningful procedures do not exist. However, a two-stage procedure has been given for the same problem. Consider the following problem. Let X be distributed as normal with mean 8 and variance u2,where 8 and u2 are both unknown. We wish to test HO : 8 = 80 against the alternative hypothesis HI : 8 > 80; this is known as Student’s hypothesis. It is well-known that given a random sample Xl,X2, ...)X,, the uniformly most powerful unbiased test of HO against HI is to reject HOwhen

(X- 60) Jn T= > in-1,l-a (1.3.1) S where x and s denote the mean and the of the observed Xi’s and tn-l,l-a denotes the lOO(1- a)th of the &distribution with n - 1 degrees of freedom. If 1 - 7r (8,u) denotes the power of the test in (1.3.1), then 7r (80,~)= 1-a, irrespective of the value of u. However, when one is planning an experiment, one is interested in knowing the probability with which the statistical test will detect a difference in the mean when it actually exists. However, the power function of “Student’s’’ test depends on u which is unknown. Hence, it is of interest to devise a test of HOversus HI,the power of which does not depend 1.3. STEIN’S TWO-STAGE PROCEDURE 7 on CT. However, Danzig (1940) has shown the nonexistence of meaningful fixed- sample test procedures for this problem. Stein (1945) proposed a two-sample (or two-stage) test having the above desirable property, where the size of the second sample depends on the outcome of the first.

1.3.1 The Procedure A random sample of size no observations XI,X2, ..., Xn,, is taken and the variance o2 is then estimated by

(1.3.2)

Then calculate n as n=max{ [:I +l,no+l}, (1.3.3) where z is a previously specified constant and [y] denotes the largest integer less than y, and draw additional observations Xno+l,Xno+2, ..., Xn. Evaluate, accord- ing to any specified rule that depends only on s2, real numbers ai (i = 1,2, ..., n) such that n n Cai = 1, a1 = a2 = ... = an,, s2xa:= z. ( 1.3-4) i=l i=l This is possible since n 1z min>af = - 5 - (1.3.5) ._ n s2 2= 1 by (1.3.3), the minimum being taken subject to the conditions a1 + a2 + ... +a, = 1,al = a2 = ... = an,. Define TI by

(1.3.6)

Then

is such that 8 CHAPTER 1. PRELIMINARIES

Also, it is well-known that V = (no - 1) s2/a2is distributed as central chi- square with no - 1 degrees of freedom.

uls2 normal (0, oz/s2> .

Hence S ~-1s~normal (0,1>. a S S Since the distribution of U-ls2 does not involve s, we infer that U- is uncondi- a U tionally distributed as normal(0,l) and is independent of s2. Consequently

(1.3.7)

If f(z,y) denotes the joint density of Us/a and s2, then

where g (x) is the density of Us/u because

So, Us/o and s2 are stochastically independent. i.e., U has the t-distribution with no - 1 degrees of freedom irrespective of the value of 0. Hence,the test based on T‘ is unbiased and has power free of u. Then in order to test for the one-sided alternative 8 > 80, the critical region of size a is defined by

( 1.3.8)

The power function is then

Analogues critical region, with similar power function independent of a, holds for the two-sided alternative: 8 # 80. As mentioned earlier, the above test is not used in practice. However, a simpler, and slightly more powerful, version of the test is available, as we now show. (Intuitively Stein’s test wastes information in order to make the power of the test strictly independent of the variance.) Instead of (1.3.3), take a total of

(1.3.10) 1.3. STEIN'S TWO-STAGE PROCEDURE 9 observations and define

(1.3.11)

One can easily establish that U1has a &distribution with no - 1 degrees of freedom. Since n > s2/q we have I(e - 00) fils1 > I(B - 00) /*I. So, if we employ critical region TI' > tno-l,l--(Y instead of (1.3.8) the power of the test will always be increased. Also, the number of observations will be reduced by 1 or left the same. Suppose we want the power to be 1 - p when 6' = 00 + 6 where 6 is specified. Then power at 00 + 6 is

= 1-p, provided tno-l,l--(Y - Sfi/s = -i!no-l,l-p where X denotes the sample mean. Now solving for n we obtain

(1.3.12)

Similarly in the two-sample case let X =d normal(pl, 02),Y =d normal(p2, 02) and X,Y be independent. Suppose we wish to test HO : p1 = p2 versus HI : p2 > pl. Suppose we wish to have error probability a when HOis true and power 1 - p when p2 - pl = 6. In the first stage we observe (X1,X2,..., Xno) and (Y1,Y2, ..., Ym)and compute

(1.3.13)

Then the total sample size to be drawn from each population is

n = max (n',no), 10 CHAPTER 1. PRELIMINARIES where 2 2 n=I ha-l),l-a s2+ t2(no-1),1-p] ( 1.3.14)

Moshman (1958) has investigated the proper choice of the initial sample size no in Stein’s two-stage procedure and believes that an upper percentage point of the distribution of the total sample size, n when used in conjunction with the expectation of the sample size, is a rapidly computable guide to an efficient choice of the size of the first sample. However, the optimum initial sample that maximizes a given function involves an arbitrary parameter which has to be specified by the experimenter from non-statistical considerations. If the initial sample size is chosen poorly in relation to the unknown a2,the expected sample size of Stein’s procedure can be large in comparison to the sample size which would be used if o2 were known (which it is not). For example, this can occur if o2 is very small; then (if o2 were known) a small total sample size would suffice, but one may use no much larger (hence being inefficient). However, this problem is not of practical significance. Chapter 2 The Sequential Probability Ftatio Test

2.1 The Sequential Probability Ratio Test (SPRT)

During World War 11, Abraham Wald and others began working on sequential procedures and developed what is called the Sequential Probability Ratio Test procedure, which can be motivated as follows: Neyman and Pearson (1933) have provided a method of constructing a most powerful test for a simple versus simple hypothesis-testing problem. Suppose X has p.d.f. f (x;0) and we wish to test HO : 8 = 80 against HI : 8 = 81.

Lemma 2.1.1 (Neyman and Pearson, 1933). Let XI)X2) ...) Xn be a random sample and also let

Then the most powerful test of HO against HI is obtained by rejecting HO if An 2 K, and accepting Ho if An < K, where K is determined by the level of significance.

Wald proposed the following sequential probability ratio test which was ob- viously motivated by Lemma 2.1.1: Choose two constants A and B such that 0 < B < A < 00, and accept Ho if An 5 B; reject Ho if An 2 A; continue sampling if B < An < A when the experiment has proceeded up to stage n (n= 1,2,...).

Example 2.1.1 Consider the

where Q (0) is monotonically increasing in 0.

11 12 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

For this family the graph is as shown in Figure 2.1.1.a. However, at stage n the rule is equivalent to continue sampling if

n Cl+nD

.- (a 1 Ib) Figure 2.1.1 The Process of Sampling

As special cases of Example 2.1.1 consider:

Example 2.1.2 (Exponential distribution) Let f (x;6) = 8-l exp (+e) , x > 0, 8 > 0.

We wish to test HO: 00 = 80 versus HI : 00 = 01 (01 > 00). Then

The continue-sampling inequality after taking the nth observation is

n [InA+nln (k- &) [InB+nln < XXi< (M)8081 ($)I. (31 i=l 2.2. SPRT: IT’S FINITE TERMINATION AND BOUNDS 13

Example 2.1.3 For the binomial distribution a SPRT for HO : 8 = 80 versus HI : 0 = 81, (01 > 00) is defined by two constants B and A. After n observations, we continue sampling if

where m is the number of defectives or successes (Xi = 1) among the n observa- tions. Alternatively, at stage n the continue-sampling region is:

Q + sn < m < c1+ sn where

-In B In A 81 (1 - 00) co = InK’C1=- 1nK’ K= 00 (1 - 01) - In the plane of n and m, the continue-sampling region lies between two lines having common slopes and intercepts ~0 and c1. Each sample point (n,m) when plotted in this plane, has integer-valued coordinates. Two procedures, defined by pairs of intercepts (q,c1) and (4,~;)are equivalent if there is no point (n,m) n 2 m 2 0, between the lines y = q + sx and y = c; + sx and between the lines y = c1+ sz and y = c;I + sx. Anderson and Fkiedman (1960) have shown that if the slope is rational there is a denumerable number of SPRT’s, and if the slope is irrational, there is a nondenumerable number. Let s = M/R where M and R are relatively prime integers. Then a point (n,m) is on the line y = c + sx for a value of c = (mR- nM)/R which is rational. All the lines required for defining SPRT’s in this case have intercepts of the form shown above. There is a denumerable number of such lines, and hence a denumerable number of pairs of such lines.

2.2 SPRT: It’s Finite Termination and Bounds The reason we resort to a sequential procedure is that we may be able to termi- nate the experiment earlier than a fixed-sample size procedure. Then we should ensure that the sequential procedure will terminate finitely with probability one. Towards this we have the results of Stein (1946) and Wald (1947).

Theorem 2.2.1 Let Z = In f (X;01) / f (X;00)) where we are interested in testing test HO : f (z) = fo (z) versus HI : f (z) = f1 (z) . Then Wald’s SPRT terminates finitely with probability one provided P (Z = 0) < 1. 14 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

We will omit part of the proof because it is somewhat technical. When we are dealing with a family of densities indexed by a parameter 8, Z = 1n{f (z; 81) /f (z;&)}, where f(q00) and f(z;01) are hypothesized density functions under HO and HI respectively. In general, it can be shown that if N is the stopping time of the SPRT, P (N > kr) 5 (1 - q‘>”, where k and r are positive integers. Since P(Z = 0) < 1, there exist a d > 0 and q > 0 such that either P (2> d) 2 q or P (2 < -d) 2 q. In the former case choose the integer r such that rd > In (A/B). In the latter case choose the integer r such that -rd < -In (A/B). Now 00 {N = 00) = n {N > n} where {N > n} n=l is monotone decreasing. Hence

P (N is not finite) = lim P (N > n) n-wa = lim (N> kr). k-mo Note that {P (N > n)}is a monotone decreasing sequence of probabilities bounded below, and hence has a limit. This limit is also the limit of any subse- quence, in particular, the sequence consisting of every rth element of the original sequence. Thus, lim P (N > kr) 5 lim (1 - qr)k = 0. k+00 k400

Remark 2.2.1 Wald (1947, pp. 157-158) has established Theorem 2.2.1 under the assumption that vur (2)is positive.

Next, we shall explore whether it is possible to solve for A and B explicitly for specified a and p. We have

a = P(reject Ho(H0) 00 = CPH,(B < Rj < A, j = 1,2,..., i - 1 and Ri 2 A), i= 1

p = P(accept HolH1) 00 = ~PH~(B< Rj

However these expressions do not easily lend themselves to evaluate A and B in general, where

Theorem 2.2.2 For Wald's SPRT A 5 (1 - p) /a! and B 2 p/ (1- a!).

Proof. Let X = (XI,X2, ...,Xk) and let Ek be the set of all points (in k- dimensional Euclidean space Rk) for which we reject HOusing the SPRT. Also, let Fk be the set of all points for which we accept Ho. Notice that (Ek,k = 1,2, ..) are mutually disjoint and (Fk,k = 1,2, .->are also mutually disjoint (draw pictures in R1 and R2). Assume, without practical loss that PH~({UEk} U {UFk}) = 1,i = 0 and 1. That is, P (N = ca) = 0 which is satisfied when P (2= 0) < 1 (see Theorem 2.2.1). Notice that 2 will be identically zero if and only if f1 (x) and fo (z) agree at each point x which can occur. The mild condition P(Z = 0) < 1 will be satisfied provided the random variable X is not concentrated on the set of points x for which f1 (z) = fo (x). Then

00

Since fi (x) 2 Afo (x) holds at every point x E Ek, we obtain

00

(2.2. I) Hence 1 a! ->- (2 * 2.2) A 1-p' Similarly

since 1- a! = PH, (accept Ho) = CEl PH, (Fk). Consequently

(2.2.3) 16 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

Corollary 2.2.1 A = (1 - p) /a and B = p/ (1 - 0) imply that a = (1 - B)/ (A- B) and p = B (A- 1) / (A- B).

Remark 2.2.2 In obtaining Wald’s bounds for A and B it is assumed that the SPRT terminates finitely with probability one. However, when pHi({u&} U {UFk}) 5 1, i = 0 and 1, the last equality in (2.2.1) can be replaced by the inequality 5. Hence, the inequalities relating A and B and the error probabilities constitute approximate bounds irrespective of whether termination is certain or not. Also B = p and A A l/a can be reasonable bounds.

The inequalities obtained here are almost equalities since A, does not usually attain either a value far above A or a value far below B. So, suppose we take A = A‘ f (1 - p> /a and B = B‘ -= p/ (1 - a).When we use the approximate values A’ and B‘ for A and B respectively, we may not have the same probabilities of Type I and Type I1 errors, namely a and p. Let the effective probabilities of error be a’ and p’. Consider the SPRT based on (B,A’). Then there may be some sequences which call for rejection of HO by the test based on (B,A)and for acceptance of HO by the test (B,A’). So a‘ 5 a,p’ 2 p for (B,A’). Similarly a‘ 2 a, p‘ 5 ,O for the SPRT (B’,A) . However, if the SPRT (B’,A’) is used instead of (B,A) it is not obvious whether the error probabilities are decreased or increased. However

i.e.,

a(1-p’) 2 a’(1-P) p(1-a’) L: P’(1 -a) - Adding these two we obtain

a + p 2 a’ + P’.

That is, at most one of the error probabilities could be larger than the nominal error probability. Further,

a’ 5 a’ <-- a -%(l+p> (1 - p’) - (1 -PI

p’ 5 -I--P’ :P(l+a), (1 -a‘) (1 -a) 2.2. SPRT: IT’S FINITE TERMINATION AND BOUNDS 17 hence any increase in error size in a’ [p’] is not beyond a factor of 1 + p [1+a]. These factors are close to unity when a and ,8 are small. If a = p = .05, then a‘ = p‘ 5 .0525. If both a’ < a and p’ < p, it would usually mean the (B’,A’) test required substantially more observations than the (B,A) test. Since B 2 p/ (1 - a) and A 5 (1 - a)/p, we have increased the continue- sampling region. There are several reasons for believing that the increase in the necessary number of observations caused by the approximations to be only slight. First, the sequential process may terminate at the nth stage if f1 (x)/fo (x) 2 A or f1 (x)/fo (x) 5 B. If at the final stage fl/fo 2 A were exactly equal to A or B, then the inequalities for A and B would be exact equalities. A possible excess of fl/fo beyond the boundaries A and B at termination of the test procedure is caused by the discontinuity of the number of observations. If n were continuous then f1/ fo would be continuous in n and the ratio could exactly achieve A and B at the time of the termination. Wald (1947, Section 3.9) has shown that increase in the expected sample number using the inequalities is slight. A nice feature of the SPRT is that the approximations to A and B are only functions of a and /3 and can be computed once and for all free of f;whereas the critical values in Neyman and Pearson formulations of fixed-sample procedures depend on f and a. So, in the SPRT no distributional problems are involved except where one is interested in finding the distribution of the number of trials required to terminate that experiment. However this is of secondary importance if we know that the sequential test on the average leads to a saving in the number of observations. Note that when B = ,8/ (1 - a) and A = (1 - a)lp, it is trivial to show that B < 1 < A (since a + ,8 cannot exceed unity).

Example 2.2.1 Let 8 be the probability of an item being defective. At the nth stage: take one more observation if

that is, if or

In other words, the continue-sampling inequality is of the form

nC + D’ < r < nC t D, 18 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST where r denotes the number of defectives and C, D and D' are functions of A, B, 80 and 81.

Example 2.2.2 Let 80 = 1/2, 81 = 0.8 in Example 2.2.1. Then (23)' ( .2)n-r An = (.5y ( .5)n-r Letting a = 0.2 and p = 0.2, (8/5) An, if (n+ ,)st trial results in a defective, (2/5) An, if (n+ ,)st trial results in a non-defective, B = 2/8 = 1/4, A = 0.8/0.2 = 4. Suppose we observe D G G D D D D D D, where D denotes a defective item and G denotes a non-defective (or good) item. The continue-sampling region is nln(5/2) -1 < < n In (5/2) 21n2 21n2 + 1, 0.65~~- 1 < r < 0.65n + 1. Hence, we reject HO on the ninth observation. Example 2.2.3 (Fixed-sample size procedures for Example 2.2.2) If a fixed- sample size n were used above in Example 2.2.2, specifications are P(r > klOo = 1/2) = 0.2 P(r 5 kI& = 0.8) = 0.2, i.e.,

@ A 0.8 = @ (0.84)

@ IC - 0'8n] = 0.2 = @ (-0.84) [ ( .16n)1/2 i.e. , Ic - n/2 = 0.84&/2 and k - 0.8n = -0.84&(0.4) , or 0.3n = 0.846(0.9), Jn = 0.84(3) = 2.52, that is, n A 7 and k = 5.32. The exact values, using binomial tables are n = 10, k = 6. 2.3. THE OPERATING CHARACTERISTIC FUNCTION 19

2.3 The Operating Characteristic Function

Wald (1947) devised the following ingenious method of obtaining the operating characteristic (OC) (probability of accepting Ho) of an SPRT. Consider an SPRT defined by given constants A and B, with B < 1 < A, in order to test HO : f = fo (x) = f (.;go) against HI : f = fi (x) = f (x;81). If 80 and 81 are the only two states of nature, then there is no point in considering the operating characteristic function (OC Function). However, if the above hypothesis-testing is a simplification of, for example, HO: 8 5 8* versus HI : 8 > 8*, then one would be interested in the OC(8) for all possible values of 8. Let 8 be fixed and determine as a function of that 8 a value of h (other than 0) for which

This expectation is 1 when h = 0 but there is one other value of h for which it is also 1. For example, h = 1 if 8 = 80 and h = -1 if 8 = 81. The above formula can be written as

Define the density function

Consider the auxiliary problem of testing

H : f = f (x;6) vs. H* : f = f* (x;6) which are simple hypotheses for fixed h and 8. So, one continues sampling (in testing H vs. H*) if

After taking the l/hth power (assuming h > 0) throughout, we obtain the same inequality that was used for continuing sampling in testing Ho against HI. Hence Po (accept Ho) = Po (accept H)= PH (accept H) = 1-a* where a* is the size of the type I error for the auxiliary problem. However solving the equations 20 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST we find that a* (1 - Bh)/ (Ah - Bh). Hence OC(0) (Ah - 1) / (Ah - Bh), which is a function of h. If h < 0 we set B* = Ah and A* = Bh. Then Pe (accept Ho) = Pe (reject H) = PH (reject H)= a* where

yielding the same expression for OC(0) as in the case of h > 0. However, h is a function of 6' and these two relations define the operating characteristic curve OC(0) parametrically. Each value of h determines a 8 and a value of Pe (accept Ho), a point on the OC curve. (For exponential models, one obtains an explicit expression for 0 in terms of h.) The equation relating h and 8 does not provide a well-defined value of 8 when h = 0 since the relation is satisfied by all 0. However, one can evaluate the limit of OC(0) as h -+ 0 by using 1'Hospital's rule. Thus

In A lim OC (0) = h+O 1nA - 1nB' We know OC(B0) = 1 - a,OC(B1) = p,

lim OC (0) = 1, and lim OC (0) = 0, h+oo h+-W since B < 1 < A. Thus we obtain the following table of approximate values

h -00 -1 0 loo h e 81 60 - In A 1-0 1 Ah - 1 OC 1nA-1nB Ah - Bh

Example 2.3.1 Consider the problem of testing 0 = 80 vs. 0 = 81 > 80 in a Bernoulli population. Here

Setting this equal to 1 and solving for 0 we obtain 2.4. THE AVERAGE SAMPLE NUMBER 21 as h --+ 0 this becomes

Also one can easily see that

lim 8 = 0, and lim 0 = 1. h-tw h+-w

If 80 = 0.5, 81 = 0.8, and a = ,O = 0.01, we obtain e= 1 - (2/5)h - 5h - 2h 1.6h - (2/5)h - 8h - 2h' and the table

h 1-00 -1 0 1 00 1 .8 .661 .5 0 oc I 0 .01 .5 .99 1 2.4 The Average Sample Number

The sample size needed to reach a decision in a sequential or a multiple sampling plan is a random variable N. The distribution of this random variable depends on the true distribution of the observations during the sampling process. In particular, we are interested in evaluating E (N), the average sample number (ASN). In Section 2.2 it was shown that for the SPRT, N is finite with probability one. Thus, N can take on values 1,2,3,... with probabilities pl,p2, ..., where

cpa= 1.

The moments of N cannot be explicitly computed. However, one can show that (assuming P (2 = 0) < 1) E (Ni)< 00 for all i. Towards this end, consider

Now, r r 22 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST etc. follow from the inequality obtained in Section 2.2, namely that P (N > kr) -< (1 - qr)k, k = 1,2, ...) since

T T

j=l j=1 etc. Consequently,

Now, the series in brackets can be shown to be convergent by using the ratio test for 0 < g 5 1. The ratio of the (n+ l)thterm to the nth term in the above series is (%)'(1 - 4') the limit of which is less than unity. Hence, E (Ni)< 00. In fact one can show that (assuming P (2 = 0) < 1) the moment-generating function of N is finite, Towards this end consider

MN(~)= E(eNt)

provided ert (1 - qT) < 1. That is, t E (-m, 0). If a decision is reached at the nth stage, A, is approximately distributed as a Bernoulli variable with values B and A and

E (1nRN) (1nB) P(1nhN = 1nB) + (1nA) P (1nRN = 1nA)

= 1nB - P (accept Ho) + 1nA - P (reject Ho) ) where the expectation and the probabilities are with respect to the true distrib- ution. So Eeo (1nAN) (1nB) (1 - a)+ (1nA) a and Edl (1nhN) = (1nB) p + (1nA) (1 - p) . However In AN = 21 + 22+ - - - + ZN, 2.4. THE AVERAGE SAMPLE NUMBER 23 a random sum of i.i.d. random variables, where

Now, using the simple method of Wolfowitz (1947) and Johnson (1959)) we will show that E (1nAN) = E (N) E (2).Let 2, Z1,Z2, ... be a sequence of independent, identically distributed random variables and N a random variable with values 1,2,... such that the event (N 2 i) is independent of Zi,&+I, .... Let yZ be zero if N < i and 1 if N 2 i. Then the event (N = i) is determined by the constraints on 21,Z2 ..., Zi and hence is independent of &+I, ..., for i = 1,2, .... Also {N 2 i} = Uiz; {N = j})' is independent of Zi,Zi+l .... Thus

/00 \00 E(Z1+Z2+..- \i=1 i=l 00 00

i=l i=l since yZ depends only on 21, 22,..., Zi-1 and hence is independent of Zi provided the interchange of infinite summation and expectation is justified, and

00 00 0000

This completes the proof of the assertion that

E(lnAN)=E(Z)E(N). (2.4.1)

The interchange of summation and expectation is valid if the series is ab- solutely convergent. Consider

provided E (121)and E (N)are finite. Thus it follows as an application of the last result to the sequence 24 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST that

Hence . alnA+(l-a)lnB Eeo (N)= 7 (2.4.2) Eeo (2) and (2.4.3)

Example 2.4.1 Let X be normal with mean 8 and variance 1. Let 80 = 0 and 81 = 1 and a = ,B = 0.01. Then

A = 99 = 1/B, 1nA = 4.595,

So Eo (1nAN) = - (1 - 2a) In 99 = -4.5031.

. 4.5031 . Ee,(N) = -- - 9 and Eel (N)29. 112 For a fixed-sample size procedure n = 22 is needed. The expected sample size can be computed for states of nature other than HO and HI via

(2.4.4) where 7i- (8) = PO(accept Ho) = OC(8).

Example 2.4.2 Let X be a random variable distributed uniformly on [O, 6' + 21. We wish to test HO : 8 = 0 (density fo) against HI : 8 = 1 (density f1). We will obtain Wald's SPRT, the exact average sample number, and the error probabili- ties. Let I (a,b) = 1, if a 5 b and zero otherwise. Then

fl (Xi) I(X(,)>3)I(l,X(I)) AN = i=l'II [ml= I (X(+ 2) I (o,x(l)) where X(,) and X(l) respectively denote the largest and smallest observations in the sample. Hence the rule is: at stage n accept Ho if X(1) < 1 and X(n) < 2 (then A, = 0), at stage n take one more observation if 1 5 Xi 5 2 (i = 1,2, ..., n) 2.4. THE AVERAGE SAMPLE NUMBER 25

(then An = l),at stage n reject Ho if X(1) > 1 and X(n)> 2 (then An = co). If N denotes the random sample size required. Then

Similarly,

Pl(4 = P (N = nlH1) = P (N = n, reject HolH1) + P (N = n, accept HoIH1) = P (1 5 Xi 5 2,i 5 n - 1,X(1)> 1 and X(n)> 21H1) +P (1 5 Xi 5 2,i 5 n - l,X(1) < 1 and X(n)< 2(H1) = P (1 5 Xi 5 2,i 5 n - l,X(n) > 21H1) +P (1 5 xi 5 2,i 5 n - 1, X(n)< 11~1) = (1/2)n-1 (0) + (1/2)n-1 (1/2) = (1/2)n.

00 E (NIHo)= En (1/2)n = 2, and E (NIHI)= 2, n=l (because C,"=lnOn = non-' = 8 (a/aO) c,"=l On = O/ (1 - 8)2) and P(Type I error)

Similarly ,B = 0.

Higher moments of randomly stopped sums have been derived by Chow, Rob- bins and Teicher (1965). In the following we shall state their results for the second . 26 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

Let (R, F, P) be a probability space and let 21,22, ... be a sequence of random variables on R. A stopping variable (of the sequence 21,22, ...) is a random vari- able N with positive integer values such that the event {N = n} depends only on (Zl,22, ..., Zn) , for every n 2 1. Let

then N

i= 1 is a randomly stopped sum. We shall assume that

Wald’s (1947) Theorem states that for independent and identically distributed (i.i.d.) Zi with E(Zi) = 0, E(N) < 00 implies that E(SN)= 0. We have the following results of Chow, Robbins and Teicher (1965).

Theorem 2.4.1 Let 21,22, ... be independent with E(2n)= 0, E lZnl = an, E(Z:) = oi < 00 (n2 1) and let Sn = C:., Zi. Then if N is a stopping variable, either of the two relations

(2.4.6) implies that E(,S”) = 0 and

(2.4.7)

If CT~= o2 < 00, then E(N)< 00 implies that

(2.4.8)

Corollary 2.4.1.1 If E(Zn) = 0, E(N)= E(Sg)/E(Z2),which is known as Wald’s second equation.

One disturbing feature of the expected sample size of the SPRT is the fol- lowing: if one is really interested in testing H,* : 8 5 8* against the alternative 2.4. THE AVERAGE SAMPLE NUMBER 27

H;" : 8 > 8*, then one would set UP HO : 8 = 80 (00 E H;) and with HI : 8 = 81 with 81 2 8*: the zone between 80 and 81 being the %zdzflerence zone." If the population is indexed by a 0 belonging to this indifference zone, that is near 8*, E (N) tends to be largest. Thus the test tends to take larger stopping time to reach a decision when 8 is near 8* where there is hardly any concern as to which decision is made. This is clear intuitively also, because it should not take very long to discover that a population is overwhelmingly of one kind or the other as it does to discover which kind it is when it is near the borderline. What is annoying then, is that wrong decisions are less likely to be costly in the borderline case, whereas it is precisely at that situation that a large sample is likely to be required in order to reach a decision.

Example 2.4.3 For the Bernoulli problem of 80 against 81, we have shown that

1 - eo

~(0)= OC(8) = (Ah- 1) /(Ah - Bh) .

NOWusing, 80 = .5, el = .9, a! = p = .05, one obtains

e = (5h-1) /(gh-i)

and

= 8ln9-ln5.

As h -+ 0, 9 and n(8) tend to be indeterminate. Also both E (2) and E (1nRN) tend to zero, but their ratio can be computed by evaluating its limit as h --+ 0. We have Table 2.4.1. from Example 2.3.1. So, let us find limh,o E (NIB).

Table 2.4.1 OC Function and Expected Sample Size for the Bernoulli Problem

h -00 -1 0 1 00 e 1 -9 .732 .5 0 7r (0) 0 .05 -5 .95 1 E(zle> ln1.8 0.9ln9 - In5 0 In 0.6 -In5 = .91n9-ln5.9 In 19 = -.9 In 19 - 5.2 E (NIB) I # 5.01 7.2 9.16 ln.6 - In 5 = 1.83 28 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

Therefore,

(In 19) [ (1 - 19-~) lim E (NIB) = lim h-+O h-to { [l - (.2)h] / [(1.8)h- (.2)h]} (log9) - log5 ’

Using the expansion

ah-bh = ,hlna - ,hlnb = h (lna - lnb) + (h2/2)[(lna)2 - (lnl~)~]+ - - -, we have

(1 - 19-h) - (19h - 1) - -h2 (In 19)2 - $ (In 19)* - - - . 19h - 19-h -[2 hln19 + !$ (1n1q3 + - -1 1 + $ (In 19)~- - - - = -(hln19) 2 [1 + (In 19)~+ - -1

Similarly

Hence the denominator = 5 (In 5) [In (5/9)] + - - -. Thus

- (In 19)~ - (In 19)~ lim E(Nl0) = lim h+O h+O (In 5) In (5/9) [’ * (h)l = (In 5) In (5/9) - (2.9444)2 - = 9.16. (1.6094) (0.5878)

Alternatively since Ee(2) = 0 when h = 0, we use Wald’s second equation and obtain (lnA)2P(SN 2 1nA) + (lnB)2P(SN 5 1nB) Ee(N)= 7 Ee (Z2) where B = 0.732. We note that a = ,O = 0.05 implies that A = B-l = 19, and 7r (0) = (19h - 1)/(19h - 19-h). 2.5. WALD'S FUNDAMENTAL IDENTITY 29

Here 2 = Xln9-In5 = (X - 0) In9 + (Oh9 - ins) , ~~(2~)= e (1 - e) (ln9)2 = 0.9448, and the numerator is Ee (N)becomes ( 1-Bh ) (1nA)2 + ($h--jh) (1nB>2 Ah - Bh = (lnA)2 = 8.6697, since A = B-' = 19. Thus (In 19)2 (2.9444)2 Ee (N(h= 0) = -- = 9.16, 0.9448 0.9448 which agrees with the value in Table 2.4.1. Although the SPRT terminates finitely, in any single experiment N could be very large. Hence, one might establish a bound no and terminate sampling at no. If no decision is reached by the nkh stage; sampling is stopped anyway (and HO accepted if Ano < 1 and rejected if A, > 1). The truncation of the procedure would certainly affect the error sizes of the test and this aspect will be studied in Section 2.8 and the effect is slight if no is fairly large.

2.5 Wald's Fundamental Identity

In this section we shall give an identity of Wald (1947) which plays a fundamental role in deriving the moments of the sample size required to terminate the SPRT, for testing HO : 8 = 80 against HI : 0 = 81, where X has the probability density function given by f (x;0).

Theorem 2.5.1 (WaZd, 1947) Let 2 = ln[f(X;81)/f(X;Oo)] and let P(Z = 0) < 1. Then E { eSNt[C (t)]-N} = 1 for every t in D where N SN = c22, c (t)= E (P) i=l and D is the set of points in the complex plane such that C(t) is finite and C(t)2 1. Under some mild regularity assumptions, the above identity can be differen- tiated under the expectation sign any number of times with respect to t at any real value t such that C(t)2 1. 30 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

2.5.1 Applications of the Fundamental Identity

Differentiating with respect to t and setting t = 0, we obtain

E [SN- E (N)C‘ (O)] = 0.

This is E (SN)= E (N)E (2). (2.5.1) Differentiating twice and setting t = 0, we obtain

= 0) t=O i.e., E { [SN- NC’(0)I2- NC”(0) + N [C’(O)I2}= 0.

That is, war(SN) = E (N)war(2). If E(2)= C’(0) = 0 then

E (S:) = E (N)E (Z2).

Hence E (N)= E (5’5) /E (Z2) (2.5.2) which is known as Wald’s second equation, where

P (SN 2 In A) A (1 - Bh)/ (Ah - Bh) , and P(SN

Lemma 2.5.1 If E(2)= 0 then h = 0, provided P(2= 0) < 1 (that is, 2 is not a trivial random variable almost surely), and that diflerentiation of E (eZh) underneath the integral sign is permissible.

Proof. Under the assumption of the lemma, 2.5. WALD’S FUNDAMENTAL IDENTITY 31

This together with E(2)= 0 implies that

E [(e”” - 1) Z]= 0.

Now, using the mean value theorem, we obtain hE 0Z2eyZh = 0 for some 0 < y(Z) < 1. Since P(Z = 0) < 1, we infer that Z2eyzh > 0 with positive probability. That is E (Z2erzh)> 0. Hence h = 0. This completes the proof. H

We also have 1-Bh In B lim - - h-to Ah - Bh 1nA - 1nB and Ah - 1 In A lim - h-to Ah - Bh 1nA - 1nB’ Thus

. - (lnA)21nB+ (lnB)21nA E (5’;) = 1nA - 1nB = -1nAlnB.

Hence - 1nAlnB E(N)= E(z2) , when E(2)= 0. (2.5.3)

d Example 2.5.1 Let X = normal(0,l). We wish to test HO : 8 = 0 against HI : 8 = 1. Also set a = p = 0.05. Then 1 A A 19 and B - 19 * Hence In A = 2.9444 = - In B. Computations yield that Z = X - 0.5. Suppose we are interested in Eo.5(N). Since Eo.s(Z)= 0, we infer from Lemma 2.5.1 that h = 0. Hence (2.9444) E0.5(N) ~~~~(22)= (2.9444)2 9.

Also in Example 2.4.3

Z = X (h9) - in5 = ln9[X - (in5) / (lng)] , 32 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST and

E(Z2) = (In 9)28 (1 - 8) , where 8 = In 5/ In 9 = 0.7325 = (4.8278) (0.7325) (0.2675) = 0.9460.

Hence (In 19)2 - (2.9444)2 E(NIh = 0) = -- = 9.16, 0.9460 0.9460 which agrees with the value in Table 2.4.1.

Wald (1947, Appendix A4) obtained exact formulas for the OC and ASN functions when 2 = In [f(X;81)/f(X; OO)] takes on as values only a finite numbers of integral multiples of a constant d. Ferguson (1967) was able to obtain exact expression for the error probabilities and the expected stopping time when X has the following density:

and we are interested in testing HO : 8 = -112 vs. HI : 8 = 112. Consider the exponential density for X given by

ifx>O f(x;e) = { if x < 0, suppose we are interested in testing HO : 8 = 80 against HI : 8 = 81, with 81 > 80. Kemperman (1961, pp. 70-71) obtained upper and lower bounds for the error probabilities given by

l-&B A- E~B and 1 - A-l 1 - EA-' B-1 - &A-l -

(1 - EB)(A - E~B)/B - = A. (A- E~B)(1 - EB)/AB 2.6. BOUNDS FOR THE AVERAGE SAMPLE NUMBER 33

Also considering upper bounds for P/ (1 - a) we have

(1 - &Av1)(B-l - E~A-’) 1 - (1 - EB)(A - eB)-’ B (A- E) (A - EB) I (A- 1) (A - E~B) B(A-4 =B+ (1 - E) B L A-1 A-1 These results suggest that we modify Wald’s approximation to the boundary values A and B as follows:

(2.5.4)

2.6 Bounds for the Average Sample Number

When both In A and In B are finite, Stein (1946) has shown that E (N)is bounded. However, when one of the boundaries (say 1nA) is infinite, we cannot show that E (N)is bounded. When E (N)exists, the formula E(N) = E(,S”)/E(Z)will hold and be meaningful. M. N. Ghosh (1960) has shown that E (N)is bounded if a certain condition is satisfied.

Theorem 2.6.1 Let the random variable Z be such that E(Z) < 0. Then E (N)is bounded if

1: zdG(2) /l: dG(2) = E(Z1.Z < -x) 2 -Z - c (2.6.1) for some c and k so that x > k > 0, c > 0 where G(z) denotes the distribution function of Z.

Special Case 1 : If Z is normal with mean p and variance 02,then we can take c = (2/3)0 and k = 20 - p. Special Case 2: If Z has a standard double exponential distribution, then we can take c = 1 and k = 1. Next we consider lower bounds for the ASN required by an arbitrary sequential test. Let X1,X2, ... be a sequence of i.i.d. random variables having the density or probability function f(z;8) where 8 E SZ. Suppose we wish to test HO : 8 E wo versus HI : 8 E u1 where wo and w1 are two disjoint subsets of 32. Let D denote an arbitrary (possibly randomized) sequential test of HOvs. HI.Let 0 < a,,6 < 1 such that Pe(D accepts HI) 5 a, if 8 E WO, 34 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST and Pe(D accepts Ho) 5 p, if 8 E w1. (2.6.2) Then Hoeffding (1953, pp. 128-130) obtained:

(2.6.3) and

where

Notice that inequalities (2.6.2) and (2.6.3) were obtained by Wald (1947, Ap- pendix A.7), (See also (2.4.2) and (2.4.3)) when HOand H1 are simple hypotheses.

Special Case: Suppose that

Then

Hence and !-~)~/2, ifOc-6 el (8) = { if 8 2 6. Further if a = p,

Hoeffding (1960) had derived improved lower bounds and we will provide one of them. Let XI,X2, ... be a sequence of i.i.d. random variables having the density (or probability function) f (which could be indexed by a parameter 0). Consider sequential tests for which a(P) denotes the error probability when f = fo(f1). Let N denote the stopping time of the sequential test. 2.6. BOUNDS FOR THE AVERAGE SAMPLE NUMBER 35

Theorem 2.6.2 Let the sequential test terminate with probability one under each of fo, f1 and f2. Also assume that E2(N) < 00 when E2(N) denotes the expected stopping time when f = f2. Further let a -k p < 1. Then

where

9 = m+90,91), gi = S f2(4 In [f2(.)/fi(Z)] d., i = 0,l (2.6.6) and

Special Case: Let fo, f1 and fi be normal densities with variance 1 and respective means -<, 5,O (< > 0). Then

and (2.6.5) takes the form of

(2.6.8) when a = P. Note that when a is small

which can be obtained by first squaring and then using the inequality:

(-2 In a)1/2- (1 - 2 In 2)’12 5 [I - 2 In (2a)l1l25 - (2 In + (1 - 2 In 2)ll2 .

Next, consider the SPRT which stops as soon as

2clS~l> lnA(> 0) since Zi = 2[Xi, where

and A = (1 - a)/.. 36 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

Hence (2.6.10)

Table 2.6.1 Values of E2 (N) and of the Lower Bound in (2.6.8) for E = 0.1

a! .01 .05 0.1 0.2 0.3 Fixed-sample size 541.2 270.6 164.3 70.8 27.5 SPRT 527.9 216.7 120.7 48.0 17.9 Lower Bound (2.6.8) 388.3 187.0 111.1 46.6 17.8

When a is close to its upper bound 1/2 and 5 is small, the lower bound in (2.6.8) is nearly achieved by the SPRT.

2.7 Improvements to OC and ASN Functions Page (1954) and Kemp (1958) have improved the Wald’s approximations for the OC and ASN functions. In the following we will give Kemp’s (1958) results which are better than Page’s (1954).

2.7.1 The OC Function Wald’s approximate formula (see Section 2.3) for the operating characteristic of a SPRT is equivalent to the probability that a particle performing a linear random walk between two absorbing barriers is absorbed by the lower barrier. This formula is valid if the starting points of the test are not near either boundary and if the mean paths are inclined to the. boundaries at not more than a small angle so that the overshoot beyond the boundary at the end of the test is negligible. Page (1954) derived expressions for the OC function and the ASN of a SPRT that are closer to the true values. Kemp (1958) obtains even better approximations by using the same method of Page (1954) but different assumptions and we will present Kemp’s results. Suppose that a Wald’s SPRT is to be carried out on a population for which the scores, Zi assigned to the observations are independent having a continuous density function g (z). Note that in our case z~=log[f (x; 01) 1’2=1,2, ... f (X;00) and we assume that we take at least one observation (that is n 2 1). Consider a sequential testing procedure with horizontal decision lines distance w apart. Take the lower line as the line of reference. Also let P(z) be the 2.7. IMPROVEMENTS TO OC AND ASN FUNCTIONS 37 probability that a sequential test starting at a point x from the line of reference will end on or below the lower boundary. Then Kemp (1958) says that P (2) satisfies

--z (2.7.1)

If P(x)A 1 when x 5 0 and if either P(x)= 0 or

P(2;> w - x) = 0,

Equation (2.7.1) can be approximately written as

00 P(z)g(z- z)dz. (2.7.2)

Then P(x) satisfying (2.7.2) is of the form (2.7.3)

where h is the solution of the equation

(2.7.4) J-00 Also C and D can be solved for in terms of P(0) and P(w) and obtain

and P(O)- P(w)ewh D= 1 - ewh Now, substituting (2.7.3) into (2.7.2) and carrying out the integration, we obtain the simultaneous equations to solve for P(0) and P(w) by setting x = 0 and x = w.

Special Case: If 2; is normally distributed with mean 0 and variance one and w 2 3, then h = -28. Kemp (1958) obtains

where

K2 = [l - a (W + 0) - @ (W - Q) - 2@ (Q)] [l - e-2we]-' . (2.7.7) 38 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

Also note that qop) = 1 - P(~J- e) (2.7.8) when 0 = 0, the limiting form of the operating characteristic is P(2)= P(0)+ [l- 2P(O)] z/w,

z = 2.5 z = 5.0 e Wald Kemp True Wald Kemp True -1.00 1.00 1.00 1.00 1.00 1.00 1.00 -0.50 0.99 0.9998 0.9997 0.993 0.997 0.996 0.0 0.75 0.716 0.721 0.500 0.500 0.500 0.125 0.494 0.406 0.428 0.223 0.190 0.199 1.250 0.282 0.190 0.211 0.076 0.052 0.058

2.7.2 The Average Sample Number If the sequential procedure starts at a point distance z from the line of reference then n(z) the expected sample number satisfies the equation

W n(z) = 1 + 1 n(x)g(x - z)dx. (2.7.9)

If the probabilities in (-00, z) and (w - z, 00) are negligible then one can approx- imately write (2.7.9) as

00 n(z) 1+ J_, n(x)g(x - z)dx. (2.7.10) which is satisfied by the solution [(C*+ D*ezh)- z] n(z) = (2.7.11) where h is defined by (2.7.1).

Special Case: If Zi is normal with mean 0 and unit variance and w 2 3, then 2.7. IMPROVEMENTS TO OC AND ASN FUNCTIONS 39 and

{ [n(w) - n(o)]6 - w} (I - e-2wz) n(z) = n(0) - (2/6) + (2.7.12) e (1 - e-2we)

Substituting (2.7.10) into (2.7.9) integrating and setting z = 0 and z = w one can obtain K2w Kln(0) + K2n(w) = 1 - iP (LJ - 0) + iP (-6) - -e

(2.7.14) where K1 and K2 are given by (2.7.5) and (2.7.6). Also note that it is necessary to calculate n(0) and n(w) only for positive or negative since n(Ol6) = n(wl - 6). For 6 = 0, the limiting forms are

(w - n(z) = n(0) + -, 4 z

n(0) = l+(-$=){[;-iP(w)]}-'.

Table 2.7.2 Comparison of the values n(z) when w = 10 z = 2.5 z = 5.0 e Wald Kemp Tme Wald Kemp Tme -1.00 2.5 3.8 3.4 5.0 6.3 5.9 -0.50 5.0 7.0 6.4 9.9 12.0 11.4 0.0 18.8 27.7 25.2 25.0 34.0 31.4 0.5 12.3 16.2 15.4 9.9 12.0 11.4 1.0 9.3 8.8 8.4 5.0 6.3 5.9

Note that the true values in Tables 2.7.1 and 2.7.2 are obtained by solving the exact equations for P(z) and n(z). Tallis and Vagholkar (1965) also obtain improvements to the OC and ASN approximations which are comparable to those of Kemp (1958). However, they are too complicated to be presented here. 40 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

2.8 Truncated SPRT

Although SPRT's enjoy the property of finitely terminating with probability one, often due to limitations of cost or available number of experimental units we may set a definite upper limit say no on the number of observations to be taken. This may be achieved by truncating the SPRT at n = no. Thus, Wald's (1947) trun- cated SPRT is formulated as follows: If the sampling procedure has progressed to nth stage (n 5 no) n reject HO if C Zi 2 1nA , and i=l n. accept HI if 2 zi 5 ln~, i=l and take one more observation if In B < C:!,Zi < In A. If the SPRT does not lead to terminal decision for n < no, no reject HO if 0 < C Zi < 1nA , and i=l nO accept HI if 1nB < C Zi < 0, i= 1

By truncating the sequential process at the nhh stage, we change the probabil- ities of type I and type I1 errors. The following theorem provides upper bounds for the modified probabilities of errors.

Theorem 2.8.1 Let (u and p be the normal probabilities of errors of first and second kinds for the SPRT. Let a (no) and P (no) respectively denote the modajied a and P for the truncated SPRT at no. Then In A (2.8.1) and B)]Jo ~WY) (2.8.2) 1nB where I Y) \id and Pj denotes the probability computed when Hj is true (j = 0,l). When no is suficiently large, we have

(2.8.3) 2.8. TRUNCATED SPRT 41 and

(2.8.4) where pj = EH,(Z),a; = WU?-(ZIHj), j = 0,l.

Proof. Let po (no) denote the probability under HO of obtaining a sample such that the SPRT does not lead to a terminal decision for n 5 no and the truncated process leads to rejection of Ho, while sampling beyond no leads to acceptance of Ho. Let C1, C2, C3 respectively denote the sets of sample points the probability contents of which when fo is true are a(no),po (no)and a. Also let C4 denote the set of outcomes for which one continues sampling forever if he/she does not make a decision at no. Then C1 c C2 U C3 U C4 because any sample point belonging to C1 is also in C2 U C3, and the sample point belonging to C3 (and hence to C2 U C3) for which n0 1nB < C zi < 0 i=l does not belong to C1. Hence the strict inequality and consequently (since Poi (C4) = 0, i =O,1)

a (no) L a + po (no)* (2.8.5) Next we derive an upper bound for po (no),which is the probability under HO that the sequence of observations 21,22, ... satisfy the following three conditions:

(i) InB < Cy==lZi < 1nA for n = 1,2, ..., no - 1, (ii) 0 < Czl Zi < In A, (iii) when the sequential process is continued beyond no, it terminates with the acceptance of Ho.

Now

n Zi < In A and Zi 5 1nB for some n > no i=l i=l 42 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

Thus, since the 2’s are i.i.d. random variables,

= I,, [l - a* (Be-Y,Ae-Y)] dGo(y), where a* (I3e-Y’Ae-Y) denotes the type I error probability of the SPRT hav- ing stopping bounds (Be-Y,Ae-Y). Using corollary 2.2.1, we have 1 - a* = (A - ey) / (A - B). Thus

(2.8.7) since eY 2 1. Analogously, one can obtain that

where

(iv) In B < Cyll Zi < 0, and (v) when the sampling process is continued beyond no, it terminates with the rejection of Ho.

Hence

no n 1nB <: XZi< 0 and XZi2 1nA for some n > no i=l i=l

(2.8.8) 2.8. TRUNCATED SPRT 43 since the 2’s are i.i.d. random variables and where ,8* (Be-Y,Ae-9) denotes the type I1 error probability of the SPRT having stopping bounds (Be-y, Ae-y). Again from approximation (2.2.1) and (2.2.2)we obtain

1 - p* (Be-Y,Ae-Y) A A(A- B)-l(l- Be-Y) with y < 0.

Thus (2.8.9)

Next, let us consider the case where no is sufficiently large. Consider for some c

where pj = EH,(Z),0; = war(ZIHj),j = 0,l.

Remark 2.8.1 Wald’s (1947, p. 64) upper bounds for po (no) and p1 (no) are given by PO[(ii)] and PI [(iv)] respectively using normal approximations.

Example 2.8.1 Let fj(x) = $(x - Oj), j = 0,1, with 01 > 80. Then Z = 6 [X - (00 + 81) /2], 6 = (01 - 80) and hence po = -62/2,p1 = 62/2 and 02 = S2. Hence, from and we have, for all no. 3 (2.8.3) (2.8.4)

and

Special Case: a = ,d = .05 and no = 25, 80 = -1/2 and 81 = 1/2. Then

PO (no) 5 0.95 [Q, (3.08)- Q, (2.5)] = 0.005. also p1 (no) = po 50.005.

Example 2.8.2 Let X1,X2, ..., be an i.i.d. sequence of random variables having the probability density function fj(x) = 0;’ exp(-z/ej), x,0j > 0, j = 0,l. Then 2 = 6X-ln(B1/Bo) where 6 = (01 - 60) and without loss of generality, 44 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST let us assume that 01 > 00. Hence pj = EH,.(Z) = S0j - ln(0&) and 0; = var(Z1Hj)= 8’0;. Now, note that 2 Cy:, Xi/0j is, under Hj,distributed as chi-square with 2no degrees of freedom. Also let

Then straight-forward computations yield (for some c)

=H (2.8.10)

Now, using (2.8.10) with c = 1nA and j = 0 in (2.8.1), one obtains an upper bound for a (no),for the exponential distribution. Also, using (2.8.10) with c = 1nB and j = 1 in (2.8.2), we get an upper bound for ,B(no) for the exponential case. If no is large, we obtain the upper bounds by substituting the relevant quantities for pj and aj in (2.8.3) and (2.8.4). Aroian (1968) proposes a direct method for evaluating the OC and ASN func- tions of any truncated sequential test procedure once the acceptance, rejection and continuation regions are specified for each stage. His method involves re- peated convolution and numerical integration. For instance

no

where no is the truncation point, pi0 (0) Ipil(0) ] denotes the probability of accept- ing (rejecting) HOat ithstage (i = 1,2; ..., no) . His method is amenable to testing simple hypotheses about parameters in exponential families (since then he can reduce the SPRT to a random walk). Aroian’s (1968) method is more promising, especially, when the underlying distributions are discrete. Aroian (1968) illus- trates his method by evaluating the OC and ASN functions of Wald’s SPRT for the normal mean with known variance with no = 7 and 14. For the binomial case, by choosing an arbitrary continuation region, he obtains exact expressions for the OC and ASN functions. So far we have some idea about the performance of Wald’s SPRT. Now, we would like to ask whether the SPRT has any optimal properties. 2.9. OPTIMAL PROPERTIES OF THE SPRT 45

2.9 Optimal Properties of the SPRT

The sequential probability ratio test (SPRT) for testing a simple hypothesis against a simple alternative was first proved to be optimal in a certain sense by Wald and Wolfowitz (1948) (see Wolfowitz, 1966, for additional details). An- other proof has been given by LeCam which appears in Lehmann (1959). Matthes (1963) has given a proof which relies on a mapping theorem. Let XI,Xz, ... be an i.i.d. sequence of random variables having the density function f (2;8). We wish to test HO : 8 = 80 versus H1 : 8 = 81. Let N denote the stopping time of Wald’s SPRT for testing HO against HI. Then we have

Theorem 2.9.1 (Wald and Wolfowitz, 1948). Among all tests (fixed-sample or sequential) for which P(reject Holeo) < a, P(accept Hol81) < p and for which E(NI8i) < 00, i = 0, 1, the SPRT with error probabilities a and p minimizes both E(NIBo) and E(NIB1).

Proof. The main part of the proof of Theorem 2.9.1 consists of finding the solution to the following auxiliary problem: Let wi denote the loss resulting from wrong decision under Hi (i = 0, l),and let c denote the cost of each observation. Then the risk (expected loss) of a sequential procedure is

when HO is true, PWl + cE(NIQ1) when HI is true and a,P are the error probabilities. If the state of nature 8 is a random variable such that P(8 = 80) = 7r and P(8 = 81) = 1 - T, then the total average risk of a procedure 6 is

r (TWO, w1,S) = T- [~WO+ cE(NIQo)]+ (1 - T)[PWI + CE(N~~)]. The proof of Theorem 2.9.1 consists of determining the Bayes procedure for this problem, that is, the procedure which minimizes r (T,wo, w1, S), and showing that Wald’s SPRT is Bayes in the sense: Given any SPRT, given any 7r with 0 < T < 1, there exist positive constants c and w such that the Wald’s SPRT is Bayes relative to T,c, wo = 1- w, w1 = w. (It is important to note that T can be chosen arbitrarily.) From the Bayes character of Wald’s SPRT, one can show its optimum property as follows: Let S* be any other competitive procedure having error probabilities a* 5 a, ,B* 5 ,B, and expectations of sample size E(N*lOi)< 00, (i = 0,l). Since Wald’s SPRT minimizes the Bayes risk, it satisfies

T [awe + CE(NIB~)] + (1 - T)[pwl + CE(N~~)] I b*w0 + CE*(N~~)]+ (1 - T)[p*wl + CE*(N~~)], 46 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST hence nE(NJ9o)+ (1 - n) E(NJ91)I nE*(NIBo)+ (1 - n)E*(NI91). Since the inequality is valid for all 0 < n < 1, it implies

~(~190)I E*(Npo) and E(NIQ1) I E*(NI&), which establishes the optimum property of Wald’s SPRT.

Next we consider the monotonicity property (Property M). Definition 2.9.1 An SPRT is said to have the Property A4 if at least one of the error probabilities decreases, when the upper stopping bound of the SPRT is increased and the lower stopping bound decreased, unless the new test and the old test are equivalent. In this case the error probabilities are unchanged. (Two tests are said to be equivalent if their sample paths differ on a set of probability zero under both the hypotheses.) We have the following result regarding the uniqueness of a SPRT.

Theorem 2.9.2 There is at most one sequential probability ratio test for testing HO : f(x) = fo(x) us. HI : f = f1, that achieves a given a and p provided one of the following conditions hold:

(i) f1(X)/ fo(X) has a continuous distribution with positive probability on every interval in (O,oo), (ii) The SPRT has stopping bounds which satisfy 0 < B < 1 < A, (iii) The SPRT has monotonicity property M. For (i) see Weiss (1956) and for (ii) see Anderson and Friedman (1960). Wi- jsman (1960) has shown that the SPRT has property M.

Definition 2.9.2 A density or probability function f (z;9) where 9 is real is said to have monotone likelihood ratio (MLR) in t(z)if the distributions indexed by different 9’s are distinct and if f(z; 9)/f(z; 9’) is a nondecreasing function of t(z) for 8 > 8’.

The following result (see Lehmann (1959, p. 101) pertains to the monotonicity of the power function (or the OC function)). Theorem 2.9.3 Let X1,X2, ... be i.i.d. random variables with the density f (2;9) which has MLR in T(x). Then the SPRT for testing HO : 9 = 90 us. HI : 9 = 91 (90 < 91) has a non-decreasing power function. Corollary 2.9.3.1 If f(x;9) has MLR in T(x),then the SPRT is unbiased [that is, OC(Oo) > OC(Ol)]. 2.10. THE RESTRICTED SPRT 47

2.10 The Restricted SPRT

Although the SPRT has the optimum property, in general its expected sample size is relatively large when the parameter lies between the two values specified by the null and alternative hypotheses (that is, a large number of observations is expected, in cases where it does not make much difference which decision is taken). Then one can ask whether there are other sequential procedures which will reduce the expected number of observations for parameter values in the middle of the without increasing it much at the hypothesized values of the parameter. Another difficulty with the SPRT is that for most cases its number of obser- vations (which is random) is unbounded and has a positive probability of being greater than any given constant. Since it is not feasible to take an arbitrarily large number of observations, often the SPRT is truncated. A truncated SPRT with the same error probabilities may have considerably increased expected sample size at the hypothesized values of the parameter. As an alternative to the SPRT, Armitage (1957) has proposed certain re- stricted SPRT’s, leading to closed boundaries in testing hypotheses regarding the mean of a normal population. He converted the boundaries to the Wiener process, however he only approximated the probabilities and expected time of the procedure based on the Wiener process. Donnelly (1957) has proposed straight line boundaries that meet, converted to the Wiener process and obtained certain results , Anderson (1960) has also considered a modified SPRT in testing hypotheses about the mean of a normal population with known variance and has derived approximations to the operating characteristic (or power) function and the av- erage sample number. We now present Anderson’s procedure which is similar to Armitage’s and Donnelly ’s procedure. Anderson’s (1960) method consists of replacing the straight line boundaries by boundaries that are linear in the sample size. The method is easily applicable to the exponential family of distributions, because the SPRT can be based on a sum of i.i.d. random variables, each of which has a distribution belonging to the exponential family. Assume that observations are drawn sequentially from a normal distribution with mean 8 and known variance 02.We wish to test HO : 8 = 80 vs. HI : 8 = 81 (81 > 80) with a procedure which either minimizes Ee(N) at 8 = (80 + 81) /2 or (alternatively) minimizes the maximum of Ee( N). Replacing the observa- tion X by the transformed observation [X - (80 + 81) /2] /o and calling 8* = (61 - 8,) /2a, the hypotheses become HO : 8 = -8* and HI : 8 = 8* (8* > 0) when sampling from the normal population having mean 8 and variance 1.

Restricted SPRT Procedure: Let c1 > 0 > c2. Take (transformed) observa- 48 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST tions X1,X2, ... sequentially. At the nth stage: reject Ho if

accept 00 if

i=l and take one more observation if neither of the above occurs. If no observations are drawn: stop sampling, and reject Ho if

nn

i=l accept HO if

i= 1 To avoid intersection of the lines before truncation point, one requires

c2 + dz(n0 - 1) < c1 + d&o - 1). Also, since we wish the lines to converge, we require dl < 0 < d2. Because of the symmetry of the hypotheses about 6' = 0, consider the case when the error probabilities are equal. Since the problem is symmetric, it is then reasonable to consider only symmetric procedures, that is procedures with c1 = -c2 = c and -dl = d2 = d and k = 0. To calculate the probabilities and expected values that are of interest is com- plicated. However, one can calculate such quantities if

i= 1 is replaced by an analogous X(t)(0 5 t < 00)) the Wiener process with E [X(t)]= 8t and var [X(t)]= t. Anderson (1960) has derived expressions for the probability of rejecting HO as a function of 6, and the expected length of time. That is, he proposed to obtain a specified significance level at -0* and a specified power at O* with some minimization of the expected time. Then the OC and expected time are approximations to the OC and expected sample number when observations are taken discretely. One might hope that the approximations are as good as the corresponding quantities for the SPRT, which is the special case with dl = d2 = 0 and no = oo,T = 00 where T is the truncation point of the Wiener process considered here. 2.10. THE RESTRICTED SPRT 49

Anderson (1960) obtains the probabilities and expected times as infinite series of terms involving Mill’s ratio. Subject to the condition that the error probabil- ities are the same, the constants c and d are varied so as to obtain the smallest expected observation time at 8 = 0. The line ~t:= c - dt has intercept c at t = 0 and c - dT at t = T. When c + dT = 0, the two lines converge to a point. For each of several values of the ratio of these two intercepts, [(c-dT)/c = 0,0.1,0.2]. Table 2.10.1 gives the c and T (and hence d) that approximately minimize the expected observation time at 8 = 0.

Table 2.10.1 a = ,O = .05(.01), 8 = -.l and .1 Expected time Expected time Condition C T at8=O at 8 = -.1,8 = .1 Fixed size 270.6 (541.2) 2 70.6 (54 1.2) 270.6 (54 1.2) SPRT 14.7(23.0) oo(00) 2 16.7(528.0) 132.5(225.0) c-dT=O 19.9(35.5) 600.3(870.3) 192.2(402.2) 139.2(249.4) c - dt = .Ic 20.1 (35.5) 529.0( 783.2) 192.2(402.2) 139.3(249.4) c - dt ,= .2c I 20.3(35.5) 457.1(700.0) 192.2(402.8) 139.8(249.8) In Table 2.10 I,the values inside parentheses correspond to error probabilities ac = ,O = .01. These computations suggest that the convergent line procedures show a considerable improvement over the SPRT at 8 = 0 with a moderate decrease in efficiency at 0 = -.l and 8 = .l. When the error probabilities are each 0.05 the expected time at 8 = 0 is 24.5 less than the SPRT and is 6.7 more at 8 = f0.1 (a ratio of 3.7 to 1);at the 1% levels it is 125.7 less at 0 = 0 and 24.2 more at 0 = fO.l (a ratio of 5.2 to 1). When we operate at the 5% levels we are better off with the modified SPRT if intermediate values of 8 occur at least 1/4 of the time and when we operate at the 1%level if intermediate values occur at least 1/6 of the time. The difference in the expected times at 8 = 0 when the ratios of the intercepts are 0 and 0.1 is not significant, because in the latter case the probability of reaching a decision at t = T is almost zero. Bartlett (1946) has obtained the probability of absorption before time no (which is approximately equal to the probability of crossing the upper boundary with not more than no transformed observations) :

Po(8,no) = 1- @ [ - + exp [2c(~+ 41 [ -C - (e-& d)no3 (2.10.1) with Po( -8* ,no) = a,where @ denotes the standard normal distribution function, and Po denotes probability under Ho. Armitage (1957) suggested using

~=-1n(~),1 1-a d=-- A (2.10.2) A 2, 50 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST where A is the solution of the equation:

2 2

Let p(O,d,c) denote the probability of accepting HO (for the transformed observations). One may think of p (8,d, c) for 8 2 0 as a probability of an incorrect decision. Analogously let q (8,d, c) denote the probability that the continuous Brownian motion X(t)on [O,oo) exits I(t) : (-c + dt, c - dt) through the lower boundary. The quantity q (8,d, c) is an approximation to p (8,d, c); Anderson (1960, p. 170) remarks that it is actually an upper bound for p (0, d,c) when 8 2 0. Fabian (1974) has derived a simple and explicit expression for q (8,d, c) when B and d are such that Old is an even integer. Interpolation methods can be used for other values of Old. Also one can choose d in some optimal sense. In general, when testing that the normal mean 8 = -8* vs. 8 = 8* (8* > 0) at a prescribed level, the asymptotically optimal values of d (that d which makes the expected sample size at 8 = 0 minimal) is d = 8*/2 (which is easy to see by using the strong law of large numbers). We now present the results of Fabian (1974) and Lawing and David (1966).

Theorem 2.10.1 We have

(2.10.3) and

( 2.10.4) where (2.10.5) and Sij is Kronecker’s delta.

Proof. See Theorem 2.2 of Fabian (1974). H

Often with a preassigned y and given 8 we wish to determine d and c so that

If we also specify d, then the value of c satisfying (2.10.6) is uniquely determined and is given by In q-l C= (2.10.7) 2 (8 - d) 2.1 1. LARGESAMPLE PROPERTIES OF THE SPRT 51 where XP is the solution of q(0,d, c) = y with q (0, d, c) given by Equation (2.10.4). (Notice that Paulson’s (1964) bound, q(0,d, c) 5 exp [-2c (0 - d)] , yields c given by (2.10.7). Fabian (1974) has computed the values of XP for given values of y and Bid, and these are given in Table 2.10.2.

Table 2.10.2 Values of Q for which q(B,d,c) = y y = .1 .05 .01 .005 .001 Old ~ 2 .2 .1 .02 .010 .002 4 .13443 .06237 .01126 .00548 .00105 6 .12367 .05742 -01055 .00518 .00101 8 .11957 .05571 .01035 -00511 .00101 10 .11745 .05487 .01027 .00508 .00100 12 .11617 .05439 -01023 .00506 .00100 00 .11111 .05263 .01010 .00502 .00100

2.11 Large-Sample Properties of the SPRT

In this section we consider the asymptotic behavior of the error rates and ASN of Wald’s SPRT, which were studied by Berk (1973). Assume that X,XI, ... are i.i.d. with common pdf fi under hypotheses Hi, i = 0,l. Wald’s SPRT of HOvs. HI uses stopping time N = inf {n: Sn 6 (b,~)}, (2.11.1) where Sn = Cj”=,Zj, Zj = In [fl(Xj)/fo(Xj)]and b, u are two numbers. (As- sume that 2 is finite with probability one (wpl).) The error rates are denoted by a and ,O, i.e. a = Po (SN2 u) and ,O = PI (SN5 b). Sometimes (a,,O)is called the strength of the test. Wald’s (1947) inequalities for (a,,O)(see Theorem 2.2.2, p. 12) may be written as

Note that these inequalities are general and do not depend on the common distribution of the Xi. Further, it was shown in Section 2.4 that if X,XI, ... are i.i.d. and P(Z = 0) < 1, then E{exp(tN)} < 00 for some t < 0. (Here P and E refer to the true distribution of X,which need not be either fo or f1.) In particular EN < 00. We assume throughout this section that P(Z = 0) < 1. Suppose that c = min(-b, u) --+ 00, and write lim, for limc+m. Then lim, a = lim, ,O = 0 and wpl lim,N = 00, consequently lim,E(N) = 00. The following theorem states precisely the asymptotic behavior of N and E (N).

Theorem 2.11.1 (Berlc, 1973 and Govindarajulu, 1968~). Suppose that X, XI,... are i.i.d. with finite p = E (2).Then if p > 0, wpl 52 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

(i) lim, I{sNla} = lim, P (SN2 a) = 1, and (ii) lim, N/a = lim, E(N)/a= 1/p. If p < 0, then wpl @) lim, I{SN 0, let first n 2 lsuch that Sn > cg(n) r = r(c) = 00,if no such nexists.

Let X = X(c) be the solution of the equation.

pX = cg(X).

Assume that g(n) = o(n) and X(c) is unique for suficiently large c. Also, for some S E (0,l) and L slowly varying (that is, limt+oo [L(zt)/L(t)]= 1 for every z E (0, co)) we assume that g(n) - n6L(n). Then

lim A-’E (T) = I. c-mo

Proof of Theorem 2.11.1. Consider the case p > 0. Since limn Sn/n = p wpl, limnSn = 00 wpl. Thus S, = minnSn is finite wpl. We then have

I{SN,b} 5 I{s*sb}-+ 0 wpl as c -+ 00. Thus lirn,I{~~>~}= 1 and by the dominated convergence Theorem lim,P (SN2 a) = 1. Since wpl lim, N = 00, lim,SN/N = p wpl. By the definition of N,

SN-ll{SN>a) < al{s~>a) 5 sN1{s~>~)-

On dividing throughout by N and letting c -+ 00, the extreme terms both ap- proach p wpl; thus wpl lim, a/N = p or lim, N/a = l/p. By Fatou’s Lemma liminf, E(N)/a2 l/p. Now, let t = inf {n: Sn 2 a}. Clearly N 5 t. From Lemma 2.11.1, it follows that limcE(t)/a = 1/p. Hence, we have limsup, E (N)/a 5 l/p. Now the proof is complete for the case p > 0. The proof of the theorem for p < 0 is analogous. This completes the proof of Theo- rem 2.11.1.

Theorem 2.11.1 shows that Wald’s approximation to the ASN is asymp- totically correct. This approximation (see Wald, 1947, p. 53) applies when 0 < 1p1 < 00 and may be written as

E(N)= [bP(sN 5 b),+ ap(sN > a)]/p (2.11.3) 2.11. LARGESAMPLE PROPERTIES OF THE SPRT 53

According to Theorem 2.11.1, the ratio of the two sides of (2.11.3) approaches one as c --+ 00. From Wald's inequalities (2.11.2) one can obtain the cruder inequalities

Q L exp(-a), P L exp(b). (2.11.4) The next theorem shows that asymptotically, the inequalities in (2.11.4) are, in some sense, equalities.

Theorem 2.11.2 (Berk, 1973). Let XI,X2, ..., be i.i.d. and Ei 121 < 00 for i=O,l. Then lima-l In(l/cu) = 1 = 1im(-b)-'ln(1/P). C C

Proof. See Berk (1973) or Govindarajulu (1987, pp. 123-124).

Remark 2.11.1 This result also shows that Wald's approximations for the error rates [obtained by treating the relations in (2.11.2) as equalities and solving for (a,P)]are asymptotically correct in the sense of the theorem.

An approximation for the power curve of the SPRT is obtained (see Sec. 2.3) under the additional assumption that, for some (necessarily unique) real number h # 0, E {exp hZ} = 1. In the above notation the approximation to the power may be written as

Theorem 2.11.3 Together with Wald's method of considering the auxiliary SPRT generated by Sn = hSn with stopping boundaries zf h > 0 (b', a') = (hb,ha) , { (ha,hb), zf h < 0 establishes the following corollary.

Corollary 2.11.3.1 Let X, XI,X2, ..., be i.i.d. and Ei 121 < 00 and ("or some h # 0) E(exph2) = 1. Then lim(-ha)-'lnP (SN2 a) = 1 if h > 0, lim(-hb)-'lnP(SN 5 b) = 1 zf h

Proof. Power of the SPRT is equal to the probability of type I error for the auxiliary SPRT. That is,

P (sN 2 u) e--ha, (h > 0). 54 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

Similarly, power of the SPRT = 1 - P (SN5 b) = 1 - e-hb, (h < 0) after using Theorem 2.11.2, since

P(SN5 b) (1 - eha) / (ehb - eha) -. e-hb when h < 0, c is large.

When h = 0, the limiting power is -b/(a - b).

Example 2.11.1 Let X be normally distributed with mean 0 and variance 1. We wish to test HO : 0 = 0 versus HI : 8 = 1. Let a = ,6 = 0.01. Then

a = -b= ln99, 2 = x- 1/2, E(Z) = e-1/2=p7

and hence h = 1 - 28. Then

In 99 .nn a I EON - 1I tY > 1/2, - (0 - 1/2) -. - lng9 if0 < 1/2.

Power at 0 = Pe (SN2 u) exp (-ha) where h = 1 - 20. So

2.12 Problems

2.1-1 Let the random variable X have the density function

We wish to test Ho : S = 1 versus HI : S = 2. Construct Wald’s sequential probability ratio test.

2.1-2 Let X have density function Oexp(-Ox), x > 0. We wish to test HO: 0 = 2 versus H1 : 0 = 1 with a = .05 and ,6 = .lo. Construct Wald’s SPRT. 2.12. PROBLEMS 55

2.1-3 Let .7, .8, .9, .9, .85 denote the squares of observations drawn randomly from the density (0 < x < 1). We wish to test HO : 8 = 1 versus HI: 8 = 3 with a = ,B = .lo. Carry out the SPRT for the above data and see whether you accept HO or HI.

2.1-4 Let X be distributed uniformly on [8,$ + 21. We wish to test Ho : 8 = 0 vs. H1 : 8 = 1. Let 1.2, 1.5, 1.8, 1.4, .9 denote a random sample drawn sequentially from the above density. Carry out Wald’s SPRT.

2.1-5 Let X be a non-negative random variable having the probability function

1 - h(1- a)-l, for x = o f(x;e> = { Oax, for x = 1,2, ....,

where a is a known constant (1/2 < a < 1). We wish to test HO : 8 = 80 versus HI : 8 = 81 (0 < 80 < O1 < (1 - a)/.). Construct Wald’s SPRT using a = ,B = .05. [Hint: Let m denote the number of zero observations in a random sample of size n and without loss of generality set z1 = 22 = * ’ * = xm = 0, rn 5 n.] 2.1-6 Let the common distribution of Xl,X2,... have a density

Construct the sequential probability ratio test of the hypothesis HO : S = -1/2 versus HI: S = 1/2.

2.3-1 Obtain the relation between 6 and h in the SPRT for 80 = 0 vs. 81 = 1 in a normal population with unit variance. Plot the OC function of the test with a = ,B = .01.

2.3-2 Show that, in testing 8 = 80 vs. 8 = 81 in a Poisson population, the relation between 8 and h is

2.3-3 Obtain the graph of the OC(8) in the SPRT for testing the density Boe-eOz VS. (X > 0) using 00 = 2, = 1,a = .05, and ,B = .01.

2.3-4 Let X be normally distributed with mean 8 and known variance 02.We are interested in testing HO : 8 = $0 vs. 8 = $1. Show that el -+ eo - 28 h(8)= $1 - $0 - 56 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

2.3-5 Show that for Wald's SPRT, B = 1/A when a = p.

2.3-6 Obtain the relation between 8 and h in the SPRT for testing HO : 8 = 80 vs. 8 = 81 for the exponential family given by

f(z;e) = B(0)exp [~R(x)]h(x).

[Hint]: Note that

1=

1=

2.4-1 Let f(x;a)= az(l - a), x = O,l,..., 0 < o < 1. We wish to test Ho : a = a0 vs. a = 01. Tabulate some values of the OC function with a0 = 1/2, 0-1 = 3/4, a = p = .05. Obtain an expression for the OC function. Also find E,(N).

2.4-2 Let {Xi} be independent and identically distributed according to the Pareto density Oae/xefl for x 2 a. Here a is known and we wish to test HO : 8 = 00 vs. HI : 8 = 81 (0 < 00 < el). Construct the SPRT and obtain its ASPJ and OC(8).

2.4-3 Let X have the probability density function

We wish to test HO : 8 = 1 vs. 8 = 2. Construct Wald's SPRT and find its ASN, ocp).

2.4-4 Find the ASN curve for testing HO: f(z;00) = t90e-zeo vs. H1 : f(z;81) = Ole-zel (x > 0) using 00 = 2,01 = 1, a = .05 and p = -10. Notice that for one of the five points used E(N)= 0. In view of the fact that N 2 1, this result must be wrong. Explain. 2.12. PROBLEMS 57

2.4-5 Find the ASN of the SPRT for testing 0 = 80 vs. 8 = 81 in a normal population with unit variance (use Q = ,O = -01).

2.4-6 Let 21,22, ... be an i.i.d. sequence random variables and let Wi = lZil (i = 1,2, ...) and SN = ELl Zi where N is any stopping variable such that the event {N 2 i} is independent of Zi, Zi+l, .... Show that E(w/)5 6k < 00 (i = 1,2,...) and E(Nk)< 00 imply E(S%)< 00. [Hint: Ericson (1966, Theorem I).] 2.5-1 Let the common pdf of X1,X2,... be a (1 - 8') exp (- 1x1 + 8z) for 181 < 1, --oo < x < 00. Evaluate E(N) for the SPRT of HO : 8 = -112 vs. Hl : 8 = 1/2 with a = ,O = .01.

2.5-2 For the pdf considered in Example 2.5.1, derive the exact upper and lower bounds for the OC and ASN functions.

2.5-3 Let X take on values -1,O, 2 with probabilities 81,l - 81, -02, and 82, respectively. We wish to test

against H~ : el = e2 = 116.

Using a = p = 0.05, find the exact values for OC(8) and E(N) where e = el + e2.

2.5-4 Let X take on values -2, -1,1,2 with probabilities 81,&, 1 - 281 - 82, and 81, respectively. We wish to test

against H~ : el = e2 = 112.

Using Q = 0 = 0.5, find the exact values for OC(8) and E(N) where 8 = (s1,e'). [Hint: Z(X)takes on values In 2,0, - In 2 with probabilities 02, 1- 281, -02, and 201, respectively.]

2.8-1 Let f(z;0) = 0" (1 - t9)'-" , x = 0,l and 0 < 8 < 1. We wish to test HO : 8 = 0.5 versus H1 : 8 = 0.75. Using a = p = .05 and no = 25, compute the bounds for the error probabilities when Wald's truncated SPRT is employed. 58 CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST

2.8-2 In Problem 2.8-1 show that, for the SPRT truncated at n = no = 5, a(n0) 5 0.2281 and @(no)5 0.3979.

2.8-3 Let f(x;8) = e-e8z/x!, x = 0,1, ... and 8 > 0. We wish to test HO : 8 = 1 versus HI : 8 = 2. Using a = p = .05 and no = 25 evaluate the bounds for the error probabilities of the truncated SPRT.

2.8-4 Let f(x;8) = 8 (1 - 8)z, x = 0,1, ... and 0 < 8 < 1. We wish to test HO : 8 = 0.5 versus HI : 8 = 0.75. Using a = ,O = .05 and no = 30 compute the bounds for the error probabilities of the truncated SPRT.

2.8-5 Let X be distributed as normal (8,l). We wish to test HO : 8 = -0.5 versus HI : 0 = 0.5 with a = p = -05. Using no = 25, find upper bounds on the effective error probabilities when Wald’s truncated SPRT is used.

2.8-6 Let X be Bernoulli with 0 denoting the probability of a success. Suppose we wish to test HO : 0 = 0.5 versus HI : 0 = 0.9 with a = p = .05. Using no = 25, find upper bounds on the effective error probabilities when Wald’s truncated SPRT is used.

2.10.1 Let 8 denote the probability of obtaining a head in a single toss of a coin. Suppose we wish to test HO : 8 = 1/2 vs. HI : 8 = 3/4. Can you obtain a restricted SPRT for this binomial problem? [Hint: The binomial tends to the normal distribution when suitably stan- dardized. Also see Armitage (1957) .]

2.11.1 Let X be a normal (8,l) variable. We wish to test HO : 8 = -1 versus HI : 0 = 1. Using a = p = .01, find the asymptotic expressions for E(N(Hi)i = 0,l and the power function. 2.11.2 Let X have the logistic distribution function

We wish to test HO : 8 = -0.5 versus HI : 6 = 0.5. Using a = ,B = .01, find the asymptotic expressions for E(NIHi) i = 0,l and the power function. 2.11.3 Let the random variable X have the density function

f(x;0) = cr2xe-x6, x,cr > 0 and 0 elsewhere.

We wish to test HO : o = 1 versus HI : 0 = 2 . Using a = p = .01, find the asymptotic expressions for E(NIHi) i = 0,l and the power function. Chapter 3

Tests for Composite Hypotheses

3.1 Method of Weight Functions

In Chapter 2 we have considered SPRT’s to test a simple hypothesis against a simple alternative. However in practical situations the simple null hypothesis is only a representative of a set of hypotheses; the same thing can be said about the simple alternative. Thus, we are faced with the problem of testing a composite hypothesis against a composite alternative. The compositeness of the hypotheses can arise from two situations: (i) the composite hypotheses are concerned about the parameters of interest and there are no nuisance parameters, and (ii) the hypotheses may be simple or composite, but one or more nuisance parameters are present. Let f(x;6) denote the probability function (or probability density functions) of X, indexed by the unknown parameter 0 (which may be vector-valued). In general, we wish to test the composite hypothesis Ho : 6 E wo against the com- posite alternative If1 : 6 E w1. Let S1 denote the boundary of w1. Wald (1947) proposed a method of “weight functions” (prior distributions) m a means to con- struction of an optimum SPRT. Assume that it is possible to construct two weight functions gO(6) and gl(0) such that logo(W6 = 1, s,, g1(Q)dS = 1, (3.1.1) where dS denotes the infinitesimal surface element. Then the SPRT is based on the ratio

(3.I. 2) and satisfying the conditions:

59 60 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

(i) the probability of type I error, a(0),is constant on wo; (ii) the probability of type I1 error, p(e), is constant over S1; and (iii) for any point 0 in the interior of w1 the value of p(0) does not exceed its constant value on S1.

Wald (1947, Section A.9) claims that the weight functions gi(0) (i = 0,l) are optimal in the sense that for any other weight functions ho(0) and h1(0), the associated error probabilities a*(0)) p*(0) satisfying (as good approximations)

(3.1.3)

are such that: 1-B maxa*(0) 2 -- maxa(0) (3.1.4) eEwo A - B eEwo and B(A- 1) max p* (0) 2 = maxp(0). (3.1.5) eEwl A- B eEwl

3.1.1 Applications of the Method of Weight Functions (a) Sequential Binomial Test

Let X take 1 or 0 with probability 0 and 1 - 0 respectively. We wish to test HO : 0 = 1/2 against two-sided alternative, HI : 10 - 1/21 2 6 > 0. So, let go(1/2) = 1 and go(6) = 0, for 8 # 1/2, and gl(01) = gl(l - 01) = 1/2 and gi(0) = 0, otherwise, where 01 = 1/2 + S. Then, if m denotes the number of positive observations in a total sample of size n, the continuation region of the SPRT is given by

B < 2n-1 [e;" (1 - el)n-m + qrn(1 - el)"] < A (3.1.6) Note that the sequential binomial test given by (3.1.6) is not optimal in the sense of (3.1.5) since it may not satisfy (iii).

(b) Sequential Chi-square Test Let Xi be independent and normally distributed having unknown mean 0 and variance 02. We wish to test HO : O- = 0-0 against HI : O- = 0-1 > 0-0. Here choose g(0) = 1/2c, -c 5 0 5 c, and zero otherwise. The limit of the ratio of the modified likelihoods under HI and HO tends to (as c + 00)

(3.1.7) 3.2. SEQUENTIAL T AND T2 TESTS 61

It should be noted that the ratio in (3.1.7) indicates that the problem of testing HO against HI with 8 as a nuisance parameter is equivalent to the problem of testing a simple hypothesis about u against a simple alternative with known mean zero from a sample size n - 1. This can be established using Helmert’s transformation:

[XI + * ’ * + xj-1 - (j- 1)Xj] Yj = , j =2,3,.., n. (3.1.8) [j (j - W2 Hence

Thus the properties of the SPRT based on the ratio (3.1.7) can be studied via the properties of Wald’s SPRT considered in Chapter 2 and it is optimal in the sense of (3.1.5) (Apply Theorem 2.9.1).

3.2 Sequential t and t2 Tests

Here we will provide a practical situation in which the following hypothesis-testing naturally arises. Given a random variable X and a given number M,often we will be interested in knowing whether P(X < M) is equal to p or p’ where p and p’ are specified. For instance, X might be the tensile strength of a steel rod and M a lower limit below which the tensile strength should not fall; a rod is classified as defective if its tensile strength is less than M and then P(X < M)would be the proportion of defective rods in a large batch. We might wish to know whether this proportion of defective rods was equal to a low value p or to a relatively high value p‘. Since the tensile strength can reasonably be assumed to be normally distributed with mean p and variance a2,P(X < M) = @ [(M- p) /o].Since we can shift the origin of measurements to M, we can, without loss of generality, set M = 0 and then P(X < 0) = @ (-7) where 7 = p/a. If o is known, one can easily set up a sequential test. Let X be normally distributed with mean 8 and unknown variance u2. We wish to test HO : 8 = 80 vs. HI : 18 - 80) 2 So where S > 0. Then the boundary S1 consists of all points (8, a) for which 18 - 801 = So,i.e., it consists of two points for each fixed 0. Define 90 (8,o) = 1/c if 0 5 u 5 c, 8 = 80 (and zero elsewhere), and 91 (8,a) = 1/2c if 0 5 u 5 c, 8 = 80 f 60 (and zero elsewhere). One can 62 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES easily obtain

where n s = C (x;- eo)2= (n- 1)s; + n (X.- eo)2 i= 1 and Xn and si (respectively) denote the sample mean and sample variance. By letting S/2a2 = v, one can show that the integral in the denominator is equal to 2(n-3)/2 xr[(n - 1)/2] S-(n-1)/2. Also by letting S/2a2 = v and (Xn- 00) /N2= T we have

e-n62 /2 fi n @(T;S,n)= lim - = Irn21(n-3)/2e-v cosh c--t- fo,n r [(n- 1) /2] 0 Thus, the limit of the modified likelihood ratio is a function of T only. Also, since $(-T) = $(T),it is a function of ITI. Furthermore, $(T) and 2 (57, - 80) /S are single valued functions of IXn - 801 /Sn. Now since the joint distribution of {IX,- 001 /Sn, n = 2,3, ...} depends only on = 10 - 801 /a, it follows that (i) a(0,o)is constant on wo and (ii) P(8,a) is a function of 171 = (0 - 801 /a. Analogously, for the sequential t-test by taking g1 (0,o) = 1/c if 0 5 o 5 c and 8 = &+So (and zero elsewhere) we obtain the limit of the modified likelihood ratio to be

- Thus the sequential procedures can be based on tn where tn = Jn (X - 80) /Sn, X denoting the sample mean and sn the sample standard deviation respec- tively, that are based on n observations. That is, the sequential t (or t2)test of HO : 0 = 80 VS. the alternative HI : 8 - 80 2 SO (18 - 001 2 SO) can be de- scribed as follows: If the experiment has proceeded until nth stage, the sampling continuation region is given by

where the constants B, and A, (Bk and A:) are obtained by inverting the in- equality: B < Q1 (T;6, n) < A [B< @ (T;S, n) < A] in terms of tn [ltn1].David and Kruskal (1956) obtain an asymptotic expression for q1 (T;S, n) and appeal- ing to the asymptotic normality of T when suitably standardized show that the sequential t-test terminates finitely with probability one. 3.2. SEQUENTIAL T AND T2 TESTS 63

Rushton (1950) has obtained an asymptotic approximation to In $l. Let

where tn denotes the Student’s t-statistic based on a random sample of size n. If the sampling continuation region for the sequential t-test is (approximately) of the form (3.2.3) then Rushton (1950) has obtained, for large n

From numerical study, Rushton (1950) concludes that one can use the approx- imation (3.2.4) with confidence and that one should add the term [4 (n- 1)I-l and S2T*2[24 (n - 1)I-l only when he/she is about to reach a decision. Rushton’s approximation to Wald’s t2-test is

(3.2.5)

3.2.1 Uniform Asymptotic Expansion and Inversion for an Inte- gral Let e-62n/2 L1= z(n-3)/2e -x+6T* &dx rb- WI Sm0 and where T* = nT and S,L1 and L2 are given positive numbers and n is large. We want to solve these equations for T* as a function of n,L1 or L2 and 6, that is T* = T*(n,Li,S).We will show that the solution of the second equation is closely related to the solution of the first equation. Towards the solution of the first equation, let p2 = 2n - 3 and - ptfi = ST*. Now, we solve for t. Employing some uniform asymptotic expansions, Govin- darajulu and Howard (1989) obtain, after writing

t = to+ tlp-2 + (3.2.6) 64 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES to(t0 < 0) is the solution of the transcendental equation

tl is given by

and

{ [ (1 + ti)1/2 - to] t: - tot1 + 27-4 (to) / (1 + ti)} t2 = t2 (t0,tl) = 2 [tu - (1 + ,317 (1 + where ul(t)= -(t3 + 65)/24. In the following we provide a table of values of to corresponding to various choices of 6. Table 3.2.1 Negative Root: -to(b) 6 -to (6) 6 -to@) .1 2.59 (loM3) 1.1 -26 .2 9.95 10-3) 1.2 .31 .3 2.22 low2 1.3 .35 .4 3.92 1.4 .40 .5 6.06 loM2 1.5 .45 .6 8.62 low2) 1.6 -50 .7 .116 1.7 .55 .8 .148 1.8 .60 .9 .184 1.9 .65 1.0 .223 2.0 .70

Quadratic interpolation in Table 3.1 will yield at least two significant figures of accuracy. If the sampling continuation region for the sequential test is given by

with the constants An,l and B,,J given by 3.2. SEQUENTIAL T AND T2 TESTS 65 where to = t(S), tl = tl (to,B) and t2 = t2 (to,tl), and An,1 is given by the same formal expression as Bn,1 except A replaces B everywhere. For the sequential t2-test, if the continuation region is

Bn,2 < IT*l < An,2 where Bn12 = Brill except 2B replaces B and where An,2 = A,,J except 2A replaces A.

3.2.2 Barnard's Versions of Sequential t- and t2-tests The test criteria are given by Wl(T*,S,n) = exp (-;"">- { F (n572)- 1.6~:~)

+JZ(aT*)F(T,q;T) n+l and W2(T*,6, n) = exp where F(a,c; x) denotes the confluent hypergeometric function. For the sequen- tial t, Govindarajulu and Howard (1989) and Rushton (1952, p. 304) show that

Gl(T*,S,n) = exp(<)-n6 {F(-n-1 -*-)1 62T*2 2 '2' 2 + JZ(fiT*)F (2'n 5;3 T)S2T*2 [. (;) hh } , and q2(T*,6, n> = exp (4)-nS F (-n-1 --1 -)S2T*2 2 '2' 2 ' That is, Barnard's criteria use the parameter n in the first argument of the F function and in the gamma functions, whereas Wald's criteria use the parameter n - 1 in the same places.

3.2.3 Simulation Studies Govindarajulu and Howard (1989) carry out simulation studies in order to com- pare their approximations and Rushton's (1950, 1952) approximations for the sequential t and t2-test using Wald's versions for the test criteria. The following results were obtained which are based on 500 replications with a = ,6 = 0.05. 66 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Table (R Mean Stopping Time Mean Difference S.E. of d Test S R G&H d=R-G&H SJ=&m t-test 0.25 100.11 97.63 2.48 0.78 0.50 28.18 27.44 0.74 0.24 1.0 10.55 10.16 0.39 0.06 1.50 7.82 7.06 0.77 0.06 1.75 7.53 6.42 1.11 0.07 t2-test 0.25 114.86 115.03 -0.17 0.04 0.50 32.62 33.02 -0.40 0.10 1.0 11.59 11.80 -0.21 0.03 1.5 8.35 8.26 0.09 0.06 1.75 7.85 7.37 0.48 0.04 On the basis of these simulation studies, we note that the average stopping time based on Govindarajulu and Howard’s approximation for the sequential t-test is consistently smaller than the one based on Rushton’s (1950) approxi- mation. In the case of t2-test, Rushton’s (1952) approximation to the t2-test is slightly better than Govindarajulu and Howard’s for small 6 (6 5 1)’while the latter is slightly better than Rushton’s for large S (S > 1). Remark 3.2.3.1 The current approximations to Barnard’s versions of se- quential t- and t2-test criteria can be obtained by making the following changes:

(i) Set p2 = 2n - 1; (ii) Change A to Aexp (-S2/2) ; (iii) Change B to B exp (-S2/2) ; in the expressions for Anli and Bnli (i = 1’2).

3.2.4 Asymptotic Normality of the Statistic T Toward this we need the following lemma.

Lemma 3.2.1 Let x: be a chi-square variable with v degrees of freedom. Then (2~:)~’~- (2v)l12 is approximately standard normal for large v. Proof.

= P{- x:-v

Corollary 3.2.1.1 If s: denotes the sample variance in a random sample of size n, &(sn/a) - Jn is approximately normal with mean 0 and variance 112.

Lemma 3.2.2 Let tn = Jn (xn - 00) /Sn. Then for suficiently large n, t, - Jnq is approximately normal with mean 0 and variance 1 + q2/2 where r] = (8 - 80) /a when 8 is the true value of the parameter.

Proof. One can write

in distribution since Sn/o converges to one in probability. Thus, when 8 is the true value for large n, the distribution of tn - fiq is normal with mean 0 and variance 1 + q2/2, after using the above corollary. H

Next, one can write

T*= fit, (n- I + tn)2 -1/2 tn (1 + 5) -1’2

Now, for y > 0

Hence 68 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Then

Thus

Thus it follows that T* - fiq (1 + q2)-lI2 is for large n, asymptotically normal with mean zero and variance (1 + q2/2) (1 + q2)-3 when 8 is the true value since -y - &q (1 + q2)lI2tends to --oo for large n. Alternatively, we can use the delta method as follows. Let

Then we can write T* as 3.2. SEQUENTIAL T AND T2 TESTS 69

Since -1/2 g(x) = x (1 + ;) 7

Hence

Thus or

However,

Hence

That is

when 8 is the true value and g (&q) = fiq (1 + q 2 ) -1/2 .

3.2.5 Finite Sure Termination of Sequential t- and t2-tests We have shown [see also David and Kruskal (1958) or Cram& (1946, Section 28.4)] that if q = (0 - 00) /a,then

T* A Zo* + n1/2v (1 + q2)-lI2 (in distribution), 2 -3 g*2 = (1 + r12/2) (1 + rl ) 7 where Z has asymptotically a standard normal distribution. Let N (N*)denote the stopping time for the sequential t(t2)-test. Then

P(N = 00) = lim P(N > n), n-oo 70 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

P(N > n) = P(Bk1 < T* < Ak1 for all k 5 n) 5 P(B,1 < T* < A,1)

after substituting the asymptotic expressions for A,1 and B,1; and noting that to is free of A or B. Thus P(N = m) = 0. In order to show the finite sure termination of sequential t2-test, consider

and show that each probability on the right side tends to zero by proceeding as in the case of sequential t. The sequential t has the following properties.

Property I: The (one-sided) sequential t-test terminates finitely with prob- ability one. Let PT (0,a)denote the probability of Type I1 error when the true parame- ter values are 8 and 6. If it depends only on a single function r(8,a) of the parameters, we shall denote it by PT (7). The same convention is adopted for aT (@,a)-

Property 11: For the sequential t-test T of Ho against one-sided HI,PT (7) is a decreasing function of q = (6' - 00) /a > 0.

Let Ca,p denote the class of tests with average error of Type I equal to a and average error of Type I1 equal to P, the two averages being with respect to any two weight (prior density) functions defined over the regions 6' = 6'0 and 8 - 6'0 2 60 respectively. Let C& be the class of weighted probability ratio tests belonging to Ca,a. Further, CA,B be the class of weighted probability ratio tests with boundaries A,B. Then, we have the following theorem pertaining to the optimum property of the sequential t-test.

Theorem 3.2.1 (J.K. Ghosh, 1960). Let T be any sequential test of HO : 6' = 6'0 against Hl : (6' - 6'0) 2 60 > 0 with error probabilities a~ (6'0) = a and 3.2. SEQUENTIAL T AND T2 TESTS 71 p with respect to 8 - 80 = 60. Then, T has the double minimax property in the class Car,@,namely: for TI E Ca,p

Corollary 3.2.1.1 If Wald's approximations to the boundaries of the proba- bility ratio an terms of the error probabilities is allowed, T has the double minimax property in the class CA,Bwhere A = (1 - p) /a and B = p/ (1 - a).

3.2.6 Sequential t2-test (or t-test for Two-sided Alternatives) Let us consider the sequential t-procedure as given in (3.2.1) and study its prop- erties.

Property I: The sequential t2-test terminates finitely with probability one.

Property 11: The sequential t2-test for HO : 8 = 80 us. HI : 18 - 001 = SO has constant error probabilities a and p where A 5 (1 - p) /a and B (1 - a).

Theorem 3.2.2 (J.K. Ghosh, 1960). For HO: 8 = 80 us. HI : 18 - 801 = SO, the sequential t2-test has the double minimax property which in Ca,p is obtained by replacing T by T2,TI by TI2 and (8 - 00) 2 6a by 18 - 801 = 60 in (3.2.8) pro- vided A and B are so chosen that T2 has error probabilities a and p. Moreover, if A - (1 - p) /a and B - p/ (1 - a), then T2is double minimax in CA,J. Proof. The proof is analogous to the proof of Theorem 3.2.1 and one should use Property 11. W

Sacks (1965) proposed a sequential t-test for one-sided alternative with pos- sible extensions to the two-sided situations. His procedure is as follows: Let HO : 0 = 0 vs. HI : 0 = 0,0 > 0. Let

and 72 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Then take an (n+ observation if

and if

stop and accept HO if (3.2.10) is violated, and stop and accept I31 if (3.2.9) is violated. Notice that the procedure is symmetric with symmetric bounds A and A-l. (That is, a! = p). We also have the following result.

Theorem 3.2.3 (Sacks, 1965). For the above procedure, let N denote the stopping time or variable. Then

(3.2.11)

Remarks 3.2.3.1 Similar ideas would work for HI : 8 = 60 or 101 = 60. Since the distribution of N depends only on 8/o, (3.2.11) is valid for any 8 and o with 8 = cr. (That is, for any point in HI). Sacks (1965) points out that the moments-generating function of N is finite for t in some neighborhood of zero. The sequential test has bad asymptotic properties when 0 = a/2. When 8/cr = 0 or 1 the author claims that the error probability is o (In A/A).

Remarks 3.2.3.2 Sacks' (1965) t-test can be obtained by employing the fol- lowing class of weight functions:

(2alnc)-', 8 = 600 and c-l c < cr < (3.2.12) elsewhere in wo and (2alnc)-' , 6' = 610 and c-l < cr < c 91 = 0) (3.2.13) (e74 { elsewhere in q. Note that

= 2("-2)/2S- n/2r(7212) where S = C.5 after substituting v = & and carrying out the integration. Also 3.2. SEQUENTIAL T AND T2 TESTS 73 where T = F/&, Thus Sack’s (1965) criterion is

which coincides with Barnard’s criterion when S = 1, and with Wald’s criterion when we set 6 = 1, B = A-l, ,B = a and replace n by n + 1 in the exponent of v and the gamma function occurring in Wald’s criterion. The Govindarajulu- Howard approximation to Sack’s version of the sequential t can be obtained by setting p = 2n - 1, B = 1/A and S = 1, and changing A to A exp (-S2/2).

Hall (1962) has given two analogues of Stein’s two-stage test procedures for testing hypotheses about the mean of a normal population with specified bounds on the error probabilities when the variance is unknown. These procedures are modifications of Baker’s (1950) procedure. They provide alternatives to the se- quential normal test (variance known) or the sequential t-test. The performance of these tests does not depend on the of any as- sumption regarding the variance. Moreover, unlike the t-test, these procedures do not require that we specify the alternative hypothesis in standard deviation units.

3.2.7 The Sequential Test T Let XI, X2, ... be an i.i.d. normal (8, a2) sequence of random variables with -00 < 8 < 00, 0 < a < 00. Let m(m 2 2) be a specified integer. We wish to test HO : 8 = 0 versus If1 : 8 = 6 (a unknown) based on Xm,Xm+l, ... having terminal boundaries Am and B, and with a replace by Sm where

-x,=--ZX~, lrn sk=-C(Xi-Xm) lrn 2 , ~=m-l, m i=l 2, i=l and

(3.2.14) 74 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES and similarly

b, = lnB,

(3.2.15)

The first test of Hall (1962) will be called test T. Note that A, > A G l/a and B, < B = ,O with approximate equalities instead of inequalities if m is large, where A,B denote Wald’s conservative stopping bounds that are appropriate if cr were known. Let

(3.2.16) i=l

Then we can describe the procedure T as follows: Take observations XI,X2, ... X,. Then successively observe X,+l, Xm+2,...; and for each n 2 m, after ob- serving X,,

(i) Stop and take decision do (accept Ho) if rn (s,) 5 b,; (ii) Stop and take decision dl (accept HI)if rn (s,) 2 a,; (iii) and take one more observation if b, < rn (s,) < a,.

It can be shown that

P(d1 using TI8,cr) < a for all 8 5 0,a > 0, and P(d0 using TI6,a) < ,f3 for all 6 2 S,a > 0. (3.2.17)

That is, T has strength (a,@. Further, since the SPRT T (s,, a) terminates finitely with certainty for every fixed value of s,, the test T also terminates with certainty. Hall (1962) has o’btained expansions for the OC and ASN functions of T. Further Hall (1962) has made certain comparisons of the approximate power and ASN functions of the sequential test T,Stein’s two-stage test, Wald’s SPRT and fixed-sample size test (FSST) if cr were known for = p = .05 and .01. He surmises that substantial sE,vings may be possible using T, at least if one of the hypotheses is correct. The comparison between T and Stein’s test is analogous to the comparison between the SPRT and the FSST of the same strength.

3.2.8 An Alternative Sequential Test T‘ For the symmetric (a = ,8) one-sided case, a minimum probability ratio test (MPRT) which has converging straight line boundaries can be adapted. The 3.3. SEQUENTIAL F-TEST 75

MPRT is equivalent to one of Anderson’s tests (1960) discussed in Section 2.9. The test T‘ is given by: For n 2 m, stop sampling as soon as

and choose dl or do according as Cy=l(xi - S/2) is > or < 0 where

c, c, = v [(2a)-2/’- 11

-2 In (2a) +o(v-l)] . (3.2.18)

If N‘ is the stopping time associated with the test T‘, then a lower bound for E(N’) at 8 = S/2 is

(3.2.19)

3.3 Sequential F-test In this section, we are concerned with testing for location in several normal pop- ulations in the presence of nuisance parameters. P. G. Hoel (1955) has applied Wald’s method of weight functions to the general linear hypothesis-testing prob- lem. In the following we take a special case of the formulation of the problem by Hoel (1955). Let XI,X2, ..., X, be independent normal variables with means 81, 82, ..., 8, (Qi= 0 for i = p + 1, ..., s) and common variance 02.We wish to test

H~ : el = . . . = e, = o (a > 0) (3.3.1) against the alternative

where A0 is a specified constant. Since the likelihood is a function of Cr.l 8f’/cr2, it is reasonable to consider such alternative hypotheses. Notice that cr is a nui- sance parameter which makes HO a composite hypothesis. Let S1 denote the boundary of the region

(3.3.2) 76 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

The following is a special case of Hoel’s (1955) normalizing weight functions. ul&, if o _< u _< c { O, otherwise (3.3.3) ~20~2,if o 5 0 5 c 91 (W = { 0, otherwise, where the ai and bi are certain related constants with bl 5 0 and b2 = bl + 1 - p. Let (Xil,Xi2,..., Xin) be a random sample on Xi (i = 1,2, ..., s). Let f1nc (font) denote the density obtained after integrating the joint density of Xij (i = 1,2, ..., s, j = 1,2,..., n) with respect to the prior density gl(o)(go(o)) on the region SIC(woe) which is that part of the surface S1 (WO) when 0 5 0 5 c and wo is the region in the parameter space characterized by the null hypothesis. Notice that SICand woCare truncations of S1 and wo which imply the existence of the necessary integrals. After forming the ratio flnc/fonc and letting c + 00, the special case of Hoel’s (1955) expression is given by

-f In = lim -f 1nc fon c-mo fonc (3.3.4)

L where

s= (3.3.5)

M(a,b,u) = the confluent hypergeometric function, and

- ln xi= -):xij. n j=1 Note that if bl = -1 and s = p, the test criterion (3.3.4) reduces to that obtained by Johnson (1953) by the application of invariance considerations pro- posed by Cox (1952a). If bl = 0 and s = p = 1 the test criterion (3.3.4) reduces to the sequential t2-test criteria (see $(T;6, n) in Section 3.2). If s = p = 1 and bl = -1 the test criterion (3.3.4) yields a version of the sequential t2-test given independently by Barnard (1952) and Rushton (1952). In the following we assume that s = p and bl = -1. Letting b = p/2, X = np/2 and T = AoS/p (i.e., XonS/2 = AT) (3.3.4) takes the form of

(3.3.6) 3.3. SEQUENTIAL F-TEST 77

If a and p denote the type I and type I1 error probabilities, then setting A = (1 - p)/a and B = p/(l - a) the sequential F-test can be described as follows: If the experiment reaches the nth stage, the sampling continuation region is given by B < M (A, b, AT) e-xxO/P < A. (3.3.7) If we invert the inequality (3.3.7)) then the sampling continuation inequality becomes Bx < AT < Ax. (3.3.8) Before we discuss this inversion, it is useful to consider the asymptotic distribution of nS. Towards this, Govindarajulu and Howard (1994) (which will be abbreviated as G-H (1994)) have shown that

in distribution as n -P 00 where Z has a standard normal distribution and 82 s=>:-. 02 i=l

Thus S tends in probability to S/ (p+ 6) as n --f 00. Hence S remains bounded in probability as n increases and this is useful in the forthcoming expansions. Using the inversion formula (3.3.8) G-H (1994) show that the sequential pro- cedure terminates finitely with probability one. Also note that Ray (1957a)) using the monotonic properties of the logarithm of the M function, established almost sure finite termination of the sequential F-test.

3.3.1 Inversion Formula We are concerned with solving equations of the form

M (A, $, x) = AeXXo/p (3.3.10) for x > 0 where 0 < Ao/p 5 1 with A and p given (n large, p = 2,3,4, ...). We assume that p/2 is a positive integer, so p is an even positive integer; if p is an odd positive integer, then the analysis becomes complicated by the appearance of “extra terms”. However, the formula valid for even p also gives satisfactory results when used for odd p. So it suffices to provide the results for p even. Let Ic = (n- l)p/2, so that Ic + b = A. Then we have

(3.3.11) 78 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES where the C,k are the usual binomial coefficients. Using the leading term in the Euler-Maclaurin sum formula, G-H (1994) re- place the sum involving the binomial coefficients by an integral and then using Laplace's methods, obtain an asymptotic expansion for the integral. Thus, they obtain

+ -4 + 6b + x2 - 3b2 + 6bx 12 (xk)1/2 1 - 6b + 6~ + 3(~- b)2 + 12xb2 - +O (k-3/2) . (3.3.12) 48xk Solving (3.3.10) for x is equivalent to solving for x from

G(x) = In M(k + b, b, x) - 1nA - (k + b)Ao/2b = 0 (3.3.13) where G(x) can be obtained from (3.3.12) and

1 3 - 8b + 4(~+ b)2 xG'(x) M (k~)l/~- -(-I - 22 + 2b) + 4 32(~1C)l/~ 9 + 14~- 30b - 24x2 + 24b2 + 0 (kW3l2). (3.3.14) + 384xk Now use the Newton-Raphson algorithm which yields the sequence {xn} (n = 0,1,2, ...) where

xn+l = (xn) - -G (Xn) (3.3.15) G' (Xn) where equations (3.3.12) and (3.3.14) can be used to evaluate the quotient GIG' in (3.3.15). From a convexity argument we take xo = 0 in (3.3.15) and obtain

blnA + (k + b)Xo - (3.3.16) k+b Further approximations to the root x of (3.3.10) are obtained by using (3.3.15), (3.3.12) and (3.3.14) to generate x2, x3, etc. Thus we have the algorithm.

Algorithm. To find an approximation to the solution of (3.3.10) in x > 0, use the Newton-Ralphson sequence generated by (3.3.15) with G and G' given respectively by (3.3.12) and (3.3.14) starting from x1 given by (3.3.16). 3.4. LIKELIHOOD RATIO TEST PROCEDURES 79

Note that unless the right side member of (3.3.10) is at least one, this equation has no nonnegative solutions for x, since

M (A,;,.) 2 1 for x > 0,P > 0,x 2 0. 2

Recall that the sampling continuation region at the nth stage is given by (3.3.7). The algorithm above can now be used to give an equivalent sampling continuation region given by (3.3.8) where Ax is the root, x of (3.3.10) and Bx is the root, x of (3.3.10) with A replaced by B. These roots are obtained using (3.3.15), (3.3.12) and (3.3.14). It is found that 4 or 5 iterations of the algorithm are needed to obtain Ax and Bx with an error of 1%.G-H (1994) carry out some simulation study (with 500 replications for each case) with a = ,O = 0.05 (and hence A = 1/B = 19) and 9i = (X,/p)l/’. Some of their values are presented in Table 3.3.1.

Table 3.3.1 Stopping Times for some Parameter Configurations Average Stopping Time XOIP P (AST) of the AST 0.5 1 17.2 0.35 1.5 1 8.4 0.14 0.5 3 7.8 0.13 1.5 3 4.4 0.04 0.5 5 5.10 0.06 1.5 5 4.16 0.25 0.5 7 4.58 0.04 1.5 7 3.96 0.01 0.5 9 4.29 0.03 1.5 9 3.90 0.01

3.4 Likelihood Ratio Test Procedures

If it is not feasible to apply the method of weight functions of Wald (1947), and if invariance considerations do not apply, then one should look for a procedure in which sample estimates replace the true values of the nuisance parameters. Estimates like the BAN (best asymptotically normal) estimates will suffice. In particular, we confine ourselves to maximum likelihood estimates which have some well-known desirable large-sample properties. In this section we implicitly assume that we are testing hypotheses that are ”close” to each other, hence the need for large sample sizes to terminate the experiments. Large-sample properties of maximum likelihood estimators (mle’s) in fixed-sample size cases have been studied by CrAmer (1946); the regularity conditions have been relaxed to some extent by LeCam (1970). 80 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Let f (2; 6,q) be the common probability density (probability function) of a sequence of i.i.d. random variables {xi}, where 6 E R, q E A and let 0 = R x A. Then we have the following well-known result.

Lemma 3.4.1 For each (6,q) E 0, assume that

(i)The first partial derivatives of f with respect to 6 and q exist and are bounded absolutely by some integrable functions of x, and (ii) The second partial derivatives of f with respect to 6 and 7 exist, are ab- solutely bounded by some integrable functions of x and have finite expecta- tions. If

then,

-' d21(@' ') -+ -I& in probability as n + 00, n do2 -' d21(" ') -+ -Ieq in probability as n -+ 00, (3.4.1) n dedq

- d21(e7rl) --+ -Iqq an probability as n -+ 00 n 672 where "(Z)= o=+-), dln f

war dln f = -E d2In f =lee, (7)(F) (3.4.2)

dln f d2In f var (F)= -E (F)= Ivv.

Proof. Assumptions (i) and (ii) enable us to perform differentiation under- neath the integral sign and imply the existence of 100, Ieq and Iqv. Also I (@,q) is the sum of i.i.d. random variables since the X's are. Hence application of Khintchine's theorem pertaining to the convergence of an of i.i.d. sequence of random variables to their mean, yields (3.4.1). 3.4. LIKELIHOOD RATIO TEST PROCEDURES 81

Lemma 3.4.2 (Crkmer, 1946; LeCam, 1970). If the assumptions of Lemma 3.4.1 are satisfied and if

(3.4.3) and analogous results hold for the mixed second derivative and the second par- tial derivative of log f with respect to r], then the mles an, on based on a ran- dom sample of size n, have an asymptotic bivariate normal distribution with means 6' and r], and n-l Ice, n-l Iqq and covariance n-l Ieq where Iee = -1 Fee - liv(I~~)-'] -' - I;~(10e)-'] ,1077 [ (IeeIqv - I;~)/I~~I -'

Remark 3.4.1 Uniform continuity of the second partial derivatives of 1 (0,~) with respect to 6' and 7 or the existence and absolute boundedness by integrable functions in x of the third order partial derivatives could replace the assumptions in (3.4.3).

Maximum likelihood SPRT's have been proposed by Bartlett (1946) and D. R. Cox (1963). First we shall give Cox's procedure which is a slight modification of Bartlett's procedure. Suppose we are interested in testing HO : 6' = 6'0 against I31 : 6' = 81, where 7 is the nuisance parameter, no prior information about which is available. We further assume that 6'1 - 6' and 6'0 - 6' are of the order n-lI2 where 6' denotes the true value. Let en and on denote the mle's of 6' and r] based on (XI,X2, ..., Xn). Then consider the sequential procedure based on the likelihood ratio 2; = l(6'1, on) - l(eo,'Fln)- (3.4.4) The method of Bartlett (1946) involves the use of two mles of r] one assuming that 6' = 81 and the other with 0 = 6'0. His method will be described later on. Notice that exp (2;) is no longer a ratio of probability density functions and hence, the boundary values expressed in terms of error probabilities cannot be used without proper justification. Taylor's expansions of 1 (Oi, 9,) for i = 0,l about the true (6',r]) yield

(3.4.5) where the remainder a involves the differences of the second order derivatives, and it converges to zero in probability when - 6'1, i = 0,l are sufficiently small 82 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES and the second derivatives are smooth (see Remarks 3.4.1). Next, expanding the following functions of en and ijn (which are the mles)

about (e,n)we obtain

and

where &,J and &,2 converge to zero in probability and hence can be neglected for sufficiently large n. Substituting (3.4.7) in (3.4.5) and ignoring terms of order o(1) we obtain Theorem 3.4.1.

Theorem 3.4.1 (Cox, 1963). When lei - 81 is suficiently small (i = 0,l) and the second partial derivatives are smooth, then we have

ZA = (el - e,) Iee n On - - (* eO:el) where n8, is asymptotically normal with mean n8 and variance ,Iee. Also, using (3.4.1) in (3.4.7) and (3.4.8) we see that for large n, n (*8, - 0) is the sum of i.i.d. {K} i= 1,2,..., 72 where

-1 Also, note that wary = Iee = Iqq (IeeIqq - I&) . The asymptotic repre- sentation of 2; suggests that the test should be based on

where E (Tn) = n (0 - (01 + 00) /2} and war (Tn) = nIee. Thus, for large n, Tn is a random walk where the mean increment per step is 8 - (01 + 0,) /2 and the variance per step is Ice. Further, leecan be estimated by substituting 6' = (01 + 00) /2 and q = 5. We can therefore use the theory for normally distributed 3.4. LIKELIHOOD RATIO TEST PROCEDURES 83 observations. (The continue-sampling inequality for testing HO : 8 = 00 versus HI : = 81 when u is known is:

u21nB < (el - e,)

Thus, the continue-sampling inequality becomes

where

b = IeeIeeln A %In (J-)l-G! (J-)l-G! a = IeeIeeln (T)1-P A c,ln (a>1-P (3.4.9) and

It should be pointed out that the above procedure can be carried out even if 7 is a vector. Appropriate modifications should be made in the justification. Now, we can use Wald's (1947) approximations for the boundary values in terms of error probabilities. Furthermore, the expressions for the OC functions, and the ASN are also valid for sufficiently large n.

Example 3.4.1 (Cox, 1963). Suppose that Xi are i.i.d. normal with mean p and variance u2. We wish to test HO : 8 = p/u = 80 against HI : 6' = 81 (00 < el). Here in = Xn/sn, ijn = sn. Thus

2; = n (01 - 00) [Xn(sn)-l - (00 + 01) /2] , n = 2,3, ..., lee = 1, I~,= (2 + e2) lo2, rev = elu.

Hence, we take c, = (1 + F:/2si) and carry out the procedure. It is of interest to note that the above procedure is asymptotically equivalent to the sequential t-test ( to the asymptotic form of Rushton (1950), See Eq.(3.2.4)). One can obtain sequential likelihood ratio tests in order to test Ho : p = po against HI : p = pl (pl> po) when u is unknown. Here the principle of invariance does not apply. The sequential procedure is obtained by replacing u by s in Wald's 84 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

SPRT with u known. Since s and ?E are independent, there is asymptotically no change in the properties of the sequential procedure from that when u is known. Furthermore, one can also derive a sequential likelihood ratio procedure for the Behren-Fisher problem (in which we wish to test the equality of the means of two normal populations having unequal and unknown variances). There might be some situations where it is easier to calculate the mle’s of 77 when 8 = 80 and 8 = 81 than to compute the joint mle’s 8, and 4,. Bartlett (1946) proposed that the sequential likelihood ratio procedure be based on where el [Go] denotes the mle of 77 when 0 = 01 [@,I. Notice that the subscript n on 77 is suppressed for the sake of simplicity.

Theorem 3.4.2 (Bartlett, 1946). Under the regularity assumptions of The- orem 3.4.1 we have

Proof. Let 01 be the true value of 0. Expanding I (80,fiO) - I (01,fjl)in Taylor series around 1 (81,fil)and using (3.4.1) we obtain

(3.4.11) 2; = (01 - 60) (g) + (41 - 40) 1 (E)1 where the subscript i denotes that the particular derivative is evaluated at 8 = 8i and 7 = iji (i = 0,l). Also

Furthermore, since (c> = (c> = 0, we have

(”> -(”) = (E) -(”> +(;)l-(E)87 e=el 87 e=eo 877 e=el 87 e=eo 877 0 = -n (77 - 40) Iqq - (41 - 77) 17777 = n (40 - 41) IW (3.4.13)

Hence, from (3.4.12) and (3.4.13) we obtain

(3.4.14) 3.4. LIKELIHOOD RATIO TEST PROCEDURES 85

Also

after using (3.4.14). Now using (3.4.13), (3.4.14) and (3.4.15) in (3.4.11) we obtain

which is a sum of i.i.d. random variables after using (3.4.14) and hence, Wald’s approximations to the boundary values, OC function and ASN are applicable.

Remark 3.4.2 Cox (1963) has applied sequential likelihood ratio procedure in order to test certain hypotheses about two binomial proportions. If pl and pa are the two proportions, let 8 = p2 - pl and 7 = pl or 8 = In [p2/ (1 - p2)] - In [pi/ (1 - pi)] and 7 = pi/ (1 - pl). We will be interested in testing a simple hypothesis about 8 against a simple alternative.

For the normal problems (see Example 3.4.1) Joanes (1972 and 1975) has com- pared the procedures of Bartlett (1946), Cox (1963), Barnard (1952) and Wald (1947) (the test of Barnard (1952) [Wald (1947)] is a sequential t-test obtained by using the weight function g (0)= 0-l (0 < 0 < 00) [g (0)= 1/c (0 5 0 5 c)]. Using asymptotic expansions similar to those obtained by Rushton (1950), Joanes (1972) surmises that Bartlett’s test (1946) is closer to that of Barnard rather than to that of Wald (1947). Cox’s (1963) test procedure with modified bounds given by (3.4.9) is asymptotically equivalent to that of Bartlett. All these statistics, when asymptotically expanded, differ only at the 0 (n-1/2)term. Breslow (1969) provides a general theory of large-sample sequential analysis via weak convergence approach which explicitly justifies Cox’s (1963) approach. He applies this general theory to the comparison of two binomial populations (see Remark 3.4.2) and the comparison of two exponential survival curves; the latter problem can be described as follows: Let 2720 denote the number of patients that enter into study in the time interval (0,T) (that is, the total number of 86 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES observations available to the experimenter), no to be placed on treatment A and no on treatment B. Let the entry times of the former [latter] group be denoted by Hi[Ji](i = l72,...,n0). Let X1,X2, ... [Yl,Y2,...I denote the survival times of the patients in group A [group B] patients. Let

and F(~)(Z)= P(Y 5 x) = 1 - exp(-zXB), x 2 0, XB > 0. We wish to test HO: 8 = (XA/XB) = 1 against Hno : 8 = en, = 1 + 2r]/no1/2 . Breslow (1969) proposes a large-sample sequential test of HO (as no -+ m) based on the mle of 8.

3.4.1 Generalized Likelihood Ratio Tests for Koopman-Darmois Families Before we present these open-ended tests for Koopman-Darmois families we need the following result of Lorden (1970) pertaining to the excess over the boundary. In SPRTs cumulative sums play an important role. Let Sn = XI+X2 + ... + Xn denote sum of n independent random variables with common mean p > 0. The stopping time is given by N(t)= inf {n : Sn > t}. Wald’s (1947) equation states pEN(t)= ESN(t) (see (2.4.10)) whenever supn E lXnl and EN(t)are finite, and can be rewritten as pEN(t)= t + ERt, where Rt = SN(t) - t.

Definition 3.4.1 Suppose a random walk, {Sn}T=o having positive drift and starting at the origin is stopped the first time Sn > t 2 0. Let N(t) = inf {n : Sn > t}. Then Rt = SN(t) - t is called the “excess over the boundary.”

The excess over the boundary is often assumed to be negligible. Wald (1947) gave an upper bound for supt>oERt in the case of i.i.d. X’s, namely SUP^>^ E (X - rlX > r) (see Wald, 1947, p. 172. Eq. (A.73)). Wald’s bound whichis large by at most a factor [l- F(O)]-’, may be difficult to calculate. Lordon (1970) provides an upper bound for the excess over the boundary which is intuitively clear if the X’s are nonnegative random variables. Towards this, we have the following theorem which we state without proof.

Theorem 3.4.3 (Lorden, 1970). Let X1,X2, ... be independent and identi- cally distributed random variables with EX = p > 0 and E(X+)2< 00 where X+ = X if X > 0 and X+ = 0 if X < 0. Let Sn = Xi + X2 + ... + Xn, N(t)= inf {n : Sn > t} and R(t)= SN(t)- t. Then

(3.4.16) 3.4. LIKELIHOOD RATIO TEST PROCEDURES 87

Corollary 3.4.3.1 Under the hypothesis of Theorem 3.4.3, if b < 0 < u and N* = inf {n: Sn @ (b,u)}, then

(3.4.17) where /? = P(Sp < b) .

b = -00 implies that P = 0 and N = inf {n2 1,Sn > u}. Then Siegmund (1985, p.170 and 176) obtained an asymptotic expression for EN.

Theorem 3.4.4 (Siegmund (1985)). If P(X1 > 0) = 1, ,u = EX1 and EX; < 00, then

if X1 is non-arithmetic, (2) limu-+wE(SN - if X1 is arithmetic’ with span h.

If p > 0 and EX; < 00, we have’ ES$+ if X1 is non-arithmetic, 2ESp+ (ii) limu-+mE(SN - u) = ’ EsN+ if X1 is arithmetic with span h, i 2E5”+ + 5, where N+ = inf (n2 1,Sn > u) , and

where x- = - min(x, 0).

Proof. See Sigmund (1985, Chapter 8).

One can use Theorem 3.4.4 in the expression for the average stopping time given by 1 EN = - [U + E (SN- u)]. P

Remark 3.4.3 If the X’s themselves are non-negative, then n-l CTZ1ES; will be identically equal to zero.

~~~ ~ ‘A distribution on the real line is said to be arithmetic if it is concentrated on a set of points of the form 1,fh, f2h, .... The largest h with this property is called the spun of the distribution. 88 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Now, we shall consider the generalized likelihood ratio test procedures. H. Robbins and his collaborators have proposed "open-ended tests" which like the one-sided SPRTs, continue sampling indefinitely (with prescribed probability) when the null hypothesis is true and stop only if the alternative is to be accepted. Their methods are most effective in the case of testing a normal mean. Lorden (1973) has investigated the generalized likelihood ratio approach to the problem of open-ended tests for Koopman-Darmois families. Schwarz (1962) has shown that the approach leads to easily computable procedures. Since the tests are equivalent to simultaneous one-sided SPRT's, it is easy to obtain upper bounds on expected sample sizes. We should focus on bounding the error probabilities, since the simple Wald approach is not applicable. For the one parameter Koopman-Darmois family, Wong (1968) has shown that the error probabilities tend to zero faster than clnc-l (as c + 0) for simultaneous one-sided SPRT's with c (c < 1) being the cost of each observation. Lorden (1973) obtains an explicit bound which is of the order of c In c-l. Let XI,X2, ... denote independent and identically distributed random vari- ables having the density

(3.4.18) with respect to a non-degenerate 0-finite measure. Stopping time N (possibly randomized) satisfies

P(N < 0olHo) 5 y for some 0 < y < 1/3, (3.4.19) where HO : 8 = 80 and HI : 8 > 80. (Reparametrize the density if necessary to shift the boundary point between null and alternative hypotheses to zero, that is 00 = 0. Also, without loss of generality assume that b(0) = 0. Let Sn = X1 + X2 + - - * + Xn, n = 1,2, ... and note that

(3.4.20) so that one-sided SPRTs of f (x;0) against f (2; e), 9 > 0 are given by: stop as soon as lna-' + nb (0) sn > e (3.4.21) for specified type I error probability a (0 < a! < 1). (Notice that here, we set p = 0 in A = (1 - p) /a.) The function b (.) is necessarily convex and infinitely differentiable in (8,e) which need not be the entire natural parameter space2 of 2The set of parameter points 8 for which f(q8) is a probability density function is called its natural parameter space. 3.4. LIKELIHOOD RATIO TEST PROCEDURES 89 the family of densities considered here. One can easily show that Ee (X) = b' (0) and ware (X) = bN (0) where primes denote the order of the derivatives. It is easy to show that the information number Ee {In [f(X; 0) / f (X;O)]} is I(0)given by

eb' (e) - b (e) = I (8) , (3.4.22) and the variance of In {f (X;0) /f (X;0)) under 8 is 02b" (0) . Define a likelihood ratio open-ended test of HO : 0 = 0 against HI : 0 < 81 5 8 < 8 as a stopping time, N(81,a):the smallest n 2 1 (or 00 if there is no n) such that [lna-' I nb(8)] Sn > inf - (3.4.23) elg

For the alternative 0 5 02 < 0, N (82, a) can similarly be defined. Although the infimum in (3.4.23) is computable in some cases, for example, the case of the normal mean, it is simpler in many cases such as the normal and negative exponential distribution, to formulate the continue-sampling inequality in terms of Xn = Sn/n and n, as in Schwarz (1962). First note that (3.4.23) is equivalent to sup [eSn- nb (e)] > lna-l. (3.4.24) elgje If b' (81) 5 Xn < b' (8), the supremum is attained at q (Xn) where q is the inverse of the increasing function b'. Then (3.4.24) is equivalent to

(3.4.25)

If ffn < b'(01), the supremum is achieved at 01; and if Xn > b' (8), the supremum is approached as 8 --+8 (and attained at 8 if the latter belongs to the natural parameter space). Then we have the following theorem which we state without proof.

Theorem 3.4.4 (Lorden, 1973). If fi = N (01, a) with

then N satisfies (3.4.19) and

lny-l+ lnlny-1 21n { fi [I + I]} + +- e2b'' (') + 1 (3.4.26) I (0) I (0) {I (0) )2 for all 8 in [01,8]. Further, af N satisfies (3.4.21) then

(3.4.27) 90 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Example 3.4.2 Let X be distributed normally with mean 0 and variance 1. Then b (0) = Q2/2 and q (y) = y. Hence (3.4.25) is equivalent to

-2 21n(l/a) Xn>

3.5 Testing Three Hypotheses about Normal Mean

In the preceding sections we have developed sequential procedures that are ap- propriate for testing two hypotheses (simple or composite). However, there are applications that require a choice among three or more courses of action; hence the theory developed earlier is not adequate. For example, the tolerance limits set to a machine may be too high, too low or acceptable. As another example let X, Y denote the breaking strengths of two yarns. We may be interested in test- ing whether X and Y are approximately equal or that one is stochastically larger than the other. One can formulate the above problems in terms of sequential testing among three hypotheses. If the three hypotheses can be ordered in terms of an unknown parameter, a sequential test may be devised by performing two SPRT’s simultaneously, one between each pair of neighboring hypotheses after they have been ordered. Armitage (1947, 1950)) and Sobel and Wald (1949) ob- tained sequential tests which satisfy certain conditions on the error probabilities.

3.5.1 Armit age- Sob el-Wald Test Let a random variable X be normally distributed having mean 8 and known variance which, without loss of generality can be set to be unity. We are interested in accepting one of H~ : e = eo, H~ : e = el, H~ : e = e2,with eo < el < e2, (3.5.1) on the basis of an i.i.d. sequence, {Xn} (n= 1,2, ...). Notice that Armitage (1947) has considered the above formulation, whereas Sobel and Wald (1949) consider HO : B = 00, HI : 81 5 8 5 62, H2 : 8 = 03. Thus we are considering the special case of 02 = 81. Since Tn = X1 + - + Xn is sufficient for 0, the fixed-sample size procedure would be: accept Ho if Tn 5 to, accept Hi if to < Tn < tl, (3.5.2) accept H2 if Tn > ti , where to and tl are chosen subject to P(reject HolHo) 5 yo, P(reject HlIH1) 5 yl, P(reject H2lH2) 5 y2, (3.5.3) The sequential procedure is given by: Let R1 denote the SPRT for HO versus HI and Rz denote the SPRT for HI versus H2. Then both R1 and R2 are carried out at each stage until 3.5. TESTING THREE HYPOTHESES ABOUT NORMAL MEAN 91

- either: one of the procedures leads to a decision to stop before the other. Then the former is stopped and the latter is continued until it leads to a decision to stop. - or: both R1 and R2 lead to a decision to stop at the same stage in which case no further experimentation is conducted. The final decision rule is: Accept HO if R1 accepts HO and R2 accepts HI, Accept HI if R1 accepts HI and R2 accepts HI, Accept H2 if R1 accepts HI and R2 accepts H2.

Let the SPRT Rj be given by the stopping bounds (Bj,Aj) (j = 1,2) . Also let Aj = 8j - 8j-1 and dj = (8j + 8j-1) /2 for j = 1,2. Then at stage n the continue-sampling inequality for Rj is given by

(3.5.4)

Now, we will show that acceptance of HO and H2 is impossible if

(B1,Al) = (B2,Az)= (B,A) and A1 = A2 = A. If HO is accepted at nth stage, then In B 2Xi< ndl+ -A' i=l However, since ndl + In B/A < nd2 + In B/A, H2 is rejected at nth stage. Next, suppose that H2 is accepted at the nth stage. That is, In A 2Xi2 nd2 + - i=l A' However, since nd2 + lnA/A > ndl + InA/A, HO is rejected at nth stage. Thus, Ho and H2 cannot both be accepted. A geometrical representation of the Rule R is given in Figure 3.5.1. The combined Rule R can be stated as follows: continue sampling until an acceptance region (shaded area) is reached or both dashed lines are crossed. In the former case, stop and accept the relevant hypothesis as shown in figure 3.5.1; in the latter case stop and accept HI.

Lemma 3.5.1 (Sobel and Wald, 1949). Let A1 = A2 = A. A suficient condition for the impossibility of accepting both Ho and H2 is

(3.5.5) 92 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Proof. Acceptance of HOand H2 is impossible if and only if rejection number of HO for R1 5 rejection number of H2 for R2, and acceptance number of Ho for R1 5 acceptance number of H2 for R2. That is, for every n 2 1

In A1 In A2 ndl -I nd2 - + A + A and In B1 In B2 ndl Ind2 - + a + A* That is, for every n 2 1

and this completes the proof. H

Reiection line for Hsm

D I II

Figure 3.5.1 The Sobel-Wald Procedure

Remark 3.5.1.1 Since n (62 - 60) A/2 = A2 > 0, the inequalities (3.5.5) are surely satisfied when A1/A2 I1 and &/I32 5 1. Hereafter unless otherwise stated, we assume that the stopping bounds of the two SPRT's are such that it is impossible to accept both HO and H2. Let Nj [N]denote the stopping time associated with Rj [R](j = 1,2). Then one can easily see that N = max (N1,N2).

P(N > n) = P(N1 > n or N2 > n) 5 P(N1 > n)+ P(N2 > n) 5 2pn + 0 as n tends to infinity for some 0 < p < 1. Thus, the combined SPRT terminates finitely with probability one. 3.5. TESTING THREE HYPOTHESES ABOUT NORMAL MEAN 93

3.5.2 Choice of the Stopping Bounds Here we indicate how to choose the constants (Bj,Aj),j = 1,2 subject to (3.5.3). Let L(HilOj,R) = Probability of accepting Hi when Hj is true, using R (i, j = O,1,2).

(3.5.6)

(3.5.7)

where q = (1 - B2) / (A2 - B2).

(3.5.8)

Equations (3.5.6) and (3.5.7) yield

Also, the expressions for q and y2 yield

Thus, by specifying r,~(besides yo,yl and y2), one can determine the stopping bounds Al, A2, B1 and B2. Clearly 0 < q 5 min (rl,1 - y2), provided A2 2 1. If B1 5 1 then q 2 max (0, yo + y1 - 1). Also, the sufficient conditions for the impossibility of accepting HOand H2 will also lead to meaningful lower and upper bounds for 7.

OC Function Let Lj(B(R)denote the probability of accepting Hj using R when 0 is the true value. Then we have Lo(B(R)= Lo(B(R1) since this is the probability of a path starting at the origin and leaving the lower boundary of R1 without ever touching the upper boundary of R. Similarly

L2(4R) = L2(4R2). (3.5.9) 94 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Also, because the combined procedure R terminates with probability one, we have

However, we know

where and

where

3.5.3 Bounds for ASN Let RT be the SPRT having stopping bounds (B1,oo)and RZ be the SPRT associ- ated with stopping bounds (0, A2). That is RT says continue taking observations until R1 accepts Ho and R,* says continue sampling until R2 accepts H2. Thus, R must terminate not later than RT or R;. If Nr (N;) denotes the stopping time associated with RT (R;), N 5 NT and N 5 N,* and hence we have

Furthermore, since N = max (N1,N2) , we have

because

Similarly, the other inequality can be obtained and consequently

(3.5.12)

Neglecting the excess over the boundary and from (2.4.4) we have

(3.5.13) 3.5. TESTING THREE HYPOTHESES ABOUT NORMAL MEAN 95 since the probability of accepting HI with RT is zero. Similarly (3.5.14) since the probability of accepting H1 with R; is zero. Sobel and Wald (1949) numerically evaluate the proceeding upper and lower bounds for various values of 0. Several remarks are in order.

Remark 3.5.1 Although the Sobel-Wald procedure is not an optimum pro- cedure (in the sense that the terminal decision is not in every case, a function of only the , namely, the mean of all the observations), it is simple to apply, the OC function is known (after neglecting the excess over the boundary) and bounds for the ASN are available. Sobel and Wald (1949) claim that their procedure is not far from being optimum and when compared with a fixed-sample size procedure having the same maximum probability of making a wrong decision, the sequential procedure requires, on the average, substantially fewer observations to reach a final decision.

Remark 3.5.2 Due to the monotonicity of the OC function, Sobel-Wald's procedure is applicable to test

H~ z e IeQ, H~ ; el e el1, H~ e 2 el where

Remark 3.5.3 Although we have formulated the procedure for the normal density, one can easily set up the procedure for an arbitrary probability density f (a;0). However, it may not be easy to evaluate the OC function for all arbitrary f (a;0). Furthermore, all these considerations can be extended to k-hypotheses testing problems (k > 3). Simons (1967) developed a sequential procedure for testing (3.5.1) in the particular case when {Xi} are i.i.d. normal with unknown mean 6' and known variance 02.Numerical computations indicate that his procedure is as efficient as the Sobel-Wald procedure. Although Simon's (1967) procedure is a little more flexible, it would be somewhat difficult to extend it to non-normal populations. Atmitage (1950), independently of Sobel and Wald (1949)) has proposed a sequen- tial procedure for k-hypotheses testing problem, which is related to classification procedures. When specialized to three hypotheses his procedure is as follows: 96 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Armitage’s Procedure Let Aijn = f (Xn;8i) / f (Xn;8j) where f (Xn;0) denotes the joint probability or density function of Xn = (XI,X2, ..., Xn). Also, let A;j (i,j = 0,1,2) be some positive constants. At stage n,

otherwise take one more observation. It should be fairly easy to see (via the ) or since the pro- cedure is a combination of several SPRT’s which surely terminate finitely) that Armitage’s procedure also terminates finitely with probability one. Armitage (1950) provides some crude bounds for the error probabilities. Let Li (8j) de- note the probability of accepting Hi when Hj is true. By considering the total probability of all sample points which call for accepting Hi we see that

1 > Li (Oi) > AijLi (Oj) , (i,j = O,l, 2, i # j). (Proceed as in Theorem 2.2 for obtaining Wald’s bounds). That is, 1 (3.5.15)

Also, if the procedure is closed (that is LO(8) + L1(O)+ L2 (8) = 1)

Not much is known about the ASN for this procedure although Sobel-Wald’s bounds for ASN might still hold. Armitage (1950) applies his procedure to a binomial testing problem.

Remark 3.5.4 Armitage’s (1950) procedure has some nice features compared to the Sobel-Wald procedure. The difficulty in the dashed area (see Figure 3.5.1) is avoided by performing an SPRT of Ho versus H2 in this region. Also, sampling terminates only when all the SPRTs terminate simultaneously.

3.5.4 Testing Two-sided Alternatives for Normal Mean Interest in testing a two-sided alternative hypothesis arises naturally in the fol- lowing industrial-setting. Suppose a measuring device has a zero setting which is liable to shift, so we will be interested in resetting the device or not on the basis of reported measurements on the “standard”. If an appreciable bias, expressed as a standardized departure from the known time reading in either direction is 3.5. TESTING THREE HYPOTHESES ABOUT NORMAL MEAN 97 indicated, then the instrument is taken out of service and reset. Let X be nor- mally distributed with unknown mean 8 and known variance 02. Suppose we are interested in testing HO : 0 = 80 against HI : (8 - 80) = 60 for specified 6. We can run a sequential t2-test discussed in Section 3.2. Alternatively, as suggested by Armitage (194'7))one can carry out an SPRT, R+ of HO against H+ : 8 = 80 + 60 with error probabilities a/2 and p; and also carry out an SPRT, R- of HO against H- : 6' = 80 - So with error probabilities a/2 and p. Then the combined decision rule is:

accept HO if R+ accepts HO and R- accepts Ho, accept H+ if R+ accepts H+ and R- accepts Ho, accept H- if R+ accepts Ho and R- accepts H-.

Note that the acceptance region of H+ and H- will be symmetrically placed. (Draw a diagram with n on the horizontal axis and ZXi on the vertical axis). Also acceptance of H+ and H- will be impossible provided ,B + a/2 < 1. One exception to the above terminal rule is: accept HO immediately after the path leaves the region common to both the continue sampling regions of R+ and R- instead of waiting till the path reaches one of the acceptance regions stated above. This test has P(rejecting Ho(H0) = a and P(accepting HoIH+) = P(accepting Ho(H-) = p. This procedure is suitable if one is concerned not only about the magnitude of the shift but also its direction. Another situation where we will be interested in testing three hypotheses is the following. Let X be normally distributed with unknown mean p and known variance u2. Suppose we wish to test HO : p = po against the alternative hypotheses HI : p # po. The composite hypothesis is usually replaced in practice by two simple hypotheses Hi : p = pi (i = -1,l) where p-l < po < pl. A test procedure is sought such that P(accepting Ho(H0)= 1 - a and P(rejecting Ho(p = pi) = 1 - ,B for i = fl. Billard and Vagholkar (1969) have proposed a sequential procedure for the above hypotheses-testing problem which is defined as follows (see Figure 3.5.2). Let Sn = Cy==lXi denote the sum of n observations at nth stage. 98 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

n cx, I i 21 I Accept HI

a ......

b ...... I I Accept Ho

Figure 3.5.2 The Test Procedure of Billard and Vagholkar

First let an initial sample of size no 2 2 be taken. Then continue taking observations until the sample path traced out by the point (n,Sn) crosses either of the boundary lines LA, LAP when HI accepted or either of the lines BC, BQ and CR when HO is accepted, or it crosses either of the lines MD and DS when H- is accepted. This procedure will terminate finitely with probability one since N = min(N1,N-1) where Ni is the stopping time of the SPRT for HOversus Hi (i = Al) and we know that the Ni are finite with probability one. Note that the procedure is completely specified in terms of the geometric parame- ters (no,a, b, c, t,$, y) which are determined so as to minimize the ASN function subject to certain constraints on the OC functions. In general, there are seven geometrical pararmeters which completely define the sequential test procedure. However, in the symmetrical case, that is when po = 0, and --pel = p = pl and o2 = 1, we have c = -b, d = -a and y = -$. Hence, there remains only four parameters, namely (no,a, b, $) which need to be specified. The optimum set of values was obtained by Billard and Vagholkar (1969) for these parameters which minimize the ASN function E,(N) for some specified p, subject to the following constraints on L (p):

L (Po) 2 1 - a, L (pi) 5 p, i = fl. where a and p are the preassigned error probabilities, and L(p)= P(accepting Hold* Further, Billard and Vagholkar (1969) obtain explicit expressions for L (p) and Ep(N)and on the basis of some simulation study, claim the superiority of their test over that of Sobel and Wald. They attribute this to the fact that the B-V procedure is based on the sufficient statistics, XXi. 3.6. THE EFFICIENCY OF THE SPRT 99

3.6 The Efficiency of the SPRT

In this section we will study the efficiency of the SPRT not only at the hypoth- esized values of the parameter but also at other values of the parameter. The behavior of the relative efficiency of the SPRT when compared with the fixed- sample size procedure in the case of normal populations is studied when the error probabilities are related in a certain way.

3.6.1 Efficiency of the SPRT Relative to the Fixed-Sample Size Procedure at the Hypotheses Let {Xi} be i.i.d. having f (z;8) for the common probability (density) function and 8 E 0. For the sake of simplicity assume that 8 is real and 0 is a part of the real line. We are interested in testing HO : 8 = 80 against HI : 8 = 81 where 80 # 81, subject to the prescribed error probabilities a and P. By Theorem 2.8.1 the SPRT minimizes the expected sample sizes at both the hypotheses. Given any competing procedure D one can study the efficiency of the SPRT relative to D at 8 = 80, 81. We shall, in particular, study the amount of saving achieved by the SPRT relative to the corresponding optimum fixed-sample test for the same hypotheses. If the optimum fixed-sample size is n (a,p), the relative efficiency of the SPRT at 8 E 0 is defined by

(3.6.1) where N is the stopping time of the SPRT. We note that 100 [Re(8) - I] /& (8) is the average percentage saving achieved by the SPRT over the optimum fixed- sample size test when 8 is the true value of the parameter. In particular, let X be normal with mean p and known variance g2 where we shall also assume that lpl - pol is small so that the approximations for Epi (N) (i = 0,l) in (2.4.2) and (2.4.3) are reasonable. For the fixed-sample size case one can easily verify that if we reject HO when Cy=lxi > c

(3.6.2) provided a + ,B < lwhere zr is defined by 4 (zr) = y and [z]denotes the largest integer contained in x. Thus, it follows from (2.4.2), (2.4.3) and (2.10.2) that 100 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES and

In particular when a! = p,

(3.6.4)

The following table shows that the approximate values of the efficiency of the SPRT relative to the fixed-sample size procedure for the normal mean when o is known. Note that (Re - 1) 100/Re indicates the percentage of savings by the SPRT.

a! .005 -01 .05 .1 Re 2.540 2.411 2.042 1.864 (Re - 1) 100/Re 60.6 58.5 51.0 46.4

From Table 3.6.1 one can reasonably guess that the percentage saving would be at least 40 for all combinations of a and p. It should be noted that the expres- sions in (3.6.3) serve as upper bounds for Re because we have used expressions for Ep0(N) and Epl (N)which are essentially lower bounds. (For instance, see Theorem 2.6.3). Under certain regularity conditions on the probability or den- sity function f (.$), Paulson (1947) has shown that, when 81 is close to 80, the efficiency of the SPRT relative to the optimum fixed-sample size procedure is free of the particular form of f (z; 9) and the particular values 80 and 91. It should be noted that 8 could be vector-valued.

Theorem 3.6.1 (Paulson, 1947). Let {Xi} be a sequence of i.i.d. random variables having f (5;8) for the common probability or density function where 8 is assumed to be scalar. For 8 in the neighborhood of 80 and 81, assume that Eg (1) = f (2; 9)dx can be differentiated twice with respect to 9 under the inte- gral sign, and that d2 In f (xi 0) 1 (d02> is unaformly continuous in 8. Then, the formulae for Re (80) and Re (81) of the SPRT of 8 = 80 against 8 = 81 relative to the most powerful fixed-sample procedure are respectively given by (3.6.3) and (3.6.4) when A = 81 - 80 tends to zero.

Proof. We shall show that Re(00) tends to the expression in (3.6.3) and an analogous argument leads one to assert that Re (81) tends to the expression in (3.6.4). 3.6. THE EFFICIENCY OF THE SPRT 101

Let us denote the information contained in X about 8j by

Expanding in Taylor series we have

where

and

Since we have 1 EeO(2) = --A21 (80) + o (A2), 2 using the first expansion, and

vur8, (2)= A21(80) + o (A2) , using the second expansion, because the first and second derivatives of d 1n[f (2;0) /a81 are uniformly continuous in O in the neighborhood of 80. Hence from (2.4.2) we have

Notice that o (1) -+ 0 as A -+ 0. Next, let n (a,p) denote the sample size required by the most powerful non- sequential test which will be based on Sn = Cy=lZi. Now, n (a,P) = n is determined by 102 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Since lAl is small, n will be large and hence the central limit theorem is applicable. Thus [K - Eeo (z)lI Jn - a--a [wareo ~1 1/2 and

Solving for n (a,p), we obtain

(3.6.6)

Now, as A -+ 0, I(81) = I(80). Thus, considering n (a,p) /Eeo (N)we obtain (3.6.3). The expression for Re (81) can be obtained by interchanging the roles of a and p in (3.6.3) (because interchanging the roles of 80 and 81 is equivalent to interchanging the roles of a and p). Thus, the percentage savings indicated in Table 3.6.1 will also apply to the SPRT of 8 = 80 against 8 = 81 for arbitrary f (z; 8) provided 181 - 801 is small.

3.6.2 Relative Efficiency at 0 # Bo, Although the null and alternative hypotheses are simple, it is conceivable that the unknown value of 8 is different from 80 and 81. The OC function of the SPRT at values of 8 lying between 80 and 81 is usually not of much interest since one is indifferent as to whether 8 = 80 is accepted or rejected. The performance of the ASN function at such values, however, is of considerable interest since the optimum property of the SPRT holds only at 8 = 80 or 81. If Ee (N) is a continuous function of 8, one would expect that the results of the preceding subsection should hold for 8 in the neighborhood of 80 and 81. In general, if a and p were not too small, then sup0 Ee (N) < n (a,p). Whether the maximum ASN of the SPRT is less than n(a,P) can be easily verified in a given situation by using Wald’s approximations. We note that it is quite possible that Ee (N)> n (a,p) for all 8 E 0’ where 0’ C 0. In this case if 0 C R then 0’ is typically an interval of values located between 80 and 81. We illustrate this feature by Wald’s SPRT and the optimum fixed-sample size test of the normal mean p and with known variance cr2, namely HO : p 5 po, against HI : p 2 pl,-00 < po < pl < 00. Let a = ,6 < 1/2. The monotonicity and the 3.6. THE EFFICIENCY OF THE SPRT 103 symmetry of the OC function yield E, (N)< E,,, (N)= EP1(N) < n (a,a) for p < po or p > pl, and sup, E, (N) = Ep (N), = (po+ pl) /2. From (3.6.2) and the relations

(P1 - Po) (2X - Pl - Po> z= 202 where 2 is normally distributed with mean (pl- PO) (2p - p1 - po) / (20~)and variance (pl- po)2/02, we obtain

(3.6.7)

(after noting that A = (1 - a)/a, h (p)= 0 and using (2.5.8)). Consequently 2 inf Re (/A)= n(a7a) = [ 2z17,]=$(a), for 0 < a < -1 (3.6.8) c1 sup, E, (N) In (l-Cr 2-

We present in Table 3.6.2 the values of t+b (a)for some values of a. From Table 3.6.2 we guess that inf, Re (p) is monotonically increasing. It is easy to verify (noting that dz/da = -l/+ (2) and using 1’Hospital’s rule) that

m lim inf Re (p) = 0 and lim inf Re (p) = - a+o p a+1/2 c1

forTable 3.6.2 Approximate Values of inf R ()u Testing the Normal Mean ( Known ) When a .005 .01 .05 .10 inf, Re. (PI .950 1.028 1.250 1.357 In order to establish the monotonicity of inf, Re (p) , it is sufficient to examine the derivative of (1/2) [$ (GX)]~’~.Hence, it suffices to show that h (a) > 0 for 0 < a < 1/2 where

h (a)= z$ (z) - a (1 - a)In (’- a) , and z = ~1-~.

Set 104 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Then, we can rewrite h(a)as

using the mean value theorem and the fact that g(0) = 0, where 0 5 z* 5 z. This is possible because g (z) is continuous and differentiable. Thus, in order to show that h(a)>_ 0, it suffices to show that

Also, since 4 (z) > 0 and In (A)> 0 for all positive finite z, equivalently it suffices to show that

Notice that H (0) = 0 and H (m) = 0 and HI(.) = (1 - Q) - 2cp2.

Since H' (0) < 0, H' (z) is decreasing near z = 0. Also for large x since

we have (3.6.9)

Also, H(2)= 0 implies that H'(2) > 0 (3.6.10) because H (2) = 0 implies that 5@(1 - a) = cp (2Q - 1). Hence

by applying the mean value theorem and noting that 0 < 2* < 2. Now H(z) cannot intersect the x-axis exactly once because then H' (z) should be negative for large z which contradicts (3.6.9). Also H (z) cannot intersect the x-axis more than once because H'(z) has to be negative at the second intersection whereas this cannot happen because of (3.6.10). In Table 3.6.3 we give some values of h(a). By using Newton's method, it is easily seen that the root of the equation $(a) = 1 lies between a = .0080 and a = ,0081. Consequently, for a 5 .008 Wald's SPRT is less efficient than the 3.6. THE EFFICIENCY OF THE SPRT 105 o A T a .01 .05 .10 .20 .25 h(a) 1.60 x 2.99 x 2.73 x 3.69 x 5.71 x a .30 .40 .45 .49 h(a) 4.80 x 4.46 x 2.73 x 1.69 x

3.6.3 Limiting Relative Efficiency of the SPRT The limiting relative efficiency of the SPRT has also been studied by Chernoff (1956), Aivazian (1959) and Bechhofer (1960). The statement “The SPRT often results in an average saving of 50 percent in sample size” needs to be qualified. Bechhofer (1960) has studied the limiting relative efficiency of the SPRT for the normal case when the error probabilities are related to each other and they both tend to zero. He brings out some of the surprises that are never anticipated. In the following we shall present his results. Let XI,X2, ... be an i.i.d. sequence of normal variables having unknown mean p and known variance a2. We wish to test the hypothesis HO : p = po against the alternative HI : p = pl (pl> po). Let a and p be the bounds on the error probabilities such that 0 < a,@< 1 and a + p < 1. Bechhofer (1960) studied the efficiency (measured in terms of the ratio of average sample size to the fixed- sample size) of the SPRT to the best competing fixed-sample procedure as a,P approach zero in a specified manner. Denote this efficiency by Re (a,p, 6,6*) where 6 = [2p - (po+ pl)] /20 and 8* = (pl - po)/2a. Hence p = po and p = pl are equivalent to 6 = -6* and 6 = 6* respectively. Then

When d = 1, that is, p = ca 8* P1- Po lim Re (a,ca, 6,6*) = - = (3.6.12) CY+O 4 lel 4 IPO + p1 - Thus in the limit the relative efficiency given by (3.6.12) tends to zero as p --+ foo, is equal to 1/4 when p = po or p = pl. It is greater than unity when (5p0 + 3p1) /8 < p < (3p0 + 5p1) /8 and it is infinite if p = (po+ pl) /2. The relative efficiency of 1/4 when p = po or pl was previously noted by Chernoff (1956, p. 19) and the result in (3.6.11) for 6 = 6* has been obtained by Aivazian (1959). Both Chernoff and Aivazian considered the general problem of testing a simple hypothesis versus a simple alternative and studied the limiting relative efficiency as the two hypotheses approach each other. 106 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

3.7 Bayes Sequential Procedures

In this section we will give Bayes sequential procedures for the binomial propor- tion, normal mean and some asymptotic Bayes sequential procedures.

3.7.1 Bayes Sequential Binomial SPRT Suppose that batches of items are available for inspection, and the items must be classified as either effective or defective. Let p denote the fraction of defective items in a batch. Assume that there exists a known critical fraction defective po such that a batch with a fraction po may without loss be accepted or rejected. Batch with p > po are considered bad and should be rejected. Let P(p) denote the prior distribution for the fraction of defective items in batches available for inspection and assume that it is of the following type:

Let W21 (W12) denote the loss incurred if a batch with fraction defective pl (p2) is rejected (accepted). Let c denote the cost of inspecting a single item. Then we wish to determine the most economical sampling inspection plan. Vagholkar and Wetherill (1960) give a method based on the basic theory developed by Barnard (1954) which will be presented below. Because of the special form of the prior distribution, the problem of acceptance sampling is reduced to that of testing two simple hypotheses Hi (i = 1,2) where Hi means that a batch comes from a population with a fraction defective pi (i = 1,2). We accept a batch if HI is accepted and we reject the batch if H2 is accepted. In the sampling plan one item is inspected at a time and inspection is stopped as soon as sufficient evidence is accumulated in favor of either of the hypotheses. If the cost of the inspection depends merely on the total number of items inspected and no extra cost is involved due to sequential sampling, a sequential plan will be the most economical one. Also because of the optimum property of the SPRT, the latter will be the optimum test procedure when the cost of inspection is linear in the number of items inspected. The optimum procedure is given by:

(i) continue inspection as long as

a1 A2 < x = -l(X,Y) < A1, (3.7.2) a2

(ii) stop inspection and accept the batch as soon as

(iii) stop inspection and reject the batch as soon as 3.7. BAYES SEQUENTIAL PROCEDURES 107

where X and Y denote respectively the number of effectives and defectives obtained at any stage, A represents the ratio of the weighted likelihoods, and Z (X,Y) is the likelihood ratio given by (pl/p2)y(q1/q2)x , qi = 1 - pi (i = 1,2). It is more convenient to write Inequality (3.7.2) as

-v

Thus when X = XI, the posterior probability that HI is true is XI/ (1 + XI) and the posterior probability that H2 is true is I/ (1 + XI). Then, we accept HI and thereby incur a loss W12 if H2 is true. Thus, the expected loss due to an immediate decision is W12/ (1 + A,). On the other hand, if we inspect one more item, incurring a cost c, and continue the procedure thereafter, we accept the batch if the next item is an effective one since Alql/q2 > XI. The cost W12 will be incurred if H2 is true, so that the expected loss is

W12P (H2 is true) P (next item is effective I H2 is true) = W12q2/ (I + XI) . 108 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

If the next item is a defective, the likelihood ratio will be Xlpl/p2, and we shall continue with a (new) SPRT starting at this point. If the number of effective and defective items obtained with this new test is (XI,Y’), then we continue sampling if

Clearly, if X2/X1 2 pl/p2 we reject the batch, so that sampling continues only if X2/X1 < p1/p2. The cost due to this continuation is

if Hi is true and p < 6 where 6 = p2/p1 and p = X2/X1. If H2 is true and p < 6 the cost due to continuation of sampling is

If p > 6, we reject the batch, and the cost is W21p1X1/ (1 + XI). Now, equating the expected loss of an immediate decision to the expected loss if at least one more item is inspected, we have for p < 6

(3.7.4)

and, for p > 6 w12 W21PlX1 W12Q2 -- -c+ +- (3.7.5) 1 + A1 1+X1 1+X1 Solving for A1 from (3.7.4) we get, for p < 6

(3.7.6)

and for p > 6 from (3.7.5) we have

W12P2 - A1 = (3.7.7) w21p1 + c’

A similar argument leads to the following equation for X2.

X2 = W12q2L (Yh,7; P2) + c + q2cL (Yh,7; P2) (3.7.8) W21QlJ5(YlP, 7;P1) - c - q1cL (YlP,w1) 3.7. BAYES SEQUENTIAL PROCEDURES 109

Dividing (3.7.8) by (3.7.6) we have, for p < pl/p2

and dividing (3.7.8) by (3.7.7) we have, for p > pl/p2

Using Equations (3.7.9) and (3.7.10) one can solve for p by the method of iteration. the usual way of determining the boundaries of a SPRT is to use Wald’s approximate formulae which assume the error probabilities to be small, which is not always true if the sampling inspection plans are designed on a minimum cost basis. Hence, we would prefer some type of exact formulae that are useful from a practical point of view. Burman (1946) provided such formulae which will be discussed next.

The SPRT for a binomial population as defined by (3.7.3) can be reduced (without much loss in accuracy) to a scoring procedure given by

(3.7.11) where b = In (p1/p2)/ In (ql/q2).If b, MI and M2 are rounded off to the nearest integers, the SPRT reduces to the following scoring scheme. Start with a score M2, and add one to the score for each effective item found, subtracting b for each defective item found. Reject the batch if the score is zero or less, and accept the batch if the score reaches 2M = MI + M2. Formulae for the ASN and OC functions of such a scheme have been given by Burman (1946) which are exact if b, MI and M2 are integers. The error involved in rounding them off is small if b exceeds ten, which is often satisfied in practice. One can express h (p) and hl (p) given by Equations (3.7.9) and (3.7.10) in terms of the score notation by replacing

and n(6,pb;pj) by n(b,2M-b;pj), forj= 1,2, where b and 2M = - In p/ In (ql/q2) are rounded off to the nearest integers. 110 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

The Methods of Calculation

In practice, fix the values of al,a2,pl ,p2,W12, W21 and c, and then compute the value of b. In order to solve Equations (3.7.9) and (3.7.10) (which are expressed in terms of the score notation) we start with some guessed value for p and then iterate until we get the same value of p. In the right hand side expressions of the modified versions of (3.7.9) and (3.7.10) p enters through 2M, which is an integer, so that when we get a p which gives rise to the same value of 2M that was used in the previous iteration, we stop and take that as our final iterated solution. Once p is obtained, A1 and A2 can be computed from (3.7.6) and (3.7.8) or (3.7.7.) and (3.7.8) as the case may be. The value of p is always less than one, and a lower bound for p has been derived by Vagholkar (1955) which is given by

(3.7.12)

The lower bound or any number which is a little higher than the lower bound can serve as a good first guess for p, in order to start the iteration.

Example 3.7.1 (Vagholkar and Wetherill, 1960). Let a1 = 5/9, a2 = 4/9, pl = .01, p2 = -10, W2l = 400, W12 = 500 and c = 1. We get b = 24 and by (3.7.12) we have .00065 < p < 1. Starting with a guessed value of .001 for p the successive values of p obtained are

P+ .001 + .001607 + .001600 .1 I -1 2M = 72 2M = 68 2M = 68

This gives A1 = 30.5, A2 = .0489 and the optimum test procedure (3.7.11) is given by M2 = 34, 2M = MI + M2 = 68, b = 24.

Remark 3.7.1 It has been assumed that the cost of inspection is proportional to the number of items inspected. Also, the difficulty in carrying out a SPRT arises in that a decision is required after every inspection of an item as to whether to continue sampling or pass a judgement on the batch. This can be taken care of in the cost function by adding an extra term, namely Dd where D denotes the number of times the decision to continue sampling or pass a judgement on the batch has to be made and d denotes the cost associated with stopping the sampling in order to make this decision. Further, one can also add on the cost of processing the items sampled. This may be equal to Tt where T denote the number of times the process is carried out, and t denotes the cost associated with each processing. Thus, the total cost becomes

cost = nc + Dd -t- Tt (3.7.13) 3.7. BAYES SEQUENTIAL PROCEDURES 111

In this case the optimum test will be a SPRT with items sampled in groups (not necessarily of a constant size). If the constant group size is k, then (3.7.13) becomes nn dt cost =nc+-d+-t=nc', c'=c+-+- (3.7.14) kk kk which is linear in n. Two particular results apply to group sequential sampling which follow from the results of Wald and Wolfowitz (1948). These are stated as lemmas.

Lemma 3.7.1 The risk associated with a group sequential sampling scheme is greater than or equal to the risk for a unit-step sequential, for given and loss functions.

Lemma 3.7.2 The optimum boundaries for group sequential sampling are within the optimum boundaries for the unit-step sequential sampling having the same prior probability and loss functions.

Remark 3.7.2 The above lemmas are useful in practice, because for group sequential sampling we can derive the optimum boundaries from unit-step se- quential sampling using c', the artificial cost per item given in (3.7.14). These boundaries will contain the true optimum boundaries. If the group size is large, and if we can replace the two point binomial distribution by an equivalent two point normal distribution, a more exact solution is possible.

Remark 3.7.3 The three point binomial distribution is difficult to handle mathematically. However, Vagholkar and Wetherill (1960) and Wetherill (1957) provide a result pertaining to the three point binomial distribution.

3.7.2 Dynamic Programming Method for the Binomial Case In the previous work of Vagholkar and Wetherill (1960) the set of values of p is replaced by two special values of p namely, pl and p2 such that one decision is clearly appropriate for pl and the second decision is appropriate for p2. This approach is somewhat unsatisfactory and is an oversimplification of the problem. For the binomial problem Lindley and Barnett (1965) have given an optimal Bayes sequential procedure that can be implemented numerically by the backward induction techniques of dynamic programming. This will be described in the following. Without loss of generality, assume that the losses are given by

L1 (P) = b, L2 (13) = P (3.7.15) where, without affecting the problems, the scales of losses have been changed and Li (p) is the loss associated with the terminal decision di (i= 1,2). We shall also assume that 0 < k, < 1, otherwise the problem is trivial (if k, lies outside 112 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES this range, then one decision is optimum for all p and no sampling is necessary). We call k, the critical value or the break-even value of p. If p < k,, d2 is the optimum decision, and if p > k,, dl is optimum where for convenience, let dl be the decision to reject and d2 the decision to accept. This agrees with the industrial application to sampling inspection where Xi = 1 if the ith individual item in the batch is defective. Let c be the constant cost of sampling an item. Notice that we cannot scale the cost of sampling since we have already scaled the loss functions. The prior distribution is the conjugate prior, namely, the beta family given by (a b - l)!p"-l(l - pIb-l, a,b > 0 + (3.7.16) (a - l)!(b - l)! If the optimum scheme is tabulated for a = b = 1 (that is, when the prior density of p is uniform), then the optimum scheme can be known for any beta prior distribution with positive integral a and b for the following reason: The tabulated scheme for a = b = 1 tells us what to do if n observations are taken out of which r are found to be defective. Then since the likelihood of p is proportional to pr (1 - p)"-', the posterior distribution of p is proportional to the likelihood of p. Hence, the situation is the same as that of starting with a = r, b = n - T and no further tabulation for this prior distribution is necessary. Therefore, the tables depend only on the parameters k, and c.

3.7.3 The Dynamic Programming Equation If the current of p is beta given by (3.7.16), then the expected loss is k, if dl is taken, and is equal to E(p) = u/(u+b) if d2 is taken. Hence, if only two terminal decisions are considered we reject (take dl) if k, > a/(. + b) and otherwise accept (take d2). If the prior distribution has a = b = 1 then this current distribution is obtained by observing r = (a - 1) defectives (XI = 1) and n - r = (b - 1) effectives (Xi= 0). In this case we reject if k, < (r + 1) / (n + 2). One can plot the schemes on a diagram with (a + b) along the horizontal axis and a along the vertical axis. The line kr = a/(. + b) is called the critical line. Above it the optimum decision among all the terminal decisions is to reject; below it is better to accept. Then, the loss incurred due to this terminal decision is given by

D (a,b) = min k,, - (3.7.17) { arb}. One of the other possibilities besides making a terminal decision, is take one observation and then choose among the terminal decisions. Amongst all the possibilities, there is one procedure that has the smallest expected loss and this will be called best. Let B(a,b) be the expected loss of the best possible procedure when the prior distribution has values (a,b). If B(a,b) = min [k,, a/ (a + b)],then 3.7. BAYES SEQUENTIAL PROCEDURES 113 the best procedure is to stop and take that terminal decision with the smaller loss. If B(a,b) < min [k,, a/ (a + b)],take at least one more observation and then proceed as follows. Let B*(a,b) denote the expected loss if one further observation is made followed by the optimum decision. If the observation is defective then a and (a+b) will increase by 1. Hence, the expected loss obtained by adopting the optimum procedure after the observation will be B(a + 1,b). If the observation is an effective one, the expected loss will be B(a,b + 1). Consequently, if one observation is taken, when the prior state of knowledge is (a,b), we have

U b B*(a,b) = c + -B(a + 1,b) + --(a, b + 1). (3.7.18) a+b a+b When once we know D(a,b) and B*(a,b) the equation for B(a,b) is, since stopping or taking one more observation are the only possibilities,

B(a,b) = min [D(a,b), B*(a, b)] . (3.7.19)

If B(a,b) is known for all a, b, with a+ b = zo (say) then (3.7.19) enables one to find B(a,b) for all a, b with a + b = zo- 1. Consequently, B(a,b) is known for all a,b with zo- a - b equal to an integer. Once B(a,b) is known, the optimum procedure at (a,b) is easily found by accepting if it is equal to a/(a+ b), rejecting if it is equal to k,, and otherwise taking one more observation. Thus, each point can be labelled as acceptance, rejection or continuation. According to Lindley and Barnett (1965) it can be shown that for fixed (a+ b) the continuation region is an interval (possibly empty) between the two boundaries with the acceptance region below and the rejection region above, and that for all sufficiently large (a + b) the interval of the continuation region is empty. Therefore, there is a least upper bound to the values of a + b in the continuation region, at which the rejection and continuation boundaries meet on the critical line. This meeting point [call it (ii + 6, b)] will satisfy the following relation - - - - U a b a Ic, = -- c+ -Ic, + - (3.7.20) a+b a+b -7i+b G+b+l'

Letting 3 = ii + 6, and solving we obtain

(3.7.21) L J Equation (3.7.21) gives the upper bound beyond which it is never worth taking further observations. From a practical point of view, it is sufficient to start from the highest reachable point. the authors discuss as to how to find the highest reachable point. They also provide a computational method. They include a discussion about the OC and ASN functions of their procedure. 114 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

The Efficiency of Their Procedure If c were equal to zero, we could sample indefinitely until p were known: Then the value of B(1,l) is

The initial expected loss when a = b = 1 is kr (if kr 5 0.5) obtained by rejecting. The difference between these two equals k:/2 which is the expected value of perfect information (EVPI) in the sense of Raiffa and Schlaifer (1961). Similarly, for any other value of c, kr - B(1,l) denotes the expected value of sample information (EVSI). Thus a measure of the efficiency of the optimum scheme is the ratio of the EVSI to EVPI given by

(3.7.22)

This measure is a better criterion for the performance of the scheme than B(1,l)which depends for its interpretation on kr and c. The following table lists the efficiencies for various values of kr and c. Table 3.7.1 Showing the Values of E = 2 [kr - B(1, l)]/k: kr\ c 0.05 .01 .001 .031 .041 .051 .061 .01 .oo .59 .90 -1 -00 .66 .91 .98 1-00 .2 .OO .52 -86 .97 .99 1.00 -3 .OO .15 -72 -93 .98 1.00 .4 .OO .45 .82 .95 .99 1.00 .5 .27 .61 .88 .97 .99 1.00 For fixed kr, the efficiency naturally increases as the sampling cost c decreases. The limiting value of 1 is approached much more slowly for small kr than for values of Icr near l/2. Lindley and Barnett (1965) provide normal approximations to the boundaries, which are consistent with the results of Moriguti and Robbins (1962) and Chernoff (1960). Lindley and Barnett (1965, Section 16) also consider the normal case with HO : p > 0 against HI : p 5 0 and known variance where p has a normal prior distribution. The problem of sequential sampling has been considered in great generality for the exponential family by Mikhalevich (1956); in particular, he investigates the circumstances under which the optimum schemes terminate.

3.7.4 Bayes Sequential Procedures for the Normal Mean This problem has been considered by Chernoff in a series of papers. The review paper by Chernoff (1968) summarizes the results pertaining to the Bayes sequen- tial testing procedure for the mean of a normal population. We present these 3.7. BAYES SEQUENTIAL PROCEDURES 115 results and for details the reader is referred to the references given at the end of the review paper by Chernoff (1968). The problem can be formulated as follows. Let X be a normal random variable having unknown mean p and variance a2.We wish to test HO : p 2 0 against HI : p < 0 with the cost of an incorrect decision being klpl, k > 0. Let c denote the cost per observation. The total cost is cn if the decision is correct and cn + k lpl if it is wrong where n is the number of observations taken. Thus, the total cost is a random variable whose distribution depends on the unknown p and the sequential procedure used. The problem is to select an optimal sequential procedure. After much sampling , one is either reasonably certain of the sign of p or that 1pl is so small that the loss of a wrong decision is less than the cost of another observation. Here, one expects the proper procedure to be such that one stops and makes a decision when the current estimate of p is sufficiently large and continues sampling otherwise. The adequacy of the largeness of p depends on, and should decrease with, the number of observations or equivalently the precision of the estimate. It can be shown that after a certain sample size, it pays to stop irrespective of the current estimate of 1p1. For given values of the constants, this problem can be solved numerically by the backward induction techniques of dynamic programming em- ployed by Lindley and Barnett (1965). However, care must be taken in initiating the backward induction at a sample size n sufficiently large so that no matter what the estimate of p is, the optimal procedure will lead to a decision rather than to additional sampling. The technique of the backward induction can be summarized by the equation (3.7.23) where & (t,) is the expected cost of an optimal procedure given by the history [, up to stage n, tn+l(6,) 5,) describes the history up to stage n + 1 which may be random, with distribution depending on t, and the action 6, taken at stage n. It is possible to show that En is adequately summarized by the mean and variance of the posterior distribution of p. The evaluation of posterior distributions when dealing with normal random variables and normal priors enables us to treat the problem without much difficulty. If it is desired to have an overall view of how the solutions depend on the various parameters, the simple though extensive numerical calculations of the backward induction are not adequate. A natural approach that is relevant to large-sample theory is that of replac- ing the discrete time random variables by analogous continuous time stochastic processes. The use of the Wiener process converts the problem to one in which the analytic methods of partial differential equations can fruitfully be used. So let us assume that the data consist of a Wiener process X(t)with unknown drift p and known variance a2 per unit time. Also assume that the unknown value of p has prior normal distribution with mean po and variance 0;. One can easily 116 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES verify (or see Lemma 4.1 of Chernoff (1968)) that the posterior distribution of p is again normal with mean Y(s)and variance s where

pooC2+ (t)K2 Y (s) = x (3.7.24) oo2 + to-2 > 1 s= (3.7.25) 0;2 + to-2’ and Y (s) is a Wiener process in the -s scale originating at (yo, SO) = (po,oi), that is E [dY(s)]= 0, var [dY(s)]= -ds. (3.7.26) Notice that s decreases from so = oi as information accumulates. Since the X process can be recovered from the Y process it suffices to deal with the latter, which measures the current estimate of p and which is easier to analyze. The posterior expected cost associated with deciding in favor of Ho at time t (when Y(s)= y) is

(3.7.27) where $J+ (u)= 4 (u)- u [l- @ (u)].Similarly the posterior expected cost asso- ciated with deciding p < 0 is lcfi$- (y/fi) where $- (u)= 4 (u)+ uc9 (u).It is easy to see that if sampling is stopped at Y(s)= y the decision should be made on the basis of the sign of y and the expected cost of deciding plus the cost of sampling which is given by

(3.7.28) where (3.7.29)

Thus the continuous time version of the problem can be viewed as the fol- lowing stopping problem: The Wiener process Y (s) is observed. The statistician may stop at any value of s > 0 and pay d {Y(s),s} . Find the stopping proce- dure which minimizes the expected cost. In this version using Y(s),the posterior Bayes estimate, the statistical aspects involving the unknown parameter p have been abstracted . The original discrete time problem can be described in terms of this stopping problem provided the permissible stopping values of s are restricted to so, s1, s2, ... where sn = (go2 + n~-~)-l.Now it should be straightforward to see that the discrete version can be treated numerically by backward induction in terms of the Y(s)process starting from any sn = c2/k2$J2(0) = 27rc2/k2. 3.8. SMALL ERROR PROBABILITY AND POWER ONE TEST 117

3.8 Small Error Probability and Power One Test

Before we consider power one tests, we give a useful lemma.

Lemma 3.8.1 (Villa (1939) and Wald (1947, p. 146)). For each n >_ 1, let fn (XI,..., zn)[f; (XI,..., x,)] denote the joint probability density function of Xl,X2, ...,X, under Ho [HI]. Also let Pi(E) denote the probability of E com- puted under the assumption that Hi is true (i = 0,l) . Also, let An = fkl f' when fn > 0. Then, for any E > 1, 1 Po (An > E for some n 2 I) 5 -. (3.8.1) &

Application. Let the X's be i.i.d. normal (8,l) and let f6,n (XI, 22, ..., xn) denote the joint density of XI,X2, ..., Xn and let G(8) denote a prior distribution of'e. NOW, set

00 fA (~1,~2,--.,xn)- J_, f6,n (217 x2, -*-) xn) dG(8)

00 - 8)dG(8) and (3.8.2) where Sn = Xi + Xz + - - - + Xn. Let

(3.8.3) and replace G(8) by G(8Jm) where rn is an arbitrary positive constant. Then (3.8.2) takes the form of

exp 8Sn - dG(8rni) = J_m_ ( 2 = Irnexp (e- g) d~(9). (3.8.4) -00 Jm 2m Thus, for i.i.d. normal (0,l) variables

(3.8.5)

In order to understand the implication of (3.8.5) let G(8) = 0 for 8 < 0 so that g(x,t) is increasing in x. If A(~,E)is the positive solution of the equation g(x,t) = E, then g(x,t)2 E if and only if x 2 A(t,e). (3.8.6) 118 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Hence (3.8.5) becomes

forsomen21 I-,1 (m>O,~>l). I€ (3.8.7)

Remark 3.8.1 It was shown in Robbins (1970) that (3.8.7) is valid for an arbitrary i.i.d. sequence of random variables provided

(3.8.8)

Example 3.8.1 Let P(X = 1) = P(X = -1) = 1/2. Then

Robbins (1970) provides some examples where it is possible to give an explicit form for the function A(t,E).

Example 3.8.2 Let G(0) be degenerate at 8 = 2a > 0. Then

In E g(x,t) = exp (2ax - 2a2t) 2 E if and only if x 2 at -. + 2a Hence (3.8.7) gives with d = In&/(2a),

an O).

Example 3.8.3 If G(y) is the folded standard normal, then Robbins (1970) shows that

exp (-."I2), (a,d, m > 0) (3.8.9) 2Q,(a) 3.8. SMALL ERROR PROBABILITY AND POWER ONE TEST 119

Tests with small error probability Let XI,X2, ... be an i,i.d. sequence of normal (8,l) variables, where 8 is an unknown parameter (-co < 8 < m). Suppose we wish to test H- : 8 < 0 against H+ : 0 > 0 (8 = 0 is excluded). Then, define the stopping time N = inf (n : lS,l 2 c,} and accept H+ [H-] when SN _> CN [SN5 -CN] where

(3.8.10)

(Here h(x) = x2 and note that cn/n N [(lnn)/n]1/2-+ 0 as n -+ co). Now Pe(N=co) = lim Pe(N>n) 72400 = lim (ISj( cj,j n) n-wc Po 5 5

since Sn/n -+ 8 # 0 under H- or H+. Hence Po (N < co) = 1 for 8 # 0, whereas for 0 > 0 Po (accept H-) = Po (S, 5 -c, before S, 2 k) 5 Po (S, 5 -c, before Sn 2 G) 1 = zPo(lSnl >_ c, for some n >_ 1)

-2< Lexp(f) after using (3.8.9) with h(s) = x2. Similarly (from symmetry considerations) we have Po (accept H+) 5 - exp - for 0 < 0. 2 (-f) Thus the error probability of this test is uniformly bounded by (1/2) exp( -u2/2) for all 8 # 0. Hence

Po (N < 00) = Po (IS,l 2 cn for some n 2 1) = 2P0 (S, > c, for some n 2 1) 120 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

So the test will rarely terminate when 8 = 0. However, Ee (N) is finite for every 8 # 0, approaches 00 as 8 -+ 0 and 1 as 181 --+ 00; because

00

n=l 00

n=O 00

n=O and all the terms in the summation except the one for n = 0 will be equal to zero as 101 -+ 00.

Remarks 3.8.2 In a similar fashion one can construct an SPRT with uni- formly small error probability when the Xi are i.i.d. Bernoulli variables with P(X1 = 1) = p = 1 - P(X1 = -1) and we set 8 = 2p - 1 and H- : 6 < 0 and H+:e>o.

Tests with Power One Let X1,X2, ... be i.i.d. normal (8,l) and we wish to test HO : 8 5 0 against HI : 8 > 0. Let the stopping time be defined as smallest n such that Sn c, N={ 2 (3.8.11) 00, if no such n occurs, and when N < 00, stop sampling with XN and reject Ho in favor of HI; when N = 00, continue sampling indefinitely and do not reject Ho, where Sn and c, are as defined in (3.8.10). For 8 < 0,

Po (reject Ho) = Po (N < 00) I Po(N

PO (not reject Ho) = PO (Sn < c, for all n 2 1) = 0 for all 8 > 0, 3.8. SMALL ERROR PROBABILITY AND POWER ONE TEST 121 because c,/n - 0 and Sn/n + e(> 0) as n + 00. Thus the test has power one against HI. Clearly the test rarely terminates when 0 < 0. Towards &(N) for 8 > 0, we have (after using (2.6.12) with p = 0)

-2lnPo(N < GO) , for every 0 > 0. (3.8.13) Ee(N) 2 e2

For example, if N is such that PO(N < 00) = 0.05, then Eo(N) 2 6/02. However, no such N will minimize Ee(N) uniformly for all 0 > 0. For N given by (3.8.11) and c, selected here (ct is concave for t 2 l), Darling and Robbins (1968a) have shown that

(3.8.14)

For our choice of c, with rn = 1, and u2 = 9, we obtain from (3.8.13) and (3.8.14) the following table.

Table 3.8.1. Upper and Lower Bounds for Eo(N) with m = 1, and u2 = 9 e .1 1 2 Equation (3.8.13) 1040 10.4 26 Equation (3.8.14) 1800 15 5

Monte Carlo studies will usually yield more precise estimates of Ee(N). How- ever, they are not directly applicable to estimating the type of error for which the upper bound (3.8.12) becomes 0.0056 when u2 = 9. Thus the proposed test has type I error probability 5 a uniformly and type I1 error probability = 0. Also the sample size N is finite with probability one when HI is true, N = 00 with probability 2 1--a when HOis true. The latter property is not usually acceptable in statistical inference, since Wald’s SPRT was originally designed for problems in acceptance sampling, where nonterminating tests are not of much use. Darling and Robbins (1968a) provide a practical situation in which a physician has to decide whether to switch over to a new drug and where a test of power one makes sense. Barnard (1969), independently of Darling and Robbins (1967a, 196713, 1968a) has proposed tests of power one for the following Bernoulli problem. Suppose we wish to test HO : p < po versus HO : p 2 po where p denotes the probability of a certain component being defective. Let Sn denote the number of defectives in n components. In practical applications, po is small. the stopping time N is defined as:

N={ smallest n for which Sn 2 npo + zI-~[npo (1 - PO)]^/^ m, if no such n occurs, 122 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES where denotes the (1 - a)100% point on the standard normal distribution. After we stop, we reject Ho. The above sequential procedure has power one uniformly for all p E HI and the type I error probability a can be made arbitrarily small by choosing a small. In order to see this, let us consider the standardized variables :

where q = 1 - p. Hence, we stop as soon as

ZN > (3.8.15)

Also, recall that the aw of the iterated logarithm says that

(3.8.16)

M for any Sn = XI+X2 + - - * + Xn where the X’s are i.i.d. having mean zero and variance one. In our case, the successive maxima of the sequence of Zn increases like (2 log log N)l12 with probability one. If p > PO, then with probability one, we eventually stop the sampling process. If p = PO, then the right side of (3.8.15) is a constant and again, ZN will exceed that constant sooner or later. However, when p < po, the coefficient of N112 is positive and hence the right hand side of (3.8.15) grows faster than (2 log log N)l12 and thus, there is a positive probability that the inequality in (3.8.15) will be violated for all N. This latter probability can be made arbitrarily close to 1, for any given value of p < .01, by choosing a arbitrarily small. Another possible practical application of the above formulation is the super- vision of weights and measures. Sugar is packaged and is sold to consumers. The weight of each packet of sugar should not fall below WO;otherwise the man- ufacturer will be prosecuted. So the distribution of the weight of sugar in a package W follows a distribution having mean WOand a small variance such that P (W < WO)is very small when E (W)= WO. Suppose we wish to test Ho : E(W)2 Wo against HI : E(W)< WO.Note that difference W- WOconstitutes a bonus by the manufacturer to the consumers. The Weights and Measures Inspectorate should record the deviations

n di = Wo - Wi, i = 1,2, ... , let Tn = Cdi. i=l

They should stop experimentation as soon as Tn > kn112 and then prosecute the manufacturer, where k is chosen keeping in mind the standard deviation of 3.9. SEQUENTIAL RANK TEST PROCEDURES 123

Wi and the amount of risk of being prosecuted the manufacturer is willing to take. In this way, the manufacturer who consistently gave below legal weight would eventually be caught and prosecuted, while honest manufacturers would not have to spend unnecessary funds on unduly elaborate weighing equipment. They could pass on some of the benefits of the resultant cost reduction to the consumers.

3.9 Sequential Rank Test Procedures

In this section we shall consider certain nonparametric test procedures having power one (which were proposed by Darling and Robbins, 1968b) and simplified by Robbins (1970) and rank order test procedures for Lehmann alternatives. First we consider the test procedures proposed by Darling and Robbins (1968b) and Robbins (1970).

3.9.1 Kolmogorov-Smirnov Tests with Power One Let X [Y]have distribution function F(z)[G(y)]. Also, let D+ (F,G) = [F(z)- G(z)],D (F,G) = IF(z) - G(z)I; assume that X and Y are independent.

. (a) We wish to test HI : F(z) 5 G(z) for all z, against Hla : F(z) > G(z) for some x. Let F, (z) [G, (z)] denote the empirical distribution function based on a random sample XI,X2, ..., X, [Yl,Y2, ..., Y,].In order to test HI define

N = first n 2 m such that D+ (F,,G,) 2 -dn) (3.9.1) n where g(n) is some positive sequence such that g(n)/n+ 0 as n 3 00. Let h(~)denote the inverse function of g(x)/z.If HI is false and D+ (F,G) = d > 0, then by the Glivenko-Cantelli theorem, D+ (F!,G,) -+ d with probability one while g(n)/n-+ 0 as n -+ 00 so that P(N < m) = 1. Then, if we agree to reject HI as soon as we observe that N < 00 while if N = 00, we do not reject HI, then the test certainly has power 1 when HI is false. It remains to consider the probability of type I error. Towards this we have the following result.

Result 3.9.1 (Darling and Robbins, 1968b). Let the stopping time be defined by (3.9.1). Also let (i) g(x)/x < 1, (ii) g(x) be concave. Then P (reject HlIH1 is true) 5 a (3.9.2) where a 2 Cr=,exp [-g2(n)/(n+ l)]. If HI is false and D+ (F,G) = d > 0 with d 5 g(m)/rn,then

(3.9.3) 124 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

(b) Suppose we wish to test the hypothesis: H2 : F(t)= G(t)for all t, against Hza : F(t)# G(t)for some t. Define N as

N = smallest n 2 m such that D (Fn,Gn) 2 -g(n> n and we reject H2 if and only if N < 00. For the error probabilities we have

Result 3.9.2 (Darling and Robbins, 1968b). For the above stopping time,

P (reject H21Hz is true) 5 2a, P(reject H21H2 is false) = 1. (3.9.4)

If H2 is false, and D(F,G) = d > 0 with d 5 g(m)/rn, then (3.9.3) is valid, where a is as defined above.

(c) Let FO be any specified d.f. and consider testing the hypothesis: H3 : F(t)5 Fo(t) for every -00 < t < 00, against H3a : F(t)> Fo(t) for some t.

Result 3.9.3 (Darling and Robbins, 1968b). Define N as the smallest n _> m such that D+ (Fn,Fo) 2 g(n)/n,and reject H3 if and only if N < 00. Then

P (reject ~31.~3is true) 5 2&, (3.9.5) P(reject H31H3 is false) = 1.

If H3 is false, and D+(F,Fo) = d > 0 with d 5 g(m)/m,then (3.9.3) holds.

3.9.2 Sequential Sign Test Assume that X has d.f. F(x) and Y has d.f. G(y). We wish to test Ho : F = G, against the alternative HI : G(x) = F2(x)for all x. We observe pairs of observations (Xl,Yl),(X2,Y2), ... sequentially. Assume that p = P(X < Y)is constant from stage to stage. Then the hypotheses can be replaced by Hb : p = 1/2 versus Hi : p = 2/3. The sequential sign test procedure has been proposed by Hall (1965, p. 40). Reduce the data to the signs of the differences Xi - yZ. This may be justified by invariance under monotone transformations gn applied to each of the observations in stage n. (See Section 1.8 of Hall, Wijsman and Ghosh, 1965). The reduced data constitutes a Bernoulli sequence with success (positive sign) probability p. The simple SPRT for Bernoulli data (see Example 2.1.2) is applicable and the likelihood ratio at nth stage being that of a binomial random variable with parameters n and p. 3.9. SEQUENTIAL RANK TEST PROCEDURES 125

Groeneveld (1971) has proposed a sequential sign test for symmetry consid- ered by Miller (1970). Groeneveld assumes that the density f (x;0) = f (z - 0) where f(x) > 0 for all x, f(z)= f(-z) and f’(x) < 0 for x > 0. In order to test Ho : 0 = 00 against HI : 0 = 01 > 00, one can carry out an SPRT by computing

Groeneveld’s (1971) sequential test of the same hypotheses is based on the number of times the ratio 2 = In [f (X;01) / f (X;Bo)] is positive. Let this statistic be denoted by Tn. It is clear that this ratio is positive if and only if - eo+el - e. X>-- 2

(Note that f (x - 00) = f (x - 01) when x = 3.) In this sequential test we are testing two alternative values of a binomial parameter p, HO : p = po against HI : p = pl where

PO = P (x> 3p0) = P x - eo > - ( -2 80> and p1 =P(X> - el > - x eo -2 = i;f (4dX with A = (el - eo)/2. It is easy to see that po + pl = 1. It is of interest to compare the efficiency of the sequential sign test (SST) relative to the SPRT. When both the tests have the same error probabilities a! and p, the relative efficiency is e = E(N)s~RT/E(N)ssT.Using the approximate formula for E(N) given by Eqs. (2.4.2) and (2.4.3) and noting that the random variable 2 for the SST is given by 2 = In { (pl/p~)~[(l- p1) / (1 - p~)]’-~}= (2Y - 1) ln (pl/po) since po +pi = 1 where Y takes on the value 1 with probability p and 0 with probability 1 -p, one can obtain the relative efficiency under HO as

where 2 = In [f(Xi; 01) /f (Xi;eo)] and Eeo(2) = s-”, In [f(Y - 2A) /f (Y)] f(y)dy. The same value results under HI. Table 3.9.1 gives values of these efficiencies when X is normal or Laplace (double exponential). The parameter A is in terms 126 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES of standard deviations. Table 3.9.14 Efficiency of SST Relative to SPRT 2A Normal Laplace .1 .634 .978 .2 .635 .959 .4 .634 927 .5 .632 .914 .6 .630 .902 .8 .624 380 1.o .618 362

Both the expressions in the numerator and denominator of e are functions of A and the density function f(x). Under additional regularity conditions on f(x), both can be expanded in a Taylor series. The resulting expression for e is

Hence lim e = Ad0 which is also the efficiency of the as an estimator of 8 in relation to the best unbiased estimator of 6' [See Fisz, 1963, Chapter 131, and is also the asymptotic efficiency of the sign test relative to the most powerful test for the hypothesis of symmetry if we consider competing tests based on a fixed-sample size [See HAjek and Sidak, 1967, Chapter 71. Table 3.9.2 gives the limiting value for several distributions. Table 3.9.24 Limiting Efficiencies of SST Normal Laplace Logistic Cauchy 2Ir = -637 1 .75 8/r2= -810

One would consider this SST of Groeneveld (1971), because the calculation of the statistic T, does not require the specific form of f(x). But the value po depends on the distribution of X. However, the test can be carried out by choosing a value for po (say .4) and hence pl = -6 and the values of A and B (Wald's approximate boundary values) so that the sequential binomial test has errors a and p. If A is measured in standard deviation units, then by the improved Chebyshev's inequality for continuous unimodal distributions [See CrBmer, 1946, p. 183.1, P (X - 6'0 > A) 5 2/9A2. Hence po = .4 implies that A2 5 5/9 4Reproduced from The American Statistician Vol. 25, copyright (1971) by the American Statistical Association. All right reserved. 3.9. SEQUENTIAL RANK TEST PROCEDURES 127 or A 5 .745. That is, the SST corresponds to a SPRT with error sizes (a,p)in which 81 (80) differs in absolute value from 3 by at most .745 standard deviations, whatever the distribution of X, provided a2is finite.

3.9.3 Rank Order SPRT’s Based on Lehmann Alternatives: Two- Sample Case Nonparametric alternatives known as Lehmann (1953) alternatives are given in terms of powers of the d.f. under the null case. For example, in the two-sample case if NO: F = G, then the Lehmann alternative is given by HI : (i) G = FA or (ii) G = 1- (1 - F)A (A > 0) or (iii) G = h(F)such that h(0) = 0, h(1) = 1 and h (.) is nondecreasing in (a). Let (XI,..., Xnl) denote a random sample from F and (Y1, ..., Yn2)denote a random sample from G where X’s and Y’s are mutually independent, and F and G are assumed to be continuous. Let s1 < s2 < < sn2 be the ranks of Y1, Y2, ...,Yn2 in the combined sample of size n = n1 + 722. Then S = (s1, s2, ..., sn2)is called the rank order. One can derive an explicit expression for the probability of a rank order under Lehmann alternatives. (For instance, see Lehmann, 1953). Using this result, Wilcoxon, Rhodes and Bradley (1963) (see also Bradley, 1967) have developed two sequential two-sample grouped rank tests called sequential configural rank test (SCR-test) and the rank sum test. Bradley, Merchant and Wilcoxon (1966) provide a modified version of the configural group rank test proposed earlier, which is based on rerankings of ob- servations as new groups of observations are obtained sequentially. Monte Carlo studies carried on the modified, sequential, configural rank test (MSCR) indicates a formal superiority of the MSCR-test over the SCR-test. Let us briefly describe the MSCR-test procedure. Suppose that X and Y-observations are taken in groups of m and n and that no group or block effects are present. Then at the tth stage of such a process, mt X-observations and nt of Y-observations are ranked in joint array. Then the likelihood ratio is the ratio of the probabilities of the ranks s!) j = 1,2, ..., nt which constitute the ranks of nt Y’s in the joint reranking at stage t. The likelihood ratio at tth stage is given by

(3.9.6)

By invariant sufficiency the likelihood ratio only for the last reranking is equiv- alent to the joint likelihood ratio for all rerankings up to stage t. Hall, Wijsman and Ghosh (1965) discuss the MSCR-test and note that its finite termination with probability one under HO and HI follows from the work of Wirjosudirdjo (1961). Next we shall turn to the work of Savage and Sethuraman (1966). Let (XI,Yl), (X2, Y2), .. . be independent and identically distributed bivariate random vari- 128 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES ables with a joint distribution H(.,.) which has continuous marginal distribu- tions F and G. We wish to test Ho : X and Y are independent and G = F against the alternative hypothesis, H1 : X,Y are independent and G = FA where A > 0, A # 1 is a known constant. At the nth stage of experimentation the available information is the ranks of (Y1,..., Yn) among (XI,..., X,, Y1, ..., Yn). Let the combined sample be denoted by (W1, W2,..., W2n) and the ordered com- bined sample by Wn,l,Wn,2, ..., Wn,an. Let Fn(Gn) denote the empirical distrib- ution function of Xi,..., Xn (Y1,..., Y,). Let sl < s2 < - - . < sn be the ranks of the ordered Y1, Y2, ..., Y, in the combined sample. Notice that the statistics (s1, s2, ..., Sn) is equivalent to (Gn (Wn,i),i = 1, ..., 2n) which in turn is equivalent to (Fn (Wn,i),Gn (Wn,i>,i = 1, ---,2n>. Lemma 3.9.1 With the above notation we have

Clearly, PH,,(z = z) = PH,,(s1, ..., Sn) = 1/ (2,") , which follows by setting A = 1 in (3.9.7) or from the fact that each rank order zis equally likely under HO and (2,") denotes the total number of distinct rank orders. Let

(3.9.8)

It should be remarked that, since the product on the right side of (3.9.8) is symmetric in the Wni one can replace the Wni by the Wi. Then the SPRT based on ranks for testing HO against HI is given by:

(i) Take one more observation if B < An (A,Fn, Gn) < A, (ii) accept Ho if An 5 B, and (iii) reject Ho if An 2 A, n = 1,2, .... Let N be the stopping variable for the above SPRT and let (3.9.9)

Then, from (3.9.7) we have

(3.9.10) Sn = In (4A)- 2 - Tn + 0 (2)7 3.9. SEQUENTIAL RANK TEST PROCEDURES 129

where Tn = n-l x:zl In [Fn(Wi)+ AGn(Wi)]. Also, let

00 S (A, F, G) = In (4A) - 2 - In [F(z)+ AG(z)] {dF(z)+ dG(z)}. (3.9.11)

Also let I, ifxsz I(.;.) = { 0, ifx>z7 W(z) = F(z)+AG(z),

W(Z,Y, 4 = I(z;4 + A%; 41 V(X,Y) = In [W(X)W(Y)]+ In (4A) - 2

Then combining the results of Savage and Sethuraman (1966) and Sethuraman (1970) we have

Result 3.9.4 The rank order SPRT based on Lehmann alternatives temi- nates finitely and has a finite moment-generating function provided P{V (X,Y) = 0) # 1 where V(X,Y) is given by (3.9.12).

Corollary 3.9.4.1 If X and Y are independent, then P { [V (X,Y)] = 0) < 1.

Next, towards the asymptotic normality of Sn we have the following result of Govindarajulu (196Sa) and Sethuraman (1970). Let {k} denote a sequence of stopping rules and {Nk} with N1 5 N2 5 . - - denote the sequence of stopping times associated with {R). Also assume that there exists a nonstochastic sequence of positive numbers {nk}, n1 5 n2 5 - - - 5 nk -+ 00 as k --+ 00 and Nklnk + 1 in probability as k -+ 00 .

Result 3.9.5 If X and Y are independent and the sequences {k}, {nk} and {Nk} are as defined above,

(3.9.13) for every z where

(3.9.14) 130 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Note that 0; (F,F) = 2 (1 - A)2 - (1 + A)-2 (3.9.15)

3.9.4 One-Sample Rank Order SPRT’s for Symmetry Let V1, V2, ... be independent and identically distributed random variables ob- served sequentially and having a continuous d.f. F. We wish to test the hypoth- esis, HO : F(w)+ F(-w) = I, for all w, (3.9.16) that is, F is symmetric about zero. Let H(w) = P(V 5 wJV 2 0) = [F(w)- F(O)]/ [l- F(0)] and G(w) = P(lVl 5 wlV < 0) = [F(O)- F(-w)]/F(O) for w 2 0, and H(w), G(w) = 0, for w < 0. Thus F(w) = F(0)[l -G(-w)] for w < 0 and F(w) = H(w) + F(0)[l- H(w)]for w 2 0. Then we can rewrite (3.9.16) as

HO : H(w) = G(w) for all w and F(0) = 1/2 and take HI : H(w) # G(w) for some w However, probabilities of desired rank orders cannot be derived explicitly under Ha. Assuming that H, : H(w) = 1 - [I - G(w)lA, w 2 0 with F(0) = A/(l+A), sampling with groups of fixed size at each stage and the absolute values of the observations within each group at each stage, Weed and Bradley (1971) have proposed two sequential procedures, one based on within- group configuration of signed ranks and the other based on the within-group sums of positive signed ranks. They carried out some Monte Carlo studies in order to test whether the model is approximate or not when the data is generated from the normal populations. The choice of F(0) is obtained by forcing F(w) not to have a jump discontinuity at the origin. Consider the alternatives:

Model I1 : Hull : H(w) = GA(w), for all w, A > 0, A # 1, (3.9.17) where A is specified, F(0) = XO, XO specified. Weed (1968) has considered the alternative

Model I : H,I : H(w) = 1 - [l- G(w)lA, for all w, A > 0, A # 1, (3.9.18) where A is specified, F(0) = A/ (1 + A).

Notice that if X = -V, V < 0 and if Y = V, V 2 0, and if X and Y have conditional d.f.’s (G and H respectively) satisfying (3.9.17) then the conditional d.f.’s of -X and -Y would satisfy (3.9.18) provided A0 = A/ (1 + A) and the 3.9. SEQUENTIAL RANK TEST PROCEDURES 131 converse statement is also true. Thus, the proof for finite termination would es- sentially be the same for both the models. (See Weed, Bradley and Govindarajulu (1974)). In the following we shall present the main results for Model 11. Let XI,...) Xm denote the absolute values of those V’s that are negative and let Y1, ..., Yn denote the nonnegative V’s, rn + n = t. Notice that rn is binomially distributed with parameters t and X = F(O),0 < X < 1. Let the combined sample of X’s and Y’s be denoted by W1, ..., Wt and the ordered combined sample by Wtl < - - - < Wtt. Let Gm and Hn respectively denote the empirical d.f. of XI)..., Xm and Y1)..., Yn respectively. Further, following Savage (1959)) let 2 = (21, ..., Zt) where Zi = 1 or 0 according as Wti corresponds to a negative or nonnegative V respectively. Also, let (3.9.19) where z denotes a specified value for the rank order. The SPRT for testing Ho against Hal or Ha11 is given by:

(i) Take one more observation if B < At < A, (ii) accept Ho if At 5 B, and (iii) reject HO if At 2 A, t = 1,2, ..., where 0 < B < 1 < A are suitable constants (independent oft). Let T denote the stopping variable for the above SPRT.

Lemma 3.9.2 With the above notation and assumptions we have

ztt!A;;” (1 - AO)” An nf=1[mGm (Wi)+ (J44)l-l for Model II, t 2?! [A]~~=,[m{1-G,(W,)}+An{1-H,(Wi)}] for Model I. (3.9.20)

Towards the finiteness of the stopping time we have the following result. Let I(z;z) be as defined in the two-sample case (i.e., I (z; z) = 1 if x 5 z and zero elsewhere). W(Z)= XG(z) + A (1 - A) H(z), X = F(0) (3.9.2 I) 132 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES and A (1 - Xo) U(V) = In2 - 1+In [A (1 - Xo)] - In -ln[X+A(l -X)]+lnW(X) +lnW(Y) -lnW(IVI) -/lnW(z)d{AG+(l-X)H}

XG(z) - - (1 - A) (A - 1) [ "'1 dH(z). (3.9.22) 0 W("> Under Ho, U(V) simplifies to (since X = and G = H)

(3.9.23) -21n (1 + A) + In [wF;;y]* Then combining the main results of Weed, Bradley and Govindarajulu (1974) and Govindarajulu (1984) we have

Result 3.9.6 If P(U(V)= 0) # 1, then the rank order SPRT terminates finitely with probability one and the stopping time has a finite moment-generating function.

Remarks 3.9.1 Suppose Xo is not specified and we estimate it by Xt = m/t and base the sequential procedure on the resulting likelihood ratio. This modified rank order SPRT will also have the property given in Result 3.9.6.

Miller (1970) has proposed a sequential signed rank test for symmetry. Let XI,X2, ... be a sequence of i.i.d. random variables having density f(x;0) which is symmetric about 8. We wish to test HO : 8 = 0 against the alternative HI : 0 # 0. Notice that the hypothesis is regarding the location of symmetry and the symmetry is a part of the assumption under both HO and HI. Thus tests of this hypothesis would be different from the test procedures developed earlier in Section 3.9.4. Let &,n denote the rank of IXi( among 1x11, ..., lXnl , and let S denote the sum of the ranks of the positive X's. That is

where I(x) = 1 if x > 0 and zero otherwise. Miller's (1970) procedure is as follows: Continue sampling as long as 3.9. SEQUENTIAL RANK TEST PROCEDURES 133

n (n + 1) (2n + 1) 1’2 24 ] and (ii) n < no.

Stop sampling as soon as (i) or (ii) is violated. If (i) is violated, decide in favor of HI. If (ii) is violated and not (i), decide in favor of Ho. no and a are selected by the investigator. These determine z (a,no) as follows. Let

For the rejection boundary on S - n(n + 1)/4 defined by fy(a, no) [n (n + 1) (2n + 1) /24]1/2, the test will decide in favor of HI if and only if Yzo 2 y (a,no). Thus, y (a,no) should be the upper a-percentile of the distribution of Yzo,that is, P {Y;,, 2 y (a,no)} = a. The behavior of the sequence {Sn} for n = 1,2, ..., no can be approximated by the Wiener process. In particular, P {Y,*,,2 y} can be approximated by the probability that a Wiener process crosses a square root barrier. Miller (1970) has carried out Monte Carlo studies on the distribution of Y,*,,for various values of no and has obtained the values of y (a,no). These are presented in Table 3.9.3 where K denotes the number of replications for each no.

Table 3.9.35 Values of y (a,no) a\n0 10 15 20 25 30 40 50 60 .10 2.02 2.16 2.20 2.22 2.28 2.33 2.37 2.39 -05 2.20 2.39 2.40 2.46 2.55 2.55 2.62 2.62 .01 2.55 2.75 2.83 2.91 2.93 3.03 3.03 3.07 ~/103 21313236

The truncation point no, especially in medical trials, is dictated by limitations on time, money, number of patients etc. Since Sn denotes the sum of the ranks of the positive X’s among the absolute X’s,

Miller (1970), by means of Monte Carlo studies, tabulates the power and average sample number of his procedure for the translation alternatives of the double exponential distribution and concludes that its power is reasonable and the ASN is less than or equal to no (equal no for large no and small values of 5Reproduced from the Journal of American Statistical Association Vol. 65, copyright (1970) by the American Statistical Association. All rights reserved. 134 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES the shifts in the ). Miller (1970) also discusses the “inner acceptance boundary.” Let sgn(x) = 1 if x 2 0 and = -1 if x < 0. Also, let Si = sgn (Xi) &i where Rji denotes the rank of IXj( among 1x11 ,..., [Xil. Then Reynolds (1975) has shown that the following statements are equivalent:

(i) S1, Sz, S3, ..., are mutually independent,

for all n >, 1 where sn is a non-zero integer in [-n,n] ,

(iii) F(-z)[l - F(O)]= F(O)[I- F(z)],x 2 0, (iv) 1x1 I and sgn (XI)are independent, and (v) Rnn and sgn (Xn)are independent for all n 2 1.

Reynolds (1975) proposes a sequential procedure for testing symmetry that is based on the test statistic

The signed Wilcoxon rank statistic is

n

on which Miller’s (1970) procedure is based. Writing

where

One can easily compute

6’ = E (q5jj) = 1 - 2F(O) 3.9. SEQUENTIAL RANK TEST PROCEDURES 135 and

where r2 = (1/3) + 67 - 5t2. If f denotes the density of F, then we will be interested in testing HO : f is symmetric about zero against H : f is symmetric about some S # 0. Then if no is the upper bound on the sample size, Reynold’s (1975) procedure is as follows: If the experiment has proceeded up to stage n, reject Ho if Zn $ (b,a) with b < 0 < a; if Zn E (b,a) and n < no take one more observation. If n = no and Zn E (b,a) stop sampling and accept Ho. In case we are interested in one-sided alternatives, then set either b = -00 or a = 00 accordingly as 6 > 0 or S < 0. If the value of the test statistic is close to zero in the two-sided test, then as n approaches no, a point is reached from which it is not possible to reach the rejection boundary irrespective of the values of the remaining Si’s. This leads us to the use of an inner acceptance boundary that enables us to accept HO at an early stage. At any stage n1 < no the maximum amount that Zn can increase or decrease while taking the remaining no - n1 observation is

i

i=nl+l Thus, if for any n 5 no, Zn is such that

nn nn 2 2 b+ -

Asymptotic Results Reynolds (1975) shows that asymptotically the test statistic behaves like a Brown- ian motion process and derives asymptotic expressions for the power and ASN functions that are based on the Brownian motion having truncated linear barriers.

One-sided Alternatives. We have

a = P (reject HOIHO) E 2@ (c); c - (;)1’2u, 136 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

exp($)a( --En0 - a )+a(*). TJno TJno Let N denote the stopping time of the sequential procedure. Then

Two-sided Alternatives Using Anderson’s (1960) results and considering only the first term in the infinite series expressions for the power and the ASN functions of the continuous time sequential procedure, Reynolds (1975) obtains, after setting b = -a,

Reynolds (1975), via simulation methods, compares his procedure, for double exponential shift alternatives, with the Wilcoxon signed rank sequential procedure given by n SR, = C sgn (xi)Rzn i=l which is not quite Miller’s (1970) test statistic, since Miller does not include the ranks enjoyed by the observations that are equal to zero. Reynolds surmises that the procedure based on SR, and his procedure are equivalent for reasonable val- ues of no and alternatives that are not too far from the null hypothesis. However, 3.9. SEQUENTIAL RANK TEST PROCEDURES 137 the SSR statistic (i.e., Reynold’s statistic) is easier to use as well as to gener- ate its null distribution because the Si (sequential ranks) are independent under the null hypothesis. Furthermore, the Brownian motion approximation seems to over-estimate the probability of rejecting HO for all values of no.

Remark 3.9.2 If the underlying density f is symmetric about some positive constant or if f has median zero but is skewed to the right, then E (Zn) is positive. Hence the sequential procedure based on Zn could also be used to test the hypothesis that f is symmetric about zero against the alternative that the distribution is symmetric having shifted mean or it is skewed having zero median. 138 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

3.10 Appendix: A Useful Lemma

Lemma 3.A.1

where

s: = - no - 1 n i=l i=no+l Proof.

i=l

n 3.11. PROBLEMS 139

3.11 Problems

3.2-1 For the following data carry out the sequential t-test for HO : 8 = 5.0 vs. H1 : 8 2 5.0 + 0.20 (0 unknown) with a = p = 0.05 :

5.4,5.3,5.2,5.0,5.4,5.9,5.4,5.1,5.4,5.2,5.7,5.9,5.0,5.0

Hint:

i=l i=l where

3.2-2 For the following data carry out the sequential t2-test for HO : 8 = 5.0 vs. HI : 10 - 5.01 > 0.20 (a unknown) with a = p = 0.05.

5.4,5.3,5.2,4.5,5.0,5.4,3.8,5.9,5.4,5.1 5.4,4.1,5.2,4.8,4.6,5.7,5.9,5.8,5.0,5.0.

3.2-3 For the data in Problem 3.2.1, using Hall's procedure test Ho : 8 = 5.0 VS. Hl : 0 = 5.2 with m = 9, a = ,B = 0.05.

3.2-4 Let X be normally distributed with mean 0 and unknown variance a2. We wish to test HO : B = 0 versus HI : 8 > 60 where 6 > 0. The following is a sequence of independent observations from the above population: -1.3, .34, -41, -.06, .94, 1.44, -.22, -.34. Carry out the sequential t-test (see Equation (3.2.4)) for S = 1/4 and for 1/2, starting with n = 2,3, etc., using a = p = 0.1.

3.3-1 For the following data with p = 4, carry out a sequential F-tkst with A0 = 0.25 and a = ,O = 0.05. The following data from Olson and Miller (1958) (or see Sokal and James (1969))6 are measurements of four random samples of domestic pigeons. The measurements (multimeters) is the length from the anterior and of the narial opening to the tip of the bony beak.

601son, E. C. and Miller, R. L. (1958), Morphological Integration, University of Chicago Press, p. 317. Sokal, R. R. and James, R. F. (1969), Biometry, W.A. Freeman and Co., San Francisco, p. 251 Problem # 9.5. 140 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

Samples 1 2 3 4 5.4 5.4 5.2 5.1 5.5 5.1 5.1 4.8 5.3 4.1 5.1 5.7 4.7 4.5 4.6 4.9 5.2 5.2 4.7 5.1 4.8 5.3 5.4 6.0 4.5 4.8 5.0 4.7 4.9 4.8 5.5 4.8 5.0 4.6 5.9 6.5 5.9 5.3 5.2 5.7 5.4 5.7 5.3 5.1 5.2 5.4 5.0 5.5 3.8 5.9 6.0 5.4 4.8 4.9 4.8 5.8 5.9 5.8 5.2 5.8 4.9 4.7 5.1 5.6 5.4 5.0 6.6 5.8 6.4 4.8 4.4 5.5 5.1 5.0 5.6 5.9 5.1 5.0 6.5 5.0

For each of the following problems carry out a sequential likelihood ratio test procedure.

3.4-1 Let 6, 15, 3, 12, 6, 21, 15, 18, 12, ... denote a sequence of independent observations from the normal population having unknown mean p and vari- ance 02.We wish to test sequentially HO : p = 8 against HI : p = 14. 3.4-2 As part of a learning experiment twelve subjects recited a series of digits. The number of correct responses out of 180 is given below for two groups of subjects, one group having had 20 practice trials and the other group 30 practice trials. * Group 1 169 97 16 113 61 77 Group 2 100 141 151 169 100 166

Assuming that the proportion of correct responses is approximately normal and that the two groups have the same variability, test sequentially HO : pl = p2 = 112 against HI : pl = 112, p2 = 314. 3.4-3 The following data* gives the Hemoglobin content in milligrams in six patients having pernicious anemia before and after three months treatment with vitamin B12. Patient 1 2 3 4 5 6 Before 12.2 11.3 14.7 11.4 11.5 12.7 After 13.0 13.4 16.0 13.6 14.0 13.8 ’* The data in problems (3.4.2) and (3.4.3) are taken from Kurtz, Thomas E. (1963). Basic Statistics. Prentice Hall, Englewood Cliffs, N.J., p. 257. The data in (3.4.2)was provided to Dr. Kurtz by Professor Donald C. Butler and the data (3.4.3)was taken from the Southern Medical Journal, Vol. 43 (1950), p. 679. Reprinted by permission of Prentice Hall Inc., Englewood Cliffs, N.J. and the Southern Medical Journal. 3.11. PROBLEMS 141

Assume that the data is normal and let pd denote the expectation of the difference between “before” and “after” Hemoglobin content. We wish to test sequentially HO: pd = 0 against HI : pd = -1.

3.4-4 Let 40,47,35,60,54,42,66,51denote independent observations from the exponential density 0-l exp [- (x - 6) /o],x 2 8. Test sequentially HO : CT = 5 against HI : 0 = 8.

[Hint: Note that n1I2(XQ) - 0) converges to zero in probability as n + 00 where X(l)= min (XI,...) Xn) .]

3.4-5 Derive an asymptotic expression for the OC function of Cox’s likelihood ratio test procedure.

3.4-6 Derive an asymptotic expression for the ASN function of Cox’s likelihood ratio test procedure.

3.4-7 Let XI,X2, ... be an i.i.d. sequence of normal variables having mean p and unknown variance 02.We wish to test Ho : p = po against HI : p = pl with 0 unknown. Obtain Bartlett’s sequential likelihood ratio test procedure for testing HO against HI.

3.4-8 Let 81 and 62 denote two binomial probabilities. Let y = 82 - 61 and 6 = 81. Set up Cox’s likelihood ratio test procedure for testing HO : y = yo versus HI : y = y1 where yo and y1 are some specified constants.

3.4-9 With the notation of Problem 3.4.8, let y = In [&/ (1 - &)]-ln [el/ (1 - Sl)] and 6 = el/ (1 - 81). Set up Cox’s likelihood ratio test procedure for test- ing Ho : y = yo versus HI : y = y1 where yo and y1 are some specified const ants. 3.4-10 Let F (2;0,s) = 1 - exp [- (x - 13)~/o1 , x 2 6. Set up the sequential likelihood ratio procedure for testing Ho : 0 = 00 against HI : 0 = 01 (with 01 > 00) when 8 is unknown.

Using y1 = y2 = y3 = .05,q = .01 carry out the Sobel-Wald sequential procedure in problems 3.5.1, 3.5.2 and 3.5.3.

3.5-1 The binomial distribution with HO : 8 = 1/4, HI : 8 = 1/2, and H2 : 6 = 314.

3.5-2 The Poisson distribution with HO : 8 = 1, HI : I3 = 2, and H2 : 6 = 3.

3.5-3 The negative exponential distribution having density f(x;8) = 8-1e--z/e for x > 0 with HO : 6 = 1, HI : 6 = 2, and H2 : 6 = 3. 142 CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES

3.8-1 Carry out Darling-Robbins’ power one test of HO : 9 5 4.5 versus HI : 8 > 4.5, B = 0.5 with m = 1, a2 = 9 for the following normal data:

5.4, 5.3, 5.2, 5.0, 5.4, 5.9, 5.4, 5.1, 5.4, 5.2, 5.7, 5.9, 5.0, 5.0

3.8-2 Carry out Barnard’s power one test for the Bernoulli problem with HO : p < 0.4 and HI : p 2 0.4 and a = .05 for the data (Xl,X2,...) = (1,L 1,0,1,1’0,1,1) - 3.9-1 For the following pairs of data (X;,Y,)carry out a sequential sign test of HO : G(x)= F(x)for all x versus HI : G = F2 using a = ,f3 = .05 :

(5.4, 5.2) (5.3, 4.7) (5.2, 4.8) (4.5, 4.9) (5.0, 5.9) (5.4, 5.2) (3.8, 4.8) (5.9, 4.9) (5.1, 5.0) (5.4, 5.1) (4.1, 4.5) (5.7, 5.4) (5.9, 4.9) (5.8, 4.7) (5.0, 4.8)

[Hint: Let p = P(X > Y).Then the problem of testing HO versus HI is equivalent to testing Hb : p = 1/2 versus Hi : p = 2/3.]

3.9-2 Carry out a two-sample sequential rank order SPRT with a = p = .05 for the data in problem 3.9.1. Chapter 4

Sequential Est irnat ion

4.1 Basic Concepts

In some application, formulation of a problem as a hypothesis-testing one would be artificial. In some of these instances, estimation seems to be more appropriate. In the fixed-sample size situation there is a close connection between acceptance regions and confidence regions, whereas that analogy does not hold in the se- quential situation. Hence, there is a need for a theory of sequential estimation. The stopping rules in sequential testing may not be meaningful in sequential estimation. In this section we formulate (1) the general involved in sequential estimation and (2) certain stopping rules. Let XI,X2, ..., be a sequence of independent random variables having common pdf f (x;e>. Let r [S (x); 01 denote the loss resulting from making a terminal decision S(z) when 8 is the true value of the parameter. We should add to the loss the cost of experimentation, namely C(N),the cost of taking N observations (where N is a random variable). The statistician’s task lies in choosing a stopping rule and a terminal decision rule (estimate for 8). Then according to Lehmann (1950, Section 2.4) the statistician might be faced with the following situations:

(i) Limited resources (forcing a bound on expectation of total cost of experimen- tation). He then seeks to minimize the risk function (expectation of the loss function) subject to an upper bound no on the expectation of C(N). (ii) Limit on accuracy (bound on the risk function). He then seeks to minimize the expectation of C(N)subject to an upper bound p on the risk function. (iii) Both losses are economically important. Here he seeks to choose a termi- nal decision that will minimize the weighted sum of the risk function and E [C “)I -

143 144 CHAPTER 4. SEQUENTIAL ESTIMATION

In general, there does not seem to be a sequential procedure that satisfies (iii) uniformly in 9 unless the criterion is modified. Such a modification is the Bayesian criterion of optimality. Regarding (i), the Cram&-Rao inequality implies that an optimal sequential procedure is the one based on a fixed-sample size no whenever the variance of its uniformly minimum variance unbiased estimator equals the Cram&-Rao lower bound. (This is the case whenever the unbiased estimator in a fixed-sample size procedure has a density belonging to the exponential family.) If the criterion is (ii), there is a justification for a sequential procedure even in the case of exponential type distributions. This has been shown by DeGroot (1959) and Wasan (1964) in the binomial case. We take this up in Section 4.2.

4.2 Sufficiency and Completeness Let X1,X2, ..., be a sequence of i.i.d. random variables having common pdf f(x;9). We wish to estimate 9 by some function S (XI,..., Xi) , while using a stopping rule which is closed (that is, for every 8, P(N 6 n) --+ 1 as n 3 00, although not necessarily uniformly in 9). The sample space is El + E2 + - - -,where Ei is contained in Ri and consists of those points (XI,..., Xi) which serve as stopping points. Again, N denotes the random number of observations taken. Throughout, we assume that the relevant conditional probabilities exist. Let Tn = T (XI,..., Xn) be a sufficient statistic for the joint density of XI,..., Xn.

Definition 4.2.1 The sequence (TI,T2, ...) is called a sufficient sequence for the sequential model. Then we have the following result of E. Fay (see Lehmann (1950)).

Result 4.2.1 (E. Fay). If, for each n, Tn = T (XI,..., Xn) is a sufficient statistic for 9 in the fixed sample XI,..., Xn, then (T', N)is a sufficient statistic for 9 in the sequential case.

Proof. This theorem was proved by Blackwell (1947) under the assumption that the stopping rule depends only on the TN's. Here, we assume only the existence of conditional probabilities. Let E be any measurable set in UzlEi. Then with E = UglAi with Ai c Ei, i = 1,2,..., Ei = {N =i) = set of stopping points in R'. Also one can look upon Ei as cylindrical sets in ROO. Consider

P (EIN = n,T, = t)

(4.2.1) 4.2. SUFFICIENCY AND COMPLETENESS 145

Now EnEn and En are sets in Rn. Since Tn is sufficient for 8 on the basis of XI, ..., Xn, both numerator and denominator of the above expression are free of 8. Hence P (EIN = n,Tn = t) is free of 6'. Thus (TN,N) is sufficient for 6' in the sequential case. 99

Let { Tn} be a sufficient sequence for the sequence of random variables X1, X2 ,. . . , Then the SPRT at stage n depends only on the observed values of Tn. Even in sequential estimation, it is reasonable to base our decision on Tn if experimen- tation is stopped at nth stage. Thus, we are led to the definition of transitiv- ity. A sequence of sufficient statistics {Tn} is said to be a transitive sequence if for every n > 1 and for each 8 E 0, the conditional distribution of Tn+l given (XI,..., Xn) = (XI, ..., xn) is equal (in distribution) to the conditional distribution of Tn+l given that Tn (Xi,..., Xn) = tn (XI, ..., xn). ,In other words, transitivity of {Tn}implies that all the information concerning Tn+l contained in Xn = (Xi,... ,Xn) is also contained in the function Tn (Xn). Bahadur (1954) showed that if a sufficient and transitive sequence {Tn} exists, then any closed sequential procedure based on {X,} is equivalent to a procedure which at stage n is based only on T,. In the case of i.i.d. random variables, {Tn} is transitive if Tn+1 (Xn+l) = qn {Tn (Xn),Xn+1} , for every n > 1. The exponential family has this property. Next, we shall consider completeness of (T', N). Assume that Tm is complete for every fixed rn.

Definition 4.2.2 The family of distributions of (TN,N) is said to be complete if Eo [g (TN,N)] = 0 for all 6' implies that g (t,, n) = 0 almost everywhere for all n> 1.

Definition 4.2.3 The family of distributions of (TN,N) is said to be bound- edly complete if for every bounded g (tn,n) , Ee [g (T', N)]= 0 for all 0 implies that g (tn,n) = 0 almost everywhere for all n > 1.

Notice that bounded completeness is clearly a weaker property than com- pleteness. Lehmann and Stein (1950) have found a general necessary condition for com- pleteness of the statistic (N,T'). It is of interest to explore the stopping rules for which (N,T') is complete. Lehmann and Stein (1950) have examined this question where the X's are normal (0, 1),Poisson or rectangular on (0,e). In the case of normal (0, l), (N,T') is complete if N = m (here T, = CZ,Xi). In the binomial and Poisson cases, Tm = Cp,Xi. Let Sm be the set of values of Tm for which we stop at the mth observation. A necessary and sufficient condi- tion for (N,T') to be complete is that the Si's be disjoint intervals, each lying immediately above the preceding one. For example, if the stopping rule is con- tinue sampling until Tm exceeds c (a given value), then (N,T') is not complete. 146 CHAPTER 4. SEQUENTIAL ESTIMATION

Similarly one can obtain a necessary and sufficient condition for the rectangular case (see Lehmann and Stein, 1950, Example 2).

Example 4.2.1 (Binomial Case). Let P (Xi = 1) = 6' and P (Xi = 0) = 1-6' for some 0 < 6' < 1. Then T, = XI + X2 + - - - + X, is sufficient for 9 when m is fixed. Suppose we are given a closed stopping rule which depends only on the Tm7s.For a given stopping rule, each point (m,T,) can be categorized as (i) a stopping point, (ii) a continuation point, or (iii) an impossible point. The sample space of points (N,TN)consists of all stopping points. Since we have a closed stopping rule, a continuation point is not a value that can be assumed by the sufficient statistic (N,TN). We are interested in estimating 6' unbiasedly. Since X1 is unbiased for 8, applying Rao-Blackwell Theorem', we find that

is unbiased for 6' and that war(Y) 5 var(X1).

Each sample (X1,X2,..., Xn) can be viewed as a path starting at the origin (0) and ending at a stopping point (n,T,) . The ithstep of such a path is either to the point immediately on the right if Xi = 0 or to the point immediately above that if Xi = 1. Obviously, a path cannot pass through a stopping point before reaching (n,Tn).The probability of any single path from 0 to the stopping point (m,t)is Qt (1 - 6')m-t. Let 7r (m,t)denote the number of paths leading to (m,t) starting at 0, and 7r* (m,t)be the number of paths leading to (m,t)starting at O* = (1,l).It would be helpful if the reader draws diagrams. Then, we have

That is, Y=r* (N,TN) (4.2.2) T (N,TN) ' Let us consider some special stopping rules:

(i) Sample offixed size m. 'The Rao-Blackwell 'Theorem states that if T is sufficient for I3 and U is any unbiased esti- mator for 8, then V(t)= E(UIT = t)is unbiased for I3 and var(V) 5 var(U). 4.2. SUFFICIENCY AND COMPLETENESS 147

Then

and consequently y = t/m.

(ii) Continue sampling until c successes are obtained. Then

Hence, y = (c - 1) / (m - 1).

(iii) Any stopping rule with O* = (1,l) as a stopping point. Then 1, if (m,t)= O* Y (m,t)= 0, otherwise. That is Y = XI.Notice that rule (ii) with c = 1 reduces to rule (iii).

(iv) Curtailed simple sampling. As in Section 1.2, we accept a lot if fewer than c defectives appear in a sample of size s, and we reject the lot if defectives greater than c are discovered. Thus the full sample of size s need not be taken, since we can stop as soon as c defectives or s - c + 1 non-defectives are observed. It is customary, however, to inspect all the items in the sample even if the final decision to either reject or accept the lot is made before all sample items are inspected. One reason for this is that an unbiased estimate for 8 cannot be found if a complete inspection of the sample is not taken. The best unbiased estimate of 8 was provided by Girshick, Mosteller and Savage (1946): (",y)- c- 1 Y(x~c)=c+z--l-c+~--I ( c-1 ) where x is the number of non-defective items examined, which is the unique unbiased estimate along the horizontal line corresponding to rejection with c > 1 defectives. Further the unique unbiased estimate along the line corresponding to acceptance for c > 1 is

where m is the number of defectives observed. Thus, the unique unbiased estimate is equal to the number of defectives observed divided by one less than the number of observations. 148 CHAPTER 4. SEQUENTIAL ESTIMATION

Girshick, Mosteller and Savage (1946) give an example of a general curtailed double-sampling plan, They also provide necessary and sufficient conditions for the existence of a unique unbiased estimate of 8. Sometimes we may be interested in estimating unbounded functions of para- meters like l/8. Hence, completeness is more relevant than bounded complete- ness. DeGroot (1959) has considered unbiased estimation of a function of 8 namely h(8). From the Cram&-Rao inequality (see Eq. 4.3.7) we have

(4.2.3) where g denotes an unbiased estimator of h(8).

Definition 4.2.4 (DeGroot, 1959). A sampling plan S and an estimator g are said to be optimal at 8 = 80 if among all procedures with average sample size at 80 no larger than that of S, there does not exist an unbiased estimator with smaller variance at 80 than that of g.

If a particular estimator for a given sampling plan, attains the lower bound in (4.2.3) for its variance at 00, then it is concluded that the estimator and the sampling plan are optimal at 80 and the estimator is said to be efficient at 80. DeGroot (1959) has shown that the (fixed) single-sample plans and the inverse binomial sampling plans are the only ones that admit an estimator that is efficient at all values of 8. For inverse sampling plan, DeGroot (1959) has given an explicit expression for the unique unbiased estimator of h (1 - 8). The stopping points of an inverse sampling plan is the totality of {ylT (y) = c} where T denotes the number of defectives at the point y. Then, for each nonnegative integer k there exists a unique stopping point yk such that N (yk) = c + k. Since

P (N = C + k) = (k+~-')eeqk, q = i - e. (4.2.4)

Then, for any estimator g

(4.2.5)

Result 4.2.2 (DeGroot, 1959). A function h(q) is estimable unbiasedly if and only if it can be expanded in Taylor's series in the interval lql < 1. If h(q) is estimable, then its unique unbiased estimator is given by

(4.2.6) 4.2. SUFFICIENCY AND COMPLETENESS 149

Proof. h(q) can be expanded in Taylor's series in the given interval if and only if h (4)/ (1 - q)" can be expanded. Then, suppose that

That is 00 h (4) = 8" C bkqk. k=O and taking

yields an estimator g with E (9) = h(q). Suppose now that h(q) is estimable unbiasedly. Then, there exists an estimator g such that

or

Hence k+c-1

The uniqueness of g (Y~)follows from the uniqueness of the Taylor's series expan- sion, which is the basis of the completeness of this sampling plan. This completes the proof. I

It is often possible to find the expectation of a given estimator in closed form by using the fact that if the series

00 (4.2.7)

is differentiated m times within its interval of convergence, then

(4.2.8)

As an illustration of the technique involved, the variance of an unbiased esti- mator of 8 and the moment-generating function of N will be determined. 150 CHAPTER 4. SEQUENTIAL ESTIMATION

Result 4.2.3 (DeGroot, 1959). Let g (yk)= (c - 1) / (k + c - 1) which is an unbiased estimator of 8 for c 2 2. Then we have

- 1) ec (-1)'-l In8 c-2 + - (4.2.9) (g2>= qc-l z i= 1 where q = 1 - 8 and (4.2.10)

Proof.

00 E (g2) = ec (. - q2C + - k=O 1 dC-l = qC-q2-- r2qk , (c - I)!dqC-l [g ] after using (4.2.8). Note that the constant term in the last series on the right side is taken to be zero. Its value can be assigned arbitrarily since it does not appear in the derived series. However,

k=O J k=l 00

k= 1

Consequently, - dC- 1 -dc-2 ln(1 - q) K2qk = -[ ] Ig ] dqc-2 where 4.2. SUFFICIENCY AND COMPLETENESS 151

Using this would complete the proof for (4.2.9). For t < ln(l/q),

= (eet)>"(1 - 4et)-c.

This completes the proof for (4.2.10).

Remark 4.2.1.1 Haldane (1945) gives E(g2) in the form of an integral which after repeated integration by parts would yield (4.2.9).

Corollary 4.2.3.1 E(N)= c/9, and var(N) = c(l - 8)/e2 achieving the Cram&-Rao lower bound (4.2.3). Thus N is an eficient estimator of its expected value. Notice that g (rk) = (c - 1) 1 (k + c - 1) is not eficient in this sense. The eficient unbiased estimator of EN (with h (4) = c/ (1 - q) in (4.2.6)) is given by C! (c- k + 2) (c- k + 3) - * - (c+ 1). [ (k + c - l)!1

In the following we shall state DeGroot's (1959) theorem pertaining to the optimality of the single sample plans and the inverse binomial sampling plan.

Result 4.2.4 (DeGroot, 1959). For all stopping points y such that N (y) = n, any non-constant function of the form a + bT is an eficient estimator of a+bn9, and these are the only eficient estimators. For y such that T (7)= c, any non-constant function of the form a+ bN is an eficient estimator of a+ bc (lp), and these are the only eficient estimators where T denotes the number of defective items.

For the problem (ii) posed in Section 4.1 (that is, to choose a procedure S (X) which minimizes E(N) subject to E [S (X)- el2 5 p) Wasan (1964) has shown that the fixed-sample procedure is admissible and minimax [see Wasan (1964, Theorem 1, p. 261)]. Consider the following symmetric curtailed sample procedure S* (k) whose stopping points are (k + i, k) and (k + i,i) , i = 0, 1, ..., k - 1. Here k+i--l P(N = IC+~,T~= IC) = )ek (1 - el2 i = 07i,...?k - 1 (4.2.11) ( k-1 152 CHAPTER 4. SEQUENTIAL ESTIMATION and IC4-i-1 P(N=k+i,TN=Z)= )oi((1-6)", i=O,l, ..., k-1. (4.2.12) ( k-1

Then the unbiased estimate of 8 (proposed by Girshick et al., 1946) is given by

for i = 0,1, ..., k - 1. Wasan (1964) studied the asymptotic optimality of g (yk) for large k. This is given in the following theorem.

Theorem 4.2.5 (Wasan, 1964). We have

(4.2.14) and

Hence g (yk)is asymptotically uniformly better than the fixed-sample size procedure. Wasan (1964) also demonstrates that the fixed-sample size procedure with m = I/ (24)has the smallest total risk equal to ,/Z = rnaxo<0<1 [cm+ 0 (1 - 0) /m] among all procedures having bounded Ee (N). Thus, the fixed-sample procedure with m = 1/ (2&) is minimax and admissible for problem (iii) posed in Section 4.1.

4.3 Cram6r-Rao Lower Bound

Suppose the statistician is interested in solving problem (i) posed in Section 4.1, that the terminal decision is an estimate for 8, and that r (6 (x); 0) = [S (x)- el2. In this case, if we restrict ourselves to unbiased estimates of 0 we would then be interested in lower bounds for the variance of such estimates. The Cram&-Rao inequality was extended to the sequential case by Wolfowitz (1947). Towards this we need the following lemma pertaining to a random sum of i.i.d. random variable.

Lemma 4.3.1. Let SN = X1 + X2 + - - - + XN where the Xi are 2.i.d. 4.3. CRAMER-RAO LOWER BOUND 153

(2) If E(N)< 00 and E(X(< 00, we have E(,S")= E(N)E(X).

(ii) If E(X)= 0, E(X2)< 00, and E(N)< 00, E(S$) = E(N)E(X2).

(iii) If E [g(X)]= E [h(X)] = 0, E [g2(X)]< 00, E [h2(X)]< 00 and P(X = 0) < 1, then

Proof. Notice that (i) and (ii) follow from Theorem 2.4.2. (iii) has been estab- lished by Lehmann (1950) under a different set of sufficient conditions. Consider

where Y, = 1 if N 2 i and zero otherwise. So, left hand expression

i=l i< j

i> j 00

i=l i

since for i < j, 5 = 1 implies that Y, = 1.

In order to interchange expectation and infinite summation, the series should 154 CHAPTER 4. SEQUENTIAL ESTIMATION be absolutely integrable. That is

after applying Cauchy-Schwarz inequality and using the independence of Xi and Y,. From Theorem 2.2.1, we infer that there exists a p(0 < p < 1) such that

provided P (X = 0) < 1. Hence

i= 1 i=l

< 00.

Wolfowitz (1947) has given a generalized version of Wald’s equation [Lemma 4.3.l(i)] which is valid even if the underlying variables are dependent.

Lemma 4.3.2 (Wolfowitz, 1947). Let ui = E (XiIN 2 i) exist for all i where i assumes positive integers for which P (N 2 i) # 0. Then, write ui = E (IXi - vil IN 2 i), i = 1,2,... and assume that the series Czl x P(N = i) converges. Then

Proof. Consider 4.3. CRAMgR-RAO LOWER BOUND 155 where yZ = 1 if N 2 i and zero elsewhere. Then

i= 1 = 0.

The interchange of integration and infinite summation is justified because

00 00

i=l i=l

< 00.

Corollary 4.3.3.1 (Wolfowitz, 1947). Let the Xi be independent r.v. 's having different distributions. Suppose that all ui are equal, except perhaps for those i for which P (N > i)= 0. Further assume that E (N)exists. Then (4.3.2) holds.

Notice that Corollary 4.3.3.1 is a special case of Lemma 4.3.2 and is a general- ization of Lemma 4.3.1(i). If the random variables Xi are i.i.d., then u = E (Xi) and v' = E IXi - vI , i = 1,2, .... In the sequential sample space, let El be the set of values of X1 for which we stop after taking one observation. Let Ei be the set of values of XI,X2, ... ,Xi for which we stop after taking i observations. Notice that (~1~x2,..., xi) E Ei implies that (~1~x2,..., xj) $ Ej, j = 1,2, ..., i - 1 (i = 1,2, ...) . The totality of stopping points is El U E2 U - - -.Recall that N denotes the random number of ob- servations taken. Then if we define g (q,22, ..., ZN) = g (XI,22, ..., xi) whenever (xi,z2, ..., xi) E Ei (i = 1,2, ...) we have

We assume that Po (uy Ei) = 1.

Theorem 4.3.1 (Wolfowitz, 1947). Let TN = S (Xl,X2,..., X,) be an es- timate of 0, such that Ee (TN)= 8 -/- b (8). Suppose that differentiation un- derneath the summation and integral signs is permissible in Ee (1) = 1 and 156 CHAPTER 4. SEQUENTIAL ESTIMATION

Ee (T') = 6' + b (0) where b' (0) exists. Then

(4.3.4)

Proof. We have

i=l

That is, Ee (SN)= 0 where

(4.3.5) i=l provided (af/ae)dx = 0 and Ee laln f/%l < 00. Differentiating Ee (T') = 8 + b (0) similarly yields

However, from Lemma 4.3.1(ii) we have

provided Ee (alnf/a812 < 00.

Remark 4.3.1.1 (Lehmann, 1950). If we are restricting ourselves to sequen- tial estimation procedures for which Ee (N)5 no and the regularity conditions of Theorem 4.3.1 hold for all such estimates, then for every unbiased estimate 7'' of 8. we have

(4.3.6)

In the case of a normal distribution with mean 8 and variance unity, binomial and Poisson distributions, one can establish the validity of (4.3.6) for all unbiased estimation procedures for which Ee (N)5 no provided there is an M such that P (N 5 M)= 1. this additional restriction to bounded procedures is inconvenient from the theoretical point of view although in practice it is no restriction at all since A4 could be fairly large. 4.3. CRAMER-RAO LOWER BOUND 157

Remark 4.3.1.2 If TN is an unbiased estimator of h(8)then one can anal- ogously show [see, for instance Wolfowitz, 19471 that

(4.3.7)

If 8 = (81,82, ..., 8,)' and if we are interested in unbiased estimation of a single component say 81, then Wolfowitz has also obtained

(4.3.8) where TN,~denotes any unbiased estimate of 81.

Example 4.3.1 Let X be distributed as normal with mean 81 and variance 82. Let T',1 be an unbiased estimator of 81. Then

Blackwell and Girshick (1947) have shown that the lower bound given by (4.3.4) for the variance of an unbiased estimate of 8 is attained only for the sequential procedure for which P(N = n) = 1, if the probability density function f(x;8) of X is such that E(X)= 0 and X1 + X2 + - - - + X, is sufficient for 0 for all integral values of m where XI,X2, ..., Xm are independent observations on the random variable X. Seth (1949) has extended Bhattacharyya's (1946) bounds to the sequential case which in some respects are more general than those of Wolfowitz (1947). In the following we shall give his result specialized to unbiased estimates of 8 on the basis of i.i.d. observations.

Theorem 4.3.2 (Seth, 1949). Let TN = T (XI,X2, ..., XN) be an unbiased estimate of 8 having a finite variance and let 9 lie in an open interval I. The derivatives dif (x;6) ldBi (i = 1,2,..., k) exist for all 8 in I and almost all x. Let

and ((xij)) = ((xii))-l. (4.3.10) Further assume that Eo (1) and Ee (TN)are differentiable underneath the infinite summation and the integration at least k times. Then, we have

war (TN) 2 A". (4.3.11) 158 CHAPTER 4. SEQUENTIAL ESTIMATION

Corollary 4.3.2.1 If k = 1, Theorem 4.3.2 reduces to Theorem 4.3.1.

Seth (1949) has also obtained conditions under which the inequality in (4.3.11) (and hence, in (4.3.4) becomes an equality.

4.4 Two-Stage Procedures 4.4.1 Stein's Procedure for Estimating the Mean of a Normal Distribution with Unknown Variance It is known that there does not exist a fixed-sample size procedure for estimating the mean of a normal population (when the variance is unknown) with a con- fidence interval of fixed-width and specified confidence coefficient. Stein (1945) has presented a two-sample procedure, in which the size of the second sample depends upon the result of the first sample, for the problem of determining con- fidence intervals of preassigned length and confidence coefficient for the mean of a normal population with unknown variance. In order to make the length of the confidence interval free of the variance, it seems necessary to "waste" a small portion of the information contained in the sample. Thus, in practical applications one would, if possible, modify this procedure still preserving this property, and use an interval of the same length, whose con- fidence coefficient (although a function of 0)is always greater than the desired value and at the same time, reducing the expected number of observations by a small amount. The two-sample procedure will be a special case of sequential esti- mation. It is further shown by Stein (1945) that if the variance and initial sample size are sufficiently large, the expected number of observations differs only slightly from the number of observations required by a single sample interval estimation procedure when the variance is known. Let Xi (i = 1,2,...) be independent normal variables having mean 0 and cr2 (unknown). We wish to estimate 8 by a confidence interval of specified length 2d and specified confidence coefficient 1 - a. Take a sample size no observations XI, X2, ...)Xn, and compute the sample variance given by

1 (4.4.1) no - 1 xi)2]* (ci= 1 Then, define n by ,=mu{ [3+l,nO+l} (4.4.2) where z is a previously specified positive constant and [-I denotes the largest integer less than (.). Now, take additional observations Xno+l,...) Xn. Also, choose real numbers ai (i = 1,2, ...)n) such that 4.4. TWO-STAGE PROCEDURES 159

(i) Cz1ai = I, a1 = a2 = - - = ano and

(ii) s2 Cy=laf = z.

This is possible since n 12 minxaf = - 5 - by (4.4.2) (4.4.3) n s2 the minimum being taken subject to the condition (i). Then define the statistic u by (4.4.4)

aiXi (a1 = a2 = - - - = ano) is independent of s2 because of normal samples and -xE:+laiXi is independent of s2 because they are based on two mutually independent set of observation. Write

For given s, U is distributed as normal with mean 0 and variance 02/s2. It was shown in Section 1.4 that U is distributed as Student’s t with no - 1 degrees of freedom. A confidence interval for 8 of specified length 2d and confidence coefficient 1 - a is then given by

n n +d (4.4.5) i=l i=l where (4.4.6) tno-l,l-a/2 = lOO(1 - a/2) percentage point of the t distribution with no - 1 degrees of freedom. The distribution of n, the sample size is given by

where y = (n: - 1) z/02, and for w > no + 1 160 CHAPTER 4. SEQUENTIAL ESTIMATION all other values of v being impossible. Hence

00 E(n) = (no+ l)P(n=no+l)+ c wP(n=v) v=n0+2 00 = (no + 1) p (x:o-l < Y) + J UP(n = v) dv v=n0+2 fno-1 (u)du1 dv where fno-l (u)denotes the chi-square density with no - 1 degrees of freedom. Thus after interchanging the variables of integration, we have

By replacing the integrand v by the upper and lower limits of integration on v one can get the bounds for E (n)obtained by Stein (1945). However, one can obtain an exact expression for E (n). By performing the integration on v

Thus,

1 o2 (4= -2 + (no + ;) p (x:o-l < Y> + (no- 1) p (X?L0+l> Y) (4.4.7)

For moderately large no, one can use Fisher's approximation to the chi-square, namely P (a- (2v + 1p2I t) @ (t) or p(x;~s)=~(,&dWi). Then (4.4.7) becomes

\ I (4.4.8) Thus E (n)is a function of o2 and can be evaluated from the chi-square tables or normal table. 4.4. TWO-STAGE PROCEDURES 161

As mentioned earlier, in practical applications instead of (4.4.2) we take a total of n=max [;I + Mo} (4.4.9) observations and define

(4.4.10) where U' has the t distribution with no - 1 degrees of freedom. By (4.4.2), n > s2/z, so that although dn1I2/s is random,

(4.4.11)

Thus

provided z is defined by (4.4.6). Thus the interval

(4.4.12) has length 2d and the probability that it covers the true parameters is a function of u but is always greater than 1 - a and differs slightly from 1 - a provided u2 > noz. Also E(n)will be reduced from that in (4.4.7) by P (xiovl< 9). Thus (4.4.12) can be used instead of the confidence interval (4.4.5). From (4.4.7) it follows that (4.4.13)

Thus, the approximation E(n) = 02/zbeing fair provided u2 > xno. The length of the confidence interval is given by

When u2 is known, the length of the single-sample confidence interval of confidence coefficient 1 - a obtained on the basis of n observations is given by

Hence, if no is moderately large (say > 30) the expected number of obser- vations of a confidence interval of given length and confidence coefficient is only slightly larger than the fixed number of observations required in the single-sample case when the variance is known provided the variance is moderately large. 162 CHAPTER 4. SEQUENTIAL ESTIMATION

4.4.2 A Procedure for Estimating the Difference of Two Means Putter (1951) considered the problem of estimating the mean of a population composed of a known number of normally distributed strata whose relative pro- portions are known. [see also Robbins (1952, p. 528)j. Ghurye and Robbins (1954) considered the problem of estimating the difference of two means by a two-stage procedure and in the following we shall present their results. Let IIi be a population with unknown mean Bi and variance a: (i = 1,2); we wish to estimate the difference 81 - 82. Let (n)be the mean (Xi1 + Xi2 + - - - +Xi,) of a sample from Hi. Then TI (nl)- T2 (n2) is an unbiased estimate of 2 81 - 82, with variance (a?/ni). Let the cost of sampling be an unknown linear function of the number of observations. The cost of taking a sample of size n1 from IIl and a sample of 722 from II2 is a1n1 + a2n2 + a3. If there is a prescribed upper bound A0 to the cost of sampling, n1 and 722 are subject to the restriction a1n1+ a2722 5 A = A0 - a3 (4.4.14) The quantity xi(aT/ni) which is equal to the variance of 2-1 (nl)-T2 (712) for integral values of the ni, is minimized for continuous ni > 0 subject to (4.4.14) by taking ni = ng where

(4.4.15) the minimum value is given by

(4.4.16)

When the ratio p = 02/01 on which the optimum value (4.4.16) depends is unknown, one can use a two-stage procedure for estimating 81 -&, first by taking a sample of ml + m2 observations, mi from IIi, and then using estimates of ai obtained from this preliminary sample to distribute the remaining observations between the IIi. We shall investigate the performance of this estimation procedure when IIi are normal.

Normal Populations When the IIi are known to be normal, choose positive integers mi such that ulml+ u2m2 < A and take mi observations from IIi. Let

(4.4.17)

(4.4.18) 4.4. TWO-STAGE PROCEDURES 163

(4.4.19)

(4.4.20)

fii = [na],(i = l,2). (4.4.21) where [XI denotes the largest integer contained in x. Having computed fii we take (fii - mi) additional observations (Xij,j = mi + 1, ...,iii) from lli, and estimate 01 - 02 by (4.4.22)

Let (A)= war {TI(fil) - T2 (fiz)}. Now, one can write

where

Since the fii depend only on the si(mi) for fixed sir the random variables TI (rnl),T2 (mz),2'1 (fil - ml),T2 (62 - ma) are mutually independent, and the conditional distributions of T, (mi)and T,(iii - mi) are respectively normal (Oi, af/mi) and normal (0,,0:/ (fii - mi)).Hence, for fixed si the conditional distri- bution of TI (fil)- T2 (62) is normal (01 - 02, xi(of/fii)). Hence

E [Tl(fil) - T2 (fi2)] = O1 - 02 and V(A)= E [ 2 4. (4.4.23) Ghurye and Robbins obtained an explicit expression for v(A)and also an approximation to it namely V*(A) when [-]'s [the largest integers contained in (-)'s] are replaced by (-)'s. They tabulate the ratio V*/Vofor selected values of the parameters, p = 02/01,ma/A and A/. where a = a1 = a2 and m = m1 = m2. Based on these computations, they infer that the two-stage procedure provides considerable improvement over the usual one-stage procedure for values of p away from 1; and the performance seems to be best for ma/A in the neighborhood of all (01 + 02) if 01 < 02. 164 CHAPTER 4. SEQUENTIAL ESTIMATION

4.4.3 Procedures for Estimating the Common Mean

Let II1, I32 be two populations having the common mean 8 and variances a:, 0:. We wish to estimate 8 using a fixed number of observations. If the population variances are known, the efficient procedure would be to take all n observations from that population having the smallest variance. When prior information about the variances is not available or is too vague to be quantified, it is natural to consider the procedure which consists of taking a preliminary sample of size m from each population, computing estimates of the variances, and then taking the remaining n - 2m observations from the population having the smaller estimated variance. If m is too large or too small, the advantage of the two-stage sampling scheme over the procedure of simply taking n/2 observations, from each population will be lost. Hence it is of interest to determine for some good estimator an optimum choice of m as a function of n, not dependent on the unknown variances. As an example suppose that there are two devices for measuring a physical constant, that each measurement is expensive or time consuming so that their total number is limited, and that we wish to estimate the constant as accurately as possible. Richter (1960) considered this problem and his results will be given below. Let Xil,Xi2, ..., Xi,m be a random sample from Hi (i = 1,2) and let 7 = a;/a; . Also, let (4.4.24) so that l/R is the usual estimator of 7 based on 2m observations. Then take obser- vations Xl,rn+l)Xl,m+2) ..., X1,n-m if R < 1 or take observations X2,m+l, X2,m+2, .-.,X2,n-m otherwise. Write yi,~~= cy$ XijlN;, i = 1,2. We will consider estimators 8 of 8 which are of the form (4.4.25) where N1, N2, A1 and A2 are random variables such that N1 = n - rn if R < 1, N~=rnifR~l,N1+N2=n,O~Ai~l,(i=1,2)andA1+A2=1 probability one; and besides, where A1 and A2 are such that

(4.4.26) EH [ytk]= E [x;'], (i = 1,2) , for all 1 and k where EH [.] = E [.lH] and H = (N17N2,A1,A2).If the Xij are assumed to be normally distributed, then, the sample mean and sample variance are inde- pendent. Hence, the above assumption may be replaced by the assumption that A1 and A2 are functions of the sample variances only. Estimators of the form 4.4. TWO-STAGE PROCEDURES 165

8 seem reasonable since, if observations are- available on normal variables Xij, j = 1,2, ..., ni, i = 1,2, and q is known, alXllnl+u2y2,n2 is uniformly minimum variance unbiased estimator of 0, where a1 = nlq/ (nlq + n2) and a2 = 1 - al. Next, let & = (l/n) min (a:, o;),which is the variance of the standard es- timator of 8 for the case when sgn(ai - u:) is known beforehand, and define 2 & (m;q) = VCIE(8 - 0) to be the risk function associated with the estimator 8. The first part of the following theorem implies that & (m;q) = VC'vur

Theorem 4.4.1 (Richter, 1960). For any estimator of the form 8,

Proof. Since Next

which proves (ii) since R, (m;q) = VclE [EH (A0 - 0)2] . Finally, A?/N1 + qAi/N2 has a unique minimum with respect to A1 = 1-A2 at A1 = Nlq/ (N2 + N1q: so that E [A?/Nl+qA;/N2] 2 qE [(Nz + Nlq)-'] which proves the left-hand inequality of (iii) ; since N2 + N1q 5 n max (1,q)the proof is complete.

Let us examine the risk function for the usual one-stage experiment for es- timating 0, which would be to observe n/2 of the Xlj and n/2 of the X2j. If we confine ourselves to unbiased estimators 0', and assume the variables to be normally distributed, then war (8') > 2&n (1 + q), since (qx1 + x2) / (1 + q) is the minimum variance unbiased estimator with variance 24/72 (1 + q) when the variances are known. Then

since max (1,q) = 1/ min (l,l/q),and 166 CHAPTER 4. SEQUENTIAL ESTIMATION with equality if and only if 77 = 1. Hence, for each fixed 77 # 1, the risk function is bounded away from unity independent of the sample size. One would hope that the risk function for the two-stage scheme would prove to be smaller, and we shall show that it is so for large samples at least provided m is suitably chosen. For the two-stage experiment, it is clear that once an estimator is specified, the only variable left at the statistician's disposal is the quantity m. Then given an estimator of the form 6, we may say that any real-valued function m(n)such that 4 5 2m(n)< n for all n 2 5 is a solution to the problem. With respect to an estimator of the form 6, m(n)will be called a uniformly consistent solution (UCS) if supv R, [m(n);771 --+ 1 as n --+ 00. We shall restrict to such solutions if they exist. Further, if supv l& (m;7) < 00, a solution which minimizes supv & (m;77) will be called a minimax solution (MMS). If there exists a UCS then a MMS is UC too. Hence the minimax principle provides a means of selecting one solution from the class of UC solutions.

A Simpler Estimator In the following, we shall derive an asymptotic minimax solution for a particular unbiased estimator. For the subsequent considerations, we shall assume that the Xij are normally distributed. Hence rlR = s~ai/sia~has the F-distribution with m - 1,m - 1 degrees of freedom, and we write

K(m;q) = P(R>l)

Now define 61 = A~X~,N,+ A2132,~~, where A1 = 1 or 0 according as R < 1 or R > 1. This estimator has the form of 8 and by Theorem 4.4.1, 8 is unbiased

(4.4.2 7)

It is easy to show that R1n (m;77) = R1n (m;1/77) by using the fact that K (m;77) = 1- K (m;1/77); thus, when considering supq R, (m;7)) we can assume that 77 2 1. Thus, we have the following result towards the MMS for 6. Theorem 4.4.2 (Richter, 1960). The minimax solution for 6 is m(n) = (~n/2)~/~+ O(n1l3) and min, maxv R1n (m;77) = 1+ 3(~/2)~/~n-l/~+ O(n-2/3) where c = 2rt@(-7') and r' is the solution of the equation:

!D (-7-) - rq5 (r)= 0. 4.4. TWO-STAGE PROCEDURES 167

Proof. See Richter (1960, Theorem 2). Note that 0.7 5 r 5 0.8. W

A Class of Estimators One may ask if better (in the sense of smaller risk) estimators exist and if they do, if results like Theorem 4.4.2 can be found for such estimators. Richter (1960) provides an affirmative answer to both these questions. Let

whose risk is Rzn (m;7) = nmax (1,~)- E [I/ (N17 + Nz)];then R2n (m;7) is a lower bound for the risk of all estimators of the form 8 by Theorem 4.4.1 (iii). However, when v is unknown, one can replace it by fj where fj + 7 in probability. It is mathematically convenient to use a ij based on the first stage only; we take fj = 1/R and define

Another motivation for 83 is as follows. When q is known, for a one-stage experiment, the uniformly minimum variance unbiased (UMVU) estimator is (n1~y1,~~+ ~1222,~~) / (nlq + 722) . However, when nl,722 and 7 are unknown, taking $ to be the usual estimator of v based on 2min(nl,n2) observations, and replacing n1,na by the random variables N1, N2 we obtain 83. Another esti- mator which might be considered is 84, the grand mean of all the observations: 84 = (NI~I,N,-I- N2x2,~~) /n. For 84 there exist no UC solution and no non- trivial MM solutions. For 82, Richter (1960) obtains a theorem similar to that of Theorem 4.4.2.

4.4.4 Double-Sampling Estimation Procedures Suppose that we are interested in estimating an unknown parameter 8 having specified accuracy, using as small a sample size as possible. The accuracy could be in terms of variance ~(8))some given function of 8. Another problem of interest is to estimate 8 by a confidence interval having a specified width and a specified confidence coefficient y. Since it is not possible to construct an estimate meeting the specifications on the basis of a sample of fixed size, one has to resort to some kind of sequential sampling. Cox (1952b) proposed a double sampling procedure for the above problem. The basic idea is to draw a preliminary sample of observations which determines how large the total sample size should be. Stein’s (1945) two-stage procedure for the normal mean is a special case of the double sampling method since the underlying distribution is known. Furthermore, the double sampling methods of Cox (195213) differ from those used in industrial inspection because, in the latter case, the second sample is of fixed size. Although the theory of double 168 CHAPTER 4. SEQUENTIAL ESTIMATION sampling developed by Cox (195213) is primarily for large-sample sizes, hopefully it is reasonable for small sample sizes. In the following, we present an estimate of 8 having bias 0 (no2) and variance a (8) [1+ 0 (nt2)]where no is the preliminary sample size and a (8) the specified variance.

Estimation with given Variance: Single unknown Parameter Let 8 be the unknown parameter we wish to estimate with a specified variance equal to a function of 8, namely a(0) which is small. Assume the following: For a fixed-sample size m, one can construct an estimate T(m)of 8 such that

(i) T(m)is unbiased for 8 with variance u (8) /m. (ii) the coefficient of T(m)is of order y1 (8) m-l/' and of T(m) is 0 (m-l) as m becomes large;

(iii) Asymptotic means and standard errors can be derived for a (T(m)),u (T'")) by expansion in series.

Procedure;

(a) Take a preliminary sample of size no and let TI be the estimate of 8 from this sample. (b) Let ii (Tl) = n (Tl)[1+ b (531 (4.4.28) where

with m (0) = l/n (0). (c) Take a second sample of size max [0, ?i (TI) - no] and let T2 be the esti- mate of 0 from the second sample. (d) Define

(4.4.30)

and T - m' (T)u (T), if no 5 ?i (TI) TI= { (4.4.31) Tl 7 if no > ?i (7'1) ' 4.4. TWO-STAGE PROCEDURES 169

(iv) If no < n (8) and the distribution of m (TI)is such that the event n (TI)< no may be neglected, then T' has bias and variance of the desired orders.

Example 4.4.1 Suppose we wish to estimate the normal mean 8 with frac- tional standard error all2. Here a (0) = d2,n(0) = 02/ (aB2) and b(0) = 8a + u2/ @zoo2).Thus the total sample size is n (Tl)[I + b (Tl)] = fi (Tl) - o2 8a2 o4 @ +-+-TI noaTf where TI is the mean of the initial sample. T' = T(l - 2a) where T is the mean of the combined sample. We should choose no sufficiently large so that no << 02/(a02).

Example 4.4.2 Suppose we wish to estimate the binomial proportion 8 with specified variance V. Here we set a (0) = V, u (0) = 8 (1 - 0) and y1 (0) = (1 - 2e) [e (1 - e)~-l/~. Then

v (28 - 1) zv [I - 3e (1 - ell m' (e) = and m" (8) = e2 (1 - e)2' e3 (1 - e)3 Thus 3 - se(i - e) 1 - 3e(i - e) b (e) n (0) = + e (1 - s) n0

Consequently, the total sample size n is a1 (1 -Al)e +3-8i1 (1-e A1) + 1 - 381 (1 - 81) n= V 8, (1 - al) Vn0

where no is the initial sample size and 61 is the estimator of i9 based on the preliminary sample.

Example 4.4.3 (Estimation of the binomial 8 with specified = c1I2). Here a (8) = 02c, u and yl, are as in Example 4.4.2. Thus ce m(e)= - (1 - e) - Hence C 2c m'(e)= - m'' (0) = - (1 - e)2' (1 - e)3* 170 CHAPTER 4. SEQUENTIAL ESTIMATION

Then computations yield 3c 1 b(6) = -+ (1 - 6)2 no (1 - 8) and

where 81 is the estimate of 6 based on preliminary sample of size no.

Estimation in the presence of a Nuisance Parameter We now suppose that in addition to the unknown parameter 6, which is to be estimated with variance a (6) which is small, there is an unknown nuisance para- meter $. Assume that in samples of any fixed size m, we can find estimates T(") and C(") of 0 and $ such that

(i) T(") and C(") are unbiased estimates and have variances $/m and rG2/m where T is asymptotically constant,

(ii) if m is large, asymptotic means and standard errors can be developed for combinations of T(m),C(") and a (57'")) by expanding them in Taylor series.

Procedure: Take a preliminary sample of size no and let TI, C1 be the estimates of 9 and 1cf based on the initial sample. Set

(4.4.32) where r - (4.4.33) n0 Take a second sample of size max [0, {ii (TI,Cl) - no}]and let T2 be the estimate of 8 from it. Set

(4.4.34) and (4.4.35)

Then, assuming as before that 4.4. TWO-STAGE PROCEDURES 171

(iii) the possibility that no > ii (TI,Cl) can be neglected, one can show that T' has bias 0 (no-2)and variance a (0) [1+ 0 (no2)].

Example 4.4.4 (Estimation of a normal mean with given standard error Let the method be based on the sample mean. Then $ is the unknown population variance u2 and is estimated by the usual sample variance, that is, take C1 = s:. Then r = 2, a (0) = a and

(4.4.36) and the final estimate is the pooled sample mean T,which can easily be shown to be unbiased. The expected sample size is u2 (1 + 2/no)/a. Thus, ignorance of u increases the sample size by the ratio (1 + 2/no). Cox (1952b) section 4) shows that except when the preliminary sample size is small, the expected sample size of the best double-sa,mpling procedure is slightly larger than that of the best sequential procedure.

Example 4.4.5 (Estimation of a normal mean with specified coefficient of variation c~/~).Here set a (8) = Q2c, $ = u2,Tm = $> $m = s: and obtain the total sample size to be

where ?j:and s? respectively denote the sample m.em and sample variance based on the preliminary sample of size no.

Confidence Intervals Suppose that we want to estimate 9 by a 7%confidence interval of predetermined form, Further, suppose that we have obtained an estimate T' after a sampling procedure designed to give a variance a (0) . If we want a (1 - 2a)100% confidence interval for 8, we let xlVa denote the (1 - a)thpoint on the standard normal distribution. Then define 8-, 8+ by the equations

(O-, 13,) will be the required confidence interval. If an explicit solution is impos- sible, Equation (4.4.37) is solved by the successive approximation method given by Bartlett (1937). 172 CHAPTER 4. SEQUENTIAL ESTIMATION

Example 4.4.6 Suppose that TI is the estimate obtained after the procedure of Example 4.4.1 for estimating a normal mean with given fractional standard error all2. Then a (0) = a02 and (4.4.37) yields for the 95% confidence interval

e- = e+ = (1 + 1.96~'/~)' (1 - 1.96~~/~)*

Formula (4.4.37) assumes that TI is normally distributed. A refinement of the method depends on evaluating the skewness yl, and kurtosis y2 of TI and making a correction for these based on the Cornish-Fisher expansion. We shall illustrate the method by the following example.

Example 4.4.7 Confidence interval of given width for a normal mean, vari- ance unknown. Suppose we wish to construct a confidence interval after the procedure of Example 4.4.2 for estimating a normal mean 8 with given standard error If TI is the final estimate, in this case, the sample mean, the (1-2a)100% confidence interval is, from (4.4.37) (TI - z1-aa1/2, TI + z~-,al/~).Now, it can be shown that for the distribution of T', y1 is zero and y2 is 675'. Thus, the Edgeworth expansion for the distribution of the standardized T' is given by

Now if we set F (x)= 1 - a and wish to find x,then we must solve for x. Since in the case of TI, y1 = 0 and y2 = 6/71,)) we have to solve for x such that 1 F (x)2 @ (x)- - (32 - x3) cp (x). 4 As a first approximation, z = @-' (1 - a) = = x (say). Thus we wish to find a refinement of x,namely x*, from the equation 1 F (z*)= @ (2)- - (x3 - 32) (2), 4no or

Now expanding (-) around @ (z) we obtain 1 1 z* = @-I [a(41 + - (z3- 34cp (x) +... 4n0 cp {@-l [@ (41> 1 = + - (x3 - 32) . 4n0 4.4. TWO-STAGE PROCEDURES 173

Note that the above can also be gotten from the Cornish-Fisher inversion of the Edgeworth expansion. Thus, the normal multiplier should be replaced by xl-, + (x:-, - 3tl-,) /4no = ZT-~, say. If we use the normal multiplier, the width of the confidence interval is 2~1-,u~/~,and if we use the corrected multiplier, the width is 2~;-,a~/~.To solve Stein’s problem so that the confidence interval is of width A, we take u1/2 = A/ (2x1-,) or A/ (22,3-,) . The corresponding sample size functions are, from (4.4.36)

(4.4.38) or in the second case

In Stein’s exact solution the corresponding sample size is 4~~A-~t~,-1 where t2alno-l is the two-sided 200a% point of the t distribution with no - 1 degrees of freedom. From the exact solution we can compute the percentage error in the approximate formulae (4.4.38) and (4.4.39). Cox (195213) finds that for .01 5 a 5 .lo, the percentage error based on Formula 14.4.38 is fairly small even when no is as small as 10, provided that a is less than .025. The correction for kurtosis yields a significant improvement. These results indicate that Cox’s (195213) approximate formulae given in this subsection will be reasonably accurate for all no unless no is very small.

Remark 4.4.1 There are two situations in which sequential methods are useful. In the first, observations are available only at infrequent intervals and must be interpreted as soon as they are obtained. An example is the study of accident rates which may only be obtainable at weekly, monthly, etc. intervals. Double sampling procedures are not useful in such situations. The second type of situation is where the number of observations is under the experimenter’s control, but observations are expensive, so that the smaller the required sample size the better it is. Double sampling is appropriate for this type of problem.

4.4.5 Fixed Length Confidence Intervals Based on SPRT Franzkn (2003) has given a procedure for obtaining confidence intervals for an unknown parameter 8 which is based on Wald’s SPRT. Let X have probability density or mass function f (2; 9) which has monotone likelihood ratio in x. Assume that we observe Xl,X2 ,... sequentially. Let Xn = (x1,x2, ..., x,) E Xn and 9 E R & R. The generalized probability ratio test (GPRT) defined by Lehmann (1955) is a test of HO : 8 = 80 against HI : 9 = 91 (90 < 91) with boundaries which may 174 CHAPTER 4. SEQUENTIAL ESTIMATION vary with n, that continues as long as

(4.4.40)

Then we have the following lemma.

Lemma 4.4.1 Let X1,X2, ... be a sequence of random variables with monotone likelihood ratio. Then the power function of any generalized probability ratio test is nondecreasing.

Proof. This lemma is analogous to a result of Lehmann (1959, p. 101).

Obviously the SPRT is a member of the class of GPRTs. From the above lemma it follows that the SPRT of HO : 8 = 00 against HI : 8 = 01 with error probabilities a and ,8 will have type I error rate less than or equal to a for any parameter in the hypothesis HO: 8 5 80 and type I1 error rate less than or equal to p for any parameter belonging to HI : 8 2 81 and consequently the SPRT of HO : 8 = 80 versus HI : 8 = 81 can be used as a test of HO : 8 5 80 versus H~ :ezel. Next, for fixed 80 define the two types of hypotheses, H& : 8 2 80 and H; :0<80. Let7-l+={H$:eE~}and3-1-={H~:8En}. ForfixedA>O, at each step, we test on level a/2 which elements H,f in 7-l+ can be rejected or accepted against the corresponding elements He-A in 7-1- and which elements He in 7-l- can be rejected or accepted against in Z+. Whenever a decision is reached concerning a pair of hypotheses in 'FI+ and 7-l- these hypotheses will not be considered anymore. The use of composite hypotheses enabling us to make a decision regarding a hypothesis Hr against H$ is made possible only by the monotone likelihood ratio property.

R+ (xn,A) = (8 : H$ is rejected against HrvA at or before time n} and

R- (xn,A) = (6' : Hi is rejected against at or before time n} be the set of parameters corresponding to hypotheses that have been rejected against their alternatives when observing XI, 22,..., xn. Let

U (xn,A) = inf (0 : 0 E R+ (xn,A)} = the smallest parameter 6' for which Hz is rejected against 4.4. TWO-STAGE PROCEDURES 175 and

L (x~,A) = SUP (0 : 0 E R+ (xn, A)} = the largest parameter 0 for which He is rejected against Hi+a. when x, is observed. Now we are ready to define the SPRT(A) confidence interval. We construct a sequence of temporary confidence intervals. Assume, for the time being (which will be established later) that R\ {R+(xn, A) U R- (xn)A)} is an interval. Since we have fixed-length confidence interval in mind we call the confidence intervals produced at each step temporary confidence intervals. In this terminol- ogy the event that there are no pairs left to test corresponds to the event that the length of the temporary confidence interval is less than or equal to A, and when this happens, the process is stopped.

First step

Observe zl and construct R+ (xi,A) and R- (xi,A). Based on this we can compute U (xi,A) and L (XI, A). If U (XI, A) - L (xi,A) L A, stop and declare that no confidence interval was found. If U (xi,A) - L (xi,A) > A, declare [U (XI, A) ,L (XI, A)] a 1 - a temporary confidence interval and take one more observation. kth step

In the kth step, xk-1 = (xi,x2, ..., xk-1) has already been observed and the hypotheses corresponding to parameters in R+ (xk-1, A) and R- (xk-1, A) and which yield the present temporary confidence interval [L(xk-1, A) , U (xk-1, A)] have been rejected. Observing Xk enables us to reject the hypotheses correspond- ing to parameters in R+ (xk,A) and R- (xk,A). If U(xk,A) - L (xk,A) L A there are no pairs of hypotheses left to test and hence declare [L(xk-1, A), U (xkWl, A)] to be the smallest confidence interval one can get based on the obser- vations xk using A as the interval parameter. However, if U (xk, A) -L (xk, A) > A declare [L(xk-1, A) , U (xk-1, A)] as a 1- a temporary confidence interval and take one more observation. Then the SPRT(A) confidence interval is denoted by where L (xn,A) and U (xn,A) are constructed as described above. The sequence {S (xi,A), i = 1,2, ...} will be a sequence of temporary confidence intervals. Inherent in the construction is the property 176 CHAPTER 4. SEQUENTIAL ESTIMATION and R- (Xn,A) C R- (Xn+l,A) and consequently S (xn+l, A) G S (xn+1, A) - Next, we need to be certain that the set R\ {R+(Xn, A) U R- (Xn, A)} of parameters corresponding to hypotheses which have not been rejected against the alternatives while observing xn is indeed an interval and that the coverage probability of this interval is at least 1 - a. This is assured by the following theorem of Franzh (2003).

Theorem 4.4.2 Let f (x;8) have monotone likelihood ratio, that (d2/dB2) x In f (2;8) < 0 and that both error rates are a/2. Then the set R\{ R+ (Xn, A) U R- (xn, A)} as an interval equal to S (xn, A) with coverage probability of at least 1 - a. That is, Po (8 E S (xn, A)} 2 1 - a.

Proof. First let us show that if 8' E R- (xn,A) then 8" E R- (xn,A) for every 8" < 8'. Now, if 8' E R- (Xn,A), then for some sample size m 5 n the hypotheses He was rejected against the alternative H$+,. This means that for the sample size we have a < ln{X(xn,8'+A,8')} = In f (xm,8' + A) - In f (x~,8') < In f (xm,8" + A) - In f (xm,8") since (d2/d6J2)In f (2;8) < 0 implies that the first derivative of In f (z; 8) is de- creasing. Hence the hypotheses Hey, must have been rejected against the al- ternative H:,+, at or before the sample size m. Because the error rates are equal, acceptance of the null hypothesis in the SPRT is equivalent to rejecting the hypothesis used as an alternative. Consequently, no hypothesis corresponding to a parameter smaller than L(Xn,A) or larger than U (xn,A) has ever been accepted since that would require U (xn, A)- L (xn, A) < A which is not permissible in the construction. This completes the proof of the assertion that R\ {R+(xn, A) U R- (xn, A)} is an interval. Now, the coverage probability of the confidence interval can be decomposed as

Assume that the event 8 < L (Xn, A) has happened. This implies that every hypothesis Hi, where 8' 5 L (xn, A) has been rejected against its alternative H$+A for some sample size less than or equal to n. 4.4. TWO-STAGE PROCEDURES 177

In particular, at stage Ic, He was falsely rejected against HA, with proba- bility at most equal to a/2 since each test has level a/2. Thus

Po {e 5 L (xn,A)} = Pe ({reject all Hp where 0" < O} n {reject Ho}) -< Pe (reject He against HAA) 5 42.

We can apply an analogous argument for asserting that

It remains to be shown that the length of S (xn,A) does depend on A. Franzh (2003) was able to show this for the Bernoulli case via a simulation study.

Applications Consider the exponential family given by

(4.4.42) where 8 is a natural parameter. One can easily show that

since

that is,

By differentiating once more we can easily show that d2 uar [T (43 = -- In c (0) . do2 I78 CHAPTER 4. SEQUENTIAL ESTIMATION

Note that the Bernoulli, Poisson, normal distributions belong to the exponen- tial family. Also note that if

then 2 ~-(~-e) -a" In f (z,0) = - [ "1 (4.4.43) ae2 [I + (x - e)"]" which fails to be negative when Ix - el > 1 and thus Theorem 4.4.2 does not hold in the case of the Cauchy density with a translation parameter. The fixed length SPRT(A) confidence interval of length at most D can be constructed by simply stopping at the smallest n for which U (xk,A)-L (xk,A) 5 D. This will always work for all D > A. According to Franzh (2003) there seems to exist an optimal value of A that yields, the smallest average number of observations. The optimal value of A may depend on both the true value of the parameter as well as D. In the Bernoulli case, the optimum choice for A seems to be in the interval [L)/2,D);however the simulation carried out by Franzh indicates that the exact choice of A is not critical.

Example 4.4.8 Let X have the Bernoulli mass function given by

f (x,e) = ez (1 - x = o,i.

Assume that we have n observations x1,x2, ..., x, on X. Assume that the error probabilities are equal to a/2. In order to determine the lower limit of the confidence interval we find the largest value of 80 such that the hypothesis Ho : 8 5 80 can be rejected against HI : 6 2 80 + A using a SPRT. If (B,A) are Wald's bounds, then set a = 1nA and b = 1nB. The SPRT rejects Ho when

That is, when

(4 a 4.45) where s (n)= z1+ 22 + - - - + xn. Using Wald's approximations to the boundary values in terms of the error probabilities, we have a = In [(2- a)/a] and b = In [a/(2 - a)]and hence to find the largest value of 80 that satisfies (4.4.45) we solve eo + A i-eo-a s (n)In (7)+ [n - s (n)]In ( (4.4.46) - eo ) = In (e). 4.4. TWO-STAGE PROCEDURES 179

Using a similar argument, a candidate for the upper confidence limit is given by the solution to (the smallest value of 80 such that HO : 8 5 80 is rejected against HI : 8 2 80 - A)

Note that replacing the strict inequality with an equality in (4.4.46) and (4.4.47) will be of little consequence since the parameter space is continuous. Equations (4.4.46) and (4.4.47) being nonlinear need to be solved numerically for 80 for given n, ~(n),A and a. Note that until the first response (namely, unity for z) is observed Equation (4.4.46) does not have a solution and the lower confidence limit is set equal to zero. Similarly, Equation (4.4.47) has no solution until the last nonresponse, namely zero is observed, and hence until then the upper confidence limit is set equal to unity. The candidate for the lower confidence limit we obtain at nth step is compared with the confidence limit we have for the previous step and the larger of these two values is used as the current lower limit of the confidence interval. The upper confidence limit is adjusted in a similar fashion. The process continues until the length of the temporary confidence interval is less than D.

Special Case Let A = 0.25, and D = 0.5 and suppose that we want to construct a 90% SPRT fixed-width confidence interval for the binomial 8. Let the first 17 of observations be 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0 180 CHAPTER 4. SEQUENTIAL ESTIMATION

Setting a = 0.1, we obtained the following results.

Table 4.4.1 90% Confidence Interval for binomial 6 n Lower root Lower CL Upper root Upper CL 1 0 0 0.9861 0.9861 2 0 0 0.9256 0.9526 3 0 0 0.8502 0.8502 4 0.005644 0.0050 0.8740880 0.8502 5 0.0426368 0.0426 0.892344 0.8502 6 0.035993 0.0426 0.8328545 0.8328 7 0.035355 0.0426 0.7740144 0.7740 8 0.026006 0.0426 0.717304 0.7173 9 0.022218 0.0426 0.6636085 0.6636 10 0.055929 0.0559 0.702070 0.6636 11 0.0497702 0.0559 0.657106 0.6571 12 0.0846981 0.0847 0.690889 0.6571 13 0.076936 0.0847 0.6524315 0.6524 14 0.0700812 0.0847 0.6 16872 0.6169 15 0.0639904 0.0847 0.584188 0.5842 +stop 16 0.058550 0.0847 0.55429599 0.5542 17 0.05366896 0.0847 0.5270667

The source program2 used for this result is given in Appendix of this section.

Appendix: Computer Program for Evaluating the Confidence Interval for Binomial Parameter

> read sol; sol := proc(xctr,nctr, deltactr, alphactr) local j, k, trigger, critl, crit; global eq; print(' '); trigger := deltactr; lprint(' '); print(' '); if 0 < triggger then for j to xctr do for k to nctr do if j 5 k then eq := z * h(l+6/6) + (n- z) *In (1 - 6/ (1 - 6)) = In ((2 - alphactr) /alphactr) ; critl := subs(z = j, n = lc, 6 = deltactr,eq); '1 thank Professor Henry Howard of the University of Kentucky in helping me to prepare this computer program. 4.4. TWO-STAGE PROCEDURES 181

crit := fsolve(crit1, 8, 0..3/2); lprint(j, k, crit) end if end do; print(‘ ’) end do end if; if triggger < 0 then for j to xctr do for k to nctr do if j = I? then lprint(j,k, ‘no pos sol for this case’) end if; if j < k then eq := x * In (1 + 6/8) + (n- x) * In (1 - S/ (1 - 8)) = In ((2 - alphactr) /alphactr) ; critl := subs(x = j,n = k, S = deZtactr,eq); crit := fsolve(crit1, 8, 0.. 1); lprint(j, k, crit) end if end do; print(‘ ’) end do end if end proc 182 CHAPTER 4. SEQUENTIAL ESTIMATION

4.5 Large-Sample Theory for Estimators

Anscombe (1949) provided a large-sample theory for sequential estimators when there is only one unknown parameter. He showed, using a heuristic argument, that an estimation formula valid for fixed-sample size remained valid when the sample size was determined by a sequential stopping rule. An alternative proof was given by Cox (1952a) which suggests that fixed-sample size formulas might be valid generally, for sequential sampling, provided the sample size is large. Anscombe (1952) simplified his previous work by introducing the concept “uni- form continuity in probability” of the statistic employed. Towards this assume that there exists a real number 8, a sequence of positive numbers {wn}, and a distribution function G (x), such that the following conditions are satisfied:

(Cl) Convergence of {Yn}: For any z such that G(a)is continuous (a continuity point of G(z)),

(C2) Uniform continuity in probability of {Yn}: Given any small E and q there exists a large u and a small positive c such that, for any n > u,

P { lynl - Yn EWn simultaneously for all integers n‘ I < (4.5.1) such that In/ - n1 < cn} > 1 - q.

Note that, as n -+ 00, Yn -+ 8 in probability if wn + 0.

In most applications, G(x) is continuous, and usually is the normal distrib- ution function, wn is a linear measure of dispersion of Yn, and for example, is the standard deviation or the quartile range. The term “uniform continuity” is used to describe condition (C2), since a property analogous to ordinary uniform continuity is implied. Given any realization of the sequence {Yn}, let functions Yn and wn be defined for non-integer values of n by linear interpolation between the adjacent integer values. Then, if lnwn is uniformly continuous with respect to Inn for large n, and if (Cl) is satisfied, it is easy to see that (C2) implies, in a probabilistic sense the uniform continuity of (Yn - 8)/Wn with respect to Inn.

Theorem 4.5.1 (Anscombe, 1952). Let {n,} be an increasing sequence of positive integers tending to infinity, and let (NT)be a sequence of random variables taking positive integer values such that {Nr/nr)4 1 in probability as r -+ 00. Then, for the sequence of random variables (Yn) satisfying conditions (Cl) and (C2), we have

P {YN~- 8 5 xw,,} + G (x) as r -+00. (4.5.2) 4.5. LARGE-SAMPLE THEORY FOR ESTIMATORS 183

Remark 4.5.1 Notice that we have not assumed that the distribution of Nr and Yn are independent.

Application 4.5.1 We will apply this result to the sequential estimation of an unknown parameter 8. Let XI,X2, ... denote a sequence of observations, Yn be an estimate of 8 calculated from the first n observations, and Zn an estimate of the dispersion Wn of Yn. In order to estimate 6' with given small dispersion a,we use the sequential stopping rule: sample until for the first time Zn 5 a and then calculate Yn. To show that Yn is an estimate of B with dispersion asymptotically equal to a if a is small, we consider not a single stopping rule, but a sequence of possible stopping rules such that the values of a tend to zero. The above situation can be described in probabilistic terms as follows. Let {Xn} (n= 1,2,...) denote a sequence of random variables not necessarily independent. For each n, let Yn and Zn be functions of Xi,X2, ..., Xn. Assume that {Yn} satisfies (Cl) and (C2) above. Let {a,} (r = 1,2,...) be a decreasing sequence of positive numbers tending to zero. Let { N,} be a sequence of stopping times defined by the condition: Nr is the least integer n such that Zn 5 a,; and let {n,} be a sequence of integers defined by the condition: n, is the least n such that wn 5 a,. We assume that the following further conditions are satisfied.

(C3) Convergence of {wn}: {wn} is decreasing, tends to zero, and as n --+ 00

(4.5.3)

(C4) Convergence of {Nr}:Nr is a well-defined random variable for all r, and asr-+oo

N,/n, 4 1 in probability. (4.5.4)

Then we have the following theorem.

Theorem 4.5.2 (Anscornbe, 1952). If conditions (Cl)-(C4) are satisfied, then P (YN,- B 5 xa,) --+ G (x) as r --+ 00, (4.5.5) at all continuity points x of G(x).

Proof. (C3) implies that WnT/a,4 1 as r + 00. Now (4.5.5) readily follows from Theorem 4.5.1.

Remark 4.5.2 Renyi (1957) gives a direct proof of Theorem 4.5.2 when

i= 1 184 CHAPTER 4. SEQUENTIAL ESTIMATION where the Zi are i.i.d. having mean 0 and finite variance.

In applying this theorem, it will usually be obvious that (Cl) and (C3) are satisfied. The following theorems show that (C2) is satisfied for a wide class of statistics Yn. (C4) is a weak condition and it is usually easy to verify. Although these conditions are sufficient for the conclusion of Theorem 4.5.2.) it will be shown that they are not necessary.

Particular Forms of Yn: Let us assume the following form for Yn which is satisfied in most applications

1 Yn - 8 = - czi + & (4.5.6) n i=l where the Zi are independent with E(2;) = 0 and var(2i) 5 b 5 00 and n1I2R,= o(1) almost surely (as.). Then we will show that Yn satisfies (C2). Thus, we are led to the following result.

Theorem 4.5.33 Let a statistic Yn have the following representation:

(4.5.7) i= 1 where the Zi are independent with E (Zi)= 0 and var (2;)5 b < 00 (i = 1,2, ...) and the remainder term R, = o(n-lI2) a.s. Then Yn - 8 satisfies (C2).

Special Cases (1) Sample Quantiles. Let F be a distribution function and 7 be a fixed point such that F (7) = p (0 < p < 1). Assume that F has at least two derivatives in the neighborhood of 7 and F’(7) is bounded in the neighborhood and that F’ (7) = f (7) > 0. Let Xl,X2, ..., Xn be a random sample from F, Yn be the pth sample quantile (i.e., Yn = X[nP]+l,nwhere Xin are ordered X’S) and Sn denotes the number of X’s exceeding 7. Then Bahadur’s (1966) representation gives

where R, = 0 (r~-~/~lnn)as n --+ 00 with probability one. Applying Theo- rem 4.5.3 we surmise that a sample quantile or a linear combination of sample quantiles satisfies (C2). 3I thank Professor David Mason of the University of Delaware for a useful discussion regarding this result. 4.5. LARGESAMPLE THEORY FOR ESTIMATORS 185

(2) Maximum likelihood estimate (mle). When the probability function or the density function satisfies certain regularity conditions, we shall show that the mle has an asymptotic normal distribution when based on a random sample size. First, we shall give the strong consistency property of the rnle which was established by Wald (1949).

Theorem 4.5.4 (Wald, 1949). Let f (x;9) denote the probability function or the probability density function of a random variable X, where 9 could be a vector. Let

f* (x;9,p) = f (x;9,p) when f (x;8,p) > 1 and = 1 otherwise. $* (qr) is analogously defined. Assume the following regularity conditions:

(i)For suficiently small p and for suficiently large r,

where 90 denotes the true value of the parameter,

(ii)If 8i is a sequence converging to 8, then limi-+oof (x;&) = f (x;8)for all x except perhaps on a set which may depend on the limit point 0 (but not on the sequence 9i) and whose probability measure is zero with respect to the distribution associated with the parameter 90,

(iii) Distinct parameters index distinct distribution functions,

(iv) If lirn;-.+- 8; = 00, then lim;+- f (x;8;) = 0 for any x except perhaps on a fixed set (not depending on the sequence 8i) whose probability as zero with respect to the distribution associated with 90,

(vi) The parameter space is a closed subset of a finite dimensional Euclidean space, and (vii) f (x;9, p) is a measurable function of x for all 8 and p. Let 8, denote the maximum likelihood estimate of 8 based on a random sample of size n from f (x;Oo). If the regularity assumptions (i)-(vii)hold, then 8, converges to 80 as n + 00 with probability one. 186 CHAPTER 4. SEQUENTIAL ESTIMATION

Proof. See Wald (1949, pp. 599-600).

Remark 4.5.3 In the discrete case assumption (vii) is unnecessary. We may replace f* (z;8, p) by f” (2; 8, p) where f(x; 8, p) = f (x;8, p) when f (z; 00) > 0, and = 1 when f (x;eo)= 0. Since f(x;80) is positive at most at a countable values of x, f” is obviously a measurable function of x. Huber (1967) gives an alternative set of conditions for the strong consistency of the mle. Next we shall consider the asymptotic normality of the mle based on a random sample size. Let XI,X2, ... denote an i.i.d. sequence having the probability function or the density function of XI. Let

Bn(0) = -x-lnf(Xi;e).1”d n de i=l We assume the following regularity conditions;

dj (a) - [In f (X;O)] exists for j = 1,2,3, dej

d3 - [In f (X;e)] 5 H (x)where Ee [H (X)]< M, where M is free of 8. de3 1

Under the assumptions (a)-(c), the mle converges to 00 (the true value of the parameter) in probability (see, for instance, Rao (1965, p. 300)). We shall state without proof the following result of the author (1987, pp. 426-427).

Theorem 4.5.5 Let N be the random sample size. Then, under the as- sumptions (a)-(c), we have N1I2 I (&) tends in distribution to normal

(07 I (eo)) ’

Remarks 4.5.4 As a by-product, we can establish the law of the iterated logarithm for the mle. That is

(4.5.8) since nl/’Bn (00) [2 In In n]-1/2 --+ 1-112 (00) . 4.5. LARGE-SAMPLE THEORY FOR ESTIMATORS 187

Furthermore, if for some S > 0,

then using a result of Lo&e (1977, Vol. 1, pp. 254-255) we can establish that

Bn (80) (an - 00) = + o 0n-1/2 , (see Problem 4.5.7.) which coincides with the strong representation in (4.5.7). However, it seems that such a strong representation (although sufficient for the statistic to have the property (C2)) is not necessary as has been demonstrated in the case of the mle. A type of statistic to which Theorem 4.5.3 does not apply, but which still satisfies (C2), is given by (i) Yn = Xn,n; (ii) Yn = XI,; (iii) Yn = Xnn - Xln, because

= ([TI-+l )-’

Thus, P (Ynl > Yn) 5 1 - (1 + c)-’ 5 c. That is, the probability that Ynl differs from Yn for any n‘, where n < n‘ < (1 + c)n is less than c in cases (i) and (ii), and less than 2c in case (iii), and is therefore small if c is small. Statistics of this type often consist of one of the three expressions listed above multiplied by a factor not depending on n. In that case, if c is small, the probability is close to 1 that Ynl - Yn = (Unl/Un - 1)Yn and this must with high probability be small compared with wn for large n and In’ - nl < m.Thus a condition is imposed on the “continuity” of relative to wn.

Theorem 4.5.6 (Anscombe, 1952). (C2) is satisfied if X1,X2, ... are in- dependent and identically distributed and Yn is an extremum or the range of X1,X2, ..., Xn multiplied by a factor un (provided that un, if not a constant, sat- isfies the above condition).

Example 4.5.1 Suppose that we wish to estimate, with given small standard error a, the mean 8 of a normal distribution, when its variance o2 is unknown. If XI,X2, ... denote independent observations, we consider the statistic Yn = n-l Cy==lXi as an estimate of 8; for fixed n this has standard error Wn = o/,/E, 1/2 estimated (for n 2 2) by Zn = { [n(n - 1)I-l cy=l (Xi- Yn)2} . Condition (Cl) and (C3) are satisfied, and so is (C2). Therefore (C4) hold and Theorem 188 CHAPTER 4. SEQUENTIAL ESTIMATION

4.5.2 implies that (YN- 0) /a is asymptotically normal with mean 0 and unit variance if N is the least n for which Zn 5 a (4.5.9)

Now, (4.5.9) is equivalent to (n - 1)-' n-1 ti2 5 nu2, where the tiare in- dependent normal (0, 02)variables derived from the Xi by a Helmert's transfor- mation: ti = (Xi+1 - i-l '& X3> [i/ (i + 1)]'/2.By the strong law of large numbers, given E, r] > 0, there is a v such that

(4.5.10)

If a is small enough, the probability exceeds 1 - r] that (4.5.9) is not satisfied for any n in the range 2 5 n 5 v;and then, given that N > u (4.5.10) implies that the probability exceeds 1 - 7 that IN 2 - 11 < &/u2.Hence (C4) holds as a + 00. To obtain a better approximation to the asymptotic situation when a is not infinitesimally small, it is advisable to impose a lower limit on the value of N and not consider whether (4.5.9) was satisfied until that lower limit had been passed. One might, for example, specify that N 2 where 0 < S < 0. If S < 0 (C4) and Theorem 4.5.2 would apply as before. Example 4.5.2 Suppose that XI,X2, ... have independent uniform distrib- utions in the interval (O,e), and that we desire to estimate 8 with given small standard error a. We may take

AS n 4 00, P (Yn - 8 5 Xwn) -, G (x) = exp (x - 1) (for x 5 l),and wn is asymptotically the standard deviation of Yn. (C2) is satisfied by Theorem 4.5.6. Hence by Theorem 4.5.2 the required sample size N is the least n for which

Similar considerations apply regrading (C4) as in the previous example. If in the definition of Yn we omit the factor (n+ l)/n,this gives G (x)= exp (x) (for x 5 0) and a stopping rule equivalent to the above for large n. Example 4.5.3 To see what may happen if (C2) is not satisfied, consider independent observations XI,X2, ... from a normal distribution with unknown mean 8 and variance 02,and let us take

(4.5.1 1) 4.5. LARGESAMPLE THEORY FOR ESTIMATORS 189

Yn is normally distributed with mean 8 and asymptotic variance 2a2/n for large n. The correlation between Yn and Yn+l + 112 and (C2) is not satisfied. Suppose that we wish to estimate 8 with given small standard error a. Then we take N = 2a2/a2(or the next integer above), a fixed value. The conclusion of Theorem 4.5.2 is valid. Now suppose that we wish to estimate 8 (assumed to be positive) with given small coefficient of variation a. The coefficient of variation of Yn and a suitable estimate of it, are wn = (cr/O) (2/n)l12, Zn = (./Yn) (2/n)l12. When Theorem 4.5.2 is applied, it yields the stopping rule: Stop at N where

N = inf (n ; nY: 2 2a2/a2). (4.5.12)

When n = N the second member of the right-hand side of (4.5.11) is asymp- totically normal with mean 8 (1 - N-l12) , however, toward the first number, as a + 0 the probability tends to 1 that XN > 8 + kcr where k is any positive number. Hence (YN- 8) /a -+ 00 in probability, and so does not have the limit distribution G (z) = (z/O) which Theorem 4.5.2 gives, where CP denotes the standard normal distribution function: It is easy to verify that (C4) is satisfied by the rule in (4.5.12). Thus, when (C2) is not satisfied, the conclusion of Theorem 4.5.2 may hold if N satisfies a stronger condition than (C4), such as being constant. However (C2) seems to be necessary if no condition other than (C4) is imposed on N, and in particular, if the distribution of N and YN are not assumed to be independent.

Remark 4.5.5 If Yn is the average of independent random variables, one can easily show (see, for instance, Laha and Rohatgi (1979, Lemma 5.4.1, pp. 322-323) that if Yn satisfies (Cl) then it also satisfies (C3). Since in most practical applications, we will be concerned with random sums of i.i.d. random variables, the following result would supplement the main result of Anscombe (1952).

Theorem 4.5.7 (Renyi, 1957, and Wittenberg, 1964). Let XI,Xz, ... be inde- pendent and identically distributed random variables having mean 0 and variance 1, and define Sn = X1 + X2 -+ - - - + Xn. If Nl,N2, ... is a sequence of posi- tive integer-valued random variables (defined on the same probability space) such that N,/n converges an probability to a positive constant E, then sNn/(ne)1'2 converges in law to a standard normal variable as n --+ 00.

Proof. This is a special case of Theorem 4.1 of Wittenberg (1964, p. 15). I

Bhattacharya and Mallik (1973) employ Theorem 4.5.7 in order to establish the asymptotic normality of the stopping time of Robbins' (1959) procedure for 190 CHAPTER 4. SEQUENTIAL ESTIMATION estimating the normal mean p when the variance u2 is unknown, with (I; - p)2 + cn as the loss function where c is proportional to the cost per observation and fi = n-l C:==,Xi and the Xi are the i.i.d normal (p,02). In other words, they show that N-c-li2u d 1/2 = normal (OJ) as c -1 0. (4.5.13) (;c-%) They use Lemma 1 of Chow and Robbins (1965) (see Lemma 4.7.1) in order to assert that c112N + o almost surely as c + 0. Notice here that the stopping time N should be indexed by c which we are suppressing for the sake of simplicity. Next, we shall consider a result of Siegmund (1968). Let X1,X2, ... be i.i.d. random variable having mean p > 0 and finite variance 02, and Tn = X1 + X2 + * - - + Xn. Let N(= N,) denote the smallest n for which Tn 2 c-lnb where 0 5 6 < 1. Such stopping rules commonly arise in sequential estimation (for instance, see Chow and Robbins (1965) and Darling and Robbins (1967b)). Then we have

Theorem 4.5.8 (Siegmund, 1968). Let X1,X2, ... be 2.i.d. random variables with E (Xi)= p > 0, var (Xi)= o2 and let Tn = Xi + X2 + - * * + Xn. If N is the smallest n for which Tn 2 cV1n6,0 5 S < 1, then as c -1 0

where A, = (cp)'/('-') ,

Proof. Bhattacharya and Mallik (1973, Theorem 4) provide a simpler proof of Theorem 4.5.8 that is based on Theorem 4.5.7.

They also conjecture that Theorem 4.5.8 holds for all nonnegative 6. Woodroofe (1977) considered the stopping time of a sequential estimation procedure given by N = inf { n 2 no : T* < cnbL (n)} where Tn is as defined earlier, S > 1, c is a positive parameter and L(n) is a convergent sequence such that

L (z)= 1+ L0z-l + o (z-') (Z + m) and LO< 00. Then Woodroofe (1977) establishes the asymptotic normality of

and obtains an asymptotic expansion for E (N). 4.6. DETERMINATION OF FIXED- WIDTH INTERVALS 191

4.6 Determination of Fixed-width Intervals

Khan (1969) has given a method for determining stopping rules in order to ob- tain fixed-width confidence intervals of prescribed coverage probability for an unknown parameter of a distribution possibly involving some unknown nuisance parameters The results are only asymptotic, and rely on the asymptotics of Chow and Robbins (1965). Below we present Khan’s (1969) results. Let p (2;81, 82) denote the probability density function of a random variable X (for convenience, with respect to Lebesgue measure) with real-valued parameters 81 and 82 where 02 is considered to be a nuisance parameter. For the sake of simplicity we assume that there is a single nuisance parameter since the case of several nuisance parameters would be analogous. We wish to determine a confidence interval of fixed-width 2d (d > 0) for 81 when both 81 and 82 are unknown, with preassigned coverage probability 1 - a (0 < a < 1).

Assumption We assume that all the regularity conditions of maximum likelihood estimation are satisfied. [See for instance, LeCam (1970)]. Also assume the regularity conditions of Theorem 4.5.4.

Let N denote a bona fide stopping variable (that is, N is a positive integer- valued random variable such that the stopping set {N = n} is a member of the 0-algebras of subsets generated by X(”) = (XI,X2, ..., Xn)’ and P (N < m) = 1). Let n denote a fixed value assumed by N. Also, let Fisher’s information matrix [see for instance, Rao (1965, p. 270)] be denoted by I(8) = ((Iij)),2, j = 1,2 where 8 = (el, 82)’ and

We assume that ((Iij))is positive definite and let ((Iij))-’= ((Xij)) = A, that is I-l(O) = A. 81 (n)and 82 (n)will denote the maximum likelihood estimators (mle’s) of 81,& respectively based on a random sample of size n. It should be noted that 81 (n) is asymptotically normal with mean 81 and variance All/n where A11 = A11 (el,&) since in general the Iij are functions of 81 and 82. Let {u,,n 2 1) be a sequence of positive constants converging to a constant u where (u)= 1 - (a/2).Let Jn = [81 (n)- d,81 (n)+ d] and

no = smallest integer n 2 u2x1l (el, e2)/d2 (4.6.1) From (4.6.1) it follows that

d2n0 lim no = 00 and lim [ ] 2 1. d+O d--tO u2h1(81782) 192 CHAPTER 4. SEQUENTIAL ESTIMATION

Hence,

ni’2 181 (no>- el 1 d+Olim P(01 E &o) = d-0lim P A;(2 L d (3 = 2~ (u’)- 1 2 1 - a, since u’ > u.

We will treat no as the optimum sample size if 01 and 02 were known, which will serve as a standard for comparison with the stopping time of the sequential procedure to be adopted. In some cases no will turn out to be optimum if only 02 were known and A11 (01,02) depends only on 02, for example, in the case of a normal distribution with 01 as the mean and 02 as the variance. When 01 and 02 are unknown, no fixed n will be available to guarantee fixed- width 2d and coverage probability 1 - a. So we adopt the following sequential rule. For a fixed positive integer m, let

(4.6.2)

where (n)= A11 (81 (n),& (n)).

Lemma 4.6.1 (Khan, 1969). If A11 (el,&) < 00, then the sequential proce- dure terminates finitely with probability 1.

Proof. Under the regularity assumptions, (n)+ A11 (&,02) with proba- bility 1 [see Theorem 4.5.41. Thus the right hand member of (4.6.2) tends to no with probability one. Hence

P(N = 00) = lim P(N > n) 5 lim P n+oo n+oo

Then we have the following first order asymptotic result.

Theorem 4.6.1 (Khan, 1969). If the assumptions of Theorem 4.5.4 are satisfied and if E Lsupi11 (n)1 < 00, (4.6.3) then we have

(i) limd,o N/no = 1 almost surely (a.s.) (asymptotic optimality), 4.6. DETERMINATION OF FIXED- WIDTH INTERVALS 193

(ii) limd+O P (81 E JN) = 1 - a (asymptotic consistency),

(iii) limd,o E (N)/no = 1 (asymptotic eficiency).

Proof. To prove (i) let Yn = A11 (n)/Ail, f (n)= nu2/uiand t = u2Aii (81, 82) /d2 = no.Then conditions of Lemma 4.7.1 [Chow and Robbins (1965, Lemma l)] are satisfied and hence N N lim - = lim - = 1 a.s. t+m t d--+onO

To prove (ii) we observe that N(t)/t--+ 1 a.s. as t -+ 00 and hence N(t)/nt -+ 1 a.s. as t --+ m where nt = [t]= largest integer 5 t. It follows from Theorem 4.5.4 that [N(t) /A11 (01, 82)]1/2 {N (t)}- 811 tends to the standard normal variable in distribution as t --+ 00. Also from (i) it follows that d(N/A11)l12-, u a.s. as d + 0. Hence

= aj (u)- @ (-u) = 1 - a.

Finally (iii) follows from Lemma 4.7.2 [Chow and Robbins (1965; Lemma 2)]. H

Remark 4.6.1 It should be noted that Assumption (4.6.3) is required only for the validity of (iii). However, in some cases it might be possible to establish (iii) without (4.6.3), for instance, by using Lemma 4.7.3 [Chow and Robbins (1965; Lemma 3)].

Example 4.6.1

(a) Consider the normal population having mean p and variance u2 (0 < u2 < m) , Take = p, O2 = u2. Then

The mle's of 81 and 82 are &(n) = Xn = n-lxy=lXi and 8a(n) = 2 n-' Cy==l(Xi - Xn) . Instead of 62 (n), we can use s: = n& (n)/ (n- 1) which is an unbiased and consistent estimator for 82. Hence, the following stopping rule is obtained: u2u2 n>2:n>- and no = - "i'z> d2 . 194 CHAPTER 4. SEQUENTIAL ESTIMATION

(b) For the normal population of example (a), let 81 = 02, 02 = p. Then

Thus 2u'4 2~4 n 2 2 : n 2 - and no = - d2 d2 Graybill and Connell (1964a) have given a two-stage procedure for estimat- ing u2 (see Problem 4.4.5).

(c) Let p (2;0) = 8 exp (-Ox), x 2 0, 0 < 8 < 00. Then,

and

The validity of Assumption (4.6.3) in (a) and (b) follows from the following lemma which is proved from Wiener's (1939) theorem. However, it is not true in (c) and hence (iii) cannot be concluded from Lemma 4.7.2. We first state Wiener's (1939) theorem without proof.

Theorem 4.6.2 (Wiener, 1939). Let {Xn,n 2 1) be a sequence of 2.i.d. random variables with E lXnlr < 00 or E [IXnlr In+ IXnI] < m according as r > 1 or r = 1, where log+ u = max(0, log u). Then

5xi '1

Lemma 4.6.2 (Khan, 1969). Under the assumption 0 < u2 < 00, we have

E sups: < 00 for q 2 2 I22 1 where si is the sample variance (unbiased version). 4.6. DETERMINATION OF FIXED- WIDTH INTERVALS 195

Proof. For q = 2,

Since -2 ln nX: X, ln s; s; ,< -Ex? - -+- + 2z; 5 x,2 + -c x? + z;, n-1 i=l n-l n-1 n-1 i=2 we have 2 sups; 5 x; + sup -lnEx? + sup 2- (ex.) . n12 n22 72- 1 i=l n>ln2 i=l Consequently using Theorem 4.6.2,

E sups, < 00 if E (X2)In’ [XI2< 00 and E (X2)< 00. L22,

However, E (X’)In+ IX12 5 E (X4) < 00 and E (X2) < 00 are true for normal distribution with finite variance. Now let q > 2. Then

after using the inequality [u+ blP 5 2p { Iulp + [blp},p 2 1. That is

Therefore,

Hence, E (supn2z sg) < 00 if E JXIJq< 00 which is true for a normal distribution with finite variance. This completes the proof of the lemma. 196 CHAPTER 4. SEQUENTIAL ESTIMATION

Remark 4.6.2 In the case of a single parameter family of distributions, the stopping variable N takes the form:

where i (8) = -E {a2lnp (X;8) /a02} and 8, is the mle of 8. However, if i (8) is independent of 8, no sequential procedure is required since the bounded length confidence intervals of given coverage probability can be based on normal theory. More generally no sequential procedure is required when A11 (&,&) depends only on 82 which is known. As an example, consider a normal distribution with unknown mean and known variance.

4.7 Interval and Point Estimates for the Mean 4.7.1 Interval Estimation for the Mean In Section 4.6, we discussed fixed-width confidence intervals for a general para- meter, in this section we shall present the large-sample fixed-width sequential confidence intervals for the population mean. The main results can be followed by assuming certain convergence theorems, the understanding of which requires some knowledge of measure theory. The basic asymptotics of Chow and Robbins (1965) will be given as lemmas and their main results and NBdas’ (1969) results will be stated as theorems.

Lemma 4.7.1 (Chow and Robbins, 1965). Let Yn (n= 1,2, ...) be any sequence of random variables such that Yn > 0 a.s. (almost surely) limn+m Yn = 1 a.s., let f (n)be any sequence of constants such that

and for each t > 0 define

N = N(t)= smallest k 2 1 such that Yk 5 f(k)/t. (4.7.1)

Then N is well-defined and non-decreasing as a function of t,

lim N = 00 a.s., lim E(N)= 00, (4.7.2) t+oo t+w and lim f(N)/t= 1 a.s. (4.7.3) t+m 4.7. INTERVAL AND POINT ESTIMATES FOR THE MEAN 197

Proof. (4.7.2) can easily be verified. In order to prove (4.7.3) we observe that for N > 1, YN < f(N)/t < [f(N)/f(N- l)]Y~-1,from which (4.7.3) follows as t + 00. rn

Lemma 4.7.2 (Chow and Robbins, 1965). If the assumptions of Lemma 4.7.1 are satisfied and if E (sup, Yn) < 00, then

lim E [f(N)]/t = 1. (4.7.4) t+oo

Proof. Let 2 = supn Yn; then E(2)< 00. Choose m such that f(n)/f(n- 2) 5 2 for n > m. Then for N > m

Hence, for all t 2 1

(4.7.5)

Now, (4.7.4) follows from (4.7.3), (4.7.5) and the dominated convergence theorem.

Let XI,X2, ... be a sequence of independent observations from some popu- lation. We wish to set up a confidence interval having specified width 2d and coverage probability 1 - Q for the unknown mean p of the population. If the variance of the population u2 is known, and if d is small compared to u2,this can be constructed as follows. For any n 2 1 define

and let u, (to be written as u hereafter) denote the (1 - fractile of the standard normal distribution. Then, for a sample of size n determined by

U202 n = smallest integer 2 - (4.7.6) d2 the interval In has coverage probability

P(pE In) = P U

Since (4.7.6) implies that limd,o (d2n) / (u2u2)= 1, it follows from the cen- tral limit theorem that limd,o P (p E In) = 1 - Q. 198 CHAPTER 4. SEQUENTIAL ESTIMATION

Quite often, we will be concerned with the situation where the nature of the population, and hence u2,is unknown, so that no fixed-sample size method is available. Let (4.7.7) u1, u2, ... be any sequence of positive constants such that limn-,m un = u, and define

d2k N = smallest integer k 2 1 such that V2< - (4.7.8) IC - u; Then, we have the following theorem.

Theorem 4.7.1 (Chow and Robbins, 1965). If 0 < u2 < 00, then we have

(i) 1imd-o (d2N)/ (u2u2)= 1 a.s., (asymptotic (‘optimality”), (4.7.9) (ii) limd,o P (p E IN)= 1 - Q (asymptotic ‘(consistency))), (4.7.10) (iii) limd-+o (d2E(N))/ (u2u2) = 1 (asymptotic “eficiency’)). (4.7.11) In Lemma 4.7.1, set

and (4.7.12) then (4.7.8) can be written as N = N(t)= smallest k 2 1 such that Yk 5 f(k)/t. Applying Lemma 4.7.1, we have

(4.7.13) which proves (4.7.9). Next, consider

Proof. By (4.7.13) dN112/a --+ u and N/t --+ 1 in probability as t + 00; it follows from Theorem 4.5.1 that as t --+ 00, (XI+ X2 + - - - + Xn - Np) ON^/^ behaves like a standard normal variable. Hence limt+m P (p E IN)= 1- a which 4.7. INTERVAL AND POINT ESTIMATES FOR THE MEAN 199 proves (4.7.10). Now (4.7.11) immediately follows from Lemma 4.7.2 whenever the distribution of the Xi is such that

E sup -)(xi-Yn) < 00. (4.7.14) [n{:" i=l 2}1 The justification being that

(4.7.15) and since the function f(n)defined by (4.7.12) is n+o(n) it follows from (4.7.15) that

For (4.7.14) to hold, finiteness of the fourth moment of the Xi would suf- fice; however, the following lemma shows that (4.7.11) is valid without such a restriction.

Lemma 4.7.3 (Chow and Robbins, 1965). If the conditions of Lemma 4.7.2 are satisfied, if limn+oo [f(n)/n]= 1, ijfor N defined by (4.7.1)

E (NYN) E(N)< 00 (all t > 0), lim sup 5 1, (4.7.16) t-mo E(N) and if there exists a sequence of constants g(n) such that

then, (4.7.17)

Proof. The proof is somewhat technical and hence is not given here. For details see Chow and Robbins (1965, Lemma 3, p. 459). H

Remark 4.7.1. If the random variables Xi are continuous, definition of V2 2 in (4.7.7) can be modified to V2 = n-l Cy==,(Xi - Xn) . The term n-l is added in order to ensure that Yn = Vn/a2 > 0 a.s., and this fact has been used in the proof of Lemma 4.7.1 so as to guarantee that N --+ 00 a.s. as t + 00. Also it is evident from the proofs, N in (4.7.8) could be defined as the smallest (or the smallest odd etc.) integer 2 no such that the indicated inequality holds, where no is any fixed positive integer. 200 CHAPTER 4. SEQUENTIAL ESTIMATION

Remark 4.7.2. Theorem 4.7.1 has been established when the Xi are normal with mean p and variance cr2 by Stein (1949), Anscombe (1952, 1953), and Gleser, Robbins and Starr (1964). Extensive numerical computations of Ray (195713) and Starr (1966a) for the normal case indicate that, for example, when 1 - cy = 0.95 the lower bound of P(~N- d 5 p 5 YN+d) for all d > 0, where N is the smallest odd integer k 2 3 such that (k - 1)-l xf’.l (Xi - yk)2 5 (d2k)/u:, is about 0.929 if the values uk are taken from the t-distribution with k - 1 degrees of freedom (see Table 5.1.1 in Govindarajulu (1987) or Ray (1957b)). Niidas (1969) has extended Theorem 4.7.1 so as to take care of other specified “accuracies”. We speak of “absolute accuracy’’ when estimating p by

In (P IX, - 1-11 L d) ) (d > 0) ; (4.7.18) and if p # 0,we speak of “proportional accuracy” when estimating p by (4.7.19) Jn (P : IXn - PI I P IPI) 7 (0 < P < 1) - Denote by p the coefficient of variation a/ IpI and define

(4.7.20)

(4.7.21) where as before u denotes the (1 - c~/2)~~fractile of the standard normal variable. Then n(d) and m(p) increase without bound as the arguments tend to zero. Hence, for small arguments one can (at least approximately) achieve the required probability of coverage 1 - a by taking the ‘sample size’ n no smaller than n (d) (for absolute accuracy) or m(p) (for proportional accuracy). If, however, cr2 (or p2) is unknown then n(d) or m(p)is not available. On the other hand if we let V: be given by (4.7.2) then the stopping rules

(4.7.22) and (4.7.23) are well-defined. In the event that p2 is known (but not cr2) and one insists on absolute accuracy, or if cr2 is known (but not p2) in the proportional case, then one has N*=min { n:Xn+-l 4.7. INTERVAL AND POINT ESTIMATES FOR THE MEAN 201 and (4.7.25) as the sequential analogues of (4.7.22) and (4.7.23). Denote by K any one of the stopping times given by (4.7.22)-(4.7.25),let k be the corresponding “sample size” (4.7.20) or (4.7.21) and let Hk be the corresponding interval estimate (4.7.18) or (4.7.19). Then we have the following theorem.

Theorem 4.7.2 (Nhdas, 1969). With the above notation, we have

(i) limK/lc = 1 a.s., (asymptotic optimality) (ii) lim P (p E Hk) = 1 - a (asymptotic consistency) (iii) limE(K)/k = I (asymptotic eficiency)

4.7.2 Risk-efficient Estimation of the Mean

Let XI,X2, ... be i.i.d. random variables having mean p and variance 02,We wish to estimate the unknown mean p by s?, with a loss function

L n--,26-2 (X. - p)2 + where X, = n-l xz1Xi, S > 0, X > 0 and n denotes the sample size. The risk

026 4 = E (Ln) = - +An n which is minimized by taking a sample size no where no=X- 112o 6 incurring the risk h0= 2X1/20b. Since o is unknown, there is no fixed-sample size procedure that will achieve the minimum risk. With S = 1, Robbins (1959) proposed to replace o2 by its estimator

i= 1 and define the stopping time by

where m 2 2 and then the point estimate of p is ~NI.

Let RN~= E (LN~).The performance of N‘ is usually measured by 202 CHAPTER 4. SEQUENTIAL ESTIMATION

(i) the risk-efficiency: &,/RNI and

(ii) the regret: RN~- ho where the observations are drawn from a normal population, and 6 = 1, using the loss function

Robbins (1959) obtained some numerical and Monte Carlo results which sug- gested the boundedness of the regret. Starr (1966b) has shown that

(i.e., asymptotic risk-efficient) if and only if m 2 3. Starr and Woodroofe (1969) have shown that the regret is bounded if and only if m 2 3. Woodroofe (1977) has obtained the second order asymptotics: when m 2 4 1 3 = X1l2a + -o-~v- - + o(1) E (N') 2 4 1 R>, = 2X-1/2a + - + o (1) , 2 where v is a computable constant. The preceding results indicate that the sequen- tial procedure performs well for the normal samples. Chow and Yu (1981) ask how good is the procedure in general? Consider the following counter example. Let P(Xi = 1) = p = 1 - P(Xi = 0), 0 < p < 1 and S = 1. Then for m 2 2

E [ (~NI- p) 4- AN']

L (Xm- dP J{X~=1,,..,Xm=l} ' (1 - p)2pm> 0 and R,, x 2 [~p(1 - p)ll/' + o as x -+ 0. Hence limX+0&,/RNt = 0 and thus N' is not asymptotically risk-efficient. To remedy the situation, Chow and Robbins (1965) proposed the stopping time which is a special case of the following with S = p = 1:

where ,6 > 0 and the n-p is added to Vn. 4.8. ESTIMATION OF REGRESSION COEFFICIENTS 203

Chow and Yu (1981) obtain the following theorem which we will state without proof.

Theorem 4.7.3 (Chow and Yu, 1981). Let X1,X2, ... be i.2.d. random variables with E(X)= p and war(X) = o2 E (0,~).For 6 > 0, let an and bn sequences of constants such that

ann-2/6 41 and 0 < bn + ~(l)as n + 00.

For X > 0 and nA 2 1, define

Then, we have (i) If nA = o 0X-ll2 as X + 0, then X112T = o6 as., (ii) limA40 E 0X112T = 06,and (iii)If E IX12p < 00 for some p > 1 and -klogX 5 nA = o k > kd,p, then as + 0

026-2(x, - + AT] -=E[RT + 1. Rnb 2X1l2o6

Note that p > 1. In our particular case, we can set

bn = n-l and an = n2/'.

4.8 Estimation of Regression Coefficients 4.8.1 Fixed-Size Confidence Bounds Gleser (1965) has extended Chow and Robbins' (1965) results to the linear re- gression problem Let 91, y2, ... be a sequence of independent observations with

yi = p',W + ~i,i = 1,2, ..., n (4.8.1) where p' is an unknown 1 x p vector, z(2) a known p x 1 column vector, and ~i a random error having an unknown distribution function F with mean 0 and finite, but unknown variance 02. We wish to find a region W in p-dimensional Euclidean space such that P(pE W)= 1 - a and such that the length of the interval cut off on the Pi-axis by W has width 5 2d, i = 1,2, ...,p. As has already been noted for p = 1, no fixed-sample size procedure meeting the requirements exists. Hence, we are led to consider sequential procedures. 204 CHAPTER 4. SEQUENTIAL ESTIMATION

When u is Known Since the least-squares (Gauss-Markov) estimate of p has, component-wise, uni- formly minimum variance among all linear unbiased estimates of p, has good asymptotic properties (such as consistency) and performs reasonably well against non-linear unbiased estimates, the least squares estimate of p would be a natural candidate to use in the construction of our confidence region. It is well-known that the least squares estimate of p is given by

where YL = (91,y2, ..., yn), Xn = (~(~1,d2), ..., dn)):a p x n (p 5 n) matrix and where we assume that X, is of full rank. [This can usually be achieved in practice - if not sample until p independent di)are found, start with the p corresponding yZ's and save the remainder for future use in the sequential procedure. Such a procedure does not bias the results and is equivalent to starting after a fixed number of observations no .] Since the covariance matrix of the b (n)is u2 (XnX;)-', construct the confi- dence region {a (n>- p>' (XnXL) { B (n>- ic> I d2 which would have probability of coverage equal to P {u2Xi 5 d2} if F is normal (and asymptotically for any F). To find a confidence interval of width 2d for any one of the pi,we could [as in Chow and Robbins (1965)] use the interval pi(n) f d. Also, for any linear combination alp, a' : 1 x p, of the pi,i = 1,2,. .., p, we could use the confidence interval a'b (n)f d. Now, a region Wn that would be contained in all of these confidence intervals is,

(4.8.3) since for any a such that a'a = 1, any z E Wn

2 {a' [z - b(n)]}2 5 ata=l {a' [.- P(4]}

This region can be adapted for the confidence procedure.

When u is Unknown The least squares estimate of o2 is

(4.8.4) 4.8. ESTIMATION OF REGRESSION COEFFICIENTS 205

Before presenting the class of sequential procedures C, we shall consider some asymptotic properties of (n) and e2(n), which will be relevant to the discussion of the asymptotic properties of the class C.

Lemma 4.8.1 Let Un = (XnXi)1/2Xn = ((Un,i,j)). Then, if (A)maxi,j IUn,i,jl -+ 0 as n + 00, then

(i) [b(n) - PI’(XnX;)lI2 - nomal(0,u21p), and

(ii) e2(n) -, u2 a.s. (as n + 00).

A suficient condition for the Condition (A) to hold is the following set of assumptions:

Al: There exists a p x p positive definite matrix C such that

Under these assumptions, we can find the asymptotic probability of coverage of the region Wn.

Lemma 4.8.2 If assumptions A1 and A2 hold,

where XI, X2, ..., Ap are the characteristic roots of C-l and T (XI, X2, ..., X,) has the distribution of a weighted sum of p-independent chi-square variables with one degree of freedom, the Xi’s being the weights.

Asymptotic Properties of the Class C

Given d and a! and for a fixed sequence of z-vectors, &I,d2), ... arranged so that X, is non-singular and so that assumptions A1 and A2 are satisfied, let {un} be any sequence of constants converging to the number u satisfying

P {T (A1,X2, ..., Xp) 5 U} = 1 - 0. (4.8.5)

Then, this sequence {Un} determines a member of the class C of sequential pro- cedures as follows: 206 CHAPTER 4. SEQUENTIAL ESTIMATION

Start by taking no 2 p observations y1,y2, ...,yno. Then, sample term by term, stopping at N where 82(k)+k-l d2 N = smallest k 2 no such that L -; (4.8.6) k uk when sampling is stopped at N = n, construct the region Wn described in (4.8.3). Then the procedures in the class C are asymptotically “consistent” and “effi- cient”, as d --+ 0, which are given by Theorem 4.8.1.

Theorem 4.8.1 (Gleser, 1965). Under the assumption that 0 < o2 < 00,

= 1 a.s. , (4.8.7)

(ii) lim P (p E (4.8.8) d+O and d2E(N) (iii) lim - (4.8.9) d-0 [ uo2 =’ -

Remark 4.8.1. The addition of n-l to o2 (n)in (4.8.6) is unnecessary if F is continuous. N could be defined as the smallest odd, even, etc., integer 2 no such that (4.8.6) holds and Theorem 4.8.1 will be still valid. Very little is known about the properties of any member of the class C for moderate values of 02/d2.

Gleser’s assumptions (Al) and (A2) have been found to be strong and they have been weakened by Albert (1966) and Srivastava (1967, 1971). Also, the latter authors obtain spherical confidence regions for the regression parameters. Consider the following example.

Example4.8.1 Let E=a+pi+~i,i=1,2)..., nwhere~iarei.i.d. having mean 0 and variance 02, so, p = 2 and let

Hence, n n (n 1) /2 XLXn = + [ n(n+1)/2 n(n+1)(2n+1)/6 1 It is clear that n-l (X,X’,) does not tend to a positive definite matrix as n --+ 00. Thus Assumption (Al) is not satisfied. The characteristic roots of XnX’, are given by 4.8. ESTIMATION OF REGRESSION COEFFICIENTS 207 where

Hence one of the assumptions of Albert (1966) is not satisfied (the limit should go to unity). Srivastava (1971) has given alternative sufficient conditions that are weaker than Albert (1966) and Gleser (1965). It is known (see for example, Roy, 1957) that Xncan be written as Xn= T,L, where T, is a p xptriangular matrix with positive diagonal elements (hence unique); and Ln is a p x TZ semi-orthogonal matrix; LnLk = Ip and Ip is a p x p identity matrix. Hence Tz1X, = Ln = 1, ,In , ..., lp)]= ((lij (TZ))). [ (2) Let A, = Xmin (XnXk). Srivastava (1971) has shown that the basic result of Gleser (namely Theorem (4.8.1)) holds under the weaker conditions:

where TZ*= [n(l + c)]+ 1 where [-I denotes here the largest integer in (.), and IlBll = [A,, (B)p2

Example 4.8.2 Let

This satisfies the above conditions (i)-(iii).

Srivastava (1971) gives the following sequential rule in order to construct confidence regions whose limiting coverage probability is 1- a and the maximum diameter of the region is almost 2d.

Procedure. Start taking no 2 p observations y1,y2, ..., yno. Then sample observation one at a time and stop when

d2Ak N = smallest k >_ no such that [S2 (k)+ k-'] I -1 ck where A, is the smallest characteristic root of (XnXA), P (xi< u) = 1 - a, ii, -+ U. When sampling is stopped at N = n, construct the region Wndefined 208 CHAPTER 4. SEQUENTIAL ESTIMATION

4.9 Confidence Intervals for P(X

In reliability problems the parameter p which is the probability that one random variable is stochastically larger than the other is of much interest. Sequential confidence interval for p = P(X < Y)has been considered by Govindarajulu (1974). Let (X,Y) have a bivariate normal distribution with an unknown mean vector and an unknown covariance-matrix. Assume that we ob- serve pairs of observations (Xi,Y,), i = 1,2, ... - 2 Let Di = Y, - Xi, Dn = Fn - Tn, = (n- 1)-l (Di - Dn) where - s~O,, Cy=, Yn and Xn denote means of samples of size n. Then, it is well known that p = @ (pD/ao)where pD = E (Y - X), 0% denotes the variance of (Y - X). Whatever be the covariance structure of X and Y,a reasonable estimate of p is $n = @ (En/SDln) . Then, we have the following result. For the sake of simplicity, we shall suppress all the subscripts in p, a,s and $.

Theorem 4.9.1 Let a2 = (1 + p2/2u2)and = (1 + Bi/2sgln). Then, we have

where the subscript n in Dn is suppressed for the sake of implicity.

Proof. Let @* = (D/a). First we will show that

Jn @* - P) d NN normal (~,a-~)as n --+ 00.

Toward this, write @* - p = a-’ (D - p) 4 (t/cr)where t lies between D and p. However, since D converges to p in probability (in fact as.), 4(t)converges to 4 (p/a)in probability. Thus fi ($* - p)/4 (p/a)is asymptotically equivalent in distribution to fi (D- p) /a. Now write

where

where w lies between E/s and D/a and 4 (w) converges to 4 (p/a)in probability. Since D converges to p in probability 4.9. CONFIDENCE INTERVALS FOR P(X< Y) 209

d Now, using the independence of x and s, we have 6(f~ - p) /a$ (p/a> = normal(0, aW2+ p2 (20~)~' = normal(0,l). The proof of the Theorem is complete upon noting that ii converges to a and 4 (D/s) converges to 4 (p/o)in probability. 1

Suppose we wish to set up a confidence interval of prescribed converge prob- ability 1 - a for the unknown parameter p. Define for n 2 2, In = (fi - d,fi + d) and assume that d is small compared to $I (p/o). Then, for

222 n = smallest integer 2 4 (P/+ (4.9.1) d2 > the interval In has coverage probability

as d -, 0 after applying Theorem 4.9.1. Notice that n + 00 as d -+ 0 and (4.9.1) implies that limd -,o {d2n/ [~ia~4~(p/o)]} = 1. However, since p and o and consequently 4 (p/o) are unknown, no optimal fixed-sample size procedure exists. An inefficient fixed-sample size procedure can be obtained by replacing 4 (p/o)by 4 (0) = 6in (4.9.1) which was proposed by Govindarajulu (1967).

Sequential Procedure

Let {uk} be a sequence of constants tending to u,. Then stop at N = smallest n 2 2 for which (4.9.2) and then give the confidence interval IN = ($N - d,p~ + d) where $N = 4 (DN/sN) It is worthwhile to note that N is a genuine stopping variable. That is

P (N < 00) = 1 since

P(N=oo) = limP(N>n) n+oo

= lim P (- d2 < (B/s)) n+oo u; - n = 0. 210 CHAPTER 4. SEQUENTIAL ESTIMATION

Towards the properties of the above sequential procedure, we have the follow- ing theorem.

Theorem 4.9.2 Under the assumption that o2 > 0 (which implies that q5 (p/o)> 0) we have

Proof. In Lemma 4.7.1 set Yn = [&q5 (D/s) / {uq5 (p/cr)}]2, f(n)= nui/ui and t = uiu2q52(PI.) /d2. Since limt,m f(N)/t -+ 1 a.s. and fi(fi~- p)/ {uq5 (p/o)}behaves like a standard normal variable as t + 00, (i) and (ii) follow. If you define N* = smallest n such that n 2 u:&:$~(0) /d2, then N* > N. Now, using Nadas' theorem (Theorem 4.7.2, for stopping time Ad) for N*, (iii) and (iw) will follow. W

4.10 Nonparametric Confidence Intervals

4.10.1 Confidence Intervals for the p-point of a Distribution F'unc- tion Farrell (1966a) 1966b) has given two sequential procedures for setting up bounded width confidence intervals for'the p-point of a distribution function that are based on the i.i.d. sequence of random variables {Xn,n > l}. For further details, the reader is referred to Govindarajulu (1987, Section 5.11.1).

4.10.2 Confidence Intervals for Other Population Parameters Geerstsema (1970a) has applied the methods of Section 4.7 for constructing a sequential nonparametric confidence interval procedure for certain population parameters. Notice that the methods of Section 4.6 will not apply here since the functional form of the density is unknown.

A General Method for Constructing Bounded Length Confidence In- tervals Let XI, X2, ...)Xn be a fixed random sample of size n from a population having F for its cumulative distribution function (cdf) and let 6' be a parameter of the population. We are interested in constructing for 8, a confidence interval of length 4.10. NONPARAMETRIC CONFIDENCE INTERVALS 211 not larger than 2d. For each positive integer n, consider two statistics Ln and Un (not depending on d) based on the first n observations, such that Ln < Un as. and P (L, 5 8 5 U,) = 1 - a (so that, for n large, (Ln,Un) is a confidence interval for 8 with coverage probability approximately 1 - a). Define a stopping variable N to be the first integer n 2 no such that Un - Ln 5 2d, where no is a positive integer. Take as confidence interval (LN,UN). Then one could ask:

(i) What is the coverage probability of the procedure? (ii) What is the expected sample size?

These questions can, under the following regularity assumptions, be answered asymptotically as d --+0.

Al: Ln < Un as. (Ln and Un are independent of d)

A2: Jn (Un - Ln) + 2K,lA a.s. as n + 00 where A > 0 and @ (Ka)= 1-a/2 and where @ denotes the standard normal cdf.

A3: fi (Ln - 8) = Zn/A + K,/A + o(1) as. as n + 00 where Zn is a stan- dardized average of i.i.d. random variables having finite second moment. A4: The set { Nd2}d,o is uniformly integrable.

Then we have the following result.

Theorem 4.10.1 (Geertsema, 1970a). Under the assumptions (Al)-(A4)

(2) N is well-defined, E(N) < 00 for all d > 0, N (= N(d)) is a function of d which is nondecreasing as d decreases, limd,o N = 00 a.s. and limd,oE(N) = 00..

A Procedure Based on the Sign Test Let X1,X2, ..., Xn be i.i.d. random variables having a unique median y. For testing the hypothesis y = 0, one uses the sign test based on the statistic

where I (B)denotes the indicator function of the set B. In the case of a fixed- sample of size n, a confidence interval for y can be derived from the sign test in 212 CHAPTER 4. SEQUENTIAL ESTIMATION a standard way. This confidence interval is of the form (X,,,(n) X,,,(n)) where Xn,l 5 Xn,2 5 - - 5 Xn,n are the ordered X’s and where a(n) and b(n) are integers depending on n. The limiting coverage probability as n --,00 of such confidence interval is 1 - a if

n KaJn n KaJn a(.) a(.) -and b(n) - - - (4.10.1) - 5 + 2 - 2 2- From this confidence interval one can thus obtain a sequential procedure as follows: Let N be the first integer n > no for which X,,,(n) - xn,$(,)5 2d and choose as resulting confidence interval (XN,b(N),XN,.(N)) where {a (n)}and {b (n)}are sequences of positive integers satisfying Assumption (A5) which is given below and no is some integer. This procedure is similar to Farrell’s (1966b) procedure discussed in Govindarajulu (1987 Section 5.11)). The following assumption will be needed.

A5: Let X1,X2, ... be a sequence of i.i.d. random variables with common cdf F(x - y) where F(x) is symmetric about zero. F has two derivatives in a neighborhood of zero and the second derivative is bounded in the neigh- borhood so that y is the unique median of the X’s. The sequences, {an} and {b,} are defined by b, = max [l,{ (n/2)- Kan’/’/2}], an = n - bn + 1 where [z] denotes the largest integer contained in x.

We shall now show that the above procedure satisfies (Al)-(A4). Without loss of generality we can assume that y = 0. A strong use is made of the following representation of sample quantiles by Bahadur (1966). Under Assumption (A5)

(4.10.2) where {Ic (n)}is a sequence of positive integers satisfying k(n) = np+ ~(filnn), 0 < p < 1, F(5) = p, F’(5) = f(c) and Fn is the empirical cdf of the X’s. Then, we have the following Lemma.

Lemma 4.10.1 (Geertsema, 1970a). We have

(iii) The set {Nd2}d,o is uniformly integrable. 4.10. NONPARAMETRIC CONFIDENCE INTERVALS 213

Proof. (i) It readily follows from (4.10.2) that

(ii) readily follows from (4.10.2). For the proof of (iii) see Geertsema (1970a). It consists of using a result of Yahav and Bickel (1968) and Hoeffding (1963, Theorem 1). The following theorem is a direct consequence of Theorem 4.10.1 and Lemma 4.10.1. H

Theorem 4.10.2 (Geertsema, 1970a). The confidence interval procedure based on the sign test has asymptotic coverage probability 1 - a as d 3 0. The stopping variable N satisfies limd,o E(Nd2)= K2/4f2(0).

Remark 4.10.1.1 Unlike in Theorem 4.7.1, no assumption of the finiteness of the second moment of F is made in Theorems 4.10.1 and 4.10.2.

A Procedure Based on the Wilcoxon One-Sample Test The Wilcoxon one-sample test procedure is based on the statistic

where XI,X2, ..., Xn is a random sample from a distribution symmetric about y. The test procedure is used to test the hypothesis y = 0 against shift alternatives. A confidence interval for y based on a fixed-sample of size n is of the form (Zn,b(n),znla(n)) where Zn,l IZn,2 I' - IZn1n(n+l)/2 are ordered averages (Xi + Xj)/2, for i, j = 1,2,..., n and i 5 j. The limiting coverage probability of such an interval is 1 - a if

(4.10.3)

The Sequential Procedure

Let N be the first integer n 2 no for which Znla(n)- Znlb(,) 5 2d and choose as re- sulting confidence interval (ZN,b(N) - z~,~,,)).{.(.)} n)} and {b(n)}are sequences of positive integers satisfying (4.10.3) and no is some positive integer. 214 CHAPTER 4. SEQUENTIAL ESTIMATION

The asymptotic analysis of this procedure is somewhat complicated because it is based on ordered dependent random variables, namely the Zn,k, k = 1,2, ..., n(n+ 1)/2. Fortunately the theory of U-statistics [See Hoeffding, 1948, 19631 can be applied. The statistic

(;)yg ,(Xi+X, 2 .o> (4.10.4) i=l j=i+l is a one-sample U-statistic and the test based on it is asymptotically equivalent to the Wilcoxon one-sample test.

The Modified Sequential Procedure

Let N be the first integer n 2 no for which W,,,(,) - Wn,b(n)5 2d and choose as resulting confidence interval (WN,b(N),WNlu(N)). {a(n)}and {b(n)} are se- quences of positive integers satisfying (4.10.3), and no is some positive integer and Wn,15 Wn,25 - - 5 Wn,n(n+1)/2are ordered averages (Xi + Xj)/2, for i < j and i, j = 1,2, ..., n. So, let us confine ourselves to the modified sequential procedure. We need the following assumption.

A6: Let XI,X2, ... be a sequence of i.i.d. random variables having common cdf F(z- 7)where F is symmetric about zero. F has density f which satisfies f2(z)dx < 00. G (x - 7)denotes the cdf of (XI+ X2)/2 and G has a second derivative in some neighborhood of zero with G“ bounded in the neighborhood. G‘ is denoted by g when it exists. {a(n)}and {b(n)} are sequences of positive integers defined by

The following facts can easily be established.

(i) The assumptions on F guarantee the existence of a derivative for G.

(ii) If f has a Radon-Nikodym derivative f’ satisfying s If‘l < 00 and s (f’)2 < 00, then assumptions on G are satisfied. (iii) Assumption (A6) implies that G’(0) > 0 since G’(0) = 2 s f2(z)d.Without loss of generality we can set y = 0. We also have the following result. 4.10. NONPARAMETRIC CONFIDENCE INTERVALS 215

Theorem 4.10.3 (Geertsema, 1970a). Both the confidence interval proce- dures based on the Wilcoxon one-sample test procedure have asymptotic coverage probability 1 - a as d + 0. The stopping variable N satisfies:

-2 d+Olim E (Nd2)= LK:12 [If2(x)dx] .

Asymptotic Efficiencies of the Procedures Consider two bounded length confidence interval procedures T and S, for esti- mating the mean of a symmetric population by means of an interval of prescribed length 2d. Denote by NT and Ns the stopping variables and by p~ and ps the coverage probabilities associated with the procedures T and S respectively.

Definition 4.10.1 The asymptotic efficiency as d + 0 of procedure T relative to S is e(T,S) = lirnd+o E (Ns)/E (NT)provided limd+o p~ = limd+o ps and that all the limits exist. Denote by M the procedure of Chow and Robbins (1965) [See Equation (4.7.8) and by S and W the procedures based on sign test and Wilcoxon test respectively. Then it follows from Theorems 4.7.1, 4.10.2 and 4.10.3 (under Assumptions (AS) and (A6), and o2 < 00) that

e (S,M) = 4a2f2 (0) , (4.10.5)

If one regards the procedures M, S and W as based on the t-test, sign test and the Wilcoxon one-sample test respectively, one sees that the above efficiencies are the same as the Pitman-efficiencies of the respective (fixed-sample size) tests relative to each other.

Monte Carlo Studies Geertsema (1970b) has performed a series of Monte Carlo studies for different values of d and for a few symmetric populations (normal, contaminated normal, uniform, and double exponential) to compare the behavior of the procedures with the asymptotic results. He surmises that the actual coverage probability is quite close to the asymptotic coverage probability for the two procedures that are based on rank tests is higher than that of the procedures based on the t-test. The results also suggest that E (N)5 {Kz/3 [G (d) - 1/212} + C in the case of the W procedure where C is a constant. The results illustrate the upper bound E (N)5 K:a2/d2 + no for the procedure based on the t-test [see Simons, 19681. 216 CHAPTER 4. SEQUENTIAL ESTIMATION

4.10.3 Fixed-width Confidence Intervals for P(X

X

X

If u2 is known, and if d is small compared to 02,the fixed-sample size proce- dure is given as follows: For any n 2 1 define In = (& - d,& + d) and let ITa denote the (1 - fractile of the standard normal distribution. then, for a sample of size n determined by Kp n = smallest integer - (4.10.7) 2 d2 the interval In has coverage probability

and (4.10.7) implies that limd,o (d2n)/ (K2u2)= 1. It follows from the asymp- totic normality of @n [See Theorem 2.1 of Govindarajulu 1968133, which could easily be extended to the situation where F and G are purely discrete since F and G can be made continuous by the continuization process described by Govindarajulu, LeCam and Raghavachari (1967) that limd,o P(p E In) = 1- a. However, in real situations F and G and consequently o2 are unknown, so that no fixed-sample size method is available. 4.10. NONPARAMETRIC CONFIDENCE INTERVALS 217

Sequential Fixed-width Procedure Let

(4.10.8)

and let {Kn} be any sequence of positive constants tending to Ka as n --+ 00. Then define d2n N = smallest n 2 1 such that Vn < -. (4.10.9) K: Thus, we have the following theorem establishing the asymptotic properties of the above procedure.

Theorem 4.10.4 If 0 < a < 00, we have

(i) limd,o (d2N)/ (K:cr2) = 1 as., (asymptotic optimality), (4.10.10) (ii) limd,o P (p E I,> = 1 - a (asymptotic consistency), (4.10.11) (iii) 1imd-o (d2E(N))/ (K2a2)= 1 (asymptotic eficiency). (4.10.12)

Proof. In Lemma 4.7.1 set Yn = Vn/a2,f(n) = nKz/Ki and t = Kia2/d2. Then (4.10.9) can be rewritten as N = N(t) = smallest lc 2 1 such that Yi = f (k)/t.By Lemma 4.10.1 (N) d2N 1 = lim -f - lim - a.s. (4.10.13) t-mo t d+OK2a2 which proves (4.10.10). Now

By (4.10.13) dO/a + Ka and N/t -+ 1 in probability as t + 00. Writ- ing (fin-p) = S(F,-F)dG-S(G,-G)dF+R, where & = J(Fn-F) d (Gn - G) and using the Kolmogorov's inequality for reverse martingales, it can be shown that -RN = op (l)4. Hence, from Theorem 4.5.4 we infer that as t --+ 00, fi(2j~ - p)/a behaves like a standard normal variable. Hence

4I thank Dr. Paul Janssen of Lindburgs University center for pointing this out to me. 218 CHAPTER 4. SEQUENTIAL ESTIMATION which establishes (4.10.11). Now the property of asymptotic efficiency (4.10.12) follows immediately from Lemma 4.7.2 whenever the distributions of the Xi and E: are such that

However, this is trivially true since Vn 5 2 and o2 > 0. This completes the proof of Theorem 4.10.4. W

It also follows from Theorem 4.10.2 that

E (N)- K202/d2= 0 (1).

Case 2. Let X and Y have an unknown bivariate distribution. Let 2 = X-Y and let H(z) denote the distribution of 2. Then Zi = Xi - yZ (i = 1,2, ..., n) constitute a random sample of size n from H. Also, let Hn(2) denote the empirical distribution function based on Zl,Z2, ..&. Let p = H(0) and fin = Hn(0). Then, it is well known from the asymptotic normality of the binomial variable and Cramer’s (1946, p. 254) theorem that

Then, a fixed-width confidence interval procedure is as follows: The stopping variable N is given by

d2n N = smallest integer n 2 2 such that H, (0) [l - Hn (O)] 5 - (4.10.14) Ki and we give the confidence interval IN = (HN(O)- d,H~(0)+ d). Then the asymptotic properties of optimality, consistency and efficiency also hold for the above sequential procedure.

Asymptotic Relative Efficiencies (ARE) We would like to compare the nonparametric procedures for estimating P(X < Y) with the parametric competitors discussed in Section 4.9 assuming that they have the same prescribed width and coverage probability. Thus the asymptotic efficiency of the procedure given by (4.10.9) relative to (4.9.2) (given by the ratio of the reciprocals of expected sample sizes) is

(4.10.15) 4.11. PROBLEMS 2 19 where u2 = 1 + (p2/202),p = E(Y -XI, o2 = war(^ - X) = war(X) + wur (Y),and o*2is given by (4.10.6). The asymptotic efficiency of the procedure in (4.10.14) relative to (4.9.2) is

Weiss and Wolfowitz (1972) give an optimal fixed-length confidence limit for the location parameter of an unknown distribution. They also compare its per- formance with the means procedure of Chow and Robbins (1965) and the scores procedure of Sen and Ghosh (1971). Sproule (1969) has extended the results of Chow and Robbins (1965) to the class of U-statistics.

4.11 Problems

4.2-1 In the binomial problem, assume that at least two observations are taken. Write 0** for the point (2,l). Estimate unbiasedly O (1 - 0). [Hint: we have P(O**)= 28 (1 - 8) and let Y = $P(O**(N,T') = T** (n,T') /2~(n,T..) where T** (n,t) denotes the number of paths from 0** to (n,t). Then Y is an unbiased estimate of 8 (1 - 8) .]

4.2-2 In the binomial case, check whether removal of (i) O**, (ii) (2,O) destroys closure. Find all unbiased estimates of 0 which depend on (N,T') only. Among them, there is only one bounded estimate.

4.2-3 Verify that the sequence of sufficient statistics for the exponential family of densities is transitive.

4.2-4 If X is normal (8, 02),show that {x~,sx} is transitive. 4.3-1 Show that the exponential family of densities satisfy the regularity con- ditions of Theorem 4.3.2 on f (z;e).

4.4-1 Is it possible to have a two-stage procedure for estimating the parameter of the exponential density where (i) f (z;8) = exp {- (Z - 0)}, z 2 8, (ii) f (z;8) = 8-1 exp (-xp), z > o ?

4.4-2 (Two-stage procedure for the binomial parameter). Let XI,X2, ... be i.i.d. Bernoulli variables with P (XI= 1) = p and P (XI= 0) = 1 - p for some 0 < p < 1. Let T be an unbiased estimator for p such that war (T)5 B for all p and specified B. For each integer m, let T, = cE1Xi/m 220 CHAPTER 4. SEQUENTIAL ESTIMATION

and c = (4m)-l. Consider the estimator 5!?m where Pm = (B/C)Tm+ (1- I3C-l) Tmlfiwhere fim = (1 - BC-l) Nm and Nm satisfies

B* and --- B* -B B*+C-C’

Show that ?m is an unbiased estimator of p with war Tm ) 5 B for all 0. [Hint: See Samuel (1966, pp. 222-223) and note that the expected saving in observations is (BIG‘)E (Nm)when compared with the estimator Tml~ which is based on the second sample above, which is also unbiased and whose variance is bounded by B.]

4.4-3 (Two-stage procedure for the Poisson parameter). Let

P(X = x) = exp (-0) 0”/x!,x = O,I, 2, ... (0 > 0).

Let T be an unbiased estimator for 0 such that war (T)5 B for all 0 where B is a specific bound. Let Sm = XI+ X2 + - - - + Xm, Nm = (1+ Sm)/mB. Define g(Sm) = 1 if Sm = 0, and = Sm otherwise. Take additional Nm - m observations and consider

(i) Show that Ee(Z) = 0 and war (Zm) = B + B [h(A) - (p - 1)e-’] /p where h = me, m = (pB)-1/2. (p 2 1). (ii) Assuming that supx h (A) = 0.1249, make the necessary modifications in the choices of m, Nm and Zm so that the estimator has the desired properties. Also compare the expected sample sizes of this estimator and the estimator of Birnbaum and Healy (1960). [Hint: See Samuel (1966, pp. 225-226) .]

4.4-4 (Two-stage procedure for estimating the variance of a normal distribu- tion). Let XI,X2, ... denote a sequence of i.i.d. normal variables distributed normal (p,a2), where p and u2 are unknown. Set up a two-stage procedure for given d and a where 2d denotes the fixed width and 1 - a denotes the confidence coefficient of the interval estimate for 02. [Hint: Let s;,, denote the unbiased version of the sample variance based on a preliminary sample of size no taken from the normal density. It is desired to determine n, on the basis of the preliminary sample such that

P(lSi+, -a21 1-a 4.11. PROBLEMS 221

where

and XI,X2, ..., Xn+l is an additional random sample of size n + 1. The above probability statement is equivalent to

where E is expectation with respect to n, a = d/a2, V = si+,/a2 and f1 (-In)is the density of a chi-square variable divided by n, the degrees of freedom. Connell and Graybill (1964) have shown that

Hence, if a were known we set n = 1+ 7r (In a)2/a2. Since a is unknown, let n = 1 + 7r (In a)’ k2s;,/d2 where k is some constant independent of a such that > 1-a which determines the value of k, given by

(no - 1) [(~/a)~/(no-~)- 11 k= 2 In (1/a)

For further details, see Graybill and Connell (1964a) or Govindarajulu (1987, pp. 375-377).]

4.4-5 (Two-stage procedure for estimating the parameter of a uniform density). Let f (u)= l/8 for 0 < u < 8. Then determine the sample size n, based on a preliminary sample of size no (specified) from f (u) such that

where d and a are specified and 8, is an estimator of 8. [Hint: The maximum likelihood estimator of 8 is Y(n),the largest observa- tion in the sample. Let d/B = a. Then

P((Yin)-BI 1 - a. 222 CHAPTER 4. SEQUENTIAL ESTIMATION

If 8 were known, we let n' = Ilnal /In (1 - a), 0 < a < 1. Since B is unknown, we replace a with the estimator, namely u = bd/Y(,), and de- termine b by the inequality

E 1-- [ ( ;:))n] > - a-

For further details, see Graybill and Connell (196413) or Govindarajulu (1987, p. 378).]

4.4-6 Using double sampling scheme, estimate the Poisson mean with given fractional standard error u112 and also set up a (1 - 20)100% confidence interval for 0.

4.4-7 Estimate the exponential mean 0 by double sampling method with given fractional standard error u112 and also set up a (1 - 2a)100% confidence interval for 8.

4.4-8 Estimate the binomial proportion 8 with given fractional standard error u1/2 and also set up a (1 - 2a)100% confidence interval for 8.

4.4-9 Let 6' denote the difference between the means of two normal populations and o2 be twice the individual population variances, which are assumed to be equal. Devise a double sampling procedure for estimating 0 with specified standard error all2. [Hint: Proceed as in Example 4.4.4.1

4.5-1 Let XI,X2, ... be a sequence of independent Poisson random variables having mean 8. Estimate 8 (via large-sample theory) with given small standard error a.

4.5-2 Let XI,Xp, ... be a sequence of independent random variables having the density exp {- (X - 8)) for x 2: 8. Estimate 8 with given small standard error a.

4.5-3 Proceed as in 4.5-2 if 8 is the parameter in the Bernoulli distribution.

4.5-4 Let {Xn} be a sequence of random variables such that Xn -+ B a.s. and

let N denote a stopping variable such that Nn/n --+ 1 in probability. Then XN + 8 in probability. Write P ( ~XN- 01 > E) = c,"=1P(~XN - 81 > E, N = m),and partition the summation into the sets: (i) Im/n - 11 > S and (ii) Im/n - 11 5 S and use Lemma 4.11.1 in Govindarajulu (1987, p. 421). 2 4.5-5 Let si= n-l Cy=l (Xi - Xn) , namely the sample variance. Using the result in Problem 4.5-4, show that si--+ o2 in probability. 4.11. PROBLEMS 223

4.5-6 Let Xn be uniformly continuous in probability and let Xn -+ 9 as. Let g be a function such that g' is continuous in the neighborhood of 9. Then g (Xn)is uniformly continuous. [Hint: g (Xn)- g (8)= (Xn- 8) g' (X;), note that g' (Xn)--+ g' (9) as. and use Problem 4.5.4.1

4.5-7 Let 8, denote the mle of 8 based on random sample of size n when f (2;0) denotes probability (or density) function of the random variable X. Assuming the regularity assumptions (a)-(c) of Theorem 4.5.5 and

show that

[Hint: Starting from Equation (4.11.9) in Govindarajulu (1987, p. 427) one can obtain

and for some e > 0

Now, using Lohe's (1977 Vol. 1, pp. 254-255) lemma we have

1 -' n2 Bn (9) = o (I).

Consequently n+ (an - 6) B; (9) = o (1)

from which it follows that 8, - 6 = o . Again, using Lohe's (1977) lemma, we have 0

n' IBn (9) + I (0)l + 0 a.s. with E = S/ (1 + 6).

Now, using these in Equation (i) we obtain the desired result.] 224 CHAPTER 4. SEQUENTIAL ESTIMATION

4.6-1 Set up a large-sample fixed-width confidence interval for 82 when 81 is unknown where f (x;81,OZ) =8;' [exp {- (X - 01) /82}], z 2 81. [Hint: Note that n1I2 [Xp)- 811 tends to zero in probability as n -+ oo where X(1)= min (Xi,X2, .,.,Xn).] 4.6-2 Assume that the underlying population has the distribution function F (z; el, e2) = 1-exp [- {(z - el) /e2}2] for x 2 el. Set up a large-sample fixed-width confidence interval for 82 when 81 is unknown.

4.6-3 Let X be distributed as Poisson with parameter 8. Set up a large-sample fixed-width confidence interval for 8.

4.6-4 Let (XI,X2) have the probability function:

where 81,& 2 0, 81 + 82 5 1 and x,y = 0, 1, ..., n. Set up a large-sample fixed-width confidence interval for (i) 81 and (ii) 82 assuming that both 81 and 82 are unknown.

4.6-5 Let

Set up a fixed-width confidence interval for (i) 8 when cr is unknown and for (ii) u2 when 8 is unknown. [Hint: For alternative procedures, see Zacks (1966) who shows that the procedure for 8 which is based on the sample mean is inefficient when compared with the procedure based on the maxi- mum likelihood estimator of 0.1

4.6-6 Let X be distributed as normal (p,cr2) where p and u2 as unknown. Set up a large-sample fixed-width confidence interval for cr2.

4.6-7 Let X be distributed uniformly on (0,B). Set up a large-sample fixed- width confidence interval for 8. [Hint: Let Y(n)denote the maximum in a random sample of size n. Use 8, = (n+ 1) Y(n,/n as the unbiased estimate of 8 based on the maximum likelihood estimate.] 4.7-1 Let 89, 102, 108, 92, 98, 110, 88, 96, 94, 105, 107, 87, 112, 95, 99, ... constitute a sequence of independent observations from a normal population having an unknown mean p and variance u2. Estimate p by a confidence interval of given width 6 and confidence coefficient 0.90. Also find the expected sample size assuming that u2 = 10. [Hint: Use no = 2.1 4.11. PROBLEMS 225

4.7-2 For the above data estimate p with prescribed standard error a = 1. 4.9-1 Let X and Y have a bivariate normal distribution with unknown mean vector and variance-covariance matrix. Also, let (X;,Y,) i = 1,2, ... denote a sequence of independent observations from this population. Set up a fixed-width confidence interval for p = P (X < Y)and study its asymptotic properties when the width is small. [Hint: See Section 4.9.1 This page intentionally left blank Chapter 5 Applications to Biostatistics

In this chapter we will study some sequential procedures that are germane to biostatistics

5.1 The Robbins-Monro Procedure Let Y (x)denote the response to a stimulus or dose level x and assume that Y (x) takes the value 0 or 1 with E [Y(x)] = P {Y (2) = 1) = M (x) where M (x) is unknown. We wish to estimate 0 such M (0) = a where a is specified (0 < a < 1). Next the Robbins-Monro (1951) procedure is as follows: Guess an initial value x1 and let y, (x,) denote the response at x,. Then choose xn+l by the recursion formula:

where an, n = 1,2, ... is a decreasing sequence of positive constants and an tends to 0 as n tends to infinity. If we stop after n iterations, xn+l will be the estimate of 8. Without loss of generality we can set a = 0. Then (5.1.1) becomes

A suitable choice for an is c/n where c is chosen optimally in some sense. Further it is not unreasonable to assume that M(z) > 0 for all x > 0. With an = c/n, Sacks (1958) has shown that (xn - 8) fi is approximately normal with mean 0 and variance 02c2/(2cal - 1) where a1 = M' (0) and o2 = war (Y (x)Ix) provided cal > 1/2. Robbins and Monro (1951) proved that xn converges to 8 in probability under general assumptions on the sequence {an} and on the distribution function H (ylx) = P {Y (x)5 y)x}. When an = c/n this result becomes

Theorem 5.1.1 If (i) Y(x)is a bounded random variable and (ii) for some 6 > 0, M(x) 5 a - S for x < 8 and M(x) 2 a + S for x > 8, then xn

227 228 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS converges in quadratic mean and hence in probability to 0, i.e. limn+mbn = limn+w E (xn - 8)2 = 0 and 2, + 0 in probability. Blum (1954) established the strong convergence of xn to 8. Recall that the asymptotic variance of Jn (xn - 0) is u2c2/ (2ca1 - 1) and this is minimized at c = l/cq, the minimum value being a2/ay. The choice of c was recommended by Hodges and Lehmann (1956) and it suffices to require that c > / (2a1)-'.

Remark 5.1.1 If we can assume that M(x) = a + a1 (x - e), then one can derive explicit expressions for E (Xn+l) and E (x:+~). See for instance Govin- darajulu (2001, Section 7.2.3).

5.2 Parametric Estimation

Although the Robbins-Monro (R-M) procedure is nonparametric since it does not assume any form for M(x) or for H(ylx) = P(Y 5 yJx), in several cases, especially in quantal response situations H (ylz) is known to be Bernoulli except for the value of a real parameter y. We can reparameterize it so that y will become the parameter to be estimated. Let E [Y(x)] = M, (x)and var [Y (x)] = V, (x). Since y determines the model, 8 is a function of a and y. Further, assume that there is a one-to-one correspondence between 8 and y so that there exists a function h such that y = ha (0). Then we may use xn as the R-M estimate of 8 and obtain the estimate of y as h, (xn). Now the problem is choosing an in order to minimize the asymptotic variance of the estimate of y. Using the delta method one can easily show that fi [ha(xn) - y] is asymptotically normal with mean 0 and variance [h', (e)]2 u 22c / (2cal - 1). In our quantal response problem Y (x)= 1 with probability M, (x) and

v, (4= M, (4 - M, (41* Then the R-M estimate 2, of 8 is such that Jn(xn - e) = normal (0, rile!;) - Let

For given a the best value of c = [M; (0)I-l. With this c the asymptotic variance of Jnh, (xn) is [h', p)12a2 (e) - a (1 - a) [M.@)I2 [q(0)12 5.2. PARAMETRIC ESTIMATION 229 since o2 (8) = V, (8) = a (1 - a) and by differentiating the identity (8) = a with respect to 8 we obtain h; (0) = -M. (8)/M, (0). Now, the value of a which minimizes a (1 - a)/ [M; (O)] will be independent of y provided that M; (0) factors into a function of 8 and a function of y [like M, (8) = r {s (7)) t (O)]. For example, we can take P {Y (x)= 1) = M, (2) = F [x - y + F-' (p)] for some 0 < ,O < 1, where F is a distribution function. Now, with that representation, y can be interpreted as the dose of x for which the probability of response is ,O; i.e., y = LDloop (lethal dose loop). Then the formula for the asymptotic variance takes the form of a (1 - a) since M, (8) = a implies F [8 - y + F-' (p)]= a which in turns implies F-' (a)= 8 - y + F-l (a) and M; (8) = -F' [O - y + F-l (p)]. The asymptotic variance is independent of p since the problem is invariant under translation shifts. Now the value of a that minimizes the asymptotic variance is a = 1/2 when F is normal or logistic. [Note that the derivative is fe4{(l- 2a) f2 (F-' (a))- 2a (1 - a) x f' (F-l(Q.))Lf = f (F-l(Q.>)l. If we want to estimate y = LDloop, then we do not need the parametric model since we can set a = p and y = 0 and thus estimate y directly from x, via the R-M method. The advantage of this method is that it assumes very little about the form of F, the disadvantage may be a significant loss of efficiency, especially when ,O is not close to 1/2.

Remark 5.2.1 Stopping rules for the R-M procedure are available in the literature. For survey of these, see Govindarajulu (1995).

Example 5.2.1 Suppose we wish to estimate the mean bacterial density y of a liquid by the dilution method. For a specified volume x of the liquid, let

1, if the number of bacteria is 2 1 0, otherwise.

Then P{Y (x)= 1) = M, (x)= 1 - e-Y2 under the Poisson model for the number of bacteria in a volume x. Hence - (1 - a)In (1 - a) M; (e) = ee-@ = Y since 1 - e--Ye = a. Consequently, the asymptotic variance becomes 230 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS and whatever be y, this is minimized by minimizing the first factor with respect to a. Hence the best a is the solution of the equation

2a = -In (1 - a) or a = 0.797.

Thus, the recommend procedure is to carry out the R-M procedure with a = 0.797 and a, = 4.93/n.$, [since 1/c = a1 = M.4 (0) = y (1 - a)]where .$, is our prior estimate of y. Our estimate of y after n steps is 2a 1.594 1 -- - since y = ha (0) = -- In (1 - a) %+1 zn+1 0 and the asymptotic variance is y2/4a (1 - a) = 1.544~~since 2a = - In (1 - a).

5.3 Up and Down Rule

Dixon and Mood (1948) proposed an up and down method for estimating LD50 which is simpler than the R-M procedure. In the latter the dose levels are random and hence cannot be specified ahead of time. The Dixon-Mood method chooses a series of equally spaced dose levels xi. Let h denote the dose span. Typically the dose levels are logs of the true dose levels. The response Y (x)is observed only at these levels. The first observation is taken at the best initial guess of LD50. If the response is positive the next observation is made at the immediately preceding lower dose level. If the response is zero, the next trial is made at the immediately higher dose level. If positive response is coded as 1 and no response is coded as 0, then the data may look like the one in Figure 5.3.1

I 1 I I I I I I 1 I I I I -3 -2 -1 0 1 2 3 Figure 5.3.1 Dose Level

The data in Figure 5.3.1 can be explained as follows: We start at dose level 0 at which the response is 1; then we go to dose -1 and the response is 0; then we go to dose 0, and the response is 1; then go 5.4. SPEARMAN-KARBER (S-K) ESTIMATOR 231

to dose -1 at which the response 0; then go to dose 0, and the response is 1, then go to dose -1 at which the response is 0; then go to zero dose at which the response is 0, then go to dose 1 and suppose the response is 1; then go to dose 0, the response is 0; then take dose 1, the response is 0; then go to dose 2, the response is 1.

The main advantage of this up and down method is that testing hovers around the mean. Also there is an increase in the accuracy of the estimate. The saving in the number of observations may be 30-40%. Further, the statistical analysis is fairly simple. One disadvantage may be that it requires each specimen be tested separately which may not be economical, especially in tests of insecticides. This method is not appropriate for estimating dose levels other than LD50. Also, the sample size is assumed to be large and one must have a rough idea of the standard deviation in advance. We set the dose level to be equal to the standard deviation. The up and down procedure stops when the nominal sample size is reached. The nominal sample size N* is a count of the number of trials, beginning with the first pair of responses that are unlike. For example, in the sequence of trial-result 000101, N* = 4. Dixon (1970, p. 253) provides some tables to facilitate the estimate of LD50 for 1 < N* 5 6 and is given by

where xt denotes the final dose level in an up and down sequence and k is read from Table 2 of Dixon (1970). For instance, if the response is 011010 and xt = 0.6 and h = 0.3, then N* = 6 and the estimate of LD50 is 0.6+0.831(0.3) = 0.85. For nominal sample sizes greater than 6, the estimate of LD50 is (Cxi+ hA*)/N* where the xi values are the dose levels among the N* nominal sample trials and A* is obtained from Table 3 of Dixon (1970, p. 254). Also note that A* depends on the number of initial-like responses and on the difference in the cumulative number of ones and zero values in the nominal sample of size N*.

5.4 Spearman-Karber (S-K) Estimator

The Spearman-Karber estimator has several desirable merits (especially from the theoretical point of view). For extensive literature on the estimator the reader is referred to Govindarajulu (2001). Nanthakumar and Govindarajulu (N-G) (1994, 1999) derive the fixed-width and risk-efficient sequential rules for estimating the mean of the response function. Govindarajulu and Nanthakumar (2000) (G-N (2000)) have shown that the MLE of LD50 and the scale parameter in the logistic case are equivalent to the Spearmen-Karber type of estimators. They also derive simple expressions for the bias and the variance of the S-K estimator of LD50. Using these they obtain sequential rules that are simple to carry out. These will be presented below. 232 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

Let X-k, x-k+l, ...,20, XI, ...) Xk-1, xk denote the 2k + 1 dose levels with Xi = 20 + ih, i = -k, ..., 0, ..., k when xo is chosen at random between 0 and d. We subject n experimental units at each dose level and record the responses as 1 or 0 according as the experimental unit responds to the dose or not. Let Pj = P(zj) denote the probability of a positive response at xj = zo+ jh. By definition, p, the mean of the tolerance distribution is given by

(5.4.1)

Then the S-K estimator is given by

0 k (5.4.2)

where pj = rj/n denotes the sample proportion of positive responses at xj. In particular, if P (x)= [1+ exp {- (x- 0)/o}]-', then the S-K estimator of 8 is

Also the S-K type of estimator for CJ is given by

G-N (2000) have shown that the mles of 8 and CT coincide with (5.4.3) and (5.4.4). First let us give simple expressions for B = the bias in 8, and the variance of 8k. G-N (2000) obtain

(5.4.5) and = (ho) 'I2 1 - exp - (kh h/2)}] o? - [ { + (5.4.6) 81, n 0 5.4. SPEARMAN-KARBER (S-K) ESTIMATOR 233

Fixed-Width Sequential Rule Let 20be the specified width and y be the specified confidence coefficient. Then we wish to determine k such that P (IAek-eSD I ) 27- (5.4.7)

Using the asymptotic normality of it can be shown that (5.4.7) is implied by (5.4.8) where z = [(1+ y) /2]. So using (5.4.5) and (5.4.6) in (5.4.8) when 8 and u are known, the optimal number of dose levels is 2k* + 1 where

128 - hl - (z2hu/n)'I2 exp { k*h + s> 2 (5.4.9) D - (z2hu/n)lI2 -

Since 8 and 0 are unknown, we obtain the following adaptive rule: Stop at dose level 2K + 1 where

128 - hl - (z2h&/n)lI2

in the log is 5 1) (5.4.10) where 2ko + 1 denotes the initial number of dose levels.

Example 5.4.1 Let h = 0.2, D = 0.62, y = 0.90, n = 3, 8 = 1.25 and u = 1. If we choose 20 = 0.05, the rule (5.4.9) yields k* = 11. If the data is

... 2-1 20 z1 ... 0000010000000020120 0 1 1 031111521102133333333 then, K = 14, with 814 = 1.35.

Asymptotic Properties of the Sequential Rules G-N (2000) obtained the following properties of the sequential rule (5.4.10).

(i) The sequential procedure terminates finitely with probability one.

(ii) E [K(h) /k* (h)]-+ 1 when h = ho/m --+ 0 as m + 00 for some ho > 0, where k* (h)is given by (5.4.9). 234 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

(iii) (IsK(h) - el 5 D) /P (Ii&*(h) - e + 1 as h --+ 0, when h is proportional to D2.

In the following table we provide some simulated values based on 100 simu- lations, with y = 0.95 and n = 3.

Table 5.4.1’ Simulated Results for Fixed-Width Estimation dD e Average K k* Coverage Probability 0.1 0.37 1 52.4 48 1.oo 0.2 0.53 1 18.8 20 1.oo 0.1 0.37 -1 57.4 50 1.oo 0.2 0.53 -1 24.1 21 1.oo 0.1 0.37 2 57.6 57 0.91 0.2 0.53 2 27.6 25 0.94 0.1 0.37 -2 54.5 57 1.oo 0.2 0.53 -2 24.8 25 0.92

Point Estimation Let c denote the cost of each experimental unit. Then we want to select the stopping stage k that minimizes

R = Risk + Cost = var (8k) + ~2 + (2k + 1) cn, (5.4.11) where B is given by (5.4.5). Note that

Using the approximations given earlier, one can obtain ha R A - (I - 2e-lhIU) + (20 - h)2 e-lhlo + lcn (5.4.12) n where 1 = (2k + 1). If 0 and a are known, the optimum 1 (to be denoted by 2*) is given by

2n2cae1*h/20= [4n3cah (20 - h)2 + h4021‘I2 - h2a (5.4.13) L J ‘Reproduced with the permission of Taylor and Francis Ltd. The website for Statistics is http://www.tandf.co,uk/journals/titles/0233/888.html 5.4. SPEARMAN-KARBER (S-K) ESTIMATOR 235

Since 8 and CT are unknown, we have the following adaptive rule: Stop when the number of dose levels is L where

+ h4] 1’2 - h2}; (5.4.14) or approximately we can take

2e - hl} (5.4.15) provided c = 0 (hl+q) for some r] > 0 where and 6 are based on 1 dose levels.

Example 5.4.2 Let 8 = 0.625, o = 0.5, h = 0.2, n = 3 and c = 0.00055 and ko = 5. Computations yield

1* = 13 and hence k* = 6 for (5.4.14) 1* = 15 and hence k* = 7 for (5.4.15)

For the following data (generated for the above parameter configuration)

we stop at L = 15 with rule (5.4.15). 100 simulations were carried out with n = 3 and Ic0 = 5.

Table 5.4.22 Simulated Values of Stopping Time Using (5.4.15)

h C 8 0 Average K k* RK/R~*= risk ratio 0.1 0.002 1 1 18.1 24 0.976 0.2 0.008 1 1 6.3 10 0.968 0.1 0.002 -1 1 18.4 25 0.904 0.2 0.008 -1 1 7.7 11 0.834 0.1 0.002 2 1 22.6 31 1.112 0.2 0.008 2 1 0.9 14 1.115 0.1 0.002 -2 1 23.7 32 1.104 0.2 0.008 -2 1 10.4 14 1.026

Asymptotic Properties of the Point Estimate

(i) E [L(h) /E* (h)]--+ 1 when h = ho/m + 0 as m --+ 00 for some ho > 0.

(ii) RL/R~+ 1 as h --+ 0. (Risk-efficiency)

2Reproduced with the permission of Taylor and Francis Ltd. The website for Statistics is http://www. tandf.co.uk/journals/titles/0233/888.html 236 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

5.5 Repeated Significance Tests

In some situations, data is accumulating. Several tests may be conducted on the accumulating data until either one of the tests yields a significant result or the nonsignificance is accepted at the end. This is called repeated significance testing. In this procedure a significance at level 0.05 according to the final test will not be the level of significance relative to the trial as a whole. Invariably, the true significance level will be larger than that from the final test. Because of this unstructured and unplanned behavior, it is preferable to adopt a sequential pro- cedure from the beginning. This formal method of repeated significance testing is called a partially sequential procedure constructed on the basis of repeatedly applying fixed-sample size procedures in a systematic manner. Let 8 index the density function f (2;0). Suppose we wish to test HO : 8 = 0 against the alternative HI : 8 # 0. Given a sample of fixed size, say XI,X2, ...Xn, the likelihood ratio test with critical level A = ea rejects Ho if and only if 1, > a where 1, denotes the logarithm of the . Then the probability of a type I error is a = Po (1, > u) (5.5.1) which may be estimated by using the chi-squared approximation to the null dis- tribution of 1,. One may test Ho repeated over time. If m and N are integers such that 1 5 m < N and 0 < b 5 a, then the repeated significance test rejects Ho if and only if In > a for some n (m5 n < N) or 1~ > b. Thus letting t =t, = inf{n 2 m:1, > a} the stopping time is T = min (N,t,) and the test rejects HO if and only if either t, < N or IN > b. The probability of a type I error is given by

(5.5.2) which is typically much larger than a. Similarly, for testing HO : 8 5 0 and HI : 8 > 0 we have the following procedure. Let m 2 1 be the initial sample size, N 2 m be the maximum sample size and the logarithmic critical bounds be a and b with 0 < b 5 a. Then we reject Ho if either 1, > a and 8, > 0 for some n E [m,N]or 1~ > b and 8, > 0. Hence, if 1: = 1nI (8, > 0) and t+ = t,+= inf{n 2 m : 1: > u} (5.5.3) the stopping time of the test T+ = min (taf,N) . Note that 8, is the mle of 8. Woodroofe (1982, Section 7.2) gives asymptotic expressions (as a -+ 00) for the error probabilities and expected sample sizes of the two repeated significance 5.5. REPEATED SIGNIFICANCE TESTS 237 likelihood ratio test procedures when the underlying distribution is the exponen- tial family of densities given by

f (x;0) = exp [ex - @ (43 with respect to some sigma-finite measure over (-00, 00) . Then 8, becomes X,= (Xi + X2 + * * * + Xn)/n.

Mantel-Haenszel Test For the reference, see Miller (1981). Suppose there are two populations, where n1 and n2 are categorized as dead or alive. Dead Alive Sample size Population I U b 721 Population I1 c d n2 Total ml m2 n

Let pl = P(the patient dieslhe belongs to population I) and p2 = P(the patient dieslhe belongs to population 11). Suppose we wish to test HO : pl = p2 versus HI : pl # p2 and use the statistic

- n (ad - bc)2 - nln2mlm2 where $1 = u/n1, $2 = c/n2, $ = ml/n. With the correction for continuity,

n(lud- bcl -n/2)2 2 x: = - x1. n1722m1 m2 Now for given nl,n2, m1 and m2, the cell(1,l) frequency A has the hypergeomet- ric distribution

n1n2m1m2 varo(A) = n2 (n - I) - Hence 238 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

If we have a sequence of 2 x 2 tables (for instance, various hospitals) and we wish to test HO : Pll = P12, .-.,Pkl = pk2 where pi1 = P(death1treatment 1 at hospital i) and pi2 = P(death1treatment 2 at hospital i). Then Mantel-Haenszel (MH) statistic is given by

and with the continuity correction, it becomes

Then we reject HO for large volume of MH,.

5.6 Test Statistics Useful in Survival Analysis

The log rank statistic plays an important role in survival analysis3, which is a special case of general statistics called 2 and V which will be described below. Consider two treatments, the standard (S) and the experimental (N).Let Gs (t) [GN(t)]denote the probability that the survival time of a patient on treatment S[N]will exceed t. Suppose we are interested in testing

HO : there is no difference in the two treatments versus that is, the experimental treatment provides a longer survival. If the treatments are related such that

GN (t)= [Gs(t)lA , A = e-'. (5.6.1)

Then, we can rewrite HO and HI as

HO : 8 = 0 (i.e., GN (t)= Gs (t) for all t) and HI : 8 > 0 (ie., GN (t)> Gs (t)) 3Whitehead (1983, Sections 3.7 and 3.8, served as source for this subsection). 5.6. TEST STATISTICS USEFUL IN SURVIVAL ANALYSIS 239

If X [Y]denotes the survival time of a patient having the new [standard] treatment ; then under (5.6.1) ,

= iluAdu

Thus ee or ~=ln p=- [&I (5.6.2) (1 + ee) and 8 = 0 corresponds to p = 1/2. A reference improvement value for 8, namely OR can be chosen by selecting the corresponding value for p, namely pR. Alterna- tively if time is measured in units of years and Gs(1) = 0.65 and GN(~)= 0.75, then OR is the solution of 0.75 = (0.65)exp(-eR);namely 8~ = 0.404, which corre- sponds to pR = 0.6.

Test Statistics for the Sequential Case Suppose rn patients taking the new treatment have died previous to a particular point in time and n patients of the standard treatment have died. Assume that the times of deaths of these patients are known. If the progression of the disease is the response of interest rather than death of a patient and detection might be possible only at one of a series of monthly examinations, all recorded progression times would be multiples of months (i.e., integers). Hence ties could occur with positive probability. If dl < d2 < - - - < dk denote the distinct (uncensored) survival times and Oi the frequency of di (i = 1,2, ..., k). Let

ri = number of survival times 2 di.

Of these ri, let TiN be those on the new treatment and ris be those on the standard treatment (ri~+ ris = ri). Let A~N= riN/Ti = proportion of patients with new treatment surviving di or longer. Similarly Ais = riS/ri. Then let k (5.6.3) i= 1 240 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS and

(5.6.4)

Z is called the log rank statistic (It is related to Mantel-Haenszel test; see Miller (1981, pp. 94-98).) An equivalent form of 2 is

i=l

If m and n are large I*. mn v= me-812 + neeI2‘ If t denotes the total number of deaths, then V = t/4 when m = n t/2.

General Forms for the Statistics Z and V Starting from the likelihood function one can derive general forms for Z and V. Denote the observed data by X = (XI,X2, ..., Xn), the parameter of interest by 8 and the nuisance parameter by Q which could be a vector. For the sake of simplicity, let us take \I, to be scalar. Further, let L(8,XP;X) denote the likelihood, 1 (8, Q; X) the log likelihood and (8) be the maximum likelihood estimate (mle) of X€’ for a given value of 8. From the consistency property of the mle, we infer that in large samples, 1 8,s (8) will be close to 1 (8, Q) with the additional advantage that it depends 0only on 8. This enables us to obtain an expansion of 1 8, (8) in powers of 8 and then identify the statistics Z and V from the expansion:0 1 i e, 5(e) = constant + ez - -e2v+ o (e3) (5.6.5) 0 2 where Z is called the ‘eficient score’ for 8 and V is called the Fisher information about 8 contained in 2.

Example 5.5.1 Let X = (XI,X2, ..., Xn)’ be a random sample from normal (p,1) population. Then if Sn = Cy==lXi, then

12 1 (p) = constant + pSn - -p n. 2

d Note that Sn N normal (np,n). If there is no nuisance parameter

z = 10 (0) , V = -lee (0) (5.6.6) 5.6. TEST STATISTICS USEFUL IN SURVIVAL ANALYSIS 241 where d d2 le = as1(0) and V = -1 (0). (5.6.7) de2 In the presence of a single nuisance parameter @ let

!P !P (0) = S* and we can expand 5 (0) about 0 = 0 as follows: 5(0) = S* + 0Ge (0) + o (e2) (5.6.8)

A where (8) = 35 (8). Further, since (8) is an mle,

zq { 0, G (8)) = 0. (5.6.9) Hence

for all 8. Using (5.6.10) in (5.6.8) we have (after ignoring 0 (0'))

(5.6.11)

Now expanding Z(0, S) about (0, @*) we have 1 I (e, S) = z (0, s*)+ 0ze (0, Q*) + 202Zee (0, s*) 1 +e (S - Q*) (0, Q*) - - .*)' (0, Q*) (5.6.12) + 2 (XP where the term la (O,@*) is omitted because its value is zero. Now, substitute G (0) for S, use G (0) - S* from (5.6.11) and obtain

Hence Z = 10 (0, S*) and

= -l/P (0, Q*) 242 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS where lee (0, 9*)is the leading element of the inverse of the matrix lee le* [ le* lw ] the arguments of the second derivatives being 0 and 9*.

Example 5.5.2 Let XI,X2, ..., Xn be a random sample from normal(p, 02). Let 6 = p and 9 = o-~.If Sn = XI+X2 + - . * + Xn, then n n l(6,9) = -- In (27r) - In 9 2 + 2 Then one can easily obtain

10 (0, @*) = Q*Sn, leq (0, Q*) = Sn, Hence 1/lee (0, e*)= -nQ* + -.29*2s; n Thus 2z2 2 = 9*Sn and V = n9* - -. n

Example 5.5.3 If pl and p2 denote the proportion of cures by two different treatments, we will be interested in testing HO : pl = p2 versus HI : pl > p2. We can reparameterize and set ~=1n[Pl(1 - P2) ] P2 (1 - P1) and 9 = In

Then HO corresponds to 6 = 0 and H1 corresponds to 6 > 0.

Asymptotic Distribution of 2 When 6 is small Whitehead (1983, p. 56) asserts that the approximate distribu- tion of 2 is normal with mean 6V and variance V. This result is extensively used in constructing triangular and double triangular sequential tests of hypotheses about 6. Suppose we wish to test HO : 6 = 0 versus HI : 6 > 0. Then we plot 2 on the y-axis and V along the x-axis and the triangular test can be depicted as in Figure 5.6.1. The continuation region is 2 E (-c + XlV, c + X2V) where X2 = -A1 yields the symmetrical case. Then the lines meet on the V-axis. 5.6. TEST STATISTICS USEFUL IN SURVIVAL ANALYSIS 243

ZA Reject H, C

0

-C

Figure 5.6.1 A Plot of 2 aginst V for the Trangular Test

Double Triangular Case

Suppose we are interested in testing HO : 8 = 0 versus HI : 8 # 0. Then we run two triangular tests:

R: : Ho VS. Ht:O>O R, : Ho VS. HC:O

bV

Reject H, in favor of HI-

Figure 5.6.2 The Double Triangular Test

As we see in Figure 5.6.2 we accept Ho when the outcome is within the triangle bounded by the lines z = c + X2V and z = c - qlV. Note that we will have P(reject HolO = 0) = 2a. If I denotes the interval inspection, then V is usually an integral multiple of I. The choice of the length of the inspection interval is specified by the organizers of the . For the triangular test, 244 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS according to Whitehead (1983, p. 72),

2-

4c vmax = - 0R where 9~ need to be specified by the problem on hand and P (reject Hole = 0) = a and P (accept Hole = 0,) = a. For further details on the theory of the tri- angular test the reader is referred to Whitehead (1983, Section 4.9). 0.583fi is called the correction for overshoot. The rejection rule for the composite test R in the double triangular test is

R+ R- R rejects HO accepts HO rejects HO in favor of H: accepts HO accepts HO accepts HO accepts HO rejects HO rejects HO in favor of Hc

The duration of R = max(duration of R+, duration of R-) since both com- ponent tests must stop before R does. The constants are computed along the lines of the triangular test. We have

where c and Vmax are as given before. Whitehead (1983, Equation 4.9.7) gives an explicit expression for E (V.19) where V* is the stopping value for V.

Repeated Significance Tests Using Statistics 2 and V Suppose we wish to test Ho : 0 = 0 vs. the alternative HI : 9 # 0. Let the inspection interval be I. A maximum number, N of inspections is specified. Carry out a fixed sample size test at each inspection. The form of the test is to reject HO when 2 falls out of the interval (-kn, kn)and to accept HO otherwise. The value of k is determined from normal distribution tables and corresponds to a ‘normal significance level’ 2a‘. That is, each of these tests is conducted at the level 2a’. If any one of these tests rejects Ho, then the overall procedure terminates with the rejection of Ho. If all N of them accept Ho, then the overall procedure proceeds to the 5.6. TEST STATISTICS USEFUL IN SURVIVAL ANALYSIS 245

Nth inspection and then accepts Ho. The sequential trial is planned so that P (accept Hole = 0) = 1-2a and P (accept Hole = 8,) = P (accept HOjB = -8,) = p. Thus 2a is the overall significance level of the test. For given values of I, OR, a, ,B and N one can compute 2a' and k. Originally Armitage (1975, Tables 5.5 and 5.6) has prepared certain tables which enables one to carry out the procedure. Whitehead (1983, Table XI) reproduces these tables. If we use the approximate normality of 2 with mean OV and variance V,we have for a = 0.025, p = 0.05,

P{Z E (-c,c) le = 01 = 1 - 2~ (-- &) = 0.95 which implies that c = 1.96fi. Further,

= 0.05.

Hence c - 8Rv = -1.645fl which implies that

For instance, 8; = 0.5 yield V A 52. Then

c-ev =

Whitehead (1983, Equation 6.2.6) gives an explicit expression for E(V*]O) where V*denotes the stopping value for V.

Sequential Probability Ratio Test based on 2 and V Suppose we wish to test HO : 8 = 0 versus HI : 8 = 8R with a = p. Then using bounds A = (1 - a)/a = 1/B, one can easily compute the continuation region to be ZE (-c+sV,c+sV) where s = -8R1 and c = - log . 2 OR (9 246 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

In order to correct for the overshoot one can take c = -1 log (7)l-a - 0.583fi, OR where I denotes the inspection interval (see Whitehead (1983, p. 84)).

Estimation of 8 If we reject Ho via the SPRT, triangular, double triangular or the repeated sig- nificance test procedure, it is of interest to estimate 8 (pointwise or via a fixed- width confidence interval). If for given d and y we want to find V such that P 8 - 8 5 d = y, then using the normality of 2 with mean 8V and variance V,(IAone canI easily1 determine the required V to be

On the other hand if we wish to minimize the risk function 2 ~(8)= AE (e - 8) + v then the optimal V = All2. If we wish to estimate 8 after we stop in a sequential trial (say at V*),then 8 = Z*/V* is no longer unbiased, since the stopping rule depends on the ev- idence accumulated. Armitage (1957), Siegmund (1978) and Whitehead and Jones (1979) have provided estimates of 8 after a sequential test procedure has been terminated, Whitehead (1983, Table V) has provided tables for computing a confidence interval for 8.

5.7 Sample Size Re-estimation Procedures 5.7.1 Normal Responses Estimation of the required sample size is an important issue in most of clinical trials. Fixed-sample designs use previous date or guessed values of the parameters which can be unreliable. The classical sequential designs are limited to situations where outcome assessment can be made only after patients are enrolled in the trial. Group sequential designs have also been used in clinical trials. However, the type I error rate at each analysis stage needs to be adjusted so as to control the overall type I error probability at a specified level. In several clinical trials, especially those dealing with nonfatal ailments, investigators would like to come up with a procedure at an interim stage in order to obtain updated information on the adequacy of the planned initial sample size. This often takes place when 5.7. SAMPLE SIZE REESTIMATION PROCEDURES 247 the natural history of the ailment is not well known or the treatment under study is new. In those cases, investigations are often unsure of the assumed values of the parameters that were initially used for calculating the sample size of the planning stage. Note that the initial parameter values are obtained, invariably, from various studies conducted on different populations of patients, diagnostic criteria etc. Thus, the initial sample size does not guarantee either the width of the confidence interval in estimation or the desired power in hypothesis-testing setup. Hence, it is desirable to monitor the clinical trial so as to assume that the basic assumptions on the design are reasonably satisfied and to construct procedures for estimating the sample size using the data available at the interim stage. Thus Shih (1992) makes a compelling case for not unblinding the treatment codes at the interim stage so that the integrity of the trial is maintained and no conscious or unconscious bias is introduced. If the goal of the trial is to reestimate the required sample size, the only decision that would be taken is the determination of how many additional obser- vations, if any, are needed beyond those planned earlier. If no further observations are needed, the planned sample size is sufficient and the trial will be carried out. In a two-treatment double-blind clinical experiment, one is interested in testing the null hypothesis of equality of the means against a one-sided alternative when the common variance o2 is unknown, We wish to determine the required total sample size when the error probabilities, a and ,B are specified at a predetermined alternative. Assuming normal response, Shih (1992) provided a two-stage proce- dure which is an extension of Stein’s (1945) one-sample procedure. He estimates o2 by the method of maximum likelihood via the E-M algorithm and carries out a simulation study in order to evaluate the effective level of significance and the power. For further references on normal response, the reader is referred to Shih (1992). Govindarajulu (2002) proposed a closed-form estimator for u2 and showed ana-lytically that the difference between the effective and nominal levels of significance is negligible and that the power exceeds 1 - ,B when the initial sample size is large. Govindarajulu (2003) has extended the above results when the responses are from arbitrary distributions. In the following we present these results which are valid for responses from an arbitrary distribution.

5.7.2 Formulation of the Problem Suppose the two treatment responses X and Y have unknown means pl and p2 and unknown common variance u2.We further assume that o2 is not functionally related to p1 and p2. We wish to test HO : pl = p2 against the alternative, I71 : pl < p2 with specified error probabilities a and p at p2 = pl + S* where S* is specified. Since the clinical trial is double-blind, we do not know to which 248 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS treatment the response belongs. If U denotes the response, X,if the observation is on a patient assigned to treatment 1 U={ Y,if the observation is on a patient assigned to treatment 2. If ni denotes the number of patients assigned to treatment i (i = 1,2) we take 721 = n2 = n/2 where n denotes the total number of patients. Since equal number of patients are allocated to each treatment

P(U=X)=P(U=Y)=1/2. (5.7.1) Consequently one can easily obtain

and warU = o2 + (P2 -Pd2 (5.7.2) E(U)= 2 4 where E (X)= pl, E (Y)= p2 and war (X)= war (Y)= 02. If Dn = C:==,U;/n and, X,, and Fn2 denote the sample means, we have the identity :

(5a 7.3) If a and p denote the error probabilities at the alternative p2 = pl+ 6*, then Shih (1992) obtains n1 = 722 = 2 (za + zp)2 (.-*/s*)~ (5.7.4) where o* is an initially guessed value of o and z, = (1 - a), denoting the standard normal distribution function. Let en denote an estimate of o and let

M = n1 (5.7.5) Then, N, the total number of observations on each treatment is defined as

(5.7.6)

Draw N - n1 additional observations from each treatment. Then the decision rule is: Reject HO when (5.7.7) and accept HO otherwise. We take as in Govindarajulu (2002), n (5.7.8)

Now, one can ask (assuming that the responses follow arbitrary distributions): 5.7. SAMPLE SIZE REESTIMATION PROCEDURES 249

(i) What is the effective level of significance of this procedure?

(ii) What is the effective power at the specified alternative?

In the following we will provide answers to these questions.

The Effective Level of Significance The following lemmas are needed.

Lemma 5.7.1 Let

(5.7.9)

Then as n becomes large

Z2 0, when NO is true

-n-1 -+ { 6*2/4a2 when p2 - p1 = 6* an probability.

Proof. When Ho is true Z has a standard normal distribution as n becomes large. When p2 - pl = 6* one can write

where 5 will have (asymptotically) a standard normal distribution. Thus Z/ (n - 1)11 tends to zero (in probability) when HOholds and to 6*/2a when p2 - p1 = 6*.H

Lemma 5.7.2 As n becomes large

when Ho holds (5.7.10) '02 { :'+ 6*2/4a2 when p2 - pl = 6*. - Proof. Let zi = (Xi - pl)/o and y3 = (% - p2)/a, i, j = 1,2 ,...,n1. Then we can write 250 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

By the weak law of large numbers, n;' Cyil 2! and nl' CYi, tend to 1 in probability. Using Lemma 5.7.1 completes the proof of Lemma 5.7.2.

Lemma 5.7.3 Let 8 = (a*/~)~.Then as n gets large, an probability

when Ho is true 1i- S*2/4a2 when p2 - p1 = S*.

Proof. Readily follows upon noting that M0/n1 = &:/a2. Now, let us consider the effective level of significance which will be denoted by a*. Also, for the sake of simplicity, let z = z,. Then,

(5.7.12) where PO denotes the probability computed when HO is true. We can write

a* = PO{y" - XN> z&n (2/N)1/2 ,&n 5 a*} - +PO{FN - XN > x&n (2/N)lI2,&n > a* = TI + T2, respectively. (5.7.13)

For sufficiently large n, when Ho is true, &n 5 a* becomes 0 2 1 where 0 = (~*/a)~.Thus using Lemma 5.7.2, we have a!, if 0 2 1, 571 N Po (2 > z, 0 2 1) = 0, otherwise.

Next, we can write T2 as

Using Lemma 5.7.3, we can write

By Anscornbe's theorem (1952)

Hence

T2 M aif0<1, = OifO21.

Thus a* = TI + T2 M a for all values of 0. H 5.7. SAMPLE SIZE RE-ESTIMATION PROCEDURES 251

Effective Power at the Specified Alternative Let ,B* = P* {Y" - x" 5 8, (2/N)l12z}. Then, we have the following result. Result 5.7.1 We have s*2 , when 8 1 - 2 + 402 ' (5.7.14) s*2 when 8 < 1 + - 402 ' when Q, (q)= p.

Proof. We can express p* as

= T;"+ T; respectively.

Note that since (CT;/O)~ + 1 + S*2/402 in probability,

S*2 T;" x 0 when 8 < 1 + - 402 and S*2 Tl= 0 when 8 2 1 + - 402 Thus, when 8 2 1 + S*2/402, we can write

- d where 2 = (F,, - X,, - S*) (n1/20~)~/~= normal(0,l) for large 721. Also since

2 721 = 2 (zcy + zp)2 ($) , we obtain 252 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

Next, when 8 < 1 + 6*2/402

Using Lemma 5.7.3 and Anscombe's theorem (1952) on the asymptotic standard normality of 1/2 - - YM - XM - S* (y2(1 + ")402 ( 0 we obtain

Thus 6*2 p, when 8 < 1 + - T; 2~ 402 ' 6*2 0, when 9 2 1 + - 402 - This completes the proof of Result 5.7.1. I

Govindarajulu (2002, 2003) has tabulated the values of (1 - p*) / (1 - p) as a percent per selected values of the parameters when 8 2 1 +S*2/4a2. The following conclusions can be drawn from the above tables.

(i) For fixed 6* and o*,both S*/o and & increase as 0 decreases. When o*~- o2 2 6*2/4, the gain in power is higher than when 0*2- cr2 < S*2/4. The values of 0.2, 0.35 and 0.50 for S*/a are considered to be of interest in clinical trials. (ii) The percentage of gain in power is non-negative and is less than 3 percent for all practical values of S*/a.

Fixed-width Confidence Interval Estimation

Suppose we wish to estimate 7 = p2 - p1 with a confidence interval having width 2d and confidence coefficient y. Let z = ~(l+~)pbe such that 2@ (2) - 1 = y as before, let o* be a preliminary estimate of 0. Then the number of patients to be assigned to each treatment is given by 2 nl= (q). (5.7.15)

Let &n (when n = 2711) denote an estimate of o2 based on the blinded re- sponses U1)U2, ..., Un. Then according to the two-stage procedure, we stop at n1 if (5.7.16) 5.7. SAMPLE SIZE REESTIMATION PROCEDURES 253 otherwise allocate M - n1 additional patients to each treatment where

Note that M/nl = In other words, the total number of patients on each treatment is

nl, if &n 5 u* (5.7.17) N={ M, if5n>a*. Assume that n1 is sufficiently large,- say > 30. After we stop, the confidence interval for 7 = p2 - p1 is (YN - XN)f d. Further, we assume that after total experimentation, the clinical trial is unblinded and we know which are X and Y observations. Of much interest is the effective coverage probability y* of the resultant confidence interval. Towards this we have the following result.

Result 5.7.2 For suficiently large n1, we have

- 1, when 8 < 1+ -v2 4u2 ’ (5.7.18) 2@ (zd) - 1, when 0 2 1 + -q2 402 -

Proof. We have

Proceeding as in the proof of Result 5.7.1, one can show that -1, when821+- v2 P1 x 402 ) otherwise,

and - { ;; [Z (I+ -$)1’2] - 1, when 6’ < 1 + -402v2 ’ T2 x otherwise,

where 0 = (o*/o>~.H

From (5.7.18) one notes that y* > 1 - a for all 6’ when n* is large. 254 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

5.7.3 Binary Response Shih and Zhao (1997) propose a design for sample size re-estimation with interim binary data for double-blind clinical trials. Based on simulation study, they infer that the effect is only slight on the type I error and the nominal power. Govindarajulu (2004) derive closed-form expressions for the effective type I error probability and the power at the specified alternative. In the following we will give the results of Govindarajulu (2004).

The Design and the Preliminaries The following is the randomized design proposed by Shih and Zhao (1997). As- sume that an interim analysis is conducted when the clinical trial is halfway com- pleted from what was originally planned (i.e., when the outcome data is available from a total n* patients. Let n* be a positive integer which is a multiple of 2/7r (1 - 7r) where 0 < 7r < 1 and 7r # 1/2. Allocate at random n*/2 patients to stratum A and the rest to stratum B. In stratum A allocate 7rn*/2 patients to treatment 1 and the rest to treatment 2. In stratum B allocate (1 - T)n*/2 patients to treatment 1 and the rest of treatment 2. Note that the last allocation to treatments is double-blind. Let

nA,1 respond to treatment 1 in stratum A,

n~,2respond to treatment 2 in stratum A,

nB,1 respond to treatment 1 in stratum B,

n~,2respond to treatment 2 in stratum B,

Let pi be the probability that a patient responds to treatment i (i = 1,2). Let 81 denote the probability that a patient responds in stratum A and 02 denote the probability that a patient responds in stratum B. Due to the double-blindness of the experiment, only nA,1 + n~,2and nB,1 + nB,1 are observable. Also, el = 7t-pl + (1 - 7r)p2, e2 = (1 - n)pl + 7rp2 and (5.7.19)

Now solving for pl and p2 and estimating, we obtain

7r81 - (1 - 7r) 82 7r82 - (1 - n)81 $1 = and $2 = (5.7.20) 2n - 1 2T - 1 where 7r # 1/2 and 61 and 82 are independent. 5.7. SAMPLE SIZE RE-ESTIMATION PROCEDURES 255

One can easily see that $1 and $2 are unbiased for pl and p2 respectively, since 01 and 02 are unbiased for 01 and 02 respectively. Further

war-cow ?I) P2

From this we obtain

Hence

and war (P+) = war81 -I- war62 4 We are interested in testing

Ho : pi = p2 versus HI : pl # p2.

Let a denote the type I error probability and 1- p be the power at pl = pT and p2 = pz where pT and pa are specified. Let n* denote the required number of patients on each treatment which is assumed to be reasonably large. We are given that

(5.7.21)

Where P = (PI + p2) /2, fi = ($1 + $2) /2, z = 24and Po denotes the probability computed when HOis true. Also given is

1 - p = power at (p;,p;)

(5.7.22) where P* denotes the probability computed when (p1,p2) = (pT,p;). Note that when n*/2is large, $1 and $2 being consistent estimators of pl and p2 respectively, 256 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS we can replace @ (1 - @) by p (1 - p) in (5.7.21) and by p* (1 - p*) in (5.7.22) where p* = (pf + pz) 12. Also, let q = pl - p2. Then one can easily establish that

n* = 2 (z+ + zp)'p* (1 - p*> /q*2. (5.7.23) Note that (5.7.23) is known as Lachin's (1977) formula, an elementary proof of which is in Govindarajulu (2004, Result 2.1). Now, use $1 and @2 in order to update the sample size. Let

ii = 2 (zap + zp)2@(1 - @)/q. (5.7.24) Then we have the following rule:

If ii > n* increase the sample size at each treatment to qn* (typically 1 < ~1 5 4.1). If ii < n* decrease the sample size at each treatment to w2n* (0.6 5 w2 < 1). After the sample size re-estimation, the trial will be conducted according to the newly estimated sample size (without stratification). The treated groups are unblinded and compared at the final stage using all the patients' data. Typically, 7r is set to be 0.2 or 0.8 (and not near 0.5). Next we will study the effect of the sample size re-estimation on the level of significance and the power at the specified alternative. Let N denote the selected sample size per treatment.

The Effective Level of Significance Let (P1+ P2) (2 - pl - p2) 47 (~1,p~)= ( 5.7.25) (Pl - P2)2 When Ho is true, i.e., when pl = p2, y ($1,@2) tends to infinity as n* becomes large. Hence, the probability measure of the set tends to one. Let zi! be the effective level of significance of the test procedure. Then

= T-+T2 respectively, 5.7. SAMPLE SIZE REESTIMATION PROCEDURES 257 where D" denotes the complement of the event D and the estimates $IN, $2N and @N of pl, p2 and p = (p1 + p2) /2 respectively are based on the unblinded data (i.e., after the treatment codes are broken). Also, due to the binomial and independent nature of the random variables nA,1, n42, ng,l and ng,2, and the fact that n*/2 is large, we have the following representations (in distribution):

n*n nA,1 x -p1+2 21 (Tplql) 1'2, n* (1 - n* (1- *)P242] ll2) nA,2 2 %2 + 22 [ 2

and (5.7.26) where 21, 22, 2; and 2; are mutually independent standard normal variables. Now, using (5.7.19) and (5.7.26) in (5.7.20) and simplifying, Govindarajulu (2004, Eq. (3.5)) obtains

(5.7.27)

Letting

U1 = fi21+ dEZ2and U2 = dE2;- AZ,* (5.7.28) when pl = pa, Govindarajulu (2004, Eq. (3.7)) obtains

(5.7.29)

Also, since where one can readily get (5.7.30) Thus, when pl = p2, we can write the set D as

(5.7.3 1) 258 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

Recall that in the second stage, wln* - n*/2 patients are given treatment 1 and the same number of patients are assigned to treatment 2. Let X1 be the number of patients responding to treatment 1out of (w1 - 1/2) n* patients, Then

1 Plql v2 = PI + - ['UI + (w1 - 1/2)lI2231 ( n* ) (5.7.32) w1 Jz after using representations (5.2.26), (5.7.28) and

Similarly, letting X1 to denote the number of patients responding to treatment 2 out of (w1 - 1/2) n* patients, we obtain

11 = P2 + -W1 [3'+ (w1 - 1/2)1/2Z4] (7)ll2 (5.7.33) after using representations (5.2.26), (5.7.28) and

Note that Z3 and Z4 are independent standard normal and independent of U1 and U2. Thus, from (5.7.32) and (5.7.33) we have

When Ho is true (i.e.) pl =p2 =p) we can write

Now letting L1 = (U1 - U2) /fiand L2 = (23 - 24) /aand simplifying, we have 5.7. SAMPLE SIZE REESTIMATION PROCEDURES 259 where L1 and L2 are independent and approximately standard normal variables. Proceeding in an analogous manner, we obtain

127T - L1 + (2~1- 1)1/2L2l > z (2w2)lI2, lLll < - Jz (5.7.36) Hence, for sufficiently large n* we have

where = { L1 + (2wi - 1)lI2L2} / (2wi)lI2, i = 1,2 and Y3 = L1. Note that (yZ, Y3),is standard bivariate normal (for i = 1,2) with

and

cw (Y2, fi)= p2 = E (y2y3) = E (LT)/ (2~2)~'~= (2w2)-ll2. (5.7.39) From (5.7.37) we obtain

(5.7.41)

(see Govindarajulu (2004, Lemma A.1)).

Example 5.7.1 Let 7r = 0.2 or 0.8, c31 = 4.1, w2 = 0.6, a = 0.05 and p = 0.10. Then 260 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

Computations yield - a-a = -2 -~p(2) [a (-2.0917 + 0.37272) - @ (-2.0917 - 0.3727~)]d~ L.3746

+2 ~p (x)[@ (-4.8010 + 2.2362) - Q, (-4.8010 - 2.236~)]d~ 11.3746

-2JW ~p (2)@ (-2.0917 + 0.37272) dx 1.3746 00 cp (2)@ (-4.8010 + 2.2362) dx +2 11.3746 = 2 (-0.006932 + 0.023525) = 0.0332.

Example 5.7.2 Let R = 0.2 or 0.8, w1 = 2.54, w2 = 0.6, a = 0.05 and p = 0.10. Then p1 = 0.4436, p2 = 0.9129 and

00 6 - a f -2 / cp (x)[@ (-2.187 + 0.495%)- @ (-2.187 - 0.495~)]da: 1.3746 00 cp (2)[@ (-4.801 + 2.2362) - @ (-4.801 - 2.2362)] dx 3-2 11.3746 -2Jm ~p (x)@ (-2.187 + 0.4952) G!X 1.3746 00 cp (2)4? (-4.801 + 2.2362) dx +2 11.3746 = 2 (-0.008869 + 0.023525) = 0.0293.

Remark 5.7.1 Examples 5.7.1 and 5.7.2 indicate that there is about 60% increase in the nominal level of significance, whereas Shih and Zhao (1997) claim that there is a slight increase in the level of significance. It is recommended that the nominal a be small, say 0.01.

Effective Power at the Specified Alternative We wish to obtain explicit expressions for the effective power c = [ (pr,pa) at the alternative pi = pf (i = 1,2). Note that the nominal power at the specified alternative is 1 - p. By definition

where P* denotes the probability evaluated when (p1,pz) = (pz,pa). Instead of torturing the reader with all the technical details, we simply give the final result 5.7. SAMPLE SIZE REESTIMATION PROCEDURES 261 as given in Govindarajulu (2004, Eq. (4.18)):

where

B = [p* (1 - P*)p2, p* = (PY +pZ) /2,

0; = (A: +A;) ,

= (p;q;)1/2, i = 192,

0: = [TA: + (1 - T)A;] [TC~- (I - T)C2I2 + [(I - T)A: + TA;] [TC~- (1 - T)C1I2,

pi = (0102*)-' (2~- 1) (CIA:- C2Ai),i = 1,2, c1 = +?7*-3 [q-* (1 - 2p*) - 4 (p* - p*2)] ,

c2 = +q-*-3 [q-* (1 - 2p*) + 4 (p* - p*2)] , q-* = p; - pa. Let us consider some numerical examples

Example 5.7.3 Let 7r = 0.2, pr = 0.5, pa = 0.3 (yielding q-* = 0.2, p* = 0.4), w1 = 4.1, w2 = 0.6, n* = 127, a = 0.05 and 1 - ,6 = 0.90. Then computations yield A? = 0.25, A$ = 0.21, B = 0.4899 C1 = -57.5, C2 = 62.5 01 = 40.6536, 02 = (O.46)lI2 = 0.6782 pi = 0.5984 /a, i = 1,2 i.e., p1 = 0.2090, p2 = 0.5463. Hence oo x 1 - 1 cp (z) [@ (-4.834 - 0.2137~)- @ (-8.929 - 0.21372)] dx roo -J, ~p (2)[@ (-0.6832 + 0.65222) - @ (-5.4635 + 0.65222)] dx 00 = 1 - 1 cp (x)[@ (-0.6832 + 0.65222) + @ (-5.4635 + 0.65222)] dx = 1 - 0.2183 + 0.2364 = 0.782 262 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS which is much lower than the nominal level, whereas Shih and Zhao (1997) obtain 0.9430 based on 500 simulations.

Example 5.7.4 Let 7r = 0.2, p; = 0.4, pz = 0.25 (yielding 'I* = 0.15, p* = 0.325), ~1 = 2.54, ~2 = 0.6, n* = 205, a = 0.05 and 1 - p = 0.90. Then

A: = 0.24, A; = 0.1875, B = 0.4684 CI = -122.222, C2 = 137.778 CT~= 84.8283, 02 = 0.6538 pi = 0.5968/&, i = 1,2

Thus, p1 = 0.2648, p2 = 0.5448 and

5' = '-1 ~p (x) [@ (-3.3698 - 0.27462) - @ (-7.4884 - 0.2746~)]dx

~p (x)[(a (-0.6662 + 0.64962) - @ (-5.4022 + 0.6496~)]d~ - 10 roo = 1-lo~p (x)[(a (-0.6662 + 0.64962) - @ (-5.4022 + 0.6496~)]dx = 1 - 0.2210 + 0.294 = 0.779.

Shih and Zhao (1997) obtain 0.9400 for the power based on 500 simulation runs. Thus the effective power is much lower than the specified power at the alternative. In the totally unblinded case, Govindarajulu (2004) shows that the type I error probability is under control and the power slightly increases. Thus it seems that the randomized response model adopted with blinded case is not robust. So one should abandon the creation of strata A and B, still retaining the blindedness of the trial.

5.8 Problems

5.1-1 In Robbins-Monro process set an = l/n and a = 0. Further assume that Yn (Xn) = x: - 2. Start with xi = 1 and iterate using the recurrent relation Xn+1 = Xn + $ (2 - xi) and stop for the first time two values of xn coincide. [Hint: you should stop when xn is close to Jz = 1.414.1 5.1-2 Assume that M (x) = a + a1 (x - 0) where, without loss of generality, we can set a = 0 and 8 = 0. That is, M (x) = alx and war {Y (x)} = 02. NOW, since Xn+1 = xn - anyn (xn), taking the conditional expectation on both sides for given xn and iterating the resultant expression, 5.8. PROBLEMS 263

(a) show that n

i= 1 where x1 denotes the initial value.

(b) Also squaring zn+l, taking conditional expectations for given xn and by iteration, show that

and hence

n n

Further, if an = c/n, E (xn+1) and E (~2~~)take on much simpler expres- sions. For details, see Govindarajulu (2001, p. 133).

5.3-1 For the following sequences of trial-result, obtain estimates of LD50 (using Tables of Dixon (1970))

(a) 0001010 (b) 01101011 (c) 001101010

5.4-1 Let h = 0.2, D = 0.62, y = 0.90, n = 3, 6 = 1.25 and 0 = 1. If zg = 0.05, the rule 5.4.9 yields k* = 11. Suppose the data is

... 2-1 xo z1 ... 00000000112 0 1 1 12333333333

Carry out the sequential Spearmen-Karber estimating procedure and see whether you stop with K = 12. If you stop provide an estimate of 8.

5.4-2 Carry out the sequential risk-efficient procedure for the following data. (Assume that 8 = 0.625, o = 0.5, h = 0.2, n = 3 and c = 0.0006).

and if you stop obtain the estimate of 6. 264 CHAPTER 5. APPLICATIONS TO BIOSTATISTICS

5.5-1 Let X have the probability density function

Carry out a repeated significance test for HO : 8 5 0 versus If1 : 8 > 0 using the following data

0.8, -1.2,1.7?2.1, -0.6,1.4,1.9, -0.4,1.5,2.7

with N = 20, exp (u) = 10, exp (b) = 8.

5.5-2 Suppose for Lucky Hospital, we have the following data for a certain disease: Dead Alive Sample size Population I 20 80 100 Population I1 10 70 80 Total 30 150 180

Suppose we wish to test HO : pl = p2 versus If1 : pl # p2 carry out Mantel-Haenszel test for the above data.

5.5-3 In Example 5.5.3 after reparameterization, obtain explicit expressions for the statistics 2 and V. Chapter 6

Matlab Programs in Sequential Analysis

6.1 Introduction

The primary purpose of this supplement' is to help the users carry out sequential procedures with real data using a minimum of long-hand calculations and to obtain decent numerical and graphical summaries of the sequential procedures. The manual contains a series of programs in Matlab (one of the most fre- quently used programming languages on university campuses), implementing most well-known and widely utilized procedures of sequential analysis. Each program is essentially a sample that can be (and therefore should be) changed to fit the users' needs. The programs are accompanied with a short description of the procedure and a list of arguments such as the values of the parameter under HO and HI, the significance level of the test, the coverage probability of the confidence interval, etc. The following is a list of the procedures and the names of the correspond- ing Matlab functions (they are sorted in the order of their appearance in the textbook) :

0 Sequential probability ratio test (SPRT), sprt

0 Restricted SPRT (Anderson's triangular test), restsprt

0 Rushton's sequential t-test, rttest

0 Sequential t-test, ttest 'Dr. Alex Dmitrienko, while he was a graduate student in the Department of Statistics, University of Kentucky, has helped me in preparing these computer programs in Matlab for which I am very thankful to him.

265 266 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS

0 Sequential t2-test, tsqtest

0 Hall’s sequential test, hall

0 Rank order SPRT , rankordersprt

0 Stein’s two-stage procedure (confidence interval), steinci

0 Stein’s two-stage test , steint

0 Robbins’ power one test, robbins

0 Cox’s sequential estimation procedure, cox

Each of these functions is saved in the file named functionname .m, where functionname is simply the name of the function (for example, sprt is saved in the file sprt .m). The source code of the functions with the description and comments are given in Section 2. In case you do not wish to type them in manu- ally, you are welcome to download the functions from http ;//WW .ms .uky . edu/ -alexe i/mat lab. Furthermore, you can either prepare by yourself (see Section 3) or download a library of frequently used probability density functions (p.d.f.’s) or probability functions (p.f.’s). This is a list of p.d.f.’s and p.f.’s available in Matlab and their Matlab names:

0 Bernoulli, bernd

0 Beta, betad

0 Binomial, bind

0 Cauchy, cauchyd

0 Double exponential, doubled

0 Exponential, expd

0 Gamma, gammad

0 Normal, normald

0 Poisson, poissond

0 t-distribution, td

0 Uniform, uniformd

0 Weibull, weibulld 6.1. INTRODUCTION 267

What follows is the list of Matlab files used by the functions described in this manual. You need to place them in your directory.

0 decisi0n.m

0 output .m

0 ttestb0und.m

The source code (in case you decide to change these files): decision.m: function p=decision(cont); % Produces caption for the graph if cont==-1 p=’Accept the hypothesis at stage %2.0f\n’; elseif cont==l p=’Reject the hypothesis at stage %2.0f\n’ ; else p=’The procedure didnj’t stop’; end; output.m: function p=output (c,str ,title, a, b, f ilename) ; %Saves the results (matrix c) in the file “filename”

[l,m]=size(c); fid=fopen(filename, ’w’);

titleI=[title ’\n\n’l ; fprintf (f id,titlel) ; fprintf (titlel) ;

if 1==2 fprintf(fid, ’k=%2.0f s=%6.3f\n’,c); fprintf(’k=%2.0f s=%6.3f\n’,c); elseif 1==3 fprintf(fid, ’k=%2.0f q=%6.3f r=%6.3f\n’,c); fprintf (’k=%2.0f q=%6.3f r=%6.3f\nJ ,c); else fprintf(fid, ’k=%2.0f s=%6.3f cl=%6.Clf c2=%6.3f\n’,c); fprintf(’k=%a.Of s=%6.3f cl=%6.3f c2=%6.3f\n’,c); end ;

if (a”=O)&(b”=O) fprintf (fid, ’\n\na=%6.3f b=%6.3f\nJ ,a,b); fprintf(’\n\na=%6.3f b=%6.3f\n’,a,b); end;

f close (f id) ; 268 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS tt e st bound. m: function p=ttestbound(delta,a,n); % Bounds for sequential t- and t-square-tests mu=sqrt(2*n-3); d-table=linspace(.l,.l,l); tO,table=[.002497 ,00995 ,02225 ... .03922 ,060625 .08618 .1156 .1484 ,1844 .2232]; if delta>l fprintf(’The value of delta will be rounded down to 1’); delta=l; end; i=floor(delta*lO) ; tO=tO-table(i) ; c=1+tOn2; t l= ( (delta) ^2+log(a) +O. 25*1og(c) ) / (to-sqrt (c) ) ; ~=-((t0)*3-6*t0)/24; t2=( (sqrt (i-tOn2)-tO)*tO*2-tO*t~+2*u~/c)/(2*sqrt(c)*(tO-sqrt (c))) ; p=-sqrt (2) *(t0*mu+tl/mu+t2/mun3)/delta; 6.2. SEQUENTIAL PROCEDURES 269

6.2 Sequential Procedures 6.2.1 Sequential Probability Ratio Test (SPRT) Problem. Given a sequence of independent identically distributed observations Xl,X2 ,..., having p.d.f. (p.f.) f(z), we wish to test HO : f(z) = fo(z) vs. Hl : f(.) = fl(+ Procedure. Let

n Sn = C(1nfi(Xn) - 1nfo(Xn)), 72 2 1- i=l At the nth stage, accept Ho if Sn 5 b, reject Ho if Sn 2 a and continue sampling if b < Sn < a, where b = ln(P/(l - a)),a = ln((1 - /?)/a),a and P are error probabilities. Arguments. The data set x, error probabilities alpha, beta, the name of the output file filename. Example. Assume that we have Bernoulli data with parameter p and we wish to test Ho : p = 1/3 vs. HI : p = 1/2. The following Matlab function carries out the SPRT for the data set x. % SPRT

% Arguments x=[1 1 I 0 0 I 0 I I I 0 I I 1 I]; % Observations alpha=. I; % Error probabilities beta=. I ; filename=’sprt.txt’; n=length(x); % Number of observations a=log ( (l-beta) /alpha) ; b=log(beta/(l-alpha)); % Upper and lower bounds s (l)=log(bernoulli (x (I), 1/31 1-log(bernoul1i (x( I) ,O.5) ) ; i=2; cont=O;

% SPRT while (i<=n)&(’cont), s (i)=s (i-l)+log(bernoulli (x(i) ,1/3)) -log(bernoulli (x(i) ,O.5)) ; if s (i) a cont=l; end; i=i+i ; 270 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS end ; d=decision(cont); % Decision

% Plotting the path and the bounds k=l: (i-I); cl=linspace(b,b,i-I); c2=linspace(a,a,i-l); str=sprintf (d, i-1) ; plot(k,s,k,cl,’--’,k,c2,’--’); title(’SPRT’); xlabel(str); axis([l n -5 5]);% Setting the scaling for the axes

% Saving the result in a file results= [k;s] ; output (results,str, ’SPRT’,a,b,filename); 6.2. SEQUENTIAL PROCEDURES 271

6.2.2 Restricted SPRT (Anderson’s Test) Problem. Same as in the SPRT. Procedure is similar to that in the SPRT, the only difference is that Anderson’s (1960) test uses convergent bounds. Let Sn be defined as in Procedure 1. Then at the nth stage, accept Ho if Sn 5 -c-dn, reject Ho if Sn 2 c+dn and continue sampling if -c - dn < Sn < c + dn, where c and d are respectively the intercept and the slope of the convergent bounds. Arguments. The data set x, intercept and slope of the bounds c, d, the name of the output file filename. Example. Assume that we have normal data with mean 0 and variance 1. We wish to test Ho : 0 = -1 vs. HI : 0 = 1. The following function carries out the restricted SPRT for the data set x.

% Restricted SPRT (Normal (-1,l) vs. Normal (1,l))

% Arguments; x=[-0.5 1.2 0.7 -1.4 0.7 0.4 -0.9 1.1 1.51; % Observations c=4; % Intercept d=-0.3; % Slope filename=’restsprt.txt’; n=length(x); % Number of observations s (1)=log(normal (x (1) ,-1,l) ) -log (normal (x (1), I, I) ) ; i=2; cont=O;

% SPRT while (i<=n)&(-cont), s (1) =s (i- 1) +log (normal (x (i) ,-I, I) ) -log (normal (x (i) , I,I) ) ; if s(i)<-c-d*i cont=-1; elseif s (i) >c+d*i cont=l ; end ; i=i+i ; end ; d=decision(cont) ;

% Plotting the path and the bounds k=l:(i-l); cl=linspace(b,b,i-I); c2=linspace(a,a,i-l); str=sprintf(d, i-1) ; 272 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS plot(k,s,k,cl,’--’ ,k,~2,’--~); title(’Norma1 (-1,l) vs. Normal (1,l): Restricted SPRT’); xlabel(str) axis([i n -5 51); % Setting the scaling for the axes

% Saving the result in a file results=[k; s;cl; c21 ; output(results,str,’Restricted SPRT’,O,O,filename); 6.2. SEQUENTIAL PROCEDURES 273

6.2.3 Rushton’s Sequential t-test

Problem. Assume that XI,X2,. . ., are normal (8, 02),both 8 and o2 are un- known, and we want to test HO : 8 = 80 vs. HI : 8 - 80 2 So, where S is a specified number. Procedure. Rushton (1950) proposed to use the following algorithm. Let

Then at stage n) stop if qn 5 b or qn 2 a. If rn satisfies the same inequality, make the appropriate decision (i.e. accept HO if qn 5 b or reject HO if qn 2 a) or take one more observation otherwise. Here b = ln(P/(l- a)),a = ln((1- P)/a), a and ,O are error probabilities. Arguments. The data set x, error probabilities alpha, beta, delta=S, theta=&), the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to test Ho : 8 = 5 versus HI : 8 - 5 > 0.20. The following function carries out Rushton’s t-test for the data set x. % Rushton’s sequential t-test

% Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.4 5.1 5.4 4.1 ... 5.2 4.8 4.6 5.7 5.9 5.81; % Observations alpha=.05; % Error probabilities beta=.05; theta=5; % delta=0.2; filename=’rttest.txt’; n=length(x); % Number of observations a=log((i-beta)/alpha); % Upper and lower bounds b=log(beta/(i-alpha)) ; i=2; cont=O; q(i)=O; r(i)=O; sum=x (I) ; sumsq= (x( I) -theta) -2; 274 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS

% Sequential t-test while (i<=n)&(-cont), sum=sum+x(i) ; sumsq=sumsq+(x ( i) -theta) 2 ; T=(sum-i*theta)*sqrt (sumsq) ; % T-statistic

% Rushton’s statistics q(i)=O.25*(delta*T)^2+delta*T*sqrt (i-1) ; r(i)=0.25*(delta*T)&2+delta*T*sqrtt (i-l)*(1-1/(4*(i-l)) +(delta*T) -2/(24* (i-1))) ; if (q(i)a)&(r(i)>a) )>a) cont=i; end; i=i+i ; end ; d=decision(cont); % Decision

% Plotting the path and the bounds k=l: (i-I); cl=linspace(b,b,i-1); c2=linspace(a,a,i-l); str=sprintf(d, (i-1)) ; plot(k,q,k,r,k,cl,’--’,k,c2,’--’); title( ’Rushton’’s sequential t-test’) ; xlabel(str) ; axis([l n -5 51); % Setting the scaling for the axes

% Saving the result in a file results=[k;q;rl ; output(results,str,’Rushton”s sequential t-test’,a,b,filename); 6.2. SEQUENTIAL PROCEDURES 275

6.2.4 Sequential t-test Problem. Same as in Rushton’s sequential t-test (Subsection 2.3). Procedure. Govindarajulu and Howard (1989) obtained the following modifi- cation of the sequential t-test. Let Tn be defined as in Subsection 2.3. At stage n, accept Ho if Tn 5 Bn or reject Ho if Tn 2 An. Here the bounds Bn and An are defined as follows:

where to is the unique solution of

and

tl = (3S2/4 + 1nB + (lnc)/4) /(to - &),

t2 = (t?(Ib- to) - tot1 + 2u(to)/c) / (2&(to - 6))7 p = ,/-, c = 1 +t& u(t) = -(t3 - 6t)/24. The formula for A, is the same, except A replaces B. Further, B = @/(1-a), A = (1 - /?)/a,a and @ are error probabilities. Arguments. The data set x, error probabilities alpha, beta, delta=&, theta=&, the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to test HO : 8 = 5 versus Hi : 8 - 5 > 0.20. The following Matlab function carries out sequential t-test for the data set x.

% Sequential t-test

% Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.4 5.1 5.4 4.1 ... 5.2 4.8 4.6 5.7 5.9 5.81; % Observations alpha=.05; % Error probabilities beta=.05; theta=5; % delta=O. 2; filename=’ttest.txt’; n=length(x); % Number of observations a=(l-beta)/alpha; % Upper and lower bounds b=beta/ (I-alpha) ; 276 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS

i=2; t(l)=O; cont=O; sum=O; s= (x( I) -theta) -2;

% Sequential t-test while (i<=n)&(-cont), sum=sum+x (i) ; s=s+(x(i)-theta)^2; t (i)=(sum-i*theta)/sqrt (s) ; % T-statistic cl(i)=ttestbound(delta,b,i) ; % Lower bound c2(i)=ttestbound(delta,a,i) ; % Upper bound

if t(i)c2(i) cont=l; end; i=i+i ; end ; d=decision(cont); % Decision

% Plotting the path and the bounds k=l:(i-I); str=sprintf(d, (i-I)); plot(k,t,k,cl,’--’,k,c2, ’--’) ; title(’Sequentia1 t-test’); xlabel(str); axis( [I n -15 151); % Setting the scaling for the axes

% Saving the result in a file results=[k;t;cl;c2]; output(results,str,’Sequential t-test’,O,O,filename); 6.2. SEQUENTIAL PROCEDURES 277

6.2.5 Sequential t2-test Problem. Assume, as in Rushton’s t-test (Subsection 2.3), that XI,X2,. . ., are normal (8, 02)and both 8 and o2 are unknown. Now we wish to test HO : 8 = 80 versus a two-sided alternative HI : (e - 001 2 60, where 6 is a specified number. Procedure. Govindarajulu and Howard (1989) proposed also a modification of the sequential t2-test. If Tn is defined as before in Subsection 2.3, at the nth stage, accept Ho if lTnl 5 B; or reject HO if lTnl 2 A;, where the bounds B; and A: are defined exactly as B and A in Subsection 2.4, except 2B replaces B and 2A replaces A. Again, B = p/(1 - a),A = (1 - p)/a, a and p are error probabilities. Arguments. The data set x,error probabilities alpha,beta, delta=S, theta=eo, the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to test HO : 8 = 5 versus HI : 18 - 51 > 0.20. The following function carries out sequential t2-test for the data set x.

% Sequential tsquare-test

% Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.4 5.1 5.4 4.1 ... 5.2 4.8 4.6 5.7 5.9 5.81; % Observations alpha=.05; % Error probabilities beta=.05; theta=5; % delta=O.2; filename=’tsqtest.txt’; n=length(x) ; % Number of observations a=(i-beta)/alpha; % Upper and lower bounds b=beta/(i-alpha) ; i=2; t(i)=O; cont=O; sum=O ; s= (x(i) -theta) -2 ;

% Sequential tsquare-test while (i<=n>&(-cont), sum=sum+x (i) ; s=s+ (x (i) -theta) -2; t (i)=abs( (sum-i*theta)/sqrt (s)) ; % T-statistic 278 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS

cl(i)=ttestbound(delta,2*b,i) ; % Lower bound c2(i)=ttestbound(delta,2*a, i) ; % Upper bound

if t(i)c2(i) cont=i; end ; i=i+i ; end ; d=decision(cont); % Decision

% Plotting the path and the bounds k=l:(i-I); str=sprintf(d, (i-I)); plot(k,t,k,cl,’--’ ,k,c2,’--’); title(’Sequentia1 tsquare-test’) ; xlabel(str) ; axis( [I n -15 151); % Setting the scaling for the axes

% Saving the result in a file results= [k;t ;cl ;c21; output(results,str,’Sequential tsquare-test’,O,O,filename); 6.2. SEQUENTIAL PROCEDURES 279

6.2.6 Hall's Sequential Test

Problem. Assume again that XI,X2,. . ., are normal (0,a2) and both 8 and a2 are unknown. We want to test HO: 8 = 0 against HI : 8 = S (S is specified). Procedure. Hall (1962) suggested to make use of the following two-stage pro- cedure. Take a preliminary sample of size m (m> 2) and compute the sample mean xmand the sample variance s& from this sample. Then define

am = -1na + (Ina)2/(rn - 11, bm = In@- (In,0)2/(rn - 1). Now for all n 2 m + 1, let rn = nS(Xn - S/2)/sk and at stage n 2 m + 1, accept Ho if rn 5 b, or reject Ho if rn > am. Here a and p are error probabilities. Arguments. The data set x, error probabilities alpha,beta,delta=S, the size of the preliminary sample mO, the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to test HO : 9 = 0 versus HI : B = 0.2. The following function carries out Hall's sequential t-test for the data set x. % Hall's sequential test

% Arguments x=[0.4 0.3 0.2 0.0 0.4 0.9 0.4 0.1 0.4 0.2 0.7 0.9 0.0 0.01; % Observations alpha=.05; % Error probabilities beta=.05; delta=O.2; m0=9; % Size of the pilot sample filename='hall.txt'; n=length(x) ; % Number of observations s=std(x(l:mO)) ; % Sample variance r (mO) =sum (x( I :mO) ) -mO*delta/2 ; j=mO+ I ; cont=O ; a=s*(-log(alpha)+(log(alpha) )-2/(mO-l))/delta; % Upper and lower bounds b=s* (log(beta) - (log(beta) ) ^2/(mO-I)) /delta;

% Second sample while (i<=n)&("cont), r (i) =r (i-I) +y (i) -delta/2 ; 280 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS

if (r(i)a) cont=i ; end ; i=i+i ; end ; r (mO) =O ; d=decision(cont) ;

% Plotting the path and the bounds k=l: (i-I); cl=linspace(b,b,i-I); c2=linspace(a,a,i-l) ; str=sprintf (d, i-I) ; plot(k,r,k,cl,’--’,k,c2,’--’); title(’Ha1l”s sequential test’); xlabel(str1; axis([l n -3 31); % Setting the scaling for the axes

% Saving the result in a file results=[k;r] ; output(results,str,’Hall”s sequential test’,a,b,filename); 6.2. SEQUENTIAL PROCEDURES 281

6.2.7 Stein’s Two-Stage Procedure (Confidence Interval) Problem. Assume that XI,X2,. . ., are normal (8, a2)with both 8 and a2 being unknown. The goal is to estimate 8 by a fixed-width confidence interval having the specified coverage probability 1 - a. Procedure. Stein (1945) proposed the following two-stage procedure for con- structing the fixed-width confidence interval for 8. Take a preliminary sample of size no (no 2 2) and calculate the sample variance s2 from this sample. Let t(n0 - 1,l - a/2) be the two-sided lOOa% quantile of the t-distribution with no - 1 degrees of freedom. Then set z = [d/t(no- 1,1- a/2)I2,where d is the half-width of the confidence interval, and n = max( [s2/z]+ 1,no). If n > no, draw an additional sample of n - no observations and estimate 8 by (x- d,y + d). Arguments. The data set x, the coverage probability alpha, the half-width d, the size of the preliminary sample no, the 100a% quantile of the t-distribution with no - 1 degrees of freedom quan, the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to construct a 90%-confidence interval of width 6. The following Matlab function carries out Stein’s procedure for the data set x.

% Stein’s two-stage procedure

% Arguments x=[10.5 19.5 20.5 23.1 24.3 24.3 15.6 24.6 22.2 21.9 21.31; alpha=.i; % I-Coverage probability d=3; % Half-width of the confidence interval nO=lO; % Size of the pilot sample quan=l.8125; filename=’steinci.txt’; l=length(x) ; % Number of observations z=(d/quan)^2; % Parameter y=x(I:nO) ; s=std(y) ; % Sample variance n=max(floor(s/z)+i,nO); % Total sample size if n>l disp(’Stein”s procedure did not stop.’); else xbar=mean (x (I :n) ) ;

% Saving the result in a file fid=fopen(filename, ’w’) ; fprintf(fid,’Stein’’s two-stage procedure\n\n’); 282 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS fprintf(fid,’Confidence interval=(%6.3f, %6.3f)\n’,xbar-d,xbar+d); fprintf(fid,’Variance=%6.3f\n’,s) f close (f id) ; fprintf(’Stein”s two-stage procedure\n\n’); fprintf(’C0nfidence interval=(%6.3f, %6.3f)\n’,xbar-d,xbar+d); fprintf (’Variance=%6.3f\n’,s) ; end ; 6.2. SEQUENTIAL PROCEDURES 283

6.2.8 Stein’s Two-Stage Test

Problem. Under the assumptions of Subsection 2.7, we want to test Ho : 8 < 80 VS. Hi : 8 2 80. Procedure. The sequential test porposed by Stein (1945) for testing HOis similar to the procedure described in Subsection 2.7. First, take a preliminary sample of size no (no 2 2) and calculate the sample variance s2. Then, let t(n0- 1,l-a) be the 100a% quantile of the t-distribution with no - 1 degrees of freedom. Finally, for any positive z, define n = max( [s2/z]+ 1,no). If n > no, draw an additional sample of n - no observations and accept HOif T < t(n0 - 1,l- a) or reject HO if T 2 t(n0 - 1,1- a),where T = +(f? - &)/s . Arguments. The data set x, the probability of type I error coverage alpha, theta=&, the size of the preliminary sample no, parameter z,the 100a% quantile of the t-distribution with no - 1 degrees of freedom quan,the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to test HO : 8 < 20 vs. Hi : 8 2 20. The following Matlab function carries out Stein’s sequential test for the data set x.

% Stein’s sequential test

% Arguments x=[10.5 19.5 20.5 23.1 24.3 24.3 15.6 24.6 22.2 21.9 21.31; alpha=.1 ; thet a=20 ; nO=lO; % Size of the pilot sample z=l; quan=l.8125; filename=’steint.txt’; l=length(x); % Number of observations y=x(l:nO) ; s=std(y) ; % Sample variance n=max(floor(s/z)+l,nO); % Total sample size if n>l disp(’Stein’’s test did not stop.’) ; else xbar=mean(x(l:n)) ; T=sqrt(n*s)*(xbar-theta); if T

% Saving the result in a file f id=fopen(f ilename , ’w ’ ) ; fprintf(fid,’Stein”s sequential test\n\n’); fprintf (fid,str); fprintf (fid,’Variance=%6.3f\n’,s) fclose (f id) ; fprintf (’Stein’’s sequential test\n\n’); fprintf(str); fprintf(’Variance=%6.3f\n’,s); end ; 6.2. SEQUENTIAL PROCEDURES 285

6.2.9 Robbins’ Power One Test Problem. Suppose that XI,X2,. . ., are normal (9,l). We wish to test HO : 9 < 0 against HI : 9 > 0. Procedure. Darling and Robbins (1967a) suggested to carry out the following sequential test. Let

n sn = EX^, = {(n+ m) [a2 + ln(l+ n/m)])1’2 i=l where u2 and m are positive constants, and at the nth stage accept Ho if Sn 5 -c,,reject HO if Sn 2 c, and continue sampling otherwise. Arguments. The data set x, constants asq=A2, m=m, the name of the output file filename. Example. Assume that we have normal data with mean 8 and variance 1. The following Matlab function carries out Robbins’ sequential test for the data set x with u2 = 9 and m = 1.

% Robbins’ power one sequential test

% Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.4 5.11; % Observations y=(x-4.5)/0.5; % Standardized observations asq=9; m=i; filename=’robbins.txt’; n=length(x) ; % Number of observations s(l>=y(l> ; c1 (i>=-sqrt((l+m>*(asq+log(l+l/m> 1) ; c2(i)=-ci(l); i=2; cont=O;

% Robbins’ test while (i<=n>& (-cant), ci(i>=-sqrt((i+m>*(asq+log(I+i/m>>> ; c2(i)=-cI(i) ; s(i)=s(i-i>+y(i> ;

if s(i> cont=-i; elseif s(i)>ca(i) cont=i; end ; i=i+l ; end; 286 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS d=decision (cont) ;

% Plotting the path and the bounds k=l: (i-1) ; str=sprintf (d, i-1) ; plot(k,s,k,cl,’--’,k,~2,’--’); title(’Robbins” sequential test’); xlabel(str); axis([1 n -15 151); % Setting the scaling for the axes

% Saving the result in a file results=[k; s;cl; c21 ; output(results,str,’Robbins”sequential test ’,O,O,filename); 6.2. SEQUENTIAL PROCEDURES 287

6.2.10 Rank Order SPRT Problem. Let (XI,Yl), (X2, Yz), . . ., be independent identically distributed bi- variate variables with marginal distributions F and G. We wish to test Ho : F(z)= G(z)for all z vs. HI : F6(z)= G(z) for all z and some 6 # 1. Procedure. Savage and Sethuraman (1966) constructed a rank order SPRT for this problem. At stage n, let W1,. . . , W2n denote the combined sample of XI,.. . ,Xn and Y1,.. . ,Yn. Also, let Fn(z) and Gn(z)be respectively empirical distribution functions of Xi,. . . ,Xn and Y1, . . . ,Yn. Then

is the likelihood ratio for HI and Ho. Therefore, we can now carry out the SPRT and at stage n accept HO if Sn 5 B, reject Ho if Sn 2 A and take one more observation if neither of these inequalities is satisfied. Here B = p/(1 - a), A = (1 - P)/a,Q! and p are error probabilities. Arguments. The two data sets x, y, error probabilities alpha, beta, the name of the output file filename. Example. The following Matlab function carries out the rank order SPRT for the data sets x, y with a = .05 and p = .05.

% Rank order SPRT

% Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.1 5.4 4.1 5.7 5.9 5.8 5.01; y=[5.2 4.7 4.8 4.9 5.9 5.2 4.8 4.9 5.0 5.1 4.5 5.4 4.9 4.7 4.81; alpha=.05; beta=. 05; filename=’ranksprt.txt’; n=length(x); % Number of observations a=log( (I-beta)/alpha) ; % Upper and lower bounds b=log(beta/(l-alpha)) ;

i=l; cont=O;

% SPRT while (i<=n>&(-cont) , w=[x(l:i> y(1:i)l; % Combined sample 288 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS for j=i:(2*i), f(j)=O; g(j)=O; % Empirical distribution functions for k=l : i , if x(k)<=w(j) f (j)=f (j)+l; end; if y(k)<=w(j) g(j)=g(j)+l; end; end ; end ; s(i)=(2^i)*prod(l: (2*i))/prod(f+2*g) ; % Likelihood ratio if s(i)a cont=l; end; i=i+l; end; d=decision(cont) ;

% Plotting the path and the bounds k=l:(i-I) ; cl=linspace(b,b,i-I); c2=linspace(a,a,i-l); str=sprintf(decis, i-I) ; plot(k,s,k,cl,’--’,k,c2,’--’); title(’Rank order SPRT’); xlabel(str); axis([l n -5 51); % Setting the scaling for the axes

% Saving the result in a file results= [k;s] ; output(results,str,’ Rank order SPRT’,a,b,filename); 6.2. SEQUENTIAL PROCEDURES 289

6.2.11 Cox’s Sequential Estimation Procedure

Problem. Suppose that XI,,k 2 1, are normal with mean 8 and variance 02. We wish to estimate B with a given standard error a. Procedure. The two-stage estimation procedure outlined below was proposed by Cox (1952b). Draw a preliminary sample of size no (no 2 2) and calculate the sample variance s2. Then, let n = [s2(1+2/no)/a2].If n > no, take an additional sample of n - no observations and estimate 8 by I?n. If n is smaller than no, estimate 8 by zn0. Arguments. The data set x, the standard error a, the size of the preliminary sample no, the name of the output file filename. Example. Assume that we have normal data with mean B and unknown variance 02.We wish to estimate 8 with standard error a = 0.5. The following Matlab function carries out Cox’s sequential procedure for the data set x.

% Cox’s sequential procedure

% Arguments x=[10.5 19.5 20.5 23.1 24.3 24.3 15.6 24.6 22.2 21.9 21.31; a=0.5; nO=lO; % Size of the pilot sample filename=’cox.txt’; l=length(x) ; % Number of observations y=x (1 :no) ; s=std(y) ; % Sample variance n=floor (s* (1+2/nO) /a) ; if n>nO xbar=mean (x ( 1 :n) ) ; else xbar=mean(x(l:nO)); end;

% Saving the result in a file fid=fopen(filename,’w’); fprintf(fid,’Cox’s sequential procedure\n\n’); fprintf (fid,’Estimate= %6.3f\n’,xbar); fprintf(fid,’Variance=%6,3f\n’,s) f close (f id) ; fprintf(’Cox’s sequential procedure\n\n’); fprintf (’Estimate= %6.3f\n’,xbar); fprintf(’Variance=%6.3f\n’,s); end : 290 CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS

6.3 Distribution

In this section you can find the source code of the Minitab functions generating the p.d.f’s or p.f.’s listed in the introduction. Notice that each of these functions must be saved in an m-extension file whose name coincides with the name of the function, e.g. the function bernd must placed in bernd.m. function a=bernd(x, theta) ; % Bernoulli p.f. if x==l a=theta; else a=theta-I; end; function p=betad(x,r,s); % Beta p.d.f. a=gamma(s+r) *exp ( (r-I) *log(x)+ (s-I) *log( I-x) ) / . . . (gammdr) *gamma(s) 1 ; function a=bind(x,p,n) ; % Binomial p.f. a=gamma(n+i)*exp(x*log(p)+(n-x)*log(l-p))/. . . (gamma(x+I) *gamma(n-x+I) 1 ; function a=cauchyd (x ,theta) ; % Cauchy p.d.f. a=theta/ (pi* (thetan2+xn2)) ; function a=doubled(x) ; % Double exponential p.d.f. a=exp(-abs(x))/2; function a=expd(x,theta) ; % Exponential p . d.f . a=theta*exp (-thetaax) ; function a=gammad(x,p) ; % Gamma p.d.f. a=exp(-x+(p-I)*log(x) >/gamma(p> ; function a=normald(x, theta,sigmasq) ; % Normal p.d.f. a=exp(- (x-theta) -2)/sqrt (2*pi*sigmasq) ; function a=poissond(x,lambda) ; 6.3. DISTRIBUTION 291

% Poisson d.f. a=exp(-lambda+x*log(lambda))/gamma(x+i) ; function a=td(x,n) ; % t-distribution a=gamma((n+l)/2)/(sqrt(pi*n)*gamma(n/2)*. . . exp ( ( (n+l)/2) *log( i+x^2/n) ) ) ; function a=uniformd(x,l,u); % Uniform p.d.f. if (lC=x)&(xC=u) a=l/(u-1) ; else a=O; end; function a=weibulld(x,m); % Weibull p.d.f. a=m*exp((m-i)*log(x)-exp(m*log(x))) ; This page intentionally left blank Referenced Journals

The following is a list of the references that are cited in the text. The relevant page numbers in which the references are cited are given at the end of each reference in brackets. The references are arranged alphabetically according to the author’s names. Multiple-authored articles are listed according to the names of the primary authors only. The following abbreviations of journal titles are used.

List of the Abbreviated Journal Titles

Acta Math. Acad. Sci, Hungar. - Acta Mathematica Academiae Scientiarum

Ann. Inst. Statis. Math. - Annals of the Institute of Statistical Mathematics (Tokyo)

Ann. Math. Statist. - The Annals of Mathematical Statistics Ann. Statist.- The Annals of Statistics

Austl. J. Statist. - Australian Journal of Statistics

Bull. Amer. Math. SOC.- Bulletin of the American Mathematical Society Bull. Intern. Statist. Inst. - Bulletin of the International Statistical Institute

Bell Syst. Tech. J. - Bell System Technical Journal

Calcutta Statist. Assoc. Bull. - Calcutta Statistical Association Bulletin (Cal- cutta)

Cornrnun. Statist. Theor. Method - Communications in Statistics - Theory and Met hods Duke Math J. -Duke Mathematical Journal

J. Arner. Statis. Assoc. - Journal of the American Statistical Association J. Appl. Prob. - Journal of Applied Probability

293 294 REFERENCED JOURNALS

J. Austral. Math. SOC.- The Journal of the Australian Mathematical Society

J. Ind. Statist. Assoc. - Journal of Indian Statistical Association

J. Roy. Statist. SOC. Ser. A (or B) - Journal of the Royal Statistical Society, Series A (or B)

Philos. Trans. Roy. SOC. Ser. A - Philosophical Transactions of the Royal Society Series A (London)

Proc. Cambridge Philos. SOC. - Proceedings of the Cambridge Philosophical Society (Cambridge, England)

Proc. Berkeley Symp. Math. Stat. Prob. nth (n= 1,2, ..., 6) - Proceedings of the nth (n= 1,2, ..., 6) Berkeley Symposium of Mathematical Statistics and Probability (Berkeley, California)

Proc. Nut. Acad. Sci. U.S.A. - Proceeding of the National Academy of Sciences of the United Stated of America Rep. Statist. Appli. Res. Un. Japan Sci. Engrs. - Reports of Statistical Application Union of Japanese Scientists and Engineers (Tokyo)

Rev. Inst. Intern. Statist. - Review of the international Statistical Institute (Hague)

Sankhya Ser. A (or B) - Sankhyii, The Indian Journal of Statistics, Series A (or B) SOC.Indust. and Appl. Muth. - Society for Industrial and Applied Mathematics South African Statist. J. - South African Statistical Journal

Statis. Med. - Statistics in Medicine

Theor. Prob. Appl. - Theory of Probability and Applications

2. Wahrscheinlichkeitstheorie und Verw. Gebiete - Wahrscheinlinchkeitstheorie und Verwandte Gebiete (Berlin) Bibliography

[l]Aivazyan, S. A. (1959). A comparison of the optimal properties of the Neyman-Pearson and the Wald sequential probability ratio test. Theor. Probability Appl. 4 86-93. [lo51 [a] Albert, A. (1966). Fixed size confidence ellipsoids for pa- rameters. Ann. Math. Statist. 37 1602-1630. [206, 2071 [3] Anderson, T. W. (1960). A modification of the sequential probability ratio test to reduce the sample size. Ann. Math. Statist. 31 165-197. [47, 48, 49, 75, 2711 [4] Anderson, T. W. and Friedman, M. (1960). A limitation of the optimum property of the sequential probability ratio test. Contributions to Probability and Statistics Essay in honor of (Ed. Olkin et al). Standord University Press No. 6 57-59. [46, 1361

[5] Anscombe, F. J. (1949). Large-sample theory of sequential estimation. Bio- metrika 36 455-458. [182]

[6] Anscombe, F. J. (1952). Large-sample theory of sequential estimation. Proc. Cambridge Philos. SOC.48 600-607.[182, 183, 187, 189, 200, 250, 2521

[7] Anscombe, F. J. (1953). Sequential estimation. J. Roy. Statist. SOC.Ser B. 15 1-29. [200] [8] Armitage, P. (1947). Some sequential tests of Students’ hypothesis. J. Roy. Statist. SOC.Suppl. 9 250-263. [go] [9] Armitage, P. (1950). Sequential analysis with more than two alternative hy- pothesis, and its relation to discriminant function analysis. J. Roy. Statist. SOC.Ser. B. 12 137-144. [go, 95, 961

[lo] Armitage, P. (1957). Restricted sequential procedures. Biometrika 44 9-26. [47, 49, 58, 2461

295 296 BIBLIOGRAPHY

[ll]Arimtage, P. (1975). Sequential Medical Trials, 2nd Ed., Oxford, Blackwell Scientific Publications. [245]

[12] Aroian, L. A. (1968). Sequential analysis-direct method. Technometrics 10 125-132. [44] [13] Bahadur, R. R. (1954). Sufficiency and statistical decision function. Ann. Math. Statist. 25 423-462. [145] [14] Bahadur, R. R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37 557-580. [184, 2121 [15] Baker, A. G. (1950). Properties of some tests in sequential analysis. Bio- metrika 37 334-346. [73] [16] Barnard, G. A. (1952). The frequency justification of certain sequential tests. Biometrika 39 144-150. [65, 76, 851 [17] Barnard, G. A. (1954). Sampling inspection and statistical decisions. J. Roy. Statist. SOC.Ser. B. 16 151-174. [106, 1071

[18] Barnard, G. A. (1969). Practical applications of tests with power one. Bull. Intern, Statist. Inst. 43,parts 1 and 2 389-393. [121] [19] Bartlett, M. S. (1937). Some examples of statistical methods of research in agriculture and applied biology. J. Roy. Statist. SOC.Suppl. 4 137-170. [171] [20] Bartlett, M. S. (1946). The large sample theory of sequential tests. Proc. Cambridge Philos. SOC.42 239-244. [49, 81, 84, 851

[21] Bechhofer, R. (1960). A note on the limiting relative efficiency of the Wald sequential probability test. J. Arner. Statist. Assoc. 55 660-663. [lo51

[22] Berk, R. H. (1973). Some asymptotic aspects of sequential analysis. Ann. Statist. 1 1126-1138. [51, 531 [23] Bhattacharyya, A. (1946). On some analogues of the amount of information and their use in statistical estimation. Sankhya 8 1-14. [157]

[24] Bhattacharya, P. K. and Mallik, A. (1973). Asymptotic normality of the stopping times of some sequential procedures. Ann. Statist. 1 1203-1211. ~891 [25] Billard, L. and Vagholkar, M. D. (1969). A sequential procedure for testing a null hypothesis against a two-sided alternative hypothesis. J. Roy. Statist. SOC.Ser. B. 31 285-294. [97, 981 BIBLIOGRAPHY 297

[26] Birnbaum, Z. W. (1956). On a use of the Mann-Whitney statistics. Proc. Third Berlcerley Symp. Math. Stat. Prob. Univ. of California Press 13-17. [2161 [27] Birnbaum, A. and Healy, W. C. Jr. (1960). Estimates with prescribed vari- ance based on two-stage sampling. Ann. Math. Statist. 31 662-676. [220]

[28] Blackwell, D. (1947). Conditional expectation and unbiased sequential es- timation. Ann. Math. Statist. 18 105-110. [144]

[29] Blackwell, D. and Girshick, M. A. (1947). A lower bound for the variance of some unbiased sequential estimates. Ann. Math. Statist. 18 277-280. [157]

[30] Blum, J. (1954). Multidimensional methods. Ann. Math. Statist. 25 737-744. [228] [31] Bradley, R. A. (1967). Topics in rank order statistics . Proc. Fifth Berlcelely Symp. Muth. Statist. Prob. 1 593-607, University of California Press. [127]

[32] Bradley, R. A., Merchant, S. D. and Wilcoxon, F. (1966). Sequential rank test 11; modified two-sample procedure. Technometrics 8 615-623. [127]

[33] Breslow, N. (1969). On large sample sequential analysis with applications to survivorship data. J, Appl. Prob. 6 261-274. [85, 861

1341 Burman, J. P. (1946). Sequential sampling formulae for a binomial popula- tion J. Roy. Statist. SOC.Suppl. 8 98-103. [log]

[35] Chernoff, H. (1956). Large sample theory: parametric case. Ann. Math. Statist. 27 1-22. [lo51 [36] Chernoff, H. (1960). Sequential test for the mean of a normal distribution. Proc. Fourth Berkelely Symp. Math. Statist. Prob. 1 79-92, University of California Press. [114]

[37] Chernoff, H. (1968). Optimal stochastic control. Sankhya Ser. A. 30 221- 252. [114, 115, 1161

[38] Chow, Y. S. and Robbins, H. (1965). On the asymptotic theory of fixed width sequential confidence intervals for the mean. Ann. Math. Statist. 36 457-462. [190, 193, 196, 197, 198, 199, 202, 204, 215, 2191

[39] Chow, Y.S. and Yu, K. F. (1981). The performance of a sequential proce- dure for the estimation of the mean. Ann. Statist. 9 184-189. [202, 2031

[40] Chow, Y. S., Robbins, H., and Teicher, H. (1965). Moments of randomly stopped sums. Ann. Math. Statist. 36 789-799. [25, 261 298 BIBLIOGRAPHY

[41] Connel, T. L. and Graybill, F. A. (1964). A Tchebycheff type inequality for chi-square. Unpublished manuscript. [221]

[42] Cox, D. R. (1952a). A note on the sequential estimation of means. Proc. Cambridge Philos. SOC.48 447-450. [76, 1821 [43] Cox, D. R. (1952b). Estimation by double sampling. Biometrika 39 217- 227. [167, 168, 171, 173, 2891 [44] Cox. D. R. (1963). Large sample sequential tests for composite hypotheses. Sankhya Ser. A. 25 5-12. [81, 82, 83, 851 [45] Gamer, H. (1946). Mathematical Methods of Statistics, Princeton Univer- sity Press, Princeton. [69, 79, 81, 126, 2181

[46] Dantzig, G. B. (1940). On the non existence of tests of “Students” hy- pothesis having power functions independent of 0.Ann. Math. Statist. 11 186-192. [7]

[47] Darling D. A. and Robbins, H. (1967a). Inequalities for the sequence of sample means. Proc. Nut. Acad. Sci. U.S.A. 57 1157-1180. [121]

[48] Darling D. A. and Robbins, H. (196713). Iterated logarithm inequalities. Proc. Nut. Acad. Sci. U.S.A. 57 1188-1192. [121]

[49] Darling D. A. and Robbins, H. (1968a). Some further remarks on inequali- ties for sample sums. Proc. Nut. Acad. Sci. U.S.A. 60 1175-1182. [121] [50] Darling D. A. and Robbins, H. (196813). Some nonparametric sequential tests with power one. Proc. Nut. Acad. Sci. U.S.A. 61 804-809. [123, 1241

[51] David, H. T. and Kruskal, W. H. (1956). The WAGR sequential t-test reaches a decision with probability one. Ann. Math. Statist. 27 797-805. ~2,691 [52] DeGroot, M. H. (1959). Unbiased binomial sequential estimation. Ann. Math. Statist. 30 80-101. [144, 148, 150, 1511 [53] Dixon, W. J. (1970). Quanta1 response variable experimentation: the up and down method: in McArthur, Colton. Statistics. in Endocrinology 251- 267 (MIT Press, Cambridge). [231]

[54] Dixon, W. J. and Mood, A. M. (1948). A method for obtaining and analysing sensitivity data. J. Arner. Statist. Assoc. 43 109-126. [230] [55] Dodge, H. F. and Romig, H. G. (1929). A method of sampling inspection. Bell Syst. Tech. J. 8 613-631. [3] BIBLIOGRAPHY 299

[56] Donnelly, T. G. (1957). A Family of Truncated Sequential Tests. Doctoral Dissertation, Univ. of North Carolina. [47]

[57] Ericson, R. (1966). On moments of cumulative sums. Ann. Math. Statist. 37 1803-1805. [57] [58] Fabian, V. (1974). Note on Anderson’s sequential procedures. Ann. Statist. 2 170-176. [50, 511 [59] Farrell, R. H. (1966a). Bounded length confidence intervals for the p-point of a distribution function, 11. Ann. Math. Statist. 37 581-585. [210] [60] Farrell, R. H. (1966b). Bounded length confidence intervals for the p-point of a distribution function, 111. Ann. Math. Statist. 37 586-592. [210, 2121 [Sl] Ferguson, T. S. (1967). Mathematical Statistics: A Decision Theoretic Ap- proach. Academic Press, New York. 378-383. [32] [62] Fisz, M. (1963). Probability Theory and Mathematical Statistics. Wiley, New York. [126]

[63] Franzh, S. (2003). SPRT fixed-length confidence intervals. Submitted to Comm. Statist. Theor. Meth. [173, 1761

[64] Geertsema, J. C. (1970a). Sequential confidence intervals based on rank tests. Ann. Math. Statist. 41 1016-1026. [210, 211, 212, 213, 2151 [65] Geertsema, J. C. (1970b). A Monte Carlo study of sequential confidence intervals based on rank tests. South African Statist. J. 4 25-31. [215]

[66] Ghosh, B. K. (1970). Sequential Tests of Statistical Hypotheses. Addison- Wesley, Reading (Mass.) [v] [67] Ghosh, J. K. (1960). On some properties of sequential t-test. Calcutta Sta- tist. Assoc. Bull. 9 77-86. [70, 711 [68] Ghosh, M. N. (1960). Bounds for the expected sample size in a sequential probability ratio test. J. Roy. Statist. SOC.Ser. B 22 360-367. [33] [69] Ghosh, M., Mukhopadhyay, N. and Sen, P. K. (1997). Sequential Estima- tion. Wiley. New York. [v] [70] Ghurye, S. G. and Robbins, H. (1954). Two-stage procedures for estimating the difference between means. Biometrika 41 146-152. [162, 1631 [71] Girshick, M. A., Mosteller, F., and Savage, L. J. (1946). Unbiased estimates for certain binomial sampling problems with applications. Ann. Math. Sta- tist. 13 13-23. [147, 1481 300 BIBLIOGRAPHY

1721 Gleser, L. J. (1965). On the asymptotic theory of fixed-size sequential con- fidence bounds for linear regression parameters. Ann. Math. Statist. 36 463-467. (Correction note (1966) Ann. Math. Statist. 37 1053-1055). [203, 206, 2071 [73] Gleser, L. J., Robbins, H., Starr, N. (1964). Some asymptotic properties of fixed-width sequential confidence intervals for the mean of a normal popula- tion with unknown variance. Tech. Report. Dept. of Math. Statist. Columbia University. [ZOO] [74] Govindarajulu, Z. (1967). Two-sided confidence limits for P(X < Y)based on normal samples of X and Y. Sankhyii Ser. B. 29 35-40. [209] 1751 Govindarajulu, Z. (1968a). Asymptotic normality of two-sample rank order sequential probability ratio test based on Lehmann alternatives. Unpub- lished manuscript. [51, 1291 [76] Govindarajulu, Z. (1968b). Distribution-free confidence bounds for P(X < Y).Ann. Inst. Statist. Math. 20 229-238. [216] [77] Govindarajulu, Z. (1974). Fixed-width confidence intervals for P(X

[78] Govindarajulu, Z. (1984). Stopping times of one-sample rank order sequen- tial probability ratio tests. J. Statist. Planning and Inference 9 305-320. ~321 [79] Govindarajulu, Z. (1987). The Sequential Statistical Analysis of Hypothesis Testing, Point, and Interval Estimation, and Decision Theory. (Corr. Ed.). Amer. Sciences Press, Inc. 20 Cross Road, Syracuse, New York 13224-2104. [v, 53, 186, 200, 210, 212, 221, 2221

[80] Govindarajulu, Z. (1995). Certain sequential adaptive design problems. Adaptive Designs (IMS Lecture Notes - Monograph Series No. 25) (Eds. N. Flournoy and W. F. Rosenberger) 197-212. [229]

[81] Govindarajulu, Z. (2001). Statistical Techniques in Bioassay, 2nd, revised and enlarged Ed., Karger Publishers, New York. [228, 231, 2631

[82] Govindarajulu, Z. (2002). Robustness of a sample size re-estimation proce- dure in clinical trials. Advances in Methodological and Applied Aspects of Prob. and Statist. (Ed. N. Balakrishnan) Taylor and Francis, New York. 383-398. [247, 248, 2521 BIBLIOGRAPHY 301

Govindarajulu, Z. (2003). Robustness of a sample size re-estimation proce- dure in clinical trials (Arbitrary populations). Statist. Med. 22 1819-1828. Correction, Vol. 23 to appear. [247, 2521 Govindarajulu, Z. (2004). Robustness of a sample size re-estimation with interim binary data for double-blind clinical trials. J. Statist. Planning and Inference (to appear). [254, 256, 257, 259, 261, 2621 Govindarajulu, Z. and Howard, H. C. (1989). Uniform asymptotic expan- sions applied to sequential t and t2 tests. Austr. J. Statist. 31 No. 1 95-104. [63, 65, 66, 275, 2771 Govindarajulu, Z. and Howard, H. C. (1994). Asymptotic expansions ap- plied to sequential F-test criteria. Austr. J. Statist. 36 No. 1 101-113. [77, 78, 791 Govindarajulu, Z. and Nanthakumar, A. (2000). Sequential estimation of the mean logistic response function. Statistics. 33 309-33. [231, 232, 2331 Govindarajulu, Z., LeCam, L., and Raghavachari, M. (1967). Generaliza- tions of the theorems of Chernoff and Savage on the asymptotic normality of test statistics. Proc. Fifth Berkeley Symp. Math. Statist. Prob. 1608-638 University of California Press. [216] Graybill, F. A. and Connell, T. L. (1964a). Sample size estimating the variance within d units of the true value. Ann. Math. Statist. 35 438-440. [194, 2211 Graybill, F. A. and Connell, T. L. (1964b). Sample size required to estimate the parameter in the uniform density within d units. J. Amer. Statist. Assoc. 59 550-556. [222] Groeneveld, R. A. (1971). A note on the sequential sign test. American Statistician 25 (2) 15-16. [125, 1261

Hajek, J. and Sidak, Z. (1967). Theory of rank test. Academic Press, New York. [126] Haldane, J. B. S. (1945). On a method of estimating frequencies. Biometrika 33 222-225. I1511 Hall, W. J. (1962). Some sequential analogs of Stein’s two-stage test. Bio- metraka 49 367-378. [73, 74, 2791 Hall, W. J. (1965). Methods of sequentially testing composite hypotheses with special reference to the two-sample problem Univ. of North Carolina, Inst. of Statistics Mimeo Series No. 441, 40-41. [124] 302 BIBLIOGRAPHY

[96] Hall, W. J., Wijsman, R. A., and Ghosh, J. K. (1965). The relationship between sufficiency and invariance with applications in sequential analysis. Ann. Math. Statist. 36 575-614. [124, 1271 [97] Hodges, J. L. and Lehmann, E. L. (1956). Two approximations to the Robbins-Monro process. Proc. Third Berkeley Symp. Math. Statist. Prob. Vol. 1 95-104. Univ. of California Press, Berkeley. [228] [98] Hoeffding, W. (1948). A class of statistics with asymptotically normal dis- tribution. Ann. Math. Statist. 19 293-325. [214] [99] Hoeffding, W. (1953). A lower bound for the average sample number of a sequential test. Ann. Math. Statist. 24 127-130. [34] [loo) HoeEding, W. (1960). Lower bounds for the expected sample size and the average risk of a sequential procedure. Ann. Math. Statist. 31 352-368. [34] [ 1011 Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-20. [213, 2141 [lo21 Hoel, P. G. (1955). On a sequential test for the general linear hypothesis. Ann. Math. Statist. 26 136-139. [75, 761 [lo31 Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proc. FiBh Berkeley Symp. Math. Statist. Prob. 1 221-233 University of California Press at Berkeley and Los Angeles. [186] [lo41 Joanes, D. N. (1972). Sequential tests of composite hypothesis. Biometrzka 59 633-637. Correction: Biometrika 62 (1975) 221. [85] [lo51 Johnson, N. L. (1953). Some notes on the application of sequential test for the general linear hypothesis. Ann. Math. Statist. 24 614-623. [76] [lo61 Johnson, N. L. (1959). A proof of Wald’s theorem on cumulative sums. Ann. Math. Statist. 30 1245-1247. [23] I1071 Kemp, K. W. (1958). Characteristic and the averge sample number of some sequential test AE for calculating the operating. J. Roy. Statist. SOC.Ser. B. 20 379-386. [36, 37, 391 [lo81 Kemperman, J. H. B. (1961). The Passage problem for a Stationary Markov Chain. Univ. of Chicago Press, Chicago, Illinois. [32] [ 1091 Khan, R. A. (1969). A general method of determing fixed-width confidence intervals. Ann. Math. Statist. 40 704-709. [191, 192, 1941

[110] Lachin, J. M. (1977). Sample size determination for r x c comparative trials. Bzometrics 33 315-324. [256] BIBLIOGRAPHY 303

[lll]Laha, R. G. and Rohatgi, V. K. (1979). Probability Theory. Wiley, New York. [189] [112] Lawing, W. D. and David, H. T. (1966). Likelihood ratio computation of operating characteristics. Ann. Math. Statist. 37 1704-1716. [50] [113] LeCam, L. (1970). On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann. Muth. Statist. 41 802-828. [79, 1911

[114] Lehmann, E. L. (1950). Notes on the Theory of Estimation. Lecture notes recorded by Colin Blyth. Associated Student’s Store. Univ of California, Berkeley. [144, 153, 1561

[115] Lehmann, E. L. (1953). The power of rank test. Ann. Muth. Statist. 24 23-43. [127] [116] Lehmann, E. L. (1955). Ordered families of distributions. Ann. Math. Sta- tist. 26 399-419. [173]

[117] Lehmann, E. L. (1959). Testing Statistical Hypotheses. 104-110 Wiley, New York. [vii, 45, 46, 1741

[118] Lehmann, E. L. and Stein, C. (1950). Completeness in the sequential case. Ann. Muth. Statist. 21 376-385. [145, 1461 [119] Lindley, D. V. and Barnett, B. N. 91965). Sequential sampling, two deci- sion problems with linear losses for binomial and normal random variables. Biometrika. 52 507-532. [lll,113, 114, 1151 c- [120] ‘Lokve, M. (1977). Probability Theory Vol. 1. Van Nostrand, Princeton. [187, 2231

[121] Lorden, G. (1970). On excess over the boundary. Ann. Math. Statist. 41 520-527. [86] [122] Lorden, G. (1973). Open-ended tests for Koopman-Darmois familes. Ann. Statist. 1633-643. [88, 891 [123] Matthes, T. K. (1963). On the optimality of sequential probability ratio tests. Ann. Math. Statist. 34 18-21. [45]

[124/ Mikhalevich, V. S. (1956). Sequential Bayes solutions and optimal methods of statistical acceptance control. Theor. Prob. Appl. 1 395-421. [114] [125] Miller, R. G. (1970). Sequential signed rank test. J. Amer. Statist. Assoc. 65 1554-1561. [132, 133, 134, 1361 304 BIBLIOGRAPHY

[126] Miller, R. G. (1981). Survival Analysis. JohnWiley and Sons, New York. [237, 2401 [127] Moriguti, S. and Robbins, H. (1962). A Bayes test of p 5 1/2 against p > 1/2. Rep. Statist. Appl. Res. Un. Japan Sci. Engrs. 9 39-60. [114] [128] Moshman, J. (1958). A method for selecting the size of the initial sample in Stein’s two-sample procedure. Ann. Math. Statist. 29 1271-1275. [lo] [129] Nadas, A. (1969). An extension of a theorem of Chow and Robbins on sequential confidence intervals for the mean. Ann. Math. Statist. 40 667- 671. [196, 200, 2011 [130] Nanthakumar, A. and Govindarajulu, Z. (1994). Risk-efficient estimation of the mean of logistic response function using Spearmen-Karber estimator. Statistica Sinica 4 305-324. I2311 [131] Nanthakumar, A. and Govindarajulu, Z. (1999). Fixed-width estimation of the mean of logistic response function using Spearmen-Karber estimator. Biometrical Journal 41 No. 4, 445-456. [231] [132] Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. SOC.Ser. A. 231 289-337. [111 [133] Page, E. S. (1954). An improvement to Wald’s approximation for some properties of sequential test. J. Roy. Statist. SOC.Ser. B. 16 136-139. [36] [134] Paulson, E. (1947). A note on the efficiency of the Wald sequential test. Ann. Math. Statist. 19 447-450. [loo] [ 1351 Paulson, E. (1964). A sequential procedure for selecting the population with the largest mean from k normal populations. Ann. Math. Statist. 35 174-180. [51] [136] Putter, J. (1951). Sur une methods de double e’chantillonnage, pour estimer le mayenne d’une population Laplacienne Stratifice. Etc. Rev. Inst. Intern. Statist. 19 231-238. [162] [ 1371 Raiffa, H. and Schlaifer, R. (1961). Applied Statistical Decision Theory. Harvard University Press, Cambridge, Massachusetts. [114] [138] Rao, C. R. (1965). Linear Statistical Inference and its Applications. Wiley, New York. [186, 1911 [139] Ray, W. D. A. (1957a). A proof that the sequential probability ratio test (SPRT) of the general linear hypothesis terminates with probability unity. Ann. Math. Statist. 28 521-522. [77] BIBLIOGRAPHY 305

[140] Ray, W. D. A. (1957b). Sequential confidence intervals for the mean of a normal population with unknown variance. J. Roy. Statist. Soc. Ser. B. 19 133-143. [200]

I1411 Renyi, A. (1957). On the asymptotic distribution of the sum of random number of random variables. Acta. Math. Acad. Sci. Hungar. 8 193-199. ~891 [142] Reynolds, M. R. Jr. (1975). A sequential signed-rank test for symmetry. Ann. Statist. 3 382-400. [134, 135, 1361 [143] Richter, D. (1960). Two stage experiments for estimating a common mean. Ann. Math. Statist. 31 1164-1173. [164, 165, 166, 1671 [144] Robbins, H. (1952). Some aspects of the sequential . Bull. Amer. Math. SOC.58 527-535. [162] [145] Robbins, H. (1959). Sequential estimation of the mean of a normal pop- ulation. Probability and Statistics - The Herald Cram& Volume 235-245. Almquist and Wiksell, Uppsala, Sweden. [189, 201, 2021

[146] Robbins, H. (1970). Statistical methods related to the law of the iterated logarithm. Ann. Math. Statist. 41 1397-1409. [118, 1231 [ 1471 Robbins, H. and Monro, S. (1951). Stochastic approximation method. Ann. Math. Statist. 22 400-407. [227] [148] Roy, S. N. (1957). Some Aspects of Multivariate Analysis. Wiley. New York. ~071 [149] Rushton, S. (1950). On a sequential t-test. Biometrika 37 326-333. (63, 65, 66, 83, 2731 [150] Rushton, S. (1952). On a two-sided sequential t-test. Biometrika 39 302- 308. Correction: 41 286. [65, 66, 761 [151] Sacks, J. (1958). Asymptotic distribution of stochastic approximation pro- cedures. Ann. Math. Statist. 29 373-405. [227]

[152] Sacks, J. (1965). A note on the sequential t-test. Ann. Math. Statist. 36 1867-1869. [71, 721 [153] Samuel, E. (1966). Estimators with prescribed bound on the variance for the parameter in the binomial and Poisson distributions based on two-stage sampling. J. Amer. Statist. Assoc. 61 220-227. [220]

[154] Savage, I. R. (1959). Contributions to the theory of rank order statistics - the one-sample case. Ann. Math. Statist. 30 1018-1023. [131] 306 BIBLIOGRAPHY

[155] Savage, I. R. and Sethuraman, J. (1966). Stopping tine of a rank order sequential probability ratio test based on Lehmann alternatives. Ann. Math. Statist. 37 1154-1160. Correction: 38 (1967) 1309. [127, 129, 2871 [156] Schwarz, G. (1962). Asymptotic shapes of Bayes Sequential testing regions. Ann. Math. Statist. 33 224-236. [88, 891 [157] Sen, P. K. and Ghosh, M. (1971). On bounded length sequential confidence intervals based on one-sample rank-order statistics. Ann. Math. Statist. 42 189-203. [219]

[158] Seth, G. R. (1949). On the variance of estimates. Ann. Math. Statist. 20 1-27. [157, 1581 [159] Sethuraman, J. (1970). Stopping time of a rank order sequential probability ratio test based on Lehmann alternatives 11. Ann. Math. Statist. 41 1322- 1333. [129] [160] Shih, W. J. (1992). Sample size re-estimation in clinical trials. Biophar- maceutical Sequential Applications (Ed. K.E. Peace). pp. 285-301. Marcel Dekker Inc., New York. [247, 2481 [161] Shih, W. J. and Zhao, P. L. (1997). Design for sample size re-estimation with interim data for double-blind clinical trials with binary outcomes. Statistics and Medicine. 16 1913-1923. [254, 260, 2621 [162] Siegmund. D. 0. (1967). Some one-sided stopping rules. Ann. Math. Statist. 38 1641-1646. [52] [163] Siegmund. D. 0. (1968). On the asymptotic normality of one-sided stopping rules. Ann. Math. Statist. 39 1493-1497. [190] [164] Siegmund. D. 0. (1978). Estimation following sequential tests. Biometrika. 65 341-349. [246] [165] Siegmund, D. 0. (1985). Sequential Analysis, Tests and Confidence Inter- vals. Springer-Verlag, New York. [v, vii, 871 [166] Simons, G. (1967). Lower bounds for average sample number of sequential multi-hypothesis tests. Ann. Math. Statist. 38 1343-1364. [95] [167] Simons, G. (1968). On the cost of not knowing the variance when making a fixed-width confidence interval for the mean. Ann. Math. Statist. 39 1946- 1952. [215] [168] Sobel, M. and Wald, A. (1949). A sequential decision procedure for choos- ing one of three hypotheses concerning the unknown mean of a normal distribution. Ann. Math. Statist. 20 502-522. [go, 91, 951 BIBLIOGRAPHY 307

[169] Sproule, R. N. (1969). A Sequential Fixed- Width Confidence Interval for the Mean of a U-statistic. Ph.D. dissertation, University of North Carolina. ~2191 [170] Srivastava, M. S. (1967). On fixed-width confidence bounds for regression parameters and mean vector. J. Roy. Statist. SOC.Ser. B. 29 132-140. [206] [171] Srivastava, M. S. (1971). On fixed-width confidence bounds for regression parameters. Ann. Math. Statist. 42 1403-1411. [206, 2071 [ 1721 Starr, N. (1966a). The performance of a sequential procedure for fixed-width interval estimate. Ann. Math. Statist. 36 36-50. [200] [173] Starr, N. (1966b). On the asymptotic efficiency of a sequential procedure for estimating the mean. Ann. Math. Statist. 37 1173-1185. [202] [174] Starr, N. and Woodroofe, M. (1969). Remarks on sequential point estima- tion, Proceedings National Academy Sciences, USA. 63 285-288. [202] [175] Stein, C. (1945). A two-sample test for a linear hypothesis whose power is independent of the variance. Ann. Math. Statist. 16 243-258. [7, 158, 160, 167, 247, 281, 2831 [176] Stein, C. (1946). A note on cumulative sums. Ann. Math. Statist. 17 498- 499. [13, 331 [177] Stein, C. (1949). Some problems in sequential estimation. Econometrika 17 77-78. [200] [178] Tallis, G. M. and Vagholkar, M. K. (1965). Formulas to improve Wald’s approximation for some properties of sequential tests. J. Roy. Statist. SOC. Ser. B. 27 74-81. [40] [179] Vagholkar M. K. (1955). Application of statistical Decision The0y to Sam- pling Inspection Schemes. Ph.D. thesis, University of London. [110] [180] Vagholkar M. K. and Wetherill, G. B. (1960). The most economical binomiaE sequential probability ratio test. Biometrika 47 103-109. [106, 110, 1111 [181] Ville, J. (1939). Etude critique de la Notion de collectif. Gauthier-Villars, Paris. [117] [ 1821 Wald, A. (1947). Sequential Analysis. Wiley, New York. Reprinted by Dover Publications Inc. (1973). [v, 13, 14, 17, 19, 26, 29, 32, 34, 38, 39, 40, 43, 52, 59, 60, 79, 83, 85, 86, 1171 [183] Wald, A. (1949). Note on the consistency of the maximum likelihood esti- mate. Ann. Math. Statist. 20 595-601. [185] 308 BIBLIOGRAPHY

[184] Wald, A. and Wolfowitz, J. (1948). Optimum character of the sequential probability ratio test. Ann. Math. Statist. 19 326-339. [48, 1111 [185] Wasan, M. T. (1964). Sequential optimum procedures for unbiased estima- tion of a binomial parameter. Technometrics 6 259-272. [144, 151, 1521 [186] Weed, H. D., Jr. (1968). Sequential one-sample grouped signed rank tests for symmetry. Ph.D. dissertation, Florida State University. [130] [187] Weed, H. D., Jr. and Bradley, R. A. (1971). Sequential one-sample grouped signed rank tests for symmetry. J. Amer. Statist. Assoc. 66 321-326. [130]

[188] Weed, H. D., Jr. and Bradley, R. A. and Govindarajulu, Z. (1974). Stopping time of two rank order sequential probability ratio tests for symmetry based on Lehmann alternatives. Ann. Statist. 2 1314-1322. [131, 1321 [I891 Weiss, L. (1956). On the uniqueness of Wald’s sequential test. Ann. Math. Statist. 27 1178-1181. [46] [ 1901 Weiss, L. and Wolfowitz, J. (1972). Optimal, fixed-length non-parametric sequential confidence limits for a translation parameter. 2. Wahrschein- Zichkeitstheorie und Verw. Gebiete 25 203-209. [219]

[191] Wetherill, G. B. (1957). Application of the Theory of Decision Functions to Sampling Inspection with Special Reference to Cost of Inspection. Ph.D. Thesis, Univ. of London. [iii]

[192] Wetherill, G. B. (1975). Sequential Methods In Statistics, Chapman and Hall, London. [vii] [193] Wetherill, G. B. and Glazebrook, K. D. (1986). Sequential Methods In Sta- tistics, 3rd ed. Chapman and Hall, New York. [v]

[ 1941 Whitehead, J. (1983). The Design and Analysis of Sequential Clinical Tri- als. Ellis Horwood Ltd. Publishers. Chichester. [viii, 238, 242, 244, 245, 2461

[195] Whitehead, J. and Jones, D. R. (1979). The analysis of sequential clinical trials. Biometrika 66 443-452. I2461

[196] Wiener, N. (1939). The ergodic theorem. Duke Math. J. 5 1-18. [194] [197] Wijsman, R. A. (1960). A monotonicity property of the sequential probabil- ity ratio test. Ann. Math. Statist. 31 677-684. Correction: i.b.i.d. 3 (1975) 796. [46] BIBLIOGRAPHY 309

Wailcoxon, F., Rhodes, L. J. and Bradley, R. A. (1963). Two sequential two-sample grouped rank tests with applications to screening experiments. Biometrics 19 58-84. [127]

Wirjosudirdjo, S. (1961). Limiting Behavior of a Sequence of Density Ra- tios. Ph. D. Thesis, Univ. of Illinois, Urbana. Abstract: Ann. Math. Statist. 33 (1962) 296-297. [127]

Wittenberg, H. (1964). Limiting distributions of random sums of indepen- dent random variables. 2. Wahrscheinlichkeitstheorie und Verw. Gebiete 3 7-18. [189]

Wolfowitz, J. (1947). Efficiency of sequential estimates and Wald’s equation for sequential processes. Ann. Math. Statist. 18 215-230. [23, 152, 154, 155, 1571

Wolfowitz, J. (1966). Remarks on the optimum character of the sequential probability ratio test. Ann. Math. Statist. 37 726-727. [45]

Wong, S. P. (1968). Asymptotically optimal properties of certain sequential tests. Ann. Statist. 39 1244-1263. [88]

Wooadroofe, M. (1977). Second order approximations for sequential point and interval estimation. Ann. Statist. 5 984-995. [190, 2021

Woodroofe, M. (1982). Nonlinear Renewal Theory in Sequential Analysis. SOC.for Indust. and Appl. Math., Philadelphia. Pennsylvania. [236]

Yahav, J. A. and Bickel, P. J. (1968). Asymptotically optimal Bayes and minimax procedures in sequential estimation. Ann. Math. Statist. 39 442- 456. [213]

Zacks, S. (1966). Sequential estimation of the mean of a lognormal distri- bution having a prescribed proportional closeness. Ann. Math. Statist. 37 1688-1696. [224] This page intentionally left blank Subject Index

Absolute accuracy, 200 Asymptotic variance, 228, 229 Absorbing barriers, 36 Asymptotically risk-efficient, 202 Acceptance rejection, 44 Auxiliary problem, 19, 45 Acceptance sampling, 106 Auxiliary SPRT, 53 Adaptive rule, 233 Average risk, 45 Admissible, 2, 152 Average sample number (ASN), 2,21, Anderson’s test, 75 24, 38, 47 Approximate bounds, 16 Approximations to OC and ASN func- B-V procedure, 98 tions, 36 Backward induction, 111, 115, 116 Arbitrary continuation region, 44 Bacterial density, 229 Arbitrary distributions, 247 BAN (best asymptotically normal) es- Arbitrary sequential test, 33 timates, 79 Arithmatic distribution, 87 Barnard’s criteria, 65, 73 Armitage’s procedure, 96 Barnard’s sequential t and t2-tests, ASN curve, 56 65 ASN function, 2, 32, 36, 44, 83, 85, Barnard’s versions, 66 102 Bartlett’s procedure, 81 Asymptotic behavior of the stopping Bayes binomial SPRT, 106 time, 51 Bayes risk, 45 Asymptotic consistency, 193, 198, 201, Bayes sequential procedure, 106 217 normal mean, 115 Asymptotic coverage probability, 213 Bayes Theorem, 107 Asymptotic efficiency, 193, 198, 201, Bayesian criterion of optimality, 144 215, 217, 219 Behren-Fisher problem, 84 sign test, 126 Bernoulli case, 177 Asymptotic normality, 189 Bernoulli data, 269 Asymptotic normality of the statistic Bernoulli population, 20 T, 66 Bernoulli variable, 22 Asymptotic optimality, 152, 192, 198, Best sequential procedure, 171 201, 217 Best unbiased estimate, 147 Asymptotic probability of coverage, Beta family, 112 205 Binary response, 254 Asymptotic properties, 206, 235 Binomial distribution, 13

311 312 SUBJECT INDEX

Binomial proportions, 85 Covariance matrix, 204 Bivariate normal distribution, 81, 208 Covariance structure, 208 Boundary values, 83 Coverage probability, 176, 191, 196, Bounded length confidence intervals, 211 196 Cox’s procedure, 81 Bounds for ASN, 33 Cox’s sequential estimation procedure, Brownian motion, 50 289 Brownian motion process, 135 Cram&-Rao inequality, 144, 148, 152 lower bound, 144 Cauchy-Schwarz inequality, 154 Cumulative sums, 86 Central chi-square, 8 Curtailed inspection, 3 Central limit theorem, 96 Curt ailed sampling Characteristic roots, 205, 206 double, 148 Chebyshev’s inequality, 126 simple, 147 Class of weight functions, 72 symmetric, 151 Closed boundaries, 47 Coefficient of variation, 171, 200 Delta method, 228 Completeness, 145, 148, 149 Discrete time problem, 116 bounded, 145, 148 Dominated convergence theorem, 197 Composite hypothesis, 59, 75 Double minimax property, 71 Composite test, 244 Double sampling, 173 Computer program, 180 plan, 3 Concave function, 52 procedure, 167 Confidence coefficient, 158 Double triangular tests, 242, 243 Confidence interval, 173 Double-blind clinical trial, 248, 254 Confidence region, 204 Dynamic programming, 115 Confluent hypergeometric function, 65, equation, 112 76 method, 111 Conjugate prior, 112 Consistency property of MLE, 240 E-M algorithm, 247 Continuation point, 146 Edgeworth expansion, 172 Continuation region, 44, 60, 113, 242, Effective coverage probability, 253 245 Effective level of significance, 247, 249, Continue sampling indefinitely, 120 250, 256 Continue sampling region, 17 Effective power, 249, 251, 260 Continue-sampling inequality, 17, 83, Efficiency of the SPRT, 99, 100 89 Efficient score, 240 Convergent bounds, 271 Empirical distribution function, 123, Cornish-Fisher inversion, 173 128, 131 Correction for overshoot, 244 Empirical distribution functions, 216, Cost of each observation, 45 287 Cost of experimentation, 143 Error probabilities, 2, 9, 16, 24, 40, 45, 46, 48, 49, 58, 71, 73, 83, SUBJECT INDEX 313

88, 90, 97, 99, 124, 125, 174, Govindarajulu-Howard approximation, 247 73 Error rates Group sequential sampling, 111 (see error probabilities), 51 Grouped rank tests, 127 Estimable, 149 Euler-Maclaurin sum formula, 78 Hall’s sequential test, 279 Excess over the boundary, 86, 94 Helmert’s transformation, 61, 188 Expected sample number, 17 Highest reachable point, 113 Expected sample size, 27, 99, 211, Hypergeometric distribution, 237 see also Average sample num- Hypothesis of symmetry, 126 ber Hypothesis-testing problem, 11 (see Average sample number), 2 Impossible point, 146 Expected stopping time, 35 Indifference zone, 27 Expected time, 49 Inequalities for error probabilities, 16 Exponential distribution, 44 Infinitesimal surface element, 59 Exponential family, 11, 144, 145, 177 Initial sample size, 1 Exponential family of densities, 237 Inner acceptance boundary, 134, 135 Exponential survival curves, 85 Intercept, 271 Extremum, 187 Intercepts, 13 Family of densities, 14 Interchange of integration and sum- Finite sure termination, 70 mation, 155 Fisher information, 240 Interchange of summation and expec- Fixed -width confidence intervals, 191 tation, 23 lar ge-sample, 196 Interim stage, 246 Fixed-sample procedure, 5 Interval of convergence, 149 Fixed-sample procedures, 17, 18 Invariance considerations, 79 Fixed-sample size procedure, 90, 99, Invariant sufficiency, 127 152, 158, 201 Inverse binomial sampling, 151 Fixed-sample size test (FSST), 74 Inverse function, 123 Fixed-Size confidence bounds, 203 Inversion formula Fixed-width confidence interval, 246, for an integral, 78 252 Iterations, 227 Fixed-widt h estimation, 234 Jump discontinuity, 130 Fixed-width interval, 158 Fundamental identity, 30 Khintchine’s theorem, 80 Koopman-Darmois families, 86, 88 Generalized likelihood ratio tests, 86, Kronecker’s delta, 50 88 Generalized probability ratio test (GPRT), L’Hospital’s rule, 20 173 Lachin’s formula, 256 Generating distributions, 290 Laplace distribution, 125 Glivenko-Cantelli theorem, 123 Laplace’s method, 78 314 SUBJECT INDEX

Large-sample properties, 79 Modified likelihood ratio, 62 Large-sample sequential test, 85 Modified likelihoods, 60 Law of the iterated logarithm, 122 Modified rank order SPRT, 132 Least squares estimate, 204 Modified SPRT, 49 Lehmann alternatives, 127 Moment-gener at ing function, 2 2, 149 Lethal dose, 229 Moments of N, 21 Level of significance, 11 Monotone likelihood ratio (MLR), 46, Likelihood ratio, 127 173, 176 Likelihood ratio open-ended test, 89 Monotonicity of the power function, Linear hypothesis-testing problem, 75 46 Log likelihood, 240 Monotonicity property, 46 Log rank statistic, 240 Monte Carlo results, 202 Logistic distribution function, 58 Monte Carlo studies, 121, 127, 215 Lower boundary, 93 Most powerful fixed-sample procedure, Lower bounds for the ASN, 33 100 Lower confidence limit, 179 Most powerful test, 11, 126 Lower stopping bound, 46 Natural parameter space, 89 m-dimensional Euclidean space, 2 Newton-Raphson algorithm, 78 Mann-Whitney statistic, 216 Nominal error probability, 16 Mant el-Haenszel statistic, 238 Nominal level of significance, 260 Mantel-Haenszel test, 240, 264 Nominal power, 260 Matlab, 265 Non-linear unbiased estimates, 204 Maximum ASN, 102 Nonparametric confidence intervals Maximum likelihood estimake (mle) , p-point of a distribution function, 185, 240 210 Maximum likelihood estimakion, 191 Normal response, 247 Maximum likelihood SPRT,,81 Nuisance parameter, 59, 75, 79, 81, Mean, 6 170, 191, 240 Means procedure, 219 Measure theory, 196 OC and ASN functions of T, 74 Median, 126 OC function, 32, 36, 44, 55, 56, 83, Method of maximum likelihood, 247 85, 93, 98 Method of weight functions, 59, 79 monotonicity, 103 Mill’s ratio, 49 symmetry, 103 Minimax, 152 One-sided alternative, 8 Minimax solution (MMS), 166 Open-ended tests, 86, 88 Minimum probability ratio test (MPRT), Operating characteristic (OC), 2 74 curve, 20 Minimum variance unbiased estimate, function, 19, 47 216 Optimal properties of SPRT, 44 MLE, 232 Optimal sequential procedure, 2 SUBJECT INDEX 315

Optimum fixed-sample size test, 99, Repeated significance tests, 236, 244 102, 105 Response, 230 Optimum property of the SPRT, 102, Restricted SPRT, 47, 271 106 Risk, 45 Optimum property of Wald’s SPRT, Risk function, 143 46 Risk-efficient estimation, 201 Optimum SPRT, 59 Robbins’ power one test, 285 Optimum test procedure, 110 Rob bins-Monr o procedure, 2 2 7 Overshoot, 246 Robbins-Monro process, 262 Rushton’s approximation, 63 Percentage error, 173 Rushton’s t-test, 273 Perfect information, 114 Pitman-efficiencies, 215 Sack’s criterion, 73 Positive drift, 86 Sack’s version of the sequential t, 73 Posterior probability, 107 Sample information, 114 Power curve of the SPRT, 53 Sample paths, 46 Power function, 58 Sample quantiles, 184 Power of the SPRT, 53 Sample size functions, 173 Power of the test, 6 Sampling continuation region, 62, 77, Predetermined alternative, 247 79 Prior distribution, 59 Sampling inspection plan, 3 Probability of absorption, 49 most economical, 106 Probability of type I error, 53 Sampling plan, 148 Properties of sequential t-test, 70 curtailed, 4 Properties of the SPRT, 61 double, 3 Property M, 46 inverse, 148 Proportional accuracy, 200 single, 4, 148 Sampling process, 42 Radon-Nikodym derivative, 2 14 Scoring procedure, 109 Random sample size, 25 Scoring scheme, 109 Random walk, 36, 44, 82 SCR-t est Randomized design, 254 modified (MSCR), 127 Range, 187 Semi-orthogonal matrix, 207 Rank order, 127, 130 Sequential binomial test, 60 Rank order SPRT, 287 Sequential chi-square test, 60 Rank sum test, 127 Sequential configural rank test (SCR- Rank test procedure, 123 test), 127 Rao-Blackwell Theorem, 146 Sequential design problem, 1 Regression coefficients, 203 Sequential estimation, 143, 190 Regularity assumptions, 211 Sequential estimation procedures, 156 Rejection boundary, 133 Sequential F-test, 75, 77 Rejection rule, 244 Reparameterize, 228 316 SUBJECT INDEX

Sequential likelihood ratio procedure, 128, 131, 246, 269 84, 85 difficulty, 47 Sequential model, 144 limiting relative efficiency, 105 Sequential probability ratio test restricted, 47 see also SPRT, 11 truncated, 40 truncated, 40 SPRT confidence interval Sequential procedure, 13, 144 fixed length, 178 Sequential procedures, 1, 11 SPRT fixed-width confidence inter- Sequential rank test val, 179 Kolmogorov-Smirnov test, 123 SPRTconfidence interval, 175 Sequential ranks, 137 SPRTs Sequential rule, 5, 192 one-sided, 88 Sequential sampling simultaneous, 88 group, 111 SSR statistic, 137 unit-step, 111 SST, 127 Sequential sign test, 124 Standard deviation, 6 Sequential sign test (SST), 125 Standard normal distribution, 249 efficiency, 126 Standardized average, 211 Sequential signed rank test State of nature, 45 for symmetry, 132 Stein’s test, 8, 74 Sequential Spearmen-Karber proce- Stein’s two stage procedure, 10 dure, 263 Stein’s two-stage procedure, 6,73,281 Sequential stopping rule, 183 Stein’s two-stage test, 283 Sequential t and t2-test, 61, 65 Stochastically larger, 208 Sequential t-test, 62, 71, 275 Stopping bounds, 91 Sequential t2-test, 71, 277 Stopping point, 146, 147 Sequential test, 37 Stopping points, 144, 155 Sequential test T, 73 Stopping rule, 143, 145 Sign test, 211, 215 Stopping rules, 190 Simple alternative, 45 Stopping time, 51, 87, 119, 189 Simple exponential distribution, 12 Stopping time of the SPRT, 99 Simple hypothesis, 45 Stopping times, 129 Simulation studies, 65 Stopping variable, 26, 128, 196, 209 Simulation study, 177 Straight line boundaries, 47 Single parameter family of distribu- Strength of test T, 74 tions, 196 Strong law of large numbers, 50, 188 Single sampling plan, 4 Strong represent ation, 187 Slope, 271 Student’s hypothesis, 6 Slowly varying function, 52 Student’s t, 159 Spearman-Karber estimator, 23 1 Successive approximation method, 171 Specified alternative, 260 Sufficient sequence, 144 SPRT, 13, 36, 55, 59, 109, 125, 127, Sufficient statistics, 98 SUBJECT INDEX 317

Survival analysis, 238 Uniform asymptotic expansion, 63 Survival time, 238 Uniform continuity in probability, 182 Symmetric procedures, 48 Uniformly best test, 2 Uniformly consistent solution (UCS), t-distribution, 6, 9 166 t-test, 215 Uniformly integrable, 212 Taylor’s series, 149 Uniformly minimum variance unbi- Temporary confidence interval, 179 ased estimator (UMVU), 165, Temporary confidence intervals, 175 167 Terminal decision rule, 2, 143 Uniformly most powerful unbiased test, Termination of the experiment, 1 6 Test with power one, 120 Unimodal distributions, 126 Tests for three hypotheses, 90 Uniqueness, 149 Tests of small error probability, 119 Uniqueness of a SPRT, 46 of power one, 117 Unit-step sequential sampling, 111 Three point binomial, 111 Up and down method, 230 Tolerance distribution, 232 Upper boundary, 93 Total sample size, 10 Upper confidence limit, 179 Transcendental equation, 64 Upper percentage point, 10 Transitive sequence, 145 Upper stopping bound, 46 Transitivity, 145 Translation shifts, 229 Wald approximation Triangular matrix, 207 to the boundary values, 33, 178 Triangular test, 242 Wald’s approximation Truncated process, 41 to OC, 36 Truncated sequential test procedure, to the ASN, 52 44 Wald’s bounds, 178 Truncated SPRT, 40, 58 Wald’s criteria, 65, 73 Truncation of the procedure, 29 Wald’s equation, 154 Truncation point, 48 Wald’s fundamental identity, 29 Two point binomial, 111 Wald’s inequalities, 53 Two-sample (or two-stage) test, 7 Wald’s method of weight functions, Two-sample case, 9 75 Two-sided alternative, 8, 60, 96 Wald’s second equation, 28 Two-sided test, 135 Wald’s SPRT, 2, 13, 15, 24, 44, 45, Two-stage procedure, 162, 167 54, 55, 61, 74, 84, 121, 173 for common mean, 164 asymptotic behavior of error rate Stein’s, 158 and ASN, 51 Two-stage sampling scheme, 164 limiting relative efficiency, 105 Type I error probability, 254 Wald’s stopping bounds, 74 Wald’s Theorem, 26 U-statistics, 214, 219 Weak law of large numbers, 250 one-sample, 214 318 SUBJECT INDEX

Wiener process, 47, 48, 115, 133 Wilcoxon test one-sample, 213, 214 signed rank sequential procedure, 136 signed rank statistic, 134