<<

A PARTITIONING APPROACH FOR THE SELECTION OF THE BEST TREATMENT

Yong Lin

A Dissertation

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

August 2013

Committee:

John T. Chen, Advisor

Arjun K. Gupta

Wei Ning

Haowen Xi, Graduate Faculty Representative ii ABSTRACT

John T. Chen, Advisor

To select the best treatment among several treatments is essentially a multiple compar- isons problem. Traditionally, when dealing with multiple comparisons, there is one main argument: with multiplicity adjustment or without adjustment. If multiplicity adjustment is made such as the Bonferroni method, the simultaneous inference becomes too conserva- tive. Moreover, in the conventional methods of multiple comparisons, such as the Tukey’s all pairwise multiple comparisons, although the simultaneous confidence intervals could be obtained, the best treatment cannot be distinguished efficiently. Therefore, in this disser- tation, we propose several novel procedures using partitioning principle to develop more efficient simultaneous confidence sets to select the best treatment. The method of partitioning principle for efficacy and toxicity for ordered treatments can be found in Hsu and Berger (1999). In this dissertation, by integrating the Bonferroni inequality, the partition approach is applied to unordered treatments for the inference of the best one. With the introduction of multiple comparison methodologies, we mainly focus on the all pairwise multiple comparisons. This is because all the treatments should be compared when we select the best treatment. These procedures could be used in different data forms. Chapter 2 talks about how to utilize the procedure in dichotomous outcomes and the analysis of contingency tables, especially with the Fisher’s Exact Test. Chapter 3 discusses the procedures in nonparametric field. With Mann-Whitney test, these procedures become more robust. Chapter 4 addresses the procedures with continuous data under normality. In Chapter 5 we apply the procedures to analyze a prostate cancer study. iii ACKNOWLEDGMENTS

In retrospect, there are so many people coming into my mind. Without their help and support, this dissertation could never have come to existence. First and foremost, I would like to thank my advisor, Dr. John Chen, for his constant support, great guidance and many suggestions throughout this research. Dr. John Chen is not only the mentor but also the friend who gives me all the advice, encouragement and experience that he shares about life in general. I would also like to thank my dissertation committee members, Dr. Arjun Gupta, Dr. Wei Ning, Dr. Haowen Xi, for their precious time, generous support and suggestions to my dissertation. I am also grateful to the department of Mathematics and for providing a won- derful study, teaching and research environment. I especially wish to thank our staff, Marcia Seubert, Mary Busdeker, Cyndi Patterson and Barbara Berta for all their assistance. I ap- preciate the graduate coordinator Dr. Tong Sun for his generous support and the excellent courses taught by Dr. John Chen, Dr. Arjun Gupta, Dr. Jim Albert, Dr. Craig Zirbel ,Dr. Wei Ning, Dr. Hanfeng Chen, Dr. Maria Rizzo, and Dr. Junfeng Shang and other faculty. I am thankful to my friends from Bowling Green, Chen, Wenren, Songzi, Lihua, Jet and all others for their fun and help. Finally, I thank my beloved parents, Wenle and Guifeng, my girlfriend Simeng for their full support, love and encouragement. iv

TABLE OF CONTENTS

CHAPTER 1: INTRODUCTION 1 1.1 Background ...... 1 1.2 Types of Multiple Comparisons ...... 2 1.2.1 All Contrast Comparisons ...... 3 1.2.2 All-Pairwise Comparisons ...... 3 1.2.3 Multiple Comparisons with the Best ...... 4 1.2.4 Multiple Comparisons with a Control ...... 5 1.3 Studentized Maximum Modulus ...... 5 1.3.1 Inferences for Studentized Maximum Modulus ...... 5 1.3.2 Example: a Crystalline Drug Substance ...... 7 1.4 Tukey’s Method ...... 8 1.4.1 Inference for Tukey’s Method ...... 8 1.4.2 Example: Tukey’s Method for Crystalline Drug Substance ...... 10 1.5 Scheff´e’sMethod ...... 12 1.5.1 Inference for Scheff´e’sMethod ...... 12 1.5.2 Example: Scheff´e’sMethod for Crystalline Drug Substance ...... 14 1.6 Bonferroni Method ...... 15 1.6.1 Inference for Bonferroni Method ...... 15 1.6.2 Example: Bonferroni Method for Crystalline Drug Substance . . . . . 17 1.7 Nonparametric Approach ...... 19 v 1.8 Fisher’s Exact Test ...... 20 1.8.1 Inference Using Fisher’s Exact Test ...... 20 1.8.2 Example: Python Eggs ...... 22

CHAPTER 2: IDENTIFYING THE BEST TREATMENT USING FISHER’S EXACT TEST 24 2.1 Binary Data ...... 24 2.2 Odds Ratio ...... 25 2.3 Introduction to Partition ...... 29 2.4 Main Results ...... 30 2.5 Procedures ...... 38 2.5.1 Procedure for Theorem 2.1 ...... 38 2.5.2 Procedure for Theorem 2.2 ...... 39 2.6 Simulation ...... 46

CHAPTER 3: IDENTIFYING THE BEST TREATMENT USING MANN- WHITNEY TEST 49 3.1 Simultaneous Inference with Mann-Whitney Test ...... 49 3.2 Large-Sample Approximation ...... 50 3.2.1 Example ...... 51 3.3 Main Results ...... 53 3.4 Procedures ...... 63 3.4.1 Procedure for Theorem 3.1 ...... 63 3.4.2 Procedure for Theorem 3.2 ...... 63 3.5 Simulation ...... 65

CHAPTER 4: INDENTIFYING THE BEST TREATMENT UNDER NOR- MALITY 71 4.1 Multivariate Normal Distribution ...... 71 vi 4.2 t-test with Welch Correction ...... 72 4.3 Simultaneous Inference ...... 75 4.3.1 Main Results ...... 75 4.3.2 Procedure for Theorem 4.1 ...... 82 4.3.3 Procedure for Theorem 4.2 ...... 82 4.4 Simulation ...... 82

CHAPTER 5: APPLICATIONS IN A PROSTATE CANCER STUDY 90 5.1 Data Background ...... 90 5.2 Main Results ...... 94 5.2.1 Theorem Results ...... 94 5.2.2 Analysis Results ...... 100

BIBLIOGRAPHY 107

APPENDIX SELECTED R AND SAS PROGRAMS 112 .1 Simulation for the Procedures with Fisher’s Exact Test ...... 112 .1.1 Pre-specified the Best Treatment ...... 112 .1.2 Select the Best Treatment from Unknown ...... 117 .2 Simulation for the Procedures with Wilcoxon Mann-Whitney Test ...... 119 .2.1 Pre-specified the Best Treatment ...... 119 .2.2 Select the Best Treatment from Unknown ...... 124 .3 Simulation for the Procedures with Normality ...... 126 .3.1 Pre-specified the Best Treatment ...... 126 .3.2 Select the Best Treatment from Unknown ...... 132 .4 Applications in a Prostate Cancer Study ...... 134 .4.1 Using SAS to Deal with Original Dataset ...... 134 .4.2 Using the New Procedure with Wilcoxon Mann-Whitney Test . . . . 137 .4.3 Using the New Procedure under Nomality ...... 141 vii

LIST OF FIGURES

2.1 Partition of a Set S ...... 29

3.1 Procedure for Theorem 3.1 ...... 67 3.2 Procedure for Theorem 3.2 at Stage 1 ...... 68 3.3 Procedure for Theorem 3.2 at Stage 2 ...... 69 3.4 Procedure for Theorem 3.2 from Stage 3 to Stage k ...... 70

4.1 Procedure for Theorem 4.1 ...... 86 4.2 Procedure for Theorem 4.2 at Stage 1 ...... 87 4.3 Procedure for Theorem 4.2 at Stage 2 ...... 88 4.4 Procedure for Theorem 4.2 from Stage 3 to Stage k ...... 89

5.1 Flow Chart for the Prostate Cancer Study ...... 103 5.2 Frequency Table for the Prostate Cancer Data ...... 104 5.3 QQ-plot for Treatments ...... 105 5.4 Median and Standard Deviation for 12 Treatments ...... 106 viii

LIST OF TABLES

1.1 Impurities of Product under One Dose of Irradiation ...... 7 1.2 Simultaneous Confidence Interval by Studentized Maximum Modulus . . . . 8 1.3 Simultaneous Confidence Interval by Tukey’s method ...... 11 1.4 Simultaneous Confidence Interval by Scheff´e’sMethod ...... 15 1.5 Simultaneous Confidence Interval by Bonferroni Method ...... 18 1.6 2 × 2 Table of Outcomes ...... 21 1.7 Hatched Eggs ...... 22

2.1 2 × 2 Table of Patients ...... 25 2.2 2 × k Table for k Treatments ...... 27 2.3 Coverage Probability with C.L.=.95 and Different Trail Numbers ...... 47 2.4 Coverage Probability with Trial Number n = 90 and Different Orders . . . . 47 2.5 Coverage Probability with Trial Number n = 100 and Different Response Shapes 48

3.1 Mirror Therapy ...... 52 3.2 Coverage Probability with C.L.=.95 and Different Sample Sizes ...... 65 3.3 Coverage Probability with C.L.=.95 and Sample Size n=30 ...... 66 3.4 Coverage Probability with Sample Size n=30 and Different Median Shapes . 66

4.1 Coverage Probability with Different Orders under Normality ...... 83 4.2 Coverage Probability with Different Sample Sizes under Normality ...... 83 4.3 Coverage Probability with Different Variances under Normality ...... 84 ix 4.4 Coverage Probability with Different Mean Shapes under Normality . . . . . 84 1

CHAPTER 1

INTRODUCTION

1.1 Background

When estimating a population parameter, there are point estimators and interval estima- tors. In practice, the confidence interval estimator is preferred, because of the reliability of estimation. Moreover, confidence interval, which is a particular kind of interval estimate, is commonly used. For example, suppose there is a simple random sample Y1,Y2, ..., Yn from normal distribution N(µ, σ2) with known σ, then we have parameter µ and σ. Thus, the

ˆ σ ˆ σ 100(1 − α)% confidence interval for µ is (Y − Zα/2 n , Y + Zα/2 n ), where Zα/2 is the upper α/2 critical value for the normal standard distribution. In a similar way, if there are more than one parameter estimated at the same time, simultaneous confidence intervals should be applied. Normally, simultaneous confidence intervals constitute a confidence region which esti- mates a multivariate parameter. Here, we take the case of two means as an example. Let

Yi1,Yi2, ..., Yin be a simple random sample from a normal distribution with mean µi and

2 known variance σi , where i=1,2, respectively. If µ1 and µ2 are considered to be estimated simultaneously, the simultaneous confidence intervals are applicable here. By the Bonferroni √ method, the 100(1 − α)% simultaneous confidence interval for µ is (Yˆ − t α σ / n, Yˆ + i i 2×2 ,n−1 i i 2 √ t α σ / n), where t α is the upper α/4 critical value for t-distribution with degrees 2×2 ,n−1 i 2×2 ,n−1 ˆ of freedom n-1 and Yi is the point estimate for the mean µi. The simultaneous confidence interval is one kind of simultaneous statistical inference. Traditionally, simultaneous statistical inference is the inference on several parameters at once and there are several techniques in simultaneous statistical inference especially in confidence estimation. Some of these techniques are studentized maximum modulus, Tukey-Kramer method, Scheff´e’smethod, Bonferroni method, Multiple range tests (Duncan) method, and so on. To some extent, the simultaneous statistical inference is also multiple comparisons. Mul- tiple comparisons are frequently encountered in industry, clinical trials, and social researches, among others. Multiple comparisons is a kind of comparisons of two or more treatments. If we are concerning about inference on k treatment means which can be denoted as µ1, µ2, ..., µk in multiple comparisons, the functions of contrasts of µ1, µ2, ..., µk are the parameters of in- terest. A contrast of the k treatment means is a linear combination of the k treatment means with coefficients added up to zero. The definition symbolically, a contrast of the k treatment Pk Pk means µi’s, is c1µ1 + c2µ2 + ... + ckµk = j=1 ckµk with i=1 ci = 0.

1.2 Types of Multiple Comparisons

The multiple comparisons can be categorized by the parameters of primary interest, where the most four common types of multiple comparisons are : 1. All-contrast comparisons; 2. All-Pairwise comparisons; 3. Multiple comparisons with the best; 4. Multiple comparisons with the control.

Here, to better understand these four different types of multiple comparisons, we consider 3 the one-way model as an illustrating example.

Suppose there are k treatments, a random sample Yi1,Yi2, ..., Yini is taken under the

2 ith treatment with the means µi and unknown common variance σ , for i=1,2,...,k. These random samples are independent among the k treatments. Then, the one way model can be formulated as:

Yij = µi + i,j, i = 1, 2, ..., k, j = 1, 2, ..., ni,

where 1,1, 1,2, ..., n,k are independent and identically distributed from normal distribution

2 2 with mean 0 and unknown variance σ , which can be written as i,j ∼ N(0, σ ).

1.2.1 All Contrast Comparisons

All contrast comparisons, as the name tells, the parameters of primary interest are all con- trasts. Therefore, symbolically, the primary interests are

Pk i=1 ciµi with

Pk i=1 ci = 0

µ2+µ3 where ci is the coefficient of µi. For example, θ = µ1 − 2 is one of the contrasts, where 1 1 Pk c1 = 1, c2 = − 2 , c3 = − 2 , c4 = c5 = ··· = ck = 0. Therefore, i=1 ci = 0. Here, θ compares the difference between the effect of treatment 1 and the mean effect of treatment 2 and 3.

Since there are infinite linear combinations between ci and µi, for all i=1,...,k, under the Pk constraint i=1 ci = 0, there is an infinite number of contrasts in all contrasts comparisons.

1.2.2 All-Pairwise Comparisons

Pairwise comparison compares treatments in pairs. Then, all pairwise comparisons are all

k k(k−1) the comparisons in pairs without replication. Thus, in total, there are 2 = 2 pairwise differences of treatment means in all pairwise comparisons, which can be expressed as: 4

µi − µj, for all i < j.

The pairwise differences of treatment means are assumed to be the parameters of primary interest. For example, if k=4 in the one way model, then the number of pairwise differences

4 of treatment means is 2 = 6 , in this case, the parameters of interest are: µ1 − µ2, µ1 −

µ3, µ1 − µ4, µ2 − µ3, µ2 − µ4, µ3 − µ4.

1.2.3 Multiple Comparisons with the Best

Generally, the parameters of primary interests for multiple comparisons with the best are the differences between every treatment with the best of the other treatments. For instance, in the one way model, if larger treatment effect value implies better treatment, the parameters of primary interest are

θi = µi − maxj6=i µj, for i = 1, ..., k.

If

θi < 0, then the treatment i is not the best. Otherwise, if there exists an integer i, such that

θi > 0, then the treatment i is the best.

For a simple example, in the one-way model, suppose k=3 and µ1 = 1, µ2 = 3, µ3 = 2 then the parameters of primary interest are: θ1 < 0, θ2 > 0, θ3 < 0. Thus, the second treatment is the best. Similarly, if smaller treatment effect value implies better treatment, the parameters of primary interest become

µi − minj6=i µj, for i = 1, ..., k

Actually, no matter the smallest or largest treatment effect value shows the best treat- ment, if the best treatment is known, then it will become the problem of multiple comparisons with a control which is introduced in the following section. 5 1.2.4 Multiple Comparisons with a Control

Multiple comparisons with a control compares each treatment with the control one. Suppose treatment k is the control, then the parameters of primary interest can be specified as:

µi − µk, for i = 1, ..., k − 1.

When the underlying model is normal, the famous method for solving multiple comparisons with a control is the Dunnett method. Normally, by the methods, multiple comparisons can be sorted into two parts: step- wise and single-step approach. For example, Holm-Bonferroni method is a kind of stepwise method. The single-step approach includes Tukey’s method, Scheff´e’smethod, and Bonfer- roni method, and so on.

1.3 Studentized Maximum Modulus

1.3.1 Inferences for Studentized Maximum Modulus

The studentized maximum modulus, introduced by Tukey, Roy, and Bose, is based on the one way model. Suppose there are k treatments and for any treatment i in 1,...,k, a simple

2 random sample Yi1,Yi2, ..., Yini is normally distributed with mean µi and variance σ . Then the one way model is:

Yij = µi + ij, i = 1, ..., k, j = 1, ..., ni, (1.1)

where 11, 12, ..., knk are independent and identically distributed normally with mean 0 and

2 unknown variance σ . ni stands for the sample size of treatment i. Moreover, the sample

2 2 meansµ ˆi and the pooled sample variance S =σ ˆ are:

ni ¯ X µˆi = Yi = Yij/ni, (1.2) j=1 6

k ni k 2 2 X X ¯ 2 X S =σ ˆ = (Yij − Yi) / (ni − 1). (1.3) i=1 j=1 i=1 Then, the studentized maximum modulus statistic is:

α |µbi − µi| |m|k,v = max √ . (1.4) 1≤i≤k s/ ni

Pk Let ν = i=1(ni − 1). Then,

Pk Pni ¯ 2 (Yij − Yi) νσˆ2/σ2 = i=1 j=1 , σ2

which has a χ2 distribution with ν degrees of freedom. Then the exact 100(1 − α)% simul-

taneous confidence intervals for µi is:

α √ µi ∈ µbi ± |m|k,vσ/ˆ ni, i = 1, ..., k. (1.5)

α where |m|k,v is the upper α point of the studentized maximum modulus distribution with parameters k, ν. Since:

α √ √ α P {µi ∈ µbi ± |m|k,vσ/ˆ ni, i = 1, ..., k} = Eσ/σˆ [P { ni(|µˆi − µi|)/σ| ≤ |m|k,v(ˆσ/σ) for i= 1,...,k}|σ/σˆ ] Z ∞ α α k = [Φ(|m|k,vt) − Φ(−|m|k,vt)] γν(t)dt 0 = 1 − α.

where Φ is the standard normal distribution function and γν is the density of s/σ. Therefore,

α |m|k,v can be computed as the solution of the equation:

Z ∞ α α k [Φ(|m|k,vt) − Φ(−|m|k,vt)] γν(t)dt = 1 − α. (1.6) 0 7 1.3.2 Example: a Crystalline Drug Substance

In Mario Parra, Pliar Rodriguez-Loaiza and Salvador Namur (2003), there is a case study about the development of a crystalline drug substance which is referred as the product. This product consists of four major organic impurities: A, B, C, D. The impurity values are measured by ppm (parts per million). During evaluation of products, different doses of gamma ray irradiation are used to sterilizing them. Therefore, the impurity of the product will change. Part of the data is shown on Table 1.1. From the data, the mean estimators of

Table 1.1: Impurities of Product under One Dose of Irradiation

Impurity Baseline 1 dose Difference A 513 590 77 A 380 703 323 A 717 843 126 A 673 770 97 B 87 613 526 B 67 540 473 B 133 497 364 B 143 577 434 C 27 680 653 C 10 633 623 C 47 430 383 C 60 430 370 D 23 43 20 D 77 120 43 D 110 143 33 D 137 137 0

the difference for A, B, C, D are:

µˆA = 155.75,

µˆB = 449.25,

µˆC = 507.25,

µˆD = 24, 8 and the pool standard deviation of the sample isσ ˆ = 226.1564. The degrees of freedom for the studentized maximum modulus are k = 4 and ν = 12. Therefore, for the significance

0.05 level α = 0.05, |m|4,12 = 3.095, and

.05 √ 3.095×226.1564 |m|4,12σ/ˆ ni = 2 ≈ 349.98.

Thus, the 95% studentized maximum modulus simultaneous confidence intervals for the four kinds of impurity are given as follows.

Table 1.2: Simultaneous Confidence Interval by Studentized Maximum Modulus

Impurity Simultaneous C.I. A 155.75 ± 349.98 B 449.25 ± 349.98 C 507.25 ± 349.98 D 24 ± 349.98

From Table 1.2, at 95% overall confidence level, the organic impurities A and D are not significantly different under one dose of gamma ray irradiation. The organic impurities B and D are significantly different. Moreover, under one dose of gamma ray irradiation, the impurity values of B and D have increased.

1.4 Tukey’s Method

1.4.1 Inference for Tukey’s Method

The Tukey’s method, also called studentized range method, is a single-step multiple com- parison procedure. It is one of the techniques for all-pairwise multiple comparisons. Thus, if

k there are 2 = k(k−1)/2 pairwise differences of k treatment means in a multiple comparison problem, Tukey’s method is applicable.

For all-pairwise multiple comparisons, the parameters of primary interest are µi − µj for 9 all i 6= j. Then, this kind of problems could be formulated as the hypothesis test as follows:

H0 : µ1 = µ2 = ··· = µt for all t ≤ k,

or equivalently

H0 : µi − µj = 0 i 6= j.

Let {Yij; i = 1, ..., k, j = 1, ..., n} be k independent samples with the same sample size n, which are independently, normally distributed random with the common variance σ2 and

means µi. Thus, this is a one way model with equal numbers of observation in each treatment

described in Section 1.2, where ni = n for i=1,...,k. Thus, the sample means and the pooled sample variance become: n ¯ X µˆi = Yi = Yij/n, j=1

k n 2 X X ¯ 2 σˆ = MSE = (Yij − Yi) /k(n − 1). i=1 j=1 Then, the simultaneous confidence intervals for all-pairwise differences in Tukey’s method are:

∗ p µi − µj ∈ µˆi − µˆj ± |q |σˆ 2/n for all i 6= j, (1.7)

∗ α where |q | = qk,k(n−1) is the upper 100α percent point of the studentized range distribution with k, ν = k(n − 1) for parameters and is the solution to the equation:

|µˆ − µ − (ˆµ − µ )| P { i i j j ≤ |q∗| for all i > j} = 1 − α. (1.8) σˆp2/n

Numerically, |q∗| is the solution of the equation:

Z ∞ Z ∞ √ k [Φ(z) − Φ(z − 2|q∗|s)]k−1dΦ(z)γ(s)ds = 1 − α, (1.9) 0 −∞ 10 since:

|µˆ − µ − (ˆµ − µ )| P { i i j j ≤ |q∗| for all i > j} σˆp2/n k X ∗ p ∗ p = P {−|q |σˆ 2/n < µˆi − µˆj − (µi − µj) < |q |σˆ 2/n i=1

for all j, j 6= i, andµ ˆi − µi = max (ˆµj − µj)} j=1,2,...,k k X ∗ p = P {0 < µˆi − µˆj − (µi − µj) < |q |σˆ 2/n i=1 for all j, j 6= i} k √ X √ √ ∗ = P {0 < n(ˆµi − µi)/σ − n(ˆµj − µj)/σ < 2|q |σ/σˆ i=1 for all j, j 6= i} √ √ √ ∗ = kP {0 < n(ˆµ1 − µ1)/σ − n(ˆµj − µj)/σ < 2|q |σ/σˆ

for j=2,3,...,k} Z ∞ Z ∞ √ = k [Φ(z) − Φ(z − 2|q∗|s)]k−1dΦ(z)γ(s)ds 0 −∞ = 1 − α.

The Tukey’s method can be also applied in different sample sizes for obtaining the si- multaneous confidence intervals, which is known as Tukey-Kramer method. However, when the number of treatments is large, Tukey’s method will lose some power. Then the other methods such as studentized maximum modulus can be used to replace it.

1.4.2 Example: Tukey’s Method for Crystalline Drug Substance

From the studendized maximum modulus method, the simultaneous confidence intervals in the crystalline drug substance are obtained in Section 1.3.2. The Tukey’s method can also apply for this case study. If the hypotheses are: H0 : µi − µj = 0 versus H1 : µi − µj 6= 0 for 11 all i, j= A, B, C, D and i 6= j, then from Table 1.1, the estimates are:

µˆA = 155.75,

µˆB = 449.25,

µˆC = 507.25,

µˆD = 24,

σˆ = 226.1564.

The degrees of freedom for the Tukey’s method are k = 4 and N −k = 12. For the significance

∗ level α = 0.05, |q | = |q0.05,4,12| ≈ 4.20,

|q∗|σˆp2/n = 4.20 × 226.1564 × p2/4 ≈ 671.65.

Then a 95% simultaneous confidence interval for difference between effects of 1 dose of gamma ray irradiation for the different kinds of the impurity of crystalline drug substance by Tukey’s method is presented in Table 1.3.

Table 1.3: Simultaneous Confidence Interval by Tukey’s method

Impurity Simultaneous C.I. A-B −293.5 ± 671.65 A-C −351.5 ± 671.65 A-D 131.75 ± 671.65 B-C −58 ± 671.65 B-D 425.25 ± 671.65 C-D 483.25 ± 671.65

From Table 1.3, at overall confidence level .95, there is no significantly different effect between any two impurities under one dose of gamma ray irradiation. 12 1.5 Scheff´e’sMethod

1.5.1 Inference for Scheff´e’sMethod

Scheff´e’smethod is a single-step multiple comparison which constructs a simultaneous confi- dence band for all contrasts, which is often used in an analysis of variance (ANOVA). Further-

more, Scheff´e’smethod can be extended to arbitrary linear spaces. Suppose Y = (Y1, ..., Yn), a vector of n components with mean vector µ = (µ1, µ2, ..., µn), for each component, there is a linear model:

Yi = β1xi1 + β2xi2 + ··· + βnxip + εi for i ∈ {1, 2, ..., n} (1.10)

T 2 where ε = (ε1, ε2, ..., εn) follows N(0, σ In).

Let β = (β1, β2, ..., βp), then E(Y ) = µ = Xβ (1.11) with the independent variable matrix X:

  x1,1 x1,2 ··· x1,p     x2,1 x2,2 ··· x2,p X =   . (1.12)  . . . .   . . .. .      xn,1 xn,2 ··· xn,p

From the generalized linear model, the least squares and maximum likelihood estimator βˆ of β and the unbiased estimator s2 of σ2 (if (XT X)−1 exists) are:

βˆ = (XT X)−1XT Y , s2 = Y T (I − X(XT X)−1XT )Y/(n − p). 13 βˆ and s2 have the following properties:

βˆ ∼ N(β, σ2(XT X)−1) (n − p)s2 ∼ χ2 with n-p degrees of freedom σ2 βˆ and s2 are independent.

Let Ld be any d-dimensional linear space with d ≤ p. Then for any l ∈ Ld, its cor-

T Pp T 2 T T −1 responding linear combinations is l β = i=1 liβi, which follows N(l β, σ (l (X X) l)). Thus, (lT β − lT βˆ)2 ∼ F . s2(lT (XT X)−1l)/d d,n−p

T Therefore, the 100(1 − α)% simultaneous confidence interval for l β given any l ∈ Ld is:

1 1 T T ˆ α 2 T T −1 2 l β ∈ l β ± (dFd,n−p) s(l (X X) l) , (1.13)

α where Fd,n−p is the upper 100α percent point of the F distribution with d degrees of freedom Pk in the numerator, n − p in the denominator. For any linear combination contrast i=1 ciµi where c1 + c2 + ··· + ck = 0, let li = ci, βi = µi for all the i in 1, ··· , k, thus, d = k − 1 here. Then, the Scheff´e’s100(1 − α)% simultaneous confidence intervals for all linear combination contrasts of µ1, ··· , µk are:

v k k q u k X X uX 2 ciµi ∈ ciµˆi ± (k − 1)Fα,k−1,νσˆt ci /n. (1.14) i=1 i=1 i=1

Therefore, we can obviously deduce pairwise comparisons from Sheff´e’sconfidence set by

0 specializing to ci = 1, lj = −1 and 0 in other c s. As a result, for all-pairwise comparisons in the one-way model with equal sample size, the 100(1−α)% simultaneous confidence intervals 14 for the pairwise differences µi − µj are:

q p µi − µj ∈ µˆi − µˆj ± (k − 1)Fα,k−1,νσˆ 2/n, for all i, j, i 6= j. (1.15)

Given that, if different values of ci are selected for all i in 1, ··· , k, then Scheff´e’smethod can also provide its corresponding simultaneous confidence interval. For all-pairwise com- parisons in the balanced one-way model, Scheff´e’smethod is too conservative. However, if the contrasts are not for all pairwise differences, Scheff´e’smethod is suggested. Furthermore, since Scheff´e’smethod is used for all possible comparisons, the Scheff´e’smultiple comparison procedure would be more conservative relative to other methods.

1.5.2 Example: Scheff´e’sMethod for Crystalline Drug Substance

As the example in the Section 1.3.2, to get the simultaneous confidence intervals for the difference in impurity after gamma ray irradiation, Scheff´e’smethod can achieve this. From Table 1.1, the mean estimators of the difference for A, B, C, D and the estimated pool standard deviation are:

µˆA = 155.75,

µˆB = 449.25,

µˆC = 507.25,

µˆD = 24,

σˆ = 226.1564.

The degrees of freedom for the Scheff´e’s method are k − 1 = 3 and ν = 12. Therefore, at significance level α = 0.05, the hypotheses are: H0 : µi − µj = 0 versus H1 : µi − µj 6= 0, p p then (k − 1)Fα,k−1,ν = 3F.05,3,12 ≈ 10.47 and

p p 10.47×226.1564 (k − 1)Fα,k−1,νσˆ 2/n = √ = 3348.656. 2/4 15

For the hypotheses: H0 : µi = 0 versus H1 : µi 6= 0, then

p p 10.47×226.1564 (k − 1)Fα,k−1,νσˆ 1/n = √ = 4735.715. 1/4

Then 95% simultaneous confidence intervals for the impurity of crystalline drug substance by Scheff´e’smethod are shown on Table 1.4.

Table 1.4: Simultaneous Confidence Interval by Scheff´e’sMethod

Impurity Simultaneous C.I. A 155.75 ± 4735.715 B 449.25 ± 4735.715 C 507.25 ± 4735.715 D 24 ± 4735.715 A-B −293.5 ± 3348.656 A-C −351.5 ± 3348.656 A-D 131.75 ± 3348.656 B-C −58 ± 3348.656 B-D 425.25 ± 3348.656 C-D 483.25 ± 3348.656

From Table 1.4, at confidence level 0.95, there is no significant effect for any impurity under one dose of gamma ray irradiation. Moreover, there is also no significantly different effect between any two impurities under one dose of gamma ray irradiation. Compared with the studentized maximum modulus and Tukey’s simultaneous confidence intervals, the margin of error is the largest in Scheff´e’smethod.

1.6 Bonferroni Method

1.6.1 Inference for Bonferroni Method

The Bonferroni inequality is the basic theorem for Bonferroni adjustment. Moveover, the Bonferroni inequality can be obtained from the Boole’s inequality. Hence, it is necessary to introduce Boole’s inequality concisely. 16

Boole’s inequality: if P is a probability function, then for any sets A1,A2, ..., we have

∞ P∞ P∞ P (∪i=1Ai) ≤ i=1 P (Ai), if i=1 P (Ai) < ∞. For the one-way model:

Yi = µi + εi for i=1,2,...,k

2 2 2 where εi follows N(0, σi ). Supposeµ ˆi and si are the estimators for µi and σi , respectively.

2 Then under the assumption that Yi is independent of si , i = 1, ··· , k, we will have the t statistic:

µˆi − µi Ti = ∼ tνi i = 1, ..., k (1.16) si

which follows a t distribution with νi degrees of freedom, i = 1, ..., k. Thus, for individual

µi, the 100(1 − α)% confidence interval is:

µi ∈ µˆi ± tα/2,νi si

To test hypotheses: H0i : µi = 0 versus H1i : µi 6= 0, for i=1,2,...,k. There are k t-tests for these hypotheses. For each hypothesis, controlling the significance level at α, we have

P (Ti rejects H0i|H0i is true) = α.

If all k hypotheses are simultaneously tested at significance level α, then:

k P (∪i=1(Ti rejects H0i|H0i is true)) = α.

Thus, the significance level for each test is should be adjusted. Suppose the significance level

? c for each test is α and let Ei = (Ti rejects H0i|H0i is true), then:

k k c P (∪i=1(Ti rejects H0i|H0i is true)) = P (∪i=1Ei ) k X c ≤ P (Ei ) (1.17) i=1

= kP (Ti reject H0i|H0i is true)

= kα?. 17 ? α where inequality (1.17) is from Boole’s inequality. Thus, α = k .

Therefore, a set of conservative 100(1−α)% simultaneous confidence intervals for µ1, ..., µk is:

µi ∈ µˆi ± tα/2k,νi Si for i=1,...,k. (1.18)

2 2 2 2 If the variance σ1 = σ2 = ··· = σk = σ , and the variables

|µˆi − µi| Ti = i = 1, ..., k (1.19) si

are independent, where s1 = s2 = ... = sk. Then the product inequality method is ap- plicable. Thus, 100(1 − α)% simultaneous confidence intervals for µ1, ..., µk are: µi ∈

2 2 1/k µˆi ± t[1−(1−α) ]/2,νi s for i=1,...,k where s is the estimator of σ . Compared with the prod- uct inequality confidence interval, the Bonferroni inequality confidence intervals are always more conservative, because (1 − α)1/k < 1 − α/k (1.20) for all α and k > 1. However, both of them are too conservative if there are too many treatments to compare. For example, if we take 95% simultaneous confidence intervals for

µ1, ..., µ100 with degrees of freedom ν = 10, the critical value for t-statistic in Bonferroni method is t.05/200,10 which leads to the confidence interval being too wide and results in too conservative estimates. In this situation, the Scheff´e’sor Tukey’s method may be applied when appropriate.

1.6.2 Example: Bonferroni Method for Crystalline Drug Sub-

stance

In the Section 1.3.2, there are 4 groups A, B, C, D. Under the hypotheses: H0 : µi = µj versus H1 : µi 6= µj for i, j=A, B, C, D, and i 6= j, the Bonferroni simultaneous confidence intervals for µi − µj are 18 p µˆi − µˆj ± tα/2k,νsi 2/n.

From data in Table 1.1, the mean and pool stand deviation estimators are:

µˆA = 155.75,

µˆB = 449.25,

µˆC = 507.25,

µˆD = 24,

si =σ ˆ = 226.1564.

4 The number of degrees of freedom for Bonferroni method is ν = 3 and there are k = 2 = 6 simultaneous null hypotheses. Therefore, for overall significance level α = 0.05,

tα/2k,ν = t.05/(2×6),3 ≈ 6.23, p p tα/2k,νsi 2/n = 6.23 × 226.1564 × 2/4 ≈ 996.28.

Then, 95% simultaneous confidence intervals for µi − µj by Bonferroni method are given at Table 1.5. Table 1.5: Simultaneous Confidence Interval by Bonferroni Method

Impurity Simultaneous C.I. A-B −293.5 ± 996.28 A-C −351.5 ± 996.28 A-D 131.75 ± 996.28 B-C −58 ± 996.28 B-D 425.25 ± 996.28 C-D 483.25 ± 996.28

At confidence level 0.95, there are no significantly different mean effects between any two impurities under one dose of gamma ray irradiation. Both of Bonferroni and Tukey’s methods deal with all pairwise comparisons. However, compared with the Tukey’s method in these 6 comparisons, the margin of error in Bonferroni method is larger. That is, the Tukey’s 19 simultaneous confidence intervals are narrower than the Bonferroni method. It is because Tukey’s method gives exact overall confidence level. Thus, in all pairwise comparisons, Tukey’s method is preferred.

1.7 Nonparametric Approach

Mann-Whitney test (Wilcoxon rank sum test) is a distribution-free rank sum test. In the

two-sample location problems, X1, ··· ,Xm and Y1, ··· ,Yn are two random samples from two different continuous populations, respectively. If X’s and Y’s satisfy the following indepen- dent assumptions, then the Mann-Whitney test can be used. a) The X’s are independent and identically distributed. And the Y’s are also independent and identically distributed. b) The X’s and Y’s are mutually independent. Let M(X) and M(Y ) denote the medians of population X and Y, respectively. Then the null hypothesis would be:

∆ = M(Y ) − M(X),

which is the difference between population medians. On the other hand, the null hypothesis

H0 can be reduced to:

H0 : ∆ = 0.

The hypothesis asserts that the population medians are equal. Equivalently, in the treatment comparisons, it means the treatment has no effect. In the two-sided hypotheses: H0 : ∆ = 0

versus H1 : ∆ 6= 0, the Wilcoxon rank sum statistic W should be obtained. In order to get W, first of all, we combine the values of X’s and Y’s together and rank them from the

smallest to largest as a (m+n) joint order. Let Rj denote the rank of Yj in this joint ordering of X’s and Y’s. In a result, n X W = Rj (1.21) j=1 20

The decision rule is: if W ≥ wα/2 or W ≤ n(m + n + 1) − wα/2, then the null hypothesis is rejected at significance level α. The wα/2 is the upper α/2 percentile critical value of the null distribution of W. Similarly, for the one-sided upper-tail hypotheses test:H0 : ∆ = 0 versus H1 : ∆ > 0, the rejection region is: [wα, ∞). Mann and Whitney (1947) proposed the U statistic: m n X X U = φ(Xi,Yj), (1.22) i=1 j=1 where   1, if Xi < Yj φ(Xi,Yj) =  0, otherwise.

Actually, the U statistic is computed by comparing each pair of values Xi and Yi. If the

Xi value is smaller, then score one for that pair. Otherwise, score zero for that pair. In the end, the summation of 1’s is the Mann-Whitney U statistic. Since the Mann-Whitney test is nonparametric, the confidence bound for ∆ = M(Y ) − M(X) is different from the parametric confidence sets. The (mn) different joint order of Yj − Xi can be denoted as U 1 ≤ U 2 ≤ ... ≤ U mn from the least to greatest. For one-sided upper-tail hypotheses

H0 : ∆ = 0 versus H1 : ∆ > 0, the 100(1 − α)% confidence interval for ∆ is:

(U Cα , ∞),

n(2m+n+1) Cα 1 2 mn where Cα = 2 + 1 − wα and U is the Cα position value in (U ,U , ..., U ).

1.8 Fisher’s Exact Test

1.8.1 Inference Using Fisher’s Exact Test

In the analysis of contingency tables, Fisher’s exact test is used as a statistical significance test to determine whether there is an association between the categorical variables, especially when the sample sizes are small. Suppose there are two samples, for sample i (i=1,2), the 21

number of successes, observed in mi independent Bernoulli trials with success probability pi, is oi1. Let oi2 stand for the number of failure observed in sample i, n1 stand for the total number of successes in both samples, n2 stand for the total number of failure in both samples, and n stand for the total number of both of samples. Then n1 = o11 + o21, n2 = o12 + o22,

n = n1 + n2 = m1 + m2. This is the typical contingency table, as shown on Table 1.6.

Table 1.6: 2 × 2 Table of Outcomes

Successes Failures Totals Sample 1 o11 o12 m1 Sample 2 o21 o22 m2 Totals: n1 n2 n

Fisher’s exact test is based on the conditional distribution of 011 given the row and column sums m1, m2, n1, n2. Therefore, the conditional distribution of 011 is:

m1 m2  x m1−x P (011 = x|m1, m2, n1, n2) = n  . (1.23) n1

The conditional probability distribution defined by equation (1.23) is a member of a family of hypergeometric distributions. Note that, there is a restriction on the range of x, where

max(0, m1 + n1 − n) ≤ x ≤ min(m1, n1). Equivalently, the equation (1.23) can be simplified as:

m1!m2!n1!n2! P (011 = x|m1, m2, n1, n2) = . (1.24) n!o11!o12!o21!o22!

Fisher’s exact test decides whether o11 is significantly small or large with respect to the conditional distribution representing in equation (1.23). Specifically, for one-sided upper-

tail test, H0 : p1 = p2 versus H1 : p1 > p2, the decision rule is: if o11 ≥ γα, then the null hypothesis is rejected. Otherwise, there is not sufficient evidence to reject the null 22 hypothesis. Here γα is chosen from the conditional distribution so that it satisfies:

min(m1,n1) m1 m2  X γα m1−γα P (o11 ≥ γα|m1, m2, n1, n2) = n  γα n1 = α.

1.8.2 Example: Python Eggs

In Shine et al (1997), it indicates that the site of nest and maternal care can influence incubation temperatures. Furthermore, to figure out the effects of different temperatures on whether eggs are hatched or not, they experimentally simulated three different temperatures: hot, neutral and cold. In final, the hatched eggs are counted in different temperatures. As a result, the major concerning is whether the percentages of hatched eggs are the same if the temperatures are different. The part of the results are given in Table 1.7.

Table 1.7: Hatched Eggs

Hatched Not Hatched Totals Cold 16 11 27 Neutral 38 18 56 Totals: 54 29 83

Then, n1 = 54, n2 = 29, m1 = 27, m2 = 56, n = 83. To test whether the probabilities of hatched eggs in cold and neutral temperatures are the same or not, one-sided upper-tail hypotheses are applied: H0 : p1 = p2 versus H1 : p1 > p2, where p1 stands for the probability of hatched eggs in cold temperature and p2 stands for the probability of hatched eggs in neutral temperature. Since the min(m1, n1) = min(54, 27) = 27 and max(0, m1 + n1 − n) = max(0, 27 + 54 − 83) = 0, for the significance level α = 0.05, γ0.05 should satisfy:

27 27  56  X γ0.05 27−γ0.05 P (o11 ≥ γ0.05|27, 56, 54, 29) = 83 γ0.05 54 = 0.05. 23

Therefore, γ0.05 = 12. By the decision rule, since o11 = 16 < 12, there is not enough evidence to reject the null hypothesis. That is, the hypothesis that probability of hatched eggs in cold and neutral temperatures are the same cannot be rejected at 0.05 significance level. 24

CHAPTER 2

IDENTIFYING THE BEST TREATMENT USING FISHER’S EXACT TEST

2.1 Binary Data

In social science and clinical trials, there are many kinds of categorical data. Specifically, if there are only two categorical outcomes, it is called binary data. For example, if we are only interested in whether the patient gets a disease or not, the information of every patient is either healthy or sick. This kind of data is binary. Suppose there are two treatment groups of people and the probability for the ith treat- ment of people cured from the disease is πi. Then, every treatment group of people can be considered as the ni (which is the number of people in the ith treatment group) independent repeated Bernoulli trials with the success probability πi, where i=1,2. Since the binomial random variable is the sum of independent Bernoulli variables, xi1, the total number of peo- ple who get cured from the disease, follows the binomial(ni, πi), with ni is fixed. The possible outcomes are summarized in a 2 × 2 contingency Table 2.1. When π1, π2 are of interest, the 25

joint distribution of (x11, x21) is needed.

Table 2.1: 2 × 2 Table of Patients

Treatment 1 Treatment 2 Totals Recovery from the Disease x11 x12 m1 Not recovery from the Disease x21 x22 m2 Totals: n1 n2 n

In practice, whether π1 and π2 are equal is always of primary interest. That is why the

Fisher’s exact text is used for the hypothesis H0 : π1 = π2 = π versus H1 : π1 > π2.

The joint probability mass function of (x11, x12) under the null hypothesis is

 n   n  1 x11 n1−x11 2 x12 n2−x12 f(x11, x12|π) = π (1 − π) π (1 − π) x11 x12    n1 n2 = πx11+x12 (1 − π)n1+n2−(x11+x12) x11 x12    n1 n2 = πm1 (1 − π)n−m1 . x11 x12 (2.1)

From conditional probability (2.1), m1 = x11 + x12 is a sufficient statistic under null hypoth- esis. Given m1, the conditional distribution x11 is hypergeometric(n, n1, m1), which is:

n1n−n1  x m1−x P (x11 = x|n, n1, m1) = n  . (2.2) m1

By the margin numbers, x11 ∈ (max(0, m1 − n2), min(m1, n1)) from Section 1.8.1.

2.2 Odds Ratio

The odds ratio ϕ is a measure that access the strength of association in binary data. It is the ratio of the odds of the outcomes of primary interest in one group to the odds in another 26 group. Therefore, the odds ratio for group 1 and group 2 is:

π /(1 − π ) ϕ = 1 1 (2.3) π2/(1 − π2)

Since the hypotheses: H0 : π1 = π2 versus H1 : π1 > π2 is equivalent to H0 : ϕ = 1 versus

H1 : ϕ > 1, generally, the joint distribution of (x11, x12) can be re-written as: for given m1,

 n  n  π /(1 − π ) 1 2 1 1 x11 n1 m1 n2−m1 f(x11, x12|π1, π2, n1, n2) = ( ) (1 − π1) π2 (1 − π2) x11 x12 π2/(1 − π2)  n  n  1 2 x11 n1 m1 n2−m1 = ϕ (1 − π1) π2 (1 − π2) , x11 x12 (2.4) the conditional likelihood hypergeometric likelihood would be:

f(x11, m1|π1, π2, n1, n2) P (x11|n1, m1, n, ϕ) = f(m1|π1, π2, n1, n2) n1  n−n1 ϕx11 = x11 m1−x11 Pxu n1n−n1ϕi i=xl i m1−i (2.5)

where au    X n1 n − n1 i f(m1|π1, π2, n1, n2) = ϕ (2.6) i m1 − i i=al is the summation of all possible values for x11, which is determined by the margin numbers n1, n2, m1. Thus, xl, xu are:

xl = max(0, m1 − n2) (2.7)

xu = min(m1, n1) (2.8) 27

Therefore, for H0 : π1 = π2 = π versus H1 : π1 > π2, the exact p value is:

x Xu pL = P (x|n1, m1, n, ϕ). (2.9)

x=xl

To test the hypotheses at α significance level, then

x Xu α = P (x|n1, m1, n, ϕˆL) (2.10)

x=xl

whereϕ ˆL is the lower confidence bound of 100(1 − α)% for ϕ. Therefore, the 100(1 − α)% confidence interval for ϕ is:

(ϕ ˆL, ∞)

Now, if there are k treatment groups and two different responses (such as positive and negative responses), the data is still binary. Moreover, it can be expressed as a 2 × k table by following:

Table 2.2: 2 × k Table for k Treatments

Treatment 1 Treatment 2 ··· Treatment K positive x11 x12 ··· x1k negative x21 x22 ··· x2k Totals: n1 n2 ··· nk

Here, xij, where i=1,2 and j=1,2,...,k, stands for the number of positive (for i=1) and negative (for i=2) for the ith treatment group. Similar to the 2 × 2 table, the number of

positive response x1j is distributed as binomial(nj,πj), where nj is fixed and nj = x1j + x2j.

Let m1ij denote the summation of the number of positive responses in group i and group j, where i, j=1,2,...,k and i 6= j. That is:

x1i + x1j = m1ij. (2.11) 28

Let m2ij denote the summation of the number of negative responses in group i and group j, where i, j=1,2,...,k and i 6= j. That is:

x2i + x2j = m2ij (2.12)

Notations: ni: the total number of group i:

ni = x1i + x2i

Nij: the total number of group i and group j:

Nij = ni + nj = m1ij + m2ij

u l Denote x1i and x1i the largest and lowest number of positive in group i,respectively, then:

u x1i = min(m1ij, ni)

l x1i = max(0, m1ij − nj)

The hypotheses testing: H0 : πj = πi = π versus H1 : πj > πi is equivalent to H0 : ϕji = 1 versus H1 : ϕji > 1.

Similar to the 2 × 2 case, the 100(1 − α)% confidence interval for ϕji is:

(ϕ ˆji, ∞)

whereϕ ˆji is the lower bound of the confidence interval and satisfies the equation (2.13).

u x1j X α = P (x|ni, m1ij,Nij, ϕˆji) (2.13)

x=x1j

where niNij −ni ϕˆx x m1ij −x ji P (x|ni, m1ij,Nij, ϕˆji) = u (2.14) Px1i niNij −ni i l ϕˆji i=x1i i m1ij −i 29 2.3 Introduction to Partition

Partition is a usual method in mathematics and statistics. It has the following properties: suppose S is a non-empty set and θ is a partition set for S, then the elements in θ are pairwise S disjoint and the union of all elements of θ equals to S. That is, θ Aθ = S. For example, suppose θ = {A1,A2,A3,A4} and S 6= φ, where φ is an empty set, and

1. Ai ∩ Aj = φ where i 6= j i, j = 1, 2, 3, 4

4 2. ∪i=1Ai = S;

Then θ is a partition set for S. In particularly, Figure 2.1 gives us the idea what the partition is.

Figure 2.1: Partition of a Set S

A3 A2

A1

A4

Another terminology is useful in multiple comparison is directed towards a set of the parameter space, which was proposed by Hsu and Berger (1999). Definition of directed towards a set: Let the data Y have a distribution determined by 30 the parameter θ, where θ ∈ Θ and Θ is the parameter space. A confidence set C(Y) for θ is directed towards Θ∗, where Θ∗ is a subset of the parameter space Θ, if Θ∗ ⊂ C(Y ) or C(Y ) ⊂ Θ∗ for every sample point y. Since we want to find the best treatment which needs to compare any two treatments, the one-sided significant difference inference is the parameter of interest. For example, in the

∗ one way model, the parameter we are interested in is Θ = {µi − µ1 > δ}, where δ is known as a set practical significant difference and is pre-defined. Therefore, from the definition of

∗ directed towards, the confidence interval C(Y) for θ = µi − µ1 which directed towards Θ has the the form (L(Y ), ∞), where L(Y), a function of the data Y, is determined by the confidence level α. Let D(Y) be any 100(1 − α)% confidence set for θ, then we can construct C(Y) that is 100(1 − α)% confidence set for θ and directed towards Θ∗ as follows:   ∗ D(Y ), if D(Y ) ⊂ Θ C(Y ) =  ∗ D(Y ) ∪ Θ , otherwise Here, since we are only interested in which one is the better, the pre-defined δ is set as zero.

2.4 Main Results

Without loss of generality, suppose the higher probability for the positive response stands for the better treatment. Meanwhile, we do not consider the extreme situations, which include that the success probability is 0 or 1. It is obviously to know whether they are the best treatment or not in these extreme cases. Then, πi 6= 0 and πi 6= 1 for all i = 1, 2, ..., k in the following Theorems.

Theorem 2.1. For a 2×k table, if the number of positive response x1j follows a binomial(nj, πi)

distribution, let θ = (π1, π2, ..., πk) be the vector of the probability of the positive response, and ϕji be the odds ratio of the positive response in group j and group i. The 100(1 − α)%

confidence interval for ϕji is: 31

(ϕ ˆji, ∞)

where ϕˆji satisfies: u x1j X α = P (x|ni, m1ij,Nij, ϕˆji)

x=x1j

u and x1i = min(m1ij, ni).

For an integer J, let M be the largest integer for i such that ϕˆJi ≤ 1, if such an integer M exists; otherwise, M=0. If M=0 for ϕˆJM , then:

P (ω : πJ (ω) = maxj=1,...,k πj(ω)) ≥ 1 − α.

This means that J is the best treatment.

Proof. First, for any fixed J, the hypotheses: H0 : πJ ≤ πi versus H1 : πJ > πi is equivalent to H0 : ϕJi ≤ 1 versus H1 : ϕJi > 1. It is because πi 6= 0 and πi 6= 1 for all i = 1, 2, ..., k and:

πJ ≤ πi

⇔ πJ − πJ πi ≤ πi − πJ πi

⇔ πJ (1 − πi) ≤ πi(1 − πJ )

⇔ πJ (1−πi) ≤ 1. πi(1−πJ )

If the null hypothesis H0 is rejected in favor of H1, then πJ > πi. 32 Let Q = (0, ∞) and Qk−1 space can be partitioned as follows:

k−1 Sk = (1, ∞)

Sk−1 = (0, 1]

Sk−2 = (1, ∞) × (0, 1] . .

k−1−i Si = (1, ∞) × (0, 1] . .

k−2 S1 = (1, ∞) × (0, 1]

k−1 Then, S1,S2, ..., Sk partition the Q space. Denote

ϕˆJ1, ..., ϕˆJ(J−1), ϕˆJ(J+1), ..., ϕˆJk as

xˆ1, ..., xˆJ−1, xˆJ , ..., xˆk−1 and letx ˆk = 0.

whereϕ ˆJ1 =x ˆ1, ϕˆJ2 =x ˆ2, ··· , ϕˆJ(J−1) =x ˆJ−1, ϕˆJ(J+1) =x ˆJ , ··· , ϕˆJk =x ˆk−1. Sk Let CJ (X) = i=1((ˆxi, ∞) ∩ Si), then CJ (X) is 100(1 − α)% confidence set for θ. Since if θ ∈ Si, then

k [ Pθ(θ ∈ CJ (X)) = Pθ(θ ∈ ((ˆxi, ∞) ∩ Si)) i=1

= Pθ(θ ∈ ((ˆxi, ∞) ∩ Si))

= Pθ(θ ∈ (ˆxi, ∞))

≥ 1 − α. 33

If M=0, thenϕ ˆJt > 1 for all t=1,2,...,J-1,J+1,...,k. Therefore, for all i = 1,2,..,k-1, (ˆxi, ∞) ∩

Si = φ, sincex ˆi > 1 for all i ∈ (1, 2, ..., k − 1) and (ˆxi, ∞) ∩ (0, 1] = φ. Thus,

k [ CJ (X) = ((ˆxi, ∞) ∩ Si) i=1 k−1 [ k−i−1 [ = { ((ˆxi, ∞) ∩ (1, ∞) × (0, 1])} {(ˆxk, ∞) ∩ Sk} i=1

= (ˆxk, ∞) ∩ Sk (2.15)

= Sk

= (1, ∞)k−1. (2.16)

Then,

P (ω : πJ (ω) = max πj(ω)) = P (ω : πJ (ω) ≥ πj(ω) for all j in 1, 2, ..., k) j=1,...,k

= P (ω : πJ (ω) ≥ π1(ω), πJ (ω) ≥ π2(ω), ...,

πJ (ω) ≥ πJ−1(ω), πJ (ω) ≥ πJ+1(ω), ..., πJ (ω) ≥ πk(ω))

≥ P (ω : ϕJ1(ω) > 1, ϕJ2(ω) > 1, ..., ϕJ(J−1)(ω) > 1,

ϕJ(J+1)(ω) > 1, ..., ϕJk(ω) > 1)

= P (ω : ϕ∗ ∈ (1, ∞)k−1) e ≥ 1 − α. (2.17)

∗ T where ϕ = (ϕJ1, ..., ϕJ(J−1), ϕJ(J+1), ..., ϕJk) . e From the result (2.16), we have:

∗ k−1 P (ω : ϕ ∈ (1, ∞) ) ≥ P (X : θ ∈ CJ (X)) e ≥ 1 − α. 34 The inequality (2.17) holds. Thus,

P (ω : πJ (ω) = maxj=1,...,k πj(ω)) ≥ 1 − α.

That is, J is the best treatment.

Theorem 2.2. For a 2 × k table, assume that the number of positive response x1j follows

binomial(nj, πi) model. Denote θ = (π1, π2, ..., πk), the vector of the probability of the positive

response. Let ϕji be the odds ratio of the positive response in group j and group i. The

100(1 − α/k)% confidence interval for ϕji is:

ϕ ∈ (ϕ ˆji, ∞)

where ϕˆji satisfies: u x1j X α/k = P (x|ni, m1ij,Nij, ϕˆji)

x=x1j

u and x1i = min(m1ij, ni). For each J, let M be the largest integer for i such that ϕˆJi ≤ 1, if

k−M−1 such M exists; otherwise, M=0. Let Ij(Y ) = (1, ∞) ∩ (ϕ ˆjM , ∞), then:

k P (Y : θ ∈ ∩j=1Ij(Y )) ≥ 1 − α.

Proof. From the definition of M, we know that M is a function of J. In order to prove this theorem, first, we need to prove P (Y : θ ∈ Ij(Y )) ≥ 1 − α/k for any j in 1,2,...,k. 35 The Qk−1 space can be partitioned as follows:

k−1 Sk = (1, ∞)

Sk−1 = (0, 1]

Sk−2 = (1, ∞) × (0, 1] . .

k−1−i Si = (1, ∞) × (0, 1] . .

k−2 S1 = (1, ∞) × (0, 1]

k−1 Then, S1, ..., Sk partition the Q space. Sk Let Cj(Y ) = i=0((ϕ ˆji, ∞)∩Si), then Cj(Y ) is 100(1−α/k)% confidence set for θ. Since if θ ∈ Si, then

k [ Pθ(θ ∈ Cj(Y )) = Pθ(θ ∈ ((ϕ ˆji, ∞) ∩ Si)) i=1

= Pθ(θ ∈ ((ϕ ˆji, ∞) ∩ Si))

= Pθ(θ ∈ (ϕ ˆji, ∞))

≥ 1 − α/k

Notice that there are three properties for the random variable M here: a) if i > M ( if such i exists), then (ϕ ˆji, ∞) ∩ Si = φ, since (ϕ ˆji, ∞) ∩ (0, 1] = φ;

k−M k−i−1 k−M b) if i < M (if such i exists), then Si ⊂ (1, ∞) since (1, ∞) × (0, 1] ⊂ (1, ∞) ; c) (ϕ ˆjM , ∞) ⊃ (1, ∞), sinceϕ ˆjM ≤ 1. 36 Therefore,

k [ Cj(Y ) = ((ϕ ˆji, ∞) ∩ Si) i=1 M [ = ((ϕ ˆji, ∞) ∩ Si) (2.18) i=1 [ k−M ⊆ {(ϕ ˆjM , ∞) ∩ SM } (1, ∞) (2.19)

k−M−1 [ k−M ⊆ {(ϕ ˆjM , ∞) ∩ (1, ∞) × (0, 1]} {(1, ∞) ∩ (ϕ ˆjM , ∞)} (2.20)

k−M−1 = (ϕ ˆjM , ∞) ∩ (1, ∞)

= Ij(Y )

Here, the equation (2.18) comes from the property a). The subsets (2.19) and (2.20) result from property b) and c), respectively. Thus,

P (Y : θ ∈ Ij(Y )) ≥ P (Y : θ ∈ Cj(Y )) α ≥ 1 − k

for any j in 1,2,...,k. Then, let

c c Ij (Y ) , (Ij(Y )) ,

we have:

c sup Pθ(Y : Ij (Y )) = 1 − inf Pθ(Y : Ij(Y )) θ θ α ≤ 1 − (1 − ) k α = k 37 By Bonferroni inequality,

k k c X c Pθ(∪j=1Ij (Y )) ≤ Pθ(Ij (Y )) j=1 α = k k = α.

Thus,

k k c P (Y : θ ∈ ∩j=1Ij(Y )) = 1 − P (Y : θ ∈ ∪j=1Ij (Y ))

≥ 1 − α.

The procedure for Theorem 2.2 is in Section 2.5.2. If the screening procedure stop at stage T (T ≤ k) where M = 0, then the treatment J = k-T+1 is the best one. Since:

T c k c Pθ(∪j=1Ij (Y )) ≤ Pθ(∪j=1Ij (Y )) k X c ≤ Pθ(Ij (Y )) j=1 α = k k = α.

Thus,

T T c P (Y : θ ∈ ∩j=1Ij(Y )) = 1 − P (Y : θ ∈ ∪j=1Ij (Y ))

≥ 1 − α. 38 2.5 Procedures

2.5.1 Procedure for Theorem 2.1

Suppose the better treatment has larger probability of positive response, for a 2 × k table, the procedure produced from applying Theorem 2.1 will have k steps, since M=0. And the procedure for a fixed index J, reads: Step 1

Ifϕ ˆJk > 1, whereϕ ˆJk satisfies

xu X1J α = P (x|nk, m1kJ ,NkJ , ϕˆJk)

x=x1J

then claim πJ > πk and goes to step 2. Else, claim the treatment J is not the best one and stop.

Step 2

Ifϕ ˆJ(k−1) > 1, whereϕ ˆJ(k−1) satisfies

xu X1J α = P (x|nk−1, m1(k−1)J ,N(k−1)J , ϕˆJ(k−1))

x=x1J

then claim πJ > πk−1 and goes to step 3. Else, claim the treatment J is not the best one and stop.

. . . . Step k-M 39

Ifϕ ˆJ(k−M) > 1, whereϕ ˆJ(k−M) satisfies

xu X1J α = P (x|nk−M , m1(k−M)J ,N(k−M)J , ϕˆJ(k−M))

x=x1J

then claim πJ > πk−M and goes to step k-M+1. Else, claim the treatment J is not the best one and stop. . . . . Step k-1

Ifϕ ˆJ1 > 1, whereϕ ˆJ1 satisfies

xu X1J α = P (x|n1, m11J ,N1J , ϕˆJ1)

x=x1J

then claim πJ > π1 and goes to step k. Else, claim the treatment J is not the best one and stop.

Step k

Claim that the treatment J is the best one.

Here, if M 6= 0, then the number of steps will be less than k. It does not conclude that J is the best treatment. However, the Theorem 2.1 still controls the familywise type I error in the whole procedure within α in every kinds of value of M.

2.5.2 Procedure for Theorem 2.2

The following k+1 stages show the procedure in Theorem 2.2. Note that we could screen treatments by randomly order. However, for convenience, we screen from treatment k till 40 treatment 1 by order as an example here. Stage 1 Step 1

Ifϕ ˆk(k−1) > 1, whereϕ ˆk(k−1) satisfies

xu X1k α/k = P (x|nk−1, m1(k−1)k,N(k−1)k, ϕˆk(k−1))

x=x1k

then claim πk > πk−1 and goes to step 2. Else, claim the treatment k is not the best one and goes to Stage 2.

Step 2

Ifϕ ˆk(k−2) > 1, whereϕ ˆk(k−2) satisfies

xu X1k α/k = P (x|nk−2, m1(k−2)k,N(k−2)k, ϕˆk(k−2))

x=x1k

then claim πk > πk−2 and goes to step 3. Else, claim the treatment k is not the best one and goes to Stage 2.

. . . . Step k-M

Ifϕ ˆkM > 1, whereϕ ˆkM satisfies

xu X1k α/k = P (x|nM , m1Mk,NMk, ϕˆkM )

x=x1k 41

then claim πk > πM and goes to step k-M+1. Else, claim the treatment J is not the best one and goes to Stage 2. . . . . Step k-1

Ifϕ ˆk1 > 1, whereϕ ˆk1 satisfies

xu X1k α/k = P (x|n1, m11k,N1k, ϕˆk1)

x=x1k

then claim πk > π1 and goes to step k. Else, claim the treatment k is not the best one and goes to Stage 2.

Step k

Claim that the treatment k is the best one and stop.

Stage 2 Step 1

Ifϕ ˆ(k−1)k > 1, whereϕ ˆ(k−1)k satisfies

u x1(k−1) X α/k = P (x|nk, m1k(k−1),Nk(k−1), ϕˆ(k−1)k)

x=x1(k−1)

then claim πk−1 > πk and goes to step 2. Else, claim the treatment k-1 is not the best one and goes to Stage 3.

Step 2 42

Ifϕ ˆ(k−1)(k−2) > 1, whereϕ ˆ(k−1)(k−2) satisfies

u x1(k−1) X α/k = P (x|nk−2, m1(k−2)(k−1),N(k−2)(k−1), ϕˆ(k−1)(k−2))

x=x1(k−1)

then claim πk−1 > πk−2 and goes to step 3. Else, claim the treatment k-1 is not the best one and goes to Stage 3.

. . . . Step k-M

Ifϕ ˆ(k−1)M > 1, whereϕ ˆ(k−1)M satisfies

u x1(k−1) X α/k = P (x|nM , m1M(k−1),NM(k−1), ϕˆ(k−1)M )

x=x1(k−1)

then claim πk−1 > πM and goes to step k-M+1. Else, claim the treatment J is not the best one and goes to Stage 3.

. . . . Step k-1

Ifϕ ˆ(k−1)1 > 1, whereϕ ˆ(k−1)1 satisfies

u x1(k−1) X α/k = P (x|n1, m11(k−1),N1(k−1), ϕˆ(k−1)1)

x=x1(k−1)

then claim πk−1 > π1 and goes to step k. Else, claim the treatment k-1 is not the best one and goes to Stage 3. 43 Step k

Claim that the treatment k-1 is the best one and stop.

...... Stage k-1 Step 1

Ifϕ ˆ2k > 1, whereϕ ˆ2k satisfies

xu X12 α/k = P (x|nk, m1k2,Nk2, ϕˆ2k)

x=x12

then claim π2 > πk and goes to step 2. Else, claim the treatment 2 is not the best one and goes to Stage k.

Step 2

Ifϕ ˆ2(k−1) > 1, whereϕ ˆ2(k−1) satisfies

xu X12 α/k = P (x|nk−1, m1(k−1)2,N(k−1)2, ϕˆ2(k−1))

x=x12

then claim π2 > πk−1 and goes to step 3. Else, claim the treatment 2 is not the best one and goes to Stage k.

. . . . 44 Step k-M

Ifϕ ˆ2M > 1, whereϕ ˆ2M satisfies

xu X12 α/k = P (x|nM , m1M2,NM2, ϕˆ2M )

x=x12

then claim π2 > πM and goes to step k-M+1. Else, claim the treatment 2 is not the best one and goes to Stage k.

. . . . Step k-1

Ifϕ ˆ21 > 1, whereϕ ˆ21 satisfies

xu X12 α/k = P (x|n1, m112,N12, ϕˆ21)

x=x12

then claim π2 > π1 and goes to step k. Else, claim the treatment 2 is not the best one and goes to Stage k.

Step k

Claim that the treatment 2 is the best one and stop.

Stage k Step 1 45

Ifϕ ˆ1k > 1, whereϕ ˆ1k satisfies

xu X11 α/k = P (x|nk, m1k1,Nk1, ϕˆ1k))

x=x11

then claim π1 > πk and goes to step 2. Else, claim the treatment 1 is not the best one and goes to Stage k+1.

Step 2

Ifϕ ˆ1(k−1) > 1, whereϕ ˆ1(k−1) satisfies

xu X11 α/k = P (x|nk−1, m1(k−1)1,N(k−1)1, ϕˆ1(k−1))

x=x11

then claim π1 > πk−1 and goes to step 3. Else, claim the treatment 1 is not the best one and goes to Stage k+1.

. . . . Step k-M

Ifϕ ˆ1M > 1, whereϕ ˆ1M satisfies

xu X11 α/k = P (x|nM , m1M1,NM1, ϕˆ1M )

x=x11

then claim π1 > πM and goes to step k-M+1. Else, claim the treatment 1 is not the best one and goes to Stage k+1.

. . . . 46 Step k-1

Ifϕ ˆ12 > 1, whereϕ ˆ12 satisfies

xu X11 α/k = P (x|n2, m121,N11, ϕˆ12)

x=x11

then claim π1 > π2 and goes to step k. Else, claim the treatment 1 is not the best one and goes to Stage k+1.

Step k

Else, claim the treatment 1 is not the best one and goes to Stage k+1.

Stage k+1

Claim that there is no best treatment.

To understand the k stages more clearly, if it stops at stage T where 1 ≤ T ≤ k, then the treatment k+1-T is the best treatment. If T=k+1, then there is no best treatment here.

2.6 Simulation

In this section, we use simulation to confirm the coverage probability of the new selection procedures with Fisher’s exact test. Without loss of generality, suppose the higher response probability, the better treatment. First, for Theorem 2.1, we want to test whether the first treatment is the best one. In order to get the 2 × k table, randomized binomial data is generated. Here, let k = 5 and use different numbers of trials n to confirm that procedures are related with the sample size. The successful probability for these five treatments are 0.53, 0.1, 0.2, 0.3, 0.4 respectively. 47 The number of iterations is set to 10,000. At the significance level α = 0.05, the overall coverage probability of the new selection procedure is:

Table 2.3: Coverage Probability with C.L.=.95 and Different Trail Numbers

P = (P1,P2,P3,P4,P5) Coverage Probability Trail number n (0.53,0.1,0.2,0.3,0.4) 0.9521 90 (0.53,0.1,0.2,0.3,0.4) 0.9567 95 (0.53,0.1,0.2,0.3,0.4) 0.9611 100 (0.53,0.1,0.2,0.3,0.4) 0.9697 105

As Table 2.3 shows, all the coverage probabilities of the best treatment are greater than 1 − α = 1 − .05 = 0.95 which agrees with Theorem 2.1. Moreover, as the number of trials for all the binomial distribution treatment increases, the coverage probability also increases. That is, the Fisher’s exact test for the best selection of treatment is very sensitive to the trial numbers. The next, considering different position of pre-specified best treatment and there are only four treatments in total, the result is shown on the Table 2.4 as the other conditions remain the same. Table 2.4: Coverage Probability with Trial Number n = 90 and Different Orders

P = (P1,P2,P3,P4) Coverage Probability Best Treatment Position (.53,.2,.3,.4) 0.9515 1st (.2,.53,.3,.4) 0.9534 2nd (.2,.3,.53,.4) 0.9519 3rd (.2,.3,.4,.53) 0.957 4th

From Table 2.4, no matter what position is the pre-specified best treatment, the coverage probabilities are close to .95. Therefore, the procedure could always get the best treatment with the confidence level 0.95. For Theorem 2.2, the best treatment is unknown. To select the best treatment, with- out loss of generality, let k = 6 and the overall confidence interval 100(1 − α)% = 95%. 48 Then the individual confidence level for odds ratio should be 1 − α/k = 0.9917. The number of trials and iterations are set as 100 and 10000, respectively. For linearly de- creasing response probability P = (P1,P2,P3,P4,P5,P6) = (.53,.4,.35,.3,.2,.1), the best treatment is the first one and coverage probability is 0.9581. For U-shape response prob- ability P = (P1,P2,P3,P4,P5,P6) = (.7,.45,.3,.12,.32,.58), the best treatment is the first one and coverage probability is 0.9545. As for inverted-U shape response probability P =

(P1,P2,P3,P4,P5) = (.1,.2,.3,.5,.63,.4), the best treatment is the fifth one and coverage probability is 0.9491. For logarithmic response probability P = (P1,P2,P3,P4,P5,P6) = (0, 0.301, 0.477, 0.602, 0.7, 0.81) the best treatment is the sixth one and coverage probability is 0.9581. In conclusion, the result can be shown as in Table 2.5.

Table 2.5: Coverage Probability with Trial Number n = 100 and Different Response Shapes

P = (P1,P2,P3,P4,P5,P6) Coverage Probability Best Treatment (.53,.4,.35,.3,.2,.1) 0.9581 1st (.7,.45,.3,.12,.32,.58) 0.9545 1st (.1,.2,.3,.5,.63,.4) 0.9491 5th (0,0.301,0.477,0.602,0.7,0.81) 0.9581 6th 49

CHAPTER 3

IDENTIFYING THE BEST TREATMENT USING MANN-WHITNEY TEST

3.1 Simultaneous Inference with Mann-Whitney Test

For distribution free data, there are no assumptions for the probability distributions of variables. Furthermore, there are many different kinds of nonparametric methods in data analysis. For example, Kaplan-Meier method is used in life-time data, Kruskal-Wallis method tests whether the samples from the same distribution or not by ranks, Mann-Whitney test is also a popular test in nonparametric statistics. As stated in the introduction, there are several assumptions for Mann-Whitney test, like the independent observations. Another important assumption is location-shift. To test two distributions the same or not, if the alternative hypothesis is location shift, then this assumption is called location assumption in Mann-Whitney test. Take the one-sided test

for example, there are (xi) and (yj) independent samples drawn from two populations X and Y with distribution F1(x) and F2(y), i=1,2,...,n and j=1,2,...,m. The null hypothesis is: 50

H0 : F1(t) = F2(t) for all t, the alternative hypothesis is: H1 : F1(t) = F2(t + ∆), for every t, where ∆ is the location parameter. Therefore, the null hypothesis is that Y has the same distribution as X, and the alternative hypothesis is that Y has the same distribution as X with location shift ∆. Then the inference median can be used to substitute the distribution. Suppose M(X) and M(Y) are the medians of X and Y, respectively, then the location

parameter will be ∆ = M(Y )−M(X). Thus, the hypotheses are: H0 : ∆ = 0 vs H1 : ∆ > 0. In other words, if two samples have the same distribution, then the equality of median follows.

To obtain the 100(1 − α)% confidence interval for ∆ in the hypotheses H0 : ∆ = 0 vs

H1 : ∆ > 0, the lower confidence bound for ∆ should be obtained first. Therefore, the first step is to order the values of yj − xi from the smallest to the largest, which are denoted as U 1 ≤ U 2 ≤ · · · ≤ U mn. The second step is to get the position of the lower bound, which is:

n(2m + n + 1) C = + 1 − w (3.1) α 2 α

where the wα is the upper α percentile of the null distribution of the Wilcoxon Rank Sum statistic. Therefore, the 100(1 − α)% confidence interval for ∆ is:

(U Cα , ∞)

Cα where U is the Cαth position in the list of mn ordered difference yj − xi.

3.2 Large-Sample Approximation

From Mann and Whitney (1947), under the null hypothesis assumption H0: F1(t) = F2(t) for

all t, if n and m are large enough, then the integer Cα is approximately normally distributed. The standardized z-score here is 51

C − µ Z = α cα σcα where

mn µ = is the mean and cα 2 r mn(m + n + 1) σ = is the standard deviation. cα 12

C − mn That is, α 2 is asymptotically distributed to standardized normal distribution q mn(m+n+1) 12 N(0, 1). Thus, for the one-sided Wilcoxon rank sum test with H0 : ∆ = 0 against the alternative H1 : ∆ > 0, if sample sizes m,n are large, the integer Cα can be approximately by the normal distribution: q mn mn(m+n+1) Cα ≈ 2 − zα 12 .

Here, the location shift parameter ∆ is very important. Lehmann (1963c) proposed ∆U −∆L 2Z α 2 ˆ as an estimator for the asymptotic standard deviation of ∆, where (∆L, ∆U ) is the two-sided

cα mn+1−cα confidence interval for ∆ with confidence level 1 − α and ∆L = U , ∆U = U , and

n(2m+n+1) Cα = 2 + 1 − wα. Furthermore, Lehmann (1963b) has made the conclusion that the estimator for asymptotically distribution-free confidence interval for ∆ is the one centered at

ˆ U (cα)+U mn+1−cα ∆. Generally, the midpoint of the interval (∆L, ∆U ) which is 2 is a reasonable estimator of ∆.

3.2.1 Example

Dohle et al. (2009), evaluated the effect of mirror therapy(MT) in severe hemiparesis which is the weakness on the limbs on one side of the body. In the results, motor activities of daily living (motor ADL) is also measured to test whether the mirror therapy is effect or not. The motor ADL data is given in Table 3.1. 52 Table 3.1: Mirror Therapy

Initial(Before MT) Final(after MT) 43.9 60.8 48.3 66.6 42.3 58.4 47.5 65.8 36.7 50.1 41.3 54.8

Let the motor activities of daily living before mirror therapy be X and the motor activities of daily living after mirror therapy be Y. The confidence interval for location shift parameter ∆ is the confidence interval for the median difference between X and Y. Let α = 0.047 so

that the confidence level is 0.953. With m=n=6, the upper 0.047 percentile point w0.047 of the null distribution of W is 50. Then we obtain

6(2×6+6+1) C0.047 = 2 + 1 − w0.047 = 8.

8 Thus, the lower confidence bound ∆L = U = 10.1, where U’s are the 36 order numbers of the difference Y-X. As a result, the 95.3% confidence interval for ∆ is (10.1, +∞).

Since ∆L > 0, that is, zero is not included in this confidence interval. Therefore, there is sufficient evidence to support that the mirror therapy is effective with .047 significance level. Next, use the large approximate method to analyze this example.

r mn mn(m + n + 1) C ≈ − z α 2 α 12 √ = 18 − 1.674665 39

= 7.54

≈ 8.

8 Then, the lower confidence bound ∆L = U = 10.1 and the 95.3% confidence interval for ∆ is (10.1, +∞), which are the same as the exact Wilcoxon rank sum test. 53 To sum up, the Wilcoxon rank sum test is useful when the data is distribution-free and sample size is small. For the mirror therapy(MT) in severe hemiparesis example, if the paired-t test is used, it causes some mistakes, since the sample size is not large enough to satisfy the normal assumption. From Shapiro-Wilk Normality Test for the initial data, the p-value is 0.7633. Therefore, t test here is not quite as appropriate as the Wilcoxon rank sum test.

3.3 Main Results

Lemma 3.1. Let θ be the population center of symmetry (such as median, mean). If zi

zi+zj i = 1, 2, ..., n, are the data, then W = 2 for all 1 ≤ i ≤ j ≤ n are the Walsh averages. For any significance level α, we have

P (W(cα) < θ < W(t α )) = 1 − α 2

where

+ T : Wilcoxon signed rank (the sum of the positive signed ranks for zi);

+ α t α is chosen so that P (T ≥ t α ) = ; 2 2 2 n(n+1) c = − t α + 1. α 2 2

− Proof. Let T be the sum of the negative signed ranks for zi. Since the sum of the positive ranks equals the number of Walsh averages greater than the hypothesized θ, and the sum of the negative ranks equals the number of Walsh averages less than the hypothesized θ, then 54

P (W(cα) ≥ θ) = P (there are at most cα − 1 positive ranks) (3.2)

+ = P (T ≤ cα − 1)

+ n(n + 1) = P (T ≤ − t α ) 2 2 + = P (T ≥ t α ) (3.3) 2 α = 2

α P (W(t α ) ≤ θ) = P ( at least t negative ranks) (3.4) 2 2 − = P (T ≥ t α ) 2

+ = P (T ≥ t α ) (3.5) 2

+ n(n + 1) = P (T ≤ − t α ) (3.6) 2 2 α = 2

Therefore,

P (W(cα) < θ < W(t α )) = 1 − P (W(cα) ≥ θ) − P (W(t α ) ≤ θ) (3.7) 2 2 α α = 1 − − 2 2 = 1 − α.

Lemma 3.2. Given two sets of data (xi) and (yj), where i=1,2,...,n. j=1,2,...,m, the values

1 2 mn of yj − xi from the smallest to the largest are denotes as U ≤ U ≤ · · · ≤ U . Let θx, θy be the medians of X and Y. Then given any significance level α,

cα P (U < θy − θx) = 1 − α 55 where

n(n+1) cα = nm + 2 + 1 − wα and under Mann-Whitney statistic U, statistic wα satisfies

P (U > wα) = α

Proof. Based on the Lemma 3.1, T + is the sum of positive ranks, then

cα P (U < θy − θx) = P (cα < T+)

= 1 − P (T+ ≤ cα − 1) n(n + 1) = 1 − P (T ≤ mn − (w − )) + α 2 n(n + 1) = 1 − P (T > w − ) + α 2 n(n + 1) = 1 − P (T + > w ) (3.8) + 2 α

= 1 − P (U > wα) (3.9)

= 1 − α.

Lemma 3.3. For any positive constant c and independent random variables zi, i = 1, 2, ..., k

Pk P (|zi − zj| < c, i < j, i, j = 1, 2, ..., k) = i=1 P (0 < zt − zj < c, j 6= t, j = 1, 2, ..., k).

Proof. Let At = zt > zj, j 6= t for t = 1, 2, ..., k. Then A1,A2, ..., At constitute a partition of 56 the sample space.

P (|zi − zj| < c, i < j, i, j = 1, 2, ..., k) k [ \ = P ( ({|zi − zj| < c, i < j, i, j = 1, 2, ..., k} At)) (3.10) i=1 k X \ = P ({|zi − zj| < c, i < j, i, j = 1, 2, ..., k} At)) (3.11) i=1 \ = kP ({|zi − zj| < c, i < j, i, j = 1, 2, ..., k} At)) (3.12)

= kP ({0 < zt − zj < c, j 6= t, j = 1, 2, ..., k}) (3.13)

= kP ({0 < z1 − zj < c, j = 2, ..., k}) (3.14)

T To show the equation (3.13), let event A be {|zi −zj| < c, i < j, i, j = 1, 2, ..., k} At and event B be {0 < zt − zj < c, j 6= t, j = 1, 2, ..., k}. Here, we will show that P (A) = P (B).

Firstly, show A ⊂ B. For any element w ∈ A, that is, w ∈ {|zi − zj| < c, i < j, i, j = T 1, 2, ..., k} At), since At implies that zt is the largest among z1, z2, ..., zk,then zt − zj > 0 for j 6= t. Therefore, w ∈ B. Thus, A ⊂ B.

Next, show B ⊂ A. For any element w ∈ B, that is, w ∈ {0 < zt − zj < c, j 6= t, j =

1, 2, ..., k}, implies that zt is the largest one among z1, ..., zk (w ∈ At). Meanwhile, for any i, j, we have 0 < zt − zj < c and 0 < zt − zi < c. Thus, −c < zi − zj < c. Then, |zi − zj| < c. T Therefore, w ∈ {|zi − zj| < c, i < j, i, j = 1, 2, ..., k} At). That is, w ∈ A. Thus, B ⊂ A. Then A = B and P (A) = P (B).

Theorem 3.1. For independent and distribution-free data (xij), where i=1,2,...,k and j=1,2,...,ni, let Mi be the median of the ith treatment, θ = (M1,M2, ..., Mk) be the vector of median of the k treatments. The hypotheses: H0 : Mj = Mi versus H1 : Mj > Mi, are equivalent to 57

H0 : ∆ji = 0 versus H1 : ∆ji > 0, where ∆ji = Mj − Mi. The 100(1 − α)% confidence

Cji nj (2ni+nj +1) for ∆ji is (U , ∞), where Cji = 2 + 1 − wα and wα is the upper α percentile of

Cji the null distribution of the Wilcoxon Rank Sum statistic. Let U be the Cjith position in

1 2 nj ni the list of njni increasing ordered difference xjh − xir ∈ {U ≤ U ≤ · · · ≤ U } where

(h = 1, 2, ..., nj and r = 1, 2, ..., ni). For any J, let T be the largest integer i (1 ≤ i ≤ k) such

that U CJi ≤ 0. Otherwise, T=0. If for a fixed J, T=0, then

P (ω : MJ = max Mj) ≥ 1 − α j=1,2,...,k

That is, J is the best treatment with confidence level 1 − α

Proof. The Rk−1 space can be partitioned as follows:

k−1 Sk = (0, ∞)

Sk−1 = (−∞, 0]

Sk−2 = (0, ∞) × (−∞, 0] . .

k−1−i Si = (0, ∞) × (−∞, 0] . .

k−2 S1 = (0, ∞) × (−∞, 0]

k−1 Then, S1,S2, ..., Sk partition the R space. Denote

U CJ1 ,U CJ2 , ..., U CJ(J−1) ,U CJ(J+1) , ..., U CJk as

xˆ1, ..., xˆJ−1, xˆJ , ..., xˆk−1 and letx ˆk = −∞,

CJ1 CJ2 C C CJk where U =x ˆ1,U =x ˆ2, ··· ,U J(J−1) =x ˆJ−1,U J(J+1) =x ˆJ , ··· ,U =x ˆk−1. 58 Sk Let CJ (X) = i=1((ˆxi, ∞) ∩ Si), then CJ (X) is 100(1 − α)% confidence set for θ. Since if θ ∈ Si, then

k [ Pθ(θ ∈ CJ (X)) = Pθ(θ ∈ ((ˆxi, ∞) ∩ Si)) i=1

= Pθ(θ ∈ ((ˆxi, ∞) ∩ Si))

= Pθ(θ ∈ (ˆxi, ∞))

≥ 1 − α

If T=0, then for all 1 ≤ i ≤ k − 1, (ˆxi, ∞) ∩ Si = φ, sincex ˆi > 0 and (ˆxi, ∞) ∩ (−∞, 0] = φ. Thus,

k [ CJ (X) = ((ˆxi, ∞) ∩ Si) i=1 k−1 [ k−i−1 [ = { ((ˆxi, ∞) ∩ (1, ∞) × (−∞, 0])} {(ˆxk, ∞) ∩ Sk} i=1

= (ˆxk, ∞) ∩ Sk (3.15)

= Sk

= (0, ∞)k−1. (3.16) 59 Therefore,

P (ω : MJ = max Mj) = P (ω : MJ = Mj for some j in 1, 2, ..., k) j=1,...,k

= P (ω : MJ ≥ M1,MJ ≥ M2, ...,

MJ ≥ MJ−1,MJ ≥ MJ+1, ..., MJ ≥ Mk)

= P (ω : ∆J1 ≥ 0, ∆J2 ≥ 0, ..., ∆J(J−1) ≥ 0,

∆J(J+1) ≥ 0, ..., ∆Jk ≥ 0)

≥ P (ω : ∆∗ ∈ (0, ∞)k−1) e ≥ 1 − α, (3.17)

∗ T where ∆ = (∆J1, ..., ∆J(J−1), ∆J(J+1), ..., ∆Jk) . e It is a result from the (3.16):

∗ k−1 P (ω : ∆ ∈ (0, ∞) ) ≥ P (X : θ ∈ CJ (X)) e ≥ 1 − α.

The inequality (3.17) holds. Thus,

P (ω : MJ = max Mj) ≥ 1 − α. j=1,...,k

That means that the treatment J is the best one.

With the same setting of data (xij) and hypotheses, we now turn to a result involving any treatment J, when J is not fixed.

Cji Theorem 3.2. The 100(1 − α/k)% confidence interval for ∆ji is (U , ∞), where Cji =

nj (2ni+nj +1) 2 + 1 − wα/k and wα/k is the upper α/k percentile of the null distribution of the

Cji Wilcoxon Rank Sum statistic. Then U is the Cjith position in the list of njni increasing 60

1 2 nj ni 1 2 nj ni ordered difference xjh − xir ∈ {U ,U , ··· ,U }, where {U ≤ U ≤ · · · ≤ U }, h =

1, 2, ..., nj and r = 1, 2, ..., ni. For any J, let T be the largest integer i (1 ≤ i ≤ k) such that

U CJi ≤ 0. Otherwise, T=0. Screening J in (1, 2, ..., k), if exists J such that T=0, then J is the best treatment with confidence level 1 − α.

Proof. As the procedure in Section 3.4.2, if the screening procedure stop at stage H (H ≤ k), where T = 0, then the treatment J = k-H+1 is the best one. Meanwhile, every stage is the

CJT k−T −1 procedure in Theorem 3.1. Then, from Theorem 3.1, IJ (Y ) = (U , ∞)∩(0, ∞) is the confidence set for θ = (M1,M2, ..., Mk) at the stage corresponding with screening treatment J,

J ∈ (1, 2, ..., k). Thus, first, we need to prove P (Y : θ ∈ IJ (Y )) ≥ 1−α/k for any J in 1,2,...,k.

T Then, adjust with the Bonferroni inequality to indicate that P (Y : θ ∈ ∩j=1Ij(Y )) ≥ 1 − α. The Rk−1 space can be partitioned as follows:

k−1 Sk = (0, ∞)

Sk−1 = (−∞, 0]

Sk−2 = (0, ∞) × (−∞, 0] . .

k−1−i Si = (0, ∞) × (−∞, 0] . .

k−2 S1 = (0, ∞) × (−∞, 0]

k k−1 k−1 Since ∪i=1Si = R and Si ∩ Sj = φ for i 6= j, then S1, ..., Sk partition the R space. Denote

U CJ1 ,U CJ2 , ..., U CJ(J−1) ,U CJ(J+1) , ..., U CJk as

xˆ1, ..., xˆJ−1, xˆJ , ..., xˆk−1 and letx ˆk = −∞. 61

CJ1 CJ2 C C CJk where U =x ˆ1,U =x ˆ2, ··· ,U J(J−1) =x ˆJ−1,U J(J+1) =x ˆJ , ··· ,U =x ˆk−1. Sk Let CJ (Y ) = i=0((ˆxi, ∞) ∩ Si), then CJ (Y ) is 100(1 − α/k)% confidence set for θ. Since if θ ∈ Si, then

k [ Pθ(θ ∈ CJ (Y )) = Pθ(θ ∈ ((ˆxi, ∞) ∩ Si)) i=1

= Pθ(θ ∈ ((ˆxi, ∞) ∩ Si))

= Pθ(θ ∈ (ˆxi, ∞))

≥ 1 − α/k.

Notice that there are three properties here:

a) if i > T ( if such i exists), then (ˆxi, ∞) ∩ Si = φ ,since (ˆxi, ∞) ∩ (−∞, 1] = φ

k−T k−i−1 k−T b) if i < T (if such i exists), then Si ⊂ (0, ∞) since (0, ∞) × (−∞, 0] ⊂ (0, ∞) .

c) (ˆxT , ∞) ⊃ (0, ∞). Therefore,

k [ CJ (Y ) = ((ˆxi, ∞) ∩ Si) i=1 T [ = ((ˆxi, ∞) ∩ Si) (3.18) i=1 [ k−T ⊆ {(ˆxT , ∞) ∩ ST } (0, ∞) (3.19)

k−T −1 [ k−T ⊆ {(ˆxT , ∞) ∩ (0, ∞) × (−∞, 0]} {(0, ∞) ∩ (ˆxT , ∞)} (3.20)

k−T −1 = (ˆxT , ∞) ∩ (0, ∞)

= IJ (x)

Here, the equation (3.18) comes from the property a); The subsets (3.19) and (3.20) result 62 from property b) and c), respectively. Thus,

P (Y : θ ∈ IJ (x)) ≥ P (Y : θ ∈ CJ (Y )) α ≥ 1 − k

for any J in 1,2,...,k. Let

c c IJ (Y ) , (IJ (Y )) , then, we have:

c sup Pθ(Y : θ ∈ IJ (Y )) = 1 − inf Pθ(Y : θ ∈ IJ (Y )) θ θ α ≤ 1 − (1 − ) k α = . k

By Bonferroni inequality,

k k c X c P (Y : θ ∈ ∪J=1IJ (Y )) ≤ P (Y : θ ∈ IJ (Y )) J=1 α = k k = α.

Thus,

k k c P (Y : θ ∈ ∩J=1IJ (Y )) = 1 − P (Y : θ ∈ ∪J=1IJ (Y ))

≥ 1 − α. 63 Since H ≤ k, then:

H c k c P (Y : θ ∈ ∪J=1IJ (Y )) ≤ P (Y : θ ∈ ∪J=1IJ (Y )) k X c ≤ P (Y : θ ∈ IJ (Y )) J=1 α = k k = α.

Thus,

H H c P (Y : θ ∈ ∩J=1IJ (Y )) = 1 − P (Y : θ ∈ ∪J=1IJ (Y ))

≥ 1 − α.

3.4 Procedures

3.4.1 Procedure for Theorem 3.1

Suppose the better treatment has larger larger median, for data (xij) where i=1,2,...,k stands for k treatments with ni size for ith treatment, applying Theorem 3.1 to produce the proce- dure which will have k steps, since T=0. And the k steps are in the Figure 3.1.

3.4.2 Procedure for Theorem 3.2

The procedure for Theorem 3.2 is at most k stage which repeat k-1 steps in procedure for Theorem 3.1 for each J=1,2,...,k. Without loss of generality, we screen from treatment k till treatment 1, although the screening order can be random. It would stop at the stage where 64 T=0. Figures 3.2, 3.3 and 3.4 show the whole procedure of Theorem 3.2. 65 3.5 Simulation

Wilcoxon Mann-Whitney test is a nonparametric method to test the difference of population medians. Still, without loss of generality, suppose the higher response median, the better treatment. Here, we will simulate the new procedure to verify the coverage probability with Wilcoxon Mann-Whitney test. As for the Theorem 3.1, let the first treatment be the largest median response. That is, the first treatment is the best one. Suppose there are four treatments (k=4). The number of iterations is set to 10,000. Considering that the normal distribution has the same mean and median, the multivariate normal data with correlation equals zero and variance

equals one, which is N4(µ, Σ = I), is generated. With different sample sizes n, the overall e coverage probability of the new procedure with the first treatment as the best at significance level α = 0.05 is shown on the Table 3.2. To be consistent, let the medians be M =

(M1,M2,M3, ..., Mk), where k is number of populations. Table 3.2: Coverage Probability with C.L.=.95 and Different Sample Sizes

M = (M1,M2,M3,M4) Coverage Probability Sample size n (3.9,3,2,1) 0.9555 30 (3.9,3,2,1) 0.964 32 (3.9,3,2,1) 0.976 34 (3.9,3,2,1) 0.981 35

As seen from Table 3.2, the new procedure with Wilcoxon Mann-Whitney test is very sensitive to sample size. The coverage probability increases as the sample size increases. Meanwhile, all of the coverage probabilities are at the confidence level of 0.95. The next, considering different position of pre-specified best treatment and there are only four treatments in total. With the other conditions remain the same, the result is presented on the Table 3.3. As the pre-specified treatment is known, the coverage probability will keep the same confidence level at 0.95 no matter what position order it is in the overall treatments. 66 Table 3.3: Coverage Probability with C.L.=.95 and Sample Size n=30

M = (M1,M2,M3,M4) Coverage Probability Best Treatment Position (3.9,3,2,1) 0.9555 1st (3,3.9,2,1) 0.952 2nd (3,2,3.9,1) 0.954 3rd (3,2,1,3.9) 0.959 4th

When it comes to the Theorem 3.2, there is no pre-specified best treatment. To verify the new procedure’s overall coverage in selecting the best one, we use several different shape data to simulate. Here, let k =5 and the number of iterations is set to 10,000. Similarly, the multivariate distribution is generated. With the confidence level 0.95, we consider the cov- erage probability for the linearity decreasing median response M = (M1,M2,M3,M4,M5) = (5.1, 4, 3, 2, 1). The new procedure shows that the best treatment is the first one and cov- erage probability is 0.953. For the U-shape median response M = (M1,M2,M3,M4,M5) = (7.1, 3.5, 2, 4, 6), the best treatment is the first one and the coverage probability is 0.957.

As for the inverted U-shape median response M = (M1,M2,M3,M4,M5) = (2, 3.5, 7.1, 4, 6), the coverage probability from new procedure with Wilcoxon Mann-Whitney test is 0.958 and the best treatment is the third one. From the logarithm median response M =

(M1,M2,M3,M4,M5) = (1.10, 2.05, 2.64, 3.1, 4.1), the coverage probability is 0.959 and the best treatment is the last one. These results are summarized in Table 3.4.

Table 3.4: Coverage Probability with Sample Size n=30 and Different Median Shapes

M = (M1,M2,M3,M4,M5) Coverage Probability Best Treatment (5.1,4,3,2,1) 0.953 1st one (7.1,3.5,2,4,6) 0.957 1st one (2,3.5,7.1,4,6) 0.958 3rd one (1.10,2.05,2.64,3.1,4.2) 0.959 4th one

From Table 3.4, the new procedure selects the best treatment at the confidence level 0.95, which corresponds to the procedure proved in Theorem 3.2. 67 Step 1

?

If U CJk > 0 where U 1 ≤ U 2 ≤ · · · ≤ U nJ nk

nJ (2nk+nJ +1) and CJk = 2 + 1 − wα Yes No ? ? CJk Assert ∆Jk > U and J Assert ∆Jk > 0 and goes to Step 2 is the not best treatment; Stop

? Step 2

? If U CJk−1 > 0 where U 1 ≤ U 2 ≤ · · · ≤ U nJ nk−1

nJ (2nk−1+nJ +1) and CJk−1 = 2 + 1 − wα Yes No ? ? CJk−1 Assert ∆Jk−1 > U and J Assert ∆Jk > 0 and goes to Step 3 is not the best treatment; Stop

? . .

? Step k-1

? If U CJ1 > 0 where U 1 ≤ U 2 ≤ · · · ≤ U nJ n1

nJ (2n1+nJ +1) and CJ1 = 2 + 1 − wα Yes No ? ? CJ1 Assert ∆J1 > U and J Assert ∆J1 > 0 and goes to Step k is not the best treatment; Stop

? Step k - Assert treatment J is the best

Figure 3.1: Procedure for Theorem 3.1 Stage 1 (J=k) 68 Step 1

?

If U Ckk−1 > 0 where U 1 ≤ U 2 ≤ · · · ≤ U nknk−1

nk(2nk−1+nk+1) and Ckk−1 = 2 + 1 − wα Yes No ? ?

Ck(k−1) Assert ∆k(k−1) > U and k is Assert ∆k(k−1) > 0 and goes to Step 2 the not best treatment; Go to Stage 2

? Step 2

? If U Ckk−2 > 0 where U 1 ≤ U 2 ≤ · · · ≤ U nknk−2

nk(2nk−2+nk+1) and Ckk−2 = 2 + 1 − wα Yes No ? ?

Ck(k−2) Assert ∆k(k−2) > U and k is Assert ∆k(k−2) > 0 and goes to Step 3 not the best treatment; Go to Stage 2

? . .

? Step k-1

? If U Ck1 > 0 where U 1 ≤ U 2 ≤ · · · ≤ U nkn1

nk(2n1+nk+1) and Ck1 = 2 + 1 − wα Yes No ? ? Ck1 Assert ∆k1 > U and k is Assert ∆k1 > 0 and goes to Step k not the best treatment; Goes to Stage 2

? Step k - Assert treatment k is the best

Figure 3.2: Procedure for Theorem 3.2 at Stage 1 Stage 2 (J=k-1) 69 Step 1

?

If U C(k−1)k > 0 where U 1 ≤ U 2 ≤ · · · ≤ U nk−1nk

nk−1(2nk+nk−1+1) and C(k−1)k = 2 + 1 − wα Yes No ? ?

C(k−1)k Assert ∆(k−1)k > U and k is Assert ∆(k−1)k > 0 and goes to Step 2 the not best treatment; Go to Stage 3

? Step 2

? If U C(k−1)(k−2) > 0 where U 1 ≤ U 2 ≤ · · · ≤ U nk−1nk−2

nk−1(2nk−2+nk−1+1) and C(k−1)(k−2) = 2 + 1 − wα Yes No ? ?

C(k−1)(k−2) Assert ∆(k−1)(k−2) > U and k-1 Assert ∆(k−1)(k−2) > 0 and goes to Step 3 is not the best treatment; Go to Stage 3

? . .

? Step k-1

? If U C(k−1)1 > 0 where U 1 ≤ U 2 ≤ · · · ≤ U n(k−1)n1

nk−1(2n1+nk−1+1) and C(k−1)1 = 2 + 1 − wα Yes No ? ?

C(k−1)1 Assert ∆(k−1)1 > U and k-1 is Assert ∆(k−1)1 > 0 and goes to Step k not the best treatment; Goes to Stage 3

? Step k - Assert treatment k-1 is the best

Figure 3.3: Procedure for Theorem 3.2 at Stage 2 Stage 3 (J=k-2) 70

. . .

Stage k (J=1) Step 1

? If U C1k > 0 where U 1 ≤ U 2 ≤ · · · ≤ U n1nk n1(2nk+n1+1) and C1k = 2 + 1 − wα Yes No ? ? C1k Assert ∆1k > U and 1 is Assert ∆1k > 0 and goes to Step 2 the not best treatment; Stop

? Step 2

? If U C1(k−1) > 0 where U 1 ≤ U 2 ≤ · · · ≤ U n1nk−1 n1(2nk−1+n1+1) and C1(k−1) = 2 + 1 − wα Yes No ? ? C1(k−1) Assert ∆1(k−1) > U and 1 Assert ∆1(k−1) > 0 and goes to Step 3 is not the best treatment; Stop

? . .

? Step k-1

? If U C12 > 0 where U 1 ≤ U 2 ≤ · · · ≤ U n(1)n2 n1(2n2+n1+1) and C12 = 2 + 1 − wα Yes No ? ? C12 Assert ∆12 > U and 1 is Assert ∆12 > 0 and goes to Step k not the best treatment; Stop

? Step k -Assert treatment 1 is the best

Figure 3.4: Procedure for Theorem 3.2 from Stage 3 to Stage k 71

CHAPTER 4

INDENTIFYING THE BEST TREATMENT UNDER NORMALITY

4.1 Multivariate Normal Distribution

For normal distributions, if there is only one-dimension, it is the univariate normal case, where there are two parameters, population mean µ and variance σ2. If there are k- dimensions, it is the multivariate normal version, where µ is a vector of length k and variance becomes a k × k symmetric and positive definite covariance matrix Σ.

A k-dimensional random vector X = (X1,X2, ..., Xk) which has a multivariate normal distribution can be denoted as Nk(µ, Σ), where the k dimensional mean is

µ = [E(X1),E(X2), ..., E(Xk)], and variance-covariance matrix is

Σ = [Cov[Xi,Xj]] i, j = 1, 2, ..., k. 72 The multivariate normal distribution has its density:

1 1 f (x , x , ..., x ) = exp− (x − µ)T Σ−1(x − µ), (4.1) X 1 2 k (2π)k/2|Σ|1/2 2 where |Σ| is the determinant of Σ, Σ−1 is the inverse of Σ and (x − µ)T is the transpose of vector (x − µ). In particular, when k=2, that is, there are only two dimensions for the normal distri- bution, it is called a bivariate normal distribution. Let a two-dimensional random variable

(X,Y) follow N2(µ, Σ). Then   µ  x  µ =   µy   2 σx ρσxσy Σ =    2  ρσxσy σy where ρ is the correlation between X and Y.

2 2 σx and σy are the variance for X and Y, respectively.

Therefore, the density of bivariate normal distribution is :

2 2 1 1 (x − µx) 2ρ(x − µx)(y − µy) (y − µy) fX,Y (x, y) = exp(− [ − + ]). p 2 2 2 2 2πσxσy 1 − ρ 2(1 − ρ ) σx σxσy σy (4.2)

4.2 t-test with Welch Correction

We start with a brief review for the one sample t-test. The t distribution comes from normal

2 and chi-square distributions. If two independent random variables z and χn are standard normal and chi-square with n degrees of freedom, respectively, then the random variable 73 z tn = q 2 χn n follows the t distribution with n degrees of freedom. The probability density function (p.d.f.) of t is

Γ( n+1 ) √ 2 1 fT (t) = n 2 n+1 nπΓ( 2 ) t 2 [( n +1)]

2 where −∞ < t < ∞. Since z and χn are independent, the density of tn is the marginal p.d.f.

2 2 from the joint p.d.f. of z and χn. For convenience, let v denote χn. Then the joint p.d.f. of z and v is

1 z2 1 n v − 2 2 −1 − 2 fZ,V (z, v) = 1/2 e n n v e (4.3) (2π) 2 Γ( 2 )2 where −∞ < z < ∞ and 0 < v < ∞. When testing whether the mean of a normal distributed population with unknown vari- ance has a specified value or not, t test is applicable. Take a simple one sample as an

example, x1, x2, ..., xn is a random sample from normal distribution with mean µx and un-

2 2 known variance σx. Letx ¯ and S be the sample mean and variance, respectively. Then Pn 2 2 i=1(xi−x¯) S = n−1 . Under the null hypothesis: H0 : µ ≤ µ0 versus alternative hypothesis:

x¯−√µ0 H1 : µ > µ0, the test statistic is: T = S n , where T follows a t-distribution with degrees of freedom n-1 when the null hypothesis is true. The above review shows the connection between the variability and the mean of the population. In two populations with the equal variance, the two sample t-test is used. For example,

2 2 x1, x2, ..., xn follow a normal distribution with mean µx and variance σx wherex ¯ and Sx are

sample mean and variance estimators, respectively; y1, y2, ..., ym follow a normal distribution

2 2 with mean µy and variance σy wherey ¯ and Sy are sample mean and variance estimators,

respectively; the hypotheses are H0 : µx ≤ µy versus H1 : µx > µy, then with equal variance

2 2 σx = σy, t-statistic is:

x¯−y¯√−(µx−µy) T = 1 1 S n + m 74 q 2 2 2 (n−1)Sx+(m−1)Sy with degrees of freedom n+m-2, where S = n+m−2 is the estimator of pool variance of the two samples.

The 100(1 − α)% one-sided confidence interval for µx − µy is:

q 1 1 µx − µy > x¯ − y¯ − S n + m tα,n+m−2.

q 1 1 Ifx ¯ − y¯ − S n + m tα,n+m−2 > 0, there is sufficient evidence to reject the null hypothesis

in favor of the alternative hypothesis. That is, we conclude that µx > µy at α significance level. We have discussed the two-sample t-test with the same variance. Now, it is necessary to cover the two-sample t-test with unequal variances. Specifically, in normally distributed populations with different variances, the problem concerning the difference between two population means with unequal variances is called Behrens-Fisher problem. Welch (1947) proposed the Welch-t-test which is considered as an approximate solu- tion to the Behrens-Fisher problem. The main difference between Student’s t-test in equal variance and Welch-t-test in unequal variances is the degrees of freedom. In Welch-t-test, Welch-Satterthwaite equation is used to calculate the effective degrees of freedom. From

2 Satterthwaite (1946), for N samples with estimated variances Si , degrees of freedom vi, and

the linear combination L with fixed values a1, a2, ..., aN :

PN 2 L = i=1 aiSi the distribution of L can be approximated by a chi-square distribution with the number of degrees of freedom v:

PN 2 2 ( i=1 aiSi ) v = (PN a S2)2 . PN i=1 i i i=1 vi

2 2 In two populations with unequal variances σx 6= σy, Welch’s t test is used. Assume that

2 2 (x1, x2, ..., xn) ∼ N(µx, σx) with sample meanx ¯ and sample variance Sx,(y1, y2, ..., ym) ∼

2 2 N(µy, σy) with sample meany ¯ and sample variance Sy . 75 1 1 Thus, a1 = n , a2 = m , a3 = a4 = ... = aN = 0 and v1 = n − 1, v2 = m − 1. Then 2 2 Sx Sy L = n + m and the number of degrees of freedom v is:

2 2 Sx Sy 2 ( n + m ) v = 4 4 . (4.4) Sx Sy n2(n−1) + m2(m−1)

Therefore, the Welch’s t test is:

x¯−y¯−(µx−µy) t = r 2 2 Sx Sy n + m with degrees of freedom v in equation (4.4).

4.3 Simultaneous Inference

4.3.1 Main Results

Without loss of generality, assume that the larger positive response means the better treat-

ment. Let X1 = (x11, x12, ..., x1n1 ), X2 = (x21, x22, ..., x2n2 ), ··· , Xk = (xk1, xk2, ..., xknk ) be the random samples from the mutually independent population 1,2,...,k, respectively. In

addition, the random variable xij follows a normal distribution with mean µi and variance

2 σi , where i = 1, 2, ..., k and j = 1, 2, ..., ni. Specifically, for any two random variables xit and

xjs with i 6= j,(xit, xjs) follows a bivariate normal distribution N2(µ, Σ), where mean vector   2 σi 0 µ = (µi, µj) and variance-covariance matrix Σ =  .  2  0 σj Here, we only discuss new procedures under the assumption that the populations are independent. If the populations are not independent, especially covariances are not zero, the inference for confidence interval becomes complicated. The new procedure for dealing with dependent populations is under future investigation.

Theorem 4.1. Suppose there are k mutually independent populations. Let Xi = (xi1, xi2, ..., xini )

2 be a random sample from the ith population, where xit follows N(µi, σi ), t = 1, 2, ..., ni. With 76

hypotheses H0 : µj ≤ µi vs H1 : µj > µi, Cj,i is the lower bound of 100(1 − α)% one-sided

confidence interval for the mean difference µj − µi. For pre-specified treatment J in 1,2,...,k,

if CJ,i > 0 for all i 6= J, then

P (X : µJ = max µj) ≥ 1 − α. j=1,2,...,k

That is, with confidence level 1 − α, J is the best treatment.

2 Proof. Letx ¯i, Si and ni be the sample mean, sample variance, and sample size for the random

sample Xi, respectively. Since the 100(1−α)% lower bound for one-sided confidence interval

q 2 2 SJ Si for the mean difference µJ − µi, i 6= j, CJ,i =x ¯J − x¯i − tα,v + where tα,v is the upper nJ ni α percentile of t distribution with degrees of freedom v, is compared with zero, then Rk−1 space can be partitioned as follows:

k−1 Sk = (0, ∞)

Sk−1 = (−∞, 0]

Sk−2 = (0, ∞) × (−∞, 0] . .

k−1−i Si = (0, ∞) × (−∞, 0] . .

k−2 S1 = (0, ∞) × (−∞, 0]

k−1 Therefore, S1,S2, ..., Sk partition the R space. Denote

CJ1,CJ2, ..., CJ(J−1),CJ(J+1), ..., CJk as

yˆ1, ..., yˆJ−1, yˆJ , ..., yˆk−1 andy ˆk = −∞

where CJ1 =y ˆ1,CJ2 =y ˆ1, ··· ,CJ(J−1) =y ˆJ−1,CJ(J+1) =y ˆJ , ··· ,CJk =y ˆk−1. Sk Let DJ (X) = i=1((ˆyi, ∞) ∩ Si), then DJ (X) is 100(1 − α)% confidence set for θ = 77

(µJ − µ1, µJ − µ2, ..., µJ − µJ−1, µJ − µJ+1, ..., µJ − µk). Since if θ ∈ Si, then

k [ Pθ(θ ∈ DJ (X)) = Pθ(θ ∈ ((ˆyi, ∞) ∩ Si)) i=1

= Pθ(θ ∈ ((ˆyi, ∞) ∩ Si))

= Pθ(θ ∈ (ˆyi, ∞))

≥ 1 − α.

Ify ˆi > 0 for all i in 1,2,...k-1, then (ˆyi, ∞) ∩ (−∞, 0] = φ for i in 1,2,...,k-1. Thus,

k [ DJ (X) = ((ˆyi, ∞) ∩ Si) i=1 k−1 [ k−i−1 [ = { ((ˆyi, ∞) ∩ (0, ∞) × (−∞, 0])} {(ˆyk, ∞) ∩ Sk} i=1

= (ˆyk, ∞) ∩ Sk (4.5)

= Sk

= (0, ∞)k−1. (4.6)

Let Λji = µj − µi for all i 6= j and i, j = 1, 2, ..., k, then for the pre-specified J:

P (X : µJ = max µj) = P (X : µJ ≥ µ1, µJ ≥ µ2, ..., j=1,...,k

µJ ≥ µJ−1, µJ ≥ µJ+1, ..., µJ ≥ µk)

= P (X :ΛJ1 ≥ 0, ΛJ2 ≥ 0, ..., ΛJ(J−1) ≥ 0,

ΛJ(J+1) ≥ 0, ..., ΛJk ≥ 0)

≥ P (X :Λ∗ ∈ (0, ∞)k−1) e ≥ 1 − α. (4.7)

∗ T where Λ = (ΛJ1, ..., ΛJ(J−1), ΛJ(J+1), ..., ΛJk) . e 78 From the result (4.6), we have:

∗ k−1 P (X :Λ ∈ (0, ∞) ) = Pθ(θ ∈ DJ (X)) e ≥ 1 − α.

Then the inequality (4.7) holds.

P (X : µJ = max µj) ≥ 1 − α. j=1,2,...,k

That is, with confidence level 1 − α, J is the best treatment.

q 2 2 SJ Si For the critical value tα,v in the lower bound CJ,i =x ¯J − x¯i − tα,v + , if variances nJ ni 2 2 are the same σJ = σi , then the degrees of freedom v equal nJ + ni − 2. If the variances are different, then Welch’s t statistic with degrees of freedom v in equation (4.4) is appropriate.

Similarly, in Theorem 4.2, these rules can be applied in the critical value tα/k,v.

Theorem 4.2. Suppose there are k mutually independent populations. Let Xi = (xi1, xi2, ..., xini )

2 be a random sample from the ith population, where xit follows N(µi, σi ), t = 1, 2, ..., ni. With hypotheses H0 : µj ≤ µi vs H1 : µj > µi, Cj,i is the lower bound of 100(1 − α/k)% one-sided confidence set for the mean difference µj − µi. Screening for J ∈ {1, 2, ..., k}, if CJ,i > 0 for all i 6= J, J is the best treatment with confidence level 1 − α.

q 2 2 SJ Si Proof. Similar to the proof of Theorem 4.1, the lower bound CJ,i =x ¯J −x¯i −tα/k,v + , nJ ni 79 is compared with zero, then Rk−1 space can be partitioned as follows:

k−1 Ak = (0, ∞)

Ak−1 = (−∞, 0]

Ak−2 = (0, ∞) × (−∞, 0] . .

k−1−i Ai = (0, ∞) × (−∞, 0] . .

k−2 A1 = (0, ∞) × (−∞, 0]

k−1 Therefore, A1,A2, ..., Ak partition the R space. Denote

CJ1,CJ2, ..., CJ(J−1),CJ(J+1), ..., CJk as

yˆ1, ..., yˆJ−1, yˆJ , ..., yˆk−1 andy ˆk = −∞ where CJ1 =y ˆ1,CJ2 =y ˆ1, ··· ,CJ(J−1) =y ˆJ−1,CJ(J+1) =y ˆJ , ··· ,CJk =y ˆk−1. As shown on Section 4.3.3, if the procedure stop at stage H where the treatment is J

= k-H+1 and CJ,i > 0 for all i 6= J, then treatment J is the best one. Let T be the largest integer i such that CJi ≤ 0, if T exists. Otherwise, T = 0. Then T is a function of

k−T −1 J. Moreover, IJ (X) = (ˆyJ , ∞) ∩ (0, ∞) is the confidence set for θ = (µJ − µ1, µJ −

µ2, ..., µJ − µJ−1, µJ − µJ+1, ..., µJ − µk) at the stage corresponding with treatment J, J = 1,2,...,k. Sk Let DJ (X) = i=1((ˆyi, ∞)∩Si), then DJ (X) is 100(1−α/k)% confidence set for θ. Since 80 if θ ∈ Si, then

k [ Pθ(θ ∈ DJ (X)) = Pθ(θ ∈ ((ˆyi, ∞) ∩ Si)) i=1

= Pθ(θ ∈ ((ˆyi, ∞) ∩ Si))

= Pθ(θ ∈ (ˆyi, ∞))

≥ 1 − α/k.

Similar to proof of Theorem 3.2, DJ (X) = IJ (X). Therefore, for any J in 1,2,...,k,

Pθ(θ ∈ IJ (X)) = Pθ(θ ∈ DJ (X))

≥ 1 − α/k.

Let

c c IJ (X) , (IJ (X)) , then, we have:

c sup Pθ(X : θ ∈ IJ (X)) = 1 − inf Pθ(X : θ ∈ IJ (X)) θ θ α ≤ 1 − (1 − ) k α = . k 81 By Bonferroni inequality,

k k c X c P (X : θ ∈ ∪J=1IJ (X)) ≤ P (X : θ ∈ IJ (X)) J=1 α = k k = α.

Thus,

k k c P (X : θ ∈ ∩J=1IJ (X)) = 1 − P (X : θ ∈ ∪J=1IJ (X))

≥ 1 − α.

Since H ≤ k, then:

H c k c P (X : θ ∈ ∪J=1IJ (X)) ≤ P (X : θ ∈ ∪J=1IJ (X)) k X c ≤ P (X : θ ∈ IJ (X)) J=1 α = k k = α.

Thus,

H H c P (X : θ ∈ ∩J=1IJ (X)) = 1 − P (X : θ ∈ ∪J=1IJ (X))

≥ 1 − α.

That is, with confidence level 1 − α, J=k-H+1 is the best treatment. 82 4.3.2 Procedure for Theorem 4.1

Suppose that the better treatment has larger mean, for data (xij) where i=1,2,...,k stands for k treatments with sample size n for the ith treatment. Applying Theorem 4.1, if the pre- specified treatment J is the best treatment with 1 − α confidence level, then the procedure will have k steps. And the k steps with the case of equal variance and sample size are stated in Figure 4.1.

4.3.3 Procedure for Theorem 4.2

The procedure for Theorem 4.2 is at most k stages which repeat at most k steps in each stage for each J in 1,2,...,k. Without any loss of generality, the procedure here uses the equal variance in k populations and equal sample size. The Figures 4.2, 4.3 and 4.4 show the whole procedure of Theorem 4.2.

4.4 Simulation

If there are sufficient large of independent random samples, some statistic will be approx- imately normally distributed. In this section, we will use simulation to verify the cov- erage probability of new procedures work under the normality assumption. Still, with- out loss of generality, suppose the greater mean stands for a better treatment and let

2 2 2 2 σ1 = σ2 = ... = σk = σ , n1 = n2 = ... = nk = n. In this design of simulation study, for parameter configurations, we set the population means at several different levels which represent the efficacy of treatments. The number of iterations is set as 10,000. Variance σ2, the scale parameter, is set at different levels to see the effect of the sample variability on the outcome of the theoretical results. All the simulated runs are at a confidence level of 0.95. Moreover, different mean vectors are set to compare the overall coverage probability of the procedure under different scenarios. 83 In what follows, when the pre-specified treatment for comparison is available and k=5, Table 4.1 confirms that the coverage probabilities of the new procedure are close to the nominal coverage probability as shown in Theorem 4.1.

Table 4.1: Coverage Probability with Different Orders under Normality

µ = (µ1, µ2, µ3, µ4, µ5) Coverage Probability Best Treatment position (5.9,5,4.9,3,2) 0.9561 1st (5,5.9,4.9,3,2) 0.9567 2nd (5,4.9,5.9,3,2) 0.9529 3rd (5,4.9,3,5.9,2) 0.9519 4th (5,4.9,3,2,5.9) 0.9534 5th

As shown on Table 4.1, the order of pre-specified treatments for comparison will not affect in the procedure. The coverage probability of the procedure is very stable at the 0.95 confidence level.

q 2 Since the endpoint isx ¯i −x¯j −tα,2n−2 n , the coverage probability is highly related to the sample size n. In Table 4.2, we will simulate the coverage probability with different sample sizes as the pre-specified treatment is the first one.

Table 4.2: Coverage Probability with Different Sample Sizes under Normality

µ = (µ1, µ2, µ3, µ4, µ5) Coverage Probability Sample Size (5.9,5,4.9,3,2) 0.9561 30 (5,5.9,4.9,3,2) 0.9679 32 (5,4.9,5.9,3,2) 0.9747 34 (5,4.9,3,5.9,2) 0.9781 35

As the sample size increases, the lower bound increases. Thus, the coverage probabilities increase, as shown in Table 4.2. Next, consider the variance. Increasing the variance will counteract the effect of increasing sample size. In other words, if the data to be analyzed has large variability, increasing the sample size is a way to solve this problem when applying the new procedure with normality. 84 In Table 4.3, the coverage probability keeps at the same level as when we increases both of the sample size and the variability simultaneously.

Table 4.3: Coverage Probability with Different Variances under Normality

µ = (µ1, µ2, µ3, µ4, µ5) Coverage Probability Variance Sample Size (5.9,5,4.9,3,2) 0.9561 30 1 (5.9,5,4.9,3,2) 0.9548 85 2 (5.9,5,4.9,3,2) 0.9508 250 4 (5.9,5,4.9,3,2) 0.9555 800 8

The ratio between variance and sample size does not remain the same since the critical

value tα,2n−2 changes as the sample size changes. Now, for Theorem 4.2, different shapes of mean responses are considered. The confidence level is still set at 0.95, and the sample size and variance are 30 and 1, respectively. As for k = 5, the individual confidence level is 1 − α/k = 0.99. The simulation results include

the linear mean response µ = (µ1, µ2, µ3, µ4, µ5) = (5.1, 4, 3, 2, 1), U shape mean response

µ = (µ1, µ2, µ3, µ4, µ5) = (7.05, 3.51, 2.78, 4.23, 6.00). inverted-U shape mean response µ =

(µ1, µ2, µ3, µ4, µ5) = (2.78, 3.51, 7.05, 4.23, 6.00), and logarithm shape mean response µ =

(µ1, µ2, µ3, µ4, µ5) = (1.10, 2.05, 2.64, 3.1, 4.15). These are summarized in Table 4.4.

Table 4.4: Coverage Probability with Different Mean Shapes under Normality

µ = (µ1, µ2, µ3, µ4, µ5) Coverage Probability Best Treatment (5.1,4,3,2,1) 0.956 1st (7.05,3.51,2.78,4.23,6.00) 0.9521 1st (2.78,3.51,7.05,4.23,6.00) 0.9508 3rd (1.10,2.05,2.64,3.1,4.15) 0.9517 5th

Table 4.4 shows the corresponding coverage probabilities of Theorem 4.2 in different mean response settings. The coverage probability is close to the confidence level 0.95. This verifies the theoretical derivation in the proof of Theorem 4.2. Compared with the Bonferroni proce- dure in all situations, the multiplicity adjustment for the new procedure here is reduced from 85 k 2 to k. Thus, the power of the new procedure is significantly higher than the Bonferroni procedure. 86 Step 1

? q 2 If CJk > 0 where CJk =x ¯J − x¯k − tα,2n−2SJk n

and SJk is the pool variance between treatment J and k Yes No ? ? Assert µ − µ > C and J Assert treatment J is better than k; Goes to Step 2 J k Jk is the not best treatment; Stop

? Step 2

? q 2 If CJk−1 > 0 where CJk =x ¯J − x¯k − tα,2n−2SJk−1 n

and and SJk−1 is the pool variance between treatment J and k-1 Yes No ? ? Assert µ − µ > C and J Assert treatment J is better than k-1;Goes to Step 3 J k−1 Jk−1 is not the best treatment; Stop

? . .

? Step k-1

? q 2 If CJ1 > 0 where CJk =x ¯J − x¯1 − tα,2n−2SJ1 n

and SJ1 is the pool variance between treatment J and 1 Yes No ? ? Assert µ − µ > C and J Assert treatment J is better than 1;Goes to Step k J 1 J1 is not the best treatment; Stop

? Step k - Assert treatment J is the best

Figure 4.1: Procedure for Theorem 4.1 Stage 1 (J=k) 87 Step 1

? q 2 If Ck,k−1 > 0 where Ckk−1 =x ¯k − x¯k−1 − tα,2n−2Sk,k−1 n

and Sk.k−1 is the pool variance between treatment k and k-1 Yes No ? ?

Assert treatment k is better than k-1; Assert µk − µk−1 > Ck,k−1 and k is Go to Step 2 the not best treatment; Go to Stage 2

? Step 2

? q 2 If Ck,k−2 > 0 where Ck,k−2 =x ¯k − x¯k−2 − tα,2n−2Sk,k−2 n

and Sk,k−2 is the pool variance between treatment k and k-2 Yes No ? ?

Assert treatment k is better than k-1; Assert µk − µk−2 > Ck,k−2 and k is Go to Step 3 not the best treatment;Go to Stage 2

? . .

? Step k-1

? q 2 If Ck,1 > 0 where Ck,1 =x ¯k − x¯1 − tα,2n−2Sk,1 n

and Sk,1 is the pool variance between treatment k and 1 Yes No ? ?

Assert treatment k is better than 1; Assert µk − µ1 > Ck,1 and k is Go to Step k not the best treatment; Go to Stage 2

? Step k - Assert treatment k is the best

Figure 4.2: Procedure for Theorem 4.2 at Stage 1 Stage 2 (J=k-1) 88 Step 1

? q 2 If Ck−1,k > 0 where Ck,k−1 =x ¯k−1 − x¯k − tα,2n−2Sk−1,k n

and Sk−1,k is the pool variance between treatment k-1 and k Yes No ? ?

Assert treatment k-1 is better than k; Assert µk−1 − µk > Ck−1,k and k-1 is Go to Step 2 the not best treatment;Go to Stage 3

? Step 2

? q 2 If Ck−1,k−2 > 0 where Ck−1,k−2 =x ¯k−1 − x¯k−2 − tα,2n−2Sk−1,k−2 n

and Sk−1,k−2 is the pool variance between treatment k-1 and k-2 Yes No ? ?

Assert treatment k-1 is better than k; Assert µk−1 − µk−2 > Ck−1,k−2 and k-1 Go to Step 3 is not the best treatment;Go to Stage 3

? . .

? Step k-1

? q 2 If Ck−1,1 > 0 where Ck−1,1 =x ¯k−1 − x¯1 − tα,2n−2Sk−1,1 n

and Sk−1,1 is the pool variance between treatment k-1 and 1 Yes No ? ?

Assert treatment k-1 is better than k; Assert µk−1 − µ1 > Ck−1,1 and k-1 is Go to Step k not the best treatment; Go to Stage 3

? Step k - Assert treatment k-1 is the best

Figure 4.3: Procedure for Theorem 4.2 at Stage 2 Stage 3 (J=k-2) 89

. . .

Stage k (J=1) Step 1

? q 2 If C1,k > 0 where C1,k =x ¯1 − x¯k − tα,2n−2S1,k n and S1,k is the pool variance between treatment 1 and k

?Yes? No Assert treatment 1 is better than k; Assert µ1 − µk > C1,k and 1 is Go to Step 2 the not best treatment;Stop

? Step 2

? q 2 If C1,k−1 > 0 where C1,k =x ¯1 − x¯k−1 − tα,2n−2S1,k−1 n and S1,k−1 is the pool variance between treatment 1 and k-1 Yes No ? ? Assert treatment 1 is better than k-1; Assert µ1 − µk−1 > C1(k−1) and 1 Go to Step 3 is not the best treatment;Stop

? . .

? Step k-1

? q 2 If C1,2 > 0 where C1,k =x ¯1 − x¯2 − tα,2n−2S1,2 n and S1,2 is the pool variance between treatment 1 and 2 Yes No ? ? Assert treatment 1 is better than 2; Assert µ1 − µ2 > C12 and 1 is Go to Step k not the best treatment;Stop

? Step k - Assert treatment 1 is the best

Figure 4.4: Procedure for Theorem 4.2 from Stage 3 to Stage k 90

CHAPTER 5

APPLICATIONS IN A PROSTATE CANCER STUDY

5.1 Data Background

Clinical trial data of advanced prostate cancer conducted at M.D. Anderson Cancer Center from December 1998 to January 2006 in Wang et al. (2011) will be used to illustrate practicality of the theorems. Prostate cancer, which is a form of cancer that develops in tissues of the prostate, usually occurs in older men. Thall et al. (2007) indicates that every year there are around 40,000 men suffering from prostate cancer in the United States developing clinically noticeable metastases. That means the cancer cells spread from the prostate to other parts of human’s body. Meanwhile, prostate cancer is the second leading cause of death from cancer in men and is one of the most common cancers in developed countries. Moreover, the rate of prostate cancer is increasing in the developing countries. Thus, the effective therapy for curing prostate cancer is significantly meaningful. To detect prostate cancer, we could use symptoms, physical examination, prostate specific antigen (PSA), or biopsy. In this prostate cancer clinical trial study from M.D. Anderson 91 Cancer Center, prostate specific antigen is applied to indicate the status of prostate cancer. In the traditional dose response studies for clinical trials, patients are randomly assigned to treatments. However, due to the change of health status or toxicity reaction, a sequen- tially randomized trial, which is the selection of best combination of treatments instead of one treatment, is preferred. The dynamic treatment regimes are such a sequentially random- ized trial that consists of different treatments in multiple stages when involving of human subjects. In an advanced prostate cancer study, when a patient is treated with a chemother- apy, symptoms of responses from the chemotherapy such as the change of tumor and adverse reaction play an important role on the decision of the follow-up treatment. As Murphy et al. (2001) pointed out, the dynamic treatment regimes choose the effective treatment for particular patient based on the patient’s need such as medical history and body characters. In two-stage dynamic treatment regimes of advanced prostate cancer, there are four different combinations of treatments in chemotherapy, which are (1) cyclophosphamide, vincristine, and dexamethasone (CVD); (2) ketoconazole plus doxorubicin alternating with vinblastine plus estramustine (KA/VE); (3) weekly paclitaxel, estramustine, and carboplatin (TEC); (4) paclitaxel,estramustine, and etoposide (TEE). The treatment in the follow-up (second) stage is based on the responses of patients on the first chemotherapy treatment. These responses include growing back of solid tumors and metastasize to other body sites and so on. After evaluating the patient’s response to the first treatment, if the response is significantly effective (the success in initial treatment such as the drop of more than 40% prostate specific antigen (PSA) in prostate cancer trails), a higher dose of the same regime will be given. On the other hand, if the dose response is insignificant or severe toxicity occurs, an alternative regimen is proposed. If there are two consecutive successes (the success in initial treatment followed by the second success of 80% drop in PSA) or a total of two failures on a patient, the patient is removed from the study. The flow chart for this prostate cancer study is presented in Figure 5.1. Actually, in this case study, the follow-up treatment of the regime is crucial, since the 92 time of treatment for patient is limited and it is not possible to have every treatment on each patient. Therefore, under this scenario, it is desirable to detect the discernible effect of regimen among all permissible regimens. Considering the prostate cancer trial example, the four different combination of treat-

2 ments (CVD, KA/VE, TEE, and TEC) in chemotherapy in two-stage become P4 = 4 × 3 = 12 different regimens constituted by the combinations of different chemotherapies corre- sponding to the order and switches of treatments. These 12 regimens are (CVD,KA/VE), (CVD,TEC), (CVD,TEE), (KA/VE,CVD), (KA/VE,TEC), (KA/VE,TEE), (TEC,CVD), (TEC,KA/VE), (TEC,TEE),(TEE,CVD), (TEE,KA/VE), (TEE,TEC). Notice that the or- der of treatment is a significant key in advanced prostate cancer treatment. Patients with positive responses with the regimen of (KA/VE,CVD) may have different treatment out- comes when the the regimen of (CVD, KA/VE) is applied. Because these are totally dif- ferent in chemotherapies, the efficacy of one regimen (KA/VE,CVD) may not be the same as the other regime (CVD,KA/VE). For example, in (CVD,KA/VE) procedure, as patient gets treatment CVD first, if the patient’s PSA does not drop more than 40%, the treatment KA/VE replaces of CVD. However, in (KA/VE, CVD) procedure, first, it treats the patient with KA/VE and then with CVD. That is, with the first treatment KA/VE, the PSA drops less than 40% and then treatment CVD is substituted. Therefore, it is necessary to identify the best treatment regimen among these 12 reg- imens. The comparison of the effects of the 12 treatment regimes requires the method of simultaneous inference in multiple comparisons. If just considering the family of comparisons consisting of 12 pairs of comparisons, the multiple comparisons with the best or pairwise multiple comparisons are probably useful. However, those methods cannot be applicable, since they do not involve multiple stages that there are different efficacy between the orders. For example, difference of treatment means from CVD to KA/VE in (CVD,KA/VE) may not be the same as the difference of treatment means from KA/VE to CVD in (KA/VE,CVD), because there are sequential effects in chemotherapy. 93 Due to the generalized feature of the Bonferroni correction, directly applying the Bonfer- roni adjustment to all the 12 treatment regimes yields insignificant conclusion on differences of treatment regimens, as shown in Wang et al. (2011). From previous chapters, there are conclusions that the power of new procedure is significantly higher than those of the Bonfer- roni procedure in all situations, since the multiplicity adjustment for Bonferroni procedure

k is 2 and for new procedure is k. In this case, the multiplicity adjustment for Bonferroni is 12 2 = 66. However, the multiplicity adjustment for new procedure is only 12. For the AL prostate cancer trial, there are two limitations on the observational data. One is the relatively small sample sizes in each regime, another is the conservativeness of simultaneous confidence intervals after multiplicity adjustments. Toward the first difficulty, the pseudo-samples from bootstrap method to handle the small sample size and adjust for the unbiasedness of the estimator are used in Wang et al. (2011). For the problem on conservativeness of simultaneous inference, we use partitioning principle to develop more ef- ficient simultaneous confidence sets to cast significant effects in the comparisons of treatment regimes. The measuring scales innovated by Wang et al. (2011) for efficacy and toxicity according to the characterization of patient status after each treatment in each of the multiple stages are binary scores, ordinal score, and log-survival time. The binary score assigns the value one to the patient when the two consecutive per-course responses are favorable, and zero otherwise. The ordinal score essentially the same as the binary score except the value of 0.5 is assigned to the patient when the corresponding therapy achieves one successful course, or two non-consecutive successful courses. The expert score is defined as the mean of the per- course scores while the patient was on a chemotherapy under investigation. In this section, to verify the new procedure without the influence from pseudo-samples by resampling, the pseudo-samples are not generated, although the sample size is small. Therefore, we will use the difference of prostate specific antigen between two consecutive treatments as response instead of the binary score, ordinal score, or expert score. 94 5.2 Main Results

In this section, we will present three results associated with the new procedures in previous Chapters. One is for simultaneous inference based on the knowledge of the treatments used in the chemotherapy in the AL (Androgen-Independent) prostate cancer trail. The other two are theoretical results which improves the conventional approach of all comparisons with the best in multiple comparisons. Based on drug pathology, it is desirable to evaluate the effect of the treatment start- ing with TEC and following with CVD (the regimen of (TEC, CVD)) compared with the treatment regimens starting with CVD (which consist of regimens (CVD, KA/VE), (CVD, TEE), and (CVD, TEC)), although the comparisons of the efficacy of all 12 regimens are of general interest. In what follows, we first describe a confidence set that takes care of the comparisons of a subset of populations in a fashion governed by prior drug information on the treatments.

5.2.1 Theorem Results

Theorem 5.1. Let C1, ..., Ck−1 be the (1 − α)100% confidence intervals for the endpoint

∗ differences µ − µi between one regimen (such as (TEC,CVD)) and other k-1 regimens (such

+ as (CVD,TEE), (CVD,KA/VE), (CVD,TEC)). If Ci ⊂ R for i = 1, 2, ..., k − 1, then, with confidence level 1 − α the specific regimen is the best among all the regimens associated with the index i = 1, 2, ..., k.

∗ Proof. Consider the following set of parameters for comparisons, ηi = µ −µi for i = 1, .., k−1

c with confidence intervals C1,C2,...,Ck−1. Denote Θi = (−∞, 0], Θi = (0, ∞). Also, let c ∗ c T c T T c T c Θk = Θ, Θk = ∅,Ck = (−∞, ∞), and Θi = Θ1 Θ2 ··· Θi−1 Θi for i = 2, ..., k − 1, k

∗ ∗ ∗ ∗ with Θ1 = Θ1, then Θ1, ··· , Θk−1, Θk−1 constitute a partition of the parameter space Θ. Sk T ∗ Let D(Y ) = i=1(Ci(Y ) Θi ), D is a 100(1 − α)% confidence set for θ. This is because ∗ T ∗ if θ ∈ Θi , then Pθ(θ ∈ D(Y )) = Pθ(θ ∈ Ci(Y ) Θi ) = Pθ(θ ∈ Ci(Y )) = 1 − α. 95 Now, starting from i = 1 and then screening up toward i = k in a step-by-step manner,

c c let M be the first integer satisfying Ci(Y ) ⊂ Θi for all i < M, and CM (Y ) * ΘM . For

+ example if Ci ∈ R for all i = 1, ..., k − 1, we have M = k. In this setting, the unionized confidence set can be decomposed as follows.

k [ ∗ D(Y ) = (Ci(Y ) ∩ Θi ) i=1 M−1 k [ ∗ [ [ ∗ = { (Ci(Y ) ∩ Θi )} { (Ci(Y ) ∩ Θi )} (5.1) i=1 i=M k [ ∗ = (Ci(Y ) ∩ Θi ) (5.2) i=M k c c c [ [ c c c c = (Θ1Θ2 ∩ ... ∩ ΘM−1ΘM CM ) { (Θ1Θ2 ∩ ... ∩ ΘM ... Θi−1ΘiCi)}(5.3) i=M+1 c c c [ c c c ⊂ (Θ1Θ2 ∩ ... ∩ ΘM−1ΘM CM ) (Θ1Θ2 ∩ ... ∩ ΘM ) (5.4)

c c c ⊂ Θ1Θ2 ∩ ... ∩ ΘM−1 (5.5)

The equation (5.1) follows by the decomposition of the union over two different index

+ T − sets. Equation (5.2) is due to the fact that for all i < M, Ci ⊂ Ri so Ci Ri = ∅. The

∗ substitution of all the definition of Θi leads to equation (5.3). Equation (5.4) follows from the c c c Sk c c c c fact that the set Θ1Θ2∩...∩ΘM contains the set of i=M+1(Θ1Θ2∩...∩ΘM ... Θi−1ΘiCi(Y )). Finally, equation (5.5) is due to

c c c c c c (Θ1Θ2 ∩ ... ∩ ΘM−1ΘM CM ) ⊂ (Θ1Θ2 ∩ ... ∩ ΘM−1)

Now, noticing that M is actually a random variable depending on the observations and summarizing the above discussion yields 96

c \ \ c Pθ(θ ∈ Θ1 ··· ΘM−1) ≥ Pθ(θ ∈ D(Y )) (5.6)

= 1 − α.

Now, when M = k, on the basis of equation (5.6), we get

∗ ∗ P (Y : asserting µ ≥ max µj) = P (Y : asserting µ ≥ µj for all j in 1,...,k-1) j=1,...,k−1 ∗ ∗ ≥ P (Y : asserting µ > µ1, ..., µ > µk−1)

c c c = P (Y : asserting θ ∈ Θ1 ∩ Θ2 ∩ · · · ∩ Θk)

≥ 1 − α.

Theorem 5.1 guarantees that the overall coverage probability when making multiple com- parisons for k treatment regimens. By taking the drug oncology into consideration, the adjustment of multiplicity is exempted. The first approach in extending Theorem 5.1 is to generalize the result for the case to test the hypothesis on whether any pre-specified regimen is the best. In this prostate cancer study, we want to test whether (TEC, CVD) regimen is the best. On this regard, we obtain Theorem 5.2.

Theorem 5.2. Let θ = (θ1, θ2, ..., θk) be a vector mean effect of k treatments associated with a set of observation Y, let Di,j(Y) be (1 − α)100% one-sided confidence set for the mean

∗ difference θi −θj. To test the hypothesis that a pre-specified treatment j is the best among the

+ ∗ ∗ ∗ k treatments, if Dj∗,i ⊂ R for all i = 1, ..., j − 1, j + 1, ..., k, then j is the best treatment with strongly control familywise error rate at level α.

Proof. Now, for any j ∈ {1, ..., k}, the set of parameters for comparisons becomes {ηi =

µj − µi, i = 1, ..., j − 1, j + 1, ..., k} which contains k − 1 elements with confidence inter- 97 vals Dj,1, ..., Dj,(j−1),Dj,(j+1), ..., Dj,k for the k − 1 elements, respectively. Denote Θt =

c (−∞, 0], Θt = (0, ∞) for t = 1, ..., k − 1. Dt(Y ) = Dj,t(Y ) for t = 1, ..., j − 1 and

c Dt(Y ) = Dj,t+1(Y ) for t = j, ..., k − 1. Also, let Θk = Θ, Θk = ∅,Dk = (−∞, ∞), and

∗ c c c ∗ Θt = Θ1 ∩ Θ2 ∩ · · · ∩ Θt−1Θt for t = 2, ..., k − 1, k with Θ1 = Θ1. Similar to the proof of

∗ ∗ ∗ Theorem 5.1, Θ1, ··· , Θk−1, Θk constitute a partition of the parameter space Θ. Sk ∗ Notice that for any given integer j, let D(Y ) = i=1(Dt(Y ) ∩ Θt ), D is a 100(1 − α)%

∗ confidence set for θ. This is because if the parameter vector θ ∈ Θt for some t, then

∗ Pθ(θ ∈ D(Y )) = Pθ(θ ∈ Dt(Y ) ∩ Θt ) = Pθ(θ ∈ Dt(Y )) ≥ 1 − α. Starting from t = 1 and then screening up toward t = k in a step-by-step manner, let M

c c be the first integer satisfying Dt(Y ) ⊂ Θt for all t < M, and DM (Y ) * ΘM . For example, if

+ Dt ⊂ R for all i = 1,..., k-1, we have M = k. In this setting, the unionized confidence set can be decomposed as follows.

k [ ∗ D(Y ) = (Dt(Y ) ∩ Θt ) t=1 M−1 k [ ∗ [ [ ∗ = { (Dt(Y ) ∩ Θt )} { (Dt(Y ) ∩ Θt )} (5.7) t=1 t=M k [ ∗ = (Dt(Y ) ∩ Θt ) (5.8) t=M k c c c [ [ c c c c = (Θ1Θ2 ∩ ... ∩ ΘM−1ΘM DM ) { (Θ1Θ2 ∩ ... ∩ ΘM ... Θt−1ΘtDt)}(5.9) t=M+1 c c c [ c c c ⊂ (Θ1Θ2 ∩ ... ∩ ΘM−1ΘM DM ) (Θ1Θ2 ∩ ... ∩ ΘM ) (5.10)

c c c ⊂ Θ1Θ2 ∩ ... ∩ ΘM−1. (5.11)

The equation (5.7) follows by the decomposition of the union over two different index sets.

+ − Equation (5.8) is due to the fact that for all t < M, Dt ⊂ Rt so Dt ∩Rt = ∅. The substitu-

∗ tion of all the definition of Θt leads to equation (5.9). Equation (5.10) follows from the fact c c c Sk c c c c that the set Θ1Θ2 ∩ ... ∩ ΘM contains the set of t=M+1(Θ1Θ2 ∩ ... ∩ ΘM ... Θt−1ΘtDt(Y )). 98 Finally, equation (5.11) is due to

c c c c c c (Θ1Θ2 ∩ · · · ∩ ΘM−1ΘM DM ) ⊂ (Θ1Θ2 ∩ · · · ∩ ΘM−1).

Now, similar to the proof of Theorem 5.1, noticing that M is essentially a random variable depending on the observations, we have

c c Pθ(θ ∈ Θ1 ∩ · · · ∩ ΘM−1) ≥ Pθ(θ ∈ D(Y ))

= 1 − α. (5.12)

Note that when M = k, the equation (5.12) leads to

P (Y : asserting µj ≥ max µj) j=1,...,k

= P (Y : asserting µj ≥ µi for all i in 1,...,k)

≥ P (Y : asserting µj ≥ µ1, µj ≥ µ2, ..., µj ≥ µj−1, µj ≥ µj+1, ..., µj ≥ µk)

c c c = P (Y : asserting θ ∈ Θ1 ∩ Θ2 ∩ · · · ∩ Θk)

≥ 1 − α.

Notice that the selection of the best treatment regimen is tantamount to multiple compar- isons with the best. And the construction of simultaneous confidence intervals is equivalent to the problem of all-pairwise multiple comparisons in which the studentized-range test is normally applied. However, Theorem 5.1 and 5.2 are grounded on the assumption that ei- ther prior information or a specific conjecture on the best regimen is available. When such information is not available, the screening for the optimal regimen becomes a challenging

k task. The number of pairs is 2 for all-pairwise comparisons. Thus, multiplicity adjustment 99 for the comparisons of so many hypotheses makes conventional approaches too conservative. Theorem 5.3 unveils an approach that controls the familywise error rate and avoids the

k conservativeness of adjustment for all 2 pairs of hypotheses. By selecting an appropriate comparison strategy, we are able to gain the overall coverage at 1 − α while only adjusting the pairwise comparisons by the level of 1 − α/k.

Theorem 5.3. Let θ = (θ1, θ2, ..., θk) be a vector mean effect of k treatments associated with a set of observation Y, let Di,j(Y) be (1 − α/k)100% one-sided confidence set for the

mean difference θi − θj. To seek the best treatment among the k treatments, if there exists

∗ + ∗ ∗ ∗ a treatment j so that Dj∗,i ⊂ R for all i = 1, ..., j − 1, j + 1, ..., k,then j is the best treatment in terms of the endpoint Y with confidence level 1 − α.

Notice when the step-down searching algorithm is applied for integer j, the simultaneous confidence set is 1 − α. Now if each one of the integers in the set 1, ..., k is tested and the best treatment regimen is sought, the adjustment of multiplicity is needed to maintain the overall significance level at 1 − α. The application of Bonferroni-adjustment leads to the validity of Theorem 5.3. Although the selection procedure for Theorems 5.1 and 5.2 are self-evident, selecting process for Theorem 5.3 is relatively more complicated. For the ease of application, we describe a selection procedure based on Theorem 5.3 as follows. The procedure starts with examining each one of the treatment regimen as a control group. For example, examining the efficacy difference between the first treatment regimen and any one of the rest of the treatment regimens. If the first one can be claimed the best, examining the efficacy difference between the second one and any one of the rest of the treatment regimens. If the first one cannot be claimed the best, move on to examine the efficacy difference between the second one and any one of the rest of the treatments. And so on. In this way, the procedure screens all the possible treatment regimens that may serve as the optimal treatment regimen. Since each group of comparison has confidence level 1−α/k, by the Bonferroni adjustment, the overall coverage probability is 1 − α. 100 5.2.2 Analysis Results

Based on the oncology of the drug, when the experimenter is interested in the expert score dif- ference between the regimen (TEC,CVD) and the regimens starting with the drug CVD, the

∗ 95% one-sided confidence intervals for µ −µi for i = 1,2,3 are (0.0017, ∞), (0.0117, ∞), (0.0779, ∞), respectively. They are all subsets of R+ = (0, ∞). Thus, by Theorem 5.1, we assert that the regimen (TEC,CVD) is superior than any regimen starting with the drug CVD (namely the regimens (CVD, KA/VE), (CVD, TEC), (CVD, TEE)) at 95% confidence level. When the binary score and ordinal score are examined for the efficacy significance of the regimen (TEC,CVD) over the regimens starting with the drug CVD, the one-sided confidence

∗ intervals for µ − µi for i = 1,2,3 are (−0.2799, ∞), (0.0089, ∞), (0.1076, ∞), respectively for the binary score difference, and (−0.0989, ∞), (0.0065, ∞), (0.1496.∞), respectively for the ordinal score difference. In this case, we are unable to claim any significant efficacy difference between the regimen (TEC,CVD) and regimens starting with the drug CVD. This conclusion coincides with the result of analysis in Wang et al. (2011) regarding the comparisons of the three endpoints-binary score, ordinal score, and expert score. Theorem 5.3 is more powerful than the approach of all pairwise comparisons because it

k 12 reduces the number of comparisons from 2 = 2 to k = 12, however, due to variability of the data, the bootstrap-based confidence intervals are unable to identify the regimen with the best efficacy. More powerful methodology grounded on exact inference is under investigation. Next, for illustrative purposes only, the new procedures with normality and Wilcoxon Mann-Whitney test are applied in this prostate cancer study by using prostate cancer index only. Then, the response value is the difference of prostate specific antigen between two consecutive treatments. From SAS 9.2 program, the frequency Table is presented as in Figure 5.2. As from the frequency table, the sample size for each treatment regimes is very small. Since the form of response value is not discrete, the new procedure with Fisher’s exact test is not considered in this example. Now, we only consider the treatment regime TEC followed 101 by CVD and other three treatment regimes from CVD followed by TEE, KA/VE, TEE respectively. That is, the four treatment regimes here are (TEC,CVD), (CVD,KA/VE), (CVD,TEE), (CVD,TEC). For illustrative purposes only, first of all, we use the new procedure with normality to ana- lyze this prostate cancer study. Under the significance level 0.18, 82% confidence interval for the mean difference of µ∗−µi for i = 1, 2, 3 are (10.35561, ∞), (2.174858, ∞), (−11.90655, ∞), respectively. Not all the endpoints are greater than zero. So, we cannot assert that the regime (TEC,CVD) is superior than other three treatments at significance level α = 0.18. As shown on the QQ-plot for the PSA change in these four different treatment regimes in Figure 5.3, the normality of the assumption is barely satisfied. Thus, the new procedure with normality is not appropriate here. Therefore, the new procedure with Wilcoxon Mann-Whitney test is preferred in this example. Since the sample size is small for every treatment regime, if we choose the small significance level such as .05, we cannot make any conclusion. The 95% confidence interval for

∗ median difference M −Mi for i = 1, 2, 3 are (−12.9, ∞), (−19.1, ∞), (−19.4, ∞), respectively for the PSA difference. In this case, we are unable to claim any significant efficacy difference between the regimen (TEC,CVD) and regimens starting with the drug CVD at the significant level 0.05. To test at what confidence level the regimen (TEC,CVD) is the best, the larger significance level is chosen. Here, for illustration purposes, we consider the significance level 0.18. That is the confi-

∗ dence level of 0.82. The 82% confidence interval for median difference M −Mi for i = 1, 2, 3 are (8, ∞), (0.7, ∞), (1, ∞) respectively. The cutoff values 8, 0.7, 1 are all greater than 0. In other words, The confidence intervals are all subsets of R+ = (0, ∞). Thus, by Theo- rem 5.1, we conclude that the regimen (TEC,CVD) is better than the other three regimens (CVD,KA/VE), (CVD,TEC), (CVD,TEE) at 82% confidence level. The specified Theorem 5.3 is more powerful than the approach of all pairwise comparisons

4 because it reduces the number of comparisons from 2 = 6 to 4 in this situation. However, 102 due to variability of the data which is shown on Figure 5.4, the new procedure with Wilcoxon Mann-Whitney test is not able to identify the regimen with the best efficacy by screening. trt1 to trt12 stand for (CVD,KA/VE), (CVD,TEC), (CVD,TEE), (KA/VE, CVD), (KA/VE,TEC), (KA/VE,TEE), (TEC,CVD), (TEC,TEE), (TEE,CVD), (TEE,KA/VE), (TEE,TEC). Moreover, the sample size is small here. That is another reason that we cannot distinguish the best efficacy by screening as in Theorem 5.3. 103 Figure 5.1: Flow Chart for the Prostate Cancer Study

Subject

Treatment A

PSA drop 40% PSA drops >40% ≤

More doses Treatment B

PSA drops> PSA PSA drops 80% PSA drop drops >80% 40% ≤ 40% ≤

Primary interests in Prostate Cancer Study 104 Figure 5.2: Frequency Table for the Prostate Cancer Data

The SAS System 21:18 Monday, January 28, 2013 1

The FREQ Procedure

F re q u e n c y Table oftr1 by tr2 P e rc e n t R o w P c t tr2 C o l P c t tr1 CVD KA/VE TEC TEE T o ta l CVD 0 1 0 7 7 2 4 0 .0 0 1 3 .3 3 9 .3 3 9 .3 3 3 2 .0 0 0 .0 0 4 1 .6 7 2 9 .1 7 2 9 .1 7 0 .0 0 5 5 .5 6 3 3 .3 3 3 5 .0 0 KA/VE 7 0 8 6 2 1 9 .3 3 0 .0 0 1 0 .6 7 8 .0 0 2 8 .0 0 3 3 .3 3 0 .0 0 3 8 .1 0 2 8 .5 7 4 3 .7 5 0 .0 0 3 8 .1 0 3 0 .0 0 TEC 5 4 0 7 1 6 6 .6 7 5 .3 3 0 .0 0 9 .3 3 2 1 .3 3 3 1 .2 5 2 5 .0 0 0 .0 0 4 3 .7 5 3 1 .2 5 2 2 .2 2 0 .0 0 3 5 .0 0 TEE 4 4 6 0 1 4 5 .3 3 5 .3 3 8 .0 0 0 .0 0 1 8 .6 7 2 8 .5 7 2 8 .5 7 4 2 .8 6 0 .0 0 2 5 .0 0 2 2 .2 2 2 8 .5 7 0 .0 0 T o ta l 1 6 1 8 2 1 2 0 7 5 2 1 .3 3 2 4 .0 0 2 8 .0 0 2 6 .6 7 1 0 0 .0 0 105

Figure 5.3: QQ-plot for Treatments

Normal Q−Q Plot for (CVD,KA/VE) Normal Q−Q Plot for (CVD,TEC) Sample Quantiles Sample Quantiles −400 −200 0 −300 −200 −100 0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 −1.0 −0.5 0.0 0.5 1.0

Theoretical Quantiles Theoretical Quantiles

Normal Q−Q Plot for (CVD,TEE) Normal Q−Q Plot for (TEC,CVD) Sample Quantiles Sample Quantiles −100 −60 −20 0 −100 −60 −20 0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

Theoretical Quantiles Theoretical Quantiles 106 Figure 5.4: Median and Standard Deviation for 12 Treatments

The SAS System 21:41 Tuesday, January 29, 2013 1

The MEANS Procedure

Analysis Variable : effect N trt O b s M e d ia n S td D e v trt1 1 0 -36.0000000 101.8249920 trt1 0 4 -4 .1 5 0 0 0 0 0 115.1850215 trt1 1 4 1 .5 5 0 0 0 0 0 1 8 .6 0 8 6 8 9 7 trt1 2 6 -4 .0 0 0 0 0 0 0 343.5024017 trt2 7 -70.0000000 158.1141194 trt3 7 -20.5000000 4 0 .9 4 3 6 1 4 0 trt4 7 1 .0 0 0 0 0 0 0 2 7 .8 9 5 5 4 5 0 trt5 8 -0 .5 0 0 0 0 0 0 2 3 .5 9 1 7 5 1 0 trt6 6 7 .9 0 0 0 0 0 0 3 8 .5 9 8 0 1 8 1 trt7 5 0 .4 0 0 0 0 0 0 4 9 .5 8 4 0 7 0 0 trt8 4 9 .5 0 0 0 0 0 0 2 2 .3 0 2 9 7 0 7 trt9 7 0 .6 0 0 0 0 0 0 9 5 .7 8 9 1 4 0 9 107

BIBLIOGRAPHY

[1] Agresti, A. (2012). Categorical data analysis. John Wiley & Sons.

[2] Baade, P. D., Youlden, D. R. and Krnjacki, L. J. (2009). International epidemiology of prostate cancer: geographical distribution and secular trends. Molecular Nutrition & Food Research, 53(2), 171-184.

[3] Chen, J.T. (2008a). Inference on the minimum effective dose using binary data. Com- munications in Statistics-Theory and Methods, 37, 2124-2135.

[4] Chen, J. T. (2008b). A two-stage stepwise estimation procedure. Biometrics, 64(2), 406-412.

[5] Dohle, C. P¨ullen,J., Nakaten, A. etc (2009). Mirror therapy promotes recovery from severe hemiparesis: a randomized controlled trial. The American Society of Neuroreha- bilitation, 23(3), 209-217.

[6] Cohen, A., Sackrowitz, H. B. (2005). Decision theory results for one-sided multiple comparison procedures. The Annals of Statistics, 33(1), 126-144.

[7] Dmitrienko, A., Tamhane, A. and Bretz, F. (2009). Multiple testing problems in phar- maceutical statistics. Chapman and Hall/CRC Press , Boca Raton, Florida.

[8] Dunnett, C. W. (1955). A multiple comparison procedure for comparing several treat- ments with a control. Journal of the American Statistical Association, 50(272), 1096- 1121. 108 [9] Edwards, D. G., Hsu, J. C. (1983). Multiple comparisons with the best treatment. Journal of the American Statistical Association, 78(384), 965-971.

[10] Fay, M. P. (2010). Confidence intervals that match Fisher’s exact or Blaker’s exact tests. , 11(2), 373-374.

[11] Finne, H., Strassburger, K. (2002). The partitioning principle: a powerful tool in mul- tiple decision theory. The Annals of Statistics, 30, 1194-1213.

[12] Fujikoshi, Y., Ulyanov, V.V. and Shimizu, R. (2010). Multivariate statistics. John Wiley & Sons.

[13] Gabriel, K. R. (1978). A simple method of multiple comparisons of means. Journal of the American Statistical Association, 73(364), 724-729.

[14] Genizi, A., Hochberg, Y. (1978). On improved extensions of the T-method of multiple comparisons for unbalanced designs. Journal of the American Statistical Association, 73(364), 879-884.

[15] Glantz, A.S. (2001). Primer of biostatistics. McGraw-Hill/Appleton & Lange.

[16] Hollander, M., Wolfe, D.A. (1999). Nonparametric statistical methods. John Wiley& Sons, New York.

[17] Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6 (2), 65-70.

[18] Hothorn, L. A. (2007). How to deal with multiple treatment or dose groups in random- ized clinical trials. Fundamental & Clinical Pharmacology, 21(2), 137-154.

[19] Hsu, J.C., Berger, R.L. (1999). Stepwise confidence intervals without multiplicity ad- justment for dose-response and toxicity studies. Journal of the American Statistical Association, 94, 468-482. 109 [20] Hsu, J.C. (1996). Multiple comparisons: Theory and methods. Chapman and Hall/CRC.

[21] Klockars, J.A., Sax, G. (1986). Multiple comparisons. SAGE.

[22] Larry, E.T. (1992). Multiple comparison procedures. SAGE.

[23] Lehmann, E. L. (1963a). Robust estimation in analysis of variance. The Annals of Mathematical Statistics, 34(3), 957-966.

[24] Lehmann, E. L. (1963b). Asymptotically nonparametric inference: An alternative ap- proach to linear models. The Annals of Mathematical Statistics, 1494-1506.

[25] Lehmann, E. L. (1963c). Nonparametric confidence intervals for a shift parameter. The Annals of Mathematical Statistics, 1507-1512.

[26] Lin, Y., Wang, L., Chen, J.T. (2013). A simultaneous approach detecting efficacy of treatment regimes. Submitted.

[27] Mann, H., Whitney, R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50- 60.

[28] Marcus, R., Eric, P., and Gabriel, K.R. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63, 655-660

[29] Moses, L.E. (1952). A two-sample test. Psychometrika, 17, 234-247.

[30] Parra, M., Rodriguez-Laoiza, P., Namur, S. (2003). Application of the repeated mea- sures (split-plot) design to the development of processes for drug substances. Quality Engineering, 16(2), 321-328.

[31] Roy, S. N., Bose, R. C. (1953). Simultaneous confidence interval estimation. The Annals of Mathematical Statistics, 24(4), 513-536. 110 [32] Shine, R., Madsen, T. R., Elphick, M. J. and Harlow, P. S. (1997). The influence of nest temperatures and maternal brooding on hatchling phenotypes in water pythons. Ecology, 78(6), 1713-1721.

[33] Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance com- ponents. Biometrics Bulletin, 2(6), 110-114.

[34] Sawilowsky, S. (2002). Fermat, Schubert, Einstein, and Behrens-Fisher: The probable

2 2 difference between two means when σ1 6= σ2. Journal of Modern Applied Statistical Methods, 1(2), 461472.

[35] Stefansson, G., Kim, W.C. and Hsu, J.C. (1988). On confidence sets in multiple com- parisons. Statistical Decision Theory and Related Topics, Iv, 2, 89-104.

[36] Stoline, M. R., Ury, H. K. (1979). Tables of the studentized maximum modulus distri- bution and an application to multiple comparisons among means. Technometrics, 21(1), 87-93.

[37] Tamhane, A. C., Logan, B. R. (2002). Multiple test procedures for identifying the min- imum effective and maximum safe doses of a drug. Journal of the American Statistical Association, 97(457), 293-301.

[38] Thall, P. F., Logothetis, C., Pagliaro, L. C., Wen, S., Brown, M. A., Williams, D. and Millikan, R. E. (2007). Adaptive therapy for androgen-independent prostate cancer: A randomized selection trial of four regimens. Journal of the National Cancer Institute, 99(21), 1613-1622.

[39] Tukey, J.W.(1949). The simplest signed-rank tests. Memo Rept. 17, Statistical Research Group, Princeton University.

[40] Ury, H. K. (1976). Comparison of four procedures for multiple comparisons among means (pairwise contrasts) for arbitrary sample sizes. Technometrics, 18(1), 89-97. 111 [41] Wang, L., Rotnitzky, A, Lin, X, Millikan, R., Thall, P (2011). Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. Journal of the American Statistical Association, 107(498), 493-508.

[42] Welch, B.L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29, 350-362.

[43] Welch, B. L. (1947). The generalization of ‘student’s’ problem when several different population variances are involved. Biometrika, 28-35. 112

APPENDIX SELECTED R AND SAS PROGRAMS

.1 Simulation for the Procedures with Fisher’s Exact

Test

.1.1 Pre-specified the Best Treatment library(exact2x2) ##################################### ##Function that pre-specified ##treatment is the first one ##################################### terror5<-function(n,p,alpha){ bt <- rep(NA,asim) for (k in 1:asim) { lp <- length(p) x <- rbinom(lp,n,p) y <- n-x data <-cbind(x,y)#combine as contigency tables 113 rank<-seq(1,lp) # control the subscripts on j M <- rep(NA,lp-1) rank2<-rank[-1] for (j in rank2){

###if the odds ratio greater 1 then M[j-1]=0 continue steps. ###Otherwise stop and set M[j-1]=1. if(fisher.exact(data[c(1,j),],alternative="greater", conf.level = 1-alpha)$estimate[[1]]>1) {M[j-1]=0;next} else {M[j-1]=1;break} } ###if treatment one is better than all others set bt[k]=1 ###otherwise bt[k]=0 if (all (M[1:lp-1]==0)) {bt[k]=1} else {bt[k]=0} } mean(bt) }

################################### n=90 #sample size asim=10000 #number of iterations p <- c(.53,.1,.2,.3,.4) #successful probability alpha<-0.05 #the alpha level terror5(n,p,alpha)

################################### 114 n=95 #sample size asim=10000 #number of iterations p <- c(.53,.1,.2,.3,.4) #successful probability alpha<-0.05 #the alpha level terror5(n,p,alpha) ################################### n=100 #sample size asim=10000 #number of iterations p <- c(.53,.1,.2,.3,.4) #successful probability alpha<-0.05 #the alpha level terror5(n,p,alpha)

################################### n=105 #sample size asim=10000 #number of iterations p <- c(.53,.1,.2,.3,.4) #successful probability alpha<-0.05 #the alpha level terror5(n,p,alpha)

#################################### #best treatment is in different position ################################### terror<-function(trt,n,p,alpha){ bt <- rep(NA,asim) for (k in 1:asim) 115 { lp <- length(p) x <- rbinom(lp,n,p) y <- n-x data <-cbind(x,y) rank<-seq(1,lp) # control the subscripts on j M <- diag(lp) rank2<-rank[-trt] for (j in rank2){ ###if the odds ratio greater 1 then M[j-1]=0 continue steps. ###Otherwise stop and set M[j-1]=1. if(fisher.exact(data[c(trt,j),],alternative="greater", conf.level = 1-alpha)$estimate[[1]]>1) {M[j,trt]=1;next} else {M[j,trt]=3;break} } ###if treatment one is better than all others set bt[k]=1 ###otherwise bt[k]=0 if (all (M[,trt]==1)) {bt[k]=1} else {bt[k]=0} } mean(bt) }

################################### trt <-1 #best treatment is 1st n <- 90 #sample size asim <- 10000 #number of iterations 116 p <- c(.53,.2,.3,.4) #successful probability alpha<-0.05 #the alpha level terror(trt,n,p,alpha)

################################### trt <-2 #best treatment is 2nd n <- 90 #sample size asim <- 10000 #number of iterations p <- c(.2,.53,.3,.4) #successful probability alpha<-0.05 #the alpha level terror(trt,n,p,alpha)

################################### trt <-3 #best treatment is 3rd n <- 90 #sample size asim <- 10000 #number of iterations p <- c(.2,.3,.53,.4) #successful probability alpha<-0.05 #the alpha level terror(trt,n,p,alpha) ################################### trt <-4 #best treatment is 4nd n <- 90 #sample size asim <- 10000 #number of iterations p <- c(.2,.3,.4,.53) #successful probability alpha<-0.05 #the alpha level terror(trt,n,p,alpha) 117 .1.2 Select the Best Treatment from Unknown library(exact2x2) ################################## ##When best treatment is unknown, ##select it out with new procedure ################################## terror2<-function(n,p,alpha){ lp <- length(p) ind <-matrix(0,lp,asim) ###matrix to control the value for (k in 1:asim) { x <- rbinom(lp,n,p) ###generate the random data from Binomial distribution y <- n-x data <-cbind(x,y) rank<-seq(1,lp) ### control the subscripts on j M <- diag(lp) for (i in 1:lp){ rank2<-rank[-i] for (j in rank2){ ###the odd ratio for the Fisher Exact Test, if it is greater ### than 1 then M[j,i]=1 and continue steps ###Ohterwise M[j,i]=3 and stop the treatment i is not ###the best one (not reject the null hypothesis) ### if(fisher.exact(data[c(i,j),],alternative="greater", conf.level = 1-alpha/lp)$estimate[[1]]>1) {M[j,i]=1;next} else {M[j,i]=3;break} } 118

###if treatment i is better than all others ### set ind[i,k]=1 and break ###otherwise bt[k]=0 and continue to next stage if (all (M[,i]==1)) {ind[i,k]=1;break} else {ind[i,k]=0;next} } } ### get the best treatment btreat<-match(max(rowSums(ind)),rowSums(ind))

###return the best treatment number and coverage probability c(btreat,mean(ind[btreat,])) }

############linear shape################### n=100 #sample size asim=10000 #number of iterations p <- c(.53,.4,.35,.3,.2,.1) #successful probability alpha<-0.05 #the alpha level terror2(n,p,alpha) ############U shape#################### n=100 #sample size asim=10000 #number of iterations alpha<-0.05 #the alpha level p <- c(.7,.45,.3,.12,.32,.58) #successful probability terror2(n,p,alpha) 119 ############Invert U shape################ n=100 #sample size asim=10000 #number of iterations alpha<-0.05 #the alpha level p <- c(.1,.2,.3,.51,.63,.4) #successful probability terror2(n,p,alpha) ###########logarithm ######################## n=100 #sample size asim=10000 #number of iterations alpha<-0.05 #the alpha level p <- c(0,0.301,0.477,0.602,0.7,0.81) #successful probability terror2(n,p,alpha)

.2 Simulation for the Procedures with Wilcoxon Mann-

Whitney Test

.2.1 Pre-specified the Best Treatment library(exactRankTests) library(MASS) ########################################### ###First treatment is the pre-specified one ########################################### coverage1<-function(n,p,alpha){ lp <- length(mu) ###number of treatment sigma <- diag(lp) ###variance matrix for data rank<-seq(1,lp) ###control the subscripts on j 120 rank2<-rank[-1] bt <- rep(NA,asim) for (k in 1:asim){ x1 <- mvrnorm(n,mu,sigma) M <- rep(NA,lp-1) for (j in rank2){ ###if the treatment one is better than j, ###set M[j-1]=0 and continue to next step ###otherwise set M[j-1]=1 and break if (wilcox.exact(x1[,1],x1[,j],alternative = "greater", conf.int=T, conf.level =1-alpha)$conf.int[[1]]>0) {M[j-1]=0;next} else {M[j-1]=1;break} } ###if treatment trt is better than all others set bt[k]=1 ### ###otherwise bt[k]=0 ### if (all (M[1:lp-1]==0)) {bt[k]=1} else {bt[k]=0} } mean(bt) }

################### ##sample size ################### asim <-10000 #number of iterations mu <-c(3.9,3,2,1) alpha <- 0.05 121 n <-30 coverage1(n,mu,alpha)

################### asim <-10000 #number of iterations mu <-c(3.9,3,2,1) alpha <- 0.05 n <-32 coverage1(n,mu,alpha)

################### asim <-10000 #number of iterations mu <-c(3.9,3,2,1) alpha <- 0.05 n <-34 coverage1(n,mu,alpha)

################### asim <-10000 #number of iterations mu <-c(3.9,3,2,1) alpha <- 0.05 n <-35 coverage1(n,mu,alpha)

######################### ######################### ######################### 122

######################################## ###The pre-specified one in different location ######################################## library(exactRankTests) library(MASS) coveraget<-function(n,mu,trt,alpha){ lp <- length(mu) bt <- rep(NA,asim) for (k in 1:asim) { rank<-seq(1,lp) # control the subscripts on j M <- diag(lp) sigma <- diag(lp) x1 <- mvrnorm(n,mu,sigma) rank2<-rank[-trt] for (j in rank2){ ###if the treatment trt is better than j, ### set M[i-1,k]=0 and continue to next step ###otherwise break and set M[j,trt]=3 if (wilcox.exact(x1[,trt],x1[,j],alternative = "greater", conf.int=T, conf.level =1-alpha)$conf.int[[1]]>0) {M[j,trt]=1;next} else {M[j,trt]=3;break} }

###if treatment trt is better than all others set bt[k]=1 ### ###otherwise bt[k]=0 ### 123 if (all (M[,trt]==1)) {bt[k]=1} else {bt[k]=0} } mean(bt) }

################### asim <-10000 #number of iterations mu <-c(3,3.9,2,1) trt <-2 alpha <- 0.05 n <-30 coveraget(n,mu,trt,alpha)

################### asim <-10000 #number of iterations mu <-c(3,2,3.9,1) trt <-3 alpha <- 0.05 n <-30 coveraget(n,mu,trt,alpha)

################### asim <-10000 #number of iterations mu <-c(3,2,1,3.9) trt <-4 alpha <- 0.05 124 n <-30 coveraget(n,mu,trt,alpha)

.2.2 Select the Best Treatment from Unknown library(exactRankTests) library(MASS) ####################################### ###Function with unknown best treatment ####################################### coverage<-function(n,mu,v,alpha){ lp <- length(mu) ind <-matrix(0,lp,asim) for (k in 1:asim) { rank<-seq(1,lp) # control the subscripts on j M <- diag(lp) sigma <- diag(lp)*v x1 <- mvrnorm(n,mu,sigma) for (i in 1:lp){ rank2<-rank[-i] for (j in rank2){ ###if the treatment i is better than j, ###set M[j,i]=1 and continue to next step ###otherwise stop and set M[j,i]=3 if (wilcox.exact(x1[,i],x1[,j],alternative = "greater", 125 conf.int=T, conf.level =1-alpha/lp)$conf.int[[1]]>0) {M[j,i]=1;next} else {M[j,i]=3;break} }

###if treatment i is better than all others ###set ind[i,k]=1 and stop ###otherwise set ind[i,k]=0 and continue to next stage if (all (M[,i]==1)) {ind[i,k]=1;break} else {ind[i,k]=0;next} } } #the best treatment btreat<-match(max(rowSums(ind)),rowSums(ind))

###return the best treatment number and coverage probability c(btreat,mean(ind[btreat,])) }

###################Linear shape#### asim <-10000 #number of iterations v <- 1 # variance mu <-c(5.1,4,3,2,1) alpha <- 0.05 n <-30 coverage(n,mu,v,alpha) ###################U-shape######## asim <-10000 #number of iterations 126 v <- 1 # variance mu <-c(7.1,3.5,2,4,6) alpha <- 0.05 n <-30 coverage(n,mu,v,alpha)

##############Inverted U-shape######## asim <-10000 #number of iterations v <- 1 # variance mu <-c(2,3.5,7.1,4,6) alpha <- 0.05 n <-30 coverage(n,mu,v,alpha)

###########Logarithmic################# asim <-10000 #number of iterations v <- 1 # variance mu <-c(1.10,2.05,2.64,3.1,4.2) alpha <- 0.05 n <-30 coverage(n,mu,v,alpha)

.3 Simulation for the Procedures with Normality

.3.1 Pre-specified the Best Treatment

library(exactRankTests) library(MASS) 127 coverage_n <- function(n,mu,trt,v,alpha){ bt <- rep(NA,asim) lp <- length(mu) ###number of treatment M <- diag(lp) sigma <- diag(lp)*v ###variance matrix for data for (k in 1:asim){ rank<-seq(1,lp) # control the subscripts on j x1 <- mvrnorm(n,mu,sigma) rank2<-rank[-trt] for (j in rank2){ ###if the treatment trt is better than j, ###set M[i-1,k]=0 and continue to next step ###otherwise break and set M[j,trt]=3 if (mean(x1[,trt])-mean(x1[,j]) -qt(1-alpha,2*n-2)*v*sqrt(2/n)>0){M[j,trt]=1;next} else {M[j,trt]=3;break} }

###if treatment trt is better than all others ### set bt[k]=1 ###otherwise bt[k]=0 if (all (M[,trt]==1)) {bt[k]=1} else {bt[k]=0} } mean(bt) } 128 ##################### #Different position ###################### ################## asim <-10000 #number of iterations mu <-c(5.9,5,4.9,3,2) trt <- 1 alpha <- 0.05 n <-30 v <-1 coverage_n(n,mu,trt,v,alpha)

################### asim <-10000 #number of iterations mu <-c(5,5.9,4.9,3,2) trt <- 2 alpha <- 0.05 n <-30 v <-1 coverage_n(n,mu,trt,v,alpha)

################### asim <-10000 #number of iterations mu <-c(5,4.9,5.9,3,2) trt <- 3 alpha <- 0.05 n <-30 129 v <-1 coverage_n(n,mu,trt,v,alpha)

################### asim <-10000 #number of iterations mu <-c(5,4.9,3,5.9,2) trt <- 4 alpha <- 0.05 n <-30 v <-1 coverage_n(n,mu,trt,v,alpha) ####################

asim <-10000 #number of iterations mu <-c(5,4.9,3,2,5.9) trt <- 5 alpha <- 0.05 n <-30 v <-1 coverage_n(n,mu,trt,v,alpha)

############################ ##Sample size ############################

asim <-10000 #number of iterations mu <-c(5.9,5,4.9,3,2) 130 trt <- 1 alpha <- 0.05 n <-32 v <-1 coverage_n(n,mu,trt,v,alpha)

############################ asim <-10000 #number of iterations mu <-c(5.9,5,4.9,3,2) trt <- 1 alpha <- 0.05 n <-34 v <-1 coverage_n(n,mu,trt,v,alpha)

############################ asim <-10000 #number of iterations mu <-c(5.9,5,4.9,3,2) trt <- 1 alpha <- 0.05 n <-35 v <-1 coverage_n(n,mu,trt,v,alpha)

################################# 131 ###Different Variance #################################

################################ asim <-10000 #number of iterations mu <-c(5.9,5,4.9,3,2) trt <- 1 alpha <- 0.05 n <-85 v <-2 coverage_n(n,mu,trt,v,alpha)

################################ asim <-10000 #number of iterations mu <-c(5.9,5,4.9,3,2) trt <- 1 alpha <- 0.05 n <-250 v <-4 coverage_n(n,mu,trt,v,alpha)

################################ asim <-10000 #number of iterations mu <-c(5.9,5,4.9,3,2) trt <- 1 alpha <- 0.05 n <-800 132 v <-8 coverage_n(n,mu,trt,v,alpha)

.3.2 Select the Best Treatment from Unknown library(MASS) coverage_n2 <- function(n,mu,v,alpha){ lp <- length(mu) ###number of treatment ind <-matrix(0,lp,asim) for (k in 1:asim){ M <- diag(lp) sigma <- diag(lp)*v ###variance matrix for data rank<-seq(1,lp) ### control the subscripts on j x1 <- mvrnorm(n,mu,sigma) for (i in 1:lp){ rank2<-rank[-i] for (j in rank2){ ###if the treatment i is better than j, ### set M[i-1,k]=0 and continue to next step ###otherwise break and set M[j,trt]=3 if (mean(x1[,i])-mean(x1[,j])- qt(1-alpha/lp,2*n-2)*v*sqrt(2/n)>0) {M[j,i]=1;next} else {M[j,i]=3;break} }

###if treatment i is better than all others ### set ind[i,k]=1 and stop 133 ###otherwise set ind[i,k]=0 and go to next stage if (all (M[,i]==1)) {ind[i,k]=1;break} else {ind[i,k]=0;next} } } #the best treatment btreat<-match(max(rowSums(ind)),rowSums(ind))

###return the coverage probability and the best treatment number c(btreat,mean(ind[btreat,])) }

######Linear shape#### asim <-10000 #number of iterations v <- 1 # variance mu <-c(5.1,4,3,2,1) alpha <- 0.05 n <-30 coverage_n2(n,mu,v,alpha) ######U-shape######## asim <-10000 #number of iterations v <- 1 # variance mu <-c(7.05,3.51,2.78,4.23,6.00) alpha <- 0.05 n <-30 coverage_n2(n,mu,v,alpha) #####Inverted U-shape###### 134 asim <-10000 #number of iterations v <- 1 # variance mu <-c(2.78,3.51,7.05,4.23,6.00) alpha <- 0.05 n <-30 coverage_n2(n,mu,v,alpha) ####Logarithmic########### asim <-10000 #number of iterations v <- 1 # variance mu <-c(1.10,2.05,2.64,3.1,4.15) alpha <- 0.05 n <-30 coverage_n2(n,mu,v,alpha)

.4 Applications in a Prostate Cancer Study

.4.1 Using SAS to Deal with Original Dataset

*************************************************** Import original data ****************************************************; PROC IMPORT OUT= WORK.cancer DATAFILE= "e:\prostate cancer original data.txt" DBMS=TAB REPLACE; GETNAMES=YES; DATAROW=2; RUN; 135

********************************************************* KEEP DATA WITH TWO DIFFERENT TREATMENTS IN ONE REGIME *********************************************************; data pc (keep=binaryscore tr1 tr2 PSA08 PSA16); set cancer; if tr2 ^=’’; run;

******************************************************** GET FREQ TABLE IN PDF FILE ********************************************************; ods pdf; proc freq data =pc; table tr1*tr2; run; ods pdf close; ******************************************************** ASSIGN CATEGORY *******************************************************; data pc_trt(keep = trt effect); length trt $5; set pc; /*if (PSA16-PSA08)/PSA08 >=.8 then delete; */ if upcase(tr1)=’CVD’ and upcase(tr2) =’KA/VE’ then trt=’trt1’; if upcase(tr1)=’CVD’ and upcase(tr2) =’TEC’ then trt=’trt2’; if upcase(tr1)=’CVD’ and upcase(tr2) =’TEE’ then trt=’trt3’; 136 if upcase(tr1)=’KA/VE’ and upcase(tr2) =’CVD’ then trt=’trt4’; if upcase(tr1)=’KA/VE’ and upcase(tr2) =’TEC’ then trt=’trt5’; if upcase(tr1)=’KA/VE’ and upcase(tr2) =’TEE’ then trt=’trt6’; if upcase(tr1)=’TEC’ and upcase(tr2) =’CVD’ then trt=’trt7’; if upcase(tr1)=’TEC’ and upcase(tr2) =’KA/VE’ then trt=’trt8’; if upcase(tr1)=’TEC’ and upcase(tr2) =’TEE’ then trt=’trt9’; if upcase(tr1)=’TEE’ and upcase(tr2) =’CVD’ then trt=’trt10’; if upcase(tr1)=’TEE’ and upcase(tr2) =’KA/VE’ then trt=’trt11’; if upcase(tr1)=’TEE’ and upcase(tr2) =’TEC’ then trt=’trt12’; effect = PSA16-PSA08; proc sort data =pc_trt; by trt; run; ********************************************************* QQPLOT FOR TREATMENT WILL USE R CODE TO DO THE SAME PLOT *********************************************************; ods pdf file=’e:\qqplot.pdf’; proc univariate data=pc_trt noprint; class trt; qqplot effect; run; ods pdf close; ******************************************************** MEDIAN AND STD FOR 12 TREATMENTS ********************************************************; ods pdf file=’e:\std.pdf’; 137 proc means data =pc_trt median std; class trt; run; ods pdf close; ******************************************************* EXPORT DATA FOR FUTURE USE (R CODE USE) ********************************************************; proc export data=pc_trt outfile=’e:\trt1.csv’ dbms=csv replace; run;

.4.2 Using the New Procedure with Wilcoxon Mann-Whitney Test

######################## #read data ######################## data <- read.csv("e:/trt1.csv", h=T) x1=data[which(data$trt==’trt1’),] x2=data[which(data$trt==’trt2’),] x3=data[which(data$trt==’trt3’),] x4=data[which(data$trt==’trt4’),] x5=data[which(data$trt==’trt5’),] x6=data[which(data$trt==’trt6’),] x7=data[which(data$trt==’trt7’),] x8=data[which(data$trt==’trt8’),] x9=data[which(data$trt==’trt9’),] 138 x10=data[which(data$trt==’trt10’),] x11=data[which(data$trt==’trt11’),] x12=data[which(data$trt==’trt12’),]

####################################### ##function to combine different length ##into a matrix ####################################### cbind.fill<-function(...){ nm <- list(...) nm<-lapply(nm, as.matrix) n <- max(sapply(nm, nrow)) do.call(cbind, lapply(nm, function (x) rbind(x, matrix(, n-nrow(x), ncol(x))))) } ######################################### ##combine 4 treatments ######################################### x <-cbind.fill(x1[,2],x2[,2],x3[,2],x7[,2])

########################################## ##function with the variable alpha to get the select ##best treatment ’trt’ ######################################### library(exactRankTests) coverage<-function(alpha,trt){ 139 lp <- length(x[1,]) rank<-seq(1,lp) # control the subscripts on j M <- diag(lp) rank2<-rank[-trt] y1 <-rep(NA,10) y1 <-x[,trt] z1 <-y1[!is.na(y1)] for (j in rank2){ y2 <-rep(NA,10) y2 <-x[,j] z2 <-y2[!is.na(y2)] ####M[j,trt]=3 means the treatment trt is ###not the best one (not reject the null hypothesis) if (wilcox.exact(z1,z2,alternative = "greater", conf.int=T, conf.level =1-alpha)$conf.int[[1]]>0) {M[j,trt]=1;next} else {M[j,trt]=3;break} }

################################################# ###Print out whether treatment trt is the best at level alpha or not ################################################# if (all (M[,trt]==1)) {print(c(trt,"is the best treatment"))} else {print(c(trt,"is not the best at",alpha))} }

######### ##alpha## 140 alpha <-.18 trt <-4 coverage(alpha,trt)

################################ ###QQ-PLOT ################################ par(mfrow=c(2,2)) qqnorm(x1[,2],main = "Normal Q-Q Plot for (CVD,KA/VE)") qqnorm(x2[,2],main = "Normal Q-Q Plot for (CVD,TEC)") qqnorm(x3[,2],main = "Normal Q-Q Plot for (CVD,TEE)") qqnorm(x7[,2],main = "Normal Q-Q Plot for (TEC,CVD)")

##################################### ## of 95% alpha <- 0.05 wilcox.exact(x7[,2],x1[,2],alternative = "greater", conf.int=T, conf.level =1-alpha) wilcox.exact(x7[,2],x2[,2],alternative = "greater", conf.int=T, conf.level =1-alpha) wilcox.exact(x7[,2],x3[,2],alternative = "greater", conf.int=T, conf.level =1-alpha)

##################################### ##confidence interval of 82% alpha <- 0.18 wilcox.exact(x7[,2],x1[,2],alternative = "greater", conf.int=T, conf.level =1-alpha) wilcox.exact(x7[,2],x2[,2],alternative = "greater", conf.int=T, conf.level =1-alpha) 141 wilcox.exact(x7[,2],x3[,2],alternative = "greater", conf.int=T, conf.level =1-alpha)

.4.3 Using the New Procedure under Nomality

######################## #read data ######################## data <- read.csv("e:/trt1.csv", h=T) x1=data[which(data$trt==’trt1’),] x2=data[which(data$trt==’trt2’),] x3=data[which(data$trt==’trt3’),] x4=data[which(data$trt==’trt4’),] x5=data[which(data$trt==’trt5’),] x6=data[which(data$trt==’trt6’),] x7=data[which(data$trt==’trt7’),] x8=data[which(data$trt==’trt8’),] x9=data[which(data$trt==’trt9’),] x10=data[which(data$trt==’trt10’),] x11=data[which(data$trt==’trt11’),] x12=data[which(data$trt==’trt12’),]

####################################### ##function to combine different length ##into a matrix ####################################### cbind.fill<-function(...){ nm <- list(...) 142 nm<-lapply(nm, as.matrix) n <- max(sapply(nm, nrow)) do.call(cbind, lapply(nm, function (x) rbind(x, matrix(, n-nrow(x), ncol(x))))) } ########################################## ##combine 4 treatments ########################################## x <-cbind.fill(x1[,2],x2[,2],x3[,2],x7[,2])

########################################### ##function with the variable alpha to get the select ##best treatment ’trt’ ########################################### library(MASS) #function with treatments coverage<-function(alpha,trt){ lp <- length(x[1,]) rank<-seq(1,lp) # control the subscripts on j M <- diag(lp) rank2<-rank[-trt] y1 <-rep(NA,10) y1 <-x[,trt] z1 <-y1[!is.na(y1)] n <-length(z1) for (j in rank2){ y2 <-rep(NA,10) 143 y2 <-x[,j] z2 <-y2[!is.na(y2)] m <- length(z2) ###if the treatment trt is better than j, ### set M[j,trt]=1 and continue to next step ###otherwise set M[j,trt]=3 and break if ((mean(z1)-mean(z2))-qt(1-alpha,m+n-2)*sd(c(z1,z2))*sqrt(1/n+1/m)>0) {M[j,trt]=1;next} else {M[j,trt]=3;break} }

###print out whether treatment trt is the best or not ### if (all (M[,trt]==1)) {print(c(trt,"is the best treatment"))} else {print(c(trt,"is not the best at",alpha))} }

################################################ ##confidence interval at 0.18 significant level alpha <- 0.18 (mean(x7[,2])-mean(x1[,2]))-qt(1-alpha,10+5-2)*sd(c(x7[,2],x1[,2]))*sqrt(1/5+1/10) (mean(x7[,2])-mean(x2[,2]))-qt(1-alpha,7+5-2)*sd(c(x7[,2],x2[,2]))*sqrt(1/5+1/7) (mean(x7[,2])-mean(x3[,2]))-qt(1-alpha,7+5-2)*sd(c(x7[,2],x3[,2]))*sqrt(1/5+1/7)

######### ##alpha## 144 alpha <-.32 trt <-4 coverage(alpha,trt)

######### alpha <-.31 trt <-4 coverage(alpha,trt)

##confidence interval at 0.18 significant level alpha <- 0.32 (mean(x7[,2])-mean(x1[,2]))-qt(1-alpha,m+n-2)*sd(c(x7[,2],x1[,2]))*sqrt(1/5+1/10) (mean(x7[,2])-mean(x2[,2]))-qt(1-alpha,m+n-2)*sd(c(x7[,2],x2[,2]))*sqrt(1/5+1/7) (mean(x7[,2])-mean(x3[,2]))-qt(1-alpha,m+n-2)*sd(c(x7[,2],x3[,2]))*sqrt(1/5+1/7)