A PARTITIONING APPROACH FOR THE SELECTION OF THE BEST TREATMENT
Yong Lin
A Dissertation
Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
August 2013
Committee:
John T. Chen, Advisor
Arjun K. Gupta
Wei Ning
Haowen Xi, Graduate Faculty Representative ii ABSTRACT
John T. Chen, Advisor
To select the best treatment among several treatments is essentially a multiple compar- isons problem. Traditionally, when dealing with multiple comparisons, there is one main argument: with multiplicity adjustment or without adjustment. If multiplicity adjustment is made such as the Bonferroni method, the simultaneous inference becomes too conserva- tive. Moreover, in the conventional methods of multiple comparisons, such as the Tukey’s all pairwise multiple comparisons, although the simultaneous confidence intervals could be obtained, the best treatment cannot be distinguished efficiently. Therefore, in this disser- tation, we propose several novel procedures using partitioning principle to develop more efficient simultaneous confidence sets to select the best treatment. The method of partitioning principle for efficacy and toxicity for ordered treatments can be found in Hsu and Berger (1999). In this dissertation, by integrating the Bonferroni inequality, the partition approach is applied to unordered treatments for the inference of the best one. With the introduction of multiple comparison methodologies, we mainly focus on the all pairwise multiple comparisons. This is because all the treatments should be compared when we select the best treatment. These procedures could be used in different data forms. Chapter 2 talks about how to utilize the procedure in dichotomous outcomes and the analysis of contingency tables, especially with the Fisher’s Exact Test. Chapter 3 discusses the procedures in nonparametric field. With Mann-Whitney test, these procedures become more robust. Chapter 4 addresses the procedures with continuous data under normality. In Chapter 5 we apply the procedures to analyze a prostate cancer study. iii ACKNOWLEDGMENTS
In retrospect, there are so many people coming into my mind. Without their help and support, this dissertation could never have come to existence. First and foremost, I would like to thank my advisor, Dr. John Chen, for his constant support, great guidance and many suggestions throughout this research. Dr. John Chen is not only the mentor but also the friend who gives me all the advice, encouragement and experience that he shares about life in general. I would also like to thank my dissertation committee members, Dr. Arjun Gupta, Dr. Wei Ning, Dr. Haowen Xi, for their precious time, generous support and suggestions to my dissertation. I am also grateful to the department of Mathematics and Statistics for providing a won- derful study, teaching and research environment. I especially wish to thank our staff, Marcia Seubert, Mary Busdeker, Cyndi Patterson and Barbara Berta for all their assistance. I ap- preciate the graduate coordinator Dr. Tong Sun for his generous support and the excellent courses taught by Dr. John Chen, Dr. Arjun Gupta, Dr. Jim Albert, Dr. Craig Zirbel ,Dr. Wei Ning, Dr. Hanfeng Chen, Dr. Maria Rizzo, and Dr. Junfeng Shang and other faculty. I am thankful to my friends from Bowling Green, Chen, Wenren, Songzi, Lihua, Jet and all others for their fun and help. Finally, I thank my beloved parents, Wenle and Guifeng, my girlfriend Simeng for their full support, love and encouragement. iv
TABLE OF CONTENTS
CHAPTER 1: INTRODUCTION 1 1.1 Background ...... 1 1.2 Types of Multiple Comparisons ...... 2 1.2.1 All Contrast Comparisons ...... 3 1.2.2 All-Pairwise Comparisons ...... 3 1.2.3 Multiple Comparisons with the Best ...... 4 1.2.4 Multiple Comparisons with a Control ...... 5 1.3 Studentized Maximum Modulus ...... 5 1.3.1 Inferences for Studentized Maximum Modulus ...... 5 1.3.2 Example: a Crystalline Drug Substance ...... 7 1.4 Tukey’s Method ...... 8 1.4.1 Inference for Tukey’s Method ...... 8 1.4.2 Example: Tukey’s Method for Crystalline Drug Substance ...... 10 1.5 Scheff´e’sMethod ...... 12 1.5.1 Inference for Scheff´e’sMethod ...... 12 1.5.2 Example: Scheff´e’sMethod for Crystalline Drug Substance ...... 14 1.6 Bonferroni Method ...... 15 1.6.1 Inference for Bonferroni Method ...... 15 1.6.2 Example: Bonferroni Method for Crystalline Drug Substance . . . . . 17 1.7 Nonparametric Approach ...... 19 v 1.8 Fisher’s Exact Test ...... 20 1.8.1 Inference Using Fisher’s Exact Test ...... 20 1.8.2 Example: Python Eggs ...... 22
CHAPTER 2: IDENTIFYING THE BEST TREATMENT USING FISHER’S EXACT TEST 24 2.1 Binary Data ...... 24 2.2 Odds Ratio ...... 25 2.3 Introduction to Partition ...... 29 2.4 Main Results ...... 30 2.5 Procedures ...... 38 2.5.1 Procedure for Theorem 2.1 ...... 38 2.5.2 Procedure for Theorem 2.2 ...... 39 2.6 Simulation ...... 46
CHAPTER 3: IDENTIFYING THE BEST TREATMENT USING MANN- WHITNEY TEST 49 3.1 Simultaneous Inference with Mann-Whitney Test ...... 49 3.2 Large-Sample Approximation ...... 50 3.2.1 Example ...... 51 3.3 Main Results ...... 53 3.4 Procedures ...... 63 3.4.1 Procedure for Theorem 3.1 ...... 63 3.4.2 Procedure for Theorem 3.2 ...... 63 3.5 Simulation ...... 65
CHAPTER 4: INDENTIFYING THE BEST TREATMENT UNDER NOR- MALITY 71 4.1 Multivariate Normal Distribution ...... 71 vi 4.2 t-test with Welch Correction ...... 72 4.3 Simultaneous Inference ...... 75 4.3.1 Main Results ...... 75 4.3.2 Procedure for Theorem 4.1 ...... 82 4.3.3 Procedure for Theorem 4.2 ...... 82 4.4 Simulation ...... 82
CHAPTER 5: APPLICATIONS IN A PROSTATE CANCER STUDY 90 5.1 Data Background ...... 90 5.2 Main Results ...... 94 5.2.1 Theorem Results ...... 94 5.2.2 Analysis Results ...... 100
BIBLIOGRAPHY 107
APPENDIX SELECTED R AND SAS PROGRAMS 112 .1 Simulation for the Procedures with Fisher’s Exact Test ...... 112 .1.1 Pre-specified the Best Treatment ...... 112 .1.2 Select the Best Treatment from Unknown ...... 117 .2 Simulation for the Procedures with Wilcoxon Mann-Whitney Test ...... 119 .2.1 Pre-specified the Best Treatment ...... 119 .2.2 Select the Best Treatment from Unknown ...... 124 .3 Simulation for the Procedures with Normality ...... 126 .3.1 Pre-specified the Best Treatment ...... 126 .3.2 Select the Best Treatment from Unknown ...... 132 .4 Applications in a Prostate Cancer Study ...... 134 .4.1 Using SAS to Deal with Original Dataset ...... 134 .4.2 Using the New Procedure with Wilcoxon Mann-Whitney Test . . . . 137 .4.3 Using the New Procedure under Nomality ...... 141 vii
LIST OF FIGURES
2.1 Partition of a Set S ...... 29
3.1 Procedure for Theorem 3.1 ...... 67 3.2 Procedure for Theorem 3.2 at Stage 1 ...... 68 3.3 Procedure for Theorem 3.2 at Stage 2 ...... 69 3.4 Procedure for Theorem 3.2 from Stage 3 to Stage k ...... 70
4.1 Procedure for Theorem 4.1 ...... 86 4.2 Procedure for Theorem 4.2 at Stage 1 ...... 87 4.3 Procedure for Theorem 4.2 at Stage 2 ...... 88 4.4 Procedure for Theorem 4.2 from Stage 3 to Stage k ...... 89
5.1 Flow Chart for the Prostate Cancer Study ...... 103 5.2 Frequency Table for the Prostate Cancer Data ...... 104 5.3 QQ-plot for Treatments ...... 105 5.4 Median and Standard Deviation for 12 Treatments ...... 106 viii
LIST OF TABLES
1.1 Impurities of Product under One Dose of Irradiation ...... 7 1.2 Simultaneous Confidence Interval by Studentized Maximum Modulus . . . . 8 1.3 Simultaneous Confidence Interval by Tukey’s method ...... 11 1.4 Simultaneous Confidence Interval by Scheff´e’sMethod ...... 15 1.5 Simultaneous Confidence Interval by Bonferroni Method ...... 18 1.6 2 × 2 Table of Outcomes ...... 21 1.7 Hatched Eggs ...... 22
2.1 2 × 2 Table of Patients ...... 25 2.2 2 × k Table for k Treatments ...... 27 2.3 Coverage Probability with C.L.=.95 and Different Trail Numbers ...... 47 2.4 Coverage Probability with Trial Number n = 90 and Different Orders . . . . 47 2.5 Coverage Probability with Trial Number n = 100 and Different Response Shapes 48
3.1 Mirror Therapy ...... 52 3.2 Coverage Probability with C.L.=.95 and Different Sample Sizes ...... 65 3.3 Coverage Probability with C.L.=.95 and Sample Size n=30 ...... 66 3.4 Coverage Probability with Sample Size n=30 and Different Median Shapes . 66
4.1 Coverage Probability with Different Orders under Normality ...... 83 4.2 Coverage Probability with Different Sample Sizes under Normality ...... 83 4.3 Coverage Probability with Different Variances under Normality ...... 84 ix 4.4 Coverage Probability with Different Mean Shapes under Normality . . . . . 84 1
CHAPTER 1
INTRODUCTION
1.1 Background
When estimating a population parameter, there are point estimators and interval estima- tors. In practice, the confidence interval estimator is preferred, because of the reliability of estimation. Moreover, confidence interval, which is a particular kind of interval estimate, is commonly used. For example, suppose there is a simple random sample Y1,Y2, ..., Yn from normal distribution N(µ, σ2) with known σ, then we have parameter µ and σ. Thus, the
ˆ σ ˆ σ 100(1 − α)% confidence interval for µ is (Y − Zα/2 n , Y + Zα/2 n ), where Zα/2 is the upper α/2 critical value for the normal standard distribution. In a similar way, if there are more than one parameter estimated at the same time, simultaneous confidence intervals should be applied. Normally, simultaneous confidence intervals constitute a confidence region which esti- mates a multivariate parameter. Here, we take the case of two means as an example. Let
Yi1,Yi2, ..., Yin be a simple random sample from a normal distribution with mean µi and
2 known variance σi , where i=1,2, respectively. If µ1 and µ2 are considered to be estimated simultaneously, the simultaneous confidence intervals are applicable here. By the Bonferroni √ method, the 100(1 − α)% simultaneous confidence interval for µ is (Yˆ − t α σ / n, Yˆ + i i 2×2 ,n−1 i i 2 √ t α σ / n), where t α is the upper α/4 critical value for t-distribution with degrees 2×2 ,n−1 i 2×2 ,n−1 ˆ of freedom n-1 and Yi is the point estimate for the mean µi. The simultaneous confidence interval is one kind of simultaneous statistical inference. Traditionally, simultaneous statistical inference is the inference on several parameters at once and there are several techniques in simultaneous statistical inference especially in confidence estimation. Some of these techniques are studentized maximum modulus, Tukey-Kramer method, Scheff´e’smethod, Bonferroni method, Multiple range tests (Duncan) method, and so on. To some extent, the simultaneous statistical inference is also multiple comparisons. Mul- tiple comparisons are frequently encountered in industry, clinical trials, and social researches, among others. Multiple comparisons is a kind of comparisons of two or more treatments. If we are concerning about inference on k treatment means which can be denoted as µ1, µ2, ..., µk in multiple comparisons, the functions of contrasts of µ1, µ2, ..., µk are the parameters of in- terest. A contrast of the k treatment means is a linear combination of the k treatment means with coefficients added up to zero. The definition symbolically, a contrast of the k treatment Pk Pk means µi’s, is c1µ1 + c2µ2 + ... + ckµk = j=1 ckµk with i=1 ci = 0.
1.2 Types of Multiple Comparisons
The multiple comparisons can be categorized by the parameters of primary interest, where the most four common types of multiple comparisons are : 1. All-contrast comparisons; 2. All-Pairwise comparisons; 3. Multiple comparisons with the best; 4. Multiple comparisons with the control.
Here, to better understand these four different types of multiple comparisons, we consider 3 the one-way model as an illustrating example.
Suppose there are k treatments, a random sample Yi1,Yi2, ..., Yini is taken under the
2 ith treatment with the means µi and unknown common variance σ , for i=1,2,...,k. These random samples are independent among the k treatments. Then, the one way model can be formulated as:
Yij = µi + i,j, i = 1, 2, ..., k, j = 1, 2, ..., ni,
where 1,1, 1,2, ..., n,k are independent and identically distributed from normal distribution
2 2 with mean 0 and unknown variance σ , which can be written as i,j ∼ N(0, σ ).
1.2.1 All Contrast Comparisons
All contrast comparisons, as the name tells, the parameters of primary interest are all con- trasts. Therefore, symbolically, the primary interests are
Pk i=1 ciµi with
Pk i=1 ci = 0
µ2+µ3 where ci is the coefficient of µi. For example, θ = µ1 − 2 is one of the contrasts, where 1 1 Pk c1 = 1, c2 = − 2 , c3 = − 2 , c4 = c5 = ··· = ck = 0. Therefore, i=1 ci = 0. Here, θ compares the difference between the effect of treatment 1 and the mean effect of treatment 2 and 3.
Since there are infinite linear combinations between ci and µi, for all i=1,...,k, under the Pk constraint i=1 ci = 0, there is an infinite number of contrasts in all contrasts comparisons.
1.2.2 All-Pairwise Comparisons
Pairwise comparison compares treatments in pairs. Then, all pairwise comparisons are all