A Partitioning Approach for the Selection of the Best Treatment

A PARTITIONING APPROACH FOR THE SELECTION OF THE BEST TREATMENT Yong Lin A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2013 Committee: John T. Chen, Advisor Arjun K. Gupta Wei Ning Haowen Xi, Graduate Faculty Representative ii ABSTRACT John T. Chen, Advisor To select the best treatment among several treatments is essentially a multiple comparisons problem. Traditionally, when dealing with multiple comparisons, there is one main argument: with multiplicity adjustment or without adjustment. If multiplicity adjustment is made such as the Bonferroni method, the simultaneous inference becomes too conserva- tive. Moreover, in the conventional methods of multiple comparisons, such as the Tukey's all pairwise multiple comparisons, although the simultaneous confidence intervals could be obtained, the best treatment cannot be distinguished efficiently. Therefore, in this dissertation, we propose several novel procedures using partitioning principle to develop more efficient simultaneous confidence sets to select the best treatment. The method of partitioning principle for efficacy and toxicity for ordered treatments can be found in Hsu and Berger (1999). In this dissertation, by integrating the Bonferroni inequality, the partition approach is applied to unordered treatments for the inference of the best one. With the introduction of multiple comparison methodologies, we mainly focus on the all pairwise multiple comparisons. This is because all the treatments should be compared when we select the best treatment. These procedures could be used in different data forms. Chapter 2 talks about how to utilize the procedure in dichotomous outcomes and the analysis of contingency tables, especially with the Fisher's Exact Test. Chapter 3 discusses the procedures in nonparametric field. With Mann-Whitney test, these procedures become more robust. Chapter 4 addresses the procedures with continuous data under normality. In Chapter 5 we apply the procedures to analyze a prostate cancer study. iii ACKNOWLEDGMENTS In retrospect, there are so many people coming into my mind. Without their help and support, this dissertation could never have come to existence. First and foremost, I would like to thank my advisor, Dr. John Chen, for his constant support, great guidance and many suggestions throughout this research. Dr. John Chen is not only the mentor but also the friend who gives me all the advice, encouragement and experience that he shares about life in general. I would also like to thank my dissertation committee members, Dr. Arjun Gupta, Dr. Wei Ning, Dr. Haowen Xi, for their precious time, generous support and suggestions to my dissertation. I am also grateful to the department of Mathematics and Statistics for providing a won- derful study, teaching and research environment. I especially wish to thank our staff, Marcia Seubert, Mary Busdeker, Cyndi Patterson and Barbara Berta for all their assistance. I ap- preciate the graduate coordinator Dr. Tong Sun for his generous support and the excellent courses taught by Dr. John Chen, Dr. Arjun Gupta, Dr. Jim Albert, Dr. Craig Zirbel ,Dr. Wei Ning, Dr. Hanfeng Chen, Dr. Maria Rizzo, and Dr. Junfeng Shang and other faculty. I am thankful to my friends from Bowling Green, Chen, Wenren, Songzi, Lihua, Jet and all others for their fun and help. Finally, I thank my beloved parents, Wenle and Guifeng, my girlfriend Simeng for their full support, love and encouragement. iv TABLE OF CONTENTS CHAPTER 1: INTRODUCTION 1 1.1 Background . 1 1.2 Types of Multiple Comparisons . 2 1.2.1 All Contrast Comparisons . 3 1.2.2 All-Pairwise Comparisons . 3 1.2.3 Multiple Comparisons with the Best . 4 1.2.4 Multiple Comparisons with a Control . 5 1.3 Studentized Maximum Modulus . 5 1.3.1 Inferences for Studentized Maximum Modulus . 5 1.3.2 Example: a Crystalline Drug Substance . 7 1.4 Tukey's Method . 8 1.4.1 Inference for Tukey's Method . 8 1.4.2 Example: Tukey's Method for Crystalline Drug Substance . 10 1.5 Scheffé'sMethod . 12 1.5.1 Inference for Scheffé'sMethod . 12 1.5.2 Example: Scheffé'sMethod for Crystalline Drug Substance . 14 1.6 Bonferroni Method . 15 1.6.1 Inference for Bonferroni Method . 15 1.6.2 Example: Bonferroni Method for Crystalline Drug Substance . 17 1.7 Nonparametric Approach . 19 v 1.8 Fisher's Exact Test . 20 1.8.1 Inference Using Fisher's Exact Test . 20 1.8.2 Example: Python Eggs . 22 CHAPTER 2: IDENTIFYING THE BEST TREATMENT USING FISHER'S EXACT TEST 24 2.1 Binary Data . 24 2.2 Odds Ratio . 25 2.3 Introduction to Partition . 29 2.4 Main Results . 30 2.5 Procedures . 38 2.5.1 Procedure for Theorem 2.1 . 38 2.5.2 Procedure for Theorem 2.2 . 39 2.6 Simulation . 46 CHAPTER 3: IDENTIFYING THE BEST TREATMENT USING MANN- WHITNEY TEST 49 3.1 Simultaneous Inference with Mann-Whitney Test . 49 3.2 Large-Sample Approximation . 50 3.2.1 Example . 51 3.3 Main Results . 53 3.4 Procedures . 63 3.4.1 Procedure for Theorem 3.1 . 63 3.4.2 Procedure for Theorem 3.2 . 63 3.5 Simulation . 65 CHAPTER 4: INDENTIFYING THE BEST TREATMENT UNDER NOR- MALITY 71 4.1 Multivariate Normal Distribution . 71 vi 4.2 t-test with Welch Correction . 72 4.3 Simultaneous Inference . 75 4.3.1 Main Results . 75 4.3.2 Procedure for Theorem 4.1 . 82 4.3.3 Procedure for Theorem 4.2 . 82 4.4 Simulation . 82 CHAPTER 5: APPLICATIONS IN A PROSTATE CANCER STUDY 90 5.1 Data Background . 90 5.2 Main Results . 94 5.2.1 Theorem Results . 94 5.2.2 Analysis Results . 100 BIBLIOGRAPHY 107 APPENDIX SELECTED R AND SAS PROGRAMS 112 .1 Simulation for the Procedures with Fisher's Exact Test . 112 .1.1 Pre-specified the Best Treatment . 112 .1.2 Select the Best Treatment from Unknown . 117 .2 Simulation for the Procedures with Wilcoxon Mann-Whitney Test . 119 .2.1 Pre-specified the Best Treatment . 119 .2.2 Select the Best Treatment from Unknown . 124 .3 Simulation for the Procedures with Normality . 126 .3.1 Pre-specified the Best Treatment . 126 .3.2 Select the Best Treatment from Unknown . 132 .4 Applications in a Prostate Cancer Study . 134 .4.1 Using SAS to Deal with Original Dataset . 134 .4.2 Using the New Procedure with Wilcoxon Mann-Whitney Test . 137 .4.3 Using the New Procedure under Nomality . 141 vii LIST OF FIGURES 2.1 Partition of a Set S . 29 3.1 Procedure for Theorem 3.1 . 67 3.2 Procedure for Theorem 3.2 at Stage 1 . 68 3.3 Procedure for Theorem 3.2 at Stage 2 . 69 3.4 Procedure for Theorem 3.2 from Stage 3 to Stage k . 70 4.1 Procedure for Theorem 4.1 . 86 4.2 Procedure for Theorem 4.2 at Stage 1 . 87 4.3 Procedure for Theorem 4.2 at Stage 2 . 88 4.4 Procedure for Theorem 4.2 from Stage 3 to Stage k . 89 5.1 Flow Chart for the Prostate Cancer Study . 103 5.2 Frequency Table for the Prostate Cancer Data . 104 5.3 QQ-plot for Treatments . 105 5.4 Median and Standard Deviation for 12 Treatments . 106 viii LIST OF TABLES 1.1 Impurities of Product under One Dose of Irradiation . 7 1.2 Simultaneous Confidence Interval by Studentized Maximum Modulus . 8 1.3 Simultaneous Confidence Interval by Tukey's method . 11 1.4 Simultaneous Confidence Interval by Scheffé'sMethod . 15 1.5 Simultaneous Confidence Interval by Bonferroni Method . 18 1.6 2 × 2 Table of Outcomes . 21 1.7 Hatched Eggs . 22 2.1 2 × 2 Table of Patients . 25 2.2 2 × k Table for k Treatments . 27 2.3 Coverage Probability with C.L.=.95 and Different Trail Numbers . 47 2.4 Coverage Probability with Trial Number n = 90 and Different Orders . 47 2.5 Coverage Probability with Trial Number n = 100 and Different Response Shapes 48 3.1 Mirror Therapy . 52 3.2 Coverage Probability with C.L.=.95 and Different Sample Sizes . 65 3.3 Coverage Probability with C.L.=.95 and Sample Size n=30 . 66 3.4 Coverage Probability with Sample Size n=30 and Different Median Shapes . 66 4.1 Coverage Probability with Different Orders under Normality . 83 4.2 Coverage Probability with Different Sample Sizes under Normality . 83 4.3 Coverage Probability with Different Variances under Normality . 84 ix 4.4 Coverage Probability with Different Mean Shapes under Normality . 84 1 CHAPTER 1 INTRODUCTION 1.1 Background When estimating a population parameter, there are point estimators and interval estimators. In practice, the confidence interval estimator is preferred, because of the reliability of estimation. Moreover, confidence interval, which is a particular kind of interval estimate, is commonly used. For example, suppose there is a simple random sample Y1;Y2; :::; Yn from normal distribution N(µ, σ2) with known σ, then we have parameter µ and σ. Thus, the ^ σ ^ σ 100(1 − α)% confidence interval for µ is (Y − Zα/2 n ; Y + Zα/2 n ), where Zα/2 is the upper α=2 critical value for the normal standard distribution. In a similar way, if there are more than one parameter estimated at the same time, simultaneous confidence intervals should be applied. Normally, simultaneous confidence intervals constitute a confidence region which esti- mates a multivariate parameter. Here, we take the case of two means as an example. Let Yi1;Yi2; :::; Yin be a simple random sample from a normal distribution with mean µi and 2 known variance σi , where i=1,2, respectively.

A Partitioning Approach for the Selection of the Best Treatment

©2018 Oxford University Press

The Problem of Multiple Testing and Its Solutions for Genom-Wide Studies

Simultaneous Confidence Intervals for Comparing Margins of Multivariate

P and Q Values in RNA Seq the Q-Value Is an Adjusted P-Value, Taking in to Account the False Discovery Rate (FDR)

Dataset Decay and the Problem of Sequential Analyses on Open Datasets

Multiple Testing Corrections

Multiple Testing with Gene Expression Array Data

Don't Have to Worry About Multiple Comparisons

Evaluation of the Statistical Elements of the Teaching Excellence and Student Outcomes Framework

False Discovery Rate Computation: Illustrations and Modiﬁcations by Megan Hollister Murray and Jeffrey D

Machine-Learning Tests for Effects on Multiple Outcomes

Multiple Comparisons