Sequential Multiple Hypothesis Testing with Type I Error Control

Total Page:16

File Type:pdf, Size:1020Kb

Sequential Multiple Hypothesis Testing with Type I Error Control Sequential Multiple Hypothesis Testing with Type I Error Control Alan Malek Sumeet Katariya MIT University of Wisconsin-Madison [email protected] [email protected] Yinlam Chow Mohammad Ghavamzadeh Stanford University Adobe Research [email protected] [email protected] Abstract type I error, the probability of falsely rejecting the null hypothesis, is bounded. A second goal is to minimize This work studies multiple hypothesis testing the probability type II error, which occurs when the in the setting when we obtain data sequen- statistician erroneously fails to reject the null (when tially and may choose when to stop sampling. the alternative is indeed true). HT is a fundamental We summarize the notion of a sequential p- problem in statistical inference that arises in numer- value (one that can be continually updated ous fields. Examples include online marketing [6] and and still maintain a type I error guarantee) pharmaceutical studies [20]. and provide several examples from the litera- HT has been traditionally studied in the fixed-horizon ture. This tool allows us to convert fixed- setting. The desired parameters such as type I error horizon step-up or step-down multiple hy- and power are used to calculate a lower bound on the pothesis testing procedures (which includes necessary sample size, the requisite number of samples Benjamini-Hochberg, Holm, and Bonferroni) are collected, and then a decision is made. In particular, into a sequential version that allows the statis- one must wait until the minimum sample size is reached tician to reject a hypothesis as soon as the or the type I error guarantee is forgone. While this sequential p-value reaches a threshold while approach provides rigorous statistical guarantees in maintaining type I error control. We show accuracy and robustness, it usually takes a long time to that if the original procedure has a type I error conclude a test and also its offline nature possesses huge guarantee in a certain family (including FDR limitations to emerging applications, such as digital and FWER), then the sequential conversion marketing, where data is streaming in a sequential inherits an analogous guarantee. The con- fashion and a data-efficient approach is crucial to allow version also allows for allocating samples in real-time adaption to user’s preferences. a data-dependent way, and we provide simu- lated experiments demonstrating an increased Multiple hypothesis testing (MHT) is a natural exten- number of rejections when compared to the sion of HT to control error across multiple hypotheses. fixed-horizon setting. While type I and type II errors are straightforward in single tests, there are many ways one could penalize incorrect decisions in MHT. The two most widely used 1 Introduction notions are family-wise error rate (FWER), which is the probability of at least one false rejection, and false dis- covery rate (FDR), which is the expected proportion of Hypothesis testing (HT) is a statistically rigorous pro- the rejections that are false. Generally, the statistician cedure that makes decision about the truth of a hy- would like to reject as many tests as possible while still pothesis is a way that allows the probability of error controlling the number of false positives. to be carefully controlled. Specifically, given a null hy- pothesis and an alternative hypothesis, the statistician Recent advances in the scale and rate of data collec- decides when to reject the null hypothesis such that the tion have increased the need for sequential hypothesis testing; in this framework, the data arrives sequentially Proceedings of the 20th International Conference on Artifi- and the statistician may make decisions after observing cial Intelligence and Statistics (AISTATS) 2017, Fort Laud- partial data, such as deciding when to stop collecting erdale, Florida, USA. JMLR: W&CP volume 54. Copyright data or from which populations to collect data from. 2017 by the author(s). Sequential Multiple Hypothesis Testing with Type I Error Control Large clinical trials and Hypothesis testing in social dynamic allocation property allows us to conclude more networks, mobile applications, and large internet sys- tests (than their static/uniform allocation scenario) at tems are examples where sequential HT is desirable. each time-step during the lifespan of the procedure. In contrast, in fixed-horizon setting all the data are Moreover, our framework is more flexible and allows collected a priori. control of FDR, FWER, and a host of other metrics, in contrast to only controlling FDR. Of course, such data-dependent decision introduce de- pendencies between future data and past data, and the statistician must be more careful designing proce- 1.2 Our Contributions dures if type I error control is desired. For example, We first describe a framework that fits most a common goal in sequential hypothesis testing is to fixed-horizon hypothesis test procedures, including observe the data sequence and stop at the statisticians all step-up procedures (e.g., Hochberg [16] and choosing. However, if the statistician were to follow the Benjamini-Hochberg [7]) and step-down procedures evolution of a statistic in the traditional fixed-horizon (e.g., Holm’s [17]) with an accompanying family of type HT and reject the null hypothesis as soon as it crosses I errors that includes FWER and FDR. By leveraging the fixed-horizon decision boundary, the type I error the powerful tool of sequential p-values, we propose would be greatly inflated [18] and many erroneous re- a method for converting any fixed-horizon MHT into jections would be made. We survey some alternative a sequential MHT and show that the fixed-horizon hypothesis testing procedures that provide a type I guarantees generalizes to the sequential setting in Sec- error guarantee in the sequential setting, but our work tion 5.2. Further, our sequential MHT can stop tests will use the existence of such sequential tests as a start- early in a data-dependent manner, and therefore stops ing point. Our goal will be to use sequential hypothesis easier hypotheses first and potentially allowing more tests to adapt multiple hypothesis testing procedures samples for difficult hypotheses. We discuss this dy- into sequential variants that can enjoy the benefits of namic allocation property and demonstrate that it does data-depending sampling and early stopping. lead to more rejected tests on synthetic data in Sec- tion 7. Additionally, Section 6 discusses controlling 1.1 Related Work type II errors and provides a sequential calculator for estimating a minimum sample size to control both FDR Sequential hypothesis testing has a long history [30, and false non-discovery rate (FNR) in sequential MHT. 20, 24], but its extension to multiple hypotheses and tests is more recent. Several recent papers study the 2 Hypothesis Testing mechanics of statistics that allow for uniform type I error guarantees for any stopping time [1, 15, 28, 29]. In hypothesis testing (HT), the statistician tests a Bartroff and co-authors have established several scenar- null hypothesis H0 against an alternative hypothesis H1. ios where sequential MHT has type I error in the cases The goal is to design a procedure that controls the of all simple hypotheses with FWER [2] and FDR con- type I error α, the probability of erroneously rejecting trol [3]. Similarly, De and Baron [10] proposed a sequen- H0, while simultaneously minimizing the type II error tial MHT algorithm that guarantees the overall type β, the probability of failing to reject H0 when it is I error. However since this algorithm is based on the false. A prototypical example is A/B testing with Bonferroni correction [9], the power of rejecting alterna- baseline option A and alternative option B; the null tive hypotheses has been empirically shown to be low. and alternative hypotheses are H0 : µA = µB and Fellouris and Tartakovsky [14] also studied the sequen- H1 : µA 6= µB, where µA and µB are the mean values tial testing problem of a simple null hypothesis vs. a (e.g., conversion rates or click through rates). composite alternative hypothesis, for which the decision Multiple hypothesis testing (MHT) is an extension of boundaries were derived using both mixture-based and HT to a finite number of tests m. Examples are a weighted-generalized likelihood ratio statistics. While base option A against a set of m alternative options they have shown asymptotic optimally of such a sequen- A ;:::;A , a single null hypothesis against a set of tial testing procedure, their proposed method requires 1 m alternative hypotheses [23, 5], and a test with a set of high-dimensional hyper-parameter tuning and cannot nested hypotheses [19]. When the test stops, the MHT be easily extended to multiple hypothesis testing. Ze- procedure rejects R tests, out of which V are rejected by hetmayer and collaborators [31, 32] looked at control- mistake, and does not reject m − R tests, out of which ling FDR when the p-values are modeled as Gaussian. W should have been rejected. Table 1 summarizes Similar to our results, the approach of [18] relies on these quantities. The decision of the MHT procedure sequential p-values. Our work extends their results to reject/not-reject a test is a random quantity, because by allowing each test to be stopped individually. This it depends on the test sample that is random. Thus, all Alan Malek, Sumeet Katariya, Yinlam Chow, and Mohammad Ghavamzadeh not-rejected rejected total The most common procedure for fixed-horizon multi- H0 true U V m0 ple hypothesis testing is to pick sample sizes, gather H0 false W S m − m0 the samples and compute the p-values of the m tests, total m − R R m then determine, as a whole, which null hypotheses to reject. The most common MHT procedure to control Table 1: MHT Error Quantities.
Recommended publications
  • Sequential Analysis Tests and Confidence Intervals
    D. Siegmund Sequential Analysis Tests and Confidence Intervals Series: Springer Series in Statistics The modern theory of Sequential Analysis came into existence simultaneously in the United States and Great Britain in response to demands for more efficient sampling inspection procedures during World War II. The develop ments were admirably summarized by their principal architect, A. Wald, in his book Sequential Analysis (1947). In spite of the extraordinary accomplishments of this period, there remained some dissatisfaction with the sequential probability ratio test and Wald's analysis of it. (i) The open-ended continuation region with the concomitant possibility of taking an arbitrarily large number of observations seems intol erable in practice. (ii) Wald's elegant approximations based on "neglecting the excess" of the log likelihood ratio over the stopping boundaries are not especially accurate and do not allow one to study the effect oftaking observa tions in groups rather than one at a time. (iii) The beautiful optimality property of the sequential probability ratio test applies only to the artificial problem of 1985, XI, 274 p. testing a simple hypothesis against a simple alternative. In response to these issues and to new motivation from the direction of controlled clinical trials numerous modifications of the sequential probability ratio test were proposed and their properties studied-often Printed book by simulation or lengthy numerical computation. (A notable exception is Anderson, 1960; see III.7.) In the past decade it has become possible to give a more complete theoretical Hardcover analysis of many of the proposals and hence to understand them better. ▶ 119,99 € | £109.99 | $149.99 ▶ *128,39 € (D) | 131,99 € (A) | CHF 141.50 eBook Available from your bookstore or ▶ springer.com/shop MyCopy Printed eBook for just ▶ € | $ 24.99 ▶ springer.com/mycopy Order online at springer.com ▶ or for the Americas call (toll free) 1-800-SPRINGER ▶ or email us at: [email protected].
    [Show full text]
  • Trial Sequential Analysis: Novel Approach for Meta-Analysis
    Anesth Pain Med 2021;16:138-150 https://doi.org/10.17085/apm.21038 Review pISSN 1975-5171 • eISSN 2383-7977 Trial sequential analysis: novel approach for meta-analysis Hyun Kang Received April 19, 2021 Department of Anesthesiology and Pain Medicine, Chung-Ang University College of Accepted April 25, 2021 Medicine, Seoul, Korea Systematic reviews and meta-analyses rank the highest in the evidence hierarchy. However, they still have the risk of spurious results because they include too few studies and partici- pants. The use of trial sequential analysis (TSA) has increased recently, providing more infor- mation on the precision and uncertainty of meta-analysis results. This makes it a powerful tool for clinicians to assess the conclusiveness of meta-analysis. TSA provides monitoring boundaries or futility boundaries, helping clinicians prevent unnecessary trials. The use and Corresponding author Hyun Kang, M.D., Ph.D. interpretation of TSA should be based on an understanding of the principles and assump- Department of Anesthesiology and tions behind TSA, which may provide more accurate, precise, and unbiased information to Pain Medicine, Chung-Ang University clinicians, patients, and policymakers. In this article, the history, background, principles, and College of Medicine, 84 Heukseok-ro, assumptions behind TSA are described, which would lead to its better understanding, imple- Dongjak-gu, Seoul 06974, Korea mentation, and interpretation. Tel: 82-2-6299-2586 Fax: 82-2-6299-2585 E-mail: [email protected] Keywords: Interim analysis; Meta-analysis; Statistics; Trial sequential analysis. INTRODUCTION termined number of patients was reached [2]. A systematic review is a research method that attempts Sequential analysis is a statistical method in which the fi- to collect all empirical evidence according to predefined nal number of patients analyzed is not predetermined, but inclusion and exclusion criteria to answer specific and fo- sampling or enrollment of patients is decided by a prede- cused questions [3].
    [Show full text]
  • Detecting Reinforcement Patterns in the Stream of Naturalistic Observations of Social Interactions
    Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 7-14-2020 Detecting Reinforcement Patterns in the Stream of Naturalistic Observations of Social Interactions James Lamar DeLaney 3rd Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Developmental Psychology Commons Let us know how access to this document benefits ou.y Recommended Citation DeLaney 3rd, James Lamar, "Detecting Reinforcement Patterns in the Stream of Naturalistic Observations of Social Interactions" (2020). Dissertations and Theses. Paper 5553. https://doi.org/10.15760/etd.7427 This Thesis is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Detecting Reinforcement Patterns in the Stream of Naturalistic Observations of Social Interactions by James Lamar DeLaney 3rd A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Psychology Thesis Committee: Thomas Kindermann, Chair Jason Newsom Ellen A. Skinner Portland State University i Abstract How do consequences affect future behaviors in real-world social interactions? The term positive reinforcer refers to those consequences that are associated with an increase in probability of an antecedent behavior (Skinner, 1938). To explore whether reinforcement occurs under naturally occuring conditions, many studies use sequential analysis methods to detect contingency patterns (see Quera & Bakeman, 1998). This study argues that these methods do not look at behavior change following putative reinforcers, and thus, are not sufficient for declaring reinforcement effects arising in naturally occuring interactions, according to the Skinner’s (1938) operational definition of reinforcers.
    [Show full text]
  • ©2018 Oxford University Press
    ©2018 PART E OxfordChallenges in Statistics University Press mot43560_ch22_201-213.indd 201 07/31/17 11:56 AM ©2018 Oxford University Press ©2018 CHAPTER 22 Multiple Comparisons Concepts If you torture your data long enough, they will tell you what- ever you want to hear. Oxford MILLS (1993) oping with multiple comparisons is one of the biggest challenges Cin data analysis. If you calculate many P values, some are likely to be small just by random chance. Therefore, it is impossible to inter- pret small P values without knowing how many comparisons were made. This chapter explains three approaches to coping with multiple compari- sons. Chapter 23 will show that the problem of multiple comparisons is pervasive, and Chapter 40 will explain special strategies for dealing with multiple comparisons after ANOVA.University THE PROBLEM OF MULTIPLE COMPARISONS The problem of multiple comparisons is easy to understand If you make two independent comparisons, what is the chance that one or both comparisons will result in a statistically significant conclusion just by chance? It is easier to answer the opposite question. Assuming both null hypotheses are true, what is the chance that both comparisons will not be statistically significant? The answer is the chance that the first comparison will not be statistically significant (0.95) times the chance that the second one will not be statistically significant (also 0.95), which equals 0.9025, or about 90%. That leaves about a 10% chance of obtaining at least one statistically significant conclusion by chance. It is easy to generalize that logic to more comparisons.
    [Show full text]
  • The Problem of Multiple Testing and Its Solutions for Genom-Wide Studies
    The problem of multiple testing and its solutions for genom-wide studies How to cite: Gyorffy B, Gyorffy A, Tulassay Z:. The problem of multiple testing and its solutions for genom-wide studies. Orv Hetil , 2005;146(12):559-563 ABSTRACT The problem of multiple testing and its solutions for genome-wide studies. Even if there is no real change, the traditional p = 0.05 can cause 5% of the investigated tests being reported significant. Multiple testing corrections have been developed to solve this problem. Here the authors describe the one-step (Bonferroni), multi-step (step-down and step-up) and graphical methods. However, sometimes a correction for multiple testing creates more problems, than it solves: the universal null hypothesis is of little interest, the exact number of investigations to be adjusted for can not determined and the probability of type II error increases. For these reasons the authors suggest not to perform multiple testing corrections routinely. The calculation of the false discovery rate is a new method for genome-wide studies. Here the p value is substituted by the q value, which also shows the level of significance. The q value belonging to a measurement is the proportion of false positive measurements when we accept it as significant. The authors propose using the q value instead of the p value in genome-wide studies. Free keywords: multiple testing, Bonferroni-correction, one-step, multi-step, false discovery rate, q-value List of abbreviations: FDR: false discovery rate FWER: family familywise error rate SNP: Single Nucleotide Polymorphism PFP: proportion of false positives 1 INTRODUCTION A common feature in all of the 'omics studies is the inspection of a large number of simultaneous measurements in a small number of samples.
    [Show full text]
  • Sequential Analysis Exercises
    Sequential analysis Exercises : 1. What is the practical idea at origin of sequential analysis ? 2. In which cases sequential analysis is appropriate ? 3. What are the supposed advantages of sequential analysis ? 4. What does the values from a three steps sequential analysis in the following table are for ? nb accept refuse grains e xamined 20 0 5 40 2 7 60 6 7 5. Do I have alpha and beta risks when I use sequential analysis ? ______________________________________________ 1 5th ISTA seminar on statistics S Grégoire August 1999 SEQUENTIAL ANALYSIS.DOC PRINCIPLE OF THE SEQUENTIAL ANALYSIS METHOD The method originates from a practical observation. Some objects are so good or so bad that with a quick look we are able to accept or reject them. For other objects we need more information to know if they are good enough or not. In other words, we do not need to put the same effort on all the objects we control. As a result of this we may save time or money if we decide for some objects at an early stage. The general basis of sequential analysis is to define, -before the studies-, sampling schemes which permit to take consistent decisions at different stages of examination. The aim is to compare the results of examinations made on an object to control <-- to --> decisional limits in order to take a decision. At each stage of examination the decision is to accept the object if it is good enough with a chosen probability or to reject the object if it is too bad (at a chosen probability) or to continue the study if we need more information before to fall in one of the two above categories.
    [Show full text]
  • Simultaneous Confidence Intervals for Comparing Margins of Multivariate
    Computational Statistics and Data Analysis 64 (2013) 87–98 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Simultaneous confidence intervals for comparing margins of multivariate binary data Bernhard Klingenberg a,∗,1, Ville Satopää b a Department of Mathematics and Statistics, Williams College, Williamstown, MA 01267, United States b Department of Statistics, The Wharton School of the University of Pennsylvania, Philadelphia, PA 19104-6340, United States article info a b s t r a c t Article history: In many applications two groups are compared simultaneously on several correlated bi- Received 17 August 2012 nary variables for a more comprehensive assessment of group differences. Although the Received in revised form 12 February 2013 response is multivariate, the main interest is in comparing the marginal probabilities be- Accepted 13 February 2013 tween the groups. Estimating the size of these differences under strong error control allows Available online 6 March 2013 for a better evaluation of effects than can be provided by multiplicity adjusted P-values. Simultaneous confidence intervals for the differences in marginal probabilities are devel- Keywords: oped through inverting the maximum of correlated Wald, score or quasi-score statistics. Correlated binary responses Familywise error rate Taking advantage of the available correlation information leads to improvements in the Marginal homogeneity joint coverage probability and power compared to straightforward Bonferroni adjustments. Multiple comparisons Estimating the correlation under the null is also explored. While computationally complex even in small dimensions, it does not result in marked improvements. Based on extensive simulation results, a simple approach that uses univariate score statistics together with their estimated correlation is proposed and recommended.
    [Show full text]
  • Sequential and Multiple Testing for Dose-Response Analysis
    Drug Information Journal, Vol. 34, pp. 591±597, 2000 0092-8615/2000 Printed in the USA. All rights reserved. Copyright 2000 Drug Information Association Inc. SEQUENTIAL AND MULTIPLE TESTING FOR DOSE-RESPONSE ANALYSIS WALTER LEHMACHER,PHD Professor, Institute for Medical Statistics, Informatics and Epidemiology, University of Cologne, Germany MEINHARD KIESER,PHD Head, Department of Biometry, Dr. Willmar Schwabe Pharmaceuticals, Karlsruhe, Germany LUDWIG HOTHORN,PHD Professor, Department of Bioinformatics, University of Hannover, Germany For the analysis of dose-response relationship under the assumption of ordered alterna- tives several global trend tests are available. Furthermore, there are multiple test proce- dures which can identify doses as effective or even as minimally effective. In this paper it is shown that the principles of multiple comparisons and interim analyses can be combined in flexible and adaptive strategies for dose-response analyses; these procedures control the experimentwise error rate. Key Words: Dose-response; Interim analysis; Multiple testing INTRODUCTION rate in a strong sense) are necessary. The global and multiplicity aspects are also re- THE FIRST QUESTION in dose-response flected by the actual International Confer- analysis is whether a monotone trend across ence for Harmonization Guidelines (1) for dose groups exists. This leads to global trend dose-response analysis. Appropriate order tests, where the global level (familywise er- restricted tests can be confirmatory for dem- ror rate in a weak sense) is controlled. In onstrating a global trend and suitable multi- clinical research, one is additionally inter- ple procedures can identify a recommended ested in which doses are effective, which dose. minimal dose is effective, or which dose For the global hypothesis, there are sev- steps are effective, in order to substantiate eral suitable trend tests.
    [Show full text]
  • P and Q Values in RNA Seq the Q-Value Is an Adjusted P-Value, Taking in to Account the False Discovery Rate (FDR)
    P and q values in RNA Seq The q-value is an adjusted p-value, taking in to account the false discovery rate (FDR). Applying a FDR becomes necessary when we're measuring thousands of variables (e.g. gene expression levels) from a small sample set (e.g. a couple of individuals). A p-value of 0.05 implies that we are willing to accept that 5% of all tests will be false positives. An FDR-adjusted p-value (aka a q-value) of 0.05 implies that we are willing to accept that 5% of the tests found to be statistically significant (e.g. by p-value) will be false positives. Such an adjustment is necessary when we're making multiple tests on the same sample. See, for example, http://www.totallab.com/products/samespots/support/faq/pq-values.aspx. -HomeBrew- What are p-values? The object of differential 2D expression analysis is to find those spots which show expression difference between groups, thereby signifying that they may be involved in some biological process of interest to the researcher. Due to chance, there will always be some difference in expression between groups. However, it is the size of this difference in comparison to the variance (i.e. the range over which expression values fall) that will tell us if this expression difference is significant or not. Thus, if the difference is large but the variance is also large, then the difference may not be significant. On the other hand, a small difference coupled with a very small variance could be significant.
    [Show full text]
  • Dataset Decay and the Problem of Sequential Analyses on Open Datasets
    FEATURE ARTICLE META-RESEARCH Dataset decay and the problem of sequential analyses on open datasets Abstract Open data allows researchers to explore pre-existing datasets in new ways. However, if many researchers reuse the same dataset, multiple statistical testing may increase false positives. Here we demonstrate that sequential hypothesis testing on the same dataset by multiple researchers can inflate error rates. We go on to discuss a number of correction procedures that can reduce the number of false positives, and the challenges associated with these correction procedures. WILLIAM HEDLEY THOMPSON*, JESSEY WRIGHT, PATRICK G BISSETT AND RUSSELL A POLDRACK Introduction sequential procedures correct for the latest in a In recent years, there has been a push to make non-simultaneous series of tests. There are sev- the scientific datasets associated with published eral proposed solutions to address multiple papers openly available to other researchers sequential analyses, namely a-spending and a- (Nosek et al., 2015). Making data open will investing procedures (Aharoni and Rosset, allow other researchers to both reproduce pub- 2014; Foster and Stine, 2008), which strictly lished analyses and ask new questions of existing control false positive rate. Here we will also pro- datasets (Molloy, 2011; Pisani et al., 2016). pose a third, a-debt, which does not maintain a The ability to explore pre-existing datasets in constant false positive rate but allows it to grow new ways should make research more efficient controllably. and has the potential to yield new discoveries Sequential correction procedures are harder *For correspondence: william. (Weston et al., 2019). to implement than simultaneous procedures as [email protected] The availability of open datasets will increase they require keeping track of the total number over time as funders mandate and reward data Competing interests: The of tests that have been performed by others.
    [Show full text]
  • A Partitioning Approach for the Selection of the Best Treatment
    A PARTITIONING APPROACH FOR THE SELECTION OF THE BEST TREATMENT Yong Lin A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2013 Committee: John T. Chen, Advisor Arjun K. Gupta Wei Ning Haowen Xi, Graduate Faculty Representative ii ABSTRACT John T. Chen, Advisor To select the best treatment among several treatments is essentially a multiple compar- isons problem. Traditionally, when dealing with multiple comparisons, there is one main argument: with multiplicity adjustment or without adjustment. If multiplicity adjustment is made such as the Bonferroni method, the simultaneous inference becomes too conserva- tive. Moreover, in the conventional methods of multiple comparisons, such as the Tukey's all pairwise multiple comparisons, although the simultaneous confidence intervals could be obtained, the best treatment cannot be distinguished efficiently. Therefore, in this disser- tation, we propose several novel procedures using partitioning principle to develop more efficient simultaneous confidence sets to select the best treatment. The method of partitioning principle for efficacy and toxicity for ordered treatments can be found in Hsu and Berger (1999). In this dissertation, by integrating the Bonferroni inequality, the partition approach is applied to unordered treatments for the inference of the best one. With the introduction of multiple comparison methodologies, we mainly focus on the all pairwise multiple comparisons. This is because all the treatments should be compared when we select the best treatment. These procedures could be used in different data forms. Chapter 2 talks about how to utilize the procedure in dichotomous outcomes and the analysis of contingency tables, especially with the Fisher's Exact Test.
    [Show full text]
  • Group-Sequential Analysis for Two Proportions
    NCSS Statistical Software NCSS.com Chapter 710 Group-Sequential Analysis for Two Proportions The corresponding sample size procedure, found in PASS Sample Size software, is Group-Sequential Tests for Two Proportions (Simulation). Introduction This procedure is used to test the difference of two proportions in stages (sometimes called looks or interim analyses) using group-sequential methods. Unless the stage boundaries are entered directly, the stage boundaries are defined using a specified spending function. One- or two-sided tests may be performed with the option of binding or non-binding futility boundaries. Futility boundaries are specified through a beta-spending function. Sample size re-estimation, based on current-stage sample sizes and parameter estimates, may also be obtained in this procedure. The spending functions available in this procedure are the O’Brien-Fleming analog, the Pocock analog, the Hwang-Shih-DeCani gamma family, and the power family. At each stage, the current and future boundaries are calculated based on the accumulated information proportion. Conditional and predictive power for future stages is also given. A group-sequential boundary and analysis graph is produced in this procedure. 710-1 © NCSS, LLC. All Rights Reserved. NCSS Statistical Software NCSS.com Group-Sequential Analysis for Two Proportions At each stage, stage-adjusted difference estimates, confidence intervals, and p-values are available. The probabilities of crossing future boundaries may also be assessed, using simulation. The format of the data for use in this procedure is three columns: one column for the response values, one column defining the two groups, and a third column defining the stage.
    [Show full text]