Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Caitlin McHugh Department of Biostatistics University of Washington 10 April 2012 Caitlin McHugh Looking at the Other Side of Bonferroni The Paper The Annals of Applied Statistics 2007, Vol. 1, No. 1, 179–190 DOI: 10.1214/07-AOAS102 © Institute of Mathematical Statistics,2007 CONTROL OF THE MEAN NUMBER OF FALSE DISCOVERIES, BONFERRONI AND STABILITY OF MULTIPLE TESTING BY ALEXANDER GORDON,GALINA GLAZKO,XING QIU AND ANDREI YAKOVLEV1 University of Rochester and University of North Carolina at Charlotte, University of Rochester, University of Rochester and University of Rochester I Gordon, et al. set out to prove that the Bonferroni testing The Bonferroni multiple testing procedure is commonly perceived as be- procedureing overlyis conservative not conservative in large-scale simultaneous if we simply testing look situations at it such from as a differentthose that perspective arise in microarray [2]. data analysis. The objective of the present study is to show that this popular belief is due to overly stringent requirements that are typically imposed on the procedure rather than to its conservative nature. To get over its notorious conservatism, we advocate using the Bonferroni selection rule as a procedure that controls the per family error rate (PFER). The present paper reports the first study of stability properties of the Bonferroni and Benjamini–Hochberg procedures. The Bonferroni procedure shows a su- perior stability in termsCaitlin of McHugh the varianceLooking of both at the the Other number Side of of true Bonferroni discoveries and the total number of discoveries, a property that is especially important in the presence of correlations between individual p-values. Its stability and the ability to provide strong control of the PFER make the Bonferroni procedure an attractive choice in microarray studies. 1. Introduction. Arecentexplosionofstatisticalpublicationsdealingwith multiple significance tests has been triggered by the needs of new high throughput technologies in biology such as gene expression microarrays [Dudoit, Shaffer and Boldrick (2003)]. This voluminous literature has been focused on various alter- natives to the family-wise error rate (FWER) controlling procedures such as the classical Bonferroni method, the latter having been considered as too conservative for practical purposes. The Bonferroni method was improved by Holm (1979)whoproposedastep- down multiple testing procedure (MTP) that has more power but still controls the FWER at the same level. Furthermore, the Holm procedure is known to have strong optimality properties [Lehmann and Romano (2005a), Chapter 9]. Another attempt to gain more power by utilizing the dependence between test-statistics is due to Westfall and Young (1993)whodesignedastep-downresamplingalgorithmthat provides strong control of the FWER and is consistent (i.e., the FWER approaches Received November 2006; revised January 2007. 1Supported in part by NIH/NIGMS Grant GM075299 and Johnson & Johnson Discovery Concept Award. Supplementary material available at http://imstat.org/aoas/supplements Key words and phrases. Multiple testing, stability, Bonferroni method, microarray data. 179 Multiple Testing: Control the Type I Error Rate I When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis tests. I In categorical data analysis, one may want to test all pairwise combinations. I How do we ensure we are properly controlling for the number of false rejections? Caitlin McHugh Looking at the Other Side of Bonferroni 2.5 Million Hypothesis Tests Caitlin McHugh Looking at the Other Side of Bonferroni The Family-Wise Error Rate (FWER) I For a test, we choose a significance level, α. I We define the family-wise error rate as FWER = P(# false rej ≥ 1) I This is the probability of one or more false rejections of the null hypothesis, the probability that there is at least one type I error. Caitlin McHugh Looking at the Other Side of Bonferroni The Per Family Error Rate (PFER) I We define the per family error rate as PFER = E(# false rej) I This is the expected number of false rejections of the null hypothesis. I Can be thought of as the expected number of type I errors, or the expected number of false positives. I FWER≤PFER. I The probability of one or more false rejections is less than or equal to the expected number of false rejections. Caitlin McHugh Looking at the Other Side of Bonferroni Bonferroni Correction I Say we perform m hypothesis tests. I To adjust our overall significance level, α, simply divide by the number of tests performed, m, so our significance level becomes α=m. α∗ = α=m I When the p-values are highly correlated, this adjustment is conservative, when thinking about the FWER (the probability of at least one false positive). I Note, however, this adjustment is not conservative when thinking about the PFER (the expected number of false positives). I This controls the probability of one or more false rejections. Caitlin McHugh Looking at the Other Side of Bonferroni Bonferroni does control the PFER! I Bonferroni correction controls the E(# false rej)=PFER as well as the P(# false rej ≥ 1)=FWER [3]. I When controlling the FWER, we have an extremely small threshold (think 0.05/1,000,000). I When considering the PFER, the threshold is somewhere between 0 and m, the number of tests we are performing. α I Call Bonf the classical Bonferroni procedure, and denote Bonf γ the Bonferroni procedure that controls the PFER, 0 ≤ γ ≤ m. Caitlin McHugh Looking at the Other Side of Bonferroni What is Bonf γ? γ I Bonf controls the expected number of false positives at level γ. I PFER≤ (m0=m)γ, where m0 is the number of true null hypotheses in the m hypotheses considered. I We calculate PFER = E(# false rej) X = E( Ifpi ≤γ=mg) i2T X = E( P(pi ≤ γ=m)) i2T ≤ (m0=m)γ ≤ γ where T is the set of indices of the true null hypotheses and pi are the observed p-values. Caitlin McHugh Looking at the Other Side of Bonferroni Is Bonf γ really that cool? γ I If the p-values are uniformly distributed on [0,1], Bonf controls the PFER at level (m0=m)γ, since PFER=(m0=m)γ. γ I Since FWER≤PFER, Bonf controls the FWER at level (m0=m)γ also. I This controls the probability of one or more false rejections. I What if we controlled the expected proportion of false rejections, known as the false discovery rate (FDR)? Caitlin McHugh Looking at the Other Side of Bonferroni Benjamini-Hochberg (BH) Correction I To adjust our overall significance level, α, simply divide by the number of tests performed, and multiply by the ranking of the p-value, so our significance level becomes αi=m for p(i). I From the ordered p-values, we start at the largest and αi work down until we find i such that p(i) ≤ m , call this particular i value k. I We then reject all H(i), i ≤ k. I This controls the expected proportion of false rejections, the FDR [1]. Caitlin McHugh Looking at the Other Side of Bonferroni A simple example I Assume we produce p-values for 5 different tests: 0.0004 0.0015 0.0095 0.0254 0.0450 I α = 0:05 ∗ I Using Bonferroni correction, α = α=5 = 0:05=5 = 0:01. I Using the BH correction, α(i) = 0:05i=5 = 0:01i, so the largest p-value that satisfies the constraint p(i) ≤ 0:01i is p(5) = 0:0450 ≤ 0:01i = 0:05. Caitlin McHugh Looking at the Other Side of Bonferroni A simple example I Assume we produce p-values for 5 different tests: 0.0004 0.0015 0.0095 0.0254 0.0450 I No multiple-testing correction: all significant. I Using Bonferroni correction: 3 smallest p-values are significant. I Using the BH correction: all significant. Caitlin McHugh Looking at the Other Side of Bonferroni Conclusions Via simulation, I will show that the Bonf γ procedure is just as powerful as the BH procedure, when γ is chosen appropriately. Caitlin McHugh Looking at the Other Side of Bonferroni References Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc., B(1):289–300, 1995. Alexander Gordon, Galina Glazko, Xing Qiu, and Andrei Yakovlev. Control of the mean number of false discoveries, Bonferroni and stability of multiple testing. The Annals of Applied Statistics, 1(1):179–190, 2007. Mei-Ling Ting Lee. Analysis of Microarray gene expression data. Kluer Academic, Boston, Massachusetts, 2004. Caitlin McHugh Looking at the Other Side of Bonferroni.

Looking at the Other Side of Bonferroni

©2018 Oxford University Press

The Problem of Multiple Testing and Its Solutions for Genom-Wide Studies

Simultaneous Confidence Intervals for Comparing Margins of Multivariate

P and Q Values in RNA Seq the Q-Value Is an Adjusted P-Value, Taking in to Account the False Discovery Rate (FDR)

Dataset Decay and the Problem of Sequential Analyses on Open Datasets

A Partitioning Approach for the Selection of the Best Treatment

Multiple Testing Corrections

Multiple Testing with Gene Expression Array Data

Don't Have to Worry About Multiple Comparisons

Evaluation of the Statistical Elements of the Teaching Excellence and Student Outcomes Framework

False Discovery Rate Computation: Illustrations and Modiﬁcations by Megan Hollister Murray and Jeffrey D

Machine-Learning Tests for Effects on Multiple Outcomes