Discovering the False Discovery Rate

Total Page:16

File Type:pdf, Size:1020Kb

Discovering the False Discovery Rate Discovering the False Discovery Rate Yoav Benjamini Tel Aviv University www.math. tau .ac. il/~ybenja October 2005 Louvain 1 Copyright, 1996 © Dale Carnegie & Associates, Inc. Outline of Lectures 1. Discovering the False Discovery Rate (FDR) 2. FDR “ testimation ” and model selection 3. Seminar Talk: FDR confidence intervals 4. Multiplicity issues in genetic research Y Benjamini Louvain ‘05 Outline of this Talk • The multiplicity problem • The FDR criterion and Variations • The Linear Step Up procedure • Other procedures • Open problem Y Benjamini Louvain ‘05 The multiplicity problem - pairwise comparisons A standard example: Erdman (1946), Steel and Torrie (1960) The data includes six groups of five measurements each, on Nitrogen content of red clover plants, the groups inoculated with strains of Rhizobium bacteria. Explaining ANOVA in SAS/Stat, 1985, 1988 manuals, Holland (1987), Rom and Holland (1995). # of comparisons in examples 6*(6-1)/2 = 15 Traditional Approach: Protect against the possible inflation in type I error rate, by controlling the probability of making even one error: control the Family Wise Error-rate ( FWE ) (Tukey ‘53) Y Benjamini Louvain ‘05 The multiplicity problem - formulation 1. The null hypotheses tested: H1,H 2,…,Hm. Let hi= 0 indicates H i is true; = 1 indicates it is not. Σ(1− m0 of the m hypotheses are true, i.e. m0= hi); we do not know which ones are true or even their number 2. The result of any testing procedure is Ri i=1,2,…,m: Ri= 1 if H i is rejected; = 0 if not Let Vi = 1 if Ri=1 but H i is true (a type I error was made ) = 0 otherwise Σ R= Ri # hypotheses rejected; Σ V= Vi # hypotheses rejected in error α 3. The multiplicity problem : Prob(V i ≥ 1) = for all i, yet FWE = Prob(V ≥ 1) may be much larger Y Benjamini Louvain ‘05 Old and trusted solutions α If we test each hypothesis separately at level BON Σ Σ α α E(V)=E( Vi) = E( Vi) ≤ m 0 BON ≤ m BON α α So to assure E(V) we may use BON = /m This also assures P(V ≥1) ≤ α Because: E(V) = 0Pr(V=0)+1Pr(V=1)+2Pr(V=2)+…+mPr(V=m) ≥ 0 + 1Pr(V=1)+ 1Pr(V=2)+…+ 1Pr(V=m) = 0 + Pr(V ≥ 1) α α So, when using BON = /m for individual tests, FWE =Prob(V ≥ 1) ≤ E(V) ≤ α (Is any conditions needed?) •(1) This is the Bonferroni multiple testing procedure Y Benjamini Louvain ‘05 More old and trusted solutions If the test statistics are independent, α and we test each hypothesis separately at level SID α m0 α m α Prob(V ≥1)=1-Prob(V=0) = 1-(1- SID ) ≤ 1-(1- SID ) ≤ So to assure Prob(V ≥ 1) ≤ α we may use α α 1/m SID =1-(1- ) (2) This is Sidak’s multiple testing procedure Note: If m0=m equalities Y Benjamini Louvain ‘05 More old and trusted solutions Idea : we use dependency structure to get a better test. How much better? α α 1/m α α2 α α2 SID =1-(1- ) ~ 1-(1- /m - /2m) = BON + /2m Even for small m (=10) very little gain: .005116 instead of .005 (3) Tukey’s procedure for pairwise comparisons: same idea but larger gain Y Benjamini Louvain ‘05 Newer solutions Stepwise procedures that make use of individual observed p-values: (4) Holm’s procedure: Let Pi be the observed p-value of the test for Hi Order the p-values P(1) ≤ P (2) ≤…≤ P (m) α If P(1) ≤ /m Reject H(1) α If P(2) ≤ /(m -1) Reject H(2) Continue this way… α Until for the first time P(k) > /(m -(k -1)) then stop and reject no more. Always: FWE ≤ α Y Benjamini Louvain ‘05 The multiplicity problem - its status •Pairwise comparisons - controlling FWE is the standard Post-Hoc analysis (Tukey/Scheffe) •Clinical trials: Multiple Arms Multiple Looks Multiple endpoints (Primary, Secondary ) •Medical Research ( NEJM (Friday) , Other Journals) •Biostatistical Research, Epidemiology (Ottenbacher ’98) •Genetic Research (Next Week) •Science at large • Why? Y Benjamini Louvain ‘05 Behavior genetics • Study the genetics of behavioral traits: Hearing, sight, smell, alcoholism, locomotion, fear, exploratory behavior • These are complex traits • Compare inbred strains, crosses, knockouts… Y Benjamini Louvain ‘05 Example: exploratory behavior NIH: Phenotyping Mouse Behavior High throughput screening of mutant mice , Stops, move segments Behavior Tracking and velocity profile Comparing between 8 inbred strains of mice Dr. Ilan Golani TAU Dr. Elmer, MPRC , Dr Kafkafi, NIDA Y Benjamini Louvain ‘05 Significance of 8 Strain differences Behavioral Endpoint Mixed Prop. Lingering Time 0.0029 # Progression segments 0.0068 Bonferroni Median Turn Radius (scaled) 0.0092 .05/17=.0029 Time away from wall 0.0108 Distance traveled 0.0144 Acceleration 0.0146 # Excursions 0.0178 Time to half max speed 0.0204 Max speed wall segments 0.0257 Median Turn rate 0.0320 Spatial spread 0.0388 Lingering mean speed 0.0588 Homebase occupancy 0.0712 Unadjusted # stops per excursion 0.1202 Stop diversity 0.1489 Length of progression segments 0.5150 Activity decrease 0.8875 Y Benjamini Louvain ‘05 Current multiplicity problems - large problems Revisiting Erdman (1946) 2 such groups of 6 +1 “general” control # of comparisons in study 13*(13-1)/2 = 78 Williams Johns and Tukey (1999) Results of Educational Progress assessed by testing Pairwise comparisons between 35 States in US # of comparisons 35*(35-1)/2 = 595 What should be reported? Y Benjamini Louvain ‘05 High throughput screening of Chemical Compounds (Proteomics) (with Frank Bretz) • Purpose: at early stages of drug development, screen a large number of potential chemical compounds, in order to find any interaction with a given class of compounds (a "hit" ) • The classes may be substructures of libraries of compounds involving up to 10 5 members. • Each potential compound interaction with class member is tested once and only once Y Benjamini Louvain ‘05 High Throughput Screening with Microtiters i=74 j=1 plate i i=1 row j j=8 Negative control Positive control k=2 k=11 10x8 Potential Compounds Y Benjamini Louvain ‘05 High Throughput Screening • Step 1: Analyzing the negative control data 74 plates x 8 rows Get comparison values per plate and s.e. • Step 2: Conduct individual comparisons 74 plates x 80 potential compounds − + Xplate, row,col (C Cplate ) Note positive dependency within plate because of Y Benjamini Louvain ‘05 The dilemma • Not controlling for multiplicity, working at .05 we expect 74 plates x 80 compounds x .05 = 296 (statistical) discoveries possibly just due to noise • Controlling for multiplicity, working at .05 a single comparison has to be significant at .05 / (74 x 80) = 0.000008 to make it to the list of discoveries Y Benjamini Louvain ‘05 Outline • The multiplicity problem • The FDR criterion and Variations • The Linear Step Up procedure • Other procedures • FDR testimation with confidence • Open problem Y Benjamini Louvain ‘05 The False Discovery Rate (FDR) criterion Benjamini and Hochberg (95) R = # rejected hypotheses = # discoveries V of these may be in error = # false discoveries The error (type I) in the entire study is measured by V Q = R > 0 R = 0 R = 0 i.e. the proportion of false discoveries among the discoveries (0 if none found) FDR = E(Q) Does it make sense? Y Benjamini Louvain ‘05 Does it make sense? • Inspecting 100 features: 2 false ones among 50 discovered - bearable 2 false ones among 4 discovered - unbearable So this error rate is adaptive • The same argument holds when inspecting 10,000 So this error rate is scalable • If nothing is “real” controlling the FDR at level q guarantees Prob ( V ≥ 1 ) = E( V/R ) = FDR ≤ q • But otherwise Prob ( V ≥ 1 ) ≥ FDR So there is room for improving detection power Y Benjamini Louvain ‘05 Extensions • Directional FDR An error can be that of declaring a negative parameter to be positive - a directional error V+ of the errors may be directional errors V0 of the errors may be errors of rejecting 0 values D-FDR = E( V+ +V/ R 0)/ R) • Weighted FDR associated with each hypothesis i is a weight wi ΣΣ ΣΣ W-FDR = E( wiVi / wiRi) Y Benjamini The weights capture importance/price Louvain ‘05 False Non -Discovery Rates (FNR) Sarkar; Genovese and Wasserman FNR = E(T / {m-R}) = E( {m -m0 -(R -V) } / {m-R}) How about: Minimize FNR s.t. FDR ≤ q why not : Minimize FDR s.t. FNR ≤ q Y Benjamini Louvain ‘05 • Other versions of False Discovery Rates : – Genovese and Wasserman emphasize the sample quantity V/R : False Discovery Proportion or FDP – …and using a fixed rejection rule at a FDP( a) = V( a)/R( a) – Storey emphasizes E(V/R | R>0) : positive FDR or pFDR …and using a fixed rejection rule at a pFDR (a) = E(V( a)/R( a) | R( a)>0) Both cannot be controlled when nothing is real yet they give more perspectives on the FDR Y Benjamini Louvain ‘05 • Recent interest in Tail probability of FDP – Genovese and Wasserman, Lehmann and Romano, van der Laan at al emphasize also The tail probability of the False Discovery Proportion Prob ( V/R ≥ q ) This is an attractive criterion. But obviously if we want Prob ( V/R ≤ q ) > 1 - β for some small β it is more strict criterion than controlling the expectation below q. We’ll see what it means as we gain experience Y Benjamini Louvain ‘05 Historical Perspective (I) FDR control in BH was motivated by a paper of Soriç (1987). In the direction in which we went: • Prof. Victor (1997) brought to my attention his note from (1982) with independent previous efforts: 1.
Recommended publications
  • Multiple Testing
    The p-value Familywise Error Rate Fisher's Method False Discovery Rate Chapter 8 Hypothesis Testing Multiple Testing 1 / 20 The p-value Familywise Error Rate Fisher's Method False Discovery Rate Outline The p-value Familywise Error Rate Fisher's Method Benjamini-Hochberg Procedure False Discovery Rate 2 / 20 The p-value Familywise Error Rate Fisher's Method False Discovery Rate The p-value In 2016, the American Statistical Association set for itself a task to make a statement on p-values. They note that it is all too easy to set a test, create a test statistic and compute a p-value. Proper statistical practice is much more than this and includes • appropriately chosen techniques based on a thorough understanding of the phenomena under study, • adequate visual and numerical summaries of the data, • properly conducted analyses whose logic and quantitative approaches are clearly explained, • correct interpretation of statistical results in context, and • reproducibility of results via a thorough reporting. Expressing a p-value is one of many approaches to summarize the results of a statistical investigation. 3 / 20 The p-value Familywise Error Rate Fisher's Method False Discovery Rate The p-value The statement's six principles, many of which address misconceptions and misuse of the p-value, are the following: 1. P-values can indicate how incompatible the data are with a specified statistical model. 2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. 3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
    [Show full text]
  • Discussion: on Methods Controlling the False Discovery Rate
    Sankhy¯a: The Indian Journal of Statistics 2008, Volume 70, Part 2, pp. 169-176 c 2008, Indian Statistical Institute Discussion: On Methods Controlling the False Discovery Rate Joseph P. Romano Stanford University, USA Azeem M. Shaikh University of Chicago, USA Michael Wolf University of Zurich, Switzerland 1 Introduction It is a pleasure to acknowledge another insightful article by Sarkar. By developing clever expressions for the FDP, FDR, and FNR, he manages to prove fundamental properties of stepdown and stepup methods. It is particularly important that the theory is sufficiently developed so as to apply to what Sarkar calls adaptive BH methods. Here, the goal is to improve upon the Benjamini Hochberg procedure by incorporating a data-dependent estimate of the number of true null hypotheses. Theoretical justification of such methods is vital and Sarkar’s analysis is useful for this purpose. A perhaps more ambitious task is to develop methods which implicitly or explicitly estimate the joint dependence structure of the test statistics (or p-values). The focus of our discussion is to show how resampling methods can be used to construct stepdown procedures which control the FDR or other general measures of error control. The main benefit of using the boot- strap or subsampling is the ability to estimate the joint distribution of the test statistics, and thereby offer the potential of improving upon methods based on the marginal distributions of test statistics. The procedure below is a generalization of one we developed for FDR control, and the utility of the bootstrap is that it can apply to essentially arbitrary measures of error control, such as the pairwise FDR of Sarkar, the k-FWER, or the tail prob- abilities of the false discovery proportion.
    [Show full text]
  • Inference on the Limiting False Discovery Rate and the P-Value Threshold Parameter Assuming Weak Dependence Between Gene Expression Levels Within Subject
    Inference on the limiting false discovery rate and the p-value threshold parameter assuming weak dependence between gene expression levels within subject Glenn Heller1 and Jing Qin2 1Department of Epidemiology and Biostatistics Memorial Sloan-Kettering Cancer Center New York, NY 10021, USA 2Biostatistics Research Branch National Institute of Allergy and Infectious Diseases Bethesda, Maryland 20892, USA Corresponding author: Glenn Heller email: [email protected] phone: 646 735 8112 fax: 646 735 0010 Running Head: FDR analysis with dependent data 1 Summary. An objective of microarray data analysis is to identify gene expressions that are associated with a disease related outcome. For each gene, a test statistic is computed to determine if an association exists, and this statistic generates a marginal p-value. In an effort to pool this information across genes, a p-value density function is derived. The p-value density is modeled as a mixture of a uniform (0,1) density and a scaled ratio of normal densities derived from the asymptotic normality of the test statistic. The p-values are assumed to be weakly dependent and a quasi-likelihood is used to estimate the parameters in the mixture density. The quasi-likelihood and the weak dependence assumption enables estimation and asymptotic inference on the false discovery rate for a given rejection region, and its inverse, the p-value threshold parameter for a fixed false discovery rate. A false discovery rate analysis on a local- ized prostate cancer data set is used to illustrate the methodology. Simulations are performed to assess the performance of this methodology. Keywords: Asymptotic normal test statistic, confidence interval, microarray, p-value mixture model, quasi-likelihood, weak dependence.
    [Show full text]
  • Heterocedasticity-Adjusted Ranking and Thresholding for Large-Scale Multiple Testing
    Heterocedasticity-Adjusted Ranking and Thresholding for Large-Scale Multiple Testing Luella Fu1, Bowen Gang2, Gareth M. James3, and Wenguang Sun3 Abstract Standardization has been a widely adopted practice in multiple testing, for it takes into account the variability in sampling and makes the test statistics comparable across different study units. However, despite conventional wisdom to the contrary, we show that there can be a significant loss in information from basing hypothesis tests on stan- dardized statistics rather than the full data. We develop a new class of heteroscedasticity{ adjusted ranking and thresholding (HART) rules that aim to improve existing methods by simultaneously exploiting commonalities and adjusting heterogeneities among the study units. The main idea of HART is to bypass standardization by directly incor- porating both the summary statistic and its variance into the testing procedure. A key message is that the variance structure of the alternative distribution, which is subsumed under standardized statistics, is highly informative and can be exploited to achieve higher power. The proposed HART procedure is shown to be asymptotically valid and optimal for false discovery rate (FDR) control. Our simulation results demon- strate that HART achieves substantial power gain over existing methods at the same FDR level. We illustrate the implementation through a microarray analysis of myeloma. Keywords: covariate-assisted inference; data processing and information loss; false dis- covery rate; heteroscedasticity; multiple testing with side information; structured multiple testing 1Department of Mathematics, San Francisco State University. 2Department of Statistics, Fudan University. 3Department of Data Sciences and Operations, University of Southern California. The research of Wen- guang Sun was supported in part by NSF grant DMS-1712983.
    [Show full text]
  • Comprehensive Comparative Analysis of Local False Discovery Rate Control Methods
    H OH metabolites OH Article Comprehensive Comparative Analysis of Local False Discovery Rate Control Methods Shin June Kim, Youngjae Oh and Jaesik Jeong * Department of Mathematics and Statistics, Chonnam National University, Gwangju 61186, Korea; [email protected] (S.J.K.); [email protected] (Y.O.) * Correspondence: [email protected]; Tel.: +82-062-530-3442 Abstract: Due to the advance in technology, the type of data is getting more complicated and large-scale. To analyze such complex data, more advanced technique is required. In case of omics data from two different groups, it is interesting to find significant biomarkers between two groups while controlling error rate such as false discovery rate (FDR). Over the last few decades, a lot of methods that control local false discovery rate have been developed, ranging from one-dimensional to k-dimensional FDR procedure. For comparison study, we select three of them, which have unique and significant properties: Efron’s approach, Ploner’s approach, and Kim’s approach in chronological order. The first approach is one-dimensional approach while the other two are two-dimensional ones. Furthermore, we consider two more variants of Ploner’s approach. We compare the performance of those methods on both simulated and real data. Keywords: biomarker; familywise error rate; false discovery rate; large scale inference 1. Introduction Due to the advent of advanced high-throughput technologies, a large amount of raw data have been produced and various methods that can appropriately preprocess Citation: Kim, S.J.; Oh, Y.; Jeong, J. such data have been developed. After various preprocessing steps, statistical methods Comprehensive Comparative are applied to the preprocessed, yet large-scale data.
    [Show full text]
  • Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
    JournalofMachineLearningResearch14(2013)1423-1459 Submitted 4/08; Revised 8/12; Published 5/13 Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators Pierre Neuvial∗ [email protected] Laboratoire Statistique et Génome Université d’Évry Val d’Essonne UMR CNRS 8071 – USC INRA 23 boulevard de France 91 037 Évry, France Editor: Olivier Teytaud Abstract The False Discovery Rate (FDR) is a commonly used type I error rate in multiple testing problems. It is defined as the expected False Discovery Proportion (FDP), that is, the expected fraction of false positives among rejected hypotheses. When the hypotheses are independent, the Benjamini- Hochberg procedure achieves FDR control at any pre-specified level. By construction, FDR control offers no guarantee in terms of power, or type II error. A number of alternative procedures have been developed, including plug-in procedures that aim at gaining power by incorporating an estimate of the proportion of true null hypotheses. In this paper, we study the asymptotic behavior of a class of plug-in procedures based on kernel estimators of the density of the p-values, as the number m of tested hypotheses grows to infinity. In a setting where the hypotheses tested are independent, we prove that these procedures are asymptotically more powerful in two respects: (i) a tighter asymptotic FDR control for any target FDR level and (ii) a broader range of target levels yielding positive asymptotic power. We also show that this increased asymptotic power comes at the price of slower, non-parametric convergence rates k/(2k+1) for the FDP.
    [Show full text]
  • 1 Why Is Multiple Testing a Problem? 2 the Bonferroni Correction
    Spring 2008 - Stat C141/ Bioeng C141 - Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141C-2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman GSI Contact Info: Megan Goldman [email protected] Office Hours: 342 Evans M 10-11, Th 3-4, and by appointment 1 Why is multiple testing a problem? Say you have a set of hypotheses that you wish to test simultaneously. The first idea that might come to mind is to test each hypothesis separately, using some level of significance α. At first blush, this doesn't seem like a bad idea. However, consider a case where you have 20 hypotheses to test, and a significance level of 0.05. What's the probability of observing at least one significant result just due to chance? P(at least one significant result) = 1 − P(no significant results) = 1 − (1 − 0:05)20 ≈ 0:64 So, with 20 tests being considered, we have a 64% chance of observing at least one sig- nificant result, even if all of the tests are actually not significant. In genomics and other biology-related fields, it's not unusual for the number of simultaneous tests to be quite a bit larger than 20... and the probability of getting a significant result simply due to chance keeps going up. Methods for dealing with multiple testing frequently call for adjusting α in some way, so that the probability of observing at least one significant result due to chance remains below your desired significance level. 2 The Bonferroni correction The Bonferroni correction sets the significance cut-off at α=n.
    [Show full text]
  • False Discovery Rate 1 the Pairwise Multiple
    False Discovery Rate 1 The Pairwise Multiple Comparison Multiplicity Problem: An Alternative Approach to Familywise/Comparisonwise Type I Error Control by H. J. Keselman, Robert Cribbie University of Manitoba and Burt Holland Temple University False Discovery Rate 2 Abstract When simultaneously undertaking many tests of significance researchers are faced with the problem of how best to control the probability of committing a Type I error. The familywise approach deals directly with multiplicity problems by setting a level of significance for an entire set (family) of related hypotheses, while the comparison approach ignores the multiplicity issue by setting the rate of error on each individual contrast/test/hypothesis. A new formulation of control presented by Benjamini and Hochberg, their false discovery rate, does not provide as stringent control as the familywise rate, but concomitant with this relaxing in stringency is an increase in sensitivity to detect effects, compared to familywise control. Type I error and power rates for four relatively powerful and easy-to-compute pairwise multiple comparison procedures were compared to the Benjamni and Hochberg technique for various one-way layouts using test statistics that do not assume variance homogeneity. False Discovery Rate 3 The Pairwise Multiple Comparison Multiplicity Problem: An Alternative Approach to Familywise/Comparisonwise Type I Error Control The multiplicity problem in statistical inference refers to selecting the statistically significant findings from a large set of findings (tests) to support one's research hypotheses. Selecting the statistically significant findings from a larger pool of results that also contain nonsignificant findings is problematic since when multiple tests of significance are computed the probability that at least one will be significant by chance alone increases with the number of tests examined.
    [Show full text]
  • Size, Power and False Discovery Rates
    The Annals of Statistics 2007, Vol. 35, No. 4, 1351–1377 DOI: 10.1214/009053606000001460 c Institute of Mathematical Statistics, 2007 SIZE, POWER AND FALSE DISCOVERY RATES By Bradley Efron Stanford University Modern scientific technology has provided a new class of large- scale simultaneous inference problems, with thousands of hypothe- sis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spec- troscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on large-scale problems. A simple empirical Bayes approach allows the false discovery rate (fdr) analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closed-form accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tail-area fdr’s, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology, the power diagnostics showing why nonnull cases might easily fail to appear on a list of “significant” discoveries. 1. Introduction. Large-scale simultaneous hypothesis testing problems, with hundreds or thousands of cases considered together, have become a familiar feature of current-day statistical practice. Microarray methodology spearheaded the production of large-scale data sets, but other “high through- put” technologies are emerging, including time of flight spectroscopy, pro- teomic devices, flow cytometry and functional magnetic resonance imaging. Benjamini and Hochberg’s seminal paper [3] introduced false discovery rates (Fdr), a particularly useful new approach to simultaneous testing.
    [Show full text]
  • P and Q Values in RNA Seq the Q-Value Is an Adjusted P-Value, Taking in to Account the False Discovery Rate (FDR)
    P and q values in RNA Seq The q-value is an adjusted p-value, taking in to account the false discovery rate (FDR). Applying a FDR becomes necessary when we're measuring thousands of variables (e.g. gene expression levels) from a small sample set (e.g. a couple of individuals). A p-value of 0.05 implies that we are willing to accept that 5% of all tests will be false positives. An FDR-adjusted p-value (aka a q-value) of 0.05 implies that we are willing to accept that 5% of the tests found to be statistically significant (e.g. by p-value) will be false positives. Such an adjustment is necessary when we're making multiple tests on the same sample. See, for example, http://www.totallab.com/products/samespots/support/faq/pq-values.aspx. -HomeBrew- What are p-values? The object of differential 2D expression analysis is to find those spots which show expression difference between groups, thereby signifying that they may be involved in some biological process of interest to the researcher. Due to chance, there will always be some difference in expression between groups. However, it is the size of this difference in comparison to the variance (i.e. the range over which expression values fall) that will tell us if this expression difference is significant or not. Thus, if the difference is large but the variance is also large, then the difference may not be significant. On the other hand, a small difference coupled with a very small variance could be significant.
    [Show full text]
  • From P-Value to FDR
    From P-value To FDR Jie Yang, Ph.D. Associate Professor Department of Family, Population and Preventive Medicine Director Biostatistical Consulting Core In collaboration with Clinical Translational Science Center (CTSC) and the Biostatistics and Bioinformatics Shared Resource (BB-SR), Stony Brook Cancer Center (SBCC). OUTLINE P-values - What is p-value? - How important is a p-value? - Misinterpretation of p-values Multiple Testing Adjustment - Why, How, When? - Bonferroni: What and How? - FDR: What and How? STATISTICAL MODELS Statistical model is a mathematical representation of data variability, ideally catching all sources of such variability. All methods of statistical inference have assumptions about • How data were collected • How data were analyzed • How the analysis results were selected for presentation Assumptions are often simple to express mathematically, but difficult to satisfy and verify in practice. Hypothesis test is the predominant approach to statistical inference on effect sizes which describe the magnitude of a quantitative relationship between variables (such as standardized differences in means, odds ratios, correlations etc). BASIC STEPS IN HYPOTHESIS TEST 1. State null (H0) and alternative (H1) hypotheses 2. Choose a significance level, α (usually 0.05) 3. Based on the sample, calculate the test statistic and calculate p-value based on a theoretical distribution of the test statistic 4. Compare p-value with the significance level α 5. Make a decision, and state the conclusion HISTORY OF P-VALUES P-values have been in use for nearly a century. The p-value was first formally introduced by Karl Pearson, in his Pearson's chi-squared test and popularized by Ronald Fisher.
    [Show full text]
  • False Discovery Rate Control with Unknown Null Distribution: Is It Possible to Mimic the Oracle?
    False discovery rate control with unknown null distribution: is it possible to mimic the oracle? Etienne Roquain and Nicolas Verzelen Abstract: Classical multiple testing theory prescribes the null distribution, which is of- ten a too stringent assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distri- bution, and how it can be properly learned from the (same) data-set, when possible. We explore this issue in the case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and the alternative distribution is let arbitrary. While an oracle procedure in that case is the Benjamini Hochberg procedure applied with the true (unknown) null distribution, we pursue the aim of building a procedure that asymptotically mimics the performance of the oracle (AMO in short). Our main result states that an AMO procedure exists if and only if the sparsity parameter k (number of false nulls) is of order less than n= log(n), where n is the total number of tests. Further sparsity boundaries are derived for general location models where the shape of the null distribution is not necessarily Gaussian. Given our impossibility results, we also pursue a weaker objective, which is to find a confidence region for the oracle. To this end, we develop a distribution-dependent confidence region for the null distribution. As practical by-products, this provides a goodness of fit test for the null distribution, as well as a visual method assessing the reliability of empirical null multiple testing methods. Our results are illustrated with numerical experiments and a companion vignette Roquain and Verzelen(2020).
    [Show full text]