Discovering the False Discovery Rate
Total Page:16
File Type:pdf, Size:1020Kb
Discovering the False Discovery Rate Yoav Benjamini Tel Aviv University www.math. tau .ac. il/~ybenja October 2005 Louvain 1 Copyright, 1996 © Dale Carnegie & Associates, Inc. Outline of Lectures 1. Discovering the False Discovery Rate (FDR) 2. FDR “ testimation ” and model selection 3. Seminar Talk: FDR confidence intervals 4. Multiplicity issues in genetic research Y Benjamini Louvain ‘05 Outline of this Talk • The multiplicity problem • The FDR criterion and Variations • The Linear Step Up procedure • Other procedures • Open problem Y Benjamini Louvain ‘05 The multiplicity problem - pairwise comparisons A standard example: Erdman (1946), Steel and Torrie (1960) The data includes six groups of five measurements each, on Nitrogen content of red clover plants, the groups inoculated with strains of Rhizobium bacteria. Explaining ANOVA in SAS/Stat, 1985, 1988 manuals, Holland (1987), Rom and Holland (1995). # of comparisons in examples 6*(6-1)/2 = 15 Traditional Approach: Protect against the possible inflation in type I error rate, by controlling the probability of making even one error: control the Family Wise Error-rate ( FWE ) (Tukey ‘53) Y Benjamini Louvain ‘05 The multiplicity problem - formulation 1. The null hypotheses tested: H1,H 2,…,Hm. Let hi= 0 indicates H i is true; = 1 indicates it is not. Σ(1− m0 of the m hypotheses are true, i.e. m0= hi); we do not know which ones are true or even their number 2. The result of any testing procedure is Ri i=1,2,…,m: Ri= 1 if H i is rejected; = 0 if not Let Vi = 1 if Ri=1 but H i is true (a type I error was made ) = 0 otherwise Σ R= Ri # hypotheses rejected; Σ V= Vi # hypotheses rejected in error α 3. The multiplicity problem : Prob(V i ≥ 1) = for all i, yet FWE = Prob(V ≥ 1) may be much larger Y Benjamini Louvain ‘05 Old and trusted solutions α If we test each hypothesis separately at level BON Σ Σ α α E(V)=E( Vi) = E( Vi) ≤ m 0 BON ≤ m BON α α So to assure E(V) we may use BON = /m This also assures P(V ≥1) ≤ α Because: E(V) = 0Pr(V=0)+1Pr(V=1)+2Pr(V=2)+…+mPr(V=m) ≥ 0 + 1Pr(V=1)+ 1Pr(V=2)+…+ 1Pr(V=m) = 0 + Pr(V ≥ 1) α α So, when using BON = /m for individual tests, FWE =Prob(V ≥ 1) ≤ E(V) ≤ α (Is any conditions needed?) •(1) This is the Bonferroni multiple testing procedure Y Benjamini Louvain ‘05 More old and trusted solutions If the test statistics are independent, α and we test each hypothesis separately at level SID α m0 α m α Prob(V ≥1)=1-Prob(V=0) = 1-(1- SID ) ≤ 1-(1- SID ) ≤ So to assure Prob(V ≥ 1) ≤ α we may use α α 1/m SID =1-(1- ) (2) This is Sidak’s multiple testing procedure Note: If m0=m equalities Y Benjamini Louvain ‘05 More old and trusted solutions Idea : we use dependency structure to get a better test. How much better? α α 1/m α α2 α α2 SID =1-(1- ) ~ 1-(1- /m - /2m) = BON + /2m Even for small m (=10) very little gain: .005116 instead of .005 (3) Tukey’s procedure for pairwise comparisons: same idea but larger gain Y Benjamini Louvain ‘05 Newer solutions Stepwise procedures that make use of individual observed p-values: (4) Holm’s procedure: Let Pi be the observed p-value of the test for Hi Order the p-values P(1) ≤ P (2) ≤…≤ P (m) α If P(1) ≤ /m Reject H(1) α If P(2) ≤ /(m -1) Reject H(2) Continue this way… α Until for the first time P(k) > /(m -(k -1)) then stop and reject no more. Always: FWE ≤ α Y Benjamini Louvain ‘05 The multiplicity problem - its status •Pairwise comparisons - controlling FWE is the standard Post-Hoc analysis (Tukey/Scheffe) •Clinical trials: Multiple Arms Multiple Looks Multiple endpoints (Primary, Secondary ) •Medical Research ( NEJM (Friday) , Other Journals) •Biostatistical Research, Epidemiology (Ottenbacher ’98) •Genetic Research (Next Week) •Science at large • Why? Y Benjamini Louvain ‘05 Behavior genetics • Study the genetics of behavioral traits: Hearing, sight, smell, alcoholism, locomotion, fear, exploratory behavior • These are complex traits • Compare inbred strains, crosses, knockouts… Y Benjamini Louvain ‘05 Example: exploratory behavior NIH: Phenotyping Mouse Behavior High throughput screening of mutant mice , Stops, move segments Behavior Tracking and velocity profile Comparing between 8 inbred strains of mice Dr. Ilan Golani TAU Dr. Elmer, MPRC , Dr Kafkafi, NIDA Y Benjamini Louvain ‘05 Significance of 8 Strain differences Behavioral Endpoint Mixed Prop. Lingering Time 0.0029 # Progression segments 0.0068 Bonferroni Median Turn Radius (scaled) 0.0092 .05/17=.0029 Time away from wall 0.0108 Distance traveled 0.0144 Acceleration 0.0146 # Excursions 0.0178 Time to half max speed 0.0204 Max speed wall segments 0.0257 Median Turn rate 0.0320 Spatial spread 0.0388 Lingering mean speed 0.0588 Homebase occupancy 0.0712 Unadjusted # stops per excursion 0.1202 Stop diversity 0.1489 Length of progression segments 0.5150 Activity decrease 0.8875 Y Benjamini Louvain ‘05 Current multiplicity problems - large problems Revisiting Erdman (1946) 2 such groups of 6 +1 “general” control # of comparisons in study 13*(13-1)/2 = 78 Williams Johns and Tukey (1999) Results of Educational Progress assessed by testing Pairwise comparisons between 35 States in US # of comparisons 35*(35-1)/2 = 595 What should be reported? Y Benjamini Louvain ‘05 High throughput screening of Chemical Compounds (Proteomics) (with Frank Bretz) • Purpose: at early stages of drug development, screen a large number of potential chemical compounds, in order to find any interaction with a given class of compounds (a "hit" ) • The classes may be substructures of libraries of compounds involving up to 10 5 members. • Each potential compound interaction with class member is tested once and only once Y Benjamini Louvain ‘05 High Throughput Screening with Microtiters i=74 j=1 plate i i=1 row j j=8 Negative control Positive control k=2 k=11 10x8 Potential Compounds Y Benjamini Louvain ‘05 High Throughput Screening • Step 1: Analyzing the negative control data 74 plates x 8 rows Get comparison values per plate and s.e. • Step 2: Conduct individual comparisons 74 plates x 80 potential compounds − + Xplate, row,col (C Cplate ) Note positive dependency within plate because of Y Benjamini Louvain ‘05 The dilemma • Not controlling for multiplicity, working at .05 we expect 74 plates x 80 compounds x .05 = 296 (statistical) discoveries possibly just due to noise • Controlling for multiplicity, working at .05 a single comparison has to be significant at .05 / (74 x 80) = 0.000008 to make it to the list of discoveries Y Benjamini Louvain ‘05 Outline • The multiplicity problem • The FDR criterion and Variations • The Linear Step Up procedure • Other procedures • FDR testimation with confidence • Open problem Y Benjamini Louvain ‘05 The False Discovery Rate (FDR) criterion Benjamini and Hochberg (95) R = # rejected hypotheses = # discoveries V of these may be in error = # false discoveries The error (type I) in the entire study is measured by V Q = R > 0 R = 0 R = 0 i.e. the proportion of false discoveries among the discoveries (0 if none found) FDR = E(Q) Does it make sense? Y Benjamini Louvain ‘05 Does it make sense? • Inspecting 100 features: 2 false ones among 50 discovered - bearable 2 false ones among 4 discovered - unbearable So this error rate is adaptive • The same argument holds when inspecting 10,000 So this error rate is scalable • If nothing is “real” controlling the FDR at level q guarantees Prob ( V ≥ 1 ) = E( V/R ) = FDR ≤ q • But otherwise Prob ( V ≥ 1 ) ≥ FDR So there is room for improving detection power Y Benjamini Louvain ‘05 Extensions • Directional FDR An error can be that of declaring a negative parameter to be positive - a directional error V+ of the errors may be directional errors V0 of the errors may be errors of rejecting 0 values D-FDR = E( V+ +V/ R 0)/ R) • Weighted FDR associated with each hypothesis i is a weight wi ΣΣ ΣΣ W-FDR = E( wiVi / wiRi) Y Benjamini The weights capture importance/price Louvain ‘05 False Non -Discovery Rates (FNR) Sarkar; Genovese and Wasserman FNR = E(T / {m-R}) = E( {m -m0 -(R -V) } / {m-R}) How about: Minimize FNR s.t. FDR ≤ q why not : Minimize FDR s.t. FNR ≤ q Y Benjamini Louvain ‘05 • Other versions of False Discovery Rates : – Genovese and Wasserman emphasize the sample quantity V/R : False Discovery Proportion or FDP – …and using a fixed rejection rule at a FDP( a) = V( a)/R( a) – Storey emphasizes E(V/R | R>0) : positive FDR or pFDR …and using a fixed rejection rule at a pFDR (a) = E(V( a)/R( a) | R( a)>0) Both cannot be controlled when nothing is real yet they give more perspectives on the FDR Y Benjamini Louvain ‘05 • Recent interest in Tail probability of FDP – Genovese and Wasserman, Lehmann and Romano, van der Laan at al emphasize also The tail probability of the False Discovery Proportion Prob ( V/R ≥ q ) This is an attractive criterion. But obviously if we want Prob ( V/R ≤ q ) > 1 - β for some small β it is more strict criterion than controlling the expectation below q. We’ll see what it means as we gain experience Y Benjamini Louvain ‘05 Historical Perspective (I) FDR control in BH was motivated by a paper of Soriç (1987). In the direction in which we went: • Prof. Victor (1997) brought to my attention his note from (1982) with independent previous efforts: 1.