Testing Multiple Hypotheses and False Discovery Rate Models, Inference, and Algorithms Primer

Manuel A. Rivas Broad Institute Let’s assume that we wish to examine the association between a response and m different covariates When m tests are performed, the aim is to decide which of the nulls should be rejected. Not flagged Flagged

H0

H1 Not flagged Flagged

H0

H1

K

This table shows the possibilities when m tests are performed and K are flagged as requiring further attention. Not flagged Flagged

H0 m0

H1

K

m0 is the number of true nulls Not flagged Flagged

H0 B m0

H1

K

B is the number of type I errors Not flagged Flagged

H0 B m0

H1 C

K

C is the number of type II errors Not flagged Flagged

H0 A B m0

H1 C D m1

m - K K m m1 is the number of true alternatives Not flagged Flagged

H0 A B m0

H1 C D m1

m - K K m

Each of these quantities is unknown. The aim is to select a rule on the basis of some criterion and this in turn will determine K. To illustrate the multiple testing problem we focus on GWAS as an example where we typically test the null hypothesis

H0 : =0

i.e. the effect of the genetic variant is 0. In a single test situation the historical emphasis has been on the control of the type I error rate (false positives). In a multiple testing situation there are a variety of criteria that may be considered. In a multiple testing situation there are a variety of criteria that may be considered:

Frequentist analysis 1. Bonferroni method 2. Sidák correction 3. Benjamini and Hochberg (FDR) 4. Storey (FDR) In a multiple testing situation there are a variety of criteria that may be considered:

Frequentist analysis 1. Bonferroni method 2. Sidák correction 3. Benjamini and Hochberg (FDR) 4. Storey (FDR) Bayesian analysis 1. Bayesian Bonferroni-type correction 2. Mixture models 3. Matthew Stephens’ FDR approach In a multiple testing situation there are a variety of criteria that may be considered:

Frequentist analysis 1. Bonferroni method 2. Sidák correction 3. Benjamini and Hochberg (FDR) 4. Storey (FDR) Bayesian analysis 1. Bayesian Bonferroni-type correction 2. Mixture models 3. Matthew Stephens’ FDR approach Frequentist analysis

Family-wise error rate (FWER): the probability of making at least one type I error Frequentist analysis

Family-wise error rate (FWER): the probability of making at least one type I error

P (B 1 H =0,...,H = 0) | 1 m Frequentist analysis Bonferroni method

Let Bi be the event that the ith null is incorrectly rejected, so that, B, the representing the number of incorrectly rejected nulls, corresponds to the union of all incorrectly rejected nulls, i.e. m B [i=1 i Frequentist analysis Bonferroni method

With a common level for each test ↵⇤ the family-wise error rate (FWER) is

↵ = P (B 1 H =0,...,H = 0) = P ( m B H =0,...,H = 0) F | 1 m [i=1 i| 1 m m P (B H =0,...,H = 0)  i| 1 m i=1 X = m↵⇤ Frequentist analysis Bonferroni method

↵ = P (B 1 H =0,...,H = 0) = P ( m B H =0,...,H = 0) F | 1 m [i=1 i| 1 m m P (B H =0,...,H = 0)  i| 1 m i=1 X = m↵⇤

The Bonferroni method takes

↵⇤ = ↵F /m

to give FWER ↵F .  Frequentist analysis Bonferroni method

Preferred approach for GWAS where to control the FWER at a level of alpha = 0.05 with m = 1,000,000 tests, we would take

8 ↵⇤ = .05/1, 000, 000 = 5 10 . ⇥ Frequentist analysis Sidák correction

Overcomes conservatism introduced by inequality If test are independent,

P (B 1) = 1 P (B = 0) =1 P m B0 \i=1 i m⇣ ⌘ =1 P B0 i i=1 Y ⇣ ⌘m =1 P (1 ↵⇤) Frequentist analysis Sidák correction

Overcomes conservatism introduced by inequality If test statistics are independent,

1/m ↵⇤ =1 (1 ↵ ) . F In GWAS, assuming 1,000,000 tests were independent this would change it slightly to 5.13e-8 as a p-value threshold. Frequentist analysis False Discovery Rate (FDR)

A simple way to overcome the conservative nature of the control of FWER is to increase ↵F . Frequentist analysis False Discovery Rate (FDR)

A simple way to overcome the conservative nature of the control of FWER is to increase ↵F .

One measure to calibrate a procedure is via the expected number of false discoveries:

EFD = m ↵⇤ 0 ⇥ m ↵⇤  ⇥ Frequentist analysis False Discovery Rate (FDR)

A simple way to overcome the conservative nature of the control of FWER is to increase ↵F .

One measure to calibrate a procedure is via the expected number of false discoveries:

EFD = m0 ↵⇤ Recall m0 is the ⇥ m ↵⇤ number of true nulls.  ⇥ Frequentist analysis False Discovery Rate (FDR)

For example, we could specify ↵⇤ such that EFD <= 1

We choose: ↵⇤ =1/m.

6 ↵⇤ =1 10 ⇥ for GWAS with 1,000,000 markers. Frequentist analysis False Discovery Rate (FDR)

We introduce the false discovery proportion (FDP) as the proportion of incorrect rejections:

B B is the number of type I errors FDP = . K Frequentist analysis False Discovery Rate (FDR)

We introduce the false discovery proportion (FDP) as the proportion of incorrect rejections:

K is the number B FDP = . flagged for additional attention K Frequentist analysis False Discovery Rate (FDR)

False Discovery Rate (FDR), the expected proportion of rejected nulls that are actually true:

FDR = E [FDP] = E [B/K K>0] P (K>0) | Frequentist analysis False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure

1. Let P <

For independent p-values, each of which is uniform under the null. Frequentist analysis False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure

1. Let P <

2. Assume we would like FDR control at ↵ =0.05

For independent p-values, each of which is uniform under the null. Frequentist analysis 1. Let P <

Let li = i↵/m and = max i : P(i)

  • False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure Frequentist analysis

    1. Assume m = 1,000,000 independent tests. 2. Assume P(10) = 4.5e-7 and P(11) = 5.7e-7

    10*.05/1,000,000 = 5e-7 P(10) < 5e-7

    GWAS example False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure Frequentist analysis

    1. Assume m = 1,000,000 independent tests. 2. Assume P(10) = 4.5e-7 and P(11) = 5.7e-7

    10*.05/1,000,000 = 5e-7 P(10) < 5e-7 11*.05/1,000,000 = 5.5e-7 P(11) ⌅ 5.5e-7

    GWAS example False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure Frequentist analysis

    1. Assume m = 1,000,000 independent tests. 2. Assume P(10) = 4.5e-7 and P(11) = 5.7e-7

    10*.05/1,000,000 = 5e-7 P(10) < 5e-7 11*.05/1,000,000 = 5.5e-7 P(11) ⌅ 5.5e-7

    3. Use p-value threshold at P(10).

    GWAS example False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure Frequentist analysis False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure

    If procedure is applied, then regardless of how many nulls are true (m0) and regardless of the distribution of the p-values when the null is false m FDR 0 ↵ < ↵.  m

    FDR is controlled at ↵. Frequentist analysis False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure Bonferroni = 5%

    FDR is controlled at ↵. Frequentist analysis False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure

    FDR = 5%

    FDR is controlled at ↵. Frequentist analysis False Discovery Rate (FDR) Benjamini and Hochberg (1995) procedure

    EFD = 1

    FDR is controlled at ↵. Frequentist analysis False Discovery Rate (FDR) Storey (2002)

    Introduced the q-value

    q (t)=P (H =0T>t) |

    For each observed we can obtain an associated q- value, which tells us the proportion of false positives incurred at a thresholded statistic. In a multiple testing situation there are a variety of criteria that may be considered:

    Frequentist analysis 1. Bonferroni method 2. Sidák correction 3. Benjamini and Hochberg (FDR) 4. Storey (FDR) Bayesian analysis 1. Bayesian Bonferroni-type correction 2. Mixture models 3. Matthew Stephens’ FDR approach Bayesian analysis

    Bayes Factors

    Defined as ratios of marginal likelihoods of the data under two models.

    Bayes Factor = P (Data M ) /P (Data M ) i | 0 | 1

    Model 0 can be the null model. Bayesian analysis

    Bayes Factors

    We apply the same procedure m times (can be genetic variants for instance). Bayesian analysis

    Bayes Factors

    Combine with prior probabilities

    Posterior Odds = Prior Odds i i ⇥ i where

    Prior Odds = ⇡ / (1 ⇡ ) . i 0i 0i Bayesian analysis

    Bayes Factors

    Combine with prior probabilities

    Posterior Odds = Bayes Factor Prior Odds i i ⇥ i where

    Prior Odds = ⇡ / (1 ⇡ ) . i 0i 0i Bayesian analysis

    Bayesian Bonferroni-type correction

    If the prior probabilities of each of the nulls are independent with ⇡0i = ⇡0 for i = 1, …, m.

    Then that all nulls are true is

    m ⇧0 = P (H1 =0,...,Hm = 0) = ⇡0 Bayesian analysis

    Bayesian Bonferroni-type correction

    If the prior probabilities of each of the nulls are independent with ⇡0i = ⇡0 for i = 1, …, m. Suppose that we wish to fix the prior probability that all of the 1/m nulls are true at ⇧ 0 . We can fix ⇡0i = ⇧0 Bayesian analysis

    Mixture model

    Estimate common parameters like the proportion of null tests

    Gibbs sampler. Bayesian analysis

    Matthew Stephens’ FDR approach Bayesian analysis

    Matthew Stephens’ FDR approach Open source R package

    ashr

    http://github.com/stephens999/ashr Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas

    1. Assumes distribution of effects is unimodal, with a at 0. Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas

    1. Assumes distribution of effects is unimodal, with a mode at 0. 2. Takes as input two numbers: i) estimate and ii) corresponding . Bayesian analysis

    Matthew Stephens’ FDR approach Model outline Data corresponds to effect size estimates and corresponding (estimated) standard errors, i.e.

    ˆ = ˆ1,...,ˆm ⇣ ⌘ sˆ =(ˆs1,...,sˆm) . Bayesian analysis

    Matthew Stephens’ FDR approach Model outline

    Goal is to compute a posterior distribution

    p ˆ, sˆ p ( sˆ) p ˆ , sˆ . | / | | ⇣ ⌘ Prior Likelihood⇣ ⌘ Bayesian analysis

    Matthew Stephens’ FDR approach Model outline

    Goal is to compute a posterior distribution

    p ˆ, sˆ p ( sˆ) p ˆ , sˆ . | / | | Key: For p ( sˆ )⇣ assumption⌘ is that⇣ the⌘ betas | are independent from a unimodal distribution. “Unimodal assumption” Bayesian analysis

    Matthew Stephens’ FDR approach Model outline Assume that it is a mixture of point mass at 0 and a mixture of zero- normal distributions: K p ( s,ˆ ⇡)=⇡ ( )+ ⇡ ;0, 2 | 0 0 · kN · k k=1 Estimate mixture proportions X mixture component standard deviations is a grid

    “Unimodal assumption” Bayesian analysis

    Matthew Stephens’ FDR approach Model outline

    For the likelihood p ˆ , sˆ | ⇣ ⌘ m p ˆ , sˆ = ˆ ; , sˆ2 | N j j j j=1 ⇣ ⌘ Y ⇣ ⌘ Bayesian analysis

    Matthew Stephens’ FDR approach Model outline

    Goal is to compute a posterior distribution

    p ˆ, sˆ p ( sˆ) p ˆ , sˆ . | / | | ⇣ ⌘ ⇣ ⌘

    “Unimodal assumption” Bayesian analysis

    Matthew Stephens’ FDR approach Model outline

    Goal is to compute a posterior distribution

    p ˆ, sˆ p ( sˆ) p ˆ , sˆ . | / | | ⇣ ⌘ ⇣ ⌘

    Measurement precision in likelihood Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas

    1. Assumes distribution of effects is unimodal, with a mode at 0. 2. Takes as input two numbers: i) effect size estimate and ii) corresponding standard error.

    3. local false sign rate - probability of getting sign of effect wrong. Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas

    local false discovery rate (“local FDR”)

    lfdr := P =0ˆ, s,ˆ ⇡ˆ . j j | ⇣ ⌘ probability, given the observed data, that effect j would be a false discovery, if we were to declare it a discovery.

    3. local false sign rate - probability of getting sign of effect wrong. Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas

    lfdr := P =0ˆ, s,ˆ ⇡ˆ . j j | ⇣ ⌘ some statisticians argue that it is inappropriate because that null hypothesis is often implausible. Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas

    lfdr := P =0ˆ, s,ˆ ⇡ˆ . j j | ⇣ ⌘ FDR[ ():=(1/ ) lfdr | | j j X2 Can obtain an estimate of the average error rate over subsets of observations, for example if you declared all tests in as significant. Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas

    Tukey stated:

    All we know about the world teaches us that the effects of A and B are always different - in some decimal place - for any A and B. Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas

    Tukey suggested:

    Is the evidence strong enough to support a belief that the observed difference has the correct sign?

    3. local false sign rate - probability of getting sign of effect wrong. Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas 3. local false sign rate - probability of getting sign of effect wrong.

    lfsr := min p 0 ˆ,s ,p 0 ⇡ˆ, ˆ,s . j j | j  | h ⇣ ⌘ ⇣ ⌘i Bayesian analysis

    Matthew Stephens’ FDR approach Three key ideas 3. local false sign rate - probability of getting sign of effect wrong.

    lfsr := min p 0 ˆ,s ,p 0 ⇡ˆ, ˆ,s . j j | j  | h ⇣ ⌘ ⇣ ⌘i Gelman proposed focusing on “type S errors”, errors in sign, rather than traditional type I errors. Bayesian analysis

    Matthew Stephens’ FDR approach Other results/observations covered, but not in this primer

    1. Computation/Implementation details 2. Comparisons to other approaches Bayesian analysis

    Multiple tests of the same null hypothesis

    In genetics we may be interested in applying: 1) additive model, 2) dominant, and 3) recessive model.

    How to correct? Bayesian analysis

    Multiple tests of the same null hypothesis

    In genetics we may be interested in applying: 1) additive model, 2) dominant, and 3) recessive model.

    Bonferroni? Bayesian analysis

    Multiple tests of the same null hypothesis

    In genetics we may be interested in applying: 1) additive model, 2) dominant, and 3) recessive model.

    Bonferroni? Too conservative. Bayesian analysis

    Multiple tests of the same null hypothesis

    In genetics we may be interested in applying: 1) additive model, 2) dominant, and 3) recessive model.

    Minimum p-value? Bayesian analysis

    Multiple tests of the same null hypothesis

    In genetics we may be interested in applying: 1) additive model, 2) dominant, and 3) recessive model.

    Minimum p-value? not valid p-value since it is not uniform under the null. (can modify null though) Bayesian analysis

    Multiple tests of the same null hypothesis

    In genetics we may be interested in applying: 1) additive model, 2) dominant, and 3) recessive model.

    Minimum p-value? not valid p-value since it is not uniform under the null.

    Permutation. Bayesian analysis

    Multiple tests of the same null hypothesis

    In genetics we may be interested in applying: 1) additive model, 2) dominant, and 3) recessive model.

    Bayesian model averaging.

    Hoeting, Madigan, Raftery and Volinksy Statisticial Science 1999