Optimal False Discovery Rate Control for Large Scale
Multiple Testing with Auxiliary Information
Hongyuan Cao1, Jun Chen2 and Xianyang Zhang3
Abstract
Large-scale multiple testing is a fundamental problem in high dimensional statistical
inference. It is increasingly common that various types of auxiliary information, reflect-
ing the structural relationship among the hypotheses, are available. Exploiting such
auxiliary information can boost statistical power. To this end, we propose a framework
based on a two-group mixture model with varying probabilities of being null for dif-
ferent hypotheses a priori, where a shape-constrained relationship is imposed between
the auxiliary information and the prior probabilities of being null. An optimal rejection
rule is designed to maximize the expected number of true positives when average false
discovery rate is controlled. Focusing on the ordered structure, we develop a robust
EM algorithm to estimate the prior probabilities of being null and the distribution of
p-values under the alternative hypothesis simultaneously. We show that the proposed arXiv:2103.15311v1 [stat.ME] 29 Mar 2021 method has better power than state-of-the-art competitors while controlling the false
discovery rate, both empirically and theoretically. Extensive simulations demonstrate
the advantage of the proposed method. Datasets from genome-wide association studies
are used to illustrate the new methodology.
1 Department of Statistics, Florida State University, Tallahassee, FL, U.S.A. 2 Division of Biomedical Statistics, Mayo Clinic, Rochester, MN, U.S.A. 3 Department of Statistics, Texas A&M University, College Park, U.S.A. The authors are ordered alpha- betically. Correspondence should be addressed to Xianyang Zhang ([email protected]) Keywords: EM algorithm, False discovery rate, Isotonic regression, Local false discovery rate, Multiple testing, Pool-Adjacent-Violators algorithm.
1 Introduction
Large scale multiple testing refers to simultaneously testing of many hypotheses. Given a pre- specified significance level, family-wise error rate (FWER) controls the probability of making one or more false rejections, which can be unduly conservative in many applications. The false discovery rate (FDR) controls the expected value of the false discovery proportion, which is defined as the ratio of the number of false rejections divided by the number of total rejections.
Benjamini and Hochberg (BH) [5] proposed a FDR control procedure that sets adaptive thresholds for the p-values. It turns out that the actual FDR level of the BH procedure is the multiplication of the proportion of null hypotheses and the pre-specified significance level. Therefore, the BH procedure can be overly conservative when the proportion of null hypotheses is far from one. To address this issue, [43] proposed a two-stage procedure (ST), which first estimates the proportion of null hypotheses and uses the estimated proportion to adjust the threshold in the BH procedure at the second stage. From an empirical Bayes perspective, [17] proposed the notion of local FDR (Lfdr) based on the two-group mixture model. [45] developed a step-up procedure based on Lfdr and demonstrated its optimality from the compound decision viewpoint.
The aforementioned methods are based on the premise that the hypotheses are exchange- able. However, in many scientific applications, particularly in genomics, auxiliary information regarding the pattern of signals is available. For instance, in differential expression analysis of
RNA-seq data, which tests for difference in the mean expression of the genes between condi- tions, the sum of read counts per gene across all samples could be the auxiliary data since it is informative of the statistical power [35]. In differential abundance analysis of microbiome
1 sequencing data, which tests for difference in the mean abundance of the detected bacterial species between conditions, the genetic divergence among species is important auxiliary in- formation, since closely-related species usually have similar physical characteristics and tend to covary with the condition of interest [50]. In genome-wide association studies, the major objective is to test for association between the genetic variants and a phenotype of interest.
The minor allele frequency and the pathogenicity score of the genetic variants, which are informative of the statistical power and the prior null probability, respectively, are potential auxiliary data, which could be leveraged to improve the statistical power as well as enhance interpretability of the results.
Accommodating auxiliary information in multiple testing has recently been a very active research area. Many methods have been developed adapting to different types of structure among the hypotheses. The basic idea is to relax the p-value thresholds for hypotheses that are more likely to be alternative and tighten the thresholds for the other hypotheses so that the overall FDR level can be controlled. For example, [19] proposed to weight the p-values with different weights, and then apply the BH procedure to the weighted p-values. [23] developed a group BH procedure by estimating the proportion of null hypotheses for each group separately.
[34] generalized this idea by using the censored p-values (i.e., the p-values that are greater than a pre-specified threshold) to adaptively estimate the weights that can be designed to reflect any structure believed to be present. [25, 26] proposed the independent hypothesis weighting
(IHW) for multiple testing with covariate information. The idea is to use cross-weighting to achieve finite-sample FDR control. Note that the binning in IHW is only to operationalize the procedure and it can be replaced by the proposed EM algorithm below.
The above procedures can be viewed to some extent as different variants of the weighted-BH procedure. Another closely related method was proposed in [30], which iteratively estimates the p-value threshold using partially masked p-values. It can be viewed as a type of Knockoff procedure [? ] that uses the symmetry of the null distribution to estimate the false discovery
2 proportion.
Along a separate line, Lfdr-based approaches have been developed to accommodate var- ious forms of auxiliary information. For example, [9] considered multiple testing of grouped hypotheses. The authors proposed an optimal data-driven procedure that uniformly improves the pooled and separate analyses. [44] developed an Lfdr-based method to incorporate spa- tial information. [40, 47] proposed EM-type algorithms to estimate the Lfdr by taking into account covariate and spatial information, respectively.
Other related works include [18], which considers the two-group mixture models with side-information. [13] develops a method for estimating the constrained optimal weights for
Bonferroni multiple testing. [7] proposes an FDR-controlling procedure based on the covariate- dependent null probabilities.
In this paper, we develop a new method along the line of research on Lfdr-based approaches by adaptively estimating the prior probabilities of being null in Lfdr that reflect auxiliary information in multiple testing. The proposed Lfdr-based procedure is built on the optimal rejection rule as shown in Section 2.1 and thus is expected to be more powerful than the weighted-BH procedure when the underlying two-group mixture model is correctly specified.
Compared to existing work on Lfdr-based methods, our contributions are three-fold. (i) We outline a general framework for incorporating various forms of auxiliary information. This is achieved by allowing the prior probabilities of being null to vary across different hypotheses.
We propose a data-adaptive step-up procedure and show that it provides asymptotic FDR control when relevant consistent estimates are available. (ii) Focusing on the ordered structure, where auxiliary information generates a ranked list of hypotheses, we develop a new EM-type algorithm [12] to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. Under monotone constraint on the density function of p-values under the alternative hypothesis, we utilize the Pool-Adjacent-Violators
Algorithm (PAVA) to estimate both the prior probabilities of being null and the density
3 function of p-values under the alternative hypothesis (see [20] for early work on this kind of problems). Due to the efficiency of PAVA, our method is scalable to large datasets arising in genomic studies. (iii) We prove asymptotic FDR control for our procedure and obtain some consistency results for the estimates of the prior probabilities of being null and the alternative density, which is of independent theoretical interest. Finally, to allow users to conveniently implement our method and reproduce the numerical results reported in Sections 6-7, we make our code publicly available at https://github.com/jchen1981/OrderShapeEM.
The problem we considered is related but different from the one in [21, 33], where the authors seek the largest cutoff k so that one rejects the first k hypotheses while accepts the remaining ones. So their method always rejects an initial block of hypotheses. In contrast, our procedure allows researchers to reject the kth hypothesis but accept the k − 1th hypothesis in the ranked list. In other words, we do not follow the order restriction strictly. Such flexibility could result in a substantial power increase when the order information is not very strong or even weak, as observed in our numerical studies. Also see the discussions on monotonicity in
Section 1.1 of [40].
To account for the potential mistakes in the ranked list or to improve power by incor- porating external covariates, alternative methods have been proposed in the literature. For example, [36] extends the fixed sequence method to allow more than one acceptance before stopping. [32] modifies AdaPT in [30] by giving analysts the power to enforce the ordered constraint on the final rejection set. Though aiming for addressing a similar issue, our method is motivated from the empirical Bayes perspective, and it is built on the two-group mixture model that allows the prior probabilities of being null to vary across different hypotheses. The implementation and theoretical analysis of our method are also quite different from those in
[32, 36].
Finally, it is also worth highlighting the difference with respect to the recent work [11] which is indeed closely related to ours. First of all, our Theorem 3.3 concerns about the two-
4 group mixture models with decreasing alternative density, while Theorem 3.1 in [11] focuses on a mixture of Gaussians. We generalize the arguments in [48] by considering a transformed class of functions to relax the boundedness assumption on the class of decreasing densities.
A careful inspection of the proof of Theorem 3.3 reveals that the techniques we develop are quite different from those in [11]. Second, we provide a more detailed empirical and theoretical analysis of the FDR-controlling procedure. In particular, we prove that the step-up procedure based on our Lfdr estimates asymptotically controls the FDR and provide the corresponding power analysis. We also conduct extensive simulation studies to evaluate the finite sample performance of the proposed Lfdr-based procedure.
The rest of the paper proceeds as follows. Section 2 proposes a general multiple testing procedure that incorporates auxiliary information to improve statistical power, and establishes its asymptotic FDR control property. In Section 3, we introduce a new EM-type algorithm to estimate the unknowns and study the theoretical properties of the estimators. We discuss two extensions in Section 5. Section 6 and Section 7 are devoted respectively to simulation studies and data analysis. We conclude the paper in Section 8. All the proofs of the main theorems and technical lemmas are collected in the Appendix.
2 Covariate-adjusted multiple testing
In this section, we describe a covariate-adjusted multiple testing procedure based on Lfdr.
2.1 Optimal rejection rule
Consider simultaneous testing of m hypotheses Hi for i = 1, . . . , m based on m p-values x1, . . . , xm, where xi is the p-value corresponding to the ith hypothesis Hi. Let θi, i = 1, . . . , m indicate the underlying truth of the ith hypothesis. In other words, θi = 1 if Hi is non- null/alternative and θi = 0 if Hi is null. We allow the probability that θi = 0 to vary across
5 i. In this way, auxiliary information can be incorporated through
P (θi = 0) = π0i, i = 1, . . . , m. (2.1)
Consider the two-group model for the p-values (see e.g., [15] and Chapter 2 of [16]):
xi | θi ∼ (1 − θi)f0 + θif1, i = 1, . . . , m, (2.2)
where f0 is the density function of the p-values under the null hypothesis and f1 is the density function of the p-values under the alternative hypothesis. The marginal probability density function of xi is equal to
i f (x) = π0if0(x) + (1 − π0i)f1(x). (2.3)
We briefly discuss the identifiability of the above model. Suppose f0 is known and bounded away from zero and infinity. Consider the following class of functions:
˜ ˜1 ˜m ˜i ˜ ˜ Fm = f = (f ,..., f ) with f =π ˜if0 + (1 − π˜i)f1 : min f1(x) = 0, x∈[0,1]
0 ≤ π˜i ≤ 1, min π˜i < 1}. i
˜ ˘ ˜ ˘ ˜i ˜ Suppose f, f ∈ Fm, where the ith components of f and f are given by f =π ˜if0 + (1 − π˜i)f1
˘i ˘ ˜i ˘i and f =π ˘if0 + (1 − π˘i)f1 respectively. We show that if f (x) = f (x) for all x and i, then
˜ ˘ ˜ 0 0 f1(x) = f1(x) andπ ˜i =π ˘i for all x and i. Suppose f1(x ) = 0 for some x ∈ [0, 1]. Ifπ ˜i < π˘i for some i, then we have
˜ 0 ˘ 0 f1(x ) π˘i − π˜i (1 − π˘i)f1(x ) 0 = 0 = + 0 > 0, (2.4) f0(x ) 1 − π˜i (1 − π˜i)f0(x )
6 which is a contradiction. Similarly, we get a contradiction whenπ ˜i > π˘i for some i. Thus we haveπ ˜i =π ˘i for all i. As there exists a i such that 1 − π˜i = 1 − π˘i > 0, it is clear that
˜i ˘i ˜ ˘ f (x) = f (x) implies that f1(x) = f1(x).
In statistical and scientific applications, the goal is to separate the alternative cases (θi = 1) from the null cases (θi = 0). This can be formulated as a multiple testing problem, with
m solutions represented by a decision rule δ = (δ1, . . . , δm) ∈ {0, 1} . It turns out that the optimal decision rule is closely related to the Lfdr defined as
π0if0(x) π0if0(x) Lfdri(x) := P (θi = 0 | xi = x) = = i . π0if0(x) + (1 − π0i)f1(x) f (x)
In other words, Lfdri(x) is the posterior probability that a case is null given the corresponding p-value is equal to x. It combines the auxiliary information (π0i) and data from the current experiment. Information across tests is used in forming f0(·) and f1(·).
Optimal decision rule under mixture model has been extensively studied in the literature, see e.g., [3, 30, 46]. For completeness, we present the derivations below and remark that they follow somewhat directly from existing results. Consider the expected number of false positives (EFP) and true positives (ETP) of a decision rule. Suppose that xi follows the mixture model (2.2) and we intend to reject the ith null hypothesis if xi ≤ ci. The size and power of the ith test are given respectively by
Z ci Z ci αi(ci) = f0(t)dt and βi(ci) = f1(t)dt. 0 0
It thus implies that
m m X X EFP(c) = π0iαi(ci) and ETP(c) = (1 − π0i)βi(ci), i=1 i=1
where c = (c1, . . . , cm). We wish to maximize ETP for a given value of the marginal FDR
7 (mFDR) defined as
EFP(c) mFDR(c) = , (2.5) ETP(c) + EFP(c) by an optimum choice of the cutoff value c. Formally, consider the problem
max ETP(c) subject to mFDR(c) ≤ α. (2.6) c
A standard Lagrange multiplier argument gives the following result which motivates our choice of thresholds.
Proposition 1. Assume that f1 is continuously non-increasing, and f0 is continuously non- decreasing and uniformly bounded from above. Further assume that for a pre-specified α > 0,
(1 − π )f (0) 1 − α min 0i 1 > . (2.7) i π0if0(0) α
Then (2.6) has at least one solution and every solution (˜c1,..., c˜m) satisfies
˜ Lfdri(˜ci) = λ for some λ˜ that is independent of i.
The proof of Proposition 1 is similar to that of Theorem 2 in [30] and we omit the details.
Under the monotone likelihood ratio assumption [10, 45]:
f1(x)/f0(x) is decreasing in x, (2.8)
we obtain that Lfdri(x) is monotonically increasing in x. Therefore, we may reduce our atten-
8 tion to the rejection rule I{xi ≤ ci} as
δi = I{Lfdri(xi) ≤ λ} (2.9) for a constant λ to be determined later.
2.2 Asymptotic FDR control
To fully understand the proposed method, we gradually investigate its theoretical properties through several steps, starting with an oracle procedure which provides key insights into the
m problem. Assume that {π0i}i=1, f0(·) and f1(·) are known. The proposed method utilizes
m auxiliary information through {π0i}i=1 and information from the alternative through f1(·) in addition to information from the null, upon which conventional approaches are based. In view of (2.9), the number of false rejections equals to
m X Vm(λ) = I{Lfdri(xi) ≤ λ}(1 − θi) i=1 and the total number of rejections is given by
m X Dm,0(λ) = I{Lfdri(xi) ≤ λ}. i=1
Write a ∨ b = max{a, b} and a ∧ b = min{a, b}. We aim to find the critical value λ in
(2.9) that controls the FDR, which is defined as FDRm(λ) = E{Vm(λ)/(Dm,0(λ) ∨ 1)} at a pre-specified significance level α. Note that
m m X X E[Vm(λ)] = π0iP (Lfdri(xi) ≤ λ|θi = 0) = E[Lfdri(xi)I{Lfdri(xi) ≤ λ}]. (2.10) i=1 i=1
9 An estimate of the FDRm(λ) is given by
Pm i=1 Lfdri(xi)I{Lfdri(xi) ≤ λ} FDRm(λ) = Pm . i=1 I{Lfdri(xi) ≤ λ}
Let λm = sup{λ ∈ [0, 1] : FDRm(λ) ≤ α}. Then reject Hi if Lfdri(xi) ≤ λm. Below we show that the above (oracle) step-up procedure provides asymptotic control on the FDR under the following assumptions.
(C1) Assume that for any λ ∈ [0, 1],
m 1 X I{Lfdr (x ) ≤ λ} →p D (λ), m i i 0 i=1 m 1 X Lfdr (x )I{Lfdr (x ) ≤ λ} →p D (λ), m i i i i 1 i=1
and 1 V (λ) →p D (λ), (2.11) m m 1
where D0 and D1 are both continuous functions over [0, 1].
(C2) Write R(λ) = D1(λ)/D0(λ), where D0 and D1 are defined in (C1). There exists a
λ∞ ∈ (0, 1] such that R(λ∞) < α.
We remark that (C1) is similar to those for Theorem 4 in [42]. In view of (2.10), (2.11) follows from the weak law of large numbers. Note that (C1) allows certain forms of depen- dence, such as m-dependence, ergodic dependence and certain mixing type dependence. (C2) ensures the existence of the critical value λm to asymptotically control the FDR at level α. The following proposition shows that the oracle step-up procedure provides asymptotic FDR control.
10 Proposition 2. Under conditions (C1)-(C2),
lim FDRm(λm) ≤ α. m→∞
The proof of Proposition 2 is relegated in the Appendix. In the following, we mimic the operation of the oracle procedure and provide an adaptive procedure. In the inference prob- lems that we are interested in, the p-value distribution under the null hypothesis is assumed to be known (e.g., the uniform distribution on [0, 1], or can be obtained from the distributional theory of the test statistic in question). Below we assume f0 is known and remark that our
m result still holds provided that f0 can be consistently estimated. In practice, f1 and {π0i}i=1 ˆ m are often unknown and replaced by their sample counterparts. Let f1(·) and {πˆ0i}i=1 be the
m estimators of f1(·) and {π0i}i=1 respectively. Define
πˆ0if0(x) πˆ0if0(x) Lfdrd i(x) = = , ˆ ˆi πˆ0if0(x) + (1 − πˆ0i)f1(x) f (x)
ˆi ˆ where f (x) =π ˆ0if0(x) + (1 − πˆ0i)f1(x). A natural estimate of λm can be obtained through
( Pm ) Lfdrd i(xi)I{Lfdrd i(xi) ≤ λ} λˆ = sup λ ∈ [0, 1] : i=1 ≤ α . m Pm i=1 I{Lfdrd i(xi) ≤ λ}
ˆ Reject the ith hypothesis if Lfdrd i(xi) ≤ λm. This is equivalent to the following step-up proce- dure that was originally proposed in [45]. Let Lfdrd (1) ≤ · · · ≤ Lfdrd (m) be the order statistics of
(1) (m) {Lfdrd 1(x1),..., Lfdrd m(xm)} and denote by H ,...,H the corresponding ordered hypothe- ses. Define
( i ) ˆ 1 X k := max 1 ≤ i ≤ m : Lfdrd (j) ≤ α ; i j=1 then reject all H(i) for i = 1,..., k.ˆ
11 We show that this step-up procedure provides asymptotic control on the FDR. To facilitate the derivation, we make the following additional assumption.
(C3) Assume that m 1 X p |Lfdrd i(xi) − Lfdri(xi)| → 0. m i=1
Remark 1. (C4) imposes uniform (weak) consistency on the estimators and Condition (C5) requires the joint empirical distribution function of Lfdri(xi) and xi to converge to a continuous function. Both conditions are useful in showing that the empirical distribution functions based on {Lfdri(xi)} and {Lfdrd i(xi)} are uniformly close, which is a key step in the proof of Theorem 1. Notice that we do not require the consistency of the estimators at the boundaries. Such a ˆ relaxation is important when πˆ0i and f1 are estimated using the shape-restricted approach. For example, [14] showed uniform consistency for the Grenander-type estimators on the interval
[βm, 1 − βm] with βm → 0.
(C3) requires the Lfdr estimators to be consistent in terms of the empirical L1 norm. We shall justify Condition (C3) in Section 3.3.
Theorem 1. Under Conditions (C1)-(C3),
ˆ lim FDRm(λm) ≤ α. m→∞
Theorem 1 indicates that we can obtain asymptotic control on the FDR using the data- adaptive procedure when relevant consistent estimates are available. Similar algorithm has been obtained in [45], where it is assumed that the hypotheses are exchangeable in the sense that π01 = ··· = π0m.
12 3 Estimating the unknowns
3.1 The density function f1(·) is known
We first consider the case that f0(·) and f1(·) are both known. Under such setup, we need to estimate m unknown parameters π0i, i = 1, . . . , m, which is prohibitive without additional constraints. One constraint that makes the problem solvable is the monotone constraint. In statistical genetics and genomics, investigators can use auxiliary information (e.g., p-values from previous or related studies) to generate a ranked list of hypotheses H1,...,Hm even before performing the experiment, where H1 is the hypothesis that the investigator believes to most likely correspond to a true signal, while Hm is the one believed to be least likely.
m Specifically, let Π0 = (π01, . . . , π0m) ∈ (0, 1) . Define the convex set
m M = {Π = (π1, . . . , πm) ∈ (0, 1) : 0 ≤ π1 ≤ ... ≤ πm ≤ 1}.
We illustrate the motivation for the monotone constraint with an example.
Example 3.1. Suppose that we are given data consisting of a pair of values (xi1, xi2), where xi1 represents the p-value, xi2 represents auxiliary information and they are independent conditional on the hidden true state θi for i = 1, . . . , m. Suppose
ind xij | θi ∼ (1 − θi)f0,j(xij) + θif1,j(xij), i = 1, . . . , m, j = 1, 2, (3.12)
where θi = 1 if Hi is alternative and θi = 0 if Hi is null, f0,j(·) is the density function of p-values or auxiliary variables under the null hypothesis and f1,j(·) is the density function of p-values or auxiliary variables under the alternative hypothesis. Suppose P (θi = 0) = τ0 for all i = 1, . . . , m. Using the Bayes rule and the independence between xi1 and xi2 given
13 θi, i = 1, . . . , m, we have the conditional distribution of xi1 | xi2 as follows:
f(xi1 | xi2) f(x , x | θ = 0)τ + f(x , x | θ = 1)(1 − τ ) = i1 i2 i 0 i1 i2 i 0 f(xi2 | θi = 0)τ0 + f(xi2 | θi = 1)(1 − τ0) f(x | θ = 0)f(x | θ = 0)τ + f(x | θ = 1)f(x | θ = 1)(1 − τ ) = i1 i i2 i 0 i1 i i2 i 0 f(xi2 | θi = 0)τ0 + f(xi2 | θi = 1)(1 − τ0) f (x )f (x )τ + f (x )f (x )(1 − τ ) = 0,1 i1 0,2 i2 0 1,1 i1 1,2 i2 0 f0,2(xi2)τ0 + f1,2(xi2)(1 − τ0)
=f0,1(xi1)γ0(xi2) + f1,1(xi1)(1 − γ0(xi2)), where
f0,2(x)τ0 τ0 γ0(x) = = . f1,2(x) f0,2(x)τ0 + f1,2(x)(1 − τ0) τ0 + (1 − τ0) f0,2(x)
If f1,2(x)/f0,2(x) is a monotonic function, so is γ0(x). Therefore, the order of xi2 generates a ranked list of the hypotheses H1,...,Hm through the conditional prior probability γ0(x).
We estimate Π0 by solving the following maximum likelihood problem:
ˆ Π0 = (ˆπ01,..., πˆ0m) = arg max lm(Π), Π=(π1,...,πm)∈M m (3.13) X lm(Π) := log {πif0(xi) + (1 − πi)f1(xi)} . i=1
It is easy to see that (3.13) is a convex optimization problem. Let φ(x, a) = af0(x) + (1 − a)f1(x). To facilitate the derivations, we shall assume that f0(xi) 6= f1(xi) for all i, which is a relatively mild requirement. Under this assumption, it is straightforward to see that Pl for any 1 ≤ k ≤ l ≤ m, i=k log φ(xi, a) is a strictly concave function for 0 < a < 1. Let Pl aˆkl = arg maxa∈[0,1] i=k log φ(xi, a) be the unique maximizer. According to Theorem 3.1 of
14 [37], we have
πˆ0i = max min aˆkl. (3.14) 1≤k≤i i≤l≤m
However, this formula is not practically useful due to the computational burden when m is very large. Below we suggest a more efficient way to solve problem (3.13). A general algorithm when f1 is unknown is provided in the next subsection. The main computational tools are the EM algorithm for two-group mixture model and the Pool-Adjacent-Violator-
Algorithm from isotonic regression for the monotone constraint on the prior probability of null hypothesis π0i, i = 1, . . . , m [12, 38]. Our procedure is tuning parameter free and can be easily implemented in practice. The EM algorithm treats the hidden state θi, i = 1, . . . , m as missing data. The isotonic regression problem is to
m X 2 minimize (ai − zi) wi, i=1 (3.15)
subject to z1 ≤ z2 ≤ ... ≤ zm,
where wi > 0 and ai, i = 1, . . . , m are given. By [22], the solution to (3.15) can be written as
Pb j=a ajwj zˆi = max min . (3.16) a≤i b≥i Pb j=a wj
We need a key result from [4] which we present below for completeness.
Proposition 3. (Theorem 3.1 in [4]) Let G be a proper convex function on R, and g its
m derivative. Denote m dimensional vectors a = (a1, . . . , am) and w = (w1, . . . , wm).R : a =
(a1, . . . , am), w = (w1, . . . , wm). We call the problem
m X minimizez=(z1,...,zm) {G(zi) − aizi}wi (3.17) i=1
15 the generalized isotonic regression problem. Then
◦ −1 ∗ zi = g (ai ), i = 1, . . . , m, (3.18) where Pb a w ∗ j=a j j ai = max min , (3.19) a≤i b≥i Pb j=a wj solves the generalized isotonic regression problem (3.17). The minimization function is unique if G is strictly convex.
Observe that (3.16) and (3.19) have the same expression. In practice, we implement the
Pool-Adjacent-Violator-Algorithm by solving (3.15) to get (3.19).
(t) (t) (t) Let Π = (ˆπ01 ,..., πˆ0m) be the solution at the tth iteration. Define
(t) (t) (t) (t) πˆ0j f0(xj) Qj := Qj (ˆπ0j ) = (t) (t) , πˆ0j f0(xj) + (1 − πˆ0j )f1(xj) m (t) X (t) (t) Q(Π|Π ) = {Qj log(πj) + (1 − Qj ) log(1 − πj)}. j=1
At the (t + 1)th iteration, we solve the following problem,
Π(t+1) = arg max Q(Π|Π(t)). (3.20) Π=(π1,...,πm)∈M
To use Proposition 3, we first do a change of variable by letting zi = log π0i, i = 1, . . . , m. We proceed by solving
m Q(t) X (t) zi i arg min (1 − Qi ){− log(1 − e ) − (t) zi}. z ≤z ≤···≤z 1 2 m i=1 1 − Qi
16 which has the same solution to the problem
m (t) X Qi 2 (t) arg min { (t) − zi} (1 − Qi ). z ≤z ≤...≤z 1 2 m j=1 1 − Qi
We write it as
Pb (t) j=a Qj zˆi = max min , a≤i b≥i Pb (t) j=a(1 − Qj ) 1 Pb (t) maxa≤i minb≥i Q = b−a+1 j=a j , i = 1, . . . , m. 1 Pb (t) 1 − maxa≤i minb≥i b−a+1 j=a Qj
By Proposition 3, we have
b (t+1) zˆi 1 X (t) πˆ0i = = max min Qj , i = 1, . . . , m. zˆ + 1 a≤i b≥i b − a + 1 i j=a
(t+1) In practice, we obtainπ ˆ0i , i = 1, . . . , m through the Pool-Adjacent-Violators Algorithm (PAVA) [38] by solving the following problem
m (t+1) X (t) 2 Π = arg min {Qj − πj} . (3.21) Π=(π1,...,πm)∈M j=1
(t) (t) (t) (t+1) Note that if Q1 ≥ Q2 ≥ · · · Qm , then the solution to (3.21) is simply given byπ ˆ0i = Pm (t) j=1 Qj /m for all 1 ≤ i ≤ m. As the EM algorithm is a hill-climbing algorithm, it is not
(t) hard to show that lm(Π ) is a non-decreasing function of t. ˆ We study the asymptotic consistency of the true maximum likelihood estimator Π0 which
17 can be represented as (3.14). To this end, consider the model
i.i.d 1 xi ∼ π0if0 + (1 − π0i)f1, π0i = π0(i/m) ,
for some non-decreasing function π0 : [0, 1] → [0, 1]. Our first result concerns the point-wise consistency for eachπ ˆ0i. For a set A, denote by card(A) its cardinality.
R 2 Theorem 2. Assume that (log fi(x)) fj(x)dx < ∞ for i, j = 0, 1, and P (f0(xi) = f1(xi)) =
0 00 0. Suppose 0 < π0(0) ≤ π0(1) < 1. For any > 0, let 0 ≤ t < i0/m < t ≤ 1 such that
0 00 0 |π0(t ) − π0(i0/m)| ∨ |π0(t ) − π0(i0/m)| < /2. Denote A1 = {i : t ≤ i/m ≤ i0/m} and
00 A2 = {i : i0/m ≤ i/m ≤ t }. For card(A1) ∧ card(A2) ≥ N, we have
1 P (|πˆ − π | < ) ≥ 1 − O . 0,i0 0,i0 2N
The condition on the cardinalities of A1 and A2 guarantees that there are sufficient obser-
vations around i0/m, which allows us to borrow information to estimate π0,i0 consistently. The assumption P (f0(xi) = f1(xi)) = 0 ensures that the maximizera ˆkl is unique for 1 ≤ k ≤ l ≤ m.
It is fulfilled if the set {x ∈ [0, 1] : f0(x) = f1(x)} has zero Lebesgue measure. As a direct ˆ consequence of Theorem 2, we have the following uniform consistency result of Π0. Due to the monotonicity, the uniform convergence follows from the pointwise convergence.
Corollary 1. For > 0, suppose there exists a set i1 < i2 < ··· < il, where each ik satisfies
the assumption for i0 in Theorem 2 and that max2≤k≤l(π0,ik − π0,ik−1 ) < . Then we have
l P max |πˆ0,i − π0,i| < ≥ 1 − O 2 . i1≤i≤il N
Remark 3.1. Suppose π0 is Lipschitz continuous with the Lipschitz constant K. Then we
00 0 can set t = (i0 − 1)/m + /(2K), t = (i0 + 1)/m − /(2K) and thus N = bm/(2K)c. Our
1 For the ease of presentation, we suppress the dependence on m in π0i.
18 result suggests that K P (|πˆ − π | < ) ≥ 1 − O , 0,i0 0,i0 3m
−1/3 which implies that |πˆ0,i0 − π0,i0 | = Op(m ).
3.2 The density function f1(·) is unknown
In practice, f1 and Π0 are both unknown. We propose to estimate f1 and Π0 by maximizing the likelihood, i.e.,
m ˆ ˆ X n ˜ o (Π0, f1) = arg max log πif0(xi) + (1 − πi)f1(xi) , (3.22) ˜ Π∈M,f1∈H i=1 where H is a pre-specified class of density functions. In (3.22), H might be the class of beta mixtures or the class of decreasing density functions. Problem (3.22) can be solved by
Algorithm 1. A derivation of Algorithm 1 from the full data likelihood that has access to latent variables is provided in the Appendix. Our algorithm is quite general in the sense that it allows users to specify their own updating scheme for the density components in (3.24).
19 Both parametric and non-parametric methods can be used to estimate f1.
(0) (0) 0. Input the initial values (Π , f1 ). ˆ (t) ˆ(t) 1. E-step: Given (Π , f1 ), let
πˆ(t)f (x ) Q(t) = 0i 0 i . i (t) (t) ˆ(t) πˆ0i f0(xi) + (1 − πˆ0i )f1 (xi)
(t) 2. M-step: Given Qi , update (Π, f1) through
m 2 (t+1) (t+1) X (t) (ˆπ01 ,..., πˆ0m ) = arg min Qi − πi , (3.23) Π=(π1,...,πm)∈M i=1
and
m ˆ(t+1) X (t) ˜ f1 = arg max (1 − Qi ) log f1(xi). (3.24) ˜ f1∈H i=1
3. Repeat the above E-step and M-step until the algorithm converges.
Algorithm 1:
In the multiple testing literature, it is common to assume that f1 is a decreasing density function (e.g., smaller p-values imply stronger evidence against the null), see e.g. [29]. As an example of the general algorithm, let H denote the class of decreasing density functions. We shall discuss how (3.24) can be solved using the PAVA. The key recipe is to use Proposition
3 in obtaining f1 evaluated at the observed p-values. Specifically, it can be accomplished by a series of steps outlined below. Define the order statistics of {xi} as x(1) ≤ x(2) ≤ · · · ≤ x(m).
(t) (t) Let Q(i) be the corresponding Qi that is associated with x(i).
Step 1: The objective function in (3.24) only looks at the value of f1 at x(i). The objective function increases if f1(x(i)) increases, and the value of f1 at (x(i−1), x(i)) has no impact on the objective function (where x(0) = 0). Therefore, if f maximizes the objective function, there is a solution that is constant on (x(i−1), x(i)].
20 Step 2: Let yi = f1(x(i)). We only need to find yi which maximizes
m X (t) (1 − Q(i)) log(yi), i=1
Pm subject to y1 ≥ y2 ≥ · · · ≥ ym ≥ 0 and i=1 yi(x(i) − x(i−1)) = 1. It can be formulated as a convex programming problem which is tractable. In Steps 3 and 4 below, we further translate it into an isotonic regression problem.
(t) Pm (t) Step 3: Write Q = i=1(1 − Q(i)). Consider the problem:
m X n (t) (t) o min −(1 − Q(i)) log(yi) + Q yi(x(i) − x(i−1)) . i=1
1−Q(t) (i) Pm The solution is given byy ˆi = (t) , which satisfies the constraint yi(x(i) − Q (x(i)−x(i−1)) i=1 x(i−1)) = 1 in Step 2. Step 4: Rewrite the problem in Step 3 as
m ( (t) ) X (t) −Q (x(i) − x(i−1)) min (1 − Q(i)) − log(yi) − (t) yi . i=1 (1 − Q(i))
This is the generalized isotonic regression problem considered in Proposition 3. We use (3.16) to obtain (3.19) as follows. Let
2 m (t) ! X (t) Q (x(i) − x(i−1)) (ˆu1,..., uˆm) = arg min (1 − Q(i)) − (t) − ui i=1 (1 − Q(i))
subject to u1 ≥ u2 ≥ · · · ≥ um. The solution is given by the max-min formula
(t) Pb −Q j=a(x(j) − x(j−1)) uˆi = max min , b≥i a≤i Pb (t) j=a(1 − Q(j)) which can be obtained using the PAVA. By Proposition 3, we arrive at the solution to the
21 1 original problem (3.24) by lettingy ˜i = − . Therefore, in the EM-algorithm, one can employ uˆi the PAVA to estimate both the prior probabilities of being null and the p-value density function under the alternative hypothesis. Because of this, our algorithm is fast and tuning parameter free, and is very easy to implement in practice.
3.3 Asymptotic convergence and verification of Condition (C3)
In this subsection, we present some convergence results regarding the proposed estimators in
Section 3.2. Furthermore, we propose a refined estimator for π0, and justify Condition (C3) for the corresponding Lfdr estimator. Throughout the following discussions, we assume that
i xi ∼ f = π0(i/m)f0 + (1 − π0(i/m))f1
independently for 1 ≤ i ≤ m and π0 : [0, 1] → [0, 1] with π0(i/m) = π0i. Let F be the class of densities defined on [0, 1]. For f, g ∈ F, we define the squared Hellinger-distance as
1 Z 1 Z 1 H2(f, g) = (pf(x) − pg(x))2dx = 1 − pf(x)g(x)dx. 2 0 0
Suppose the true alternative density f1 belongs to a class of decreasing density functions H ⊂ F. Let Ξ = {π : [0, 1] → [0, 1], 0 < ε < π(0) ≤ π(1) < 1 − ε < 1, and π(·) is nondecreasing}
˜i ˜ ˘i and assume that π0 ∈ Ξ. Consider f =π ˜(i/m)f0 + (1 − π˜(i/m))f1 and f =π ˘(i/m)f0 + (1 − ˘ ˜ ˘ π˘(i/m))f1 for 1 ≤ i ≤ m, f1, f1 ∈ H andπ, ˜ π˘ ∈ Ξ. Define the average squared Hellinger- ˜ ˘ distance between (˜π, f1) and (˘π, f1) as
m 1 X H2 ((˜π, f˜ ), (˘π, f˘ )) = H2(f˜i, f˘i). m 1 1 m i=1
22 ˆ Suppose (ˆπ0, f1) is an estimator of (π0, f1) such that
m ˆi ! X 2f (xi) log ≥ 0, ˆi i i=1 f (xi) + f (xi)
ˆi ˆ ˆ where f (x) =π ˆ0(i/m)f0(x)+(1−πˆ0(i/m))f1(x). Note that we do not require (ˆπ0, f1) to be the global maximizer of the likelihood. We have the following result concerning the convergence ˆ of (ˆπ0, f1) to (π0, f1) in terms of the average squared Hellinger-distance.
R 1 1+a Theorem 3. Suppose π0 ∈ Ξ, f0 ≡ 1, and f1 ∈ H. Under the assumption that 0 f1 (x)dx < ∞ for some 0 < a ≤ 1, we have
ˆ −1/3 1/3 P Hm((π0, f1), (ˆπ0, f1)) > Mm ≤ M1 exp(−M2m ),
−γ for some M,M1 and M2 > 0. We remark that f1(x) = (1 − γ)x with 0 < γ < 1 satisfies
R 1 1+a 0 f1 (x)dx < ∞ for 0 < a < (1/γ − 1) ∧ 1.
Theorem 3 follows from an application of Theorem 8.14 in [48]. By Cauchy-Schwarz inequality, it is known that
Z 1 |f(x) − g(x)|dx ≤2H(f, g)p2 − H2(f, g). 0
Under the conditions in Theorem 3, we have
m 1 X Z 1 |fˆi(x) − f i(x)|dx = O (m−1/3). (3.25) m p i=1 0
However, π0 and f1 are generally unidentifiable without extra conditions. Below we focus on the case f0 ≡ 1. The model is identifiable in this case if there exists an a0 ≤ 1 such that f1(a0) = 0. If f1 is decreasing, then f1(x) = 0 for x ∈ [a0, 1]. Suppose a0 < 1. For a sequence
23 bm ∈ (0, 1) such that
R 1 −1/3 f1(x)dx m bm = o(1), = o(1), (3.26) 1 − bm 1 − bm
as m → +∞, we define the refined estimator for π0(i/m) as
1 R 1 ˆ 1 Z f1(x)dx ˆi bm π˘0(i/m) = f (x)dx =π ˆ0(i/m) + (1 − πˆ0(i/m)) . 1 − bm bm 1 − bm
Under (3.26), we have
m 1 X |π˘ (i/m) − π (i/m)| m 0 0 i=1 m Z 1 Z 1 1 X ˆi i = f (x)dx − f (x)dx + op(1) (3.27) m(1 − b ) m i=1 bm bm m 1 X Z 1 ≤ |fˆi(x) − f i(x)|dx + o (1) = o (1). m(1 − b ) p p m i=1 0
Given the refined estimatorπ ˘0, the Lfdr can be estimated by
π˘0(i/m) Lfdrd i(xi) = . ˆi f (xi)
Asπ ˆ0, π0 ∈ Ξ and thus are bounded from below, by (3.25) and (3.27), it is not hard to show that
m 1 X Z 1 |Lfdrd i(x) − Lfdri(x)|dx = op(1). (3.28) m i=1 0
Moreover, we have the following result which justifies Condition (C3).
Corollary 2. Suppose π0 ∈ Ξ, f0 ≡ 1, and f1 ∈ H. Further assume D0 in Condition (C1) is continuous at zero and (3.26) holds. Then Condition (C3) is fulfilled.
24 Remark 3.2. Although bm needs to satisfy (3.26) theoretically, the rate condition is of little use in selecting bm in practice. We use a simple ad-hoc procedure that performs reasonably well in our simulations. To motivate our procedure, we let θ indicate the underlying truth of
m a randomly selected hypothesis from {Hi}i=1. Then we have
m m 1 X 1 X P (θ = 0) = P (θ = 0) = π (i/m) :=π ¯ . m i m 0 m i=1 i=1
Without knowing the order information, the p-values follow the mixture modelπ ¯mf0(x) +
(1 − π¯m) f1(x). The overall null proportionπ ¯m can be estimated by classical method, e.g., [41] (in practice, we use the maximum of the two Storey’s global null proportion estimates in the qvalue package for more conservativeness). Denote the corresponding estimator by
−1 Pm πˆ. Also denoteπ ˘ = m i=1 π˘0(i/m), whereπ ˘0(i/m) =π ˆ0(i/m) + δ(1 − πˆ0(i/m)) is the calibrated null probability and δ is the amount of calibration, which is a function of bm.
Then it makes sense to choose bm ∈ [0, 1] such that the difference |π˘ − πˆ| is minimized. This results in the procedure that if the mean ofπ ˆ0(i/m)’s from the EM algorithm (denote asπ ˜) is greater than the global estimateπ ˆ,π ˘0(i/m) =π ˆ0(i/m), and if the mean is less thanπ ˆ, then
π˘0(i/m) =π ˆ0(i/m) + δ(1 − πˆ0(i/m)), where δ = (ˆπ − π˜)/(1 − π˜).
4 A general rejection rule
Given the insights from Section 2, we introduce a general rejection rule and also discuss its connection with the recent accumulation tests in the literature, see e.g. [2, 21, 33]. Recall that our (oracle) rejection rule is {Lfdri(xi) ≤ λ}, which can be written equivalently as
f (x ) (1 − λ)π 1 i ≥ 0i . f0(xi) λ(1 − π0i)
25 Motivated by the above rejection rule, one can consider a more general procedure as follows. R 1 Let h(x) : [0, 1] → [0, +∞] be a decreasing non-negative function such that 0 h(x)f0(x)dx = 1. The general rejection rule is then defined as
(1 − λ)˜π0i h(xi) ≥ wi(λ) := (4.29) λ(1 − π˜0i)
˜ for 0 ≤ λ, π˜0i ≤ 1. Here h serves as a surrogate for the likelihood ratio f1/f0. Set h(x) = h(1 − x). Under the assumption that f0 is symmetric about 0.5 (i.e. f0(x) = f0(1 − x)), it ˜ R 1 ˜ is easy to verify that E[h(xi)|θi = 0] = 0 h(x)f0(x)dx = 1. We note that the false discovery proportion (FDP) for the general rejection rule is equal to
Pm i=1 1{h(xi) ≥ wi(λ)}(1 − θi) FDPm(λ) := Pm i=1 1{h(xi) ≥ wi(λ)} Pm ˜ i=1 E[h(xi)|θi = 0]1{h(xi) ≥ wi(λ)}(1 − θi) = Pm . i=1 1{h(xi) ≥ wi(λ)}
If we setπ ˜0i = 0 for 1 ≤ i ≤ k andπ ˜0i = 1 for k + 1 ≤ i ≤ m, then wi(λ) = 0 for 1 ≤ i ≤ k and wi(λ) = ∞ for k + 1 ≤ i ≤ m. Therefore, we have
Pk E[h˜(x )|θ = 0](1 − θ ) FDP (λ) = i=1 i i i m k Pk h˜(x )(1 − θ ) Pk h˜(x ) ≈ i=1 i i ≤ i=1 i , k k
where the approximation is due to the law of large numbers. With this choice ofπ ˜0i, one intends to follow the prior order restriction strictly. As suggested in [2, 21], a natural choice of k is given by ( ) Pj h˜(x ) k˜ = max 1 ≤ j ≤ m : i=1 i ≤ α , j
26 which is designed to control the (asymptotic) upper bound of the FDP.2 Some common choices of h˜ are given by
1 h˜ (λ) = log , ForwardStop 1 − λ ˜ hSeqStep(λ) = C1{λ > 1 − 1/C}, 1 h˜ (λ) = C log 1{λ > 1 − 1/C}, HingeExp C(1 − λ) for C > 0, which correspond to the ForwardStop, SeqStep and HingeExp procedures respec- tively. Note that all procedures are special cases of the accumulation tests proposed in [33]. ˜ Different from the accumulation tests, we suggest to use h(x) = f1(1 − x)/f0(1 − x) and
π˜0i = π0i. Our procedure is conceptually sound as it is better motivated from the Bayesian perspective, and it avoids the subjective choice of accumulation functions.
Remark 4.1. Our setup is different from the one in [21], where the authors seek for the largest cutoff k so that one rejects the first k hypothesis while accepts the remaining ones. In contrast, our procedure allows researchers to reject the kth hypothesis but accept the k − 1th hypothesis. In other words, we do not follow the order restriction strictly. Such flexibility could result in a substantial power increase when the order information is not very strong or even weak as observed in our numerical studies.
Below we conduct power comparison with the accumulation tests in [33], which include the ForwardStop procedure in [21] and the SeqStep procedure in [2] as special cases. Let h˜ R 1 ˜ R ˜ be a nonnegative function with 0 h(x)dx = 1 and ν = h(x)f1(x)dx. Define
1 Z s 1 − α s0 = max s ∈ [0, 1] : (1 − π0(x))dx ≥ , s 0 1 − ν
where s0 = 0 if the above set is empty. The asymptotic power of the accumulation test in [33] 2Finite sample FDR control has been proved for this procedure, see e.g. [33].
27 is given by
R s0 (1 − π (x))dx Power = 0 0 . AT R 1 0 (1 − π0(x))dx
Notice that the accumulation test rejects the first s0 hypotheses in the ordered list, which is equivalent to setting the threshold si = 1 for 1 ≤ i ≤ s0 and si = 0 otherwise. Suppose s0 satisfies 1 Z s0 1 − α (1 − π0(x))dx = . s0 0 1 − ν
Then after some re-arrangements, we have
R s0 π (x)dx R s0 π (x)dx ν R s0 (1 − π (x))dx 0 0 ≤ 0 0 + 0 0 = α, s0 s0 s0 which suggests that the accumulation test controls the mFDR defined in (2.5) at level α. By the discussion in Section 2.1, the optimal thresholds are the level surfaces of the Lfdr. Therefore the proposed procedure is more powerful than the accumulation test asymptotically.
4.1 Asymptotic power analysis
We provide asymptotic power analysis for the proposed method. In particular, we have the following result concerning the asymptotic power of the Lfdr procedure in Section 2.2.
Theorem 4. Suppose Conditions (C1)-(C3) hold and additionally assume that
m 1 X 1{θ = 0} → κ , m i 0 i=1 m 1 X 1{θ = 1, Lfdr (x ) ≤ λ} →p D (λ), m i i i 2 i=1
for a continuous function D2 of λ on [0,1]. Let λ0 be the largest λ ∈ [0, 1] such that R(λ) ≤ α
28 and for any small enough , R(λ0 − ) < α. Then we have
Pm ˆ i=1 1{θi = 1, Lfdrd i(xi) ≤ λm} p D2(λ0) PowerLfdr := Pm → . i=1 1{θi = 1} ∨ 1 1 − κ0
Recall that in Section 2.1, we have shown that the step-up procedure has the highest expected number of true positives amongst all α-level FDR rules. This result thus sheds some light on the asymptotic optimal power amongst all α-level FDR rules when the number of hypothesis tests goes to infinity.
Remark 4.2. Under the two-group mixtue model (2.1)-(2.2) with π0i = π0(i/m) for some
−1 Pm −1 Pm R 1 non-decreasing function π0, we have m i=1 P (θi = 0) = m i=1 π0(i/m) → 0 π0(x)dx R 1 as monotonic functions are Riemann integrable. Thus κ0 = 0 π0(x)dx. Define g(x) = sup{t ∈
π0(x)(1−λ) [0, 1] : f1(t)/f0(t) ≥ x} and w(λ, x) = . Denote by F1 the distribution function of f1. (1−π0(x))λ Then we have
m m 1 X 1 X P (θ = 1, Lfdr (x ) ≤ λ) = P (θ = 1)P (Lfdr (x ) ≤ λ|θ = 1) m i i i m i i i i i=1 i=1 m 1 X = (1 − π (i/m))F ◦ g ◦ w(λ, i/m) m 0 1 i=1 Z 1 → (1 − π0(x))F1 ◦ g ◦ w(λ, x)dx, 0
where “◦” denotes the composition of two functions, and we have used the fact that F1 ◦ g ◦ w R 1 is monotonic and thus Riemann integrable. So D2(λ) = 0 (1 − π0(x))F1 ◦ g ◦ w(λ, x)dx.
29 5 Two extensions
5.1 Grouped hypotheses with ordering
Our idea can be extended to the case where the hypotheses can be divided into d ≥ 2 groups within which there is no explicit ordering but between which there is an ordering. One can simply modify (3.23) by considering the problem,
m n o2 (t+1) (t+1) X ˜(t) 3 (ˆπ01 ,..., πˆ0d ) = arg min Qj − πs(j) , (5.30) j=1
subject to 0 ≤ π1 ≤ · · · ≤ πd ≤ 1, where s(j) ∈ {1, 2, . . . , d} is the group index for the jth hypothesis. A particular example is about using the sign to improve power while controlling the FDR. Consider a two-sided test where the null distribution is symmetric and the test statistic is the absolute value of the symmetric statistic. The sign of the statistic is independent of the p-value under the null. If we have a priori belief that among the alternatives, more hypotheses have true positive effect sizes than negative ones or vice versa, then sign could be used to divide the hypotheses into two groups such that π1 ≤ π2 (or π1 ≥ π2).
5.2 Varying alternative distributions
In model (2.1), we assume that the success probabilities π0i, i = 1, . . . , m vary with i while
F1 is independent of i. This assumption is reasonable in some applications but it can be restrictive in other cases. We illustrate this point via a simple example described below.
ni Example 5.2. For 1 ≤ i ≤ m, let {xik}k=1 be ni observations generated independently from 3This optimization problem can be solved by slightly modifying the PAVA by averaging the estimators within each group.
30 √ −1 Pni N(µi, 1). Consider the one sided z-test Zi = nix¯i withx ¯i = ni k=1 xik for testing
Hi0 : µi = 0 vs Hia : µi < 0.
√ The p-value is equal to pi = Φ( nix¯i) and the p-value distribution under the alternative hypothesis is given by