
Sequential Analysis in High Dimensional Multiple Testing and Sparse Recovery Matt Malloy Robert Nowak Electrical and Computer Engineering Electrical and Computer Engineering University of Wisconsin-Madison University of Wisconsin-Madison Email: [email protected] Email: [email protected] Abstract—This paper studies the problem of high-dimensional components at the expense of ignoring others. For example, multiple testing and sparse recovery from the perspective of the process might first measure each component once, then sequential analysis. In this setting, the probability of error is focus on a reduced subset of ‘interesting’ components in a a function of the dimension of the problem. A simple sequential testing procedure for this problem is proposed. We derive neces- second pass. sary conditions for reliable recovery in the non-sequential setting To compare sequential and non-sequential methods we and contrast them with sufficient conditions for reliable recovery impose a budget on the total number of measurements that can using the proposed sequential testing procedure. Applications of be made. The main results show sequential methods can be the main results to several commonly encountered models show dramatically more sensitive to small differences between the that sequential testing can be exponentially more sensitive to the difference between the null and alternative distributions (in terms baseline/null θ0 and the alternative value of θ1. Our approach of the dependence on dimension), implying that subtle cases can is similar to the so-called distilled sensing method proposed be much more reliably determined using sequential methods. in [1] [2], however there are two main distinctions. First, the results in this paper are applicable to a large class of I. INTRODUCTION problems characterized by one-sided tests; the distilled sensing High dimensional testing and sparse recovery problems arise approach is specific to the Gaussian setting. Second, here we in a broad range of scientific and engineering applications. are concerned with the probability of error in identifying , The basic problem is summarized as follows. Let θ Rn S ∈ whereas the distilled sensing controls the false discovery and denote a parameter vector. The dimension n may be very non-discovery rates which is less demanding than the error θ large (thousands or millions or more), but is sparse in the rate control. The probability of error is more natural and sense that most its elements are equal to a baseline/null value appropriate in applications such as spectrum sensing. denoted by θ0 (e.g., θ0 =0). The support of the sparse subset To give a sense of the main results, consider the case in of elements that deviate from the baseline is denoted by . which f( θ) is a Gaussian with mean θ and variance 1. If The parameter θ is observed stochastically according toS ·| θ0 =0and the alternative is θ1 > 0, then reliable detection (probability of error tending to zero as n ) is possible f(yi θ0) i →∞ yi | #∈ S , (1) using non-sequential methods if and only if θ > √2 log n. ∼ f(y θ ) i 1 ! i| 1 ∈S In contrast, a sequential method that we will demonstrate where f( θ) is a parametric family of densities indexed by a is reliable as long as θ1 > 4 log , where is the scalar parameter·| θ R. The goal of the high-dimensional cardinality of the support set. This shows|S| that the|S| sequential ∈ " 1/2 arXiv:1103.5991v1 [math.ST] 30 Mar 2011 testing and sparse recovery problem is to identify from method is more sensitive whenever <n ; i.e., the sparse observations of this form. This problem has attracted attenS tion setting. The improvement is especially|S| remarkable when θ lately due to its importance in the biological sciences. It is is very sparse; e.g., if log n, then sequential methods |S| ∼ also relevant in communications problems including spectrum succeed as long as θ1 is larger than a constant mutiple of sensing in cognitive radio, one of the motivations for our work. √log log n. The gains provided by the sequential method are The conventional theoretical treatment of this problem as- even more pronounced for certain one-sided distributions. Un- sumes that a set of observations are collected prior to data der the Gamma distribution model (which arises in spectrum analysis. Typically, in what we refer to as the non-sequential sensing), for constants c and C, if θ0 C log( log2 n) setting, each of the n components is measured (one or more then the sequential method is reliable, but≥ any non-sequent|S| ial 1 times) according to the model above and then component-wise thresholding procedure is unreliable if θ0 cn2 . To dramatize tests are performed to estimate . this result, if log n, then the≤ gap between these S This papers investigates the high-dimensional testing prob- conditions is doubly|S| ∼exponential in n. lem from the perspective of sequential analysis. In this setting, observations are gathered sequentially and adaptively, based on II. PROBLEM STATEMENT information gleaned from previous observations. This allows We begin by stating a main assumption about the family the observation process to focus sensing resources on certain f( θ). Let y , . , y be i.i.d. random variables with common ·| 1 m distribution f( θ), for some θ R. Let y =(y1,...,ym) and for each is Ti,m := T (yi,1,...,yi,m). Assume θ0 is known define the likelihood·| ratio ∈ and let T θ denote the random variable whose distribution m| 0 m is that of the test statistic under the null, θ = θ0. Consider the f(yj θ1) Γ(y) := | . threshold test f(y θ0) j=1 j| # Ti,m > median(Tm θ0) . Assumption A1. Γ(y) is a monotone non-decreasing function | For i , the test statistic T falls below median(T θ ) for θ1 θ0. i,m m 0 ≥ with probability#∈ S 1/2. The threshold test above thus eliminates| We will state our main results with this monotonicity assump- approximately 1/2 of the components that follow the null. We tion. However, in certain applications we consider it is more can next use a portion of our remaining budget of mn to repeat the same measurement and thresholding procedure on the natural to consider θ1 θ0 and assume that the likelihood ratio is a monotone non-increasing≤ function. The main results carry remaining components. Since approximately n/2 components remain this will require mn/2 of the remaining budget. over to this setting with appropriate modification. Define Tm as the (log) likelihood ratio test statistic, which is a function Repeating this process for sufficiently many iterations will of y. The test statistic depends on the number of independent remove, with high probability, all of the null components. We observations, and so this is indicated by the subscript m. If call this process sequential thresholding and give a formal A1 holds, then the test at threshold τ R algorithm below. The output of the procedure, K , is the ∈ estimated support set. Notice that sequential thresholdinS g does θ1 Tm ≷ τ, not require prior knowledge of the size of the support set. θ0 is the uniformly most powerful test (UMP) of θ θ versus Sequential Thresholding ≤ 0 θ>θ . The monotonicity of the likelihood ratio is satisfied input: K>0 steps, γ0 := median(Tm θ0) 0 | by a large number of distributions in the exponential family initialize: 0 = 1, ..., n S { } (including Gaussian, Poisson and exponential distributions). for k =1,...,K do for i k−1 do ∈S m (k) A. Measurement Budget f(y θ ) i (k) m iid j=1 i,j | 0 #∈ S To compare sequential and non-sequential methods we measure: yi,j j=1 m (k) { } ∼ ! f(y θ1) i impose a budget on the total number of measurements. The $j=1 i,j (k) | ∈S total number of measurements N 2mn, where m 1 is an threshold: k := i k−1 : T >γ0 S { ∈S $ i,m } integer and n is the dimension of≤θ. ≥ end for end for B. Non-Sequential Testing output: SK The non-sequential approach distributes the measurement budget uniformly over the n components, making 2m i.i.d. observations of each. Let yi,1, . , yi,2m denote the m obser- D. Sequential Thresholding Satisfies Budget vations of component i, and let Ti,2m denote the corresponding The number of measurements used by sequential threshold- test statistic. The UMP test takes the form ing satisfies the overall measurement budget N 2mn in ≤ θ1 expectation. Let s = , the cardinality of the support set. |S| Ti,2m ≷ τ. (2) The expected number of measurements is θ0 K−1 K−1 The estimated support set at threshold τ is defined to be E m(n s) k k− + ms % |S |' ≤ 2 τ := i : Ti,2m >τ . k&=0 k&=0 ( ) S { } 2m(n s)+msK . This estimator is optimal among all (non-sequential) ≤ − component-wise procedures because each test is UMP. Our interest is in high-dimensional limits of n and s (and possibly K). Suppose that sK grows sublinearly with n. Then C. Sequential Thresholding for any $> 0 there exists an N such that E K−1 # k=0 |Sk| ≤ The sequential method we proposed is based on the fol- 2(1 + $)mn for every n > N . For ease of exposition, we # *+ , lowing simple bisection idea. Instead of aiming to identify suppress the factor 1+$ as we proceed; it does not effect the the components in (those with θ = θ ), at each step of S 1 main results and conclusions of the paper as allowing the non- the sequential procedure we aim to eliminate about 1/2 of sequential method 2(1+$)mn observations is inconsequential.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-