arXiv:1103.5991v1 [math.ST] 30 Mar 2011 feeet htdvaefo h aeiei eoe by denoted is baseline the from deviate that elements of eoeaprmtrvco.Tedimension The vector. parameter a denote clrparameter scalar where eoe by denoted es htms t lmnsaeeult aeienl val baseline/null a to equal are elements its most that sense ag tosnso iloso oe,but more), or millions or (thousands large etn n prercvr rbe st identify to is problem recovery sparse and testing h ai rbe ssmaie sflos Let follows. applications. as engineering summarized and is scientific problem of basic The broad a in esn ncgiierdo n ftemtvtosfrorwo our for motivations the of one radio, spectr cognitive including in problems sensing communications in relevant also aeydet t motnei h ilgclsine.I i It sciences. biological the in importance its atten to attracted has due problem lately This form. this of observations ue htasto bevtosaecletdpirt data non-sequenti to the prior as to collected refer are we observations what of in Typically, set . a that sumes etn,ec fthe of each setting, ucino h ieso ftepolm ipesequenti simple A problem. the of dimension the of function a htsqeta etn a eepnnilymr sensitiv more sho exponentially models be encountered can commonly testing several sequential to that Applicat results procedure. main testing re the sequential reliable for proposed conditions the sufficient using with them and utpetsigadsas eoeyfo h esetv o perspective the from recovery sparse and testing multiple et r efre oestimate to performed are above tests model the to according times) cases subtle that implying dimension), on dependence distributions the alternative of and null the between difference non-sequentia the ne in derive recovery We reliable proposed. er for is conditions of problem sary this probability for the procedure testing setting, this In analysis. sequential emc oerlal eemnduigsqeta methods. sequential using determined reliably more much be bevtosaegtee eunilyadaatvl,ba set adaptively, this and In sequentially gathered analysis. are sequential observations of perspective the from lem h bevto rcs ofcssnigrsucso certa on resources sensing focus allo to This process observation observations. the previous from gleaned information h parameter The ihdmninltsigadsas eoeypolm aris problems recovery sparse and testing dimensional High h ovninltertclteteto hspolmas- problem this of treatment theoretical conventional The Abstract hsppr netgtstehg-iesoa etn pro testing high-dimensional the investigates papers This f ( ·| Ti ae tde h rbe fhigh-dimensional of problem the studies paper —This θ θ ) 0 eunilAayi nHg Dimensional High in Analysis Sequential saprmti aiyo este nee ya by indexed densities of family parametric a is y (e.g., i utpeTsigadSas Recovery Sparse and Testing Multiple ∼ lcrcladCmue Engineering Computer and Electrical θ θ nvriyo Wisconsin-Madison of University ! sosre tcatclyacrigto according stochastically observed is ∈ n θ .I I. 0 mi:[email protected] Email: opnnsi esrd(n rmore or (one measured is components f f R =0 ( ( h olo h high-dimensional the of goal The . NTRODUCTION y y i i .Tespoto h presubset sparse the of support The ). | | θ θ atMalloy Matt 1 0 ) ) S . i i ∈S S #∈ n then and , θ component-wise n ssas nthe in sparse is a every be may θ i terms (in S setting l othe to e osof ions e on sed covery ∈ o is ror from ting, S tion ces- can R um (1) ws rk. . ue b- in al al w n e s f f odtosis conditions e h am itiuinmdl(hc rssi spectrum in arises constants (which for model sensing), distribution Gamma distributions. the one-sided der certain for pronounced more even demonstrate will as we long that as method reliable sequential is a contrast, In hntesqeta ehdi eibe u n non-sequent any if but unreliable reliable, is procedure is thresholding method sequential the then as long as succeed hsrsl,if result, this √ if only and if methods non-sequential using aecnrl h rbblt ferri oentrland natural more which sensing. is spectrum error as error such of the applications probability in than appropriate The demanding less control. is discovery rate false which the rates controls non-discovery sensing distilled the whereas adnlt ftespotst hssosta h sequenti the that shows whenever sensitive This more set. is method support the of cardinality identifying in error of of we here probability class Second, the large setting. with Gaussian a concerned the are to First, to specific applicable distinctions. s is distilled are main the approach tests; paper one-sided two by this characterized are problems in there results however the [2], [1] so-called the in to similar is etn.Teipoeeti seilyrmral when if remarkable e.g., sparse; especially very is is improvement The setting. rmtclymr estv osaldfeecsbtent be between can differences methods small sequential baseline/null to show sensitive results more can main dramatically that measurements The of made. number total be the on a budget a in impose components ‘interesting’ then of once, subset pass. component reduced second each a example, measure For on first others. focus might ignoring process of expense the the at components poaiiyo ro edn ozr as zero to tending error of (probability θ 0 ( o log log ·| ebgnb ttn anasmto bu h family the about assumption main a stating by begin We ogv es ftemi eut,cnie h aein case the consider results, main the of sense a give To ocmaesqeta n o-eunilmtoswe methods non-sequential and sequential compare To =0 θ ) Let . f lcrcladCmue Engineering Computer and Electrical n h lentv is alternative the and ( nvriyo Wisconsin-Madison of University n ·| h an rvddb h eunilmto are method sequential the by provided gains The . θ y mi:[email protected] Email: ) 1 θ y , . . . , doubly saGusa ihmean with Gaussian a is 0 S ∼ |S| n h lentv au of value alternative the and I P II. oetNowak Robert m θ xoeta in exponential 1 ROBLEM eiid admvralswt common with variables random i.i.d. be S ∼ |S| θ c slre hnacntn uil of mutiple constant a than larger is log 1 and itle sensing distilled > n hntegpbtenthese between gap the then , θ log C " 1 S if , TATEMENT > log 4 n |S| hnsqeta methods sequential then , n 0 . θ hnrlal detection reliable then , θ 0 0 |S| |S| |S| √ log 2 log ensing sthe is 1 2 Un- and If . n ial he S n al θ ) . , distribution f( θ), for some θ R. Let y =(y1,...,ym) and for each is Ti,m := T (yi,1,...,yi,m). Assume θ0 is known define the likelihood·| ratio ∈ and let T θ denote the random variable whose distribution m| 0 m is that of the test under the null, θ = θ0. Consider the f(yj θ1) Γ(y) := | . threshold test f(y θ0) j=1 j| # Ti,m > (Tm θ0) . Assumption A1. Γ(y) is a monotone non-decreasing function | For i , the test statistic T falls below median(T θ ) for θ1 θ0. i,m m 0 ≥ with probability#∈ S 1/2. The threshold test above thus eliminates| We will state our main results with this monotonicity assump- approximately 1/2 of the components that follow the null. We tion. However, in certain applications we consider it is more can next use a portion of our remaining budget of mn to repeat the same measurement and thresholding procedure on the natural to consider θ1 θ0 and assume that the likelihood ratio is a monotone non-increasing≤ function. The main results carry remaining components. Since approximately n/2 components remain this will require mn/2 of the remaining budget. over to this setting with appropriate modification. Define Tm as the (log) likelihood ratio test statistic, which is a function Repeating this process for sufficiently many iterations will of y. The test statistic depends on the number of independent remove, with high probability, all of the null components. We observations, and so this is indicated by the subscript m. If call this process sequential thresholding and give a formal A1 holds, then the test at threshold τ R algorithm below. The output of the procedure, K , is the ∈ estimated support set. Notice that sequential thresholdinS g does θ1 Tm ≷ τ, not require prior knowledge of the size of the support set. θ0 is the uniformly most powerful test (UMP) of θ θ versus Sequential Thresholding ≤ 0 θ>θ . The monotonicity of the likelihood ratio is satisfied input: K>0 steps, γ0 := median(Tm θ0) 0 | by a large number of distributions in the initialize: 0 = 1, ..., n S { } (including Gaussian, Poisson and exponential distributions). for k =1,...,K do for i k−1 do ∈S m (k) A. Measurement Budget f(y θ ) i (k) m iid j=1 i,j | 0 #∈ S To compare sequential and non-sequential methods we measure: yi,j j=1 m (k) { } ∼ ! f(y θ1) i impose a budget on the total number of measurements. The $j=1 i,j (k) | ∈S total number of measurements N 2mn, where m 1 is an threshold: k := i k−1 : T >γ0 S { ∈S $ i,m } integer and n is the dimension of≤θ. ≥ end for end for B. Non-Sequential Testing output: SK The non-sequential approach distributes the measurement budget uniformly over the n components, making 2m i.i.d. observations of each. Let yi,1, . . . , yi,2m denote the m obser- D. Sequential Thresholding Satisfies Budget vations of component i, and let Ti,2m denote the corresponding The number of measurements used by sequential threshold- test statistic. The UMP test takes the form ing satisfies the overall measurement budget N 2mn in ≤ θ1 expectation. Let s = , the cardinality of the support set. |S| Ti,2m ≷ τ. (2) The expected number of measurements is θ0 K−1 K−1 The estimated support set at threshold τ is defined to be E m(n s) k k− + ms % |S |' ≤ 2 τ := i : Ti,2m >τ . k&=0 k&=0 ( ) S { } 2m(n s)+msK . This estimator is optimal among all (non-sequential) ≤ − component-wise procedures because each test is UMP. Our interest is in high-dimensional limits of n and s (and possibly K). Suppose that sK grows sublinearly with n. Then C. Sequential Thresholding for any $> 0 there exists an N such that E K−1 # k=0 |Sk| ≤ The sequential method we proposed is based on the fol- 2(1 + $)mn for every n > N . For ease of exposition, we # *+ , lowing simple bisection idea. Instead of aiming to identify suppress the factor 1+$ as we proceed; it does not effect the the components in (those with θ = θ ), at each step of S 1 main results and conclusions of the paper as allowing the non- the sequential procedure we aim to eliminate about 1/2 of sequential method 2(1+$)mn observations is inconsequential. the components that follow the null, θ = θ0, from further consideration. The components that remain after K such steps E. Implementations is our estimate of the set . There are two possible implementations of sequential Suppose we begin by usingS half of our measurement budget thresholding which we refer to as parallel and scanning. to collect m observations of each component. The test statistic parallel: The parallel implementation measures and tests all level. n components in parallel according to the procedure. From the Wald equation, the expected stopping time of the scanning: The scanning implementation measures and tests SPRT per index is (approximately) [3] the n components in a sequence (which can be arbitrary). 1 β β E [N "] = µ−1 α log − + (1 α) log For example, the scanning implementation can begin with 0 ∼ 0 α − 1 α component i =1and repeatedly measure and threshold the . ( ) ( − )/ 1 β β observations up to K times. If an observation falls below the E [N "] = µ−1 (1 β) log − + β log , 1 ∼ 1 − α 1 α threshold at any point, then the scanning procedure immedi- . ( ) ( − )/ ately moves on to the next component. If K observations are where E denotes the expectation under f( θ ) and µ := i ·| i i made without an observation falling below the threshold, then f(y|θ1) Ei log , i =0, 1. In our case α = $/(n ) and the component is added to the set S . The expected number f(y|θ0) −|S| K β =* $/ , and, as $ 0 we have of observations obeys the same bound as derived above. |S| → E " −1 $ The two implementations are equivalent from a theoretical 0[N ] ∼= µ0 log perspective. The parallel implementation may be more natural |S| n for large-scale experimental designs (e.g., in the biological E " −1 1[N ] ∼= µ1 log −|S| . sciences), whereas the scanning implementation is more ap- $ propriate in communications applications such as spectrum If n, then the expected total number of measurements |S| + sensing. The latter also reveals natural connections between of made by all n SPRTs is sequential thresholding and sequential probability ratio tests. " " n $ E[N] = (n )E0[N ]+ E1[N ] = log . −|S| |S| ∼ µ0 F. Connection to Sequential Probability Ratio Tests |S| Note that µ = D := D(f( θ ) f( θ )), the KL As we will show in the following section, in the high- 0 0 0 1 of f( θ−) from f( −θ ), so·| expected|| ·| total number dimensional limit (n ) sequential thresholding can drive 1 0 of observations made·| by the n·|SPRTs is the probability of error→∞ to zero if the divergence between the null and alternative distributions is log times a small n |S| E[N] = log |S| . constant. This specializes in the Gaussian setting to the re- ∼ D0 $ quirement that the difference between the is at least It follows from the optimality of the SPRT that no other 4 log , which compares favorably to the requirement that component-wise testing procedure with $ error-rate requires the difference|S| exceeds √2 log n for non-sequential methods. " fewer observations. Now let us constrain this expected total to In fact, the log dependence of sequential thresholding |S| be less than or equal to 2mn. This yields a necessary condition is optimal, up to constant factors. This follows from well- for controlling the probability of error of any sequential test: known results in sequential testing. Let denote the result of any testing procedure based on n localS (component-wise) 1 D(f( θ0) f( θ1)) " log |S| . tests of the form H : i against H : -i . Each test is ·| || ·| 2m $ 0 #∈ S 1 ∈S based on the sequential observations yi,1,yi,2, . . . , yi,N , and III.MAIN RESULTS the stopping time of the test is the value of N (possibly The main results rely on the extremal properties of the random) when the decision is made. test statistic. We say that a testing procedure is reliable if it Suppose that each individual test has false-positive and drives the probability of error to zero in the high-dimensional false-negative error probabilities less than α := $/(n ) limit. More formally, consider a sequence of multiple testing and β := $/ , respectively. Then the expected total number−|S| |S| problems indexed by dimension n. Let (n) denote the true of errors is E S C + E SC 2$. It is necessary that S | ∩S | | ∩S|≤ support set and let (n)= τ (non-sequential procedure) or this expected number tend to zero in order for the probability S S (n)= (sequential procedure). We define a notion of of error, P(S =- ), to tend zero.- With the above specifications S SK reliability as follows.- for the two types# S of error, it is possible to design a sequential - probability- ratio test (SPRT) for each component. Definition III.1. (Reliability) Let denote the error event The SPRT computes a sequence of likelihood ratios, where (n) = (n) . We say that the supportE set estimator (n) {S # S } S ' is the likelihood ratio of y , . . . , y , n 1. The SPRT is reliable if limn→∞ P( ) = 0; i.e., if the probability of error i,n i,1 i,n ≥ E terminates when 'i,n B or 'i,n A, where the thresholds tends- to zero as n grows. - A and B are determined≥ by the equations≤ α = B−1(1 β) To keep the notation simple, in what follows we will and β = A(1 α) (see [3] p. 11). Note that, unlike sequential− not explicitly indicate the dependence of the statistics on n. thresholding,− the SPRT requires knowlege of both distributions We show that the non-sequential testing procedure in (2) is as well as the level of sparsity. Since such information is unreliable at every threshold level τ if usually unavailable in applications, we advocate the use of sequential thresholding instead; it requires only crude knowl- max T lim P i%∈S i,2m 1 =1. (3) edge of the null and nothing about the alternative or sparsity n→∞ median(T θ ) ≥ ( 2m| 1 ) Sequential testing according to sequential thresholding is reli- B. Capability of Sequential Thresholding able if Theorem III.3. If (4) holds, then sequential thresholding is reliable if K = (1 + $) log n, for $> 0. Specifically, if is minK min T (k) 2 E# P k=1 i∈S i,m the error event = , then for any $> 0 lim 1 =0, (4) {SK # S} n→∞ 0 median (Tm θ0) ≤ 1 | lim P ( #) = 0. n→∞ E and K = (1 + $) log n, for any $> 0. We are interested 2 Proof: The probability of error is in ranges of θ1 >θ0 that satisfy the conditions above. In many cases of interest, (3) and (4) hold simultaneously for P( ) := P ( = ) E# SK # S a wide range parameter values. This implies that there are c c = P ( = K = ) many regimes in which sequential methods are reliable, but {S∩SK # ∅}∪{S ∩S # ∅} P ( c = )+P ( c = ) , (5) non-sequential methods are not. ≤ S∩SK # ∅ S ∩SK # ∅ For example, we show in Section IV-A that if the underlying where the superscript c denotes the complementation of the set. component distributions are unit Gaussian with means The upper bound on the probability of error consists of two θ0 =0and θ1 > 0, then the non-sequential procedure (2) is terms, the false-negative and false-positive probabilities. The 1 false positive probability (second term in (5)) can be bounded unreliable if θ1 < m log n whereas sequential thresholding as follows. Because we threshold at the median of the null is reliable if θ 2 2 log( log n). The size of the 1 ≥ m |S| 2 distribution, approximately half of the null components survive sparse support is typically2 much smaller than the overall each step. dimension n, and|S| so there are many cases in which the P c sequential method is reliable but the non-sequential method is ( K = ) S ∩S # ∅ not. The gap between the two conditions can be exponentially K large in terms of the dimension n. As a specific example, if the = P T (k) median (T θ )  i,m ≥ m| 0  support is log n, then there are constants C, c > 0 such i%∈S k=1 |S| ∼ 5 8 9 : that the sequential method is reliable if θ1 C√log log n and  K  ≥ P (1) the non-sequential method is not if θ c√log n. Ti,m median (Tm θ0) 1 ≤ ≤ ≥ | &i%∈S ; ; << n A. Limitation of Non-Sequential Testing = −|S| 2K

Theorem III.2. If (3) holds, then the non-sequential procedure Since K = (1 + $) log2 n, with $> 0, we have in (2) is unreliable. Specifically, if is the error event = Eτ {Sτ # P c , then for every τ lim ( K = ) = 0 S} n→∞ S ∩S # ∅ Bounding the false-negative probability (first term in (5)) P 1 lim ( τ ) . depends on the distribution of the test statistic under the n→∞ E ≥ 2 alternative θ1: Proof: The non-sequential testing procedure accepts the P c (S SK = ) null hypothesis if the test statistic Ti,2m is less than some ∩ # K ∅ threshold, τ, and conversely, rejects the null hypothesis if P (k) = Ti,m median (Tm θ0) Ti,2m τ. The probability of error at threshold level τ is 0 ≤ | 1 ≥ k5=1 i5∈S 9 : K P (k) = min min Ti,m median (Tm θ0) P ( )=P T τ T <τ , k=1 i∈S ≤ | Eτ  { i,2m ≥ } { i,2m } ( ) i5%∈S i5∈S which, from (4), goes to zero in the limit, which completes   the proof. and the minimum probability of error is minτ P ( τ ). Now suppose we take τ = median(T θ ), the median valueE of the IV. APPLICATIONS 2m| 1 test statistic under the alternative. At this threshold level, the To illustrate the main results we consider three canonical false-negative rate would be 1/2, and so the overall probability settings arising in high-dimensional multiple testing. We again of error would be at least 1/2. It follows that the minimum have in mind a sequence of problems and consider behavior probability of error can be bounded from below by in the high-dimensional limit. Thus, when we write θ g(n) (or θ g(n)) we that the parameter θ may (must)≤ grow P P ≥ min ( τ ) min (1/2 , ( i%∈S Ti,2m median(T2m θ1) ) . with dimension n no faster (slower) than the function g(n). τ E ≥ ∪ { ≥ | } Throughout this section we let s := , the cardinality of the According to (3) the second argument above tends to 1 as support set (which may also be considered|S| to be a function of n , which completes the proof. n). →∞ A. Gaussian Model B. Gamma Model: Spectrum Sensing Gaussian noise models are commonly assumed in multi- Often termed hole detection, the objective of spectrum ple testing problems arising in the biological sciences (e.g., sensing is to identify unoccupied communication bands in the testing which of many genes or proteins are involved in a electromagnetic spectrum. Most of the bands will be occupied certain process or function). For example, a multistage testing by primary users, but these users may come and go, leaving procedure similar in spirit to sequential thresholding was certain bands momentarily open and available for secondary used to determine genes important for virus in users. Recent work in spectrum sensing has given considerable [4]. Consider a high dimensional hypothesis test in additive attention to such scenarios, including some work employing Gaussian noise where the parameter θ represents the mean of adaptive sensing methods (see, for example [6], [7]). the distribution. We assume the null hypothesis follows zero Following the notation throughout this paper, channel oc- cupation is parameterized by θ, with θ denoting the signal mean (θ0 =0), unit variance gaussian statistics; the alternate 0 plus noise power in the occupied bands, and θ1 representing hypothesis, mean θ1 > 0, unit variance: the noise only power in the un-occupied bands. Without loss of generality, we let θ1 =1. The statistics of a sin- iid (0, 1) ,i yi N #∈ S gle measurement follow a complex Gaussian distribution – ∼ ! (θ1, 1) ,i . iid N ∈S yi (0,θ). From Urkowitz’s seminal work [8], making m measurements∼CN of each index, the likelihood ratio test statistic 1) Non-Sequential Testing: We make 2m measurements of follows a Gamma distribution: each element of θ. The test statistic again follows a normal m distribution: (k) 2 iid Gamma (m,θ0) i T = yi,$ #∈ S (9) i,m | | ∼ Gamma (m, 1) i . 2m 1 j=1 ! ∈S 1 iid (0, 2m ) ,i & Ti,2m = yi,j N #∈ S (6) 2m ∼ (θ , 1 ) ,i . Remarkably, for this problem there exist constants C, c > 0 j=1 ! 1 2m & N ∈S such that the sequential testing procedure is reliable if θ0 ≥ C log(s log2 n), but the non-sequential testing procedure is log(n−s) 1 2m Corollary IV.1. If θ1 < m , then the non-sequential unreliable if θ0 c (n s) . To highlight this effect, if P ≤ − testing procedure in (2) is2 unreliable, i.e., minτ ( τ ) 1/2. s log n, then the gap between these conditions is doubly E ≥ exponential∼ in n. Proof: For the test statistic in equation (6), we satisfy (3) Since we are interested in detecting the sparse set of provided median (T θ ) log(n−s) (see, for example vacancies in the spectrum, our hypothesis test is reversed. We i,2m| 1 ≤ m [5]). By Theorem III.2 and since2median (Ti,2m θ1)=θ1, if reject the null hypothesis (occupied component) if the test | statistic falls below (rather than above) a certain threshold. In log(n s) this case, the likelihood ratio is monotone non-increasing for θ − 1 ≤ m θ1 θ0, and so the inequalities in the key conditions (3) and = (4)≤ are reversed: specifically, the non-sequential thresholding then non-sequential thresholding is unreliable. procedure is unreliable if 2) Sequential Testing: Sequential thresholding makes m P mini%∈S Ti,2m measurements of each component in the set k at each step. lim 1 =1 (10) S n→∞ median(T2m θ1) ≤ The test statistic follows a normal distribution: ( | ) and sequential thresholding is reliable if m 1 (0, 1 ) i (k) iid m K (k) T = yi,j N #∈ S (7) max max T i,m m ∼ (θ , 1 ) i . P k=1 i∈S i,m j=1 ! 1 m lim 1 =0. (11) & N ∈S n→∞ median (Tm θ0) ≥ 0 | 1 2 1) Non-Sequential Testing: In the non-sequential procedure Corollary IV.2. If θ1 > log(s log2 n), then sequential m (2), we make 2m measurements per index. The distribution thresholding will reliably recover2 . S of the test statistic follows a gamma distribution with shape Proof: In this case, equation (4) is satisfied provided parameter 2m.

2 log Ks 1 median(Tm θ0) θ1 m (see for example [5]). Since Corollary IV.3. If θ < 2(m 1)(n s) 2m , then the non- | ≤ − 0 − − median(Tm θ0) = 0, Theorem2 III.3 tells us that provided | sequential procedure in (2) is unreliable. 2 Proof: In this case, because the hypothesis test is reversed, θ1 log Ks (8) we aim to satisfy (10). Since median(T θ ) 2(m 1), ≥ m 2m 1 = we have | ≥ − with K = (1 + $) log n, we reliably recover . min T min T 2 S P i%∈S i,2m 1 P i%∈S i,2m 1 . median (T θ ) ≤ ≥ 2(m 1) ≤ ( 2m| 1 ) ( − ) θ0 If 2(m 1) > 1 , we show in Appendix A that the models show that sequential testing can be exponentially (in − (n−s) 2m right hand side above goes to 1 as n grows large. Together dimension n) more sensitive to the difference between the null 1 and alternative distributions, implying that subtle cases can be with Theorem III.2 this implies that if θ0 < 2(m 1)(n s) 2m then the non-sequential procedure is unreliable.− − much more reliably determined using sequential methods. 2) Sequential Testing: Sequential thresholding makes m REFERENCES measurements of each component in the set k at each step. S [1] J. Haupt, R. Castro, and R. Nowak, “Distilled sensing: Selective The test statistic follows the Gamma distributions in (9). for sparse signal recovery,” http://arxiv.org/abs/1001.5311.

log(s log2 n) [2] ——, “Improved bounds for sparse recovery from adaptive measure- Corollary IV.4. If θ0 > m , then sequential thresh- ments,” in Information Theory Proceedings (ISIT), 2010 IEEE Interna- olding is reliable. tional Symposium on, 2010, pp. 1563 –1567. [3] D. Siegmund, Sequential Analysis. New York, NY, USA: Springer- Proof: It suffices to show (11) is satisfied. For all m and Verlag, 2010. θ0, we have median(Tm θ0) θ0(m 1). We upper bound [4] L. Hao, A. Sakurai, T. Watanabe, E. Sorensen, C. Nidom, M. Newton, | ≥ − P. Ahlquist, and Y. Kawaoka, “Drosophila rnai screen identifies host genes (11) by important for influenza virus replication,” Nature, pp. 890–3, 2008. [5] M. R. Leadbetter, G. Lindgren, and H. Rootzen, Extremes and Related maxK max T (k) P k=1 i∈S i,m Properties of Random Sequences and Processes. Berlin: Springer, 1983. lim 1 [6] A. Tajer, R. Castro, and X. Wang, “Adaptive spectrum sensing for agile n→∞ θ0(m 1) ≥ 0 − 1 cognitive radios,” in Acoustics Speech and Signal Processing (ICASSP), which goes to zero in the limit provided θ (m 1) > log Ks 2010 IEEE International Conference on, 2010, pp. 2966 –2969. 0 − [7] W. Zhang, A. Sadek, C. Shen, and S. Shellhammer, “Adaptive spectrum (see appendix B) . Together with Theorem III.3 if sensing,” in Information Theory and Applications Workshop (ITA), 2010, log Ks 31 2010. θ0 > (12) [8] H. Urkowitz, “Energy detection of unknown deterministic signals,” Pro- m 1 ceedings of the IEEE, vol. 55, no. 4, pp. 523 – 531, 1967. − [9] J. A. Gubner, Probability and Random Processes for Electrical and with K = (1 + $) log2 n, then sequential thresholding is Computer Engineers. New York, NY, USA: Cambridge University Press, reliable. 2006. C. Poisson Model: Photon-based Detection Lastly we consider a situation in which the component distributions are Poisson. This model arises naturally in testing problems involving photon counting (e.g., optical communica- tions or biological applications using fluorescent markers). We let the (sparse) alternative follow a Poisson with fixed rate θ1, and the null hypothesis a rate θ0, θ0 >θ1:

iid Poisson(θ0) i yi #∈ S ∼ Poisson(θ ) i , ! 1 ∈S Note that as θ0 >θ1, our hypothesis test is reversed as in the spectrum sensing example (and equations (10) and (11)). The sufficient statistic for the likelihood ratio test is a sum of the individual measurements, again following a Poisson distribution. In this setting, the gap between sequential and non-sequential testing is similar to that of the Gaussian case. Proofs are left to Appendices C and D. log(n−s) Corollary IV.5. For any fixed θ1, if θ0 < 2m , non- sequential thresholding is unreliable.

log(s log2 n)+1 Corollary IV.6. For any fixed θ1, if θ0 > m , sequential thresholding is reliable.

V. CONCLUSION This paper studied the problem of high-dimensional testing and sparse recovery from the perspective of sequential analy- sis. The gap between the null parameter θ0 and the alternative θ1 plays a crucial role in this problem. We derived necessary conditions for reliable recovery in the non-sequential setting and contrasted them with sufficient conditions for reliable recovery using the proposed sequential testing procedure. Ap- plications of the main results to several commonly encountered APPENDIX If 2mθ < log(n s), then 0 − A. Gamma Non-Sequential n−s lim 1 1 e−2mθ0 =1 n→∞ The cumulative distribution function of Gamma(2m,θ0) is − − given as > log(? n−s) which is also true provided θ0 < 2m and concludes the 2m−1 $ proof. − γ γ 1 F (γ) = 1 e θ0 − θ0 '! D. Poisson Sequential $=0 ( ) & In sequential thresholding, for each i S hence, ∈ k n−s m 2m−1 $ (k) iid Poisson(mθ0) i mini%∈S Ti,2m − γ γ 1 #∈ S P θ0 Ti,m = yi,j 1 =1 e ∼ Poisson(mθ1) i . γ ≤ − 0 θ0 '!1 j=1 ! ∈S ( ) &$=0 ( ) & θ0 We need to show, for the test statistic above, Letting γ = 1 and taking the limit, it can be shown (n−s) 2m K P maxk=1 maxi∈S Ti,m −# n−s lim 1 =0. − 1 2m−1 n→∞ − (n−s) 2m (n s) 2m median(Tm θ0) ≥ lim 1 e ! " − ( | ) n→∞ − 0 '! 1 First, we note median(Tm θ0) mθ0 1. Hence, &$=0 | ≥ − − 1 =1 e (2m)! . P K − max max Ti,m median(Tm θ0) k=1 i∈S θ0 ≥ | If γ> 1 , then ( ) (n−s) 2m K P max max Ti,m mθ0 1 . min T ≤ k=1 i∈S ≥ − P i%∈S i,2m 1 =1. ( ) γ ≤ ( ) We can bound the probability of a single event by Chernoff’s B. Gamma Sequential bound [9], p.166. For Ti,m Possion(mθ1) we have: ∼ γ The cumulative distribution function of Gamma(m, 1) is −mθ1−γ log −1 P (T γ) e ! ! mθ1 " " given as i,m ≥ ≤ −γ log γ −1 ! ! mθ1 " " m−1 γ$ e . F (γ) = 1 e−γ ≤ − '! which implies $=0 & Ks hence, K γ −γ!log! mθ "−1" P max max Ti,m γ 1 1 e 1 m−1 Ks k=1 i∈S ≥ ≤ − − $ ( ) ( ) P K (k) −γ γ max max Ti,m γ =1 1 e . Letting γ = log Ks and taking the limit as n of the k=1 i∈S ≥ − − '! ( ) 0 $=0 1 →∞ & expression above for any fixed θ1, we conclude Letting γ = (1 + $) log Ks, for some $> 0, we have K maxk=1 maxi∈S Ti,m Ks lim P 1 =0 m−1 $ n→∞ log Ks ≥ 1 ((1 + $) log Ks) ( ) lim 1 1 1+# =0. n→∞ − 0 − (Ks) '! 1 Thus, if log Ks mθ0 1, or equivalently &$=0 ≤ − C. Poisson Non-Sequential log Ks +1 θ0 , The likelihood ratio statistic is distributed as ≥ m 2m sequential thresholding is reliable. iid Poisson(2mθ0) i Ti,2m = yi,j #∈ S ∼ Poisson(2mθ ) i . j=1 ! 1 & ∈S It suffices to show min T lim P i%∈S i,2m 1 =1. n→∞ median(T θ ) ≤ ( 2m| 1 ) log(n−s) for any θ0 < 2m . The bound we derive is loose, but sufficient to show the adaptive scheme is superior. First, we assume that median(T θ ) > 0. Next we have 2m| 1

P min Ti,2m median(T2m θ1) i%∈S ≤ | ( ) P (mini%∈S Ti,2m = 0) ≥ n−s =1 1 e−2mθ0 . − − > ?