arXiv:1103.5991v1 [math.ST] 30 Mar 2011 feeet htdvaefo h aeiei eoe by denoted is baseline the from deviate that elements of eoeaprmtrvco.Tedimension The vector. parameter a denote clrparameter scalar where eoe by denoted es htms t lmnsaeeult aeienl val baseline/null a to equal are elements its most that sense ag tosnso iloso oe,but more), or millions or (thousands large etn n prercvr rbe st identify to is problem recovery sparse and testing h ai rbe ssmaie sflos Let follows. applications. as engineering summarized and is scientific problem of basic range The broad a in esn ncgiierdo n ftemtvtosfrorwo our for motivations the of one radio, spectr cognitive including in problems sensing communications in relevant also aeydet t motnei h ilgclsine.I i It sciences. biological the in importance its atten to attracted has due problem lately This form. this of observations ue htasto bevtosaecletdpirt data non-sequenti to the prior as to collected refer are we observations what of in Typically, set analysis. a that sumes etn,ec fthe of each setting, ucino h ieso ftepolm ipesequenti simple A problem. the of dimension the of function a htsqeta etn a eepnnilymr sensitiv more sho exponentially models be encountered can commonly testing several sequential to that Applicat results procedure. main testing re the sequential reliable for proposed conditions the sufficient using with them contrast and utpetsigadsas eoeyfo h esetv o perspective the from recovery sparse and testing multiple et r efre oestimate to performed are above tests model the to according times) cases subtle that implying dimension), on dependence distributions the alternative of and null the between difference non-sequentia the ne in derive recovery We reliable proposed. er for is conditions of problem sary this probability for the procedure testing setting, this In analysis. sequential emc oerlal eemnduigsqeta methods. sequential using determined reliably more much be bevtosaegtee eunilyadaatvl,ba set adaptively, this and In sequentially gathered analysis. are sequential observations of perspective the from lem h bevto rcs ofcssnigrsucso certa on resources sensing focus allo to This process observation observations. the previous from gleaned information h parameter The ihdmninltsigadsas eoeypolm aris problems recovery sparse and testing dimensional High h ovninltertclteteto hspolmas- problem this of treatment theoretical conventional The Abstract hsppr netgtstehg-iesoa etn pro testing high-dimensional the investigates papers This f ( ·| Ti ae tde h rbe fhigh-dimensional of problem the studies paper —This θ θ ) 0 eunilAayi nHg Dimensional High in Analysis Sequential saprmti aiyo este nee ya by indexed densities of family parametric a is y (e.g., i utpeTsigadSas Recovery Sparse and Testing Multiple ∼ lcrcladCmue Engineering Computer and Electrical θ θ nvriyo Wisconsin-Madison of University ! sosre tcatclyacrigto according stochastically observed is ∈ n θ .I I. 0 mi:[email protected] Email: opnnsi esrd(n rmore or (one measured is components f f R =0 ( ( h olo h high-dimensional the of goal The . NTRODUCTION y y i i .Tespoto h presubset sparse the of support The ). | | θ θ atMalloy Matt 1 0 ) ) S . i i ∈S S #∈ n then and , θ component-wise n ssas nthe in sparse is a every be may θ i terms (in S setting l othe to e osof ions e on sed covery ∈ o is ror from ting, S tion ces- can R um (1) ws rk. . ue b- in al al w n e s f f odtosis conditions e h am itiuinmdl(hc rssi spectrum in arises constants (which for model sensing), distribution Gamma distributions. the one-sided der certain for pronounced more even demonstrate will as we long that as method reliable sequential is a contrast, In hntesqeta ehdi eibe u n non-sequent any if but unreliable reliable, is procedure is thresholding method sequential the then as long as succeed hsrsl,if result, this √ if only and if methods non-sequential using aecnrl h rbblt ferri oentrland natural more which sensing. is spectrum error as error such of the applications probability in than appropriate The demanding less control. is discovery rate false which the rates controls non-discovery sensing distilled the whereas adnlt ftespotst hssosta h sequenti the that shows whenever sensitive This more set. is method support the of cardinality identifying in error of of we here probability class Second, the large setting. with Gaussian a concerned the are to First, to specific applicable distinctions. s is distilled are main the approach tests; paper one-sided two by this characterized are problems in there results however the [2], [1] so-called the in to similar is etn.Teipoeeti seilyrmral when if remarkable e.g., sparse; especially very is is improvement The setting. rmtclymr estv osaldfeecsbtent be between can differences methods small sequential baseline/null to show sensitive results more can main dramatically that measurements The of made. number total be the on a budget a in impose components ‘interesting’ then of once, subset pass. component reduced second each a example, measure For on first others. focus might ignoring process of expense the the at components poaiiyo ro edn ozr as zero to tending error of (probability θ 0 ( o log log ·| ebgnb ttn anasmto bu h family the about assumption main a stating by begin We ogv es ftemi eut,cnie h aein case the consider results, main the of sense a give To ocmaesqeta n o-eunilmtoswe methods non-sequential and sequential compare To =0 θ ) Let . f lcrcladCmue Engineering Computer and Electrical n h lentv is alternative the and ( nvriyo Wisconsin-Madison of University n ·| h an rvddb h eunilmto are method sequential the by provided gains The . θ y mi:[email protected] Email: ) 1 θ y , . . . , doubly saGusa ihmean with Gaussian a is 0 S ∼ |S| n h lentv au of value alternative the and I P II. oetNowak Robert m θ xoeta in exponential 1 ROBLEM eiid admvralswt common with variables random i.i.d. be S ∼ |S| θ c slre hnacntn uil of mutiple constant a than larger is log 1 and itle sensing distilled > n hntegpbtenthese between gap the then , θ log C " 1 S if , TATEMENT > log 4 n |S| hnsqeta methods sequential then , n 0 . θ hnrlal detection reliable then , θ 0 0 |S|
Theorem III.2. If (3) holds, then the non-sequential procedure Since K = (1 + $) log2 n, with $> 0, we have in (2) is unreliable. Specifically, if is the error event = Eτ {Sτ # P c , then for every τ lim ( K = ) = 0 S} n→∞ S ∩S # ∅ Bounding the false-negative probability (first term in (5)) P 1 lim ( τ ) . depends on the distribution of the test statistic under the n→∞ E ≥ 2 alternative θ1: Proof: The non-sequential testing procedure accepts the P c (S SK = ) null hypothesis if the test statistic Ti,2m is less than some ∩ # K ∅ threshold, τ, and conversely, rejects the null hypothesis if P (k) = Ti,m median (Tm θ0) Ti,2m τ. The probability of error at threshold level τ is 0 ≤ | 1 ≥ k5=1 i5∈S 9 : K P (k) = min min Ti,m median (Tm θ0) P ( )=P T τ T <τ , k=1 i∈S ≤ | Eτ { i,2m ≥ } { i,2m } ( ) i5%∈S i5∈S which, from (4), goes to zero in the limit, which completes the proof. and the minimum probability of error is minτ P ( τ ). Now suppose we take τ = median(T θ ), the median valueE of the IV. APPLICATIONS 2m| 1 test statistic under the alternative. At this threshold level, the To illustrate the main results we consider three canonical false-negative rate would be 1/2, and so the overall probability settings arising in high-dimensional multiple testing. We again of error would be at least 1/2. It follows that the minimum have in mind a sequence of problems and consider behavior probability of error can be bounded from below by in the high-dimensional limit. Thus, when we write θ g(n) (or θ g(n)) we mean that the parameter θ may (must)≤ grow P P ≥ min ( τ ) min (1/2 , ( i%∈S Ti,2m median(T2m θ1) ) . with dimension n no faster (slower) than the function g(n). τ E ≥ ∪ { ≥ | } Throughout this section we let s := , the cardinality of the According to (3) the second argument above tends to 1 as support set (which may also be considered|S| to be a function of n , which completes the proof. n). →∞ A. Gaussian Model B. Gamma Model: Spectrum Sensing Gaussian noise models are commonly assumed in multi- Often termed hole detection, the objective of spectrum ple testing problems arising in the biological sciences (e.g., sensing is to identify unoccupied communication bands in the testing which of many genes or proteins are involved in a electromagnetic spectrum. Most of the bands will be occupied certain process or function). For example, a multistage testing by primary users, but these users may come and go, leaving procedure similar in spirit to sequential thresholding was certain bands momentarily open and available for secondary used to determine genes important for virus replication in users. Recent work in spectrum sensing has given considerable [4]. Consider a high dimensional hypothesis test in additive attention to such scenarios, including some work employing Gaussian noise where the parameter θ represents the mean of adaptive sensing methods (see, for example [6], [7]). the distribution. We assume the null hypothesis follows zero Following the notation throughout this paper, channel oc- cupation is parameterized by θ, with θ denoting the signal mean (θ0 =0), unit variance gaussian statistics; the alternate 0 plus noise power in the occupied bands, and θ1 representing hypothesis, mean θ1 > 0, unit variance: the noise only power in the un-occupied bands. Without loss of generality, we let θ1 =1. The statistics of a sin- iid (0, 1) ,i yi N #∈ S gle measurement follow a complex Gaussian distribution – ∼ ! (θ1, 1) ,i . iid N ∈S yi (0,θ). From Urkowitz’s seminal work [8], making m measurements∼CN of each index, the likelihood ratio test statistic 1) Non-Sequential Testing: We make 2m measurements of follows a Gamma distribution: each element of θ. The test statistic again follows a normal m distribution: (k) 2 iid Gamma (m,θ0) i T = yi,$ #∈ S (9) i,m | | ∼ Gamma (m, 1) i . 2m 1 j=1 ! ∈S 1 iid (0, 2m ) ,i & Ti,2m = yi,j N #∈ S (6) 2m ∼ (θ , 1 ) ,i . Remarkably, for this problem there exist constants C, c > 0 j=1 ! 1 2m & N ∈S such that the sequential testing procedure is reliable if θ0 ≥ C log(s log2 n), but the non-sequential testing procedure is log(n−s) 1 2m Corollary IV.1. If θ1 < m , then the non-sequential unreliable if θ0 c (n s) . To highlight this effect, if P ≤ − testing procedure in (2) is2 unreliable, i.e., minτ ( τ ) 1/2. s log n, then the gap between these conditions is doubly E ≥ exponential∼ in n. Proof: For the test statistic in equation (6), we satisfy (3) Since we are interested in detecting the sparse set of provided median (T θ ) log(n−s) (see, for example vacancies in the spectrum, our hypothesis test is reversed. We i,2m| 1 ≤ m [5]). By Theorem III.2 and since2median (Ti,2m θ1)=θ1, if reject the null hypothesis (occupied component) if the test | statistic falls below (rather than above) a certain threshold. In log(n s) this case, the likelihood ratio is monotone non-increasing for θ − 1 ≤ m θ1 θ0, and so the inequalities in the key conditions (3) and = (4)≤ are reversed: specifically, the non-sequential thresholding then non-sequential thresholding is unreliable. procedure is unreliable if 2) Sequential Testing: Sequential thresholding makes m P mini%∈S Ti,2m measurements of each component in the set k at each step. lim 1 =1 (10) S n→∞ median(T2m θ1) ≤ The test statistic follows a normal distribution: ( | ) and sequential thresholding is reliable if m 1 (0, 1 ) i (k) iid m K (k) T = yi,j N #∈ S (7) max max T i,m m ∼ (θ , 1 ) i . P k=1 i∈S i,m j=1 ! 1 m lim 1 =0. (11) & N ∈S n→∞ median (Tm θ0) ≥ 0 | 1 2 1) Non-Sequential Testing: In the non-sequential procedure Corollary IV.2. If θ1 > log(s log2 n), then sequential m (2), we make 2m measurements per index. The distribution thresholding will reliably recover2 . S of the test statistic follows a gamma distribution with shape Proof: In this case, equation (4) is satisfied provided parameter 2m.
2 log Ks 1 median(Tm θ0) θ1 m (see for example [5]). Since Corollary IV.3. If θ < 2(m 1)(n s) 2m , then the non- | ≤ − 0 − − median(Tm θ0) = 0, Theorem2 III.3 tells us that provided | sequential procedure in (2) is unreliable. 2 Proof: In this case, because the hypothesis test is reversed, θ1 log Ks (8) we aim to satisfy (10). Since median(T θ ) 2(m 1), ≥ m 2m 1 = we have | ≥ − with K = (1 + $) log n, we reliably recover . min T min T 2 S P i%∈S i,2m 1 P i%∈S i,2m 1 . median (T θ ) ≤ ≥ 2(m 1) ≤ ( 2m| 1 ) ( − ) θ0 If 2(m 1) > 1 , we show in Appendix A that the models show that sequential testing can be exponentially (in − (n−s) 2m right hand side above goes to 1 as n grows large. Together dimension n) more sensitive to the difference between the null 1 and alternative distributions, implying that subtle cases can be with Theorem III.2 this implies that if θ0 < 2(m 1)(n s) 2m then the non-sequential procedure is unreliable.− − much more reliably determined using sequential methods. 2) Sequential Testing: Sequential thresholding makes m REFERENCES measurements of each component in the set k at each step. S [1] J. Haupt, R. Castro, and R. Nowak, “Distilled sensing: Selective sampling The test statistic follows the Gamma distributions in (9). for sparse signal recovery,” http://arxiv.org/abs/1001.5311.
log(s log2 n) [2] ——, “Improved bounds for sparse recovery from adaptive measure- Corollary IV.4. If θ0 > m , then sequential thresh- ments,” in Information Theory Proceedings (ISIT), 2010 IEEE Interna- olding is reliable. tional Symposium on, 2010, pp. 1563 –1567. [3] D. Siegmund, Sequential Analysis. New York, NY, USA: Springer- Proof: It suffices to show (11) is satisfied. For all m and Verlag, 2010. θ0, we have median(Tm θ0) θ0(m 1). We upper bound [4] L. Hao, A. Sakurai, T. Watanabe, E. Sorensen, C. Nidom, M. Newton, | ≥ − P. Ahlquist, and Y. Kawaoka, “Drosophila rnai screen identifies host genes (11) by important for influenza virus replication,” Nature, pp. 890–3, 2008. [5] M. R. Leadbetter, G. Lindgren, and H. Rootzen, Extremes and Related maxK max T (k) P k=1 i∈S i,m Properties of Random Sequences and Processes. Berlin: Springer, 1983. lim 1 [6] A. Tajer, R. Castro, and X. Wang, “Adaptive spectrum sensing for agile n→∞ θ0(m 1) ≥ 0 − 1 cognitive radios,” in Acoustics Speech and Signal Processing (ICASSP), which goes to zero in the limit provided θ (m 1) > log Ks 2010 IEEE International Conference on, 2010, pp. 2966 –2969. 0 − [7] W. Zhang, A. Sadek, C. Shen, and S. Shellhammer, “Adaptive spectrum (see appendix B) . Together with Theorem III.3 if sensing,” in Information Theory and Applications Workshop (ITA), 2010, log Ks 31 2010. θ0 > (12) [8] H. Urkowitz, “Energy detection of unknown deterministic signals,” Pro- m 1 ceedings of the IEEE, vol. 55, no. 4, pp. 523 – 531, 1967. − [9] J. A. Gubner, Probability and Random Processes for Electrical and with K = (1 + $) log2 n, then sequential thresholding is Computer Engineers. New York, NY, USA: Cambridge University Press, reliable. 2006. C. Poisson Model: Photon-based Detection Lastly we consider a situation in which the component distributions are Poisson. This model arises naturally in testing problems involving photon counting (e.g., optical communica- tions or biological applications using fluorescent markers). We let the (sparse) alternative follow a Poisson with fixed rate θ1, and the null hypothesis a rate θ0, θ0 >θ1:
iid Poisson(θ0) i yi #∈ S ∼ Poisson(θ ) i , ! 1 ∈S Note that as θ0 >θ1, our hypothesis test is reversed as in the spectrum sensing example (and equations (10) and (11)). The sufficient statistic for the likelihood ratio test is a sum of the individual measurements, again following a Poisson distribution. In this setting, the gap between sequential and non-sequential testing is similar to that of the Gaussian case. Proofs are left to Appendices C and D. log(n−s) Corollary IV.5. For any fixed θ1, if θ0 < 2m , non- sequential thresholding is unreliable.
log(s log2 n)+1 Corollary IV.6. For any fixed θ1, if θ0 > m , sequential thresholding is reliable.
V. CONCLUSION This paper studied the problem of high-dimensional testing and sparse recovery from the perspective of sequential analy- sis. The gap between the null parameter θ0 and the alternative θ1 plays a crucial role in this problem. We derived necessary conditions for reliable recovery in the non-sequential setting and contrasted them with sufficient conditions for reliable recovery using the proposed sequential testing procedure. Ap- plications of the main results to several commonly encountered APPENDIX If 2mθ < log(n s), then 0 − A. Gamma Non-Sequential n−s lim 1 1 e−2mθ0 =1 n→∞ The cumulative distribution function of Gamma(2m,θ0) is − − given as > log(? n−s) which is also true provided θ0 < 2m and concludes the 2m−1 $ proof. − γ γ 1 F (γ) = 1 e θ0 − θ0 '! D. Poisson Sequential $=0 ( ) & In sequential thresholding, for each i S hence, ∈ k n−s m 2m−1 $ (k) iid Poisson(mθ0) i mini%∈S Ti,2m − γ γ 1 #∈ S P θ0 Ti,m = yi,j 1 =1 e ∼ Poisson(mθ1) i . γ ≤ − 0 θ0 '!1 j=1 ! ∈S ( ) &$=0 ( ) & θ0 We need to show, for the test statistic above, Letting γ = 1 and taking the limit, it can be shown (n−s) 2m K P maxk=1 maxi∈S Ti,m −# n−s lim 1 =0. − 1 2m−1 n→∞ − (n−s) 2m (n s) 2m median(Tm θ0) ≥ lim 1 e ! " − ( | ) n→∞ − 0 '! 1 First, we note median(Tm θ0) mθ0 1. Hence, &$=0 | ≥ − − 1 =1 e (2m)! . P K − max max Ti,m median(Tm θ0) k=1 i∈S θ0 ≥ | If γ> 1 , then ( ) (n−s) 2m K P max max Ti,m mθ0 1 . min T ≤ k=1 i∈S ≥ − P i%∈S i,2m 1 =1. ( ) γ ≤ ( ) We can bound the probability of a single event by Chernoff’s B. Gamma Sequential bound [9], p.166. For Ti,m Possion(mθ1) we have: ∼ γ The cumulative distribution function of Gamma(m, 1) is −mθ1−γ log −1 P (T γ) e ! ! mθ1 " " given as i,m ≥ ≤ −γ log γ −1 ! ! mθ1 " " m−1 γ$ e . F (γ) = 1 e−γ ≤ − '! which implies $=0 & Ks hence, K γ −γ!log! mθ "−1" P max max Ti,m γ 1 1 e 1 m−1 Ks k=1 i∈S ≥ ≤ − − $ ( ) ( ) P K (k) −γ γ max max Ti,m γ =1 1 e . Letting γ = log Ks and taking the limit as n of the k=1 i∈S ≥ − − '! ( ) 0 $=0 1 →∞ & expression above for any fixed θ1, we conclude Letting γ = (1 + $) log Ks, for some $> 0, we have K maxk=1 maxi∈S Ti,m Ks lim P 1 =0 m−1 $ n→∞ log Ks ≥ 1 ((1 + $) log Ks) ( ) lim 1 1 1+# =0. n→∞ − 0 − (Ks) '! 1 Thus, if log Ks mθ0 1, or equivalently &$=0 ≤ − C. Poisson Non-Sequential log Ks +1 θ0 , The likelihood ratio statistic is distributed as ≥ m 2m sequential thresholding is reliable. iid Poisson(2mθ0) i Ti,2m = yi,j #∈ S ∼ Poisson(2mθ ) i . j=1 ! 1 & ∈S It suffices to show min T lim P i%∈S i,2m 1 =1. n→∞ median(T θ ) ≤ ( 2m| 1 ) log(n−s) for any θ0 < 2m . The bound we derive is loose, but sufficient to show the adaptive scheme is superior. First, we assume that median(T θ ) > 0. Next we have 2m| 1
P min Ti,2m median(T2m θ1) i%∈S ≤ | ( ) P (mini%∈S Ti,2m = 0) ≥ n−s =1 1 e−2mθ0 . − − > ?