IEEE TRANSACTIONS ON , VOL. 60, NO. 5, MAY 2014 3019 Non-adaptive Group Testing: Explicit Bounds and Novel Chun Lam Chan, Sidharth Jaggi, Venkatesh Saligrama, Senior Member, IEEE, and Samar Agnihotri, Member, IEEE

Abstract— We consider some computationally efficient and in a group can be simultaneously tested, with a positive or provably correct algorithms with near-optimal sample complexity negative output depending on whether or not a “defective” for the problem of noisy nonadaptive group testing. Group item is present in the group being tested. In general, the testing involves grouping arbitrary subsets of items into pools. Each pool is then tested to identify the defective items, which goal of group testing algorithms is to identify the defective are usually assumed to be sparse. We consider nonadaptive set with as few measurements as possible. As demonstrated randomly pooling measurements, where pools are selected in [3] and later work (see [4] for a comprehensive survey randomly and independently of the test outcomes. We also of many group-testing algorithms and bounds), with judicious consider a model where noisy measurements allow for both some grouping and testing, far fewer than the trivial upper bound false negative and some false positive test outcomes (and also allow for asymmetric noise, and activation noise). We consider of n tests may be required to identify the set of defective three classes of algorithms for the group testing problem (we items. call them specifically the coupon collector , the column We consider the problem of non-adaptive group testing in matching algorithms, and the LP decoding algorithms—the this paper (see, for example, [4]). In non-adaptive group test- last two classes of algorithms (versions of some of which had ing, the set of items being tested in each test is required to be been considered before in the literature) were inspired by corresponding algorithms in the compressive sensing literature. independent of the outcome of every other test. This restriction The second and third of these algorithms have several flavors, is often useful in practice, since this enables parallelization of dealing separately with the noiseless and noisy measurement the testing process. It also allows for an automated testing scenarios. Our contribution is novel analysis to derive explicit process using “off-the-shelf” components. In contrast, the sample-complexity bounds—with all constants expressly procedures and hardware required for adaptive group testing computed—for these algorithms as a function of the desired error probability, the noise parameters, the number of items, and may be significantly more complex. the size of the defective set (or an upper bound on it). We also In this paper we describe computationally efficient algo- compare the bounds to information-theoretic lower bounds for rithms with near-optimal performance for “noiseless” and sample complexity based on Fano’s inequality and show that the “noisy” non-adaptive group testing problems. We now describe upper and lower bounds are equal up to an explicitly computable different aspects of this work in some detail. universal constant factor (independent of problem parameters). “Noisy” measurements: In addition to the noiseless group- Index Terms— Non-adaptive group testing, noisy mea- testing problem (i.e. wherein each test outcome is “true”), we surements, coupon collector’s problem, compressive sensing, consider the noisy variant of the problem. In this noisy variant LP-decoding. the result of each test may differ from the true result (in an I. INTRODUCTION independent and identically distributed manner) with a certain HE goal of group testing is to identify a small unknown pre-specified probability q. This leads to both false positive Tsubset D of defective items embedded in a much larger test outcomes and false negative test outcomes. set N (usually in the setting where d =|D| is much smaller Much of the existing work either considers one-sided noise, than n =|N |, i.e., d is o(n)). This problem was first namely false positive tests but no false negative tests or considered by Dorfman [3] in scenarios where multiple items vice-versa (e.g. [5]), or an activation noise model wherein a defective item in a test may behave as a non-defective Manuscript received February 16, 2012; revised September 10, 2012; accepted November 13, 2012. Date of publication March 11, 2014; date of item with a certain probability (leading to a false negative current version April 17, 2014. This paper was presented at the 2011 49th test) (e.g. [5]), or a “worst-case” noise model [6] wherein the Annual Allerton Conference on Communication, Control, and Computing [1] total number of false positive and negative test outcomes are and the 2012 IEEE International Symposium on Information Theory [2]. 1 C. L. Chan and S. Jaggi are with the Chinese University of Hong Kong, assumed to be bounded from above. Since the measurements Hong Kong (e-mail: [email protected]; [email protected]). V. Saligrama is with Boston University, Boston, MA 02215 USA (e-mail: 1For instance [6] considers group-testing algorithms that are resilient to all [email protected]). noise patterns wherein at most a fraction q of the results differ from their true S. Agnihotri is with the Indian Institute of Technology Mandi, Mandi values, rather than the probabilistic guarantee we give against most fraction-q 175001, India (e-mail: [email protected]). errors. This is analogous to the difference between combinatorial coding- Communicated by O. Milenkovic, Associate Editor for . theoretic error-correcting codes (for instance Gilbert-Varshamov codes [7]) Color versions of one or more of the figures in this paper are available and probabilistic information-theoretic codes (for instance [8]). In this work online at http://ieeexplore.ieee.org. we concern ourselves only with the latter, though it is possible that some of Digital Object Identifier 10.1109/TIT.2014.2310477 our techniques can also be used to analyzed the former.

0018-9448 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 3020 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 5, MAY 2014 are noisy, the problem of estimating the set of defective items are linear. In contrast, in group testing, the measurements are is more challenging.2 In this work we focus primarily on noise non-linear and boolean. Nonetheless, our contribution here models wherein a positive test outcome is a false positive is to show (via novel “perturbation analysis” that) a variety with the same probability as a negative test outcome being of relaxations result in a class of LP decoding algorithms a noisy positive. We do this primarily for ease of analysis that also have information-theoretically order-optimal sample (as, for instance, the Binary Symmetric Channel noise model complexity. is much more extensively studied in the literature than other In our LP relaxation, in the noise-free case the measure- less symmetric models), though our techniques also work for ments take the value one if some defective item is in the pool asymmetric error probabilities, and for activation noise. We and zero if no defective item is part of the pool. Furthermore, briefly show in Theorems 7 and 8 that our LP decoding noise in the group testing scenario is also boolean unlike the algorithms also have order-optimal sample complexity for additive noise usually considered in the compressive sensing these noise models. literature. For these reasons we also need to relax our boolean Computationally efficient and near-optimal algorithms: measurement equations. We do so by using a novel combi- Most algorithms in the literature focus on optimizing the nation of inequality and positivity constraints, resulting in a number of measurements required Ð in some cases, this leads class of Linear Programs. to algorithms that may not be computationally efficient to We consider the case when the number of defectives d is implement (for e.g. [5]). In this paper we consider algorithms known exactly. Our LP formulation and analysis is related to that are not only computationally efficient but are also near- error-correction [16] where one uses a “minimum distance” optimal in the number of measurements required. Variants of decoding criteria based on perturbation analysis. The idea is some of these algorithms have been considered before in the to decode to a vector pair consisting defective items, x,and rich and vast literature on group testing. When we later discuss the error vector, η, such that the error-vector η is as “small” the specific algorithms considered in this work we also cite as possible. We call this algorithm the No-LiPo- algorithm related algorithms from the existing literature. (for Noisy LP). Using novel “perturbation analysis”, certain We reprise lower bounds on group-testing algorithms based structural properties of our LP, convexity arguments, and stan- on information-theoretic analysis (similar bounds were known dard concentration results we show that the solution to our LP in the literature, for instance [9], [10] Ð we reproduce short decoding algorithm recovers the true defective items with high proofs here for completeness, since they help us compare probability. Based on this analysis, we can directly derive the upper and lower bounds and show that they match up to a performance of two other LP-based decoding algorithms. In universal constant factor). particular, LiPo considers the noiseless measurement scenario. For the upper bounds we analyze three different types of Furthermore, we demonstrate that each of these three algorithms. Some of these algorithms also have structural types of algorithms achieve near-optimal performance in the relationships to those described in the compressive sensing sense that the sample complexity of our algorithms match literature (see Section II-B, and our descriptions of the algo- information-theoretic lower bounds within a constant factor, rithms in Section II-C). where the constant is independent of the number of items n and The first algorithm is based on the classical Coupon Collec- the defective set size d (and at the cost of additional slackness, tor’s Problem [11]. The second is based on an iterative “Col- can even be made independent of the noise parameter q and umn Matching” algorithm. Hence we call these two algorithms the error probability ). Our sample-complexity bounds here respectively the CoCo algorithm (for Coupon Collector), and are expressly calculated, and all of the constants involved are the CoMa algorithms (for Column Matching algorithms). made precise. Neither CoCo nor CoMa are entirely new Ð variants of both “Small-error” probability : Existing work for the non- have also been previously considered in the group-testing adaptive group testing problem has considered both determin- literature (under different names) for both noiseless and noisy istic and random pooling designs [4]. In this context both scenarios (see, for instance [12] for algorithms based on the deterministic and probabilistic sample complexity bounds for idea of identifying all the non-defectives, as in the Coupon the number of measurements T that lead to exact identification Collector’s algorithm, and [13]Ð[15] for examples of Column of the defective items (with high probability (e.g., [5], [17]), Matching algorithms). or with probability 1 (e.g., [18], [19]) have been derived. Our third class of algorithms are related to linear program- These sample complexity bounds in the literature are often ming relaxations used in the compressive sensing literature. asymptotic in nature, and describe the scaling of the number In compressive sensing the 0 norm minimization is relaxed of measurements T with respect to the number of items to an 1 norm minimization. In the noise-free case this n and the number of defectives d, so as to ensure that relaxation results in a linear program since the measurements the error probability approaches zero. To gain new insights into the constants involved in the sample-complexity bounds we admit a small but fixed error probability, .3 With this 2We wish to highlight the difference between noise and errors.Weusethe former term to refer to noise in the outcomes of the group-test, regardless of the group-testing algorithm used. The latter term is used to refer to the error 3As shown in [18], [20], [21], the number of measurements required to due to the estimation process of the group-testing algorithm. Our models guarantee zero-error reconstruction even in the noiseless non-adaptive group- allow for noise to occur in measurements, and also allow for (arbitrarily testing setting scales as (d2 log(n)/ log(d)). Hence our relaxation to the small) probability of error Ð indeed, the latter is a design parameter of our “small-error” setting, where it is known in the literature that a number of algorithms. tests that scales as O(d log(n)) suffices. CHAN et al.: NON-ADAPTIVE GROUP TESTING 3021 perspective we can compare upper and lower bounds for • Our contribution for our suite of novel5 LP algorithms computationally efficient algorithms that hold not only in an is a novel perturbation analysis demonstrating that they order-wise sense, but also where the constants involved in are indeed order-optimal in sample-complexity. We also these order-wise bounds can be made explicit, for all values of demonstrate the versatility of these algorithms and our d and n. techniques for analyzing them, by demonstrating that. Explicit Sample Complexity Bounds: Our sample complexity • There are multiple LP relaxations for the underlying non- bounds are of the form T ≥ β(q,)d log(n). The function linear group-testing problem, each demonstrating inter- β(q,) is an explicitly computed function of the noise para- esting properties. For instance, we demonstrate via the meter q and admissible error probability . In the literature, No-LiPo- decoding algorithm that just the negative test order-optimal upper and lower bounds on the number of outcomes alone carry enough information to be able tests required are known for the problems we consider (for to reconstruct (with high probability) the input x with instance [5], [22]). In both the noiseless and noisy variants, order-optimal sample-complexity, even in the presence the number of measurements required to identify the set of of measurement noise. In the scenario with no noise, defective items is known to be T = (d log(n)) Ðhere the LiPo- decoding algorithm takes a particularly sim- n =|N | is the total number of items and d =|D| is the size ple form, where it reduces to solving just a feasibility of the defective subset. In fact, if only D, an upper bound problem. These different relaxations might be of interest on d, is known, then T = (D log(n)) measurements are in different scenarios. In addition, our analytical tech- also known to be necessary and sufficient. In most of our niques directly generalize to other noise models, such as algorithms we explicitly demonstrate that we require only a asymmetric noise, and activation noise. knowledge of D rather than the exact value of d (in the • The upper bounds we present differ from information- LP-based algorithms, for ease of exposition we first con- theoretic lower bounds by a constant factor that is inde- sider the case when d is known, and then demonstrate in pendent of problem parameters. the algorithm No-LiPo- that this knowledge is not needed). • Also, many of the outer bounds we present are in the Furthermore, in the noisy variant, we show that the number of main universal. That is, most of the classes of algo- tests required is in general a constant factor larger than in the rithms we consider (except for those based on Linear noiseless case (where this constant β is independent of both n Programming Ð universal versions of these algorithms is and d, but may depend on the noise parameter q and the a subject of ongoing research) do not need to know the allowable error-probability  of the algorithm Ð at the cost of exact values of d Ð as long as an outer bound D is known, additional slackness in the constant even these dependencies this suffices.6 can be removed).4 • We consciously draw connections between the compres- sive sensing and the group-testing literature, to show that despite the differences in models (linear versus non- A. Our Contributions linear, real measurements versus real measurements, etc.) algorithms that work for one setting may well work We summarize our contributions in this work as follows: in another. Such a connection has indeed been noted • Our contribution for the CoCo algorithm is that the before Ð for instance, [27] used group-testing ideas to analysis draws a simple but interesting connection to construct compressive sensing algorithms. Also, [23]Ð the classical coupon-collector problem. As far as we are [25] were motivated by CS-type LP-decoding algorithms aware this connection is novel, and hence interesting in to consider corresponding decoding algorithms for the its own right. group-testing problem. • Our contribution for CoMa algorithms is an explicit This paper is organized as follows. In Section II, we intro- analysis of the sample complexity, in comparison to the duce the model and corresponding notation (frequently used previous literature for such algorithms, which typically notation is summarized in Table I). In separate subsections focused on order-optimal analysis. Some elegant work in of this section we give background on some compressive the Russian literature also has information-theoretically sensing algorithms, describe the algorithms analyzed in this optimal analysis for high-complexity decoding for related work, and the relationships between these algorithms for these algorithms [10], [17], and good bounds [14], [15] for two disparate problems. We also reprise known information- lower-complexity variants (called the “Separate Testing of theoretic lower bounds on the number of tests required by Inputs” algorithm) of these CoMa algorithms. However any group-testing algorithm (with short proof sketches in the types of statistical decoding rules used therein are the Appendix). In Section III, we describe the main results structurally different than the ones we analyze, which can be implemented as dot-products of vectors of real num- 5In the noiseless setting prior work by [23], [24] had considered a similar bers, and hence are particularly computationally efficient LP approach (though no detailed analysis was presented). For the scenario with noisy measurements, in parallel with the conference version of our work in modern computer architectures. presented at [2], a similar algorithm was considered in [25] Ð however, the proof sketch presented therein requires disjunct measurement matrices, which are known not to be order-optimal in sample-complexity in the “small-error” 4In fact, while our analysis can be done for all values of d and n,in case [9], [10]. situations where preserving such generality makes the bounds excessively 6Prior work, for instance by [26], had also considered group testing that unwieldy we sometimes use the fact that D = o(n),orthatd = ω(1). was similarly universal in D. 3022 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 5, MAY 2014

The observed vector of test outcomes in the noisy sce- nario is denoted by the length-T binary noisy result vector yˆ Ð the probability that each entry yi of y differs from the corresponding entry yˆi in yˆ is q,whereq is the noise parameter. The locations where the noiseless and the noisy result vectors differ is denoted by the length-T binary noise vector ν, with 1s in the locations where they differ. Tests (if any) in which the noise perturbs a negative to a positive outcome are called false positives, and tests (if any) in which the noise perturbs a positive to a negative outcome are called false negatives. Two alternate noise models are also briefly considered in Theorems 7 and 8. Firstly, the asymmetric bit-flip model gener- alized the BSC(q) model described in the previous paragraph, Fig. 1. An example demonstrating a typical non-adaptive group-testing by allowing the probabilities of false positive test outcomes T × n setup. The binary group-testing matrix represents the items being and false negative test outcomes (denoted respectively by q0 tested in each test, the length-n binary input vector x is a weight d vector encoding the locations of the d defective items in D, the length-T binary and q1) to be different. Secondly, in the activation noise vector y denotes the outcomes of the group tests in the absence of noise, model individual defective items in tests have a non-activation the length-T binary noisy result vector yˆ denotes the actually observed probability u. That is, defective items in a test have an noisy outcomes of the group tests, as the result of the noiseless result vector being perturbed by the length-T binary noise vector ν. The length- i.i.d. probability of behaving like non-defectives in that test. n binary estimate vector xˆ represents the estimated locations of the defective This leads to false negative test outcomes. In addition, in items. the activation noise model we also allow false positive test outcomes to occur in an i.i.d. manner (with probability q ). of this work. Section IV contains the analysis of the group- 0 This is a generalization of the “usual” activation noise model testing algorithms considered. (for instance [5]). The estimate of the locations of the defective items is II. BACKGROUND encoded in the length-n binary estimate vector xˆ, with 1s A. Model and Notation in the locations where the group-testing algorithms described AsetN contains n items, of which an unknown subset in Section II-C estimate the defective items to be. Items (if D are said to be “defective”. The size of D is denoted by any) in which non-defective items are (incorrectly) decoded as d. We assume that D, an upper bound on the true value defective items are called false defective estimates, and items of d, is known apriori.7 The goal of group-testing is to (if any) in which defective items are (incorrectly) decoded as correctly identify the set of defective items via a minimal non-defective items are called a false non-defective estimate. number of “group tests”, as defined below (see Fig. 1 for a The probability of error of any group-testing algorithm is graphical representation). defined as the probability (over the input vector x and noise Each row of a T × n binary group-testing matrix M vector ν) that the estimated vector differs from the input corresponds to a distinct test, and each column corresponds vector at all.8 to a distinct item. Hence the items that comprise the group B. Compressive Sensing i being tested in the th test are exactly those corresponding to Compressive sensing has received significant attention over columns containing a 1 in the ith location. The method of the last decade. We describe the version most related to the generating such a matrix M is part of the design of the group D topic of this paper [30], [31]. This version considers the test Ð this and the other part, that of estimating the set ,is following problem. Let x be an exactly d-sparse vector in Rn, described in Section II-C. i.e., a vector with at most d non-zero components (in general n input x N The length- binary vector represents the set ,and in the situations of interest d = o(n)).9 contains 1s exactly in the locations corresponding to the items of D. The locations with ones are said to be defective Ðthe 8However, as is common in information-theoretic proofs, we instead calcu- other locations are said to be non-defective. We use these terms late the probability of error for any fixed x averaged over the randomness in the noise vector ν and over the choice of measurement matrices M (which are interchangeably. chosen from a random ensemble). Standard averaging arguments then allow The outcomes of the noiseless tests correspond to the length- us to transition to the definition of probability of error stated above. Since T binary noiseless result vector y, with a 1 in the ith location this step is common to each of our proofs, we do not explicitly state this henceforth. if and only if the ith test contains at least one defective 9As opposed to an approximately d-sparse vector, i.e., a vector such that item. “most” of its energy is confined to d indices of x. One way of characterizing such vectors is to say that ||x − xd ||1 ≤ c1||x||1 for some suitably small 7 It is common (see for example [28]) to assume that the number d of 0 < c1 < 1. Here xd is defined as the vector matching x on the d compo- defective items in D is known, or at least a good upper bound D on d,is nents that are largest in absolute value, and zero elsewhere. The results of known apriori. If not, other work (for example [29]) considers non-adaptive [30], [31] also apply in this more general setting. However, in this work we algorithms with low query complexity that help estimate d.However,inthis are primarily concerned with the problem of group testing rather than that of work we explicitly consider algorithms that do not require such foreknowledge compressive sensing, and present the work of compressive sensing merely by of d Ð rather, our algorithms have “good” performance with O(D log(n)) way of analogy. Hence we focus on the simplest scenarios in which we can measurements. draw appropriate correspondences between the two problems. CHAN et al.: NON-ADAPTIVE GROUP TESTING 3023

TABLE I NOTATION USED FREQUENTLY IN THE PAPER.THE ASYMMETRIC NOISE AND ACTIVATION NOISE MODELS ARE ONLY BRIEFLY CONSIDERED IN THEOREMS 7 AND 8. OFTHELAST FIVE ITEMS,LAST FOUR ARE INTERNAL VARIABLES SPECIFICALLY IN OUR LP-BASED ALGORITHMS, AND THE OTHER (g) ISANINTERNAL VARIABLE SPECIFICALLY IN OUR COUPON-COLLECTOR MODEL.ALL OTHER NOTATION IS “UNIVERSAL”(APPLIES TO THE ENTIRETY OF THE PAPER)

Let z corresponds to a noise vector added to the mea- One popular method of efficient estimation of D is that of surement Mx. One is given a set of “compressed noisy Orthogonal matching pursuit (OMP) [32]10. The intuition is measurements” of x as that if the columns of the matrix M are “almost orthogonal” (every pair of columns have “relatively small” dot-product) y = Mx + z (1) then decoding can proceed in a greedy manner. In particular, ||z||2 ≤ c2 (2) the OMP algorithm computes the dot-product between y and each column mi of M, and declares D to be the set of d ||x||0 ≤ d (3) indices for which this dot-product has largest absolute value. One can show [32] that there exists a universal constant Here the constraint (2) corresponds to a guarantee that the − = − c3 noise is not “too large”, and the (non-linear) constraint (3) c3 such that if z 0 then with probability at least 1 d corresponds to the prior knowledge that x is d-sparse. The (over the choice of M, which is assumed to be independent T × n matrix M is designed by choosing each entry i.i.d. of the vector x) this procedure correctly reconstructs x with ≤ ( ) from a suitable probability distribution (for instance, the set T c3d log n measurements. Similar results can also be = of zero-mean, 1/n variance Gaussian random variables). The shown with z 0, though the form of the result is more decoder must use the resulting noisy measurement vector intricate. y ∈ RT and the matrix M to computationally efficiently 2) Basis Pursuit: An alternate decoding procedure proceeds estimate the underlying vector x. The challenge is to do so by relaxing the compressive sensing problem (in particular the with as few measurements as possible, i.e., with the number non-linear constraint (3)) into the convex optimization problem of rows T of M being as small as possible. called Basis Pursuit (BP). We now outline two well-studied decoding algorithms for = || || this problem, each of which motivated one class of algorithms x arg min x 1 (4) we analyze in this work. subject to ||y − Mx||2 ≤ c2 (5) 1) Orthogonal Matching Pursuit: We note that it is enough for the decoder to computationally efficiently estimate the It can be shown (for instance [30], [31]) that there exist support D of x, the set of indices on which x is non-zero, constants c4, c5 and c6 such that with T = c4d log(n), with ∗ correctly. This is because the decoder can then estimate x as probability at least 1 − 2c5n, the solution x to BP satisfies t −1 t ∗ (MD MD) MDy, which is the minimum mean-square error ||x − x||2 ≤ c6||z||2. estimate of x.(HereMD equals the T × d sub-matrix of M D whose columns numbers correspond to the indices in ,and 10In fact, there are many techniques that have a similar “column matching” T is a design parameter chosen to optimize performance.) flavour Ð see, for instance [33]Ð[35]. We focus on OMP just for its simplicity. 3024 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 5, MAY 2014

2) “Column Matching” Algorithms: We next consider algo- rithms that try to correlate columns of the measurement matrix M with the vector of observations yˆ. We consider two scenarios Ð the first when the observations are noiseless, and the second when they are noisy. In both cases we draw the analogy with a corresponding compressive sensing algorithm. Column Matching algorithm (CoMa): The T × n group-testing matrix M is defined as follows. Fig. 2. An example demonstrating the CoCo algorithm. Based on only on the A group selection parameter p is chosen (the exact values outcome of the negative tests (those with output zero), the decoder estimates of T and p are code-design parameters to be specified later). the set of non-defective items, and “guesses” that the remaining items are ( , ) ( , ) defective. Then, i.i.d. for each i j , mi, j (the i j th element of M)is set to be one with probability p, and zero otherwise. The decoding algorithm works “column-wise” on M Ðit C. Group-Testing Algorithms in This Work attempts to match the columns of M with the result vector y. That is, if a particular column j of M has the property that We now describe three classes of algorithms (in the latter all locations i where it has ones also corresponds to ones in two cases, in both the noiseless and noisy settings). The algo- yi in the result vector, then the jth item (x j ) is declared to be rithms are specified by the choices of encoding matrices and defective. All other items are declared to be non-defective.13 decoding algorithms. Their performance is stated in Section III Note that this decoding algorithm never outputs false non- and the corresponding proofs of the algorithms are presented defective estimates, only false defective estimates. A false in Section IV. defective estimate occurs when all locations with ones in the 1) Coupon Collector Algorithm (CoCo): We first consider jth column of M (corresponding to a non-defective item j) an algorithm that consider rows of the measurement matrix are “hidden” by the ones of other columns corresponding to M, and try to correlate these with the observations y. defectives items. That is, let column j and some other columns The T × n group-testing matrix M is defined as follows. j1,..., jk of matrix M be such that for each i such that  A group sampling parameter g is chosen (the exact values of T mi, j = 1, there exists an index j in { j1,..., jk} for which and g are code-design parameters to be specified later). Then, mi, j  also equals 1. Then if each of the { j1,..., jk}th items the ith row of M is specified by sampling with replacement11 are defective, then the jth item will also always be declared as from the set [1,...,n] exactly g times, and setting the (i, j) defective by the CoMa decoder, regardless of whether or not location to be one if j is sampled at least once during this it actually is. The probability of this event happening becomes process, and zero otherwise.12 smaller as the number of tests T become larger. The decoding algorithm proceeds by using only the tests The rough correspondence between this algorithm and which have a negative outcome, to identify all the non- Orthogonal Matching Pursuit ([32]) arises from the fact that, defective items, and declaring all other items to be defective. as in Orthogonal Matching Pursuit, the decoder attempts to If M is chosen to have enough rows (tests), each non-defective match the columns of the group-testing matrix with the result item should, with significant probability, appear in at least one vector. An example of this algorithm is given in Fig. 3. negative test, and hence will be appropriately accounted for. Noisy Column Matching algorithm (No-CoMa): In the Errors (false defective estimates) occur when at least one non- No-CoMa algorithm, we relax the sharp-threshold requirement defective item is not tested, or only occurs in positive tests in the original CoMa algorithm that the set of locations of (i.e., every test it occurs in has at least one defective item). ones in any column of M corresponding to a defective item The analysis of this type of algorithm comprises of estimating be entirely contained in the set of locations of ones in the result the trade-off between the number of tests and the probability vector. Instead, we allow for a certain number of “mismatches” of error. Ð this number of mismatches depends on both the number of More formally, for all tests i whose measurement outcome ones in each column, and also the noise parameter q. yi is a zero, let mi denote the corresponding ith row of M. Let p and τ be design parameters to be specified later. To The decoder outputs xˆ as the length-n binary vector which has generate the M for the No-CoMa algorithm, each element of 0s in exactly those locations where there is a 1 in at least one M is selected i.i.d. with probability p to be 1. such mi . An example of this algorithm is given in Fig. 2. The decoder proceeds as follows, For each column j,we define the indicator set T j as the set of indices i in that column 11Sampling without replacement is a more natural way to build tests, but the analysis is trickier. However, the performance of such a group-testing 13Note the similarity between this algorithm and OMP. Just as OMP tries scheme can be shown to be no worse than the one analyzed here [36]. Also to select columns of the measurement matrix that have “small angle” (large see Footnote 12. dot-product) with the vector of observations, similarly this algorithm tries 12Note that this process of sampling each item in each test with replacement to detect columns of M with maximal “overlap” with the result vector y. results in a slightly different distribution than if the group-size of each test Indeed, even though the measurement process in the group-testing problem was fixed aprioriand hence the sampling was “without replacement” in is fundamentally non-linear, in current computer architectures where linear each test. (For instance, in the process we define, each test may, with some algebraic operations often have low computational complexities, a computa- probability, test fewer than g items.) The primary advantage of analyzing the tionally efficient means of implementing such a test is to check whether the “with replacement” sampling is that in the resulting group-testing matrix every dot-product between y and each column of M equals the number of ones in entry is then chosen i.i.d.. that column. Hence the name CoMa . CHAN et al.: NON-ADAPTIVE GROUP TESTING 3025

Fig. 4. An example demonstrating the No-CoMa algorithm. The algorithm matches columns of M to the result vector up to a certain number of mismatches governed by a threshold. In this example, the threshold is set so that the number mismatches be less than the number of matches. For instance, in (b) above, the 1s in the third column of the matrix match the 1s in the result vector in two locations (the 5th and 7th rows), but do not match only in one Fig. 3. An example demonstrating the CoMa algorithm. The algorithm location in the 4th row (locations wherein there are 0s in the matrix columns matches columns of M to the result vector. As in (b) in the figure, since the but 1s in the result vector do not count as mismatches). Hence the decoder result vector “contains” the 7th column, then the decoder declares that item to declares that item to be defective, which is the correct decision. However, be defective. Conversely, as in (c), since there is no such containment of the consider the false non-defective estimate generated for the item in (c). This last column, then the decoder declares that item to be non-defective. However, corresponds to the 7th item. The noise in the 2nd, 3rd and 4th rows of ν sometimes, as in (a), an item that is truly non-defective, is “hidden” by some means that there is only one match (in the 7th row) and two mismatches (2nd other columns corresponding to defective items, leading to a false defective and 4th rows) Ð hence the decoder declares that item to be non-defective. estimate. Also, sometimes, as in (a), an item that is truly non-defective estimate, has a sufficient number of measurement errors that the number of mismatches is reduced to be below the threshold, leading to a false defective estimate. where mi, j = 1. We also define the matching set S j as the set of indices j where both yˆi = 1 (corresponding to the noisy result vector) and mi, j = 1. Then the decoder uses the following “relaxed” thresholding rule. If |S j |≥|T j |(1 − q(1 + τ)), then the decoder declares the jth item to be defective, else it declares it to be non- defective.14 3) LP-Decoding Algorithms: We now consider LP-decoding algorithms in both the noiseless and noisy settings. The algorithms are specified by the choices of encoding matrices and decoding algorithms. The T × n group-testing matrix M is defined by randomly selecting each entry in it in an i.i.d. manner to equal 1 with probability p = 1/D, Fig. 5. No-LiPo- : Constraint (9) relaxes the constraint that each x j ∈{0, 1}, and 0 otherwise. ¯ ¯ Noisy Linear Programming decoding (No-LiPo-): and constraint (8) indicates that there are exactly d defective items in the dth iteration of the LP. The variables ηi are “slack variables” in the equations (7). A linear relaxation of the group testing problem with For instance, if test i is truly negative, then all the variables in an equation of noisy measurements leads “naturally” to No-LiPo- (6)Ð(10) in the form (7) are zero. However, if the test is a false negative, then the variable η η Fig. 5. In particular, each x is relaxed to satisfy 0 ≤ x ≤ 1, i is then set to equal the number of defective items in test i. Note that i i i is bounded above by d in the case of (false) negatives. Note that this linear and the non-linear measurements are linearized in (7). Also, program only uses negative test outcomes. In principle, one could also have we define “slack” variables ηi for all i ∈{1,...,T } to account defined constraints corresponding to positive test outcomes. However, we are for errors in the test outcome. For a particular test i this unable to analyze the performance of the resulting LP (though simulations indicate that such an LP does index perform well). ηi is defined to be zero if a particular test result is correct, and positive (and at least 1) if the test result is incorrect. 14As in Footnote 13, this “relaxed” thresholding rule can be viewed as analogous to corresponding relaxations in OMP-type algorithms when the Of course, the decoder does not know aprioriwhich scenario observations are noisy. a particular test outcome falls under, and hence has to also 3026 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 5, MAY 2014

probability of error via computationally efficient algorithms. Some of these algorithms are essentially new (LP-based decod- ing algorithms in Theorems 6Ð9) (though versions of these algorithms were stated with minimal analysis in [23]Ð[25]). Others have tighter analysis than previously in the literature (CoMa and No-CoMa) novel analysis (coupon-collector based analysis for CoCo , perturbation analysis for the LP-based algorithms). In each case, to the best of our knowledge ours is the first work to explicitly compute the tradeoff between the Fig. 6. LiPo : This LP simply attempts to find any feasible solution for any number of tests required to give a desired probability of error, value of d¯ ∈{0,...,D}. rather than giving order of magnitude estimates of the number η of tests required for a “reasonable” probability of success. decode . Nonetheless, as is common in the field of error- First, the CoCo algorithm has a novel connection to the correction [16], often using a “minimum distance” decoding Coupon Collector’s problem. ( ,η) criteria (decoding to a vector pair x such that the error- Theorem 3: CoCo n−δ η with error probability at most vector is as “small” as possible) leads to good decoding requires no more than 2eD(1 + δ)ln(n) tests. performance. Our LP decoder attempts to do so. To be precise, Next, we consider the “column matching” algorithms. No-LiPo- (xˆ, η)ˆ −δ the decoder outputs the that minimize the LP Theorem 4: CoMa with error probability at most n given in (6)Ð(10). Note that we assume that the value of d is requires no more than eD(1 + δ)ln(n) tests. known precisely in this algorithm. Translating CoMa into the noisy observation scenario is Linear Programming decoding (LiPo) : non-trivial. A more careful analysis for the thresholded scheme LiPo , which analyzes the scenario with noiseless measure- in No-CoMa leads to the following result. We define γ as ments, is a special case of No-LiPo- with each ηi variable + δ (recall that wasdefinedasln(D)/ ln(n), and note that set to zero. Hence it reduces to the problem of finding any 1 + δ feasible point in the constraint set (11)Ð(13) in Fig. 6. lies in the interval [0, 1) and γ in the interval (δ/(δ +1), 1]. Let the internal parameters τ and p for the remaining algo- D. Lower Bounds on the Number of Tests Required 1 − 2q 1 rithms be defined as τ = and p = . q(1 + γ −1/2) D We first reprise statements of information-theoretic lower −δ bounds (versions of which were considered by [9], [10]) on the Theorem 5: No-CoMa with√ error probability at most n ( + γ)2( + δ) number of tests required by any group-testing algorithm. For 16 1 1 ln 2 requires no more than − D log n tests. the sake of completeness, so we can benchmark our analysis (1 − e 2)(1 − 2q)2 Finally, we consider our LP-based algorithms. The main of the algorithms we present later, we state these lower bounds LP-based algorithm, whose analysis dominates the latter half here and prove them in the Appendix. of this work, is No-LiPo- , which considers the same noisy- Theorem 1 ([9], [10]): Any group-testing algorithm with measurement scenario that No-CoMa considers.16 noiseless measurements that has a probability of error of at −δ −δ −δ Theorem 6: No-LiPo- with error probability at most n most  requires at least (1 − n )D log n = (1 − n ) D requires no more than β D log n tests, with β defined as (1 − )D log(n) tests. LP LP (δ + + ) ( ) 2w(w + ( − )/ ) In fact, the corresponding lower bounds can be extended to 1 ln 2 2e 1 2q 3 , the scenario with noisy measurements as well. (1 − 2q)2 Theorem 2 ([9], [10]): Any group-testing algorithm that (1 − 2q) with w = 1 + . has measurements that are noisy i.i.d. with probability q and D that has a probability of error of at most  requires at least In fact, this algorithm is robust to other types of measure- −δ (1 − n )D log n (1 − n−δ)(1 − )D log (n) ment noise as well. For instance, in the asymmetric noise D = tests.15 1 − H (q) 1 − H (q) model, we can make the following statement. Note: Our assumption that D = o(n) implies that the Theorem 7: In the asymmetric noise model, No-LiPo- −δ bounds in Theorem 1 and 2 are (D log(n)). with error probability at most n requires no more than βASY−LPD log n tests, with βASY−LP defined as III. MAIN RESULTS (δ + + ) ( ) 2w(w + ( − − )/ ) 1 ln 2 2e 1 q0 q1 3 , 2 All logarithms in this work are assumed to be binary, except (1 − q0 − q1) where explicitly identified as otherwise (in some cases we (1 − q0 − q1) (.) with w = 1 + . explicitly denote the natural logarithm as ln ). We define D as ln(D)/ ln(n). Similarly, in the activation noise model, we can make the following statement. A. Upper Bounds on the Number of Tests Required Theorem 8: In the activation noise model, No-LiPo- with error probability at most n−δ requires no more than The main contributions of this work are explicit compu- tations of the number of tests required to give a desired 16The analysis of the constants in the LP-based algorithms are not optimized (doing so is analytically very cumbersome), but are given to demonstrate the 15Here H (.) denotes the binary entropy function. functional dependence on δ and q. CHAN et al.: NON-ADAPTIVE GROUP TESTING 3027

βACT −LPD log n tests, with βACT −LP defined as CoCo , the probability that any item occurs in any location of such a vector is uniform and independent. In fact this property (δ + 1 + )2ln(2)e2w (3w(1 + u − q ) − (1 − u − q )) 0 0 , (uniformity and independence of the value of each entry of ( − − )2 3 1 u q0 each test) also holds across tests. Hence, the items in any 1 − u − q with w = 1 + 0 . subsequence of k tests may be viewed as the outcome of a D For the same noiseless observation scenario as CBP, the process of selecting a single chain of gk coupons. This is still LP-based decoding algorithm CBP-LP (and which is implied true even if we restrict ourselves solely to the tests that have by the stronger analysis (for the noisy observations scenario) a negative outcome. The goal of CoCo may now be viewed for NCBP-LP in Theorem 6) has the following performance as the task of collecting all the non-defective items. This can guarantees. be summarized in the following equation −δ Theorem 9: LiPo with error probability at most n g β β n − d requires no more than D log n tests, with set as Tg ≥ (n − d) ln(n − d). (14) n    1 (δ + 1 + ) ln(2)2e2w (w + 1/3), with w = 1 + . D Note: Our achievability schemes in Theorems 4Ð9 are The left-hand side of Equation (14) refers to the expected commensurate (equal up to a constant factor) with the lower number of (possibly repeated) items in negative tests (since bounds in Theorems 1 and 2. For instance, the bound on the there are a total of T tests, each containing g (possibly number of tests in Theorem 5 differs from the corresponding repeated) items, and the probability of a test being negative equals ((n − d)/n)g). The right-hand side of (14) refers to lower bound√ in Theorem 2 by a factor that is at most 12.83(1 + γ)2(1 + δ)(1 − 2q)−2, which is a function only of the expected stopping-time of the underlying CCP. We thus δ (which is any positive number for a polynomially decaying optimize (14) w.r.t. g to obtain an optimal value of g equaling probability of error), γ (which depends on the scaling between 1/ ln(n/(n − d)). However, since the exact value of d is not D and n and lies between δ and 1), and q. However, taking known, but rather only D, an upper bound on it, we set g to the Taylor series expansion of 1 − H (q) about 1/2, for all equal 1/ ln(n/(n−D)). Taking the appropriate limit of n going q ∈ (0, 1/2),1− H (q) is always between (1 − 2q)2 and to infinity, and noting D = o(n), enables us to determine that, (1 − 2q)2/(2ln(2)), taking those extreme values for extreme in expectation over the testing process and the location of the values of q ∈ (0, 1/2). Hence for all values of q,thereisa defective items, (14) implies that T ≥ eDln n. finite (and explicitly computable) constant-factor gap between However, (14) only holds in expectation. For us to design a testing procedure for which we can demonstrate that the our information-theoretic lower bounds, and the upper bounds −δ achieved by our algorithms. number of tests decays to zero as n , we need to modify (14) to obtain the corresponding tail bound on T . This takes IV. PROOF OF THE PERFORMANCE OF ALGORITHMS a bit more work. χ( − ) ( − ) IN THEOREMS 3Ð9 The right-hand side is then modified to n d ln n d . This corresponds to the event that all types of coupons have A. Coupon Collector Algorithm not been collected if χ(n−d) ln(n−d) total coupons have been We first consider CoCo , whose analysis is based on a novel collected. The probability of this event is at most (n−d)−χ+1). use of the Coupon Collector Problem [11]. The left-hand side is multiplied with (1 − ρ),whereρ is Proof of Theorem 3: a design parameter to be specified by the Chernoff bound on The Coupon Collector’s Problem (CCP) is a classical prob- the probability that the actual number of items in the negative lem that considers the following scenario. Suppose there are n tests is smaller than (1 − ρ) times the expected number. By types of coupons, each of which is equiprobable. A collector n − d g the Chernoff bound this is at most exp −ρ2T . selects coupons (with replacement) until he has a coupon of n each type. What is the distribution on his stopping time? Taking the union bound over these two low-probability events It is well-known ([11]) that the expected stopping time is gives us that the probability that + ( ) n ln n n . Also, reasonable bounds on the tail of the distribution are also known Ð for instance, it is known that n − d g (1 − ρ)Tg ≥ χ(n − d) ln(n − d) (15) the probability that the stopping time is more than χn ln n is n at most n−χ+1. Analogously to the above, we view the group-testing pro- does not hold is at most cedure of CoCo as a Coupon Collector Problem. Consider g the following thought experiment. Suppose we consider any n − d −χ+ exp −ρ2T + (n − d) 1. (16) test as a length-g test-vector17 whose entries index the items n being tested in that test (repeated entries are allowed in this ∗ vector, hence there might be less than g distinct items in this So, we again optimize for g in (14) and substitute g = vector). Due to the design of our group-testing procedure in n 1/ ln into (15). We note that since both D and d n − D 17 ∗ Note that this test-vector is different from the binary length-n vectors n − d g that specify tests in the group testing-matrix, though there is indeed a natural are o(n), converges to e−1. Hence we have, for bijection between them. n 3028 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 5, MAY 2014 large n, Taking the union bound over all n − d non-defective items χ (n − d) ln(n − d) gives us that the probability of false positives is bounded from T ≥ ∗ above by − ρ − g 1 g∗ n d n + T χ ( − ) ( − ) P = P ≤ (n − d) 1 − p(1 − p)d . (19) ≈ n d ln n d e e 1 − ρ 1 e−1 By differentiation, optimizing (19) with respect to p suggests ln n n−D choosing p as 1/d. However, the precise value of d may not be ( − ) ( − ) n known, only D, an upper bound on it, might be. Substituting χ n d ln n d ln n−D = . (17) the value p = 1/D back into (19), and setting T as β D ln n 1 − ρ e−1 gives us Using the inequality ln(1 + x) ≥ x − x2/2 with x as β D ln n 1 1 d D/(n − D) simplifies the RHS of (17) to P ≤ (n − d) 1 − 1 − e D D χ 2 D β T ≥ e D − ln(n − d). (18) 1 D ln n 1 − ρ 2(n − d) ≤ (n − d) 1 − (20) De Choosing T to be greater than the bound in (18) can only −βe−1 ln n reduce the probability of error, hence choosing ≤ (n − d)e −β −1 χ ≤ n1 e . (21) T ≥ eDln(n − d) 1 − ρ Inequality (20) follows from the previous since d ≤ D by − still implies a probability of error at most as large as in (16). definition, and since (1−1/x)x ≥ e 1. Choosing β = (1+δ)e ρ = 1 ≥ thus ensures the required decay in the probability of error. Choosing 2 , noting that D d, and substituting (19) into (16) implies, for large enough d, the probability of error Hence choosing T to be at least (1 + δ)eDln n suffices to  Pe satisfies prove the theorem.

δ2χ Proof of Theorem 5: − d ln(n−d) −χ+1 Pe ≤ e 1−δ + (n − d) Due to the presence of noise leading to both false positive 2 − δ χd −χ+ and false negative test outcomes, both false defective estimates = (n − d) 1−δ + (n − d) 1 and false non-defective estimates may occur in the No-CoMa −χ+1 ≤ 2(n − d) . algorithm Ð the overall probability of error is the sum of the −χ+ −δ log n probability of false defective estimates and that of false non- Taking 2(n − d) 1 = n ,wehaveχ = δ + = log(n − d) defective estimates. As in the previous algorithm, we set p 1 1/D and T = β D log n. We first calculate the probability + 1. For large n, χ approaches δ + 1. log(n − d) of false non-defective estimates by computing the probability Therefore, the probability of error is at most n−δ, with that more than the expected number of ones get flipped to zero sufficiently large n, the following number of tests suffice to in the result vector in locations corresponding to ones in the satisfy the probability of error condition stated in the theorem. column indexing the defective item. This can be computed as ≥ ( + δ) . d T 2 1 eDln n − = |T |= |S | < |T |( − ( + τ)) Pe P j t Pr j j 1 q 1  i=1 T T − ≤ d pt (1 − p)T t (22) B. Column Matching Algorithms t t=0 We next consider Column Matching algorithms. The CoMa t t − and No-CoMa algorithms respectively deal with the noiseless qr (1 − q)t r r and noisy observation scenarios. r=t−t(1−q(1+τ)) Proof of Theorem 4: T T − − ( τ)2 As noted in the discussion on CoMa in Section II-C, the ≤ d pt (1 − p)T t e 2t q (23) error-events for the algorithm correspond to false positives, = t t 0 when a column of M corresponding to a non-defective item 2 T = − + −2(qτ) is “hidden” by other columns corresponding to defective d 1 p pe (24) items. To calculate this probability, recall that each entry of β D log n 1 1 −2(qτ)2 M equals one with probability p,i.i.d.Let j index a column = d 1 − + e (25) D D of M corresponding to a non-defective item, and let j1,..., jd −2(qτ)2 index the columns of M corresponding to defective items. ≤ d exp −β log n 1 − e (26) Then the probability that mi, j equals one, and at least one of ≤ −β ( − −2)( τ)2 ,..., ( − ( − )d ) d exp log n 1 e q (27) mi, j1 mi, jd also equals one is p 1 1 p . Hence the probability that the jth column is hidden by a col- Here, as in Section II-C, T j denotes the locations of ones T umn corresponding to a defective item is 1 − p(1 − p)d . in the jth column of M. Inequality (22) follows from the CHAN et al.: NON-ADAPTIVE GROUP TESTING 3029 union bound over the possible errors for each of the defective in a similar manner to that of false non-defective estimates as items, with the first summation accounting for the different in (22)Ð(27). T possible sizes of j , and the second summation accounting n−d + = |T |= |S |≥|T |( − ( +τ)) for the error events corresponding to the number of one-to- Pe P j t P j j 1 q 1 zero flips exceeding the threshold chosen by the algorithm. i=1 Inequality (23) follows from the Chernoff bound. Equality (24) T T − comes from the binomial theorem. Equality (25) comes from ≤ (n − d) pt (1 − p)T t t substituting in the values of p and T . Inequality (26) follows t=0 from the leading terms of the Taylor series of the exponential t t − function. Inequality (27) follows from bounding the concave ar (1 − a)t r − − r function 1 − e 2x by the linear function (1 − e 2)x for x > 0. r=t(1−q(1+τ)) T The requirement that the probability of false non-defective ≤ ( − ) − + −2((1−q(1+τ))−a)2 P− n−δ β− n d 1 p pe (33) estimates e to be at most implies that (the bound β T on due to this restriction) satisfies ≤ ( − ) − + −2((1−2q)/4−qτ)2 n d 1 p pe (34) −2 2 ln d exp −β(1 − e )(qτ) log n < −δ ln n − (( − )/ − τ)2 ≤ (n − d) exp −(1 − e 2 1 2q 4 q )β log n (35) β( − −2)( τ)2 ⇒ − 1 e q < −δ ln d ln n ln n ≤ ( − ) −β ( − −2) (( − )/ − τ)2) ln 2 n d exp log n 1 e 1 2q 4 q (36) ln d + δ ln 2 ⇒ β− > ln n Note that for the Chernoff bound to be applicable in (33), − (28) (1 − e 2)(qτ)2 1 − q(1 + τ) > a, which implies that τ<(1 − 2q)/(4q). We now focus on the probability of false defective estimates. Equation (34) follows from substituting the bound derived on = In the noiseless CoMa algorithm, the only way a false a in (32) into (33), and (35) follows by substituting p / defective estimate could occur was if all the ones in a column 1 D into the previous equation. Inequality (36) follows from − −2x are hidden by ones in columns corresponding to defective bounding the concave function 1 e by the linear function ( − −2) > items. In the No-CoMa algorithm this still happens, but in 1 e x for x 0. addition noise could also lead to a similar masking effect. The requirement that the probability of false defective + −δ β+ That is, even in the 1 locations of a column corresponding to estimates Pe be at most n implies that (the bound on β due to this restriction) be at least a non-defective not hidden by other columns corresponding to defective items, measurement noise may flip enough zeroes ln(n − d) + δ ln 2 to ones so that the decoding threshold is exceeded, and the + ln n β > . (37) decoder hence incorrectly declares that particular item to be (1 − e−2)((1 − 2q)/4 − qτ)2 defective. See Fig. 4(a) for an example. Note that β must be at least as large as max{β−,β+} so Hence we define a new quantity a, which denotes the that both (28) and (37) are satisfied. ( , ) probability for any i j th location in M that a 1 in that When the threshold in the No-CoMa algorithm is high location is “hidden by other columns or by noise”. It equals (i.e., τ is small) then the probability of false negatives τ a = 1 −[(1 − q)(1 − p)d + q(1 − (1 − p)d )] increases; conversely, the threshold being low ( being large) increases the probability of false defective estimates. Alge- 1 d = 1 − q − 1 − (1 − 2q) (29) braically, this expresses as the condition that τ>0(elsethe D probability of false non-defective estimates is significant), and conversely to the condition that 1 − q(1 + τ) > a (so that the We set D ≥ 2 (the case D = 1 can be handled separately by Chernoff bound can be used in (33)) Ð combined with (32) the same analysis, but setting p = 1/(D + 1) = 1/2rather this implies that τ ≤ (1 − 2q)/4q. Each of (28) and (37) as a than 1/D = 1), and note that by definition d ≤ D. function of τ is a reciprocal of a parabola, with a pole at the We then bound a from above as − corresponding extremal value of τ.Furthermore,β is strictly + 1 d increasing and β is strictly decreasing in the region of valid max a = max 1 − q − 1 − (1 − 2q) τ ( ,( − )/( )) D≥2,d≤D D≥2,d≤D D in 0 1 2q 4q . Hence the corresponding curves on the right-hand sides of (28) and (37) intersect within the region of d 1 valid τ, and a good choice for β is at the τ where these two = 1 − q−(1−2q) min 1 − (30) D≥2,d≤D D curves intersect. To find this β, we make another simplifying substitution. Let γ be defined as18 1 D = 1 − q − (1 − 2q) min 1 − (31) ln D + δ ln n D≥2 D γ = . (38) ln(n − d) + δ ln n = (1 − q) − (1 − 2q)/4. (32) 18Note that we replace d with D in (28) and (37) since this still leaves − + the inequalities true, but now converts β and β into quantities that are Equation (32) follows from the observations that (30) is β− β+ = = independent of d. This is necessary since and are (internal) code optimized when d D and (31) is optimized when D 2. design parameters, and as such must be independent of the true value of d The probability of false defective estimates is then computed (which might be unknown) and may depend only on the upper bound D. 3030 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 5, MAY 2014 and as when perturbed in a convex combination of the directions in . = ln D . ln n (2) Expected Perturbation Cost: The heart of our argument then lies in Claims 11 and 12, where we characterize via Hence γ for large n essentially equals an exhaustive case-analysis the expected change (over ran- + δ domness in the matrix M and noise ν)inthevalueof . (39)  1 + δ each slack variable ηi when x is perturbed to some x  (Note that since lies in the interval [0, 1), hence γ lies in by a vector in  . In particular, we demonstrate that for the interval [δ/(δ + 1), 1).) each such individual perturbation vector, the expected change η Then equating the RHS of (28) and (37) implies that the in the value of each slack variable i is strictly positive 20 optimal τ ∗ satisfies with high probability. The actual proof follows from a case-analysis similar to the one performed in the example ln 2 γ ln 2 = (40) in Table II. ( − −2)(( − )/ − τ ∗)2 ( − −2)( τ ∗)2 1 e 1 2q 4 q 1 e q (3) Concentration & Union Bounding: Next, in Claim 14 with Simplifying (40) gives us that slightly careful use of standard concentration inequalities (specifically, we need to use both the additive and multiplica- ∗ 1 − 2q τ = . (41) tive forms of the Chernoff bound, reprised in Claim 13) we 4q(1 + γ −1/2) show that the probability distributions derived in Claim 12 are Substituting these values of γ and τ into (28) gives us the sufficiently concentrated. We then take the union bound over explicit bound for large n all vectors in  (in fact, there are a total of d(n − d) such  −0.5 2 vectors in  ) and show that with sufficiently high probability ∗ 16(1 + γ ) ( + δ)ln 2 β = . (42) the expected change in the value of the objective function (1 − e−2)(1 − 2q)2 (which equals the weighted sum of the changes in the values  Using (39) to simplify (42) gives of the slack variables ηi )foreach perturbation vector in  is √ √ 2 2 also strictly positive. ∗ 16(1 + γ) (1 + δ)ln 2 12.83(1 + γ) (1 + δ) β = ≈ . (4) Generalization Based on Convexity: We finally note in (1 − e−2)(1 − 2q)2 (1 − 2q)2 Claim 14 that the set of feasible (x¯,η) of No-LiPo- for a  fixed value of d forms a convex set. Hence if the value of the LP objective function strictly increases along every direction in  C. LP-Decoding Algorithms  , then in fact the value of the LP objective function strictly increases when the true x is perturbed in any direction (since, For the LP-based algorithms, we first prove Theorem 6, and as noted before, any vector in the feasible set can be written then derive Theorems 7, 8 and 9 as direct corollaries.  as x plus a non-negative linear combination of vectors in  ). Proof of Theorem 6: Hence the true x must be the solution to No-LiPo- .This Proof sketch: For the purpose of exposition we break the completes our proof. main ideas into four steps below. Proof details: We now proceed by proving the sequence (1) Finite set of Perturbation Vectors: For the known d case of claims that when strung together formalize the above we define a finite set  (that depends on the true x) con- argument. taining so-called “perturbation vectors”.19 We demonstrate in Without loss of generality, let x be the vector with 1s in the Claim 10 that any x¯ in the feasible set of the constraint first d locations, and 0s in the last n − d locations.21 Choose set of No-LiPo- can be written as the true x plus a non-   d(n−d)  ={φ }  as the set of d(n − d) vectors with a single negative linear combination of perturbation vectors from this k =1 −1 in the support of x, a single 1 outside the support of x, set. The physical intuition behind the proof is that the vectors  and zeroes everywhere else. For instance, the first φ in the from  correspond to a “mass-conserving” perturbation of set equals (−1, 0,...,0, 1, 0,...,0), where the 1 is in the x. The property of non-negativity of the linear combinations (d + 1)th location. arises from a physical argument demonstrating that there is Then, Claim 10 below gives a “nice” characterization of the a path from x to any point in the feasible set using these set of x¯ in the feasible set of No-LiPo- . perturbations, over which one never has to “back-track”. Claim 10: Any vector x¯ with the same weight as x (i.e.,if The linear combination property is important, since this d¯ = d) that satisfies the constraints (7Ð10) in NCBP-LP can enables us to characterize the directions in which a vector can be perturbed from x to another vector that satisfies the con- straints of No-LiPo-, in a “finite” manner (instead of having to consider the uncountably infinite number of directions that x 20Actually, for each fixed feasible x, due to equality (7) there may be a could be perturbed to). The non-negativity of the linear combi- range of ηi variables such that (x,η)are simultaneously feasible in No-LiPo- . η nation is also crucial since, as we explain below, this property Here and in the rest of this paper we abuse notation by using i to denote specifically those variables that meet all constraints in No-LiPo- with equality, ensures that the objective function of the LP can only increase and hence correspond to the smallest possible value of the objective function of No-LiPo- for that fixed x. 19This set is defined just for the purpose of this proof Ð the encoder/decoder 21As can be verified, our analysis is agnostic to the actual choice of x,as do not need to know this set. long as it is a vector in {0, 1}n of weight any d ≤ D. CHAN et al.: NON-ADAPTIVE GROUP TESTING 3031 be written as and the number of perturbed noise variables (#)T as d n T    x¯ = x + ck,k φ ,  , with all ck,k ≥ 0. (43) ( )T = ( = ) + ( =− ) . k k # 1 0,i 1 1 0,i 1 (46) = = + k 1 k d 1 i=1 Proof of Claim 10: The intuition behind this claim is by Here 1(.) is denotes the indicator function of an event. Claim 12: “conservation of mass”, so to speak. A good analogy is the following physical process.  − P( = 1) = p(1 − p) (1 − 2q)(1 − p)d 1 + q , Imagine that x corresponds to n bottles of water each 0,i ( =− ) = ( − ) . with capacity one litre, with the first d bottles full, and the P 0,i 1 p 1 p q (47) remaining empty. Imagine x¯ as another state of these bottles Hence the expected value of the objective value perturba- with d¯ = d litres of water. For each bottle j among the tion T equals Tp(1 − p)d (1 − 2q), and the expected first d bottles that has more water remaining than in the value of the number of perturbed variables (#)T equals corresponding bottle in the final state x¯ , we use its water to i Tp (1 − p)d (1 − 2q) + 2(1 − p)q . increase the water level of bottles among the last n −d bottles Proof of Claim 12: The proof follows from a case-analysis (taking care not to overshoot, i.e., not to put more than the similar to the one performed in the example in Table II. The desired water level x¯ in any such bottle). The fact that this is i reader is strongly encouraged to read that example before doable follows from conservation of mass. This corresponds to looking at the following case analysis, which can appear quite using non-negative linear combinations of perturbation vectors  intricate. from the set  (non-negativity arises from the fact the we took •  =− care not to overshoot). Equation (47) analyzes 0,i 1, the expected change in η if x is perturbed by a vector from ,andyˆ = 0. In fact, amongst various ways to do this, the following i i n η ( ) = .  = η ( )−η ( ) = .φ specific choices of c ,  also work. Define C as  x¯  , By (7) i x mi x, hence 0 i x i x mi . k k k =d+1 k  =− i.e., the amount of water transferred to the empty bottles. We We first analyze the case when 0,i 1. This occurs only when m = 1whereφ =−1 (hence there is non- then set ck,k = (1 −¯xk )x¯k /C. That is, first consider the i  = amount of water transferred out of the k bottle Ð this equals zero intersection with the support of x and so yi 1), and further that m equals 0 in the location where φ = 1. (xk −¯xk ), which equals 1 −¯xk since xk = 1 by assumption. i We apportion this water in proportion to the water required in Thus, only 2 indices of mi matter for this scenario. These  ( − ) each of the k th bottles.  scenarios occurs with probability q 1 p p.  = Next, Claim 12 below computes the expected change in the Analogously, the only scenarios when 0,i 1 occur  φ = value of the slack variable η as x is perturbed by φ .Asmall when mi equals 1 in the location where 1and i φ =− example in Table II (with n = 3, d = 2) demonstrates the mi equals 0 in the location where 1. This can calculations in the proof of Claim 12 explicitly. happen in two ways. It could be that if the support of mi   = For any fixed x ∈{0, 1}n of weight d ≤ D,letx = x + φ . is entirely outside the support of x (hence yi 0), and further that m equals 1 in the location where φ = 1 (this Over the randomness in mi and the noise νi generating the i ( − )( − )d testing outcome yˆ = y + ν ,wedefinethecost perturbation happens with probability 1 q 1 p p). Or, it could i i i φ =− random variables be that mi equals 0 where 1, mi equals1inat least one of the other (d −1) locations in the support of x  = (η ( ) − η ( )), ˆ = . 0,i i x i x conditioned on yi 0 (44) (which ensures that yi = 1), and further that mi equals 1 φ = ( − Claim 11: The cost perturbation random variables defined where 1 (the probability of such a scenario is q 1 )( − ( − )d−1) in (44) all take values only in {−1, 0, 1}. p 1 1 p p). Adding together the probabilities corresponding to these two scenarios gives the desired Proof of Claim 11: We first analyze the case when if yˆi = 0.   result. By (7), ηi (x) = mi .x. Hence  , = ηi (x ) − ηi (x) = mi .    0 i (x − x) = mi .φ .Butφ has exactly one component equaling Substituting the four terms obtained in (47) into (45) and T ( )T  −1 and one equaling 1. By definition, mi ,isa0/1 vector. (46) gives the required result for and # .  {− , , }  We now recall Bernstein’s inequality, a classical concen- Hence 0,i takes values only in 1 0 1 . The next claim forms the heart of our proof. It does an tration inequality that we use to make a statement about the exhaustive22 case analysis that computes the probabilities that concentration about the expectation of the objective function the cost perturbation random variables take values 1 or −1(the value of No-LiPo- . { }T case that they equal zero can be derived from these calculations Claim 13 (Bernstein inequality [11]): Let Wi i=1 be a in a straightforward manner too, but since these values turn sequence of independent zero-mean random variables. out not to matter for our analysis, we omit these details). Suppose |Wi |≤w for all i. Then for all negative σ, We define the expected objective value perturbation T as T σ 2/ <σ ≤ − 2 . T P Wi exp (48)   T 2 − wσ/ T = ( = ) − ( =− ) , i=1 i EWi 3 1 0,i 1 1 0,i 1 (45) i=1 Next, Claim 14 below demonstrates that if x is perturbed  22And exhausting! in any direction from the set  , as long as it remains within 3032 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 5, MAY 2014

TABLE II     SUPPOSE x = (1, 1, 0).CHOOSE SOME x = x (IN THIS EXAMPLE, x = x + φ ,WHERE φ = (−1, 0, 1) ISAPerturbation Vector). THIS EXAMPLE ANALYZES THE EXPECTATION (OVER THE RANDOMNESS IN THE PARTICULAR ROW mi OF THE MEASUREMENT MATRIX M) OF THE DIFFERENCE IN  VALUE OF THE CORRESPONDING SLACK VARIABLES ηi (x) AND ηi (x ) IN COLUMN 8(b).TO COMPUTE THESE, WE CONSIDER THE COLUMNS OF THE TABLE ABOVE SEQUENTIALLY FROM LEFT TO RIGHT.COLUMN 1CONSIDERS THE TWO POSSIBLE VALUES OF THE OBSERVED VECTOR yˆi .COLUMN 2 GIVES THE CORRESPONDING VALUES OF THE SLACK VARIABLES CORRESPONDING TO THE i TH TEST, AS RETURNED BY THE CONSTRAINTS (7) OF No-LiPo- ÐCOLUMN 3INDEXES THE POSSIBILITIES OF THE (NOISELESS)TEST OUTCOMES yi , AND COLUMN 4ENUMERATES POSSIBLE VALUES FOR mi , THE i-TH ROW OF M,THAT COULD HAV E GENERATED THE VALUES OF yi IN COLUMN 3, GIVEN THAT x = (1, 1, 0).COLUMN 5 COMPUTES THE PROBABILITY OF A PARTICULAR OBSERVATION yˆi AND A ROW mi ,GIVEN THAT THE NOISELESS OUTPUT yi EQUALED A PARTICULAR VALUE. COLUMN 6COMPUTES THE FUNCTION IN COLUMN 2, GIVEN THAT mi EQUALS THE VALUE GIVEN IN COLUMN 4. COLUMNS 7 AND 8(a)  RESPECTIVELY EXPLICITLY COMPUTE THE VALUE OF THE FUNCTION IN COLUMN 6 FOR THE VECTORS x AND x Ð THE RED ENTRIES IN COLUMN 8(a)  INDEX THOSE LOCATIONS WHERE η(x ), THE SLACK VARIABLE FOR THE PERTURBED VECTOR,IS LESS THAN η(x), AND THE GREEN CELLS INDICATE THOSE LOCATIONS WHERE THE SITUATION IS REVERSE.COLUMN 8(b) THEN COMPUTES THE PRODUCT OF COLUMN 5WITH THE DIFFERENCE OF THE ENTRIES IN COLUMN 7FROM THOSE OF COLUMN 8(a), i.e., THE EXPECTED CHANGE IN THE VALUE OF THE SLACK VARIABLE ηi (.).THE VALUE 2 (1 − 2q)(1 − p) p IN BLUEATTHEBOTTOM REPRESENTS THE EXPECTED CHANGE (AVERAGED OVER ALL POSSIBLE TUPLES (yi , mi , yˆi ))

T 2 the feasible set for No-LiPo- , with high probability over M Next, we bound i=1 EWi from above by and the noise vector ν, the value of the objective function of T No-LiPo- increases. EW2 ≤ w2(#)T β ( ) β i Claim 14: Choose T as LPD log n , with LP as given in i=1 Theorem 6. Then with probability no more than n−δ the value = 2w2Tp (1 − p)d (1 − 2q) + 2(1 − p)q (52) of the objective function of No-LiPo- has a unique optimum 2 at xˆ = x. < 2w βLP log(n). (53) Proof of Claim 14: Since each of the m vectors and the i = / noise vector ν are all chosen independently, the random vari- Here (52) follows from Claim 12, and (53) by using p 1 D, = β ( ) ( − )d < ables corresponding to each type of cost perturbation variable T LPD log n ,and 1 p 1. in (44) are distributed i.i.d. according to (47). We then set We now use Claim 13 to concentrate around the expected  σ =−TE(η (x ) − η (x))=−β log(n)(1− p)d(1−2q), value given in Claim 12. In particular, for each i ∈{1,...,T } i i LP we set (54)   = / = Wi = (ηi (x ) − ηi (x)) − E(ηi (x ) − ηi (x)), (49) which is negative as required. We use p 1 D, T d βLPD log(n) and 1/e <(1 − 1/D) < 1togiveusthat hence each Wi is zero-mean. β ( )( − )/ < |σ| <β ( )( − ). Hence w, the maximum value of |Wi |, can be bounded as LP log n 1 2q e LP log n 1 2q (55)   w =|ηi (x ) − ηi (x) − E(ηi (x ) − ηi (x))| (For the numerator of the exponent of Claim 13 we shall use   ≤|ηi (x ) − ηi (x)|+|E(ηi (x ) − ηi (x))| the lower bound, and for the denominator the upper bound.) Substituting the value of σ from (54), W from (49), noting = 1 + p(1 − p)d (1 − 2q) (50) i that σ is negative, using Claim 13, and using bounds for w, < 1 + (1 − 2q)/D. (51) T 2 σ i=1 EWi and respectively from (51), (53) and (55) gives Here (50) follows by Claim 11 which demonstrates that us that  η(x ) − η(x) takes values only in {−1, 0, 1}, and from the T   computation of E(ηi (x ) − ηi (x)) in the last part of Claim 12. P ηi (x ) − ηi (x) < 0 (56) / = Equation (51) uses the definition of p as 1 D and the bound i 1 that (1 − p)d < 1.23 T   = P (ηi (x ) − ηi (x)) − E(ηi (x ) − ηi (x)) <σ 23 = = / As always, the case of D 1 is handled separately by choosing p 1 2. i=1 CHAN et al.: NON-ADAPTIVE GROUP TESTING 3033 T = <σ (47) Ð (59) change as follows. Equation (47) becomes P Wi i=1 ( = ) = ( − ) ( − − )( − )d−1 + , P 0,i 1 p 1 p 1 q0 q1 1 p q1 σ 2/ ≤ − 2 ( =− ) = ( − ) . exp P 0,i 1 p 1 p q1 (60) T EW2 − wσ/3 i i 2 (βLP log(n)(1 − 2q)/e) /2 Hence the expected value of the objective value perturbation < exp − T ( − )d ( − − ) w2β log(n) + wβ log(n)(1 − 2q)/3 equals Tp 1 p 1 q0 q1 , and the expected LP LP ( )T 2 value of the number of perturbed variables # equals βLP log(n)(1 − 2q) d = exp − , Tp (1 − p) (1 − q0 − q1) + 2(1 − p)q1 . These differences 2w(w + ( − )/ ) 2e 1 2q 3 lead to changes in the parameters we use in Bernstein’s (1 − 2q) with w = 1 + . (57) inequality. Specifically, (51) is now bounded as D   |Wi |=w =|ηi (x ) − ηi (x) − E(ηi (x ) − ηi (x))| We now note that (57) gives an upper bound on the   ≤|η (x ) − η (x)|+|E(η (x ) − η (x))| probability that a single perturbation vector in  causes a i i i i = + ( − )d ( − − ) non-positive perturbation in optimal value of the objective 1 p 1 p 1 q0 q1  function of No-LiPo-.Butthereared(n−d) vectors in  .We < 1 + (1 − q0 − q1)/D. (61) take a union bound over all of these vectors by multiplying T 2 the terms in (57) by d(n − d). Hence the overall bound on Next, we bound i=1 EWi from above by the probability that any vector from the set  causes a non- T positive perturbation in optimal value of the objective function EW2 ≤ w2(#)T of No-LiPo- is given by i i=1 ⎛ ⎞ 2 d β ( − )2 = w Tp (1 − p) (1 − q0 − q1) + 2(1 − p)q1 − LP 1 2q ⎜ ⎟ 2 ⎜ ( ) e2w(w + ( − q)/ ) ⎟

Hence the overall bound on the probability that any vector Substituting into Bernstein’s Inequality, we can bound  from the set  causes a non-positive perturbation in optimal βACT −LP as value of the objective function of No-LiPo- is given by (δ + + ) ( ) 2w ( w( + − ) − ( − − )) ⎛ ⎞ 1 2ln 2 2e 3 1 u q0 1 u q0 , β − (1 − q − q )2 ( − − )2 ⎜ − ASY LP 0 1 ⎟ 3 1 u q0 2 ( − ) ⎜ ln(2)2e w(w + (1 − q0 − q1)/3) ⎟ 1 − u − q0 d n d ⎝n ⎠ with w = 1 + . (67) D  2 βASY−LP(1 − q0 − q1) 1+ − Proof of Theorem 9: ( ) 2w(w + ( − − )/ ) < n ln 2 2e 1 q0 q1 3 , We substitute q = 0 in (59) to obtain the corresponding β as (1 − q0 − q1) w = + ,    1 with 1 (65) (δ+1+ ) ln(2)2e2w (w +1/3), with w = 1+ , (68) D D ( )/ ( ) where we recall is defined as ln D ln n .  The quantity in (65) bounds from above the probability that  No-LiPo- “behaves badly” (i.e., some vector from  causes APPENDIX a non-positive perturbation in optimal value of the objective Many “usual” proofs of lower bounds for group-testing function of No-LiPo- ). are combinatorial. Information-theoretic proofs are needed β We can thus bound ASY−LP as to incorporate the allowed probability of error  into our (δ + + ) ( ) 2w(w + ( − − )/ ) lower bounds, we provide. Such proofs were provided in, for 1 ln 2 2e 1 q0 q1 3 , 2 instance [10]. For completeness, we reprise simple versions of (1 − q0 − q1) these proofs here. 2(1 − q0 − q1) with w = 1 + . (66) We begin by noting that X → Y → Yˆ → Xˆ (i.e. D the input vector, noiseless result vector, noisy result vector,  and the estimate vector) forms a Markov chain. By standard Proof of Theorem 8: information-theoretic definitions we have Recall that in the activation noise model, we denote by u the probability of activation noise, i.e., the probability that H(X) = H(X|Xˆ ) + I(X; Xˆ ) a (single) defective item shows up as non-defective in a test. Correspondingly, we denote by uv the probability of observing Since X is uniformly distributed over all length-n and D- ( ) = a false negative test outcome, given that the test involves say sparse data vectors (since d could be as large as D), H X log |X |=log n . By Fano’s inequality, H(X|Xˆ ) ≤ 1 + v defective items. Also, we use q0 to denote the probability D  n ( ; ˆ ) ≤ ( ; ˆ ) of a false positive test outcome. log D . Also, we have I X X I Y Y by the data- In this noise model, No-LiPo- decoding still achieves good processing inequality. Finally, note that performance Ð “small” probability of error with an order- T ˆ ˆ ˆ optimal number of measurements. Again, as in the asymmetric I (Y; Y) ≤ H (Yi ) − H (Yi |Yi ) noise model, the proof is essentially the same as the proof i=1 for BSC(q) noise. Equations (47) Ð (59) change as follows. ˆ Equation (47) becomes since the first term is maximized when each of the Yi are inde- pendent, and because the measurement noise is memoryless. ( = ) = ( − )d ( − ) P 0,i 1 p 1 p 1 q0 For the BSC(q) noise we consider in this work, this summation d−1 is at most T (1 − H (q)) by standard arguments.24 d − 1 − − +p(1 − p) (1 − p)d 1 i (pu)i Combining the above inequalities, we obtain i i=1 n = p( − p) ( − p+ pu)d−1−( − p)d−1q , (1 − )log ≤ 1 + T (1 − H(q)) 1 1 1 0 D d−1 −  d 1 d−1−i i Also, by standard arguments via Stirling’s approximation [37], P( , =−1) = pu(1 − p) (1 − p) (pu) n 0 i i log is at least D log(n/D). Substituting this gives us the i=0 D − desired result = p(1 − p)(1 − p + pu)d 1u. 1 −  n T ≥ log Hence the expected value of the objective value perturbation 1 − H(q) D T  equals 1 −  n ≥ D log . d−1 d−1 1 − H(q) D Tp(1 − p) (1 − p + pu) (1 − u) − (1 − p) q0 ,  and the expected value of the number of perturbed variables (#)T equals 24This technique also holds for more general types of discrete memoryless noise, such as, for instance, the asymmetric noise model considered in d−1 d−1 Theorem 7 Ð for ease of presentation, in this work we focus on the simple Tp(1 − p) (1 − p + pu) (1 + u) − (1 − p) q0 . case of the Binary Symmetric Channel. CHAN et al.: NON-ADAPTIVE GROUP TESTING 3035

ACKNOWLEDGMENT [26] A. Gilbert, M. Iwen, and M. Strauss, “Group testing and sparse sig- nal recovery,” in Proc. 42nd Asilomar Conf. Signals, Syst. Comput., The authors wish to acknowledge useful discussions with Oct. 2008, pp. 1059Ð1063. Pak Hou Che, who contributed to an early version of this [27] G. Cormode and S. Muthukrishnan, “Combinatorial algorithms for work. We would also like to thank the anonymous reviewers, compressed sensing,” in Proc. 40th Annu. Conf. Inf. Sci. Syst., Mar. 2006, pp. 198Ð201. who have suggested new results, pointed out related work, and [28] A. Macula, “Probabilistic nonadaptive group testing in the presence in general significantly improved the quality of this work. of errors and DNA library screening,” Ann. Combinat., vol. 3, no. 1, pp. 61Ð69, 1999. REFERENCES [29] M. Sobel and R. M. Elashoff, “Group testing with a new goal, estima- tion,” Biometrika, vol. 62, no. 1, pp. 181Ð193, 1975. [1] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama, “Non-adaptive prob- [30] E. J. Candès, “The restricted isometry property and its implications abilistic group testing with noisy measurements: Near-optimal bounds for compressed sensing,” Comp. Rendus Math., vol. 346, nos. 9Ð10, with efficient algorithms,” in Proc. 49th Annu. Allerton Conf. Commun., pp. 589Ð592, 2008. Control, Comput., Sep. 2011, pp. 1832Ð1839. [31] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof [2] C. L. Chan, S. Jaggi, V. Saligrama, and S. Agnihotri, “Non-adaptive of the restricted isometry property for random matrices,” Construct. group testing: Explicit bounds and novel algorithms,” in Proc. IEEE Approx., vol. 28, no. 3, pp. 253Ð263, Dec. 2008. ISIT, Jul. 2012, pp. 1837Ð1841. [32] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea- [3] R. Dorfman, “The detection of defective members of large populations,” surements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, Ann. Math. Statist., vol. 14, no. 4, pp. 436Ð411, 1943. vol. 53, no. 12, pp. 4655Ð4666, Dec. 2007. [4]D.-Z.DuandF.K.Hwang,Combinatorial Group Testing and Its [33] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck, “Sparse solu- Applications, 2nd ed. Singapore: World Scientific, 2000. tion of underdetermined systems of linear equations by stagewise [5] G. Atia and V. Saligrama, “Boolean compressed sensing and noisy group orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 58, no. 2, testing,” vol. 58, no. 3, pp. 1880Ð1901, 2013. pp. 1094Ð1121, Feb. 2012. [6] A. J. Macula, “Error-correcting nonadaptive group testing with de- [34] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sens- disjunct matrices,” Discrete Appl. Math., vol. 80, nos. 2Ð3, pp. 217Ð222, ing signal reconstruction,” IEEE Trans. Inf. Theory, vol. 55, no. 5, 1997. pp. 2230Ð2249, May 2009. [7] E. N. Gilbert, “A comparison of signaling alphabets,” Bell Syst. Tech. [35] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from J., vol. 31, no. 3, pp. 504Ð522, 1952. incomplete and inaccurate samples,” Commun. ACM, vol. 53, no. 12, [8] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. pp. 93Ð100, 2010. Tech. J., vol. 27, pp. 379Ð423, Jul. 1948. [36] J. A. Rice, Mathematical and Data Analysis, 3rd ed. Boston, [9] M. B. Malyutov, “Separating property of random matrices,” Math. MA, USA: Duxbury, 2006. Zametki, vol. 23, no. 1, pp. 155Ð167, 1978. [37] T. Cover and J. Thomas, Elements of Information Theory.NewYork, [10] A. D’yachkov, “Lectures on designing screening experiments,” Ph.D. NY, USA: Wiley, 1991. dissertation, Combinatorial Comput. Math. Center, Pohang Univ. Sci. Technol., Gyeongsangbuk-do, South Korea, Feb. 2004. [11] M. Mitzenmacher and E. Upfal, Probability and Com- puting: Randomized Algorithms and Probabilistic Analysis. Cambridge, U.K.: Cambridge Univ. Press, 2005. Chun Lam Chan received a B.Eng. (2013) in Information Engineering from [12] H.-B. Chen and F. K. Hwang, “A survey on nonadaptive group testing the Chinese University of Hong Kong (CUHK). He is now a M.Phil. student algorithms through the angle of decoding,” J. Combinat. Optim., vol. 15, in the Dept. of Information Engineering in CUHK. His research interest is no. 1, pp. 49Ð59, 2008. displayed by the name of the team he is in - CAN-DO-IT team (Codes, [13] D.-Z. Du and F. K. Hwang, Pooling Designs and Nonadaptive Group Algorithms, Networks: Design and Optimization for Information Theory). Testing: Important Tools for DNA Sequencing. Singapore: World Scientific, 2006. [14] M. B. Malyutov and P. S. Mateev, “Screening designs for non-symmetric response function,” Math. Zametki, vol. 27, no. 1, pp. 109Ð127, 1980. [15] M. B. Malyutov and H. Sadaka, “Jaynes principle in testing active variables of linear model,” Random Operat. Stochastic Equ.,vol.6, Sidharth Jaggi B.Tech. (’00), EE, IIT Bombay, MS/Ph.D. (’05) EE, CalTech, no. 4, pp. 311Ð330, 1998. Postdoctoral Associate (’06) LIDS, MIT, currently Associate Professor, Dept. [16] F. J. McWilliams and N. J. A. Sloane, The Theory of Error-Correcting of Information Engineering, The Chinese University of Hong Kong. Codes. Amsterdam, The Netherlands: North Holland, 1977. [17] A. D’yachkov and A. Rashad, “Universal decoding for random design of screening experiments,” Microelectron. Rel., vol. 29, no. 6, pp. 965Ð971, 1989. [18] A. G. Dyachkov and V. V. Rykov, “Bounds on the length of disjunctive Venkatesh Saligrama (SM’07) is a faculty member in the Electrical and codes,” Problemly Peredachi Inf., vol. 18, no. 3, pp. 7Ð13, 1982. Computer Engineering Department at Boston University. He holds a PhD [19] P. L. Erdös, P. Frankl, and Z. Füredi, “Families of finite sets in which from MIT. His research interests are in Statistical Signal Processing, Statistical no set is covered by the union of two others,” J. Combinat. Theory, Ser. Learning, Video Analysis, Information and Decision theory. He is currently A, vol. 33, no. 1, pp. 158Ð166, 1982. serving as an Associate Editor for IEEE TRANSACTIONS ON INFORMATION [20] A. G. Dyachkov, V. V. Rykov, and A. M. Rashad, “Superim- THEORY. He has edited a book on Networked Sensing, Information and posed distance codes,” Problems Control Inf. Theory, vol. 18, no. 4, Control. He has served as an Associate Editor for IEEE TRANSACTIONS ON pp. 237Ð250, 1989. SIGNAL PROCESSING and Technical Program Committees of several IEEE [21] Z. Füredi, “On r-cover-free families,” J. Combinat. Theory Ser. A, conferences. He is the recipient of numerous awards including the Presidential vol. 73, no. 1, pp. 172Ð173, Jan. 1996. Early Career Award (PECASE), ONR Young Investigator Award, and the NSF [22] D. Sejdinovic and O. Johnson, “Note on noisy group testing: Asymptotic Career Award. bounds and belief propagation reconstruction,” in Proc. 48th Annu. Allerton Conf. Commun., Control, Comput., Oct. 2010, pp. 998Ð1003. [23] M. B. Malyutov and H. Sadaka, “Capacity of screening under linear programming analysis,” in Proc. 6th Simul. Int. Workshop Simul., 2009, pp. 1041Ð1045. [24] M. B. Malyutov, “Recovery of sparse active inputs in general sys- Samar Agnihotri (S’07–M’10) is a faculty member in the School of Com- tems: A review,” in Proc. Int. Conf. Comput. Technol. Electr. Electron. puting and Electrical Engineering at IIT Mandi. He holds M.Sc (Engg.) and Eng., vol. 1. Jul. 2010, pp. 15Ð22. Ph.D. degrees in Electrical Sciences from IISc Bangalore. During 2010-12, [25] D. M. Malioutov and M. M. B. Malyutov, “Boolean compressed sensing: he was a postdoctoral fellow in the Department of Information Engineering, LP relaxation for group testing,” in Proc. IEEE Int. Conf. Acoust., Speech The Chinese University of Hong Kong. His research interests are in the areas Signal Process., Kyoto, Japan, Mar. 2012, pp. 3305Ð3308. of communication and information theory.