arXiv:1701.07477v1 [cs.IT] 25 Jan 2017 floor au fconstants of value ftedfcieieswt ihpoaiiy(..)ie,w i.e., (w.h.p) probability high with items approaching probability defective the of m trtv eoigagrtmhsasblna computation sub-linear a has of complexity decoding iterative qa o1frvrosvle of values various for 1 to equal eso htfrayabtaiysmall arbitrarily any for de that iterative simple show a We and codes sparse-graph right-regular of testing edfcie epooeatsigshm ae nleft- on only based requires scheme testing a propose We defective. be h etn ceesc httettlnme ftests of number total the that such scheme testing the eti osbe h upto h etis test the of single a output for The together possible. items is multiple test grouping where people) n os esoso h rbe sln stenme of number the as of long number as total the problem nois with i.e., both the sub-linearly scale for of items valid defective versions are noisy results These and w.h.p. items defective rue tm r o-eetv res h uptis output when the scenario else the or In non-defective are items grouped of population large on based scheme testing the o to number the sparse-graphs. compared left-regular in when improvement required substantial tests a showing by results o h Tpolm h rbblt frcvr o n given any for recovery of desig probability the combinatorial problem, the GT the In for approximate. combinator and considered: probabilistic been have guarantees construction [8]. f [7], be [2], can in probabilistic, and combinatorial test both group structures[5] , on survey data comprehensive A learning[4], processing[6]. signal machine non-line fi biology[3], screening, and [2] wide like individu- library etc.., in clone communication soldier application multi-access like found optimization, each problems has test of testing soldiers variety to group the then having testing Since for without ally. II War World for during [1] Dorfman by minimized. is performed = K Abstract h rbe fGopTsig(T eest etn a testing to refers (GT) Testing Group of problem The nteltrtr nGopTsig he id fre- of kinds three Testing, Group on literature the In of field the to introduced first was problem This c ǫ 2 K = n constant and log N o ( N W osdrtepolmo o-dpiegroup non-adaptive of problem the consider —We ru etn sn left-and-right-regular using Testing Group K tm u fwhich of out items O m ) h iuainrslsvldt h theoretical the validate results simulation The . ( log K = c log c K N ǫ ǫ K N K and .I I. et u ceercvr the recovers scheme our tests c K N 1 log 1 tm for items ) ≪ = NTRODUCTION smttclyin asymptotically hc skont eotml lofor Also optimal. be to known is which ℓ c N 1 K r ucino h eie error desired the of function a are c N ℓ vns e,NgrjT aaiaa,KihaR Narayanan R. Krishna Janakiraman, T. Nagaraj Vem, Avinash ǫ h betv fG st design to is GT of objective the , et orecover to tests osre ob approximately be to (observed K eateto lcrcladCmue Engineering Computer and Electrical of Department K rls tm r nw to known are items less or ǫ .Mr motnl the importantly More ). eetv tm o sick (or items defective { pregahcodes sparse-graph vemavinash,tjnagaraj,krn > ǫ N negative and (1 0 K − u scheme our ea & University A&M Texas whole hr the where , ǫ ) falthe all if positive. m fraction coder. items, e of set obe to ound eless and elds and ing ial, ith ns ar al f - h opttoa opeiyo h rpsdpeigbased peeling only proposed is the decoder of complexity computational the fG)whp h rcs au fconstant of value precise The w.h.p. GT) of twssoni 7,[]ta h ubro et necessary tests of number the that [8] [7], in is shown was it time. fterqie ro floor error required the of (1 hr l h eetv tm r urnedt eoe usin [13] recover al., to et guaranteed are Indyk items m by defective the proposed all was where decoding efficient with g ouet h uhr rvdta sn AFO scheme SAFFRON using that m proved authors The document. 1] 1] oto hs eut eebsdo algorithms on computation based high atleast were a of have complexity results thus searches these exhaustive on of relying Most [12]. [11], hra h etkonaheaiiybudis bound achievability known best the whereas on audr[4 rpsdacntuto hthsan has with that probability achievability construction error known a decaying best proposed asymptotically the [14] regarding Mazumdar bound And literature. the h o-dpiegoptsigpolm hyrfrt the to refer They SAFFRON( for problem. the [16], as testing on coding scheme channel group based in non-adaptive decoder tools the popular iterative are simple which on decoder, a based and scheme codes testing a posed known. bound tightest the is this knowledge our of et.Frteapoiaevrini a hw 8 htthe that [8] as shown scale was tests it of version number required approximate the For tests. ob rae hno qa to equal requ or is recovery than of greater probability be the to that is version bilistic ntenme ftssrqie is required tests of number the on w.h.p. items) defective of set on whole in interested is a one recovering version recovery approximate the For ihhg rbblt whp .. ihpoaiiyappro probability with 1 i.e., (w.h.p) probability high with eetv e hudb qa to equal be should set defective eso n sitrse nrecovering in interested is one version RO smttclyin asymptotically = − = Ω( ptesti up fw osdrtepoaiitcvrino h problem, the of version probabilistic the consider we If n[5 uhr e,PdraiadRmhnrnpro- Ramchandran and Pedarsani Lee, authors [15] In o h obntra Ttebs nw oe bound lower known best the GT combinatorial the For c ǫ O K ) ǫ K ( } rcino eetv tm teapoiaeversion approximate (the items defective of fraction K log @tamu.edu log 2 N log K N (1 N ) eeec hc ewl olwtruhthis through follow will we which reference a g), ) N − ubro et r nuht dniyatleast identify to enough are tests of number hc stebs nw oe on in bound lower known best the is which O ) ǫ ( et npoly in tests ) N K O rcino h eetv tm ntthe (not items defective the of fraction log ( and S K parse-gr 2 N ǫ N K ) sas ie.Mr importantly More given. also is hyas hwdta with that showed also They . nte ain fteproba- the of variant Another . log ( 1 (1 A K O N hra nteprobabilistic the in whereas hcodes ph ) − ( ) etrglrsparse-graph left-regular K O · nyrcnl scheme a recently Only . ε Ω( all log ) ( m o given a for K h eetv items defective the N log 2 ) log log F c ǫ 2 n otebest the to and ramewrok O K N m safunction a as O ( ) + ) K ( K 9,[10] [9], 2 > ε peeling O log log aching log ( 2 m ired F K N N 2 or ly 0 al g ) ) ) . m = c · K log K log N tests i.e. with an additional log K III. REVIEW: SAFFRON factor, the whole defective set (the probabilistic version of GT) can be recovered with an asymptotically high probability of As mentioned earlier the SAFFRON scheme [15] is 1 − O(K−α). based on left-regular sparse graph codes and is applied for non-adaptive group testing problem. In this section we will Our Contributions briefly review their testing scheme, iterative decoding scheme (reconstruction of x given y) and their main results. The In this work, we propose a non-adaptive GT scheme SAFFRON testing scheme consists of two stages: the first that is similar to the SAFFRON but we employ left-and- stage is based on a left-regular sparse graph code which pools right-regular sparse-graph codes instead of the left-regular the N items into M non-disjoint bins where each item belongs sparse-graph codes and show that we only require c K log Nℓ ǫ cǫK to exactly ℓ bins. The second stage comprises of producing h number of tests for an error floor of ǫ in the approximate testing outputs at each bin where the h different combinations version of the GT problem. Although the testing complexity of the pooled items (from the first stage) at the respective of our scheme has the same asymptotic order O(K log N) as bin are defined according to a universal signature matrix. For that of [15], which as far as we are aware is the best known the first stage the authors consider a bipartite graph with N order result for the required number of tests in the approximate variable nodes (corresponding to the N items) and M bin N GT, it provides a better explicit upper bound of Θ(K log K ) nodes. Each variable node is connected to ℓ bin nodes chosen N with optimal computational complexity O(K log K ) and also uniformly at random from the M available bin nodes. All a significant improvement in the required number of tests for the variable nodes (historically depicted on the left side of finite values of K,N. Following the approach in [15] we the graph in ) have a degree ℓ, hence the left- extend our proposed scheme with the singleton-only variant regular, whereas the degree of a bin node on the right is a of the decoder to tackle the probabilistic version of the GT random variable in the range [0 : N]. problem. In Sec. V we show that for m = c · K log K log N K Definition 1 (Left-regular sparse graph ensemble). Let tests i.e. with an additional log K factor the whole defective set be the ensemble of left-regular bipartite graphs can be recovered w.h.p. Note that the testing complexity of our Gℓ(N,M) where for each variable node the ℓ right node connections scheme is only log K factor away from the best known lower N are chosen uniformly at random from the M right nodes. bound of Ω(K log K ) [7] for the probabilistic GT problem. We also extend our scheme to the noisy GT problem, where Let T ∈ {0, 1}M×N be the adjacency matrix corre- the test results are corrupted by noise, using an error-correcting G sponding to a graph G ∈ G (N,M) i.e., each column in T code similar to the approach taken in [15]. We demonstrate the ℓ G corresponds to a variable node and has exactly ℓ ones. Let the improvement in the required number of tests due to left-and- rows in matrix T be given by T = [tT , tT ,..., tT ]T . For right-regular graphs for finite values of K,N via simulations. G G 1 2 M the second stage let the universal signature matrix defining the h tests at each bin be U ∈{0, 1}h×N . Then the overall testing II. PROBLEM STATEMENT T T T matrix A := [A1 ,..., AM ] where Ai = U diag(ti) of size th Formally the group testing problem can be stated as h × N defines the h tests at i bin. Thus the total number of following. Given a total number of N items out of which tests is m = M × h. K are defective, the objective is to perform m different tests The signature matrix U in a more general setting with and identify the location of the K defective items from the parameters r and p can be given by test outputs. For now we consider only the noiseless group testing problem i.e., the result of each test is exactly equal to b1 b2 ··· br the boolean OR of all the items participating in the test.  b1 b2 ··· br  b 1 b 1 ··· bπ1 Let the support vector x ∈ {0, 1}N denote the list of  π1 π2 r   bπ1 bπ1 ··· bπ1  items in which the indices with non-zero values correspond to Ur,p =  1 2 r  (1)  .  the defective items. A non-adaptive testing scheme consisting  ··· .  m×N   of m tests can be represented by a matrix A ∈ {0, 1} b p−1 b p−1 ··· b p−1   π1 π2 πr  where each row a corresponds to a test. The non-zero indices   i b p−1 b p−1 ··· b p−1 th  π1 π2 πr  in row ai correspond to the items that participate in i test.   ⌈log r⌉ The output corresponding to vector x and the testing scheme where bi ∈{0, 1} 2 is the binary expansion vector for i A and can be expressed in matrix form as: k k k k and bi is the complement of bi. π = [π1 , π2 ,...,πr ] denotes a permutation chosen at random from symmetric group S . y = A ⊙ x r Henceforth Ur,p will refer to either the ensemble of matrices where ⊙ is the usual matrix multiplication in which the generated over the choices of the permutations πk for k ∈ [1 : arithmetic multiplications are replaced by the boolean AND p − 1] or a matrix picked uniformly at the random from the operation and the arithmetic additions are replaced by the said ensemble. The reference should be sufficiently clear from boolean OR operation. the context. In the SAFFRON scheme the authors employed a signature matrix from Ur,p with r = N and p = 3 thus [15] that enabled the authors to show that their SAFFRON resulting in a U of size h × N with h = 6 log2 N. scheme with the described peeling decoder solves the group testing problem with c · K log N tests and O(K log N) com- Decoding putational complexity. Lemma 3 (Bin decoder analysis). For a signature matrix Before describing the decoding process let us review Ur,p as described in (1), the bin decoder successfully detects some terminology. A bin is referred to as a singleton if there and resolves if the bin is either a singleton or a resolvable is exactly one non-zero variable node connected to the bin and double-ton. In the case of the bin being a multi-ton, the bin similarly referred to as a double-ton in case of two non-zero decoder declares a wrong hypothesis of either a singleton or a variable nodes. In the case where we know the identity of one 1 resolvable double-ton with a probability no greater than p 1 . of them leaving the decoder to decode the identity of the other r − one, the bin is referred to as a resolvable double-ton. And if Proof: This result was proved in [15] for the choice of the bin has more than two non-zero variable nodes attached parameters r = N and p = 3. The extension of the result to we refer to it as a multi-ton. First part of the decoder which general r, p parameters is straight forward. is referred to as bin decoder will be able to detect and decode exactly the identity of the non-zero variable nodes connected For convenience the performance of the peeling decoder to the bin if and only if the bin is a singleton or a resolvable is analyzed independently of the bin decoder i.e., a peeling double-ton. If the bin is a multi-ton the bin decoder will detect decoder is considered which assumes that the bin decoder it neither as a singleton nor a resolvable double-ton with high is working accurately which will be referred to as oracle probability. The second part of the decoder which is commonly based peeling decoder. Another simplification is that a pruned referred to as peeling decoder [17], when given the identities graph is considered where all the zero variable nodes and their of some of the non-zero variable nodes by the bin decoder, respective edges are removed from the graph. Also the oracle identifies the bins connected to the recovered variable nodes based peeling decoder is assumed to decode a variable node and looks for newly uncovered resolvable double-ton in these if it is connected to a bin node with degree one or degree bins. This process of recovering new non-zero variable nodes two with one of them already decoded, in an iterative fashion. from already discovered non-zero variable nodes proceeds in Any right node with more than degree two is untouched by an iterative manner (referred to as peeling off from the graph this oracle based peeling decoder. It is easy to verify that the historically). For details of the decoder we refer the reader to original decoder with accurate bin decoding is equivalent to [15]. this simplified oracle based peeling decoder on a pruned graph. The overall group testing decoder comprises of these two Definition 4 (Pruned graph ensemble). Let the pruned graph ˜ decoders working in conjunction as follows. In the first and ensemble Gl(N,K,M) be the set of all bipartite graphs foremost step, given the m tests output, the bin decoder is obtained from removing a random N − K subset of variable applied on the M bins and the set of variable nodes that are nodes from a graph from the ensemble Gℓ(N,M). Note that connected to singletons are decoded and output. We denote graphs from the pruned ensemble have K variable nodes. the decoded set of non-zero variable nodes as D. Now in an iterative manner, at each iteration, a variable node from Before we analyze the pruned graph ensemble let us D is considered and the bin decoder is applied on the bins define the right-node degree distribution (d.d) of an ensemble i connected to this variable node. The main idea is that if one of as R(x)= i Rix where Ri is the probability that a right- these bins is detected as a resolvable double-ton thus resulting node in anyP graph from the ensemble has degree i. Similarly i−1 in decoding a new non-zero variable node. The considered the edge d.d ρ(x) = i ρix is defined where ρi is the variable node in the previous iteration is moved from D to probability that a randomP edge in the graph is connected to a a set of peeled off variable nodes P and the newly decoded right-node of degree i. Note that the left-degree distribution is ℓ non-zero variable node in the previous iteration, if any, will be regular (i.e. L(x) = x ) even for the pruned graph ensemble placed in set D and continue to the next iteration. The decoder and hence is not specifically discussed. is terminated when D is empty and is declared successful if Lemma 5 (Edge d.d of Pruned graph). For the pruned ensem- the set P equals the set of defective items. ble G˜ℓ(N,K,M), it was shown that in the limit K,N → ∞, −λ −λ Remark 2. Note that we are not literally peeling off the ρ1 = e and ρ2 = λe where λ = ℓ/cǫ for M = cǫK for decoded nodes from the graph because of the non-linear OR any constant cǫ. operation on the non-zero variable nodes at each bin thus Lemma 6. For the pruned graph ensemble G˜ℓ(N,K,M) the preventing us in subtracting the effect of the non-zero node oracle-based peeling decoder fails to peel off atleast (1 − ǫ) from the measurements of the bin node unlike in the problems fraction of the variable nodes with exponentially decaying of compressed sensing or LDPC codes on binary erasure probability if M ≥ cǫK where the required cǫ and ℓ for channel. various values of ǫ are given in Table. I.

Now we state the series of lemmas and theorems from −3 −4 −5 −6 −7 −8 −9 ǫ 10 10 10 10 10 10 10 Note that A is defined by placing the r columns of U at the c1(ǫ) 6.13 7.88 9.63 11.36 13.10 14.84 16.57 i ℓ 7 9 10 12 14 15 17 r non-zero indices of ti and the remaining are padded with zero columns. We can observe that the total number of tests TABLE I for this scheme is m = M × h where h =2p log r. CONSTANTS FOR VARIOUS ERROR FLOOR VALUES 2 Example 9. Let us look at an example for (N,M) = (6, 3) and (ℓ, r)=(2, 4). Then the adjacency matrix TG of a graph G ∈ G (6, 3) and a signature matrix U ∈ {0, 1}4×3 for Proof: Instead of reworking the whole proof here from 2,4 p =1 and log r =2 are given by [15], we will list the main steps involved in the proof which 2 we will use further along. Let p be the probability that a 0 0 1 1 j 110101 random defective item is not identified at iteration j of the 0 1 0 1 TG = 011110 U = . decoder, in the limit N and K → ∞. Then one can write the 1 1 0 0 101011   density evolution (DE) equations relating p to p as   1 0 1 0 j+1 j   ℓ−1 pj+1 = [1 − (ρ1 + ρ2(1 − pj ))] . Then, the measurement matrix A with m = 2pM⌈log2 r⌉ = 12 tests is given by For this DE, we can see that 0 is not a fixed point and hence 000101 pj 9 0 as j → ∞. Therefore numerically optimizing the 010001 values of cǫ and ℓ such that limj→∞ pj ≤ ǫ gives the optimal   110000 values for cǫ and ℓ given in Table. I. It was also shown [15],   100100 [16] that for such sparse graph systems the actual fraction   000110 of the undecoded variable nodes deviates from the average   001010 undecoded fraction of the variable nodes given by the DE A =   011000 with exponentially low probability.   010100   Combining the lemmas and remarks above, the main 000011   result from [15] can be summarized as below. 001001   101000 Theorem 7. A random testing matrix from the SAFFRON   100010 scheme with m = 6cǫK log2 N tests recovers atleast (1 − ǫ)   K   fraction of the defective items w.h.p of atleast 1 − O( N 2 ). Definition 10 (Regular-SAFFRON). Let the ensemble of The computational complexity of the decoding scheme is testing matrices be Gℓ,r(N,M) × Ur,p where a graph G from . The constant is given in Table. I for some O(K log N) cǫ Gℓ,r(N,M) and a signature matrix U from Ur,p are chosen at values of ǫ. random and the testing matrix A is defined according to Eq.

(2). Note that the total number of tests is 2pM log2 r where IV. PROPOSED SCHEME Nℓ r = M . The main difference between the SAFFRON scheme For the regular-SAFFRON testing ensemble defined in described in Sec. III and our proposed scheme is that we Def. 10, we employ the iterative decoder described in Sec. III. use a left-and-right-regular sparse-graph instead of left-regular Similar to the SAFFRON scheme we will analyze the peeling sparse-graph in the first stage for the binning operation. decoder and the bin decoder separately and union bound the Definition 8 (Left-and-right-regular sparse graph ensemble). total error probability of the decoding scheme. As we have We define Gℓ,r(N,M) to be the ensemble of left-and-right- already mentioned the analysis of just the peeling decoder part regular graphs where the Nℓ edge connections from the left can be carried out by considering a simplified oracle-based and Mr(= Nℓ) edge connections from the right are paired up peeling decoder on a pruned graph with only the non-zero according to a permutation π chosen at random from SNℓ. variable nodes remaining.

M×N Definition 11 (Pruned graph ensemble). We will define the Let TG ∈ {0, 1} be the adjacency matrix corre- pruned graph ensemble G˜ℓ,r(N,K,M) as the set of all graphs sponding to a graph G ∈ Gℓ,r(N,M) i.e., each column in obtained from removing a random N − K subset of variable TG corresponding to a variable node has exactly ℓ ones and each row corresponding to a bin node has exactly r ones. nodes from a graph from left-and-right-regular sparse-graph And let the universal signature matrix be U ∈ {0, 1}h×r ensemble Gℓ,r(N,M). chosen from the U ensemble. Then the overall testing r,p Note that graphs from the pruned ensemble have K matrix A := [AT ,..., AT ]T where A ∈{0, 1}h×N defining 1 M i variable nodes with a degree ℓ whereas the right degree is the h tests at ith bin is given by not regular anymore. A = [0,..., 0, u , 0,..., u , 0,..., u ], where (2) i 1 2 r Lemma 12 (Edge d.d of pruned graph). For the pruned graph ti = [0,..., 0, 1, 0,..., 1, 0,..., 1]. ensemble G˜ℓ,r(N,K,M) it can be shown in the limit K,N → ℓ ℓ r = Nℓ r ≤ N c1K

. c1K bins . c1K bins N vars . π N vars . π . . . . log N tests log r tests

U ∈ {0, 1}c log N×N U ∈ {0, 1}c log r×r

N M = Θ(K log N) M = Θ(K log K )

Fig. 1. Illustration of the main differences between SAFFRON [15] on the left and our regular-SAFFRON scheme on the right. In both the schemes the peeling decoder on sparse graph requires Θ(K) bins. But for the bin decoder part, in SAFFRON scheme the right degree is a random variable with a maximum N value of N and thus requires Θ(log N) tests at each bin. Whereas our scheme based on right-regular sparse graph has a constant right degree of Θ( K ) and N N thus requires only Θ(log K ) tests at each bin. Thus we can improve the number of tests from Θ(K log N) to Θ(K log K ).

∞ and K = o(N) that the edge d.d coefficients approach ǫ) fraction of the variable nodes with exponentially decaying −λ −λ ρ1 = e and ρ2 = λe where λ = ℓ/c for the choice of probability for M = cǫK where ℓ,cǫ for various ǫ is given in M = cK, c being some constant. Table. I.

Proof: We will first derive R(x) for the pruned graph Proof: We showed in Lemma. 12 that, in the limit of ′ R (x) K,N → ∞, the edge degree distribution coefficients ρ and ensemble and then use the relation ρ(x)= R′(1) [16] to derive 1 the edge d.d. Note that all the bin nodes have a uniform degree ρ2 approach the same values as in the SAFFRON scheme (see r before pruning. In the pruning operation we are removing Lem. 5). Now we follow the exact same approach as that of −λ −λ a N − K subset of variable nodes at random which means Lem. 6 where the limiting values of ρ1 = e and ρ2 = λe from the bin node perspective, in an asymptotic sense, this is are used in the DE equations to show that for the given values equivalent to removing each connected edge with a probability of ℓ and cǫ limj→∞ pj ≤ ǫ. 1 − β where β := K . Under this process the right-node d.d N Theorem 14. Let p ∈ Z such that K = o(N 1−1/p). A can be written as random testing matrix from the proposed regular SAFFRON r−1 c2N R1 = rβ(1 − β) , and similarly (3) ensemble Gℓ, Nℓ (N,cǫK) × U Nℓ ,p with m = c · K log2 cǫK cǫK K r tests recovers atleast (1 − ǫ) fraction of the defective items R = βi(1 − β)r−i ∀i<= r i i w.h.p. The computational complexity of the decoding scheme N ℓ is O(K log ). The constants are c =2pcǫ,c2 = where ℓ thus giving us R(x) = (βx + (1 − β))r. This gives us K cǫ and cǫ for various values of ǫ are given in Table. I. rβ(βx + (1 − β))r−1 ρ(x)= rβ Proof: It remains to be shown that for the proposed reg- ular SAFFRON scheme the total probability of error vanishes = (βx + (1 − β))r−1. asymptotically in K and N. Let E1 be the event of oracle- r−1 Thus we can compute that ρ1 = (1 − β) and ρ2 = (r − based peeling decoder terminating without recovering atleast r−2 1)β(1 − β) . For M = cK we evaluate these quantities in (1−ǫ)K variable nodes. Let E2 be the event of the bin decoder the limit K,N → ∞ as making an error during the entirety of the peeling process and Nℓ −1 Ebin be the event of one instance of bin decoder making an K cK lim ρ1 = lim 1 − error. The total probability of error Pe can be upper bounded K,N→∞ K,N→∞  N  by ℓ = e−λ where λ = c Pe ≤ Pr(E1)+ Pr(E2) −λ Similarly we can show limK,N→∞ ρ2 = λe . ≤ Pr(E1)+ Kℓ Pr(Ebin) Kp Note that even if our initial ensemble is left-and-right- ∈ O regular the pruned graph ensemble has asymptotically the N p−1  same degree distribution as in the SAFFRON scheme where where the second inequality is due to the union bound over the initial ensemble is left-regular. a maximum of Kℓ (number of edges in the pruned graph) Lemma 13. For the pruned graph ensemble G˜ℓ,r(N,K,M) instances of bin decoding. The third line is due to the fact the oracle-based peeling decoder fails to peel off atleast (1 − that Pr(E1) is exponentially decaying in K (see Lemma. 13) cǫK p−1 and Pr(Ebin) = ( Nℓ ) (see Lemma. 3 and Def. 10) VI. ROBUST GROUP TESTING

V. TOTAL RECOVERY: SINGLETON-ONLY VARIANT In this section we extend our scheme to the group testing problem where the test results can be noisy. Formally, the In this section we will look at the proposed regular- signal model can be described as SAFFRON scheme but with a decoder that uses only the singleton bins. To elaborate, the only difference is in the y = A ⊙ x + w, decoder which is not iterative in this framework and recovers the variable nodes connected to only the singleton bin nodes N and terminates. We will refer to this scheme as singleton-only where w ∈ {0, 1} is an i.i.d. noise vector distributed according to Bernoulli distribution with parameter 1 regular-SAFFRON scheme. The trade-off is that we can now 0 0. In Eqn. 4, q :=1 − q. By using union bound over all the K variable nodes in Even though the computational complexity of using random the pruned graph, the probability Pe that the singleton-only decoder fails to recover a defective item can be bounded by codes is exponential in block length of the code since the N block length for our application is O(log K ) and hence we ℓ Pe ≤ K(1 − R1) have an overall computational complexity of O(N). But in practice one can use any of the popular error-correcting = O K(1 − e−1)e(1+α) log K   codes such as spatially-coupled LDPC codes or polar codes −1 = O Ke−e e(1+α) log K which are known to be capacity achieving [18], [19] whose   computational complexity is linear in block length. = O K−α . ′ The modified signature matrix Ur,p can be described via  −x In third line we used (1 − x) ≤ e . Ur,p given in Eq. (1) and encoding function f for Cn where p n = ⌈log2 r⌉ as follows: a singleton with probability no greater than rκ . The robust bin decoder wrongly declares a singleton with probability no f(b1) f(b2) ··· f(br) 1 greater than rpκ+p−1 .  f(b1) f(b2) ··· f(br)  f(b 1 ) f(b 1 ) ··· f(bπ1 )  π1 π2 r  Proof: Let Ei be the event that the error-control decoder ′  b 1 b 1 b 1  f( π ) f( π ) ··· f( πr ) g(yi1) commits an error at section i. From Eqn. (4) we know Ur,p :=  1 2  (5)  .  that Pr(E )=2−κ log r = r−κ. The robust bin decoder misses  ··· .  i   a singleton if the error-control decoder g(yi1) commits an f(b p−1 ) f(b p−1 ) ··· f(b p−1 )  π π πr   1 2  error at any one section. Thus the probability of missing a f(b p−1 ) f(b p−1 ) ··· f(b p−1 )  π π πr  singleton can be upper bounded by applying union bound over  1 2  Then the overall testing matrix A is defined in identical all the sections i ∈ [0 : p − 1] giving the required result. fashion to the definition in Sec. III for the case of noiseless Consider a singleton bin and let the event where the case except that U will be replaced by U′ in Eqn. (5). robust bin decoder outputs a singleton hypothesis but the Formally it can be defined as A : AT AT T where = [ 1 ,..., M1 ] wrong index be Ebin. This event happens when the error- ′ Ai = U diag(ti) where the binary vectors ti are defined in control decoder commits an error and outputs the exact same Sec. III. wrong index at each and every section. We assume that given the error-control decoder makes an error, the output Decoding is uniformly random among all the remaining indices. Thus 1 1 p−1 Pr(Ebin) can be upper bounded by rκ ( r1+κ ) which upon The decoding scheme for the robust group testing, similar simplification gives us the required result. to the case of noiseless case, has two parts with the peeling part The fraction of missed singletons can be compensated by of the decoder identical to that of the noiseless case whereas p using M(1 + rκ ) instead of M such that the total number of the bin detection part differs slightly with an extra step of p p decoding for the error control code involved. singletons decoded will be M(1 + rκ )(1 − rκ ) ≈ M. Z 1−1/p Given the test output vector at a bin y = Theorem 17. Let p ∈ such that K = o N . T T T T T The proposed robust regular SAFFRON scheme using m = [y01, y02, y11,..., y(p−1)2] , the bin detection for the noisy c · K log Nℓ tests recovers atleast (1 − ǫ) fraction of the case can be summarized as following: The decoder ∀i ∈ [0 : 2 cǫK p − 1] applies the decoding function g(·) to the first segment defective items w.h.p. where c =2pβ(q)cǫ and β(q)=1/R. yi1 in each section i and obtains the location li whose binary expansion is equal to the error-correcting decoder output Proof: Similar to the noiseless case the total probability of error Pe is dominated by the performance of bin decoder. g(yi1). The decoder then declares the bin as a singleton if i π = li ∀i. l0 Pe ≤ Pr(E1)+ Kℓ Pr(Ebin) Similarly given that one of the variable nodes connected Kp+pκ = Pr(E )+ O to the bin is already decoded to be non-zero, the resolvable 1 N p−1+pκ)  double-ton decoding can be summarized as following. Let the N (p−1)(1+κ) location of the already recovered variable node in the bin = O p−1+pκ) (originally a double-ton) be l then the test output can be  N  0 −κ given as ∈ O(N )

f(bl ) f(bl ) where the second line is due to Lem. 16 and the third line y01 0 1 w01 is due to the fact that Pr(E1) is exponentially decaying in K y  f(bl0 )   f(bl1 )  w  02  02 and K ≤ N (p−1)/p for large enough K,N. y f(b 1 ) f(b 1 ) 11 u u w πl πl w11   = l0 ∨ l1 + =  0  ∨  1  +    .   .   .   .   .   .   .   .  VII. SIMULATION RESULTS         y  p   p   p2 f(bπ ) f(bπ ) wp2    l0   l1    In this section we will evaluate the performance of the where the location of the second non-zero variable node l1 proposed regular-SAFFRON scheme via Monte Carlo simula- needs to be recovered. Given y = ul0 ∨ ul1 + w and ul0 , the tions and compare it with the results of SAFFRON scheme

first segments of each section in ul1 + w can be recovered provided in [15] for both the noiseless and noisy models. since for each segment of ul0 either the vector f(bπk ) or it’s l0 complement is available. Once the first section f(bπi )+ w Noiseless Group Testing l1 of each segment i is recovered, we apply singleton decoding As per Thm. 14 the proposed regular SAFFRON scheme procedure and rules as described above. requires only 6c K log Nℓ tests as opposed to 6c K log N ǫ cǫK ǫ Lemma 16 (Robust Bin Decoder Analysis). For a signature tests of SAFFRON scheme to recover (1 − ǫ) fraction of ′ matrix U r,p as described in (5), the robust bin decoder misses defective items with a high probability. We demonstrate this 0 10 100

−1 10−1 10

−2 − 10 10 2

− 10 3 10−3

− 10 4 10−4 ℓ =3 ℓ =5 − 10 5 ℓ =7 q=0.03 −5 q=0.04 Fraction of unidentified10 defectives ℓ =3, regular q=0.05 ℓ =5, regular Fraction of unidentified defectives −6 10 q=0.03, regular ℓ =7,regular q=0.04, regular −6 10 q=0.05, regular − 100 200 300 400 500 600 700 10 7 Number of tests per defective item (m/K) 4 6 8 10 12 Number of tests (×105) Fig. 2. MonteCarlo simulations for K = 100,N = 216. We compare the SAFFRON scheme [15] with the proposed regular SAFFRON scheme for various left degrees ℓ ∈ {3, 5, 7}. The plots in blue indicate the SAFFRON 32 scheme and the plots in red indicate our regular SAFFRON scheme based on Fig. 3. MonteCarlo simulations for K = 128,N = 2 . We compare the left-and-right-regular bipartite graphs. SAFFRON scheme with the proposed regular-SAFFRON scheme for a left degree ℓ = 12. We fix the number of bins and vary the rate of the error control code used. The plots in blue indicate the SAFFRON scheme[15] and the plots in red indicate the regular-SAFFRON scheme based on left-and- by simulating the performance for the system parameters right-regular bipartite graphs. summarized below.

16 • We fix N =2 and K = 100 of 4 × 7(4+2e) bits at each bin and the total number of tests • For ℓ ∈{3, 5, 7} we vary the number of bins M = cK. m = 28M(4+2e). • In Eqn. 1 the parameter p =2 is chosen for matrix U Nℓ • Thus the bin detection size is h = 6 log2 cK VIII. CONCLUSION Nℓ • Hence the total number of tests m =6cK log2 cK  We addressed the Group Testing problem for K defective The results are shown in Fig. 2. We observe that there is items out of N items. Using left-and-right-regular sparse- clear improvement in performance for the proposed regular graph codes we propose a new construction for the testing SAFFRON scheme when compared to the SAFFRON scheme matrix based on that of Lee et al., [15]. We show that this for each ℓ ∈{3, 5, 7}. improves the testing complexity upon the previous results for the approximate version of the Group Testing problem and Noisy Group Testing achieves asymptotically vanishing error probability under sub- Similar to the noiseless group testing problem we simu- linear time, order optimal, computational complexity. We also late the performance of our robust regular-SAFFRON scheme show that the proposed scheme with a variant of the original and compare it with that of the SAFFRON scheme. For decoder has a testing complexity that is only log K factor convenience of comparison we choose our system parameters away from the lower bound for the probabilistic version of identical to the choices in [15]. The system parameters are the Group Testing problem with order optimal computational summarized below: complexity.

32 7 • N =2 ,K =2 . We fix ℓ = 12,M = 11.36K REFERENCES • BSC noise parameter q ∈{0.03, 0.04, 0.05} • In Eqn. 1 the parameter p =1 is chosen for matrix U [1] R. Dorfman, “The detection of defective members of large populations,” Nℓ The Annals of Mathematical Statistics, vol. 14, no. 4, pp. 436–440, 1943. • Thus the bin detection size is h = 4 log2 M [2] D.-Z. Du and F. K. Hwang, Combinatorial group testing and its applications, vol. 12. World Scientific, 1999. The results are shown in Fig. 3. Note that for the above set [3] H.-B. Chen and F. K. Hwang, “A survey on nonadaptive group testing Nℓ algorithms through the angle of decoding,” Journal of Combinatorial of parameters the right degree r = M ≈ 26. We choose to operate in field GF (27) thus giving us a message length of 4 Optimization, vol. 15, no. 1, pp. 49–59, 2008. [4] D. M. Malioutov and K. R. Varshney, “Exact rule learning via boolean symbols. For the choice of code we use a (4+2e, 4) Reed- compressed sensing.,” in Proc. Int. Conf. on ., Solomon code for e ∈ [0 : 8] thus giving us a column length pp. 765–773, 2013. [5] M. T. Goodrich, M. J. Atallah, and R. Tamassia, “Indexing information for data forensics,” in International Conference on Applied Cryptogra- phy and Network Security, pp. 206–221, Springer, 2005. [6] A. Emad and O. Milenkovic, “Poisson group testing: A probabilistic model for nonadaptive streaming boolean compressed sensing,” in Int. Conf. on Acoustics, Speech & Signal Proc., pp. 3335–3339, IEEE, 2014. [7] C. L. Chan, S. Jaggi, V. Saligrama, and S. Agnihotri, “Non-adaptive group testing: Explicit bounds and novel algorithms,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 3019–3035, 2014. [8] G. K. Atia and V. Saligrama, “Boolean compressed sensing and noisy group testing,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1880–1901, 2012. [9] A. G. D’yachkov and V. V. Rykov, “Bounds on the length of disjunctive codes,” Problemy Peredachi Informatsii, vol. 18, no. 3, pp. 7–13, 1982. [10] P. Erd¨os, P. Frankl, and Z. F¨uredi, “Families of finite sets in which no set is covered by the union ofr others,” Israel Journal of Mathematics, vol. 51, no. 1, pp. 79–89, 1985. [11] W. Kautz and R. Singleton, “Nonrandom binary superimposed codes,” IEEE Transactions on , vol. 10, no. 4, pp. 363–377, 1964. [12] E. Porat and A. Rothschild, “Explicit nonadaptive combinatorial group testing schemes,” IEEE Transactions on Information Theory, vol. 57, no. 12, pp. 7982–7989, 2011. [13] P. Indyk, H. Q. Ngo, and A. Rudra, “Efficiently decodable non- adaptive group testing,” in Proceedings of the twenty-first annual ACM- SIAM symposium on Discrete Algorithms, pp. 1126–1142, Society for Industrial and Applied Mathematics, 2010. [14] A. Mazumdar, “Nonadaptive group testing with random set of defectives via constant-weight codes,” arXiv preprint arXiv:1503.03597, 2015. [15] K. Lee, R. Pedarsani, and K. Ramchandran, “Saffron: A fast, efficient, and robust framework for group testing based on sparse-graph codes,” arXiv preprint arXiv:1508.04485, 2015. [16] T. Richardson and R. Urbanke, Modern coding theory. Cambridge University Press, 2008. [17] X. Li, S. Pawar, and K. Ramchandran, “Sub-linear time compressed sensing using sparse-graph codes,” in Proc. IEEE Int. Symp. Information Theory, pp. 1645–1649, 2015. [18] S. Kumar, A. J. Young, N. Macris, and H. D. Pfister, “Threshold saturation for spatially coupled ldpc and ldgm codes on bms channels,” IEEE Trans. Inf. Theory, vol. 60, no. 12, pp. 7389–7415, 2014. [19] S. Kudekar, T. Richardson, and R. L. Urbanke, “Spatially coupled ensembles universally achieve capacity under belief propagation,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 7761–7813, 2013.