Arxiv:2007.08111V5 [Cs.IT] 17 Mar 2021 Fected Members in a Population

Group testing for connected communities Pavlos Nikolopoulosy Sundara Rajan Srinivasavaradhanz Tao Guoz Christina Fragouliz Suhas Diggaviz yEPFL, Switzerland zUniversity of California Los Angeles, USA Abstract history of several decades dating back to R. Dorfman in 1943 and a number of variations and setups have been examined in the literature [Dorfman, 1943, Du In this paper, we propose algorithms that and Hwang, 1993, Aldridge et al., 2019, Yaakov Mali- leverage a known community structure to novsky, 2016]. make group testing more efficient. We consider a population organized in disjoint com- The observation we make in this paper is that we can munities: each individual participates in a leverage a known community structure to make group community, and its infection probability de- testing more efficient. The work in group testing we pends on the community (s)he participates know of, assumes \independent" infections, and ig- in. Use cases include families, students who nores that an infection may be governed by community participate in several classes, and workers spread; we argue that taking into account the commu- who share common spaces. Group testing nity structure can lead to significant savings. As a use reduces the number of tests needed to iden- case, consider an apartment building consisting of F tify the infected individuals by pooling di- families that have practiced social distancing; clearly agnostic samples and testing them together. there is a strong correlation on whether members of We show that if we design the testing strat- the same family are infected or not. Assume that the egy taking into account the community struc- building management would like to test all members ture, we can significantly reduce the number to enable access to common facilities. We ask, what is of tests needed for adaptive and non-adaptive the most test-efficient way to do so. group testing, and can improve the reliability Our approach enlarges the regime where group test- in cases where tests are noisy. ing can offer benefits over individual testing. Indeed, a limitation of group testing is that it offers fewer or no benefits when k grows linearly with n [Riccio and 1 Introduction Colbourn, 2000,Hu et al., 1981,Ungar, 1960,Aldridge, 2019, Aldridge et al., 2019]. Taking into account the Group testing pools together diagnostic samples to community structure allows to identify and remove reduce the number of tests needed to identify in- from the population large groups of infected members, arXiv:2007.08111v5 [cs.IT] 17 Mar 2021 fected members in a population. In particular, if in thus reducing their proportion and converting a lin- a population of n members we have a small frac- ear to a sparse regime identification. Essentially, the tion infected (say k n members), we can iden- community structure can guide us on when to use in- n tify the infected members using as low as O(k log( k )) dividual, and when group testing. group tests, as opposed to n individual tests [Du and Hwang, 1993, Aldridge et al., 2019, Kucirka et al., Our main results are as follows. Assume that n popu- 2020]. Triggered by the need of widespread testing, lation members are partitioned into F groups that we such techniques are already being explored in the con- call families, out of which kf families have at least one text of Covid-19 [Gollier and Gossner, 2020,Broadfoot, infected member. 2020,Ellenberg, 2020,Verdun et al., 2020,Ghosh et al., • We derive a lower bound on the number of tests, 2020, Kucirka et al., 2020]. Group testing has a rich which for some regimes increases (almost) linearly with kf (the number of infected families) as opposed to k (the number of infected members). This work was accepted at the Proceedings of the • We propose an adaptive algorithm that achieves the 24th International Conference on Artificial Intelligence and lower bound in some parameter regimes. Statistics (AISTATS) 2021, San Diego, California, USA. • We propose a nonadaptive algorithm that accounts Group testing for connected communities for the community structure to reduce the number of ber, and the non-zero elements determine the set δτ . tests when some false positive errors can be tolerated. Although adaptive testing uses less tests than non- • We propose a new decoder based on loopy belief adaptive, non-adaptive testing is more practical as all propagation that is generic enough to accommodate tests can be executed in parallel. any community structure and can be combined with • Scaling regimes of operation: assume k = Θ(nα), we any test matrix (encoder) to achieve low error rates. say we operate in the linear regime if α = 1; in the • We numerically show that leveraging the community sparse regime if 0 ≤ α < 1; in the very sparse regime structure can offer benefits both when the tests used if k is constant. have perfect accuracy and when they are noisy. Known results. The following are well estab- We present our models in Section 2, the lower bound in lished results (see [Johnson, 2017, Du and Hwang, Section 3, our algorithms for the noiseless case in Sec- 1993, Aldridge et al., 2019] and references therein): tion 4, and loopy belief propagation (LBP) decoding • In the combinatorial model, since T tests allow to in Section 5. Numerical results are in Section 6. distinguish among 2T combinations of test outputs, then to identify all k infected members without error, Note: The proofs of our theoretical results (in Sec- we need: 2T ≥ n , T ≥ log n. This is known as tions 3{4) are in the Appendix, along with an extended k 2 k the counting bound [Johnson, 2017,Du and Hwang, explanation of the rationale behind our algorithms. 1993,Aldridge et al., 2019] and implies that we cannot n use less than T = O(k log k ) tests. In the probabilistic 2 Background and notation model, a similar bound has been derived for the number of tests needed on average: T ≥ nh2 (p), where h is the binary entropy function. 2.1 Traditional group testing 2 • Noiseless adaptive testing can achieve the counting α Our work extends traditional group testing to infection bound for k = Θ(n ) and α 2 [0; 1); for non-adaptive models that are based on community spread. For this testing, this is also true of α 2 [0; 0:409], if we allow reason, we review here known results from prior work. a vanishing (with n) error [Aldridge et al., 2019,Coja- Oghlan et al., 2020, Coja-Oghlan et al., 2020]. Traditional group testing typically assumes a popu- • In the linear regime (α = 1), group testing offers lit- lation of n members out of which some are infected. tle benefits over individual testing. In particular, if the Two infection models are considered: (i) in the combi- infection rate k=n is more than 0:38, group testing does natorial model, a fixed number of infected members k, not use fewer tests than 1-by-1 (individual) testing un- are selected uniformly at random among all sets of size less high identification-error rates are acceptable [Ric- k; (ii) in the probabilistic model, each item is infected cio and Colbourn, 2000, Hu et al., 1981, Ungar, 1960]. independently of all others with probability p, so that ¯ the expected number of infected members is k = np. 2.2 Community and infection models A group test τ takes as input samples from nτ individuals, pools them together and outputs a single value: In this paper, we additionally assume a known commu- positive if any one of the samples is infected, and neg- nity structure: the population can be decomposed in ative if none is infected. More precisely, let Ui = 1 F disjoint groups of individuals that we call families. PF when individual i is infected and 0 otherwise. Then Each family j has Mj members, so that n = j =1 Mj . the traditional group testing output Yτ takes a binary In the symmetric case, Mj = M for all j and n = FM . value calculated as Y = W U , where W stands for Note, that the term \families" is not limited to real τ i2δτ i the OR operator (disjunction) and δτ is the group of families|we use the same term for any group of peo- people participating in the test. ple that happen to interact, so that they get infected according to some common infection principle. The performance of a group testing algorithm is mea- sured by the number of group tests T = T (n) needed We consider the following infection models, that par- to identify the infected members (for the probabilistic allel the ones in the traditional setup: model, the expected number of tests needed). Setups • Combinatorial Model (I). kf of the families are that have been explored in the literature include: infected|namely they have at least one infected mem- • Adaptive vs. non-adaptive testing: In adaptive test- ber. The rest of the families have no infected members. j ing, we use the outcome of previous tests to decide In each infected family j , there exist km infected mem- j what tests to perform next. An example of adaptive bers, with 0 ≤ km ≤ Mj . The infected families (resp. testing is binary splitting, which implements a form of infected family members) are chosen uniformly at ran- binary search. Non-adaptive testing constructs, in ad- dom out of all families (resp. members of the same vance, a test matrix G 2 f0; 1gT×n where each row family). For our analysis, we sometimes consider only j corresponds to one test, each column to one mem- the symmetric case, where km = km for each family j .

Arxiv:2007.08111V5 [Cs.IT] 17 Mar 2021 Fected Members in a Population

Nber Working Paper Series Group Testing in a Pandemic

Group Testing

The Illusion of Group Testing Teddy Furon

Group Testing-Based Robust Algorithm for Diagnosis of COVID-19

On the Cut-Off Point for Combinatorial Group Testing

Group Testing Schemes from Codes and Designs

A Maxsat-Based Framework for Group Testing

Neural Group Testing to Accelerate Deep Learning

A Survey on Combinatorial Group Testing Algorithms with Applications to DNA Library Screening

Group Testing: an Information Theory Perspective

Group Testing on a Network

Group Testing Using Left-And-Right-Regular Sparse-Graph