Shannon and group testing: Finding needles in haystacks

Oliver Johnson [email protected] : @BristOliver

School of Mathematics, University of Bristol

Inaugural Lecture, 1st November 2018

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 1 / 35 Outline of the talk

1 Catering conundrum

2 How did I get here?

3 Shannon and

4 Group testing

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 2 / 35 Section 1: Toxic talk treat teaser

Professor J is giving his inaugural lecture at Blistor University. 7 plates of delicious post-lecture snacks. Professor J’s evil nemesis, Dr X, has poisoned one of them. Whoever eats that snack will fall asleep for 24 hours. How to find the poisoned food, as efficiently as possible? Can pay any PhD student £10 to eat what we tell them.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 3 / 35 How to solve the mystery?

One idea: pay 7 PhD students to eat one snack each. One will fall asleep. Will cost us £70. Better idea: use the following strategy (only costs £30).

Olives Nuts Bread Crisps Dip Cheese Jelly sticks straws

PhD 1 X × X × X × X

PhD 2 XX × × XX ×

PhD 3 XXXX × × ×

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 4 / 35 Outcome

Olives Nuts Bread Crisps Dip Cheese Jelly sticks straws

PhD 1 X × X × X × X PhD 2 XX × × XX × PhD 3 XXXX × × ×

Solution: breadsticks were poisoned. This strategy would always work. But what if more than one snack poisoned?

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 5 / 35 More interesting problems

What if you had 500 snacks, with 10 poisoned? How many students would we need? What should we get them to eat? How should we find the poisoned snacks? What if some students are immune to poison . . . or fall asleep anyway? This is group testing.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 6 / 35 Section 2: Brief history of me

Academic careers are a matter of luck. Successes are always team efforts. Survivorship bias and impostor syndrome. Will briefly say how I got here, via Birmingham and Cambridge.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 7 / 35 Birmingham

Attended King Edward’s School. Lucky to be encouraged towards maths ...... and Maths Olympiads in particular. UK team (Sweden 1991, Russia 1992).

Learnt a lot of tricks . . . also that I can’t do geometry.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 8 / 35 Cambridge

Queens’ College Cambridge as undergrad and PhD student. Took diverse range of courses from General Relativity through to Galois Theory . . . including Information Theory. Realised I wasn’t smart enough to be a number theorist.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 9 / 35 Cambridge (cont.)

PhD (Entropy and Limit Theorems) with Yuri Suhov. Lucky to be exposed to Russian School. Applied for lots of jobs . . . got the last one (JRF at Christ’s College). After 4 years at Christ’s, job at Bristol came up . . . I didn’t get it! 3 more postdoc years later, second time lucky.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 10 / 35 Bristol

2006, moved to Bristol as Lecturer in Statistics. Fantastic city, great department. Found interesting new directions. Including collaborating with biologists (a bit) and engineers (a lot!). Lucky to have some great PhD students: Matt, Leo, Dan, Vaia, Tom, Jennifer, Zichen, Chrys.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 11 / 35 Section 3: Information theory

Claude Shannon (1916–2001)

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 12 / 35 It’s rare to talk about maths and science as op- portunities to revel in discovery. We speak, instead about their practical benefits – to society, the econ- omy, our prospects for employment. STEM courses are the means to job security, not joy. Studying them becomes the academic equivalent of eating your vegetables – something valuable, and state sanctioned, but vaguely distasteful.

– from A Mind at Play (2017) by Jimmy Soni and Rob Goodman

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 13 / 35 Shannon’s 1948 paper

One of the most influential scientific papers ever. 110k citations and counting. Impact comparable with e.g. Einstein’s work on relativity?

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 14 / 35 What did Shannon do?

from A Mathematical Theory of Communication (1948)

Boole had introduced binary arithmetic (0s and 1s). Shannon realised any information can be represented as series of these ‘bits’. Understood we can compress information down to a limit (entropy). Think about “amount of stuff” a message contains. Remove redundancy. Key Idea: Predictable messages are compressible.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 15 / 35 We all live in Shannon’s world

Phones

Hard Drives

Broadband

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 16 / 35 Channels and noise

Shannon also modelled noisy (imperfect) ‘communications channels’. Imagine we send message X :

X = (0 0 1 0 1 0 0 1 ...)

Measurements inaccurate, environment has interference etc. Through noisy channel may receive

Y = (0 1 1 0 1 0 0 0 ...) 6= X

Could model this as Y = X + Z, where Z is noise (randomly flip bits).

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 17 / 35 Coping with noise

Shannon realised noise needn’t be a problem. Can make messages longer (deliberately introduce redundancy). e.g. Naively just repeat message 3 times – call this ‘rate 1/3’. Even with some errors know what was (probably) sent. In fact, better strategies available. In general rate is ‘proportion of bits that are message’. How big can rate be?

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 18 / 35 Shannon capacity

Shannon introduced ‘capacity’ C of a channel, gave general formula. Think about “width of a pipe”. Roughly speaking: 1 [Achievability] For any rate < C, there is a strategy so the message gets through. 2 [Converse] For any rate > C, no such strategy exists. Key Idea: Some problems can’t be solved.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 19 / 35 Section 4: Group testing

1942/3, US wanted to test all men joining army for disease; potentially very expensive. Disease rare: test outcomes known with high probability (predictable). Dorfman’s idea: pool blood from a group of people, test it together. If disease present in any blood sample, test is positive. If no disease present, test is negative.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 20 / 35 Standard noiseless group testing

Outcome Test 1 1 1110000 Positive Test 2 0 0 0 01111 Positive Test 3 1 1 0 0 0 0 0 0 Negative Test 4 0 0100000 Positive Test 5 0 0 0 01100 Positive Test 6 0 0 0 01000 Positive

Represent pooling strategy via binary test matrix. Rows are tests, columns are people (‘items’). Put a 1 if item is in test. Red denotes having the disease (‘being defective’) – rare.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 21 / 35 In practice

Outcome Test 1 1 1 1 1 0 0 0 0 Positive Test 2 0 0 0 0 1 1 1 1 Positive Test 3 1 1 0 0 0 0 0 0 Negative Test 4 0 0 1 0 0 0 0 0 Positive Test 5 0 0 0 0 1 1 0 0 Positive Test 6 0 0 0 0 1 0 0 0 Positive

Want to find defective items in as few tests as possible.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 22 / 35 Why should we care?

It’s fun! Applications in . . . DNA testing for rare genetic conditions, ...... counting disease prevalence ...... cognitive radio, ...... data forensics, ...... database management, . . . and many more.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 23 / 35 My work on group testing algorithms

Been thinking how to find defective items. Describe work with Aldridge, Baldassini, Scarlett. Give computationally feasible algorithm (recipe) ...... with provable performance guarantee. Find defective items from test matrix and outcomes only. Negative test . . . all items in it are non-defective. Positive test with one item in . . . that item is defective.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 24 / 35 How well can we do in general?

Suppose N items with K defective, often consider K ∼ Nθ. How well can we do with T tests?

Theorem (BJA 2013 – ’strong counting bound’) Any matrix and any algorithm has success probability satisfying

2T (suc) ≤ . P N K

N Hence (folklore) need T ' log2 K tests (‘magic number’). Information theory argument gives this too. N Need this many tests to learn log2 K bits of information.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 25 / 35 DD algorithm (ABJ 2014): Example Stage 1

1 0 1 0 0 1 0 Negative 1 1 0 1 0 0 1 Positive 1 0 0 0 1 0 0 Negative 0 1 1 0 1 1 0 Positive 1 0 1 1 0 1 0 Positive

First, look at negative tests. Test 1 is negative, so items 1,3,6 are non-defective. Test 3 is negative, so items 1,5 are non-defective. Hence items 2,4,7 are possible defectives (PDs). This is the (pre-existing) COMP algorithm.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 26 / 35 DD example: Stage 2

0 0 0 1 1 1 Positive 0 0 0 1 0 0 Positive 0 1 0 Positive

Restrict to submatrix corresponding to the PD set Test 4 is positive with one PD item in, so item 2 is defective. Test 5 is positive with one PD item in, so item 4 is defective. Up to this point, we are definitely correct.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 27 / 35 DD example: Stage 3

0 0 0 1 1 1 Positive 0 0 0 1 0 0 Positive 0 1 0 Positive

Don’t know about item 7. Item 7 masked by defective items. Arbitrarily, make it non-defective (sparsity grounds). Probably the obvious algorithm.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 28 / 35 How to choose the matrix?

DD should work for any matrix, given enough tests. Don’t really know how to choose good test matrix ...... in fact a sensible random matrix performs well on average. With K defectives, Bernoulli(1/K) matrix does well (ABJ 2014). Constant column weight even better (AJS 2018).

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 29 / 35 Empirical (simulation) performance – from AJS 2018

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 30 / 35 Theoretical performance – capacity and rate

Want theoretical understanding of this. Analogy with Shannon’s channel work applies. Think of ‘set of defectives’ as message we want to recover. Shannon tells us when we can’t do this (not enough tests).

Definition N Rate of algorithm = log2 K /T (bits learned per test). Constant C is group testing capacity if: 1 [Achievability] For any rate < C there is a matrix and an algorithm with P(suc) → 1, 2 [Converse] For any rate > C all matrix and all algorithms have P(suc) 6→ 1.

Want a formula like Shannon’s to tell us the capacity.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 31 / 35 Theoretical performance – lower bound

Theorem θ With K = N , the DD algorithm has P(suc) → 1 using 1 [ABJ 2014] Bernoulli(1/K) matrix when

 1 − θ  rate < 0.53 min 1, . θ

2 [AJS 2018] Constant column weight matrix when

 1 − θ  rate < 0.69 min 1, . θ

Hence capacity must be at least this big.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 32 / 35 Theoretical performance – upper bound

Theorem θ With K = N , the best possible algorithm has P(suc) 6→ 1 using 1 [ABJ 2014] Bernoulli(1/K) matrix when

1 − θ  rate > 0.53 . θ

2 [AJS 2018] Constant column weight matrix when

1 − θ  rate > 0.69 . θ

Hence (for these matrices) DD is essentially optimal for θ > 1/2.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 33 / 35 Rate bounds (from AJS 2018 paper)

Big open questions: Is there a better matrix? Better practical algorithms? What about noise? (JS 2018). See forthcoming monograph, AJS 2019 (?).

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 34 / 35 Thank you for listening. Enjoy the nibbles.

Oliver Johnson @BristOliver Shannon and group testing 1st November 2018 35 / 35