<<

Battlestar GalacticaGalactica

Outline Outline ƒ BSG ƒ BSG • Statistics and ƒ Basics ƒ Basics ƒ Estimation Human or Cylon? ƒ Estimation • The story so far… ƒ Identification Group testing on the ƒ Identification ƒ Covariates Group testing on the ƒ Covariates – Video ƒ NIH grant Battlestar Galactica ƒ NIH grant

Christopher R. Bilder Department of Statistics University of Nebraska-Lincoln www.chrisbilder.com [email protected]

Slide 1 of 37 Slide 2 of 37 www.chrisbilder.com www.chrisbilder.com

Battlestar GalacticaGalactica Battlestar GalacticaGalactica

Outline Outline ƒ BSG • Statistics and ƒ BSG •Dr. Gaius Baltar ƒ Basics Battlestar Galactica ƒ Basics – Asked to develop a ƒ Estimation • The story so far… ƒ Estimation Cylon detector ƒ Identification ƒ Identification ƒ Covariates –Video ƒ Covariates • Season 1’s Bastille ƒ NIH grant • Cylons ƒ NIH grant Day episode –Centurion – # of Cylons in fleet is – Humanoid form expected to be small (new) – 47,905 individuals to • How can you test! distinguish a human from a Cylon?

Slide 3 of 37 Slide 4 of 37 www.chrisbilder.com www.chrisbilder.com Battlestar GalacticaGalactica Battlestar GalacticaGalactica

Outline Outline ƒ BSG • Dr. Gaius Baltar (continued) ƒ BSG • Dr. Gaius Baltar (continued) ƒ Basics – Season 1’s Tigh me up and Tigh me down ƒ Basics – Season 1’s Tigh me up and Tigh me down ƒ Estimation ƒ Estimation ƒ Identification ƒ Identification ƒ Covariates ƒ Covariates ƒ NIH grant ƒ NIH grant

– Video –Video

Slide 5 of 37 Slide 6 of 37 – (47,905 blood tests)∗(11 hours each) = 21,956 days www.chrisbilder.com www.chrisbilder.com

Battlestar GalacticaGalactica Battlestar GalacticaGalactica

Outline Outline ƒ BSG • Individual testing ƒ BSG • Group testing ƒ Basics ƒ Basics ƒ Estimation ƒ Estimation ƒ Identification ƒ Identification ƒ Covariates ƒ Covariates ƒ NIH grant ƒ NIH grant + or - + or - + or - + or - + or - + or - + or - + or - + or -

• If a GROUP is negative, then all 4 individuals are not + or - + or - + or - + or - + or - + or - • Problems: Cylons –Time • If the GROUP is positive, then at least ONE of the 4 individuals is a Cylon Slide 7 of 37 – Limited resources Slide 8 of 37 www.chrisbilder.com www.chrisbilder.com – “Retesting” can be done to determine who is a Cylon Battlestar GalacticaGalactica Other examples

Outline Outline ƒ BSG • Group testing (continued) ƒ BSG • Screening blood donations ƒ Basics – Time savings ƒ Basics – American Red Cross uses groups of size 16 ƒ Estimation – Save resources ƒ Estimation – HIV, Hepatitis B, Hepatitis C, … ƒ Identification ƒ Identification ƒ Covariates – Strategy works well when prevalence of a “trait” is ƒ Covariates – Screen about 6 million a year ƒ NIH grant small ƒ NIH grant • Source: Roger Dodd, Executive Director of Blood • If prevalence is large, all groups may test positive Services R & D at ARC • See Dodd et al. (Transfusion, 2002) • Drug discovery experiments – Screen hundreds of thousands of chemical compounds to look for potentially good ones – Remlinger et al. (Technometrics, 2006)

Slide 9 of 37 Slide 10 of 37 www.chrisbilder.com www.chrisbilder.com

Other examples NotationNotation

Outline Outline ƒ BSG • Multiple vector transfer design experiments ƒ BSG • Individual responses ƒ Basics ƒ Basics th th – Estimate probability an insect vector – Yik = 1 if the i item in the k group has the “trait” ƒ Estimation transfers a pathogen to a plant ƒ Estimation (positive) and ƒ Identification ƒ Identification Yik = 0 otherwise (negative) for i=1, …, I and k=1, …, K ƒ Covariates – Swallow (Phytopathology, 1985, 1987) ƒ Covariates ƒ NIH grant • Veterinary ƒ NIH grant – Yik are independent Bernoulli(p) random variables – Bovine viral diarrhea in cattle (Peck, Beef, 2006) • p = P(Yik = 1) – Avian pneumovirus (APV) in turkeys (Maherchandani • Homogenous population et al., J. Veterinary Diagnostic Investigation, 2004) • p can be thought of as the “individual probability” or • Public health studies “prevalence in a population”

– Prevalence of HCV (Liu et al., Transfusion, 1997) – Yik’s are not directly observed (at least initially) – Prevalence of HIV (Verstraeten et al., Trop. Med. & International Health, 2000)

Slide 11 of 37 Slide 12 of 37 www.chrisbilder.com www.chrisbilder.com NotationNotation NotationNotation

Outline Outline ƒ BSG • Group responses ƒ BSG • Example random variables ƒ Basics ƒ Basics – Zk = 1 denotes a positive response + or − + or − + or − + or − + or − + or − ƒ Estimation th ƒ Estimation Zk = 0 denotes a negative response for the k group ƒ Identification ƒ Identification ƒ Covariates – Zk are independent Bernoulli(θ) random variables ƒ Covariates

ƒ NIH grant • θ = P(Zk = 1) ƒ NIH grant • Individual and group response relationship + or - + or - + or -

– Zk = 1 if and only if Zk = 0 if and only if

+ or − + or − + or − + or − + or − + or −

Slide 13 of 37 Slide 14 of 37 www.chrisbilder.com www.chrisbilder.com

NotationNotation NotationNotation

Outline Outline ƒ BSG • Example observed values ƒ BSG • Example observed values ƒ Basics ƒ Basics Basics Basics y11 = 0 y21 = 0 y12 = 0 y22 = 1 y13 = 0 y23 = 0 ƒ Estimation ƒ Estimation ƒ Identification ƒ Identification ƒ Covariates ƒ Covariates ƒ NIH grant ƒ NIH grant

- + + z1 = 0 z2 = 1 z3 = 1

y31 = 0 y41 = 0 y32 = 0 y42 = 0 y33 = 0 y43 = 1

Slide 15 of 37 Slide 16 of 37 www.chrisbilder.com www.chrisbilder.com PurposePurpose EstimateEstimate p

Outline Outline ƒ BSG • Prevalence of a trait in a population (estimation problem) ƒ BSG • How can we estimate p = P(Yik = 1)? ƒ Basics • Which items are positive (identification problem) ƒ Basics – We observe information about the groups, not ƒ Estimation ƒ Estimation individuals! ƒ Identification ƒ Identification I ƒ Covariates ƒ Covariates – θ = 1 – P(Yik = 0, ∀i) = 1 – (1 – p) ƒ NIH grant ƒ NIH grant – Then p = 1 – (1 – θ)1/I – MLE for p: • Unequal group sizes – Likelihood function

where

θk = positive probability for group k Slide 17 of 37 Slide 18 of 37 www.chrisbilder.com www.chrisbilder.com Ik = size of group k

Testing error Identification

Outline Outline ƒ BSG • What if there is testing error? ƒ BSG • Dorfman (Annals of Mathematical Statistics, 1943) ƒ Basics ƒ Basics – Can incorporate sensitivity (η) and specificity (δ) – Retest all items in a positive group y12 = 0 y22 = 1 ƒ Estimation – ƒ Estimation – Often credited for the very first use ƒ Identification ƒ Identification ƒ Covariates ƒ Covariates of group testing ƒ NIH grant ƒ NIH grant •Sterrett(Annals of Mathematical Statistics, 1957) z2 = 1 – Individually retest until first positive is found – Re-group remaining items • If group is negative, STOP y32 = 0 y42 = 0 • If group is positive, repeat – Expected number retests is smaller than Dorfman

Slide 19 of 37 Slide 20 of 37 www.chrisbilder.com www.chrisbilder.com • Gupta and Malina (Statistics in Medicine, 1999) provides a summary InfertilityInfertility PreventionPrevention ProgramProgram InfertilityInfertility PreventionPrevention ProgramProgram

Outline Outline ƒ BSG • U.S. national program funded by Centers for Disease ƒ BSG • Lindan et al. (J. Clinical Microbiology, 2005) ƒ Basics Control and Prevention ƒ Basics – Estimates that 12% of the laboratories in the U.S. are ƒ Estimation – Assess and reduce prevalence of chlamydia and ƒ Estimation already using group testing ƒ Identification gonorrhea ƒ Identification ƒ Covariates ƒ Covariates – Group testing has allowed “laboratories to achieve a ƒ NIH grant •Nebraska ƒ NIH grant significant increase in specimen loads.” – Swab or urine specimens are sent to the Nebraska Public • Quarter #1 of 2006, chlamydia testing Health Laboratory at U. of Nebraska Medical Center – Urine specimens – 1,384 total –NATs – Ignore sensitivity and specificity here – About 30,000 individual tests done per year – Individual data: 111/1,384 = 0.0802 • Group testing! – Group testing: • Randomly put known individual responses into groups of size I = 2

Slide 21 of 37 Slide 22 of 37 www.chrisbilder.com www.chrisbilder.com •

InfertilityInfertility PreventionPrevention ProgramProgram Heterogonous populations

Outline Outline ƒ BSG • Quarter #1 of 2006 (continued) ƒ BSG • Individual responses ƒ Basics ƒ Basics – Individual data: 111/1,384 = 0.0802 – Yik are independent Bernoulli(pik) random variables ƒ Estimation – Group testing: ƒ Estimation – p = P(Y = 1) for item i in group k ƒ Identification ƒ Identification ik ik ƒ Covariates ƒ Covariates • Group responses ƒ NIH grant ƒ NIH grant – Zk are independent Bernoulli(θk) random variables

– θk = P(Zk = 1) for group k • Covariates th th – xik1, xik2, …, xikp for the i item in the k group – Approximate cost per test – Incorporate factors which influence trait status • $16 for urine – Not really done until recently in group testing! • $11 for swab

Slide 23 of 37 Slide 24 of 37 www.chrisbilder.com www.chrisbilder.com Kenyan pregnant women study Heterogonous populations

Outline Outline ƒ BSG • Part of the data from Vansteelandt et al. (Biometrics, 2000) ƒ BSG • Model ƒ Basics ƒ Basics – logit(pik) = β0 + β1xik1 + … + βpxikp ƒ Estimation ƒ Estimation • Estimation of β , β , β , …, β ƒ Identification ƒ Identification 0 1 2 p ƒ Covariates ƒ Covariates – Note that Yik are not directly observed ƒ NIH grant ƒ NIH grant z1 = 1 – Vansteelandt et al. (Biometrics, 2000)

• Likelihood function is written in terms of the Zk

z2 = 1 –Xie(Statistics in Medicine, 2001)

• Likelihood function is written in terms of the Yik • EM algorithm used Slide 25 of 37 Slide 26 of 37 www.chrisbilder.com www.chrisbilder.com

Forming groups Forming groups

Outline Outline ƒ BSG •Alike ƒ BSG •Random ƒ Basics – Individuals with “similar” covariates are put into pools ƒ Basics – Individuals are assigned to pools at random ƒ Estimation – Smallest variability in parameter estimates ƒ Estimation – Emulates chronological if no dependence over time ƒ Identification ƒ Identification ƒ Covariates – How implement? ƒ Covariates •Different ƒ NIH grant • One covariate: Sort by covariate, then assign ƒ NIH grant – Pool individuals with covariates as different as possible successive individuals to pools – Emulates “worse case scenario” (?) • Multiple covariates: ? – Usually requires one to have all individual testing specimens up front and available for testing at the same time

Slide 27 of 37 Slide 28 of 37 www.chrisbilder.com www.chrisbilder.com Forming groups Forming groups

Outline Outline • Simulate data from model fitted to the individual • One simulated data set ƒ BSG ƒ BSG True Individual estimated ƒ Basics observations in Vansteelandt et al. (Biometrics, 2000) ƒ Basics Group estimated (alike) • Relative efficiency Group estimated (random) Group estimated (different) ƒ Estimation ƒ Estimation – logit(pik) = β0 + β1xik = –1.97 – 0.024xik ƒ Identification ƒ Identification ƒ Covariates – Simulate the individual and group responses ƒ Covariates ƒ NIH grant • I = 7 subjects per group ƒ NIH grant

• K = 100 groups Estimated probability • Overall sample size is I∗K = 700 0.00 0.05 0.10 0.15 0.20 0.25

15 20 25 30 35 40

Covariate

Slide 29 of 37 Slide 30 of 37 www.chrisbilder.com www.chrisbilder.com

Forming groups Forming groups

Outline Outline ƒ BSG • 100 simulated data sets ƒ BSG • Last slide examined a fixed I∗K ƒ Basics ƒ Basics • What if we fix the number of groups (tests), K, instead? ƒ Estimation ƒ Estimation – Settings ƒ Identification ƒ Identification ƒ Covariates ƒ Covariates • logit(pik) = –2 + 0.6931xik ƒ NIH grant ƒ NIH grant • xik ~ Uniform(–70.079, 1.663)

• 0.001 < pik < 0.3

• Average value of pik is 0.02 • 500 simulated data sets for each simulation – Relative efficiency: I K = 500 2 5 10 Alike 2.20 4.62 6.72 Random 1.61 1.79 1.50 Slide 31 of 37 • Pearson Slide 32 of 37 www.chrisbilder.com correlations: www.chrisbilder.com Different 1.16 0.51 0.22 NIH Grant NIH Grant

Outline Outline ƒ BSG • Content removed ƒ BSG • Content removed ƒ Basics ƒ Basics ƒ Estimation ƒ Estimation ƒ Identification ƒ Identification ƒ Covariates ƒ Covariates ƒ NIH grant ƒ NIH grant

Slide 33 of 37 Slide 34 of 37 www.chrisbilder.com www.chrisbilder.com

NIH Grant NIH Grant

Outline Outline ƒ BSG • Content removed ƒ BSG • Content removed ƒ Basics ƒ Basics ƒ Estimation ƒ Estimation ƒ Identification ƒ Identification ƒ Covariates ƒ Covariates ƒ NIH grant ƒ NIH grant

Slide 35 of 37 Slide 36 of 37 www.chrisbilder.com www.chrisbilder.com Outline ƒ BSG ƒ Basics ƒ Estimation Human or Cylon? ƒ Identification Group testing on the ƒ Covariates Group testing on the ƒ NIHProposed grant Battlestar Galactica

Christopher R. Bilder Department of Statistics University of Nebraska-Lincoln www.chrisbilder.com [email protected]

Slide 37 of 37 www.chrisbilder.com