Battlestar GalacticaGalactica
Outline Outline BSG BSG • Statistics and Basics Basics Battlestar Galactica Estimation Human or Cylon? Estimation • The story so far… Identification Group testing on the Identification Covariates Group testing on the Covariates – Video NIH grant Battlestar Galactica NIH grant
Christopher R. Bilder Department of Statistics University of Nebraska-Lincoln www.chrisbilder.com [email protected]
Slide 1 of 37 Slide 2 of 37 www.chrisbilder.com www.chrisbilder.com
Battlestar GalacticaGalactica Battlestar GalacticaGalactica
Outline Outline BSG • Statistics and BSG •Dr. Gaius Baltar Basics Battlestar Galactica Basics – Asked to develop a Estimation • The story so far… Estimation Cylon detector Identification Identification Covariates –Video Covariates • Season 1’s Bastille NIH grant • Cylons NIH grant Day episode –Centurion – # of Cylons in fleet is – Humanoid form expected to be small (new) – 47,905 individuals to • How can you test! distinguish a human from a Cylon?
Slide 3 of 37 Slide 4 of 37 www.chrisbilder.com www.chrisbilder.com Battlestar GalacticaGalactica Battlestar GalacticaGalactica
Outline Outline BSG • Dr. Gaius Baltar (continued) BSG • Dr. Gaius Baltar (continued) Basics – Season 1’s Tigh me up and Tigh me down Basics – Season 1’s Tigh me up and Tigh me down Estimation Estimation Identification Identification Covariates Covariates NIH grant NIH grant
– Video –Video
Slide 5 of 37 Slide 6 of 37 – (47,905 blood tests)∗(11 hours each) = 21,956 days www.chrisbilder.com www.chrisbilder.com
Battlestar GalacticaGalactica Battlestar GalacticaGalactica
Outline Outline BSG • Individual testing BSG • Group testing Basics Basics Estimation Estimation Identification Identification Covariates Covariates NIH grant NIH grant + or - + or - + or - + or - + or - + or - + or - + or - + or -
• If a GROUP is negative, then all 4 individuals are not + or - + or - + or - + or - + or - + or - • Problems: Cylons –Time • If the GROUP is positive, then at least ONE of the 4 individuals is a Cylon Slide 7 of 37 – Limited resources Slide 8 of 37 www.chrisbilder.com www.chrisbilder.com – “Retesting” can be done to determine who is a Cylon Battlestar GalacticaGalactica Other examples
Outline Outline BSG • Group testing (continued) BSG • Screening blood donations Basics – Time savings Basics – American Red Cross uses groups of size 16 Estimation – Save resources Estimation – HIV, Hepatitis B, Hepatitis C, … Identification Identification Covariates – Strategy works well when prevalence of a “trait” is Covariates – Screen about 6 million a year NIH grant small NIH grant • Source: Roger Dodd, Executive Director of Blood • If prevalence is large, all groups may test positive Services R & D at ARC • See Dodd et al. (Transfusion, 2002) • Drug discovery experiments – Screen hundreds of thousands of chemical compounds to look for potentially good ones – Remlinger et al. (Technometrics, 2006)
Slide 9 of 37 Slide 10 of 37 www.chrisbilder.com www.chrisbilder.com
Other examples NotationNotation
Outline Outline BSG • Multiple vector transfer design experiments BSG • Individual responses Basics Basics th th – Estimate probability an insect vector – Yik = 1 if the i item in the k group has the “trait” Estimation transfers a pathogen to a plant Estimation (positive) and Identification Identification Yik = 0 otherwise (negative) for i=1, …, I and k=1, …, K Covariates – Swallow (Phytopathology, 1985, 1987) Covariates NIH grant • Veterinary NIH grant – Yik are independent Bernoulli(p) random variables – Bovine viral diarrhea in cattle (Peck, Beef, 2006) • p = P(Yik = 1) – Avian pneumovirus (APV) in turkeys (Maherchandani • Homogenous population et al., J. Veterinary Diagnostic Investigation, 2004) • p can be thought of as the “individual probability” or • Public health studies “prevalence in a population”
– Prevalence of HCV (Liu et al., Transfusion, 1997) – Yik’s are not directly observed (at least initially) – Prevalence of HIV (Verstraeten et al., Trop. Med. & International Health, 2000)
Slide 11 of 37 Slide 12 of 37 www.chrisbilder.com www.chrisbilder.com NotationNotation NotationNotation
Outline Outline BSG • Group responses BSG • Example random variables Basics Basics – Zk = 1 denotes a positive response + or − + or − + or − + or − + or − + or − Estimation th Estimation Zk = 0 denotes a negative response for the k group Identification Identification Covariates – Zk are independent Bernoulli(θ) random variables Covariates
NIH grant • θ = P(Zk = 1) NIH grant • Individual and group response relationship + or - + or - + or -
– Zk = 1 if and only if Zk = 0 if and only if
+ or − + or − + or − + or − + or − + or −
Slide 13 of 37 Slide 14 of 37 www.chrisbilder.com www.chrisbilder.com
NotationNotation NotationNotation
Outline Outline BSG • Example observed values BSG • Example observed values Basics Basics Basics Basics y11 = 0 y21 = 0 y12 = 0 y22 = 1 y13 = 0 y23 = 0 Estimation Estimation Identification Identification Covariates Covariates NIH grant NIH grant
- + + z1 = 0 z2 = 1 z3 = 1
y31 = 0 y41 = 0 y32 = 0 y42 = 0 y33 = 0 y43 = 1
Slide 15 of 37 Slide 16 of 37 www.chrisbilder.com www.chrisbilder.com PurposePurpose EstimateEstimate p
Outline Outline BSG • Prevalence of a trait in a population (estimation problem) BSG • How can we estimate p = P(Yik = 1)? Basics • Which items are positive (identification problem) Basics – We observe information about the groups, not Estimation Estimation individuals! Identification Identification I Covariates Covariates – θ = 1 – P(Yik = 0, ∀i) = 1 – (1 – p) NIH grant NIH grant – Then p = 1 – (1 – θ)1/I – MLE for p: • Unequal group sizes – Likelihood function
where
θk = positive probability for group k Slide 17 of 37 Slide 18 of 37 www.chrisbilder.com www.chrisbilder.com Ik = size of group k
Testing error Identification
Outline Outline BSG • What if there is testing error? BSG • Dorfman (Annals of Mathematical Statistics, 1943) Basics Basics – Can incorporate sensitivity (η) and specificity (δ) – Retest all items in a positive group y12 = 0 y22 = 1 Estimation – Estimation – Often credited for the very first use Identification Identification Covariates Covariates of group testing NIH grant NIH grant •Sterrett(Annals of Mathematical Statistics, 1957) z2 = 1 – Individually retest until first positive is found – Re-group remaining items • If group is negative, STOP y32 = 0 y42 = 0 • If group is positive, repeat – Expected number retests is smaller than Dorfman
Slide 19 of 37 Slide 20 of 37 www.chrisbilder.com www.chrisbilder.com • Gupta and Malina (Statistics in Medicine, 1999) provides a summary InfertilityInfertility PreventionPrevention ProgramProgram InfertilityInfertility PreventionPrevention ProgramProgram
Outline Outline BSG • U.S. national program funded by Centers for Disease BSG • Lindan et al. (J. Clinical Microbiology, 2005) Basics Control and Prevention Basics – Estimates that 12% of the laboratories in the U.S. are Estimation – Assess and reduce prevalence of chlamydia and Estimation already using group testing Identification gonorrhea Identification Covariates Covariates – Group testing has allowed “laboratories to achieve a NIH grant •Nebraska NIH grant significant increase in specimen loads.” – Swab or urine specimens are sent to the Nebraska Public • Quarter #1 of 2006, chlamydia testing Health Laboratory at U. of Nebraska Medical Center – Urine specimens – 1,384 total –NATs – Ignore sensitivity and specificity here – About 30,000 individual tests done per year – Individual data: 111/1,384 = 0.0802 • Group testing! – Group testing: • Randomly put known individual responses into groups of size I = 2
Slide 21 of 37 Slide 22 of 37 www.chrisbilder.com www.chrisbilder.com •
InfertilityInfertility PreventionPrevention ProgramProgram Heterogonous populations
Outline Outline BSG • Quarter #1 of 2006 (continued) BSG • Individual responses Basics Basics – Individual data: 111/1,384 = 0.0802 – Yik are independent Bernoulli(pik) random variables Estimation – Group testing: Estimation – p = P(Y = 1) for item i in group k Identification Identification ik ik Covariates Covariates • Group responses NIH grant NIH grant – Zk are independent Bernoulli(θk) random variables
– θk = P(Zk = 1) for group k • Covariates th th – xik1, xik2, …, xikp for the i item in the k group – Approximate cost per test – Incorporate factors which influence trait status • $16 for urine – Not really done until recently in group testing! • $11 for swab
Slide 23 of 37 Slide 24 of 37 www.chrisbilder.com www.chrisbilder.com Kenyan pregnant women study Heterogonous populations
Outline Outline BSG • Part of the data from Vansteelandt et al. (Biometrics, 2000) BSG • Model Basics Basics – logit(pik) = β0 + β1xik1 + … + βpxikp Estimation Estimation • Estimation of β , β , β , …, β Identification Identification 0 1 2 p Covariates Covariates – Note that Yik are not directly observed NIH grant NIH grant z1 = 1 – Vansteelandt et al. (Biometrics, 2000)
• Likelihood function is written in terms of the Zk
z2 = 1 –Xie(Statistics in Medicine, 2001)
• Likelihood function is written in terms of the Yik • EM algorithm used Slide 25 of 37 Slide 26 of 37 www.chrisbilder.com www.chrisbilder.com
Forming groups Forming groups
Outline Outline BSG •Alike BSG •Random Basics – Individuals with “similar” covariates are put into pools Basics – Individuals are assigned to pools at random Estimation – Smallest variability in parameter estimates Estimation – Emulates chronological if no dependence over time Identification Identification Covariates – How implement? Covariates •Different NIH grant • One covariate: Sort by covariate, then assign NIH grant – Pool individuals with covariates as different as possible successive individuals to pools – Emulates “worse case scenario” (?) • Multiple covariates: ? – Usually requires one to have all individual testing specimens up front and available for testing at the same time
Slide 27 of 37 Slide 28 of 37 www.chrisbilder.com www.chrisbilder.com Forming groups Forming groups
Outline Outline • Simulate data from model fitted to the individual • One simulated data set BSG BSG True Individual estimated Basics observations in Vansteelandt et al. (Biometrics, 2000) Basics Group estimated (alike) • Relative efficiency Group estimated (random) Group estimated (different) Estimation Estimation – logit(pik) = β0 + β1xik = –1.97 – 0.024xik Identification Identification Covariates – Simulate the individual and group responses Covariates NIH grant • I = 7 subjects per group NIH grant
• K = 100 groups Estimated probability • Overall sample size is I∗K = 700 0.00 0.05 0.10 0.15 0.20 0.25
15 20 25 30 35 40
Covariate
Slide 29 of 37 Slide 30 of 37 www.chrisbilder.com www.chrisbilder.com
Forming groups Forming groups
Outline Outline BSG • 100 simulated data sets BSG • Last slide examined a fixed I∗K Basics Basics • What if we fix the number of groups (tests), K, instead? Estimation Estimation – Settings Identification Identification Covariates Covariates • logit(pik) = –2 + 0.6931xik NIH grant NIH grant • xik ~ Uniform(–70.079, 1.663)
• 0.001 < pik < 0.3
• Average value of pik is 0.02 • 500 simulated data sets for each simulation – Relative efficiency: I K = 500 2 5 10 Alike 2.20 4.62 6.72 Random 1.61 1.79 1.50 Slide 31 of 37 • Pearson Slide 32 of 37 www.chrisbilder.com correlations: www.chrisbilder.com Different 1.16 0.51 0.22 NIH Grant NIH Grant
Outline Outline BSG • Content removed BSG • Content removed Basics Basics Estimation Estimation Identification Identification Covariates Covariates NIH grant NIH grant
Slide 33 of 37 Slide 34 of 37 www.chrisbilder.com www.chrisbilder.com
NIH Grant NIH Grant
Outline Outline BSG • Content removed BSG • Content removed Basics Basics Estimation Estimation Identification Identification Covariates Covariates NIH grant NIH grant
Slide 35 of 37 Slide 36 of 37 www.chrisbilder.com www.chrisbilder.com Outline BSG Basics Estimation Human or Cylon? Identification Group testing on the Covariates Group testing on the NIHProposed grant Battlestar Galactica
Christopher R. Bilder Department of Statistics University of Nebraska-Lincoln www.chrisbilder.com [email protected]
Slide 37 of 37 www.chrisbilder.com