BOSTON UNIVERSITY
SCHOOL OF MEDICINE
Thesis
CHARACTERIZATION OF ERROR TRADEOFFS IN
HUMAN IDENTITY COMPARISONS: DETERMINING A COMPLEXITY
THRESHOLD FOR DNA MIXTURE INTERPRETATION
By
JACOB SAMUEL GORDON
A.B., Harvard University, 2005
Submitted in partial fulfillment of the
requirements for the degree of
Master of Science
2012
© Copyright by JACOB SAMUEL GORDON 2012
Approved by
First Reader ______Catherine M. Grgicak, M.S.F.S., Ph.D. Instructor, Biomedical Forensic Sciences
Second Reader ______Robin W. Cotton, Ph.D. Associate Professor, Biomedical Forensic Sciences
ACKNOWLEDGEMENTS
Catherine, you made it your personal mission to ensure that this thesis was a
success. You were willing to take me on as your mentee with a compressed timeline,
introduced me to an engaging problem that I enjoyed attacking, and accommodated many
post-work conferences. Thank you for helping me finish and for providing me with
direction throughout the last couple months.
Dr. Cotton, I have received nothing short of endless support since you became my academic adviser. You helped steer me through the program and are always thinking of interesting people with whom I should speak and churning away at ideas to help guide my future. Thank you for helping me refine my thesis thoughts down the stretch.
Mom and Dad, I truly would have nothing without you. You have given me everything of yours and sacrificed throughout my life so that I never missed out on anything. I have learned every important skill and life strategy from you and have not achieved anything without your endless support, calming patience, guidance, and fajitas.
Melissa and Amy, you have always been blindly loyal to me through triumphs and setbacks, and the only times you haven’t been in my corner were when you were battling me yourselves to ensure that I was imbued with just the right amount of pervasive humbling. You are my best friends.
Because of you four, I am the happiest guy I know.
iv
CHARACTERIZATION OF ERROR TRADEOFFS IN
HUMAN IDENTITY COMPARISONS: DETERMINING A COMPLEXITY
THRESHOLD FOR DNA MIXTURE INTERPRETATION
JACOB SAMUEL GORDON
Boston University School of Medicine, 2012
Major Professor: Catherine M. Grgicak, M.S.F.S., Ph.D., Instructor, Biomedical Forensic Sciences
ABSTRACT
DNA analysts considering a forensic evidence sample and a reference sample
(e.g., from a suspect) have three options when rendering a decision with regard to the consistency between the samples: exclusion, inclusion, inconclusive. Complicating this determination is the reality that DNA profiles originating from forensic evidence may not be fully observed due to allelic drop-out and/or the presence of overlapping alleles.
Different analyst inclinations and laboratory standards exist for informing an analyst’s decision; typically—and particularly for samples demonstrating some degree of allelic drop-out—reference samples exhibiting less than exactly 100% allelic overlap are not automatically precluded from inclusion in an evidence sample. In tolerating some measure of absence of a reference sample’s alleles in an evidence sample, the potential for two kinds of errors exists: In a case in which an individual could not have contributed to an evidence sample, there is the potential for false inclusion; in a case in which an individual could have contributed, there is the potential for false exclusion. In selecting a particular decision criterion to inform determinations of inclusion or exclusion, a tradeoff
v
between these antagonistic errors exists. A lax decision criterion minimizes false exclusions at the expense of false inclusions while a strict criterion eschews false inclusions at the expense of greater numbers of false exclusions. The relevance of a decision criterion is greatest for low-template samples and for samples that are mixtures of multiple contributors since both are likely to experience allelic drop-out and thus to occupy a potential gray area between certain exclusion and likely inclusion.
In this study, databases of simulated mixtures and laboratory mixtures are compared with databases of simulated excluded and included individuals. In order to generate credible genetic profiles, the phenomena of allelic drop-out and profile mixing are modeled. Given this framework, the universe of possible decision criteria is explored.
Receiver Operating Characteristic curves, a type of analysis originally applied to assessing World War II radar performance, are adopted as a paradigm for summarizing the tradeoff of both types of errors. The a priori balancing of these errors as specified by a laboratory’s standard operating procedures defines a “complexity threshold” that will determine, before the process of sample interpretation is undertaken or inclusion/exclusion statistics are calculated, whether a mixture or low copy number sample ultimately holds any evidentiary value. When a sample does in fact hold evidentiary value, the complexity threshold will have specified a predetermined decision point to inform determinations of exclusion versus inclusion.
vi
TABLE OF CONTENTS
TITLE PAGE……………………………………………………………………………... i
COPYRIGHT PAGE………………………………………………………………………ii
READER APPROVAL PAGE…………………………………………………………... iii
ACKNOWLEDGEMENTS...... iv
ABSTRACT...... v
TABLE OF CONTENTS...... vii
LIST OF TABLES...... x
LIST OF FIGURES ...... xi
1 INTRODUCTION ...... 1
1.1 Comparing Forensic Evidence Samples and Reference Samples...... 3
1.1.1 DNA Profile Comparison ...... 3
1.1.2 Conclusions Arising from Profile Comparisons...... 5
1.1.3 Inclusion Statistics...... 8
1.2 Consideration of Error Rates ...... 15
1.2.1 Error Rate Analysis Using the Paradigm of Receiver Operating
Characteristics...... 15
2 METHODS ...... 19
2.1 Overview...... 19
2.2 Error Analysis Study Using Simulated Mixture Data ...... 20
2.2.1 Simulation Materials...... 20
2.2.2 Simulation Model...... 20
vii
2.2.2.1 Modeling Allele Drop-out...... 28
2.2.3 Generating Populations for Comparison...... 29
2.2.3.1 Simulating Mixtures...... 29
2.2.3.2 Simulating Excluded Individuals...... 32
2.2.3.3 Simulating Included Individuals...... 33
2.2.4 Comparing Populations and Counting Allelic Discrepancies...... 36
2.3 Validation Study Using Laboratory Mixture Data ...... 36
2.3.1 Profile Typing Materials and Methods...... 37
2.4 Data Interpretation Framework...... 37
2.4.1 Organizing Comparison Results...... 37
2.4.2 Making Determinations of Exclusion or Inclusion...... 40
3 RESULTS AND DISCUSSION...... 44
3.1 Error Analysis Study Using Simulated Data ...... 44
3.1.1 Comparing Simulated Mixtures to Simulated Excluded Individuals ...... 44
3.1.2 Comparing Simulated Mixtures to Simulated Included Individuals...... 50
3.2 Validation Study Using Laboratory Mixture Data ...... 53
3.2.1 Comparing Laboratory Mixtures to Simulated Excluded Individuals...... 53
3.2.2 Comparing Laboratory Mixtures to Simulated Included Individuals...... 55
3.3 Impact of Analyst Decision Threshold on Expected Errors ...... 56
3.3.1 Receiver Operating Characteristic (ROC) Analysis ...... 57
3.3.1.1 Rollup ROC Results for Simulated Mixture Data ...... 60
3.3.1.2 Rollup ROC Results for Laboratory Mixture Data...... 68
viii
4 CONCLUSION...... 70
5 REFERENCES ...... 73
6 VITA...... 79
ix
LIST OF TABLES
Table 1: Example Profile Comparison between Three Reference Samples and an
Evidence Sample...... 3
Table 2: Inclusion/Exclusion Criteria from The German Stain Commission:
Recommendations for the Interpretation of Mixed Stains (reproduced from [7])...... 5
Table 3: Steps to Mixture Interpretation from the DNA Commission of the International
Society of Forensic Genetics (reproduced from [10]) ...... 7
Table 4: Mixture Classification Scheme from German Stain Commission and SWGDAM
(“Characteristics” below quoted from [7]) ...... 8
Table 5: Guidelines for Interpretation of Likelihood Ratios (reproduced from [21]) ...... 12
Table 6: Contingency Matrix Correlating Analyst Decision with the Underlying Reality
...... 17
Table 7: Example of Excluded-Individual-to-Mixture Comparison...... 49
Table 8: Notional Table of False Positive & True Positive Rates for Different Decision
Thresholds...... 59
x
LIST OF FIGURES
Figure 1: Dirac Delta Function Plots of Genetic Model Used in Simulating Profiles...... 21
Figure 2: Dirac Delta Function Representation of an Example Single-Source Profile .... 24
Figure 3: Graphical Matrix Representation of a Representative Single-Source Profile... 25
Figure 4: Dirac Delta Function Representation of Example Mixture Profile from Person 1 and Person 2...... 26
Figure 5: Graphical Matrix Representation of Example Mixture Profile from Person 1 and
Person 2...... 27
Figure 6: Flow Describing Pristine Mixture Profile Generation ...... 31
Figure 7: Flow Used to Generate Perturbed Mixture Profiles ...... 31
Figure 8: Integrated Flow Describing Generation of Pristine Mixture Profiles & Excluded
Individuals...... 32
Figure 9: Flow Describing Generation of Included Individuals (From Simulated Mixture
Profiles)...... 35
Figure 10: Example Histogram Tabulating Discrepancies Between 10,000 References and
1 Mixture...... 38
Figure 11: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies. 39
Figure 12: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies
(two-color) ...... 40
Figure 13: Summary histograms: Excluded-Individual-to-Mixture Comparisons (No
Drop-Out)...... 46
xi
Figure 14: Results Comparing Excluded Individuals to Simulated Mixtures (No Drop-
Out) ...... 47
Figure 15: Results Comparing Excluded Individuals to Simulated Mixtures,
Pr(D)| φ=1 ≠0.00 ...... 48
Figure 16: Results Comparing Included Individuals to Simulated Mixtures (No Drop-
Out) ...... 51
Figure 17: Results Comparing Included Individuals to Simulated Mixtures,
Pr(D)| φ=1 ≠0.00 ...... 52
Figure 18: Results Comparing Excluded Individuals to 1:1 Laboratory Mixtures...... 54
Figure 19: Results Comparing Included Individuals to Laboratory Mixtures...... 55
Figure 20: Example Simultaneous Visualization of Exclusion & Inclusion Comparisons
...... 58
Figure 21: Example ROC Plot...... 59
Figure 22: Compilation of Exclusion & Inclusion Data Using Simulated Mixtures...... 61
Figure 23: Error Analysis Results: ROC Rollup Using Simulated Mixtures ...... 62
Figure 24: Notional Complexity Threshold: Simulated Mixture Results...... 63
Figure 25: More Realistic Complexity Threshold: Simulated Mixture Results ...... 65
Figure 26: Complexity Threshold Based on Blackstone’s Ratio: Simulated Mixture
Results...... 67
Figure 27: Compilation of Exclusion & Inclusion Data Using Laboratory Mixtures ...... 68
Figure 28: Complexity Threshold: Laboratory Mixture Results ...... 69
xii
1 INTRODUCTION
The goal of an analyst when considering DNA evidence is to determine whether a reference sample could have contributed to the genetic profile expressed in an evidence sample. The final determination as to whether the two samples could have originated from the same source may have significant implications in an ongoing investigation or for how compelling a jury finds a prosecutor’s theory of a crime (or, alternatively, how compelling they find a suspect’s defense).
At its core, identifying individuals through DNA profiling is accomplished by examining particular regions (i.e., loci) that are part of an individual’s genetic makeup and are highly variable (i.e., polymorphic) among individuals. These loci are contained on two sets of structures (i.e., chromosomes) containing a person’s DNA, with one set inherited from each parent. A person’s DNA sequence is defined by a collection of four nucleotides—adenine, cytosine guanine, and thymine, conventionally abbreviated as A,
C, G, and T, respectively.
The use of DNA as a “genetic fingerprint” was introduced in 1985 by Alec
Jeffries et al. [1] who described a method of simultaneously detecting “hypervariable
‘minisatellite’ regions” in human DNA—in other words, a collection of exploitable polymorphic loci—that could collectively be of forensic utility to individuate a person
[2]. Around the same time, Kary Mullis et al . [3] published their work on the polymerase
chain reaction (PCR), a method that allowed for the amplification and quantification of
trace amounts of DNA, which are often encountered in forensic work. More modern
methods have introduced increased automation [4] and efficiency [5] to the process, but
1
the seminal works of Jeffries et al . and Mullis et al . formed the foundation on which
DNA profiling has grown. The introduction of PCR technology into forensic laboratories was accompanied by a shift from using large minisatellite regions, consisting of 1000s –
10,000s of bases to short tandem repeats (STRs), regions that typically range from ~70 –
~500 base pairs. STRs are similar to variable number of tandem repeat (VNTR) regions in that they both consist of tandemly repeating units of DNA, where the number of repeat units (i.e., alleles) is variable in the population. However, STRs are shorter, which allows for efficient amplification of these regions and for increased sensitivity in DNA profiling.
At its heart, successfully “typing” or “profiling” an individual amounts to a signal
detection problem. First, biological evidence is collected from a crime scene, evidence
sample, or person of interest. Sources of biological evidence that can yield DNA samples
include—but are not limited to—blood, semen, saliva, urine, feces, teeth, bone, hair, skin
cells, or other biological tissue samples [6]. DNA is then extracted from the biological
sample and quantified using quantitative polymerase chain reaction (qPCR). Next, a
separate round of PCR amplifies the STR loci of interest. Detection is accomplished by
separating amplicons based on allele size and by the use of fluorescent dyes. The
identification of which alleles are expressed in the sample is determined by applying a
threshold to distinguish a collection of fluorescently-detected electronic signals—allele
peak heights/areas—from a background of noise in an electropherogram. The results are
then interpreted and a comparison between an unknown (i.e., the evidence) and a known
(i.e., the standard) is made.
2
Although getting to the point of producing a DNA profile from a sample involves
many nuanced steps—from collecting and storing an evidence sample to extracting and
amplifying the evidentiary sample’s genetic profile—the DNA profile that is ultimately
produced consists of the STR alleles at each locus tested, where 13 – 16 STR loci are
generally typed during forensic DNA testing.
1.1 Comparing Forensic Evidence Samples and Reference Samples
1.1.1 DNA Profile Comparison
The first step before comparing evidence and reference samples involves
amplifying the evidence sample’s loci of interest and interpreting which alleles are
present. This process leads to an evidence profile, which consists of a list of alleles
observed to be present at each amplified locus within the evidence sample. The reference
sample is also amplified and its observed alleles noted. This leads to a reference profile.
The comparison of the two involves determining which of the reference alleles are
present in the evidence sample.
Table 1 provides an example comparison between the alleles detected at four loci
for three reference samples and those detected for an evidence sample.
Table 1: Example Profile Comparison between Three Reference Samples and an Evidence Sample The alleles present at each locus in the Evidence Sample are compared with those present at the same locus for a reference sample. Each reference sample is considered in turn. Alleles Locus 1 Locus 2 Locus 3 Locus 4 Evidence Sample 5, 10 5, 6, 7, 8 9, 10, 11 4, 9, 13 Reference 1 5, 10 5, 8 10, 11 9, 9 Reference 2 5, 10 7, 8 9, 11 4, 13 Reference 3 5, 10 6, 6 10, 10 8, 9
3
Assuming all alleles are detected in the evidence sample, a comparison between the alleles at each of the loci from each reference sample would lead an analyst to a conclusion for that reference sample. For example, the alleles present in Reference 1 are also present at every locus in the Evidence Sample. If all alleles are detected and peak height and contributor ratios are not considered, Reference 1 would be “included” as a potential contributor to the DNA mixture in Evidence Sample 1. Similarly, Reference 2 would also be included as a potential contributor since all alleles present in the individual’s genotype are also present in the evidence profile. In contrast, not all of
Reference 3’s alleles are present in the Evidence Sample; while all of the alleles at Loci
1, 2, and 3 are present, Allele 8 at Locus 4 is not present in the Evidence Sample. A strict insistence that every reference allele be present in the evidence would lead an analyst to
“exclude” Reference 3. However, if allelic drop-out is suspected due to low-level amplification conditions, Reference 3 may be included as a potential contributor despite the allelic discrepancy at Locus 4.
Under ideal circumstances, the inclusion or exclusion of an individual as a contributor to an item of evidence would be straightforward: If—and only if—100% of the reference alleles are detected in the evidence sample can a reference be included as a contributor to the evidence; otherwise, the reference is excluded. In reality, such determinations are not so straightforward due to allelic drop-out during low-template
DNA amplification, as evidenced by the example contained in Table 1.
4
1.1.2 Conclusions Arising from Profile Comparisons
Given profiles generated from evidence and reference samples, a DNA analyst
compares the two and attempts to make a determination as to whether the contributor of
the reference sample could have contributed to the evidence sample. Thoughtful
comparison of the two samples leads an analyst to one of three conclusions, whose
criteria have been delineated by The German Stain Commission [7] and are contained in
Table 2.
Table 2: Inclusion/Exclusion Criteria from The German Stain Commission: Recommendations for the Interpretation of Mixed Stains (reproduced from [7] ) Conclusion Criteria Inclusion If all alleles of a person in question are uniformly present in a mixed stain, the person shall be considered a possible contributor to the stain.
Exclusion If alleles of a person in question are not present in a mixed stain, the person shall not be considered as a possible contributor to the stain.
Gray Area between The following effects may occur in [mixtures with no major Inclusion and component(s) and evidence of stochastic effects] due to Exclusion imbalances between the mixture components and may cause difficulties in reaching an unambiguous decision about inclusion or exclusion across all analyzed DNA systems: - Locus drop out and allelic drop out (e.g., caused by the sensitivity of the amplification system, as well as by stochastic effects). - Allelic drop out is more likely to occur for longer than for shorter alleles, and in particular for DNA systems with long amplicon sizes.
If the samples do not amplify efficiently, there are too many contributors present in an evidence stain, and/or there is too little starting template, no reliable comparison can be made between reference and evidence profiles. The ability of an analyst to accurately
5
conclude whether a reference sample should be included as a possible contributor to an evidence sample is dependent upon the quality or “complexity” of the sample. Currently no standard to determine such a “quality factor” is offered in the literature. Assuming both profiles amplify and produce readable electropherograms that are not overwhelmingly diluted in discriminatory power due to the presence of many contributors, assessing allelic commonality is trivial in the case where there are no common alleles between samples: The individual could not have contributed to the evidence sample. The assessment is equally trivial in the case of complete allelic overlap: The individual almost certainly contributed. The case where more-than-zero and less-than-all of the alleles are in common is the case of interest and the case most relevant for considering forensic evidence profiles, which are often subject to some combination of complicating factors.
One metric that describes the “degree of exclusion” of two profiles is the number of allelic discrepancies between them; here, “degree of exclusion” is defined as the extent to which sets of alleles from different samples fail to overlap. As the probability of drop- out increases, the number of discrepancies between a true contributor and an evidence sample increases. Work by Tvedebrink et al . [8] has attempted to determine the
probability of drop-out with respect to an allele’s average electropherogram peak height
(corrected for diploidy); the peak heights are taken to be robust indicators of quantity of
DNA contributed [9]. But little work on defining the implications for an analyst’s ability
to accurately include or exclude a contributor has been provided.
6
The steps to interpreting a mixture—according to the DNA Commission of the
International Society of Forensic Genetics [10]—are contained in Table 3.
Table 3: Steps to Mixture Interpretation from the DNA Commission of the International Society of Forensic Genetics (reproduced from [10] ) Interpretation Steps Action Step 1 Identify the presence of a mixture
Step 2 Designation of allelic peaks
Step 3 Identify the number of contributors in the mixture
Step 4 Estimation of the mixture proportion or ratio of the individuals contributing to the mixture
Step 5 Consideration of all possible genotype combinations
Step 6 Compare reference samples
However, there is a decision that is actually or implicitly made prior to Step 3: Will this mixture sample lead to a credible interpretation? In other words, is the mixture of sufficient “quality”—i.e., “not too complex”—to be confidently interpreted? Mixture
“types” have also been proposed and have generally been parsed into three main types.
An example scheme for mixture typing—according to the German Stain Commission [7] and the Scientific Working Group on DNA Analysis Methods (SWGDAM) Mixture
Interpretation Guidelines [11]—is contained in Table 4.
7
Table 4: Mixture Classification Scheme from German Stain Commission and SWGDAM (“Characteristics” below quoted from [7] ) Mixture Type Characteristics Type A Mixtures / No obvious major contributor with no evidence of stochastic Indistinguishable effects Mixtures
Type B Mixtures / Clearly distinguishable major and minor DNA components; Distinguishable consistent peak height ratios of approximately 4:1 (major to Mixtures minor component) across all heterozygous systems, and no evidence of stochastic effects
Type C Mixtures / No major component(s) and evidence of stochastic effects Uninterpretable Mixtures
1.1.3 Inclusion Statistics
Any legal declaration of consistency between a reference standard and an evidentiary profile, dubbed inclusion in the forensics community, must be contextualized statistically [11]. In the realm of DNA analysis, this statistic—however calculated—aims at informing the jury of the probability that a randomly selected individual could not be eliminated as a contributor to a given stain or of how much more likely the prosecution’s hypothesis is than the defense’s hypothesis. The first kind of statistic is calculated via the
“Random Man Not Excluded” (RMNE ) method [12] while the latter is calculated via the
“Likelihood Ratio” (LR ) method [13].
Calculation of the RMNE statistic—or, synonymously, of the “Combined
Probability of Inclusion” ( CPI )—is performed by considering all combinations of
feasible genotypes that could have contributed to the evidence sample. Only evidence
sample loci where alleles are above the stochastic threshold are considered in the 8
computation. Notably, when “unrestricted” CPI is invoked as the statistical method, quantitative information such as peak height or area is ignored. Although the number of contributors theoretically has no bearing on whether CPI could be used as the statistic, it
does have a significant effect on the assumption that all alleles have been detected.
Despite this consideration, the RMNE approach may be considered appropriate when
dealing with an indistinguishable evidence mixture where drop-out of alleles from all
contributors is deemed unlikely. [14,15]
For a sample consisting of m loci, with each locus L containing n alleles {αL,1 ,
αL,2 , …, αL,n } with associated probabilities of occurrence p( αL,1 ), p( αL,2 ), …, p( αL,n ), the
Random Man Not Excluded ( RMNE ) statistic is given by Equation 1.
m m n 2 RMNE = CPI = ∏ PI L = ∏ ∑ p(α L, A ) L =1 L =1 A=1
Equation 1: Random Man Not Excluded (RMNE ) Statistic m ≡ total number of loci contained in sample n ≡ number of alleles contained at a particular locus αL,A ≡ a particular allele A at a particular locus L p( αL,A ) ≡ probability of αL,A occurring in population PI L ≡ probability of inclusion at locus L CPI ≡ combined probability of inclusion
Accepting the stipulations of Recommendation 4.1 of the Second National Research
Council Report (NRC-II) [16], which requires an assumption of within-locus independence based on Hardy-Weinberg equilibrium and the associated inbreeding coefficient θ, yields Equation 2. Unlike the RMNE statistic, the LR must assume a number of mixture contributors to the evidence sample and employs knowledge of a reference sample’s genotype.
9
m n 2 n RMNE = ∏ ∑ p(α L,A ) + θ ⋅ ∑ p(α L,A ) ⋅ []1 − p(α L,A ) L=1 A=1 A=1
Equation 2: RMNE Using NRC-II Recommendation 4.1 m ≡ total number of loci contained in sample n ≡ number of alleles contained at a particular locus αL,A ≡ a particular allele A at a particular locus L p( αL,A ) ≡ probability of αL,A occurring in population θ ≡ inbreeding coefficient
A LR can be calculated either with or without taking peak height information into
account as well as with or without taking the probability of allelic drop-out into account.
The likelihood ratio gives the ratio of the conditional probabilities of encountering an
evidence profile given competing prosecution and defense hypotheses. While the
likelihood ratio makes use of more information (e.g., a reference sample’s genotype) in a
more robust manner, it is both harder to calculate and fundamentally reliant on
assumptions (e.g., number of mixture contributors, defense hypotheses) that can be
difficult to justify. In the most complex scenarios, multiple conclusions for various
numbers of contributors may need to be stated for a single item of evidence. This would
results in two or more LR ’s with no clear indication of which LR is the best estimate.
[15,17-19] For a given evidentiary profile E and two competing hypotheses—i.e., the prosecutor’s hypothesis HP and the defense’s hypothesis HD—the likelihood ratio LR is given by Equation 3.
10
Pr( E | H ) LR = P Pr( E | H D )
Equation 3: Likelihood Ratio ( LR ) Statistic Pr(E|HP) ≡ probability of encountering an evidence profile E given prosecution hypothesis HP Pr(E|H D) ≡ probability of encountering an evidence profile E given defense hypothesis HD
Equation 4 is based on the general formulation by Weir et al . [20] for computing the
likelihood ratio for x unknown contributors carrying a set of alleles U that are included in evidence sample E under a given hypothesis H.
Pr (U | E) LR = x H P Pr (U | E) x H D
Equation 4: General Formulation of Likelihood Ratio ( LR ) E ≡ evidence sample containing a set of alleles HP ≡ prosecution hypothesis HD ≡ defense hypothesis UH|E ≡ set of alleles contained in evidence sample E but not contributed by contributors specified by hypothesis H Pr x(U H|E) ≡ probability, given x contributors, of observing allele set U given evidence sample E and contributors specified by hypothesis H
The mathematical rigor involved in computing a statistic as well as the apparent precision of the numerical result belie the complexity associated with asking analysts, attorneys, judges, and juries to actually interpret the number that is ultimately calculated.
If the evidence is neither overwhelmingly strong nor underwhelmingly weak, deciding what weight to assign to the evidence can be subjective. Evett and Weir [21] furnish a framework that attempts to guide interpretation of the statistical result by linking a quantitative likelihood ratio with a qualitative indication of the degree to which the evidence backs the prosecution’s hypothesis; the prescriptions are found in Table 5. This
11
scheme is helpful in eliminating some subjectivity from the evidence support but fails to connect qualitative notions of evidence strength with quantitative notions of error incidence. “Limited” support, for instance, may equate to error rates in the determination of a reference sample’s inclusion that are unacceptably high.
Table 5: Guidelines for Interpretation of Likelihood Ratios (reproduced from [21] ) Likelihood Ratio Range Degree of Support Provided by Evidence for Prosecution’s Hypothesis 1 to 10 Limited
10 to 100 Moderate
100 to 1000 Strong
1000 and greater Very Strong
Word notes, “The statement of ‘inclusion’ under these scenarios may have little meaning… ‘Inconclusive’ or ‘insufficient for comparison purposes’ may be the more appropriate conclusion in some cases” [22].
Despite the interpretive difficulties involved in using an inclusion statistic, the literature is filled with an abundance of scholarly work dedicated to reporting inclusion probabilities and likelihood ratios for multifarious scenarios—when only qualitative allelic information is considered [23], when the number of contributors is ambiguous
[24,25], when multiple hypotheses are entertained [26], when a reference sample is identified via a database search [27], and when reporting prior odds affects interpretation of the LR [28,29]. By contrast, little has been published on “exclusion criteria” or
“complexity criteria.” Although the LR approach attempts to handle allelic loss through
incorporation of the probability of drop-out in the computation [30], a minimal amount of
12
work has been published regarding a criterion that may be used by analysts to decide when a profile contains usable information. That is, is there a point when an evidentiary
profile contains too many contributors and/or too much drop-out to be considered a
reliable item of evidence for comparison purposes? Blackstone’s ratio, a central moral
and legal principle, asserts: “better that ten guilty persons escape than one innocent
suffer” [31]. This is bolstered by the notion: “The DNA tests that we routinely use in our
laboratories are designed to be exclusionary tests. That is, testing is performed under the
premise that an individual who is not the source of the DNA with a single-source profile
or who is not one of the sources in a mixture of DNA is expected to be excluded from the
DNA sample” [22]. The allure of increased detection of criminals must be chastened by
consideration of the specious detection of innocents.
Invoking this reverse perspective seems pedantic until one considers the
technological advancements employed in DNA testing as well as the implications of
DNA testing to criminal justice policy and practice. Advances include the emerging
abilities to detect DNA in minute quantities of sample—for example, from fingerprints
[32], latex gloves [33], fingernail remains [34], skin cells [35], single hairs [36], clothing
[37], and cigarettes [38]—as well as the ability to discern and analyze mixture samples
[20,23,39-42]. The extraction of potentially individualizing information from degraded
or partial profiles and/or from individual profiles whose characteristics are potentially
masked by the presence of another’s profile is being pursued more often with the
increasing commercialization of new, more sensitive amplification chemistries [43,44].
13
Although “touch” and otherwise low-level samples are now more routinely submitted for DNA typing, a number of issues are associated with testing these types of low-template samples. By its nature, the attempted detection of scant quantities of an evidentiary sample carries with it the risk of not fully observing all the alleles that truthfully comprise the sample under investigation; this is the phenomena of allelic drop- out [13]. At the same time, because the detection threshold must be relaxed and extra amplification cycles might need to be added to produce a discernible signal, concurrent risk exists for “detecting” spurious evidentiary signal(s) due to contamination, stochastic variation, or adventitious deposition [45,46]; this is the phenomena of allelic drop-in.
Issues relating to heterozygous peak-height balance and the increased prevalence of stutter products also need to be considered [47].
The ramifications for evidence interpretation given this mélange of complications 1 are significant. For example, if a laboratory uses the CPI statistic, loci
with alleles below the stochastic threshold cannot be used for inclusion. If all alleles of
an evidence profile exhibited alleles less than the stochastic threshold, then the evidence
theoretically would not be used for comparison, and the likely outcome would be that a
reliable comparison cannot be made. Similarly, incorporating the probability of drop-out
(Pr(D) ) into the LR would decrease the weight of the evidence since more random people could be included.
1 Error rates associated with sample collection, extraction, and the amplification process itself have been well documented [62-64]. These errors, which range in pervasiveness from the mislabeling of laboratory sample tubes to the contamination of an evidence collection kit by a crime scene investigator, would preface the types of profile interpretation errors that are assessed in Sections 2 and 3 and are not considered in this analysis. 14
1.2 Consideration of Error Rates
Despite the growing literature on determination of the LR , little in the way of
examining whether a profile should even be analyzed is considered; concomitantly
missing is a treatment of the Type I and Type II errors associated with a given likelihood
ratio. Further, models that use linear mixture analysis [41,48], Monte Carlo Markov
Chains [49], or least-square deconvolution [50] have been proposed, but in each case, the
fact that the profile is interpretable has been presumed. The decision of whether to
embark on the analysis path towards sample comparison needs to be determined prior to
comparison to a reference sample (whether qualitatively by the analyst or quantitatively
via computational software); furthermore, how to contextualize an inclusion statistic—
either RMNE or LR —within the framework of error rates is critical.
Accordingly, an investigation of the error rates associated with interpreting
successfully amplified DNA profiles—with a focus on uncovering a complexity threshold
to help constrain the conditions under which interpretation of a low-copy or multiple-
contributor profile is even attempted—is necessary. A method to accomplish these goals
is proposed and detailed in subsequent sections by employing Receiver Operating
Characteristic analysis.
1.2.1 Error Rate Analysis Using the Paradigm of Receiver Operating Characteristics
Receiver Operating Characteristic (ROC) analysis originated in World War II as a
means for radar operators to set their detection thresholds in a manner that optimized the
15
tradeoff between false alarms (i.e., spurious target detections) and leakage (i.e., undetected targets) [51].
The ROC parameter space of a classification scheme can be organized in a contingency table or confusion matrix that maps data instances into one of four classes relative to the actual and determined classes of those instances. Within the realm of DNA analysis, the classification scheme involves including or excluding a reference profile from an evidentiary profile, and the data points consist of reference-to-evidence comparisons. The true or actual classes depend on whether a reference sample really is included in an evidentiary sample while the determined classes represent the conclusions of an analyst.
This is distinct from the issue of whether a reference sample ought to be included
analytically. As outlined in Section 2.4.2, this study models an analyst’s determination of
exclusion or inclusion deterministically. In doing so, different determinations, which are
based on specific decision criteria, will specify when an analyst ought to include or
exclude a reference as a contributor to an evidence sample. The agreements and
deviations between those prescriptions (i.e., “ ought to be included” versus “ ought to be
excluded”) and reality (i.e., “ possible contributor” versus “ non -contributor”) form the
basis of this study. Considering whether a standard is a possible contributor versus an
actual contributor is a subtle but important distinction. For example, if the mixture
sample contains alleles 14, 15, 16 at a particular locus, then individuals who ought to be
included could have any of the following allele pairings at that locus: (14,14); (14,15);
(14,16); (15,15); (15,16); or (16,16). This is different than confining the error analysis to
16
those individuals that did actually contribute to a particular mixture since an analyst cannot usually know the exact genotype of a particular contributor among the universe of possible included genotype combinations. Thus, the focus is on the decision criteria themselves; no attempts to model any potential errors committed by an analyst’s actions independent of the prescriptions of his laboratory’s standard operating procedures are considered.
Table 6 shows the four possible outcomes coupled with the consequences of an
analyst’s decision for the comparison of a given reference sample with an unknown.
Table 6: Contingency Matrix Correlating Analyst Decision with the Underlying Reality H0: The mixture profile does not contain the individual’s profile (i.e., the individual is excluded). H1: The mixture profile contains the individual’s profile (i.e., the individual is included). A false negative occurs when an included individual is improperly excluded as a contributor to a mixture, while a true positive occurs when an included individual is correctly included as a mixture contributor. A true negative occurs when an excluded individual is correctly excluded as a mixture contributor, while a false positive occurs when an excluded individual is improperly included as a contributor to a mixture.
Analyst Action Fail to reject H 0; Reject H 0; Reality Accept H 0 Accept H 1 False Negative H false; H true True Positive 0 1 Type II Error False Positive H true; H false True Negative 0 1 Type I Error
Comparisons between an evidence sample and an individual who ought to have been excluded can result either in a correctly excluded individual (i.e., a true negative) or in an incorrectly included individual (i.e., a false positive). Thus, all comparisons that fall within the bottom row of Table 6 come from comparisons involving individuals who could not have contributed to the mixtures to which they are being compared.
Comparisons between an evidence sample and an included individual (i.e., possible contributor) can result either in a correctly included individual (i.e., a true
17
positive) or in an incorrectly excluded individual (i.e., a false negative). Thus, all comparisons that fall within the top row of Table 6 come from comparisons involving individuals who could have contributed to the mixtures to which they are being compared.
Within the realm of error analysis, a decision that results in a truthfully negative sample being judged to be positive (i.e., a false positive) is known as a Type I error or
“error of the first kind.” The rate at which false positives occur, which is typically denoted by α, is the conditional probability of including an individual as a contributor to a mixture given that the individual ought to be excluded. The sensitivity of a test, also
known as hit rate or recall , is equivalent to the true positive rate and is given by 1−α . A decision that results in a truthfully positive sample being judged to be negative (i.e., a false negative) is known as a Type II error or “error of the second kind.” The rate at which false negatives occur, which is typically denoted by β, is the conditional probability of excluding an individual as a contributor to a mixture given that the individual ought to be included. The specificity of a test is equivalent to the true negative
rate and is given by 1− β .
Initially applied to radar detection thresholds, ROC analysis has found utility in
general problems involving signal detection across multiple disciplines, including speech
recognition and music detection [52], face detection and recognition [53], and vibration-
based structural health monitoring of bolt loosening [54], among many others. It has also
been successfully applied to clinical settings involving disease diagnosis [55] and
18
provides a useful, reductive lens through which to process the allelic discrepancy data contained in Sections 3.1 and 3.2.
Given databases of mixtures, excluded individuals, and included individuals—
which will originate from simulation (Section 2.2) and from the laboratory (Section
2.3)—assessments will be made as to the relative error rates resulting from varying
decision criteria for declaring a reference profile as either excluded or included from a
given evidence mixture sample, where the decision criterion will be dependent upon the
number of discrepant alleles between the two samples.
2 METHODS
2.1 Overview
Two studies were conducted: an error analysis study that considered a simulated
mixture database (Section 2.2) and a validation study that considered a laboratory
mixture database (Section 2.3). Results for each of these studies (Sections 3.1 and 3.2,
respectively) were generated by comparing these mixture databases to two different
simulated individual databases: one consisting of individuals verified to be excluded as
mixture contributors; the other consisting of individuals verified to be included as
mixture contributors.
In this study, the mixture profiles were taken to be the forensic evidence samples,
and the individual profiles represented the reference samples.
19
2.2 Error Analysis Study Using Simulated Mixture Data
For the error analysis study using simulated mixture data, the population under
study was a database of simulated mixtures, which were compared to a database of
simulated excluded individuals and a database of simulated included individuals.
2.2.1 Simulation Materials
All simulated genetic profiles and subsequent analyses were accomplished using
MATLAB Version 7.5.0.342 (R2007b) (MathWorks, Natick, Massachusetts).
2.2.2 Simulation Model
Within the simulation framework, profiles were represented as a collection of
alleles determined to be present at each of the 15 autosomal loci contained in the
AmpF ℓSTR® Identifiler® Amplification Kit (Applied Biosystems, Foster City, CA):
D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338,
D19S433, vWA, TPOX, D18S51, D5S818, FGA, D5S818, FGA [43]. Amelogenin, the
gender-determining locus, was not considered since it is not a hypervariable locus
providing discriminatory power.
The alleles observed and tabulated by Butler et al . [56] in their 2003 population
study using Identifiler® were taken as the universe of realizable alleles for the purposes
of simulating profiles. Butler et al . observed 59 distinct allele calls over all autosomal
STR loci: 5, 6, 7, 8, 8.1, 9, 9.3, 10, 10.3, 11, 12, 12.2, 13, 13.2, 14, 14.2, 15, 15.2, 16,
16.2, 17, 17.2, 18, 18.2, 19, 19.2, 20, 21, 21.2, 22, 22.2, 22.3, 23, 23.2, 24, 24.2, 25, 25.2,
20
26, 27, 28, 29, 29.2, 30, 30.2, 31, 31.2, 32, 32.2, 33, 33.1, 33.2, 34, 34.2, 35, 36, 37, 38,
39. The collection of alleles observed at each particular locus, along with each allele’s associated subpopulation frequency among Caucasians is diagrammed in Figure 1.
Figure 1: Dirac Delta Function Plots of Genetic Model Used in Simulating Profiles The genetic model employed when simulating profiles was based on Butler et al .’s 2003 subpopulation study of Caucasians using the Identifiler® kit [56]. All observed alleles (i.e., the common alleles) at each of the 15 autosomal STR loci are represented by the integer topping each vertical line, and each allele’s associated subpopulation frequency is represented by the height of its respective vertical line.
Profiles of individuals were generated by randomly selecting two (not necessarily
distinct) alleles for each of the 15 autosomal loci in the Identifiler® kit. For any given
locus, a list of alleles was constructed that consisted of all alleles with non-zero
subpopulation frequencies with respect to the 302 Caucasians observed by Butler et al .
Two alleles were selected at random from this locus-specific allele list according to the
subpopulation frequencies observed by Butler et al . (and represented in Figure 1). For
21
instance, for the CSF1PO locus, the following alleles were observed (with the associated subpopulation frequencies in Caucasians): allele 8 (with a frequency of 0.00497); allele 9
(0.01159); allele 10 (0.21689); allele 11 (0.30132); allele 12 (0.36093); allele 13
(0.09603); and allele 14 (0.00828). The simulation selects a 9 allele 1.159% of the time and a 12 allele 36.093% of the time. A genotype consisting of alleles 8 and 13 at the
CSF1PO locus would be selected 2 × .0 00497 × .0 09603 = .0 09545 % of the time 2,
whereas a homozygous CSF1PO locus consisting of the alleles 11 and 11 would be
selected .0 30132 × .0 30132 = .0 09079 % of the time. This random selection is
consummated for each of the 15 autosomal loci in the Identifiler® kit according to Butler et al .’s observed allele frequencies for each allele at each locus.
Each individual profile, then, consists of a collection of two alleles at each of the
fifteen loci. Recasting this information in the form of a matrix served the dual purpose of
neatly summarizing an individual’s profile in a logical manner as well as setting up the
data in a computationally efficient manner. Matrix representation of profiles leveraged
MATLAB’s vectorized analysis environment to facilitate fast data manipulation in
performing essential operations, such as “summing” individuals’ profiles to simulate
mixtures and comparing two profiles for the presence of common alleles.
Thus, an amplification result for an individual was represented as a 59 x 15
matrix, with the 59 rows corresponding to the universe of possible alleles at all loci and
the 15 columns corresponding to those particular loci. (Row 1 corresponded to allele 5
2 The factor of 2 that is included in the product of heterozygous allele frequencies to account for the combinatorial fact that a genotype consisting of allele 8 and allele 13 (in that order) is equivalent to a genotype consisting of allele 13 and allele 8 (in that order). 22
and proceeded in a monotonically increasing fashion through row 59, which corresponded to allele 39.) The loci order corresponded to the ordering listed above with column 1 corresponding to the D8S1179 locus and column 15 corresponding to the FGA locus.)
The matrix entries represented relative allele prevalences for that profile. The relative presence of an allele allows for a simple model of allele expression—as either absent, present heterozygously, or present homozygously for single-source profiles while not taking into account signal intensity, the number of contributors, or relative contributor ratios for mixed profiles. For example, a given single-source profile matrix consisted of entries of relative prevalences of 0, 1, or 2. An entry of zero corresponded to the absence of that allele for that particular locus (e.g., a zero in the 8 th row and 1 st column indicates
that the reference did not have a 10 allele at the D8S1179 locus); an entry of unity
corresponded to the presence of a heterozygote allele at a particular locus (e.g., a one in
the 42 nd row and 2 nd column indicates that the reference’s D21S11 locus is heterozygous and that one—and only one—of the two alleles possessed at this locus is a 29); an entry of 2 represents a homozygous allele at the specified locus (e.g., a two in the 11 th row and
3rd column indicates that the reference has two 12 alleles at the D7S820 locus). The
profiles of all individuals were assumed to have exactly two alleles at a given locus.
For an individual profile I with 15 loci L, each consisting of two alleles αL,1 and
αL,2 , Equation 5 models the resulting profile.
23
15 15 2 I = ∑ (I α L 1, + I α L 2, )= ∑∑ I α L , A L=1 LA=1 =1
Equation 5: Model of Individual Profile α I L, A ≡ allele A contained at locus L for individual I I ≡ (complete) individual profile
Figure 2 shows a graphic representation of an example single-source profile using the
Dirac delta function.
Figure 2: Dirac Delta Function Representation of an Example Single-Source Profile The alleles at each of the 15 autosomal loci are represented by the integer topping each vertical line, and each allele’s associated subpopulation frequency is represented by the height of its respective vertical line.
This example individual’s profile is equivalently represented in Figure 3 as a matrix.
24
Figure 3: Graphical Matrix Representation of a Representative Single-Source Profile The loci names on the x-axis have been abbreviated. The colors of the abbreviated loci names correspond to their respective fluorescent dye colors in the Identifiler® kit (except in the case of the black font, which corresponds to a dye color of yellow). Because of space limitations, only the first and last allele values are identified on the y-axis. In place of potentially illegible numbers, a relative allele prevalence of 0 is represented as white- space, while a red box indicates a relative prevalence of 1, and a blue box indicates a relative prevalence of 2.
Mixtures were generated by summing the matrices of a given number of contributors. A mixture of two people could have matrix entries corresponding to relative allele prevalence in the range 0 – 4 depending on the degree of allelic overlap between contributors. (For example, if two contributors were homozygous for the same allele at a particular locus, the resulting mixture matrix entry for that locus’s allele would be 4.)
25
Therefore, in general, for a mixture M1, created by contributions from two individuals I1 and I 2 with 15 loci L, each consisting of two alleles αL,1 and αL,2 , the resulting alleles expressed at each locus in the mixture profile are modeled by the simple sum shown in Equation 6.
15 2 15 2 α L 1, α L 2, α L, A M 1 = I1 + I 2 = ∑(I S + I S )=∑∑∑ I S 1 2 c L=1 c=1 L =1 A =1
Equation 6: Model of a Two-Person Mixture Profile αL,A ≡ allele A contained at locus L Sc ≡ single-source profile for individual c M1 ≡ (complete) profile for mixture 1
Figure 4 and Figure 5 show graphic representations of an example mixture of Person 1 and Person 2’s profiles as a collection of Dirac delta function plots and as a matrix, respectively.
Figure 4: Dirac Delta Function Representation of Example Mixture Profile from Person 1 and Person 2 The alleles at each of the 15 autosomal loci are represented by the integer topping each vertical line, and each allele’s associated subpopulation frequency is represented by the height of its respective vertical line. 26
Figure 5: Graphical Matrix Representation of Example Mixture Profile from Person 1 and Person 2 The loci names on the x-axis have been abbreviated. The colors of the abbreviated loci names correspond to their respective fluorescent dye color in the Identifiler® kit (except in the case of the black font, which corresponds to a dye color of yellow). The first and last common alleles are identified on the y-axis, where a relative allele prevalence of 0 is represented as white-space; a red box indicates a relative prevalence of 1; a blue box indicates a relative prevalence of 2; a green box indicates a relative prevalence of 3; and a magenta box indicates a relative prevalence of 4.
In this scenario, allele detection has effectively been reduced to a binary system such that each allele is deterministically either present or absent. This ultimately manifests itself as a relative prevalence number instead of a peak height or area. Modeling different mixture ratios between contributors would be required to fully encompass the potential effects of allelic drop-out. Since the drop-out model employed in this study incorporates consideration of relative allele prevalence, all results assume a 1:1 mixture ratio.
27
2.2.2.1 Modeling Allele Drop-out
If no drop-out is assumed, summation of the single-source matrices would result in
a mixture profile with all contributed alleles detected; this was considered a “pristine
mixture” profile and is akin to instances in casework in which the mixture proportion
ratio is 1:1 with a total DNA mass input of greater than 0.5 ng into the amplification
process [57]. To account for instances where lower targets of DNA are amplified, allele
drop-out needed to be modeled. To accomplish this, pristine mixtures were perturbed for
varying proportions of drop-out for a heterozygous allele from 0 to 1 in increments of
0.1. Here, a drop-out level of 0 means that all of the alleles were detected; in this case,
the “perturbed mixture” is identical to the “pristine mixture.” A non-zero level of drop-
out corresponded to the proportion of time a heterozygously-present allele (i.e., an allele
with a relative prevalence of 1) was not detected. For example, for a single-source
sample with a drop-out proportion of 0.1, each allele with a relative prevalence of 1 stood
a 10% chance of not being detected. Random numbers drawn separately for each allele at
every locus according to the specified proportion of drop-out determined whether a given
allele actually dropped out.
For alleles within a profile that were contributed multiple times (i.e., had a
relative prevalence greater than unity)—either from an individual contributor being
homozygous at that locus or from overlapping alleles between contributors—their
increased prevalence diminished the probability that that particular allele would drop-out.
Therefore, an allele that is twice as prevalent in a mixture is half as likely to completely
drop-out while an allele that is four times as prevalent is one-quarter as likely to drop-out.
28
Thus, for a particular mixture allele α at a particular locus L with a relative prevalence φ ,
the expression used to describe the probability of drop-out Pr( D) α L, A for that particular φ mixture allele, given a specified probability of drop-out for a heterozygous allele
Pr( D) , is given by Equation 7. φ =1
Pr( D) Pr( D) α L, A = φ =1 φ φ
Equation 7: Probability of Allele Drop-out αL,A ≡ allele A at locus L φ ≡ relative prevalence of allele (e.g., 0, 1, 2, 3, 4) Pr( D) ≡ specified probability of drop-out for a heterozygous allele φ =1 Pr( D) α L,A ≡ realized probability of drop-out for allele A at locus L φ
Whether an allele actually dropped out or remained observable was determined through a random number draw weighted with the appropriate probability of drop-out Pr( D) α L, A . φ
2.2.3 Generating Populations for Comparison
2.2.3.1 Simulating Mixtures
First, profiles for 100,000 single-source samples were simulated to serve as
possible mixture contributors. Random profiles from this set were selected two-at-a-time
and their profiles combined into a mixture. A quality check was employed to ensure that
an individual’s—and potential mixture contributor’s—profile was not selected multiple
times, which would have led to a single person contributing twice to a single “mixture.”
As described in Section 2.2.3.1, the combination of the two profiles was achieved by
29
summing the matrix profiles from each contributor to arrive at a matrix profile that represented the pristine mixture under the condition of no allele drop-out. This selection of mixture contributors and summing of contributor profiles was repeated 10,000 times.
The final collection of pristine mixture profiles comprised part of the simulated mixture database; these 10,000 mixture profiles (with no drop-out) collectively represented one mixture set. Perturbations of the pristine mixtures, generated by applying varying levels of allelic drop-out (i.e., 0.10 to 0.90 in increments of 0.10), contributed the rest of the mixture sets, resulting in a total of 10,000 mixtures/set × 10 sets = 100,000 simulated mixtures. These simulated mixtures were later used for comparison to simulated excluded and included single-source profiles.
Figure 6 provides a flow chart representing the process of simulating pristine mixture profiles, along with the associated inputs that defined the parameters of this particular study.
30
Figure 6: Flow Describing Pristine Mixture Profile Generation Results contained in Sections 3.1 & 3.3 are for npop ≡ number of simulated potential contributor individuals = 100,000, ncontribs ≡ number of mixture contributors = 2, & nmix ≡ number of mixtures to simulate = 10,000.
From this master set of pristine mixture profiles, perturbed mixtures were generated, each modeling a different level of allelic drop-out. This flow is depicted in Figure 7.
Figure 7: Flow Used to Generate Perturbed Mixture Profiles Results included in Sections 3.1 & 3.3 incorporate a range of allele drop-out rates increasing from 0% to 90% in increments of 10%. Pr( D) ≡ specified probability of drop-out for a heterozygous allele φ =1
31
2.2.3.2 Simulating Excluded Individuals
For the exclusion component of the simulation study, 10,000 possible individual profiles were simulated in the same manner as they were for the mixture contributors to create a database.
Subsequently, when comparisons were being made between these excluded types and a particular mixture, a quality check ensured that a given reference profile deemed to be excluded did not in fact contain a collection of alleles that overlapped 100% with the mixture profile under consideration; in other words, the profiles were confirmed to be excluded. This confirmed that the correct relationship between the references and the mixtures was in fact exclusion.
Figure 8: Integrated Flow Describing Generation of Pristine Mixture Profiles & Excluded Individuals Results provided in Sections 3.1 and 3.3 are for npop ≡ number of simulated potential contributor individuals = 100,000, nindivs ≡ number of simulated included individuals = 10,000, ncontribs ≡ number of mixture contributors = 2, & nmix ≡ number of mixtures to simulate = 10,000.
32
Figure 8 demonstrates the expanded simulation methodology as well as the
relationship between the mixture contributor profiles and excluded reference profiles,
along with the relevant simulation inputs.
2.2.3.3 Simulating Included Individuals
For the inclusion component of the simulation study, the same number of comparisons between single-source and mixture profiles was desired as that which took place for the simulated excluded individuals. In the exclusion component of the simulated mixture study, 10,000 excluded individuals were compared recursively to
10,000 mixtures, resulting in a total of 10,000 individuals ×10,000 mixtures = 1×10 8 comparisons. Unlike the exclusion component of the study in which a single population of individuals was simultaneously excluded from all mixtures, separate populations of included individuals are needed for every individual mixture; in other words, for 10,000 mixtures, distinct sets (each containing 10,000 individuals) are needed, with each set appropriate for comparison to a single mixture. Thus, a total of 10,000 included individuals per mixture × 10,000 mixtures = 1×10 8 total included individuals are needed to arrive at an equivalent number of comparisons.
To generate the set of included individuals for a given mixture, that mixture’s matrix profile was considered. In the case of no allelic drop-out, the universe of possible alleles considered for reference profile generation—previously set to mirror Butler et al .’s observed subpopulation frequencies—was collapsed to include only those alleles that were represented in the mixture. Once all of the alleles for each locus had been
33
identified, the frequencies of allele incidence at a given locus were renormalized to 1 using Butler et al .’s published frequencies [56]. This ensured the relative subpopulation frequencies between alleles was maintained while limiting the pool of possible alleles. In general, at locus L for a mixture containing n alleles { αL,1 , αL,2 , …, αL,n } with corresponding subpopulation frequencies { fL,1 , f L,2 , …, f L,n }, the resulting renormalized
frequency RL,m of an allele αL,m is given by Equation 8.
f R L,m L,m = n f ∑ L,Ai i =1
Equation 8: Renormalized Allele Frequency for Generation of Included Individuals A ≡ collection of alleles at locus L with non-zero allele frequencies Ai ≡ ith allele in set A n ≡ number of alleles in A fL,m ≡ subpopulation frequency of allele m at locus L RL,m ≡ renormalized subpopulation frequency of allele m at locus L
For the D16S539 locus, for example, the observed alleles (along with their
subpopulation frequencies) were alleles 8 (0.01821), 9 (0.11258), 10 (0.05629), 11
(0.32119), 12 (0.32616), 13 (0.14570), 14 (0.01987). If a given mixture only contained
alleles 9, 10, and 14, then the re-normalized frequencies for the generation of individuals
would be allele 9 (0.24817), 10 (0.70803), 14 (0.04380), and all included individuals
would contain one of the following genotypes (α1,α2): (9,9), (9,10), (9,14), (10,10),
(10,14), or (14,14). 3
It should be noted that restricting the included individuals only to include the
mixture contributors that actually contributed to these simulated mixtures provides an
3 (α1,α2) is genotypically equivalent to (α2,α1). 34
insufficiently large population (i.e., consisting of only two individuals) from which to make comparisons. The actual mixture contributors represent only a subset of all individuals that could have contributed to a given profile. That is, just because the combination of C 1 and C 2’s profiles resulted in a collection of alleles in mixture M 1 does
not necessarily exclude the possibility that the combination of C 3 and C 4’s profiles could produce that same combination of alleles. In fact, the possibility that any given person could have reasonably contributed to a mixture increases in likelihood as the number of contributors—and thus the total collection of mixture alleles—increases. To ease the computational burden of simulating extraordinary quantities of individuals until one was serendipitously included in a given mixture, the prescribed methodology constrains the simulation space to produce included individuals in a more efficient manner. Such individuals that are forced—by simulation—to have profiles included in a given mixture should be determined to be potential contributors by an analyst.
Figure 9 depicts the simulation process for generating included individuals, along with the simulation inputs used in this study.
Figure 9: Flow Describing Generation of Included Individuals (From Simulated Mixture Profiles) Results provided in Section 3.1 and 3.3 are for nindivs ≡ number of simulated included individuals = 10,000, ncontribs ≡ number of mixture contributors = 2, & nmix ≡ number of mixtures to simulate = 10,000.
35
2.2.4 Comparing Populations and Counting Allelic Discrepancies
As previously described, profiles within each of the databases (mixtures, excluded individuals, included individuals) existed as 59 x 15 matrices. To compare two profiles, the single-source matrix was first considered to see which alleles were present at each locus. The mixture profile matrix was then considered to see how many of those single- source alleles it contained.
In accordance with the “relative prevalence” model, allele detection for mixtures was assessed in a binary fashion: for the purposes of comparison, each allele was either considered detected or not detected, without regard to the “strength” of presence, which might be interpreted through peak height or area. Accordingly, each reference allele from a heterozygous locus not contained in the mixture counted as one allelic discrepancy, and the reference alleles from a homozygous locus not contained in the mixture counted as two allelic discrepancies. Each individual locus could have zero, one, or two allelic discrepancies with a corresponding mixture locus. This discrepancy tallying was completed for all alleles at all loci in the single-source matrix. Thus, since the Identifiler® kit tests 15 autosomal loci, the range of total possible discrepancies between a given reference profile and a given mixture profile included integers ranging from 0 to 30.
2.3 Validation Study Using Laboratory Mixture Data
For the error analysis study using laboratory mixture data, the population under
study consisted of actual, amplified mixture samples, which were compared to the
36
already-generated database of simulated excluded individuals and a separately-generated database of simulated included individuals.
2.3.1 Profile Typing Materials and Methods
Full details of sample collection, preparation, extraction, amplification, and analysis for these laboratory mixtures are described elsewhere [58]. Two mixtures consisting of two individuals each (for a total of four individuals) mixed in a 1:1 ratio were considered for six starting masses of DNA template: 2.0 ng, 1.0 ng, 0.5 ng, 0.25 ng,
0.125 ng, and 0.0625 ng.
2.4 Data Interpretation Framework
2.4.1 Organizing Comparison Results
The comparison of all single-source genotypes to a particular mixture within a mixture set yielded a vector of allelic discrepancy values. The total compilation of all comparisons between all individuals and all mixtures within a mixture set yielded a collection of allelic discrepancy values that were plotted as a histogram. In all, 10,000 individuals were compared to 10,000 mixtures, resulting in a total of 1×10 8 comparisons represented in such a histogram.
Each histogram bin τδ contained a count ρδ of the number of comparisons between reference and mixture profiles out of the total number ψ of comparisons that had exactly
δ allelic discrepancies between reference and mixture profiles, where δ was an integer
ranging from 0 to 30.
37
An example histogram organizing a collection of discrepancy values for 10,000 individuals compared to a single mixture is shown in Figure 10. For example, reading the bin corresponding to 15 allelic discrepancies (τδ=15) for the histogram yielded the following information: Exactly 15 discrepancies ( δ = 15) were observed in 1060 reference-to-mixture comparisons ( ρ15 = 1060) out of the total of 10,000 reference-to- mixture comparisons ( ψ = 10,000) comprising this histogram.
Figure 10: Example Histogram Tabulating Discrepancies Between 10,000 References and 1 Mixture The height of the bar for a given histogram bin τδ represents the number of reference-to- mixture comparisons that resulted in δ allelic discrepancies, where an allelic discrepancy is defined as a reference allele that is not present in the mixture profile.
Within the context of this study, this histogram represents the collection of
discrepancies resulting from the comparison of a particular mixture set with a particular
single-source database (i.e., a set of either excluded individuals or included individuals).
Completing comparisons for multiple mixture sets yielded multiple histograms, each
representing the results from comparing a database of individuals with a set of mixtures
exhibiting a given rate of allelic drop-out. Here, just one histogram is considered.
Using the same data, the histogram depicted in Figure 10 was recast as the
cumulative normalized histogram shown in Figure 11. Each cumulative normalized
histogram bin represented the fraction of the total number of comparisons between
reference and mixture profiles that had less than or equal to δ allelic discrepancies (i.e.,
38
δ
∑ ρi i=1 ). For example, ~0.831 or ~83.1% of all comparisons resulted in 15 or less ψ
δ =15
∑ ρi discrepancies (i.e., i=1 = .0 831 ). ψ
Figure 11: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies The height of the bar for a given cumulative normalized histogram bin τδ represents the number of reference-to-mixture comparisons that resulted in δ or less allelic discrepancies.
δ
∑ ρi Since each bin’s magenta bar corresponded to i=1 comparisons with discrepancies less ψ than or equal to δ, the proportion of total comparisons with discrepancies greater than δ
δ
∑ ρi was 1− i=1 . These complementary quantities are included in Figure 12 as gray bars. ψ
(To facilitate the discussion of later analysis, the magenta bar in each bin was stacked on
top of the complementary gray bar in that bin; the data represented by the magenta bars
are identical between Figure 11 and Figure 12.) Another way to read the plot is to
consider an allelic discrepancy bin, e.g., τ15 . The magenta bar in that bin represents the proportion of individual-to-mixture comparisons resulting in exactly δ discrepancies or
less while the gray bar in that same bin represents the proportion of individual-to-mixture 39
comparisons resulting in more than δ discrepancies. As an example, for τ15 , 83.1% of all reference-to-mixture comparisons yielded 15 allelic discrepancies or less while 16.9% of comparisons yielded 16 or more allelic discrepancies.
Figure 12: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies (two-color) Each magenta bar corresponds to the number of individual-to-mixture comparisons exhibiting δ or less discrepancies. Each gray bar corresponds to the number of individual-to-mixture comparisons exhibiting more than δ discrepancies.
All of the data comparing single-source profiles to mixture profiles—whether
involving individuals that are excluded or included, or mixtures that originate from
simulation or from the laboratory—were cast in these summary histograms.
2.4.2 Making Determinations of Exclusion or Inclusion
The next analysis step involves considering how a DNA analyst, when presented
with a reference profile and a mixture profile, would make a determination of inclusion
versus exclusion when comparing the two profiles. In reality, making a determination of
inclusion versus exclusion is not always a straightforward one and is in fact rarely
straightforward when low template quantities or mixture data are being considered. The
comparison and resulting decision involves nuanced consideration of many complex
aspects of profile analysis and interpretation. A decision need not be binary—i.e., the
40
individual is either included or excluded—and an analyst must offer statistics and/or likelihood metrics to contextualize their determination [10,11].
A simplistic yet not irrelevant means of modeling an analyst’s decision is to
enforce a decision threshold on the number of tolerable allelic discrepancies while still
concluding an individual is included in a mixture. Under such a scheme, the strictest
decision threshold would equate to tolerating zero allelic discrepancies, which
corresponds to an analyst insisting that every allele present in the reference profile appear
in the mixture profile in order to conclude that the reference is a potential contributor to
that mixture. Alternatively, a laboratory may choose to establish a complexity threshold
such that, given a particular set of circumstances—e.g., too much allelic drop-out—one
may decide that the inherent likelihood of error is too high to render an accurate
conclusion; accordingly, the interpretation of the mixture would cease (i.e., before Step 3
in Table 3).
Enforcing a decision threshold allows for the deterministic modeling of the
analyst decision process. Under such a paradigm, an analyst operating under a particular
laboratory protocol might, for example, tolerate the absence of up to six reference alleles
(presumably due to allelic drop-out) in the mixture profile and still declare the reference
to be included in that mixture. In other words, six of the reference alleles (i.e., two at
three loci, one at six loci, or other combinations summing to six) are “missing” from the
mixture profile; one possible reason for the “missing alleles” could be presumed drop-out
in the mixture profile. Accordingly, for a given profile, the analyst would accept any
41
mixture profile that contained 0 to 6 allelic discrepancies with the reference profile as supporting a conclusion of inclusion. In this case, the analyst’s decision threshold is τδ=6 .
Consider the example cumulative normalized histogram in Figure 12, with the
reference-to-mixture comparisons assumed to involve individuals that truthfully did not
contribute to the mixtures; that is, the individuals really ought to be excluded. In the
context of Table 6, this corresponds to “reality: H0 true; H 1 false,” which confines
the analyst decision consequences to the bottom row of the contingency table: True
negative or false positive (Type I Error). The analyst will make the correct decision (i.e.,
true negative) if truthfully excluded individuals (i.e., non-contributors) are excluded;
conversely, if an analyst were to include truthfully excluded individuals, it would be in
error (i.e., false positive). For example, if the decision threshold was set at 6, the
magenta bar at τδ=6 represents the fraction of comparisons for which false positive errors were made. The false positive rate at a decision threshold of τδ=6 , then, is the quotient of
the total false positives and total negatives (i.e., the total number of excluded
δ =6
∑ ρi i=1 individuals): . The gray bar at τδ=6 represents the fraction of comparisons for ψ
which correct conclusions are made. The true negative rate is the quotient of total true
δ =6
∑ ρi negatives and total negatives (i.e., the total number of excluded individuals): 1− i=1 . ψ
Alternatively, consider a separate example. In this case, consider the example histogram in Figure 12, with the individual-to-mixture comparisons now instead assumed to involve individuals that could have contributed to the mixtures; that is, the individuals 42
are, in reality, included. In the context of Table 6, this corresponds to “reality: H 0 false; H 1 true,” which confines the analyst decision sequences to the top row of the contingency table: false negative (Type II Error) or true positive. The analyst will make the correct decision (i.e., true positive) if individuals who truthfully ought to have been included (i.e., possible contributors) are included; conversely, if an analyst were to exclude truthfully included individuals, it would be in error (i.e., false negative).
Any time a decision threshold is set at a level that excludes any of the magenta bars, some errors are being made. If the decision threshold was set at τδ=6, the gray bar at
τδ=6 represents the fraction of comparisons for which false negative errors were made.
The false negative rate represented by a decision threshold at τδ=6 , then, is the quotient of the total false negatives and total positives (i.e., the total number of included individuals):
δ =6
∑ ρi i=1 1− . The magenta bar at τδ=6 represents the fraction of comparisons for which ψ correct calls are made. The true positive rate is the quotient of total true positives and
δ =6
∑ ρi total positives (i.e., the total of included individuals): i=1 ; in the literature, another ψ term for true positive rate is recall . [59]
In the literature, another term for the quotient computed by dividing the number of true positives by the sum of true positives and false positives is precision . Another term for the quotient computed by dividing the sum of true positives and true negatives by the sum of total positives and total negatives is accuracy . [59]
43
3 RESULTS AND DISCUSSION
Although preferable, all of a standard’s alleles need not be present in order to
conclude that an individual should or should not be considered a possible contributor to a
mixture. Making determinations in the face of incomplete information, though, requires
consideration of countervailing implications of decision thresholding as well as
concomitant error rates. If the number of tolerated allelic discrepancies is set too low,
then individuals that ought to have been included are not; conversely, if the number of
tolerated discrepancies is too high, then individuals that ought to have been excluded are
not. In other words, a tradeoff exists between properly including contributors and
improperly excluding non-contributors.
For the simulated error analysis study (Section 3.1), simulation methods were employed to probe the relationship between correct and incorrect determinations of inclusion and exclusion as they varied with analyst laxity in match criteria and with probability of allelic drop-out. An error analysis exploring the tradeoff between false positive and false negative errors was explored as a function of drop-out. The results from this error analysis study using simulated profiles were then compared with laboratory mixture data (Section 3.2)
3.1 Error Analysis Study Using Simulated Data
3.1.1 Comparing Simulated Mixtures to Simulated Excluded Individuals
In the case of comparisons between the simulated mixtures and the simulated excluded standards, all of the individuals under consideration should, in truth, be
44
excluded by a DNA analyst. If an analyst were to conclude otherwise, it would be in error. Given a decision criterion, an analyst can either fail to reject H 0, which
corresponds to excluding the individual from the mixture, or reject H 0, which corresponds
to including the individual from the mixture (see Table 6). Since all of these simulated
individuals really ought to be excluded, an analyst determination of exclusion properly
identifies the individual as a true negative. Conversely, an incorrect analyst
determination of inclusion represents a false positive.
Tallying the discrepancies observed in all of the comparisons between individuals
and 2-contributor mixtures exhibiting no drop-out (i.e., Pr( D) = 00.0 ) yields the three φ =1 histograms contained in Figure 13.
45
Figure 13: Summary histograms: Excluded-Individual-to-Mixture Comparisons (No Drop-Out) The top histogram counts how many allelic discrepancies were found in all reference-to- mixture comparisons. The middle histogram cumulatively bins those results. The bottom histogram (Figure 13c) normalizes those cumulative bin results by the total number of comparisons made.
Using the results depicted in the cumulative normalized histogram in Figure 13c and mapping the two possible analyst actions (i.e., determinations) to the corresponding colors from Table 6 yields Figure 14.
46
Figure 14: Results Comparing Excluded Individuals to Simulated Mixtures (No Drop-Out) This figure presents the same data depicted in Figure 13c.
Discrepancies were approximately normally distributed with a mean of
approximately 13 discrepancies and a standard deviation of about 3 discrepancies. For a
δ =6
∑ ρi given analyst allelic discrepancy tolerance δ, the green bar ( i=1 ) at the associated bin ψ
τδ corresponds to the total proportion of excluded-individual-to-mixture comparisons
δ =6
∑ ρi resulting in a correct exclusion while the complementary red bar ( 1− i=1 ) at that same ψ
bin corresponds to the proportion of comparisons resulting in incorrect inclusions.
As an example, consider the case in which a laboratory is given a population of
reference profiles that truthfully should be excluded from a population of mixture
profiles. In that case, if a laboratory assessed individual-to-mixture “matches” while
allowing for up-to-six allelic discrepancies, their analysts would make correct exclusions
99.1% of the time while incorrectly being unable to exclude (i.e., including) 0.9% of the
47
time. As the number of allowed discrepancies increases, the probability of incorrectly including a truthfully excluded individual increases and is greater than 5% at a decision threshold that tolerates 8 discrepancies.
Equivalent analyses were performed for varying levels of drop-out, and the results for excluded-individuals-to-mixtures comparisons are shown in Figure 15.
Figure 15: Results Comparing Excluded Individuals to Simulated Mixtures, Pr(D)| φ=1 ≠0.00 Green bars correspond to Pr(Correct Exclusion) , i.e., true negatives . Red bars correspond to Pr(Incorrect Inclusion) , i.e., false positives .
For a given decision threshold τδ, Figure 15 shows that as Pr(D) increases, so does
the chance of correctly excluding a standard that truly should have been excluded. A
higher rate of drop-out results in less allelic information being available in the mixture 48
profile, which in turn increases the number of discrepancies between reference and mixture profiles. An example of this counterintuitive trend is illustrated in Table 7.
Table 7: Example of Excluded-Individual-to-Mixture Comparison The chance of correctly excluding a truthfully excluded individual increases as the probability of drop-out increases. Non-Contributor Individual Locus 1 Locus 2 Locus 3 Alleles of Non-Contributor 7, 11 8, 9 13, 15 Detected Mixture Alleles 7, 8, 9, 11 5, 8, 9, 14 11, 13, 14 Discrepant Allele(s) ∅ ∅ 15 Decision Threshold τ0 …………… Excluded since δ=1 > τ0 …………… Decision Threshold τ1 …………… Included since δ=1 ≤ τ1 ……………
Pr(D) = 0.00 Pr(D) 0.00 = Decision Threshold τ2 …………… Included since δ=1 ≤ τ2 …………… Detected Mixture Alleles 7, 8, 9 8, 9, 14 11, 13, 14 Discrepant Allele(s) 11 ∅ 15 Decision Threshold τ0 …………… Excluded since δ=2 > τ0 …………… Decision Threshold τ1 …………… Excluded since δ=2 > τ1 ……………
Pr(D) = 0.20 Pr(D) 0.20 = Decision Threshold τ2 …………… Included since δ=2 ≤ τ2 …………… Detected Mixture Alleles 8 9, 14 11, 14 Discrepant Allele(s) 7, 11 8 13, 15 Decision Threshold τ0 …………… Excluded since δ=5 > τ0 …………… Decision Threshold τ1 …………… Excluded since δ=5 > τ1 …………… Pr(D)= 0.50 Decision Threshold τ2 …………… Excluded since δ=5 > τ2 ……………
Collapsing the individual-to-mixture comparison to three loci in Table 7 for simplicity yields an individual for whom all 6 alleles (i.e., Alleles 7 & 11 at Locus 1;
Alleles 8 & 9 at Locus 2; and Alleles 13 and 13 at Locus 3) are included in the mixture for Pr(D) = 0.00 . Since the individual is truly a non-contributor, this false inclusion is
not fortuitous. The detection of fewer mixture alleles with increasing Pr(D) corresponds to an increase in the number of observed allelic discrepancies. In this example, two discrepancies (Allele 11 at Locus 1; Allele 15 at Locus 3) are evident for Pr(D) = 0.20 , while five discrepancies (Alleles 7 & 11 at Locus 1; Allele 8 at Locus 2; Alleles 13 & 15 at Locus 3) exist for Pr(D) = 0.50 . Since the individual considered in this example is 49
truthfully a non-contributor to the mixture, the incidence of more allelic discrepancies increases the likelihood that the correct determination of exclusion (i.e., true negative) is reached. Had the individual truthfully contributed to the mixture, then that same phenomena of diminished mixture allele detection would have resulted in an increased likelihood of incorrectly excluding the included individual (i.e., false negative).
3.1.2 Comparing Simulated Mixtures to Simulated Included Individuals
In the case of comparisons between the simulated mixtures and the simulated included genotypes, all of the standards under consideration should, in truth, be included.
A contrary conclusion of exclusion would be in error. Considering Table 6, individual- to-mixture comparisons that involve the simulated population of included individuals correspond to the case when “reality: H 0 false; H1 true,” which corresponds to the top row in the table. Given a decision criterion, an analyst can either fail to reject H 0,
which corresponds to excluding the individual from the mixture, or reject H 0, which corresponds to including the individual from the mixture. Since all of these simulated individuals really ought to be included, a determination of inclusion properly identifies the individual as a true positive. Conversely, an incorrect determination of exclusion represents a false negative.
Tabulating all of the comparisons between individuals and 2-contributor mixtures
exhibiting no drop-out (i.e., Pr( D) = 00.0 ) and mapping the two possible analyst φ =1
conclusions to the corresponding colors from Table 6 yields Figure 16.
50
Figure 16: Results Comparing Included Individuals to Simulated Mixtures (No Drop-Out)
For reference-to-mixture comparisons in which the individuals are deliberately
simulated to have viable mixture-contributor profiles for a collection of respective
mixtures whose profiles are fully observed (i.e., Pr( D) = 00.0 ), Figure 16 bears out φ =1
the observation that those individuals will always be assessed to be included in the
mixtures. This serves as a quality check that the simulation scripts generating individuals
that ought to be included are, in fact, operating as intended. The only way a truthfully
included individual could be judged to have discrepancies with a mixture to which he
contributed is if part of the mixture profile is unobservable; in other words, discrepancies
between included-individual-to-mixture comparisons and will only occur when the
mixtures experience drop-out (i.e., Pr( D) ≠ 00.0 ). φ =1
Results for included-individuals-to-mixtures comparisons for different levels of drop-out are shown in Figure 17.
51
Figure 17: Results Comparing Included Individuals to Simulated Mixtures, Pr(D)| φ=1 ≠0.00 Yellow bars correspond to Pr(Incorrect Exclusion) , i.e., false negatives . Blue bars correspond to Pr(Correct Inclusion) , i.e., true positives .
For a given level of allelic discrepancy tolerance, as Pr(D) increases, mixture profiles are more likely to be missing alleles. Mixtures missing alleles, due to drop-out, are less likely to have commonality with a given reference profile; in other words, more allelic discrepancies will be observed. This is because the mixture profile has a diminished “allelic collection” from which to possibly match an individual’s alleles.
Thus, as Pr( D) increases, the incidence of false negatives (Type II errors) increases. φ =1
52
3.2 Validation Study Using Laboratory Mixture Data
An analogous investigation was conducted into the error rates associated with analyst calls of exclusion and inclusion as a function of analyst decision criteria and allelic drop-out using laboratory mixture data. Though a priori probabilities of allelic drop-out are not known for the laboratory mixtures, realized levels of drop-out that increase with diminishing levels of pre-amplification DNA template quantities are observed.
3.2.1 Comparing Laboratory Mixtures to Simulated Excluded Individuals
In the case of comparisons between the two laboratory mixtures and the simulated excluded standards, all of the individuals under consideration should, in truth, be excluded. Since all of these simulated standards really ought to be excluded, an analyst determination of exclusion properly identifies the individual as a true negative.
Conversely, an incorrect determination of inclusion represents a false positive.
In reality, the a priori probability of drop-out is not known for forensic evidence samples. Instead, the likelihood of drop-out is understood to be related to the quantity of sample’s DNA template before PCR is performed. The availability of more DNA starting material leads to increased electropherogram peak heights, which in turns leads to a greater likelihood of observing more of a profile and less allelic drop-out [45,60].
The range of starting DNA template constituting different amplification runs in this study was 2 ng, 1 ng, 0.5 ng, 0.25 ng, 0.125 ng, and 0.0625 ng. Since all mixtures considered were combined in a 1:1 ratio between contributors, those mixture DNA template
53
quantities in turn correspond to individual contributor DNA template quantities of 1 ng,
0.5 ng, 0.25 ng, 0.125 ng, 0.0625 ng, and 0.03125 ng, respectively.
Tallying all of the comparisons between excluded individuals and the dilution series of 2-contributor mixtures and mapping the two possible analyst actions to different colors yields the summary plots in Figure 18.
Figure 18: Results Comparing Excluded Individuals to 1:1 Laboratory Mixtures Green bars correspond to Pr(Correct Exclusion) , i.e., true negatives . Red bars correspond to Pr(Incorrect Inclusion) , i.e., false positives .
For laboratory mixture amplifications with starting template masses of 2 ng, 1 ng,
0.5 ng, 0.25 ng, and 0.125 ng, no allelic drop-out is observed. Accordingly, their distributions of true-negatives and false-positives are statistically indistinguishable. For the 0.125 ng and 0.0625 ng cases, some drop-out of mixture alleles occurred. This leads to an increased probability of correctly excluding a truly excluded individual with a
54
concomitant decrease in the probability of incorrect inclusion. This result is consistent with the trend demonstrated in Figure 15 and Table 7.
3.2.2 Comparing Laboratory Mixtures to Simulated Included Individuals
In the case of comparisons between the laboratory mixtures and the simulated
included individuals, all of the individuals under consideration should, in truth, be
included by a DNA analyst. Since all of these simulated individuals really ought to be
included, an analyst determination of inclusion properly identifies the individual as a true
positive. Conversely, an incorrect determination of exclusion represents a false negative.
Tabulating all of the comparisons between included individuals and 2-contributor
mixtures and mapping the two possible analyst actions to different colors yields the
summary plots in Figure 19.
Figure 19: Results Comparing Included Individuals to Laboratory Mixtures Yellow bars correspond to Pr(Incorrect Exclusion) , i.e., false negatives . Blue bars correspond to Pr(Correct Inclusion) , i.e., true positives . 55
For laboratory mixture amplifications with starting template masses of 2 ng, 1 ng,
0.5 ng, or 0.25 ng, no allelic drop-out is observed. Since all mixture alleles are observed, there is no chance of incorrectly excluding a truthfully included individual. This condition of Pr(D) = 0 matches the simulated result depicted in Figure 16. For the 0.125 ng and 0.0625 ng cases, some drop-out of mixture alleles occurred. This led to an increased probability of false exclusion with a concomitant decrease in the probability of correct exclusion. This result is generally consistent with the trend demonstrated in
Figure 17. Some deviation of this trend from a strictly increasing error as drop-out levels increase is likely due to the variability in results due to the small sample size; for these laboratory amplified mixtures, only two 1:1 mixtures (for every level of drop-out) were available for analysis.
3.3 Impact of Analyst Decision Threshold on Expected Errors
In establishing an acceptable decision criterion, there is a tradeoff between minimizing false positives while simultaneously minimizing false negatives. In the cases in which an individual should be excluded, the tolerance of discrepant alleles should be minimized in order to avoid false inclusions; this, in turn, maximizes the number of “true negative” determinations. In the cases in which an individual really should be included, tolerating an increased number of discrepancies may be necessary in order to avoid false exclusions; this, in turn, maximizes the number of “true positive” determinations. A
56
germane analysis paradigm employed in other disciplines to perform error tradeoff analysis is called ROC analysis.
3.3.1 Receiver Operating Characteristic (ROC) Analysis
Typical ROC analysis plots true positive rate versus false positive rate for a range
of possible decision thresholds. In the context of reference-to-mixture comparisons, this
amounts to simultaneously imposing a decision threshold (of allelic discrepancies) on
each of the two-color analyst histograms—the ones for excluded individuals (e.g.,
Section 3.1.1 for simulated mixtures; Section 3.2.1 for laboratory mixtures) and the ones
for included individuals (Section 3.1.2 for simulated mixtures; Section 3.2.2 for
laboratory mixtures)—for a given probability of drop-out. An example visualization
scheme for combining excluded-individual and included-individual data for
Pr( D) = 80.0 using the simulated mixture data is shown in Figure 20. The excluded- φ =1
individual-to-mixture comparison data occupies the bottom “row” of the figure, while the
included-individual-to-mixtures comparison data is stacked on top of it. Each “row” of
data has a separate y-axis ranging from 0 to 1, which represents the cumulative
normalized probability for analyst calls of exclusion and inclusion.
57
Figure 20: Example Simultaneous Visualization of Exclusion & Inclusion Comparisons Data on the bottom portion of the plot (contained in green and red bars) represent comparisons involving truthfully excluded individuals, while truthfully included individuals are represented on the top portion of the plot (by yellow and blue bars). Data shown are for simulated data with Pr(D)=0.80.
As an example, the τδ=21 bin is boxed to represent an analyst decision threshold to tolerate δ=21 discrepancies while still declaring a reference to be included in a mixture.
For τδ=21, the excluded-individual-to-mixture comparisons yield a false positive error (i.e.,
incorrect inclusion; the red bar at the τδ=21 bin) approximately 18.1% of the time, while
the included-individual-to-mixture comparisons yields a true positive rate (i.e., correct
inclusion; the blue bar at the τδ=21 bin) of approximately 82.3%.
Varying the decision threshold across the full range of possible allelic discrepancy
values and recording the associated false positive and true positive rates populates a table
of the form shown in
Table 8. Once the table is completely populated, plotting the false positive and
the true positive rates as the abscissa and ordinate quantities, respectively, results the
ROC plot depicted in Figure 21.
58
Table 8: Notional Table of False Positive & True Positive Rates for Different Decision Thresholds Decision Threshold τδ=0 τδ=1 τδ=2 τδ=3 τδ=4 … τδ=21 … τδ=30 False Positive Rate 0.181 True Positive Rate 0.823
Figure 21: Example ROC Plot Each possible decision threshold results in an ordered pair (false positive, true positive ) that corresponds to performance expectations for the associated, allelic-discrepancy decision criterion specified by τδ. Optimal performance, which corresponds to a false positive rate of 0 and a true positive rate of 1, occurs at the point (0,1 ) in the upper-left of the chart. The dotted line corresponding to y = x represents random performance, equivalent to a scheme that guesses “inclusion” and “exclusion” a certain percentage of the time without regard to any profile information. The complementary scales to false positive rate and true positive rate—true negative rate and false negative rate, respectively—are also shown. Data shown are for simulated data with Pr(D)=0.80.
Each point in the ROC space is associated with a particular decision threshold τδ.
Perfect performance is represented by the point (0,1), which corresponds to a false positive rate of 0 and a true positive rate of 1. Though perfect performance is not
59
expected in the presence of drop-out, performance (i.e., accuracy) increases as one approaches the upper-left corner of the ROC space. Any decision criterion that results in a ROC performance curve lying along the 45 o diagonal (i.e., true positive rate = false
positive rate) represents the case in which the decision criterion is unable to provide any
marginal discrimination over a random guess. In other words, if an analyst guesses
inclusion 80% of the time, the analyst will correctly diagnose 80% of the truthful
inclusions as inclusions but also incorrectly diagnose 80% of truthful exclusions as
inclusions; if an analyst guesses inclusion 50% of the time, the analyst will correctly
diagnose 50% of the truthful inclusions as inclusions but also incorrectly diagnose 50%
of truthful exclusions as inclusions.
3.3.1.1 Rollup ROC Results for Simulated Mixture Data
The data from Section 3.1 is reproduced in Figure 22 using the visualization
scheme introduced in Section 3.3.1 to facilitate ROC curve generation.
60
Figure 22: Compilation of Exclusion & Inclusion Data Using Simulated Mixtures As in Section 3.3.1, the bars represent cumulative, normalized fractions with the red and green bars for a given bin summing to 1 and—separately—the blue and yellow bars for a given bin summing to 1. Green represents Pr(Correct Exclusion) ; red, Pr(Incorrect Inclusion) ; yellow, Pr(Correct Inclusion) ; blue, Pr(Incorrect Exclusion)
Compiling the ROC results for all levels of drop-out analyzed with the simulated mixture data produces Figure 23.
61
Figure 23: Error Analysis Results: ROC Rollup Using Simulated Mixtures Decision threshold labels have been omitted for legibility.
Figure 23 shows the correct inclusion rate (calculated from the inclusion data)
versus the incorrect inclusion rate (calculated from the exclusion data) at various levels of
drop-out for the full range of possible discrepancy decision criteria. Figure 23 shows that
as the tolerated number of allelic discrepancies (i.e., the decision threshold) increases,
both the correct inclusion and incorrect inclusion fractions increase (while the incorrect
exclusion and correct exclusion fractions experience complementary decreases), albeit at
different rates. Further, as the level of drop-out increases for a given decision threshold,
both the correct inclusion and incorrect inclusion fractions decrease.
ROC analysis may therefore be used to determine a complexity threshold. That is, a laboratory may choose to specify particular levels of error that are to be tolerated, thus
62
bounding the false positive and false negative error rates and establishing a “complexity threshold.” This amounts to “zooming in” on a region in the ROC space that accords with the tolerable error levels. An example is shown in Figure 24.
Figure 24: Notional Complexity Threshold: Simulated Mixture Results This plot “zooms in” on the upper left of Figure 23 in a manner consistent with error bounds specified by an imaginary laboratory’s Standard Operating Procedure. (The same color scheme is employed as that specified in Figure 23’s legend, where the Pr(D) increases from blue to red or from top-left to bottom-right in increments of 0.1. Also, the first couple of decision thresholds (i.e., τ1 – τ4) for the darkest blue line, corresponding to no drop-out, are so close that they cannot be distinguished from one another.) In this case, the error bounds provide for no more than a 10% chance of incorrect inclusion of a reference sample while insisting on at least 30% correct inclusion determinations. Any points (determined by error rates and drop-out levels) that lie outside of this space fail the complexity threshold specified by the SOP and should not be interpreted.
Complexity thresholds, which are defined in ROC space by pre-specified, laboratory error bounds, may be utilized in two ways: 1) to determine when the Pr(D) is too high to allow for reliable evidence profile interpretation; 2) to determine an a priori
63
ceiling on the number of allelic discrepancies tolerated (while still allowing for a determination of reference exclusion as a contributor to the evidence sample) before false inclusion rates become intolerable.
Even if a determination of drop-out level can be made, certain error limitations may sufficiently bound the problem of establishing a complexity threshold. For instance, if a laboratory demanded that false inclusion occur less than 1% of the time and that true inclusions occur at least 85% of the time (Figure 25), then samples where the probability of drop-out is greater than 0.2 should not be analyzed since none of the ROC curve is represented in the bounded space; that is, the complexity threshold, which is based on pre-determined, tolerable error rates, is not met.
64
Figure 25: More Realistic Complexity Threshold: Simulated Mixture Results This plot “zooms in” on the upper left of Figure 24 in a manner consistent with error bounds specified by an imaginary laboratory. (The same color scheme is employed as that specified in Figure 23’s legend, which has been left off here for space considerations. Also, the first couple of decision thresholds (i.e., τ1 – τ2) for the darkest blue line, corresponding to no drop-out, are so close that they cannot be distinguished from one another.) A laboratory’s Standard Operating Procedure specifies tolerable error bounds that will define the complexity threshold. In this case, false inclusions are limited to occurring less than 1% of the time while false negatives are allowed 15% of the time. The highest number of allelic discrepancies (representing possible decision thresholds) for any level of drop-out that falls within this space is 8 (for Pr(D) = 0.30).
If these are the bounds selected by the laboratory, only those samples with a corresponding Pr(D) = 0.00 – 0.20 should be evaluated. It should be noted that the present study simulated two-person mixtures in a contributor ratio of 1:1. Supplementary studies that simulate mixtures with varying ratios and numbers of contributors are necessary to further explore the parameter space. If a higher probability of drop-out is
65
expected due to low-template, the samples should not be analyzed. It would be classified as a Type C [7] or uninterpretable mixture [11].
Previous work suggests that Pr(D) increases with decreasing DNA input levels and
can be characterized via peak heights [9]. Gill et al. [61] showed that Pr(D) ≈ 0.20 when allele peak heights are between approximately 50 – 100 RFU for the AmpF ℓSTR® SGM
Plus® PCR Amplification Kit. Therefore, for the aforementioned bounds, if it is suspected that less than ~0.1 ng was amplified for a given contributor—as evidenced by peak heights less than ~100 RFU at an analytical threshold of 50 RFU—then the laboratory would deem the sample indeterminable and would not use it for comparison purposes. The decision would be made before comparison to a reference sample.
Next, the ROC plot can be used to determine the number of “allowed” discrepancies based on a given Pr(D). For example, considering Figure 25, if a laboratory has settled upon acceptable rates of incorrect inclusion and correct inclusion at
≤1% and ≥85%, respectively, with Pr(D) ≈0.2, then no more than eight allelic
discrepancies should be allowed by the laboratory’s analysts. That is, if an analyst were
to include reference samples as potential contributors to evidence stains despite observing
9 or more allelic discrepancies, error rates greater than the laboratory’s specified
tolerances would be encountered.
Alternatively, given Blackstone’s Ratio [31], a laboratory might solely prioritize the minimization of spurious inculpations at the expense of identifying every true positive, in which case only false inclusions are considered in the constitution of a laboratory’s acceptable error bounds. An example is shown in Figure 26.
66
Figure 26: Complexity Threshold Based on Blackstone’s Ratio: Simulated Mixture Results This complexity threshold focuses solely on diminishing the incidence of false positives without specifying a bound on false negatives. Only selected thresholds are labeled.
In this example, any rate of incorrect exclusion is allowed and only the rate of incorrect inclusions is bounded. Here, all samples regardless of drop-out rates may be interpreted, but given a particular Pr(D), a specified number of allowable allelic discrepancies may be defined that increases as with increasing Pr(D). This increase in the number of allowable allelic discrepancies with increasing Pr(D) is again the result of obtaining less allelic information with increasing drop-out, which leads to an increased chance in correctly excluding a reference who ought to have been excluded. (Refer to
Figure 15 and Table 7 for detailed discussions of this point.)
67
3.3.1.2 Rollup ROC Results for Laboratory Mixture Data
The data from Section 3.2 is reproduced in Figure 27 using the visualization
scheme introduced in Section 3.3.1 to facilitate ROC curve generation.
Figure 27: Compilation of Exclusion & Inclusion Data Using Laboratory Mixtures Green represents Pr(Correct Exclusion) ; red, Pr(Incorrect Inclusion) ; yellow, Pr(Correct Inclusion) ; blue, Pr(Incorrect Exclusion)
Compiling the ROC results for all quantities of starting DNA template with the
laboratory mixture data produces Figure 28. When the mass (in ng) of starting DNA
template decreases below 0.25 ng, allelic drop-out occurs, and the greater incidence of
errors is observed, as predicted by the simulation studies.
68
Figure 28: Complexity Threshold: Laboratory Mixture Results False negative errors for template quantities greater than 0.125 ng of DNA are non- existent, and the data points for the respective curves lie on top of one another. The dotted line represents the line y = x of random performance.
If the error specification is such that only a 1% rate of incorrect inclusion is prescribed, the laboratory amplified data shows “operating points” that pass the complexity threshold for all levels of starting template mass analyzed here. A pre- established maximum of seven allelic discrepancies would be given as this laboratory’s tolerance for two-contributor mixtures mixed in a 1:1 proportion; making determinations of inclusion for mixtures containing a higher number of discrepant alleles would not allow for long-term compliance with the indicated 1% ceiling on incorrect inclusion rate.
Therefore for this laboratory allowing 8 allelic discrepancies would be too lax of a criterion given the error requirements. Therefore if there were 8 or greater discrepancies
69
between the evidence and reference samples (regardless of locus), then exclusion may be appropriate. More stringent criterion, such as a 0.1% ceiling on incorrect inclusion rate, would decrease the number of allowed allelic discrepancies to 5.
Although beyond the scope of this work, comparisons between error analyses and
resultant RMNE or LR statistics would be required to assess the relationship between these inclusion statistics versus ROC-determined error rates. Completing this analysis would allow for the determination of a meaningful conclusion as to the “strength” of an evidentiary sample and allow for lay contextualization of the relative strength of RMNE or LR statistic. Additionally, analysis of error tradeoffs for complex mixtures with greater than two contributors and unequal mixing ratios need to be studied to assess the general viability of this analytic method.
4 CONCLUSION
In the analysis context of allelic drop-out—and when dealing with samples
consisting of low template DNA quantities or of mixtures—cognizance of the error rates
associated with DNA profile interpretation is paramount. Awareness of the incidence of
error is crucial to responsible determinations of a reference’s exclusion or inclusion in an
evidence sample. Depending on a laboratory’s Standard Operating Procedure decision
criterion with regard to allelic discrepancies, this potential for error exists independently
of the additional, ever-present pitfalls that can accompany sample collection, storage,
extraction, amplification, and profiling. The various incarnations of results in Section 3
70
represent a characterization of the countervailing errors that lurk in the fundamental nature of comparing allelic lists between samples.
The error characterizations demonstrated are useful in a number of ways. As an
analysis methodology, the ROC paradigm represents a powerful tool that allows for the
characterization of errors across the discipline of forensics, including DNA profile
comparisons. The analysis can be extended, as more sophisticated mathematical
treatment will yield further insight. For example, the slope of the ROC curve at a given
decision threshold can be used to calculate a likelihood ratio of conditional error
expectations that is of essential relevance in contemporary DNA analysis and courtroom
testimony.
The simulated results can be used by a laboratory to inform the establishment of a preferred operating point by selectively optimizing between false positives and false negatives to accord with prudence. Alternatively, after empirically determining a level of drop-out associated with a particular laboratory or with evidence samples of varying starting DNA template, an informed decision can be made regarding tolerance of allelic discrepancies.
The specification of error bounds can also designate an operating region, outside
of which the interpretation of an evidence profile cannot be made with the required
accuracy. Whether a given evidence profile is a candidate for interpretation is a function
of its associated level of drop-out. Evidence profiles shown to lie outside of the
acceptable error bounds due to their level of allelic drop-out are said to fail to meet a
complexity threshold for determinations of inclusion or exclusion. No statistics should
71
be calculated for such samples, and the only responsible determination with respect to reference inclusion/exclusion is “inconclusive.” For evidence profiles possessing levels of drop-out that are deemed interpretable, this same complexity threshold can be employed to establish a laboratory’s decision criteria with respect to tolerating allelic discrepancies; the resulting prescription for determining that a reference is included as a contributor to an evidentiary stain conforms with premeditated, laboratory-selected error rates.
72
5 REFERENCES
List of Abbreviated Publication Titles (In Alpha Order)
Cold Spring Harb. Sympo. Quant. Biol. ………………... Cold Spring Harbor Symposium in Quantitative Biology Croat. Med. J. ………………………………………………….. Croatian Medical Journal Forensic Sci. Int. …………………………………………. Forensic Science International Forensic Sci. Int. Genet. ……………………….. Forensic Science International: Genetics HP Labs. Tech. Report ………………… Hewlett-Packard Laboratories Technical Report Hum. Reprod. ………………………………………………………. Human Reproduction IEEE Trans. Med. Imaging …………….. Institute of Electrical and Electronics Engineers Transactions on Medical Imaging IMAC-XXV …………………………………... International Modal Analysis Conference Int. J. Comput. Vis. …………………………… International Journal of Computer Vision Int. J. Legal Med. ………………………………. International Journal of Legal Medicine Invest. Radiol. ……………………………………………………. Investigative Radiology J. Forensic Sci. ……………………………………………... Journal of Forensic Sciences J. Forensic Sci. Soc. ……………………………….... Journal of Forensic Science Society J. R. Stat. Soc. Ser. C Appl. Stat. ………………………… Journal of the Royal Statistical Society: Series C (Applied Statistics) Jurimetrics J. …………………………………………………………. Jurimetrics Journal Law Probab. Risk. ……………………………………………... Law, Probability and Risk Nat. Genet. ……………………………………………………………….. Nature Genetics NCJ …………………………………………………………….. National Criminal Justice Stat. Methods Med. Res. ……………………….. Statistical Methods in Medical Research Theor. Popul Biol. ……………………………………….. Theoretical Population Biology Univ. Colorado Law Rev. ………………………...... University of Colorado Law Review
73
[1] A.J. Jeffreys, V. Wilson, S.L. Thein, Hypervariable 'minisatellite' regions in human DNA, Nature. 314 (1985) 67-73.
[2] P. Gill, A.J. Jeffreys, D.J. Werrett, Forensic application of DNA 'fingerprints', Nature. 318 (1985) 577-579.
[3] K. Mullis, F. Faloona, S. Scharf, R. Saiki, G. Horn, H. Erlich, Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction, Cold Spring Harb. Symp. Quant. Biol. 51 (1986) 263-273.
[4] S.A. Greenspoon, K.L.V. Sykes, J.D. Ban, A. Pollard, M. Baisden, M. Farr, et al. , Automated PCR setup for forensic casework samples using the Normalization Wizard and PCR Setup robotic methods, Forensic Sci. Int. 164 (2006) 240-248.
[5] P.M. Vallone, C.R. Hill, J.M. Butler, Demonstration of rapid multiplex PCR amplification involving 16 genetic loci, Forensic Sci. Int. Genet. 3 (2008) 42-45.
[6] National Institute of Justice, Postconviction DNA Testing: Recommendations for Handling Requests, NCJ 177626 (1999).
[7] P.M. Schneider, R. Fimmers, W. Keil, G. Molsberger, D. Patzelt, W. Pflug, et al. , The German Stain Commission: recommendations for the interpretation of mixed stains, Int. J. Legal Med. 123 (2009) 1-5.
[8] T. Tvedebrink, P.S. Eriksen, H.S. Mogensen, N. Morling, Estimating the probability of allelic drop-out of STR alleles in forensic genetics, Forensic Sci. Int. Genet. 3 (2009) 222-226.
[9] T. Tvedebrink, P.S. Eriksen, H.S. Mogensen, N. Morling, Evaluating the weight of evidence by using quantitative short tandem repeat data in DNA mixtures, J. R. Stat. Soc. Ser. C Appl. Stat. 59 (2010) 855-874.
[10] P. Gill, C.H. Brenner, J.S. Buckleton, A. Carracedo, M. Krawczak, W.R. Mayr, et al. , DNA commission of the International Society of Forensic Genetics: recommendations on the interpretation of mixtures, Forensic Sci. Int. 160 (2006) 90-101.
[11] Scientific Working Group on DNA Analysis Methods (SWGDAM), SWGDAM interpretation guidelines for autosomal STR typing by forensic DNA testing laboratories, http://www.fbi.gov/about-us/lab/codis/swgdam-interpretation-guidelines. (2010).
[12] J.M. Curran, J.S. Buckleton, Inclusion probabilities and dropout, J. Forensic Sci. 55 (2010) 1171-1173.
[13] D.J. Balding, J.S. Buckleton, Interpreting low template DNA profiles, Forensic Sci. Int. Genet. 4 (2009) 1-10.
74
[14] B. Devlin, Forensic inference from genetic markers, Stat. Methods Med. Res. 2 (1992) 241-262.
[15] J.S. Buckleton, J.M. Curran, A discussion of the merits of random man not excluded and likelihood ratios, Forensic Sci. Int. Genet. 2 (2008) 343-348.
[16] National Research Council (NRC-II), The Evaluation of Forensic DNA Evidence, National Academy Press. Washington, D.C. (1996).
[17] D. Jarjoura, J. Jamison, S. Androulakakis, Likelihood ratios for deoxyribonucleic acid (DNA) typing in criminal cases, J. Forensic Sci. 39 (1994) 64-73.
[18] B.S. Weir, DNA statistics in the Simpson matter, Nat. Genet. 11 (1995) 365-368.
[19] J.J. Koehler, On conveying the probative value of DNA evidence: frequencies, likelihood ratios and error rates, Univ. Colorado Law Rev. 67 (1996) 859-886.
[20] B.S. Weir, C.M. Triggs, L. Starling, L.L. Stowell, K.A.J. Walsh, J.S. Buckleton, Interpreting DNA mixtures, J. Forensic Sci. 42 (1997) 213-222.
[21] I.W. Evett, B.S. Weir, Interpreting DNA Evidence: Statistical Genetics for Forensic Scientists, Sinauer Associates, Sunderland, MA, 1998.
[22] C.J. Word, Mixture interpretation: why is it sometimes so hard? Promega Corporation Web Site. http://www.promega.com/resources/articles/profiles-in- dna/2011/mixture-interpretation-why-is-it-sometimes-so-hard/. 2011 (2011) 5.
[23] I.W. Evett, C. Buffery, G. Willott, D. Stoney, A guide to interpreting single locus profiles of DNA mixtures in forensic cases, J. Forensic Sci. Soc. 31 (1991) 41-47.
[24] C.H. Brenner, R. Fimmers, M.P. Baur, Likelihood ratios for mixed stains when the number of donors cannot be agreed, Int. J. Legal Med. 109 (1996) 218-219.
[25] J. Mortera, A.P. Dawid, S.L. Lauritzen, Probabilistic expert systems for DNA mixture profiling, Theor. Popul. Biol. 63 (2003) 191-205.
[26] J.S. Buckleton, I.W. Evett, B.S. Weir, Setting bounds for the likelihood ratio when multiple hypotheses are postulated, Science & Justice. 38 (1998) 23-26.
[27] A. Stockmarr, Likelihood ratios for evaluating DNA evidence when the suspect is found through a database search, Biometrics. 55 (1999) 671-677.
[28] A.P. Dawid, Which likelihood ratio, Law Probab. Risk. 3 (2004) 65-71.
[29] R. Meester, M. Sjerps, Why the effect of prior odds should accompany the likelihood ratio when reporting DNA evidence, Law Probab. Risk. 3 (2004) 51-62. 75
[30] P. Gill, J. Whitaker, C. Flaxman, N. Brown, J.S. Buckleton, An investigation of the rigor of interpretation rules for STRs derived from less than 100 pg of DNA, Forensic Sci. Int. 112 (2000) 17-40.
[31] W. Blackstone, Of Trial, and Conviction, Commentaries on the Laws of England, Book The Fourth: Of Public Wrongs, Clarendon Press, Oxford, 1765-1769, pp. 358.
[32] R.A. Van Oorschot, M. Jones, DNA fingerprints from fingerprints, Nature. 387 (1997) 767.
[33] M. Pizzamiglio, F. Donato, T. Floris, C. Bellino, P. Cappiello, G. Lago, et al. , DNA typing on latex gloves, in: Sensabaugh G.F., Lincoln P.J., Olaisen B. (Eds.), Progress in Forensic Genetics, Elsevier, Amsterdam, 2000, pp. 504-507.
[34] P. Wiegand, T. Bajanowski, B. Brinkmann, DNA typing of debris from fingernails, Int. J. Legal Med. 106 (1993) 81-83.
[35] P. Wiegand, M. Kleiber, DNA typing of epithelial cells after strangulation, Int. J. Legal Med. 110 (1997) 181-183.
[36] R. Uchihi, K. Tamaki, T. Kojima, T. Yamamoto, Y. Katsumata, Deoxyribonucleic acid (DNA) typing of human leukocyte antigen (HLA)-DQA1 from single hairs in Japanese, J. Forensic Sci. 37 (1992) 853-859.
[37] J. Bright, S.F. Petricevic, Recovery of trace DNA and its application to DNA profiling of shoe insoles, Forensic Sci. Int. 145 (2004) 7-12.
[38] Y. Watanabe, T. Takayama, K. Hirata, S. Yamada, A. Nagai, I. Nakamura, et al. , DNA typing from cigarette butts, Leg. Med. 5, Supplement (2003) S177-S179.
[39] P. Gill, B. Sparkes, J.S. Buckleton, Interpretation of simple mixtures of when artefacts such as stutters are present - with special reference to multiplex STRs used by the Forensic Science Service, Forensic Sci. Int. 95 (1998) 213-224.
[40] J.M. Curran, C.M. Triggs, J.S. Buckleton, B.S. Weir, Interpreting DNA mixtures in structured populations, J. Forensic Sci. 44 (1999) 987-995.
[41] M.W. Perlin, B. Szabady, Linear mixture analysis: a mathematical approach to resolving mixed DNA samples, J. Forensic Sci. 46 (2001) 1372-1378.
[42] Y. Torres, I. Flores, V. Prieto, M. López-Soto, M.J. Farfán, A. Carracedo, et al. , DNA mixtures in forensic casework: a 4-year retrospective study, Forensic Sci. Int. 134 (2003) 180-186.
[43] Applied Biosystems, AmpF ℓSTR® Identifiler® Plus PCR Amplification Kit , User's Manual. Part Number 4323291 Rev. F (2011). 76
[44] Promega Corp., PowerPlex® 16 HS System, User's Manual. Part# TMD022 (2011) 1-73.
[45] P. Gill, Amplification of low copy number DNA profiling, Croat. Med. J. 42 (2001) 229-232.
[46] P. Gill, Role of short tandem repeat DNA in forensic casework in the UK--past, present, and future perspectives, BioTechniques. 32 (2002) 366-385.
[47] B. Budowle, A.J. Eisenberg, A. van Daal, Validity of low copy number typing and applications to forensic science, Croat. Med. J. 50 (2009) 207-217.
[48] M.W. Perlin, A. Sinelnikov, An information gap in DNA evidence interpretation, PLoS ONE. 4 (2009) e8327 1-12.
[49] J.M. Curran, A MCMC method for resolving two person mixtures, Science. 48 (2008) 168-177.
[50] T. Wang, N. Xue, J.D. Birdwell, Least-square deconvolution: a framework for interpreting short tandem repeat mixtures, J. Forensic Sci. 51 (2006) 1284-1297.
[51] C.E. Metz, ROC methodology in radiologic imaging, Invest. Radiol. 21 (1986) 720- 733.
[52] M. Alnadabi, S. Johnstone, Speech/music discrimination by detection: assessment of time series events using ROC graphs, Systems, Signals and Devices, 2009. SSD '09. 6th International Multi-Conference on. (2009) 1-5.
[53] P. Viola, M.J. Jones, Robust real-time face detection, Int. J. Comput. Vis. 57 (2004) 137-154.
[54] J.M. Nichols, M. Seaver, S.T. Trickey, S.R. Motley, E. Eisner, Detecting bolt loosening under strong temperature fluctuations using ambient vibrations, IMAC-XXV. XXV (2007).
[55] S. Gefen, O.J. Tretiak, C.W. Piccoli, K.D. Donohue, A.P. Petropulu, P.M. Shankar, et al. , ROC analysis of ultrasound tissue characterization classifiers for breast cancer diagnosis, IEEE Trans. Med. Imaging. 22 (2003) 170-177.
[56] J.M. Butler, R. Schoske, P.M. Vallone, J.W. Redman, M.C. Kline, Allele frequencies for 15 autosomal STR loci on U.S. Caucasian, African American, and Hispanic populations, J. Forensic Sci. 48 (2003) 908-911.
[57] E.R. Coronado, Amplification reproducibility and the effects on DNA mixture interpretation on profiles generated via traditional and mini-STR amplification, MSc Thesis, Boston University School of Medicine. (2011) 1-77. 77
[58] J. Bregu, D. Conklin, E. Coronado, M. Terrill, R.W. Cotton, C.M. Grgicak, Analytical thresholds: determination of minimum distinguishable signals, J. Forensic Sci. ([in press]).
[59] T. Fawcett, ROC graphs: notes and practical considerations for researchers, HP Labs. Tech. Report. No. HPL-2003-4 (2004) 1-38.
[60] D.T. Chung, J. Drabek, K.L. Opel, J.M. Butler, B.R. McCord, A study on the effects of degradation and template concentration on the amplification efficiency of the STR miniplex primer sets, J. Forensic Sci. 49 (2004) 733-740.
[61] P. Gill, R. Puch-Solis, J.M. Curran, The low-template-DNA (stochastic) threshold— its determination relative to risk analysis for national DNA databases, Forensic Sci. Int. Genet. 3 (2009) 104-111.
[62] W.C. Thompson, Subjective interpretation, laboratory error and the value of forensic DNA evidence: three case studies, Genetica. 96 (1995) 153-168.
[63] W.C. Thompson, Accepting lower standards: the National Research Council's Second Report on Forensic DNA Evidence, Jurimetrics J. 37 (1997) 405-424.
[64] B. Scheck, P. Neufeld, F. Dwyer, Actual Innocence, Doubleday, New York, 2000.
78
6 VITA
Jacob Samuel Gordon MIT Lincoln Laboratory: Permanent address: 244 Wood Street [email protected] 52 Church Avenue Lexington, MA 02420-9108 [email protected] Islip, NY 11751-3902 Work phone: (781) 981-4373 Year of Birth: 1983 Cell phone: (516) 721-7234
EDUCATION
BOSTON UNIVERSITY SCHOOL OF MEDICINE Boston, MA M.S. Degree in Biomedical Forensic Sciences, expected 1/12. Thesis: Characterization of Error Tradeoffs in Human Identity Comparisons: Determining a Complexity Threshold for DNA Mixture Interpretation . [9/06 – present]
HARVARD UNIVERSITY Cambridge, MA A.B. Degree with Honors in Physics, 2005. Dean’s List, Harvard College Scholarship. [9/01 – 6/05]
UNIVERSITY OF CAPE TOWN Cape Town, South Africa Semester study abroad. [2/04 – 6/04]
ISLIP HIGH SCHOOL Islip, NY [9/97 – 6/01]
EXPERIENCE
MIT LINCOLN LABORATORY Hanscom Air Force Base, Lexington, MA Developing and evaluating large-scale ballistic missile defense systems to advance laboratory’s fundamental mission to apply science and advanced technology to critical problems of national security. Analysis includes data fusion and interpretation, system performance characterizations, phenomenology studies, and leveraging of high-level data analysis software. Lead author of papers presented at National Fire Control Symposium (San Diego, 8/2010), Missile Defense Sensors, Environments, and Algorithms (Orlando, 2010). Continuing coursework: Radar Systems, Parameter Estimation for Dynamic Systems, Information Fusion for Decision Support, Cryptography and Cyber Security, Advanced MATLAB Programming, Programming in JAVA. [10/05 – present]
NASA GODDARD SPACE FLIGHT CENTER Greenbelt, MD Investigated use of lasers to remotely sense atmospheric conditions and variables. [7/04 – 8/04]
PUBLIC DEFENDER SERVICE FOR THE DISTRICT OF COLUMBIA Washington, D.C. Selected as one of two members of personal investigative team for felony attorney serving the indigent community in D.C. Responsibilities: interviewing witnesses, analyzing crime scenes, taking sworn statements, writing and serving subpoenas, developing case strategy, and visiting clients in D.C. jail. Attended forensics conference hosted by Dr. Henry Lee. [6/03 – 8/03]
DEPT. OF ENERGY UNDERGRADUATE LABORATORY FELLOWSHIP Upton, NY Recruited by high-energy particle physics group at Brookhaven National Laboratory. PHENIX is one of the two biggest experiments conducted using BNL’s Relativistic Heavy-Ion Collider. Research earned poster presentation slot at American Physical Society’s 2002 Division of Nuclear Physics Conference. [6/02 – 8/02]
79
HARVARD UNIVERSITY COURSE ASSISTANT Cambridge, MA Assisted professors in first- and second-year calculus courses. Responsibilities: teaching weekly, 90- minute, problem sessions, generating course review materials, grading assignments, preparing students for exams, and writing student recommendations. [9/03 – 1/04, 9/04 – 1/05]
PEDIATRIC HEMATOLOGY/ONCOLOGY WARD Massachusetts General Hospital, Boston, MA Child Life Volunteer providing support for patients and to nursing staff. [6/06 – 9/06]
80