Boston University School of Medicine, 2012

Home , Jurimetrics

BOSTON UNIVERSITY

SCHOOL OF MEDICINE

Thesis

CHARACTERIZATION OF ERROR TRADEOFFS IN

HUMAN IDENTITY COMPARISONS: DETERMINING A COMPLEXITY

THRESHOLD FOR DNA MIXTURE INTERPRETATION

JACOB SAMUEL GORDON

A.B., Harvard University, 2005

Submitted in partial fulfillment of the

requirements for the degree of

Master of Science

2012

Approved by

First Reader ______Catherine M. Grgicak, M.S.F.S., Ph.D. Instructor, Biomedical Forensic Sciences

Second Reader ______Robin W. Cotton, Ph.D. Associate Professor, Biomedical Forensic Sciences

ACKNOWLEDGEMENTS

Catherine, you made it your personal mission to ensure that this thesis was a

success. You were willing to take me on as your mentee with a compressed timeline,

introduced me to an engaging problem that I enjoyed attacking, and accommodated many

post-work conferences. Thank you for helping me finish and for providing me with

direction throughout the last couple months.

Dr. Cotton, I have received nothing short of endless support since you became my academic adviser. You helped steer me through the program and are always thinking of interesting people with whom I should speak and churning away at ideas to help guide my future. Thank you for helping me refine my thesis thoughts down the stretch.

Mom and Dad, I truly would have nothing without you. You have given me everything of yours and sacrificed throughout my life so that I never missed out on anything. I have learned every important skill and life strategy from you and have not achieved anything without your endless support, calming patience, guidance, and fajitas.

Melissa and Amy, you have always been blindly loyal to me through triumphs and setbacks, and the only times you haven’t been in my corner were when you were battling me yourselves to ensure that I was imbued with just the right amount of pervasive humbling. You are my best friends.

Because of you four, I am the happiest guy I know.

CHARACTERIZATION OF ERROR TRADEOFFS IN

HUMAN IDENTITY COMPARISONS: DETERMINING A COMPLEXITY

THRESHOLD FOR DNA MIXTURE INTERPRETATION

JACOB SAMUEL GORDON

Boston University School of Medicine, 2012

Major Professor: Catherine M. Grgicak, M.S.F.S., Ph.D., Instructor, Biomedical Forensic Sciences

ABSTRACT

DNA analysts considering a forensic evidence sample and a reference sample

(e.g., from a suspect) have three options when rendering a decision with regard to the consistency between the samples: exclusion, inclusion, inconclusive. Complicating this determination is the reality that DNA profiles originating from forensic evidence may not be fully observed due to allelic drop-out and/or the presence of overlapping alleles.

Different analyst inclinations and laboratory standards exist for informing an analyst’s decision; typically—and particularly for samples demonstrating some degree of allelic drop-out—reference samples exhibiting less than exactly 100% allelic overlap are not automatically precluded from inclusion in an evidence sample. In tolerating some measure of absence of a reference sample’s alleles in an evidence sample, the potential for two kinds of errors exists: In a case in which an individual could not have contributed to an evidence sample, there is the potential for false inclusion; in a case in which an individual could have contributed, there is the potential for false exclusion. In selecting a particular decision criterion to inform determinations of inclusion or exclusion, a tradeoff

between these antagonistic errors exists. A lax decision criterion minimizes false exclusions at the expense of false inclusions while a strict criterion eschews false inclusions at the expense of greater numbers of false exclusions. The relevance of a decision criterion is greatest for low-template samples and for samples that are mixtures of multiple contributors since both are likely to experience allelic drop-out and thus to occupy a potential gray area between certain exclusion and likely inclusion.

In this study, databases of simulated mixtures and laboratory mixtures are compared with databases of simulated excluded and included individuals. In order to generate credible genetic profiles, the phenomena of allelic drop-out and profile mixing are modeled. Given this framework, the universe of possible decision criteria is explored.

Receiver Operating Characteristic curves, a type of analysis originally applied to assessing World War II radar performance, are adopted as a paradigm for summarizing the tradeoff of both types of errors. The a priori balancing of these errors as specified by a laboratory’s standard operating procedures defines a “complexity threshold” that will determine, before the process of sample interpretation is undertaken or inclusion/exclusion statistics are calculated, whether a mixture or low copy number sample ultimately holds any evidentiary value. When a sample does in fact hold evidentiary value, the complexity threshold will have specified a predetermined decision point to inform determinations of exclusion versus inclusion.

TABLE OF CONTENTS

TITLE PAGE……………………………………………………………………………... i

READER APPROVAL PAGE…………………………………………………………... iii

ACKNOWLEDGEMENTS...... iv

ABSTRACT...... v

TABLE OF CONTENTS...... vii

LIST OF TABLES...... x

LIST OF FIGURES ...... xi

1 INTRODUCTION ...... 1

1.1 Comparing Forensic Evidence Samples and Reference Samples...... 3

1.1.1 DNA Profile Comparison ...... 3

1.1.2 Conclusions Arising from Profile Comparisons...... 5

1.1.3 Inclusion Statistics...... 8

1.2 Consideration of Error Rates ...... 15

1.2.1 Error Rate Analysis Using the Paradigm of Receiver Operating

Characteristics...... 15

2 METHODS ...... 19

2.1 Overview...... 19

2.2 Error Analysis Study Using Simulated Mixture Data ...... 20

2.2.1 Simulation Materials...... 20

2.2.2 Simulation Model...... 20

vii

2.2.2.1 Modeling Allele Drop-out...... 28

2.2.3 Generating Populations for Comparison...... 29

2.2.3.1 Simulating Mixtures...... 29

2.2.3.2 Simulating Excluded Individuals...... 32

2.2.3.3 Simulating Included Individuals...... 33

2.2.4 Comparing Populations and Counting Allelic Discrepancies...... 36

2.3 Validation Study Using Laboratory Mixture Data ...... 36

2.3.1 Profile Typing Materials and Methods...... 37

2.4 Data Interpretation Framework...... 37

2.4.1 Organizing Comparison Results...... 37

2.4.2 Making Determinations of Exclusion or Inclusion...... 40

3 RESULTS AND DISCUSSION...... 44

3.1 Error Analysis Study Using Simulated Data ...... 44

3.1.1 Comparing Simulated Mixtures to Simulated Excluded Individuals ...... 44

3.1.2 Comparing Simulated Mixtures to Simulated Included Individuals...... 50

3.2 Validation Study Using Laboratory Mixture Data ...... 53

3.2.1 Comparing Laboratory Mixtures to Simulated Excluded Individuals...... 53

3.2.2 Comparing Laboratory Mixtures to Simulated Included Individuals...... 55

3.3 Impact of Analyst Decision Threshold on Expected Errors ...... 56

3.3.1 Receiver Operating Characteristic (ROC) Analysis ...... 57

3.3.1.1 Rollup ROC Results for Simulated Mixture Data ...... 60

3.3.1.2 Rollup ROC Results for Laboratory Mixture Data...... 68

viii

4 CONCLUSION...... 70

5 REFERENCES ...... 73

6 VITA...... 79

LIST OF TABLES

Table 1: Example Profile Comparison between Three Reference Samples and an

Evidence Sample...... 3

Table 2: Inclusion/Exclusion Criteria from The German Stain Commission:

Recommendations for the Interpretation of Mixed Stains (reproduced from [7])...... 5

Table 3: Steps to Mixture Interpretation from the DNA Commission of the International

Society of Forensic Genetics (reproduced from [10]) ...... 7

Table 4: Mixture Classification Scheme from German Stain Commission and SWGDAM

(“Characteristics” below quoted from [7]) ...... 8

Table 5: Guidelines for Interpretation of Likelihood Ratios (reproduced from [21]) ...... 12

Table 6: Contingency Matrix Correlating Analyst Decision with the Underlying Reality

...... 17

Table 7: Example of Excluded-Individual-to-Mixture Comparison...... 49

Table 8: Notional Table of False Positive & True Positive Rates for Different Decision

Thresholds...... 59

LIST OF FIGURES

Figure 1: Dirac Delta Function Plots of Genetic Model Used in Simulating Profiles...... 21

Figure 2: Dirac Delta Function Representation of an Example Single-Source Profile .... 24

Figure 3: Graphical Matrix Representation of a Representative Single-Source Profile... 25

Figure 4: Dirac Delta Function Representation of Example Mixture Profile from Person 1 and Person 2...... 26

Figure 5: Graphical Matrix Representation of Example Mixture Profile from Person 1 and

Person 2...... 27

Figure 6: Flow Describing Pristine Mixture Profile Generation ...... 31

Figure 7: Flow Used to Generate Perturbed Mixture Profiles ...... 31

Figure 8: Integrated Flow Describing Generation of Pristine Mixture Profiles & Excluded

Individuals...... 32

Figure 9: Flow Describing Generation of Included Individuals (From Simulated Mixture

Profiles)...... 35

Figure 10: Example Histogram Tabulating Discrepancies Between 10,000 References and

1 Mixture...... 38

Figure 11: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies. 39

Figure 12: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies

(two-color) ...... 40

Figure 13: Summary histograms: Excluded-Individual-to-Mixture Comparisons (No

Drop-Out)...... 46

Figure 14: Results Comparing Excluded Individuals to Simulated Mixtures (No Drop-

Out) ...... 47

Figure 15: Results Comparing Excluded Individuals to Simulated Mixtures,

Pr(D)| φ=1 ≠0.00 ...... 48

Figure 16: Results Comparing Included Individuals to Simulated Mixtures (No Drop-

Out) ...... 51

Figure 17: Results Comparing Included Individuals to Simulated Mixtures,

Pr(D)| φ=1 ≠0.00 ...... 52

Figure 18: Results Comparing Excluded Individuals to 1:1 Laboratory Mixtures...... 54

Figure 19: Results Comparing Included Individuals to Laboratory Mixtures...... 55

Figure 20: Example Simultaneous Visualization of Exclusion & Inclusion Comparisons

...... 58

Figure 21: Example ROC Plot...... 59

Figure 22: Compilation of Exclusion & Inclusion Data Using Simulated Mixtures...... 61

Figure 23: Error Analysis Results: ROC Rollup Using Simulated Mixtures ...... 62

Figure 24: Notional Complexity Threshold: Simulated Mixture Results...... 63

Figure 25: More Realistic Complexity Threshold: Simulated Mixture Results ...... 65

Figure 26: Complexity Threshold Based on Blackstone’s Ratio: Simulated Mixture

Results...... 67

Figure 27: Compilation of Exclusion & Inclusion Data Using Laboratory Mixtures ...... 68

Figure 28: Complexity Threshold: Laboratory Mixture Results ...... 69

xii

1 INTRODUCTION

The goal of an analyst when considering DNA evidence is to determine whether a reference sample could have contributed to the genetic profile expressed in an evidence sample. The final determination as to whether the two samples could have originated from the same source may have significant implications in an ongoing investigation or for how compelling a jury finds a prosecutor’s theory of a crime (or, alternatively, how compelling they find a suspect’s defense).

At its core, identifying individuals through DNA profiling is accomplished by examining particular regions (i.e., loci) that are part of an individual’s genetic makeup and are highly variable (i.e., polymorphic) among individuals. These loci are contained on two sets of structures (i.e., chromosomes) containing a person’s DNA, with one set inherited from each parent. A person’s DNA sequence is defined by a collection of four nucleotides—adenine, cytosine guanine, and thymine, conventionally abbreviated as A,

C, G, and T, respectively.

The use of DNA as a “genetic fingerprint” was introduced in 1985 by Alec

Jeffries et al. [1] who described a method of simultaneously detecting “hypervariable

‘minisatellite’ regions” in human DNA—in other words, a collection of exploitable polymorphic loci—that could collectively be of forensic utility to individuate a person

[2]. Around the same time, Kary Mullis et al . [3] published their work on the polymerase

chain reaction (PCR), a method that allowed for the amplification and quantification of

trace amounts of DNA, which are often encountered in forensic work. More modern

methods have introduced increased automation [4] and efficiency [5] to the process, but

the seminal works of Jeffries et al . and Mullis et al . formed the foundation on which

DNA profiling has grown. The introduction of PCR technology into forensic laboratories was accompanied by a shift from using large minisatellite regions, consisting of 1000s –

10,000s of bases to short tandem repeats (STRs), regions that typically range from ~70 –

~500 base pairs. STRs are similar to variable number of tandem repeat (VNTR) regions in that they both consist of tandemly repeating units of DNA, where the number of repeat units (i.e., alleles) is variable in the population. However, STRs are shorter, which allows for efficient amplification of these regions and for increased sensitivity in DNA profiling.

At its heart, successfully “typing” or “profiling” an individual amounts to a signal

detection problem. First, biological evidence is collected from a crime scene, evidence

sample, or person of interest. Sources of biological evidence that can yield DNA samples

include—but are not limited to—blood, semen, saliva, urine, feces, teeth, bone, hair, skin

cells, or other biological tissue samples [6]. DNA is then extracted from the biological

sample and quantified using quantitative polymerase chain reaction (qPCR). Next, a

separate round of PCR amplifies the STR loci of interest. Detection is accomplished by

separating amplicons based on allele size and by the use of fluorescent dyes. The

identification of which alleles are expressed in the sample is determined by applying a

threshold to distinguish a collection of fluorescently-detected electronic signals—allele

peak heights/areas—from a background of noise in an electropherogram. The results are

then interpreted and a comparison between an unknown (i.e., the evidence) and a known

(i.e., the standard) is made.

Although getting to the point of producing a DNA profile from a sample involves

many nuanced steps—from collecting and storing an evidence sample to extracting and

amplifying the evidentiary sample’s genetic profile—the DNA profile that is ultimately

produced consists of the STR alleles at each locus tested, where 13 – 16 STR loci are

generally typed during forensic DNA testing.

1.1 Comparing Forensic Evidence Samples and Reference Samples

1.1.1 DNA Profile Comparison

The first step before comparing evidence and reference samples involves

amplifying the evidence sample’s loci of interest and interpreting which alleles are

present. This process leads to an evidence profile, which consists of a list of alleles

observed to be present at each amplified locus within the evidence sample. The reference

sample is also amplified and its observed alleles noted. This leads to a reference profile.

The comparison of the two involves determining which of the reference alleles are

present in the evidence sample.

Table 1 provides an example comparison between the alleles detected at four loci

for three reference samples and those detected for an evidence sample.

Table 1: Example Profile Comparison between Three Reference Samples and an Evidence Sample The alleles present at each locus in the Evidence Sample are compared with those present at the same locus for a reference sample. Each reference sample is considered in turn. Alleles Locus 1 Locus 2 Locus 3 Locus 4 Evidence Sample 5, 10 5, 6, 7, 8 9, 10, 11 4, 9, 13 Reference 1 5, 10 5, 8 10, 11 9, 9 Reference 2 5, 10 7, 8 9, 11 4, 13 Reference 3 5, 10 6, 6 10, 10 8, 9

Assuming all alleles are detected in the evidence sample, a comparison between the alleles at each of the loci from each reference sample would lead an analyst to a conclusion for that reference sample. For example, the alleles present in Reference 1 are also present at every locus in the Evidence Sample. If all alleles are detected and peak height and contributor ratios are not considered, Reference 1 would be “included” as a potential contributor to the DNA mixture in Evidence Sample 1. Similarly, Reference 2 would also be included as a potential contributor since all alleles present in the individual’s genotype are also present in the evidence profile. In contrast, not all of

Reference 3’s alleles are present in the Evidence Sample; while all of the alleles at Loci

1, 2, and 3 are present, Allele 8 at Locus 4 is not present in the Evidence Sample. A strict insistence that every reference allele be present in the evidence would lead an analyst to

“exclude” Reference 3. However, if allelic drop-out is suspected due to low-level amplification conditions, Reference 3 may be included as a potential contributor despite the allelic discrepancy at Locus 4.

Under ideal circumstances, the inclusion or exclusion of an individual as a contributor to an item of evidence would be straightforward: If—and only if—100% of the reference alleles are detected in the evidence sample can a reference be included as a contributor to the evidence; otherwise, the reference is excluded. In reality, such determinations are not so straightforward due to allelic drop-out during low-template

DNA amplification, as evidenced by the example contained in Table 1.

1.1.2 Conclusions Arising from Profile Comparisons

Given profiles generated from evidence and reference samples, a DNA analyst

compares the two and attempts to make a determination as to whether the contributor of

the reference sample could have contributed to the evidence sample. Thoughtful

comparison of the two samples leads an analyst to one of three conclusions, whose

criteria have been delineated by The German Stain Commission [7] and are contained in

Table 2.

Table 2: Inclusion/Exclusion Criteria from The German Stain Commission: Recommendations for the Interpretation of Mixed Stains (reproduced from [7] ) Conclusion Criteria Inclusion If all alleles of a person in question are uniformly present in a mixed stain, the person shall be considered a possible contributor to the stain.

Exclusion If alleles of a person in question are not present in a mixed stain, the person shall not be considered as a possible contributor to the stain.

Gray Area between The following effects may occur in [mixtures with no major Inclusion and component(s) and evidence of stochastic effects] due to Exclusion imbalances between the mixture components and may cause difficulties in reaching an unambiguous decision about inclusion or exclusion across all analyzed DNA systems: - Locus drop out and allelic drop out (e.g., caused by the sensitivity of the amplification system, as well as by stochastic effects). - Allelic drop out is more likely to occur for longer than for shorter alleles, and in particular for DNA systems with long amplicon sizes.

If the samples do not amplify efficiently, there are too many contributors present in an evidence stain, and/or there is too little starting template, no reliable comparison can be made between reference and evidence profiles. The ability of an analyst to accurately

conclude whether a reference sample should be included as a possible contributor to an evidence sample is dependent upon the quality or “complexity” of the sample. Currently no standard to determine such a “quality factor” is offered in the literature. Assuming both profiles amplify and produce readable electropherograms that are not overwhelmingly diluted in discriminatory power due to the presence of many contributors, assessing allelic commonality is trivial in the case where there are no common alleles between samples: The individual could not have contributed to the evidence sample. The assessment is equally trivial in the case of complete allelic overlap: The individual almost certainly contributed. The case where more-than-zero and less-than-all of the alleles are in common is the case of interest and the case most relevant for considering forensic evidence profiles, which are often subject to some combination of complicating factors.

One metric that describes the “degree of exclusion” of two profiles is the number of allelic discrepancies between them; here, “degree of exclusion” is defined as the extent to which sets of alleles from different samples fail to overlap. As the probability of drop- out increases, the number of discrepancies between a true contributor and an evidence sample increases. Work by Tvedebrink et al . [8] has attempted to determine the

probability of drop-out with respect to an allele’s average electropherogram peak height

(corrected for diploidy); the peak heights are taken to be robust indicators of quantity of

DNA contributed [9]. But little work on defining the implications for an analyst’s ability

to accurately include or exclude a contributor has been provided.

The steps to interpreting a mixture—according to the DNA Commission of the

International Society of Forensic Genetics [10]—are contained in Table 3.

Table 3: Steps to Mixture Interpretation from the DNA Commission of the International Society of Forensic Genetics (reproduced from [10] ) Interpretation Steps Action Step 1 Identify the presence of a mixture

Step 2 Designation of allelic peaks

Step 3 Identify the number of contributors in the mixture

Step 4 Estimation of the mixture proportion or ratio of the individuals contributing to the mixture

Step 5 Consideration of all possible genotype combinations

Step 6 Compare reference samples

However, there is a decision that is actually or implicitly made prior to Step 3: Will this mixture sample lead to a credible interpretation? In other words, is the mixture of sufficient “quality”—i.e., “not too complex”—to be confidently interpreted? Mixture

“types” have also been proposed and have generally been parsed into three main types.

An example scheme for mixture typing—according to the German Stain Commission [7] and the Scientific Working Group on DNA Analysis Methods (SWGDAM) Mixture

Interpretation Guidelines [11]—is contained in Table 4.

Table 4: Mixture Classification Scheme from German Stain Commission and SWGDAM (“Characteristics” below quoted from [7] ) Mixture Type Characteristics Type A Mixtures / No obvious major contributor with no evidence of stochastic Indistinguishable effects Mixtures

Type B Mixtures / Clearly distinguishable major and minor DNA components; Distinguishable consistent peak height ratios of approximately 4:1 (major to Mixtures minor component) across all heterozygous systems, and no evidence of stochastic effects

Type C Mixtures / No major component(s) and evidence of stochastic effects Uninterpretable Mixtures

1.1.3 Inclusion Statistics

Any legal declaration of consistency between a reference standard and an evidentiary profile, dubbed inclusion in the forensics community, must be contextualized statistically [11]. In the realm of DNA analysis, this statistic—however calculated—aims at informing the jury of the probability that a randomly selected individual could not be eliminated as a contributor to a given stain or of how much more likely the prosecution’s hypothesis is than the defense’s hypothesis. The first kind of statistic is calculated via the

“Random Man Not Excluded” (RMNE ) method [12] while the latter is calculated via the

“Likelihood Ratio” (LR ) method [13].

Calculation of the RMNE statistic—or, synonymously, of the “Combined

Probability of Inclusion” ( CPI )—is performed by considering all combinations of

feasible genotypes that could have contributed to the evidence sample. Only evidence

sample loci where alleles are above the stochastic threshold are considered in the 8

computation. Notably, when “unrestricted” CPI is invoked as the statistical method, quantitative information such as peak height or area is ignored. Although the number of contributors theoretically has no bearing on whether CPI could be used as the statistic, it

does have a significant effect on the assumption that all alleles have been detected.

Despite this consideration, the RMNE approach may be considered appropriate when

dealing with an indistinguishable evidence mixture where drop-out of alleles from all

contributors is deemed unlikely. [14,15]

For a sample consisting of m loci, with each locus L containing n alleles {αL,1 ,

αL,2 , …, αL,n } with associated probabilities of occurrence p( αL,1 ), p( αL,2 ), …, p( αL,n ), the

Random Man Not Excluded ( RMNE ) statistic is given by Equation 1.

m m  n 2   RMNE = CPI = ∏ PI L = ∏ ∑ p(α L, A ) L =1 L =1  A=1 

Equation 1: Random Man Not Excluded (RMNE ) Statistic m ≡ total number of loci contained in sample n ≡ number of alleles contained at a particular locus αL,A ≡ a particular allele A at a particular locus L p( αL,A ) ≡ probability of αL,A occurring in population PI L ≡ probability of inclusion at locus L CPI ≡ combined probability of inclusion

Accepting the stipulations of Recommendation 4.1 of the Second National Research

Council Report (NRC-II) [16], which requires an assumption of within-locus independence based on Hardy-Weinberg equilibrium and the associated inbreeding coefficient θ, yields Equation 2. Unlike the RMNE statistic, the LR must assume a number of mixture contributors to the evidence sample and employs knowledge of a reference sample’s genotype.

m  n  2 n     RMNE = ∏ ∑ p(α L,A ) + θ ⋅ ∑ p(α L,A ) ⋅ []1 − p(α L,A ) L=1  A=1  A=1 

Equation 2: RMNE Using NRC-II Recommendation 4.1 m ≡ total number of loci contained in sample n ≡ number of alleles contained at a particular locus αL,A ≡ a particular allele A at a particular locus L p( αL,A ) ≡ probability of αL,A occurring in population θ ≡ inbreeding coefficient

A LR can be calculated either with or without taking peak height information into

account as well as with or without taking the probability of allelic drop-out into account.

The likelihood ratio gives the ratio of the conditional probabilities of encountering an

evidence profile given competing prosecution and defense hypotheses. While the

likelihood ratio makes use of more information (e.g., a reference sample’s genotype) in a

more robust manner, it is both harder to calculate and fundamentally reliant on

assumptions (e.g., number of mixture contributors, defense hypotheses) that can be

difficult to justify. In the most complex scenarios, multiple conclusions for various

numbers of contributors may need to be stated for a single item of evidence. This would

results in two or more LR ’s with no clear indication of which LR is the best estimate.

[15,17-19] For a given evidentiary profile E and two competing hypotheses—i.e., the prosecutor’s hypothesis HP and the defense’s hypothesis HD—the likelihood ratio LR is given by Equation 3.

Pr( E | H ) LR = P Pr( E | H D )

Equation 3: Likelihood Ratio ( LR ) Statistic Pr(E|HP) ≡ probability of encountering an evidence profile E given prosecution hypothesis HP Pr(E|H D) ≡ probability of encountering an evidence profile E given defense hypothesis HD

Equation 4 is based on the general formulation by Weir et al . [20] for computing the

likelihood ratio for x unknown contributors carrying a set of alleles U that are included in evidence sample E under a given hypothesis H.

Pr (U | E) LR = x H P Pr (U | E) x H D

Equation 4: General Formulation of Likelihood Ratio ( LR ) E ≡ evidence sample containing a set of alleles HP ≡ prosecution hypothesis HD ≡ defense hypothesis UH|E ≡ set of alleles contained in evidence sample E but not contributed by contributors specified by hypothesis H Pr x(U H|E) ≡ probability, given x contributors, of observing allele set U given evidence sample E and contributors specified by hypothesis H

The mathematical rigor involved in computing a statistic as well as the apparent precision of the numerical result belie the complexity associated with asking analysts, attorneys, judges, and juries to actually interpret the number that is ultimately calculated.

If the evidence is neither overwhelmingly strong nor underwhelmingly weak, deciding what weight to assign to the evidence can be subjective. Evett and Weir [21] furnish a framework that attempts to guide interpretation of the statistical result by linking a quantitative likelihood ratio with a qualitative indication of the degree to which the evidence backs the prosecution’s hypothesis; the prescriptions are found in Table 5. This

scheme is helpful in eliminating some subjectivity from the evidence support but fails to connect qualitative notions of evidence strength with quantitative notions of error incidence. “Limited” support, for instance, may equate to error rates in the determination of a reference sample’s inclusion that are unacceptably high.

Table 5: Guidelines for Interpretation of Likelihood Ratios (reproduced from [21] ) Likelihood Ratio Range Degree of Support Provided by Evidence for Prosecution’s Hypothesis 1 to 10 Limited

10 to 100 Moderate

100 to 1000 Strong

1000 and greater Very Strong

Word notes, “The statement of ‘inclusion’ under these scenarios may have little meaning… ‘Inconclusive’ or ‘insufficient for comparison purposes’ may be the more appropriate conclusion in some cases” [22].

Despite the interpretive difficulties involved in using an inclusion statistic, the literature is filled with an abundance of scholarly work dedicated to reporting inclusion probabilities and likelihood ratios for multifarious scenarios—when only qualitative allelic information is considered [23], when the number of contributors is ambiguous

[24,25], when multiple hypotheses are entertained [26], when a reference sample is identified via a database search [27], and when reporting prior odds affects interpretation of the LR [28,29]. By contrast, little has been published on “exclusion criteria” or

“complexity criteria.” Although the LR approach attempts to handle allelic loss through

incorporation of the probability of drop-out in the computation [30], a minimal amount of

work has been published regarding a criterion that may be used by analysts to decide when a profile contains usable information. That is, is there a point when an evidentiary

profile contains too many contributors and/or too much drop-out to be considered a

reliable item of evidence for comparison purposes? Blackstone’s ratio, a central moral

and legal principle, asserts: “better that ten guilty persons escape than one innocent

suffer” [31]. This is bolstered by the notion: “The DNA tests that we routinely use in our

laboratories are designed to be exclusionary tests. That is, testing is performed under the

premise that an individual who is not the source of the DNA with a single-source profile

or who is not one of the sources in a mixture of DNA is expected to be excluded from the

DNA sample” [22]. The allure of increased detection of criminals must be chastened by

consideration of the specious detection of innocents.

Invoking this reverse perspective seems pedantic until one considers the

technological advancements employed in DNA testing as well as the implications of

DNA testing to criminal justice policy and practice. Advances include the emerging

abilities to detect DNA in minute quantities of sample—for example, from fingerprints

[32], latex gloves [33], fingernail remains [34], skin cells [35], single hairs [36], clothing

[37], and cigarettes [38]—as well as the ability to discern and analyze mixture samples

[20,23,39-42]. The extraction of potentially individualizing information from degraded

or partial profiles and/or from individual profiles whose characteristics are potentially

masked by the presence of another’s profile is being pursued more often with the

increasing commercialization of new, more sensitive amplification chemistries [43,44].

Although “touch” and otherwise low-level samples are now more routinely submitted for DNA typing, a number of issues are associated with testing these types of low-template samples. By its nature, the attempted detection of scant quantities of an evidentiary sample carries with it the risk of not fully observing all the alleles that truthfully comprise the sample under investigation; this is the phenomena of allelic drop- out [13]. At the same time, because the detection threshold must be relaxed and extra amplification cycles might need to be added to produce a discernible signal, concurrent risk exists for “detecting” spurious evidentiary signal(s) due to contamination, stochastic variation, or adventitious deposition [45,46]; this is the phenomena of allelic drop-in.

Issues relating to heterozygous peak-height balance and the increased prevalence of stutter products also need to be considered [47].

The ramifications for evidence interpretation given this mélange of complications 1 are significant. For example, if a laboratory uses the CPI statistic, loci

with alleles below the stochastic threshold cannot be used for inclusion. If all alleles of

an evidence profile exhibited alleles less than the stochastic threshold, then the evidence

theoretically would not be used for comparison, and the likely outcome would be that a

reliable comparison cannot be made. Similarly, incorporating the probability of drop-out

(Pr(D) ) into the LR would decrease the weight of the evidence since more random people could be included.

1 Error rates associated with sample collection, extraction, and the amplification process itself have been well documented [62-64]. These errors, which range in pervasiveness from the mislabeling of laboratory sample tubes to the contamination of an evidence collection kit by a crime scene investigator, would preface the types of profile interpretation errors that are assessed in Sections 2 and 3 and are not considered in this analysis. 14

1.2 Consideration of Error Rates

Despite the growing literature on determination of the LR , little in the way of

examining whether a profile should even be analyzed is considered; concomitantly

missing is a treatment of the Type I and Type II errors associated with a given likelihood

ratio. Further, models that use linear mixture analysis [41,48], Monte Carlo Markov

Chains [49], or least-square deconvolution [50] have been proposed, but in each case, the

fact that the profile is interpretable has been presumed. The decision of whether to

embark on the analysis path towards sample comparison needs to be determined prior to

comparison to a reference sample (whether qualitatively by the analyst or quantitatively

via computational software); furthermore, how to contextualize an inclusion statistic—

either RMNE or LR —within the framework of error rates is critical.

Accordingly, an investigation of the error rates associated with interpreting

successfully amplified DNA profiles—with a focus on uncovering a complexity threshold

to help constrain the conditions under which interpretation of a low-copy or multiple-

contributor profile is even attempted—is necessary. A method to accomplish these goals

is proposed and detailed in subsequent sections by employing Receiver Operating

Characteristic analysis.

1.2.1 Error Rate Analysis Using the Paradigm of Receiver Operating Characteristics

Receiver Operating Characteristic (ROC) analysis originated in World War II as a

means for radar operators to set their detection thresholds in a manner that optimized the

tradeoff between false alarms (i.e., spurious target detections) and leakage (i.e., undetected targets) [51].

The ROC parameter space of a classification scheme can be organized in a contingency table or confusion matrix that maps data instances into one of four classes relative to the actual and determined classes of those instances. Within the realm of DNA analysis, the classification scheme involves including or excluding a reference profile from an evidentiary profile, and the data points consist of reference-to-evidence comparisons. The true or actual classes depend on whether a reference sample really is included in an evidentiary sample while the determined classes represent the conclusions of an analyst.

This is distinct from the issue of whether a reference sample ought to be included

analytically. As outlined in Section 2.4.2, this study models an analyst’s determination of

exclusion or inclusion deterministically. In doing so, different determinations, which are

based on specific decision criteria, will specify when an analyst ought to include or

exclude a reference as a contributor to an evidence sample. The agreements and

deviations between those prescriptions (i.e., “ ought to be included” versus “ ought to be

excluded”) and reality (i.e., “ possible contributor” versus “ non -contributor”) form the

basis of this study. Considering whether a standard is a possible contributor versus an

actual contributor is a subtle but important distinction. For example, if the mixture

sample contains alleles 14, 15, 16 at a particular locus, then individuals who ought to be

included could have any of the following allele pairings at that locus: (14,14); (14,15);

(14,16); (15,15); (15,16); or (16,16). This is different than confining the error analysis to

those individuals that did actually contribute to a particular mixture since an analyst cannot usually know the exact genotype of a particular contributor among the universe of possible included genotype combinations. Thus, the focus is on the decision criteria themselves; no attempts to model any potential errors committed by an analyst’s actions independent of the prescriptions of his laboratory’s standard operating procedures are considered.

Table 6 shows the four possible outcomes coupled with the consequences of an

analyst’s decision for the comparison of a given reference sample with an unknown.

Table 6: Contingency Matrix Correlating Analyst Decision with the Underlying Reality H0: The mixture profile does not contain the individual’s profile (i.e., the individual is excluded). H1: The mixture profile contains the individual’s profile (i.e., the individual is included). A false negative occurs when an included individual is improperly excluded as a contributor to a mixture, while a true positive occurs when an included individual is correctly included as a mixture contributor. A true negative occurs when an excluded individual is correctly excluded as a mixture contributor, while a false positive occurs when an excluded individual is improperly included as a contributor to a mixture.

Analyst Action Fail to reject H 0; Reject H 0; Reality Accept H 0 Accept H 1 False Negative H false; H true True Positive 0 1 Type II Error False Positive H true; H false True Negative 0 1 Type I Error

Comparisons between an evidence sample and an individual who ought to have been excluded can result either in a correctly excluded individual (i.e., a true negative) or in an incorrectly included individual (i.e., a false positive). Thus, all comparisons that fall within the bottom row of Table 6 come from comparisons involving individuals who could not have contributed to the mixtures to which they are being compared.

Comparisons between an evidence sample and an included individual (i.e., possible contributor) can result either in a correctly included individual (i.e., a true

positive) or in an incorrectly excluded individual (i.e., a false negative). Thus, all comparisons that fall within the top row of Table 6 come from comparisons involving individuals who could have contributed to the mixtures to which they are being compared.

Within the realm of error analysis, a decision that results in a truthfully negative sample being judged to be positive (i.e., a false positive) is known as a Type I error or

“error of the first kind.” The rate at which false positives occur, which is typically denoted by α, is the conditional probability of including an individual as a contributor to a mixture given that the individual ought to be excluded. The sensitivity of a test, also

known as hit rate or recall , is equivalent to the true positive rate and is given by 1−α . A decision that results in a truthfully positive sample being judged to be negative (i.e., a false negative) is known as a Type II error or “error of the second kind.” The rate at which false negatives occur, which is typically denoted by β, is the conditional probability of excluding an individual as a contributor to a mixture given that the individual ought to be included. The specificity of a test is equivalent to the true negative

rate and is given by 1− β .

Initially applied to radar detection thresholds, ROC analysis has found utility in

general problems involving signal detection across multiple disciplines, including speech

recognition and music detection [52], face detection and recognition [53], and vibration-

based structural health monitoring of bolt loosening [54], among many others. It has also

been successfully applied to clinical settings involving disease diagnosis [55] and

provides a useful, reductive lens through which to process the allelic discrepancy data contained in Sections 3.1 and 3.2.

Given databases of mixtures, excluded individuals, and included individuals—

which will originate from simulation (Section 2.2) and from the laboratory (Section

2.3)—assessments will be made as to the relative error rates resulting from varying

decision criteria for declaring a reference profile as either excluded or included from a

given evidence mixture sample, where the decision criterion will be dependent upon the

number of discrepant alleles between the two samples.

2 METHODS

2.1 Overview

Two studies were conducted: an error analysis study that considered a simulated

mixture database (Section 2.2) and a validation study that considered a laboratory

mixture database (Section 2.3). Results for each of these studies (Sections 3.1 and 3.2,

respectively) were generated by comparing these mixture databases to two different

simulated individual databases: one consisting of individuals verified to be excluded as

mixture contributors; the other consisting of individuals verified to be included as

mixture contributors.

In this study, the mixture profiles were taken to be the forensic evidence samples,

and the individual profiles represented the reference samples.

2.2 Error Analysis Study Using Simulated Mixture Data

For the error analysis study using simulated mixture data, the population under

study was a database of simulated mixtures, which were compared to a database of

simulated excluded individuals and a database of simulated included individuals.

2.2.1 Simulation Materials

All simulated genetic profiles and subsequent analyses were accomplished using

MATLAB Version 7.5.0.342 (R2007b) (MathWorks, Natick, Massachusetts).

2.2.2 Simulation Model

Within the simulation framework, profiles were represented as a collection of

alleles determined to be present at each of the 15 autosomal loci contained in the

AmpF ℓSTR® Identifiler® Amplification Kit (Applied Biosystems, Foster City, CA):

D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338,

D19S433, vWA, TPOX, D18S51, D5S818, FGA, D5S818, FGA [43]. Amelogenin, the

gender-determining locus, was not considered since it is not a hypervariable locus

providing discriminatory power.

The alleles observed and tabulated by Butler et al . [56] in their 2003 population

study using Identifiler® were taken as the universe of realizable alleles for the purposes

of simulating profiles. Butler et al . observed 59 distinct allele calls over all autosomal

STR loci: 5, 6, 7, 8, 8.1, 9, 9.3, 10, 10.3, 11, 12, 12.2, 13, 13.2, 14, 14.2, 15, 15.2, 16,

16.2, 17, 17.2, 18, 18.2, 19, 19.2, 20, 21, 21.2, 22, 22.2, 22.3, 23, 23.2, 24, 24.2, 25, 25.2,

26, 27, 28, 29, 29.2, 30, 30.2, 31, 31.2, 32, 32.2, 33, 33.1, 33.2, 34, 34.2, 35, 36, 37, 38,

39. The collection of alleles observed at each particular locus, along with each allele’s associated subpopulation frequency among Caucasians is diagrammed in Figure 1.

Figure 1: Dirac Delta Function Plots of Genetic Model Used in Simulating Profiles The genetic model employed when simulating profiles was based on Butler et al .’s 2003 subpopulation study of Caucasians using the Identifiler® kit [56]. All observed alleles (i.e., the common alleles) at each of the 15 autosomal STR loci are represented by the integer topping each vertical line, and each allele’s associated subpopulation frequency is represented by the height of its respective vertical line.

Profiles of individuals were generated by randomly selecting two (not necessarily

distinct) alleles for each of the 15 autosomal loci in the Identifiler® kit. For any given

locus, a list of alleles was constructed that consisted of all alleles with non-zero

subpopulation frequencies with respect to the 302 Caucasians observed by Butler et al .

Two alleles were selected at random from this locus-specific allele list according to the

subpopulation frequencies observed by Butler et al . (and represented in Figure 1). For

instance, for the CSF1PO locus, the following alleles were observed (with the associated subpopulation frequencies in Caucasians): allele 8 (with a frequency of 0.00497); allele 9

(0.01159); allele 10 (0.21689); allele 11 (0.30132); allele 12 (0.36093); allele 13

(0.09603); and allele 14 (0.00828). The simulation selects a 9 allele 1.159% of the time and a 12 allele 36.093% of the time. A genotype consisting of alleles 8 and 13 at the

CSF1PO locus would be selected 2 × .0 00497 × .0 09603 = .0 09545 % of the time 2,

whereas a homozygous CSF1PO locus consisting of the alleles 11 and 11 would be

selected .0 30132 × .0 30132 = .0 09079 % of the time. This random selection is

consummated for each of the 15 autosomal loci in the Identifiler® kit according to Butler et al .’s observed allele frequencies for each allele at each locus.

Each individual profile, then, consists of a collection of two alleles at each of the

fifteen loci. Recasting this information in the form of a matrix served the dual purpose of

neatly summarizing an individual’s profile in a logical manner as well as setting up the

data in a computationally efficient manner. Matrix representation of profiles leveraged

MATLAB’s vectorized analysis environment to facilitate fast data manipulation in

performing essential operations, such as “summing” individuals’ profiles to simulate

mixtures and comparing two profiles for the presence of common alleles.

Thus, an amplification result for an individual was represented as a 59 x 15

matrix, with the 59 rows corresponding to the universe of possible alleles at all loci and

the 15 columns corresponding to those particular loci. (Row 1 corresponded to allele 5

2 The factor of 2 that is included in the product of heterozygous allele frequencies to account for the combinatorial fact that a genotype consisting of allele 8 and allele 13 (in that order) is equivalent to a genotype consisting of allele 13 and allele 8 (in that order). 22

and proceeded in a monotonically increasing fashion through row 59, which corresponded to allele 39.) The loci order corresponded to the ordering listed above with column 1 corresponding to the D8S1179 locus and column 15 corresponding to the FGA locus.)

The matrix entries represented relative allele prevalences for that profile. The relative presence of an allele allows for a simple model of allele expression—as either absent, present heterozygously, or present homozygously for single-source profiles while not taking into account signal intensity, the number of contributors, or relative contributor ratios for mixed profiles. For example, a given single-source profile matrix consisted of entries of relative prevalences of 0, 1, or 2. An entry of zero corresponded to the absence of that allele for that particular locus (e.g., a zero in the 8 th row and 1 st column indicates

that the reference did not have a 10 allele at the D8S1179 locus); an entry of unity

corresponded to the presence of a heterozygote allele at a particular locus (e.g., a one in

the 42 nd row and 2 nd column indicates that the reference’s D21S11 locus is heterozygous and that one—and only one—of the two alleles possessed at this locus is a 29); an entry of 2 represents a homozygous allele at the specified locus (e.g., a two in the 11 th row and

3rd column indicates that the reference has two 12 alleles at the D7S820 locus). The

profiles of all individuals were assumed to have exactly two alleles at a given locus.

For an individual profile I with 15 loci L, each consisting of two alleles αL,1 and

αL,2 , Equation 5 models the resulting profile.

15 15 2 I = ∑ (I α L 1, + I α L 2, )= ∑∑ I α L , A L=1 LA=1 =1

Equation 5: Model of Individual Profile α I L, A ≡ allele A contained at locus L for individual I I ≡ (complete) individual profile

Figure 2 shows a graphic representation of an example single-source profile using the

Dirac delta function.

Figure 2: Dirac Delta Function Representation of an Example Single-Source Profile The alleles at each of the 15 autosomal loci are represented by the integer topping each vertical line, and each allele’s associated subpopulation frequency is represented by the height of its respective vertical line.

This example individual’s profile is equivalently represented in Figure 3 as a matrix.

Figure 3: Graphical Matrix Representation of a Representative Single-Source Profile The loci names on the x-axis have been abbreviated. The colors of the abbreviated loci names correspond to their respective fluorescent dye colors in the Identifiler® kit (except in the case of the black font, which corresponds to a dye color of yellow). Because of space limitations, only the first and last allele values are identified on the y-axis. In place of potentially illegible numbers, a relative allele prevalence of 0 is represented as white- space, while a red box indicates a relative prevalence of 1, and a blue box indicates a relative prevalence of 2.

Mixtures were generated by summing the matrices of a given number of contributors. A mixture of two people could have matrix entries corresponding to relative allele prevalence in the range 0 – 4 depending on the degree of allelic overlap between contributors. (For example, if two contributors were homozygous for the same allele at a particular locus, the resulting mixture matrix entry for that locus’s allele would be 4.)

Therefore, in general, for a mixture M1, created by contributions from two individuals I1 and I 2 with 15 loci L, each consisting of two alleles αL,1 and αL,2 , the resulting alleles expressed at each locus in the mixture profile are modeled by the simple sum shown in Equation 6.

15 2 15 2 α L 1, α L 2, α L, A M 1 = I1 + I 2 = ∑(I S + I S )=∑∑∑ I S 1 2 c L=1 c=1 L =1 A =1

Equation 6: Model of a Two-Person Mixture Profile αL,A ≡ allele A contained at locus L Sc ≡ single-source profile for individual c M1 ≡ (complete) profile for mixture 1

Figure 4 and Figure 5 show graphic representations of an example mixture of Person 1 and Person 2’s profiles as a collection of Dirac delta function plots and as a matrix, respectively.

Figure 4: Dirac Delta Function Representation of Example Mixture Profile from Person 1 and Person 2 The alleles at each of the 15 autosomal loci are represented by the integer topping each vertical line, and each allele’s associated subpopulation frequency is represented by the height of its respective vertical line. 26

Figure 5: Graphical Matrix Representation of Example Mixture Profile from Person 1 and Person 2 The loci names on the x-axis have been abbreviated. The colors of the abbreviated loci names correspond to their respective fluorescent dye color in the Identifiler® kit (except in the case of the black font, which corresponds to a dye color of yellow). The first and last common alleles are identified on the y-axis, where a relative allele prevalence of 0 is represented as white-space; a red box indicates a relative prevalence of 1; a blue box indicates a relative prevalence of 2; a green box indicates a relative prevalence of 3; and a magenta box indicates a relative prevalence of 4.

In this scenario, allele detection has effectively been reduced to a binary system such that each allele is deterministically either present or absent. This ultimately manifests itself as a relative prevalence number instead of a peak height or area. Modeling different mixture ratios between contributors would be required to fully encompass the potential effects of allelic drop-out. Since the drop-out model employed in this study incorporates consideration of relative allele prevalence, all results assume a 1:1 mixture ratio.

2.2.2.1 Modeling Allele Drop-out

If no drop-out is assumed, summation of the single-source matrices would result in

a mixture profile with all contributed alleles detected; this was considered a “pristine

mixture” profile and is akin to instances in casework in which the mixture proportion

ratio is 1:1 with a total DNA mass input of greater than 0.5 ng into the amplification

process [57]. To account for instances where lower targets of DNA are amplified, allele

drop-out needed to be modeled. To accomplish this, pristine mixtures were perturbed for

varying proportions of drop-out for a heterozygous allele from 0 to 1 in increments of

0.1. Here, a drop-out level of 0 means that all of the alleles were detected; in this case,

the “perturbed mixture” is identical to the “pristine mixture.” A non-zero level of drop-

out corresponded to the proportion of time a heterozygously-present allele (i.e., an allele

with a relative prevalence of 1) was not detected. For example, for a single-source

sample with a drop-out proportion of 0.1, each allele with a relative prevalence of 1 stood

a 10% chance of not being detected. Random numbers drawn separately for each allele at

every locus according to the specified proportion of drop-out determined whether a given

allele actually dropped out.

For alleles within a profile that were contributed multiple times (i.e., had a

relative prevalence greater than unity)—either from an individual contributor being

homozygous at that locus or from overlapping alleles between contributors—their

increased prevalence diminished the probability that that particular allele would drop-out.

Therefore, an allele that is twice as prevalent in a mixture is half as likely to completely

drop-out while an allele that is four times as prevalent is one-quarter as likely to drop-out.

Thus, for a particular mixture allele α at a particular locus L with a relative prevalence φ ,

the expression used to describe the probability of drop-out Pr( D) α L, A for that particular φ mixture allele, given a specified probability of drop-out for a heterozygous allele

Pr( D) , is given by Equation 7. φ =1

Pr( D) Pr( D) α L, A = φ =1 φ φ

Equation 7: Probability of Allele Drop-out αL,A ≡ allele A at locus L φ ≡ relative prevalence of allele (e.g., 0, 1, 2, 3, 4) Pr( D) ≡ specified probability of drop-out for a heterozygous allele φ =1 Pr( D) α L,A ≡ realized probability of drop-out for allele A at locus L φ

Whether an allele actually dropped out or remained observable was determined through a random number draw weighted with the appropriate probability of drop-out Pr( D) α L, A . φ

2.2.3 Generating Populations for Comparison

2.2.3.1 Simulating Mixtures

First, profiles for 100,000 single-source samples were simulated to serve as

possible mixture contributors. Random profiles from this set were selected two-at-a-time

and their profiles combined into a mixture. A quality check was employed to ensure that

an individual’s—and potential mixture contributor’s—profile was not selected multiple

times, which would have led to a single person contributing twice to a single “mixture.”

As described in Section 2.2.3.1, the combination of the two profiles was achieved by

summing the matrix profiles from each contributor to arrive at a matrix profile that represented the pristine mixture under the condition of no allele drop-out. This selection of mixture contributors and summing of contributor profiles was repeated 10,000 times.

The final collection of pristine mixture profiles comprised part of the simulated mixture database; these 10,000 mixture profiles (with no drop-out) collectively represented one mixture set. Perturbations of the pristine mixtures, generated by applying varying levels of allelic drop-out (i.e., 0.10 to 0.90 in increments of 0.10), contributed the rest of the mixture sets, resulting in a total of 10,000 mixtures/set × 10 sets = 100,000 simulated mixtures. These simulated mixtures were later used for comparison to simulated excluded and included single-source profiles.

Figure 6 provides a flow chart representing the process of simulating pristine mixture profiles, along with the associated inputs that defined the parameters of this particular study.

Figure 6: Flow Describing Pristine Mixture Profile Generation Results contained in Sections 3.1 & 3.3 are for npop ≡ number of simulated potential contributor individuals = 100,000, ncontribs ≡ number of mixture contributors = 2, & nmix ≡ number of mixtures to simulate = 10,000.

From this master set of pristine mixture profiles, perturbed mixtures were generated, each modeling a different level of allelic drop-out. This flow is depicted in Figure 7.

Figure 7: Flow Used to Generate Perturbed Mixture Profiles Results included in Sections 3.1 & 3.3 incorporate a range of allele drop-out rates increasing from 0% to 90% in increments of 10%. Pr( D) ≡ specified probability of drop-out for a heterozygous allele φ =1

2.2.3.2 Simulating Excluded Individuals

For the exclusion component of the simulation study, 10,000 possible individual profiles were simulated in the same manner as they were for the mixture contributors to create a database.

Subsequently, when comparisons were being made between these excluded types and a particular mixture, a quality check ensured that a given reference profile deemed to be excluded did not in fact contain a collection of alleles that overlapped 100% with the mixture profile under consideration; in other words, the profiles were confirmed to be excluded. This confirmed that the correct relationship between the references and the mixtures was in fact exclusion.

Figure 8: Integrated Flow Describing Generation of Pristine Mixture Profiles & Excluded Individuals Results provided in Sections 3.1 and 3.3 are for npop ≡ number of simulated potential contributor individuals = 100,000, nindivs ≡ number of simulated included individuals = 10,000, ncontribs ≡ number of mixture contributors = 2, & nmix ≡ number of mixtures to simulate = 10,000.

Figure 8 demonstrates the expanded simulation methodology as well as the

relationship between the mixture contributor profiles and excluded reference profiles,

along with the relevant simulation inputs.

2.2.3.3 Simulating Included Individuals

For the inclusion component of the simulation study, the same number of comparisons between single-source and mixture profiles was desired as that which took place for the simulated excluded individuals. In the exclusion component of the simulated mixture study, 10,000 excluded individuals were compared recursively to

10,000 mixtures, resulting in a total of 10,000 individuals ×10,000 mixtures = 1×10 8 comparisons. Unlike the exclusion component of the study in which a single population of individuals was simultaneously excluded from all mixtures, separate populations of included individuals are needed for every individual mixture; in other words, for 10,000 mixtures, distinct sets (each containing 10,000 individuals) are needed, with each set appropriate for comparison to a single mixture. Thus, a total of 10,000 included individuals per mixture × 10,000 mixtures = 1×10 8 total included individuals are needed to arrive at an equivalent number of comparisons.

To generate the set of included individuals for a given mixture, that mixture’s matrix profile was considered. In the case of no allelic drop-out, the universe of possible alleles considered for reference profile generation—previously set to mirror Butler et al .’s observed subpopulation frequencies—was collapsed to include only those alleles that were represented in the mixture. Once all of the alleles for each locus had been

identified, the frequencies of allele incidence at a given locus were renormalized to 1 using Butler et al .’s published frequencies [56]. This ensured the relative subpopulation frequencies between alleles was maintained while limiting the pool of possible alleles. In general, at locus L for a mixture containing n alleles { αL,1 , αL,2 , …, αL,n } with corresponding subpopulation frequencies { fL,1 , f L,2 , …, f L,n }, the resulting renormalized

frequency RL,m of an allele αL,m is given by Equation 8.

f R L,m L,m = n f ∑ L,Ai i =1

Equation 8: Renormalized Allele Frequency for Generation of Included Individuals A ≡ collection of alleles at locus L with non-zero allele frequencies Ai ≡ ith allele in set A n ≡ number of alleles in A fL,m ≡ subpopulation frequency of allele m at locus L RL,m ≡ renormalized subpopulation frequency of allele m at locus L

For the D16S539 locus, for example, the observed alleles (along with their

subpopulation frequencies) were alleles 8 (0.01821), 9 (0.11258), 10 (0.05629), 11

(0.32119), 12 (0.32616), 13 (0.14570), 14 (0.01987). If a given mixture only contained

alleles 9, 10, and 14, then the re-normalized frequencies for the generation of individuals

would be allele 9 (0.24817), 10 (0.70803), 14 (0.04380), and all included individuals

would contain one of the following genotypes (α1,α2): (9,9), (9,10), (9,14), (10,10),

(10,14), or (14,14). 3

It should be noted that restricting the included individuals only to include the

mixture contributors that actually contributed to these simulated mixtures provides an

3 (α1,α2) is genotypically equivalent to (α2,α1). 34

insufficiently large population (i.e., consisting of only two individuals) from which to make comparisons. The actual mixture contributors represent only a subset of all individuals that could have contributed to a given profile. That is, just because the combination of C 1 and C 2’s profiles resulted in a collection of alleles in mixture M 1 does

not necessarily exclude the possibility that the combination of C 3 and C 4’s profiles could produce that same combination of alleles. In fact, the possibility that any given person could have reasonably contributed to a mixture increases in likelihood as the number of contributors—and thus the total collection of mixture alleles—increases. To ease the computational burden of simulating extraordinary quantities of individuals until one was serendipitously included in a given mixture, the prescribed methodology constrains the simulation space to produce included individuals in a more efficient manner. Such individuals that are forced—by simulation—to have profiles included in a given mixture should be determined to be potential contributors by an analyst.

Figure 9 depicts the simulation process for generating included individuals, along with the simulation inputs used in this study.

Figure 9: Flow Describing Generation of Included Individuals (From Simulated Mixture Profiles) Results provided in Section 3.1 and 3.3 are for nindivs ≡ number of simulated included individuals = 10,000, ncontribs ≡ number of mixture contributors = 2, & nmix ≡ number of mixtures to simulate = 10,000.

2.2.4 Comparing Populations and Counting Allelic Discrepancies

As previously described, profiles within each of the databases (mixtures, excluded individuals, included individuals) existed as 59 x 15 matrices. To compare two profiles, the single-source matrix was first considered to see which alleles were present at each locus. The mixture profile matrix was then considered to see how many of those single- source alleles it contained.

In accordance with the “relative prevalence” model, allele detection for mixtures was assessed in a binary fashion: for the purposes of comparison, each allele was either considered detected or not detected, without regard to the “strength” of presence, which might be interpreted through peak height or area. Accordingly, each reference allele from a heterozygous locus not contained in the mixture counted as one allelic discrepancy, and the reference alleles from a homozygous locus not contained in the mixture counted as two allelic discrepancies. Each individual locus could have zero, one, or two allelic discrepancies with a corresponding mixture locus. This discrepancy tallying was completed for all alleles at all loci in the single-source matrix. Thus, since the Identifiler® kit tests 15 autosomal loci, the range of total possible discrepancies between a given reference profile and a given mixture profile included integers ranging from 0 to 30.

2.3 Validation Study Using Laboratory Mixture Data

For the error analysis study using laboratory mixture data, the population under

study consisted of actual, amplified mixture samples, which were compared to the

already-generated database of simulated excluded individuals and a separately-generated database of simulated included individuals.

2.3.1 Profile Typing Materials and Methods

Full details of sample collection, preparation, extraction, amplification, and analysis for these laboratory mixtures are described elsewhere [58]. Two mixtures consisting of two individuals each (for a total of four individuals) mixed in a 1:1 ratio were considered for six starting masses of DNA template: 2.0 ng, 1.0 ng, 0.5 ng, 0.25 ng,

0.125 ng, and 0.0625 ng.

2.4 Data Interpretation Framework

2.4.1 Organizing Comparison Results

The comparison of all single-source genotypes to a particular mixture within a mixture set yielded a vector of allelic discrepancy values. The total compilation of all comparisons between all individuals and all mixtures within a mixture set yielded a collection of allelic discrepancy values that were plotted as a histogram. In all, 10,000 individuals were compared to 10,000 mixtures, resulting in a total of 1×10 8 comparisons represented in such a histogram.

Each histogram bin τδ contained a count ρδ of the number of comparisons between reference and mixture profiles out of the total number ψ of comparisons that had exactly

δ allelic discrepancies between reference and mixture profiles, where δ was an integer

ranging from 0 to 30.

An example histogram organizing a collection of discrepancy values for 10,000 individuals compared to a single mixture is shown in Figure 10. For example, reading the bin corresponding to 15 allelic discrepancies (τδ=15) for the histogram yielded the following information: Exactly 15 discrepancies ( δ = 15) were observed in 1060 reference-to-mixture comparisons ( ρ15 = 1060) out of the total of 10,000 reference-to- mixture comparisons ( ψ = 10,000) comprising this histogram.

Figure 10: Example Histogram Tabulating Discrepancies Between 10,000 References and 1 Mixture The height of the bar for a given histogram bin τδ represents the number of reference-to- mixture comparisons that resulted in δ allelic discrepancies, where an allelic discrepancy is defined as a reference allele that is not present in the mixture profile.

Within the context of this study, this histogram represents the collection of

discrepancies resulting from the comparison of a particular mixture set with a particular

single-source database (i.e., a set of either excluded individuals or included individuals).

Completing comparisons for multiple mixture sets yielded multiple histograms, each

representing the results from comparing a database of individuals with a set of mixtures

exhibiting a given rate of allelic drop-out. Here, just one histogram is considered.

Using the same data, the histogram depicted in Figure 10 was recast as the

cumulative normalized histogram shown in Figure 11. Each cumulative normalized

histogram bin represented the fraction of the total number of comparisons between

reference and mixture profiles that had less than or equal to δ allelic discrepancies (i.e.,

∑ ρi i=1 ). For example, ~0.831 or ~83.1% of all comparisons resulted in 15 or less ψ

δ =15

∑ ρi discrepancies (i.e., i=1 = .0 831 ). ψ

Figure 11: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies The height of the bar for a given cumulative normalized histogram bin τδ represents the number of reference-to-mixture comparisons that resulted in δ or less allelic discrepancies.

∑ ρi Since each bin’s magenta bar corresponded to i=1 comparisons with discrepancies less ψ than or equal to δ, the proportion of total comparisons with discrepancies greater than δ

∑ ρi was 1− i=1 . These complementary quantities are included in Figure 12 as gray bars. ψ

(To facilitate the discussion of later analysis, the magenta bar in each bin was stacked on

top of the complementary gray bar in that bin; the data represented by the magenta bars

are identical between Figure 11 and Figure 12.) Another way to read the plot is to

consider an allelic discrepancy bin, e.g., τ15 . The magenta bar in that bin represents the proportion of individual-to-mixture comparisons resulting in exactly δ discrepancies or

less while the gray bar in that same bin represents the proportion of individual-to-mixture 39

comparisons resulting in more than δ discrepancies. As an example, for τ15 , 83.1% of all reference-to-mixture comparisons yielded 15 allelic discrepancies or less while 16.9% of comparisons yielded 16 or more allelic discrepancies.

Figure 12: Cumulative Normalized Histogram of Reference-to-Mixture Discrepancies (two-color) Each magenta bar corresponds to the number of individual-to-mixture comparisons exhibiting δ or less discrepancies. Each gray bar corresponds to the number of individual-to-mixture comparisons exhibiting more than δ discrepancies.

All of the data comparing single-source profiles to mixture profiles—whether

involving individuals that are excluded or included, or mixtures that originate from

simulation or from the laboratory—were cast in these summary histograms.

2.4.2 Making Determinations of Exclusion or Inclusion

The next analysis step involves considering how a DNA analyst, when presented

with a reference profile and a mixture profile, would make a determination of inclusion

versus exclusion when comparing the two profiles. In reality, making a determination of

inclusion versus exclusion is not always a straightforward one and is in fact rarely

straightforward when low template quantities or mixture data are being considered. The

comparison and resulting decision involves nuanced consideration of many complex

aspects of profile analysis and interpretation. A decision need not be binary—i.e., the

individual is either included or excluded—and an analyst must offer statistics and/or likelihood metrics to contextualize their determination [10,11].

A simplistic yet not irrelevant means of modeling an analyst’s decision is to

enforce a decision threshold on the number of tolerable allelic discrepancies while still

concluding an individual is included in a mixture. Under such a scheme, the strictest

decision threshold would equate to tolerating zero allelic discrepancies, which

corresponds to an analyst insisting that every allele present in the reference profile appear

in the mixture profile in order to conclude that the reference is a potential contributor to

that mixture. Alternatively, a laboratory may choose to establish a complexity threshold

such that, given a particular set of circumstances—e.g., too much allelic drop-out—one

may decide that the inherent likelihood of error is too high to render an accurate

conclusion; accordingly, the interpretation of the mixture would cease (i.e., before Step 3

in Table 3).

Enforcing a decision threshold allows for the deterministic modeling of the

analyst decision process. Under such a paradigm, an analyst operating under a particular

laboratory protocol might, for example, tolerate the absence of up to six reference alleles

(presumably due to allelic drop-out) in the mixture profile and still declare the reference

to be included in that mixture. In other words, six of the reference alleles (i.e., two at

three loci, one at six loci, or other combinations summing to six) are “missing” from the

mixture profile; one possible reason for the “missing alleles” could be presumed drop-out

in the mixture profile. Accordingly, for a given profile, the analyst would accept any

mixture profile that contained 0 to 6 allelic discrepancies with the reference profile as supporting a conclusion of inclusion. In this case, the analyst’s decision threshold is τδ=6 .

Consider the example cumulative normalized histogram in Figure 12, with the

reference-to-mixture comparisons assumed to involve individuals that truthfully did not

contribute to the mixtures; that is, the individuals really ought to be excluded. In the

context of Table 6, this corresponds to “reality: H0 true; H 1 false,” which confines

the analyst decision consequences to the bottom row of the contingency table: True

negative or false positive (Type I Error). The analyst will make the correct decision (i.e.,

true negative) if truthfully excluded individuals (i.e., non-contributors) are excluded;

conversely, if an analyst were to include truthfully excluded individuals, it would be in

error (i.e., false positive). For example, if the decision threshold was set at 6, the

magenta bar at τδ=6 represents the fraction of comparisons for which false positive errors were made. The false positive rate at a decision threshold of τδ=6 , then, is the quotient of

the total false positives and total negatives (i.e., the total number of excluded

δ =6

∑ ρi i=1 individuals): . The gray bar at τδ=6 represents the fraction of comparisons for ψ

which correct conclusions are made. The true negative rate is the quotient of total true

δ =6

∑ ρi negatives and total negatives (i.e., the total number of excluded individuals): 1− i=1 . ψ

Alternatively, consider a separate example. In this case, consider the example histogram in Figure 12, with the individual-to-mixture comparisons now instead assumed to involve individuals that could have contributed to the mixtures; that is, the individuals 42

are, in reality, included. In the context of Table 6, this corresponds to “reality: H 0 false; H 1 true,” which confines the analyst decision sequences to the top row of the contingency table: false negative (Type II Error) or true positive. The analyst will make the correct decision (i.e., true positive) if individuals who truthfully ought to have been included (i.e., possible contributors) are included; conversely, if an analyst were to exclude truthfully included individuals, it would be in error (i.e., false negative).

Any time a decision threshold is set at a level that excludes any of the magenta bars, some errors are being made. If the decision threshold was set at τδ=6, the gray bar at

τδ=6 represents the fraction of comparisons for which false negative errors were made.

The false negative rate represented by a decision threshold at τδ=6 , then, is the quotient of the total false negatives and total positives (i.e., the total number of included individuals):

δ =6

∑ ρi i=1 1− . The magenta bar at τδ=6 represents the fraction of comparisons for which ψ correct calls are made. The true positive rate is the quotient of total true positives and

δ =6

∑ ρi total positives (i.e., the total of included individuals): i=1 ; in the literature, another ψ term for true positive rate is recall . [59]

In the literature, another term for the quotient computed by dividing the number of true positives by the sum of true positives and false positives is precision . Another term for the quotient computed by dividing the sum of true positives and true negatives by the sum of total positives and total negatives is accuracy . [59]

3 RESULTS AND DISCUSSION

Although preferable, all of a standard’s alleles need not be present in order to

conclude that an individual should or should not be considered a possible contributor to a

mixture. Making determinations in the face of incomplete information, though, requires

consideration of countervailing implications of decision thresholding as well as

concomitant error rates. If the number of tolerated allelic discrepancies is set too low,

then individuals that ought to have been included are not; conversely, if the number of

tolerated discrepancies is too high, then individuals that ought to have been excluded are

not. In other words, a tradeoff exists between properly including contributors and

improperly excluding non-contributors.

For the simulated error analysis study (Section 3.1), simulation methods were employed to probe the relationship between correct and incorrect determinations of inclusion and exclusion as they varied with analyst laxity in match criteria and with probability of allelic drop-out. An error analysis exploring the tradeoff between false positive and false negative errors was explored as a function of drop-out. The results from this error analysis study using simulated profiles were then compared with laboratory mixture data (Section 3.2)

3.1 Error Analysis Study Using Simulated Data

3.1.1 Comparing Simulated Mixtures to Simulated Excluded Individuals

In the case of comparisons between the simulated mixtures and the simulated excluded standards, all of the individuals under consideration should, in truth, be

excluded by a DNA analyst. If an analyst were to conclude otherwise, it would be in error. Given a decision criterion, an analyst can either fail to reject H 0, which

corresponds to excluding the individual from the mixture, or reject H 0, which corresponds

to including the individual from the mixture (see Table 6). Since all of these simulated

individuals really ought to be excluded, an analyst determination of exclusion properly

identifies the individual as a true negative. Conversely, an incorrect analyst

determination of inclusion represents a false positive.

Tallying the discrepancies observed in all of the comparisons between individuals

and 2-contributor mixtures exhibiting no drop-out (i.e., Pr( D) = 00.0 ) yields the three φ =1 histograms contained in Figure 13.

Figure 13: Summary histograms: Excluded-Individual-to-Mixture Comparisons (No Drop-Out) The top histogram counts how many allelic discrepancies were found in all reference-to- mixture comparisons. The middle histogram cumulatively bins those results. The bottom histogram (Figure 13c) normalizes those cumulative bin results by the total number of comparisons made.

Using the results depicted in the cumulative normalized histogram in Figure 13c and mapping the two possible analyst actions (i.e., determinations) to the corresponding colors from Table 6 yields Figure 14.

Figure 14: Results Comparing Excluded Individuals to Simulated Mixtures (No Drop-Out) This figure presents the same data depicted in Figure 13c.

Discrepancies were approximately normally distributed with a mean of

approximately 13 discrepancies and a standard deviation of about 3 discrepancies. For a

δ =6

∑ ρi given analyst allelic discrepancy tolerance δ, the green bar ( i=1 ) at the associated bin ψ

τδ corresponds to the total proportion of excluded-individual-to-mixture comparisons

δ =6

∑ ρi resulting in a correct exclusion while the complementary red bar ( 1− i=1 ) at that same ψ

bin corresponds to the proportion of comparisons resulting in incorrect inclusions.

As an example, consider the case in which a laboratory is given a population of

reference profiles that truthfully should be excluded from a population of mixture

profiles. In that case, if a laboratory assessed individual-to-mixture “matches” while

allowing for up-to-six allelic discrepancies, their analysts would make correct exclusions

99.1% of the time while incorrectly being unable to exclude (i.e., including) 0.9% of the

time. As the number of allowed discrepancies increases, the probability of incorrectly including a truthfully excluded individual increases and is greater than 5% at a decision threshold that tolerates 8 discrepancies.

Equivalent analyses were performed for varying levels of drop-out, and the results for excluded-individuals-to-mixtures comparisons are shown in Figure 15.

Figure 15: Results Comparing Excluded Individuals to Simulated Mixtures, Pr(D)| φ=1 ≠0.00 Green bars correspond to Pr(Correct Exclusion) , i.e., true negatives . Red bars correspond to Pr(Incorrect Inclusion) , i.e., false positives .

For a given decision threshold τδ, Figure 15 shows that as Pr(D) increases, so does

the chance of correctly excluding a standard that truly should have been excluded. A

higher rate of drop-out results in less allelic information being available in the mixture 48

profile, which in turn increases the number of discrepancies between reference and mixture profiles. An example of this counterintuitive trend is illustrated in Table 7.

Table 7: Example of Excluded-Individual-to-Mixture Comparison The chance of correctly excluding a truthfully excluded individual increases as the probability of drop-out increases. Non-Contributor Individual Locus 1 Locus 2 Locus 3 Alleles of Non-Contributor 7, 11 8, 9 13, 15 Detected Mixture Alleles 7, 8, 9, 11 5, 8, 9, 14 11, 13, 14 Discrepant Allele(s) ∅ ∅ 15 Decision Threshold τ0 …………… Excluded since δ=1 > τ0 …………… Decision Threshold τ1 …………… Included since δ=1 ≤ τ1 ……………

Pr(D) = 0.00 Pr(D) 0.00 = Decision Threshold τ2 …………… Included since δ=1 ≤ τ2 …………… Detected Mixture Alleles 7, 8, 9 8, 9, 14 11, 13, 14 Discrepant Allele(s) 11 ∅ 15 Decision Threshold τ0 …………… Excluded since δ=2 > τ0 …………… Decision Threshold τ1 …………… Excluded since δ=2 > τ1 ……………

Pr(D) = 0.20 Pr(D) 0.20 = Decision Threshold τ2 …………… Included since δ=2 ≤ τ2 …………… Detected Mixture Alleles 8 9, 14 11, 14 Discrepant Allele(s) 7, 11 8 13, 15 Decision Threshold τ0 …………… Excluded since δ=5 > τ0 …………… Decision Threshold τ1 …………… Excluded since δ=5 > τ1 …………… Pr(D)= 0.50 Decision Threshold τ2 …………… Excluded since δ=5 > τ2 ……………

Collapsing the individual-to-mixture comparison to three loci in Table 7 for simplicity yields an individual for whom all 6 alleles (i.e., Alleles 7 & 11 at Locus 1;

Alleles 8 & 9 at Locus 2; and Alleles 13 and 13 at Locus 3) are included in the mixture for Pr(D) = 0.00 . Since the individual is truly a non-contributor, this false inclusion is

not fortuitous. The detection of fewer mixture alleles with increasing Pr(D) corresponds to an increase in the number of observed allelic discrepancies. In this example, two discrepancies (Allele 11 at Locus 1; Allele 15 at Locus 3) are evident for Pr(D) = 0.20 , while five discrepancies (Alleles 7 & 11 at Locus 1; Allele 8 at Locus 2; Alleles 13 & 15 at Locus 3) exist for Pr(D) = 0.50 . Since the individual considered in this example is 49

truthfully a non-contributor to the mixture, the incidence of more allelic discrepancies increases the likelihood that the correct determination of exclusion (i.e., true negative) is reached. Had the individual truthfully contributed to the mixture, then that same phenomena of diminished mixture allele detection would have resulted in an increased likelihood of incorrectly excluding the included individual (i.e., false negative).

3.1.2 Comparing Simulated Mixtures to Simulated Included Individuals

In the case of comparisons between the simulated mixtures and the simulated included genotypes, all of the standards under consideration should, in truth, be included.

A contrary conclusion of exclusion would be in error. Considering Table 6, individual- to-mixture comparisons that involve the simulated population of included individuals correspond to the case when “reality: H 0 false; H1 true,” which corresponds to the top row in the table. Given a decision criterion, an analyst can either fail to reject H 0,

which corresponds to excluding the individual from the mixture, or reject H 0, which corresponds to including the individual from the mixture. Since all of these simulated individuals really ought to be included, a determination of inclusion properly identifies the individual as a true positive. Conversely, an incorrect determination of exclusion represents a false negative.

Tabulating all of the comparisons between individuals and 2-contributor mixtures

exhibiting no drop-out (i.e., Pr( D) = 00.0 ) and mapping the two possible analyst φ =1

conclusions to the corresponding colors from Table 6 yields Figure 16.

Figure 16: Results Comparing Included Individuals to Simulated Mixtures (No Drop-Out)

For reference-to-mixture comparisons in which the individuals are deliberately

simulated to have viable mixture-contributor profiles for a collection of respective

mixtures whose profiles are fully observed (i.e., Pr( D) = 00.0 ), Figure 16 bears out φ =1

the observation that those individuals will always be assessed to be included in the

mixtures. This serves as a quality check that the simulation scripts generating individuals

that ought to be included are, in fact, operating as intended. The only way a truthfully

included individual could be judged to have discrepancies with a mixture to which he

contributed is if part of the mixture profile is unobservable; in other words, discrepancies

between included-individual-to-mixture comparisons and will only occur when the

mixtures experience drop-out (i.e., Pr( D) ≠ 00.0 ). φ =1

Results for included-individuals-to-mixtures comparisons for different levels of drop-out are shown in Figure 17.

Figure 17: Results Comparing Included Individuals to Simulated Mixtures, Pr(D)| φ=1 ≠0.00 Yellow bars correspond to Pr(Incorrect Exclusion) , i.e., false negatives . Blue bars correspond to Pr(Correct Inclusion) , i.e., true positives .

For a given level of allelic discrepancy tolerance, as Pr(D) increases, mixture profiles are more likely to be missing alleles. Mixtures missing alleles, due to drop-out, are less likely to have commonality with a given reference profile; in other words, more allelic discrepancies will be observed. This is because the mixture profile has a diminished “allelic collection” from which to possibly match an individual’s alleles.

Thus, as Pr( D) increases, the incidence of false negatives (Type II errors) increases. φ =1

3.2 Validation Study Using Laboratory Mixture Data

An analogous investigation was conducted into the error rates associated with analyst calls of exclusion and inclusion as a function of analyst decision criteria and allelic drop-out using laboratory mixture data. Though a priori probabilities of allelic drop-out are not known for the laboratory mixtures, realized levels of drop-out that increase with diminishing levels of pre-amplification DNA template quantities are observed.

3.2.1 Comparing Laboratory Mixtures to Simulated Excluded Individuals

In the case of comparisons between the two laboratory mixtures and the simulated excluded standards, all of the individuals under consideration should, in truth, be excluded. Since all of these simulated standards really ought to be excluded, an analyst determination of exclusion properly identifies the individual as a true negative.

Conversely, an incorrect determination of inclusion represents a false positive.

In reality, the a priori probability of drop-out is not known for forensic evidence samples. Instead, the likelihood of drop-out is understood to be related to the quantity of sample’s DNA template before PCR is performed. The availability of more DNA starting material leads to increased electropherogram peak heights, which in turns leads to a greater likelihood of observing more of a profile and less allelic drop-out [45,60].

The range of starting DNA template constituting different amplification runs in this study was 2 ng, 1 ng, 0.5 ng, 0.25 ng, 0.125 ng, and 0.0625 ng. Since all mixtures considered were combined in a 1:1 ratio between contributors, those mixture DNA template

quantities in turn correspond to individual contributor DNA template quantities of 1 ng,

0.5 ng, 0.25 ng, 0.125 ng, 0.0625 ng, and 0.03125 ng, respectively.

Tallying all of the comparisons between excluded individuals and the dilution series of 2-contributor mixtures and mapping the two possible analyst actions to different colors yields the summary plots in Figure 18.

Figure 18: Results Comparing Excluded Individuals to 1:1 Laboratory Mixtures Green bars correspond to Pr(Correct Exclusion) , i.e., true negatives . Red bars correspond to Pr(Incorrect Inclusion) , i.e., false positives .

For laboratory mixture amplifications with starting template masses of 2 ng, 1 ng,

0.5 ng, 0.25 ng, and 0.125 ng, no allelic drop-out is observed. Accordingly, their distributions of true-negatives and false-positives are statistically indistinguishable. For the 0.125 ng and 0.0625 ng cases, some drop-out of mixture alleles occurred. This leads to an increased probability of correctly excluding a truly excluded individual with a

concomitant decrease in the probability of incorrect inclusion. This result is consistent with the trend demonstrated in Figure 15 and Table 7.

3.2.2 Comparing Laboratory Mixtures to Simulated Included Individuals

In the case of comparisons between the laboratory mixtures and the simulated

included individuals, all of the individuals under consideration should, in truth, be

included by a DNA analyst. Since all of these simulated individuals really ought to be

included, an analyst determination of inclusion properly identifies the individual as a true

positive. Conversely, an incorrect determination of exclusion represents a false negative.

Tabulating all of the comparisons between included individuals and 2-contributor

mixtures and mapping the two possible analyst actions to different colors yields the

summary plots in Figure 19.

Figure 19: Results Comparing Included Individuals to Laboratory Mixtures Yellow bars correspond to Pr(Incorrect Exclusion) , i.e., false negatives . Blue bars correspond to Pr(Correct Inclusion) , i.e., true positives . 55

For laboratory mixture amplifications with starting template masses of 2 ng, 1 ng,

0.5 ng, or 0.25 ng, no allelic drop-out is observed. Since all mixture alleles are observed, there is no chance of incorrectly excluding a truthfully included individual. This condition of Pr(D) = 0 matches the simulated result depicted in Figure 16. For the 0.125 ng and 0.0625 ng cases, some drop-out of mixture alleles occurred. This led to an increased probability of false exclusion with a concomitant decrease in the probability of correct exclusion. This result is generally consistent with the trend demonstrated in

Figure 17. Some deviation of this trend from a strictly increasing error as drop-out levels increase is likely due to the variability in results due to the small sample size; for these laboratory amplified mixtures, only two 1:1 mixtures (for every level of drop-out) were available for analysis.

3.3 Impact of Analyst Decision Threshold on Expected Errors

In establishing an acceptable decision criterion, there is a tradeoff between minimizing false positives while simultaneously minimizing false negatives. In the cases in which an individual should be excluded, the tolerance of discrepant alleles should be minimized in order to avoid false inclusions; this, in turn, maximizes the number of “true negative” determinations. In the cases in which an individual really should be included, tolerating an increased number of discrepancies may be necessary in order to avoid false exclusions; this, in turn, maximizes the number of “true positive” determinations. A

germane analysis paradigm employed in other disciplines to perform error tradeoff analysis is called ROC analysis.

3.3.1 Receiver Operating Characteristic (ROC) Analysis

Typical ROC analysis plots true positive rate versus false positive rate for a range

of possible decision thresholds. In the context of reference-to-mixture comparisons, this

amounts to simultaneously imposing a decision threshold (of allelic discrepancies) on

each of the two-color analyst histograms—the ones for excluded individuals (e.g.,

Section 3.1.1 for simulated mixtures; Section 3.2.1 for laboratory mixtures) and the ones

for included individuals (Section 3.1.2 for simulated mixtures; Section 3.2.2 for

laboratory mixtures)—for a given probability of drop-out. An example visualization

scheme for combining excluded-individual and included-individual data for

Pr( D) = 80.0 using the simulated mixture data is shown in Figure 20. The excluded- φ =1

individual-to-mixture comparison data occupies the bottom “row” of the figure, while the

included-individual-to-mixtures comparison data is stacked on top of it. Each “row” of

data has a separate y-axis ranging from 0 to 1, which represents the cumulative

normalized probability for analyst calls of exclusion and inclusion.

Figure 20: Example Simultaneous Visualization of Exclusion & Inclusion Comparisons Data on the bottom portion of the plot (contained in green and red bars) represent comparisons involving truthfully excluded individuals, while truthfully included individuals are represented on the top portion of the plot (by yellow and blue bars). Data shown are for simulated data with Pr(D)=0.80.

As an example, the τδ=21 bin is boxed to represent an analyst decision threshold to tolerate δ=21 discrepancies while still declaring a reference to be included in a mixture.

For τδ=21, the excluded-individual-to-mixture comparisons yield a false positive error (i.e.,

incorrect inclusion; the red bar at the τδ=21 bin) approximately 18.1% of the time, while

the included-individual-to-mixture comparisons yields a true positive rate (i.e., correct

inclusion; the blue bar at the τδ=21 bin) of approximately 82.3%.

Varying the decision threshold across the full range of possible allelic discrepancy

values and recording the associated false positive and true positive rates populates a table

of the form shown in

Table 8. Once the table is completely populated, plotting the false positive and

the true positive rates as the abscissa and ordinate quantities, respectively, results the

ROC plot depicted in Figure 21.

Table 8: Notional Table of False Positive & True Positive Rates for Different Decision Thresholds Decision Threshold τδ=0 τδ=1 τδ=2 τδ=3 τδ=4 … τδ=21 … τδ=30 False Positive Rate 0.181 True Positive Rate 0.823

Figure 21: Example ROC Plot Each possible decision threshold results in an ordered pair (false positive, true positive ) that corresponds to performance expectations for the associated, allelic-discrepancy decision criterion specified by τδ. Optimal performance, which corresponds to a false positive rate of 0 and a true positive rate of 1, occurs at the point (0,1 ) in the upper-left of the chart. The dotted line corresponding to y = x represents random performance, equivalent to a scheme that guesses “inclusion” and “exclusion” a certain percentage of the time without regard to any profile information. The complementary scales to false positive rate and true positive rate—true negative rate and false negative rate, respectively—are also shown. Data shown are for simulated data with Pr(D)=0.80.

Each point in the ROC space is associated with a particular decision threshold τδ.

Perfect performance is represented by the point (0,1), which corresponds to a false positive rate of 0 and a true positive rate of 1. Though perfect performance is not

expected in the presence of drop-out, performance (i.e., accuracy) increases as one approaches the upper-left corner of the ROC space. Any decision criterion that results in a ROC performance curve lying along the 45 o diagonal (i.e., true positive rate = false

positive rate) represents the case in which the decision criterion is unable to provide any

marginal discrimination over a random guess. In other words, if an analyst guesses

inclusion 80% of the time, the analyst will correctly diagnose 80% of the truthful

inclusions as inclusions but also incorrectly diagnose 80% of truthful exclusions as

inclusions; if an analyst guesses inclusion 50% of the time, the analyst will correctly

diagnose 50% of the truthful inclusions as inclusions but also incorrectly diagnose 50%

of truthful exclusions as inclusions.

3.3.1.1 Rollup ROC Results for Simulated Mixture Data

The data from Section 3.1 is reproduced in Figure 22 using the visualization

scheme introduced in Section 3.3.1 to facilitate ROC curve generation.

Figure 22: Compilation of Exclusion & Inclusion Data Using Simulated Mixtures As in Section 3.3.1, the bars represent cumulative, normalized fractions with the red and green bars for a given bin summing to 1 and—separately—the blue and yellow bars for a given bin summing to 1. Green represents Pr(Correct Exclusion) ; red, Pr(Incorrect Inclusion) ; yellow, Pr(Correct Inclusion) ; blue, Pr(Incorrect Exclusion)

Compiling the ROC results for all levels of drop-out analyzed with the simulated mixture data produces Figure 23.

Figure 23: Error Analysis Results: ROC Rollup Using Simulated Mixtures Decision threshold labels have been omitted for legibility.

Figure 23 shows the correct inclusion rate (calculated from the inclusion data)

versus the incorrect inclusion rate (calculated from the exclusion data) at various levels of

drop-out for the full range of possible discrepancy decision criteria. Figure 23 shows that

as the tolerated number of allelic discrepancies (i.e., the decision threshold) increases,

both the correct inclusion and incorrect inclusion fractions increase (while the incorrect

exclusion and correct exclusion fractions experience complementary decreases), albeit at

different rates. Further, as the level of drop-out increases for a given decision threshold,

both the correct inclusion and incorrect inclusion fractions decrease.

ROC analysis may therefore be used to determine a complexity threshold. That is, a laboratory may choose to specify particular levels of error that are to be tolerated, thus

bounding the false positive and false negative error rates and establishing a “complexity threshold.” This amounts to “zooming in” on a region in the ROC space that accords with the tolerable error levels. An example is shown in Figure 24.

Figure 24: Notional Complexity Threshold: Simulated Mixture Results This plot “zooms in” on the upper left of Figure 23 in a manner consistent with error bounds specified by an imaginary laboratory’s Standard Operating Procedure. (The same color scheme is employed as that specified in Figure 23’s legend, where the Pr(D) increases from blue to red or from top-left to bottom-right in increments of 0.1. Also, the first couple of decision thresholds (i.e., τ1 – τ4) for the darkest blue line, corresponding to no drop-out, are so close that they cannot be distinguished from one another.) In this case, the error bounds provide for no more than a 10% chance of incorrect inclusion of a reference sample while insisting on at least 30% correct inclusion determinations. Any points (determined by error rates and drop-out levels) that lie outside of this space fail the complexity threshold specified by the SOP and should not be interpreted.

Complexity thresholds, which are defined in ROC space by pre-specified, laboratory error bounds, may be utilized in two ways: 1) to determine when the Pr(D) is too high to allow for reliable evidence profile interpretation; 2) to determine an a priori

ceiling on the number of allelic discrepancies tolerated (while still allowing for a determination of reference exclusion as a contributor to the evidence sample) before false inclusion rates become intolerable.

Even if a determination of drop-out level can be made, certain error limitations may sufficiently bound the problem of establishing a complexity threshold. For instance, if a laboratory demanded that false inclusion occur less than 1% of the time and that true inclusions occur at least 85% of the time (Figure 25), then samples where the probability of drop-out is greater than 0.2 should not be analyzed since none of the ROC curve is represented in the bounded space; that is, the complexity threshold, which is based on pre-determined, tolerable error rates, is not met.

Figure 25: More Realistic Complexity Threshold: Simulated Mixture Results This plot “zooms in” on the upper left of Figure 24 in a manner consistent with error bounds specified by an imaginary laboratory. (The same color scheme is employed as that specified in Figure 23’s legend, which has been left off here for space considerations. Also, the first couple of decision thresholds (i.e., τ1 – τ2) for the darkest blue line, corresponding to no drop-out, are so close that they cannot be distinguished from one another.) A laboratory’s Standard Operating Procedure specifies tolerable error bounds that will define the complexity threshold. In this case, false inclusions are limited to occurring less than 1% of the time while false negatives are allowed 15% of the time. The highest number of allelic discrepancies (representing possible decision thresholds) for any level of drop-out that falls within this space is 8 (for Pr(D) = 0.30).

If these are the bounds selected by the laboratory, only those samples with a corresponding Pr(D) = 0.00 – 0.20 should be evaluated. It should be noted that the present study simulated two-person mixtures in a contributor ratio of 1:1. Supplementary studies that simulate mixtures with varying ratios and numbers of contributors are necessary to further explore the parameter space. If a higher probability of drop-out is

expected due to low-template, the samples should not be analyzed. It would be classified as a Type C [7] or uninterpretable mixture [11].

Previous work suggests that Pr(D) increases with decreasing DNA input levels and

can be characterized via peak heights [9]. Gill et al. [61] showed that Pr(D) ≈ 0.20 when allele peak heights are between approximately 50 – 100 RFU for the AmpF ℓSTR® SGM

Plus® PCR Amplification Kit. Therefore, for the aforementioned bounds, if it is suspected that less than ~0.1 ng was amplified for a given contributor—as evidenced by peak heights less than ~100 RFU at an analytical threshold of 50 RFU—then the laboratory would deem the sample indeterminable and would not use it for comparison purposes. The decision would be made before comparison to a reference sample.

Next, the ROC plot can be used to determine the number of “allowed” discrepancies based on a given Pr(D). For example, considering Figure 25, if a laboratory has settled upon acceptable rates of incorrect inclusion and correct inclusion at

≤1% and ≥85%, respectively, with Pr(D) ≈0.2, then no more than eight allelic

discrepancies should be allowed by the laboratory’s analysts. That is, if an analyst were

to include reference samples as potential contributors to evidence stains despite observing

9 or more allelic discrepancies, error rates greater than the laboratory’s specified

tolerances would be encountered.

Alternatively, given Blackstone’s Ratio [31], a laboratory might solely prioritize the minimization of spurious inculpations at the expense of identifying every true positive, in which case only false inclusions are considered in the constitution of a laboratory’s acceptable error bounds. An example is shown in Figure 26.

Figure 26: Complexity Threshold Based on Blackstone’s Ratio: Simulated Mixture Results This complexity threshold focuses solely on diminishing the incidence of false positives without specifying a bound on false negatives. Only selected thresholds are labeled.

In this example, any rate of incorrect exclusion is allowed and only the rate of incorrect inclusions is bounded. Here, all samples regardless of drop-out rates may be interpreted, but given a particular Pr(D), a specified number of allowable allelic discrepancies may be defined that increases as with increasing Pr(D). This increase in the number of allowable allelic discrepancies with increasing Pr(D) is again the result of obtaining less allelic information with increasing drop-out, which leads to an increased chance in correctly excluding a reference who ought to have been excluded. (Refer to

Figure 15 and Table 7 for detailed discussions of this point.)

3.3.1.2 Rollup ROC Results for Laboratory Mixture Data

The data from Section 3.2 is reproduced in Figure 27 using the visualization

scheme introduced in Section 3.3.1 to facilitate ROC curve generation.

Figure 27: Compilation of Exclusion & Inclusion Data Using Laboratory Mixtures Green represents Pr(Correct Exclusion) ; red, Pr(Incorrect Inclusion) ; yellow, Pr(Correct Inclusion) ; blue, Pr(Incorrect Exclusion)

Compiling the ROC results for all quantities of starting DNA template with the

laboratory mixture data produces Figure 28. When the mass (in ng) of starting DNA

template decreases below 0.25 ng, allelic drop-out occurs, and the greater incidence of

errors is observed, as predicted by the simulation studies.

Figure 28: Complexity Threshold: Laboratory Mixture Results False negative errors for template quantities greater than 0.125 ng of DNA are non- existent, and the data points for the respective curves lie on top of one another. The dotted line represents the line y = x of random performance.

If the error specification is such that only a 1% rate of incorrect inclusion is prescribed, the laboratory amplified data shows “operating points” that pass the complexity threshold for all levels of starting template mass analyzed here. A pre- established maximum of seven allelic discrepancies would be given as this laboratory’s tolerance for two-contributor mixtures mixed in a 1:1 proportion; making determinations of inclusion for mixtures containing a higher number of discrepant alleles would not allow for long-term compliance with the indicated 1% ceiling on incorrect inclusion rate.

Therefore for this laboratory allowing 8 allelic discrepancies would be too lax of a criterion given the error requirements. Therefore if there were 8 or greater discrepancies

between the evidence and reference samples (regardless of locus), then exclusion may be appropriate. More stringent criterion, such as a 0.1% ceiling on incorrect inclusion rate, would decrease the number of allowed allelic discrepancies to 5.

Although beyond the scope of this work, comparisons between error analyses and

resultant RMNE or LR statistics would be required to assess the relationship between these inclusion statistics versus ROC-determined error rates. Completing this analysis would allow for the determination of a meaningful conclusion as to the “strength” of an evidentiary sample and allow for lay contextualization of the relative strength of RMNE or LR statistic. Additionally, analysis of error tradeoffs for complex mixtures with greater than two contributors and unequal mixing ratios need to be studied to assess the general viability of this analytic method.

4 CONCLUSION

In the analysis context of allelic drop-out—and when dealing with samples

consisting of low template DNA quantities or of mixtures—cognizance of the error rates

associated with DNA profile interpretation is paramount. Awareness of the incidence of

error is crucial to responsible determinations of a reference’s exclusion or inclusion in an

evidence sample. Depending on a laboratory’s Standard Operating Procedure decision

criterion with regard to allelic discrepancies, this potential for error exists independently

of the additional, ever-present pitfalls that can accompany sample collection, storage,

extraction, amplification, and profiling. The various incarnations of results in Section 3

represent a characterization of the countervailing errors that lurk in the fundamental nature of comparing allelic lists between samples.

The error characterizations demonstrated are useful in a number of ways. As an

analysis methodology, the ROC paradigm represents a powerful tool that allows for the

characterization of errors across the discipline of forensics, including DNA profile

comparisons. The analysis can be extended, as more sophisticated mathematical

treatment will yield further insight. For example, the slope of the ROC curve at a given

decision threshold can be used to calculate a likelihood ratio of conditional error

expectations that is of essential relevance in contemporary DNA analysis and courtroom

testimony.

The simulated results can be used by a laboratory to inform the establishment of a preferred operating point by selectively optimizing between false positives and false negatives to accord with prudence. Alternatively, after empirically determining a level of drop-out associated with a particular laboratory or with evidence samples of varying starting DNA template, an informed decision can be made regarding tolerance of allelic discrepancies.

The specification of error bounds can also designate an operating region, outside

of which the interpretation of an evidence profile cannot be made with the required

accuracy. Whether a given evidence profile is a candidate for interpretation is a function

of its associated level of drop-out. Evidence profiles shown to lie outside of the

acceptable error bounds due to their level of allelic drop-out are said to fail to meet a

complexity threshold for determinations of inclusion or exclusion. No statistics should

be calculated for such samples, and the only responsible determination with respect to reference inclusion/exclusion is “inconclusive.” For evidence profiles possessing levels of drop-out that are deemed interpretable, this same complexity threshold can be employed to establish a laboratory’s decision criteria with respect to tolerating allelic discrepancies; the resulting prescription for determining that a reference is included as a contributor to an evidentiary stain conforms with premeditated, laboratory-selected error rates.

5 REFERENCES

List of Abbreviated Publication Titles (In Alpha Order)

Cold Spring Harb. Sympo. Quant. Biol. ………………... Cold Spring Harbor Symposium in Quantitative Biology Croat. Med. J. ………………………………………………….. Croatian Medical Journal Forensic Sci. Int. …………………………………………. Forensic Science International Forensic Sci. Int. Genet. ……………………….. Forensic Science International: Genetics HP Labs. Tech. Report ………………… Hewlett-Packard Laboratories Technical Report Hum. Reprod. ………………………………………………………. Human Reproduction IEEE Trans. Med. Imaging …………….. Institute of Electrical and Electronics Engineers Transactions on Medical Imaging IMAC-XXV …………………………………... International Modal Analysis Conference Int. J. Comput. Vis. …………………………… International Journal of Computer Vision Int. J. Legal Med. ………………………………. International Journal of Legal Medicine Invest. Radiol. ……………………………………………………. Investigative Radiology J. Forensic Sci. ……………………………………………... Journal of Forensic Sciences J. Forensic Sci. Soc. ……………………………….... Journal of Forensic Science Society J. R. Stat. Soc. Ser. C Appl. Stat. ………………………… Journal of the Royal Statistical Society: Series C (Applied Statistics) Jurimetrics J. …………………………………………………………. Jurimetrics Journal Law Probab. Risk. ……………………………………………... Law, Probability and Risk Nat. Genet. ……………………………………………………………….. Nature Genetics NCJ …………………………………………………………….. National Criminal Justice Stat. Methods Med. Res. ……………………….. Statistical Methods in Medical Research Theor. Popul Biol. ……………………………………….. Theoretical Population Biology Univ. Colorado Law Rev. ………………………...... University of Colorado Law Review

[1] A.J. Jeffreys, V. Wilson, S.L. Thein, Hypervariable 'minisatellite' regions in human DNA, Nature. 314 (1985) 67-73.

[2] P. Gill, A.J. Jeffreys, D.J. Werrett, Forensic application of DNA 'fingerprints', Nature. 318 (1985) 577-579.

[3] K. Mullis, F. Faloona, S. Scharf, R. Saiki, G. Horn, H. Erlich, Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction, Cold Spring Harb. Symp. Quant. Biol. 51 (1986) 263-273.

[4] S.A. Greenspoon, K.L.V. Sykes, J.D. Ban, A. Pollard, M. Baisden, M. Farr, et al. , Automated PCR setup for forensic casework samples using the Normalization Wizard and PCR Setup robotic methods, Forensic Sci. Int. 164 (2006) 240-248.

[5] P.M. Vallone, C.R. Hill, J.M. Butler, Demonstration of rapid multiplex PCR amplification involving 16 genetic loci, Forensic Sci. Int. Genet. 3 (2008) 42-45.

[6] National Institute of Justice, Postconviction DNA Testing: Recommendations for Handling Requests, NCJ 177626 (1999).

[7] P.M. Schneider, R. Fimmers, W. Keil, G. Molsberger, D. Patzelt, W. Pflug, et al. , The German Stain Commission: recommendations for the interpretation of mixed stains, Int. J. Legal Med. 123 (2009) 1-5.

[8] T. Tvedebrink, P.S. Eriksen, H.S. Mogensen, N. Morling, Estimating the probability of allelic drop-out of STR alleles in forensic genetics, Forensic Sci. Int. Genet. 3 (2009) 222-226.

[9] T. Tvedebrink, P.S. Eriksen, H.S. Mogensen, N. Morling, Evaluating the weight of evidence by using quantitative short tandem repeat data in DNA mixtures, J. R. Stat. Soc. Ser. C Appl. Stat. 59 (2010) 855-874.

[10] P. Gill, C.H. Brenner, J.S. Buckleton, A. Carracedo, M. Krawczak, W.R. Mayr, et al. , DNA commission of the International Society of Forensic Genetics: recommendations on the interpretation of mixtures, Forensic Sci. Int. 160 (2006) 90-101.

[11] Scientific Working Group on DNA Analysis Methods (SWGDAM), SWGDAM interpretation guidelines for autosomal STR typing by forensic DNA testing laboratories, http://www.fbi.gov/about-us/lab/codis/swgdam-interpretation-guidelines. (2010).

[12] J.M. Curran, J.S. Buckleton, Inclusion probabilities and dropout, J. Forensic Sci. 55 (2010) 1171-1173.

[13] D.J. Balding, J.S. Buckleton, Interpreting low template DNA profiles, Forensic Sci. Int. Genet. 4 (2009) 1-10.

[14] B. Devlin, Forensic inference from genetic markers, Stat. Methods Med. Res. 2 (1992) 241-262.

[15] J.S. Buckleton, J.M. Curran, A discussion of the merits of random man not excluded and likelihood ratios, Forensic Sci. Int. Genet. 2 (2008) 343-348.

[16] National Research Council (NRC-II), The Evaluation of Forensic DNA Evidence, National Academy Press. Washington, D.C. (1996).

[17] D. Jarjoura, J. Jamison, S. Androulakakis, Likelihood ratios for deoxyribonucleic acid (DNA) typing in criminal cases, J. Forensic Sci. 39 (1994) 64-73.

[18] B.S. Weir, DNA statistics in the Simpson matter, Nat. Genet. 11 (1995) 365-368.

[19] J.J. Koehler, On conveying the probative value of DNA evidence: frequencies, likelihood ratios and error rates, Univ. Colorado Law Rev. 67 (1996) 859-886.

[20] B.S. Weir, C.M. Triggs, L. Starling, L.L. Stowell, K.A.J. Walsh, J.S. Buckleton, Interpreting DNA mixtures, J. Forensic Sci. 42 (1997) 213-222.

[21] I.W. Evett, B.S. Weir, Interpreting DNA Evidence: Statistical Genetics for Forensic Scientists, Sinauer Associates, Sunderland, MA, 1998.

[22] C.J. Word, Mixture interpretation: why is it sometimes so hard? Promega Corporation Web Site. http://www.promega.com/resources/articles/profiles-in- dna/2011/mixture-interpretation-why-is-it-sometimes-so-hard/. 2011 (2011) 5.

[23] I.W. Evett, C. Buffery, G. Willott, D. Stoney, A guide to interpreting single locus profiles of DNA mixtures in forensic cases, J. Forensic Sci. Soc. 31 (1991) 41-47.

[24] C.H. Brenner, R. Fimmers, M.P. Baur, Likelihood ratios for mixed stains when the number of donors cannot be agreed, Int. J. Legal Med. 109 (1996) 218-219.

[25] J. Mortera, A.P. Dawid, S.L. Lauritzen, Probabilistic expert systems for DNA mixture profiling, Theor. Popul. Biol. 63 (2003) 191-205.

[26] J.S. Buckleton, I.W. Evett, B.S. Weir, Setting bounds for the likelihood ratio when multiple hypotheses are postulated, Science & Justice. 38 (1998) 23-26.

[27] A. Stockmarr, Likelihood ratios for evaluating DNA evidence when the suspect is found through a database search, Biometrics. 55 (1999) 671-677.

[28] A.P. Dawid, Which likelihood ratio, Law Probab. Risk. 3 (2004) 65-71.

[29] R. Meester, M. Sjerps, Why the effect of prior odds should accompany the likelihood ratio when reporting DNA evidence, Law Probab. Risk. 3 (2004) 51-62. 75

[30] P. Gill, J. Whitaker, C. Flaxman, N. Brown, J.S. Buckleton, An investigation of the rigor of interpretation rules for STRs derived from less than 100 pg of DNA, Forensic Sci. Int. 112 (2000) 17-40.

[31] W. Blackstone, Of Trial, and Conviction, Commentaries on the Laws of England, Book The Fourth: Of Public Wrongs, Clarendon Press, Oxford, 1765-1769, pp. 358.

[32] R.A. Van Oorschot, M. Jones, DNA fingerprints from fingerprints, Nature. 387 (1997) 767.

[33] M. Pizzamiglio, F. Donato, T. Floris, C. Bellino, P. Cappiello, G. Lago, et al. , DNA typing on latex gloves, in: Sensabaugh G.F., Lincoln P.J., Olaisen B. (Eds.), Progress in Forensic Genetics, Elsevier, Amsterdam, 2000, pp. 504-507.

[34] P. Wiegand, T. Bajanowski, B. Brinkmann, DNA typing of debris from fingernails, Int. J. Legal Med. 106 (1993) 81-83.

[35] P. Wiegand, M. Kleiber, DNA typing of epithelial cells after strangulation, Int. J. Legal Med. 110 (1997) 181-183.

[36] R. Uchihi, K. Tamaki, T. Kojima, T. Yamamoto, Y. Katsumata, Deoxyribonucleic acid (DNA) typing of human leukocyte antigen (HLA)-DQA1 from single hairs in Japanese, J. Forensic Sci. 37 (1992) 853-859.

[37] J. Bright, S.F. Petricevic, Recovery of trace DNA and its application to DNA profiling of shoe insoles, Forensic Sci. Int. 145 (2004) 7-12.

[38] Y. Watanabe, T. Takayama, K. Hirata, S. Yamada, A. Nagai, I. Nakamura, et al. , DNA typing from cigarette butts, Leg. Med. 5, Supplement (2003) S177-S179.

[39] P. Gill, B. Sparkes, J.S. Buckleton, Interpretation of simple mixtures of when artefacts such as stutters are present - with special reference to multiplex STRs used by the Forensic Science Service, Forensic Sci. Int. 95 (1998) 213-224.

[40] J.M. Curran, C.M. Triggs, J.S. Buckleton, B.S. Weir, Interpreting DNA mixtures in structured populations, J. Forensic Sci. 44 (1999) 987-995.

[41] M.W. Perlin, B. Szabady, Linear mixture analysis: a mathematical approach to resolving mixed DNA samples, J. Forensic Sci. 46 (2001) 1372-1378.

[42] Y. Torres, I. Flores, V. Prieto, M. López-Soto, M.J. Farfán, A. Carracedo, et al. , DNA mixtures in forensic casework: a 4-year retrospective study, Forensic Sci. Int. 134 (2003) 180-186.

[43] Applied Biosystems, AmpF ℓSTR® Identifiler® Plus PCR Amplification Kit , User's Manual. Part Number 4323291 Rev. F (2011). 76

[44] Promega Corp., PowerPlex® 16 HS System, User's Manual. Part# TMD022 (2011) 1-73.

[45] P. Gill, Amplification of low copy number DNA profiling, Croat. Med. J. 42 (2001) 229-232.

[46] P. Gill, Role of short tandem repeat DNA in forensic casework in the UK--past, present, and future perspectives, BioTechniques. 32 (2002) 366-385.

[47] B. Budowle, A.J. Eisenberg, A. van Daal, Validity of low copy number typing and applications to forensic science, Croat. Med. J. 50 (2009) 207-217.

[48] M.W. Perlin, A. Sinelnikov, An information gap in DNA evidence interpretation, PLoS ONE. 4 (2009) e8327 1-12.

[49] J.M. Curran, A MCMC method for resolving two person mixtures, Science. 48 (2008) 168-177.

[50] T. Wang, N. Xue, J.D. Birdwell, Least-square deconvolution: a framework for interpreting short tandem repeat mixtures, J. Forensic Sci. 51 (2006) 1284-1297.

[51] C.E. Metz, ROC methodology in radiologic imaging, Invest. Radiol. 21 (1986) 720- 733.

[52] M. Alnadabi, S. Johnstone, Speech/music discrimination by detection: assessment of time series events using ROC graphs, Systems, Signals and Devices, 2009. SSD '09. 6th International Multi-Conference on. (2009) 1-5.

[53] P. Viola, M.J. Jones, Robust real-time face detection, Int. J. Comput. Vis. 57 (2004) 137-154.

[54] J.M. Nichols, M. Seaver, S.T. Trickey, S.R. Motley, E. Eisner, Detecting bolt loosening under strong temperature fluctuations using ambient vibrations, IMAC-XXV. XXV (2007).

[55] S. Gefen, O.J. Tretiak, C.W. Piccoli, K.D. Donohue, A.P. Petropulu, P.M. Shankar, et al. , ROC analysis of ultrasound tissue characterization classifiers for breast cancer diagnosis, IEEE Trans. Med. Imaging. 22 (2003) 170-177.

[56] J.M. Butler, R. Schoske, P.M. Vallone, J.W. Redman, M.C. Kline, Allele frequencies for 15 autosomal STR loci on U.S. Caucasian, African American, and Hispanic populations, J. Forensic Sci. 48 (2003) 908-911.

[57] E.R. Coronado, Amplification reproducibility and the effects on DNA mixture interpretation on profiles generated via traditional and mini-STR amplification, MSc Thesis, Boston University School of Medicine. (2011) 1-77. 77

[58] J. Bregu, D. Conklin, E. Coronado, M. Terrill, R.W. Cotton, C.M. Grgicak, Analytical thresholds: determination of minimum distinguishable signals, J. Forensic Sci. ([in press]).

[59] T. Fawcett, ROC graphs: notes and practical considerations for researchers, HP Labs. Tech. Report. No. HPL-2003-4 (2004) 1-38.

[60] D.T. Chung, J. Drabek, K.L. Opel, J.M. Butler, B.R. McCord, A study on the effects of degradation and template concentration on the amplification efficiency of the STR miniplex primer sets, J. Forensic Sci. 49 (2004) 733-740.

[61] P. Gill, R. Puch-Solis, J.M. Curran, The low-template-DNA (stochastic) threshold— its determination relative to risk analysis for national DNA databases, Forensic Sci. Int. Genet. 3 (2009) 104-111.

[62] W.C. Thompson, Subjective interpretation, laboratory error and the value of forensic DNA evidence: three case studies, Genetica. 96 (1995) 153-168.

[63] W.C. Thompson, Accepting lower standards: the National Research Council's Second Report on Forensic DNA Evidence, Jurimetrics J. 37 (1997) 405-424.

[64] B. Scheck, P. Neufeld, F. Dwyer, Actual Innocence, Doubleday, New York, 2000.

6 VITA

Jacob Samuel Gordon MIT Lincoln Laboratory: Permanent address: 244 Wood Street [email protected] 52 Church Avenue Lexington, MA 02420-9108 [email protected] Islip, NY 11751-3902 Work phone: (781) 981-4373 Year of Birth: 1983 Cell phone: (516) 721-7234

EDUCATION

BOSTON UNIVERSITY SCHOOL OF MEDICINE Boston, MA M.S. Degree in Biomedical Forensic Sciences, expected 1/12. Thesis: Characterization of Error Tradeoffs in Human Identity Comparisons: Determining a Complexity Threshold for DNA Mixture Interpretation . [9/06 – present]

HARVARD UNIVERSITY Cambridge, MA A.B. Degree with Honors in Physics, 2005. Dean’s List, Harvard College Scholarship. [9/01 – 6/05]

UNIVERSITY OF CAPE TOWN Cape Town, South Africa Semester study abroad. [2/04 – 6/04]

ISLIP HIGH SCHOOL Islip, NY [9/97 – 6/01]

EXPERIENCE

MIT LINCOLN LABORATORY Hanscom Air Force Base, Lexington, MA Developing and evaluating large-scale ballistic missile defense systems to advance laboratory’s fundamental mission to apply science and advanced technology to critical problems of national security. Analysis includes data fusion and interpretation, system performance characterizations, phenomenology studies, and leveraging of high-level data analysis software. Lead author of papers presented at National Fire Control Symposium (San Diego, 8/2010), Missile Defense Sensors, Environments, and Algorithms (Orlando, 2010). Continuing coursework: Radar Systems, Parameter Estimation for Dynamic Systems, Information Fusion for Decision Support, Cryptography and Cyber Security, Advanced MATLAB Programming, Programming in JAVA. [10/05 – present]

NASA GODDARD SPACE FLIGHT CENTER Greenbelt, MD Investigated use of lasers to remotely sense atmospheric conditions and variables. [7/04 – 8/04]

PUBLIC DEFENDER SERVICE FOR THE DISTRICT OF COLUMBIA Washington, D.C. Selected as one of two members of personal investigative team for felony attorney serving the indigent community in D.C. Responsibilities: interviewing witnesses, analyzing crime scenes, taking sworn statements, writing and serving subpoenas, developing case strategy, and visiting clients in D.C. jail. Attended forensics conference hosted by Dr. Henry Lee. [6/03 – 8/03]

DEPT. OF ENERGY UNDERGRADUATE LABORATORY FELLOWSHIP Upton, NY Recruited by high-energy particle physics group at Brookhaven National Laboratory. PHENIX is one of the two biggest experiments conducted using BNL’s Relativistic Heavy-Ion Collider. Research earned poster presentation slot at American Physical Society’s 2002 Division of Nuclear Physics Conference. [6/02 – 8/02]

HARVARD UNIVERSITY COURSE ASSISTANT Cambridge, MA Assisted professors in first- and second-year calculus courses. Responsibilities: teaching weekly, 90- minute, problem sessions, generating course review materials, grading assignments, preparing students for exams, and writing student recommendations. [9/03 – 1/04, 9/04 – 1/05]

PEDIATRIC HEMATOLOGY/ONCOLOGY WARD Massachusetts General Hospital, Boston, MA Child Life Volunteer providing support for patients and to nursing staff. [6/06 – 9/06]