Accuracy and Reliability of Forensic Latent Fingerprint Decisions
Total Page:16
File Type:pdf, Size:1020Kb
Accuracy and reliability of forensic latent fingerprint decisions Bradford T. Ulerya, R. Austin Hicklina, JoAnn Buscagliab,1, and Maria Antonia Robertsc aNoblis, 3150 Fairview Park Drive, Falls Church, VA 22042; bCounterterrorism and Forensic Science Research Unit, Federal Bureau of Investigation Laboratory Division, 2501 Investigation Parkway, Quantico, VA 22135; and cLatent Print Support Unit, Federal Bureau of Investigation Laboratory Division, 2501 Investigation Parkway, Quantico, VA 22135 Edited by Stephen E. Fienberg, Carnegie Mellon University, Pittsburgh, PA, and approved March 31, 2011 (received for review December 16, 2010) The interpretation of forensic fingerprint evidence relies on the tion content is sufficient to make a decision. Latent print exam- expertise of latent print examiners. The National Research Council ination can be complex because latents are often small, unclear, of the National Academies and the legal and forensic sciences com- distorted, smudged, or contain few features; can overlap with munities have called for research to measure the accuracy and other prints or appear on complex backgrounds; and can contain reliability of latent print examiners’ decisions, a challenging and artifacts from the collection process. Because of this complexity, complex problem in need of systematic analysis. Our research is experts must be trained in working with the various difficult focused on the development of empirical approaches to studying attributes of latents. this problem. Here, we report on the first large-scale study of the During examination, a latent is compared against one or more accuracy and reliability of latent print examiners’ decisions, in exemplars. These are generally collected from persons of interest which 169 latent print examiners each compared approximately in a particular case, persons with legitimate access to a crime 100 pairs of latent and exemplar fingerprints from a pool of 744 scene, or obtained by searching the latent against an Automated pairs. The fingerprints were selected to include a range of attri- Fingerprint Identification System (AFIS), which is designed to butes and quality encountered in forensic casework, and to be select from a large database those exemplars that are most similar BIOLOGICAL comparable to searches of an automated fingerprint identification to the latent being searched. For latent searches, an AFIS only system containing more than 58 million subjects. This study eval- provides a list of candidate exemplars; comparison decisions must SCIENCES uated examiners on key decision points in the fingerprint exami- be made by a latent print examiner. Exemplars selected by an APPLIED nation process; procedures used operationally include additional AFIS are far more likely to be similar to the latent than exemplars safeguards designed to minimize errors. Five examiners made false selected by other means, potentially increasing the risk of exam- positive errors for an overall false positive rate of 0.1%. Eighty-five iner error (18). percent of examiners made at least one false negative error for an The prevailing method for latent print examination is known overall false negative rate of 7.5%. Independent examination of as analysis, comparison, evaluation, and verification (ACE-V) the same comparisons by different participants (analogous to blind (19, 20). The ACE portion of the process results in one of four verification) was found to detect all false positive errors and the decisions: the analysis decision of no value (unsuitable for com- majority of false negative errors in this study. Examiners frequently parison); or the comparison/evaluation decisions of individualiza- differed on whether fingerprints were suitable for reaching a tion (from the same source), exclusion (from different sources), conclusion. or inconclusive. The Scientific Working Group on Friction Ridge Analysis, Study and Technology guidelines for operational proce- he interpretation of forensic fingerprint evidence relies on the dures (21) require verification for individualization decisions, but verification is optional for exclusion or inconclusive decisions. Texpertise of latent print examiners. The accuracy of decisions ’ made by latent print examiners has not been ascertained in a Verification may be blind to the initial examiner s decision, in large-scale study, despite over one hundred years of the forensic which case all types of decisions would need to be verified. use of fingerprints. Previous studies (1–4) are surveyed in ref. 5. ACE-V has come under criticism by some as being a general Recently, there has been increased scrutiny of the discipline approach that is underspecified (e.g., refs. 14 and 15). resulting from publicized errors (6) and a series of court admis- Latent-exemplar image pairs collected under controlled con- ditions for research are known to be mated (from the same sibility challenges to the scientific basis of fingerprint evidence source) or nonmated (from different sources). An individualiza- (e.g., 7–9). In response to the misidentification of a latent print tion decision based on mated prints is a true positive, but if based in the 2004 Madrid bombing (10), a Federal Bureau of Investiga- on nonmated prints, it is a false positive (error); an exclusion tion (FBI) Laboratory review committee evaluated the scientific decision based on mated prints is a false negative (error), but basis of friction ridge examination. That committee recom- is a true negative if based on nonmated prints. The term “error” mended research, including the study described in this report: is used in this paper only in reference to false positive and false a test of the performance of latent print examiners (11). The need negative conclusions when they contradict known ground truth. for evaluations of the accuracy of fingerprint examination deci- No such absolute criteria exist for judging whether the evidence is sions has also been underscored in critiques of the forensic sufficient to reach a conclusion as opposed to making an incon- sciences by the National Research Council (NRC, ref. 12) and clusive or no-value decision. The best information we have to others (e.g., refs. 13–16). Background Author contributions: B.T.U., R.A.H., J.B., and M.A.R. designed research; B.T.U., R.A.H., J.B., Latent prints (“latents”) are friction ridge impressions (finger- and M.A.R. performed research; B.T.U. and R.A.H. contributed new analytic tools; B.T.U., prints, palmprints, or footprints) left unintentionally on items R.A.H., J.B., and M.A.R. analyzed data; and B.T.U., R.A.H., J.B., and M.A.R. wrote the paper. such as those found at crime scenes (SI Appendix, Glossary). The authors declare no conflict of interest. Exemplar prints (“exemplars”), generally of higher quality, are This article is a PNAS Direct Submission. collected under controlled conditions from a known subject using Freely available online through the PNAS open access option. ink on paper or digitally with a livescan device (17). Latent print 1To whom correspondence should be addressed. E-mail: [email protected]. examiners compare latents to exemplars, using their expertise This article contains supporting information online at www.pnas.org/lookup/suppl/ rather than a quantitative standard to determine if the informa- doi:10.1073/pnas.1018707108/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1018707108 PNAS Early Edition ∣ 1of6 evaluate the appropriateness of reaching a conclusion is the col- required multiple examiners for each image pair, while incorpor- lective judgments of the experts. Various approaches have been ating a broad range of fingerprints for measuring image-specific proposed to define sufficiency in terms of objective minimum effects required a large number of images. criteria (e.g., ref. 22), and research is ongoing in this area (e.g., We sought diversity in fingerprint data, within a range typical ref. 23). Our study is based on a black box approach, evaluating of casework. Subject matter experts selected the latents and the examiners’ accuracy and consensus in making decisions rather mated exemplars from a much larger pool of images to include than attempting to determine or dictate how those decisions are a broad range of attributes and quality. Latents of low quality made (11, 24). were included in the study to evaluate the consensus among examiners in making value decisions about difficult latents. The Study Description exemplar data included a larger proportion of poor-quality exem- This study is part of a larger research effort to understand the plars than would be representative of exemplars from the FBI’s accuracy of examiner conclusions, the level of consensus among Integrated AFIS (IAFIS) (SI Appendix, Table S4). Image pairs examiners on decisions, and how the quantity and quality of image features relate to these outcomes. Key objectives of this were selected to be challenging: Mated pairs were randomly study were to determine the frequency of false positive and false selected from the multiple latents and exemplars available for negative errors, the extent of consensus among examiners, and each finger position; nonmated pairs were based on difficult factors contributing to variability in results. We designed the comparisons resulting from searches of IAFIS, which includes study to enable additional exploratory analyses and gain insight exemplars from over 58 million persons with criminal records, or SI Appendix in support of the larger research effort. 580 million distinct fingers ( , section 1.3). Participants There is substantial variability in the attributes of latent prints, were surveyed, and a large majority of the respondents agreed in the capabilities of latent print examiners, in the types of that the data were representative of casework (SI Appendix, casework received by agencies, and the procedures used among Table S3). agencies. Average measures of performance across this heteroge- Noblis developed custom software for this study in consulta- neous population are of limited value (25)—but do provide in- tion with latent print examiners, who also assessed the software sight necessary to understand the problem and scope future work.