CSAFE Annual Progress Report

This document was produced as a part of a NIST Center of Excellence in cooperative agreement (officially referred to as the Center for Statistics and Applications in Forensic Evidence (CSAFE)).

Cooperative Agreement: 70NANB15H176

Reporting Period: June 1, 2016 – June 30, 2017

Submission Date: July 30, 2017

Project Period: June 2015 – May 2020

NIST Project Manager: Susan Ballou Program Manager for Forensic Sciences Law Enforcement Standards Office, NIST [email protected], (301) 975 - 8750

Investigators: Alicia Carriquiry – Iowa State University [email protected], (515) 294-3440

William F. Eddy – Carnegie Mellon University Hal Stern – University of California Irvine Karen Kafadar – University of Virginia

DUNS and EIN Numbers: 005309844 and 42-6004224

Website: www.forensicstats.org

Table of Contents

CSAFE Annual Progress Report ______1 Executive Summary ______1 Selected Accomplishments______2 STATISTICAL FOUNDATIONS ______11 Project K - Improving the Statistical Validity of Forensic Science Databases – size, relevance, representativeness and utility ______11 Project AA - Inverse Problems in Forensic Science – A statistical approach ______18 Project BB - Blind Proficiency Testing: Designing a methodology for forensic laboratories 25 Project DD - Probabilistic Framework for the Evaluation of Complex Forensic Inference: Accounting for uncertainty, multiple data sources and expert judgement ______31 Project HH - Statistical Modeling Framework for Pattern Evidence ______39 FIREARMS AND TOOLMARKS ______44 Project O - Developing Methods for Comparison of Cartridge Breechface Images ______44 Project CC – Statistical and Algorithmic Approaches to Matching Bullets ______51 SHOE PRINTS AND TREADMARKS ______59 Project II - Understanding and Modeling the Probability Distribution of Accidental/Randomly Acquired Characteristics (RACs) for Shoeprint Matching ______59 Project EE – Statistical and Algorithmic Approaches to Shoeprint Analysis ______62 Project A - Statistical Models for the Generation and Interpretation of Shoeprint Evidence 68 HANDWRITING ______76 Project G – Towards a Score-Based Likelihood Ratio for Handwriting Evaluation ______76 ______81 Project Q - Developing a Statistical Foundation for Latent Print Comparison ______81 Project V - Latent Proficiency Testing ______88 Project X - Quality Metrics for Latent Fingerprints ______95 ______101 Project D - StegoDB: An Image Dataset for Benchmarking Steganalysis Algorithms ___ 101 Project J - Mobile App Forensic Analysis ______118

Project S - Statistical Methods for Change Detection Over Time in Digital Forensics Data ______130 BLOOD PATTERN ANALYSIS ______137 Project P – Combining Fluid Dynamics, Statistics and Pattern Recognition in Bloodstain Pattern Analysis, to Quantify Spatial Uncertainty and Remove Human Bias ______137 HUMAN FACTORS ______144 Project E - Analysis of Forensic Testimony and Reports ______144 Project I - Evaluating Lay Perceptions of Forensic Evidence and Forensic Statistics ___ 150 Project M - Human Factors in Visual Identification: A cross-cutting research proposal __ 157 Project T - Forensic Processing and Human Factors at Crime Laboratories ______164 Project U - Research on Lawyers, Jurors, and the Evaluation of Forensic Evidence ___ 174 Project W - Legal Education and Forensic Evidence ______186 TRAINING AND EDUCATION ______198 Project F - Education of Legal Professionals on the Role of Statistics in Forensics and the Law ______198 Project H – Probability and Statistics Education for Forensic Science Practitioners ____ 205 Project L - Training of Forensic Practitioners in Uncertainty and Measurement Error and the Statistical Presentation of Forensic Evidence ______209 Project N - Summer Undergraduate Research Experience (SURE) in Forensic Science and Statistics ______215 Project Y - Training Statisticians in Forensic Science______221 CENTER ADMINISTRATION ______224 Advisory Boards ______224 Facilities and Personnel ______227 Meetings and Events ______228 Internal Evaluation ______232 APPENDIX ______233 Exhibit 1 ______234 Exhibit 2 ______236

Executive Summary

Despite many advances in scientific equipment and the computing technology used to collect valuable pattern and digital evidence for forensic investigations, the probative value of such evidence remains limited due to a lack of statistical foundation for its use. Unlike forensic DNA analysis, which has an underlying biological foundation, a well-defined measurement process, and statistically-based error rates and clear and repeatable standards for analysis, interpretation, and reporting, many other forms of forensic analysis lack the same degree of statistical rigor. To address this issue, the Center for Statistics and Applications in Forensic Evidence Center of Excellence is working alongside the National Institute of Standards and Technology through a $20 million dollar, 5-year cooperative agreement.

A collaboration between Iowa State University, Carnegie Mellon University, University of Virginia, and University of California, Irvine, CSAFE marshals the breadth and strength of the statistical community in the U.S. and abroad to collaborate with forensic and statistical scientists at NIST, the forensic community, and state and local crime laboratories to ensure the quantification of uncertainty associated with various pattern and digital evidence analyses. Our Center develops inferential procedures with error rates, sets in place education and training in these new methods, and transfers the new tools to forensic practitioners and other stakeholders. Through its activities, CSAFE plays a central role in the advancement of the statistical and probabilistic bases of analyses for common forms of pattern evidence and digital evidence, thereby fulfilling the goal of placing forensic science on a firm scientific foundation.

Key Focus Areas

• CSAFE creates statistical foundations for digital and pattern evidence that can be used to analyze and determine the strength of evidence and its forensic interpretation.

• CSAFE develops education and outreach for judges, lawyers, law enforcement and forensic science investigators and practitioners to help decipher and communicate forensic science results comprehensibly and accurately.

• CSAFE collaborates with forensic science agencies at the federal, state and local levels to create objective, repeatable standards and methods to reduce human factors, including bias, risk and error.

The following report outlines activities for project year 2. It includes a review of the activities of CSAFE for the time period June 1, 2016 – June 30, 2017. Based on the exceptional progress, CSAFE is well positioned to expand the knowledge base for researchers, practitioners and lay audiences.

CSAFE Annual Report 06.2017 Executive Summary 1

Selected Accomplishments

CSAFE has significant progress toward each of the objectives discussed below. CSAFE members have cultivated partnerships with state and federal crime laboratories to evaluate current practices, to collect and analyze forensic data, and to aid in the understanding and processing of evidence. At least two novel forensic techniques (a 3D-based algorithm to compare striation in bullets, and a metric to quantify the quality of latent prints) are approaching the level of development required to initiate testing in the field.

Objective 1—Bring together the leading experts in statistics and other relevant sciences and provide them with the resources needed to conduct robust research

CSAFE has built an exceptional team of over 60 distinguished statisticians, scientists, and practitioners. CSAFE researchers are advancing the field of forensic science by actively engaging in 29 projects that span diverse areas of human factors, pattern evidence, and digital evidence. To support continued advancement, CSAFE is committed to ongoing efforts to expand the team and secure additional resources.

CSAFE Leadership Team

• Alicia Carriquiry – Iowa State University • William F. Eddy – Carnegie Mellon University • Hal Stern – University of California, Irvine • Karen Kafadar – University of Virginia

CSAFE’s commitment to identifying, engaging, and learning from key experts in the field of forensic science led to the development of three advisory boards—senior, technical, and practitioner. These advisory boards encompass a distinguished body of professionals from diverse forensic science backgrounds and were brought together to provide additional leadership and strategic direction to CSAFE operations. Advisory board members have been engaged to comprehensively evaluate current CSAFE policies and strategical approaches to development and growth. Advisory board members assess current performance against CSAFE goals, identify opportunities to connect CSAFE to key stakeholder groups, review CSAFE materials, and provide feedback to CSAFE leadership.

In addition to increasing partnerships with internal team members, growing the infrastructure, and designing a system for enhanced feedback, CSAFE is aggressively addressing industry needs related to technology, advanced data collection, and information storage capacity. To promptly address deficiencies, CSAFE invested in advanced, cutting-edge computing equipment with storage capacity of the hundreds of terabytes that is expandable to petabyte level when needed, in a Sensofar 3D confocal microscope for surface metrology, and in an

CSAFE Annual Report 06.2017 Selected Accomplishments 2

EverOS shoe sole scanner. Much of this equipment was purchased with funds provided by Iowa State University or through a generous grant from the Roy J. Carver Charitable Trust.

Objective 2—Build relationships in the practitioner community and disseminate information to the community

The center has made it a priority to cultivate relationships with key individuals and organizations in the forensic science community to evaluate current practices and to aid in the understanding and processing of evidence.

As a first step, CSAFE researchers in all partner institutions have reached out to specific forensic science organizations to discuss ongoing research projects and receive feedback.

Key examples include:

• The CMU research team holds weekly meetings with the Allegheny County Medical Examiner’s Office and has embarked on research projects that were proposed by the forensics collaborators. • UCI researchers are collecting information about handwriting in cooperation with the Los Angeles Police Department • Researchers at UVA are interacting with the Houston Lab on projects in at least two areas: contextual bias in forensic examinations and assessment of the quality of latent fingerprints. • ISU has focused on engaging the community of firearm and toolmark examiners. In addition to the joint project with the Defense Forensic Science Center, collaborations are ongoing with Association of Firearms and Toolmark Examiners (AFTE, examiners in the Houston Forensic Science Center and with examiners in the Los Angeles Police Department (LAPD), the Denver Crime Lab, and the State of Virginia Crime Lab. • ISU is also working with the Iowa Division of Criminal Investigation and with the Story County Sheriff’s Office.

As a means to strengthen communication and collaboration with NIST, CSAFE faculty and students have visited NIST, and NIST team members have come to CSAFE facilities for the purpose of increased knowledge sharing. The two teams have identified productive ways in which resources can be combined to advance understanding and discovery in forensic sciences.

CSAFE has increased its visibility in the forensic community by participating in and presenting research results at professional conferences and national events. Select examples include events such as the annual meeting of Association of Firearm and Tool Mark Examiners and [email protected] has been an effective way to interact with the forensic community and to communicate our interest in establishing a collaborative relationship with practitioners.

CSAFE Annual Report 06.2017 Selected Accomplishments 3

It is important to demonstrate that CSAFE researchers do novel and high-quality work by submitting papers to top-tier journals in the relevant areas. In the past year, the work of CSAFE researchers has appeared (or has been accepted) in the Annals of Applied Statistics, the Journal of Forensic Sciences, and other notable publications.

Hosting meetings, workshops and other events to bring together various audiences is also key to CSAFE’s mission. This year, CSAFE established a Center-Wide Webinar series, sponsored a Digital Evidence Workshop, and held an All Hands Meeting.

Objective 3—Provide funding for high-quality research in statistical foundations (as well as the basic sciences on which the methodologies will build) to permit the quantification of error rates and uncertainty in pattern and digital evidence.

In the last year, CSAFE has made significant advancements in addressing fundamental concerns in forensic science. Key accomplishments from CSAFE’s exceptional research in pattern and digital evidence are listed below. The information presented here represents a small section of the high-quality research in a variety of areas that is sponsored by CSAFE.

Statistical Foundations for Modeling Evidence

In collaboration with NIST researchers and others, statisticians at CSAFE are exploring fundamental questions related to the construction of useful databases for research purposes, the potential of likelihood ratio-like approaches to evaluate pattern and digital evidence, and the statistical properties of model-free methods based on the construction of “scores”. For most forensic disciplines, useful data are either nonexistent or not widely available; this is problematic in the short term, but absence of such data creates an opportunity to think about how to go about collecting information. CSAFE has made data collection a priority, and is currently funding projects to create databases of 3D images of bullets, breech faces and firing pin impressions, shoe outsoles, stego images, smart phone applications. Once these data have been assembled in a searchable and curated database, they will be made publicly available, so that a wide community of scientists will be motivated to conduct research useful in forensic practice.

Additionally, researchers at CSAFE have completed a conceptual inferential framework that clarifies the contributions of databases to statistical inference. The framework identifies the parameters of the probabilistic model that arises from pattern data after they have been transformed into mathematical vectors via "metrics," such as those that quantify features of minutia in fingerprints. More importantly, the framework identifies the type of information that is needed in order to provide suitable prior knowledge for performing inference on those parameters, hence calculating a meaningful likelihood ratio. These initiatives help clarify and simplify the presentation of forensic evidence in terms of statistical summaries.

CSAFE Annual Report 06.2017 Selected Accomplishments 4

Firearms & Toolmarks

CSAFE researchers at ISU have developed an algorithm to compare bullets that will enable firearm examiners to attach a “degree of certainty” to the results of their examinations. This is a potentially transformational contribution, which can empower examiners who at present only rely on subjective assessments. To aid in this effort, CSAFE researchers are collaborating with firearm examiners from AFTE (Association of Firearms and Toolmark Examiners), the Houston Crime Lab, the Denver Crime Lab, the Story County Sheriff’s Office, the Iowa Department of Criminal Investigations, the Virginia Crime Lab, and the Defense Forensic Science Center.

In parallel, CSAFE researchers at CMU have proposed a statistical model that uses circular bases functions to represent 2D breech face images. The approach appears to have good statistical properties, and to effectively distinguish mated from not-mated pairs of cartridge cases. The DFSC has shared about 18,000 cartridge cases that will be sub-sampled at a 10% rate and will imaged using a 3D confocal microscope. These cartridge cases were used in a 2014 black-box study with firearms examiners. The 3D images to be obtained will help the CMU team to validate the 2D model and extend it to permit comparison of 3D images. The complete set of images will be added to NIST’s collection of publicly available ballistics images.

Shoe prints & Tread marks

CSAFE researchers are investigating spatial distribution and uniqueness of tread pattern features of shoe outsoles using a large database of shoe/brand types. Researchers have grown the dataset to about 70,000, and work is proceeding on implementing a web-based search interface of this dataset, which would allow exploratory browsing, search by metadata (e.g., brand, category), and search-by-image functionality using images descriptors. In collaboration with Israeli Police, researchers have made use of a newly acquired dataset consisting of 600 shoe impressions marked with 20,000 accidentals. The additional dataset will advance efforts to improve the existing statistical models of shoe print analysis by implementing the use of image-based appearance models to report the appearance consistency of accidentals. Researchers are also making advancements in the development of 3D shoe print scanning technology. Expansions in the use of this new technology will transform the process of shoe print database construction and modeling, allowing for increased confidence in the determination that a shoe print matches a putative shoe.

Handwriting

CSAFE researchers are developing a statistical approach to model the complexity of handwriting samples in relation to examiner assessment. In partnership with the Los Angeles

CSAFE Annual Report 06.2017 Selected Accomplishments 5

Police Department and the Los Angeles County Sheriff Department (LASD), researchers have analyzed data from five different examiners who were presented with 101 signature pages. Researchers assessed the consistency of examiner judgements across signatures. A linear statistical model was used to estimate the variability in scores due to examiners, signatures, and random variation. From these measurements, it is possible to estimate the reliability and variability of the complexity assessment process by forensic examiners. Researchers’ complexity measurements of signatures to be developed will have meaningful applications in score-based likelihood ratio analysis of handwritten evidence that will be useful to the forensic science community.

Fingerprints

CSAFE researchers have proposed a quality metric, a new measurement to quantify the quality of individual features in a latent fingerprint, which can reduce the subjectivity of fingerprint examiners and allow for more scientifically grounded confidence in results. Latent finger print analysis is second only to DNA analysis in terms of public confidence. Yet, comparison of latent prints with prints obtained from a suspect is largely subjective. There are ongoing efforts in the forensic community to develop quantitative methods to compare prints. At CSAFE, we have proposed a quality metric that will enable practitioners to up or down-weigh the value of a match between two prints.

In addition, an ongoing partnership with the Defense Forensic Science Center (DFSC), will for the first time compare two models for latent print proficiency testing and asses the number and types of errors on a test composed of realistic case samples as compared with an existing commercial test offered by Collaborative Testing Services, Inc. (CTS). Investigators also plan to compare the performance of practicing latent print analysts to participants at lower levels of training (i.e., forensic science trainees), or even those without training (i.e., university undergraduates). This will help the forensic community better understand (and eventually improve) the ecological validity of proficiency testing efforts, and eventually better calibrate latent fingerprint proficiency testing.

Digital Forensics

Three digital forensics projects are ongoing at CSAFE. These projects focus on the collection, construction, and analysis of various types of digital evidence. In particular, the steganalysis project has made significant progress in assembling a large database of stego images using reverse-engineering of mobile phone apps. So far, over fifty thousand stego images have been collected. This allows the application of deep learning algorithms such as deconvolutional neural nets to determine whether the file is hiding payload. Researchers are also developing investigation tools new to the steganalysis community via the use of mobile phone cameras in place of still cameras. Researchers have shown that variable camera

CSAFE Annual Report 06.2017 Selected Accomplishments 6

settings such as ISO and exposure time in fact affect the error rates, thus indicating that false positive and negatives can be quantified in specific models, not a specific device. Practical applications of this discovery would allow examiners to use a steg detection algorithm from mobile phones available in the laboratory, permitting the use of the algorithm on a model of phone that is part of an investigation.

Other researchers have been exploring the problem of determining if two event streams were generated by a single source or two different sources and developed a statistical methodology using score-based likelihood ratios. The method researchers developed is based on a statistical methodology known as marked point processes, and the particular technique researchers used for constructing likelihood ratios was based on the fraction of neighbors of events that are of the same type or of a different type. Using real-world student- generated event data from laptops and phones, researchers demonstrated hold-out true positive rates in the range of 85% to 95% and false positive rates of 3% to 10% (depending on which specific methodology was used).

Blood Pattern Analysis

Blood pattern analysis has long relied on unsophisticated methods. CSAFE researchers are working on more precise algorithms to increase analysis accuracy through the use of fluid dynamics theory and inverse methods. Discussions with project collaborators resulted in increased understanding that the field lacks the necessary data and simulations needed to address this issue more effectively. Researchers responded by creating a database of high- quality blood spatter videos—13 gunshot backspatters and 47 beating spatters—which can be used to compare various methods of crime scene reconstruction (tangent method, method of strings, Hemospat or other software, and the 2D+ code).

In addition, researchers are developing software tools that will be used for studying the trajectory of a blood drop. CSAFE tools break down a segment of blood spatter video into frames and give a mathematical description to both the drop of blood and to its location in the image. This representation allows researchers to separate the motion of the blood drop from its change of shape throughout its flight. Currently, CSAFE work has demonstrated the ability to use videos to measure, model, and infer blood dynamics, which is a very innovative approach that has led to remarkable progress.

Objective 4—Support high-quality research that seeks to create objective, repeatable standards and methods to understand and reduce human factors, including bias, risk, and error.

CSAFE investigators interested in how forensic evidence is presented in court through testimony. We collect and analyze forensic testimony in order to increase understanding of the quality of testimony presented in real cases. Researchers have completed data collection

CSAFE Annual Report 06.2017 Selected Accomplishments 7

for court transcripts for the three disciplines (latent prints, tool marks, questioned documents. Results of preliminary analysis show that firearm and tool mark examiners exhibit greater consistency in identification reporting than latent print examiners. CSAFE research continues to inform and guide discussions in the other forensic disciplines regarding testimony best practices in comparison to actual testimony provided cases. By measuring the degree of penetration of current disciplinary reporting standards, the study can help inform forensic and legal discussions about promulgating new reporting standards.

Also related to presenting evidence in court, CSAFE researchers have devoted significant effort to the evaluation of lay perceptions of forensic evidence and forensic statistics. Several studies have been conducted to investigate a lay person’s judgment of the relative strength of various conclusions that a forensic scientist might present after comparing two pieces of evidence(e.g. fingerprints) to determine whether they have a common source. These comparisons provide insight into whether particular statements will be perceived in the manner intended when used in testimony.

CSAFE researchers have also been working closely with forensic practitioners to understand forensic processing and human factors at crime laboratories. We worked our partners at Houston Forensic Science Center (HFSC) to gather the data. Preliminary data analyses revealed a wide range of latent prints available between cases and suggested that exclusions, verifications and consultations occur in a meaningful, non-trivial number of cases. This research, involving collaboration with crime laboratories to study processing of real cases, has not been done in any large-scale way in the past and contributes greatly to the field.

In order to help future law students understand human factors, CSAFE researchers created a curriculum focused on new methods of instruction for law students regarding forensics, scientific evidence, and statistics and implemented two courses at the University of Virginia Law school. In addition, researchers developed a flagship forensic science seminar, where visiting scholars, researchers, and practitioners can discuss cutting-edge issues in forensics, and a forensics litigation course. The course is designed to be replicated in other universities and trainings to teach law students and practicing lawyers how to litigate expert evidence questions in a mock-trial setting. Researchers have generated a great deal of interest and engagement across multiple university disciplines and are seeing increased participation in the professional development opportunities being offered.

CSAFE Annual Report 06.2017 Selected Accomplishments 8

Objective 5—Offer training and education that promotes quantitative literacy not only among forensic scientists, but also in the judicial community.

Workshops for Practitioners

CSAFE researchers are working to address the critical need to improve statistical literacy among forensic science practitioners. Ongoing discussion in the forensic community regarding appropriate standards for reporting and testimony involves statistical concepts such as hypotheses, error rates, and likelihood ratios, however many practitioners lack adequate background knowledge in these areas. Since 2015, CSAFE has provided statistics courses for approximately 200 forensic practitioners nationwide, providing context and training on the applications of probability and statistics to forensic science. Due to the success of these courses, CSAFE is responding to the increased demand for training by working on the development of on-line training materials to efficiently distribute resources to a wider audience.

Lawyer Training Videos

The training videos on forensic statistics were prepared and presented to participants in the National Forensics College at Cardozo Law School, a training conference for lawyers. Participants were required to view the videos before attending the conference and to complete a test on their comprehension and ability to apply the covered concepts. The test results were reviewed and then a follow-up lecture was conducted at the Forensics College designed to correct errors and misconceptions and reinforce concepts. Experience and knowledge gained through this process will be used to improve future training presentations on forensic statistics. The videos will be posted on the CSAFE website.

Undergraduates & High School Students

During summer 2017 CSAFE partnered with multiple minority serving colleges and universities to invite minority and non-traditional undergraduate students for a research internship. Students increased their knowledge base of statistical foundations in forensic science through educational sessions, and were given the opportunity to apply this information in hands on research projects in pattern and digital evidence. In addition, CSAFE sponsored activities for talented and gifted yet low-income high school students between the ages of 14 and 18. Students were exposed to forensic science in action through participation in mock crime scene evidence collection, analysis, and court testimony. CSAFE has received positive feedback from students about the experience provided, and plans to increase student outreach efforts.

CSAFE Annual Report 06.2017 Selected Accomplishments 9

Looking Forward

As CSAFE moves into year 3, the team plans to dedicate significant efforts to translating groundbreaking advancements in research into practical applications for real-world forensic science investigations. With the strongest and largest group of statisticians working in forensic sciences and dedicated and tremendously accomplished faculty, staff, students, and collaborators, CSAFE is confident it will succeed in this mission. Additional partnerships continue to develop rapidly, allowing greater potential for future progress. New facilities at Iowa State University, the lead institution, will allow for additional visits from personnel key to research advancements, such as long-term collaborative visits from NIST researchers. CSAFE looks to increase staffing opportunities and events to multiply noteworthy accomplishments relevant to the everyday practice of forensic sciences.

CSAFE Annual Report 06.2017 Selected Accomplishments 10

STATISTICAL FOUNDATIONS

Project K - Improving the Statistical Validity of Forensic Science Databases – size, relevance, representativeness and utility

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Anjali Mazumder (CMU faculty)

Other Investigators: Stephen Fienberg (CMU), Jay Kadane (CMU faculty), Alicia Carriquiry (ISU faculty), Max G’Sell (CMU faculty), Robin K. Mejia (CMU faculty), Hal Stern (UCI faculty)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? a) Identify and address the statistical issues that concern the use of forensic databases in criminal investigations including, size, representativeness, bias, etc., b) Apply/develop statistical methods to quantify and adjust for such uncertainty in the absence of data, databases for inference, c) Quantifying the possible weight of evidence from a database search and/or developing associated methods/frameworks associated methods to determine a probabilistic match, if possible d) Propose/advise on design and collection of evidence and use of methods relating to forensic databases. e) Develop statistical methods which may involve search algorithms suitable for comparison of partial samples and big data, depending on the status and availability of data. f) Develop a coherent probabilistic framework for evaluation and interpretation of any calculation made on searches and its use in court

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? CMU-CSAFE researchers have continued to meet and narrow the focus of the research questions related to databases. The death of Professor Stephen Fienberg and other circumstances has resulted in slow progress. However, we have made more progress in the last quarter of FY17, and expect progress to increase over the next year.

• We drafted an outline, delegated co-authors to allocated forensic domains and working towards a draft of an initial paper focusing on the statistical challenges

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 11

associated with forensic databases. We expect to make further progress in the next two quarters. • CMU-CSAFE researchers Anjali Mazumder, William (Bill) Eddy, Jay Kadane, Robin Mejia and Max G’Sell (along with Alicia Carriquiry) continue to meet bi-weekly to discuss a framework for addressing the different types of forensic science databases, and associated statistical evidence and issues. Late in the project year we identified Hal Stern as a possible collaborator, and asked him to join us semi-regularly for meetings via Skype/phone. • The SURE 2016 program in forensic science and statistics held in summer 2016 (Project N) provided an opportunity to engage undergraduate research students to explore the use of forensic science databases and critically think to identify statistical issues. Anjali Mazumder assigned the SURE students a miniproject on forensic databases related to either shoeprints or bullet cartridges. They were expected to search for the existence of databases (in the respective area), identify their uses and administration and identify statistical issues in any comparison or ‘identification’ made using such databases. They presented this in an oral presentation on the last day of the program to CSAFE members and delivered a poster presentation to the wider CMU community. • We initiated conversations with NIST researchers William Guthrie, Shannan Williams and Melissa Taylor regarding a possible follow-up survey or communication with crime labs to determine the existence and utility of forensic databases. We are continuing to engage with forensic practitioners and researchers to identify possible sources and utility of databases. • Anjali Mazumder supervised two undergraduate research students who surveyed and gathered examples of databases for different types of forensic domains • We have been focused on narrowing research questions and identifying key statistical issues to be addressed for each forensic evidence type/domain including: respect to samples, reference populations and issues related to calculations (such as likelihood ratios) from databases. This is working towards, milestone/activity 1 year 2, addressing statistical issues related to databases including bias, etc.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? This project has provided the opportunity for undergraduate research projects. Additionally, it has provided a great opportunity for collaboration across CSAFE institutions.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? As noted above, initial work done by the SURE 2016 students resulted in a poster which will be posted on the CSAFE website. We also submitted an abstract to the 10th International

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 12

Conference on Forensic Inference and Statistics. We were accepted and will present at the conference in September

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? • Continue to gather examples of databases and create a summary of databases, key features, and highlight statistical issues associated with them. This will be an ongoing activity through the course of the project. • Make targeted efforts to collaborate with labs, forensic science providers, lawyers and NIST researchers on databases, their use, examples, and identification of statistical issues. This will be an ongoing activity through the course of the project. • Draft paper based on research questions in working document described above, particularly with respect to samples, reference populations and issues related to calculations (such as likelihood ratios) from databases. • Take advantage of a visit from Jennifer Newman (CSAFE - ISU) in quarter one of FY18 will also provide the opportunity for collaboration on databases regarding steganography and other digital images. • Present an initial draft paper on the statistical challenges at the upcoming International Conference for Forensic Inference and Statistics. • Jointly with efforts of Project DD, we are proposing a mini-workshop at CMU on databases and complex forensic inference in mid-September. This project links with Project DD, in two ways: (a) in the absence of databases and data, how to address forensic inference questions, and (b) how to include and evaluate the weight of evidence from a database search when presenting the joint or combined evidence.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS. SURE students made poster presentations on shoe prints and bullet cartridges. These will be made available on the CSAFE website.

Abstract submitted and accepted to the 10th International Conference on Forensic Inference and Statistics.

WEBSITE(S) OR OTHER INTERNET SITE (S) Posters from SURE students have been provided and will be posted on the CSAFE website (forensicstats.org).

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 13

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Anjali Mazumder and Robin Mejia have been drafting working documents and continue to lead on and gathering examples of databases, and working with William Eddy, Jay Kadane, Max G’Sell and Alicia Carriquiry on summarizing the features of such databases, their use and statistical issues associated with it.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Nothing to report at this time.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? Current forensic databases have been developed as a response to law enforcement needs and requests usually without careful consideration of the statistical and scientific principles associated with use of such a database. Databases and their search algorithms are often under government or commercial property knowledge with little to no published material on the methods. The matching process often produces a list of candidates based on some score but with no known or statistically interpretation. News articles and court judgements have often overturned decisions due to search criteria involving partial, poor quality samples, and diminutive metrics.

By addressing statistical issues that govern forensic databases and applying/developing statistical methods to address existing issues regarding design and characteristics of a database, we will be able to assess the bias and usefulness of such database search algorithms, including uploading of poor quality samples. Development of statistical methods that provides a coherent and interpretable framework for suitable comparison of partial matches will improve the statistical validity of calculations and identification of ‘matches’ for the courts, and ultimate increase confidence in law enforcement and the science behind the black box so that courts, jurors and scientists alike have greater confidence in results.

This project links with Project DD, in two ways: (a) in the absence of databases and data, how to address forensic inference questions, and (b) how to include and evaluate the weight of evidence from a database search when presenting the joint or combined evidence.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? This may have wider implications for administrative/government databases beyond that forensic science.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 14

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? Develop confidence in calculations produced and quantification in the identity from a database search.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? Zero.

Changes/Problems The death of Professor Stephen Fienberg and other circumstances has resulted in slow progress. However, we have made more progress in the last quarter of FY17, and expect progress to increase over the next year.

Task 1 - The summary of databases referenced in task 1 in the original project proposal - is been delayed due to the collection of databases taking longer than initially planned. This will also involve a continuous update; however, we plan to pull together a draft by summer 2018.

This project is not focused on the collection of databases but an understanding of what databases exist, what is in them and how are they used. It is not our intention to reconstruct databases, but to address statistical issues that arise in the use of such databases without an understanding of features, prevalence of features in a population, relevant populations, etc.

Task 2 - Meeting with national and international collaborators regarding database needs, and associated report - was due in June 2017. This is a primary focus and has been delayed in part due to Steve’s death and expected travel last Fall and early summer this year. We are engaged with some collaborators and are in the process of planning for a series of workshops over Year 2 to address these issues, with the first one being held in September 2017.

Task 3a – Draft of statistical issues relating to forensic databases - there should be a clarification as there are different types of databases: (a) those created by law enforcement officially and adhoc, and (b) those creased for scientific purposes. Both are needed to calculate weight of evidence, and the concern is the use of (a) without knowledge of (b). This should be in a shareable draft by December 2017 – this is task 3a (and addition/amendment) to task 3 which has been split into two deliverables.

Task 4 - Will submit abstract to one or more of IAI, AFTE, AAFS, ICFIS conferences to conduct workshop/present- This is arealdy underway. An abstract was submitted to ICFIS and is expected to be presented in September 2017.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 15

REVISED SCHEDULE

Year Task Number CSAFE Deliverable(s} Delivery Schedule Investigator

2016 Task 1 & 2 Mazumder Summary of databases and May 2018 statistical issues for key pattern evidence.

2017 Task 2 Mazumder, Meeting with national and Summer 2018 Fienberg, international collaborators,

Kadane, regarding database needs, and Carriquiry, associated report. Ramaman Task 3a Draft of statistical issues December Mazumder, relating to forensic databases 2017

Kadane, Eddy,

Mejia and G’Sell Will submit abstract to one or Task 4 more of IAI, AFTE, AAFS,

Mazumder, ICFIS conferences to conduct Fienberg, workshop/present Spring/Fall Kadane, Eddy, 2017 Carriquiry

2018 Task 3b Mazumder, Report and develop statistical Fall 2018 Fienberg, methods that aim to address

Kadane, Eddy, and adjust for bias involving Carriquiry inadequate size, reference population and unobserved

features. Task 6 Fall 2018

Mazumder, Submit abstract to one or more Fienberg, of IAI, AFTE, AAFS Kadane, Eddy, conferences to conduct Carriquiry workshop

2019 Task 5 Mazumder, Report, publish and develop a April 2019 Fienberg, probabilistic framework for

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 16

Kadane, Eddy, database matches and Carriquiry, searches, particuarly involving

partial matches and big data.

Task 7

Mazumder, April 2019 Fienberg, Publish standards on Kadane, Eddy, databases and associated Carriquiry statistical issues

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 17

Project AA - Inverse Problems in Forensic Science – A statistical approach

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PIs: William (Bill) Eddy (CMU faculty)

Other Investigators: Stephen Fienberg (CMU faculty), Anjali Mazumder (CMU faculty), Lotem Kaplan (CMU Postdoc), Daniel Attinger (ISU faculty)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? An inverse problem is the process of calculating from a set of observations the causal factors that produced them; it’s called an inverse problem because it starts with the results and then calculates the causes – it is the inverse of a forward problem which starts with the causes and then calculates the results. In science and engineering, the inverse problem is the process of retrieving physical properties of the object or activity from the observations. Forensic science problems often involve identifying the source of a sample, for example, the source of a biological sample left at the scene of a crime. Moreover, in forensic science the observed evidence collected at a crime scene is a result of some action, i.e. an inverse problem. For example, retrieving the action (physical properties) that led to a bloodstain pattern (observed evidence or result). A similar example would be determining the (backward) trajectory of a bullet lodged in a wall. Digital images can be easily created, altered and manipulated without leaving obvious traces. Such retrieval of the historical manipulation and processing is critical for many digital forensic cases, for example, images of child pornography, and is an inverse problem. A large-scale inverse problem involves the forensic investigation and reconstruction of a lost plane in the ocean, such as that of the MH370 flight from Kuala Lampur to Beijing on 8 March 2014. Other forensic inverse problems include estimating the postmortem interval from entomological data; and identifying the source of an electronic transmission, fire or incendiary device.

Typically, inverse problems are ill-posed; That is there are an infinite number of forward models which would result in the actual observations. For forensic inverse problems, such as bloodstain pattern analysis, the forward forensic approach is not precisely known and requires an unknown error in the model. Our objective is to determine the likelihood surface of the problems described and to develop statistical methods that allow the combination of data sources and quantification of uncertainty which is currently lacking in the current calculation of forensic inverse problems.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 18

When utilizing different components of blood and blood spatter that are either not used or used separately, we will aim to combine these multiple pieces of evidence (e.g. biological, chemical and physical components) in the determination of the source region. Thus lending itself to larger-scale forensic inverse problems, such as determination of the region of an ocean crash site, which can also involve multiple data sources (e.g. radar, satellite pings, ocean currents, hydrophones, geophones, passenger report). The task at hand is then to combine multiple likelihoods which can be non-trivial.

For each of the examples or situations described above, the objective will be to determine the likelihood surface, quantifying uncertainty and provide a coherent framework for the interpretation of the results. In each of the examples, multiple inverse solutions may be derived from different sources of data. The project involves careful consideration of how best to combine the solutions from different data sources, and whether weights should be assigned in the combination.

Goals as listed in the original project proposal:

• Use the inverse problem framework to solve complex forensic pattern, acoustic and digital evidence problems such as , • Develop statistical methods and framework for combining multiple likelihood solutions derived from different sources of data, allowing for and quantifying uncertainty. • Incorporate the information derived from multiple sources of data to determine a source or time of action, thus increasing confidence in the results of analysis

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? We have made a great deal of progress in our primary focus of modeling blood spatter. We have studied different approaches to blood spatter analysis including current forensic blood pattern analysis and fluid dynamics. We gathered and examined blood pattern studies including blood pattern videos collected by Bart Epstein and available from the MFC crime lab, and those conducted by the Allegheny County Medical Examiner’s Office. To aid in understanding, Lotem Kaplan also attended basic bloodstain pattern recognition course and received certification.

Additionally, we examined the geometric relation between the shape of the stain and the angle in which a drop hits a surface to infer the origin of the blood, and developed a preliminary mathematical framework for modelling the shape of the stain.

Regarding objectives listed in our original research plan:

• Task 1a - Design and conduct examples of data sources used for different forms of forensic pattern evidence requiring a determination of the origin such as those described for blood pattern analysis - it is about 73% complete with drawing

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 19

and gathering examples on blood spatter. We have made good progress, but it will be an ongoing effort throughout the project. • Task 2 - Meet with our collaborators to outline statistical issues, assumptions and data sources associated with relevant case examples, including Allegheny County Medical Examiner’s Office and other labs - is at about 55% complete. We have met with our collaborator at Iowa State University, mainly Daniel Attinger, and discussed the limitations of the currently available data sources of videos of experiments, as well as the work that is being done on creating additional data sources for images of bloodstains. We have also worked with Allegheny County Medical Examiner’s Office to identify experiments and cases, which have been reviewed; Karl Williams (Chief ME) presented blood spatter cases, and we are working to acquire real cases involving blood spatter and other forensic evidence domains. • Task 1b - Conduct experiments and construct examples of data sources used for different forms of forensic pattern evidence requiring a determination of the origin such as those described for blood pattern analysis, computer/digital tomography, and other pattern, trace and digital evidence used in forensic investigations - our focus at this time is on using available data sources and focus our efforts on modeling of blood spatter rather than conducting experiments. We extended our analysis to more videos from the Ames Laboratory collection. This task is at about 5% complete. • Task 3 - Develop statistical methods that aim to address the inverse problem of bloodstain pattern analysis, and quantify the uncertainty. - we were modeling the trajectory, velocities and stain formation for a specific type of falling drops and used these models for inference and prediction, current completion is about 15%.

We have been focusing on further developing our software tools for extracting data of blood stain motion, followed by analysis and inference from the data.

Ames Laboratory holds a collection of short videos clips, each of them is a few seconds long. The videos contain, for example, shots of blood dripping from different object into different surfaces from different heights, swiped and wiped blood by different object/body parts, cast- off patterns by swinging objects, impact spatter of different forces and different objects.

The primary focus of video analysis is to review the Ames Laboratory collection that show drops of blood falling on different surfaces from different heights. We observe these videos to have two separate parts for the motion. The first part is the fly of the drop through the air. The second part is when the drop hits the surface and the formation of the stain.

We have been developing software tools that will be used for studying the trajectory of a blood drop. Our software processes a video that captures the travel of a drop of blood. A

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 20

video can be viewed as a series of consecutive frames. Our software disassembles the video into frames and processes each of those frames to extract the drop of blood from the rest of the image. We then give a mathematical description to both the drop of blood as well as to its location in the image. This representation allows us to separate the motion of the blood drop from its change of shape throughout its flight for further different uses.

In regards to the nature of flight, we extracted the location of the drop throughout its fall and inferred its velocity. We modeled the velocity for the two different height that were available in the videos collection. We also modeled the stain formation by looking on drops that fall on the same surface and then measuring the width of the stain over time as the satin is formed.

Using these models, we later infer and sample the values for a drop that would fall from a new height that is in between the two heights that were available in the videos. We produced a video of an experiment that never took place, of a drop falling from a new height.

This work shows tremendous progress in the ability to model blood spatter and inference of blood dynamics from the models. However, this research is still at preliminary stages. Currently, our work has demonstrated the ability to use videos to measure, model and infer blood dynamics. Additional development of such models highly depends on the availability of good quality videos.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Lotem Kaplan attended basic bloodstain pattern recognition course and was certified.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Based on the current progress, a presentation was given to the CMU CSAFE group.

Poster and presentation at the CSAFE All-Hands Meeting in Ames, Iowa.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? We plan to further develop the data extraction process to apply it to more complicated videos from the Ames Laboratory collection to gather data that will be used for developing an extended statistical model.

Further progress on Task 1a - Design and draw together examples of data sources used for different forms of forensic pattern evidence suited for inverse problems such as those described for blood pattern analysis, ocean crash sites, and computer/digital tomography.

Continue to meet with our collaborators to outline statistical issues, assumptions and data sources associated with relevant case examples, including Allegheny County Medical Examiner’s Office and other labs (task 2).

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 21

Design experiments and construct examples of data sources used for different forms of forensic pattern evidence requiring a study of pattern and dynamics such as those described for blood pattern analysis, computer/digital tomography, and other pattern, trace and digital evidence used in forensic investigations (task 1b)

Develop statistical methods that aim to address the inverse problem of bloodstain pattern analysis, and quantify the uncertainty (task 3)

Determine the appropriate statistical approach to combine multiple likelihoods derived from different data sources (task 4)

Refine the mathematical framework for modelling the shape and dynamics of blood. Gain more input from collaborators and share results and insights with the relevant.

Involve collaborators from NIST by identifying a specific collaborator.

Products Preliminary development of a mathematical framework for modelling the shape of the stain. More input from collaborators is needed before making this available to the public.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Lotem Kaplan (CMU postdoc), along with William Eddy (CMU faculty), is actively researching inverse problems with applications to bloodstain pattern analysis for the project and developing the software tools and mathematical framework.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Nothing to report at this time.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The determination of the source (e.g. angle, object, etc.) of a bloodspatter is similar to the inverse problem of determining the search region or crash site of a plane where we may assume that the site is located complete within the boundary of a large ocean. Water covers a large surface area of the Earth and has the added challenge of being a moving surface privy to climate and other geomorphical effects. Current approaches often involve a single source of data at a time, similar to that of bloodspatter. For instance, a physical model of the ocean currents inverted could lead to an inverse solution. However, there are possible other sources

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 22

of data too such as hydrophones – if they were sufficiently sensitive to record sound waves from a crash into the ocean, this data could be used to provide a different inverse solution.

A recent study on the reliability of classification of bloodstain patterns by experienced bloodstain pattern analysts showed that where a classification was made, either by choosing a single pattern or by nominating more than one, 13.1% of these classifications did not include the correct pattern type for rigid surfaces and 23.4% for fabric surfaces. Law enforcement and forensic examiners continue to assume that blood drops travel in straight lines, thereby severely hindering their ability to determine the region of origin, and preventing any rational estimation of the uncertainty. The opportunity for reconstruction techniques based on stain inspection is expressed in a recent report by the OSAC Bloodstain Pattern Analysis Subcommittee, “There is unexploited potential for extracting more information than is currently available in BPA by studying the mechanisms of pattern and stain formation and developing predictive and interpretive models to connect observable characteristics to stain and pattern forming mechanisms.”. Bayesian methods have proved useful in the determination of lost planes. However, the use of combined multiple sources of data to solve the inverse problem is still lacking, and is non-trivial. Similar problems exist in the field of digital forensics in which the need to combine multiple inverse solutions, particularly cases relating to organized crime and terrorism and in which (manipulated) digital and acoustic evidence is a primary item of interest in an investigation. There are several more instances in forensic science in which such inverse solutions can be applied and provide a quantification of the uncertainty in any calculation that is ‘back estimated’.

The 2009 NAS report [1] stated that “The uncertainties associated with bloodstain pattern analysis are enormous”. Related court arguments are difficult to resolve objectively because (i) quantifying the “enormous” uncertainties is still a topic of current research [1-3], and (ii) the interpretation still relies on human comparison between the patterns observed at the crime scene with those learned in BPA classes [4]. Recent findings [5-7] on the fluid dynamics behind bloodstain pattern analysis by one of the PI’s in research sponsored by USDOJ and the US Defense department, provide a greater understanding of the associated atomization, drop trajectories, drop impact and wicking phenomena.

CSAFE research towards this end is building on these advances in physical understanding and incorporating other components such as biological and chemical understanding of blood. This work adds a quantification of uncertainty and of its propagation using an inverse or backward formulation of the problem, allowing for and quantifying uncertainty, e.g. uncertainties in the measurement of individual blood stains to the related uncertainty in the determination of the region of origin of a blood spatter. Furthermore, this research is facilitating the use of multiple sources of data to assist in forensic investigations of crimes, crash sites and deaths utilizing and focusing on pattern, acoustic and digital evidence. The

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 23

use of multiple and different sources of data to determine a source or time of an action will lead to and improve determining a more coherent region or source or time of determination, increasing confidence in results.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None.

Changes/Problems Nothing to report.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 24

Project BB - Blind Proficiency Testing: Designing a methodology for forensic laboratories

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Robin Mejia (PI Change Requested)

Other Investigators: William Eddy, CMU (faculty), Cleotilde Gonzalez (CMU faculty), Maria Cuellar (CMU PhD student), Lotem Kaplan (CMU Postdoc)

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The goal of the project is to develop an innovative model for proficiency testing of forensic analysts. In order to quantify errors rates and accurately assess level of expertise, the model focuses on blinding the analysts. This new statistical approach would ensure that analysts are unaware that they are being tested, restrict task-relevant information, and sample analysis would include forms indistinguishable from other samples seen in the everyday workflow of the library. The aim is to reduce associated bias and human error in the forensic disciplines of fingerprints, toxicology, controlled substances, firearms, and shoeprints, with the ultimate goal of availability of all areas.

This new type of testing would:

i. Blind the analysts so they are kept from the fact that they are being tested ii. Restrict the information given to the analysts to task-relevant information iii. Be implemented by the laboratory management, iv. Involve samples that are indistinguishable from other samples in the everyday laboratory workflow. v. Use potentially new forms of experimental design.

The main objective of this project is to develop a manual for forensic laboratories so that management has the tools to implement blind testing as an alternative or complement to proficiency testing. Researchers plan to collaborate with laboratories during the development of the procedure in order to accurately assess current practices, and provide applicable solutions. Specifically, we will work closely with the Allegheny County Medical Examiner’s Office to learn about the specific procedures and idiosyncrasies of the laboratories, and how they could be incorporated into a blinding procedure. Researchers intend to summarize the efforts of the Houston Forensic Science Center to implement blind testing in order to fully comprehend the challenges faced.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? A PI change is being requested. Robin Mejia, a faculty member at the Carnegie Mellon University statistics department is interested in taking the lead from Cleotilde Gonzalez.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 25

Mejia demonstrated tremendous enthusiasm for the project. She has already begun as a valuable collaborator, and is well positioned to lead the project. Cuellar brought Mejia up to speed by providing the previous quarterly reports, proposal, and other memorandums that were written in the past, and they met several times to discuss the project’s directions.

1) Major activities;

a. Early in the project year, the investigators met with Karl Williams, the Medical Examiner of Allegheny County, and presented the ideas of blinding and task-relevant vs. task irrelevant information. They discussed plans to meet with the forensic analysts, better understand the flow of information in the laboratory, and learn about the LIMS content management system. b. Maria Cuellar then gave a presentation to the forensic analysts at the Office of the Medical Examiner (ME) of Allegheny County. In this presentation, we covered why blinding is important in terminology that a non-statistician could understand. She also Maria Cuellar met with Raymond Everett and Deborah Tator from the Allegheny County Medical Examiner’s Office. They discussed the flow of information in the laboratory, and learned about the NIBIN system. c. Maria Cuellar (CMU) and William Thompson (UC Irvine) met and discussed the opportunities of working with two forensic laboratories: The Houston Forensics Science Center and the Allegheny County Medical Examiner’s Office forensic laboratory. d. The researchers have started to write a report on the flow of information at ACME (currently 10%). This could clarify the specific needs of the local laboratory for blinding, as well as the particularities of the ACME laboratory relative to the Houston laboratory. e. At the suggestion of Karl Williams and Ashton Ennis from the Allegheny County Medical Examiner’s Office ACME, Robin Mejia and Maria Cuellar wrote an email to Blythe Toma to schedule a meeting at ACME. At this meeting, Mejia and Cuellar will study how the ACME currently performs proficiency testing of the forensic examiners. Knowing the specific procedures will be the first step in developing a system for blind testing. The process for implementing blind testing at a forensic laboratory will be gradual, as the CSAFE researchers will need to become very familiar with the forensic procedures at ACME, as well as the current testing that ACME implements. ACME is enthusiastic about working on this project with CSAFE, despite the time constraints they have. f. Maria Cuellar and Robin Mejia assessed challenges of implementing blind testing “from the ground up” in a new laboratory, and what components of the Houston Forensic Science Center experience were transferable vs. what issues need to be addressed.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 26

g. Robin Mejia and Marial Cuellar discussed study design, including implementing blind testing in multiple phases to work with laboratory concerns. h. Mid-year Maria Cuellar drafted a Manual Outline, because of above discussions; the Manual Outline was significantly restructured. Please see included draft.

Mejia and Cuellar will study how the ACME currently performs proficiency testing of the forensic examiners. Knowing the specific procedures will be the first step in developing a system for blind testing, and thereby addressing the objectives of the project. Currently, on track with respect to the designing and planning stage of the study.

Ongoing conversations emphasized the importance of considering each laboratory’s unique history and situations, and raised the concern that implementing a fully blind study design in a new laboratory may present challenges. The researchers concluded that implementing blinding in stages may be a preferable option in some cases and revised plans to take this into account. See revision of manual outline.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Following completion of the blind testing model manual, CSAFE researchers will be able to communicate with forensic scientists the developed best practices for implementing blind testing procedures. Researchers plan to provide training for staff at multiple forensic science laboratories.

Thanks to Cuellar’s meetings and presentation at the Allegheny County Medical Examiner’s office, forensic analysts, the medical examiner, and the department heads have a better understanding of blinding and contextual bias principles.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Presentation at the CMU CSAFE meeting to the CMU CSAFE group, the Medical Examiner and the Outreach Coordinator of the Office of the Medical Examiner.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? • Conduct an analysis of LIMS systems. Write a report about the findings. • Identify key individuals in the ACME to pursue the study of how to implement blind testing at ACME, and how it might differ from other laboratories, such as the Houston Forensic Science Center. • Revise the manual outline and start writing some of the sections. • Focus specifically on providing the opportunity for laboratories to implement changes in phases.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 27

• Consider collaborators from NIST and try to identify a specific collaborator interested in this project

• Explore current proficiency tests at ACME and compare them to the Houston Forensic Science Center.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS A revised manual outline is in progress.

A report on the flow of information at ACME is underway. This could clarify the specific needs of the local laboratory for blinding, as well as the particularities of the ACME laboratory relative to the Houston laboratory.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Robin Mejia, CMU (proposed PI, collaborator)

Cleotilde Gonzalez (PI, temporarily)

Maria Cuellar, CMU (PhD student, collaborator)

William Eddy, CMU (faculty, collaborator)

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? The Office of the Medical Examiner of Allegheny County remains interested in becoming partners with CSAFE on this project.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? Blind testing—in which analysts are not told that they are being tested—is missing from forensic analysis. This is problematic because without blinding, the proficiency tests taken by most forensic analysts might be providing misleading results. Forensic analysts rely on commercial proficiency tests to demonstrate and document their skills, but commercial test materials may differ in important ways from routine, “real world” evidence. This can cause analysts who are being tested to be aware that they are being tested, and this awareness can lead persons to analyze samples differently than they would in routine casework. Therefore, the results from a proficiency test under the analysts’ awareness can be heavily biased, and

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 28

they do not necessarily represent the expertise of the analyst in everyday work. Instead, they only represent the expertise of the analyst on a specific test. Because an analyst’s performance on proficiency tests is often cited as evidence of true proficiency in the field, forensic analysis requires a new type of test that is not biased by awareness. Blind testing—in which information about the test is masked or kept from the participant—could provide a solution to the bias arising from the analysts’ awareness of the test. Although widespread in fields like medicine, biology, and, psychology, blind testing is not often used in forensics laboratories. With blind testing, laboratory managers could estimate the error rates of the analysts and the laboratory as a whole in real world routine scenarios.

Some laboratories have already noticed that current proficiency tests are not satisfactory to evaluate analysts’ performance because analysts are not “blinded” during the test (that is, they know they are being tested so they might perform differently than they would during their everyday work). A notable example is the Houston Forensic Science Center (HFSC), a forensic laboratory that became independent from the Houston Police Department in 2013. The HFSC has recently started implementing blind testing in three of its disciplines: controlled substances, toxicology, and firearms analysis. Additionally, it has started using a context management system to decrease contextual bias and facilitate blind testing. Although HFSC has made some progress in the right direction, it still faces many challenges. For instance, it has encountered obstacles in acquiring samples to make the blind tests, designing the test samples so they look like the rest of the samples, and changing the information management systems currently available for forensic laboratories so they allow for restricting information from analysts.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? Once the manual is finished, it will be very useful to managers because it will provide them with guidance about how to implement blind testing in their laboratory.

WHAT IS THE IMPACT ON TECHNOLOGY TRANSFER? This project could have an impact on the LIMS system used by most forensic laboratories. So far, the LIMS system does not allow for blinding or sequential unmasking. Thus, the analyst sees all the information on each sample, which could lead to various types of contextual bias. It would be very useful to design a LIMS system that does allow for blinding, or sequential unmasking.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? This study could be important to the forensic community because it will allow laboratories to evaluate their analysts and the laboratory using the blinding mechanisms that are widespread in other scientific fields. It could also be important for the legal community because it will provide a different standard for evaluating the quality of a forensic analysis that is presented

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 29

in court. However, specifically how the results of the blind test should be presented is a challenge that the investigators are considering in the study. This project could also be of interest to academic statistics because it can present an important new application of experimental design in the real world. Therefore, this study could be important for forensics, the law, and academia.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? Zero.

Changes/Problems Following the death of Professor Stephen Fienberg, the project was temporarily transferred to Cleotilde Gonzalez. A permanent PI change is being requested. Robin Mejia, a faculty member at the Carnegie Mellon University Statistics Department is interested in taking the lead from Cleotilde Gonzalez. Mejia demonstrated tremendous enthusiasm for the project. She has already begun as a valuable collaborator, and is well positioned to lead the project. Cuellar brought Mejia up to speed by providing the previous quarterly reports, proposal, and other memorandums that were written in the past, and they met several times to discuss the project’s directions.

Progress on the report on the flow of information at ACME (currently 10%) continues to be delayed. We focused on other key aspects including revising the manual outline and meeting collaborators, and plan to complete the report by September 2017.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 30

Project DD - Probabilistic Framework for the Evaluation of Complex Forensic Inference: Accounting for uncertainty, multiple data sources and expert judgement

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Anjali Mazumder (CMU faculty)

Other Investigators: Jay Kadane (CMU faculty), Cleotilde Gonzalez (CMU faculty); Alicia Carriquiry (ISU faculty), Amanda Luby (CMU PhD student), Maria Cuellar (CMU PhD student), Lotem Kaplan (CMU postdoc)

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The overarching goal of this project is to develop a coherent, systematic and probabilistic framework for planning, inference and interpretation of complex cases that accounts for multiple types of evidence, addresses different proposition levels, possible causal and conditional independence relationships and incorporates expert judgement and multiple sources of uncertainty. This requires careful consideration of case circumstances and collaboration with forensic scientists and lawyers. The initial goal was to use cases of sexual assault and utilize a graphical model structure and develop a probabilistic framework for evaluating cases at the activity level of propositions and where data/studies may not be available and proxies or expert judgement may need to be incorporated.

Objectives from Project Request:

• Develop a coherent, systematic, and probabilistic framework for planning, inference, and interpretation of complex cases that accounts for multiple types of evidence, addresses different proposition levels, incorporates expert judgement and multiple sources of uncertainty. • Develop a methodological framework for eliciting expert probabilities from forensic scientists in addressing activity level propositions. • Develop statistical methods (including probabilistic graph structures) to address the computational complexities of combining different graph modular substructures, model evidence conflict (model selection), and facilities the case circumstances and time or sequence of events. • Provide a probabilistic graphical structure for evaluation and interpretation of any complex scenario and demonstrate the complex relationship between variables, logical statements and uncertainties for use in court

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 31

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? • We continued to explore and expand the literature review to consider additional types of graphical structures used to evaluate, represent and communicate forensic evidence, drawing upon forensic science, evidence and legal literature. Efforts to understand the aims of each of these frameworks and the relationships between them is ongoing. Determining equivalences is required and expected to offer both practical and methodological opportunities. Some preliminary investigation into the causal nature of addressing activity level propositions has also begun, and will add to the current Bayesian Network and Chain Event Graphical methods currently being proposed. • This work has recently been focused on the translation of Bayesian networks (BNs) into formalized arguments. Arguments are logical statements that are often qualitative, and interpretation of arguments involves inferential or deductive reasoning, which is more intuitive for non-statisticians than the conditional dependence interpretation in BNs. We are working on ways to extract arguments from an existing BN in a way to both preserve the statistical properties of the variables of interest and qualify the chain of reasoning. This includes both simple, intuitive arguments such as “Latent print X matching reference print Y on N minutiae is evidence for reference print Y being the source of fingerprint X’’; as well as more complex arguments such as “The combination of the victim’s blood on suspect’s sweater, the shape of the bloodstain on sweater, and the victim’s testimony support the proposition that the suspect punched the guard”. Through evaluation of the probabilities found in the BNs, we aim to identify which piece of evidence has the largest impact on a proposition and which observed values of evidence will lead to a change in the predicted value of the proposition at hand. This could be useful in a forensics lab to determine which piece of evidence should be given priority to be analyzed in different cases. • Survey of literature including Bayesian networks and formal argumentation systems in forensic science -- 60% complete • Faculty Brian Junker has begun engaging with Anjali and Amanda on critically reviewing the different graphical and evidential reasoning structures and their relationships -- 35% complete. • Application of selected frameworks to casework provided by the Allegheny County Office of the Medical Examiner (ACOME) and the Metropolitan Police (London, UK) and former UK Forensic Science Service -- 5% complete • Continued collaboration with Allegheny County Medical Examiner’s Office (Ashton Ennis) and working with lawyers at University of West Virginia (Valena Beatty) is actively underway to gather cases and case circumstances to utilize in the development of a probabilistic framework. This involves CSAFE researchers Anjali

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 32

Mazumder, and Amanda Luby. Nine cases have been identified with ACMEO and more are underway through Valena. 2) Specific Objectives:

• The paper on ‘modelling activity level propositions using chain event graphs” contributes to the overarching objective of developing a coherent, systematic and probabilistic framework for complex forensic inference, as well as objectives 2 and 4 of the project proposal on addressing issues of uncertainty and expert judgement, and providing a graph modular structure for interpretation. However, these issues are not full addressed as yet and further case circumstances can further illuminate this and lead to statistical methods and development of a more comprehensive framework. Little progress has been made on this paper in this quarter; however, significant effort has been made to survey the literature on graphical model structures for evaluating forensic evidence and applying them to cases. • Cases and case circumstances being made available through the collaborations will aid in the overarching development of a coherent, systematic and probabilistic framework for complex forensic inference and statistical methods to account for uncertainty, expert judgement, etc.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? • Learning opportunities for Amanda Luby include research, literature review and presenting on a new topic of statistics as it interfaces with forensic science and evidence argumentation. • Anjali Mazumder continues to lead in this area and expands the opportunity for collaboration and presenting.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? • Presentations at Isaac Newton Institute in Cambridge, UK, program on Statistics and Probability in Forensic Science • Poster and presentation planned at Joint Statistical Meetings in Baltimore • Two abstracts accepted for oral presentation to International Conference of Forensic Inference and Statistics 2017 • A brief presentation was given by A. Mazumder at the CSAFE All Hands on Meeting.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? • Resume draft of a paper (by Anjali Mazumder) proposing the use of chain even graphs in modelling asymmetric evidence which often arises in cases addressing activity level propositions. Limited progress was made in this quarter on the draft but is expected to take priority in the coming two quarters.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 33

• Future work includes applying these different frameworks to casework provided by the Allegheny County Office of the Medical Examiner (ACMEO), and other cases provided by international collaborators and identifying strengths and weaknesses of these methods in real casework. As we become familiar with common evidential reasoning that arises in casework, we will move to the construction of BNs including the elicitation of probabilities from experts. We believe that including a qualitative argument framework in the elicitation process will lead to more success in elicitation than a quantitative framework alone, particularly when addressing activity-level propositions. • Jointly with Project K on Databases, we have a proposed to deliver a mini-workshop at CMU on databases and complex forensic inference. These two projects connect in various ways, for instance: (a) in the absence of databases and data, how to address forensic inference questions, and (b) how to include and evaluate the weight of evidence from a database search when presenting the joint or combined evidence.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS • The Isaac Newton Institute in Cambridge, UK, program on Statistics and Probability in Forensic Science which is ran from July to December 2016. The second workshop, entitled Bayesian Networks and Evidence Argumentation Analysis, three CSAFE researchers were invited to speak. These presentations are available online:

o Modelling Activity Level Propositions Using Chain Event Graphs (Anjali Mazumder) o Cause and Effects Framework and Bias in the Evaluation of Evidence (Maria Cuellar) o A Graphical Model Approach to Complex Forensic Inference: Accounting for Uncertainty with Eye-witness Evidence (Amanda Luby)

• Poster and presentation at Joint Statistical Meetings in Baltimore, MD (July 29- Aug 3) -- Abstract Accepted • Two abstracts accepted for oral presentation to International Conference of Forensic Inference and Statistics 2017, one by A. Luby on a review and comparative assessment of the different frameworks; and one by A. Mazumder on addressing activity level propositions using chain event graphs. • A brief presentation was given by A. Mazumder at the CSAFE All Hands on Meeting regarding graphical model approaches to addressing complex forensic inference including activity level propositions.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 34

WEBSITE(S) OR OTHER INTERNET SITE(S) Videos and presentations from the workshop are available at: http://www.newton.ac.uk/event/fosw02/timetable

TECHNOLOGIES OR TECHNIQUES The presentation on the use of chain event graphs for modelling asymmetric evidence which often arises in activity level propositions or when combining evidence resulted in an interest to have a half-day training seminar in the area as it was seen as a logical and plausible technique in the evaluation for such complex cases. This is currently being proposed for early November.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? • Anjali Mazumder has focused on objectives 1 and 2 of the proposal, developing a probabilistic framework for addressing activity level propositions, and collaborating with external partners to gather cases and case circumstances to better inform a probabilistic framework for complex forensic inference to support evidence evaluation in the courts. • Brian Junker (faculty) has been working with Anjali and Amanda since January with a focus on understanding the different graphical and evidential reasoning frameworks and their relationships. • Amandy Luby focused on exploring cases for combining evidence, surveying the literature regarding graphical model structures and accounting for uncertainty using a graph modular structure approach.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? • Professor Jim Smith (University of Warwick) is a collaborator on the paper and work on modelling activity level propositions using chain event graphs. He is a vital collaborator as he has done the foundational theoretical work in chain event graphs. • Dr Ashton Ennis (Allegheny County Medical Examiner’s Office) is a forensic pathologist who attends our weekly CSAFE meetings and also has additional regular meetings with Anjali Mazumder. He is gathering relevant cases and case circumstances to support the development of a coherent, systematic and probabilistic framework for complex forensic inference. • Valena Beatty at the University of West Virginia is Deputy Director of the Clinical Law program their and Chair of the West Virginia Innocence Project. She is working closely with Anjali Mazumder gathering cases, and identifying issues relating to the evaluation

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 35

of scientific and statistical evidence and expert judgement to aid in the development of a coherent, systematic and probabilistic framework. • CSAFE has not provided any financial support to any of these collaborators/partners.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The probabilistic and graphical structure being utilized facilitates the logical statements and mechanism to communicate complex probabilistic statements between forensic scientists, lawyers, judges and jury. Such a tool can be useful during a criminal investigation and in court.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? This is likely to have an impact not just in forensic science but to broader evaluation of evidence in the Law to serve justice (criminal and civil).

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? This provided CSAFE researchers to the international forensic science, statistics and the Law community which was very valuable to PhD students/postdocs as they develop their careers.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? This work is still in its infancy, however, there is enthusiasm as witnessed by the request to have a training workshop in the use of one of these techniques. This may have a broader impact for supporting statistics and the Law.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? Zero.

Changes/Problems There have been delays due to various reasons, including Steve’s untimely death as well as changes in my availability. Much of the international collaboration and engagement has been delayed. The expectation is that this will now resume over the next year, and conversations are taking place for visits and opportunities to move tasks 4, 2, 5 and 6 which involve international collaborators including academics and practitioners. Task 1 as noted above, we have gathered 9 case files so far but have yet to summarise these, and draw out associated statistical issues. This is expected by April 2018 with some initial work being completed by December 2017. With respect to Task 3, the initial framework was presented at the workshop in Cambrige and will be presented at ICFIS in September. It is expected that a draft of this

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 36

paper will be submitted by December 2017. With respect to Task 2, this is the focus for much of Amanda Luby’s PhD work and and will be the focus over the next 18 months.

We propose the updated time table below.

CSAFE Year Task Deliverable(s} Delivery Numbe Schedul Investigator r e

2015

2016 Task 1 Mazumder, Meet and document summary of November Fienberg case cirumstances and associated 2016 (to be

statistical issues. continuously

updated in

2017 & 2018)

Present initial framework for September activity level Task 3 2016 and into Mazumder 2017

2017 Task 4 Mazumder Engage with forensic scientists and March 2017 to lawyers on statistical issues queries late summer regarding the evaluatiion, 2018 interpretation and probative value of evidence arising in court

questions and enquiries.

Draft and develop a framework for December Task 2 Mazumder, modelling complex probabilistic 2017, and Luby forensic inference continue into late 2018

Meeting with national and Task 5 international collaborators,

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 37

Mazumder, regarding case scenarios, Fienberg, Luby, associated statsitical issues and Cueller publish case scenarios with August 2018 probabilistic framework.

Task 6

Mazumder, Fienberg, Develop coherent framework for Summer 2018 Kadane, Luby, planning/decision-making, Carriquiry inference and interpretation of complex cases

2018 Task 7 Mazumder, Develop and report on proposed December Fienberg, methodological framework for 2018

Kadane, Luby, eliciting expert probabilisties and

Gonzalez addressing complex activity level propositions

Task 9 Mazumder, Luby, Fienberg, Provide training to forensic September etc scientists and laywers on guidance 2018 and and probabilistic framework for continue into evaluation of complex forensic 2019 inference and decision-making

2019 Task 8 Mazumder, Report, publish and develop a April 2019 Luby, Fienberg computational approach to

combining likelihoods and sub-

models. Mazumder, Task 10 Fienberg, April 2019 Kadane, Publish research methods Carriquiry, Gonzalez, Luby, Cueller

.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 38

Project HH - Statistical Modeling Framework for Pattern Evidence

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Dan Spitzner (UVA)

Other Investigators: Maria Tackett (UVA)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The project team identified an expanded and refined a set of goals relative to their collaborative work, completely anchored to the project's originally stated goals. The originally stated goals are as follows: The objectives of the project are to develop Bayesian clustering methods with the idea to apply them to forensic data that come in the form of “categories” (e.g., (i) latent print feature type 1 is loop, whorl, or arch; (ii) latent print feature type 2 is ridge ending (RE), bifurcation, dot, or other). While developing such methods, the investigators work from the point of view of “robust” Bayesian inference with the objective to develop guidelines for the production of posterior probabilities with certain desirable properties, including those that improve the interpretability of reporting evidence.

As one expansion of these goals, the investigators have softened their focus on categorical data (i.e., data that come in the form of categories) to broadly consider all types of available pattern data, categorical or measured in precise numerical values. As another expansion, the investigators have found it worthwhile to set their sights beyond clustering toward general multiple testing in forensic inference, among which clustering is one type. Such expansion is largely semantic, for the investigators have come to regard categorical data as one type of data among a rich landscape of data-types, and clustering as one inferential goal among many relevant goals that involve multiple testing. They know how to incorporate these particular concepts and no longer wish to single them out as they develop the general methodology.

As for refinements of the originally stated goals, the investigators have identified two precise goals indicated above: One refined goal is to develop guidelines for designing and improving databases so they are sufficiently informative for inference, both in the construction of error models and in the elicitation of prior information. A second refined goal is to delineate the dependency between evidence reports when there are multiple inputs contributing data to an investigation.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 39

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? Dan Spitzner and Maria Tackett have continued their investigation of an inferential framework for pattern data, such as fingerprint and shoe print data. Part of the objectives are to understand the limits of meaningful information that is provided by databases, and the relevance of sophisticated constructions of likelihood ratio that take into account the dependencies in evidence that come about when there are multiple inputs to an investigation.

We have completed a "gold standard" conceptual inferential framework that clarifies the contributions of databases to statistical inference. It identifies the parameters of probabilistic model that arise from pattern data that are transformed into mathematical vectors via any of various "metrics," such as those that quantify features of minutia in fingerprints. More importantly, it identifies what type of information is needed in databases in order to provide suitable prior knowledge for performing inference on those parameters, hence for calculating a meaningful likelihood ratio.

The framework has thus far been described in the abstract, in terms of "instruments" measuring "pattern sources," the former referring abstractly to the process of measuring a pattern mark (e.g., "dusting" for fingerprints), and the latter referring to the object that made the mark (e.g., the ridges on the pad of a fingertip). It has been learned that rigorous inference requires database information not only on populations of pattern sources that would be measured by a given instrument, but also information on the populations of instruments that would measure the marks.

Regarding their explorations of likelihood ratio constructions that take into account multiple inputs to an investigation, the investigators have carried out a number of numerical explorations into order to determine the potential impact of dependencies in evidence. For example, a question is whether evidence arising from a secondary person of interest would impact the weight of evidence of a different individual's involvement in a crime. Numerically, we have learned that it would.

The next steps are to pull the framework and related observations out of its abstract context and toward the concrete forensic contexts in which they would be used, such as fingerprints, shoe marks, and possible others. There is no expectation that the formulated "gold standard" is met in any of these contexts, but it would nevertheless be illuminating and useful to identify where precisely current inferential approaches, and associated databases, fall short. An intriguing question that arises is how to define a "instrument," which is easy to answer in the abstract, but is less clear in concrete contexts. Regarding the impact of multiple inputs in a forensic investigation, the investigators are working to understand the relevance of their observations made in the abstract to specific concrete contexts.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 40

In the meantime, Dan Spitzner is completing his manuscript on concepts in robust Bayesian hypothesis testing, which will be presented at a conference in early August.

We completed two tasks and provided deliverables listed in the original project proposal:

• Task 1 - A clustering forumlation preliminary report was submitted on 7/14. This journal entry was the basis of the presentation and poster at the CSAFE All Hands meeting in June. • Task 2 - As for the robust Bayes formulation preliminary report, two draft manuscripts (sent on 1/30) were submitted as the preliminary draft.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Maria Tackett, a student from UVA, has gained significant professional experience and expertise through involvement in these activities. Maria has been involved in understanding specific forensic contexts (e.g., fingermarks, shoe prints, etc) and mapping concepts formulated in the abstract to the concrete.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Dan Spitzner participated in the CSAFE All Hands Meeting in June, 2017. He presented a poster, gave a short presentation, and led a discussion during that meeting.

Dan Spitzner 's abstract has been accepted for poster and lecture presentation to the 10th International Conference on Forensic Inference and Statistics (ICFIS), scheduled for September 2017 in Minneapolis.

Dan Spitzner 's abstract has been accepted for lecture presentation to the 2017 Joint Statistical Meetings, scheduled for August 2017 in Baltimore.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? Dan Spitzner and Maria Tackett will move toward the specific goals to identify where precisely current inferential approaches fall short of the "gold standard" inferential framework; and, to understand the relevance of dependencies in evidence. This will require a deeper understanding of existing techniques for processing and analyzing pattern data in specific forensic contexts.

We will continue to make progress on two deliverables listed in the original project proposal:

• Clustering assessment draft of final report (due 8/31/2017) -We are scheduled to present a paper at the upcoming ICFIS meeting in September 5-8, and will present results from this portion of the project at that meeting. Conference materials will be submitted as a draft of final report.

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 41

• Robust Bayes experimentation report on final interpretation (8/31/2017) – Results are being presented at the JSM Conference . A manuscript is nearly complete and will be submitted for publication in a peer-reviewed journal.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS • Dan Spitzner participated in the CSAFE All Hands Mmeeting in June, 2017. He presented a poster, gave a short presentation, and led a discussion during that meeting. • An abstract has been submitted and accepted to the 2017 Joint Statistical Meetings, scheduled for August 2017 in Baltimore. The results of Spitzner's work on a robust Bayes approach to calculating posterior probabilities will be presented at that conference. • Dan Spitzner 's abstract has been accepted for poster and lecture presentation to the 10th International Conference on Forensic Inference and Statistics (ICFIS), scheduled for September 2017 in Minneapolis.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Dan Spitzner (principal investigator) – Oversees the work described above.

Maria Tackett (graduate student) – Focuses on understanding specific forensic contexts (e.g., fingermarks, shoe prints, etc) and mapping concepts formulated in the abstract to the concrete.

No international collaborators.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The methodology is under formulation and has not yet had a significant impact. If successful, the project will clarify and improve the toolbox of clustering methods available to the forensic scientist. The project will provide insight into the advantages and disadvantages of existing clustering methodology, under- stood both in terms of interpretation and statistical performance, while at the same time offering guidance for the use and development of methodology for potentially atypical scenarios. Such understanding of interpretation and performance will be of special value to the applied forensic scientists working with latent fingerprint feature data and other categorical data, for which accurate, precise, and

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 42

interpretable output is a primary concern. (For example, the 100 ‘top-scoring AFIS matches’ to a given latent print from a crime scene might be clustered into ‘highly likely’, ‘very likely’, ‘likely’, ‘possibly likely’, ‘not very likely.’ ) The academic community will benefit from insights into the development of methodology. The project’s focus on interpretable, “robust” posterior probabilities will clarify and simplify the presentation of forensic evidence in terms of statistical summaries, thus benefitting the forensic science and legal communities, especially, while offering interesting and useful new concepts for the academic community to discuss and refine.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? The methodology is under formulation and has not yet had a significant impact.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None of the budget is being spent in foreign countries.

Changes/Problems Nothing to report

CSAFE Annual Report 06.2017 STATISTICAL FOUNDATIONS 43

FIREARMS AND TOOLMARKS

Project O - Developing Methods for Comparison of Cartridge Breechface Images

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: William F. Eddy

Other Investigators: Xiao Hui Tai (PhD student), Xiaoyu Alan Zheng (NIST)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? Major goals as stated in approved application:

- Create an open-source method for computing numerical signatures from cartridge breechface images - Develop statistical methods for comparing signatures - Address statistical modeling not addressed by NIST's research on identifying characteristics

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? The table below outlines our progress thus far:

Year Probabilistic Topic Addressing Delivery Actual Schedule Completion

2016 Preliminary fit of circularly symmetric April April basis functions to a subset of the available data

2016 Check the residuals and refine the September October model/method

2016 Finalize signature/comparison algorithm December November

2017 Establish methods to determine matches February November 2016 and quantify error rates

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 44

2017 Make software package available online March February

2017 Prepare 2D material for journal May May submission

2016-2017 Explore 3D data December 5%

2017 Explore lower-dimensional December 5% representations of the data or search space for faster comparisons

2018 Adapt all methods for use on 3D June topographies

2018 Prepare for another journal submission December

2018 Educating forensic scientists and/or December 20% manufacturers, for example by presenting at AFTE meetings/conferences

Early in the project, we presented preliminary results at NIST and received good feedback and suggestions. One of them was to cut out the firing pin impression and outer regions and focus on the breechface, which we have done. This was the first step in a fully automated algorithm that we have developed.

We tested all the methods on the NBIDE data set which was collected by NIST as part of their 2007 evaluation of the feasibility of a National Ballistics Imaging Database. We analyzed 108 of the images, taken using 12 guns, each fired 9 times using 3 each of 3 different brands of ammunition. We ran our comparison algorithm on these images, computing similarity scores for all pairwise comparisons. For each of these comparisons, we also computed the probability of obtaining a higher score by chance given a known database, based on the methodology that we proposed.

As part of the analysis of the results, we looked at the distributions of similarity scores for true matches and non-matches, and observed that there was not a good separation between the distributions. We also compared the results to those obtained using previously published methodology, as well as results that were published in NIST's 2007 report. We found that our

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 45

method improved performance compared to previous work using 2D images, but performed more poorly than methods using 3D topographies.

Using the maximum cross-correlation value as a similarity metric, we demonstrated that our methodological enhancements result in an improvement in results based on the number of matches in top 10 lists, compared to previously published research on the NBIDE data set. We also see improvements compared to methods involving a manual selection of breechface marks, and without the removal of circular symmetry.

Our algorithm consists of the following steps:

1. Automatically select breechface marks 2. Level image 3. Remove circular symmetry 4. Outlier removal and filtering 5. Maximize correlation by translations and rotations 6. Compute probability of obtaining a higher score by chance

This builds on published methodology, and steps 1, 3 and 6 are improvements. In particular, this algorithm is fully automated and quantifies the uncertainty by computing the probability of obtaining a higher similarity score. We demonstrate that removing circular symmetry in step 3 reduces the similarity score for non-matches.

Researchers brought additional order to the available all the code and produced an open- source software package that implements all the steps of our proposed method to compare cartridge breechface images. This software package is available at https://github.com/xhtai/cartridges. Users can download the software and example images, and perform comparisons according to our method.

We also ran comparisons on all the data available on NIST’s database. Briefly, there is again no cutoff which separates the similarity scores for matches versus non-matches, meaning that a purely threshold-based approach for classification would not be particularly successful. In addition, researchers compared results by individual data sets, finding that those involving consecutively manufactured slides have very good performance, while those involving multiple copies of the same gun have the poorest performance. These results are comparable to what have been published by other groups (e.g. Vorburger et al. 2007; Roth et al. 2015).

The paper that we submitted to the Journal of Forensic Sciences was accepted and will be published in March 2018. In terms of educating forensic scientists, we also presented our work at AFTE 2017, on 5/16.

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 46

We have continued work on a second manuscript describing the results on all available data, with a heavier focus on the statistical details. We have also started exploring the use of wavelets for a lower-dimensional representation of the data.

During talks that were given about this project, a question was raised about dependence between pairwise comparisons. We have completed additional analysis to further explore this concern and have found that the removal of dependence does not qualitatively affect the results.

We visited Iowa State University twice to learn about 3D microscopy. On the first visit (1/17- 18) Bill Henderson from Sensofar provided an overview and demonstrated the system. On the second visit (3/8-9) Alan Zheng and Mike Stocker from NIST went over calibration and standardization. We also shared our latest results and got useful feedback on the project and future directions.

We have been in discussions with the group at Iowa State University about collecting new data using their Sensofar instrument. Specifically, we have suggested a sampling scheme for the cartridges that they recently obtained from the black-box study conducted by the FBI and Ames Lab. These new data would allow us to compare results obtained using algorithms with those from human examiners.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? In early 2017, we mentored two groups of summer undergraduate research students, who completed their projects which made use of data from NIST's Ballistics Toolmark Research Database. They were tasked to investigate both 2D and 3D breechface data. For 2D data, they fit simple statistical models and investigated the effectiveness of using residual sums of squares as a similarity metric. In terms of 3D data, some of the early work involved investigating how the methods that we have developed for 2D might transfer to 3D.

Additionally, Xiao Hui, a PhD student has learned a great deal through involvement in this project. It has provided an excellent professional development opportunity.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? We presented our work at AFTE 2017 on 5/16, and at the CSAFE All Hands meeting on 6/8. Our manuscript has been accepted by the Journal of Forensic Sciences and will be published in March 2018.

The 2016 SURE undergraduate researchers presented posters at a poster session which was open to all members of the CMU community.

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 47

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? Research to meet the stated objectives will continue. We are continuing work revising a second manuscript based on the latest results from analysis of all the data in NIST’s database, which we plan to submit to a statistics journal.

We have started exploratory work on finding a lower dimensional representation of the images, and this work will continue.

We will work to adapt a method for use on 3D topographies. We have been in discussions with the CSAFE group at Iowa State University about collecting new data using their Sensofar instrument. Specifically, we have suggested a sampling scheme for the cartridges that they recently obtained from the black-box study conducted by the FBI and Ames Lab. These new data would allow us to compare results obtained using algorithms with those from human examiners. This collaboration also has a component about repeatability of measurements, and NIST has agreed to make the same measurements using their instrument. We will help with the analysis of results, if and where appropriate.

We will also continue to seek feedback from researchers at NIST as our work continues.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Journal publication - Xiao Hui Tai and William F. Eddy. A Fully Automatic Method for Comparing Cartridge Case Images. Journal of Forensic Sciences. Forthcoming 2018.

OTHER PUBLICATIONS, CONFERENCE PAPERS AND PRESENTATIONS. • Machine Learning Lunch Seminar at CMU, which was open to all members of the CMU community. In particular it was attended by people interested in machine learning and statistics on 10/31. • Forensics@NIST 2016. We gave a talk to the broader forensics community at the Forensics@NIST conference, held on 11/8-9. • Statistics department presentation at CMU on 11/29. • AFTE 2017 on 5/16 • CSAFE All Hands Meeting on 6/8. • Allegheny County Medical Examiner’s office presentation on 1/13, and among those in attendance were firearms examiners.

WEBSITES Our package is available at https://github.com/xhtai/cartridges. Users can download the software and example images, and perform comparisons according to our method.

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 48

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Bill Eddy (PI) and Xiao Hui Tai are the lead researchers conducting the research, analysis and disseminating results as reported.

Xiaoyu Alan Zheng: scientific advisor, Alan provides advice relating to the NIST database and data acquisition, and NIST research on ballistics identification

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? NIST in Gaithersburg, Maryland: John Song and Chen Zhe at NIST are leading researchers in ballistic identification algorithms. They are interested in the methodology that we are developing and we have sent residuals from the model to be tested with matching algorithms that they have developed. Thus far, researchers have had productive discussions that will aid in the advancement of the project, and additional conversations with ensue as the research progresses.

Additional contacts include Joseph Roth from Michigan State University, who kindly shared his implementation of published NIST techniques for comparisons, in particular pre- processing steps, registering and computing the cross-correlation function as a similarity metric.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? Current ATF technology for automated matching of 2-dimensional cartridge breechface images requires the use of an unknown proprietary technology. The matching process produces a score that has no known interpretation apart from allowing the ranking of matched images. In court proceedings, examiners are unable to quantify the weight of the evidence as there are no established methods to describe the error rates of such matching algorithms.

By developing open-source methods, this research allows the assessment of the practicality of such automated matching algorithms. We have improved on available matching algorithms and developed a fully automated method that attaches a measure of uncertainty to the results. This work enables fast and reliable matching of cartridge images, thus reducing the reliance on subjective opinions of human examiners as well as increasing available time and resources available to examiners. The ultimate aim of this project is to use the developed statistical model for improved accuracy in court testimony.

We also hope to free up the time and resources that these firearms examiners currently spend searching through images and results returned by the current NIBIN system.

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 49

Changes/Problems Nothing to report.

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 50

Project CC – Statistical and Algorithmic Approaches to Matching Bullets

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Heike Hofmann (ISU)

Other Investigators: Eric Hare (ISU), Haley Jeppson (ISU)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? 1. Develop computer assisted algorithms to extract cross-sectional bullet signatures using 3D images. 2. Identify signature features or combination of features that carry most information and enable discrimination across samples. 3. Construct a signature-based score that provides a quantitative summary of the difference in striation patterns of two bullets. 4. Explore the behavior of the comparison score among known matching bullets and known nonmatching bullets, and determine a threshold to classify bullets into the match/non-match groups that results in maximum sensitivity and specificity. 5. Investigate whether it may be possible to extract a signature from a smaller area to speed up the comparison across samples. 6. In collaboration with Alan Zheng at NIST, assemble a large database of bullet, casings and breech face 3D images that can be used for research purposes by the scientific community. This objective is conditional on ISU’s success in raising funds for the purchase of an infinite focus, 3D microscope. 7. Investigate whether any algorithm developed using high quality images is robust to image degradation. 8. Extend the model so that a 2D signature can be extracted from the entire surface of a bullet land (rather than from a cross-section of the land). 9. Define metrics that can be adopted to store and retrieve bullet images from large database. Given a set of acceptable metrics, propose a clustering approach (perhaps model-based) to group images in the database so that searches can be expedited. 10. Write user-friendly code with easy to learn and intuitive interfaces that is accessible to the forensics community and that is transparent (no black boxes). 11. Design a training module to introduce practitioners to the algorithms and software.

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 51

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? 1. Investigate whether it may be possible to match two signatures when one is degraded: matches of striation marks based on full land engraved areas (LEAs) are unrealistic in the sense, that evidence collected from a crime scene likely cannot be recovered completely or suffers from other degradation. In order to investigate how much degradation of evidence affects the matching algorithm, we have investigated the performance of the random forest based matching algorithm in the situation that one of the sources is shortened (in steps of 25% down to only one quarter of a land engraved area) and compared to the database of full land engravings. The findings suggest that the algorithm suffers both in terms of sensitivity and specificity (Type-1 and Type-2 error). However, losses in accuracy were small if at least 50% of a land was used for a match. Results from this investigation are summarized in Paper 2 and were presented in parts at the AFTE meeting in Denver. 2. Database of processed bullet lands: a detailed discussion of the database and some usage examples can be found in paper 3. We are continuing to expand the database with locally collected data. 3. Design a large scale experiment to be carried out in cooperation with the Story County Sheriff’s Office and the Defense Forensic Science Center (DFSC) to build a database of bullet and breech face images that will also permit exploring the important question of persistence. We have started a collaboration with the DCI Ankeny and the Story County Sheriff office to execute a persistence study using ten barrels each from two types of guns (collected by the DCI Ankeny from crime-related confiscations). The plan is to use a Kevlar Tube to collect a series of three bullets at every 50 shots (i.e. shots 1,2,3, 50, 51, 52, 100, 101, 102, …, 2000, 2001, 2002) from each of these barrels. The collection will start as soon as the materials ordered have arrived, i.e. some time after August 2017. 4. Continue refining the matching algorithm and begin validation of its performance. Using different datasets: (1) Following the AFTE presentation in June we have been getting access to several different sets of bullet data: we got access to scans done by Bill Henderson from Sensofar. These scans were of bullets collected by Tyler Klep from Phoenix PD. Using the Random forest algorithm we were able to match all bulletlands correctly. This provides a first successful validation of the algorithm completely out of scope, i.e. using a different machine, different operator, different ammunition and barrels. (2) We have been working with Melissa Nally and Kasi Kirksey from the Houston Forensic Science Institute on three test sets of bullets to be used in blinded proficiency tests. We have completed the scanning of all three sets and run the automated matching analysis

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 52

on the first set. Refining the matching algorithm: Working with the new datasets has brought to light some implicit assumptions in the matching algorithm that we needed to address to be able to successfully match. 5. Development of open source software package bulletr: Ganesh Krishnan has added read and write support for x3p files. 6. Continue developing a tool to dynamically visualize images on the computer screen. An online web-application for finding a matching score for two LEAs is publically available at https://isu-csafe.stat.iastate.edu/shiny/bulletr/. The application is implemented in form of a step-by-step walk-through of the automated system that can at each step be informed by manual settings. We have developed a web application for interactively choosing groove locations in a bulletland using x3p format. 7. Investigate the performance of the striation matching algorithm on shear marks of cartridges. We have investigated this possibility using the NIST ballistics database. We did not find 3d topographic images at a high enough resolution to get the algorithm to work. We are planning to continue our investigation using 3d scans from the Sensofar machine at ISU. 8. Begin collection of 3D images at ISU Multiple data collection projects are currently under the way: a) Hamby set 259: following the presentation at AFTE, Professor James Hamby sent us set 259 from his study for scanning. (in process) b) Hamby set 44: Alan Zheng provided us with Hamby set 44 for re-scanning (in process) c) Houston FSI: three test sets of 23 bullets each were scanned. (completed)

Several other opportunities have been arranged: Alan Zheng is in the process of arranging ~2400 bullets collected by Srinivasan Rathinam from the LA PD to be shipped to Iowa State for processing. We are also getting ready to collect data (bullets and cartridges) from a persistence study run locally in collaboration with the DCI and the Story County Sheriff’s office. We are planning to upload the x3p files to the NIST ballistics toolmarks database.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? On January 17, 2017 the Sensofar Confocal Light Microscope was installed by Bill Henderson in Room 0213L in the Molecular Biology Building (MBB) which is part of the High Resolution Microscopy Facility. Bill Henderson provided training on the microscope on January 17 and 18, 2017to several individuals associated with the project (Stan Bajic, Eric Hare, Haley Jeppson, Jason Saporta, Xiao Hui Tai, Heike Hofmann, Daniel Attinger, John Plansky, Haley Jeppson and Alicia Carriquiry).

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 53

On March 3rd, 2017, lab manager Curtis Mosher provided additional training on the microscope (and the general use of the microscope facility) to Heike Hofmann, Haley Jeppson and Eric Hare. On Mar 8th & 9th, 2017 Alan Zheng and Mike Stocker from NIST visited the microscope facility and interacted with the ISU toolmarks group: Heike Hofmann, Eric Hare, Haley Jeppson, Jason Saporta and lab manager Curtis Mosher. Bill Henderson from Sensofar was able to coordinate a visit with ISU for Wednesday, Mar 8th. 2017. These discussions were very fruitful and hugely informative (which is why they are listed under “training”) and introduced the ISU group to the quality control and general benchmarking of the Sensofar CL microscope. We also started a discussion on standards for collecting bullet land and cartridge standards: the ISU side is advised to wait until the collaboration led by NIST and the FBI has established standards. However, we hope that our further investigation of the operator effects between datasets will help inform this process.

Student-Mentor training: Haley Jeppson was trained by Heike Hofmann and Eric Hare to use the (internal) database of processed bullet land engraved areas.

Professional development: Three members of the ISU toolmark group (Heike Hofmann, Haley Jeppson, Jason Saporta) are registered to attend AFTE in Denver.

Training activities: Starting May 1 2017 Curtis Mosher provided training to Allison Mark, junior in Criminal Justice and Anthropology at Iowa State, at the Sensofar CLM. Since then, Allison has been working for 40 hrs/week to get bullet scans.

Professional development:

Eric Hare successfully defended his PhD thesis “Statistical methods for bullet matching ” on April 3. Dr Hare was awarded with the CyStarter award from the ISU foundation. This award consists of a 10 week program to explore the possibility of commercializing the PhD work.

Two members of the toolmarks group were attending the AFTE conference in Denver (Heike Hofmann, Haley Jeppson)

The NIST firearm and tool marks group in turn had an opportunity to work with the CSAFE- developed matching algorithm and compare it to existing tools.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST?

For now, mostly through examples imbedded in the statistics and probability training materials, but as mentioned in the last quarterly report, we are in the process of producing a training module dedicated to examination of striae in bullets, breech faces and firing pin

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 54

impressions. Researchers presented their research in the Forensics@NIST conference in November of 2016.

Presentation “Automatic Matching of Full and Degraded Bullet Lands” at the AFTE meeting in Denver (slides available at https://csafe-isu.github.io/talks/AFTE%202017/slides-AFTE.html) Presentation “Statistical and and Algorithmic Approaches to Matching Bullets” at the All- Hands meeting of csafe at Iowa State (slides are available at https://csafe- isu.github.io/talks/All-Hands%202017/five-minute.html)

Researchers submitted two abstracts to ICFIS 2017 and were accepted and will present on (a) results from degraded bullets and (b) on a web-viewer for x3p images.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? 1. Expand on the database of processed bullet land engravings. 2. Investigate the performance of the striation matching algorithm on shear marks of cartridges. 3. Advance the work on the large scale experiment to be carried out in cooperation with the Story County Sheriff’s Office to collect bullet and breech face images that permits exploring the important question of persistence. 4. Continue refining the matching algorithm with feature extraction and algorithmic changes. In particular, we are interested to investigate whether it is possible to adapt methods developed for large, flat toolmarks such as the Chumbley score and the modified Chumbley score, to be used for matching of land-land scans. 5. Continue collection of 3D images at ISU; in the near future we will be able to add a second Sensofar CLM. 6. Continue developing a tool to dynamically visualize images on the computer screen: aside from the matching web-application, we want to investigate the implementation of an online x3p viewer, that allows elementary interactions with these files. 7. Continue development of the open source R package bulletr.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS • Journal publications. 1. Hare E., Hofmann H., Carriquiry A. (accepted).Automated Matching of Bullet Lands, Annals of Applied Statistics, acknowledgment of federal support: yes.

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 55

2. Hare E., Hofmann H., (in preparation). Algorithmic approaches to bullet matching with an emphasis on the degraded land case, acknowledgment of federal support: yes 3. Hare E., Hofmann H. (in preparation). A modern bullet matching database and web application, acknowledgment of federal support: yes

WEBSITE(S) OR OTHER INTERNET SITE(S) An online web-application for finding a matching score for two LEAs is publically available at https://isu-csafe.stat.iastate.edu/shiny/bulletr/. The application is implemented in form of a step-by-step walk-through of the automated system that can at each step be informed by manual settings.

TECHNOLOGIES OR TECHNIQUES All algorithms and supporting structure are part of the open-access R package bulletr. The package is available from the github repository erichare/bulletr.

OTHER PRODUCTS Database: internal-use mysql database of processed land engraved areas and their signatures. All features extracted from pairwise comparison needed for the algorithmic matching are stored in this database.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Heike Hofmann (PI), 1 month of summer salary.

Eric Hare until May 15 2017, 20hrs/week by CSAFE, setup of the internal database, investigation of matching under degradation.

Haley Jeppson, 20hrs/week by CSAFE, shiny app for interactive groove identification in a web viewer.

Kiegan Rice, 20hrs/week, models for automatic groove detection.

Ganesh Krishnan, since May 15 2017, 30hrs/week, support for x3p read/write in R package bulletr.

Allison Mark, undergraduate in Criminal Justice and Anthropology, 40 hrs/week for operating the Sensofar CLM.

Alicia Carriquiry – ISU

William Eddy - CMU

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 56

Xiao Hui Tai –CMU

No international collaboration.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? • Collaborations with NIST are through technical advisors Alan Zheng and Mike Stocker. Further details are given in the “training” section. • Sensofar (Bill Henderson, Cristina Cadevall) provided copies of electronic scans from bullets collected by Tyler Klep from the Phoenix PD. • Melissa Nally and Kasi Kirksey from the Houston Forensics Science Institute came for a site visit to Iowa State June 18-20th . They provided us with threes sets of 9mm Ruger bullets (from ten consecutively manufactured Luger LCP barrels) to be used in a blinded proficiency test. • We have started to talk to Cystarter project (headed by Eric Hare) and OmniAnalytics (a local consulting company led by ISU alum Lawrence Mosley) about a collaborative agreement for screening bullet lands for their suitability for inclusion in the random forest algorithm. • In discussions with Story County Sheriff’s Office and Defense Forensic Science Center (DFSC)

IMPACT

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The research will provide firearms examiners with an objective approach to quantify the strength of a match and will enable the profession to establish clear thresholds to determine whether two sets of striae are indistinguishable. Furthermore, while initially researchers propose to develop a score-based metric, the ultimate goal is to derive a likelihood-ratio (or similar) statistic to assess the weight of the ballistic evidence. Matching bullets to a common source is an accepted and established forensic science practice. By using an automatic algorithm for the matching, forensic examiners are able to achieve a matching score that allows a quantification of the strength of a match. This reduces the amount of subjectivity of a match. The overall benefit for the forensic community will reside in the fact that examiners will have an objective, quantifiable, standard approach to describe the degree of similarity observed in two bullets. Transparency in algorithms and databases will benefit the legal and scientific communities. At present, a suspect accused of a crime that involves firearms has no opportunity to review the algorithms or the data that were used by the prosecution’s experts beyond the 2D images of crime scene and question samples. CSAFE researchers propose to break with the tradition and publish algorithms and data that are accessible to everyone, and

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 57

produce software that is friendly enough so that experts on both sides in a trial can carry out their own analyses of the data if they so desire.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Matching algorithms based on striation patterns are likely applicable in other disciplines. However, we do not have any concrete examples at this point.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None

Changes/Problems Nothing to report

CSAFE Annual Report 06.2017 FIREARMS AND TOOLMARKS 58

SHOE PRINTS AND TREADMARKS

Project II - Understanding and Modeling the Probability Distribution of Accidental/Randomly Acquired Characteristics (RACs) for Shoeprint Matching

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Jared Murray (CMU)

Other Investigators: Neil Spencer (CMU), Francesca Matano (CMU), Deeva Ramanan (CMU)

With Collaborators in Israel: Sarena Wiesner, Micha Mandel, Yoram Yekutieli, Yaron Shor

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? Specific goals of the project:

• To assess independence and uniformity assumptions currently in use for quantifying the strength of shoeprint evidence based on acquired characteristics • To develop new, more appropriate probability models for quantifying uncertainty in matching shoes to prints using accidental/randomly acquired characteristics • To investigate how accidental/randomly acquired characteristics should be represented and compared during matching, which will influence the development of probability model

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? The focus of this year was primarily devoted to research and development. Spencer continued his work on probability models for randomly acquired characteristics, under the supervision of Murray and Fienberg. Along with the rest of the CMU CSAFE team, Murray and Spencer were engaged with the summer undergraduate program, including two student groups who worked on shoeprint-related topics.

Spencer completed his work on probability models for randomly acquired characteristics and computational algorithms for the estimating the probability distribution of accidentals over a shoe sole, taking into account features such as tread pattern and overall levels of wear. This work was done under the supervision of Murray.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 59

We completed work on a probability model for the distribution of accidentals, finding evidence against the assumption of a uniform distribution.

Spencer completed a report for NIST and is working toward completing a software package to generate synthetic accidentals. We are closing out this project effective 7/1.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Spencer completed his work on modeling the spatial distribution of RACs in the context of his Advanced Data Analysis project (a required component of his Ph.D. program) under the supervision of Murray. Spencer drafted a final report to satisfy the written component of the requirement.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Spencer will present his work at the International Conference on Forensic Inference and Statistics in September in Minneapolis.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? We have completed the project, and expect to submit papers within 6 months.

Products A report was submitted to NIST. Software will be made available soon.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Jared Murray (CMU faculty) - Research on the distribution of RACs. Supervision and support for Spencer.. Collaborated with Israeli researchers, no international travel Neil Spencer (CMU Ph.D. student) - Research on the distribution of RACs. Collaborated with Israeli researchers, no international travel

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Israeli Police, Israel – Contributed database of shoeprints, and related software and collaborative research.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 60

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? We have developed a new statistical model and computational algorithms that are applicable to a range of problems beyond shoeprints.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Forensic scientists have a new tool for modeling shoeprints. The statistical model and computational algorithms could be applicable to a wider range of problems.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? Spencer completed the applied data analysis requirement of his Ph.D. program and interacted extensively with researchers at NIST and in Israel, enhancing his academic and professional development.

WHAT IS THE IMPACT ON TECHNOLOGY TRANSFER? Spencer will provide code implementing his methods to NIST.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? In the end, the purpose of this work is to strengthen our system of justice, both in solving crimes and in preventing wrong or exaggerated evidence from being presented in court, which can have the impact of wrongful convictions. This work provides a tool to support forensic scientists in the modelling of shoe prints.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None related to this project.

Changes/Problems We are closing out this project effective 7/1/17

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 61

Project EE – Statistical and Algorithmic Approaches to Shoeprint Analysis

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Alicia Carriquiry (ISU)

Other Investigators: Soyoung Park (ISU), Guillermo Basulto (ISU), Jared Murray (CMU), Neil Spencer (CMU), Martin Silerio-Vazquez (ISU).

Other collaborators: Eric Hare (Omni Analytics), Sarena Wiesner , Yoram Yekutieli, Yaron Shor (Israel Police),

Marty Herman (NIST), Steven Lund (NIST), Hari Iyer (NIST), Orit Moradov, Micha Mandel (Hebrew University)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? Based on what we have learned so far, and on input from collaborators and technical advisors (Lesley Hamer, Paul Kisch) we propose to expand this project to pursue additional objectives. The major goals of this project (expanded goals in italics) include:

1. Become familiar with the existing literature relevant for shoeprint analysis as well as with some existing shoe print databases.

2. Create a database of similar shoes (in terms of size and sole patterns) that can be used for both research and black-box experiments.

3. For the shoes in the database, collect repeated measurements during a year, to explore wear process and the robustness of algorithms to identify a shoe from prints that are changing over time.

4. Create a database of traces from controlled experiments mimicking crime scenes, which can be used in black-box experiments.

5. Define a score to characterize a shoeprint using 2D images that combines global (sole pattern) and local (wear pattern and RACs) attributes.

6. Investigate whether RACS are uniformly distributed in shoe soles, or whether they follow patterns that can be used for comparisons.

7. Understand the effect of different sources of variability in the process by which RACs are acquired.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 62

8. Test and validate the approach using existing shoeprint images (e.g., from commercial vendors or from collaborators including the Israeli Police).

9. Explore the robustness of the algorithms when images are degraded.

10. Carry out black-box studies with shoe print examiners.

11. Think of efficient mechanisms to store and search shoeprint images.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? Researchers have identified features that can be extracted from a shoeprint and that will permit computing a numerical summary of the print. In coordination with researchers from NIST (Hari Iyer, Steve Lund, Simone Gittelson) researchers have developed an algorithm based on graphic-theoretic ideas (for the front end) and machine learning ideas (in the back end) to align prints and extract features that can be used to construct a score. Researchers defined a score that can be used to quantify the similarity between two shoe sole prints and that is based on the concept of maximum clique, from graph theory. A maximum click is the largest set of points on two images that is determined to show the same geometrical pattern. The current version of the score is constructed by combining two features: size of the maximum clique, and proportion of points in the two images that overlap. (Goal 2) Researchers worked to improve on the feature selection approach.

The most recent version of the feature selection method still relies on maximum clique calculations, but expands beyond the initial approach in two ways: (1) Researchers first use Hu moments to screen obviously un-mated shoes, and (2) Researchers implemented a principled sub-sampling process to reduce the number of pixels that considered by the clique functions to about 10% of the total. In principle, these two added steps can significantly reduce computation time when comparing two outsole images, but researchers have not yet tested whether this is true. (Goals 5 and 8). The current version of the score is invariant to rotation or scale differences between two images. Further, it does not rely on having a complete question shoe sole print, so it appears to be well suited (but much During this reporting period, we have added features to the discriminating score. Very initial results are promising in that the features that make up the score appear to discriminate well between mated and non-mated pairs of shoes (Goal 4).

Researchers have continued to update the literature review by expanding its focus to include graph theory and allocation algorithms. (Goal 1) Code to implement the algorithm has been written and researchers are now devoting effort to improving run time and efficiency of the program. Researchers have begun a data collection activity in cooperation with NIST, CMU, NIST and the Israeli Police. Evaluation of the performance of the algorithm using shoe prints collected by NIST (Lund, Iyer), has begun, using an Evrspry 2D scanner. The computational

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 63

burden is significant, so researchers have redirected some of their efforts to trying to improve computing efficiency to permit more extensive testing. During this period, researchers parallelized the code, which resulted in a significant reduction in computation time, from about one hour for a comparison of two images to about 5 minutes for the same comparison. Researchers are now working with a start-up company (Omni Analytics) to optimize the code as much as is possible. The goal is to be able to compare two outsoles in under one minute (Goal 3).

Researchers are in the process of designing a study to collect longitudinal images of shoe outsoles. The study will include between 120 and 150 participants who will receive a pair of new athletic shoes to wear during the academic year. We restrict participation to persons (of either sex) who wear shoes sizes 8, 8.5, 10.5 or 11, and will also include two different outsole patterns in the study. Shoes will be imaged using fingerprint powder, 2D scanning high- resolution photography, and if possible, 3D scanning. In addition to the baseline measurements, participants will be asked to return every six weeks or so to have their shoes imaged. At the end of the study, researchers hope to have five sets of measurements on each pair of shoes. All measurements will be replicated three times. The design of the study was distributed in draft form to collaborators from NIST, the Israel Police, Hebrew University, UCI and CMU. We received several suggestions and incorporated them in a revised version of the protocol. The actual data collection will be carried out during the first three quarters of Y3. (Goals 2, 3, 4, and 7). The proposed protocol for the study is attached to this report. Researchers installed, calibrated, and began using the 2D scanner for outsoles that was purchased during the last reporting period. Researchers now have a small server connected to the scanner, that is dedicated to collecting and storing the shoe print images.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? The project includes graduate research assistants and a post-doctoral researcher, who have been exposed to new technologies and learning opportunities.

Orit Moradov, a graduate student in Hebrew University visited CSAFE for a month, and actively participated in the design of the data collection activity.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Not during this period. We will present initial results in the International Conference on Forensic Statistics, in Minneapolis during the first week of September. The presentation slides will be attached to the next quarterly report.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 64

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? • We will continue working on improving the computational efficiency of the current algorithm so that we can evaluate its performance on a large number of different pairs of shoeprints. We have contracted with a small private company that can help us optimize the code. • We will seek IRB approval for the longitudinal data collection activity described earlier, so that we can begin recruiting participants, purchasing test shoes, designing a searchable database for storing images, and hire undergraduate students to help with the actual data collection. • In cooperation with Steve Lund and Hari Iyer, we will begin testing the comparison algorithms on a larger set of “pristine” images and on a set of degraded images obtained from the same shoes.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Presentation at CSAFE All Hands Meeting in June 2017

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Alicia Carriquiry – ISU faculty

Soyoung Park – ISU graduate research assistant

Guillermo Basulto Elias – ISU post-doctoral research associate

Hari Iyer – NIST scientist

Steve Lund – NIST scientist

Orit Moradov – Hebrew University visiting graduate student

Sarena Weisner, Yoram Yekutieli, Yaron Shor – Israeli Police

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Israeli Police, Israel – Contributes the database of shoeprints, and related software) and collaborative research.

Carnegie Mellon University and University of California Irvine have participated in the design of the data collection study to ensure that the data will be useful in their research work as well.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 65

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The most direct benefit to the forensic community will be the development of a score that will permit objective comparison of two outsole prints. If successful, the research will also result on a measure of weight of evidence, to assess the probative value of a match between two prints. In addition to the fundamental research questions, we plan on developing practical computer applications to aid crime scene investigators in the capture, transmission and quick comparison of shoeprints. As an example, we plan on developing a phone app that will assess the quality and completeness of the photograph of the prints captured in the field and that will upload those images seamlessly to a database in the lab.

Changes/Problems Based on what we have learned so far and on input from collaborators and technical advisors (Lesley Hamer, Paul Kisch) we propose to expand this project to pursue additional objectives. The major goals of this project (expanded goals in italics) include:

1. Become familiar with the existing literature relevant for shoeprint analysis as well as with some existing shoe print databases.

2. Create a database of similar shoes (in terms of size and sole patterns) that can be used for both research and black-box experiments.

3. For the shoes in the database, collect repeated measurements during a year, to explore wear process and the robustness of algorithms to identify a shoe from prints that are changing over time.

4. Create a database of traces from controlled experiments mimicking crime scenes, which can be used in black-box experiments.

5. Define a score to characterize a shoeprint using 2D images that combines global (sole pattern) and local (wear pattern and RACs) attributes.

6. Investigate whether RACS are uniformly distributed in shoe soles, or whether they follow patterns that can be used for comparisons.

7. Understand the effect of different sources of variability in the process by which RACs are acquired.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 66

8. Test and validate the approach using existing shoeprint images (e.g., from commercial vendors or from collaborators including the Israeli Police).

9. Explore the robustness of the algorithms when images are degraded.

10. Carry out black-box studies with shoe print examiners.

11. Think of efficient mechanisms to store and search shoeprint images.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 67

Project A - Statistical Models for the Generation and Interpretation of Shoeprint Evidence

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Charless Fowlkes (UCI)

Other Investigators: Deva Ramanan (CMU)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The statistical interpretation of shoeprint evidence includes two components, an observational model that captures how likely particular impression evidence collected at a crime scene would be observed given that it was created by a specific shoe, and a prior model that captures the probability that particular shoes (either suspect or incidental) could have been at the crime scene. The observation term isolates the quality of the specific evidence itself from the prior term factors such as the frequency with which other (i.e., non-suspect individuals) wearing the same shoe type might have walked through the same area.

Our goal is to develop an observational model that takes into account partial or obscured prints. As less and less of the tread pattern is left behind, the confidence with which that impression can be matched to a specific candidate decreases. Match confidence also depends on the specifics of the tread pattern itself (e.g., is it unique to one brand of shoe or common across many). Thus, we are characterizing the spatial distribution and uniqueness of tread pattern features across a large database of shoe/brand types in order to provide statistical characterizations of match confidence as a function of how much and what part of a given print is visible.

Our initial explorations suggest the importance of 3D reasoning in estimating matching confidences. While many acquired characteristics are punctate in nature and well characterized by their 2D location in an impression, wear patterns tend to be graded and not well represented by a patent print image. Technologies for acquiring 3D shape of impressions in soft surfaces at the crime scene and for recording 3D shape of suspect outsoles have been developed by NIST (see e.g., Bauman et al, 2014) and others, but are not yet common in traditional workflows. The component of this project on relating 3D tread shape to crime- scene impressions will provide a basis for recommendations how to ultimately incorporate such shape analysis tools into the forensic workflow and characterize their potential investigative value. As 3D model acquisition of suspect outsoles becomes more commonplace, methods developed in this project for estimating contact surfaces could be

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 68

used immediately by examiners to evaluate which acquired characteristics are a priori likely to be present in a crime-scene impression

Beyond the collection of data, we plan to incorporate computational and statistical models developed in this research into the workflow of footwear examiners in several ways. Modeling the distribution of local tread patterns and acquired characteristics will provide a basis for developing guidelines on the statistical reliability of matching small numbers of such characteristics. These guidelines would be supplemented by a tread pattern database and software used to provide case specific statistics. For example, automated class-level retrieval based on crime scene impressions would allow for examiners to quickly assemble a “lineup” of candidate matches, which, in combination with statistics about the prevalence of shoe types in a locale, would provide estimates of probabilities needed for evaluating false-match probabilities. Additional data on distribution of acquired characteristics and wear patterns could also be provided on a per-class basis.

The goal of the project is to develop statistical models for the interpretation of impressions and marks left by shoe outsoles. Our specific objectives are to:

1) develop a statistical model for reasoning about partial or obscured prints, 2) build a computational implementation of this model for matching and retrieval that provides calibrated probability estimates for category level (brand, size) identification from partial prints, 3) develop a simple generative physical model for markings based on 3D outsole shape and surface properties, 4) develop methods for visualizing distinctiveness of partial print matches that can serve as a training or investigative tool.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? Our specific efforts include (1) collection of datasets for understanding shoe outsole statistics, (2) an evaluation of state-of-the-art image descriptors for the purposes of shoeprint matches under stressed conditions (such as poor print extraction and occlusions), and (3) building a system for 3D shoeprint acquisition. During this project year we have focused our efforts on evaluation of 2D features and have completed our proposed milestone of implementing and evaluating feature-matching baselines for matching class characteristics across images.

(1) Data collection

The starting point for all our activities involve data collection and modeling of shoeprint data. Toward this end, we are pursuing multiple modes of data acquisition.

Our first effort is a dataset of shoe outsole images, the UCI Shoe Outsole Database (UCI- SHOD), collected for purposes of analyzing statistical power of tread pattern shape

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 69

measurements in making statements about category level (shoe make/model/size) identity or distinction. We have continued to grow this dataset (now ~213000 images of ~79600 tread IDs) and have also been collecting side images of shoes which may ultimately be useful to have paired with tread matches for investigative purposes. We have also developed infrastructure to continue growing this dataset as well as collect basic estimates about the rate of appearance of new shoes available in the US consumer market over time.

We have begun work on implementing a web-based search interface to this dataset that will allow for exploratory browsing, search by metadata (e.g., brand, category) and provide search-by-image functionality using image descriptors described below.

Our second effort makes use of the recently acquired data set obtained from Israeli collaborators. This data consists of 600 shoe impressions marked with 20000 accidentals points split into types of scratch, hole, rift, foreign object, etc.

Existing statistical models for scoring the confidence of a print match tend to rely exclusively on the location of accidentals. We are exploring the use of image-based appearance models to report the appearance consistency of the accidentals (that may, for example capture the similarity of shape of two matched scratches). Toward this end, we are exploring several classic appearance descriptors based on edge matching, gradient histograms, as well as recent descriptors-based deep features. We have annotated the Israeli data with tread pattern identifiers, and are using these annotations to evaluate several patch descriptors (in terms of their ability to retrieve matches from prints of shoes with similar tread patterns). Importantly, some of our image matching descriptors make use of a learning component, and so require a large labeled dataset for training. We are making using of UCI-SHOD for precisely this purpose.

Finally, a third effort involves the collection of 3D shoeprint scans. Our goal is to devise a scanning pipeline with high statistical repeatability, thereby allowing for reliable variance estimates when computing match likelihood ratios. Low-cost, portable, three-dimensional capture represents a potentially transformative approach to both acquisition and modeling of latent prints, since the 3D structure of the outsole can be used to infer geometrically-valid contact surface areas for both the general tread patterns and specific accidentals of interest. We are developing a custom acquisition device inspired by GelSight technology [Johnson & Adelson, 2011], which allows for high-accuracy 3D scanning of handheld objects at micron resolution. We have constructed a prototype scanner device and have been characterizing its accuracy and refining the design.

(2) Pattern descriptor evaluation

Our specific efforts include an evaluation of state-of-the-art image descriptors for the purposes of shoeprint matches under stressed conditions (such as poor print extraction and

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 70

occlusions). We have developed a pipeline for matching crime scene impressions to a database of test impressions that searches over alignment (rotation and translation of the latent print). Using this pipeline, we have evaluated the performance of several image descriptors in terms of their ability to retrieve matches from prints of shoes with similar tread patterns. These descriptors include baseline appearance matching models (raw pixels, Chamfer Matching, Histograms of Oriented Gradients, and deep convolutional neural network features (DCNNs)).

These features have been evaluated on the Israeli dataset in order to determine best image resolution and tune feature extraction parameters. We have also tested them on the publically available FID-300 benchmark (http://fid.cs.unibas.ch/). Our experiments show that DCNN features matched using per-channel normalized cross correlation yield the best retrieval performance. The performance of our system on the FID-300 benchmark outperforms existing published methods. This is particularly significant since the FID-300 data was not used at all during the design of the system so this represents truly held-out test data. The figure below shows the cumulative match curve on FID-300 for our current implementation (Two variants denoted with [.,.]) alongside curves for recently published results reported by Kortylewski et al.

We have extended this approach to allow for end-to-end training of the system so that the DCNN features can be optimized directly to improve matching performance. In particular, we developed a so-called “Siamese network” architecture in which each branch of the network has a final independent transformation step that allows for learning an alignment of features extracted from different modalities (e.g. crime-scene lift vs. test impression vs. tread photograph). This can be viewed as an extension of traditional canonical correlation analysis (CCA) but with non-linear feature extraction and fitting based on empirical risk minimization criteria.

A manuscript describing this architecture and results has been accepted for publication and is enclosed.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 71

3) Building a system for 3D shoeprint acquisition.

During this project year we have focused our efforts on evaluation of 2D features. We intend to start exploring 3D features in the next 6-9 months. Our initial explorations suggest the importance of 3D reasoning in estimating matching confidences. While many acquired characteristics are punctate in nature and well characterized by their 2D location in an impression, wear patterns tend to be graded and not well represented by a patent print image. Technologies for acquiring 3D shape of impressions in soft surfaces at the crime scene and for recording 3D shape of suspect outsoles have been developed by NIST (see e.g., Bauman et al, 2014) and others, but are not yet common in traditional workflows. The component of this project on relating 3D tread shape to crime-scene impressions will provide a basis for recommendations how to ultimately incorporate such shape analysis tools into the forensic workflow and characterize their potential investigative value. As 3D model acquisition of suspect outsoles becomes more commonplace, methods developed in this project for estimating contact surfaces could be used immediately by examiners to evaluate which acquired characteristics are a priori likely to be present in a crime-scene impression.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? The project has supported two graduate students who have been closely involved in the research. The students have also presented at weekly CSAFE meetings held at both UCI and CMU and one attended the CSAFE All-Hands Meeting hosted in Ames in June 2017.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? The PI has participated in discussions with NIST collaborators (Martin Herman and others) and Israeli Police Collaborator (Sarena Weisner and Yaron Shor) on protocols for building new datasets of test impressions and simulated crime scene evidence. The PI was also invited to join an FBI working group studying the feasibility of assembling a national footwear database in the fall. The PI and graduate students participated in a tour of the Orange County Crime Lab and established initial contact with footwear examiners there. The PI will present a poster on the work at the International Conference on Forensic Inference and Statistics (ICFIS 2017) in September. Finally, a manuscript describing the feature matching architecture has been accepted for publication in the Proc. of the British Machine Vision Conference (BMVC 2017) and will be presented there by one of the PhD students working on the project.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? We will finish a prototype web-based interface to UCI-SHOD which supports searching by tread pattern similarity. We will begin investigating methods for performing accelerated

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 72

matching over large datasets in order to allow for a user to interactively upload a query image and search for similar tread patterns in SHOD.

We will finalize design of our 3D acquisition device design and begin collection of physical shoe scans. Our initial plan is to explore common shoe brands, allowing us to explore the interplay of make/model and instance-level tread variation.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Bailey Kong, James Supancic, Deva Ramana, Charless Fowlkes, “Cross-Domain Forensic Shoeprint Matching”, Proceedings of British Machine Vision Conference, to appear, London, September 2017.

The PI was also invited to join an FBI working group studying the feasibility of assembling a national footwear database in the fall.

The PI will present a poster on the work at the International Conference on Forensic Inference and Statistics (ICFIS 2017) in September

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Charless Fowlkes (PI) led efforts on feature matching, 3D scanner design and oversaw graduate student efforts at UCI.

Deva Ramanan (co-PI) oversaw graduate students at UCI and CMU, fostering collaborations between CMU and UCI shoeprint teams.

Bailey Kong (graduate student) developed models for shoeprint matching with partially visible prints.

James Supancic (graduate student) developed models for large-scale collection of online shoeprint data including make/model annotations.

We have recently recruited two UCI undergraduates (Liyan Chen & Yoon Jung) who will assist in developing the SHOD website interface over the next 6 months.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? In the near future, we plan to collaborate with shoe impression analysts in the Impression Evidence section of the Orange County Crime Lab in order to get feedback and guidance on what tools will be of practical significance. We are also coordinating with others at NIST to integrate our database collection and shoeprint generation methods with the general

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 73

SHOECALC framework being developed at NIST by Marty Herman, Hariharan Iyer and others.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? A strong statistical basis for understanding the reliability of partial prints would help guide practitioners and better calibrate their analysis and reporting. In addition to reporting our results in academic papers and presentations, we also intend to produce informational resources in the form of a comprehensive database of print patterns that can be searched by partial matches. This will produce visualizations that show reliability of partial matches as a function of the location, extent and tread pattern present which we envision could be used as a training tool.

Our analysis of contemporary image-matching algorithms will allow the integration of recent developments in image understanding into the problem of estimating match reliability in footwear impression evidence. Additionally, our development of 3D shoeprint scanning technology has the ability to transform the process of shoeprint database construction and modeling.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Our results will have likely impact on the matching of other latent print and impression evidence (e.g. tire treads).

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? In addition to serving directly in analysis, we expect interactive search and visualization of the datasets under development will provide a useful training tool for examiners to develop expertise applicable to existing workflows. We will work with the general outreach components of CSAFE to develop educational modules where appropriate that describe principles for statistical understanding of impression evidence as well as conveying specific findings about the reliability of acquire characteristics.

WHAT IS THE IMPACT ON PHYSICAL, INSTITUTIONAL, AND INFORMATION RESOURCES THAT FORM INFRASTRUCTURE? Our UCI-SHOD database constitutes one of the larger available collections of shoe outsole images, allowing for exploration of big-data approaches to shoeprint matching. Our portable, low-cost 3D acquisition device also allows for the development of novel 3D print databases.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 74

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? We expect our results on image matching and 3D model acquisition will allow for intuitive, rigorous, and statistically sound basis for expert forensic testimony related to footwear impression evidence.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? No amount is spent in foreign countries.

Changes/Problems Nothing to report.

CSAFE Annual Report 06.2017 SHOE PRINTS AND TREADMARKS 75

HANDWRITING

Project G – Towards a Score-Based Likelihood Ratio for Handwriting Evaluation

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Hal Stern, UC Irvine

Other Investigators: Eric Lai (graduate student), UC Irvine

Accomplishments

MAJOR GOALS The objectives of the research project are: 1) Provide statistical support for the LAPD/LASD study of handwriting complexity (assessing subjective complexity, relating it to objective dynamic (taken while the person is signing) measures of the writing, determining examiner performance as a function of complexity). 2) Develop statistical methods for measuring the complexity of a static handwriting sample. Note that a method for static samples is required to make this approach practical. 3) Evaluate the use of complexity measures for developing score-based likelihood ratios for handwriting evidence.

For this project year, the stated target/deliverable is as follows: - Survey of available data for developing measures of image complexity - Literature search on complexity/compression measures

ACCOMPLISHED DURING THIS PERIOD Objective 1 –Our collaborator, Miriam Angel of the LAPD, shared data that she had collected regarding examiner assessments of handwriting complexity. Five different examiners were presented with 101 signature pages (each page contained five signatures from an individual). Measures of complexity (on a 3-point scale and on a 5-point scale) were obtained for each signature page from each examiner. We assessed the consistency of examiner judgements across signatures. This was done by computing a number of summary measures including the frequency of agreement for pairs of examiners, the correlation of the ratings of each pair of examiners, and the nature of disagreements among examiners. A linear statistical model was used to estimate the variability in scores due to examiners, due to signatures and due to random variation. From these it is possible to estimate the reliability of the complexity assessment process. We estimate that the correlation of complexity scores across different examiners is approximately 0.60 for both the 3-point and 5-point scales. This is a little bit

CSAFE Annual Report 06.2017 HANDWRITING 76

lower than might be expected. We can also estimate the correlation that would be expected in a hypothetical study where the same examiner was used to re-assess the complexity of the signatures; this is sometimes called reproducibility and the correlation was estimated at 0.67.

A technical report was prepared and shared with our collaborators. This report (copy included) provided the basis for the CSAFE All-Hands Meeting presentation on the project. Ms. Angel has recently provided an updated dataset that includes additional signatures and also includes the results of a small reproducibility study (the same examiners were asked to re-rate a subset of the signature pages). We will analyze these data and continue to consult on the next phase of the LAPD/LASD study in the coming year.

Objective 2 –We continued to explore data that are available for developing measures of image complexity and we have continued to review the literature related to measuring image complexity and using complexity to assess signature evidence. In terms of the deliverables:

Survey of available data for developing measures of image complexity - Ultimately we will make use of the signature data collected by LAPD/LASD for their study of examiner performance. This will include genuine and simulated signatures. These data are not yet available. - ICFHR 2012 Signature Verification Competition (4NSigComp2012) Data: This is the primary dataset we have worked with so far. For the test set signature samples were provided by the Forensic Expertise Profiling Laboratory (FEPL) of La Trobe University. It contained signature samples from three specimen writers/authors, ’A1’, ’A2’, and ’A3’ respectively. The questioned samples were a mixture of genuine signatures, disguised signatures and skilled forgeries as given in tables 1 and 2. All signatures were written using the same make of ball-point pen and the same make of paper. The questioned samples were numbered randomly, scanned and inkjet or laser printed into a booklet. The signatures were collected under supervision of Bryan Found and Doug Rogers in the years 2001, 2002, 2004, 2005 and 2006, respectively. The images were scanned at 600dpi/300dpi resolution and cropped at the Netherlands Forensic Institute for the purpose of this competition. (Note: These data are the “test data”. We also have access to the “training data” which are similar in form from the ICFHR 2010 Signature Verification Competition (4NSigComp 2010). - We have recently been made aware of an additional source of data. The International Unipen Foundation website (http://www.unipen.org/products.html) contains datasets that have been used in other papers (including Firemaker data used by Schomaker and Vuurpijl in some of their work). We are investigating.

Literature search on complexity/compression measures – Graduate student Eric Lai has produced an annotated bibliography of papers relevant to our approach. These include papers on approaches to assessing complexity of signatures and other handwriting, papers on related work from other disciplines, and foundational computer science and statistics papers. The annotated bibliography is included with this report. As often happens in such cases we continue to find new papers and thus will continue to update this bibliography.

CSAFE Annual Report 06.2017 HANDWRITING 77

Feedback from CSAFE Advisory Board – We sought comments from a member of the CSAFE Advisory Board during this period. The review was complimentary of the project noting that: • Overall it is a well-designed project with attainable outcomes • The application of statistics to handwriting evaluations has merit • The incorporation of a forensic document examiners in this project is a key part • Looking forward to the completion of this project as the information should be useful to the forensic science community

TRAINING AND DEVELOPMENT PI meets regularly with graduate student Lai to discuss progress and work plans. In addition, PI and graduate student Lai meet with other UCI investigators once per month to share progress and ideas. At each meeting one of the UCI projects presents its results. Mr. Lai presented a summary of the reliability analysis of signature complexity results. A new student, Maozhu Dai, has begun to read background material. She will likely join the project in Fall 2017.

HOW HAVE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST Stern presented an update on the project at the CSAFE-wide All-Hands Meeting June 2017. In addition, Lai prepared and presented a poster describing our results on the reliability of complexity judgements at the All-Hands Meeting.

An abstract was submitted to the 2017 International Conference on Forensic Inference and Statistics to present our latest results. The abstract was accepted and there will be a presentation at the meeting in Minneapolis, MN, September 2017.

PLANS FOR NEXT YEAR PI and graduate students will focus on two aspects of this project. First, they will update the analysis of subjective judgements of signature complexity to take advantage of additional data collected by our collaborators with the LAPD/LASD. Second, we will continue to develop quantitative image summaries that might serve as features to distinguish genuine and simulated signatures.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS A technical report “Forensic Document Examiner's Handwriting Complexity Score Reliability Analysis” was prepared and shared with our collaborators at LAPD/LASD. A copy is included. Poster “Assessing the Reliability of Forensic Document Examiner Signature Complexity Ratings” presented at the CSAFE All-Hands Meeting. A copy is included.

CSAFE Annual Report 06.2017 HANDWRITING 78

An abstract was submitted to the 2017 International Conference on Forensic Inference and Statistics to present our latest results. The abstract was accepted and there will be a presentation at the meeting in Minneapolis, MN, September 2017.

Participants and Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THIS PROJECT Primary individuals involved are Hal Stern and statistics graduate student Eric Lai. PI Stern worked approximately one person-month on this project during the year. Graduate student Lai worked approximately 6.0 person-months. He received 50% funding support.

A new student, Maozhu Dai, has begun to read background material. She will likely join the project in Fall 2017.

WHAT OTHER ORGANIZATIONS HAVE WORKED ON THIS PROJECT Collaborators include Miriam Angel, forensic document examiner in the Los Angeles Police Department, and Mel Cavanaugh, forensic document examiner in the Los Angeles County Sheriff’s Department. Linton Mohammed, a consulting forensic document examiner, is also collaborating on the project now.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT The project will develop statistical methods that can be used to characterize the complexity of handwriting samples. The complexity can then be related to performance. This provides various ways of evaluating the performance of examiners in different types of cases. In addition, the complexity measure is a potential input to a score-based likelihood ratio analysis.

Changes/Problems Actual or anticipated problems or delays and actions or plans to resolve them 1) Progress on this project has been slower than hoped for. This is because graduate student Lai has been working on two different CSAFE projects – Project G (handwriting) and Project I (lay assessments of strength of forensic evidence statements – Thompson). Over the last 3-6 months the second has taken priority so that we can help produce manuscripts on that work. A second statistics graduate student, Maozhu Dai, has begun to read the background papers on handwriting analysis with the anticipation of joining the project in Fall 2017. 2) Deliverables for Fall 2018 include a report analyzing the relationship of complexity measures in static and dynamic handwriting samples. It has become apparent that this is

CSAFE Annual Report 06.2017 HANDWRITING 79

not going to be of value to the document examination community or our LAPD/LASD collaborators. The dynamic features (summaries of velocity, pressure, etc.) are not at all analogous to the types of features we are generating from static signatures. Thus we propose to remove this deliverable from the project plan.

CSAFE Annual Report 06.2017 HANDWRITING 80

FINGERPRINTS

Project Q - Developing a Statistical Foundation for Latent Print Comparison

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Joseph B. Kadane (CMU)

Other Investigators: Stephen Fienberg (CMU), Amanda Luby (CMU), Anjali Mazumdar (CMU)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? To contribute to the understanding of both the fundamentals and the practice of the forensic use of fingerprints from collection to presentation in court. This project focuses both on the nature of fingerprint databases and their use in identification via automated systems such as AFIS as well as formal statistical modeling so that when a forensic expert testifies as to a possible match, formal statistical statements can be made about the error associated with the matching process and the uncertainty associated with the forensic judgment. The project currently involves three studies, each aimed at exploring and shedding light on a certain aspect of fingerprinting.

• The first study aims at better understanding how lay people (not necessarily exposed to science generally or forensic science in particular) respond to various ways in which fingerprint analyses might be reported in court. • The second study analyzes the data from a proficiency study of fingerprint experts using item response theory. • The third study addresses an operational problem experienced by fingerprint experts in immediate practice: whether it would be wise to use ninhydrin to enhance the visibility of fingerprints found on bags after drug seizures.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? 1) Kadane’s abstract “Fingerprint Science” was accepted for presentation to the Jurisprudence Section of the American Academy of Forensic Science for its annual meeting in New Orleans in February. The abstract follows the technical arguments in the project proposal regarding use the theory of statistically equivalent blocks to investigate what can (and can't) be learned from analysis forensic databases using a similarity metric.

CSAFE Annual Report 06.2017 FINGERPRINTS 81

2) A much deeper understanding of the strengths and weaknesses of fingerprint analyses through???. 3) Additional Related Research as Part of this Project: Jonathan ("Jay") Keohler and J.B. Kadane have been invited to contribute to a special issue of Daedalus journal (the publication of the American Academy of Arts and Sciences) on Science and the Legal System. In particular, they were asked to write about "Uses and Misuses of probability and certainty in the courtroom". To narrow the topic to something tractable, they intend to write about the various wordings suggested to fingerprint analysts to report on their findings, particularly positive ones. They intend to do empirical research to see whether different verbal formulations (" to the exclusion of all others", "to a reasonable degree of scientific certainty", etc.) are heard by fact-finders as substantially different or as basically the same. They are just in the very early stages of this component of the project, starting to design a questionnaire.

4. The paper “Fingerprint Science” was revised on the basis of comments received at the October 18 CSAFE meeting at CMU and the November 8-9 NIST symposium. The “statistically equivalent blocks” idea turned out to be incorrect, and a correct argument was substituted. The paper is now submitted to the Journal of .

5. Kadane has begun preliminary discussions with Jonathan (“Jay”) Koehler of the Northwestern University School of Law about collecting data to examine whether and to what extent the public is sensitive to different verbal formulations of results of fingerprint analysis.

6. The report of the American Association for the Advancement of Science Working Group on Latent Fingerprint Analysis (John Black, Anil Jain, Jay Kadane and Bill Thompson) is now in AAAS review.

7. Three new studies have been initiated. In each study, data has been secured and preliminary analyses conducted. Each is aimed at exploring and shedding light on a certain aspect of fingerprinting. The first study aims at better understanding how lay people (not necessarily exposed to science generally or forensic science in particular) respond to various ways in which fingerprint analyses might be reported in court. Jurors and judges are the ultimate consumers of these reports. Therefore, how the various ways fingerprint analyses are reported have to be understood from the perspective of such non- experts.

The second study analyzes the data from a proficiency study of fingerprint experts. Using methods from Item Response Theory, results allow comparison of the relative abilities of

CSAFE Annual Report 06.2017 FINGERPRINTS 82

these experts and the relative difficulty of the questions posed. It also bears on how to understand the false-positive and false-negative error rates reported from such studies.

The third study addresses an operational problem experienced by fingerprint experts in immediate practice: whether it would be wise to use ninhydrin to enhance the visibility of fingerprints found on bags after drug seizures. This study is being pursued in collaboration with the Allegheny County Medical Examiner's Office, and reflects their wish for a scientifically (and legally) defensible practice in this regard.

There has been major progress in the first study which aims to better understand how lay people response to the reporting of fingerprint analyses in the court. The analysis has been completed on a study of how lay people (not necessarily exposed to science in general or forensic science in particular) respond to various ways in which fingerprint analyses might be reported in court. A paper has been drafted, and a first round of comments has been received and acted upon. This work is joint with Professor J. Koehler at Northwestern Law School. The original objective was to have data. This objective has been met, and the task of delivering a paper is 50% complete.

This has findings that bear on proposals made by the Department of Justice to bar federal employees from including in their fingerprint reports and testimony language such as “to the exclusion of all others in the world” and “to a high degree of scientific certainty”. Our finding is that lay people interpret the fingerprint results as equally strong with or without that language. Hence this is a proposed reform that does not accomplish the purpose of communicating the less-than-certain nature of fingerprint identifications.

The second study, an analysis of a proficiency exam, has been accepted for presentation at the Minneapolis meeting on forensic statistics. The aim is to set an example of how modern statistical methods can shed light on fingerprint proficiency, and possibly to make suggestions of how such examinations might be improved in their design and analysis. This study is at the preliminary stages and researchers expect to make significant progress in the next six months. This work is in joint collaboration with graduate student Amanda Luby.

The third study concerns the use of ninhydrin as an agent on bags containing street drugs. The motivation is to find a method of handling the bags that permits both recovery of fingerprints and a reliable estimate of the weight of the drugs, in a way that minimizes the possible exposure of personnel to fentanyl. We now have pristine drug bags that do not contain fingerprints with which to experiment. Kadane expects to be able to finalize the experimental design in the near future, in collaboration with the Allegheny County Medical Examiner’s Office. This work is includes a joint effort with them (both their fingerprint and drug analysts) and with CMU graduate student Neil Spencer. This study and tasks to meet the objective is moving smoothly and is 25% complete.

CSAFE Annual Report 06.2017 FINGERPRINTS 83

With respect to Study 3 investigators had a negative finding. The additional data investigators wanted was collected and analyzed. However, the analysis showed such unexpected variability that partners at the Allegheny Medical Examiner’s Office think there may be issues with their weighing techniques. Although this was an unexpected issue, it is at the same time an important contribution to the efforts of ACMEO to do accurate and excellent work.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Kadane attended the Annual meeting of the American Academy of Forensic Science (AAFS) meeting in New Orleans, and learned more about how extensive the forensic sciences are. This was an opportunity to meet and converse with several scholars working on fingerprint science matters.

Kadane has worked with two graduate students, Amanda Luby and Neil Spencer, under this cooperative agreement. In each case, the goal is to help them develop, giving guidance where it will help them grow and mature professionally, which contributed to their education. Additionally, Kadance gave two lectures to the undergraduate students in our summer program, one on experimental design and one on his experiences as an expert witness.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Kadane gave a talk ("Fingerprint Science") at the AAFS meeting mentioned above.

Studies 1 to 3 have been presented as part of lecture material and projects to the summer program involving forensic science and statistics student (Project N).

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? Study 1 will be presented at a meeting of the American Academy of Arts and Sciences in Cambridge in July. Kadane expects to get feedback which will occasion further modification of the paper. He expects that study 2 will be a major focus of effort in the upcoming year, to prepare for the September meeting. Study 3 will proceed at as rapidly as possible in regards to our partners in ACMEO availability, given their operational responsibilities.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Kadane’s abstract “Fingerprint Science” was accepted for presentation to the Jurisprudence Section of the American Academy of Forensic Science for its annual meeting in New Orleans in February 2017.

CSAFE Annual Report 06.2017 FINGERPRINTS 84

The report of the American Association for the Advancement of Science Working Group on Latent Fingerprint Analysis (John Black, Anil Jain, Jay Kadane and Bill Thompson) is now in AAAS review. “

Fingerprint Science” presentation at Forensics@NIST2016 held November 8-9

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? 1. Principal Investigator Joseph B. (“Jay”) Kadane, Leonard J. Savage University Professor of Statistics and Social Sciences, Emeritus. Kadane is working on , and am responsible for, each of the studies described in this report. Kadane is paid one month’s support per year for the work on this project. 2. Stephen Fienberg. Research Collaborator: background research collaboration, data acquisition; CSAFE funding; not collaborating internationally 3. Statistics Graduate Student Amanda Luby is a collaborator and coauthor on the study of the proficiency examination. She is supported by the Cooperative Agreement for her work on this and additionalprojects. 4. Graduate Student Neil Spencer is a collaborator on the work on ninhydrin. 5. Professor Johnathan (“Jay”) Koehler is a Professor at Northwestern University School of Law. He is not supported under this agreement.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Allegheny County Medical Examiner’s Office (1520 Penn. Avenue, Pittsburgh, Pa, 15222) The Medical Examiner and his deputy regularly attend our weekly CSAFE meetings. Study 3 in this report is in collaboration with personnel from the Medical Examiner’s Office, which is the crime lab for the county and city. Study 3 involves ACMEO expert in fingerprints and drugs. Their staff are running experiments and providing the data for analysis, with joint effort on the careful design of experiments. There are no other contributors to the project at this time, international or domestic.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? Latent fingerprints are perhaps the most common form of forensic pattern evidence. Comparisons of fingerprints (matching) represent a primary focus of most forensic labs. They play crucial roles in criminal investigations for both inclusionary and exclusionary purposes, and such comparisons are often described in court proceedings. Advances in the statistical

CSAFE Annual Report 06.2017 FINGERPRINTS 85

assessment quantification of fingerprint comparisons will impact all communities interested in forensic science.

The first impact is on the graduate students involved in the project. They are being exposed to the possibilities of serious statistical work in the forensic sciences., which they may choose to pursue after their graduate work is completed. Additionally, this work may attract other statisticians, particularly those in the early stages of their career to the opportunities in this field.

The second impact is on the undergraduates in the summer program at CMU. One of the data analysis projects (with four students assigned) uses data from our collaboration with the ACMEO, related to study 3. This gave them experience using statistical techniques to find out what information was concealed in the forensic data provided.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? The forensic experts at the Allegheny County Medical Examiner's Office may not have been exposed to statisticians before, and may not have considered the possibilities of joint research. As the work progresses, investigators may find other opportunities for collaboration.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? The first impact is on graduate students and the undergraduates in summer program. The research initiates opportunities to discuss students’ interest in the STEM fields. For example, Kadane has been approached regarding addressing a high-school class about forensic statistics, but nothing has been scheduled yet. Kadane was visited by a student from West Virginia University who is a joint forensic-science and statistics major. Kadane expressed interest in future collaborations the possibility of a visit in the fall of 2017 was discussed.

WHAT IS THE IMPACT ON TECHNOLOGY TRANSFER? Investigators work with ACMEO is all about technology transfer to them, and through them to other similar laboratories. Kadane regards this as a major opportunity, and consequently am willing to assist them in whatever means possible.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? In the end, the purpose of this work is to strengthen the system of justice, both in solving crimes and in preventing wrong or exaggerated evidence from being presented in court, which can have the impact of wrongful convictions. This is crucial to maintaining and strengthening the social and governmental system, so that it is both fair and just, and is seen to be fair and just.

CSAFE Annual Report 06.2017 FINGERPRINTS 86

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None related to this project.

Changes/Problems Actual or anticipated problems or delays and actions or plans to resolve them The death of Kadane’s colleague Stephen Fienberg has led to a reorganization and refocusing of this project. Kadane agreed with the memorandum from NIST concerning the impact of the unavailability of the YoonJain data on the feasibility of the previous formulation of the project. It was not clear until recently that the Yoon-Jain data was going to be unavailable. We have been exploring new opportunities that presented themselves, and pursued a revised research focus.

A future concern is that if study 3 is successful , we will have shown how to process the bags for fingerprints without destroying the ability of ACMEO to estimate the weight of drugs seized. The next task in helping ACMEO change the way it handles drug bags seized by the police will be to study how to estimate the weight of drugs seized, while opening a minimum number of them. While such a study is a legitimate outgrowth of our fingerprint project, and is legitimately an application of statistics, I will need guidance in whether to include such an effort as part of Project Q, seek approval of a separate project to do that work, or whether it is judged to be not in scope for NIST support. This question is hypothetical at this point, as it depends on the success of Study 3, but perhaps it is not too soon to start thinking about it. We will contact NIST prior to undergoing any of this work to gain clarity.

CSAFE Annual Report 06.2017 FINGERPRINTS 87

Project V - Latent Fingerprint Proficiency Testing

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Dan Murrie (UVA)

Other Investigators: Sharon Kelley (UVA), Brandon Garrett (UVA)

Henry Swofford, DFSC (collaborator).

Chris Christopher J. Czyryca, Collaborative Testing Services, Inc. (collaborator); Samantha Heise, Collaborative Testing Services (collaborator). Note: Swofford, Czyryca, and Heise received no CSAFE funds.

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The primary goals for this broader project have been to better understand (and eventually improve) the ecological validity of proficiency testing efforts, and eventually better calibrate latent fingerprint proficiency testing. Fingerprint analysts rely on commercial proficiency testing to demonstrate their skills, and they (or the attorneys that call them) often cite results of their proficiency testing as evidence that their work in a real case is accurate (Nichols, 2007). The project goal arises out of concerns that commercial test materials may differ in important ways from routine, “real world” evidence (e.g., Koehler, 2008; Pato & Millett, 2010). Therefore, there may be important ways in which performance on a proficiency test does not provide a true metric of competence or proficiency in the field. Without such metrics, both internal oversight (e.g., by lab managers) and external oversight (e.g., by accrediting bodies) may become much more complicated and much less meaningful.

Ultimately, managers of crime laboratories should have access to proficiency tests that meaningfully evaluate the competence and quality of analysts along multiple dimensions. Similarly, accrediting bodies need to be able to rely on the results of proficiency testing information to inform accreditation decisions. Recently, some have argued (e.g., Kohler, 2016) that these tasks call for different types of tests. However, the state of the field is such that only a handful of commercial options exist, many of which have been criticized for 1) being readily identifiable as proficiency tests (and labs often do not attempt to conduct proficiency testing in a “blind” fashion), 2) being simpler than real-world case work. Thus, there are strong reasons to wonder how well performance on proficiency tests truly predicts proficiency in real casework. There are also potential implications of this work outside of informal and formal oversight. Given emerging findings from Garrett’s research with mock jurors that lay people can and do attend to proficiency testing information, and calibrate their

CSAFE Annual Report 06.2017 FINGERPRINTS 88

judgments about forensic analysts accordingly, proficiency testing may have more relevance in criminal trials than previously realized.

A complicating factor in this overarching goal of improving latent fingerprint proficiency testing is the difficulty of calibrating tests when no objective measure of difficulty or complexity currently exists. Thus, at present, one can only make an inference about print difficulty based on the outcome of a proficiency test (i.e., through item analysis), instead of also being able to create tests that deliberately include items of varying degrees of difficulty, or being able to create tests at different levels of difficulty. Though the possibility exists to use quality metrics in the near future, there are certainly other aspects of the “difficulty” construct worth measuring.

Given these challenges, one of the study’s preliminary goals has been to develop a sense of the difficulty of existing commercial proficiency tests. Thus, initial stages of this research have involved pursuing different studies that will provide a gross estimate of difficulty that can then be refined and sharpened in subsequent work. Thus far, this research has involved collaboration with the Defense Forensic Science laboratory given their interest in developing a more ecologically valid proficiency test from their own database of latent prints.

This research, involving collaboration with the Defense Laboratory, will for the first time compare two models for proficiency testing and asses the number and types of errors on a test composed of realistic case samples as compared with an existing commercial test offered by Collaborative Testing Services, Inc. (CTS). Investigators also plan to compare the performance of practicing latent print analysts to participants at lower levels of training (i.e., forensic science trainees), or even those without training (i.e., university undergraduates).

WHAT WAS ACCOMPLISHED UNDER THESE GOALS, WHAT WAS DONE? WHAT WAS LEARNED? Following the on-site meeting at DFSC in Year 1 Quarter 3, major activities included collaborating with DFSC on development of test materials. Investigators submitted a proposal (jointly with DFSC) to the annual conference of the American Academy of Forensic Sciences to share preliminary results of this project in February 2017. Investigators proposed working directly with Collaborative Testing Services (CTS)—a leading proficiency testing company— as another strategy to explore directly their popular, widespread proficiency tests. CTS materials arrived at the end of August 2016. Since their arrival, major activities have involved developing specific plans for study design (though general procedures were identified prior to arrival of CTS materials, specific plans could not be developed until the materials arrived). Investigators then began the process of seeking approval from the University of Virginia Psychology Department and Law School to recruit participants. Major activities have included adapting CTS materials for the study and developing and collecting other materials (e.g., appropriate magnifying devices). As preparations for the study were underway, it quickly

CSAFE Annual Report 06.2017 FINGERPRINTS 89

became apparent that our first round of participants (i.e., college students) would require a brief introduction to latent print comparison to understand and have a reasonable opportunity to engage with proficiency test materials. Therefore, investigator requested training materials from colleagues at the Defense Forensic Science Center (DFSC) and searched for appropriate Internet materials. Once a suitable training video was identified and vetted by DFSC, investigators completed a small pilot study. Once CTS published the “Manufacturer’s Information” for the test that was ordered (i.e., the print sources for each latent in the proficiency test), investigators were able to determine results of this small pilot. Given this development and the modest results of the pilot study (i.e., an average accuracy rate of 6/12), investigators met consulting colleague, Greg Mitchell (UVa Law), to discuss the direction of this project and more carefully scrutinize the CTS terms of use. As a result of this meeting, investigators concluded that they would need to seek permission from CTS to use its materials for this particular study (our initial review of CTS policies led us to conclude otherwise).

Investigators met with the President of CTS, Christopher Czyryca, who expressed interest in collaborating on research to understand and improve proficiency testing in the forensic sciences by initiating a larger program of study in latent fingerprint proficiency testing. These conversations raised issues regarding the difficulties associated with designing and developing proficiency tests in the absence of objective, readily identifiable metrics of item difficulty, and significant variability in how individual analysts perceive test difficulty. As a result, a new study strategy was implemented. Specifically, investigators collaborated with CTS to develop questions addressing print difficulty and print quality to their large-scale proficiency testing, collecting data from their large examiner pool of participants. Though perceived complexity is a crude and imperfect measure of difficulty, it will provide useful baseline data to evaluate perceptions of difficulty across a large number of respondents.

The first collaborative study will occur with the second latent fingerprint proficiency testing of 2017, scheduled for distribution in August 2017. For this first study, CTS will administer our brief series of questions to all participants completing the CTS latent print proficiency testing. That is, participants will receive specific questions about the latent prints they are examining, and the proficiency test overall. Briefly, these questions include:

Regarding each of the latent prints: 1. On a scale of 1-10, how would you rate the challenge level of this questioned latent print? (1 – no challenge; 10 – extremely challenging) 2. On a scale of 1-10, what is your level of confidence regarding the correctness of your decision for this print? (1 – no confidence; 10 – extremely confident)

Regarding the overall proficiency test:

CSAFE Annual Report 06.2017 FINGERPRINTS 90

1. On a scale of 1-10, how would you rate this test for overall difficulty? (1 – extremely easy; 10 – extremely difficult) 2. On a scale of 1-10, how closely do the latent print images in this test compare to latent prints received in your casework? (1 – nothing like casework; 10 – exactly like casework) 3. Which latent print did you find to be the least challenging? 4. Which latent print did you find to be the most challenging? 4b. What characteristic(s) of this latent print led to it being the most challenging?

(select all that apply) -limited points to compare -distortion -overdeveloped or underdeveloped ridge detail -image quality -other (please explain) Contributing to our broader project goals, these simple questions will provide initial quantitative data regarding the perceived difficulty of the prints in these popular proficiency tests, as well as examiner impressions regarding the features of prints that are most challenging. The brief questions regarding examiner confidence in their conclusions will allow us to examine the relation between confidence and accuracy (an empirical question not central to this project, but relevant to other concerns in forensic science).1 Investigators were also recently made aware through Samantha Heise of CTS that the OSAC Friction Ridge Subcommittee is similarly interested in evaluating the complexity/difficulty of CTS latent print proficiency tests. Ms. Heise advised them of the CSAFE pending research so that, ideally, investigators can collect data from CTS consumers and members of the OSAC subcommittee on the same test. Investigators anticipate that this recent development will allow the collection of more meaningful data and better assist the forensic science community. Given the stage of this project, investigators anticipate that the collaboration developed with CTS will be a significant asset for further research on proficiency testing that dovetails with the interest/needs of the OSAC Friction Ridge Subcommittee.

Because this study simply involves adding questions to CTS proficiency testing materials, all preparation with CTS has been complete.

1 Latent print examiners and other forensic scientists often testify regarding their confidence in the conclusions, but research from experts in other disciplines suggests that confidence is often unrelated to accuracy.

CSAFE Annual Report 06.2017 FINGERPRINTS 91

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Given the status of this project, it has not yet provided opportunities for training or professional development. We anticipate that, after data have been collected, analyzed, and shared, the project will provide meaningful opportunities for trainings within crime labs and professional development in the form of conference presentations and outreach to crime labs. Indeed, in the longer term, this project may lead to more rigorous, or better operationalized, means of training and assessing proficiency.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Once data have been collected and analyzed, investigators plan to publish the results in relevant academic journals in disciplines with a potential interest in these findings (e.g., forensic science, law). Investigators believe that the legal community, in particular, will be interested in the results of this study as it could shape the way that they attempt to access or use results of proficiency testing in criminal cases.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? During the next year, CTS will distribute their proficiency testing materials, and thus participant responses will begin to become available, following which investigators will begin preliminary analysis of results. As plans for this project move forward, investigators will also continue discussions about possibilities for using CTS resources (e.g., access to participants, their existing dataset of latent prints) to further examine difficulty and ecological validity of proficiency testing. Additionally, investigators will continue work with Henry Swofford (DFSC) on their lab’s development of a more realistic/ecologically valid proficiency test, and how their test compares to commercially available tests.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS A proposal was submitted (jointly with DFSC) to the annual conference of the American Academy of Forensic Sciences.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Dan Murrie (principal investigator) - funding support in Years 1-2. Planned and participated in design of future experiments and collaborations with crime laboratories./CTS

Sharon Kelley, researcher – funding support in Years 1-2. Planned and participated in design of future experiments and collaborations with crime laboratories/CTS.

CSAFE Annual Report 06.2017 FINGERPRINTS 92

Brandon Garrett (UVA): funding support in Year 2. Planned and participated in design of future experiments and collaborations with crime laboratories/CTS.

Garrett, Murrie, and Kelley each spent at least one-person month of time on this and their other CSAFE projects (combined).

Henry Swofford, DFSC (collaborator): advised on study design. Receives no CSAFE funding.

Chris Christopher J. Czyryca, Collaborative Testing Services, Inc. (collaborator). Advised on study design. Receives no CSAFE funding.

Samantha Heise, Collaborative Testing Services (collaborator). Advised on study design. Receives no CSAFE funding.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Defense Forensic Science Center -- U.S. Army Criminal Investigation Laboratory Forest Park, Georgia

DFSC is supplying materials from which the more realistic, representative alternative proficiency test is being created . DFSC, specifically Henry Swafford and Anthony Koertner of the U.S. Army Criminal Investigation Laboratory, have often referenced the flaws in existing latent print proficiency testing and advocated for improved tests that better simulate the demands of real-world casework. Thus, they have been enthusiastic collaborators in this project since its inception and volunteered the use of DFSC prints and their time to create a more realistic proficiency test that can be used for this research. They have also made clear that they are invested in using the results of this study to inform future work designed to improve proficiency testing.

Collaborative Testing Services, Inc (CTS) Sterling, Virginia Collaborative research is underway. Christopher Czyryca and Samantha Heise have collaborated on study design. Mr. Czyryca and Ms. Heise have expressed interest in improving proficiency testing in the forensic sciences with the ultimate aim of making these tests more ecologically valid and therefore useful. They are enthusiastic about pursuing research in this area, and, as a first step, have agreed to include the questions described above (about item and test difficulty) in the next round of latent print proficiency tests. Researchers acknowledge that this project would not be possible without their willingness to distribute our questions and collect data as part of their widely-used commercial proficiency testing program.

CSAFE Annual Report 06.2017 FINGERPRINTS 93

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? At a foundational level, investigators anticipate that results from this research will reveal whether existing proficiency testing actually corresponds with performance on a sample of real-world case work, and therefore inform whether proficiency testing should be modified to reflect a wider range of difficulty. Should results suggest changes, patterns of errors across participants will inform the types of changes that need to be made so that proficiency testing better represents performance on actual cases. Collectively, these results will eventually allow crime labs to: a) have more faith in the results of proficiency tests as a metric of employee performance, b) use results of proficiency tests to track employee performance and identify improvements that result from training and mentorship opportunities. Further, although the results of this preliminary work will be limited to latent print proficiency tests, the method used to study and improve proficiency tests could be readily adapted to other forensic science disciplines.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Results of this research can be used to inform legal audiences (e.g., lawyers, judges, and jurors) when analyzing and weighing proficiency testing evidence. For instance, Garrett’s research suggests that, when presented with proficiency data, jurors do not entirely disregard forensic evidence. Rather, jurors appeared to be calibrated in their response to negative information about proficiency test performance.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None

Changes/Problems At the earliest stage of this project, we experienced a delay due to the timing of the shipment of CTS materials. Though the order was placed on February 2, 2016, materials did not arrive until the end of August 2016. While waiting for materials, investigators used the time to prepare a proposal for AAFS and DFSC. In addition, modest delays were experienced due to the inability to examine results of the small pilot study until CTS published the Manufacturer’s Information report. Investigators also re-evaluated the initial study design after meeting the CTS president and have opted to both seek permission to alternately use CTS materials in the preliminary study and attempt to engage CTS in a larger program of research to improve proficiency testing. As a result, a new small study of proficiency test difficulty was begun in collaboration with CTS which capitalizes on a new opportunity consistent with project goals. This latter project, in collaboration with CTS, is our immediate focus for the coming months.

CSAFE Annual Report 06.2017 FINGERPRINTS 94

Project X - Quality Metrics for Latent Fingerprints

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Karen Kafadar (UVA)

Other Investigators: Henry Swofford, DFSC; Heidi Eldridge, RTI; Alicia Rairden, HSFC; Karen Pan (graduate student in statistics at UVa)

Note: Swofford, Eldridge, Rairden receive no CSAFE funds. Kafadar received no salary during this period.

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The broad objective of this project is to develop an objective metric that can be calculated quickly on any latent fingerprint image, and to correlate that metric with accuracy of the call. The metric may involve a combination of “quality assessments” for individual features (minutiae) as well as “global quality” metrics that capture the information from an entire print. We expect that a successful classification algorithm will involve combinations of algorithms. We will calibrate the final IM with number of minutiae and accuracy based on calls from latent print examiners (LPEs). Because we expect that no single metric (among these 4) will be uniformly best in all situations, we made the decision (based on advice from our field experts, Swofford, Rairden, and Eldridge) to write our own program to call the quality metric algorithms and obtain the results, so we can identify which algorithms are best for which features of a print.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? Our funded graduate student, Karen Pan (KP), has now enhanced her computer program to read the fingerprint images and calculate two quality metrics from algorithms: the QM scores from the Peskin-Kafadar approach (see manuscript submitted with Y2Q2 report) and Noblis’ LQAS algorithm. She has written the code so that the results from each algorithm are easily viewable as well as storable in a format that will enable future analysis and combination. We presented the output from that algorithm at the CSAFE all-hands-on meeting (poster and presentation).

We are still waiting for two further algorithms. The algorithm from Stephan Huckeman and his student Robin Richter (University of Gottingen) uses a wavelet decomposition on the image (see https://www.samsi.info/wp-content/uploads/2016/03/Robin-Richter- SAMSI_Presentation_fingerprint_quality_measure.pdf ); they will send the algorithm when it is ready for distribution and are enthusiastic about participating in this study.

CSAFE Annual Report 06.2017 FINGERPRINTS 95

A fourth metric to be included in the study is the one from Henry Swofford (DFSC) for his DFIQI metric. Henry has now prepared a manuscript for his “score-based likelihood method” for matching fingerprints (FRStat), so he expects to be able to give the algorithm for his DFIQI metric when it is completed, probably in late summer or early Fall 2017. (As a courtesy, Kafadar reviewed his FRStat manuscript and provided comments to him.) Kafadar also worked with students in her Stat 4995 course, “Statistical Consulting for Undergraduates”, in the analysis of data presented by “clients” Sharon Kelley (see Project T) and Brandon Garrett & Gregory Mitchell (see Project U) on effect of proficiency test information on jurors’ assessments. Kafadar also contacted Professor Keith Inman, Cal State East Bay, who will use students in his class this fall to provide “known ground truth pairs” of matching prints.

We assisted in the analysis of data collected by Dan Murrie & Sharon Kelley (Project T) and by Brandon Garrett and Gregory Mitchell (Project U).

Karen Pan and I met with Dan Spitzner and his graduate student, Maria Tackett, on their project HH, and arranged to have Maria meet latent print examiners (LPEs) at Virginia Dept of Forensic Sciences (VaDFS).

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Our statistics graduate student, Karen Pan, continues to learn about the latent fingerprint identification process. Another one of our graduate students, Alice Liu, spent Fall 2016 semester at DFSC and is a co-author on the DFSC FRStat manuscript. No training of forensic practitioners is needed on this project until the metric(s) are fully developed, implemented, and validated.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Our QM algorithms are being discussed at presentations and training to Forensic practitioners but we have not yet disseminated any of the results yet. Alice Liu is working with Henry Swofford to inform forensic practitioners of FRStat.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? Consistent with the “Revised timeline” provided in the Project X for Year 2, Quarter 3 (and repeated below), we plan to (a) implement the Huckeman-Richter metric and Swofford’s DFIQI; (b) solicit from Professor Keith Inman (California State East Bay) his “ground truth” pairs of prints that will be used in the study; (c) calculate our quality metrics on both the “ground truth” pairs and on “highly similar non-matching” pairs; (d) contact forensic and crime laboratories to solicit consent from latent print examiners (LPEs) to participate; and (e) distribute test data sets.

CSAFE Annual Report 06.2017 FINGERPRINTS 96

Note that a possible shortcoming of our work is that we must rely on LPEs who are willing to participate, not on a genuinely representative random sample of practicing LPEs. Indeed, not all LPEs are certified. So it is not possible to select a random sample of LPEs from even a list of certified LPEs, because even the proportion of LPEs that are certified is unknown. Thus, our “calibration curve” (between our final quality metric, which will involve number of minutiae as well as involve aspects of all four algorithms, and accuracy of call) will be biased by the LPEs who agree to participate – and they may be those LPEs who are highly skilled, highly confident, or more experienced. We expect that, with additional use over time by a wider class of LPEs, we will gain experience with this metric and modify the “calibrate curve” as needed.

Kafadar submitted an abstract to the Forensic Science International Conference on Error Management in Gaithersburg, July 24-28, 2017. The abstract was accepted, and, although the topic is not related directly to this CSAFE project, the conference will provide opportunities to discuss the Quality Metric project with forensic scientists in attendance.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Our statistics Ph.D. students, Karen Pan and Alice Liu, are learning about quality metrics for fingerprints.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS “Quantifying weight of pattern evidence: General concepts,” Presentation to the 11th meeting of the National Commission on Forensic Science, 13 September 2016.

“Quality metrics for pattern evidence: Development and evaluation,” Forensic Science Workshop 3, Isaac Newton Institute, Cambridge U.K., 11 November 2016Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Henry Swofford, DFSC (collaborator): advise on study design. Receives no CSAFE funding.

Heidi Eldridge, RTI (collaborator): advise on study design. Receives no CSAFE funding.

Alicia Rairden, HFSC (collaborator): advise on study design. Receives no CSAFE funding.

Prof Keith Inman, Cal State-East Bay: obtain “ground-truth” matched print pairs. Receives no CSAFE funding for his time; has been promised a modest contribution ($1500) for materials needed for fingerprint data collection and capture.

CSAFE Annual Report 06.2017 FINGERPRINTS 97

Karen Pan, UVA (graduate student, statistics): working on platform for comparing metrics. Receives graduate student stipend.

Alice Liu, UVA (graduate student, statistics): working on FRStat with Henry Swofford. (She received hourly wages from CSAFE for her work on Project Y with Jeff Holt.)

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? DFSC (Swofford), HFSC (Rairden), Cal-State East Bay (Inman). None of them accepts CSAFE funds; only Inman has been promised a modest sum ($1500) for print collection and capture.

These are domestic partnerships only.; see (2) above. (Note: the authors of the “Huckeman- Richter” algorithm are from University of Gottingen, but the University of Gottingen is not involved except as Huckeman’s and Richter’s stated affiliation.)

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The forensic science community has recognized the relationship between “accuracy of call” and both the number of features and their “quality”; see SWGFAST “sufficiency chart” (Figure 1). This chart was not based on data; it was based on “expert opinion.” The importance of this study will be in the development of a metric, or a set of metrics, for selected features, that, together, will result in an estimated probability of an accurate call. If the metrics have low scores that are associated with low probabilities of correct calls, then the crime lab director may choose to use its human resources on other evidence which is judged more likely to provide useful, accurate information. This study will be important to the legal community also: an examiner can state an average probability (with standard error) for a correct call, given that the “information content” from an objective algorithm was a given value. Finally, the study will be of academic interest to researchers on combining information metrics in optimal ways. Thus, the project has real academic, scientific, and practical value.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? The methods used in developing the quality metric for latent prints (e.g., gradients of image pixel density, contrast, wavelet approximations, …) likely can be applied to the development of quality metrics for other pattern evidence, such as bullet cartridges, tool marks, and shoe prints.

CSAFE Annual Report 06.2017 FINGERPRINTS 98

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? As described, if the image has a low quality score, and the low score can be shown to be associated with a low probability of a correct call, the manager may decide to allocate human resources to other forensic tasks that have higher probabilities of accuracy.

WHAT IS THE IMPACT ON TECHNOLOGY TRANSFER? The quality metric algorithm will be disseminated to forensic laboratories.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? Having an estimate of the predicted accuracy of the call based on an objective (vs subjective) metric will add great credibility to examiners’ assessments.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? N/A

Changes/Problems Actual or anticipated problems or delays and actions or plans to resolve them:

Huckman and Richter have not yet submitted the code for their algorithm, but we hope they will do so next quarter (Y2Q4). Swofford is delayed in submitting his code for his DFIQI algorithm; he expects it will be ready sometime in summer or fall 2017. Via email on 27 Mar 2017, Susan Ballou approved our request to revise our timeline for deliverables on this project so DFIQI can be included in our comparison. The new timeline is given below.

Revised timeline 2017 KAFADAR CONSENT TO AUG 31, YR 3 IMPLEMENTATION PARTICIPATE 2017 2017 Kafadar Recruit Labs Consent to Dec 31, Yr 3 to participate Participate 2017 2018 Kafadar Distribute Distributed May-Jun, Yr 3 Test prints Data Sets 2018 2018 Kafadar Follow-up Communication Sep 30, Yr 4 with Labs Transmissions 2018 2018 Kafadar Analyze Report of Jun 30, Yr 4 Results Analysis 2018 2018 Kafadar Analyze Report of Dec 31, Yr 5 Results Test Results 2018

CSAFE Annual Report 06.2017 FINGERPRINTS 99

2019 Kafadar Analyze Report of Dec 31, Yr 5 Results Test Results 2019

CSAFE Annual Report 06.2017 FINGERPRINTS 100

DIGITAL FORENSICS

Project D - StegoDB: An Image Dataset for Benchmarking Steganalysis Algorithms

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Jennifer Newman

Other Investigators: Yong Guan (Iowa State University), Min Wu (University of Maryland)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The broad goals of this project are to design and create a database of image data that provides a standard, authenticated image dataset for benchmarking steganalysis tools by academic and forensic scientists. In quarters three and four, we added image data to the database; continued our steganalysis experiments to exemplify the use of our collected data; working with Alan Dotts to develop a website to handle the public availability of our StegoDB database; and submitted abstracts to conferences and attended conferences. Finally, we used the reverse-engineered mobile phone stego app on the computer to generate thousands of cover and stego images, allowing us to develop general steg detection algorithms using machine classifiers. We automatically generated thousands of stego images on the computer using the computer-equivalent of the mobile phone stego app PixelKnot, and used the data to run steganalysis experiments on. We applied StegoHunt and StegDetect to several thousand images, and compared those results for steg detection to the machine learning classifier results. StegoHunt nor StegDetect were not able to detect a single stego image produced by PixelKnot. However, the machine learning classifier did detect a large percentage of the stego images, from 81.7% correct classification rate to 99.2% correct classification rate, depending on the size of the message embedded. We are looking at a second mobile app now.

The year 2016-2017 focused on 12 tasks:

Task Description Date Due Install software to run database 9/30/2016 Identify fields for image records 9/30/2016 Identify image parameters to process for database fields (noise, etc.) 9/30/2016 Identify statistical measures to use for database evaluation 9/30/2016

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 101

Begin code implementation for parameters processing on data 12/31/2016 Begin code implementation for statistical measures 12/31/2016 Use social media, photography phone apps for cover images 12/31/2016 Begin process to identify crime labs to collaborate with 12/31/2016 Create cover, stego images, side info, enter into database – 50 images 12/31/2016 Identify and implement academic and other steganalysis algorithms 3/31/2016 Write code for web site request, email with link to download 3/31/2017 Write manuals, final reports, submit manuscripts 5/31/2017 Populate database with 500 images 5/31/2017

EXECUTIVE SUMMARY OF THE PROJECT. All twelve tasks during this first year of our project have been accomplished. Our main goals were to (1) create a set of data that allowed benchmarking of steg detection algorithms in a reliable and repeatable manner; (2) populate the database with data; and (3) evaluate and validate the database. Our decision to collect only mobile phone photos makes our database unique among image forensics databases. We have collected almost 50,000 original photos, compared with 500 as a target in our proposal. No such dataset existed at the start of our project. The purpose of focusing on mobile phone photos is to provide forensic practitioners and academic steganalyzers with a collection of authenticated photos from mobile phones, as mobile phones are now a part of many forensic investigations. Additionally, we worked with IT personnel in ISU’s College of Liberal Arts and Sciences to help propose a public data portal that CSAFE will use as a repository for our and others’ forensic data. To provide additional support for image forensic practitioners, we pursued a second focus on mobile phone steganography apps. Using image photos we generated from mobile steganography apps, we have performed initial benchmarks of commercially available steg detection software (StegoHunt) and software developed in-house by DCCI (StegDetect). Our team reverse-engineered the stego app PixelKnot and created a computing environment on a computer that emulated PixelKnot running on a mobile phone. This enabled us to create many thousands of stego images in large batches on the computer, as if they were generated using PixelKnot on the phone. With these stego images, we then created machine classifiers on a computer, whose steg detection accuracy is higher than using the only other software available for steg detection of app stegos (StegoHunt, DCCI StegDetect). In addition, we are investigating detection of stegos produced by PixelKnot in a direct method, rather than by machine learning classification. We are currently extending our efforts on reverse engineering apps to other

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 102

apps. In summary, our effort to produce a dataset of mobile phone images that are useful for benchmarking steg detection software has been very successful.

Our dissemination efforts include a number of internal presentations on our work, and presentations to some CSAFE-NIST conferences in the past year. Abstracts have been accepted at two larger forensic conferences, where the PI will present our work. These two conferences are the joint NIST-FBI second International Symposium on Forensic Science Error Management, held at Gaithersburg, MD, July 24-27, 2017; and the Tenth International Conference on Forensic Inference and Statistics, held in Minneapolis, MN, Sep. 5-8, 2017. We will submit a conference paper by August 15 to the conference Electronic Imaging 2018, Media Watermarking, Security, and Forensics 2018, and are also preparing a journal manuscript.

TASK DETAILS.

First Quarter. The first quarter was spent on designing the data collection process and the dataset storage framework. We chose a NoSQL database, MongoDB, as it provides flexibility in creation of additional data fields to accommodate the evolving data collection process in future years. We installed an integrated development environment (IDE) called PyCharm to make the coding tasks easier to implement. We decided to select ISO and exposure time as camera settings that encompassed statistical variability in the photo collection process. We identified the fields for our image data, and identified parameters that we will to extract from each image. The statistical measures for evaluation of the database were identified, and we performed an analysis of the few current forensic databases. From this analysis, we drew insights to update our guiding principles for design of the database. The guiding principles we created follow good practices for forensic purposes, for statistical quantization of information on the photos, and for scientific repeatability of the experiments.

Four guiding principles for constructing the database for StegoDB:

1. Data are copyright-free. 2. Authentication of the origin/pedigree of each image. 3. Collection of a minimum list of specific information on each image in the database, including statistically relevant factors. 4. Collect data in such a way so that experiments using StegoDB image data are repeatable. Specifically, if different two different sets of statistically similar data are given to two different users to perform an experiment, the results of each experiment can be expected to be similar, within some standard deviation range. We identified fields to store in our database. These are:

1. Original file name, represented as a unique path name to a specific image 2. Cover file name, represented as a unique path name to a specific image used as a cover image

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 103

3. Stego file name, represented as a unique path name to a specific image used as a stego image 4. Image file format (JPEG, DNG, PNG, etc.) 5. App name and file of stego app itself (unique path), for a stego image 6. Version number of app used, if stego image 7. Payload itself, if text; or unique path name to image if data embedded is an image 8. Passwords/keys if any needed for embedding 9. Phone name (unique) 10. Phone model 11. Meta data itself, stored with subfields for its information 12. Image dimensions (# of rows, # of columns) 13. ISO and exposure time settings. 14. Photo app if used and any associated information (default settings, etc.) 15. Pixel-value saturation parameters.

Second quarter. During the second quarter, we finished code implementation for parameters that we process and store on images in the database. These include importing EXIF data for each original image stored in StegoDB; calculation of the saturation values for both high and low ends of an image’s intensity; size or dimension of each image; image format (TIFF, RAW, PNG, JPEG); scene information (outdoor or indoor), and separate fields for ISO and exposure time settings. We investigated calculations or estimations of noise level for an image, with the expectation to use such an algorithm for estimating inherent camera noise for original images in StegoDB. However, without exception that we were able to find in the literature, all noise estimation algorithms are quite good at estimating noise when the amount of noise is much higher than the known inherent camera noise. The algorithms and models are very poor at estimating noise when it is a small fraction of the image magnitudes, as is the case for inherent camera noise [1] or noise introduced by steganography. These noise algorithms may be better for purposes other than our initial use, such as estimating noise for image data under other circumstances such as heavy compression, social media use (Facebook as it downsamples an image stored on its servers), or other situations. Ultimately, although we wrote code to extract a “noise” feature from an image, we decided not to include a measure of “image noise” in our database.

Standard, state-of-the-art steg embedding and detection algorithms used in academia were implemented in code during Quarters 2 and 3 for our database experiments. We chose the machine learning classifier “ensemble classifier,” developed by J. Kodovsky of Dr. Jessica Friedrich’s group [??], and three top embedding algorithms [???2}. These algorithms are used consistently by authors publishing their works in steganalysis in the top research journals. Our goal is not to show an improved steg classifier, but to explore other nonconventional factors that may be useful for benchmarking steg detection algorithms.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 104

Near the end of Quarter 2, we received our first computer and all twelve mobile phones, and spent time to make all operational. Later, in Quarter 3, we purchased a second computer as we had three students working on the project and only one computer. For the mobile phones, this entailed updating the operating system software on each phone, creating Iowa State University netids (email accounts) for each one so the wifi was available to use, and establishing google and/or Apple accounts for each of the twelve phones. We then downloaded and investigated camera apps and stego apps for Android and Apple OS, and decided which ones were appropriate to use. We started the data-taking process, and began population of the database with cell phone images. We started the process of designing and implementing an experiment that uses our database in a way different from any published work in steganalysis, that showed how such a database could be used to benchmark steg detection performance. In addition, at the end of Quarter 2, we requested and received additional funding for one graduate student for six months to reverse-engineering several Android stego apps and get code executing on a computer. Our goal was to be able to batch- generate stego images on a computer, rather than use the tedious process of generating stego images using the app on the phone. By the end of Quarter 4, we were able to demonstrate that our procedure to reverse-engineering the app was successful, allowing us to generate many thousands of stego images on a computer, much faster than we could ever by hand.

Also in Quarter 2, the database framework was completed and we started populating the database with trial data. Image data were uploaded to the database by writing a Python script, and appropriate fields were created. The “toy” database was used to develop proof of concept results during the past several months in advance of our first experiment. Computer programs were written to write parameter values into the database.

In Quarter 2, the PI began the process of identifying crime labs for collaboration and feedback about steg detection in the wild. It was only near the end of Quarter 4 that final progress was made in this area, by discovering Mr. Bill Eber, manager of the DCCI Crime Lab. He supplied our team with a copy of the software program their lab uses for steg detection, StegDetect, developed in their lab. We call this software program “D-StegDetect.” That, along with commercially-available StegoHunt by Wetstone, comprise the non-academic forensic software that we have tested for its ability to perform steg detection. The results of applying several thousand of stego and cover images to StegoHunt and D-StegDetect showed that no stego images produced in our lab by various methods of embedding were detected by either software.

StegoHunt detected zero stego files out of 2063 cover images and 1883 stego images. Of the 2063 cover images, 1250 were flagged for “carrier anomalies-file of interest shows inconsistent data structure”. This was likely due to their PNG file format, which may have

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 105

been different from the file signature for PNG stored by StegoHunt. The remaining cover images and all 1883 stego images had no flags at all.

D-StegDetect detected zero suspected files out of 1323 cover images and 1740 stego images. This result is unexpected because the stego images were created with the PixelKnot app, which uses F5 and which D-StegDetect claims to be able to detect.

Neither StegoHunt nor D-StegDetect made any false positive (FP) identifications either (a FP is a cover labeled as a stego), so the FP rate (FPR) was 0%. However, both programs identified all stego images as cover, i.e., both programs missed identifying any stego image as a stego. This is called a false negative (FN). Thus, both programs give a 100% FNR. Finally, we acquired from the Internet a very old software program created by Dr. Neil Provos, from around the year 2000, also called StegDetect, that claimed to detect stego embedding in jpeg image files. We call this program P-StegDetect. (D-StegDetect can be used on any file, including any type of image formatted file, including many old steganography embedding codes.) When we applied P-StegDetect to 723 cover images (in jpg format) and 1883 stego images (also in jpg format), the results showed 26% FPR (26% of cover were misclassified as stego), and a 74% FNR (74% of stego were misclassified as cover). These are, of course, unacceptably high in practical situations.

During Quarter 3 of the project, we focused on the process of acquiring images for the image database, the design and carrying out of initial experiments to demonstrate the usefulness of the steganalysis dataset, and the creation of images to populate the database. We spent much effort on designing the data acquisition process and developed steganalysis experiments that would highlight the use of statistical aspects in the dataset. In particular, we wanted to test the hypothesis that specific camera settings that typical introduce noise and variability into an image would impact the error rates of classifying an image as stego or innocent. This type of investigation had not been previously tested in the steganalysis community, because authenticated data gathered specifically for this kind of testing has not previously existed. Our goal was to collect such data for our database and use it to test our hypotheses. Previous databases in the stego community have collected data in an ad-hoc manner, using the “auto” setting of the cameras, utilizing rented still digital cameras or cameras owned by the research team members, or for different purposes other than steganalysis. Our image collection uses mobile phone cameras exclusively to acquire the photos, no still cameras, where the mobile phones were purchased solely for the use of CSAFE projects, and includes photos at specific ISO and exposure time settings, among other differences with previous datasets. Our novel experiments show how error rates are linked to ISO and exposure settings of the images, and this information may be practical in development of future digital camera forensic applications. For example, our results show that false positive rates and false negative rates for steg detection can be reliably quantified in

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 106

context of a specific model, not a specific device. This may lead to developing specific tests, including error rates, of a phone that is part of a criminal investigation and used as part of criminal evidence. How? It may be the case that a steg detection algorithm can be created for a particular mobile phone model using phones available in a criminal laboratory setting, and then that detection algorithm could be used on that model of phone that is part of a criminal investigation.

See the individual quarterly reports for more details on the accomplishments during each quarter.

In Quarter 4, we produced a document that describes the database structure and data fields contained in the NoSQL database we use (Mongodb), as well as a description of the data we have collected from our mobile phones and the data that we have produced from running steg detection algorithms using the original photo data. This document, called The Dataset and Database for StegoDB Manual, is included as Appendix Exhibit 2 for StegoDB (Project D). The dataset itself will be available later this year, through the CSAFE Forensics Portal.

We continue to run experiments using the data we have gathered. We will present our work related to the database and phone apps at two conferences: the joint NIST_FBI second International Symposium on Forensic Science Error Management, held at Gaithersburg, MD, July 24-27, 2017; and at the Tenth International Conference on Forensic Inference and Statistics, held in Minneapolis, MN, Sep. 5-8, 2017. We will submit a conference paper by August 15 to the conference Electronic Imaging 2018, Media Watermarking, Security, and Forensics 2018, focusing on our database and steganalysis experiments.

The PI also participated in the CSAFE Digital Evidence Workshop - May 8-9, 2017 at Virginia Tech Executive Briefing Center, Arlington, VA, and the CSAFE All Hands Meeting in Ames, IA, June 8-9, 2017.

Database and population of the database. We have been working with Alan Dotts and his team members at Iowa State University in organizing and presenting our database through the CSAFE Forensics Portal. They will use the Mongodb database framework we have created, and populate the database with data we select for public access. They will create the website login through the Portal so that verified users of the public can download our dataset appropriately. We expect the database to be available to the public later this year.

During the fourth quarter, we acquired tens of thousands of additional original photographs using our 12 mobile phones. They were taken using the camera app ProCam (iOS) and Manual Camera (Android). These apps allowed the photos to be saved in a RAW format and also jpg format. We trained a small army (around 10) undergraduate students in our special photo collection, to ensure that a great deal of the photos thus acquired were appropriate for our experiments. Some photos were deemed unacceptable due to factors such as improper

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 107

orientation; cropped image sizes; content too bright or dark; etc. The population of our database, now at around 50,000 original images, continues well beyond the proposal’s initial estimate of 500, due in large part to the undergraduate students we hired.

In addition to the original images taken by the undergraduates, we have created other types of images. Starting from the original photos, we created images that are used for our experiments. Some of these images will be available in the database, and some will not. For example, to run our comparative steganalysis experiments, we created cover images of size 512X512 by cropping out five subimages from the original image data. See Figure 1. Thus, from each original image, 5 subimages were created and used for our experiments. Smaller images (512X512) than the original (3000X4000) are necessary due to the much higher computational load of the larger, original images.

Figure 1. Example of five subimages cropped out from an original image and changed from color to grayscale. A selected subset of the cover images we used to create our experiments will be available in the public database. Stego images created for the academic algorithm experiments, for example, will not be available. This is standard operating procedure in the steganalysis community, as experimenters create their own stego images from the cover images available. A representative set of the stego images created using the reverse-engineered mobile phone app, however, will be available in the public database. This is because the creation of stego images using phone apps is slow and tedious, and no such collection of stego images exists. EXIF, or metadata, and other parameters of the data will also be available in the public database.

Reverse-engineer of mobile phone apps. The fourth quarter also focused on reverse- engineering mobile phone apps for the purpose of generating stego images in large quantities. Wenhao Chen, a PhD candidate in Computer Engineering, and Yangxiao Wang, an undergraduate student in Computer Engineering, joined our team to perform these tasks.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 108

They were able to reverse engineer one app – PixelKnot - and are working on a second. Their work has proven to be critically helpful in advancing our ability to develop advanced machine learning classifiers to detect stego images produced by mobile phone apps. Since we were success in automatically generating thousands of stego images, we were able to apply machine learning classification to these images, with excellent results. Steg detection using these artificial-intelligence-based approaches has not been reported in any literature we have been able to find. Mobile phone apps are one way to send an image that has a hidden message, and apart from some commercially available software such as StegoHunt, or federal-lab developed software – D-StegDetect – there appear to be no steg detection capabilities available. Our findings show that academic machine learning algorithms are successful at identifying stego images produced by a mobile phone app. We also compared steg detection using academic classifiers with StegoHunt and StegDetect-see results discussed above.

WHAT WAS ACCOMPLISHED? Significant Results

(1) Our experiments continue to show that use of camera settings of exposure and ISO value can not only drastically reduce the error rates of the classifiers using standard academic steganalysis classifiers, but can classify stego images taken with other phones of the same model; (2) We continue to explore algorithm-blind detection using our database, showing instances where classifiers that use one embedding algorithm can detect stego images embedding using a different algorithm, at a rate significantly lower than random; (3) We continue to increase the data in our database by tens of thousands of original photos from our 12 cell phones with specific settings including ISO and exposure time; (4) We created a successful machine-learning algorithm using many thousands of stego images created by a phone app; (5) We successfully reverse-engineered one stego Phone add (PixelKnot) and have ideas to detect stego images embedded using PixelKnot by direct inspection of an image and not using a machine learning algorithm; (6) We are investigating a second stego app, DaVinci, to develop a steg detection algorithm and direct-inspection method similar to PixelKnot; (7) We have a contact - Bill Eber at DCCI - and he procured our team a copy of the software his lab uses for steg detection called StegDetect; we did run several thousand image data through this software, including stego data, and no stego images were detected by DCCI’s StegDetect.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 109

(8) One other commercial software package, StegoHunt, was also tested for steg detection and resulted in no stego images detected by that software, although approximately half of the image data was stego images. (9) A free steg detection software available on the Internet – Provos’ StegDetect – had very high false positive and false negative rates, and could be applied only to jpeg image files.

Separately, the database is in the process of being moved to a platform where the public has access to well-vetted portions of our image data. This is being implemented by our StegoDB team and by Alan Dotts and his team from the College of Liberal Arts and Sciences (LAS). LAS will provide IT support for all databases hosted at that site, including creation of the web-access pages. This has allowed us to focus on more process-oriented construction of our database. We now have access to the CSAFE servers and are beginning to use it for processing of data and data storage.

2) Key outcomes or other achievements. The specific objectives of each project task were met. We are very excited to submit our results to not only the academic community, where we believe our results will offer new directions to researching steganalysis, but to the forensic community, where exciting opportunities lay ahead for practical applications. These initial results provide inspiration for our future experiments.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Training and professional development activities include (1) weekly meetings of the faculty and students on the project, where research guidance and other professional development activities were provided to the students. This included discussion on background material for the project; how to conduct effective publication searches; the use of cell phone apps for producing stego images; (2) individual one-on-one weekly meetings with the two PhD students (conducted by Jennifer Newman), where Li Lin and Stephanie Reinders discussed weekly progress and received direction for future research activities, including research perspective, manuscript preparation, discussion of networking, and course and programmatic issues; (3) individual meetings with the Computer Engineering students, Ph.D. candidate Wenhao and undergraduate Yangxiao, to provide direction that integrates with the rest of the project’s goals in an efficient manner; (4) meetings with the undergraduate students taking pictures, who learned about steganography, steganalysis, and image acquisition.

Other activities include several seminar presentations on the Iowa State University campus, a visit by the PI to Carnegie Mellon University in July 2016, and the visit of Dr. Min Wu to the Iowa State University Campus, to present a colloquium, and to collaborate with the PI and

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 110

other collaborators on the StegoDB project in April 2017. The PI attended the NIST-CSAFE conferences in November 2016, May 2017, and June 2017 as well.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? We have shared work and results with scholars, CSAFE members, and at NIST-sponsored conferences where other interested persons attended. Our abstract was accepted for presentation at the joint NIST_FBI second International Symposium on Forensic Science Error Management, held at Gaithersburg, MD, July 24-27, 2017. Our submitted abstract was also accepted at the Tenth International Conference on Forensic Inference and Statistics, held in Minneapolis, MN, Sep. 5-8, 2017. The PI will be giving presentations at both conferences. Jennifer Newman’s contact through Steve Watson, a member of the Scientific Working Group on Digital Evidence (SWGDE), reached out to Bill Eber at the CSAFE Digital Evidence Workshop - May 8-9, 2017 at Virginia Tech Executive Briefing Center. This has resulted in dissemination of the DCCI’s StegDetect’s steg detection rate and StegoHunt’s detection rate to the All-Hands CSAFE Meeting on June 8-9, 2017. Other updates on our project’s results were also presented at the both these meetings.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? For the next year starting June 1, 2017, we will continue to expand our steganalysis experiments and gather results using our database. This includes continuing the experiments on using our ISO-exposure image data to explore the limitations of model-independent steg detection, and to explore the effects of algorithm-mismatch steg detection. We will analyze our experiments and will submit a manuscript by mid-August to the Electronic Imaging 2018: Media Watermarking, Security, and Forensics 2018 conference. We are preparing a manuscript to the IEEE Transactions on Information Forensics and Security. We are anticipating a submission to the Digital Forensic Research Workshop – DFRWS in January 2018, to target the forensic community with the database construction and population; and anticipating submitting a paper detailing the reverse-engineering results to the mobile device community. With new phone device purchases we will continue to add diverse phone model photos to our database, and use them to extend our experiments. We also are planning to collect more data on one specific model to test our hypothesis that steg detection can be model-independent and provide accurate steg detection. We will make a thoroughly-vetted limited database of images available to the public later this year, where LAS will create a webpage interface for querying and downloading images from the StegoDB. We will continue to reverse-engineer mobile stego apps and construct steg detection procedures for them, including academic-based machine learning classifiers and targeted detection should the circumstances of the app allow. We will attempt to write an Android app that will collect 10 images with one button press according to our desired image acquisition process. We will

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 111

also continue to outreach to forensic practitioners so that we can adjust our project renewal efforts in part towards the practicing forensic community; and we write manuals and final reports. Finally, we will write a manual for use with StegoDB to facilitate use of it by the forensic community.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Journal publications.

Li Lin, Jennifer Newman, Stephanie Reinders, Yong Guan. Factors correlated with improving steganalysis error rates: a database study, work in progress, to be submitted to the IEEE Transactions on Information Forensics and Security.

Stephanie Reinders, Jennifer Newman, Li Lin, Yong Guan. StegoDB: An image dataset for the practical digital image forensic community. To be submitted as a research paper to the Digital Forensic Research Workshop 2018.

Other publications, conference papers and presentations.

J. Newman, L. Lin, S. Reinders, W. Chen, Y. Guan, Y. Wang, M. Wu. StegoDB: A Dataset for Detecting Mobile Phone Steganography, accepted for presentation at the Forensic Science Error Management, International Symposium, joint with NIST, Gaithersburg, MD, July 24 - 27, 2017 (presentation only).

J. Newman, L. Lin, S. Reinders, W. Chen, Y. Wang, Y. Guan, M. Wu. StegoDB: A dataset for detecting mobile phone steganography, accepted for presentation at the Tenth International Conference on Forensic Inference and Statistics, Minneapolis, MN, Sep. 5-8, 2017 (presentation only).

Li Lin, Jennifer Newman, Stephanie Reinders, Yong Guan, Min Wu. Detecting steganography using factor-limited data, work in progress to be submitted August 15, 2017 to Electronic Imaging 2018: Media Watermarking, Security, and Forensics 2018 conference. (presentation and conference paper).

Wenhao Chen, Yanxgiao Zhang, Li Lin, Yong Guan, Jennifer Newman. Extraction of core code from Android apps for generation of stego images on computer. In preparation to be submitted to a computer engineering mobile app conference.

CSAFE Seminar, Carnegie Mellon University, July 26, 2016. (Newman)

Presentations were made at Iowa State University:

CSAFE Seminar, March 6, 2017. (Newman)

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 112

Math Club, ISU, Sunday April 9, 2017. (Newman)

Mathematics Graduate Research Seminar, Feb. 22, 2017. (Newman)

Research Experience for Undergraduate Steganography Project, Seminar, June 20, 2017. (Newman)

Research Experience for Undergraduate Steganography Project: an introduction to Matlab, Seminar, June 20, 2017. (Reinders)

WEBSITE(S) OR OTHER INTERNET SITE(S) Currently we have no internet sites for dissemination. We will list the database site on the CSAFE Data Portal once that becomes operative.

TECHNOLOGIES OR TECHNIQUES Techniques for using the images in the image database will be disseminated through the papers once they are submitted.

OTHER PRODUCTS A data set of images for use in steganalysis is being constructed and the data increases each month. Once the data is organized and cleaned of extraneous effects, including images not appropriate for inclusion in the database, it will be ready for public use later in the year of 2017. The LAS server will host this database. It will be searchable through a query and a set of images can be downloaded for use in research activities.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Dr. Jennifer Newman, Project PI, has worked one person-month on this project for the year June 1, 2016-May 31, 2017. She is the main leader of the research, and works • closely with Dr. Yong Guan and Dr. Min Wu. Dr. Newman is considered the expert on steganalysis and steganography for the research team. She is responsible for producing written reports. She is not collaborating internationally. She received salary support during the summer 2017. Dr. Yong Guan, Project Co-PI, has worked one-half of a month on this project for the year June 1, 2016-May 31, 2017. He provides expertise in cyber forensics and mobile • apps, especially outside of steganalysis. His unique perspective on outside disciplines contributes to novel ideas within the scope of our research topics. This person is not collaborating internationally. He received salary support for his work. Li Lin is a PhD candidate in the Department of Mathematics at Iowa State University and is one of the three students supported by this project’s funding. His role in the project is to •

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 113

provide solutions for the StegoDB design, evaluation and validation processes and measures. He has a solid background in statistics. He is not collaborating internationally. Li has received graduate research assistant salary during this time. Stephanie Reinders is a PhD candidate in the Department of Mathematics at Iowa State University and is one of three students supported by this project’s funding. • Stephanie’s role in the project is to provide solutions for the StegoDB relating to the construction of the database and its software code, and to create and organize the usage procedure for data collection using the mobile phones. She is also responsible for conducting experiments on algorithm-mismatch hypotheses. She is not collaborating internationally. Stephanie has received graduate research assistant salary during this time. Wenhao Chen is a PhD candidate in the Computer Engineering program at the Department of Electrical and Computer Engineering at Iowa State University. He is one of • three graduate students supported by this project, since January 2017. His role in the project is to provide knowledge and experience in the code aspects of stego apps particularly in reverse-engineering of apps and emulating their code execution on computers to provide our group with knowledge of the apps’ algorithm as relates to producing stego images. He provides expertise in developing automated tools that include the core functions of the stego apps. He and Yangxiao Wang have experience in using Dex2Jar and APKTool that may help us convert Dexcode format of the stego apps into Smali and/or Java formats. In this format, the code can be run on a conventional computer. He is also responsible for keeping the computers maintained and updated for the specific performances we require for our database and steg detection algorithms. Dr. Min Wu, has worked one-half of a month on this project for the year June 1, 2016-May 31, 2017. Dr. Wu’s role is to provide expertise in the area of micro-signal modeling in • image data, provide ideas to further innovation in our project, and collaborate in manuscript preparation and conference direction with Dr. Newman and Dr. Guan. Yangxiao Wang is an undergraduate student in the Computer Engineering program at the Department of Electrical and Computer Engineering at Iowa State University. He will • be a senior this upcoming year. His role is to provide knowledge of app code and Dex2Jar, and help write phone apps should we need them (expected in this upcoming year’s tasks).

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? We have exchanged information with CSAFE researchers at Carnegie Mellon University. We have also contact with professionals at SWGDE and DCCI. None of these collaborations have involved any financial or in-kind support.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 114

Iowa State University has awarded Jennifer Newman a Faculty Professional Development Assignment (sabbatical) for the last six months of 2017 at Carnegie Mellon University to provide more time to dedicate to her research.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The need for large---scale standardized corpora (datasets) in forensic science was established by the National Academy of Sciences in its 2009 report [1]. There is no dataset currently established to provide a standard, authenticated image dataset for benchmarking steganalysis tools by forensic scientists. Our goal is to provide an image dataset useful for both crime lab forensic researchers and the academic community (“forensic scientists”) so that performance of the reliability and reproducibility of steganalysis software can be evaluated. This may address Daubert’s requirement that an expert’s testimony pertain to “scientific knowledge” and help establish a standard of “evidentiary reliability” (Rule 702, Testimony by Expert Witnesses, Daubert v. Merrell Dow Pharmaceuticals, Inc. 1993), for steganalysis of image data.

The images that we have collected for our StegoDB have very unique properties, including:1. We used cell phones to collect photos;

The photos are in RAW format as we used phone apps that allowed RAW image types to be collected; whereas no other database of this type has specifically collected any cell phone data;

We have authenticated information on each image: EXIF data, indoor/outdoor scene; and in particular, much of our data has been collected using specific ISO and exposure time that were purposefully set using the phone app. We also collected photos using the “auto” setting of the camera app, but most of our data is taken using pre-determined ISO and exposure settings. No other database has images taken under these conditions. All data in other image databases available publically were taken using the “auto” setting, using still cameras.

We have almost 45,000 original images and approximately 70,000 other images. We will vet these images and select a subset to be uploaded to the server database for public access, later this year.

We now have a number of stego images created by one stego app (PixelKnot) and are in the process of adding more stego images from a second app. The PI has not been able to find any other published research where stego app data has been used in steg detection applications.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 115

By taking images with specific ISO and exposure times, we have been able to show that these factors statistically affect (by reducing) the error rate. This would not have been possible without the use of the data we have collected. Our experiments using algorithm cross-detection are also showing some surprising results, again due to the nature of the data that we have in our database. We have collected the data in a well-planned and meticulous manner, so that steganalysis experiments can be conducted by varying only a few parameters and determining their impact on error rates. The experimental procedures we use are well-documented and should be reproducible by others in the steg community. We hope our results may lead the steganalysis community to explore other avenues of research not yet conceived.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Within the area of steg detection, it may be possible to construct model-specific steg detectors that could help in determining if a particular mobile phone steganography was present in a photo. Outside the area of steganalysis, forensic image analysis could be impacted due to the controlled nature of the photos in our database. Those doing camera identification, forgery, and other image forensic topics may be interested in using the data in this database for research purposes. In addition, it may impact forensic practitioners or commercial software companies who could use this database for benchmarking commercial and/or internal software programs for steg detection.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? This database is the first of its kind in this field, and the importance of such databases to the field of steganalysis and forensic imaging will be incorporated into the digital image forensics class taught at Iowa State University.

WHAT IS THE IMPACT ON PHYSICAL, INSTITUTIONAL, AND INFORMATION RESOURCES THAT FORM INFRASTRUCTURE? This database will be part of the reference datasets that CSAFE will manage through its Data Portal, and it will be available to the public for research, teaching, and other uses.

WHAT IS THE IMPACT ON TECHNOLOGY TRANSFER? This research does not provide new technology, but could be utilized to benchmark new commercial software programs that use steganalysis.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? We hope that this research will provide a valuable resource for the digital image forensic community and show that it is important to consider details of photo acquisition as part of the database design.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 116

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? No money has been spent in foreign countries.

Changes/Problems Nothing to Report.

References 1. Liu, Xinhao, Masayuki Tanaka, and Masatoshi Okutomi. "Practical signal-dependent noise parameter estimation from a single noisy image." IEEE Transactions on Image Processing 23.10 (2014): 4361-4371.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 117

Project J - Mobile App Forensic Analysis

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Yong Guan (ISU)

Other Investigators: Neil Gong (ISU)

Graduate Students: Chris Cheng (Ph.D. student ISU), Le Zhang (Ph.D. student ISU), Zhen Xu (MS student)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? We are building an automated Android app analysis tool to discover all the possible evidences the app generates in the forms of files in the local storage, SQLlite DB, and informaiton sent to remote 3-party server(s). With it, we will work with NIST to establish a dictionary-like database that includes apps and all the possible evidential data (type, location, and data format) that apps generated and stored on the mobile devices or remote servers.

ACCOMPLISHMENTS: WHAT WAS DONE? WHAT WAS LEARNED? 1. Requirement analysis, problem formulation, and initial solution designs

We have completed the initial requirement analysis and prototyped a working solution (static App Analysis) that allow us to complete a small scale set of evaluations. From the experimental evaluations, we have identified several critical challenges/problems to be addressed to make the tool more effective and cover most Apps. These problems have been formulated and have been described in detail in our Year 3 proposal.

2. A software tool based on static and dynamic analysis approaches for Android mobile app analysis;

We have completed the static app analysis tool and are working on extending it into the tool that can cover a larger set and types of apps. A limited functional dynamic analysis tool is being built. Right now, the taint tag propagation has been completed. We have performed evaluations using 5 selected large apps.

3. Technical report including use, evaluation and validation.

We have completed an initial set of evaluations and included in the tech reports. We are refining the reports into the papers to be submitted to IFIP 11.9 Digital Forensic Conference and IEEE Symposium on Security and Privacy.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 118

4. Manuscripts to journals; conference presentations

We have presented the project and some initial results at IEEE CNS Workshop. Two papers are being prepared to be submitted in August/September.

5. Consult NIST & crime labs for effective use of mobile app analysis tool in their labs.

We have been closely working with NIST technical collaborators since the start of the project. Once the tools are mature, we will work crime labs for further evaluations and use in actual crime case work.

In Year 2, we have prototyped a static program analysis tool that can analyze the Android apps that are not too large nor too complex, such as Yahoo Finance, Facebook Lite, MyShift, etc. We have manually evaluated our static tool using about 2000 apps from the 110,000 apps downloaded from GooglePlay store as well as a couple of other app mirror sites. The following figure shows the big picture of our static program analysis tool:

We analyzed the experimental results with the purpose of answering two questions: possible errors and completeness. We have implemented the tool using two methodologies: Backward tracking and Forward monitoring. Backward approach is efficient, but has limitation in handling alias (or call by reference, such as using object as method arguments). The issue is that we have difficulty in discovering the data types of the evidence due to the app program uncertainty in callback sequences. Forward approach can handle such situation better, but the performance is not as good as backward approach, due to the search of some unnecessary program paths. Also, sink and source method completeness are critical in reducing/eliminating both errors and guaranteeing the completeness of discovering all the possible evidences from the apps being investigated. Thirdly, the use of 3rd-party library is

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 119

common in many Android apps. After compilation, the code from the 3rd-party library are merged with the app’s code together. Analyzing apps that share the same 3rd-party library will repeat the same analysis again and again, which increase the time and performance of our static program toolkit. Lastly, our tool is based on Soot, which has inherent limitations in handling some apps. In Year 3, we plan to continue to work on the development solutions that tackle these problems, as well as automating the process in extracting information about apps and getting them written into the app database.

Meanwhile, we are working on the ART-based dynamic program analysis tool with the goal of analyzing large apps like Snapchat, Instagram, Wechat, Twitter, Facebook, etc. We have finished the taint propagation part of the tool. Our dynamic tool works as illustrated in the following figure:

In our dynamic analysis toolkit, we use the adb logcat that serves the center of the analysis of apps and communicates the tainted data as well as tags with emulators and real devices. Our tracking component dumps the information to the logcat and those information will be printed to a file for later off-line inspection. Prototype of taint propagation completed. Our plan on Dynamic Analysis tool in Year 3 is to add more types of flags to trace data types from the source to sink, and to improve the performance of app analysis by creating separate analysis components that runs within the Android emulator and move part of the functions outside on the same analysis (machine) to speed up the analysis process.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? PI Guan gave an invited talk on the mobile app analysis project at the Network Forensics Workshop of the IEEE Communication and Network Security Conference (CNS 2016). The supported RA student Chris Cheng will present our preliminary results on the Forensic Science Error Management International Forensics Symposium.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 120

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? PI Guan gave an invited talk at the Network Forensics Workshop of the IEEE Communication and Network Security Conference (CNS 2016). The supported RA Chris Cheng will present our preliminary results on the Forensic Science Error Management International Forensics Symposium, 2017.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? For Year 3, we will base our initial Year 2 success in both static and dynamic app analysis tools, and plan to conduct the following research tasks to tackle the challenges and problems. Task 1 focus on static app analysis tool while Task 2 on ART-based dynamic analysis tool.

Task 1: Static App Analysis

Task 1.a. Source and Sink Methods Classification and Categorization: The coverage of sink methods majorly decides the result completeness, and the coverage of source methods determines the result precision of evidence data types and syntax. We plan to perform the source and sink method classification work such that we can have an (almost) complete set of source and sink methods. The results from this task will benefit both static (Task 1) and dynamic (Task 2) analysis of the project.

Task 1.b. Call Graph and Event Sequence Modeling: We plan to extend the support of Fragment in Android OS, trim down the call graph to improve the efficiency in static app analysis, and include inter-app communication model in the call graph.

Task 1.c. 3rd Party Library Analysis: We plan to perform research on the 3rd party library parsing, summarization and isolation. The challenge is that the code from the 3-rd party library are often mixed with the app’s own program. If we ignore it, the 3rd party libraries such as Ads, often significantly wastes the time and resource in our app analysis. Also, Soot usually fails to retrieve the 3rd party library and treat it as a phantom class. We plan to create our own summary for each 3rd party library in order to improve the completeness and efficiency of our app analysis.

Task 2: ART-based Dynamic Analysis

Task 2.a. Enhancing Taint Propagation by Adding Types of Tags: We have prototyped the taint propagation with a set of identified taint types in Year 2. To make the tool more complete, we plan to add more taint tag types and figure out a solution that allows the more tag types to be added to the heap space of the process.

Task 2.b. Dynamic Analysis Using Real Device: We plan to use real phone device with granted full features to do the dynamic analysis on certain apps. This will allow app being tested to access real sensors on the deice. Some applications can detect if they run on an

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 121

emulator or a real phone device. If on an emulator, the app may attempt to hide its true functional features. There is a limitation that with one phone, only one app can be analyzed.

Task 2.c. Dynamic Analysis Using Emulator: We plan to carefully re-design the ART-based app analysis tool by keeping analysis components that runs within the Android emulator and moving other functions outside to speed up the analysis process. The advantage is that running the dynamic app analysis on an emulator is of much lower cost. In theory, one desktop can run many emulators at nearly no cost, while one phone can only do one task at a time.

Task 2.b and 2.c will be combined to use to analyze apps to cross-check whether apps behave differently in term of generating possible evidences.

Year Task CSAFE Deliverable(s} Delivery Number Schedule Investigators

(Yong Guan and Neil Gong)

2018 Task 1 Chris Cheng will lead Working prototype of Static Analysis tool. June 2018 Task 1, in particular Task A report details that evaluation results 1.b and 1.c. One more are done over a large number of apps. student to be added for Task 1.a. A report on classification of source and sink methods, and as input for both Tasks 1 and 2.

2018 Task 2 Le Zhang will lead Task Working prototype of ART-based June 2018 2. The third new student Dynamic Analysis tool. will help Task 2.a. A report details the results from the analysis of large apps such as Snapchat, Facebook, Instagram, Wechat, Line, etc.

We will work with NIST collaborators to perform analysis on the apps collected by NIST.

Products

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 122

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS We had a paper submitted to DFRWS 2017, but got rejected due to the small scale experimental evaluation using 20 apps from DroidBench. The reviewers suggested we conduct experiments using at least hundreds or thousands of apps.

TECHNOLOGIES OR TECHNIQUES Our project ultimately aims to develop a new path-, context-, flow-sensitive Android application analysis tool to report a complete list of forensic relevant information (evidential data) generated from the apps on a mobile device, by parsing the mobile application installation package. To parse what information are sent out or stored in the local storage, we start the analysis from the sink methods. Through tracking and trying to recover the argument in the sink method, we are able to report the source and syntax of the evidential data. With the help of our developed tool, forensic investigators will improve their case works in terms of better efficiency, precision, and completeness, compared against the current mobile device forensic practice that perform (to some extent) a manual process via Cellebrite and other mobile forensic tools.

The basic approach of this project is doing a backward and forward tracking on the argument(s) in the sink methods that lead to the generation and storage of evidence data. In the last several months, we improve the design of tracking coverage and performance of our prototyped analysis tool. Through the experimental evaluation of the apps analysis results and reviewing the working mechanism inside the tool, we found several limitations including that a Java field instance - used as a global variable, usually lose the tracking information when the analysis reaches the callback method. Since the static program analysis is off-line analysis, without the runtime environment, it brings a challenge to the field instance analysis.

Also, to handle the case which needs either the user or the system event information to proceed with the tracking process, we initialize the analysis by summarizing the side-effect of each variable in the method. Thereby, when a certain tracking stops at the top of a callback method, the method summary can provide us the hints about the evidence data of interest when other user or system callback method is invoked. Under the Android framework, we are able to import the result summary from other reachable callback methods. Thus, the field instance tracking can continue proceeding with further steps using the information in the summary.

After parsing the targeted argument, we are facing a set of challenges due to the diversity in the tracking results. For instance, when an app writes the evidence data to the file, the data type is usually either a String or Byte array. Both of these can be reconstructed or partially updated, thus, the recover processing requires a special and appropriate step to parse the partial (set of) values and their syntax information. We currently adopt the approach by

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 123

combining the side-effect analysis and String analysis together to recover the all possible syntax of the String or the byte array. We will evaluate the effectiveness and improve it as one of our next steps.

INVENTIONS, PATENT APPLICATIONS, AND/OR LICENSES Our new path-, context-, flow-sensitive Android application analysis techniques may be patentable. We will work with ISU Tech Transfer Office on the patent application, when we finish a comprehensive.

OTHER PRODUCTS With the tool, we will build a database for the apps that include the details about types of evidence, locations, and data syntax. The database can serve as a dictionary for law enforcement practitioners to look up what types of data can be discovered, where they are, and how they are formatted.

By considering the general Java file standard APIs as sink methods, we have used our static analysis tool to analyze 2,000 real-world Android applications. The table below shows our analysis results for several example apps.

Write App Name Format Data Type Data Syntax FilePath or Network URL SQLite Table

/data/data/[package.name]/files/mqtt_analytics

File Buffer String Journal libcore.io.DiskLruCache... /data/data/[package.name]/files/journal.tmp

File Buffer String Journal libcore.io.DiskLruCache... /data/data/[package.name]/files/journal

File Buffer String Journal libcore.io.DiskLruCache... /data/data/[package.name]/files/journal.bkp

String Device ID [java.util.UUID] /data/data/[package.name]/files/mqtt_analytics.FBNS

String Unknown /data/data/[package.name]/files/[randomUUID]

String Device ID [randomUUID()] /data/data/[package.name]/ACRA-INSTALLATION

com.graceful3715.myshift- [android.graphics.Bitmap 50.apk byte[] UnKnown getDrawingCache()] /mnt/sdcard/[package.name]/calendar/shift.png

SQLite Entry UnKnown /data/data/[package.name]/databases/appc.db appc_regist_cpi

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 124

SQLite /data/data/[package.name]/databases/appc.db appc_regist_cpi

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg1

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg1

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg1

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg1

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 125

SQLite Entry UnKnown /data/data/[package.name]/databases/myshift.db ctg2

SQLite ca.mcgill.CL2Go-2.apk Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 126

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

SQLite Entry Network /data/data/[package.name]/databases/CL2Go.db Contacts

com.coreapps.android.follow http://m.core- me.springtimeexpo2011- apps.com/[2131165217]/android/friends?device_id=[a 80.apk byte[] DeviceID {\"friends\":[]} ndroid_id]&ts=1

SQLite /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

SQLite UserScheduleItem Entry UnKnown /data/data/[package.name]/databases/user.sqlite3 s

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 127

We also used our dynamic analysis tool to analyze several apps. The table below shows the analysis results.

App Name Results

Facebook Detect file-to-file flow (20+)

Snapchat Cannot run due to the absence of Google Play Service

Wechat Detect file-to-file flow (67)

Twitter no catch

Instagram Detect file-to-file flow (459)

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? PI Guan is leading the group consisting of two faculty and three graduate research assistants. He oversees the major research activities on static taint analysis and dynamic analysis on Dalvik/ART platforms. He contributes one per month on the project.

Co-PI Gong works with the dynamic analysis and develops the experimental evaluation plans. He contributes one per month on the project.

RA Chris Cheng is a Ph.D. student who joined the group in August 2016. He is being supported by the Departmental TA and NIST CSAFE funds in the Spring. He focuses on static taint analysis. He contributed 10 hours a week to the project (equivalent to 1.8 person months).

RA Le Zhang is a Ph.D. student who started the work in May 2016. He is a ½ time RA supported by CSAFE Fund. He focuses on dynamic analysis using Dex code. He contributed 20 hours a week to the project (equivalent to 3.5 person months).

RA Zhen Xu is a MS student who started the work in May 2016. He is a ½ time RA supported by CSAFE Fund. He focuses on dynamic analysis on ART platform. He contributed 20 hours a week to the project (equivalent to 3.5 person months).

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 128

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? NIST Division, Barbara Guttman and Jim Lyles.

Eoghan Casey (University of Lausanne), formerly at DC3, gave insightful suggestions to the project, at the early stage of the project.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The outcome from this research will help digital forensics practitioners to speed up the case investigations by reducing complexity and providing completeness guarantees in searching and discovering evidences from mobile devices, thereby deliver timely investigative results and reduce backlogs at crime labs. Specifically, the toolkit can take any Android app as input and return the set of possible evidences, their locations (locally or remotely), and data formats, on the mobile device being investigated.

With the tools, we will create a database including app-generated evidences. DF practitioners can use the database in a similar way of using Wikipedia or a dictionary to search for expert knowledge about the apps on the device being investigated, instead of manually searching for evidence, which is tedious and takes time.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? The techniques and database will help to train the digital forensics practitioners as well as the college students who plan to join the digital forensic workforce after graduation. The materials will be integrated into CprE 536 Computer and Network Forensics course, and will share it with all the DF educators through the planned training workshop.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None.

Changes/Problems None.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 129

Project S - Statistical Methods for Change Detection Over Time in Digital Forensics Data

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Padhraic Smyth, UCI

Other Investigators: Chris Galbraith (PhD student), UCI

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The overall goal of this project is the development of statistical methodology for analysis of individual-level event data in a digital forensics context, where the data is obtained in the form of native data from devices such as computers and mobile phones. Our focus is on event time-series data, typically of the form , where action types can include Web navigation actions, Web searches, sending of emails or text messages, social media posts, files edited, and so on. The project is primarily focusing on the development of statistical methods for (1) quantification of dependence between event streams (2) detection of significant changes over time in event streams.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? Activity 1, Research on Statistical Methodology: In FY17 the focus of our work in Q1 was on the investigation of statistical methods for change detection in event time-series data. We developed a statistical method for change detection in event time-series data, based on hidden Markov models with Poisson-distributed observations. We implemented this approach in software and evaluated the method on multiple different event data sets. Our primary finding was that these models appear provide a useful and robust baseline against which to compare more complex methods. However, lack of ground truth is a challenge in evaluating the accuracy of change detection techniques for individual event data.

As a result of our ongoing research (as well as based on feedback we obtained while attending the NIST Forensics Conference in November 2016), in Q2 we switched our primary focus from (1) the problem of statistical change detection (for a single event stream) to (2) the problem of detecting consistency or dependency among multiple event streams. . (These two problems, change detection and data consistency, were listed as the two primary areas of investigation for our project in the multi-year plan approved by NIST in Summer 2016). We realized that the latter problem was amenable to the application of statistical analysis methods and that the development of such methods had the potential to be a useful tool in a digital forensics context. In particular, in Q2, our research focused on the problem of determining if two event streams were generated by a single source or two different sources

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 130

and developed a statistical methodology using score-based likelihood ratios. The method we developed is based on a statistical methodology known as marked point processes and the particular technique we used for constructing likelihood ratios was based on the fraction of neighbors of events that are of the same type or of a different type. Using real-world student- generated event data from laptops and phones we demonstrated hold-out true positive rates in the range of 85 to 95% and false positive rates of 3 to 10% (depending on which specific methodology we used).

During Q3 we focused our efforts on writing a paper for submission to the Digital Forensics DFRWS 2017 conference and conducting the relevant research. The paper was subsequently accepted for presentation at the conference.

In Q4 2017 we extended the general methodology that we had developed in Q2 and Q3 to take into account the distribution of inter-event times between events of type A and their nearest neighbor (in time) of type B. In experiments with student-generated event data from laptops and phones we found that using inter-event times to discriminate between same- source and different-source event streams, based on a data set of Web browsing events from student laptops and phones, leads to a similar accuracy compared to one of the near- neighbor methods of our prior work (the segregation index, with a 94% true positive rate and 3% false positive rate) and is significantly more accurate than the other near-neighbor method in our prior work (the mingling index, which had a 86% true positive rate and an 8% false positive rate). The main finding was that inter-event times provide a useful basis for detecting same-source and different-source event streams.

In summary for the past year of work on this project, we have developed a new and general statistical methodology for quantifying the likelihood that two time-stamped event streams were generated by the same source (individual) or were generated by different sources (individuals). Experimental results, using both simulated and real user data, and concluded that the proposed approach has significant discriminative power on these data sets and shows promise for applications in digital forensics.

Activity 2, Publications and Presentations:

• Wrote and submitted a 10-page paper on the work described above to the DFRWS 2017 US Conference. The paper was peer-reviewed by 6 reviewers and was accepted for presentation by the conference (the overall acceptance rate for submitted papers was less than 30%). The paper was revised based on reviewer comments and will be published in a special issue (of papers from the conference) of the journal Digital Investigation. • Participated in the CSAFE Digital Forensics Workshop in Arlington, VA, in May 2017.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 131

• Chris Galbraith and Padhraic Smyth submitted an abstract (accepted for presentation) on the research in this project, for the 10th International Conference on Forensic Inference and Statistics, due to be held in Minneapolis in September 2017.

Activity 3, Data Collection:

During Q2 we determined that we did not need UCI IRB approval to work with an existing NSF-UCI data set of user events since the data had already been de-identified. This data consists of time-stamped events recorded on computers and mobile phones of approximately 120 different individuals over a period of about 1 week, with an average of 3000 different events for each individual. In January 2017 we received approval from the Director of the Human Subjects Protection Office at NIST (Anne Andrews) that we could use this data, without a NIST IRB review, given that the data is de-identified. This data set proved to be very useful as a testbed research data set in the research described under Activity 1, where we used it to develop and evaluate statistical methods that can determine how likely it is that multiple different event streams (e.g., from different devices or different accounts) are related to each other.

We continued to work on development of the documentation for submission to the UCI IRB to seek approval for a study to collect additional data from student laptops and mobile phones at UCI. The IRB documentation is not yet ready for submission.

In terms of the status of data deliverables in Year 2:

• Progress report on initial testbed data collection; provide to NIST collaborators. A report in the form of a set of Powerpoint slides and a verbal update by conference call was provided to Dr. Barbara Guttmann, James Lyle, and Alex Nelson (all from NIST) on June 29th 2017. • Provide copy of initial testbed data set to NIST collaborators. The data has not yet been requested by NIST collaborators but is available on request.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Under the mentorship of Dr. Smyth, PhD student Chris Galbraith has gained expertise in (a) the broad field of digital forensics, (b) specific data analysis questions and challenges within the field of digital forensics, and (c) general-purpose statistical models and methodologies for analyzing event time-series data.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 132

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? • We presented a summary of our CSAFE-sponsored research work, on statistical methods for event data analysis, at the Forensics@NIST Conference held in November 2016. • Padhraic Smyth participated in the CSAFE Digital Forensics Workshop in Arlington, VA, in May 2017, organized by NIST, and presented a talk describing the work on this project and its potential benefits to practitioners in digital forensics. As part of this workshop he engaged with digital forensics practitioners, CSAFE researchers, and NIST personnel, on general discussions about the role and applicability of statistics in the field of digital forensics. • Chris Galbraith attended the CSAFE Annual All-Hands Meeting at Iowa State University in June and presented a brief talk and a poster about this project.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? Activity 1, Research on Statistical Methodology: Next year we will continue to investigate and develop our statistical methodology for computing score-based likelihood ratios for determining whether two event streams were generated by a single source or two different sources. In particular, we will extend the methodology using inter-event times, which we expect will provide more discriminative information relative to our earlier approach which relies on counts of neighboring event types. We also plan to investigate the use of randomization techniques for the problem of detecting whether two event streams were generated by the same source or by different sources – in particular we will investigate the development of statistical methodologies that do not require a large population sample to create a reference model (unlike likelihood-ratio methods) and that instead rely on randomization methods given only two event streams to analyze.

Activity 2, Publications and Presentations: We plan to present our research results at the International Conference on Forensic Inference and Statistics, in Minneapolis in September 2017. We also will begin work on a journal paper describing our research on using inter-event times for detecting whether two event streams were generated by the same source or by different sources. In addition we hope to be able to submit results on new research work in Year 3 to a Digital Forensics conference in 2018, such DFRWS-18.

Activity 3, Data Collection: We plan to complete the documentation and submit an IRB application at UCI for collection of a new event data set of time-stamped event data from student subjects. We plan to conduct a study where we will gather data from approximately 100 students. This new data set will span periods of multiple months per user, compared to the current data set we are using which only spans 1 week per user. The longer time-span will allow us to investigate more powerful statistical techniques and to evaluate them in more realistic scenarios in the context of the practice of digital forensics. If the IRB application is approved at UC Irvine we will submit it to the NIST IRB for approval.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 133

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Analyzing user-event data using score-based likelihood ratios with marked point processes, Christopher Galbraith and Padhraic Smyth, Digital Investigation, to appear. (Note: this is the same paper (updated from our original submission, following reviewer comments) as our DFRWS 2017 conference paper: DFRWS papers are published in the journal Digital Investigation).

TECHNOLOGIES OR TECHNIQUES As reported under Activity 1 accomplishments (above), we developed a new statistical methodology for quantifying the likelihood that two time-stamped event streams were generated by the same source (individual) or by different sources, based on statistics computed from event inter-event times. The methodology is based on a modeling framework in statistics known as marked point processes and can be used to generate score-based likelihood ratios that indicate how likely it is that the event streams were generated by the same source or by two different sources.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Padhraic Smyth, Professor, Departments of Computer Science and Statistics, UC Irvine is supported at the level of one academic month per year on the project. Dr. Smyth manages the overall project, conducts research related to the project, supervises the graduate student participating in the project, interacts with relevant CSAFE investigators and NIST staff when needed, and produces reports, papers, documentation, slides presentations, etc, of relevance to the project.

Chris Galbraith, PhD student, Department of Statistics, UC Irvine, is supported at 50% time for 9 months per year of the project. He conducts research for the project, including literature reviews, development of statistical models and methodologies, development and validation of software for fitting models to data, documentation of software and experimental results, and general support for reports, papers, presentations, as needed.

Hal Stern, Professor, Department of Statistics, UC Irvine (and UCI PI for the CSAFE Project) has provided technical input and advice on statistical aspects of this project.

None of the individuals listed above have any international collaborations as part of this project.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 134

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Barbara Guttman at NIST is the technical contact at NIST for this project. Dr. Guttman has provided feedback (by phone, by email, and in person) on relevant aspects of current digital forensic research and practice, providing context for this project and providing feedback on current and future research topics being investigated under the project.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? This research has the potential to both (a) assist forensic investigators improve the quality of information being extracted from devices during digital forensic investigations, and (b) to potentially provide more quantitative statistical support for evidence and arguments presented in court related to digital traces of user behavior on a device.

The methodological aspects of our proposed research may lead to a broader awareness of statistical issues in digital forensics and potentially bring new researchers into the field. Subject to privacy restrictions, this project has the potential to provide testbed data sets that can be used by other researchers in the digital forensics area.

More specifically, in terms of transitioning our research work into the practice of digital forensics, we plan in the future to share our techniques and software with the open-source digital forensics software community, e.g., by speaking about our work conferences such as the Digital Forensics DFRWS Conference, publishing in journals such as Digital Investigation, and by making our software publicly available as open-source code for others to use in the digital forensics community.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? The statistical analysis techniques we are developing could in principle be applied more broadly to the analysis of multiple time-series of event data, with potential applications in areas such as computer security, Web data analysis, medicine, and social science.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None.

Changes/Problems As described earlier in Section A, we switched our primary focus on this project from our initial focus (inQ1 2016) on the problem of statistical change detection (for a single event stream) to a new focus (from Q2 onwards) on the problem of detecting consistency or dependency among multiple event streams. These two problems, change detection and data consistency,

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 135

were listed as the two primary areas of investigation for our project in the multi-year plan approved by NIST in Summer 2016. In effect, upon realizing that the data consistency (or “multiple source”) problem was both (a) amenable to the application of statistical analysis methods and (b) likely to be of interest and practical use to practitioners in digital forensics context, we switched from the change detection to the data consistency problem.

CSAFE Annual Report 06.2017 DIGITAL FORENSICS 136

BLOOD PATTERN ANALYSIS

Project P – Combining Fluid Dynamics, Statistics and Pattern Recognition in Bloodstain Pattern Analysis, to Quantify Spatial Uncertainty and Remove Human Bias

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Daniel Attinger (ISU)

Other Investigators: Kris De Brabanter (ISU), Shih-Fu Chang (Columbia University) and Greg Gillen

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The objectives of the proposed study are four-fold:

a) Assemble a world-class interdisciplinary team of experts in statistical learning, fluid dynamics and pattern recognition. b) Quantify the spatial uncertainty in bloodstain pattern analysis (BPA). Current crime scene methods neglect key physics such as gravity and drag, and do not allow rational estimation of the uncertainty. Very recently, physics-based algorithms have been developed to predict the region of origin of the stains by inspection of a blood spatter. This is a significant advance which makes it possible to estimate and propagate uncertainties in the determination of the region of origin. The team will therefore develop algorithms that propagate the uncertainty and represent its spatial magnitude in a 3-D space. c) Develop robust pattern recognition algorithms to identify the generation mechanism of an unknown bloodstain pattern. The classification will rely on a database of spatters produced under known conditions –including gunshot, cast-off, dripping blood, transfer stains. This will advance the state of the art, where classification is done by the human investigator. Here again the classification algorithm will provide automated interpretation and uncertainty quantification. d) Compare the accuracy and uncertainty between physics-based trajectory reconstruction methods (this project) and mainly statistical methods (project AA). To this end, every single blood spatter generated under known conditions for this project will be provided to team AA, together with extensive description of the physics and fluid dynamics of BPA. However, the backward reconstruction code specifically developed in this project will not be provided to team AA.

CSAFE Annual Report 06.2017 BLOOD PATTERN ANALYSIS 137

ACCOMPLISHMENTS: WHAT WAS DONE? WHAT WAS LEARNED? We asssembled an interdisciplinary team of experts in statistical learning, fluid dynamics and pattern recognition. We have also continued our literature study.

A series of gunshot bloodstain experiments were performed for two different bullet types (XM193 FMJ & HP BEE), with stain patterns collected for distances of 0.1 m, 0.3 m, 0.5 m and 1.0 m. The stains were collected in the vertical and horizontal configurations. The 2D+ approach to trajectory reconstruction was taught to the new postdoc hired in the project. The method is based on statistical analysis of all the possible trajectories from a predetermined set of stains, to identify the most probable region of origin of a blood spatter. The trajectory reconstruction includes drag and gravity forces, improvement current methods that assume that drops travel in straight lines.

A robust Matlab code has been written to perform backward trajectory reconstruction by 2D inspection of stain features (2D+ approach). Tise 2D+ method is being compared to the currently used commercial software, HemoSpat. Of the previously conducted backspatter experiments, about 10 spatter datasets (each with about 20,000 stains) have been digitized in high resolution, assembled and stored temporarily in an online repository. We are also running a consistency check of 2D+ code. Because of the promising results using the 2D+ approach, in terms of its ability to reconstruct the region of origin, we have postponed the production of the database of errors associated with trajectory reconstruction based on 3D measurements. The rationale is that if a 2D method works (simpler to implement, and more likely to be transitioned to real crime scenes), we should go for that simpler route. Once the 2D+ method is established, a database of errors based on stain features will be developed.

The meeting with NIST counterpart Greg Gillen has not yet taken place, for the same reason as above, that we are now reconstructing trajectories based on 2D rather than 3D measurements. We however had significant interactions with other NIST counterparts, e.g. Sue Ballou on practical testimony aspects of crime scene analysis, and Will Guthrie about advanced visual documentation of blood spatters.

We considered the estimation of the density of a multivariate response, that is not observed directly but only through measurements contaminated by additive error. We focus on the realistic sampling case of bivariate panel data (repeated contaminated bivariate measurements on each sample unit) with an unknown error distribution. Several factors can affect the performance of kernel deconvolution density estimators, including the choice of the kernel and the estimation approach of the unknown error distribution. We show that the choice of the kernel function is critically important and that the class of flattop kernels has advantages over more commonly implemented alternatives. We describe different approaches for density estimation with multivariate panel responses, and investigate their performance through simulation.

CSAFE Annual Report 06.2017 BLOOD PATTERN ANALYSIS 138

We examined competing kernel functions and describe a flat-top kernel that has not been used in deconvolution problems. Moreover, we study several nonparametric options for estimating the unknown error distribution. Finally, we also provide guidelines to the numerical implementation of kernel deconvolution in higher sampling dimensions. We developed an open source R package called “fourierin” intended to numerically compute Fourier-type integrals of functions of one or two variables with finite support in several points.

A database of blood spatters has been produced and made available to CSAFE members. The database has been presented at the CSAFE all hands meeting early June 2017. The team has scanned and assembled a spatter database of 60 cases, scanned at 600 DPI. The database includes 13 gunshot backspatters, and 47 beating spatters. We took great care to make the beating spatters relevant to real criminal cases, by using a variety of contact mechanisms between the blood pool and the beating instrument. It is indeed, as mentioned by one of the reviewer of this proposal, important that the spatters used for research are as close as possible from real cases. Reagarding the gunshot spatters {Comiskey, 2017 #20318}, where the fluid dynamics is more complicated because of the interaction of muzzle gases and atomized drops, we took a different route: we suppressed the muzzle gases {Comiskey, 2017 #20318} to simplify the physics and allow comparison with our model (which while not considering muzzle gases is still the first sound model of atomization and trajectory reconstruction during gunshot). Only then will we consider the additional effect of the muzzle gases. This is standard engineering research practice, to start describing a simpler problem, validating the method and then move to the more complex problem. The spatter database is available to compare various methods of crime scene reconstruction (tangent method, method of strings, Hemospat or other software, and the 2D+ code)

The database contains 60 high-resolution scans (600 DPI) of blood spatters. See Figure 1 for an example of the level of detail on one specific gunshot backspatter.

CSAFE Annual Report 06.2017 BLOOD PATTERN ANALYSIS 139

Figure 1: Example of one of the 60 blood spatters produced and scanned in this CSAFE project. Insets are from the same blood backspatter, at different level of details. The resolution is 600 DPI.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? None yet.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? The database has been presented at the CSAFE all hand meeting.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR REPORTING PERIOD TO ACCOMPLISH THE GOALS? 1) 2D+ methodology a. quantify bias in the area of convergence, explain the bias by fluid dynamics principles, and propose an algebraic correction for the bias. b. Publish this result in a forensic journal, as a new method to calculate the area of convergence. c. Integrate automated image processing with 2D+ code stain data collection

d. Meeting with Chang at Columbia University to design an advanced pattern recognition and evaluate techniques related to the characterization of stain shapes. e. Contacts with Sue Ballou and Will Guthrie to assess the relevance of the 2D+ method for crime scene reconstruction.

CSAFE Annual Report 06.2017 BLOOD PATTERN ANALYSIS 140

2) Spatter database a. Write a journal article describing and advertising the database, so that it is available to the entire scientific community (in preparation, for Forensic Science International) 3) Event Classification a. Construct suitable features and apply machine learning techniques (random forest) to identify event types (i.e. gunshot, beating, transfer) b. Write a publication presenting a classification algorithm to determine the distance between shooter and target in backspatter cases, following up on the recent publication, where CSAFE is acknowledged [1].

Products

JOURNAL PUBLICATIONS. Comiskey, P.M., A.L. Yarin, and D. Attinger, Hydrodynamics of back spatter by blunt bullet gunshot with a link to bloodstain pattern analysis. Physical Review Fluids, 2017. 2(7): p. 073906.

OTHER PUBLICATIONS, CONFERENCE PAPERS AND PRESENTATIONS. The above paper and general BPA advances will be presented at the major BPA conference, the 2017 IABPA conference (September 2017, Redondo Beach California) by Attinger.

TECHNOLOGIES OR TECHNIQUES 1) 2D+ methodology 2) Automated image processing and stain data collection

OTHER PRODUCTS Databases: Collection of digital spatters (gunshots and beating events)

Software or NetWare: 2D+ code for trajectory reconstruction

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? John Polansky: Postdoctoral researcher • 2D+ code development o Improved code reliability, performance and efficiency o Implemented new criteria model o Drafted code documentation • Database creation

CSAFE Annual Report 06.2017 BLOOD PATTERN ANALYSIS 141

o Designed and built a device to produce beating spatters o Assembled images into single case images o Created experimental test condition documents o Uploaded to ISU onedrive account temporarily Yu Liu: Graduate student - Worked on the classification and clustering algorithms Daniel Attinger and Kris De Brabanter: - Project supervision - Training of graduate students and postdoc

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Iowa State University Police Services, Ames, IA - Provided materials (ammunition, components, access to firearm and suppressor, access to indoor shooting range.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The high resolution blood spatter database that we have built is providing a consistent set of data using controlled and carefully documented experiments. This databased will be shared with researchers worldwide so that every group can test their backward trajectory reconstruction methods, to identify the region of origin of the blood spatter., and to determining the blood pattern generation mechanism. This work will be the basis for the creation of machine learning algorithms intended to identify the generation mechanism.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? The work performed thus far on the study of blood stains has been focused on the collection of data for the purposes of building a database of controlled data sets. In trying to solve for the geometric origin of the blood spatter, the methods of reconstruction have resulted in problems associated with the point densities and their bias. This problem holds value in that statistics must be applied so as to remove biased data in favor of data that is closer to determining the true origin.

WHAT IS THE IMPACT ON TECHNOLOGY TRANSFER? The spatter database offers a means for researchers to calibrate and test their BPA models. The spatter data will be integrated into the resources of the forensics arm of NIST. Pending the 2D+ developments, the code may be integrated into a commercially viable software package.

CSAFE Annual Report 06.2017 BLOOD PATTERN ANALYSIS 142

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? Current forensic understanding and impressions are being driven by creative media such as television shows and movies. These influences have led to misconceptions regarding the accuracy with which investigators can reconstruct the events of a crime.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None

Changes/Problems None to this date.

Bibliography Comiskey, P.M., A.L. Yarin, and D. Attinger, Hydrodynamics of back spatter by blunt bullet gunshot with a link to bloodstain pattern analysis. Physical Review Fluids, 2017. 2(7): p. 073906.

CSAFE Annual Report 06.2017 BLOOD PATTERN ANALYSIS 143

HUMAN FACTORS

Project E - Analysis of Forensic Testimony and Reports

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Simon Cole (UCI)

Other Investigators: Matt Barno (UCI)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The reporting of forensic results is a topic of crucial importance and increasing interest. The purpose of this project is not to address how forensic results should or could be reported. Rather, it is to understand empirically how forensic results are actually reported in American trials today.

Key questions of interest include: (1) whether reports are consistent with published disciplinary standards if such standards exist; and (2) whether reports are probabilistic in nature and, if so, how probability is characterized; (3) whether probabilistic testimony are elicited through direct examination or through cross examination, and to what extent cross examination is effectively eliciting and/or scrutinizing probabilistic (and non-probabilistic) testimonial claims.

With regard to key question (1): Over 8 years ago the NRC Report highlighted the lack of standards with regard to reporting of evidence in forensic science, yet progress has been slow. The goal of this research is to empirically demonstrate the lack of standards, consistency and understanding in meaning behind conclusions. The aim is to increase understanding of the importance of implementing standard procedures, as well as providing necessary oversight for application.

With regard to key questions (2) and (3): CSAFE focuses on promoting the development of probabilistic testimony and reporting, thus equipping those in practice to increase confidence in results. This research seeks to discover the current state of probabilistic testimony used in real cases. As a result, this research will: (1) help us explore the advantages and disadvantages of various "types" of probabilistic testimony (e.g., verbal, quantitative); and (2) establish a baseline so we can determine whether the incidence of probabilistic testimony is actually increasing at the ground level of practice which will guide the progress of our research so that improved practices can be implemented.

CSAFE Annual Report 06.2017 HUMAN FACTORS 144

YEAR CSAFE TASK DELIVERABLES DELIVERY PERCENTAGE INVESTIGATOR SCHEDULE COMPLETE

2016 Simon Cole IV Survey of 12/31/16 100% existing disciplinary reporting standards

Preliminary results for single discipline

Data collection for single discipline

Results delivered to Projects I and U 2017 Simon Cole IV Preliminary 12/31/17 50% results for all disciplines

Final results on single discipline

Data collection for multiple disciplines

Results delivered to Projects I and U

Presentation at International Conference on Forensic Inference and Statistics

CSAFE Annual Report 06.2017 HUMAN FACTORS 145

2018 Simon Cole IV Second report on 12/31/18 all disciplines

Data collection for all disciplines

Results delivered to Projects I and U

Presentation at American Academy of Forensic Science 2019 Simon Cole IV Final results for 12/31/19 all disciplines

Results delivered to Projects I and U

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? Data collection was completed for the first three disciplines (latent prints, tool marks, questioned documents). A sample of 94 fingerprint case transcripts had previously been collected. A sample of 32 firearm and toolmark case transcripts was collected.. We have collected 37 handwriting reports and transcripts from Westlaw Expert Documents.

We completed a report detailing our findings with regard to toolmarks. Firearm and tool mark (F/T) examiners showed greater consistency in identification reporting than latent print examiners. Almost all of the “Identification” reports included in our sample were classified by us as “Source Attributions.” Source Attributions are generally consistent with the conceptual framework outlined in the AFTE’s “Theory of Identification” and “Range of Conclusions,” suggesting that the AFTE is having significant influence on the analyses conducted by actual examiners. Many reports included language pulled directly from the AFTE standards. However, like the latent print reports, none of the F/T reports in the sample quantified the likelihood that a given conclusion was true relative to an alternative hypothesis. When experts did discuss probability, they were more likely to claim that their reports were nonprobabilistic than they were to provide an actual probability associated with a given outcome. Moreover, in the few inconclusive cases where the examiners attempted to assign a percentage to the

CSAFE Annual Report 06.2017 HUMAN FACTORS 146

probability of a match, their probability statements were both imprecise and not based on specific studies or data.

We presented that data on all three disciplines at the 2017 CSAFE All-Hands Meeting. In order to streamline and refine our coding scheme and increase consistency across disciplines, we have obtained, and are considering using, a coding procedure designed by Neumann and Ausdemore for the Midwest Innocence Project. In addition, in response to concerns about the age of the transcript data, we propose below adding a survey component to capture a more recent snapshot of probabilistic reporting of evidence.

In addition to the above data collection, it was proposed to add an analysis of data from the National Registry of Exonerations. The Registry is the largest and most authoritative collection of data about all known exonerations of innocent criminal defendants in the United States, from 1989 to the present. It is newly located at the University of California, Irvine. The PI was? recently named Director and Associate Editor. False or misleading forensic evidence is the fourth leading cause of wrongful conviction in the Registry’s data set. Over four hundred cases are currently coded for this cause. It is proposed to add an analysis of the reporting of forensic evidence in these cases to this Project. (It may also be useful to analyze the reporting of forensic evidence in which forensic evidence was not coded as contributing the wrongful conviction.) This analysis requires: a) Revising the currently inadequate coding scheme for forensic evidence in the registry b) Recoding cases c) Analyzing the data. It is understood that some of the forensic evidence in the Registry cases will derive from out-of- scope disciplines (e.g., serology, microscopic hair comparison). It is believed that the analysis will be nonetheless informative because it may reveal patterns in the reporting of forensic evidence in wrongful convictions that may be useful in reforming and placing on a statistical foundation the reporting of evidence in the in-scope disciplines. Researchers devised a coding system on the reporting of evidence and completed coding of all cases in the National Registry of Exonerations involving fingerprint evidence (> 135 cases).

Researchers have communicated relevant results of this study with Projects I and U principal investigators, who a adopt psychological judgment and decision-making approaches to lay understanding of forensic statistical evidence. This project will also apply the results to a science communication approach to lay understanding of forensic statistical evidence. A science communication approach will complement the psychological approaches that are already being pursued within CSAFE by increasing comprehension of scientific uncertainty to audiences outside the forensic science discipline .

CSAFE Annual Report 06.2017 HUMAN FACTORS 147

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Graduate student Alyse Bertenthal has obtained training by working with the PI coding trial transcripts and reports. Extensive work was involved in obtaining consistent coding across coders. Her work with CSAFE has ended.

A new graduate student researcher, Matt Barno, a doctoral student with a law degree from Harvard University, has joined the project. Barno has augmented his law school knowledge base through a concentrated effort to familiarize himself with the foundational principles of CSAFE research. He has thus acquired significant comprehension of firearm and toolmark identification procedures through his work on our report inthat discipline. He submitted an abstract to the annual conference of the Association of Firearm and Toolmark Examiners.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? We presented our work at the CSAFE All-Hands Meeting 2017.

We submitted an abstract to the annual conference of the Association of Firearm and Toolmark Examiners. It was not accepted.

We will be presenting our work at the International Conference on Forensic Inference and Statistics in Minneapolis in September, 2017.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? We plan to continue our progress - Data collection; coding; data analysis.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS University of Amsterdam, Doing the Individual and the Collective in Forensic Genetics: Governance, Race and Restitution“Forensics in Black and White: Individual and Collective Identification in Contemporary Friction Ridge, Hair, and Microbiome Forensics,” May 11, 2017.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Simon A. Cole, Principal Investigator, 0.125 summer month; supervising all projects, data collection, data analysis, background research; CSAFE funding; not collaborating internationally

CSAFE Annual Report 06.2017 HUMAN FACTORS 148

Matt Barno, Graduate Student Researcher, 0.75 calendar month; data collection, data analysis, background research; CSAFE funding; not collaborating internationally

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? None

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? The study will inform discussions in the other forensic disciplines about how analysts should testify with knowledge about how they actually do testify. By measuring the degree of penetration of current disciplinary reporting standards, the study will inform forensic and legal discussions about promulgating new reporting standards.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Provided training for graduate students working in area of criminology, law and society.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? The study will draw renewed attention to the issue of expert courtroom testimony and expert “reporting” more generally (e.g., science advisory committees) and the importance of precision in language in such testimony and reporting. The study will also highlight the importance of probabilistic testimony and reporting and quantifying uncertainty for all issues concerning science and technology in modern society.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None.

Changes/Problems Changes in approach and reasons for change Because of concerns expressed about the age of transcripts, given our collection methods, we will consider an alternative approach of surveying laboratories directly for transcript samples.

CSAFE Annual Report 06.2017 HUMAN FACTORS 149

Project I - Evaluating Lay Perceptions of Forensic Evidence and Forensic Statistics

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: William C. Thompson

Other Investigators: Rebecca Grady (Ph.D. candidate in Psychology and Social Behavior); Eric Lai (Ph.D. candidate in Statistics); Hal Stern

Accomplishments

MAJOR GOALS OF THE PROJECT 1. Improved understanding of how lay people perceive (and potentially misperceive) the probative value of forensic conclusions regarding pattern evidence in the areas of latent print analysis, ballistics, tire marks, footwear, handwriting, bloodstain pattern, and tool marks. 2. Improved understanding of how lay people interpret various statements (both quantitative and qualitative) regarding the weight of these types of forensic science evidence. 3. Improved understanding of how the presentation of forensic science evidence affects: (a) lay sensitivity to the strength of the forensic evidence; (b) lay susceptibility to fallacious interpretations of forensic evidence; and (c) the logical coherence of lay judgments made about or on the basis of forensic science evidence. 4. Improved understanding of how to communicate statistical concepts relevant to forensic science to lay audiences, including lawyers.

ACCOMPLISHED DURING THIS PERIOD Psychometric Studies--Work continued on preparing a journal article reporting the three key studies in this new paradigm. A draft was prepared of a technical report on three of the studies. A poster describing the research was prepared and presented at the CSAFE All- Hands meeting at Iowa State University. In June 2017, the PI discussed this work in a presentation at the NIST Technical Colloquium on Quantifying the Weight of Forensic Evidence. An abstract describing the work was submitted and accepted for presentation at the International Conference on Forensic Inference and Statistics (ICFIS) in September 2017 in Minneapolis. Data collection was also completed for a fourth study in our psychometric paradigm. These studies examined lay perceptions of the relative strength of various conclusions that a forensic scientist might present after comparing two items (e.g.,fingerprints) to determine whether they have a common source. The studies asked participants to make a series of judgments about which of two conclusions seemed stronger for proving the items have a common source. The data are then fitted to Thurstone-Mosteller paired comparison models to obtain a rank-ordered list of various statements and an indication of the perceived differences among them. The results allow calibration of verbal statements regarding weight of evidence (e.g., “extremely strong support for same source”) relative to source probability statements (e.g., “highly probable same source”) and random

CSAFE Annual Report 06.2017 HUMAN FACTORS 150

match probabilities (e.g., RMP = 1 in 100,000). These comparisons in turn provide insight into whether particular statements will be perceived in the manner intended and help identify statements (e.g., “match”) that are perceived in a manner that may be unintended. We undertook and have largely completed a re-analysis of all the data collected in the four studies. Lai, under the direction of Hal Stern, performed a re-analysis of the data using more optimal methods than had been employed earlier.

Mock Jury Studies—Work continued on preparing for publication an article reporting two mock jury studies concerning lay people’s understanding of likelihood ratios. Participants watched videos of the simulated testimony of a forensic scientist who presented testimony about likelihood ratios (of varying strength) with and without the use of graphical illustrations.

Replication and Extension of Mock Jury Research—CSAFE researchers at Iowa State University (Tyner, Hofmann, Carriquiry) approached the PI about cooperating on a replication and extension of a mock jury study (Thompson, W.C. & Newman, E.J. Lay understanding of forensic statistics: Evaluation of random match probabilities, likelihood ratios, and verbal equivalents. Law & Human Behavior. 39(4): 332-349, 2015). Replication is considered very important in the social sciences. The PI cooperated fully with this effort by supplying copies of the stimulus materials, information about online administration of the study, and suggestions for extensions of the experimental design. The Iowa State researchers plan to carry out the replication/extension next quarter. They are conducting and administering the research independently of the PI, but will share the data once it is collected.

Lawyer Training Videos—The training videos on forensic statistics that were prepared last quarter by the PI were presented to participants at in the National Forensics College at Cardozo Law School, a training conference for lawyers. Participants were required to view the videos before attending the conference and to complete a test on their comprehension and ability to apply the concepts covered. The PI reviewed the test results and presented a follow-up lecture at the Forensics College designed to correct errors and misconceptions and reinforce concepts. Experience and knowledge gained through this process will be used to improve future training presentations on forensic statistics.

Law Review Article—The PI prepared a draft of an article for law review publication that provides a broad overview of social science research on the question of how forensic scientists should communicate source conclusions to lay audiences, such as jurors. The draft will be presented and discussed at a conference/workshop at Seton Hall University School of Law in October. It is one of several papers that will be examined and critiqued by reviewers as part of the conference discussions. The draft has been circulated to a set of distinguished commentators who will review it and then present their reactions and critiques at the October conference.

CSAFE Annual Report 06.2017 HUMAN FACTORS 151

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED?

UC Irvine graduate student Rebecca Grady (PhD. candidate in Psychology and Social Behavior) was involved in the project and assisted with preparing stimulus materials and collection of data.

HOW HAVE RESULTS BEEN DISSMENATED TO COMMUNITIES OF INTEREST The PI made presentations on the results of this project at:

• The Forensic Inference Group, University of Lausanne, Lausanne, Switzerland, August 15, 2016 (colloquium presentations) • Programme on Probability and Statistics in Forensic Science, University of Cambridge, Cambridge, England, United Kingdom, August 31, 2016 (invited workshop presentation, available online at https://www.newton.ac.uk/seminar/20160831093010201) • National Commission on Forensic Science, Washington, DC, January 10, 2017 (invited presentation) • Training conferences sponsored by National Association for Criminal Defense Lawyers and the Innocence Project, March 23, 2017, San Diego, CA (invited presentation) • The National Forensics College, Cardozo Law School, New York, June 4, 2017 • The CSAFE All-Hands Meeting in Ames, Iowa, June 8, 2017 (Poster presented with graduate students Rebecca Grady, Eric Lai and with Hal Stern) • The NIST Technical Colloquium on Weight of Forensic Evidence, Gaithersburg, MD. June 29, 2017.

A draft law review article was circulated to legal scholars for comment.

PLANS FOR NEXT YEAR 1. With regard to the psychometric studies, we plan to release a technical report and submit an article for peer-reviewed journal publication. We will then undertake a new round of research, using the same methods to explore additional issues and refine our understanding of lay reactions for forensic science testimony. We have been discussing the design and details of additional studies. We will be drafting stimulus materials and will begin data collection for a new round of research to follow-up on our initial findings.

2. With regard to the mock jury studies, we plan to continue working on a write up of two studies on perception of likelihood ratios, which will likely be submitted for publication after September 30. We expect that the Iowa State researchers will complete data collection of their replication/extension of the Thompson & Newman study. We are planning another study involving people recruited from a county jury pool concerning

CSAFE Annual Report 06.2017 HUMAN FACTORS 152

the way in which perceptions of the weight of forensic evidence are affected by forensic scientists’ use of (or failure to use) context management procedures. The study will explore lay understanding the potential for bias in experts’ probabilistic judgments and how that might affect the weight lay people assign to experts’ conclusions.

3. The PI has begun discussions with Swiss colleagues at the University of Lausanne about preparing an article for judges that could discuss and compare European and American approaches to presenting source conclusions in pattern matching disciplines. These discussions may lead to submission of an article to a journal such as Judicature that would be based, in part, on research on lay reactions to various presentation formats for forensic pattern matching evidence. Products William Thompson, Rebecca Grady, Hal Stern & Eric Lai, Perceived strength of forensic science reporting statements. Poster presented at CSAFE All-Hands Meeting, June 8, 2017.

William C. Thompson, How Should Forensic Scientists Present Source Conclusions? Working Paper. June 2016.

William Thompson, Training Videos for Lawyers : Part I: Quantitation: http://tinyurl.com/Quantitation Part II: Classification: http://tinyurl.com/lz73pfm Part III: Comparison/Identification: http://tinyurl.com/mp8w4kw

Thompson, W.C., Scurich, N., Dioso-Villa, R., & Velazquez, B. Evaluating negative forensic evidence: When do jurors treat absence of evidence as evidence of absence? Journal of Empirical Legal Studies, in press.

Participants and Collaborators

INDIVIDUALS WHO WORKED ON THE PROJECT William Thompson (UCI), led the project

Hal Stern (UCI), advised on design and analysis

Rebecca Grady (UCI), managed data collection and preliminary analysis

Eric Lai (UCI)

Reva Schwartz (NIST) and Elham Tabassi (NIST) discussions with Thompson about an expansion of this project that would entail data collection from members of a jury venires in Maryland.

CSAFE Annual Report 06.2017 HUMAN FACTORS 153

ORGANIZATIONS INVOLVED Thompson has had discussions with various individuals in OSAC about the issues this project should address

Thompson has engaged in discussions with NIST scientists Reva Schwartz and Elham Tabassi

Impact As forensic scientists improve their ability to incorporate measurement uncertainty and probabilistic assessment into forensic analysis of pattern matching evidence, there may be changes in the ways in which they characterize their findings in reports and testimony. Rather than simply reporting categorical conclusions (e.g., identification, exclusion, inconclusive), it will be possible to provide more nuanced statements regarding the weight of evidence, such as likelihood ratios and match probabilities, and to provide quantitative estimates of the sensitivity and specificity of analytic techniques. These new capabilities will raise important questions for the forensic science, legal and academic communities about the best ways to present such findings to lay people, such as jurors, in order for them to understand and make proper use of forensic science conclusions. Research on the ways in which lay people understand (and misunderstand) the kinds of weight of evidence statements that might be used in presenting pattern matching evidence, will help forensic scientists understand the strength of limitations of various possible statements, and thus help them assess how best to present their conclusions in reports and testimony. This research thus has the potential to improve communication between forensic scientists and the lay audiences (lawyers, judges and jurors) who rely on their reports and testimony.

Changes/Problems Actual or anticipated problems or delays and actions or plans to resolve them 1. We were not able to complete the write-up of the additional study using the jury paradigm during this project year. This write up was given lower priority than the psychometric studies. It should be completed and ready for by submission December 31, 2017.

2. Regarding the technical report, it was delayed for the same reason as the publication of the first two studies--the decision to collect additional data before reporting. I expect to complete the technical report in conjunction with preparing an article for publication and to have the technical report ready by September 2017. A draft of the technical report is nearly complete, but not ready to release.

CSAFE Annual Report 06.2017 HUMAN FACTORS 154

3. We had anticipated beginning a collaboration with Reva Schwartz and Elham Tabassi of NIST on studies involving jurors in Maryland courts. We understand that Reva is no longer at NIST, which may affect whether this portion of the project can go forward. In any event, we have not had time to pursue this portion of the project.

Proposed change in approach and reason As mentioned above, Tyner, Hofmann, and Carriquiry plan to replicate and expand on the study, “Lay Understanding of Forensic Statistics: Evaluation of Random Match Probabilities, Likelihood Ratios, and Verbal Equivalents” by William Thompson and Eryn Newman from the UCI. It will not be a complete replication, but we will be able to compare their results on random match probabilities (RMPs) to some of our results. In this study, there will be fiber, fingerprint, shoeprint, and DNA evidence conditions and two different evidence strengths, using RMPs of 1 in 100 for moderate and 1 in 1 million for very strong evidence, crossed for a total of 8 experimental conditions. The goal of the study is to determine how mock jurors will change their probability of guilt for different evidence types, strengths, and a combination of the two.

Probability is a notoriously difficult concept for lay people to intuit. In the context of a criminal trial, the jurors’ understanding of probabilistic evidence could be the difference between justice and injustice. Forensic evidence presentations to juries vary greatly, and without a standard of presentation there is no way to be certain that the correct information is being presented to and understood by the members of the jury. By performing this research, we hope to better understand how different types of evidence are understood by mock jurors, with the goal of guiding forensic experts and judges on the best way to present random match probability evidence to jurors.

We have received approval from Iowa State University’s Internal Review Board (IRB) to administer a survey through the Amazon Mechanical Turk service to 600 mock jurors. We expect the data collection to take 1-2 weeks. Once we have administered the survey and collected the data, we will perform a Bayesian analysis like Thompson and Newman, which will require some guidance from Thompson. We will prepare our results for publication, with the goal of submitting a paper for publication in the fourth quarter of 2017. If we have significant findings, we will also prepare guiding materials for the legal community on how to better educate jurors on evidence presented using random match probabilities.

A deliverables table is included on the next page.

CSAFE Annual Report 06.2017 HUMAN FACTORS 155

Suggested deliverables table for Project I expansion.

Task CSAFE Delivery Year Deliverable(s} Schedul Number Investigator e

2017 1 Heike Hofmann Survey results Q3

2017 2 Samantha Tyner Share anonymized results with Q3 UC-Irvine collaborator

2017 3 Heike Hofmann, Paper Submission Q4 Samantha Tyner

2017 4 Samantha Tyner Prepare guiding materials for legal Q4 groups on our findings

CSAFE Annual Report 06.2017 HUMAN FACTORS 156

Project M - Human Factors in Visual Identification: A cross- cutting research proposal

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Cleotilde Gonzalez (CMU, Social and Decision Sciences)

Other Investigators: Maria Cuellar (Ph.D. student, CMU), Nalyn Sriwattanakomen (RA, CMU)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The most immediate goal is to perform basic empirical investigation of the human factors that may shape identification decisions. Our goal is to develop a scientific foundation of behavioral effects of identification decisions to inform the generation of computational behavioral models. We have started with concentrating on Firearm examination, as in this domain there is very little understanding of effects of human factors. Through experiments in the firearm identification domain, we will be able to quantify variability and reliability of human identification decisions. These measures will help in the development of models that may be integrated with statistical models.

A long-term research goal of this program is to build actionable models of human decision making that could be integrated with the statistical models of image identification being developed under CSAFE. Our vision is to support the successful identification and interpretation of forensic evidence by accounting for the socio-cognitive factors that may influence human decision making. These novel models will need to be informed by behavioral phenomena that identify the processes by which humans make identification decisions across forensic domains.

WHAT WAS ACCOMPLSHED UNDER THESE GOALS? From a literature review of all the available work of human factors, psychology, and decision making in identification decisions and from our visits to forensic laboratories (see FY2-Q2 report for summary of literature review), we have identified major Human Factors issues in need of research of forensic science. We have started the design and a series of laboratory experiments with non-experts (i.e., laypeople from online communities) at our university to advance basic science and uncover cognitive factors relevant to identification decisions based on physical evidence.

Given the heavy reliance on memory, experience, and similarities for identification decisions in forensic science, we propose to manipulate and measure memory factors (incentives, similarity, frequency, variability, feedback, sequence) using procedures similar to those

CSAFE Annual Report 06.2017 HUMAN FACTORS 157

developed in past research for luggage screening weapon identification (e.g., Gonzalez & Madhavan, 2011; Madhavan & Gonzalez, 2010). The type of experiments we have designed are divided into two groups:

(1) experiments aimed at elucidating the effects of human memory processes on the formation of expectations and beliefs in the human mind, which then lead to variations in identification decisions; and

(2) experiments aimed at evaluating new interventions to mitigate the factors that influence the accuracy, reliability and consistency of identification decisions. Elucidating the effects of memory processes in identification decisions. In this set of experiments, we will examine the effects of cognitive factors on the accuracy and confidence of identification decisions.

We will manipulate six factors during the learning phase: incentives, similarity, frequency, variability of experiences, feedback, and sequence of experiences. Evaluation of Interventions and Mitigation of Human Error. In this set of experiments, we will determine the kinds of interventions that may help to mitigate the errors that emerge from experience. Experiments will be conducted using image databases and cases from bullet cartridges examination. We will select from the six learning phase factors those that significantly worsened learning and decision making in the transfer phase and use them in the learning phase of these experiments to induce error. One of the priorities on which we will concentrate our initial experimental efforts is the investigation of memory effects and similarities on the accuracy and confidence of decisions in firearm analysis and comparison. During Quarter 3 we submitted initial experimental protocols for internal IRB review and these were approved and the approval letter submitted with our report.

In quarter 4, we conducted informal interviews and observations with firearm examination experts at the Allegheny County Medical Examiner’s Forensic Lab (ACMEO). We then designed a study in the domain of firearm analysis to investigate the effects of the reference images used in examination and how do they influence identification decisions in this domain.

During this year we have also conducted many discussions meeting with Dr. Itiel Dror, as are collaborating with him given his experience in fingerprint analyses. Through these discussions we decided to pursue questions regarding the effects of the reference images in the process of detection. We have designed and implemented an experimental protocol in Qualtrics, and generated the images to be used in the experiment using the images from the NIST Ballistic Identification Designed Experiment (NBIDE).

Our participation in the All Hands Meeting of CSAFE was critical to learn more about the challenges that many of the researches bring up, understand the work that others are performing within CSAFE and be able to discuss possible collaborations.

CSAFE Annual Report 06.2017 HUMAN FACTORS 158

We have learned how the firearm examination process works. In particular the ACMEO procedure for firearm examination based on the NIBIN database images, how he analyst determines the relevant regions to be analyzed and what features within these regions are considered.

In summary, from our interviews and observations at ACMEO we learned about the firearm examiner process, summarized as follows.

The examiner receives either a police report containing a cartridge image or an actual cartridge from the scene of the crime as an evidentiary sample. The sample is uploaded to NIBIN or another ballistic image database. The database will automatically pull up a list of potential hits, listed in rank order. The examiner considers only the first 20 best matching images. These images typically feature only the breech face, but they may also include the sides of the cartridge case. In NIBIN, the images consist of (a) the primer, (b) the firing pin, and (c) the full headstamp. Within the database console, the examiner first looks for similarities between the images, which are displayed side by side. Identification decisions are rarely made off of 2D images alone.

In descending order of importance, the features compared during identification include: Firing pin impressions marks, drag marks (from when the firing pin drags across the primer during firing) and shear marks on the primer. Chamber marks (marks on the sides of the cartridge case). The case expands as the bullet leaves the gun, and this expansion leaves toolmarks on the case. A given gun will NOT always make the same drag marks. The absence or presence of a drag mark is not itself a disqualifying feature. The brightness/contrast of markings can be very misleading. Small differences in characteristics can be caused by many environmental factors (e.g., the gun that fired the cartridge found at the crime may have been treated poorly, which could cause rusting and distortion of the cartridge upon firing). Furthermore, the metal type may produce higher or lower extrusion from the firing pin, which translates to higher or lower. To help adjust for brightness/contrast issues, the primer will often be coated with different metals to generate images with different levels of shininess, contrast, etc. The NIBIN console allows the examiner to move, rescale, and rotate images. Most importantly, the console features a movable line between the two images that the examiner can slide back and forth to overlay the images of the breech faces for better comparison.

The relevant regions for identification are typically the firing pin mark, the drag marks, and the shearing marks. The examiner considers class characteristics (e.g., the shapes of the ejector, extractor, and firing pin marks) as opposed to gross characteristics or the exact positioning of the marks. So, for example, even if an image has the firing pin mark 6mm away from the center of the primer whereas the other image has the firing pin mark only 2mm away, they

CSAFE Annual Report 06.2017 HUMAN FACTORS 159

may still be a match if the firing pin marks are the same. The examiner will not exclude the image of the basis of the difference in distance. Given the high variability of toolmarks that a given gun may produce, examiners take may request guns from out of state to fire test shots with the suspected firearm make; or take a firearm that is believed to match the toolmarks on the evidentiary sample and make multiple test fires. These test fires are shot through water tanks. The examiner then compares the cartridge images from these multiple test fires under a microscope to assess the level of agreement between the toolmarks. The examiner then applies that standard (i.e., level of agreement) as the threshold for the level of agreement between the test-fire image and the evidentiary image. For example, if the level of agreement between 10 test fires is found to be r = .65, then the level of agreement between the test fire and the evidentiary fire should not be lower than r = .65 if they are a match.

In the design of a study in the firearm analysis domain, we implemented the study in Qualtrics, and then obtained feedback from the firearm analysts at ACMEO. The study aims at investigating how the reference image influences identification decisions. We selected images from different guns/bullets and modified each in minor ways (5% of image was modified) by using Photoshop. For example, we changed contrast, added dark/light spots, added/removed scratch marks. We asked experienced firearm analysts to evaluate our modifications. The task requires people to look at pairs of images (a reference and an evidence image) and determine whether the pair is a match or not. During a “training” phase we provide feedback about the correctness of the decision, and why the decision was correct or incorrect. During the Identification phase we ask the participant to analyze pairs of images and make a decision whether a pair is a match or not. We will also ask the participant to rate the level of confidence, and choose the features used in making the identification decision. Finally, we ask participants to complete some brief questionnaires.

We obtained feedback from the firearm analyst at ACMEO and we also received feedback during our poster presentation about this work during the all hands meeting in Iowa. Currently, we are integrating the feedback to improve the design of our study.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Training and mentorship activities are provided to PhD. Student Maria Cuellar and Research Assistant, Nalyn Sriwattanakomen. We conduct regular meetings with collaborators.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Our work has been presented and we have participated in the CSAFE-CMU weekly meetings, at the CSAFE meeting in Iowa in 2016, and All Hands Meeting of CSAFE in 2017.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS?

CSAFE Annual Report 06.2017 HUMAN FACTORS 160

We expect to address the feedback on our experimental study. Reimplement our study in Qualtrics, and pilot-test the full study.

Products

TECHNOLOGIES OR TECHNIQUES Experimental protocols in Qualtrics and database of breech face bullet cartridges to be used in experimentation.

OTHER PRODUCTS We have 143 images in a Dropbox database, which we acquired from the NIST publicly available database of images of the breechface of bullet cartridges. The database is called the NIST Ballistic Identification Designed Experiment (NBIDE), and it is available online at https://tsapps.nist.gov/NRBTD/Studies/Search (last accessed February 26, 2017). We also have the pairwise similarity scores between all the pairs of images, thanks to the NIST- improved methodology developed by Tai (2017). We also have the Qualtrics implementation of our first experiment.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Cleotilde Gonzalez (CMU), Principal Investigator, beginning in Year 2. Gonzalez leads the project and the intellectual direction of the project. She provides guidance in design and implementation of experimental materials. She is in charge of submitting IRB for approval. She receives one month in salary support. Maria Cuellar (CMU), Ph. D. Student, beginning in Q3, Year 2. Cuellar helps with implementation of experimental protocols, storyboards and materials. She documents methods and results. Nalyn Sriwattanakomen (CMU). Research staff, beginning in Q3, Year 2. Sriwa helps with Qualtrics implementation, literature review, and IRB administrative submission requirements.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Itiel Dror (UCL). Dror has served as advisor of our project. We have conducted regular online meetings with him since January 2017. Dror does not receive any monetary compensation from C-SAFE.

CSAFE Annual Report 06.2017 HUMAN FACTORS 161

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? If our hypotheses were correct, it would imply that it is essential to analyze the evidence from the crime scene before the reference materials are presented for comparison, to maximize the objectivity of the analysis. This can be achieved by using Linear Sequential Unmasking (LSU) procedure, a technique developed by Dror et al. (2015). The LSU procedure requires that bullet cartridges from the crime scene be analyzed and characterized prior to the exposure of the bullet cartridge from the reference gun. This result could be used to generate suggestions for best practices in firearm analysis in forensic laboratories. It could also suggest that this type of bias might be present in other types of evidence in which it has not been studied.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? For Psychology, it will provide support to theories of learning from experience and the effect of factors such as similarity and frequency.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? Will help in expanding the connections of human error, cognitive biases, and generally the cognitive aspects of decision making in the field of forensic sciences and psychology.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? Our research seeks to advance forensic science by identifying for the first time some of the basic cognitive and decision science factors that influence the emergence of decision errors from experience. This knowledge will lead to a better understanding of how to prevent such errors from occurring and improve the accuracy and reliability of decisions made across forensic science domains.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None

Changes/Problems Originally, Stephen E. Feinberg (CMU, Statistics) was the PI of this proposal, Gonzalez was a Co-PI. The loss of Stephen lead to several changes, including the reorganization of the project and the team. Amanda Luby (CMU, PhD student) was originally involved but was replaced by Maria Cuellar, given Maria’s closer interests to this research.

CSAFE Annual Report 06.2017 HUMAN FACTORS 162

Given these changes above, we were requested to rewrite the proposal this project – we are expecting news regarding funding for FY3 and beyond.

CSAFE Annual Report 06.2017 HUMAN FACTORS 163

Project T - Forensic Processing and Human Factors at Crime Laboratories

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Daniel Murrie (UVA)

Other Investigators: Brandon L. Garrett (UVA), Sharon Kelley (UVA); Karen Kafadar (UVA) Alicia Rairden (HFSC), Amy Castillo (HFSC)

Collaborators: Houston Forensic Science Center (HFSC); Summer Farrar ( Midwest Innocence Project; MIP)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The primary goal is to collaborate with crime laboratories to study case processing of latent fingerprints, particularly the potential role of human factors and cognitive bias in current and changed laboratory procedures, including blind or masked procedures. Examining case processing to gauge the basic reliability of latent print examination is a crucial step in understanding and improving the statistical foundations for pattern evidence, and allows us to better measure effective laboratory performance, the effects of altering case management procedures, examiner differences, and assessment of examiner-specific error rates.

Thus, this baseline case processing data will tell investigators far more about the mixture of cases that analysts review and verify, and it will permit further assessment of any future changes to the procedures used at the laboratory, including if, for example, blind verifications are adopted in the future.

Investigators are pursuing three interrelated lines of inquiry: 1) collection and analysis of case processing data, 2) collection and analysis of evidence submission forms, and 3) reviewing prior cases handled by the Midwest Innocence Project (MIP) for indicators of cognitive bias in collection, analysis and presentation of pattern evidence. Goals for each line of inquiry are a

2. Evidence Submission Forms

Concerns about contextual effects are clearly consistent with a rich body of research in cognitive and social psychology (Saks, Risinger, Rosenthal, & Thomspon, 2003). Moreover, Dror and colleagues (2006; 2010; 2011) conducted several seminal studies specifically addressing contextual effects among forensic science procedures and raised concerns throughout the forensic science community. Although limited, this body of research has

CSAFE Annual Report 06.2017 HUMAN FACTORS 164

substantial implications for policy and justice—many advocates have already urged substantial reforms (e.g., Dror et al., 2015).

Given the potential for evidence submission forms to solicit biasing and potentially task- irrelevant information, we are collecting submission forms from crime laboratories across the United States. We will then analyze similarities and differences in the information they request and the potential for such information to bias the analysis of evidence. We have started with blank forms, and will later expand to collect and code completed forms from different laboratories. Determinations about task relevant vs. task irrelevant information will be guided, at least in part, by principles articulated by the OSAC Friction Ridge Subcommittee.

3. Cognitive Bias/Pattern Evidence Audit

Through the Midwest Innocence Project’s (MIP) diligent collection of records in hundreds of cases, investigators have an opportunity to evaluate how different procedures (e.g., related to evidence collection, processing of evidence, conversations between analysts and police or attorneys) have the potential to bias case processing and outcomes, and devise a coding scheme to more systematically quantify how frequently key events occur. A main goal is to expand work on case processing and cognitive bias through a review of decades of casework through a partnership with MIP.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS This research, involving collaboration with crime laboratories to study processing of real cases, has not been done in any large-scale way in the past. Data collection was completed by the Houston Forensic Science Center (HFSC) personnel during September 2016 . At that point, preliminary data analyses revealed a wide range of latent prints available between cases and suggested that exclusions, verifications and consultations occur in a meaningful, non-trivial number of cases. Investigators anticipated that the frequency of verifications would allow the examination of questions specific to the verification process and the factors that result in a change in the original examiner’s conclusion.

Significant results from the HFSC data are based on case processing data from over two years that represents 2,536 cases and 12,363 prints. These data cover preliminary decisions examiners make about prints, such as the relative proportion of value (44%) and no value (56%) prints, and decisions made about prints of value, i.e., identifications (60%), exclusions (28%) and inconclusive (12%). Results also revealed great variability in the number of prints submitted to the lab based on offense type and in the number of prints in each case (range: 1-153; mean: 8.50; median: 4.00; mode: 1). Of the original 2,536 original cases, 36% were verified, 8% had only identifications verified, and 56 were not verified. A small subset of cases (n = 82, or 3% of all cases) proceeded to consultation (i.e., the process that occurs when the

CSAFE Annual Report 06.2017 HUMAN FACTORS 165

original and verifying examiners disagree). In the consultation stage, the final decisions were typically those of the verifier (72%) as opposed to the original examiner (28%). Post- consultations decisions often involved meaningful changes—though they represented a very small proportion of the overall caseload. For instance, a determination of “latent of no value” was changed to “latent of value” on 18 occasions, and on 22 occasions an exclusion was changed to an identification. Therefore, these results suggest that although consultation occurred relatively infrequently, the consultation process often led to meaningful changes about individual latent prints. The same was true for the conflict resolution process (i.e., when consultation did not lead to a final, mutually agreeable decision). Although occurring rarely (8 cases total), the process sometimes resulted in important changes to the decisions made about particular latent prints. Finally, preliminary results reveal significant examiner differences based on the number/overall proportion of cases that proceed to consultation (1- 11%) and the proportion of original decisions that were changed during consultation (33-

95%).

The most significant results from these analyses indicate the 1) significant variability in number of latents across cases, 2) the baseline of analyst agreement (i.e., reliability) during verification (93%) to determine whether that changes when blind verification is implemented, 3) the range of outcomes in consultation and conflict resolution, 4) and the presence of meaningful differences among examiners in the frequency with which their cases go to consultation or conflict resolution and the frequency with which their decisions are overturned.

For instance, these data suggest a high level of agreement between examiners when verification is not blind, and future work can evaluate changes (or lack thereof) in overall agreement after the lab implements blind verification. Preliminary results also suggest the potential for significant individual differences, which underscores the importance of subsequent research designed to better understand the systems and processes that result in these differences (e.g., human factors, print quality). The key outcome of this project lies in the fact that investigators now have a complete dataset reflecting two years of casework at a major crime laboratory and a process for data collection that can be replicated at that site and others. Investigators are in the preliminary stages of conversation with other crime labs to begin similar data collection procedures.

Time spent conducting analyses revealed weaknesses in the original database used for the first round of data collection last summer. Investigators have been, and continue to be, constructing a more useful database for subsequent rounds of data collection that will better allow them to 1) track processing not just by case, but by individual latent print, 2) collect more nuanced information about why particular prints reach consultation or conflict resolution, 3) understand the amount of time required to complete cases, and 4) save time in data analyses by reducing free text data entry.

CSAFE Annual Report 06.2017 HUMAN FACTORS 166

At this stage, investigators are working to synthesize lessons learned from the initial round of case processing data collection and conversations with different crime labs. Investigators plan to present this synthesis at the upcoming International Symposium on Forensic Science Error Management at NIST. The presentation will include the strong rationale for collecting case processing data, such as quantifying and describing case flow, documenting rates of examiner agreement, the types of prints that are more likely to produce disagreement, and providing a baseline for comparison if labs change procedures. We will also provide recommendations, or a basic primer, on how to collect such data. Our goal is to encourage labs to routinely conduct this type of “in house” research (perhaps collaborating with CSAFE scholars).

In addition to work with case processing data, investigators have also begun collecting latent print evidence submission forms. Given the Project T’s focus on how human factors and cognitive bias have the potential to affect case processing procedures and outcomes, we are seeking to clarify the nature and quantity of potentially biasing, or task-irrelevant information that is routinely requested before latent print analyses are conducted in forensic laboratories. Data collection is ongoing (current n >80 submission forms from nation-wide sample), however, preliminary results indicate that two-thirds request a description of the case and most include prompts requesting the race, sex, and age of the suspect or victim. Some types of potentially biasing or task-irrelevant information appear more prevalent than other types. For example, almost all submission forms request that the offense type be specified whereas only four forms ask whether the suspect or victim is deceased. Moreover, while some task-irrelevant prompts appear to have practical purposes (e.g., one quarter of forms request information regarding suspect’s criminal history and most request suspect’s name), others have no clear practical purpose (e.g., victim and suspect race). Future analyses will provide additional descriptive information regarding the type of potentially biasing information being requested. We will also examine differences in forms used by different types of forensic laboratories (e.g., state-funded, local agency, federal). Already, though, we are beginning to see patterns in the types of information that submission forms routinely request that at least have the potential to solicit biasing information. Investigators are scheduled to present results at the upcoming International Symposium on Forensic Science Error Management at NIST.

Researchers also began a meta-analysis of the studies of contextual/biasing effects in latent print processing and drafting a manuscript for publication. In addition, in partnership with HFSC, investigators presented results at the American Association of Forensic Science’s 2017 annual conference.

Investigators presented HFSC data at the “Forensics @ NIST” Conference on November 9, 2016, at the American Association of Forensic Science’s 2017 annual conference, at a small,

CSAFE Annual Report 06.2017 HUMAN FACTORS 167

internal “Quantitative Collaborative” meeting of statisticians and methodologists at University of Virginia to get critical feedback on data analyses; and in Karen Kafadar’s course on statistics consulting. Additionally, based on new contacts from the Forensics @ NIST conference, investigators began engaging in preliminary conversations with additional labs to replicate and extend the HFSC project in the California and Arizona systems.

Additional primary accomplishments included additional presentations of data collected from the Houston Forensic Science Center (HFSC), including a CSAFE webinar in May 2017 and at the CSAFE annual meeting in June 2017. As a result of these presentations, investigators have made useful contacts in the latent print field (e.g., JoAnn Buscaglia of the Federal Bureau of Investigation Laboratory Division and Michelle Triplett of King County Regional AFIS). Conversations with these contacts have been fruitful in two primary ways. First, they have pointed investigators toward areas that may be ripe for future case processing data collection. For instance, conversations with Ms. Buscaglia have led us to consider how we might incorporate research findings on examiner differences at each stage of the ACE-V process into data collection (e.g., finding ways to more specifically code reasons for disagreement during conflict resolution processes). Second, they have provided concrete examples of how case processing practices can vary between labs. This line of conversations has underscored the need to enlist more labs in collecting case processing data, both for internal (i.e., quality assurance and monitoring) and external (i.e., comparison with other labs) purposes. Thus, we continue to engage with other labs about data collection. Labs continue to be receptive, at least in theory, to collecting case processing data, and we continue to pursue additional collaborative relationships.

Following conversations with contacts in the field of latent print examination, investigators have also revised their manuscript (based on HFSC data) to better place findings in the context of what is known about sources of examiner disagreement and are currently finalizing this manuscript for publication.

Finally, investigators have expanded their work on human factors and case processing through a partnership with the Midwest Innocence Project (MIP) to conduct a “cognitive bias audit” of hundreds of cases. The impetus for this review was initially the FBI’s identification of testimonial errors in microscopic hair comparison cases. As the MIP began an audit to determine whether these types of errors occurred in their state cases, they identified a need for a broader review. This broader audit would be designed to identify sources, or potential sources, of cognitive bias in forensic science case processing procedures. Thus, we have been reviewing available documents in these cases (e.g., police reports, bench notes, forensic reports, documentation of conversations between forensic science examiners and police or attorneys, trial transcripts, and depositions) to develop a coding scheme and

CSAFE Annual Report 06.2017 HUMAN FACTORS 168

process that would capture opportunities for cognitive bias to influence case processing of and conclusions about common forms of pattern evidence.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Results were presented at AAFS and focused on the consultation and conflict resolution methods used during the verification phase of latent fingerprint analysis, exploring ways in which examiner differences influenced consultation and resolution procedures. The presentation was well-received and investigators hope, will inspire similar data collection efforts in other labs. Case processing results were presented during a CSAFE webinar and at the CSAFE annual meeting (both formally and informally). The rationale and broad framework for the cognitive bias audit were presented at the annual Innocence Network Conference (using funding from mechanisms outside CSAFE). The audience consisted of lawyers, forensic scientists, and other scholars interested in conducting similar audits.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Results have been disseminated through formal presentations (e.g., AAFS, Forensics @ NIST, CSAFE Webinar, Innocence Network Conference), as well as through informal presentations at University of Virginia (e.g., Karen Kafadar’s course) and conversations with colleagues/contacts at other labs (e.g, VaDFS). Importantly, as a result of this presentation, investigators now have contacts with two other state crime laboratory systems (California and Arizona) who have expressed interest in replicating and extending case processing research in their labs. Although these conversations are in the preliminary stages, new contacts have expressed interest in collecting case processing data both to contribute to a larger scientific process and to engage in more critical self-monitoring of their own labs’ processes.

In addition, regarding the smaller literature-review that preceded this project, investigators summarized their meta-analysis of contextual/biasing effects in latent print analyses, and disseminated this through a formal, peer-reviewed presentation at the annual meeting of the American Psychology-Law Society. Investigators are also summarizing this review in manuscript format and anticipate submitting it for publication in 2017.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? During the next year, investigators will finalize and submit the manuscript that is being prepared for HFSC. With our HFSC collaborators, we plan to circulate an early version of the manuscript (prior to submitting for publication) to contacts at other laboratories to 1) receive critical input on the manuscript and overall study and 2) prompt similar investigations at other laboratories. In this way, we plan to follow-up on preliminary conversations with contacts from other labs to pursue data collection. Thus, next year we will follow-up with contacts from crime labs in Arizona, California, Seattle, Washington; Virginia, and Washington, D.C. to

CSAFE Annual Report 06.2017 HUMAN FACTORS 169

explore the feasibility of collecting case processing data and ways that we could facilitate the process (e.g., sending researchers and/or students to sites to help code and enter data). We are hopeful that this year we will collect case processing data from at least one more lab such that we can conduct comparisons with the HFSC data set.

HFSC case processing data will be presented at the International Association for Identification in August 2017. We will also present a primer on in-house case processing research at the International Symposium on Forensic Science Error Management at NIST at the end of July, 2017. Like other NIST and CSAFE presentations, we hope this venue will allow us to reach communities of interest.

We plan to complete data collection of blank evidence submission forms, present findings at the International Symposium on Forensic Science Error Management, and reach out to labs (beginning with some of the labs listed above) to begin data collection of completed evidence submission forms.

Finally, we will continue reviewing MIP cases and finalize a coding scheme for the cognitive bias audit.

To ensure dissemination to communities of interest, we will propose to present work on these projects at 2018 conferences such as AAFS and IAI.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Q1- In partnership with HFSC, we submitted a proposal to the American Association of Forensic Science for their 2017 annual conference. The presentation will focus on the consultation and conflict resolution methods used during the verification phase of latent fingerprint analysis, exploring any ways in which examiner differences (in experience, training, and seniority status) influence consultation and resolution procedures.

We did summarize our meta-analysis of contextual/biasing effects in latent print analyses, and disseminated this through a formal, peer-reviewed presentation at the annual meeting of the American Psychology-Law Society. We are also summarizing this review in manuscript format and anticipate submitting it for publication in 2017.

Q2- Presentations at the Forensics@NIST conference in 2016

Q3- Rairden, A., Garrett, B., L., Murrie, D., Kelley, S., & Castillo, A. Resolving Latent Conflict: What Happens with Latent Print Examiners Enter the Cage? Paper presented at the annual meeting of the American Association of Forensic Sciences, February 2017.

CSAFE Annual Report 06.2017 HUMAN FACTORS 170

Q4- Murrie, D. & Kelley, S. Latent Print Processing. Webinar presented through CSAFE, May 4, 2017.

TECHNOLOGIES OR TECHNIQUES None at this time. However, investigators hope that technology related to guiding coders through case documents in a predetermined order, which will potentially incorporate a data entry system as well, will ultimately be shared with communities and researchers who want to conduct similar audits.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Dan Murrie (principal investigator) - funding support in Years 1-2. Planned and participated in design of future experiments and collaborations with crime laboratories.

Sharon Kelley, researcher – funding support in Years 1-2. Planned and participated in design of future experiments and collaborations with crime laboratories.

Brandon Garrett (UVA): funding support in Year 2. Planned and participated in design of future experiments and collaborations with crime laboratories.

Garrett, Murrie, and Kelley each spent at least one-person month of time on this and their other CSAFE projects (combined).

Alicia Rairden (HFSC collaborator): advised on study design and oversaw first round of data collection. Receives no CSAFE funding.

Amy Castillo (HFSC collaborator): advised on study design and oversaw first round of data collection. Receives no CSAFE funding.

Summer Farrar (MIP collaborator): collaborating on development of a cognitive bias audit protocol. Receives no CSAFE funding.

Brett Gardner (UVA): Postdoctoral fellow at UVA (with Murrie and Kelley), but receives no CSAFE funding (volunteers on CSAFE projects)

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Houston Forensic Science Center (HFS; Houston, Texas) has been the primary collaborator. Their staff have collaborated in sharing archival case records, designing the study, and even providing intern labor to code archival case files. HFS is the site of the initial study (i.e., the data collection underway) and the source of all study data. HFS staff Alicia Rairden, MS, CLPE has been the lead collaborator. Thus HFS has contributed time, personnel, and data.

CSAFE Annual Report 06.2017 HUMAN FACTORS 171

Defense Forensic Science Center (DFS; Forest Park, Georgia) has been a second collaborator. Their staff, particularly Henry Swafford, has hosted a site visit in order to orient us (Murrie, Kelley, Garrett) to DFS procedures and develop research plans. While their procedures currently do not allow for the kind of archival research we are conducting at HFS, we are planning to study case processing as they implement new verification procedures. Thus, at this point, their primary contribution has been staff time in terms of education and planning, though we anticipate later contributions of staff time, lab data, and other collaboration.

For this, and other UVA-based projects, investigators have had helpful, collaborative input from law professors John Monahan and Greg Mitchell, both national experts on scientific evidence in court. Neither receives any CSAFE financial support, but both contribute time and expertise in routine meetings and consultation.

Most recently, we have begun collaborating with the Midwest Innocence Project (MIP; Kansas City, Missouri), specifically Summer Farrar, on the cognitive bias audit. Ms. Farrar has collected and provided case materials for review and begun to develop a platform for giving coders access to case materials in a manner that shields them from biasing information as much as possible.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? Examining case processing to gauge the basic reliability of latent print examination is a crucial first step in understanding and improving the statistical foundations for fingerprint evidence (which should allow for better measurements of effective laboratory performance), the effects of altering case management procedures, and assessment of examiner-specific error rates.

No prior work has examined case flow within crime laboratories generally, nor with latent fingerprint processing, specifically. Likewise, we know of no basic descriptive research on the use and outcomes of verifications. This study will provide this simple first step in knowledge. Authorities have recommended expanding the use of verifications (e.g., to include inconclusive results) or the use of blind verifications, but little is known regarding what impact that may have on efficiency of the work, or whether any costs and benefits change depending on the experience of the examiners or the difficulty of the work (OIG, USDOJ, 2006; NIJ and NIST, 2012; Stoel et al., 2015). The research on human factors and cognitive bias has focused on experiments involving fairly small numbers of participants and cases (e.g., Dror & Charlton, 2006; Dror et al., 2005; 2006), and our research—once complete—will likely be the first involving greater, actual case work over extended time periods. In order to assess error

CSAFE Annual Report 06.2017 HUMAN FACTORS 172

rates and arrive at statistically sound models for pattern evidence, one must also accurately measure how human factors contribute to error rates.

Careful review of evidence submission forms and the process of collecting, reviewing, and presenting forensic evidence also has the potential to affect practices in the discipline. Investigators anticipate that findings from this research can be used to inform case management procedures (such that the examiners evaluating evidence are blind to biasing, task-irrelevant information) and more thoughtful policies and procedures surrounding data collection procedures, conversations among examiners and between examiners and law enforcement, and even data management systems that might inadvertently expose examiners to biasing and task-irrelevant information.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Though specific to latent fingerprint examination, this research has implications for other forms of pattern evidence as well. Regarding case processing, results may underscore the value of verification procedures, provide some indication of the labor costs of verification procedures, and shed light on the nature of staff status or relationships in verification (to take just one example; we will examine whether verification outcomes differ depending on status differences among staff involved in the procedures). The collection of biasing, and potentially even task-irrelevant information, on evidence submission forms is similarly relevant not only to latent print examination, but any type of forensic evidence. Finally, understanding where in case processing cognitive bias has the greatest potential effects is also broadly applicable across forensic science disciplines.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? At this stage, the project has not made meaningful impact on human resources, though we anticipate some impact upon project completion and dissemination.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None

Changes/Problems Nothing to Report

CSAFE Annual Report 06.2017 HUMAN FACTORS 173

Project U - Research on Lawyers, Jurors, and the Evaluation of Forensic Evidence

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Brandon L. Garrett (UVA)

Other Investigators: Gregory Mitchell (UVA), John Monahan (UVA), Dan Murrie (UVA), Sharon Kelley (UVA), Nicholas Scurich (UCI); VADFS, WVDFS, HFSC, DFSC

Note: Law school faculty Gregory Mitchell and John Monahan are not receiving any salary as part of CSAFE, but they advise on and collaborate on research projects. John Monahan co- taught a course on forensics.

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The broad goal of the project is to better understand how to convey forensic information in the courtroom in a way that is accurate and comprehensible to jurors, but also lawyers and judges. We focused initially on proficiency and error rate information, as our initial studies suggested such information is highly salient to laypeople. We are now examining new ways of conveying forensic information as the field moves towards the use quantitative conclusions. As noted, we hope that this research will be of great academic interest but of real practical use for latent fingerprint examiners, who will seek out guidance for how to best convey new methods for presenting forensic evidence to lawyers and jurors.

We believe that our new Year 2 work will make useful contributions. We expect that both of our first two large-scale empirical studies of juror attitudes will make a real impact on the forensics and the psychological literature. Leading policymakers from the Department of Justice, the NCFS, the OSAC, and individual crime labs are re-thinking how forensic conclusions should best be expressed in reports and testimony. It is an exciting time to be examining how jurors evaluate the information presented by forensic examiners and in forensic reports. We are discussing our research outcomes with Bill Thompson (UCI: CSAFE Project I) who is working on a related project but with different criteria and outcomes. Garrett and Mitchell have begun to analyze results for our second study of FRStat conclusions in collaboration with Nicholas Scurich at UCI.

Garrett and Mitchell’s “The Proficiency of Experts” article is a contribution to the legal literature and it is primarily written to reach an audience of lawyers and judges. However, forensic practitioners may be surprised to learn that state and federal judges appear not to know what to make of proficiency data. Judges rely on experience or credentials in qualifying experts, but not actual measured performance. Judges have highlighted both high

CSAFE Annual Report 06.2017 HUMAN FACTORS 174

proficiency rates and low proficiency rates to inform rulings. Other judges do not consider proficiency. Some courts allow parties to obtain discovery on proficiency data, while other courts have not. With no prior work analyzing these rulings and no one having made before the argument that the legal standards should focus on performance as expertise, we believe this will be an important piece of scholarship and useful to lawyers and judges, as well as forensic scientists. The Article comprehensively surveys judicial decisions that discuss proficiency evidence, in civil and criminal cases. That collection of judicial opinions should itself be highly useful to lawyers. The Article also reports aggregated proficiency results by commercial providers describing error rates in latent fingerprint reports from CTS from 1995- 2016. The aggregation of fingerprint proficiency test results from 1995-2015 will prove useful to situate what it is that commercial proficiency providers currently report and the variation in their results from test to test and year to year (we separately describe in Project T our plan to collaborate with CTS to collect additional information in future proficiency tests). Since this law review publication will primarily reach a legal audience, we plan to write in the future a short version of this piece, perhaps also reporting the results of the jury study regarding proficiency, for a publication geared towards forensic practitioners.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? In Year 2, we published several articles, including the results of our Year 1 study regarding how lawyers assess forensic evidence versus laypeople. Two new jury studies are have been conducted, with papers drafted, shared with colleagues, and soon to be submitted for publication. Additional work is in the journal editing process. Taken together, this work has focused on how legal actors, including lawyers, judges, and jurors, can better evaluate forensic evidence. One piece focuses on how judges can and should use proficiency information in litigation, particularly in testimony by experts. We set when and how judges should find such information relevant to fact-finders, and how legal actors should carefully examine what proficiency data is available and whether it provides sound and realistic data concerning performance in the field. Relatedly, we study how jurors assess such proficiency information, and find that jurors can do so in a calibrated way. We assess how practicing lawyers assess the strength of forensic evidence. We assess how jurors evaluate new forms of quantitative forensic evidence. Each of these studies and articles helps to inform legal actors in their efforts to understand forensic evidence and explain forensic evidence in the courtroom. We have been presenting this research in conferences both for researchers and for practicing lawyers, and as the findings continue to result in practical recommendations, we will disseminate them to lawyers and practitioners.

Recently we have been analyzing and preparing for publication new two studies. First, we have analyzed the data from a study, “The Impact of Proficiency Testing Information on the Weight Given to Fingerprint Evidence” with over 1,400 participants, focusing on how jurors evaluate

CSAFE Annual Report 06.2017 HUMAN FACTORS 175

proficiency test results in the context of fingerprint evidence. We believe that the results will be highly informative to forensic practitioners and to policymakers. The Fall 2016 PCAST report emphasized the importance of collecting and reporting proficiency information in forensics, in order to assess the quality of an examiner’s work. However, no prior studies have previously examined how jurors assess such individual-level proficiency information for an expert witness.

Using a nationally representative sample, we examined the impact of proficiency testing information on the weight given by potential jurors to a fingerprint examiner’s opinion that a defendant’s fingerprints matched latent prints recovered from a crime scene. The examiner’s level of performance on a proficiency test (high, medium, low, or very low), but not the type of errors committed on the test (false positive identifications, false negative identifications, or a mix of both types of errors), affected the weight given to the examiner’s identification opinion, which in turn affected judgments of the defendant’s guilt. Those with stronger aversions to false acquittals than false convictions gave greater weight to the fingerprint evidence, but all groups were consistently sensitive to information about the examiner’s proficiency level. Finally, our results suggest that jurors assume that fingerprint examiners are highly proficient but not perfect: evidence showing that an examiner’s proficiency level falls below 90% is likely to inform how jurors evaluate the examiner’s testimony. Thus, these mock jurors appear to be calibrated in their response to negative information about proficiency test performance across a wide range of conditions. This suggests proficiency information can play a valuable role in legal settings and it can inform jurors in a calibrated manner. We received useful feedback and analysis of these data from students in a UVA Statistics Department course in April and May 2017, we presented the findings at the National Forensic College and at the CSAFE conference in June 2017, and in this quarter we completed a draft describing the results. We have shared a more finalized draft with colleagues for their feedback and will next submit the piece for publication.

The second new study relates to quantitative conclusions in forensics. The Defense Forensic Science Center (DFSC) and U.S. Army Criminal Investigation Laboratory (USACIL) contacted us in early 2017 and described its preliminary use of a quantitative method (FRStat) to present fingerprint conclusions. They shared their approved language for setting out such conclusions and we designed and administered a first-time study to assess how jurors evaluate that type of conclusion language. We worked closely with U.S. Army Criminal Investigation Laboratory to design a study that will both make an important academic contribution but also be of practical use to the DFSC, as well as other labs that are presently considering the use of such quantitative methods in fingerprint examination. We have obtained initial results of the study and will be analyzing the data further and preparing a draft publication during this coming quarter. We are excited to be working on the first study to evaluate a new method for presenting latent fingerprint conclusions. The preliminary results

CSAFE Annual Report 06.2017 HUMAN FACTORS 176

look to be of real practical use for the U.S. Army Lab and other latent fingerprint units that are considering the use of quantitative methods, but are uncertain how and whether factfinders will comprehend this different method for presenting information about a latent fingerprint comparison. The laypeople who took the survey made certain relevant distinctions based on the degree of probability expressed in the quantitative fingerprint conclusions. The results suggest that lay people can rationally process such quantitative conclusions, they do not over or under-value them, and that the type of probability information presented by methods like FRStat can inform jurors. We have written a draft paper describing these results to share with colleagues for feedback and to submit for publication.

Brandon Garrett and Gregory Mitchell are engaged in the journal editing process for their piece, “The Proficiency of Experts,” forthcoming in the University of Pennsylvania Law Review, examining the legal uses of the proficiency of experts. The paper is described in Project W in more detail. The piece explores how proficiency information can inform the legal standard for the qualification of experts. The publication of this article in one of the highest ranked law reviews will enable the work to reach a broad audience of lawyers and judges that might be otherwise largely unfamiliar with forensic literature on proficiency testing, or with the significance of measuring an expert’s performance. We show how judges often rely on poor proxies when qualifying experts, such as credentials or experience, rather than focusing on the expert’s proficiency. We also present an analysis of twenty years of CTS proficiency tests in the area of latent fingerprint examination as a case study on why consistent annual proficiency testing is useful for a field of practice. We contrast how state and federal judges have struggled with how and whether to consider proficiency-related information. We argue that while scientists do not take the view that credentials or experience alone makes one an “expert,” in contrast, lawyers and judges need to appreciate that expertise can and should be empirically assessed using measures of performance, or proficiency.

Previously, in Year 2, we published our “Forensics and Fallibility” paper in the West Virginia Law Review, which examined how lawyers and jurors perceive fingerprint evidence as compared to DNA evidence. Garrett also published his piece, “The Constitutional Regulation of Forensics,” published in the Washington & Lee Law Review, describing recent changes in constitutional criminal procedure rulings focusing on the obligations of lawyers to investigate and assess forensic evidence in criminal cases. That piece was recently selected as one of a few pieces most useful to practicing lawyers to be highlighted in the past issue of the National Association of Criminal Defense Lawyer’s Champion magazine.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? The research was presented at forensic science and legal conferences (including the National Forensics College, and the CSAFE event at Iowa State in quarter 4, and law faculty

CSAFE Annual Report 06.2017 HUMAN FACTORS 177

workshops at the University of Chicago School of Law and the University of Virginia School of Law) and it will continue to be presented in future conferences. Articles and reports will be published in the appropriate law and/or science journals. This research has already and will continue to be shared directly with CSAFE collaborators and crime labs to guide crime laboratories on training analysts to testify in court and with guidance when drafting forensic reports.

The research has improved training and professional development for the participants. We have shared the draft with, and received feedback from, colleagues in CSAFE and other scholars and practitioners as we design research projects. We shared and presented the data from the proficiency study to a course that Karen Kafadar teaches in the UVA Statistics Department in which students themselves were the “consultants” and evaluated the best statistical approaches for analyzing these datasets. We continue to discuss possible collaborations with additional CSAFE researchers.

We have included some of this research in the seminar that Monahan and Garrett are teaching at the law school in Fall 2017, which will include law students and visiting researchers. We at UVA have also met as a group, and as a subgroup of the persons working on law-related projects, to regularly discuss research design. We have had regular conference calls with crime laboratories to share research ideas as well.

Garrett presented work at an American Bar Association conference in Washington D.C. in November 2016. Garrett presented general work on wrongful convictions and discussed forensic science research specifically, at a series of workshops and seminars in China in December 2016 (arranged by the New York University U.S. Asia Law Institute). Garrett also attended a science and law panel, which included Simon Cole of CSAFE, on teaching science in law schools at the American Association of Law Schools conference in January 2017.

We also have discussed the preliminary results with crime lab professionals, including those at Houston Forensic Science Center (HFSC), Defense Forensic Science Center (DFSC) and the Virginia Department of Forensic Science (VADFS). We will continue to involve additional laboratories as possible collaborators. As noted, discussions about this research on how jurors evaluate forensic evidence with the USACIL and DFSC led to the new project studying how jurors evaluate quantitative presentation of latent fingerprint evidence. DFSC is extremely interested in the results, as they inform the lab on how its new FRStat approach will be evaluated by jurors. Further, DFSC has plans to share FRStat software with other labs, who may in turn also be interested in learning more about how to best communicate this type of evidence in the courtroom.

CSAFE Annual Report 06.2017 HUMAN FACTORS 178

Garrett and Kelley presented some of this research at the Innocent Network Conference in March 2017, in a panel generally on cognitive bias and how to collaborate with and assist laboratories conducting forensic audits. The panel included Summer Farrar, of the Midwestern Innocence Project (and formerly of the Kansas City Crime Laboratory), and Amelia Maxwell, of the Maryland Public Defender. Lawyers working with crime labs in multiple states on audits of various types were in attendance. Simon Cole also attended that panel and the conference was an opportunity to discuss our various research projects with Cole generally. Garrett was also able to meet with Dean Jennifer Mnookin and discuss CSAFE research projects during a non-CSAFE related visit to give talks at UCLA in March 2017.

Garrett also presented both the proficiency and FRStat-related jury study results to the National Forensic College on June 7. The audience, consisting largely of defense lawyers, was extremely surprised to hear as a preliminary matter that jurors placed so much weight on fingerprint evidence, including as much or more weight than they place on DNA evidence. They were surprised to hear that even when hearing information about unusually low proficiency (which hopefully would not be realistic in practice), with a hypothetical examiner making false positive errors in 40 out of 100 tests, that laypeople still placed a great deal of weight on the examiner’s conclusions. They also were surprised to hear that nearly one-half of our nationally demographically representative pool of laypeople placed equal weight on errors that lead to a guilty person not being convicted as errors that lead to convicting the innocent. People have strong priors, on average, that it is of great importance that guilty people be held to account and that forensic evidence is generally quite reliable, even if no examiner may be perfectly proficient. We found that these results provided an important reality check for defense lawyers: they reported finding the presentation very valuable. These defense lawyers also had questions about how to explain quantitative evidence to jurors and how to question an examiner presenting, in part, evidence based on an algorithm in the courtroom. The responses to Garrett’s presentation, and Henry Swofford’s (DFSC) presentation, illustrated the importance of explaining how quantitative evidence is analyzed to a legal audience.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? We have shared work and research plans with scholars, CSAFE members and crime labs, and work from this project and others will be or has been presented at the AAFS, IAI, the Innocence Network Conference, the CSAFE conference at Iowa State, the National Forensics College at Cardozo Law, and in law conferences and workshops, including at the University of Chicago and the University of Virginia. We at UVA have also met as a group, and as a sub- group of the persons working on law-related projects, to regularly discuss research design.

CSAFE Annual Report 06.2017 HUMAN FACTORS 179

We have continued regular conference calls with crime laboratories to share research ideas as well.

As described elsewhere, these studies fit well with the larger projects across CSAFE that are examining how forensics are used in the courtroom. We will continue to communicate and collaborate with Projects F and I (Simon Cole and William Thompson) on selecting conclusion terms used, so that connections among the projects lead to consistent and useful guidance and comparisons. As new guidance emerges on new recommended or standard language for forensic testimony, we will collaborate with others in CSAFE, including William Thompson, Hal Stern, Simon Cole and others at UCI, as well as those at UVA such as Dan Murrie, Greg Mitchell, Sharon Kelley, and John Monahan, to conduct mock jury experiments assessing that proposed, recommended, or adopted standard language. Our new work in progress will involve Nicholas Scurich at UC Irvine as a collaborator. Such studies will provide ready and immediate use to crime laboratories and to legal practitioners.

As described in Project W, Garrett also drafted an amicus brief describing legal and scientific issues concerning toolmark evidence for a Colorado case. The brief was filed in early 2017. Roughly 25 scientists and legal academics have agreed to sign on to the brief, including several of our CSAFE colleagues. Garrett has been editing the brief to reflect the input of these scientists and law professors. The hope is that the brief will encourage courts to look more carefully at what conclusions can and cannot be reached when conducting toolmark comparisons. The brief will be published as an article when the litigation is concluded.

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? In the next year, we will submit for publication the large-scale Year 1 proficiency study relating to latent fingerprint comparisons. We will also submit for publication the second study relating to FRStat quantitative fingerprint conclusions. Greg Mitchell and Brandon Garrett at UVA Law, with Nicholas Scurich at University of California, Irvine, are currently completing a draft reporting the results of this study. We have received invaluable help from Henry Swofford at DFSC, who has shared the new technical report language about conclusions.

Garrett serves as an associate reporter on the American Law Institute (ALI) Principles of Policing project. While the research is not supported by CSAFE (the ALI has already provided research assistant support for that work), Garrett has begun drafting principles relevant to such subjects to guide policing agencies. While largely geared towards policy, these principles will be relevant to forensic practitioners as well and perhaps as they move towards completion, writing about them in a publication for forensic practitioners would be useful.

Later in Year 2, we considered follow-up studies, such as an additional proficiency-related study, and will begin to design studies to administer in Year 3. As noted in prior reports, one

CSAFE Annual Report 06.2017 HUMAN FACTORS 180

possibility might involve additional forensic disciplines, and another would involve a more detailed and realistic criminal trial scenario, or a focus on cross-examination, jury instructions, or other portions of a trial. For this study as well, we will collaborate with Project I.

Additional Year 3 work will focus on the editing and publication process for the “The Proficiency of Experts,” Article co-authored by Garrett and Mitchell, in the Pennsylvania Law Review. Very few pieces on evidence, much less scientific or forensic evidence, have been published in a top law review. We are thrilled with the placement and look forward to working with the law review editors, who plan to begin their editing and production process at the end of this next quarter. As described, we argue that expertise should be assessed on the basis of demonstrated performance on not on proxies such as credentials and years in practice. The Article explores the standard for expert qualification and how proficiency is litigated and handled by the courts. The empirical results of the separate jury studies may inform the analysis, but the analysis is legal. We found that judges have struggled with how and whether to consider proficiency-related information at the lab and examiner level. Some judges, state and federal, have variously highlighted both high and low proficiency rates to inform their rulings. Other judges have failed to consider proficiency at all. Some courts have permitted lawyers to obtain discovery on the question of proficiency, while other courts have found the subject not relevant.

Given the existence of the confusions across the legal landscape, we believe that the roadmap we provide will be an important piece of scholarship and useful to lawyers and judges, as well as to forensic scientists. Our article also reports aggregated proficiency results by CTS, which will provide a further useful resource. Finally, the article reports comparative research, including description of proficiency testing schemes in Canada, Germany, and the U.K., based on Garrett’s communications with forensic scientists and researchers in each of those countries. CSAFE collaborators suggested that we prepare a shorter version of this analysis and the results of the jury study on proficiency for a forensic practitioner’s journal. We plan to do so closer to the time of publication of both this law paper and the proficiency study. We anticipate that the results will be highly relevant for forensic practitioners.

Garrett presented the “Proficiency of Experts” article at the University of Chicago School of Law in May and at the University of Virginia School of Law in July. Garrett presented the two jury studies in progress at the National Forensics College at Cardozo Law School in June as at the CSAFE conference in Ames, Iowa, in June.

CSAFE Annual Report 06.2017 HUMAN FACTORS 181

Products

JOURNAL PUBLICATIONS

Greg Mitchell and Brandon L. Garrett, The Impact of Proficiency Testing Information on the Weight Given to Fingerprint Evidence (work in progress)

Brandon L. Garrett, Greg Mitchell, and Nicholas Scurich, How Jurors Assess Quantitative Fingerprint Evidence (work in progress)

Brandon L. Garrett and Greg Mitchell, The Proficiency of Experts, 163 UNIVERSITY OF PENNSYLVANIA LAW REVIEW __ (forthcoming 2018)

Brandon L. Garrett, The Crime Lab in the Age of the Genetic Panopticon, 114 MICHIGAN LAW REVIEW 979 (2017)

Brandon L. Garrett and Greg Mitchell, Forensics and Fallibility: Comparing the Views of Lawyers and Jurors, 119 West Virginia Law Review 621 (2016)

Brandon L. Garrett, Constitutional Regulation of Forensic Evidence, 73 Washington & Lee Law Review 1147 (2016)

Brief of Amici Curiae Brandon L. Garrett and 35 Scientists, Statisticians, Law and Science Scholars, and Practitioners, People v. Genrich, No. 2016CA651 (Co. Ct. App. February 9, 2017).

OTHER PUBLICATIONS, CONFERENCE PAPERS AND PRESENTATIONS. Garrett discussed “The Proficiency of Experts” at the University of Chicago School of Law and at the University of Virginia School of Law in July. Garrett presented “The Impact of Proficiency Testing Information on the Weight Given to Fingerprint Evidence” and “Forensic and Fallibility” at the Innocence Network Conference in San Diego, CA, as well as at the National Forensic College and the CSAFE conference at Iowa State University, also discussing the preliminary findings of “How Jurors Assess Quantitative Fingerprint Evidence.”

Garrett also presented general work on wrongful convictions and discussed forensic science research specifically at a series of workshops and seminars in China in December 2016, arranged by the New York University U.S. Asia Law Institute, and specifically at: the East China University of Political Science and Law (ECUPL) in Shanghai, the Southwest University for Nationalities in Chengdu, and the China University of Political Science and Law (CUPL) in Beijing.

CSAFE Annual Report 06.2017 HUMAN FACTORS 182

WEBSITE(S) OR OTHER INTERNET SITE(S) The forensics forum blog. https://forensicsforum.net/ . The two law student research assistants that also helped with legal research on forensics projects also spent a half hour or hour each week posting items to that blog, in the interest of educating and bringing together the forensics community and disseminating information on news and research and legal rulings. Book reviews and other features, posted by other academics, also appeared on the blog. The law students have prepared research material, for example, on litigation of forensic proficiency and on tool mark evidence, which may be posted on the blog, or result in student publications.

TECHNOLOGIES OR TECHNIQUES N/A. The techniques used to study jurors and judges and lawyers are well-established. This research will not provide new technology, but rather improved practices, as described above.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Brandon L. Garrett (UVA), Principal Investigator. Beginning in Year 2, Garrett will receive three teaching credits (one lecture course) in total CSAFE support. Dan Murrie (principal investigator) - funding support in Years 1-2. Planned and participated in design of future experiments and collaborations described separately with crime laboratories. Sharon Kelley, researcher – funding support in Years 1-2. Planned and participated in design of future experiments and collaborations described separately with crime laboratories. Garrett, Murrie, and Kelley each spent at least one-person month of time on this and their other CSAFE projects (combined). Gregory Mitchell – Receives no salary from CSAFE; but does receive a modest amount of CSAFE support for the jury survey research materials. Worked on jury surveys and law review articles. John Monahan – Receives no salary from CSAFE. Participated in teaching and hosting guest speakers for the Fall 2016 course. Law student research assistant performed small amounts of hourly work, assisting with research projects related to CSAFE, helping with course materials, and with education and outreach projects. Students in Kafadar’s undergraduate statistical consulting course also provided assistance with data analysis (no charge to CSAFE).

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? We have exchanged information and research ideas with CSAFE researchers at UCI. As noted, we have had ongoing and very valuable discussions with professionals at several

CSAFE Annual Report 06.2017 HUMAN FACTORS 183

crime labs. None of those collaborations involved any financial or in-kind support. So far, for this project, the collaborations have involved discussions only. However, the U.S. Army Crime Lab (USACIL) has shared their new approach towards latent fingerprint conclusions and materials that we plan to incorporate into a future study. Further, as part of a different project we have received data and plan to publish an article co-authored with professionals at the Houston Forensic Science Center. We continue to reach out to other possible partners; Garrett recently spoke to David Yokum, a research at Lab@DC, which works on evidence- based projects with Washington D.C. government, including the crime laboratory. We plan to explore possible research collaborations in this coming quarter.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? This research will provide guidance to crime laboratories and forensic practitioners, so that they better understand what jurors hear and understand when they testify in court. Doing so is critically important during a time in which forensic technology and standards are changing. This research will also provide guidance to policymakers, such as the OSAC when it formulates guidelines and standards for the presentation of forensic science findings.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Outside the legal and forensics communities, this research will also impact psychologists more broadly interested in understanding how people evaluate complex evidence, including scientific evidence. The research also will be of interest to legal scholars in the broader evidence law and criminal law fields, beyond those who study forensics.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? This research will continue to be incorporated into the law school teaching materials described in project W.

WHAT IS THE IMPACT ON PHYSICAL, INSTITUTIONAL, AND INFORMATION RESOURCES THAT FORM INFRASTRUCTURE? This research is designed to inform the policies and procedures used in crime laboratories and in the courts---the regulation and legal regulation of forensic practice.

WHAT IS THE IMPACT ON TECHNOLOGY TRANSFER? This research will not provide new technology, but it will provide improved practices, as described above.

CSAFE Annual Report 06.2017 HUMAN FACTORS 184

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? We hope that this research will help to better identify how jurors, judges and lawyers assess forensics and guide future outreach and education efforts. Already, we have a stronger sense of what judges know about forensics and how to reach them using quantitative information, in particular. We know far more about how jurors assess proficiency information, and not just verbal forensic conclusions. We now have a better sense of how jurors assess quantified information about disciplines that had not previously had conclusions expressed in that form.

The proposed research has already been and will continue to be published in top law reviews and interdisciplinary journals read by members of the legal community. The research has already been, and will continue to be, presented at training conferences and academic conferences for lawyers and law professors and also communicated with forensic practitioners. The research will be incorporated into law curricula as well.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? N/A

Changes/Problems Changes in approach and reasons for change Nothing to Report, except the good news that we plan to prioritize work on one new study over another, focusing on the quantified latent fingerprint conclusion study over other follow- ups to Year 1 studies. As noted, we had already planned to do that work when the subject was ripe to be researched, and we are excited that the subject is now ripe for research.

Significant changes in use or care of human subjects, vertebrate animals, and/or biohazards

N/A – all survey work was approved by UVA and NIST’s IRBs.

CSAFE Annual Report 06.2017 HUMAN FACTORS 185

Project W - Legal Education and Forensic Evidence

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Brandon L. Garrett (UVA)

Other Investigators: Gregory Mitchell (UVA), John Monahan (UVA), Dan Murrie, Sharon Kelley (UVA)

Note: Law school faculty Gregory Mitchell and John Monahan are not receiving any salary as part of CSAFE, but they have advised on and collaborated on research projects. John Monahan is co-teaching a course on forensics.

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The broad objective of this project is to educate future lawyers, practicing lawyers, judges, and the forensic community concerning the connections between forensics research and the law.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS At UVA Law we have focused on two courses: a flagship forensic science seminar, with visiting scholars, researchers and practitioners discussing cutting-edge issues in forensics; and second, a forensics litigation course designed to be replicated in other Universities and trainings to teach law students and practicing lawyers how to litigate expert evidence questions in a mock-trial setting. The courses and have allowed us to develop new ways of teaching about forensics, scientific evidence, and statistics to law students. The professional development opportunities flowing from bringing scholars and practitioners to speak at UVA Law, together with faculty from a range of disciplines around the University, have been enormous, and we could not be more pleased with the ways in which CSAFE has enhanced the curriculum for law students and expanded opportunities for faculty.

The course evaluations for both the Fall 2016 Current Issues in Forensic Science Seminar and the Forensics Litigation course were excellent. Both had waiting lists among law students and the practicing lawyers who could also participate in the litigation course. Students were enthusiastic and they offered a few useful suggestions that we will incorporate in future courses. The UVA Law Dean’s offices have approved our requests to teach both the Seminar and the January Term course next year; both courses will be offered in 2017-2018, and we are well underway in our planning for the Fall 2017 forensic seminar. We are extremely grateful for this ongoing support from UVA Law, as the great bulk of the cost of the courses is borne by the law school; only modest amounts of CSAFE funds have been spent on outside speakers and the co-instructor.

CSAFE Annual Report 06.2017 HUMAN FACTORS 186

The forensic science seminar offered in the Fall, involves guest speakers from UVA but also other Universities, including CSAFE collaborators, presenting research related to forensic science. We have incorporated background concerning statistical concepts relevant to forensics and will have a class explicitly dedicated to such issues next Fall. Scholars from the Statistics Department, the Psychology Department, the Public Policy school, and the Institute for Law, Psychiatry, and Public Policy at UVA have all attended the seminar in addition to colleagues at the law school, and the students enrolled. The law students enrolled in the seminar must write detailed papers commenting on the scholarly work and then participate in discussions with the visiting scholars. We have taken our forensics seminar students to the Virginia Department of Forensic Science in each semester for a tour of their Richmond facility and to meet and discuss law, crime lab management, and forensic science with their general counsel. We will do that again in November 2017. The course has been extremely successful in our view, not only in introducing students to key issues in forensics, scientific evidence, and statistics and law, but teaching them how to read forensic science scholarship and carefully consider how that work may affect practitioners and the legal system. The visiting scholars, in their interactions with researchers at UVA, have prompted new collaborations and research ideas as well. CSAFE researchers such as Simon Cole (UCI) and William Thompson (UCI) have been among the invitees. We have also discussed incorporating material developed in this course into the Quantitative Methods course that a colleague, Josh Fischman, has begun offering again at UVA Law (that course is not supported by CSAFE.)

We are pleased to report that the Forensics Litigation course taught in the January Term was highly successful. We believe that the Forensics Litigation course was one of the first of its kind in the country. It was an entire mock trial centering on forensic evidence. Co-instructor Kate Philpott acted the role of the latent fingerprint examiner. The Virginia Department of Forensic Science generously shared a case file from a latent fingerprint case, and we tailored that file to our case, including an APHIS hit and using just one of the prints from that file. We have a wait list of lawyers for future offering of this course. We applied and received Virginia Continuing Legal Education Credit for the practicing lawyers who attended the course. They could attend for free, but they had to apply to participate. We vetted their applications to gauge level of interest, and to select a diverse pool of lawyers, both criminal defense lawyers and prosecutors, and from as many different jurisdictions and types of offices (or solo practitioners) as possible. With a small number of participants receiving this high-level training, we wanted to extend its impact as broadly as possible, so each person could go back to their jurisdiction and office and teach others how to litigate forensic evidence in a careful and informed way. We assigned the students and the lawyers substantial reading, ranging from the PCAST and NAS Reports to studies regarding cognitive bias and background descriptions of the ACE-V method. We highlighted the statistical analysis of studies concerning latent fingerprint

CSAFE Annual Report 06.2017 HUMAN FACTORS 187

examination in the PCAST report. We also gave the students transcripts from criminal trials of latent fingerprint examiners so that they can see how such testimony develops at a trial and so that they could prepare their own line of questioning. We also gave the students a mock C.V. for this fictional fingerprint examiner.

The students began by interviewing the fingerprint examiner in preparation for trial and reviewing all of the documents and casework with her. The law students prepared extremely hard – we were so impressed throughout at how their performance compared with the practicing lawyers. (The lawyers were generous in giving advice to the students and were similarly impressed. In fact, our local Commonwealth Attorney asked the student to deliver the final closing argument in his place, with him doing just the rebuttal because she was so eager and impressively prepared). The participants next argued motions in limine. We might consider adding written motion drafting to the course in the future. They then examined the expert to qualify her. Once the examiner was qualified as an expert, the direct examination by the Commonwealth began (we treated this as a Virginia case although we told the participants to refer to Daubert and more general standards for expert admissibility). The bulk of the day was occupied by cross-examination, which we broke down into separate stages by subject matter. There was some overlap, which we had worried would add redundancy – we were surprised that it was actually a great benefit, because different students and lawyers approached subjects quite differently in their questioning and comparing those approaches was a very useful teaching tool.

Judge Stephanie Merritt, Chief Judge of the 9th Judicial District was a wonderful participant, providing thoughtful advice at each stage, to students and lawyers alike. Judge Merritt was generous in contributing an entire day to preside over this mock trial and was similarly generous in offering to do so again next time we teach the course. Judge Merritt had previously served as general counsel to the Virginia Department of Forensic Science, so was deeply informed on forensic issues and law.

Students and lawyers reported that they benefited enormously from the hands-on experience, and even the attorneys with the most prior experience commented that they understood the material with far greater sophistication than ever before. The course evaluations were very positive, with an overall score of 4.33 out of 5, which is quite good and well above average at the law school. The only score that was lower (the overall score would have been still higher otherwise) was the score regarding the readings and the coursepack. The students felt the readers were too long and could have been better tailored to the classroom discussion. We agree with their assessment – it was a new course and complex material – and we have a good plan for how to simplify the coursepack and the readings, while still providing the same preparation and information to the students. Both the prosecutors and defense lawyers

CSAFE Annual Report 06.2017 HUMAN FACTORS 188

commented that they had never realized how much one needs to know to truly understand what forensic examiners do and how to properly litigate their evidence in court.

We were careful to select no more than one lawyer for any given office, to try to give as many offices access to this training as possible. The format with no more than 20-24 participants allowed us to provide a simulated trial experience and in-depth feedback. In the future, we hope that more and more students and lawyers can take the class. We hope that this course will be extremely useful as a model for teaching forensic skills to lawyers. After we revise and improve our course materials, we will share them online with interested colleagues, who we hope will teach their own courses. Colleagues at other universities have already expressed strong interest in using those materials, as have individuals organizing trainings for practicing lawyers. We think that the mock trial design runs extremely well; even on a first trial run it went very smoothly. The Virginia Department of Forensic Science is presently creating a mock case file that is appropriately redacted so that it can be freely shared with other instructors. Once we incorporate the case file that DFS is generously providing, we can make our course materials publicly available.

A small CSAFE contribution continues to fund the minimal costs for the domain name and wordpress registration for the forensics forum blog. https://forensicsforum.net/ The law student research assistants who help with legal research on forensics projects also spend about an hour each week posting items to that blog, in the interest of educating and bringing together the forensics community and disseminating information on news and research, legal rulings, book reviews and other features posted by other academics.

Related to the education work, we continue to write publications on forensics for a legal audience. In Year 2, Garrett published “The Constitutional Regulation of Forensics,” in early 2017. That piece was recently selected as one of a few pieces most useful to practicing lawyers to be highlighted in the upcoming issue of the National Association of Criminal Defense Lawyer’s Champion magazine, which is widely read by criminal defense lawyers. Garrett recently approved the short summary of his article that will be published in the Champion. The West Virginia Law Review also published the article by Garrett and Mitchell, “Forensics and Fallibility: Comparing the Views of Lawyers and Jurors,” which analyzes how lawyers and jurors perceive fingerprint evidence compared to DNA evidence.

Garrett and Mitchell substantially revised, submitted for publication, and received an offer of publication for their article, “The Proficiency of Experts” in the University of Pennsylvania Law Review. As described in the Project U report, this article examines what the standard should be for the qualification of an expert. Scientists treat expertise as a question of performance. The legal system, however, has traditionally qualified experts almost solely based on credentials and experience, which may not always be good proxies for performance. We suggest that proficiency should instead inform legal analysis. We examine data from

CSAFE Annual Report 06.2017 HUMAN FACTORS 189

fingerprint proficiency testing showing the variability in error rates over two decades of testing. We explore how the state and federal courts, in both civil and criminal cases, remain divided on: (1) whether information about proficiency is relevant to questions of admissibility of scientific evidence; (2) how that information is relevant; and (3) whether proficiency information that does exist must be shared between the parties. Garrett and Mitchell recommend realistic blind proficiency. But the legal recommendation that we focus on is that judges should treat expert qualification as a question of proficiency and not rely solely on credentials and expertise. We recommend changes to the interpretation of the Supreme Court’s Daubert ruling and Federal Rule of Evidence 702, as well as state court analogues, and argue that for any expert that is qualified, courts should make proficiency data available to the parties. Garrett and Mitchell explore regulatory approaches towards proficiency adopted for clinical laboratories in the U.S. and in several foreign countries for forensic laboratories. Garrett presented that work at the Innocence Network Conference in March 2017. Garrett presented “The Proficiency of Experts” at a workshop at U. Chicago Law School in May 2017 and, with Greg Mitchell, at the University of Virginia School of Law in July 2017 and we received helpful feedback. A research assistant has also assisted in reviewing the coding of proficiency test results that are summarized in a table in the article.

Garrett also discussed the findings in that paper as well as two studies that are part of Project U at the National Forensics College at Cardozo Law on June 7, 2017. At that training, Garrett presented following Henry Swofford of the Defense Forensic Science Center, U.S. Army Criminal Investigation Laboratory, and spoke to lawyers about the importance of new quantitative approaches towards latent fingerprint evidence and also understanding how lay jurors will evaluate it.

Garrett also drafted and counsel filed an amicus brief describing legal and scientific issues concerning toolmark evidence in a Colorado criminal case from the early 1990s, at a time when there were not the same standards for presenting such evidence in court. Thirty-five leading scientists and legal academics signed on to the brief, including several CSAFE colleagues. Garrett has been editing the brief to reflect the input of these scientists and law professors. The hope is that the brief will encourage courts to look more carefully at what conclusions can and cannot be reached when conducting toolmark comparisons. Once the litigation is complete, Garrett plans to redraft the brief in a form suitable for an academic publication, with added discussion of recent statistical work supported by NIST in the area, and perhaps with the same signatories added as co-authors if they are interested in joining.

CSAFE Annual Report 06.2017 HUMAN FACTORS 190

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Garrett and Monahan both have developed greater familiarity with the work of a range of scholars through the teaching of their seminar on forensic science. Garrett and Philpott have developed new learning on how to teach litigation skills in forensics, something that trial advocacy courses traditionally have never done, and this novel expertise is something we hope to share broadly as the course is taught. The professional development opportunities flowing from bringing people like Simon Cole in to speak at UVA Law, together with faculty from a range of disciplines around the University, have been enormous. The discussions have sparked new ideas for research and they will continue to do so, perhaps even more so than if it were a conference with many visitors at once—we have an opportunity to spend an extended amount of time with a single scholar and read her work carefully and discuss it before then meeting with the visitor. The law writing and research has itself been highly beneficial and advanced the understanding of Garrett and Mitchell, particularly improving our understanding of how proficiency evidence is litigated. Our research assistants who have helped to work on these projects have themselves been excited to learn more about forensics and have greatly contributed to the projects.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? The proposed legal research has already been and will continue to be published in top law reviews and interdisciplinary journals read by members of the legal community. The research has already been, and will continue to be, presented at training conferences and academic conferences for lawyers and law professors, as well as forensics conferences. Garrett presented at an American Bar Association conference. In addition, the Washington & Lee Law Review solicited response papers to Garrett’s “Constitutional Regulation of Forensics” article for an online feature. In the Spring, work was presented at the AAFS and the Innocence Network conferences, at the University of Chicago School of Law, at the University of Virginia School of Law, at the June 7-9 CSAFE conference at Iowa State University, and at the June 7 National Forensics College at Cardozo Law School.

The research will be incorporated into the law courses and curricula shared with CSAFE participants, but also with other Universities as well; we have already contacted faculty at law schools who currently teach or might be interested in teaching forensic science related courses and might benefit from these materials. Several have expressed interest in offering a course using the materials from the forensic science litigation course to be taught in January 2017; law schools are generally more interested in offering practical or “experiential” courses, so we are hopefully that there will be broad interest.

These course materials, Powerpoint, curricula can be freely shared, with no restrictions, but we will want to “road test them,” and receive course evaluations and feedback from

CSAFE Annual Report 06.2017 HUMAN FACTORS 191

participants, before disseminating them online and through contacts with individual academics.

In addition, Garrett continues to participate as an Associate Reporter in the American Law Institutes (ALI) project on policing, with a substantial section concerning the role of law enforcement concerning forensic evidence to be drafted in 2017. Initial sections regarding collection of evidence generally have already been drafted. That work is supported solely by the ALI but it is highly related and synergistic with outreach work to the legal community. Representatives from policing organizations, leading judges, and leading lawyers all participate as Advisers in the project.

We continue to explore other educational and outreach opportunities to the legal community. We will explore adding the participation of additional law school faculty in the courses being offered and encouraging faculty at other law schools to use these materials developed. We continue to discuss possible training efforts with local and national criminal defense and prosecutor’s organizations, as well as with several crime labs. We invited the Virginia Department of Forensic Science to participate in person at our forensic litigation course, and while the schedules of their examiners did not permit it, we will continue to offer the invitation. We also sent invitations regarding the course to the Virginia Indigent Defense Association and the Virginia Commonwealth Attorney’s Association, both of which circulated the information to their members, accounting for the high level of demand that we experienced. Garrett and Philpott have discussed possible methods for incorporating their interactive course into the National Forensic College curriculum in the future. We continue to discuss possible training efforts with local and national criminal defense and prosecutor’s organizations, as well as with several crime labs. We invited the Virginia Department of Forensic Science to participate in person at our forensic litigation course, and while the schedules of their examiners did not permit it, we will continue to make the invitation. We have taken our forensics seminar students to the Virginia Department of Forensic Science in each semester for a tour of their Richmond facility and to meet and discuss law, crime lab management, and forensic science with their general counsel. We will do that again in November 2017. We also sent invitations regarding the course to the Virginia Indigent Defense Association and the Virginia Commonwealth Attorney’s Association, both of which circulated the information to their members, accounting for the high level of demand that we experienced. We are reaching out to the Department of Forensic Sciences in Washington D.C. as well. We will continue work hard to develop new opportunities to provide forensics and statistical training to lawyers and judges.

CSAFE Annual Report 06.2017 HUMAN FACTORS 192

WHAT DO YOU PLAN TO DO DURING THE NEXT YEAR TO ACCOMPLISH THE GOALS? In the next year, we will continue to plan for teaching the two courses, the seminar and forensics litigation course, again in the 2017-2018 school year. We will continue work editing law review articles for publication and will continue to develop new projects.

All data relevant to such scholarship will be made publicly available. We will continue to promote forensics dialogue and research on the forensics blog.

We will continue to use course evaluations to assess curricula and course materials, to improve them, and to then distribute them to other faculty interested in teaching such courses for law students or in trainings for practicing lawyers. We will continue to work with outside groups of practicing lawyers, including by preparing materials for the ALI Principles of Policing project, for which principles relating to evidence gathering have been drafted, and principles relating to forensic evidence are underway.

Products

JOURNAL PUBLICATIONS Brandon L. Garrett and Greg Mitchell, The Proficiency of Experts, 163 UNIVERSITY OF PENNSYLVANIA LAW REVIEW __ (forthcoming 2018)

Brandon L. Garrett, The Crime Lab in the Age of the Genetic Panopticon, 114 MICHIGAN LAW REVIEW 979 (2017)

Brandon L. Garrett, Constitutional Regulation of Forensic Evidence, 73 Washington & Lee Law Review 1147 (2016)

Brandon L. Garrett and Greg Mitchell, Forensics and Fallibility: Comparing the Views of Lawyers and Jurors, 119 West Virginia Law Review 621 (2016)

OTHER PUBLICATIONS, CONFERENCE PAPERS AND PRESENTATIONS Garrett presented “The Proficiency of Experts” paper at an American Bar Association conference in November in Washington D.C.

Brief of Amici Curiae Brandon L. Garrett and 35 Scientists, Statisticians, Law and Science Scholars, and Practitioners, People v. Genrich, No. 2016CA651 (Co. Ct. App. February 9, 2017).

Garrett discussed “The Proficiency of Experts” at the University of Chicago School of Law and at the University of Virginia School of Law in July. Garrett presented “The Impact of Proficiency Testing Information on the Weight Given to Fingerprint Evidence” and “Forensic and Fallibility” at the Innocence Network Conference in San Diego, CA, as well as at the

CSAFE Annual Report 06.2017 HUMAN FACTORS 193

National Forensic College and the CSAFE conference at Iowa State University, also discussing the preliminary findings of “How Jurors Assess Quantitative Fingerprint Evidence.” Other work was presented at the AAFS.

Garrett presented general work on wrongful convictions and discussed forensic science research specifically at a series of workshops and seminars in China in December 2016, arranged by the New York University U.S. Asia Law Institute, and specifically at: the East China University of Political Science and Law (ECUPL) in Shanghai, the Southwest University for Nationalities in Chengdu, and the China University of Political Science and Law (CUPL) in Beijing.

WEBSITE(S) OR OTHER INTERNET SITE(S) The forensics forum blog. https://forensicsforum.net/ . The two law student research assistants that also helped with legal research on forensics projects also spent about an hour each week posting items to that blog, in the interest of educating and bringing together the forensics community and disseminating information on news and research and legal rulings. Book reviews and other features, posted by other academics, also appeared on the blog. The law students have prepared research material, for example, on litigation of forensic proficiency, litigation under new post-conviction statutes in Texas and California that permit reopening closed cases based on changed scientific research, to evaluate how those statutes have been interpreted in the courts. Ultimately, that work may be posted on the blog, or result in student publications.

OTHER PRODUCTS Each year revised curricula, PowerPoint slides, and course materials from the law school courses will be disseminated to other law schools and universities (there is no restriction on freely sharing such materials), as well as shared with legal education groups like the American Bar Association and the National Association of Criminal Defense Lawyers.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Brandon L. Garrett (UVA), Principal Investigator. Beginning in Year 2, Garrett will receive three teaching credits (one lecture course) in total CSAFE support.

Dan Murrie (principal investigator) - funding support in Years 1-2. Planned and participated in design of future experiments and collaborations with crime laboratories described in separate Projects.

CSAFE Annual Report 06.2017 HUMAN FACTORS 194

Sharon Kelley, researcher – funding support in Years 1-2. Planned and participated in design of future experiments and collaborations with crime laboratories described in separate Projects.

Garrett, Murrie, and Kelley each spent at least one-person month of time on this and their other CSAFE projects (combined).

Gregory Mitchell – Receives no salary from CSAFE; but does receive a modest amount of CSAFE support for the jury survey research materials. Worked on jury surveys and law review articles.

John Monahan – Receives no salary from CSAFE. Participated in teaching and hosting guest speakers for the Fall 2016 course.

Kate Philpott was hired as a contractor to co-teach the Forensics Litigation course in January, and prior to that received a small amount of hourly pay to help develop teaching materials.

Law student research assistant performed small amounts of hourly work, assisting with research projects related to CSAFE, helping with course materials, and with education and outreach projects.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? We have exchanged information and research ideas with CSAFE researchers at UC-Irvine and have invited as speakers researchers from several universities (including Bill Thompson of UC-Irvine for Fall 2017). As noted, we have had ongoing discussions with professions at several crime labs. None of those collaborations has involved any financial or in-kind support. So far, for this project, the collaborations have involved discussions only. We brought criminal defense lawyers and prosecutors from a range of organizations and jurisdictions, to participate in the January Term 2017 class, as well as a sitting state court judge.

For our other UVA projects we are working with crime labs including the Houston Forensic Science Center, the Virginia Department of Forensic Science, and the Defense Forensic Science Center, and we continue to explore work with additional collaborators.

Provide the following information for each partnership:See (2) above.

Also describe any significant:

As described, we have invited and had the participation of colleagues in Statistics, Psychology, and Public Policy in the educational events that we have organized, the Virginia

CSAFE Annual Report 06.2017 HUMAN FACTORS 195

Department of Forensic Science continued a case file for our Forensics Litigation course, and we taught a course to students but also practicing lawyers and judges.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? There has been a longstanding concern that the legal community lacks expertise on the uses and limitations of scientific evidence. The courses developed, with some CSAFE support, but chiefly supported by the University of Virginia School of Law, focus on pattern evidence, and particularly fingerprint evidence, which was addressed by most of our visiting leading academic researchers. For a second skills course, we developed a curriculum that can be broadly shared and replicated, that can provide high-level litigation skills to lawyers so that they understand how to prepare, defend, and challenge forensic evidence in the courtroom, with sensitivity to the boundaries of the research and the roles of judges and jurors. The practical simulations in this skills course exclusively involved fingerprint evidence. Both courses also focus on statistical thinking and interpretation, in research, litigation, and to inform crime lab work. We will continue to bring in forensic practitioners into the classroom and conversely bring law students to the crime lab. We are producing legal scholarship that explains forensics concepts and legal regulation of forensics to lawyers and the forensics community, and the publications and any underlying data will all be made publicly available. We are teaching courses on new developments in forensics to assist in the NIST and CSAFE mission to foster a community of scholars and practitioners engaged in such issues.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? The focus is on legal education and research but psychologists and forensics professions will also benefit. The proposed legal research has already been and will continue to be published in top law reviews and interdisciplinary journals read by members of the legal community. The research has already been, and will continue to be, presented at training conferences and academic conferences for lawyers and law professors. The research will be incorporated into the law courses and curricula shared with CSAFE participants, but also with other Universities as well; we have already begun a canvas of what faculty at law schools currently teach or might be interested in teaching forensic science related courses and might benefit from these materials. These course materials, PowerPoint, curricula can be freely shared, with no restrictions, but we will want to teach from them one, and receive course evaluations and feedback from participants, before disseminating them.

CSAFE Annual Report 06.2017 HUMAN FACTORS 196

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? As described, the entire focus is on education, for young lawyers in particular, but also practicing lawyers seeking skills in forensics litigation. We will be developing course materials, etc., that as described above will be shared to encourage similar teaching efforts at other universities.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? The law students and lawyers trained will be introduced, in class, to forensic practitioners and will even hear about the work of lawyers who directly represent crime labs. They will be exposed to forensic practice in ways not previously done in law schools, and in turn, the research and publications developed to guide lawyers, will be shared directly with CSAFE collaborators and crime labs in order to inform them about how lawyers approach litigation of scientific evidence and forensics.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? N/A (although one speaker will be traveling from the U.K. and will have travel expenses paid)

Changes/Problems Nothing to Report

CSAFE Annual Report 06.2017 HUMAN FACTORS 197

TRAINING AND EDUCATION

Project F - Education of Legal Professionals on the Role of Statistics in Forensics and the Law

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Simon Cole

Other Investigators: Matt Barno

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? Develop an education module for legal professionals framed within the discipline of history of science to complement existing educational modules and materials framed within the discipline of statistics

YEAR CSAFE TASK DELIVERABLES DELIVERY COMPLETION INVESTIGATOR SCHEDULE DATE

2016 Simon Cole V.1 Pilot Lecture 12/31/16 4/14/16

Develop 6/30/16 educational module "Forensic Pattern Recognition Evidence" that elucidates the role of science in decision-making for professional school students, with a particular emphasis on scientific and statistical methods of inference published by The National Academies of

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 198

Sciences, Engineering, Medicine Committee on Science, Technology and Law. 2017 Simon Cole V.1 Improved lecture 12/31/17

Assessment of educational module

Draft written materials

Present material at regional judicial and legal workshops

Revamp undergraduate forensic science course 2018 Simon Cole V.1 Recorded 12/31/18 Lecture

Assessment of educational module

Present material at national judicial and legal workshops 2019 Simon Cole V.1 Final written 12/31/19 materials

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 199

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? Presented educational module "Forensic Pattern Recognition Evidence" that elucidates the role of science in decision-making for professional school students, with a particular emphasis on scientific and statistical methods of inference published by The National Academies of Sciences, Engineering, Medicine Committee on Science, Technology and Law at annual meeting of Association of American Law Schools, San Francisco, California. Audience: law faculty.

Engaged in education of legal professionals through delivery of two lectures and one brainstorming workshop at national and statewide conferences for attorneys. Audience: attorneys. Organized and sponsored symposium in conjunction with launch of National Registry of Exonerations at University of California, Irvine. Symposium was also livecast on the internet. Audience: Attorneys, faculty, graduate students, undergraduate students, community members.

Attended Innocence Network annual conference.

Collaborated with CSAFE Co-Investigator Brandon Garrett on legal education matters. Conference paper submission accepted at annual meeting of Society for Social Studies of Science, September, 2017.

Used educational module "Forensic Pattern Recognition Evidence" for teaching in undergraduate course Forensic Science, Law & Society. Audience: > 100 Undergraduate students.

Completely revamped undergraduate course, Forensic Science, Law & Society. Course is now an active-learning course centered around the validity and admissibility of forensic evidence. Students engage in active learning exercises, such as writing sample Briefs, judicial opinions, and post-conviction motions dealing with different problems concerning forensic evidence. Audience: > 100 Undergraduate students.

Virtually attended NIST 2017 Technical Colloquium on the Weight of Evidence: Quantifying the Weight of Forensic Evidence, June 27-29, 2017. This colloquium provided vital information about the state of statistical thinking in forensic science which is crucial to the educational mission of this project.

Participated in CSAFE-sponsored legal education program, special course, Current Issues in Forensic Science at University of Virginia School of Law taught by Professor Brandon Garrett and Professor John Monahan;

Virtually attended New Paradigm for Fingerprint Reporting W/O Individualization Webinar I and II.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 200

Continued discussions with J.H. Pate Skene, Visiting Fellow, Federal Judicial Center regarding a possible conference on the replication crisis in science and forensic science.

Arranged to give presentation to undergraduate program in Forensic Science at National Autonomous University of Mexico, October 2017.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? The Graduate Student Researcher worked the supervision of the PI to assist in the production of the educational module "Forensic Pattern Recognition Evidence."

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? We presented our work at the CSAFE All-Hands Meeting 2017.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? Prepare additional presentation for Tri-Divisional Education Conference, Arizona.

Consider ways to reach more prosecutors

Work with Defense Forensic Science Center to integrate a forensic practitioners into the project team.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Presented educational module "Forensic Pattern Recognition Evidence" that elucidates the role of science in decision-making for professional school students, with a particular emphasis on scientific and statistical methods of inference published by The National Academies of Sciences, Engineering, Medicine Committee on Science, Technology and Law at annual meeting of Association of American Law Schools, San Francisco, California. Audience: law faculty.

Gave two lectures and one brainstorming workshop at national and statewide conferences for attorneys. Audience: attorneys. Organized and sponsored symposium in conjunction with launch of National Registry of Exonerations at University of California, Irvine. Symposium was also livecast on the internet. Audience: Attorneys, faculty, graduate students, undergraduate students, community members.

Collaborated with CSAFE Co-Investigator Brandon Garrett on legal education matters. Conference paper submission accepted at annual meeting of Society for Social Studies of Science, September, 2017.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 201

Used educational module "Forensic Pattern Recognition Evidence" that elucidates the role of science in decision-making for professional school students, with a particular emphasis on scientific and statistical methods of inference published by The National Academies of Sciences, Engineering, Medicine Committee on Science, Technology and Law for teaching in undergraduate course Forensic Science, Law & Society. Audience: > 100 Undergraduate students.

Participated by giving a presentation in CSAFE-sponsored legal education program, special course, Current Issues in Forensic Science at University of Virginia School of Law taught by Professor Brandon Garrett and Professor John Monahan

Arranged to give presentation to undergraduate program in Forensic Science at National Autonomous University of Mexico, October 2017.

WEBSITE(S) OR OTHER INTERNET SITE(S) http://sites.nationalacademies.org/pga/scipol_ed_modules/index.htm

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Simon A. Cole, Principal Investigator, 0.125 summer month; supervising all projects, data collection, data analysis, background research; CSAFE funding; not collaborating internationally

Matt Barno, Graduate Student Researcher, 0.75 calendar month; data collection, data analysis, background research; CSAFE funding; not collaborating internationally

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? University of California, Los Angeles School of Law

Los Angeles, California

Law student Jaclyn Seegaly collaborated on educational module for National Academies of Sciences, Engineering, Medicine.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? If successful, the module will enhance understanding of statistical approaches to forensic evidence among legal professionals. This will partially relieve forensic scientists from the task explaining the statistical approach to legal professionals.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 202

WHAT IS THE IMPACT ON OTHER DISCIPLINES? The reception of the module will be of interest to academics studying the understanding of scientific and statistical information by non-scientific audiences.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? The National Academies educational module is freely available providing limitless opportunities for teaching issues of statistical reasoning and decision-making, specifically related to pattern evidence in forensic science, especially for professional students.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? The project can potentially serve as a model for enhancing public understanding of statistical reasoning and quantifying uncertainty.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None.

Changes/Problems Changes in approach and reasons for change No changes in approach were made during the reporting period. One change in approach is proposed below for prior written approval from the awarding agency grants official. No changes will be undertaken without prior written approval from the awarding agency grants official. Upon creating the pilot lecture, it was realized that more background information about the history and sociology of forensic statistics is necessary. For example, the PI’s virtual attendance at the two-day NIST Technical Colloquium on Quantifying the Weight of Forensic Evidence revealed that the pilot lecture overstates the degree to which forensic statistics may reasonably be called a “paradigm.” The premise of the educational module is that a history of science approach will be helpful to legal professionals—it is, therefore, essential that the underlying history of science be accurate, nuanced, and complete. Accordingly, it is proposed to spend more time analyzing the history and sociology of forensic statistics as a scientific research program or discipline. Attendance at the two-day NIST Technical Colloquium on Quantifying the Weight of Forensic Evidence was a valuable source of data, but the methods of history and sociology of science can be leveraged to provide much more useful data. Among the research questions that will be addressed are: . What is a “paradigm” and can forensic statistics reasonably be called one? . What is a discipline (or a sub-discipline) and can forensic statistics reasonably be called one? . On what fundamental questions do forensic statisticians agree and disagree?

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 203

. Given all of the above, how can and should lay “consumers” of forensic statistics treat information, guidance, and tools emerging from this discipline? This background research will produce richer and, we hope, more useful educational materials.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 204

Project H – Probability and Statistics Education for Forensic Science Practitioners

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Hal Stern (UCI)

Other Investigators: Alicia Carriquiry, Iowa State University; Colin Beck (graduate student) Iowa State University; Eric Lai (graduate student), UC Irvine

Accomplishments

MAJOR GOALS The objectives of the project are as follows:

1) Develop training materials for a short workshop on basic tools and terminology of probability and statistics.

2) Develop training materials for a short workshop on applications of probability and statistics for forensic science practitioners.

3) Develop web-based instructional material (perhaps videos) covering the material mentioned above.

The relevant task/deliverable for the current period (year 2) to be achieved by May 2017 is to revise the course based on feedback from the pilot offering, offer additional workshops and use these additional workshops to continue refining the material.

ACCOMPLISHED UNDER THESE GOALS After the presentation of the workshops during the previous review period (March 2016), the presentation slides were revised: more forensic science examples were added and some difficult mathematical concepts were removed. The workshops were presented in September 2016 to the Tri-County Regional Forensic Laboratory in Anoka County, Minnesota. The participants were very happy with the presentation and said they would be interested in participating in subject-matter online or in-person training, where they have an opportunity to do hands-on work on a specific topic (e.g., latent prints, firearm and toolmarks, etc.). At the request of participants, CSAFE began issuing Certificates of Completion or Certificates of Participation to workshop attendees.

It was reported in the March 31, 2017 report that the material for the two workshops (review of probability and statistics, applications of probability and statistics to forensic science) was revised based on feedback from the earlier (March 2016, September 2016) offerings. During the most recent quarterly period the revised material was presented to two different groups.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 205

On April 17, 2017 a training session (this included both the probability and statistics material and the applications to forensic science material) was provided to a group of 24 firearms examiners from the Virginia Department of Forensic Science. The training session was held in Roanoke, VA with several examiners participating remotely from regional laboratories throughout Virginia. On April 24, 2017 a training session (again including both workshops material) was provided to a group of 40 forensic practitioners (almost all were latent print examiners) in Bellevue, WA. Attendees included examiners from King County, WA, Washington State Police, Seattle Police Department, Bellevue Police Department, Tacoma Police Department, and several other jurisdictions.

Participants in the two training sessions completed evaluation forms that rated the pace, mathematical level and organization of the workshop. Their results are summarized in the table below. The low evaluation response rate in WA is due to the fact that the instructor failed to collect the information on-site; data were collected via email after the workshop. Questions regarding pace and mathematical level continue to suggest that some participants find the material challenging, with more identifying the pace as too fast than too slow and the mathematical level as too high rather than too low. In each case roughly 40-60% of respondents indicate that the level is just right. Participants compliment the organization of the workshop. In addition, written comments from several participants (about 10-15% of each workshop) express the view that the mathematical level and complexity makes the workshop difficult to follow. Another important piece of feedback is that examiners would like the material customized to their specialty (firearms, latent prints).

VA Dept of Forensic Science Bellevue, WA (24 attendees, 24 responses) (40 attendees, 24 responses) Averag %(rating 4- %(rating 1- Averag %(rating 4- %(rating 1- e 5) 2) e 5) 2) Probability and Statistics Pace 3.23 31% 8% 3.54 54% 4% (1=too slow, 3=just right, 5=too fast) Math Level 3.25 33% 4% 3.46 42% 0% (1=too low, 3=just right, 5= too high) Organization 4.10 79% 2% 4.04 75% 4% (1-5 with high score better) Application to Forensic Science Pace 3.33 29% 0% 3.33 38% 4%

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 206

Math Level 3.17 25% 4% 3.46 42% 0% Organization 4.25 88% 0% 4.00 71% 4%

In addition to the workshops identified above, the material was also used in an orientation for undergraduate students completing a summer program at Iowa State University.

Finally, our proposal to offer a 4-hour workshop at the IAI Meeting in Atlanta, Georgia in August 2017 was accepted. Note that this is one of the deliverables expected for 2017-2018. The evaluations above will be used to further refine the material in the probability and statistics and application to forensic science workshops before the IAI Meeting.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? This is intended as a professional development activity. The primary benefits accrue to the participants in the workshop. CSAFE graduate students and postdoctoral researchers benefit via their participation in the workshops and in helping to revise the material.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? We have provided workshops for forensic scienctists.

PLANS FOR NEXT PERIOD (2017-2018) During the next year we anticipate an additional revision of the training material based on the feedback provided above. In particular, we anticipate revising the discussion of probability and statistics to enhance the presentation of this foundational material. Our goals for these revisions are: • More information on relevance of sampling and experimental design to forensic science • Clearer and more graphical introduction to probability distributions • More complete discussion of sensitivity, specificity and error rates • Reduce overlap between the foundational material on statistical inference and the specific applications to forensic science

Sam Tyner (graduate student at Iowa State) has joined this project. She will be developing a prototype workbook to accompany the workshop. This is intended to help keep workshop participants engaged during the oral presentation. The idea is to provide questions that participants can complete as they go through the course and places where active learning (e.g., small group discussion) can be implemented. We hope to have a version of this available to test at the IAI meeting.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 207

We will schedule more workshops and training for practitioners in the coming year and to continue to improve the training materials (as described above). Additional priorities include:

• Involve practitioners to help better focus the training on the needs of forensic science examiners • Begin to develop online training materials to broaden our impact beyond the in-person workshops taught by CSAFE personnel.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Statistical Thinking for Forensic Practitioners: A 4-hour course for forensic practitioners that includes foundational material on probability and statistics and a discussion of applications to forensic science. Materials include a slide deck (about 120 slides).

WEBSITE(S) OR OTHER INTERNET SITE(S) Course materials are posted on the csafe website (forensicstats.org)

Participants

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Primary individuals involved are Hal Stern and Alicia Carriquiry who had primary responsibility for creating the products.

OTHER ORGANIZATIONS We are grateful to the organizations hosted our workshops. Michele Triplett (King County, WA), Brian Orr (Bellevue, WA) played a critical role in organizing and hosting the Washington workshop. Sabrina Cilessen of the VA Department of Forensic Science organized and hosted the VA workshop.

Impact The project addresses the critical need to improve statistical literacy among forensic science practitioners. Ongoing discussion in the forensic community regarding appropriate standards for reporting and testimony involves statistical concepts such as hypotheses, error rates, and likelihood ratios. The goal for this project is to provide the background that will allow examiners to understand and participate in these discussions.

Changes/Problems Nothing to report.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 208

Project L - Training of Forensic Practitioners in Uncertainty and Measurement Error and the Statistical Presentation of Forensic Evidence

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Anjali Mazumder (CMU faculty)

Other Investigators: William (Bill) Eddy (CMU faculty), Maria Cuellar (CMU PhD student), Amanda Luby (CMU PhD student), Ciaran Evans (CMU PhD student), Karl Williams (Allegheny County Medical Examiner), and Ashton Ennis (Allegheny County Medical Examiner)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? Most forensic laboratory practitioners have limited scientific training beyond that associated with the relatively narrow scope of their day-to-day laboratory activities. In this project hope to:

• Maintain and strengthen our relationship with the Allegheny County Medical Examiner’s Office and build relationships with other forensic labs and agencies that provide forensic science services or engage with the evaluation and interpretation of forensic science evidence. • Develop teaching materials on uncertainty and measurement error, rooted in core principles of probability and statistics for forensic practitioners and those individuals who also digest and use forensic science evidence within the criminal justice system. • Test these teaching materials through a series of half-day courses on probability and statistics for forensic examiners through the training facility of the forensic laboratory overseen by Karl Williams, Allegheny County Medical Examiner. • Share our materials more broadly, initially with those in the other CSAFE universities, and later with NIST and by posting of materials. We hope to accomplish this by collaborating with our local County lab to (a) engage research development, (b) contribute to the CSAFE initiative of training forensic scientists in statistics and probability in the context of forensic science, and (c) improve the scientific and critical thinking and use of more substantive statistical methods in the evaluation of evidence for the courts. We also hope to extend our reach to other labs and agencies through our efforts to build relationships and strengthen our joint understanding and interests.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 209

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? The CMU CSAFE research group has been successful in developing relationships with the local county crime lab – Allegheny County Medical Examiner’s Office (ACMEO). This includes the regular attendance of senior ACMEO members and joint sandpit style sessions which has led to further collaborations and proposed training sessions. This project, and relationship with ACMEO, has further extended its reach to engage with crime labs and forensic science services provided through local police forces, leading to follow-up of mini-colloquium on digital forensics, and individuals specializing in related areas.

1) Major Activities and Objectives:

• Chief medical examiner and senior pathologist (Karl Williams and Ashton Ennis) from the Allegheny County Medical Examiner’s Office (ACMEO) continue to attend our CMU-CSAFE weekly meetings on Tuesday afternoon. • Dr Ashton Ennis (Forensic Pathologist at ACMEO) and Anjali Mazumder (CMU faculty) are engaged in bi-weekly meetings to discuss collaborations, areas of statistical support, and training of forensic scientists. In this quarter, this has led to new contacts/collaborations and proposed training sessions. • We held a half-day sandpit style workshop at ACMEO’s new training facility on Friday January 13. In attendance were senior scientists of ACMEO and CMU-CSAFE researchers (6 faculty and 5 PhD students) to have a training, exchange and discussion. The workshop involved an overview of CSAFE mission and activities, and a brief introduction to statistics, probability, and forensic evidence. ACMEO scientists provided a brief overview to their work and provided a tour for CMU- CSAFE to observe the scientists and tour the lab. After lunch, small break-out groups with CMU- CSAFE researchers and ACMEO scientists were composed to better learn and exchange knowledge on potential problem areas and opportunity for collaboration in specific domain areas. • Held a 2.5hr mini-colloquium on digital forensics in 2 May at CMU in collaboration with CMU-CSAFE researchers, ACMEO scientists and the Allegheny and City of Pittsburgh police forces to discuss key areas of scientific need, problems, challenges and opportunities for collaboration and training, particularly in the area of digital forensics. • A series of two half-day training sessions were held on 15 and 19 June, beginning with an introduction to probability and statistics for forensic scientists, demonstrating its role in different domain areas. The introductory probability and statistics training was provided by Anjali Mazumder (faculty) and Amanda Luby (PhD student).

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 210

3) Significant Results and Key Outcomes/Achievements:

• Continued development of strong links with the ACMEO crime lab responsible for conducting and delivering forensic science services has led to new collaborative projects, training and new relationships with police force crime labs. • Development of research collaboration in the form of data and cases for CMU-CSAFE to work on guidance and advise on reporting on scientific and statistical evidence for the courts, and the opportunity for evaluation of forensic evidence as it relates to real casework. In particular, identification of 9 criminal cases to learn from and at least one short term collaborative project and potentially one longer-term collaborative project. • CMU-CSAFE mini-symposium led to opportunities of new collaborations in digital forensics., as well as new trainings. • Development of training modules and sessions have begun in June and will be shared amongst CSAFE trainers. This will improve scientific and critical thinking and the coherent and consistent use of and/or value in applying substantive statistical methods in the evaluation of forensic science evidence. • With respect to the two deliverables for the year, (1) we have held three mini- symposium/workshops so far and the fourth one is expected in August as a follow-up on digital forensics; and (2) the schedule of training has been set and development of course modules are under way, with initial lectures delivered in June, and three scheduled in July 2017 (approx. 25%).

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? • For Professor Anjali Mazumder, this project provides networking skills, project management, curriculum development and teaching/training skills. This will also provide training for at least two additional CMU CSAFE researchers, particularly PhD students. It will allow them to develop training/teaching skills and with particular respect to training scientists of mixed scientific backgrounds. • This provides continued professional development to ACMEO scientists in statistics and probability as it relates to forensic science evidence and the Courts. This contributes to improving scientific and statistical reporting, use of substantive methods in evaluating evidence for Courts. • Individual CMU-CSAFE research visits in October and November to ACMEO lab to meet scientists, facilitate learning and exchange knowledge toward development of proposed research in other CSAFE projects, specifically BB and M.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 211

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? • A CSAFE training listserv has been proposed. There is a shared folder to share and develop training material with CSAFE partner institutions and make it available online in due course.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? • In the next reporting period, we are planning to hold (a) three half-day sessions on likelihood ratios and its interpretation. There are two planned in July with a follow-up in September/October; and (b) • hold a mini-colloquium with members on digital evidence again. • Continue to collaborate with ACMEO and other labs, to learn from each other and support their service delivery, and hold min-symposiums to engage practitioners and researchers. • Updated standards for presenting forensic evidence, and developing training material that can be shared.

Products Introduction to Statistics and Probability materials are being drafted, and over the course of Year 3, I would expect them to become more available and accessible.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? The main people involved from for year 2 has been Professors Steve Fienberg and Anjali Mazumder. Other CMU researchers as listed above will be engaged in the initial sandpit style workshop and deliver of training as modules are developed.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? This project mainly involves the collaboration with ACMEO. In partnership with ACMEO we are looking to extend the invitation to attend training workshops to other crime lab scientists and state lawyers.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? • The research development and associated training will improve and increase the scientific and critical thinking and use of more substantive statistical methods in the evaluation of evidence for the courts.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 212

• The delivery of training and training materials will be tailored to suit needs of forensic scientists, lawyers and judges in a better understanding of statistics, probability, evidence and its context in the Law. • With CMU-CSAFE close link and relationship with ACMEO crime lab and new connection to City, County and State police forces and crime labs, there will be a greater chance for research to have a direct impact on practice.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Many of the teaching materials will have broader uses, e.g., for use in our minority partner institutions for their training of students in core materials and methods of statistics, and also in law schools.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? This project has provided opportunities for research and teaching in the scientific improvement and appropriate and substantive use in statistical methods to the evaluation and interpretation of forensic science evidence for the courts. This project will lead to the development and dissemination of new educational materials.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? This project should impact the knowledge, skills, abilities and attitudes of forensic scientists in the value of statistics and statistical issues in the forensic science context. Down the line, it should also impact the judicial sector as lawyers and judges become aware of the statistical issues in forensic science.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? Nothing to report.

Changes/Problems Actual or anticipated problems or delays and actions or plans to resolve them Firstly, the death of Steve Fienberg affected us and the progress of this project. The expectation to deliver forensic science modules within a year was overly ambitious.

We have just delivered our first set of training to the biologists at ACMEO. It is clear, and rightly pointed out by a CSAFE Advisory Board member, that there is variability in the background of the scientists – this is true within a lab and within a unit as well. We had to re- adjust our training and spread it out for over three half-days rather than two days. This also involved follow-up. In each instance, we are providing a follow-up. We have also offered to help in the drafting of SOPs. We are at the moment, working with individual groups of scientists, each require a slightly different level of training and a different set of skills and

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 213

understanding. This effort takes time. There have been challenges for the scientists to also make time for training without an understanding of the need, when there are time pressures of service deliver/case work. However, we are working as closely as we can with our county crime lab to support and improve training as best as possible.

There is no cost to the lab, per s. But as stated above, it is a time cost. That being said, the hope is that effort is put now to improve understanding and correct for and account for statistical thinking and methods to improve the scientific validity of their work.

CSAFE as a whole is discussing ways to deliver training and share materials, and will explore suggestions below. Again, our relationship with ACMEO has also invited us the opportunity to use their facilities for other forensic practitioners to attend the training. As we work, train and build our relationships with other facilities, our expectation is to deliver training more widely.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 214

Project N - Summer Undergraduate Research Experience (SURE) in Forensic Science and Statistics

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Anjali Mazumder (Original PI Stephen Fienberg, until his death 12/14/2016.)

Other Investigators: William Eddy (CMU faculty), Stephen Fienberg (CMU faculty), Amanda Luby (CMU PhD student), Neil Spencer (CMU PhD student), Xiao-Hui Tai (CMU PhD student), Ciaran Evans (CMU PhD student)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The primary objective of the project is to enhance the training of undergraduate students at the interface of statistics and forensic science, and in particular engage minority students from CSAFE partner institutions in statistics to enhance their long-term employment prospects and engagement in the science underlying forensic science. With a dual population of students in statistics and in forensic science, the aim was to motivate a new generation of statisticians in forensic science problems in both subtle and complex data-rich areas of statistics, and encourage a new generation of forensic scientists to think critically of the statistical issues that arise and appropriate statistical methods to use in the evaluation of forensic science evidence. The project also aimed to develop students’ communication and presentation skills to define, explain and solve a problem in both oral and written mediums to audiences of different technical knowledge bases.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? CSAFE Year 2 report covers the last half of 2016 CMU-CSAFE SURE which ran from May 30 to 21 July 2016, and the first half of the 2017 CMU-CSAFE SURE program in Forensic Science and Statistics is currently running between 29 May to 20 July 2017. For this reporting period we focus on the first half of SURE 2017 which was organised with the intention of learning and fun!

1) Major Activities: a. Anjali Mazumder, lead researcher/instructor with respect to statistics and forensic science, continued to work closely with three members of the Eberly Teaching Center to explore techniques to (a) better assess students’ gained knowledge as a result of the program, (b) better assess their knowledge and ability before starting the program, (c) techniques to engage and teach mixed academic populations to be implemented in future years, and (d) a structure of mixed learning opportunities to support the dichotomous group.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 215

b. A survey was developed (in conjuction with the University’s Eberly Teaching Center) but the data have yet to be collated and analysed. c. Discussions with the Allegheny County Medical Examiner’s Office (ACMEO) have resulted in collaborative summer projects, a tour for the students in 2017 and resources and expertise to conduct experiments in relation to the summer projects. d. Recruitment of 6 forensic science students and 6 statistics students was completed, from primarily minority partner institutions and/or underrepresented groups. Students arrived for the summer program on 29 May. e. Summer projects have been proposed, and the curriculum and schedule was developed. f. The summer program began and has been filled with challenges and successes with respect to student learning of statistics and group work with dichotomous populations. g. The program was organized such that students were taught in the mornings and spent the afternoons focused on research and further professional development. h. Throughout June, the students attended morning lectures on regression methods delivered by a faculty member of the Statistics department (Chad Schafer). These lectures were followed by training workshops (delivered by PhD students in the Statistics Department) in the statistical programming language R for which they were expected to use to analyze the data for their research projects. i. In June, the students received supplementary lectures on the principles and foundations of statistics and probability for the first week by Amanda Luby (PhD student). j. The students were assigned to groups to each deliver two different research projects for the end of July relating to: (a) fingerprints on stamp bags, (b) typing and handwriting; (c) digital forensics. Each project had a data analysis aspect involving regression and experimental design study aspect: That is each student delivered a group project analyzing data and answering a research question involving regression relating to (a), (b) and (c) and learning not only how to analyse data for a given research question but how to develop a research question and design an appropriate study. A PhD student, leading tutorials, were assigned to each project group as advisers. PhD student advisers met with them as often as two to three times a week to support them in the delivery of their research project. Students spent the afternoons working on their research projects. k. Throughout June, the students attended weekly seminars (Wednesday afternoon) led by Anjali Mazumder (faculty) on forensic science and statistics. The weekly sessions involved (a) oral presentations from the students on their research

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 216

projects; and (b) lectures on forensic statistics covering different domain areas and the calculation of the weight of evidence in both evaluative and investigative case. l. This year there were additional activities/sessions included: i. Anjali Mazumder provided general lectures and organized activities to develop transferable skills including interpersonal, teamwork and leadership and science communication. ii. Anjali Mazumder also held alternating bi-weekly 1-1 and project group meetings with each student and/or each project group to engage their interest, learning and enthusiasm or opportunities to support them individually or as a group. iii. Faculty seminars were organized twice a week (Monday and Thursday) over lunch to gain a broader breadth of statistical methods and thinking in different areas. 2) Specific Objectives:

• The main objective was to develop a structure to the summer program to better support the learning of the dual population, and to train both statistics and forensic science students in statistically thinking and critical thinking about the statistical issues that arise in forensic science. • Students are expected to learn and/or gain a stronger foundation in the principles of statistics and probability (through the supplementary lectures delivered by Amanda Luby) and in regression methods (delivered by faculty member Chad Schafer) and apply these in such a way to be able to identify and describe statistical issues that arise in the context of forensic science during their weekly discussion sessions with Anjali Mazumder and as they worked on their research projects. • Students are expected to develop skills to answer research questions by analyzing data using the statistical methods taught during the program which was evaluated through the delivery of the research projects and presented on the final day as an oral presentation and poster session. 3) Significant Results and Key Outcomes/Achievements:

• Projects and curriculum structure have been designed to support training and development to both populations (forensic science and statistics). • Statistics students have expressed interest in forensic science problems; and forensic science students have expressed the value in statistics as it applies to their different domain areas but also it broader applicability and role.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED?

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 217

• For one faculty member, Anjali Mazumder, this project provided the opportunity to learn new techniques to assess and evaluate students and to engage a dual student population. It has also provided the opportunity to develop/modify a training curriculum in the field of forensic statistics. • For PhD students, Ciaran Evans, Nic Dalmasso and Neil Spencer, this project has provided them with the opportunity to design research projects which involve (a) design of experiments; and (b) regression methods. It was also an opportunity on leading on the delivery of a project and developing their mentoring skills. Furthermore, for Amanda Luby, it has provided the opportunity to develop a curriculum for teaching design of experiments to a dual population.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? Internal communications both within CMU and CSAFE have taken place. We have shared ideas and materials with the ISU REU team. Nothing else to report at this time.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? • Continue to deliver SURE 2017 • Administer the survey developed to the students once they leave SURE 2017 • Continue to follow previous SURE students in their academic and professional careers • Plan for SURE 2018, making adjustments to the recruitment and delivery of the program as best suits the program delivery.

Products Nothing to report at this time.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Professor Anjali Mazumder has focused on learning from SURE 2016 program, developing a structure to improve the implementation and learning outcomes for SURE 2017, and planning for SURE 2017 and beyond.

PhD students, Amanda Luby, Ciaran Evans, Neil Spencer and Nic Dalmaasso have been working tirelessly to prepare the projects, work with the groups, and develop and deliver R tutorials and additional support.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? Eberly Teaching Centre has been invaluable in advice on development of survey questions and techniques to assess, evaluate and calibrate students from two different populations. Allegheny County Medical Examiner’s Office (ACMEO) has been involved in providing real data to motivate the students in the lerning process, providing opportunities for projects, and

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 218

providing the students with a tour of their facilities and providing insight into working in a forensic science lab.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF THE PRINCIPAL DISCIPLINE(S) OF THE PROJECT? Promote and increase the understanding of the principles of statistics and probability, and the utility in the application of sound statistical methods to the evaluation and interpretation of evidence, promote the interface of statistics and forensic science to students who would otherwise not have the opportunity to learn about such a field, and engage a new generation of students in the field of forensic statistics.

WHAT IS THE IMPACT ON OTHER DISCIPLINES? Improve teaching and communication skills to both students and individuals involved as they engage with a diverse group of people.

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? Provided new opportunities for research and teaching in the field of pattern evidence, statistics and forensic science.

Developed and will continue to develop new material and teaching examples for statistics, drawing upon forensic science, as well as key examples in training forensic science practitioners in interface of probability, statistics, forensic science and the Law.

WHAT IS THE IMPACT ON PHYSICAL, INSTITUTIONAL, AND INFORMATION RESOURCES THAT FORM INFRASTRUCTURE? Promoting engagement and collaboration with local county crime lab and CMU’s teaching centre.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? This project is motivating a new generation of statistics undergraduate students to become aware of the data-rich problems in forensic science for them to consider in postgraduate work or careers, and a new generation of forensic scientists to identify and think critically of the potential statistical issues that arise in the context of forensic science as well as gaining a stronger foundation to apply statistical methods to forensic science data problems. Initial impact has been observed with students communicating with Anjali Mazumder of how much they enjoyed the program.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)?

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 219

Zero.

Changes/Problems There are no changes/problems to report.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 220

Project Y - Training Statisticians in Forensic Science

Project Reporting Period: Project Year 2 (June 1, 2016 – June 30, 2017)

Project PI: Jeffrey J. Holt, Karen Kafadar (UVA)

Other Investigators: Maria Tackett (UVA)

Accomplishments

WHAT ARE THE MAJOR GOALS OF THE PROJECT? The major goal of this project is to educate students on the statistics of forensic science, and to attract students who are interested in statistics to the field of forensic science. This will be done through an annual summer workshop for undergraduate students recruited nationwide, and the development of materials designed to teach the statistics of forensic science.

To train students in statistics as it related to forensic science, Jeffrey Holt will take the lead in organizing summer workshops at UVA on Statistics and Forensic Science. Specifically, we will:

• Coordinate the promotion of the workshops to potential audiences nationwide; • Coordinate the application process for prospective participants, as well as the selection process; • Coordinate the workshop arrangements, including lodging, travel, workshop location, and external speakers; • Work with Tackett and Kafadar to develop the materials to be presented at the workshops as well as the workshop program; • Work with Kafadar, and Tackett as instructors to deliver the workshops to students; • Develop, coordinate, and report on the evaluation of the workshops, in terms of its achieving the workshop's objectives; • Refine workshop content as needed and develop modules that can be used for the wider forensic science and judicial communities.

WHAT WAS ACCOMPLISHED UNDER THESE GOALS? An initial workshop was held in May 2016. During that workshop, a small group of students worked in teams collecting, organizing, and synthesizing materials on statistics in forensic science. In the current reporting period, a second workshop was run at UVA in May 2017. During this workshop, a small number of students working in teams took the materials collected in 2016 and expanded them into the first drafts of educational modules. It is planned to use these modules in a pilot undergraduate course at UVA in Spring 2018, and in future summer workshops (2018-2020) for undergraduate students from around Virginia.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 221

Holt and Kafadar served as the 2017 workshop facilitators. There were six participants. All worked over a two and a half week period (May 22 – June 7) on developing materials. The working groups met regularly to discuss progress on the materials, with informal presentations as part of the meetings. All workshop activities took place at UVA.

WHAT OPPORTUNITIES FOR TRAINING AND PROFESSIONAL DEVELOPMENT HAS THE PROJECT PROVIDED? Those involved with working on educational modules have learned more about forensic science and statistics. Kafadar and Holt attend the CSAFE All Hands Meeting in Ames in June, and gave presentations about their projects.

HOW HAVE THE RESULTS BEEN DISSEMINATED TO COMMUNITIES OF INTEREST? The educational materials are still in revision and not ready for broad dissemination. Development is continuing, with initial class testing at UVA planned for the 2017-18 academic year. Dissemination will begin once revisions suggested by student testing are complete.

WHAT DO YOU PLAN TO DO DURING THE NEXT REPORTING PERIOD TO ACCOMPLISH THE GOALS? During the next reporting period, we will continue to refine the educational modules, and start plans for classroom testing at UVA.

Products

PUBLICATIONS, CONFERENCE PAPERS, AND PRESENTATIONS Presentation at the “All Hands” meeting in Ames in June.

OTHER PRODUCTS The educational materials are still in revision and not ready for broad dissemination. Development is continuing, with initial class testing at UVA planned for the upcoming year. Dissemination will begin once revisions suggested by student testing are complete.

Participants & Collaborators

WHAT INDIVIDUALS HAVE WORKED ON THE PROJECT? Jeffrey Holt (principal investigator; 1.5 months NIST support) – Organized 2017 summer workshop, was one of several workshop facilitators, worked on developing educational materials.

Karen Kafadar (principal investigator; 1.5 months NIST support) - Workshop facilitator, UVA project coordinator

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 222

Maria Tackett (graduate student; 0.75 months NIST support) – Workshop facilitator, worked on developing educational materials

There were no international collaborations.

WHAT OTHER ORGANIZATIONS HAVE BEEN INVOLVED AS PARTNERS? No other organizations have been involved.

Impact

WHAT IS THE IMPACT ON THE DEVELOPMENT OF HUMAN RESOURCES? This project will expand the pool of students trained in both statistics and forensic science, and hopefully some of these students will pursue careers related to forensic science. The education materials produced will be used by workshop participants, and are expected to be used in classes offered to students at UVA. It is also hoped that the forensic science community will find some of the materials of interest.

WHAT IS THE IMPACT ON SOCIETY BEYOND SCIENCE AND TECHNOLOGY? We hope that those students who participate in summer workshops and academic year courses who do not pursue careers in forensic science nonetheless find their experience to be positive and take with them appreciation for forensic science.

WHAT DOLLAR AMOUNT OF THE AWARD’S BUDGET IS BEING SPENT IN FOREIGN COUNTRY(IES)? None

Changes/Problems Nothing to report.

CSAFE Annual Report 06.2017 TRAINING AND EDUCATION 223

CENTER ADMINISTRATION

Advisory Boards

CSAFE SENIOR ADVISORY BOARD

The CSAFE Senior Advisory Board (SAB) is a distinguished body of experts in the field brought together to provide guidance and strategic direction on CSAFE activities. Members are broadly representative of CSAFE stakeholders. The SAB is comprised of at least eight members, balanced, to the extent possible, by background, location and area of expertise. At least one member must be from NIST.

This year, the SAB met in Arlington Virginia on January 24, 2017. The group continued its practice of providing valuable verbal feedback on project activities. Members were also invited to join the CSAFE All Hands Meeting in Ames, Iowa in June 2017. CSAFE continues to engage members via email regarding when considering new initiatives, partnership opportunities, and when recruiting for new team members.

Description of Duties

• Assess the soundness of CSAFE plans and strategies • Assess current performance against CSAFE goals • Identify opportunities to connect CSAFE to key stakeholder groups • Serve on ad hoc subcommittees as needed

Member List

• Dr. Richard Cavanagh – Director, Special Programs Office, NIST • Dr. Edward Derrick – Chief Program Director, AAAS Center of Science, Policy & Society Programs (CSPSP) (resigned spring 2017. Will be replaced) • Prof. Jules Epstein – Director of Advocacy Programs, Temple Beasley School of Law • Honorable Barbara Hervey – Judge, Court of Criminal Appeals, Texas • Dr. Sallie Keller – Director & Professor of Statistics, Social and Decision Analytics Laboratory, Virginia Bioinformatics Institute • Dr. Jeff Salvards – Director, Defense Forensic Science Center • Mr. Barry Sheck – Co-Director, The Innocence Project • Sir Bernard Silverman – Chief Science Advisor, United Kingdom Home Office • Dr. Reinout Woittiez – Director, Netherlands Forensic Institute (resigned spring 2017. Will be replaced)

CSAFE Annual Report 06.2017 CENTER ADMINISTRATION 224

CSAFE TECHNICAL ADVISORY BOARD

The Technical Advisory Board (TAB) provides valuable guidance on the technical aspects of CSAFE projects and, most importantly, will serve as a conduit for feedback and exchange of ideas between the statistical, legal, and forensic community as well as a bridge to the private, state, and federal sectors. This year the met at NIST on November 7, 2016. Prior to the meeting members reviewed CSAFE project information, and discussion during the meeting provided needed input project goals and progress.

In spring of 2017, TAB members were also asked to review three projects each, and provide written feedback on the significance, approach, innovation and impact. A full copy of the review form can be found as Exhibit 1 in the appendix of this report. The feedback received was compiled and shared with all CSAFE project leaders.

Members were invited to join the CSAFE All Hands Meeting in Ames, Iowa in June 2017. CSAFE also engages members via email regarding when considering new initiatives, partnership opportunities, and when recruiting for new team members.

Description of Duties

• Review and inform CSAFE leadership on the validity and quality of scientific research • Foster ties between CSAFE and key academics, professionals and policymakers • Promote the integration of research topics with NIST directives • Review CSAFE materials and provide written feedback on a yearly basis to CSAFE leadership.

Member List

• Joanne Buscaglia – Federal Bureau of Investigation • A. Phillip Dawid – University College, London • Constantine Gatsonis – Brown University • Julia Mortera – University of Rome • Peter Neufeld – The Innocence Project • Mark Pollitt – Federal Bureau of Investigation (retired) • Rich Tontarksi – Defense Forensics and Biometrics Agency • Dr. Will Guthrie – NIST • Dr. James Lyle – NIST • Dr. Glinda cooper - The Innocence Project (newly added member)

CSAFE Annual Report 06.2017 CENTER ADMINISTRATION 225

PRACTITIONER ADVISORY BOARD

The Practitioner Advisory Board (PAB) informs the Center’s leadership on implementation and adoption issues and barriers to and avenues for training and technology transfer. This year the met at NIST on November 7, 2016. Prior to the meeting members reviewed CSAFE project information, and discussion during the meeting provided needed input project goals and progress.

In spring of 2017, PAB members were also asked to review three projects each, and provide written feedback on the significance, approach, innovation and impact. A full copy of the review form can be found as Exhibit 1 in the appendix of this report. The feedback received was compiled and shared with all CSAFE project leaders.

Members were invited to join the CSAFE All Hands Meeting in Ames, Iowa in June 2017. CSAFE also engages members via email regarding when considering new initiatives, partnership opportunities, and when recruiting for new team members.

Description of Duties

• Review and inform CSAFE leadership on issues associated with translation of CSAFE research into practice • Foster ties between CSAFE and key academics, professionals and policymakers • Serve on ad hoc subcommittees as needed • Review CSAFE materials and provide written feedback on the applications of CSAFE research and technologies on a yearly basis to leadership team.

Member List

• Robert Thompson –NIST • Michelle Triplett – Latent Print Examiner • Paul Kisch – Forensic Consultant • Christine Funk – Attorney • Matt Redle – Sheridan County Attorney • Leslie Hammer – Hammer Forensics

CSAFE Annual Report 06.2017 CENTER ADMINISTRATION 226

Facilities and Personnel CSAFE completed a renovation of an approximately 2,500 square foot space this year, and coordinated a move to a permanent space in Durham Center on the campus of Iowa State University. Snedecor Hall, the home of the Department of Statistics, is located very close by. The Office of the Vice President for Research at Iowa State University covered much of the cost associated with renovation of the space, and the support has been gratefully received. Growth in research interest and collaborator visits has highlighted the need for additional space. As a result, CSAFE has initiate conversations to pursue expansion.

In order to expand its research efforts and continue increasing the Center’s level of expertise, CSAFE has emphasized the recruitment of additional faculty, staff, and students. At Iowa State University, the lead institution, CSAFE has recruited two tenure-track faculty positions (one filled), one or two additional research faculty positions, two additional staff members and many students. Other CSAFE institutions continued to recruit collaborators and students as well.

Name Title Start Date Stacy Renfro Program Manager September 2016 Marc Peterson Research Administrator May 2017 Sarah Carraher Communications and Outreach March 2017 Coordinator Mary Jane McCunn Administrative Specialist November 2015

Despite suffering a significant loss due to the death of CSAFE founder and Carnegie Mellon lead researcher Stephen E. Fienberg, CSAFE continues to build on the foundation of statistical rigor he developed. William F. Eddy was selected as Steve’s successor at Carnegie Mellon University. Dr. Eddy’s longstanding success in research leadership, advanced expertise in statistics, machine learning and computing, as well as his commitment to advancing the mission of CSAFE has positioned the center to accomplish significant achievements under his direction.

CSAFE contracted with the Center for Survey Statistics and Methodology for information technology services, including server hosting and maintenance, hosting and curating of forensic databases, and other services. CSSM has a large number of information technology professionals with diverse areas of expertise; by contracting with CSSM for these services, we are able to access experts with a broad array of interests.

CSAFE Annual Report 06.2017 CENTER ADMINISTRATION 227

Meetings and Events

Leadership Team Meetings

The Co-Project directors representing each institution continue to meet monthly with Alicia Carriquiry and Stacy Renfro via online Zoom meetings. The virtual meeting format allows documents to be viewed by all participants, enhancing communications and dialogue among participants. Our NIST Program Director, Susan Ballou, was invited to attend quarterly.

Center Wide Meetings

In order to expand the reach and impact of its research, CSAFE implemented a new educational initiative in January 2017. Center Wide Webinars focus on a variety of topics relevant to the diverse forensic and applied forensic science communities. Recordings of the webinars are published on the CSAFE website.

Four webinars have been held thus far, with more scheduled for the coming year. CSAFE researchers and collaborators have presented on the following topics:

• “Thinking About Likelihood Ratios for Pattern Evidence”- Hal Stern, University of California. January 19, 2017. • “A Generative Approach to Forensic Shoeprint Recognition”- Adam Kortylewski, University of Basel, Switzerland. February 10, 2017. • “Statistical and Algorithmic Approaches to Matching Bullets”- Eric Hare, Iowa State University. April 14, 2017. • “Case Processing and Human Factors at Crime Laboratories”- Daniel Murrie and Sharon Kelley, University of Virginia. May 4, 2017. As a result, individuals in the forensic science community worldwide have taken advantage of these learning opportunities, thus improving their ability to interpret evidence analysis results and communicate their findings effectively.

2017 CSAFE All Hands Meeting

The 2017 CSAFE All Hands Meeting held at Iowa State University on June 7-9 provided community members across multiple universities and countries with an opportunity to come together to discuss a targeted approach for advancing the mission of the organization. Over 80 research team members, collaborators, practitioners, and students participated in the event, which amplified the need to increase engagement and further relationship building. A copy of the agenda can be found as appendix item 4.

Research updates from each project lead enabled attendees to identify research progress and key examples of project partnerships. Breakout sessions offered conference participants a more in-depth look at each type of CSAFE research. Groups had the opportunity for candid

CSAFE Annual Report 06.2017 CENTER ADMINISTRATION 228

round-table discussion on the current state of practice in each area, and participants brainstormed opportunities for CSAFE to meet needs for improvement.

Keynote speaker from the Texas Court of Criminal Appeals Judge Barbara Hervey provided the team with significant insights into ways to translate CSAFE research into effective methods for judicial use. CSAFE Director Alicia Carriquiry emphasized that CSAFE’s goal is to transmit the ideas of what the center is doing by improving communication with community partners. Carriquiry inspired the CSAFE team to enhance commitment for transparency between researchers and the greater forensic community.

CSAFE sent out a post meeting questionnaire. 28 percent of conference participants were very satisfied with the meeting, with the remaining being mostly satisfied. Attendee feedback included that interesting information was provided pertaining to current research, ample engagement was seen from participants, and that there were many opportunities to look for further exchange of ideas as well as explore new collaborations.

Digital Evidence Workshop

On May 8 and 9, 2017 individuals from all CSAFE partner institutions as well as practitioners from NIST, the FBI and other collaborating groups gathered in Arlington, Virginia for a Digital Evidence Workshop. Practitioner Advisory Board member, Mark Pollitt, assisted with workshop objectives and implementation.

The workshop focused on providing an open forum of discussion to expose statisticians, digital forensic practitioners, and academics to the current state of practice in the field as well as the latest research that is being developed. Attendees explored how CSAFE can be at the forefront of improving the statistical foundations of digital forensic evidence, and developed new opportunities for collaboration. A copy of the agenda is included as appendix exhibit 3.

Workshop Participant list

• Susan Ballou – National Institute of Science and Technology • Nick Berry – Iowa State University • Alicia Carriquiry - Iowa State University • William (Bill) Eber - DoD Cyber Crime Center (DC3) • William F. Eddy - Carnegie Mellon University • Barbara Guttman - National Institute of Science and Technology • Lotem Kaplan - Carnegie Mellon University • Roy Maxion - Carnegie Mellon University • Jennifer Newman - Iowa State University • Lam Nguyen - Member Scientific Area Committee (OSAC) on Digital Evidence • Mark Pollitt – Former FBI

CSAFE Annual Report 06.2017 CENTER ADMINISTRATION 229

• Henry (Dick) Reeve - FBI Rocky Mountain Regional Computer Forensic Laboratory • Stacy Renfro - Iowa State University • Lora Sims - Ideal Innovations • Padhraic Smyth - University of California, Irvine • Peter Stephenson - The Center for Digital Forensic Studies

Research Experiences for Undergraduates at ISU

In summer 2017 from May 31st to August 3rd, Iowa State University and Carnegie Mellon University held a research experience for undergraduates internship program. 16 students from 7 institutions to include Iowa State University, University of Iowa, Upper Iowa University, Nebraska Wesleyan University, as well as the three CSAFE minority serving partner institutions (Eastern New Mexico University, Albany State University, and Fayetteville State University) participated in the program. CSAFE provided students with training in statistical foundations of forensic evidence through the following workshops:

• Statistics Workshop- 6/5 and 6/6 • R Workshop- 6/12-6/15 • Creating an Effective Poster- 6/21 • Matlab Workshop- 6/23 • Writing an Effective Personal Statement-7/14 • Research Symposium- 8/3 Students also participated in university wide professional development educational sessions, and visited the Iowa Division of Criminal Investigation Laboratory. Throughout the duration of the program, students engaged in hands-on research in fields such as shoeprint analysis, bullet matching, and stegoanalysis. Students gathered and analyzed data, and regularly communicated their results in group presentations. At the conclusion of the internship, students created a poster presentation to summarize their work, and will present their findings at a university wide research symposium on August 3, 2017.

High school Workshop at ISU

CSAFE coordinated and sponsored a weeklong course titled Discovering Forensics- From the Crime Scene to the Court Room on forensic sciences for 15 underprivileged, talented and gifted high school children from July 17-21, 2017. Students were from Denison, Marshalltown, and Ames, Iowa and were all part of the Science Bound pre-collegiate program designed to empower Iowa Students of color to pursue degrees and careers in the STEM fields.

In this interactive class, students learned how statistics and forensic science work together to ensure the right person is brought to justice. Students enhanced team-building skills through hands on experience in collecting evidence and analyzing and interpreting the results. The

CSAFE Annual Report 06.2017 CENTER ADMINISTRATION 230

students were able to see forensic science in action through a field trip to the State of Iowa Division of Criminal Investigation Crime Laboratory. Guest speakers including CSAFE Director, a police officer, and a lawyer from a CSAFE partner institution met with students to highlight case studies and show the progression of a criminal investigation from the crime scene to the courtroom.

Outreach activities such as this help CSAFE engage the future STEM workforce.

STEMversity Participation

STEMversity's summer forensic academy’s for teachers and students (FAST) are designed to train, educate, and develop the future STEM workforce. CSAFE presented a one-day experience for students involved in the FAST Summer Academy held at Central Georgia Technical College in Milledgeville on July 21, 2017. There were about 25 students with the majority being high schoolers (9-12). Students did hands-on learning experiences with shoeprint analysis, bullet matching, and questioned documents.

Science, technology, engineering, and math (STEM) and Forensic Science are exciting and growing disciplines throughout the United States are globally. Students from secondary and postsecondary education are interested in getting the necessary training to enter into these fields.

CSAFE Annual Report 06.2017 CENTER ADMINISTRATION 231

Internal Evaluation We conducted an annual independent survey of the various communities of stakeholders to gather information about the center. Prof. Mack Shelley from Iowa State University serves as an independent evaluator of the Center’s activities and processes.

CSAFE Annual Report 06.2017 CENTER ADMINISTRATION 232

APPENDIX

CSAFE Annual Report 06.2017 APPENDIX 233

Exhibit 1

CSAFE Project Technical Review

April 2017

The CSAFE Leadership Team would sincerely value your input on our research projects. At your convenience, we would greatly appreciate any feedback you have on the following three projects, specifically related to assessing the significance, approach, innovation and potential impact. Guiding questions are included below for reflection purposes only and it is not necessary to address every question listed.

Evaluation Guidance

Significance: Does this project address an important problem? Are the deliverables useful to the appropriate community? Is there evidence that this work advances the state of science? Is the proposed work for the future technically relevant?

Approach: Is the proposed work plan likely to result in successful achievement of the objectives? Does the team acknowledge potential problem areas and consider alternative tactics? Is the involvement with collaborators sufficient?

Innovation: Is the project original and innovative? For example, does the project challenge existing paradigms or practice; address an innovative hypothesis or critical barrier to progress in the field? Does the project develop or employ novel concepts, approaches or methodologies, tools, or technologies for this area?

Impact: What is the impact so far of the research products and papers in this field and in other disciplines? Does the work have the potential for large impact in the future? Are publications being accepted into the appropriate journals and are presentations being made to the appropriate audiences?

Space for recording feedback on projects is included on the next page.

CSAFE Annual Report 06.2017 APPENDIX 234

Exhibit 1 The following space is included for responses to the guiding questions. Please feel free to expand the response area if needed.

Project AA - Inverse Problems in Forensic Science – A statistical approach

Notes:

Project DD - Probabilistic Framework for the Evaluation of Complex Forensic Inference: Accounting for uncertainty, multiple data sources and expert judgement

Notes:

Project II - Understanding and Modeling the Probability Distribution of Accidental/Randomly Acquired Characteristics for Shoeprint Matching

Notes:

CSAFE Annual Report 06.2017 APPENDIX 235

Exhibit 2

Project D - Database Description. Choosing a Database

The majority of current databases can be grouped into two categories: SQL and NoSQL. After studying both types of databases, we concluded that a NoSQL database would be a better choice for our project.

Traditional databases are called relational or SQL databases. The term relational refers to the schema, or structure, of the database. Data is stored in tables and the database creator must define the relationship between tables. The structure of the database needs to be setup before it can be populated with data. The advent of big data brought to light major limitations of relational databases. Because of the database structure, reading and writing operations become terribly time inefficient when many thousands of users attempt to access a relational database at the same time. Sites like Amazon need to process hundreds of thousands of transactions per second, significantly more than relational databases can efficiently handle.

The need to process unimaginably large amounts of data quickly led to the creation of NoSQL databases. These databases are called schemaless, referring to the fact that the structure of the database doesn’t need to be predefined. Data isn’t stored into neatly organized tables. Instead, the data is essentially stored in one incredibly big table with extremely long rows. NoSQL databases have other features that distinguish them from their SQL counterparts, but the lack of predefined structure is the most relevant to our purposes.

While we don’t anticipate that our database will need to accommodate thousands of transactions per second, we still chose to use a NoSQL database because of the flexibility and adaptability the schemaless design offers.

Example of a Relational Database Structure

This toy example shows how we might store some simple steganography data in a relational database across three tables: Phone; Cover Image; and Stego Image. The Phone table stores information on each phone used to produce image data; the Cover Image table stores information on each image captured or produced by a specific phone; and the Stego Image table lists stego images created from cover images with stego apps. See Figure 1 for a description of this example. The arrows indicate the relationship between the tables. For example, if we wanted to know which model of phone was used to take the cover image used to create stego3.jpg, we would first look at the stego3.jpg record in the Stego Image table. According to the table, stego3.jpg came from cover image cover2.jpg. The Cover Image table

CSAFE Annual Report 06.2017 APPENDIX 236

tells us that cover2.jpg was taken on phone1. From the Phone table, we learn that phone1 came from an iPhone 6s.

Example of a NoSql Database Structure

Conceptually, one could think of the data in a NoSQL database as being stored in long rows containing field names and field values. In the NoSQL database example in Figure 2, we give a toy example that contains the same information as the SQL example. Here, the first row contains three field names: Image; Phone; and Type. Each field name has a corresponding field value. All of the essential data pertaining to a particular image is contained in a single row instead of across three rows in three different tables like in the SQL example. Suppose we want to use the NoSQL database to find which model of phone was used to take the cover image used to create stego3.jpg. The seven row tells us that stego3.jpg came from a cover image taken on an iPhone6s. We didn’t need to consult multiple tables. For this reason, querying (searching) is faster in a NoSQL database compared to a SQL database.

Figure 1. Tables representing a relational database example, with three tables: Phone table, Cover Image table, and Stego Image table. The arrows indicate the relationship between the tables, pointing between specific fields that connect two different tables.

Phone table

Phone Type phone1 iPhone 6s phone2 iPhone 6s phone3 iPhone 6 phone4 Galaxy 7

Cover Image table

Cover Image Phone Setting cover1.jpg phone1 Indoor cover2.jpg phone1 Outdoor cover3.jpg phone2 Indoor cover4.jpg phone2 Outdoor

Stego Image table

Stego Image Cover Image Stego App Payload stego1.jpg cover1.jpg WeHide Hello World stego2.jpg cover1.jpg WeHide Insert joke here stego3.jpg cover2.jpg Cloak Hello World

CSAFE Annual Report 06.2017 APPENDIX 237

Stego4.jpg cover2.jpg Cloak Insert joke here

Figure 2. The one table for the NoSQL database example, corresponding to the same information in the SQL database example with three tables.

1st row Image: cover1.jpg Phone: phone1 Type: iPhone6s 2nd row Image: cover2.jpg Phone: phone1 Type: iPhone6s 3rd row Image: cover3.jpg Phone: phone2 Type: iPhone6s 4th row Image: cover4.jpg Phone: phone2 Type: iPhone6s 5th row Image: stego1.jpg Phone: phone1 Type: iPhone6s Cover: Cover1.jpg App: WeHide 6th row Image: stego2.jpg Phone: phone1 Type: iPhone6s Cover: Cover1.jpg App: WeHide 7th row Image: stego3.jpg Phone: phone2 Type: iPhone6s Cover: Cover2.jpg App: Cloak 8th row Image: Stego4.jpg Phone: phone2 Type: iPhone6s Cover: Cover2.jpg App: Cloak

Choosing a NoSQL database: MongoDB

We considered a variety of noSQL databases, including Cassandra, Bigtable, and MongoDB. Most of the databases we looked at offered similar features. We chose MongoDB because it is open source, is widely used, and has readily available tutorials and documentation online. These are important characteristics for our project because we want a widely-available, widely-used and easy to access database software program that has good documentation. The focus of our project is to collect and store image data in a straightforward manner, and have it readily accessible for our purposes. MongoDB has these qualities.

In addition, we wanted to choose a database that had supporting software that made running it easier for non-specialists. MongoDB does not come with a graphical user interface, but MongoDB does have Python-based support for accessing the database and to edit and compile Python code. A bare bones installation of MongoDB uses JavaScript-like commands in the terminal window to query and update the database. We installed a Python shell called PyMongo that is provided by Mongo. It allows access to the database using the Python programming language. We also installed an integrated development environment (IDE) called PyCharm to edit and compile the Python code. This makes the programming environment for using MongoDB easier for us to use.

In Figure 1 we show a diagram of our programming environment for data entering and data searching in our database. The outside-most layer, Pycharm, is the level where we program code that allow us to enter, change, and search the database contents. The arrows indicate that code written in the PyCharm environment is sent to PyMongo, which in turn sends commands to Mongo Shell, which updates or searches the database. The database users

CSAFE Annual Report 06.2017 APPENDIX 238

(our graduate students for now) working in PyCharm do not see the layers beneath PyCharm, as PyCharm allows the users to do all their work in PyCharm.

Figure 3. A diagram of our programming environment. We program exclusively in PyCharm; that programs sends commands in turn to PyMongo and Mongo shell, the latter of which executes the instructions.

PyCharm (User works here) PyMongo Mongo Shell Mongo Database

• PyCharm writes • Receives • Receives • The data scripts (list of commands to commands from commands) in update and PyMongo to Python query the update and • PyCharm sends database written search the to PyMongo in Python from database • Easier script PyCharm • Commands are management • Then sends to normally entered • Easier code Mongo Shell in a terminal completion and • Easier to edit window but here compiling commands are received • Easier compared to from PyMongo debugging Mongo Shell

Image information to be stored in MongoDB

Each digital image has what is called EXIF data. The EXIF data records details about the image such as the time it was taken, the GPS location, the iso and exposure time settings, whether a flash was used and phone model. We will store all of the EXIF data in our database. We will also store whether an image is a full size original image, a cover image or stego image.

Cover image records in the database will record the name of the original image from which it was created and any pre-processing done to it, such as making it a grayscale image.

Stego image records will record the name of the cover image from which it was created. The record will also list which stego algorithm was used, the embedding rate, the password if one was used, the hidden message, and the stego app used to embed the message.

CSAFE Annual Report 06.2017 APPENDIX 239

Image Data Collection Our team of undergraduate students has collected around 35,000 digital images. The images were taken on a variety of cell phone models. We used two iPhone 6s’s, two iPhone 6s Pluses, two iPhone 7’s, two Google Pixels, two HTC’s and two Samsung Galaxy S7’s.

We chose to use camera apps instead of the phones’ built-in camera software so that we could save images in file formats other than JPEG as well as adjust some of the camera settings such as ISO and exposure time. The iPhones used the app Raw Camera and saved images as TIFF files. The Android phones used the app Manual Camera, which allowed the photographers to save DNG and PNG files.

Many of the images were taken in sets of ten. The photographer would pick a particular scene, such as a bookshelf or vase, and take ten pictures of that scene with different ISO and exposure settings as listed in Table 1. The images were taken indoors, and students took care to choose scenes that wouldn’t produce all white images or all black images at the extreme ISO and exposure settings. We also have images that were taken with the automatic ISO and exposure settings.

We also have a variety of image sizes. The photographers took full size images, depending on the phone a full-size image would around 4000x3000. The iPhone’s full-size images are 4032x3024. The Google Pixel’s JPEG format has dimensions 4048x3036, while its DNG format has dimensions 4048x3044. The HTC’s full-size images are 4000x3000. The Samsung Galaxy’s full-size images are 4032x3024. Depending on the phone model, some of the phones saved thumbnail images in addition to the full-size images. Table 3 shows the full list of image dimensions by phone.

Table 1: ISO and Exposure Time Settings

ISO Exposure Setting image 1 auto auto image 2 100 1/10 image 3 100 1/50 image 4 100 1/200 image 5 200 1/10 image 6 200 1/50 image 7 200 1/200 image 8 1000 1/10 image 9 1000 1/50 image 10 1000 1/200

CSAFE Annual Report 06.2017 APPENDIX 240

Table 2: Original Images

Phone Num. dng jpg jpg thumbnails png raw tif Images

Google Pixel Phone 1 3176 3179 2757 9112

Google Pixel Phone 2 2042 2042

HTC Phone 1 899 1329 1538 752 178 4696

HTC Phone 2 187 1280 8 358 9 1842

Samsung Phone 1 2597 1840 4437

Samsung Phone 2 1049 163 1212

iPhone 6s Phone 1 3171 3171

iPhone 6s Phone 2 10 3131 3141

iPhone 6s Plus Phone 1 200 3024 3224

iPhone 6s Plus Phone 2 1 2 2171 2174

iPhone 7 Phone 1 3977 3977

iPhone 7 Phone 2 1397 1397

7919 10035 1110 187 16871 40425

Table 3: Original Image Dimensions

Phone Num. File Format Image Dimension Number Google Pixel Phone 1 DNG 4048x3044 3176

JPG 190x253 2719

JPG 252x188 27

JPG 253x190 1

JPG 3036x4048 3171

JPG 4048x3036 18 Google Pixel Phone 2 JPG 3036x4048 17

JPG 4048x3036 2025

HTC Phone 1 DNG 899

JPG 187x250 606

JPG 249x186 22

JPG 250x187 910

JPG 3000x4000 459

JPG 4000x3000 870

PNG 3000x4000 739

PNG 4000x3000 13

RAW 3000x4000 178 HTC Phone 2 DNG 4104x3046 187

CSAFE Annual Report 06.2017 APPENDIX 241

JPG 180x320 3

JPG 187x250 4

JPG 250x187 1

JPG 3000x4000 348

JPG 4000x3000 932

PNG 3000x4000 314

PNG 4000x3000 44

RAW 9 Samsung Phone 1 DNG 4032x3024 2597

JPG 1398x3264 2

JPG 1728x4032 2

JPG 3024x4023 577

JPG 4032x3024 1255

JPG 3264x1398 2

JPG 4032x1728 2 Samsung Phone 2 DNG 4032x3024 157

JPG 3024x4032 11

JPG 4032x3024 1044 iPhone6s Phone 1 TIF 3024x4032 51

TIF 4032x3024 3120 iPhone6s Phone 2 JPG 3024x4032 10

TIF 4032x3024 3131 iPhone6s Plus Phone 1 JPG 3024x4032 5

JPG 4032x3024 195

TIF 960x960 1

TIF 3024x4032 11

TIF 4032x3024 3012 iPhone6s Plus Phone 2 DNG 640x852 1

JPG 3024x4032 2

TIF 4032x3024 1561

iPhone7 1 TIF 4032x3024 3977

iPhone7 2 TIF 4032x3024 1397

Cover images

We created PNG cover images from the original TIFF images from all six iPhone’s. The cover images were produced by cropping 512x512 sections from the four corners and the center of the original. Each original image yields five cover images.

CSAFE Annual Report 06.2017 APPENDIX 242

Stego Images

We created stego images from the cover images using three embedding algorithms – MiPOD, S-UNIWARD and WOW – and three embedding rates – 0.1, 0.2 and 0.4. Thus, nine stego images were created from each cover image: one stego for each combination of algorithm and embedding rate.

CSAFE Annual Report 06.2017 APPENDIX 243

Exhibit 3

Center for Statistics and Applications in Forensic Evidence (CSAFE) Digital Evidence Workshop

Virginia Tech Research Center Arlington, VA May 8 and 9, 2017 May 8, 2017

8:30 AM Registration 8:45 AM Opening and Introductions (Alicia Carriquiry and Susan Ballou) 9:00 AM Goals and Objectives (Alicia Carriquiry) 9:10 AM Digital and Multimedia Forensics - a Primer for Statisticians (Mark Pollitt) During this presentation, the workshop participants will be exposed to some of the foundational concepts of digital forensics, some definitions, and introduced to the some of the unique challenges in these forensic disciplines

9:40 AM The Role of Statistics in Forensics (Alicia Carriquiry) 10:10 AM BREAK 10:30 AM Evidence, Science, and Rules of Evidence: Not by the numbers (Henry R. (Dick) Reeve) 10:50 AM The Science of Digital Forensics (Lam Nguyen) Presentation will be based on the concept of digital evidence serving a dual role in both the investigative and forensic process; making its output difficult to classify.

11:10 AM What is Facial Identification? (Lora Sims) This presentation will give an overview of the discipline of Facial Identification to include, but not limited to, the difference of facial identification and facial recognition, the questions we answer, how we do our comparisons, and the conclusions we reach and how the conclusions are used in the forensic, investigative, and intelligence communities.

11:30 AM Behaviors and Network Forensics – Using Analytics (Peter Stephenson) This talk with focus on: Finding the Needle in a pile of needles, - Criminological subtypes - narrow the field of possible suspects - Link analysis, - Partial Tool set - Ashleigh Love

CSAFE – Iowa State University – 195 Durham Center – 515 294 5634

Murder – Analytics - Satan RaaS Ransomware – Analytics - Goldendumps payment card dump shop - Analytics

11:50 AM Complexity Algorithm (William P. Eber) DC3 has developed an algorithm which attempts to categorize digital forensic exams into four levels of complexity based on weighted variables known at the time of intake. These levels allow for managing the expectations of laboratory customers with regard to timeliness and accuracy of reported results. We are also working on an algorithm to predict those exams which are most likely to wind up in court.

12:10 PM Presentation TBD (Greg Kesden) 12:30 PM LUNCH AND BREAK 12:50 PM Lunch and Open Discussion (Mark Pollitt) 1:30 PM Statistical Analysis of User-Event Data in a Digital Forensics Context (Padhraic Smyth) Event histories of user activities are routinely logged on devices such as computers and mobile phones. These logs are typically composed of a list of events, each consisting of a user ID, timestamp, and associated metadata. As digital devices become more prevalent, these types of user-event histories are encountered with increasing regularity during forensic investigations. In this talk I will discuss recent work by my research group for analysis of this type of data in the context of digital forensics. In particular, I will briefly review the framework of marked point processes, discuss how this framework can address whether two time-stamped event streams were generated by the same source or by different sources, and illustrate the application of the approach to real-world data sets.

1:50 PM Using Active Learning for Digital Evidence Investigation (Lotem Kaplan)

One of the main issues that digital evidence investigators face after gaining access to digital devices, is the amount of data that they contain. Machine learning techniques allow for automatic classification of big data to focus the detectives’ work on the data of interest. Unfortunately training a classifier requires an expert to label a set of data. Usually this data is chosen randomly to represent the domain, resulting with a suboptimal use of the expert’s time. The active learning approach appreciates that an expert’s time is an expensive and valuable resource. Therefore, it strives to intelligently pick the training data to better utilize the expert’s time. In this talk I will describe an active learning approach using Support Vector Machine for data classification with applications to digital forensics.

2:10 PM Online Marketplaces on the Dark Web (William F. Eddy)

CSAFE – Iowa State University – 195 Durham Center – 515 294 5634

I will present a brief description of a large dataset concerning sales on the dark web. The data cover a two year period. We are currently studying a four month subset of the data with the intent of identifying unique sellers using record matching methods which have been widely used to match respondent records by the US Census Bureau and death records to identify victims of the civil war in Syria.

2:30 PM StegoDB: An Image Dataset for Benchmarking Steganalysis Algorithms (Jennifer Newman) This presentation focuses on the need to create an authenticated database of image data for forensic-related steganalysis. In this talk, I describe our work to create a standardized image dataset for steganography detection. I present preliminary results of proof-of- concept experiments of the utility of our database, and discuss another experiment to detect stego images produced by the Android stego app PixelKnot. We are particularly interested in discovering challenges encountered by image forensic practitioners working on steganography detection in their workplace, and what issues our group at CSAFE can help them address in a more practical sense.

2:50 PM A Dynamic Bayesian Network Approach to Authentication in Mobile Devices (Nick Berry) This presentation will center on statistical computing methodologies in clustering, as well as working with Bayesian Networks, especially Dynamic BN, to create a biometric authentication system for mobile devices. Additionally, I will share thoughts on how information from many sources can be aggregated to make a single decision about a set of evidence.

3:10 PM NSRL, CFTT and CFReDS. (Barbara Guttman) The talk will provide a brief overview of these programs with an emphasis on how NIST programs meet community needs, data that can be used more broadly, and some potential needs for statistics.

3:30 PM BREAK 3:50 PM Discussion 4:30 PM End of day 1

May 9, 2017

8:30 AM Group Pairings and Overview of Charge 9:30 AM BREAK 10:00 AM Group Reporting

CSAFE – Iowa State University – 195 Durham Center – 515 294 5634

12:00 PM Lunch and Open Discussion 1:00 PM Research topics and the future of Digital Evidence 4:00 PM Conclusion

CSAFE – Iowa State University – 195 Durham Center – 515 294 5634

AGENDA 2017 All Hands Meeting Exhibit 4 June 7-9, 2017 • Ames, IA

Wednesday, June 7, 2017 All day Airport shuttles from the Des Moines airport to Gateway Hotel can be booked through the registration or 10 days in advance of the meeting by emailing [email protected] 6:00 pm Welcome Reception South Prairie

Thursday, June 8, 2017

7:00 AM Registration, Check-In and Breakfast North/Central Prairie

8:00 AM Opening Remarks from Dr. Alicia Carriquiry North/Central Prairie

8:30 AM Project Reports- Statistical Foundations and Human Factors North/Central Prairie

8:30 - Anjali Mazumder (K, DD) 9:00 - Simon Cole (E) 8:40 - Maria Cuellar (BB) 9:05 - William Thompson (I) 8:45 - William F. Eddy (AA) 9:10 - Brandon Garrett (U,W) 8:50 - Dan Spitzner (HH) 9:20 - Daniel Murrie (T) 8:55 - Cleotilde Gonzalez (M)

9:30 AM Short Discussion

9:45 AM Break

10:00 AM Project Reports- Handwriting, Fingerprints, Firearms and Toolmarks North/Central Prairie

10:00 - Hal Stern (G) 10:15 - Karen Kafadar (X) 10:05 - Daniel Murrie (V) 10:20 - William F. Eddy (O) 10:10 - Joseph Kadane (Q) 10:25 - Heike Hofmann (CC)

10:30 AM Short Discussion North/Central Prairie

10:45 AM Break

11:00 AM Project Reports- Shoeprint and Tread marks, Digital Forensics, Blood Pattern Analysis

11:00 - Charless Fowlkes (A) 11:20 – Jennifer Newman (D) 11:05 - Jared Murray (II) 11:25 – Chris Galbraith (S) 11:10 - Alicia Carriquiry (EE) 11:30 – Daniel Attinger (P) 11:15 - Yong Guan (J)

11:30 AM Short Discussion North/Central Prairie

11:45 AM Break

CSAFE All Hands Meeting 2017 Page 1

12:00 PM Buffet Lunch North/Central Prairie

1:00 PM Breakout Session 1 by Research Topic

Group 1 - Human Factors South Prairie Project I - Evaluating Lay Perceptions of Forensic Evidence and Forensic Statistics (Thompson) Project M - Human Factors in Visual Identification: A Cross-Cutting Research Proposal (Gonzalez) Project T - Forensic Processing and Human Factors at Crime Laboratories (Murrie) Project U - Research on Lawyers, Jurors, and the Evaluation of Forensic Evidence (Garrett) Project W - Legal Education and Forensic Evidence (Garrett)

Group 2 – Statistical Foundations North/Central Prairie Project K - Improving the Statistical Validity of Forensic Science Databases – Size, Relevance, Representativeness and Utility (Mazumder) Project AA - Inverse Problems in Forensic Science – A Statistical Approach (Eddy) Project BB - Blind Proficiency Testing: Designing a Methodology for Forensic Laboratories (Cuellar) Project DD - Probabilistic Framework for the Evaluation of Complex Forensic Inference: Accounting for Uncertainty, Multiple Data Sources and Expert Judgement (Mazumder) Project HH - Statistical Modeling Framework for Pattern Evidence (Spitzner)

Group 3 – Blood Pattern Analysis Conference Room 2 Project P – Combining Fluid Dynamics, Statistics and Pattern Recognition in Bloodstain Pattern Analysis, to Quantify Spatial Uncertainty and Remove Human Bias (Attinger)

Group 4 – Fingerprints North/South Meadow Project Q - Developing a Statistical Foundation for Latent Print Comparison (Kadane) Project V - Latent Fingerprint Proficiency Testing (Murrie) Project X - Quality Metrics for Latent Fingerprints (Kafadar) Project E - Analysis of Forensic Testimony and Reports (Cole)

2:45 PM Break

3:15 PM Breakout Session 2 by Research Topic

Group 5 – Firearms/Handwriting South Prairie Project O - Developing Methods for Comparison of Cartridge Breechface Images (Eddy) Project CC – Statistical and Algorithmic Approaches to Matching Bullets (Hofmann) Project G - Towards a Score-Based Likelihood Ratio for Handwriting Evaluation (Stern)

Group 6 – Digital Forensics North/South Meadow Project D - StegoDB: An Image Dataset for Benchmarking Steganalysis Algorithms (Newman) Project J - Mobile App Forensic Analysis (Guan) Project S - Statistical Methods for Change Detection Over Time in Digital Forensics Data (Smyth)

Group 7 – Shoeprints North/Central Prairie Project A - Statistical Models for the Generation and Interpretation of Shoeprint Evidence (Fowlkes) Project EE – Statistical and Algorithmic Approaches to Shoeprint Analysis (Carriquiry) Project II - Understanding and Modeling the Probability Distribution of Accidental/Randomly Acquired Characteristics for Shoeprint Matching (Murray)

CSAFE All Hands Meeting 2017 Page 2

4:30 PM Training and Education General Session North/Central Prairie

5:30 PM Break and Transportation- Shuttles to transport guests Reiman Gardens

5:45 PM Poster set up (each project is strongly encouraged to present a poster)

6:00 PM Mixer and Poster Session Reiman Gardens

7:30 PM Dinner and Keynote Address from Hon. Barbara Hervey Reiman Gardens

Shuttles running from 7pm-9:15 pm to transport guests to Gateway Hotel

FRIDAY, JUNE 9

7:00 AM Breakfast North/Central Prairie

8:00 AM Reports from Groups North/Central Prairie

10:00 AM Break

10:30 AM Reports from Groups North/Central Prairie

12:00 PM Lunch and Closing Remarks North/Central Prairie

1:00 PM Adjourn

Airport shuttles from Gateway Hotel to the Des Moines airport can be booked through the registration or 10 days in advance of the meeting by emailing [email protected]

CSAFE All Hands Meeting 2017 Page 3