Forensic : Research Projects and Standards Production

Peter M. Vallone, Ph.D. Leader, Applied Genetics Group Forensics at NIST 2020 November 5, 2020 Disclaimer

• Points of view in this document are those of the author and do not necessarily represent the official position or policies of the U.S. Department of Commerce

• Certain commercial equipment, instruments, and materials are identified in order to specify experimental procedures as completely as possible. In no case does such identification imply a recommendation or endorsement by NIST, nor does it imply that any of the materials, instruments, or equipment identified are necessarily the best available for the purpose.

• All work presented has been reviewed and approved by the NIST Research Protections Office.​ Forensic Genetics – Forensic DNA Typing Workflow

DNA DNA PCR & Interpretation Extraction Quantitation Amplification Sequencing

Advancing technology and traceability through quality genetic measurements to aid work in Standards and Forensic and Clinical Genetics. data

Variations upon the polymerase chain reaction (PCR) technique such as rapid PCR, multiplex PCR, real- Applied research and time PCR, and digital PCR are used to , Community assessing sequence, and provide quantitative information Engagement emerging pertaining to an organism's . technologies Forensic Genetics Team

Margaret Kline will be retiring on November 20, 2020 Congratulations Margaret on a 35-year career at NIST We’ll miss you! Topics for today SRM 2391 series

• SRM 2391c and 2391d – PCR-based DNA standard • Used by the community to calibrate STR typing methods STR profile – certified allele calls • Purchased by practitioners and companies performing validations • Further characterization of the samples and updates to the certificate Markers included in the Certificate of Analysis

Y-STR Autosomal STR 31 Markers X-STR 48 Markers + 12 Markers Amelogenin

Indels SRM 2391d 30 Markers Certified and Information mtDNA Values Whole Genome INNULS 20 Markers

Identity SNPs Ancestry SNPs Phenotypic SNPs 94 ForenSeq 54 ForenSeq 24 ForenSeq 124 Precision ID 168 Precision ID 24 Precision ID Platforms used for characterization

• Capillary Electrophoresis (CE) was • Next Generation Sequencing (NGS) was performed with one instrument: performed with two different • 3500xL Genetic Analyzer (Thermo Fisher) instruments: • MiSeq FGx (Verogen) • Ion S5 XL (Thermo Fisher)

MiSeq FGx Ion S5 XL 3500xl Autosomal STR Markers Autosomal STR NGM

Marker List 16 PP 21 PP PP S5 PP PP18D HDplex PP CS7PP MiniFiler PP16HS Identifiler 20 CODIS ForenSeq 24plexQS PP ESX 17 ESX PP Fusion PP 24plexGO! GlobalFiler NGMSElect NGMDetect Verifiler Plus Verifiler PP Fusion 6C Fusion PP PP ESIPP Pro 17 PPESI Fast17 Identifiler Plus Identifiler Certified Value Certified PP ESX 17 Fast17 ESX PP Precision ID GF ID Precision PowerSeq46GY Identifiler Direct Identifiler VerifilerExpress ESSplexPlusSE Information Value Information PPVersaPlex 27PY GlobalFiler Express GlobalFiler European Standard Set Standard European D1S1656 X D1S1677 X D2S1338 X D2S441 X Materials characterized using D2S1360 X D2S1776 X D3S1358 X various commercial kits D3S1744 X D3S4529 X Autosomal STR Markers D4S2366 X D4S2408 X D5S818 X Thermo Fisher CE STR kits D5S2500 X D5S2800 X Promega CE STR kits D6S474 X D6S1043 X Qiagen Investigator CE STR kits D7S820 X D7S1517 X Verogen NGS kit D8S1132 X D8S1179 X D9S1122 X Thermo Fisher NGS kits D10S1248 X D10S2325 X Promega NGS kits D12S391 X D12ATA63 X CODIS 20/ESS 12 D13S317 X D14S1434 X D16S539 X D17S1301 X D18S51 X D19S433 X D20S482 X D21S11 X D21S2055 X D22S1045 X CSF1PO X F13A01 X F13B X FESFPS X FGA X 35 Certified Autosomal STR Markers LPL X Penta C X 13 Information Autosomal STR Markers Penta D X Penta E X SE33 X TH01 X TPOX X vWA X Autosomal STR Markers

Typed by CE methods NGS methods

Autosomal STR Markers Thermo Fisher CE STR kits Promega CE STR kits Autosomal STR Qiagen Investigator CE STR kits NGM

Marker List 16 PP 21 PP Verogen NGS kit S5 PP PP18D HDplex PP CS7PP MiniFiler PP16HS Identifiler 20 CODIS ForenSeq 24plexQS PP ESX 17 ESX PP Fusion PP 24plexGO! GlobalFiler

Thermo FisherNGMSElect NGSNGMDetect kits Verifiler Plus Verifiler PP Fusion 6C Fusion PP PP ESIPP Pro 17 PPESI Fast17 Identifiler Plus Identifiler Certified Value Certified PP ESX 17 Fast17 ESX PP Precision ID GF ID Precision PowerSeq46GY Identifiler Direct Identifiler VerifilerExpress ESSplexPlusSE

Promega NGS kits Value Information PPVersaPlex 27PY GlobalFiler Express GlobalFiler

CODIS 20/ESS 12 Set Standard European D1S1656 X D1S1677 X D2S1338 X D2S441 X D2S1360 X

35 Certified Autosomal STR Markers 13 Information Autosomal STR Markers

Y-STR Markers Y-STR Marker List Yfiler ForenSeq 24plexQS Yfiler Plus Yfiler Fusion PP 24plexGO! GlobalFiler PP Fusion 6C Fusion PP PowerPlexY23 Value Certified Precision ID GF ID Precision PowerSeq46GY Information Value Information PPVersaPlex 27PY GlobalFiler Express GlobalFiler DYS19 X DYS385a/b X DYS389I/II X Y-STR Markers DYS390 X DYS391 X Thermo Fisher CE STR kits DYS392 X Promega CE STR kits DYS393 X DYS437 X Qiagen Investigator CE STR kits DYS438 X Verogen NGS kit DYS439 X DYS448 X Thermo Fisher NGS kits DYS449 X Promega NGS kits DYS456 X DYS458 X DYS460 X DYS461 X DYS481 X DYS505 X DYS518 X DYS522 X 28 Certified Y-STR Markers DYS533 X DYS549 X 3 Information Y-STR Markers DYS570 X DYS576 X DYS612 X DYS627 X DYS635 X DYS643 X Y-GATA-H4 X DYS387S1 X Information for additional marker systems Support the adoption of new markers and technology platforms

• Mitochondrial genome sequence

• Identity SNPs – for degraded samples

• Ancestry SNPs – biogeographical ancestry prediction

SNPs – eye and hair color prediction SNP allele calls for all components How can SRM 2391d be used in YOUR lab?

• To meet the FBI Quality Assurance Standards: QAS 8.4

• Validation Studies: instrument, commercial kit, and software • Developmental and Internal Validations • Known, well-characterized samples for forensic marker systems

• Make NIST traceable materials: http://ts.nist.gov/traceability/ Establishing Traceability to NIST SRM 2391d

• Traceability requires the establishment of an unbroken chain of comparisons to stated references: http://ts.nist.gov/traceability/ • In the case of DNA testing with STR markers, the reference material is SRM 2391d • Materials deemed traceable to NIST-created materials must have a record associated with them

Run against SRM 2391d Everyday use calibrant

‘Lot’ of DNA material Contact [email protected] for traceability questions Notes for SRM 2391c and Mitochondrial SRMs

• For those still using SRM 2391c (no longer being sold) the certificate expiration date has been extended through February 3, 2022

• This will be the final extension, after that SRM 2391d must be used

• SRMs 2392 and 2392-I (mitochondrial DNA sequencing) will not be replaced – use SRM 2391d Topics for today SRM 2372a (Human DNA Quantitation Std)

• SRM 2372a – Human DNA Quantitation Standard • Used to calibrate qPCR methods and commercial DNA standards • Digital PCR used for value assignment • Mitochondrial to nuclear DNA ratio information included

Female Male 1:3 M/F

https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.260-189.pdf Digital PCR Partitioning of DNA targets into individual chambers or droplets

A standard curve is not needed

Positives (1 copy of the target)

Negatives (0 copies of the target)

dPCR is counting accessible amplifiable targets Assigned concentration values 65

60

55

These are the assigned values SRM for Value (with error) for the 50 10% Components in SRM 2372a

45 SRM for Value [DNA] ng/µL Value and uncertainty based SRM for Value on ten unique digital PCR assays 40 35

30

25 A B C SRM 2372a Component SRM 2372a includes the ratio of mitochondrial to nuclear haploid

mtDNA/nDNA ratio for three mitochondrial quantification assays optimized for dPCR

Developing Y specific digital PCR assays Topics for today STRBase and STRBase 2.0

https://strbase-b.nist.gov/ https://strbase.nist.gov/ STRBase 2.0

Updated navigation of STR fact sheets and variant alleles

Updated topic pages for: • SNP markers • Mitochondrial DNA • Insertion-Deletion markers • Forensic SRMs • NIST Interlaboratory studies • Population data • Educational resources https://strbase-b.nist.gov/ • Please visit and give us any feedback: [email protected] Forensic DNA Open Dataset

• Periodic requests for raw CE (.HID) or sequencing files (.FASTQ) • From academics, researchers, software vendors, etc • Goal: make data freely available for learning/training purposes • Source: Eleven single source samples previously screened as SRM candidates (not NIST population samples or SRM samples)

https://data.nist.gov/od/id/mds2-2157

Email [email protected] …for questions/ideas Topics for today Rapid DNA Maturity Assessment

What is “Rapid DNA”? A fully automated instrument capable of generating a DNA profile from a swab (‘swab in – profile out’)

The maturity assessment was an interlaboratory study assessing DNA typing success and accuracy results on three Rapid DNA platforms

Twenty swabs were provided to each participant (crime labs, police agencies, vendors)

The results support the implementation of Rapid DNA instrumentation at the booking stations for single source samples

Stakeholders: FBI laboratory, DNA databasing labs, booking stations, Rapid DNA community, ANDE, Thermo Fisher Assessment Scheme

Participant Runs RH-ID GFE

• Participants may choose one chemistry per 20 NIST provided swabs • Additional packages may be requested NIST reports NIST provides 20 CODIS 20 success reference buccal Participant Runs rate for all data swabs to each RH200 GFE combined participant (% success)

Data transferred back to NIST via electronic format

Participant Runs ANDE FlexPlex Profiles from high quality single source samples generated by Rapid DNA instruments Genotyping Success: Rapid DNA Analysis

100

80 CODIS 20: 85% Full Profile: 80%

60 Success (%) Success Each data point is an independent instrument 40 CODIS 20 CODIS 20 CODIS Full Profile Full Profile Genotyping Success: Modified Rapid DNA Analysis

100

80 CODIS 20: 90% Full Profile: 90%

60 Success (%) Success Each data point is an independent instrument

40 CODIS 20 CODIS 20 CODIS Full Profile Full Profile Topics for today Examination of Front-End Methods in DNA Typing

• Problem: Assess the amount of sample loss during the extraction • Low extraction efficiency cold result in overall lower sample quantity • May fail to yield full STR profiles or minor components in mixtures

0.5 ng Extraction 100 pg Starting DNA amount Ending DNA Amount Sample Loss

Methods for determining extraction efficiency and sample loss vary Absolute Extraction Efficiency

DNA Recovered Absolute Extraction Efficiency Original Amount DNA

Measured by digital PCR

Offers the ability to evaluate individual extraction processes and their efficiency independent of another method DNA Sources

Component A of SRM 2372a: Freshly collected whole Washed cell suspension in Human DNA Quantitation blood dPBS Standard

Known concentration of 49.8 ng/µL Known WBC of 4.6 x103 per µL Known cell count of 1x106 per mL Determined by digital PCR WBC reported by blood bank Determined by flow cytometry DNA Extraction Methods

Phenol Chloroform (Organic) QIAamp Spin Columns Qiagen EZ1 Advanced XL • Often referred to as the “gold standard” • Manual method commonly used • Robotic purification instrument in forensic DNA laboratories • Proteinase K digestion of • Cell lysis takes place on the the cells • Silica columns for collection of benchtop in a thermomixer DNA • Equal volumes of Phenol • Purification with paramagnetic Chloroform added • Elution in proprietary buffer bead collection • Phase lock gel tubes used -4 for promoting separation • Similar to TE • Elution in TE-4 • DNA was precipitated with Ethanol and resolubilized with TE-4 buffer

Extraction methods available in our lab Four DNA input amounts were tested in replicates of five for each extraction method

Uncertainty # of Amount (ng) # of Cells (± # Cells) Replicates 50 8,333 833 Extracted 20 3,333 333 5 per amount DNA 10 1,667 167 (20 per DNA Source) 5 781 78 38 6,250 313 19 3,125 156 Cells 5 per amount 9 1,563 78 (20 per DNA Source) 5 781 39 276 46,000 2,300 138 23,000 1,150 Blood 5 per amount 28 4,667 233 (20 per DNA Source) 14 2,333 117 60 Samples Source DNA per 60 Samples per Extraction Method DNA Source

Digital PCR was used to determine the concentration post-extraction

The efficiency of the Repeating – re-optimizing extraction methods are shown between methods and DNA sources Extraction Method Extraction

Romsos, E.L & Vallone, P.M. (2019) Estimation of extraction efficiency by droplet digital PCR. Forensic Sci Int Genet. Sup. Ser. DOI: 10.1016/j.fsigss.2019.10.072 Topics for today DNA Mixture Interpretation

LR system

DNA DNA PCR Genotyping & Interpretation Extraction Quantitation Amplification Sequencing

Statistics measurement Assign a strength of evidence Hp: the DNA from the POI IS in the mixture Likelihood ratio (LR)

Hd: the DNA from the POI IS NOT in the mixture I: background information E: evidence DNA Mixture Interpretation

• Examine methods to assess the performance of LR systems using publicly available ground truth data

(mixture profiles) Alfonse, L.E., et al. A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt. Forensic Sci. Int. Genetics 32, 62-70.

• Examine the similarities and differences between the LR systems https://www.strmix.com/ http://www.euroformix.com/ Discrimination power of LR Systems using Hp true & Hd true LR distribution

Log10(LR) Distribution by PG model, NOC, & Propositions

Hp true STRmix Hd true STRmix

) Hp true EFM LR

( Hd true EFM 10 Log contributor - non Known contributor Known Known Known

2 Person 3 Person Log10(LR) Distribution by Software & Mixture Ratios 2P

Log10(LR) Distribution by Software & Mixture Ratios (2P) 1;1 1;2 1;4 1;9

Log10(LR) Distribution for 2P by Software & Proposition ) ) LR LR ( ( 10 10 Log Log

STRmix EFM STRmix EFM STRmix EFM STRmix EFM Hp true STRmix Hd true STRmix Hp true EFM Hd true EFM Log10(LR) Distribution by Software & Treatment 2P Log10(LR) Distribution by Software & Treatment (2P) Pristine DNase I UV Sonication Inhibition

Log10(LR) Distribution for 2P by Software & Proposition ) LR ( 10 ) LR ( Log 10 Log

STRmix EFM STRmix EFM STRmix EFM STRmix EFM STRmix EFM

Hp true STRmix Hd true STRmix Hp true EFM Hd true EFM Global profile Log10(LR) from 2P and 3P for Hp true

2 2P Factor of 102 Factor of 10 Factor of 104 Factor of 104 Factor of 106 Factor of 106 ) 2P ) 3P LR LR ( ( 10 10 EFM Log EFM EFM Log EFM

STRmix Log10(LR) STRmix Log10(LR) Global profile Log10(LR) from 2P for Hp true

Factor of 102 2P 4 Factor of 10 Degree of agreement between LR systems 6 Factor of 10 80 70

) 2P 60

LR 50 (

10 40 30 20 10

EFM Log EFM 0

Relative frequency Relative *100 ( n= 310)​ Differences in log scale

• In the process of finalizing 4-person mixtures • Investigating the sources of variation • 155 two-person mixtures LR • 147 three-person mixtures STRmix Log10( ) • 132 four-person mixtures Topics for today Working group and stakeholder engagement

• NIST: Human Factors and Mixture Review • NIJ FLNTWG – Sequencing white paper • NIJ Technology Working Group • FBI: SWGDAM​ • Laboratory Operations​, Sequencing, Body Fluid Identification • FBI: Rapid DNA task groups​ • ISFG: STR Nomenclature (Recommendations)​ • NIST/ANSI Type 18 DNA Standard Working Group • OSAC: Sequencing subgroup • Pre-release testing for Promega, Thermo Fisher, QIAGEN

Working group and stakeholder engagement Looking forward into FY21

• Skin protein work with IARPA (genetically variant peptides – GVP) • Assessing the landscape – needs for validation, further foundational research, and written standards • Probabilistic Modeling for Forensic Interpretation of DNA Mixtures Using Next Generation Sequencing Data • The application of AI/machine learning for DNA mixtures (NoC, deconvolution) • Y SNP interlaboratory study (Thermo Fisher, 800+ Y SNPs) • Hosting a continuing education day at NIST • Develop digital PCR assays for the Y chromosome

Into FY2021 Thank you for your attention

[email protected] Forensic Genetics: Next Generation Sequencing

Katherine Butler Gettings, Ph.D. Research Biologist, Applied Genetics Group Forensics at NIST 2020 November 5, 2020 disclaimer

Points of view in this document are those of the author and do not necessarily represent the official position or policies of the U.S. Department of Commerce. Certain commercial equipment, instruments, and materials are identified in order to specify experimental procedures as completely as possible. In no case does such identification imply a recommendation or endorsement by NIST, nor does it imply that any of the materials, instruments, or equipment identified are necessarily the best available for the purpose. All work presented has been reviewed and approved by the NIST Research Protections Office. Forensic Genetics Team

Margaret Kline will be retiring on November 20, 2020 Congratulations Margaret on a 35 year career at NIST We’ll miss you! Topics for today Topics for today Topics for today Short Tandem Repeats

Markers and peaks are separated by size (time) Colors separated by fluorescent dye labels dye fluorescent by separated Colors Single Polymorphism

Allele 1: TAGGATCGTGCCGATGACTG Allele 2: TAGGATCGTACCGATGACTG

A/A A/G G/G Homozygous Heterozygous Homozygous Mitochondrial Genome

Topics for today NGS Platforms @NIST

MiSeq FGx / RUO • Verogen ForenSeq DNA Signature Prep • Promega PowerSeq 46GY • Promega PowerSeq CRM Nested System • QIAseq Targeted Human Mitochondria Panel • QIAgen GeneRead DNAseq Targeted V2 Panel (Identity SNP) Ion Chef and Ion S5 • Precision ID GlobalFiler NGS STR Panel v2 • Precision ID Identity and Ancestry SNP Panels • Ion AmpliSeq DNA Phenotyping Panel MinION • mtDNA whole genome and microbial genome Topics for today Population Sample Sequencing

When a match is made in a forensic case, allele frequencies are used to calculate the rarity of the DNA profile

Length Sequence 8,9 [ATCT]8, [ATCT]9

2pq 2pq

2*0.144*0.375 2*0.144*0.113

0.108 0.033

1 in 9.3 1 in 30.7 Population Sample Sequencing

STR, Y-STR, X-STR and SNP Illumina MiSeq FGx instrument, ForenSeq • 27 autosomal STRs + 24 Y-STR + 7 X-STR + Amel • 94 HID-SNPs + 56 ancestry SNPs + 22 phenotype SNPs

• 1036 Samples • Sequenced in batches of 24 or 32 • 41 total sequencing runs in 2016 Population Sample Sequencing

2018 24 Y-STR 27 auSTR and 94 HID-SNP in progress

2018 2020 7 X-STR SE33 Population Sample Sequencing 2020 7 X-STR Population Data in NIST PDR Population Data in NIST PDR Mitochondrial Genome Population Sample Sequencing

659 samples attempted 704 samples attempted • African American • African American • U.S. Caucasian • U.S. Caucasian • U.S. Hispanic • U.S. Hispanic • Native American • Asian Cassandra Taylor Kimberly Sturk-Andreaggi Charla Marshall Project Aims

Generate high-quality data • CLC Bio Workbench Plugin • “AQME” developed by AFDIL • Double review of variant calls • Confirmation of heteroplasmy < 10 % Phylogenetic QC by EMPOP • International mtDNA database • Identify unlikely variants Large dataset of mtGenomes • 1,327 total passed QC • Searchable in EMPOP • Enable match statistics (mtGenome) Long-PCR Workflow

Origin • Two long amplicons ~ 8.5 kb each 16,569  1 PCR • QC +/- on AATI Fragment Analyzer

• Fragmentation (enzymatic) of PCR products Library 2,499 Preparation • Ligate adaptors/barcodes 2,669 • QC & Quantitate libraries on AATI F/A 10,837 10,672 Sequencing • Run on MiSeq FGx

• Align to reference genome (rCRS) Data Analysis • Dual Review QC of data Heteroplasmy Observed

Total Total Individuals Individuals Individuals Individuals Dataset Individuals PHPs with PHPs with 1 PHP with 2 PHPs with 3 PHPs

31 (28 %) COAF 112 37 26 4 1 30 (27 %) COCN 112 41 20 9 1 27 (25 %) COHS 109 36 20 5 2 60 (23 %) NTAF 256 77 43 17 0 77 (30 %) NTCN 260 92 65 10 2 43 (31 %) NTHS 138 53 34 8 1 54 (32 %) DSAS 169 62 49 2 3 43 (25 %) DSNA 171 48 38 5 0 365 (28 %) All 1327 446 295 60 10 Population Pairwise Fst Comparisons

• Similar population ‘samplings’ • African American and U.S. Caucasian • Homogeneous across geographic samples • Differences in U.S. Hispanics • Could be due to site of collection • Western U.S. vs Eastern Topics for today Combining Marker Types

We calculate Match Statistics by multiplying allele frequencies across markers

• Requires markers to be independent

Sequencing allows typing more markers and different marker types

Laboratories need guidance on calculating the appropriate match statistics

Evaluating Linkage Disequilibrium in auSTR and IISNP loci in NIST 1036

• Collaboration with Andreas Tillmar, National Board of Forensic Medicine CODIS 13 CODIS 7 Commonly Used TPOX D1S1656 D6S1043 D3S1358 D2S441 Penta D FGA D2S1338 Penta E D5S818 D10S1248 CSF1PO D12S391 NIST Minis D7S820 D19S433 D4S2408 D8S1179 D22S1045 D9S1122 TH01 vWA D17S1301 D13S317 D20S482 D16S539 D18S51 27 Autosomal STRs D21S11 in ForenSeq 94 IISNPs in ForenSeq

50 from SNPforID

44 from Kidd

LD tests for pairs of SNPs on same 45 SNPs are spread across the chromosome demonstrated no 22 human autosomes and show very loose significant deviation from expectations or no with each other LD in casework When LD is detected, ideally: • Rule out technical issues by testing on different platforms/assays • Confirm with multiple sample sets from same population, and multiple test methods

Designing panel/assay: Evaluate LD, eliminate loci as needed based on informativeness

Implementing established panel/assay: Best – Determine haplotype frequency for pair or block • for polymorphic loci the sample size would be unfeasible Alternative – Exclude one of the two markers during validation • Keep the more informative, similar to assay design Problematic – Exclude one of the two markers case-by-case • RMP vs Kinship STRSeq Catalog of Sequences STRSeq Catalog of Sequences

UNT - 80 USC - 96 NIST - 105 KCL - 96

D12S391 Alleles by Lab STRSeq Catalog of Sequences

https://www.ncbi.nlm.nih.gov /bioproject/380127 STRSeq Catalog of Sequences STRSeq Catalog of Sequences STRSeq Catalog of Sequences STRSeq in Population Data STRAND working group align|name|define

Our mission is to harmonize related efforts across member laboratories:

STRidER Population STR sequence sample quality control sequencing

STRSeq Forensic STR Catalog of Sequence sequences Guide and to characterize additional STR loci STRait Razor present in the genome Bioinformatic freeware which may be useful for forensic purposes in the future. Spring 2020: STR Nomenclature Meeting ISFG EB approved STRAND WG proposal for April 2019 DNA Commission on STR Sequence Nomenclature Recommendations

STR Nomenclature Meeting Report Defined Coordinates

PowerSeq 46GY GeneMarker NGS Range ForenSeq DNA Signature Prep Kit UAS Flanking Region Report Range Precision ID GlobalFiler NGS v2 Converge .bed file range

Supplementary File - 24 auSTRs Topics for today Comparison of conventional CE versus NGS-STR genotyping workflows

Riman et al. 2020 RFU Data Analysis by CE D21S11 (29, 31) (29, D21S11 Alleles

Coverage in Log Scale S1 S2 A N S2 = = = = 50 types S1 True Primary Back Noise 28 known stutter sequences back N allele sequences stutter Data Analysis by NGS 278 types sequences 29 (LUS Number of unique sequencesNumberof not A = True = A sequences known allele of attributed D21S1131) (29, N basic repeat S2 39 types 30 S1 to S motif) 1 N Total Type646= Sequences of 257 types 31 Locus CoverageLocus = 6949 N Average library concentration yield versus starting amount of DNA template

Riman et al. 2020 Average Locus Coverage relative to DNA template amounts Normalized vs. Non-normalized libraries

Riman et al. 2020 Impact of DNA template amount on the distribution of known allele, stutter, and noise sequences

Riman et al. 2020 Topics for today Y-SNP Sequence Study with U.S. Population Samples Worldwide collaboration with Erasmus Medical Center Rotterdam Call for Participants at ISFG 2019 Project Title:

Bringing Forensic Y-Chromosome Haplogrouping to the Next Resolution Level by using Targeted Massively Parallel Sequencing

Study Organizers: Manfred Kayser and Arwin Ralf Erasmus MC Timeframe: Dec 2020 (extended due to COVID-19) Provided by Organizers • Protocol • Y-SNP Panel

Reference Paper Description of Study – Sequencing Y-SNP markers

To obtain worldwide population frequency data: • 884 Y-SNP Markers sequenced on Ion S5 • Infer 640 Y haplogroups • ≥192 male samples per population NIST 1032 male samples: • 359 U.S. Caucasians • 341 African Americans • 236 U.S. Hispanics • 96 U.S. Asians (extra data as needed) Previous Study with Erasmus: Rapidly Mutating Y- STR Markers with U.S. Population Male Samples Topics for today Future Directions Implementation… what are the barriers? Future Directions Implementation… what are the barriers?

Vendors create/adapt assays Developmental validation and analysis methods

Casework Laboratories Internal validation evaluate cost/benefit

Sequence Nomenclature Proficiency Tests

Probabilistic Databases Genotyping of the for NGS future Future Directions Implementation… what are the barriers?

Vendors create/adapt assays Developmental validation and analysis methods

Casework Laboratories Internal validation evaluate cost/benefit

Sequence Nomenclature Proficiency Tests

Probabilistic Databases Genotyping of the for NGS future Future Directions Implementation… what is needed?

Vendors create/adapt assays Developmental validation and analysis methods

Casework Laboratories Internal validation evaluate cost/benefit

SequenceSequence Nomenclature Nomenclature Proficiency Tests Proficiency Tests ProbabilisticMatch Statistics Databases Databases GenotypingProbabilistic ofof the the Genotypingfor NGS for NGS futureFuture Acknowledgements [email protected]

Collaborators David Ballard, Kings College London Questions? Martin Bodner & Walther Parson, Medical University of Innsbruck Jonathan King, University of North Texas Health Science Center [email protected] Chris Phillips, University of Santiago de Compostela George Riley, National Center for Biotechnology Information Arwin Rolf & Manfred Kayser, Erasmus University Medical Center Cassandra Taylor, Kimberly Sturk-Andreaggi, Charla Marshall, AFDIL Andreas Tillmar, National Board of Forensic Medicine

Funding NIST Special Programs Office and the FBI Biometric Center of Excellence Unit: DNA as a Biometric.