Comprehensive single-, indel, structural, and copy-number variant detection in human genomes with PacBio HiFi reads Abstract #: 917418 William J. Rowell, Aaron M. Wenger, Armin Töpfer, and Luke Hickey PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025

HiFi Reads and Circular Consensus Sequencing Human Genomic Variation HiFi Reads Detect Small Variants with High Precision and Recall

1 bp 1-49 bp ≥50 bp SNVs indels structural variants HIFI READS IDENTIFY AND PHASE SMALL VARIANT CALLING Circular Consensus 5 Mb 3 Mb 10 Mb VARIANTS IN DIFFICULT REGIONS WITH HIFI READS UTILIZES A Circularized DNA is STANDARD WORKFLOW sequenced in Sequencing. repeated passes A linear template sequence is GRCh37 15:43,891,619-43,911,196 (19 kb) ligated to SMRTbell adapters. Map to reference Call variants Short reads DNA polymerase synthesizes (pbmm2) (DeepVariant) complementary sequences to

both strands of the original HG002 1 haplotype • HiFi reads match short reads for SNV calling. HiFi reads The polymerase linear template, leading to • Indel performance has improved rapidly and reads are trimmed rolling circle sequencing and HiFi reads further improvements are expected. haplotype 2 haplotype of adapters to yield multiple passes of the original Eichler, EE (2019) , Comparative Genomics, and the Diagnosis of Disease, N Engl J Med. DOI: 10.1056/NEJMra1809315 subreads template. CCS uses the DeepVariant (DV), GATK 30-fold HiFi 30-fold NovaSeq individual subreads to Variants in a . HG002 short reads Precision Recall Precision Recall generate a highly accurate Human genomes differ at many scales from single-nucleotide variants to large SNV 99.97% 99.97% 99.85% 99.88% consensus sequence (HiFi structural variants, which are few by count but contribute most of the basepair Indel, 96.90% 95.98% DV 0.8 Consensus is read). differences between two human genomes. Short-read sequencing gives a STRC 99.37% 99.16% Indel, called from broad assay of variation, but it misses most structural variants and the small 98.94% 98.88% https://ccs.how DV 1.0 subreads variants in difficult-to-map regions. HiFi reads provide a comprehensive view of variation of all classes, including in difficult regions of the genome. https://github.com/google/deepvariant

HiFi Reads Excel at Detecting Structural Variants Extended pbsv to Call Copy-Number Variants HiFi Reads Characterize Pathogenic Variant in Mendelian Disease

HIFI READS SPAN LARGE INSERTIONS STRUCTURAL VARIANT CALLING PBSV COMBINES READ CLIPPING AND DETECTS DUPLICATIONS THAT ARE TOO FAME2 DISEASE HIFI READS CHARACTERIZE A COMPLEX REPEAT AND DETECT VARIANTS IN REPETITIVE WITH HIFI READS UTILIZES READ DEPTH TO CALL COPY-NUMBER LONG TO SPAN WITH INDIVIDUAL READS COHORT IN ONE AFFECTED INDIVIDUAL REGIONS A STANDARD WORKFLOW VARIANTS • Familial Adult Myoclonic Epilepsy (FAME) is TAAAA/TTTTA × 388 characterized by 1) myoclonic tremor and 2) (1,942 bp) myoclonic or tonic-clonic seizures. GRCh37 13:112,993,400-112,994,200 (800 bp) • FAME2 is linked to chr2. subreads 328 bp Map to reference Call variants Determine genome-wide coverage • WGS identified a repeat expansion intronic to GRCh37 6:220,626-410,470 (189 kb) (pbmm2) (pbsv) median coverage of non-gap STARD7 in 158/158 individuals. HiFi HG001 read positions at mapping quality 60 pbsv CNV

HG002 haplotype 1 haplotype HiFi reads • HiFi reads match short reads for SNV calling. Indel performance has improved rapidly and TGAAA/TTTCA × 274 • HG001 (1,370 bp)

further improvements are expected. HiFi TCCCGAGTAGCTGGGATTACAGGCGTCCACCACCATGCCCAGCTAATTTTTGTATTTTTAGTAGAGATGGAGTTTCACCATGTTTCCCAGGCCGGTCTCGAACTCCTGACATCAGGTGATCCGC CCACCTCGGCCTCCCAAAGTCTGGGATTACAGCGTGAGGCCGTTGTGCTTGGCTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTATTTTATTTTATTTTATTTA Identify candidate CNV breakpoints reads TTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTA TTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTA haplotype 2 haplotype TTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTT positions with multiple clipped reads TATTTTATTTTATTTATTTATTTTATTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTATTTATTTTATTTTATTTT ATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATT Manta TTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTT TATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTT ATTTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTT pbsv (DRAGEN 3.5) TATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTAATTTTATTTTATTTTATTTT Genes ATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTATTTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTA 30-fold HiFi 30-fold NovaSeq* TTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTT ATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTT HG002 TATTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTTTATTTTATTT Segdups TATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTT Precision Recall Precision Recall ATTTATTTTATTTTATTTTATTTTATTTTATTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTATTTTTATTTTATTTTATTTTATTTT short reads ATTTTATTTTATTTCATTTCATTTCATTCATTTCAATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCA TTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCA Evaluate coverage between adjacent Repeats TTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTC Deletions 96.7% 95.0% 94.0% 70.1% ATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTC ATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCA candidate breakpoints TTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTC Insertions 96.0% 94.9% 95.3% 54.7% ATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCA TTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTC calculate z-score vs Poisson expectation ATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTC *https://www.linkedin.com/pulse/dragen-35-out-rami-mehio/ ATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTC ATTTCATTTCATTTCATTTCATTTTCATTCATTTCATTTCATTTCATTTCATATTCATTTCATTTCATTTCATTTCATTTCATTTCATTCATTTCATTTCATTTCATTTCATTTCATTTCATTT Corbett, MA (2019) Intronic ATTTC repeat expansions in STARD7 in familial adult myoclonic epilepsy linked to chromosome CATTTCATTTCATTTCATAAATAAGACGGAGTTTCGCTCTTGTTGCCCAGGCTAGAGTGCAATGGCACGATCTTGGCTCCCTGCAACCTCCGCCTCCCGTGTTCAAGCGATTCTCTTGCCTCAG TCTCCCGAGTAGCTGGGATTACAGGTATGTGCCACCGTGCCCAGCTAATTTTGTATTTTTAGTAGAGACAGGGTTTCTCCACGTTGGTTAGGCTGGTCTCAAACTCCTGACCTCAGGTGATCGC 2, Nat Commun. doi:10.1038/s41457-019-12671-y. CTGCCTCAGCCTCCCAAAGTATTGGGATTACAGGCGTGAGCCACTGCGCCTAGCCTATTTTATTTTTTAAGAGACAGTGTAGCTGGGCACGGTGGTTT https://github.com/PacificBiosciences/pbsv

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies. All other trademarks are the sole property of their respective owners.