PRODUCT TABLE APPLICATION NOTE Cat. No. Description Reactions 30XX16M.v5 Daicel Arbor Biosciences myBaits® Custom Methyl-Seq Kit 16 30XX48M.v5 Daicel Arbor Biosciences myBaits® Custom Methyl-Seq Kit 48 Accurate, Customizable, and Cost-effective 30XX96M.v5 Daicel Arbor Biosciences myBaits® Custom Methyl-Seq Kit 96 30024 Swift Biosciences Accel-NGS® Methyl-Seq Library Prep Kit* 24 Targeted 30096 Swift Biosciences Accel-NGS® Methyl-Seq Library Prep Kit* 96 Coupling the Swift Accel-NGS® Methyl-Seq Library Preparation and * Swift Methyl-Seq Indexing Kits complete the library preparation workflow. Daicel Arbor Biosciences myBaits® Custom Methyl-Seq Systems

ABSTRACT

Methylation sequencing is an important tool in understanding gene regulation during various processes, such as cell differentiation and disease progression, and is seeing increased use in diagnostic assays including oncology. Whole- methylation sequencing is ideal for target discovery with its unbiased and comprehensive single base resolution. Once specific genome regions of interest are identified, targeted methylation sequencing through hybridization capture prior to next generation sequencing (NGS) enables a more cost-effective methylation profiling solution that maximizes both the number of samples analyzed per experiment and the potential depth of sequencing for evaluating low frequency methylation signatures in complex samples (e.g., cell free DNA for liquid biopsy). In this application note, we demonstrate a highly accurate targeted methylation sequencing workflow using Swift Accel-NGS Methyl-Seq and Daicel Arbor Biosciences myBaits Custom Methyl-Seq systems. Through proprietary library preparation and hybridization chemistries, and a unique probe design algorithm, the system overcomes many inherent challenges to the efficient retrieval of specific -converted DNA sequences. Using a target set comprising the regions of over fifty -related genes, we demonstrate that the workflow generates highly complex methylation sequencing libraries with low input quantity capabilities, and also accurately represents methylation profiles in a highly specific, unbiased fashion.

Key features include: • Over 85% reads mapped to the genome following hybridization capture • Over 80% of reads on-target, representing 8000 to 9000-fold enrichment • Highly accurate genome-wide and site-specific methylation level measurement • Compatible with as little as 1 ng starting DNA input

INTRODUCTION hybridization capture, which can target thousands to millions of bases of specific genomic sequence in a single reaction, DNA methylation occurs at within CpG and other offers an extremely cost-effective solution that enables deeper dinucleotides and is a primary epigenetic mechanism controlling sequencing and larger studies using a greater number of a variety of biological processes, especially transcriptional samples. Here we demonstrate a highly efficient, accurate, regulation of cell development and various diseases, including and reproducible targeted methylation sequencing workflow cancer. Whole genome bisulfite sequencing (WGBS) has (Figure 1) that couples the Swift Accel-NGS Methyl-Seq become the gold standard in measuring methylation status library preparation with Daicel Arbor Biosciences myBaits genome-wide, as it enables methylation signatures to be Methyl-Seq hybridization system. We show how the system discovered in a wide range of sample types [1]. Once regions accurately reflects the region-wide as well as site-specific of specific interest are identified and require characterization methylation levels obtained with standard whole-genome for large sample sets, WGBS becomes a relatively expensive bisulfite sequencing, even in samples of mixed methylation option because of its comprehensive nature. Fortunately, states and low starting input.

arborbiosci.com | swiftbiosci.com | Page 1 SWIFT BIOSCIENCES

o Stran AGACTCGGTCGACGAGAT Accel-NGS Methyl-Seq Library Kit T C CGAGT C TCGTCGA C AT Sar Bottom Stran The Swift Methyl-Seq kit is based on patented Adaptase® nomic technology, which is a highly efficient and template- on-tat tosin sA independent adapter attachment chemistry to single stranded DNA. This enables the construction of high complexity libraries from single stranded bisulfite converted DNA to provide comprehensive genome coverage, and can support a range of samples including liquid biopsy/cfDNA often limited to low natration inputs (<10 ng). Because adapter titration is not required, aminat tosin raci an Bisfit high ligation efficiency is maintained at low input quantities onrsion for the Swift kit. Methods that ligate methylated adapters followed by bisulfite conversion lose library complexity due to bisulfite induced fragmentation of the completed libraries that limits support of low input applications, and methods utilizing enzymatic conversion also cannot support inputs <10 ng. i-omit ibrar r DAICEL ARBOR BIOSCIENCES mina Aatr myBaits Custom Methyl-Seq System With myBaits Custom Methyl-Seq, Daicel Arbor Biosciences has adapted its foundational myBaits technology to meet the Accel NGS® Methyl-Seq DNA Library Kit challenges of capturing -deaminated templates using pools of biotinylated RNA probes to hybridize genomic regions of interest. Traditional hybridization capture can tolerate several probe-target mismatches and still efficiently enrich target molecules. However, hybrid capture of cytosine-deaminated templates of unknown methylation mBaits rob art atr it state is faced with several unique challenges to the design Biotinat robs and execution of the experiment: 1) for a given target sequence, more probe sequences are required to accommodate a wide range of possible methylation states among the cytosines,

myBaits® Custom 2) GC content and thus probe-target hybridization strength is Methyl-Seq Kit significantly lower after conversion of many cytosines, and 3) genomic sequence complexity is also significantly lowered following cytosine deamination, reducing the mean specificity of probes to their intended targets.

Sncin The points above magnify the costs of both probe manufacturing an Anasis and sequencing, due in large part to overall lower potential specificity of a given probe design. The myBaits Custom Methyl-Seq system solves these problems in two ways. First, it uses a methylation-specific probe design algorithm that Figure 1. Simple and easy to implement. Schematic illustration of the simulates some but not all potential cytosine methylation Swift Accel-NGS Methyl-Seq and Daicel Arbor Biosciences myBaits configurations on both strands from the converted genomic Custom Methyl-Seq target methylation sequencing workflow. Starting templates; probes are also aggressively filtered for specificity substrate is sheared genomic DNA with a specific set of methylation states (one of several potential states are depicted). After bisulfite conversion and to both strands of a fully CpG-methylated genome sequence library preparation, probes designed to accommodate a wide range of simulation. Additionally, the myBaits Custom Methyl-Seq potential states are used in myBaits hybridization capture. system employs optimized target enrichment parameters to achieve high reads on-target rate with minimal locus dropout.

2 MATERIAL AND METHODS Target enrichment reactions used the myBaits version 5.0 High Sensitivity Protocol using two rounds of 63°C hybridization To demonstrate how the Swift-Daicel Arbor targeted and wash temperatures. Libraries were captured in singleplex methylation sequencing system successfully reproduces using 400 ng input per enrichment reaction, and in multiplex methylation states at multiple genomic loci, we selected the using 200ng input per sample and 8 samples per reaction. putative promoter regions of 50 cancer-associated genes Following capture, libraries were pooled in equimolar ratio and (total target size ~100 kb) as initial targets. A total of 20,003 sequenced on one Illumina MiSeq® lane using a 2 × 76 bp probes were designed and synthesized for these regions protocol with dual 8 nt indexing reads. using the Daicel Arbor Biosciences myBaits Custom Methyl-Seq probe design and filtration pipeline. Hereafter we focus our Raw reads were trimmed of adapter sequence and low-quality analysis on the regions probed by the design, which covers bases using Trimmomatic [2], and 15 nt from the 5’ end more than 92% of the initial intended promoter region space. of Read 2 were trimmed with cutadapt [3] as recommended to eliminate the majority of Adaptase tail sequence. To address whether the system is compatible with a range of Processed reads were aligned to the human genome (hg19) DNA inputs and methylation states, we coupled the myBaits kit using bismark [4] with default parameters. PCR duplicates with a number of sequencing libraries prepared with the Swift were removed using deduplicate_bismark, and methylation call Methyl-Seq Library Preparation Kit. The first series of libraries and CpG coverage were calculated with bismark_methylation_ used a range of bisulfite converted input DNA amounts (1 ng, extractor. We imported the methylation call files into R using 5 ng, and 100 ng) isolated from the NA12878 cell line (Coriell methylKit [5] and used coverage of 5 as a filtering parameter. Institute). Extensive WGBS data is readily available for this cell The Pearson correlation coefficient was then calculated line using the Swift kit, allowing us to determine the accuracy among replicates and between WGBS and enriched libraries of our site-specific methylation calls following target enrichment. in methylKit. The second set of libraries comprised 50 ng total input from different combinations of bisulfite-converted fully-methylated RESULTS and fully-nonmethylated DNA (Zymo Cat. Nos. D5014-2 and High Coverage with Minimal Sequencing D5014-1), simulating approximately 10%, 30%, 50%, 70%, and 90% genome-wide methylation levels (referred to simulated To evaluate the hybridization capture performance, we assessed libraries hereinafter). For library preparation, samples were key metrics including mapping rate, percentage of reads sheared to 350 bp by Covaris M220, followed by bisulfite on-target (specificity), and per-site depth of coverage in a conversion using Zymo EZ DNA Methylation-Gold (Cat. No. random subsample of 950K raw sequencing reads for all D5006). This kit converts unmethylated cytosines to samples. For NA12878, we obtained an average ~85% and results in single-stranded DNA further fragmented to unique mapping rate to the genome, and an average of 84.6%, 170-200 bp. Accel-NGS Methyl Seq libraries were prepared 84.0% and 76.7% raw reads on-target following enrichment according to manufacturer’s recommendations, and for of libraries with 100 ng, 5 ng and 1 ng DNA input respectively compatibility with hybridization capture, amplified using KAPA (Figure 2A). Compared to the 0.009% on-target rate HiFi HotStart ReadyMix (Cat. No. KK2602) for both pre- and measured in the WGBS data for this sample, this translates to post- hybridization capture library amplification steps. A roughly 8400-fold to 9300-fold enrichment. The mapping representative library from each simulated level was sequenced rate and the reads on-target rate did not change significantly with PE150 sequencing on a partial NovaSeq S4 flowcell to with reduced DNA input, indicating input-independent high directly measure the methylation level and evaluate reproducibility. specificity of the system.

84.000 84.667 79.35 93.47 A 76.667 B C 75 75 75

56.35 50 50 50

25 25 25 10.31 0.009 0 0 0 as Ainin to art ions

1 n 5 n 100 n BS 5 n 100 n -Sit it a t 5 5 n 100 n n-art Ara a t ibraris ibraris ibraris

Figure 2. High specificity and orders of magnitude target enrichment.Performance metrics: (A) percentage of on-target reads in enriched and WGBS libraries, (B) average per-site depth of coverage at on-target CpG sites at an even subsample of 950K read-pairs in the enriched libraries, (C) percentage of on-target CpGs with read depth > = 5 at the even subsample in the enriched libraries. Libraries were prepared with the Swift Accel-NGS Methyl-Seq Library Kit with varying input amounts and captured with the Daicel Arbor Biosciences myBaits Custom Methyl-Seq system.

3 A CpG Base Pearson Correlations (r)

nric ibraris it 5 n Startin nt nric ibraris it 100 n Startin nt

B BRCA1 SEPT9 CAMTA1 MTOR FAF1 TP53 RBM15 NDRG4 SETDB1 BRCA2 TDRD10

BCAT1 YY1AP1

SDC2 NTRK1

NCOA2 CDC73

MNX1 PPP1CB

KMT2C YPEL5

EZH2 SOS1

ACVR2A

MET

NFE2L2

RELN

TFPI2

BARD1 RSBN1L

PASK IKZF1 BS

FANCD2 HOXA13

SETD2 PMS2 5 n nt ibraris Aftr atr

MLLT4 PARP3 100 n nt ibraris Aftr atr BCLAF1 BAP1

TERC EPHA7

MAP3K7 KDR

SRSF3 BMP3

STK19 INTS12 HIST1H4E SNX25 MSH3 NSD1 TLX3TLX3 ARHGAP26

Figure 3. High accuracy and reproducibility of targeted methylation capture. (A) Histograms and scatter plots show the comparison of CpG methylation levels across the target regions between WGBS and capture libraries and among replicates. (B) Circos plot illustrating methylation patterns for the entire target regions in capture and WGBS libraries.

We achieved an average of 201× and 18× unique read coverage and WGBS data for this same cell line, both prepared with across all targeted sites, and an average of 79× and 10× read the Swift Methyl-Seq kit. This revealed high correlations coverage on targeted CpG sites in the capture libraries from of site-specific methylation quantification between WGBS 100 ng and 5 ng DNA input respectively (Figure 2B). This and capture libraries regardless of input amount (Pearson enabled methylation states to be called at over 57% and 90% correlation coefficientr > = 0.95 for 5 ng, and r > = 0.97 for of the target region CpG sites with read depth > = 5 for libraries 100 ng, Figure 3A). Remarkable correlations (r > = 0.95 for 5 ng, from 5 ng and 100 ng input, respectively (Figure 2C). This and r > = 0.99 for 100 ng, Figure 3A) were also observed indicates that on a single MiSeq run, a target of this size can be among technical replicates. These strong correlations manifest resolved in at least 24 samples of comparable starting quality. in high correspondence in regional methylation calls, as illustrated in Figure 3B. Taken together, this indicates that the Highly Accurate Site-Specific Methylation Quantification Swift Accel-NGS Methyl-Seq and Daicel Arbor Biosciences To assess the accuracy of our target methylation sequencing, myBaits Custom Methyl-Seq system accurately reproduces we compared the distribution of methylation levels across the methylation measurements of WGBS for a fraction of the the target regions between the enriched NA12878 libraries required sequencing depth.

4

5 Sin ti atr 31,938,800 bp 80.9% 2 82.6% 31,938,600 bp 82.1% 90 tat 1 83.5% 90 87.3 81.8% 2 31,938,400 bp 84.4% (A) Target-wide methylation levels in capture (A) Target-wide methylation levels in capture 83.3% 70 tat 1 84.8% 31,938,200 bp 70 66.7 77.3% 2 83.6% 82.2% 31,938,000 bp 50 tat 1 84.8% 85.9% Efficient Multiplexing Capabilities sequencing on indexed Hybridization capture is performed be pooled into single libraries, and multiple libraries can reactions to enable even more cost-savings enrichment that pooling demonstrate and increased throughput. To does not compromise assay libraries prior to enrichment obtained for libraries performance, we compared results 8 (“multiplex” when pooled together prior to enrichment results when enriched libraries per reaction) to the same observed no significant We individually (“singleplex”). or accuracy between efficiency, in sensitivity, differences (Figure 5), experiments and singleplex the multiplex Arborthereby demonstrating that the Swift-Daicel system has tremendous scalability potential. 50 2000 b 46.7 2 ibraris 88.1% Percentage of target CpGs at 5× or higher read depth in a 460K raw read-pair 31,937,800 bp

ibraris 85.6% 30 tat 1 88.1% 31,937,600 bp 86.4% 2 30 89.4% 29.3 84.7% 10 tat 1 31,937,400 bp 85.9% 89.6%

2 91.8% 10 31,937,200 bp 10.7 89.8% 100 n nt 1 91.7% 50.9% 31,937,000 bp 0 ] ] ] ] ]

2 75 50 25

59.5% art-i tation s tation art-i 0-100 0-100 0-100 0-100 0-100 [ [ [ [ [ 5 n nt 51.6%

1 10 30 50 70 90

53.8% ibrar tation s tation ibrar High sensitivity and fidelity of targeted methylation detection at any methylation levels. High sensitivity and fidelity of targeted Comparable performance when enriching pools of libraries.

0

25 50 75 rcnt tct it t 5 t it tct rcnt B A subsample per library when captured in pools of 8 (Multiplex) vs. individually (Singleplex). subsample per library when captured in pools of 8 (Multiplex) vs. individually (Singleplex). Figure 5. little bias towards methylated or non-methylated states in the little bias towards methylated or non-methylated states in the mixed samples on per-site basis. levels (i.e., 10 to 90% methylated) (Figure 4A). Methylation region for all enriched levels at sites along the entire target indicating 4B), (Figure consistent extremely also were libraries Arbor system following enrichment, we measured target-wide Arbor we measured target-wide system following enrichment, methylation levels for the libraries simulated and found that intended simulated these closely reflect the corresponding their . Therefore, it is critical that a hybridization it is critical that a hybridization their genomes. Therefore, can reproduce capture system for methylation sequencing Swift-Daicel the with this evaluate To accurately. ratios these Methylation Detection at All Levels Methylation Detection at All free cell or cells of mixtures contain samples biological Many in sites various at states methylation DNA different with High Sensitivity and Fidelity of Figure 4. (B) IGV screenshot showing the uniform distribution of methylation levels along the gene STK12 promoter locus libraries. Red lines indicate the simulated level. methylation levels. observed with the capture libraries at various

, , 6 Humana Genome

Bioinformatics

Trimmomatic: Trimmomatic: EMBnet. Journal EMBnet.

Bismark: a flexible Reactions 16 48 96 24 96 , 27(11), pp.1571-1572. DNA Methylation Protocols. Cutadapt adapter sequences removes Bioinformatics 13(10), pp.1-9. @ArborBio www.swiftbiosci.com [email protected] 1-734-330-2568 @SwiftBioSci www.arborbiosci.com [email protected] 1-734-998-0751

from high-throughput sequencing reads. high-throughput from 17(1), pp.10-12. and Andrews, S.R. (2011) F. Krueger, aligner and methylation caller for Bisulfite-Seq applications. M., Li, S., Garrett-Bakelman, F.E., Akalin, A., Kormaksson, A. and Mason, C.E. (2012)Figueroa, M.E., Melnick, R package for the analysis methylKit: a comprehensive profiles. of genome-wide DNA methylation Biology, Tost, J. ed. (2018) Tost, Press. M. and Usadel, B. (2014) A.M., Lohse, Bolger, data.a flexible trimmer for Illumina sequence 30(15), pp. 2114-2120.

twitter web email phone twitter 3. Martin, M. (2011) 4. 5. web email phone REFERENCES 1. 2. Custom Methyl-Seq Custom Kit Methyl-Seq Custom Kit Methyl-Seq Custom Kit ® ® ®

Methyl-Seq Kit* Library Prep Methyl-Seq Kit* Library Prep ® ® Daicel Arbor myBaits Biosciences Daicel Arbor myBaits Biosciences Accel-NGSSwift Biosciences Accel-NGSSwift Biosciences Description Daicel Arbor myBaits Biosciences 30096 30XX96M.v5 30024 30XX16M.v5 30XX48M.v5 Cat. No. Daicel and Daicel Arbor Biosciences are registered trademarks of Daicel Corporation. matter the quality or quantity of starting nucleic acid material. matter the quality or quantity of starting myBaits Custom Methyl-SeqCustom myBaits system a key tool for epigenetic potential any application, with particularly excellent of research biomarker discovery no and in cancer-associated prognostic samples with a mixture of methylation states, and achieve states, and achieve samples with a mixture of methylation This makes the remarkable specificity and sensitivity. Swift Accel-NGS Methyl-Seq and Daicel Arbor Biosciences solution. For a panel of 50 cancer-associated gene promoters, gene a panel of 50 cancer-associated solution. For to measure researcher we show that the system allows the target DNA methylation levels even for low input DNA Methyl-Seq of the Daicel Arbor optimized design features and Biosciences myBaits Custom Methyl-Seq systems provides targeted methylation sequencing an accurate and cost-effective CONCLUSION Accel-NGS Swift highly robust the of coupling unique The * Swift Methyl-Seq Indexing Kits complete the library preparation workflow. Kits complete the library preparation * Swift Methyl-Seq Indexing PRODUCT TABLE