Application Note DNA Resequencing/Variant ID on the 3130 Series Systems

DNA Resequencing and Variant Identification on the 3130 Series Genetic Analyzers

Introduction Direct DNA sequencing is an accurate and proven technique for mutation detection. However, despite wide- spread use and accessibility, adapting this technique to gene resequencing projects can include challenges in primer design, PCR validation, data analysis, and data interpretation. In the course of designing more than 200,000 primer pairs and generating more than 18 million sequence reads, bioinformatics specialists and R&D scientists who worked on the Applera Genome Initiative gained valuable experience in primer design, as well as insight into the genomic complexities and technical challenges associated Figure 1. The Applied Biosystems 3130 Series Genetic Analyzers are robust, reproducible, systems that are fully with them. The experiences gained automated from the moment each 96- or 384-well plate is placed on the instrument and the run is initiated. The systems provide continuous, unattended operation, from automated polymer loading and sample injection to from the initiative revolutionized separation, detection, and data generation. The instrument also includes several new enhancements, such as a detection cell heater that improves sizing precision, thus facilitating better thermal control, an Automated Polymer DNA sequencing as a tool for the Delivery System that significantly reduces set-up time, and easy-to-use wizards for instrument operation. detection of human gene mutations.

The 3130 and 3130xl Genetic sequencing as a tool for the detection Basecaller automatically processes Analyzers, when combined with the of human gene mutations. them and provides LOR (Length- Applied Biosystems VariantSEQr™ of-Read) greater than 500 base The 3130 Series Genetic Resequencing System, reagents, and Analyzers pairs (bp), with average base Quality SeqScape® Software v2.5 provide the Values greater than 20 (QV The 3130 Series Genetic Analyzers 20). most robust and efficient resequencing (Figure 1) are fully automated, high- Furthermore, the 3130xl system, using system for customers who require low performance, fluorescence-based, the UltraSeq36_ POP7 run module, to medium throughput. Together, the capillary electrophoresis systems that can efficiently sequence up to 41 runs components of this integrated system can analyze multiple samples simulta- (656 samples) in a 24-hour period, simplify the rate-limiting steps found neously. Samples are sequenced in 35 generating high-quality, high-resolu- in current resequencing protocols, tion data with minimal hands-on time. minutes on the 16-capillary 3130xl and offer a complete, cost-effective system, or on the 4-capillary 3130 sys- solution for laboratories performing The 3130 Series Systems are designed tem with 3130 POP-7™ Polymer and either large or small resequencing for ease-of-use to maximize laboratory the 36 cm capillary array. After the studies. It also revolutionizes DNA productivity while reducing the overall sequence data are collected, the KB™

www.appliedbiosystems.com the study of human genes and other target regions. The goal is to discover variants within genes that may corre- late to disease development and response to drug treatment.

Experienced researchers identify primer design, validation, and downstream data analysis as the major bottlenecks in studies involving the resequencing of many genes. Researchers often spend weeks producing effective primers for both PCR and sequencing, especially when working with regions of high genomic complexity. The Applied Biosystems VariantSEQr™ Resequencing System, along with Figure 2. VariantSEQr Resequencing System includes ready-to-pipette M13-tailed PCR primers and universal protocols for PCR and sequencing, as well as project templates that incorporate relevant content and gene reference information. the 3130 Series Systems and SeqScape software, not only minimizes these bottlenecks but also eliminates the primer design and application validation cost per sample. Now more than ever, electrophoresis times than any other processes. The proven technology of researchers have the flexibility to choose capillary electrophoresis system this entire system provides simplified, one configuration for all their rese- available in the market today. robust protocols along with automatic quencing needs. The Automated The VariantSEQr Resequencing data analysis and reporting (Figure 3). Polymer Delivery System in the 3130 System for Low to Medium Series Systems allows automatic Throughput Advanced Design Pipeline polymer loading, which minimizes The Applied Biosystems VariantSEQr Applied Biosystems has developed an hands-on time and maintenance while Resequencing System consists of pre- advanced design pipeline that generates maximizing performance. The systems designed resequencing sets, streamlined resequencing primers for optimal enable the use of 3130 POP-7 Polymer, protocols, and a Project Template on performance. The pipeline also assigns not only for the 36 cm capillary array, a CD that contains the reference a confidence value to each primer but also for the 50 cm and 80 cm capil- sequence for integrated data analysis pair and annotates associated genome lary arrays. Run configurations, specific with SeqScape Software v2.5 (Figure 2). complexity and technical challenges. for the 3130 POP-7 Polymer, incorpo- This system is designed to work Gene coverage includes exons (coding rate a higher temperature through the seamlessly with Applied Biosystems or non-coding), splice junctions, and detection cell heater, which yields instruments, reagents, and software for regulatory regions (Figure 4). less run-to-run variability and faster

PCR using Resolution of Automatic Genomic DNA Sequencing AmpliTaq Gold® sequencing reac- analysis of variants Genotype report plus using BigDye® PCR Master Mix tions on 3130 or using SeqScape® generation validated PCR Terminators and PCR 3130xl platforms Software v2.5

Figure 3. Workflow of the Applied Biosystems VariantSEQr™ Resequencing System illustrates intergration of the 3130 Series Systems and SeqScape® Software v2.5. The system has been designed to fit the workflow of a typical resequencing laboratory. An optimized protocol provides complete integration of each step, from PCR amplification using PCR primers validated by a combination of laboratory and computational systems, to the generation of genotype reports. An important new feature is the integration of the 3130 Series System and Data Collection Software v3.0 with SeqScape software. At the end of each run, the integration feature allows the sequence files not only to be automatically basecalled, but also trimmed, aligned, and assembled against a reference sequence within SeqScape software. Data analysis results and reports generated can be reviewed either locally or exported to separate desktops convenient to the scientists. www.appliedbiosystems.com Exons Intron/Exon Regulatory Region Junction

RSA

RSS

Figure 4. Diagrammatic view of primers designed for a typical Resequencing Set (RSS). The blue regions at the top represent exons, and the lighter green regions represent either the promoter region or introns. The target region of the Resequencing Amplicons (RSA, shown in orange) have been designed to provide complete coverage of promoter regions, intron/exon junctions, and all exons. The regions flanking the target regions (represented in red) are part of the amplicons but may not always contain data of the desired quality. Each PCR primer is tailed with priming sites for either the M13 universal forward or reverse sequencing primer to permit robust and specific sequencing of the amplified regions.

While a number of software packages have the ability to design primers for amplications, it has not been previously possible to know if sequences in the genome will interfere with the genera- tion of the high-quality sequence data necessary for resequencing projects. This Thermal Cycler uncertainty led us to develop a way to reliably predict resequencing amplicon VariantSEQr performance without the need to test every resequencing set in the laboratory.

SeqScape® Software v2.5 SeqScape® Software v2.5 is a sequenc- BigDye Chemistry ing analysis software package designed expressly for resequencing applications (Figure 5). SeqScape software enables researchers to compare their sequence data to a reference sequence to identify variants. Each consensus sequence can 3130 Instrument be compared to a library of sequences to identify top matches. Unique to SeqScape software is the sequence fea- ture, which incorporates into the reference sequence information from SeqScape Software the sequence feature, such as exons, Figure 5. Applied Biosystems SeqScape® Software v2.5, designed expressly for mutation profiling, features robust algorithms, enhanced display capabilities, and detailed results reports. With a single mouse click, the software feeds the sequencing files into the analysis pipeline and generates detailed reports containing mutation informa- tion. Essential to the analysis pipeline, robust algorithms are integrated to ensure automated processing and accurate results from raw sequence data to the final mutation report. Unique to this tool are base-calling and con- sensus calling algorithms that provide quality values for each base pair, sample, and mutation to enable easy distinction between poor and high-quality data. Enhanced display capabilities, such as hyperlinks between reported results and the actual base pairs, reduce data review time. introns, known SNPs, and protein- heterozygous indels, and quality software using the provided project coding sequences. Pressing the analysis values are automatically detected template which contains the advance button in SeqScape software will trigger and identified reference information (Figure 6). basecalling with quality values, trim- ming, and generation of the consensus • All results are generated in reports The experiment illustrated in figures sequence, followed by a comparison to and each result is hyperlinked back 6 and 7 was performed with the the reference sequence. The analysis to the underlying electropherogram VariantSEQr Resequencing Set, CCL24. The sequencing reactions results can be viewed in the Mutation Integration with SeqScape® Report, which lists all mutation types, Software were performed using the Applied ® such as insertions, deletions, substitu- The 3130 Series Systems provide Biosystems BigDye Terminator tions, and heterozygous indels with their seamless integration between the v3.1 Cycle Sequencing Kit and the VariantSEQr Resequencing System amino acid effect. SeqScape software instrument and analysis software, protocol. The reactions were subse- also offers the following benefits: ensuring automatic sample loading, quently purified with Centri-Sep™ spin • Data analysis, including base calling, generation of sequencing data, base- columns. The primers were -21 M13 quality values, and heterozygote calling, and alignment of sequence forward and M13 reverse primers. detection are provided in a single and reference data. An integral part The 3130 UltraSeq36_ POP7 run software package of successful resequencing, SeqScape xl software analyzes data from both module was used, and the samples • All basecalls and variants are reported ™ Basecaller small- and large-scale projects. The were analyzed by KB with quality values, known and v1.2 in SeqScape® Software v2.5. process, which is simple and straight- novel variants are identified, and forward, is almost completely Applied Biosystems provides project the corresponding amino acid effect automated providing fast accurate templates, which contain reference is displayed data analysis. After the sequences sequence and associated data for each • All mutation types, including substi- have been generated, the sequence RSS. The project template and the tutions, insertions, deletions, and files are analyzed by SeqScape

Figure 6. The Amplicon View is a new feature in SeqScape¨ Software v2.5 to review the VariantSEQr systems amplicon (RSA) coverage. Six amplicons are represented above, spanning the region of interests (ROIs) for exon and intron target regions in CCL24. The teal bar above each amplicon indicates a successful experiment with complete coverage in forward and reverse orientations. If failures during sequencing had resulted in incomplete or lack of coverage for an amplicon, this view would enable you to identify and troubleshoot the failure easily. The blue bars in the top panel indicate unknown variants in this experiment; three unknown variants were detected in specific target exons and introns, respectively.

www.appliedbiosystems.com Figure 7. Heterozygote detection using SeqScape® Software v2.5. As noted in Figure 6, three variants were detected for this experiment using the CCL24 gene. Two of the variants are heterozygote bases illustrated in the consensus sequence with black dots above the base. The individual electropherogram sequence traces covering that base position are seen in both orientations. The sequencing reactions used in this experiment were perfomed using the Applied Biosystems BigDye® Terminator v3.1 Cycle Sequencing Kit and the VariantSEQr Resequencing System protocol. Subsequently, they were purified using Centri-Sep™ spin columns. The 3130xl UltraSeq36_POP7 run module was used, and the samples were analyzed using KB™ Basecaller v1.2 in SeqScape Software v2.5.

sequences are imported into SeqScape anomalies, such as PCR noise or unin- References software for complete analysis. After corporated dye terminators, are present 1Sequencing Analysis Software v5.2 the project is analyzed, the quality of in one strand, causing a base calling provides a metric Length-of-Read the results can be reviewed and the error, which can be corrected at the (LOR), defined as the usable range of variants can be examined (Figure 7). consensus level. This requires less man- high-quality or high-accuracy bases, ual editing, as an accurate consensus is determined by Quality Values (QV) ™ Algorithms such as the KB Basecaller determined for each DNA sample generated by the KB basecaller v1.2. and consensus caller implemented in being investigated. The LOR is determined using a sliding SeqScape software provide accurate window of 20 bases which have an detection of heterozygote mutations Conclusion average QV greater than 20 (QV20). with quality values. The color bars To address the challenges inherent shown above each base (Figure 7) in the use of DNA sequencing represent the quality value and they for mutation detection, Applied provide the estimated accuracy of the Biosystems developed an integrated assigned base, allowing clear distinctions system that comprises the new 3130 between poor and high-quality data. Series Genetic Analyzers, the After the data is assembled, the VariantSEQr Resequencing System, consensus caller algorithm examines and SeqScape software. Together, the quality of each trace for background this system, along with BigDye® noise, and the orientation of the Terminator chemistry, provides a traces, forming an accurate consensus complete, cost-effective solution call, which is particularly useful in het- for mutation detection. erozygote base calling. Sequencing

www.appliedbiosystems.com

Ordering Information Description P/N Applied Biosystems VariantSEQr™ Resequencing Systems * Each tube contains a pre-formulated primer pair, 500 PCR reactions (1,000 µL volume, 0.6 µM of each primer), and multiple primer pairs for each gene. Tubes are shipped in convenient carrier tray. Compact Disk: Includes a Data Sheet, Protocol, Quick Reference Card, Gene Information File, and SeqScape® Software v2.5 Project Template. Barcodes: A 2D laser-etched barcode is added to the bottom of each primer-pair tube; and a 1D iScience. To better understand the complex barcode is printed on each tube rack. interaction of biological systems, life scientists are *Part numbers for this product are generic and not gene-specific. To order your genes developing revolutionary approaches to discovery that unite technology, informatics, and traditional of interest, please use the RSS ID and version, or contact your local sales representative. laboratory research. In partnership with our customers, Applied Biosystems provides the innovative products, services, and knowledge resources that make this new, Integrated Science possible. Recommended Products P/N AmpliTaq Gold® Universal PCR Master Mix Worldwide Sales Offices Applied Biosystems vast distribution and service 10 x 250 units/10 x 5 mL 4327058 network, composed of highly trained support and 250 units/5 mL 4318739 applications personnel, reaches 150 countries on six continents. For international office locations, 2,500 units/50 mL 4327059 please call the division headquarters or refer to our Web site at www.appliedbiosystems.com BigDye® Terminator v3.1 Cycle Sequencing Kit Applera is committed to providing the world’s leading 100-Ready Reaction Kit 4337455 technology and information for life scientists. Applera 1,000-Ready Reaction Kit 4337456 Corporation consists of the Applied Biosystems and Celera businesses. 5,000-Ready Reaction Kit 4337457 Headquarters 25,000-Ready Reaction Kit 4337458 850 Lincoln Centre Drive Foster City, CA 94404 USA BigDye® Terminator v3.1 Matrix Standard 4336974 Phone: 650.638.5800 Toll Free: 800.345.5224 BigDye® Terminator v3.1 Sequencing Standard 4336935 Fax: 650.638.5884 For Research Use Only. 3130xl and 3100 Capillary Array (36 cm) 4315931 Not for use in diagnostic procedures.

NOTICE TO PURCHASER. DISCLAIMER OF LICENSE: 3130 and 3100-Avant Capillary Array (36 cm) 4333464 The VariantSEQr™ Resequencing System is optimized for use in the polymerase chain reaction (PCR) covered by patents owned by Roche Molecular Systems, Inc. and F. Hoffmann-La Roche Ltd. No license 3130 POP-7™ Polymer 4352759 under these patents to use the PCR Process is conveyed expressly or by implication to the purchaser by the purchase of these products. A license to use the PCR Process for certain research and development Hi-Di™ Formamide 4311320 activities accompanies the purchase of certain Applied Biosystems reagents when used in conjunction with an authorized thermal cycler, or is available from Applied Biosystems. Further information on purchasing licenses to practice the PCR Process may be obtained by contacting 10X Genetic Analyzer Buffer with EDTA 402824 the Director of Licensing at Applied Biosystems, 850 Lincoln Centre Dr., Foster City, California 94404 or Roche Molecular System, Inc., 1145 SeqScape® Software v2.5 Atlantic Avenue, Alameda, California 94501. NOTICE TO PURCHASER. DISCLAIMER OF LICENSE: 45-Day Demo License 4327099 This product is optimized for use in the DNA sequencing or fragment analysis methods covered by patents owned or licensable by Applied Initial License 4327091 Biosystems. No license under these patents to use the DNA sequencing or fragment analysis methods is conveyed expressly or by implication Additional License (1 user) 4327092 to the purchaser by the purchase of this product. A license to use the DNA sequencing or fragment analysis methods for certain research and development activities accompanies the purchase of certain Applied Additional Licenses (10 users) 4327094 Biosystems reagents when used in conjunction with an authorized DNA sequencing machine, or is available from Applied Biosystems. Further information on purchasing licenses to practice the DNA sequencing or fragment analysis methods may be obtained by contacting the Director of Licensing, Applied Biosystems, 850 Lincoln Centre Drive, Foster City, California 94404, U.S.A.

NOTICE TO PURCHASER: Please refer to the applied biosystems 3130/3130xl genetic analyzer, genemapper and seqscape software, and variantseqr resequencing system user’s manual for limited label license or disclaimer information.

Applied Biosystems, BigDye, and SeqScape are registered trademarks and AB (Design), Applera, Hi-Di, iScience, iScience (Design), POP-7, and VariantSEQr are trademarks of Applera Corporation or its sub- sidiaries in the US and/or certain other countries.

AmpliTaq Gold is a registered trademark of Roche Molecular Systems.

All other trademarks are the property of their respective owners.

© 2004. Applied Biosystems. All Rights Reserved. Information subject to change without notice. Printed in the USA, 11/04, P+s, Publication 106AP19-01