The Importance of Coverage: Advantages of Amplicon-Based Approaches in Next-Generation Sequencing
Total Page:16
File Type:pdf, Size:1020Kb
WHITE PAPER Ion AmpliSeq technology The importance of coverage: advantages of amplicon-based approaches in next-generation sequencing In this paper, we highlight: of little benefit when studying a specific genomic region • Why genomic coverage is important for disease and translational research applications. To help address this issue, many researchers have adopted • The advantages of targeted amplification compared to a targeted sequencing approach to improve coverage, hybridization enrichment simplify analysis and interpretation, and lower their total • The benefits of Ion AmpliSeq™ technology sequencing workflow costs. and its comprehensive solutions for targeted next-generation sequencing The importance of coverage Coverage, as the word implies, describes the number of Introduction sequencing reads that are uniquely mapped to a reference Next-generation sequencing (NGS) has proven to be a and “cover” a known part of the genome. Ideally, the disruptive technology, furthering our scientific knowledge sequencing reads that are uniquely aligned are uniformly and opening research opportunities faster than anyone distributed across the reference genome and hence could have envisioned even just 10 years ago. During provide uniform coverage. In reality, coverage is not uniform this time, massively parallel short-read sequencing has and may be underrepresented in genetic regions of interest decreased sequencing costs faster than Moore’s Law due to a variety of factors, including genomic complexity [1], enabling researchers to study entire genomes at an (Table 1). The genome contains an assortment of coding unprecedented scale and capacity. Fundamental advances and noncoding DNA, repetitive sequences, and other in genomics enabled by NGS have made precision elements that can make it difficult to align the sequencing medicine a reality, with medical decisions and treatments reads to the proper genomic coordinates. now tailored to individuals and their genetic makeup. The number of sequencing reads that map to a known While whole-genome sequencing has advanced discovery region is also an important part of coverage. A sufficient and human health, challenging regions of the genome number of properly mapped reads is required to find and are difficult to analyze using this approach, resulting in correctly identify genetic mutations. With high sequencing sequencing bias, and existing databases are noted to coverage, researchers can identify low-frequency be neither complete nor accurate [2]. For many research mutations or discover mutations in a heterogeneous applications the cost of whole-genome sequencing can sample such as a tumor biopsy. Poor coverage, whether still be a burden, particularly when you take into account due to an insufficient number of reads or sequencing reads the computational processing and informatics needs that are mapped incorrectly, will result in the inability to for analysis. This added cost and complexity would be detect variants of interest. Table 1. Potential reasons for poor sequencing coverage and uniformity. Reason for poor coverage Why this can affect coverage Degraded samples are more difficult to prepare and result in shorter sequencing reads that Sample quality are more difficult to map to the correct region since they may be less unique. Enough sample may not be available to sequence, and the DNA is not representative of the Sample input entire genome. Homologous Homologous regions have similar sequences, making it more difficult to map the reads to regions the correct positions of the reference genome. Regions of low Sequence reads with low complexity may be mapped to the wrong part of the genome, complexity resulting in coverage bias. Hypervariable Due to the high number of variants, the sequencing read will look very different compared to regions the reference genome and may not be mapped appropriately. GC content Potential sequencing bias due to the percentage of guanine and cytosine nucleotides. Coverage is clearly important to can also be processed simultaneously material is washed away, leaving the ensure that the genomic region of in a single sequencing run, for faster desired target regions isolated on interest can be studied with high time-to-results from individual and the substrate. For both methods, the confidence. For regions with low cohort samples. isolated and purified target regions are coverage, researchers frequently subsequently amplified and prepared increase the sequencing throughput Hybridization capture and for sequencing. Amplicon-based for their studies and try to increase amplicon-based enrichment are enrichment uses carefully designed coverage by brute force. However, this two general techniques used in PCR primers to flank targets and method is inefficient, increases costs, sequencing to enrich for specific specifically amplify regions of interest. and does not address the underlying genetic regions (Figure 1). The amplified products are then reasons for the poor coverage itself. Hybridization capture can be either purified from the sample material and By increasing throughput, genomic solution based or performed on a used for sequencing, bypassing the regions with sufficient coverage will solid substrate such as a microarray. need for enrichment by hybridization. be overrepresented and the reads Both require the use of synthesized effectively wasted. Areas with zero oligonucleotide probes (also known Advantages of amplicon-based coverage before may not have as baits) that are complementary to approaches sufficient coverage just by sequencing the genetic sequences of interest. There are many advantages in more sample. A more efficient way In solution-based methods, the using amplicon-based enrichment to address coverage is a targeted probes are biotinylated and added techniques compared to hybridization sequencing approach. Through to the genetic material in solution to capture methods (Table 2). targeted sequencing, researchers hybridize with the desired regions Amplicon-based approaches offer can focus on just their regions of of interest. Magnetic streptavidin a simpler, faster workflow with interest instead of sequencing the beads are used to capture and isolate unmatched PCR specificity, allowing entire genome. This helps to reduce the hybridized sequences from the enrichment for target gene regions costs and ensure sufficient coverage, unwanted genetic material. With from low sample input amounts. including in parts of the genome array-based capture, the probes are Genetic material from limited sample that may not have been accessible attached directly to a solid surface. sources, such as fine-needle aspirates previously. The genetic material is applied to or circulating tumor DNA, can be the microarray, and target regions sequenced for biomarker discovery. Targeted sequencing hybridize to the surface. Any unbound Leveraging current genomic knowledge, a targeted NGS approach utilizes molecular biology methods to enrich for specific genetic sequences, A Sequencing library B Genomic DNA allowing researchers to focus their studies on individual genes or Multiplex PCR genomic regions. Obtaining sequence Solution-based Array-based with Ion AmpliSeq primer pool selection selection coverage of challenging genomic regions is now possible, including regions from difficult sample types Partially digest primer sequences Hybridization such as samples with degraded DNA or RNA, or circulating cell-free DNA from blood. By only sequencing what Adaptors Washing you need, there are cost benefits and elution or Barcode adaptors Ligate adaptors beyond more efficient computational Enriched-sample sequencing A P1 processing and informatics. Focusing Nonbarcoded library or on specific regions of interest allows X P1 Barcoded library researchers to sequence at a much higher depth of coverage for rare Figure 1. Workflows for target enrichment. (A) Hybridization capture can be performed in solution variant discovery. Many more samples or on a solid substrate. (B) Amplicon-based target enrichment using Ion AmpliSeq technology. Table 2. Advantages of amplicon-based enrichment compared to hybridization multiplex up to 24,000 primer pairs in capture methods. a single reaction, allowing researchers Advantage Example Why is it important? to sequence hundreds of genes Liquid biopsies and Enables sequencing of limited samples, ensuring Low sample input from multiple samples in a single fine-needle aspirates appropriate sequencing coverage. Frequently studied genes such as PTEN have sequencing run with faster turnaround Pseudogenes Better enrichment for associated pseudogenes [3]. time and lower cost. Targeted homologous regions Monogenic disease genes often have functionally Paralogs redundant paralogs [4]. enrichment can be performed on Targeting of T cell receptor Predictive biomarker discovery for immunotherapy [5]. as little as 1 ng of low-quality DNA hypervariable regions Target and enrich Di- and tri-nucleotide Microsatellite instability studies for diseases such as or RNA, with more full-length PCR low-complexity regions repeats prostate cancer [6]. products and thus better coverage Known fusion oncogene for secretory cancers [7] Fusion detection ETV6-NTRK3 and other cancer types. NTRK3 inhibitors may be compared to other amplicon-based therapeutic for cancers exhibiting this biomarker. methods. The analytical validity of Ion AmpliSeq technology has been A targeted gene of interest may share better detect known and novel demonstrated in thousands of peer- homology with another segment