Assessing the Necessity of Confirmatory Testing for Exome-Sequencing Results in a Clinical Molecular Diagnostic Laboratory
Total Page:16
File Type:pdf, Size:1020Kb
ORIGINAL RESEARCH ARTICLE © American College of Medical Genetics and Genomics Assessing the necessity of confirmatory testing for exome-sequencing results in a clinical molecular diagnostic laboratory Samuel P. Strom, PhD1,2, Hane Lee, PhD1, Kingshuk Das, MD1, Eric Vilain, MD, PhD2,3, Stanley F. Nelson, MD1,2, Wayne W. Grody, MD, PhD1,2,3 and Joshua L. Deignan, PhD1 Purpose: Sanger sequencing is currently considered the gold stan- quality scores <Q500, six were confirmed by Sanger sequencing dard methodology for clinical molecular diagnostic testing. How- (85%). ever, next-generation sequencing has already emerged as a much more efficient means to identify genetic variants within gene panels, Conclusion: For single-nucleotide variants, we predict that going the exome, or the genome. We sought to assess the accuracy of next- forward, we will be able to reduce our Sanger confirmation workload generation sequencing variant identification in our clinical genomics by 70–80%. This serves as a proof of principle that as long as sufficient laboratory with the goal of establishing a quality score threshold for validation and quality control measures are implemented, the vol- confirmatory Sanger-based testing. ume of Sanger confirmation can be reduced, alleviating a significant amount of the labor and cost burden on clinical laboratories wishing Methods: Confirmation data for reported results from 144 sequen- to use next-generation sequencing technology. However, Sanger con- tial clinical exome-sequencing cases (94 unique variants) and an firmation of low-quality single-nucleotide variants and all insertions additional set of 16 variants from comparable research samples were or deletions <10 bp remains necessary at this time in our laboratory. analyzed. Genet Med advance online publication 9 January 2014 Results: Of the 110 total single-nucleotide variants analyzed, 103 variants had a quality score ≥Q500, 103 (100%) of which were con- Key Words: exome; molecular diagnostics; next-generation sequenc- firmed by Sanger sequencing. Of the remaining seven variants with ing; variant confirmation Next-generation sequencing (NGS) technologies require prob- recalibration to estimate posterior probabilities for each variant abilistic algorithms for the conversion of uniquely aligned short call (with hapmap_3.3.b37.sites and 1000G_omni2.5.b37.sites sequence reads into genotypes. These algorithms are sensi- for training resources). tive to multiple sources of error, including sequencing errors, Although technically these final quality scores (“Qscores”, incorrect alignment (“mismapping”), and random sampling.1–8 or “QN,” where N is a value greater than zero) are reported as False-positive results due to sequencing errors are particularly log-scaled probabilities, comparison across experiment types prevalent when read depth is below 10 reads per base on aver- is not advisable because of the large degree of variability of age (“10× coverage” by convention).3 Due to this uncertainty, data volume, data quality, and options between NGS analyti- amplification-based dye terminator dideoxy DNA (“Sanger”) cal pipelines. In this study, Qscores are considered to be rela- sequencing has been used routinely to confirm NGS results.9–16 tive measures and are compared only between clinical exome- However, as read depth increases and additional samples are sequencing (CES) data sets from the end-to-end analytically tested using a consistent experimental protocol and analytical validated procedures established in the University of California, pipeline, more information is available to interrogate the valid- Los Angeles (UCLA) Clinical Genomics Center, which is part ity of a given variant call. Valuable data, in addition to the count of the UCLA Molecular Diagnostics Laboratories (accredited of reference and nonreference (“variant”) nucleotides observed by both the Clinical Laboratory Improvement Amendments at a given position, are amassed. These data include mapping (CLIA) and the College of American Pathologists). quality, strand origin, base call quality, position of the variant For variants with high quality scores (>Q10,000) and high within a sequence read, haplotype information, and cross- coverage (>100×), the amount of information supporting the sample comparisons. The commonly used genotype-calling genotype call is overwhelming. For such variants, failure to rep- pipeline using the Genome Analysis Toolkit1,17 implements a licate the finding by Sanger sequencing is highly indicative of Bayesian genotype likelihood model (based on known poly- human error (e.g., sample swap). Thus, for high-quality NGS morphic loci such as dbSNP variants) and variant quality score variants, Sanger confirmation serves almost exclusively as a 1Department of Pathology and Laboratory Medicine¸ David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA; 2Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA; 3Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA. Correspondence: Samuel P. Strom ([email protected]) Submitted 10 July 2013; accepted 18 October 2013; advance online publication 9 January 2014. doi:10.1038/gim.2013.183 510 Volume 16 | Number 7 | July 2014 | GENETIcS in mEDIcINE Necessity of exome confirmatory testing | STROM et al ORIGINAL RESEARCH ARTICLE sample quality control measure. Therefore, it is the goal of this individual gene assessments or regions of homozygosity from study to establish a conservative internal quality score cutoff, chromosomal microarray analyses) are cross-referenced with above which Sanger confirmation of CES-identified variants the CES data also. Some additional steps that laboratories will no longer be a necessary quality control measure in our could use to reduce sample errors include running samples laboratory. in duplicate, running the CES assay in parallel with a sin- gle-nucleotide polymorphism array or genotyping identity MATERIALS AND METHODS panel for concordance analysis (if not previously performed CES procedure as mentioned earlier), or spiking the blood sample with a Exome sequencing was performed in the UCLA Clinical unique plasmid during extraction and confirming that the Genomics Center (http://pathology.ucla.edu/genomics) fol- plasmid sequence occurs in the final result. lowing validated protocols. Briefly, high-molecular-weight genomic DNA was isolated from whole blood collected in Variant selection a lavender top tube (ethylene diamine tetraacetic acid dipo- All clinically reported variants, both clinically significant tassium (K2EDTA) or ethylene diamine tetraacetic acid tri- findings and variants of uncertain significance, were selected potasium (K3EDTA)) using a QIAcube (Qiagen, Venlo, The for confirmation. In total, 110 unique single-nucleotide vari- Netherlands). For all of the clinical samples, exome sequenc- ants (SNVs) were selected for Sanger confirmation (Table 1), ing was performed using the Agilent SureSelect Human All and a subset (16 SNVs) of these was randomly selected Exon 50mb (Agilent Technologies, Santa Clara, CA) for for assessment from a pool of variants with quality scores exome capture and Illumina HiSeq2000 (Illumina, San Diego, <Q2000. These additional variants had not been clinically CA) for sequencing as 50-bp paired-end runs using V3 chem- reported because they are not in genes that are known to istry. For the nonclinical samples, Agilent SureSelect Human cause any clinical condition. All Exon 50mb XT kit (V2) was used for exome capture and All SNVs selected, regardless of report status, are predicted to Illumina HiSeq2000 for sequencing as 100-bp paired-end be nonsynonymous and are rare (with an average minor allele runs using V3 chemistry. frequency <1% in the Exome Variant Server).19 Data analysis was performed using the analytical pipeline implemented and validated for CES in the UCLA Clinical Sanger sequencing Genomics Center. All sequence reads were aligned to the PCR primers were designed for each target locus using the human reference genome (Human GRCh37/hg19) using Web-based Primer3Plus software (Andreas Untergasser and Novoalign (Novocraft, Selangor, Malaysia). Polymerase chain Harm Nijveen).20 Targets were amplified using PCR and sub- reaction (PCR) duplicates were marked by Picard (http:// jected to agarose gel electrophoresis for size analysis of resulting picard.sourceforge.net/), and Genome Analysis Toolkit (Broad amplicons. If no amplicon was observed, multiple amplicons Institute, Cambridge, MA) was used to realign indels, recali- were observed, or an amplicon of improper size was observed, brate the quality scores, call, filter, recalibrate, and evaluate the a second independent set of PCR primers was designed and variants. All variants called across the protein-coding regions tested in a similar fashion. and the flanking junctions were annotated using Variant Unique, properly sized amplicons were purified using stan- Annotator X, an in-house MySQL database using data from the dard techniques. BigDye Terminator DNA-sequencing reac- publicly available Ensembl Variant Effect Predictor (European tions were then performed on eluted amplicons, and sequenc- Bioinformatics Institute, Hinxston, UK).18 A detailed descrip- ing was conducted by automated capillary gel electrophoresis tion of the bioinformatic methods used to analyze these data is (ABI 3730, Life Technologies, Carlsbad,