Exome Sequencing Resolves Apparent Incidental Findings and Reveals Further Complexity of SH3TC2 Variant Alleles Causing Charcot
Total Page:16
File Type:pdf, Size:1020Kb
Lupski et al. Genome Medicine 2013, 5:57 http://genomemedicine.com/content/5/6/57 RESEARCH Open Access Exome sequencing resolves apparent incidental findings and reveals further complexity of SH3TC2 variant alleles causing Charcot-Marie-Tooth neuropathy James R Lupski1,2*, Claudia Gonzaga-Jauregui1, Yaping Yang3, Matthew N Bainbridge2, Shalini Jhangiani2, Christian J Buhay2, Christie L Kovar2, Min Wang2, Alicia C Hawes2, Jeffrey G Reid2, Christine Eng3, Donna M Muzny2 and Richard A Gibbs1,2* Abstract Background: The debate regarding the relative merits of whole genome sequencing (WGS) versus exome sequencing (ES) centers around comparative cost, average depth of coverage for each interrogated base, and their relative efficiency in the identification of medically actionable variants from the myriad of variants identified by each approach. Nevertheless, few genomes have been subjected to both WGS and ES, using multiple next generation sequencing platforms. In addition, no personal genome has been so extensively analyzed using DNA derived from peripheral blood as opposed to DNA from transformed cell lines that may either accumulate mutations during propagation or clonally expand mosaic variants during cell transformation and propagation. Methods: We investigated a genome that was studied previously by SOLiD chemistry using both ES and WGS, and now perform six independent ES assays (Illumina GAII (x2), Illumina HiSeq (x2), Life Technologies’ Personal Genome Machine (PGM) and Proton), and one additional WGS (Illumina HiSeq). Results: We compared the variants identified by the different methods and provide insights into the differences among variants identified between ES runs in the same technology platform and among different sequencing technologies. We resolved the true genotypes of medically actionable variants identified in the proband through orthogonal experimental approaches. Furthermore, ES identified an additional SH3TC2 variant (p.M1?) that likely contributes to the phenotype in the proband. Conclusions: ES identified additional medically actionable variant calls and helped resolve ambiguous single nucleotide variants (SNV) documenting the power of increased depth of coverage of the captured targeted regions. Comparative analyses of WGS and ES reveal that pseudogenes and segmental duplications may explain some instances of apparent disease mutations in unaffected individuals. Keywords: Exome sequencing, Whole-genome sequencing, Incidental findings, SH3TC2, Personal genomes, Precision medicine * Correspondence: [email protected]; [email protected] 1Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA Full list of author information is available at the end of the article © 2013 Lupski et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Lupski et al. Genome Medicine 2013, 5:57 Page 2 of 17 http://genomemedicine.com/content/5/6/57 Background databases and the lack of clinical indications for any of Exome sequencing (ES) is an approach to human genome these five conditions, led to the conclusion that these analysis and genetic variant detection that focuses on the alleles were most probably not pathogenic in the proband. coding exons of genes and closely linked functional ele- In the period since the initial CMT study, technologies ments. ES is less expensive than comparable approaches for both WGS and ES have advanced considerably. We based on whole genome sequencing (WGS), because developed a series of improved ES methods and reagents there is overall less raw DNA sequence data required. that reliably assay >95% of the coding regions of the Furthermore, coding regions contain the changes that are human genome [5,6]. The current ES assay design currently the most easily interpretable, and knowledge involves routine ‘oversampling’ of the targeted regions, gained from ES may therefore have more immediate med- and ensures that a minimum of 95% (average 97%) of ical utility and application. There is a generally held belief, these positions are tested with >20-fold NGS coverage. however, that WGS may be more informative than ES, The new methods also take advantage of longer indivi- even within the boundaries of exome regions, as the ran- dual DNA sequence reads generated on the Illumina dom distribution of individual sequence reads offers an HiSeq platform (2 × 100 bp) versus the previous SOLiD overall higher likelihood of effective testing of individual platform (2 × 50 bp) used for WGS. sites in the genome. Further, biases inherent to the cap- ture process may result in missing or poorly covered Methods bases in a subset of the exome regions [1]. Sample inclusion We previously applied WGS using massively parallel This study conforms to the Helsinki Declaration regarding next-generation sequencing (NGS) methods to a subject ethical principles for medical research involving human with molecularly undefined, autosomal recessive, Char- subjects. Subject BAB195 and additional family members cot-Marie-Tooth (CMT) neuropathy and identified were recruited and consented for genomic and DNA stu- apparently causative compound heterozygous SH3TC2 dies under a research protocol approved by the Institutional variants: nonsense p.R954X (g. chr5: 148,406,435 G>A) Review Board of Baylor College of Medicine. All patients and missense p.Y169H alleles (g. chr5: 148,422,281 gave informed consent for participation in this study. A>G) [2]. The two alleles satisfied multiple criteria for disease causation: both were validated with alternate DNA preparation methods technologies; the nonsense p.R954X allele had been pre- Blood was directly collected in PAXgene Blood DNA tubes viously reported in unrelated CMT families and corro- and DNA was isolated using the PAXgene Blood DNA kit borated with functional studies; each segregated (PreAnalytiX, Qiagen, Valencia, CA, USA). The quality of faithfully in the pedigree and no other lesion was dis- the DNA sample was ascertained by electrophoresis and covered in the locus despite adequate coverage of all determined to be of high quality (size >23 kb) with no visi- coding bases with NGS reads. Further, each allele faith- ble degradation. Quantity was determined using standard fully independently segregated with additional different Pico Green assays. mild phenotypes observed in other family members - susceptibility to carpal tunnel syndrome (presumably Description of NimbleGen VCRome 2.1 exome capture due to loss-of-function nonsense mutation) and an elec- design trophysiologically identified subclinical axonopathy (mis- In the current methods, the HGSC-design ‘VCRome 2.1’ sense allele). reagent [7] was used for exome capture. This NimbleGen Unresolved issues in the study included discovery of probe set targets the Vega [8], CCDS, and RefSeq gene additional reportedly pathogenic alleles at other loci, models, including predicted genes within RefSeq, as well related to potentially medically actionable conditions (inci- as microRNA (miRNA) [9] and regulatory regions for a dental or secondary findings). For example, the proband total target size of 42 Mb. Thus, exome capture sequen- was suggested to have a dominant variant that was causa- cing will identify SNPs and indels in protein coding tive for adrenoleukodystrophy (ALD, MIM #300100) [3]; regions, exon flanking sequences (including intron splice ABCD1, chrX:153,008,483 G>A; p.G608D, as well as to be sites), regulatory regions, and small non-coding RNAs (for homozygous for an allele causative for spinal muscular example, microRNAs). atrophy with respiratory distress type 1 (SMARD, MIM #604320) [4]; IGHMBP2, chr11:68,705,674C>A; pT879K. Illumina exome sequencing on GAII and HiSeq These variants were confirmed by Sanger sequencing and Library construction restriction endonuclease digestion (Figure 1). The proband Genomic DNA samples were constructed into Illumina was also found by WGS to be homozygous for reported PairEnd pre-capture libraries according to the manufac- causative mutations in three other Mendelian disease turer’s protocol (Illumina Multiplexing_SamplePrep_- genes. The possibility for erroneous entries in public Guide_1005361_D) with modifications as described in the Lupski et al. Genome Medicine 2013, 5:57 Page 3 of 17 http://genomemedicine.com/content/5/6/57 Figure 1 Confirmation and segregation by endonuclease restriction digestion of mutations presumed to represent incidental findings. The upper gel shows restriction digestion using the HphI restriction enzyme for the p.G608D mutation in ABCD1, causative for adrenoleukodystrophy (ALD). All tested individuals, none of which presented with ALD, are shown to be heterozygous for the mutation, including all males whom are hemizygous for genes on the × chromosome. The lower gel shows restriction digestion using the BtgI restriction enzyme for the p.T879K mutation in IGHMBP2, putatively causative for spinal muscular atrophy with respiratory distress type 1 (SMARD1). Random segregation and zygosity for this variant can be observed in this family; none of the individual subjects present with SMARD. BCM-HGSC Illumina Barcoded Paired-End Capture adaptor sequences. Post-capture LM-PCR amplification Library Preparation protocol. The complete library and was performed using the 2X SOLiD Library