Quantitative-PCR Validation of 154 Genomic Segments Called As Cnvs in Five Replicat
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary Table 1 status number of regions calls in A calls in B calls in C calls in D calls in E average non validated 31 5 6 5 1 0 3.4 validated 123 78 77 74 52 43 64.8 total 154 83 83 79 53 43 68.2 false positive rate * 3.2% 3.9% 3.2% 0.6% 0.0% 2.2% false negative rate # 29.2% 29.9% 31.8% 46.1% 51.9% 37.8% % false positive calls $ 6.0% 7.2% 6.3% 1.9% 0.0% 5.0% Supplementary Table 1: Quantitative-PCR validation of 154 genomic segments called as CNVs in five replicate comparisons of NA15510 versus NA10851 on WGTP array Replicate experiments A to E are ranked by global SDe (A: 0.033; B: 0.033; C: 0.036; D: 0.039; E: 0.053). *: false positive rate = number of called but not validated regions / total number of tested regions #: false negative rate = number of non called but validated regions / total number of tested regions $: % false positive calls = number of called but not validated regions / total number of calls False positive estimates for 500K EA CNV calls Total Rep1 Rep2 Rep3 Avg (unique) Validated 33 28 32 31 38 Not validated 2 2 2 2 5 Total 35 30 34 33 43 % False positive 5.71% 6.67% 5.88% 6.09% - % False negative 13.16% 26.32% 15.79% 18.42% - Supplementary Table 2A : Quantitative PCR validation of 43 unique CNV regions called as CNVs in three replicate comparisons of NA15510 versus NA10851 using the 500K EA array. % False positive: Calculated for each replicate independently. Formula = # non-validated CNVs per replicate /total CNVs called per replicate % False negative = (# total unique validated CNVs - number of validated replicate CNVs) / # total unique validated CNVs For these experiments, each CNV called in each of the three replicates was tested using quantitative PCR or Mass Spectrometry as an independent validation method (see also Supplementary Table 4). Each pair-wise comparison gives rise to 2 false positive CNV calls, or approximately 6% of the calls. The percentage of false negative calls is estimated by calculating the % of validated CNVs that are captured in each replicate relative to the total number found in all three replicates. This is likely an underestimation of the false negative rate, since this calculation only considers CNVs from the 500K EA platform that have been called with our final algorithm parameters. Total Rep1 Rep2 Rep3 Avg (unique) Validated 13 13 18 14.67 18 Not validated 1 0 0 0.33 1 Total 14 13 18 15 19 % False positive 7.14% 0.00% 0.00% 2.38% - % False negative 27.8% 27.8% 0.0% 24.1% - Supplementary Table 2B : Quantitative PCR validation of 19 unique CNV regions called as CNVs in three replicate comparisons of NA15510 versus the HapMap reference set using the 500K EA array. This table uses the same calculations as above using validation data from CNVs called from NA15510 compared to the 270 HapMap samples. The total number of CNVs detected by each replicate is lower in the population wide comparisons because CNVs are being detected only in the test sample, whereas in the pair-wise comparisons, CNVs are being detected both in the test and reference sample. Supplementary Table 3. Overlap of CNVs called from NA15510 with Tuzun et al., (2005) A) Tuzun deletions detectable by 500K EA platform using 4 SNP criteria median SNPs in 500K EA NA15510 500K EA NA15510 500K EA NA10851 vs WGTP NA15510 vs Tuzun call chr begin_span end_span size size window minimum SNPs vs NA10851 vs HapMap HapMap NA10851 Deletion 20 14700001 14897555 165991 36 35 loss loss Not called loss Deletion 16 34236453 34590205 329311 30 30 Not called gain gain Not called Deletion 15 32472522 32628056 149270 13 13 Not called loss loss Not called Deletion 22 21476328 21576188 72623 13 12 gain Ig locus Ig locus gain Deletion 11 55115601 55226935 85414 8 6 loss homozygous deletion Not called loss Deletion 1 149363029 149421545 32480 8 0 Not called Not called Not called loss Deletion 7 141501682 141557369 22083 7.5 2 Not called Not called Not called Not called Deletion 8 1327422 1358228 13004 7 0 Not called Not called Not called Not called Deletion 12 57006943 57032578 12415 7 2 Not called Not called Not called Not called Deletion 11 4915812 4941949 10197 5.5 1 Not called Not called Not called Not called Deletion 16 83201418 83232168 18892 5.5 3 loss loss Not called Not called Deletion 19 59408310 59460933 21523 5 2 loss loss Not called Not called Deletion 8 126662647 126708435 10798 4 1 Not called Not called Not called Not called Deletion 17 11157980 11202026 11860 4 0 Not called Not called Not called Not called Deletion 1 112402144 112423342 12173 4 2 Not called Not called loss Not called Deletion 11 5738669 5768809 22662 4 3 Not called loss loss Not called Deletion 16 76916517 76969258 14067 4 0 Not called Not called Not called Not called Deletion 4 9874892 9914828 22358 4 0 Not called Not called Not called Not called Deletion 1 149559508 149586671 11100 4 0 Not called Not called loss Not called B) Tuzun deletions detectable by WGTP using a 50kb size requirement WGTP NA15510 500K EA NA15510 500K EA NA15510 500K EA NA10851 Tuzun call chr begin_span end_span size vs NA10851 vs NA10851 vs HapMap vs HapMap Notes Deletion 20 14700001 14897555 165991 loss loss loss no call - Deletion 14 105301889 105469293 136594 gain no call Ig locus Ig locus Ig locus homozygous Deletion 11 55115601 55226935 85414 loss loss deletion no call - Deletion 22 21476328 21576188 72623 gain gain Ig locus Ig locus Ig locus Deletion 16 34236453 34590205 329311 no call no call gain gain Same CNV genotype for NA15510 and NA10851 Deletion 17 41724262 41981453 218711 no call no call Not called no call Potential insertion in human genome reference DNA: Deletion 15 32472522 32628056 149270 no call no call loss loss Same CNV genotype for NA15510 and NA10851 Deletion 3 163991649 164115393 112179 no call no call no call no call Potential insertion in human genome reference DNA: Deletion 7 101825756 101930685 96208 no call no call no call no call No BAC clones in region Deletion 1 25318324 25417454 67848 no call no call no call no call False negative WGTP Supplementary Table 3: Overlap of CNVs called from NA15510 with Tuzun et al., (2005) for A) 500K EA calls and B) WGTP calls A) For the 500K EA platform, there are 5 Tuzun deletion regions identified by fosmid end sequencing that are detectable, having a minimum of 4 SNPs in any given size region. To calculate the median and mimimum number of SNPs, a sliding window was used for the known size of the deletion within the genomic span as defined by Tuzun et al., 2005. All 5 detectable regions are called by 500K EA in either the pair-wise comparison with NA10851 or in the population wide comparison with all 270 HapMap samples, 14 additional deletions contain a median of at least 4 SNPs, and 3 of these are called in pair-wise or population wide comparisons. B) For the WGTP platform, 10 deletions identified by fosmid end sequencing should be large enough for BAC array CGH, with a size larger than 50kb. 4 of these 10 regions are called, 2 of the non-called retions contain a CNV in the reference sample, and 2 of the non-called regions are likely artefacts of the fosmid end sequencing method, since these regions harbor duplication or complex CNVs in the HapMap population (see Supplementary Table 8). Only one of these non-called regions may be a true false negative. Supplementary Table 4: Validation data for CNVs called by WGTP and 500K EA CNVs called by WGTP Quantitative PCR validation by SYBR Green real-time PCR (NA10851 vs NA15510) for WGTP 0=not validated Sanger Called by qPCR 1=confirmed CGH Called by Affy Affy result 2=mapping Region Chr Start End Result Affy 060308 validation 060308 problem new Chr new Start new End Forward primer Reverse primer Qty NA15510 (rel to 1) SD NA15510 Chr1tp-25F10 1 12757377 12790369 Loss No No Loss 1 1 12757377 12842343 ACAATCTTCTCTCTGGCCTCTG AGGGTGTTTACCCACAAAAATG 0.421 0.099 Chr1tp-6D2 1 103923046 104164023 Loss No No Loss 1 1 103898814 104049621 GGAAGAATATAGATGCCAACCC TCAGCTGAACTGGATCATTTGT 0.798 0.068 Chr1tp-38C8 1 120245520 120395759 Gain No No Gain 1 1 120245520 120395759 GCAGACATATTGCACCCAGATA TTCCAGTGTTTAACGCTTCTGA 1.590 0.276 Chr1tp-36B8 1 141918516 142199768 Gain No No Gain 1 1 141851837 141979099 AGTGGAAGCAGAGACTCCAGAC AAGGAAGCACATAACCTCCAAA 1.207 0.088 Chr1tp-7D2 1 142391276 142548088 Gain Gain Gain Gain Gain 1 1 142391276 142548088 ACTGGCTCTCCTCACAAAGTTC ATAAGGGGGAAGAAAACCAAAA 1.860 0.194 Chr1tp-20E3 1 142678316 142882622 Gain No No Gain 1 1 142678316 142882622 AGGTAGGGGTGATGCTAGTCAA AAAAACAAAACCAGGGACAAGA 1.297 0.069 Chr1tp-7H5 1 145888591 146093292 Gain Gain Gain Gain Gain 1 1 145888591 146093292 CCCAAGGGAGAAAAGAGTAGGT AGGAGAGTTTCTGCAGTCTTGG 1.245 0.049 Chr3tp-19E4 3 19709162 19895852 Gain Gain Gain Gain 1 3 19709162 19895852 CCCTTCTCATCTCTCACCAAAC CATCTAACCAGATGCCATTTCA 1.453 0.087 Chr3tp-14B12 3 20240591 20399098 Gain Gain Gain Gain 1 3 20240591 20399098 TTGCTGGAAATGAAAACAAATG TACAGGACAGTGCTCTGGAAGA 1.329 0.164 Chr3tp-3C7 3 98699482 98904950 Loss No No Loss 1 3 98699482 98904950 GCCATTTGATTCCTAGGTTCTG TATGGGAAAGCCTGACATTTCT 0.862 0.052 Chr3tp-9H6 3 196908420 197078010 Loss No No No change 1 3 196908420 197078010 GACCTAAACAAGATGGCCTGTC AAGTTCCACCTCACGTTGTCTT GGTGTTTCTGATATGCAGGTGA ACAGGGCTTGAAGAAGTTACCA NS NS TCTAGGACTGCGATGGTGTATG GTATCGCCCTGATAGATTCCTG Chr3tp-10B7 3 196920033 197108192 Loss No No No change 1 3 196920033 197108192 GACCTAAACAAGATGGCCTGTC AAGTTCCACCTCACGTTGTCTT GGTGTTTCTGATATGCAGGTGA ACAGGGCTTGAAGAAGTTACCA NS NS TCTAGGACTGCGATGGTGTATG GTATCGCCCTGATAGATTCCTG Chr4tp-9A2 4 9000913 9120591 Gain No No Gain 1 4 9000913 9120591 AATCCCTAGAGCCAGGATCTTC AAATTGTAATGCTGCCGAAAGT