
SNP Genotyping - Overview Overview of SNP Genotyping • Project Rationale • Genotyping Strategies/Technical Leaps Debbie Nickerson • Data Management/Quality Control Department of Genome Sciences University of Washington [email protected] SNP Genotyping SNP Project Rationale Matched Mis-Matched Probe and Target C Allele T Allele • Heritability C Eclipse C C Allele-Specific Hybridization Target G A Dash Hybridize Fail to hybridize Molecular Beacon Affymetrix • Power - Number of Individuals C C C Taqman Target G A Degrade Fail to degrade • Number of SNPs - Candidate Gene, Pathway, Genome 5-10 SNPs, 400 to 1,000, 10K, 500K Fluorescence +ddCTP C Polymerase Extension Target G A Polarization C incorporated C Fails to incorporate Sequenom • DNA requirements C C C SNPlex Oligonucleotide Ligation Target G A Ligate Fail to ligate Parallele • Cost Illumina SNP Typing Formats Defining the scale of the genotyping project is key to selecting an approach: Scale 1000 individuals 5 to 10 SNPs in a candidate gene - Many approaches Microtiter Plates - Fluorescence Low $6,000 (expensive ~ 0.60 per SNP/genotype) eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping 48 ( to 96) SNPs in a handful of candidate genes $~29,000 (~ 0.25 to 0.30 per SNP/genotype) Medium Size Analysis by Electrophoresis 384 - 1,536 SNPs - cost reductions based on scale $57,600-122,880 eg. SNPlex - Intermediate Multiplexing (~0.08 - 0.15 per SNP/genotype) reduces costs - Genotype directly on 300,000 to 500,000 SNPs defined format $800,000 genomic DNA - new paradigm for high throughput (~0.002 per SNP/ genotype) Arrays - Custom or Universal High 10,000-20,000 SNPs - defined and custom formats $>250,000 (~0.03 per SNP/genotype) eg. Illumina, ParAllele, Affymetrics - Highly multiplexed - 96, 1,500 SNPs and beyond (500K+) 1 Many Approaches to Genotype a Handful of SNPs Taqman PCR region prior to SNP genotyping - Adds to cost - Many use modified primers - the more modified, Genotyping with fluorescence-based homogenous assays the higher the cost (single-tube assay) = 1 SNP/ tube • Taqman • Single base extension - Fluorescence Polarization Sequenom - Mass Spec • Eclipse • Dash • Molecular Beacons Genotype Calling - Cluster Analysis Genotyping by Mass Spectrometry - 24 SNP 1252 - T SNP 1252 - C SNPlex Assay - 48 SNPs Technological Leap - No advance PCR Universal PCR Priming site Allele Specific Sequence Universal PCR after preparing multiple regions for analysis - ZipCode1 Universal PCR Priming site A P Several based on primer specific on genomic DNA followed by G Locus Specific Sequence PCR of the ligated products - different strategies Genomic ZipCode2 C and different readouts. DNATarget 1. Ligation P SNPlex, Illumina, Parallele (Affymetrix) A Ligation Product G Formed C Also, reduced representation - Affymetrix (Homozygote shown in this case) - cut with restriction enzyme, then ligate linkers 2. Clean-up and amplify from linkers and follow by chip hybridization to read out. 2 PCR & ZipChute Hybridization Detection 3. Multiplexed Universal PCR 9. Characterize on Capillary Sequencer Univ. PCR Primer Biotin Univ. PCR Primer 4. Capture double stranded DNA- microtiter plate SNP 1 (Streptavidin) 5. Denature double stranded DNA 6. Wash away one strand 7. Zip Chute Hybridization • SNP 2 SNPlex Readout Multiplexed Genotyping - Universal Tag Readouts C T A G Locus 1 Specific Sequence Locus 2 Specific Sequence ZipChuten N(n) T Position n Tag1 sequence cTag1 sequence Tag2 sequence cTag2 sequence n ~ 48/lane Substrate Substrate Bead or Chip Bead or Chip ~2000 lanes/day Bead Array Chip Array Zipchute3 NNN T Position 3 ~96,000 genotypes/day Tag 1 Tag 2 Zipchute2 NN A Position 2 Tag 3 Zipchute1 N C Position 1 Tag 4 Multiplex ~96 - 20,000 SNPs Illumina ParAllele Not dependent on primary PCR Affymetrix Arrays - High Density Genotyping Sentrix™ Platform Thousands of SNPs and Beyond • “Bead” Arrays - Illumina – Manufactured by self-assembly – Beads identified by decoding Sentrix™ 96 Multi-array Matrix matches standard microtiter plates (96 - 1536 SNPs/well) Up to ~140,000 assays per matrix 3 Fluorescent Image of BeadArray Illumina Assay - 3 Primers per SNP ~ 3 micron diameter beads Universal forward Sequences (1, 2) ~ 5 micron center- 5’ Universal reverse to-center 3’ sequence G 3’ ~50,000 features (1-20 nt gap) 5’ on ~1.5 mm diameter A Illumicode ™ bundle Allele specific Locus specific Sequence tag Sequence Sequence Currently: up to 1,536 SNPs C genotyped per T Genomic DNA template bundle - at least 30 beads per code - SNP many internal replicates Allele-Specific Extension and GoldenGate™ Assay Ligation Amplification Polymerase Ligase Amplification A illumiCode #561 Genomic DNA [T/C] [T/A] Template Universal A illumiCode’ Address Allele Specific PCR Sequence 1 G Cy3 Universal Extension & Universal Universal Universal PCR with Primer 1 Ligation PCR Sequence 2 PCR Sequence 3’ Common Primer P3 Cy5 Universal Primers Primer 2 Hybridization to Universal IllumiCodeTM BeadArray Reader • Confocal laser scanning system • Resolution, 0.8 micron • Two lasers 532, 635 nm – Supports Cy3 & Cy5 imaging illumiCode illumiCode illumiCode #561 #217 #1024 • Sentrix Arrays (96 bundle) and /\/\/\/ /\/\/\/ /\/\/\/ /\/\/\/ /\/\/\/ Slides for 100k fixed formats A/A T/T C/T 4 Process Controls Illumina Readout for Sentrix Array > 1,000 SNPs Assayed on 96 Samples Mismatch High AT/GC Gender Gap First Hyb Second Hyb Contamination Multiplexed Genotyping - Universal Tag Readouts C T A G Locus 1 Specific Sequence Locus 2 Specific Sequence Parallele - Defined and Custom Formats Tag1 sequence cTag1 sequence Tag2 sequence cTag2 sequence Substrate Substrate Bead or Chip Bead or Chip - Intermediate Strategy Bead Array Chip Array Tag 1 - Multiplex ~ 20,000 SNPs Tag 2 - Affymetrix readout Universal Arrays Tag 3 Tag 4 Multiplex ~96 - 20,000 SNPs Illumina ParAllele Not dependent on primary PCR Affymetrix 5 Parallele Technology (MIP) Molecular Inversion Probes (MIP) Affymetrix’s Chip Whole Genome Association Strategies Two Platforms Available Different Designs - Affymetrix - Illumina 6 500K: Content Optimized SNP Selection ~2,200,000 SNPs • Initial Selection: 48 people ~2,200,000 SNPs – 2.2M SNPs From Public & Perlegen – 25 million genotypes Affymetrix GeneChip Mapping – 16 each Caucasian, African, Asian 48 individuals – All HapMap samples Call rate, concordance • Maximize performance: Second selection 500K Array Set over 400 people ~650K SNPs – 270 HapMap Samples – 130 diversity samples 400 samples – Accuracy Call rate, accuracy • HW, Mendel error, reproducibility – Call rates LD 500K SNPs • Maximize information content: – Prioritize SNPs based on LD & HapMap (Broad Institute) 80% genome coverage of Mapping 500K Mapping 500K Set • 500K run on 270 • >500K SNP’s HapMap samples – 2 array set • Pairwise r2 analysis for common SNPs • Performance (MAF>0.05) – 93-98% call rate range (>95% average) • Robust coverage across – >99.5% concordance with HapMap Genotypes, populations r2=0.8 99.9% reproducibility – CEPH, Asian ~66% • SNP lists, annotation and genotype data available – Yoruba ~45% without restriction at Affymetrix.com • 2 & 3 marker predictors (multimarker) further increase coverage Illumina - Infinium I & II Infinium II Assay 10K - 300K Single Base Extension Two haptens/colors Whole genome amplified DNA Bead U A a T T G G C T G WGA target SNP1 SNP2 SNP3 - - - - - SNP b A-DNP A-DNP C-Bio C-Bio A-DNP C-Bio SNP1 SNP2 SNP3 - - - - - SNP c Signal Green/Red 7 HumanHap-1 Genotyping BeadChip Content HumanHap300 Content Maximize coverage of human variation by choosing tag Strategy SNPs to uniquely identify haplotypes. • Tag SNPs – r2 ≥ 0.80 for bins containing SNPs within 10kb of genes Tag SNP selection process: or in evolutionarily conserved regions (ECRs) 1. Examine HapMap Phase I SNPs with MAF > 0.05 in CEU – r2 ≥ 0.70 for bins containing SNPs outside of genes or ECRs. 2. Bin SNPs in high LD with one another using ldSelect (Carlson, et al. 2004) • Additional Content 3. Select tag SNP with highest design score for each bin. – ~8,000 nsSNPs – ~1,500 tag SNPs selected from high density SNP data in the MHC region • Total 317,503 loci HumanHap300 Data Quality HumanHap300 Genomic Coverage 127 samples by Population 25 trios 1.0 15 replicates 0.9 0.8 Parameter Percent 0.7 0.6 Call rate 99.93% 0.5 0.4 Reproducibility >99.99% 0.3 CEU, mean 0.87 median 0.97 Coverage of Phase I+II Data CHB+JPT, mean 0.80 median 0.94 Mendelian 0.2 YRI, mean 0.57 median 0.55 0.035% 0.1 Inconsistencies 0.0 >0 >0.1 >0.2 >0.3 >0.4 >0.5 >0.6 >0.7 >0.8 >0.9 Concordance with Max r2 99.69% HapMap Data HumanHap500 Content Preliminary HumanHap500 Strategy Genomic Coverage by Population – Analysis of full HapMap data set (Phase I + II) using 1.0 HumanHap300 SNP list 0.9 0.8 – Fill in regions of low LD requiring higher density of tag 0.7 SNPs 0.6 0.5 CEU, mean 0.95 median 1.0 CHB+JPT, mean 0.93 median 1.0 0.4 YRI, mean 0.75 median 0.88 – Content Strategy 0.3 • r2 ≥ 0.80 for bins containing SNPs within 10kb of genes or in Coverage of Phase I+II Loci 0.2 evolutionarily conserved regions (ECRs) in CEU 2 0.1 • r ≥ 0.70 for bins containing SNPs outside of genes or ECRs in CEU 2 0.0 • r ≥ 0.80 for large bins (≥ 3 SNPs) in CHB+JPT population >0 >0.1 >0.2 >0.3 >0.4 >0.5 >0.6 >0.7 >0.8 >0.9 2 • r ≥ 0.70 for large bins (≥ 5 SNPs) in YRI population Max r2 8 Data Quality Control Measuring Error Rates • Genotype replicate Rep 1 • Estimating Error Rates samples CC CT TT • Hardy Weinberg Equilibrium • Error rates generally Rep 2 CC 24 1 0 < <1% • Frequency Analysis CT 0 50 0 • Error rates are SNP • Missing Data TT 0 0 25 specific Measuring Error Rates Replicate samples • Genotype replicate Rep 1 • Replicates can also samples CC CT TT detect sample handling errors • Absolute number of Rep 2 CC 24 1 0 replicates is more – Wrong plate CT 0 50 0 important than – Plate rotation percentage TT 0 0 25 – E.g.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-