Overview of SNP Genotyping • Project Rationale

Overview of SNP Genotyping • Project Rationale

SNP Genotyping - Overview Overview of SNP Genotyping • Project Rationale • Genotyping Strategies/Technical Leaps Debbie Nickerson • Data Management/Quality Control Department of Genome Sciences University of Washington [email protected] SNP Genotyping SNP Project Rationale Matched Mis-Matched Probe and Target C Allele T Allele • Heritability C Eclipse C C Allele-Specific Hybridization Target G A Dash Hybridize Fail to hybridize Molecular Beacon Affymetrix • Power - Number of Individuals C C C Taqman Target G A Degrade Fail to degrade • Number of SNPs - Candidate Gene, Pathway, Genome 5-10 SNPs, 400 to 1,000, 10K, 500K Fluorescence +ddCTP C Polymerase Extension Target G A Polarization C incorporated C Fails to incorporate Sequenom • DNA requirements C C C SNPlex Oligonucleotide Ligation Target G A Ligate Fail to ligate Parallele • Cost Illumina SNP Typing Formats Defining the scale of the genotyping project is key to selecting an approach: Scale 1000 individuals 5 to 10 SNPs in a candidate gene - Many approaches Microtiter Plates - Fluorescence Low $6,000 (expensive ~ 0.60 per SNP/genotype) eg. Taqman - Good for a few markers - lots of samples - PCR prior to genotyping 48 ( to 96) SNPs in a handful of candidate genes $~29,000 (~ 0.25 to 0.30 per SNP/genotype) Medium Size Analysis by Electrophoresis 384 - 1,536 SNPs - cost reductions based on scale $57,600-122,880 eg. SNPlex - Intermediate Multiplexing (~0.08 - 0.15 per SNP/genotype) reduces costs - Genotype directly on 300,000 to 500,000 SNPs defined format $800,000 genomic DNA - new paradigm for high throughput (~0.002 per SNP/ genotype) Arrays - Custom or Universal High 10,000-20,000 SNPs - defined and custom formats $>250,000 (~0.03 per SNP/genotype) eg. Illumina, ParAllele, Affymetrics - Highly multiplexed - 96, 1,500 SNPs and beyond (500K+) 1 Many Approaches to Genotype a Handful of SNPs Taqman PCR region prior to SNP genotyping - Adds to cost - Many use modified primers - the more modified, Genotyping with fluorescence-based homogenous assays the higher the cost (single-tube assay) = 1 SNP/ tube • Taqman • Single base extension - Fluorescence Polarization Sequenom - Mass Spec • Eclipse • Dash • Molecular Beacons Genotype Calling - Cluster Analysis Genotyping by Mass Spectrometry - 24 SNP 1252 - T SNP 1252 - C SNPlex Assay - 48 SNPs Technological Leap - No advance PCR Universal PCR Priming site Allele Specific Sequence Universal PCR after preparing multiple regions for analysis - ZipCode1 Universal PCR Priming site A P Several based on primer specific on genomic DNA followed by G Locus Specific Sequence PCR of the ligated products - different strategies Genomic ZipCode2 C and different readouts. DNATarget 1. Ligation P SNPlex, Illumina, Parallele (Affymetrix) A Ligation Product G Formed C Also, reduced representation - Affymetrix (Homozygote shown in this case) - cut with restriction enzyme, then ligate linkers 2. Clean-up and amplify from linkers and follow by chip hybridization to read out. 2 PCR & ZipChute Hybridization Detection 3. Multiplexed Universal PCR 9. Characterize on Capillary Sequencer Univ. PCR Primer Biotin Univ. PCR Primer 4. Capture double stranded DNA- microtiter plate SNP 1 (Streptavidin) 5. Denature double stranded DNA 6. Wash away one strand 7. Zip Chute Hybridization • SNP 2 SNPlex Readout Multiplexed Genotyping - Universal Tag Readouts C T A G Locus 1 Specific Sequence Locus 2 Specific Sequence ZipChuten N(n) T Position n Tag1 sequence cTag1 sequence Tag2 sequence cTag2 sequence n ~ 48/lane Substrate Substrate Bead or Chip Bead or Chip ~2000 lanes/day Bead Array Chip Array Zipchute3 NNN T Position 3 ~96,000 genotypes/day Tag 1 Tag 2 Zipchute2 NN A Position 2 Tag 3 Zipchute1 N C Position 1 Tag 4 Multiplex ~96 - 20,000 SNPs Illumina ParAllele Not dependent on primary PCR Affymetrix Arrays - High Density Genotyping Sentrix™ Platform Thousands of SNPs and Beyond • “Bead” Arrays - Illumina – Manufactured by self-assembly – Beads identified by decoding Sentrix™ 96 Multi-array Matrix matches standard microtiter plates (96 - 1536 SNPs/well) Up to ~140,000 assays per matrix 3 Fluorescent Image of BeadArray Illumina Assay - 3 Primers per SNP ~ 3 micron diameter beads Universal forward Sequences (1, 2) ~ 5 micron center- 5’ Universal reverse to-center 3’ sequence G 3’ ~50,000 features (1-20 nt gap) 5’ on ~1.5 mm diameter A Illumicode ™ bundle Allele specific Locus specific Sequence tag Sequence Sequence Currently: up to 1,536 SNPs C genotyped per T Genomic DNA template bundle - at least 30 beads per code - SNP many internal replicates Allele-Specific Extension and GoldenGate™ Assay Ligation Amplification Polymerase Ligase Amplification A illumiCode #561 Genomic DNA [T/C] [T/A] Template Universal A illumiCode’ Address Allele Specific PCR Sequence 1 G Cy3 Universal Extension & Universal Universal Universal PCR with Primer 1 Ligation PCR Sequence 2 PCR Sequence 3’ Common Primer P3 Cy5 Universal Primers Primer 2 Hybridization to Universal IllumiCodeTM BeadArray Reader • Confocal laser scanning system • Resolution, 0.8 micron • Two lasers 532, 635 nm – Supports Cy3 & Cy5 imaging illumiCode illumiCode illumiCode #561 #217 #1024 • Sentrix Arrays (96 bundle) and /\/\/\/ /\/\/\/ /\/\/\/ /\/\/\/ /\/\/\/ Slides for 100k fixed formats A/A T/T C/T 4 Process Controls Illumina Readout for Sentrix Array > 1,000 SNPs Assayed on 96 Samples Mismatch High AT/GC Gender Gap First Hyb Second Hyb Contamination Multiplexed Genotyping - Universal Tag Readouts C T A G Locus 1 Specific Sequence Locus 2 Specific Sequence Parallele - Defined and Custom Formats Tag1 sequence cTag1 sequence Tag2 sequence cTag2 sequence Substrate Substrate Bead or Chip Bead or Chip - Intermediate Strategy Bead Array Chip Array Tag 1 - Multiplex ~ 20,000 SNPs Tag 2 - Affymetrix readout Universal Arrays Tag 3 Tag 4 Multiplex ~96 - 20,000 SNPs Illumina ParAllele Not dependent on primary PCR Affymetrix 5 Parallele Technology (MIP) Molecular Inversion Probes (MIP) Affymetrix’s Chip Whole Genome Association Strategies Two Platforms Available Different Designs - Affymetrix - Illumina 6 500K: Content Optimized SNP Selection ~2,200,000 SNPs • Initial Selection: 48 people ~2,200,000 SNPs – 2.2M SNPs From Public & Perlegen – 25 million genotypes Affymetrix GeneChip Mapping – 16 each Caucasian, African, Asian 48 individuals – All HapMap samples Call rate, concordance • Maximize performance: Second selection 500K Array Set over 400 people ~650K SNPs – 270 HapMap Samples – 130 diversity samples 400 samples – Accuracy Call rate, accuracy • HW, Mendel error, reproducibility – Call rates LD 500K SNPs • Maximize information content: – Prioritize SNPs based on LD & HapMap (Broad Institute) 80% genome coverage of Mapping 500K Mapping 500K Set • 500K run on 270 • >500K SNP’s HapMap samples – 2 array set • Pairwise r2 analysis for common SNPs • Performance (MAF>0.05) – 93-98% call rate range (>95% average) • Robust coverage across – >99.5% concordance with HapMap Genotypes, populations r2=0.8 99.9% reproducibility – CEPH, Asian ~66% • SNP lists, annotation and genotype data available – Yoruba ~45% without restriction at Affymetrix.com • 2 & 3 marker predictors (multimarker) further increase coverage Illumina - Infinium I & II Infinium II Assay 10K - 300K Single Base Extension Two haptens/colors Whole genome amplified DNA Bead U A a T T G G C T G WGA target SNP1 SNP2 SNP3 - - - - - SNP b A-DNP A-DNP C-Bio C-Bio A-DNP C-Bio SNP1 SNP2 SNP3 - - - - - SNP c Signal Green/Red 7 HumanHap-1 Genotyping BeadChip Content HumanHap300 Content Maximize coverage of human variation by choosing tag Strategy SNPs to uniquely identify haplotypes. • Tag SNPs – r2 ≥ 0.80 for bins containing SNPs within 10kb of genes Tag SNP selection process: or in evolutionarily conserved regions (ECRs) 1. Examine HapMap Phase I SNPs with MAF > 0.05 in CEU – r2 ≥ 0.70 for bins containing SNPs outside of genes or ECRs. 2. Bin SNPs in high LD with one another using ldSelect (Carlson, et al. 2004) • Additional Content 3. Select tag SNP with highest design score for each bin. – ~8,000 nsSNPs – ~1,500 tag SNPs selected from high density SNP data in the MHC region • Total 317,503 loci HumanHap300 Data Quality HumanHap300 Genomic Coverage 127 samples by Population 25 trios 1.0 15 replicates 0.9 0.8 Parameter Percent 0.7 0.6 Call rate 99.93% 0.5 0.4 Reproducibility >99.99% 0.3 CEU, mean 0.87 median 0.97 Coverage of Phase I+II Data CHB+JPT, mean 0.80 median 0.94 Mendelian 0.2 YRI, mean 0.57 median 0.55 0.035% 0.1 Inconsistencies 0.0 >0 >0.1 >0.2 >0.3 >0.4 >0.5 >0.6 >0.7 >0.8 >0.9 Concordance with Max r2 99.69% HapMap Data HumanHap500 Content Preliminary HumanHap500 Strategy Genomic Coverage by Population – Analysis of full HapMap data set (Phase I + II) using 1.0 HumanHap300 SNP list 0.9 0.8 – Fill in regions of low LD requiring higher density of tag 0.7 SNPs 0.6 0.5 CEU, mean 0.95 median 1.0 CHB+JPT, mean 0.93 median 1.0 0.4 YRI, mean 0.75 median 0.88 – Content Strategy 0.3 • r2 ≥ 0.80 for bins containing SNPs within 10kb of genes or in Coverage of Phase I+II Loci 0.2 evolutionarily conserved regions (ECRs) in CEU 2 0.1 • r ≥ 0.70 for bins containing SNPs outside of genes or ECRs in CEU 2 0.0 • r ≥ 0.80 for large bins (≥ 3 SNPs) in CHB+JPT population >0 >0.1 >0.2 >0.3 >0.4 >0.5 >0.6 >0.7 >0.8 >0.9 2 • r ≥ 0.70 for large bins (≥ 5 SNPs) in YRI population Max r2 8 Data Quality Control Measuring Error Rates • Genotype replicate Rep 1 • Estimating Error Rates samples CC CT TT • Hardy Weinberg Equilibrium • Error rates generally Rep 2 CC 24 1 0 < <1% • Frequency Analysis CT 0 50 0 • Error rates are SNP • Missing Data TT 0 0 25 specific Measuring Error Rates Replicate samples • Genotype replicate Rep 1 • Replicates can also samples CC CT TT detect sample handling errors • Absolute number of Rep 2 CC 24 1 0 replicates is more – Wrong plate CT 0 50 0 important than – Plate rotation percentage TT 0 0 25 – E.g.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us