<<

SOLiD Overview

Yang Wang Staff Field Bioinformatics Scientist Life Technologies

April 14, UMASS Part I SOLID System Overview • SOLiD workflow and system components • Library preparation • On-bead chemistry • Data analysis workflow

Part II Color Concept and Mapping • Color space and 2-base-encoding • Quality values and filtering • Mapping algorithm and considerations

2 © 2009 Applied Biosystems RNARNA DNADNA

Application Application Application specificspecific SOLiD Workflow Application sample specificspecific sample Data analysis preparationpreparation Data analysis

EmulsionEmulsion PCRPCR & & SequencingSequencing ImagingImaging & & slideslide chemistrychemistry basecallingbasecalling preparationpreparation

3 © 2009 Applied Biosystems SOLiD Instrument Stage with Dual Flow Cell

Monitor and keyboard to onboard XP box

Reagent handling Waste Containers

Computer Cluster 4 © 2009 Applied Biosystems System Hardware Overview

1Gbit Switch

Instrument Controller/XP Box

Head Node – Dell 2950

Compute Nodes – Dell 1950

MD1000 – secondary storage

5 © 2009 Applied Biosystems Slide Is Placed On SOLiD Instrument

Slide

Reagent Stage with Dual handling Flow Cell

6 © 2009 Applied Biosystems Slide Configurations

Spot (1/4) Spot (1/1) Spot (1/8)

1 image per panel/ligation

Less real estate for multiple samples

426 2357 186 panels/spot panels panels/spot 7 © 2009 Applied Biosystems Bead Deposition On Slide

3’-end modification

Beads attached to glass surface in a random array

8 © 2009 Applied Biosystems Section Outline - Overview

• SOLiD workflow and system components • Library preparation • On-bead chemistry • Data analysis workflow

9 © 2009 Applied Biosystems Create Library of DNA Fragments 2 methods for DNA

Fragment Library (Targeted Resequencing, ChIP-Seq, SmallRNA , Multiplexing)

P1 adapter DNA fragment P2 adapter

60-90 Bases

Mate-pair Library (, Structural Variation)

P1 adapter F3 Tag Internal Adapter R3 Tag P2 adapter

10 © 2009 Applied Biosystems Fragment Library Creation

+

Complex sample Fragment sample Ligate P1 and P2 Adaptors e.g. Genomic Randomly or Targeted DNA, TAG library, e.g. sonication, Concatenated mechanical, enzymatic PCR products digestion

P1 adapter DNA fragment P2 adapter

60-90 Bases

11 © 2009 Applied Biosystems Fragment Library Representation

50 bp F3 Tag

P1 adapter DNA Fragment P2 adapter T

Colorspace read output in FASTA format

>1_88_1830_ F3 T2103112003130213233110321

12 © 2009 Applied Biosystems Fragment Multiplexed Library

+

Ligate P1 and Complex sample Fragment sample, Modified P2 Randomly or Targeted e.g. Genomic e.g. sonication, Adaptors DNA, TAG library, mechanical, enzymatic Concatenated digestion PCR products

P1 adapter DNA fragment Internal Adapter Barcode P2 adapter

60-90 Bases

13 © 2009 Applied Biosystems Fragment Multiplexed Libraries

F3 Tag BC Tag

P1 adapter DNA Fragment Internal Adapter Barcode 1 P2 adapter T G

P1 adapter T DNA Fragment Internal Adapter G Barcode 2 P2 adapter ……..

P1 adapter DNA Fragment Internal Adapter Barcode 16 P2 adapter T G >1_88_1830_ F3 >1_88_1830_ BC T2103112003130213233110321 G00032

14 © 2009 Applied Biosystems Mate -pair Library Creation

+

Ligate Complex Randomly size select Internal sample Fragment (eg:1, 2, 3, 5, adapters sample 10 KB) IA

Circularize

15 © 2009 Applied Biosystems Mate -pair Library Creation - continued 50bp 50bp

Nicked Cleave IA IA +

Ligate P1 and P2

50 bp 50 bp F3 Tag R3 Tag

P1 adapter DNA Sample Internal Adapter DNA Sample P2 adapter T G

>1_88_1830_ F3 >1_88_1830_ R3 T2103112003130213233110321 G3211312320130023232012112

16 © 2009 Applied Biosystems Emulsion PCR Mix PCR aqueous phase into a water-in-oil emulsion and carry out emulsion PCR

17 © 2009 Applied Biosystems Emulsion PCR - Components

Library Template

+

Primers P1<

P1-coupled beads

Polymerase Enzyme

18 © 2009 Applied Biosystems Emulsion PCR - Individual Bead

P1-coupled beads

1) Template Anneals to P1

2) Polymerase extends from P1

3) Complementary sequence is extended off bead surface

4) Template disassociates 19 © 2009 Applied Biosystems Emulsion PCR – Post Amplification

Beads with no product

Bead contains ~30K amplified products from original single strand molecule

20 © 2009 Applied Biosystems Mate -pair Library F3 - R3 orientation

R3 F3 5' 3'

3' 5' F3 R3

Note strand and orientation of the tags per Mate-Pair library construction - Both F3 and R3 are on the same strand - R3 is upstream of F3

21 © 2009 Applied Biosystems Section Outline - Overview

• SOLiD workflow and system components • Library preparation • On-bead chemistry • Data analysis workflow

22 © 2009 Applied Biosystems SOLiD Color Space • SOLiD uses 2 base color encoding of data (2BE)

Collect color Identify bead Identify beads image color

Record colors for each bead over consecutive cycles

Color space Base space A C G G T C G T C G T G T G C G T

23 © 2009 Applied Biosystems Properties Of The Probes Cleavage site

3’ 3’ Ligation site Fluorescent dye

T C n n n z z z

2nd base

1,024 Octamer Probes (4 5), 4 Dyes 4 dinucleotides, 256 probes per dye base st Each dinucleotide is encoded by a color 1 N= degenerate bases Z= Universal bases | | | | CY5 TXR CY3 FAM

24 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Step 1: making beads single stranded

Initialize

1µm 1µm bead bead 5’ 3’ P1 Adapter Template Sequence

25 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Step 2: the ligation reaction process

universal seq primer ligase 3’ p5’ 3’ 5’ 5’ GG n n n z z z T A n n n z z z

3’ 5’ 3’ 5’ AT n n n z z z TC n n n z z z

universal seq primer p5’ T A 1µm 1µm bead bead 5’ 3’ P1 Adapter Template Sequence

26 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Step 2: the ligation reaction process

universal seq primer ligase 3’ p5’ 3’ 5’ 5’ GG n n n z z z T A n n n z z z

3’ 5’ 3’ 5’ AT n n n z z z TC n n n z z z

ligase universal seq primer p5’ T A 1µm 1µm bead bead 5’ 3’ P1 Adapter Template Sequence

27 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Image

universal seq primer T A 1µm 1µm bead bead 5’ 3’ P1 Adapter 1,2 Template Sequence

28 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Cleavage/phosphate generation

universal seq primer p5’ T A 1µm 1µm bead bead 5’ 1,2 3’ P1 Adapter Template Sequence

29 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Ligation (2 nd cycle)

ligase

3’ 5’ 5’ GG n n n z z z TA n n n z z z

3’ 5’ 3’ 5’ AT n n n z z z TC n n n z z z

universal seq primer ligase p5’ T A G G 1µm 1µm bead bead 5’ 3’ P1 Adapter 1,2 Template Sequence

30 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Visualization (2 nd cycle)

universal seq primer T A G G 1µm 1µm bead bead 5’ 1,2 6,7 3’ P1 Adapter Template Sequence

31 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Cleavage (2 nd cycle)

universal seq primer p5’ T A G G 1µm 1µm bead bead 5’ 1,2 6,7 3’ P1 Adapter Template Sequence

32 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Interrogates every 5 th base

universal seq primer T A G G G A T T C C 1µm1µm bead bead 5’ 1,2 6,7 11,12 19,20 24,25 3’ P1 Adapter Template Sequence

33 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation Reset

Reset And Primer annealling

1µm 1µm bead bead 5’ 3’ P1 Adapter Template Sequence

34 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation (1 st cycle after reset)

universal seq primer n-1 ligase 3’ p5’ 3’ 5’ 5’ GG n n n z z z T A n n n z z z

3’ 5’ 3’ 5’ AT n n n z z z TC n n n z z z ligase universal seq primer n-1 p5’ A T 1µm 1µm bead bead 5’ 3’ P1 Adapter Template Sequence

35 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation (1 st cycle after reset)

universal seq primer n-1 A T 1µm 1µm bead bead 5’ 0,1 3’ P1 Adapter Template Sequence

36 © 2009 Applied Biosystems SOLiD Chemistry System 4-color Ligation (2 nd Round)

universal seq primer n-1 AT TT GA CG AG 1µm 1µm bead bead 5’ 0,1 5,6 10,11 15,16 19,20 23,24 3’ P1 Adapter

37 © 2009 Applied Biosystems Collection Order Is Not Sequence Order

38 © 2009 Applied Biosystems Section Outline - Overview

• SOLiD Workflow and Library Creation • On-bead chemistry • Data analysis workflow

39 © 2009 Applied Biosystems SOLiD Data Analysis Overview

40 © 2009 Applied Biosystems SOLiD Workflow Overview On -Instrument

Instrument Control Software ICS set up run details on XP box Job Manager Insert executes workflows workflow Query/Update Job Status JobManager Relational DB SOLiD DB stores SOLiD run workflow information Initiate Pipeline Execution

Send jobs Primary Analysis to cluster Secondary Analysis

PBS View real time results Resource Manager Queues, job submission SETS

Solid Experiment Tracking Software 41 Results View, Run Tracking © 2009 Applied Biosystems Primary Analysis – Images To Reads

Barcoding/Multiplexing Align to Focal Map

Bead Finding

Primary Analysis Outputs - Colorspace reads in FASTA format - Quality value scores - Panel and Bead Statistics

42 © 2009 Applied Biosystems Secondary and Application Analysis reads -to -results : processing per library

On-Instrument BioScope

GFF/SAM Fasta/QualFasta/Qual GFF/SAM Mapping Files FilesFiles Files

DiBayes (SNP)

Mapping/Pairing ApplicationApplication SpecificSpecific Files Structural Files Variation

Filtering Pipeline Whole (optional) Transcriptome

43 © 2009 Applied Biosystems Auto -export and Offline Analysis • Job Manager and BioScope mediated • Configured from on-instrument software package, SETS • BioScope will launch the remote analysis job when run completes New features

On-Instrument

Image Primary Auto-Export BioScope Acquisition Analysis

Offline Analysis

44 © 2009 Applied Biosystems Additional Tertiary, Application Analysis Tools http://info.appliedbiosystems.com/solidsoftwarecommunity

Available tools include (and others): • Small RNA Pipeline • SOLiD GFF Conversion Tool • SOLiD Base QV Tool

45 © 2009 Applied Biosystems ColorColor ConceptConcept andand MappingMapping

• Color space and 2-base-encoding • Quality values and filtering • Mapping algorithm and considerations

46 © 2009 Applied Biosystems What Is Color Space? • Capillary electrophoresis uses single base, color encoding of data

Collect color Identify peak Convert to Identify peaks image colors base calls

Base space

Color space

47 © 2009 Applied Biosystems SOLiD Color Space • SOLiD uses 2 base color encoding of data (2BE)

Collect color Identify bead Identify beads image color

Record colors for each bead over consecutive cycles

Color space Base space A C G G T C G T C G T G T G C G T

48 © 2009 Applied Biosystems Properties Of 2 -Base Encoding (2BE) Second Base 5’ 3’ 1 3 1 3 1 3 2 3 5’-A C G T A C G A T -3’ 3’-T G C A T G C T A -5’

1 3 1 3 1 3 2 3 FirstBase

3’ 5’ • Two dibases that agree in just one base have different colors • color(AC) ≠ color(AG) ≠ color(AT) ≠ color(AA) • Two dibases that do not agree in either base have same color • color(AC) = color(GT) and color(CG) = color(AT) • A dibase and its reverse have the same color • color(AC) = color(CA), color(GT) = color(TG) • Repeated-base dibases have the same color

49 • color(AA) = color(CC)= color(GG)= color(TT) © 2009 Applied Biosystems “Valid ” and “Invalid ” Adjacent Color Substitutions • “Invalid” changes are inconsistent with SNP and likely sequencing errors

50 © 2009 Applied Biosystems Outline

• Color space and 2-base-encoding • Quality values and filtering • Mapping algorithm and considerations

51 © 2009 Applied Biosystems Quality Value (QV) For Color Call

• A score calculated based on the probability of an error call at that base • Similar to those generated by phred and the KB Basecaller for capillary electrophoresis sequencing = − q 10 log 10 p p = probability of color call error

• A QV score of 10 represent 10% error rate, whereas a QV score of 20 represents a 1% error rate

52 © 2009 Applied Biosystems QVs – How Are They Generated?

• Use angle of vector / intensity of color and a lookup table (pre-computed from training data sets) to predict QV

• Each color call and QV is computed independently

53 © 2009 Applied Biosystems Filtering By QV - Two Choices

• Filter • Filter out reads based on QV values and patterns • Reduces total amount of reads and raw error rate • Increases % of reads mapped • Lose good alignments

• Don’t filter • Filter the data by mapping • Low quality reads won’t map

54 © 2009 Applied Biosystems Outline

• Color space and 2-base-encoding • Quality values and filtering • Mapping algorithm and considerations

55 © 2009 Applied Biosystems Mapping Algorithm

• Challenge: • A small word size is needed for continuous word searches in short reads. This is computationally and time intensive.

• Our Approach: • Use discontinuous word patterns • Allows faster searching and guaranteed to find all hits up to a certain number of mismatches

56 © 2009 Applied Biosystems Discontinuous Words

• Continuous words: searching for a perfect alignment, 8/8 bases (word size 8, e.g. used by BLAST) ATTTTTT GGGTAGCC CCTTGGATGAGT |||||||| AG GGGTAGCC TGATGATGGT • Discontinuous words: searching 8/18 matches (effective word size is also 8) ATTTT TT GGGTA GC CCCTT GGAT GAGT || || |||| TT GACCG GC ATGGG GGAT 110000011000001111

57 © 2009 Applied Biosystems Mapping to Reference With Discontinuous Words

• For a read length of 15, we can find all alignments with 1 mismatch (15_1) using discontinuous words in the 3 schemas of word size 10

Schema_15_1 Ref 002321031332122013220 1111111111 Read 32103133 3122013 111111111100000 ? Ref 002321031332122013220 000001111111111 ? 1111111111 Read 32103133 3122013

111110000011111 ? Ref 002321031332122013220 Ref 002321031332122013220 11111 11111 11111111 111111 Read 32103133 3122013 Read 32103133 3122013 extend • Using continuous words , word size must be at most 7 to find all alignments with 1 mismatch • 40 times slower than the three schemas above

58 © 2009 Applied Biosystems Schema Set for 25 -mer Read, Word Size = 14

# 14 base index on 25, 0 mismatches 000000000011111111111111

# 14 base index on 25, 1 mismatches 111111111111110000000000 111110000000000111111111 000000000011111111111111 More mismatches

# 14 base index on 25, 2 mismatches longer run time 000001101111001110110111 110110001000101010111110 111101100111100100001001 100011110011110011011010 011110010100010111010011 101000011100111101101101 010101111011011001100100

59 © 2009 Applied Biosystems Mismatch Tolerance

• A SNP will generate two color mismatches • Consider the SNP frequency in the genome when setting up number of mismatches allowed in a read

• Recommended mismatch levels • 50 base-pair read – 6 mismatches • 35 base-pair read – 3 mismatches • 25 base-pair read – 2 mismatches

60 © 2009 Applied Biosystems Mapping Tool - mapreads • General features of mapping tool • Aligns in color space • Translates reference sequence to color space • Allows mismatches (no indels), valid adjacent mismatches can be counted as one • Allows masking of certain positions (bad calls) • For fixed reference sequence, running time is linear with number of reads

• New with SOLiD 3+ • Seed and extend mapping approach • Multi-threaded

61 © 2009 Applied Biosystems Local Mapping • Motivation • Long reads, non-uniform quality • At the end of reads errors tend to accumulate • Some applications show sequencing into adaptors

62 © 2009 Applied Biosystems Local Alignment Strategy

• Map the first 25 colors of the read to allowing 2 mismatches (MM). • For every hit found (up to the Z-limit), do a local extension • Accumulate alignment score (Match = 1, MM = -2 [user defined] ) • Report the best partial alignment (anchored local) based on score • Discard if score does not meet minimum cutoff

Read: 0122130123012303201203021 123012310231203120103120 ||||||| ||||||||||||||| | ||||| |||||||||| ||| Ref: 0122130 0230123032012030 1112301 13102312031 0010 1203

• For reads not mapped, shift anchor location and attempt additional mapping

63 © 2009 Applied Biosystems Local Mapping: Anchor Offset

start end

reference

read (offset)

• start and end mark the start and end of the alignment in the reference. • The alignment may not encompass the entire read. • The start of the alignment in the read is called the offset

64 © 2009 Applied Biosystems Mapping QV: Mathematical Definition

• Mapping quality is an estimate how likely an alignment is correct • First, calculate the posterior probability L−t −  1  P(r | Alignment ) = 1( − e)t m e m    4 

• If an alignment has a probability of ,P(r), it’s mapping QV is defined as

ΣP(r) • -10*log 10 (1-P(r)/P), where P = for all reads

65 © 2009 Applied Biosystems Mapping QV: Working Examples • Read maps two places, both with one mismatch • Each has equal chance of being correct Read: 01221301230123032012030 21 Read: 01221301230 12303201203021 mqv=0 ||||||||||||||||||||||| | mqv=0 ||||||||||| ||||||||||||| Ref: 01221301230123032012030 11 Ref: 01221301230 02303201203021 • Read maps two places, one with zero mismatches, and one with two mismatches • Higher likelihood the true alignment is the perfect alignment 1_2434280.2:(47.3.0):q22 1_24854171.2:(47.5.0):q0 Read: T…0120123012303201203011 Read: T…01201230 123032012 03021 mqv=22|||||||||||||||||||||| mqv=0 |||||||| |||||||| |||| Ref: …0120123012303201203011 Ref: …01201230 023032012 23021 • Mapping QV largely depends on the difference of the hits

66 © 2009 Applied Biosystems Traditional Mapreads • Split up the mapping jobs • All reads mapped against part of reference ( IO intensive, processing the same read multiple times ) • Limit read mapping to each reference entry (-z) • Merge results ( IO intensive ) All reads (.csfasta) Mapped CPU 1 Reference Results 1 Entry 1

All reads (.csfasta) Mapped Merged CPU 2 Reference Results 2 Results . Entry 2 . All reads (.csfasta) Mapped CPU n Reference Results n Entry n 67 © 2009 Applied Biosystems Multithreaded Mapreads

• Single mapping job • Fraction of reads (1/n) are mapped against the whole reference • ~20GB of RAM for the human genome • Limit read mapping to whole genome (-z) • Combine results (simple merge)

1/n reads Mapped CPU 1 (.csfasta) Results 1

1/n reads Full Mapped Combined CPU 2 (.csfasta) Reference Results 2 Results . .

1/n reads Mapped CPU n (.csfasta) Results n 68 © 2009 Applied Biosystems Local Mapping: Advantages

• Increased throughput • Some data sets have observed 2-fold increase in mapping using local mapping vs. classical mapping • Increased speed • Up to 15X Faster than iterative mapping with trimming • As read length increases, only a small set of schemas is needed to be optimized

69 © 2009 Applied Biosystems Expected Number Of Errors In A Read

• Given a read with QVs, one can estimate the expected number of errors in the read

• If the QV of the i-th call is qi, then the expected number of errors in the read is n − = = ⋅ qi 10/ m ∑ pi pi .1 0022 10 i=1

Accounts for q values being rounded to integer

70 © 2009 Applied Biosystems Accuracy

• Accuracy is affected by the mapping parameters used • Increasing number of mismatches allowed will increase number of reads that map and drive up the error rate

• The accuracy after applying 2-Base Encoding (2BE) rules improves significantly over raw color accuracy

71 © 2009 Applied Biosystems Accuracy Assessment – With a Known Genome

50 base read mapped with up to 6 MM (DH10B results)

Total number of correct CS calls 0.14% 97.60% 2.17% Single mismatched calls 2.40% 0.09% Invalid Adjacent

Valid Adjacent

Accuracy = 99.91% Raw accuracy (before corrected by 2BE) 97.6%

72 © 2009 Applied Biosystems Accuracy Effect of 2BE correction

Base accuracy by position in read

100 QV

10 10

raw 1 20 Percent Error Percent

0.1 corrected 30

0.01 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Base Position in Read

73 © 2009 Applied Biosystems 1000 Genome Project Data Accuracy Comparison • Using Broad Institute IGV (Integrated Genome Viewer) • NA19240 SOLID YRI Daughter • NA NA19240 SLX(Illumina) YRI Daughter • Install IGV and See the SOLID and SLX (Illumina) data • Download and install http://www.broadinstitute.org/igv/ (need register) • Open IGV and Select Human hg18 • Go to File --> Load from server ( be connected to internet) • Under Available Datasets pop up window, Expand 1000 Genomes • Expand YRI Trio • Select NA19240 SOLID YRI Daughter and NA NA19240 SLX(Illumina) YRI Daughter • Pick any chr, move the blue zoom bar (minus plus sign on top right) to the most right (closest to + ) • Randomly move the solid-red to any region on the chr • Count SNP/Error calls by counting the colored base

74 © 2009 Applied Biosystems Random Pick #1

In chr1:77,762,062-77,762,104, a 42 bp region, Illumina has 29 75 SNP/Error calls, and SOLiD has 1 SNP/Error call. © 2009 Applied Biosystems Random Pick #2

In chr2:124,504,606 -124,504,653, a 47 bp region, Illumina has 15 SNP/Error calls,

76 and SOLiD has none. © 2009 Applied Biosystems Random Pick #3

In chr3:28,167,215-28,167,283, a 68 bp region, Illumina has 22 SNP/Error calls, and77 SOLiD has 1 SNP/Error call. © 2009 Applied Biosystems SNP Detection at Different Coverages

Wheeler et al PLoS Computational Biol. 2008 Vol 452| 17 April 2008| doi:10.1038/nature06884

78 © 2009 Applied Biosystems How Much Coverage Do You Need? – For SNP

• 2-base encoding helps to reduce the coverage needed to detect SNP with high confidence • Heterozygous SNP will need higher coverage, compared to homozygous, to detect both alleles • If the coverage at a heterozygous position is less than 10X, the probability that one of the alleles will not be detected is 1% or more • If the sample preparation method is likely to introduce some bias in allele ratio, coverage should be increased

79 © 2009 Applied Biosystems Coverage - Variance in Different Genomic Regions

• Ideally, the coverage would follow a binomial distribution

• Possible reasons for deviations • Characteristics of the reference genome (complexity, frequency repeats, etc) coverage • The samples being sequenced (structural variation) • Sample preparation and sequencing chemistry

• Mate-pair data (rather than fragment data) can largely, but not completely, overcome this problem

• Consistent contiguous regions of over/under-coverage may represent copy number variation • Detection of SNPs or InDels in these regions should be treated with caution

80 © 2009 Applied Biosystems “Addition Table” for Colors

ref A G G C A C C 2 0 3 1 1 0 Second Color 2 0 0 2 1 0 Sample A G G G A C C ⊕ 0 1 2 3 31 = 2 02 = 2 0 0 1 2 3 First ColorFirst 1 1 0 3 2 ref A G G C A C C 2 0 3 1 1 0 2 0 0 3 0 0 2 2 3 0 1 Sample A G G G C C C 3 3 2 1 0 311 = 21=3 030 = 30=3

81 © 2009 Applied Biosystems More on Color Consistency • Isolated color changes do not always correspond to measurement errors • e.g. the following reference/read combination results in two single-color mismatches A G G C A C C reference AGG CA CC 2 0 3 1 1 0 read AGG GT CC 2 0 0 1 2 0 A G G G T C C • Permitted by addition table because 311 = 012 = 3

• “Invalid” two position changes do not always correspond to measurement errors • e.g. the combination below results in one “forbidden” 2-color change and one isolated single-color change A A C T T A A reference AA CTT AA 0 1 2 0 3 0 read AA TGG AA 0 3 1 0 2 0 A A T G G A A

• Permitted by addition table because 1203 = 3102 = 0

82 © 2009 Applied Biosystems