<<

MOLECULAR THERMODYNAMICS OF THE STABILITY OF NATURAL, SUGAR

AND BASE-MODIFIED DNA DUPLEXES AND ITS APPLICATION TO THE

DESIGN OF PROBES AND PRIMERS FOR SENSITIVE DETECTION OF

SOMATIC POINT MUTATIONS

by

Curtis Hughesman

B.A.Sc., The University of Calgary, 1997

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

THE FACULTY OF GRADUATE STUDIES

(Chemical and Biological Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

December 2012

© Curtis Hughesman, 2012 Abstract

Cancer is characterized as a genetic disease associated with acquired somatic

mutations, a majority of which consist of only a single base change and are commonly

referred to as somatic point mutations (SPM). Real-time quantitative polymerase-chain reaction (qPCR) techniques using allele specific (AS) probes or primers are widely used in genotyping assays to detect commonly known single nucleotide polymorphisms (SNP), and also have the potential to detect SPMs, provided the required analytical sensitivity and specificity can be realized. One strategy to establish the necessary performance is to introduce nucleotide analogs such as Locked Nucleic Acids (LNAs) into AS probes or primers; however the successful design requires a fundamental understanding of both the thermodynamics and kinetics of LNA-DNA heteroduplexes. Melting thermodynamic studies of DNA duplexes and LNA-DNA heteroduplexes were therefore carried out using both

ultraviolet (UV) spectroscopy and differential scanning calorimetry (DSC) to quantify the

o o thermodynamics (ΔH , ΔS , ΔCp and Tm) associated with the helix-to-coil transition. Data

collected on DNA duplexes and DNA-LNA heteroduplexes were used to introduce

improvements in the “unified” nearest-neighbor model, and for the development of a new

model, referred to as the Single Base Thermodynamic (SBT) model that accurately predicts the Tm for the melting of LNA-DNA heteroduplexes.

The SBT model was extended and applied to PCR conditions to design LNA-bearing

AS probes for qPCR assays to detect the clinically important SPMs KIT c.1799t>a (D816V)

and JAK2 c.1849g>t (V617F), and were found to significantly outperform standard AS

probes containing only DNA. The interaction of Taq polymerase with heteroduplexes

formed between an LNA-bearing primer and a target template were also studied and results

ii used to generate general rules for designing LNA-bearing AS primers capable of unequivocal detection of a rare mutant allele bearing a SPM. The method was then extended to allow qPCR detection by Plexor™ technology and applied to create an AS primer directed against the JAK2 V617F SPM that can detect one mutation in a background of more than 100,000 copies of the wild-type allele and which is now used by the Cancer Laboratory of the British Columbia Cancer Agency (BCCA) to analyze patient samples.

iii Preface

A version of Chapter 1 has been submitted for publication to "Biochemical Engineering

Thermodynamics", in press, von Stocker U, et al. (Eds.), EPFL Press (2012). The

manuscript was written through collaboration between Dr. Charles Haynes and me.

A version of Chapter 2 has been published. Hughesman, C.B., Turner, R.F. and Haynes, C.

(2011) Correcting for Heat Capacity and 5'-TA Type Terminal Nearest Neighbors Improves

Prediction of DNA Melting Temperatures Using Nearest-Neighbor Thermodynamic Models.

Biochemistry, 50, 2642-2649. I performed all of the experiments and wrote most of the

manuscript. Dr. Charles Haynes provided guidance on the research. Dr. Charles Haynes and

Dr. Robin Turner reviewed and edited the manuscript.

A version of Chapter 3 has been published. Hughesman, C.B., Turner, R.F.B. and Haynes,

C.A. (2011) Role of the Heat Capacity Change in Understanding and Modeling Melting

Thermodynamics of Complementary Duplexes Containing Standard and Nucleobase-

Modified LNA. Biochemistry, 50, 5354-5368. I performed or directly supervised all of the experiments and wrote most of the manuscripts. Dr. Charles Haynes provided guidance on the research. Dr. Charles Haynes and Dr. Robin Turner reviewed and edited the manuscript.

A version of Chapter 5 is being prepared for submission as a publication. Experiments on the LNA-bearing primers directed at the BCL2 plasmids were performed by Colin Olsen and me. Kelly McNeil generated the JAK2 plasmids and DNA from patients for testing. All experiments involving the AS primers were performed by myself. Kelly McNeil, Sean

iv Young, Dr. Aly Karsan and Dr. Charles Haynes provided guidance on the research and

experiments. Approval by UBC’s Research Ethics Board was obtained; blind testing of anonymous patient samples previously acquired and stored at the BC Cancer Agency using assays developed in this work was conducted under UBC’s research ethics certificate H08-

01035. All testing was conducted for research purposes only and no knowledge of patient identity or medical history was known or transferred.

v Table of Contents

Abstract ...... ii Preface ...... iv Table of Contents ...... vi List of Tables ...... ix List of Figures ...... xii Nomenclature ...... xv Acknowledgements ...... xviii Dedication ...... xx Chapter 1: Introduction ...... 1 1.1 Thesis Overview ...... 1 1.2 Background ...... 7 1.2.1 Methods for measuring duplex DNA melting thermodynamics ...... 12 1.2.1.1 UV absorption spectroscopy ...... 13 1.2.1.2 Calorimetry ...... 20 1.2.2 Thermodynamic models used to predict DNA duplex stability...... 24 1.2.3 Locked Nucleic Acids (LNAs) ...... 30 1.2.3.1 Chemistry and properties ...... 31 1.2.3.2 Predicting the stability of LNA-DNA heteroduplexes ...... 32 1.2.4 PCR based methods for detection and quantification of a SPM ...... 34 1.2.4.1 LNA containing AS probes ...... 38 1.2.4.2 LNA containing AS primers ...... 39 1.2.5 Clinically significant somatic point mutations ...... 41 1.2.5.1 JAK2 V617F...... 41 1.2.5.2 KIT D816V...... 42 1.2.5.3 BRAF V600E ...... 42 1.3 Thesis Objectives and Content Overview ...... 43 Chapter 2: Correcting for Heat Capacity and 5'-ta Type Terminal Nearest Neighbors Improves Prediction of DNA Melting Temperatures Using Nearest-Neighbor Thermodynamic Models ...... 46 2.1 Materials and Methods ...... 47 2.1.1 DNA synthesis and purification ...... 47 2.1.2 Differential scanning calorimetry ...... 47 2.1.3 Regression of melting thermodynamics data ...... 48 2.1.4 Error analysis ...... 48 2.2 Results and Discussion ...... 49 2.2.1 Introduction of ΔCp into the unified NNT model improves Tm predictions ... 51 bp 2.2.2 Regressed ΔCp and Tref values are supported by DSC data ...... 55 2.2.3 Duplexes terminating in a 5’-ta have statistically significant Tm(error) ...... 59 2.2.4 Correcting Tm predictions for duplexes containing 5’-ta type termini...... 63 2.3 Conclusions ...... 65 Chapter 3: The Role of the Heat Capacity Change in Understanding and Modeling Melting Thermodynamics of Complementary Duplexes Containing Standard and Nucleobase Modified LNA ...... 67 3.1 Materials and Methods ...... 69

vi 3.1.1 Sequence design ...... 69 3.1.2 synthesis and purification ...... 70 3.1.3 Differential scanning calorimetry ...... 70 3.1.4 UV spectroscopy ...... 71 3.1.5 Error analysis ...... 72 3.1.6 Regression of SBT model parameters...... 73 3.1.7 Model predicted ΔTm values for LNA substituted duplexes ...... 73 3.2 Results and Discussion ...... 74 3.2.1 Accounting for ∆Cp shows that the increase in duplex stability resulting from LNA substitutions is predominantly driven by a favorable entropy change... 79 3.2.2 Base classification and pairing explain differences in the ∆∆Si° value and stability enhancement offered by different LNAs...... 84 3.1.1 Terminal 5’ and 3’ LNA substitutions are much less stabilizing than internal LNA substitutions ...... 86 3.1.2 A new model for predicting the melting thermodynamics of LNA substituted duplexes ...... 88 3.1.3 The SBT model predicts Tm values for standard-LNA-containing mixmer duplexes with similar accuracy as more complex NNT models...... 90 3.1.4 Testing the validity of SBT model assumptions...... 92 3.1.5 Substitution of standard LNAs with base-modified LNA nucleosides provides further stability increases...... 95 3.1.6 D•H base pairs demonstrate pseudo-complementary properties...... 97 3.2 Conclusions ...... 99 Chapter 4: Design of LNA-rich Hydrolysis Probes for Detection of Somatic Point Mutations ...... 102 4.1 Materials and Methods ...... 104 4.1.1 ...... 104 4.1.2 Plasmids ...... 105 4.1.3 Monitoring helix-to-coil transitions with UV spectroscopy ...... 106 4.1.4 UVM of 9-mer duplexes to determined ΔΔTmax(MM) ...... 106 4.1.5 Prediction of melting thermodynamics for probe•template duplexes ...... 107 4.1.6 Real-time qPCR ...... 108 4.2 Results and Discussion ...... 109 4.2.1 Model-based design of LNA-rich AS primers offering increased ∆Tm(MT–WT) ...... 109 4.2.2 Performance testing of probe designs on plasmid templates ...... 114 4.2.3 Application of LNA-rich probes to mixtures of MT and WT alleles ...... 119 4.2.4 Understanding the reduction in hydrolysis probe signal strength ...... 122 4.3 Conclusions ...... 129 Chapter 5: Novel Plexor™ Multi-LNA Allele Specific Primers for Unequivocal Clinical Detection of Somatic Point Mutations: Design Rules and Application to JAK2 V617F, KIT D816V and BRAF V600E ...... 131 5.1 Materials and Methods ...... 133 5.1.1 Design and testing PCR primers with single and multiple LNA substitutions ...... 133 5.1.2 Testing of JAK2 WT and MT (V617F) AS primers on plasmids ...... 134

vii 5.1.3 Testing of KIT and BRAF plasmids with AS primers ...... 135 5.1.4 Genomic DNA isolation ...... 135 5.1.5 Plexor multi-LNA AS primer ...... 137 5.1.6 Benchmark hydrolysis-probe based assay of JAK2 V617F ...... 138 5.1.7 JAK2 MutaQuant assay ...... 138 5.2 Results and Discussion ...... 141 5.2.1 The impact of LNA substitutions in the 3’ region of a primer ...... 141 5.2.2 AS primer design: preferred location for the site of variation (SOV) ...... 147 5.2.3 AS primer design: the effect of LNA substitution versus mismatch insertion ...... 152 5.2.4 AS primer design: application of LNA substitution guidelines ...... 153 5.2.5 AS primer design: Plexor™ LNA AS primers directed against JAK2 V617F ...... 158 5.2.6 Testing of LNA AS primer designs for absolute detection of KIT D816V and BRAF V600E ...... 161 5.3 Conclusions ...... 163 Chapter 6: Conclusions and Suggestions for Future Work ...... 165 References ...... 173 Appendices ...... 187 o Appendix A Tmax ( C) data for the helix-to-coil transition of DNA duplexes and LNA- DNA heteroduplexes that are perfectly matched or contain a single centrally located mismatch ...... 187

viii List of Tables

Table 1.1 “Unified” Nearest Neighbors Parameters for Helix-to-Coil Transition...... 29

Table 2.1 Measured thermodynamic values from DSC analysis of short duplex DNA. .... 56

o Table 2.2 Average Tm(error) ( C) values associated with terminal base pairs and terminal

nearest-neighbors...... 60

Table 2.3 Measured thermodynamic data for helix to coil transition of 11-mer DNA

duplexes used to study duplex end effects...... 61

o Table 2.4 Thermodynamic parameters for 5'-ta terminal NN at Tref = 53 C...... 64

Table 3.1 DSC determined thermodynamic parameters for duplexes between DNA

oligonucleotides with and without A, T, G and/or C substitutions...... 75

Table 3.2 DSC derived thermodynamic data for complementary DNA duplexes where one

strand contains D and/or H substitutions...... 77

bp Table 3.3 Average ΔCp and Tm values determined by DSC for duplexes with and without

LNA substitutions...... 78

Table 3.4 General SBT model parameters for the helix-to-coil transition of duplexes

containing standard and nucleoside-modified LNA substitutions...... 83

Table 3.5 ΔΔSi° parameters and differences in them for the helix-to-coil transition of

duplexes containing standard LNA bases...... 85

Table 3.6 DSC derived thermodynamic data for duplexes used to study the effect of LNA

substitution at the 3’ or 5’ termini...... 87

o Table 3.7 Errors in ΔTm ( C) values predicted using the specified model for standard LNA

(A, T, G and/or C) substituted mixmer duplexes...... 91

ix Table 3.8 UVM derived thermodynamic data for duplexes containing tandem LNA

substitutions...... 93

Table 3.9 ΔTm prediction errors using SBT model for LNA substituted gapmer duplexes

and fully modified LNA-DNA heteroduplexes...... 95

Table 3.10 UVM derived ΔTmax data for the helix-to-coil transition of duplexes with base

pairs formed between a, d, A or D and t, h, T, or H...... 98

Table 4.1 Model based design of DNA and LNA probes for KIT D816V...... 110

Table 4.2 Incremental ΔΔTmax(MM) for LNA•DNA mismatch in 9-mer duplexes determined

from UVM experiments...... 111

Table 4.3 Model predicted and UVM experimentally determined thermodynamic

parameters for KIT D816V and JAK2 V617F probes at PCR solution conditions.

...... 112

Table 4.4 Calculated theoretical analytical specificity of KIT D816V and JAK2 V617F

o o hydrolysis probes at Ta = 62 C and 65 C...... 116

Table 4.5 Experimentally determined SPE for LNA bearing probes...... 120

Table 5.1 PCR efficiencies Eexpt for amplification of the BCL2 plasmid mini- using

pure-DNA and LNA-substituted primers...... 139

Table 5.2 Average Eexpt for amplification of the BCL2 plasmid mini-gene reported as a

function of LNA base type and base location in primers containing a single LNA

substitution within the 3’ (L0) to 3’-8 (L8) positions...... 141

Table 5.3 PCR efficiencies for amplification of the BCL2 plasmid mini-gene using forward

(FP) and reverse (RP) primers containing multiple LNA substitutions...... 142

x Table 5.4 Experimental and model estimated PCR efficiencies for amplification of the

BCL2 plasmid mini-gene using primers containing three LNA substitutions

within the 3’-1 to 3’-6 positions...... 144

Table 5.5 Average efficiency of amplification of the BCL2 mini-gene using each possible

primer design comprised of either one or two LNA substitutions within the 3’ to

3’-6 region...... 146

Table 5.6 Eexpt, Epred & SPE for various JAK2 WT AS primers directed against the JAK2

WT and MT (V617F) plasmid templates...... 150

Table 5.7 Eexpt, Epred & SPE for various JAK2 MT (V617F) AS primers directed against the

JAK2 WT and MT (V617F) plasmid templates...... 151

Table 5.8 Putative AS primer designs possessing a 3’ SOV and one or more LNA

substitutions within the 3’ to 3’-6 positions...... 155

Table 5.9 Summary of results of different techniques used to classify the JAK2 V617F

status of 96 patients suspected of a myeloproliferative neoplasm (MPN)...... 160

Table 5.10 Experimental Cq, Eexpt and SPE data for various AS primer designs directed

against the KIT D816V and BRAF V600E SPM-bearing alleles...... 162

xi List of Figures

Figure 1.1 Common types of base-pairing between DNA nucleobases...... 10

Figure 1.2 Example of UVM data and analysis for a short DNA duplex...... 14

Figure 1.3 Example of DSC data and analysis for a short DNA duplex...... 21

Figure 1.4 Structure of sugar in DNA, RNA and LNA...... 31

Figure 1.5 AS hydrolysis probe based qPCR assay for SPM detection...... 36

Figure 1.6 AS primer based qPCR assay for SPM detection...... 37

Figure 2.1 Tm(error) values for 125 test sequences in which the predicted Tm is determined

by the unified NNT model...... 51

Figure 2.2 Tm(error) values for 125 test sequences accounting for a non-zero ΔCp...... 54

Figure 2.3 Relationship between predicted (unified NNT model) and experimental ΔHo

as a function of Tm(expt)...... 57

bp Figure 2.4 Measured ΔCp values as a function of experimental melting temperature. . 58

Figure 2.5 Tm(error) values for 125 test sequences accounting for a non-zero ΔCp and

parameters for sequences with terminal 5’-ta...... 65

Figure 3.1 Structure of LNA-2-aminoadenine and LNA-2-thiothymine...... 68

Figure 3.2 Experimental helix-to-coil transition ΔΔH° and ΔΔS° for 43 duplexes with

standard LNA substitutions...... 80

Figure 3.3 Comparison of ∆∆G°37 values predicted using the LNA NNT model to

experimental helix-to-coil ∆∆G°37 data for the 43 duplexes with standard

LNA substitutions...... 82

xii o Figure 4.1 qPCR amplification at Ta = 62 C with P22 (A), P14L5 (B) and P11L6 (C)

hydrolysis probes using serially diluted MT only plasmids (106 to 102 copies)

and WT only plasmids (106 copies)...... 115

o Figure 4.2 qPCR amplification at Ta = 62 C with P22 (A), P14L5 (B) and P11L6 (C)

hydrolysis probes using MT only plasmids (100% MT), WT only plasmids

(100% WT) and MT plasmids diluted in a background of WT plasmids (10%

and 1% MT)...... 118

Figure 4.3 RFUend verse MT template frequency for KIT D816V & JAK2 V617F probes.

...... 121

Figure 4.4 qPCR of KIT D816V using SYBR Green or one of three hydrolysis probes. ....

...... 123

o Figure 4.5 RFUi+1/RFUi per cycle for qPCR amplification at Ta = 62 C of KIT D816V

plasmid monitored by SYBR Green or hydrolysis probe...... 125

Figure 4.6 Theoretical fractional curves determined for KIT D816V hydrolysis probes. ....

...... 128

Figure 4.7 Signal strength for KIT D816V DNA and LNA-bearing probes...... 129

Figure 5.1 Cq data, of various pure-DNA AS primers for their target allele reported as a

function of the target allele, the primer position interacting with the SOV, and

the annealing temperature Ta used in the qPCR...... 148

Figure 5.2 qPCR amplification curves using the JAK2 WT0 L025 or WT0 L25 primers. ..

...... 157

Figure 5.3 qPCR amplification curves using the JAK2 MT0 L0123 or MT0 L123

primers...... 158

xiii Figure 5.4 qPCR amplification of the JAK2 MT (V617F) and WT plasmid template using

the Plexor™ MT0 L123 AS primer...... 159

xiv Nomenclature

BRAF V600E The somatic point mutation c.1799 t>a in the BRAF gene that causes a substitution of the amino acid valine with glutamic acid at position 600 (p.V600E) in the BRAF JAK2 V617F The somatic point mutation c.1849 g>t in the JAK2 gene that causes a substitution of the amino acid valine with phenylalanine at position 617 (p.V617F) in the JAK2 protein KIT D816V The somatic point mutation c.2468 a>t in the KIT gene that causes a substitution of the amino acid aspartic acid with valine at position 816 (p.D816V) in the KIT protein. a adenine c cytosine d 2-aminoadenine (2,6 Diaminopurine) g guanine h 2-thiothymine t thymine A LNA-adenine C LNA-cytosine G LNA-guanine T LNA-thymine D LNA-2-aminoadenine H LNA-2-thiothymine

• Hydrogen bonding involved in base pairs between strands - Covalent bonding between nucleotides on a single strand

MT Mutant gene or template containing SPM WT Wild-type (germline) gene or template

LOD Limit of detection (analytical sensitivity) SPE Analytical specificity

A260 Absorbance at 260 nm ex Cp Excess heat capacity bp ΔCp Change in heat capacity for helix to coil transition per base pair ΔCp Change in heat capacity for helix to coil transition CA Concentration of the more concentrated strand CB Concentration of the less concentrated strand Cq Quantification cycle CqMT Quantification cycle for MT template CqWT Quantification cycle for WT template ΔCq(MT-WT) Difference in Cq between MT and WT template ΔCq(WT-MT) Difference in Cq between WT and MT template CT Total strand concentration

xv CT(MT) Total strand concentration of probe and MT template CT(MT) Total strand concentration of probe and WT template E Amplification efficiency of a qPCR reaction EDNA Amplification efficiency of a qPCR reaction with DNA based primer Eexpt Experimentally determined amplification efficiency of a qPCR reaction ELNA Amplification efficiency of a qPCR reaction with LNA-bearing primer Epred Predicted amplification efficiency of a qPCR reaction ∆G Change in Gibb’s energy ∆G° Change in Gibb’s energy at standard-state conditions o ∆G°37 Change in Gibb’s energy at standard-state conditions determined at T = 37 C ∆∆G°37 Incremental change in Gibb’s energy at standard-state conditions determined at T = 37oC ∆H° Change in enthalpy at standard-state conditions ∆H°cal Change in enthalpy at standard-state conditions determined directly from integration of excess heat capacity data ∆H°2-st Change in enthalpy at standard-state conditions determined from two-state model ΔH°5’-ta Change in enthalpy at standard-state conditions associated with terminal 5’-ta ΔH°expt Experimentally determined change in enthalpy at standard-state condition ΔH°pred Model predicted change in enthalpy at standard-state condition ∆H°MT Change in enthalpy at standard-state condition for probe•MT template duplex ∆H°WT Change in enthalpy at standard-state condition for probe•WT template duplex ∆∆H° Incremental change in enthalpy ∆∆H°LNA Incremental change in enthalpy associated with LNA substitution ∆∆H°(LNA-DNA) Incremental change in enthalpy between LNA-DNA heteroduplex and isosequential DNA duplex ∆H°UVM Change in enthalpy at standard-state conditions determined from UVM data K(T) Equilibrium constant at transition temperature (T) nbp Number of base pairs ni Number of LNA bases of type i P Pressure R Gas constant RFU Relative fluorescence unit RFUend End point RFU ௘௡ௗ ܴܨܷெ் End point RFU for probe determined for 100% MT template ∆S° Change in entropy at standard-state conditions ΔS°5’-ta Change in entropy at standard-state conditions associated with terminal 5’-ta ∆S°MT Change in entropy at standard-state condition for probe•MT template duplex ∆S°WT Change in entropy at standard-state condition for probe•WT template duplex ∆∆S° Incremental change in entropy ∆∆S°LNA Incremental change in entropy associated with LNA substitution ∆∆S°(LNA-DNA) Incremental change in entropy between LNA-DNA heteroduplex and isosequential DNA duplex T Temperature Ta Annealing temperature Tm Melting temperature (predicted)

xvi Tm(error) Error associated with the model predicted melting temperature as compared to the experimental melting temperature Tm(MT) Melting temperature of probe and MT template duplex Tm(WT) Melting temperature of probe and WT template duplex ∆Tm Incremental change in melting temperature between a LNA-DNA heteroduplex and an isosequential DNA duplex ∆Tm(error) Error associated with the model predicted incremental melting temperature as compared to the experimental incremental melting temperature determined between a LNA-DNA heteroduplex and an isosequential DNA duplex ΔTm(LNA-DNA) Incremental change in melting temperature for a LNA-DNA heteroduplex and isosequential DNA duplex ∆Tm(MT-WT) Difference in the melting temperature determined for duplexes formed between the probe•MT template and the probe•WT template ∆Tm(pred) Predicted incremental change in melting temperature for a LNA-DNA heteroduplex compared to the isosequential DNA duplex

Tm(expt) Experimentally determined melting temperature Tm(pred) Predicted melting temperature Tm(DNA) Melting temperature of DNA duplex

Tm(LNA) Melting temperature of LNA-DNA heteroduplex Tmax Temperature determined at the maximum of d(A260)/dT curve Tmax(DNA-PM) Tmax of DNA duplex Tmax(DNA-MM) Tmax of DNA duplex containing a single mismatch Tmax(LNA-PM) Tmax of LNA-DNA heteroduplex Tmax(LNA-MM) Tmax of LNA-DNA heteroduplex containing a single mismatch ΔΔTmax(MM) Incremental change in ΔTmax for a LNA•DNA mismatch compared to the DNA•DNA mismatch involving the same nucleobases (i.e A•a vs a•a) Tref Reference temperature tc t-distribution critical value

α Fraction of strands in double stranded state αMT Fraction of strands in duplex state formed between probe and MT template αWT Fraction of strands in duplex state formed between probe and WT template χ Residual error θ Fraction of strands in single stranded state σCqMT Standard deviation of CqMT σCqWT Standard deviation of CqWT

xvii Acknowledgements

The enigma and pervasiveness of cancer intrigued me since I was a young engineer.

This thesis represents my contribution through engineering and scientific research to improve our understanding, diagnosis and treatment of cancer. Through my own personal experience of my father’s lost battle with cancer, and which the overwhelming statistics affirm, too many lives are affected by this disease. It is my hope that more sensitive and specific detection of cancer will provide patients with a greater variety of selective treatment options, and therefore, better outcomes.

The ability to pursue this Ph.D. begins first with my parents, Yvonne and Doug, who have planted in me the necessary skills and attributes, and who have provided me in every conceivable way, the required support and encouragement to complete this goal.

I owe enduring gratitude to my wife Catherine whose unending love and guidance has provided me a solid foundation and with whom I have shared the happiest moments of my life, the birth of our son Kai Douglas and daughter Makena Alice. The experience of being a father has granted me a different perspective to both work and life, and the joy and laughter that Kai and Makena bring to me daily, continues to inspire me in new and unique ways.

I would like to thank Dr. Charles Haynes for providing me the opportunity to be a member of his research team, and who has dedicated significant time and effort not only aiding me in the completion of my thesis, but also in developing the essential skills to be a successful member of and contributor to the scientific community. I appreciate all of the members of his laboratory group with whom I have had the pleasure to either work with or work near. In particular I would like to thank Dr. Louise Creagh for always providing her time and support with the finer details of research.

xviii I would also like to thank the members of the B.C. Cancer Agency who have provided me the opportunity to be directly involved in scientific research aimed at improving diagnostics for cancer genetics. In particular I would like to thank Dr. Sean Young and Dr.

Aly Karsan. Very special thank you to Kelly McNeil with whom I have been fortunate

enough to not only work with but also become friends.

Thanks goes to an inspiring group of mentors, both past and present who have made

significant impact on my engineering career. These people include Pat Carlson and Ron

Sawatzky and my colleagues David Grohs, Jonathan White and Nance McCollom at Pavilion

Energy Corp. Finally, I want to thank all those whose patient encouragement couched

sometimes in provocation, have also inspired the completion of this epic Ph.D. journey:

Hughesman & Chow families, Stephen Dickinson, Philip Wong, Eddie Chan, Doug Howes,

Dr. Patrick Francis, Dr. Matt Larouche, Dr. Natasha Pollock, John Seo, Eric Chow, Kevin

Yee, Dr. Hans Drouin, Betty & Chris Marshall, Barry & Marilyn Wong and Rod Beltran.

xix Dedication

To my mom Yvonne, and in memory of my dad Douglas

&

To my family Catherine, Kai and Makena

xx Chapter 1: Introduction

1.1 Thesis Overview

The completion of the draft (1,2) and final (3) versions of the first human sequence had a profound impact on research and technical advances in the life sciences and clinical . Increasingly detailed genome maps permit identification of associated with dozens of diseases, including myotonic dystrophy, fragile X syndrome, neurofibromatosis types 1 and 2, inherited colon cancer, Alzheimer's disease, and familial breast cancer. The study of genetic variations within a population, including those that are inherited (germline) and acquired (somatic) by individuals is likewise opening a new era of molecular medicine that will replace treating symptoms with identifying and repairing fundamental causes of disease. The development of rapid and more specific diagnostic tests is an essential component of this transformation, and will make possible earlier treatment of countless life-threatening and life-altering maladies. Medical researchers also will be able to devise personalized therapeutic regimens, improve immunotherapy techniques, identify environmental conditions that may trigger disease, and possibly augment or even replace defective genes through gene therapy.

Large-scale sequencing programs such as the International HapMap Project (4) and the Cancer Genome Project (5) led by the Wellcome Trust Sanger Institute have collected and compiled extensive data for common germline genetic variations and somatic mutations, respectively. Common types of germline variations include copy-number variations (CNV) and single nucleotide polymorphisms (SNPs). Several classes of somatic mutations have also been identified, including large genomic rearrangements such as translocations, smaller genetic variations such as short insertions, deletions or replacements, and individual base pair

1 differences created by somatic point mutations (SPMs). The Catalog of Somatic Mutations

in Cancer (COSMIC) created and curated by the Cancer Genome Project reveals that SPMs

are the most commonly observed class of somatic mutations (6), with a subset of these SPMs

identified as “driver mutations” in cancer development and progression (7,8).

Although significant research is still required to both discover and validate the

connection between genetic variations and disease or potential for disease, a number of

examples exist where knowledge of either a SNP or SPM has proven useful for differential

diagnosis, prognosis and/or prediction of therapeutic response (9-13). Among these

examples is a growing list of genetic variations that have been validated as clinically

relevant, and this has intensified the need for accurate, cost effective tests that can be used in

clinical laboratories and hospitals. Establishing reliable methods to detect SPMs is

particularly important. Although cancer has been found to be a heterogeneous disease with

both the type and number of somatic mutations unique to an individual’s cancer, a subset of specific SPMs appear at such frequencies in certain cancers that their role as “drivers” has been signified (14). Indeed the recent success of using BRAF inhibitors in treating patients with metastatic melanoma positive for mutations in the BRAF V600 codon (15,16) highlights

the potential to improve cancer therapy using targeted treatment based on SPM profiling.

Detecting a SPM can be challenging, as unlike germline SNP variations, the

frequency (percentage) of cells that have acquired one or more clinically relevant SPM can

be highly variable, and therefore the mutated cell pool may represent far less than 1% of the total cell population within a clinical sample. The detection of a low frequency somatic

2 mutation is further challenged by the fact that the mutant allele1 typically differs from the

parent wild-type (WT) allele by a single base pair, creating the need to avoid or at least strongly inhibit cross reactions with and/or amplification of the much higher abundance WT sequence (17,18).

Deep sequencing has been used to discover most of the clinically important SPMs known to date (19). However sequencing of SPMs, even using next-generation technology, is unlikely to find significant application in clinical laboratories for initial screening due to the cost, as well as technical limitations associated with sequencing error rates (20-22).

Instead, routine and repeated testing for somatic point mutations will more likely rely on techniques that not only accurately detect low frequency SPMs, but also allow for cost- effective continual monitoring of minimal residual disease (MRD) levels during and following treatment (23).

Of the techniques available to detect SPMs, those based on the polymerase chain reaction (PCR) are among the most promising due to their low cost, general ease of use, and potential for very good analytical sensitivity, also known as the limit of detection (LOD), and

analytical specificity (SPE), particularly when coupled with in-line detection systems as in

quantitative real-time PCR (qPCR). The terminology for analytical sensitivity (LOD) and

analytical specificity (SPE) used throughout this thesis is that recommended by the MIQE

(minimum information for publication of quantitative real-time PCR experiments) guidelines

(24) and also accepted by the medical community (25). For SPM detection, analytical

1 In this thesis allele is applied in accordance with the more general definition in which an allele represents one of two or more forms of a gene.

3 sensitivity (LOD) is a minimum number of copies that can be reliably be detected by a given

assay, while analytical specificity (SPE) is a measure of that assay’s ability to discriminate

the target gene containing the SPM from closely related alleles (i.e. germline) and is

generally presented as a ratio or percent.

The LOD and SPE of a clinical assay required for unequivocal detection of a SPM is

dependent on both the quality and quantity of genomic DNA present in a typical patient

sample. Although the amount of starting material may vary, a typical patient sample

collected and processed for use in qPCR contains ca. 100 ng of genomic DNA. A single

copy of the human genome is estimated to contain ~3 x 109 bp, with each nucleotide base

pair weighing on average 660 g/mole, therefore ca. ~ 30,000 copies of genomic DNA are present in a typical clinical sample. Stochastic effects limit the LOD that can be theoretically achieved with confidence (95%) to 3 copies of the mutant gene (24). Therefore in a typical patient sample, an assay with an SPE of 3:30,000 or 0.01% can be classified as providing unequivocal detection of a SPM. Although the clinical significance of detecting very low mutation loads is not clear, the development of clinical qPCR based assays capable of unequivocal detection of SPMs is a desirable goal, both in that it improves confidence in detecting what is currently set as a positive result in the clinic and that it may lead to improved understanding of the and progress of cancer-related diseases as well as

earlier detection of cancerous or precancerous states for patients.

Common qPCR techniques used to detect and differentiate sequences that may differ

by as little as one nucleotide include both allele-specific probe (AS probe) and allele-specific

primer (AS primer) based assays. In both cases, accurate prediction of probe or primer

hybridization thermodynamics, especially the melting temperature (Tm) under specific PCR

4 solution conditions, is generally required to design a robust assay offering an acceptable LOD and SPE. First described more than a decade ago, AS probes and AS primers were originally designed and are still typically designed as standard DNA oligonucleotides

(26,27). Although SNPs are often reliably detected using these pure-DNA reagents, they have proven far less effective for the detection of SPMs, as the frequency of the allele bearing the SPM in the clinical sample is unknown. SPE below 5% are rarely achieved for

DNA-based AS probes (28), and although DNA-based AS primers can on occasion achieve

SPE capable of unequivocal detection of SPM (29), the performance of DNA-based AS primers is known to be sensitive to a number of factors including the type of base pair mismatch and the surrounding sequence, which makes it difficult to consistently achieve SPE below 1% (30-33). DNA based AS probes and AS primers are therefore generally unsuitable for detection of acquired mutations present at low frequency, such as during early disease pathology or following treatment.

The need for improved AS probe and AS primer chemistries and designs that greatly enhance the LOD and more notably the SPE for rare SPMs has in part motivated the development of a number of useful nucleic-acid analogs. Of particular importance are

Locked Nucleic Acids (LNAs), a class of RNA analogues that provide several advantages over natural oligonucleotides in and antisense applications (34). Each of the four standard deoxynucleotides can be converted to its corresponding LNA through the introduction of a 2’-O, 4’-C methylene bridge into the ribose ring of the nucleotide. LNA is therefore a chemical analogue of RNA, and commercial DNA synthesizers can be used to synthesize either probes or primers comprised of linear combinations of standard and locked nucleotides. There is evidence that LNA containing oligonucleotides can offer performance

5 advantages when used as probes in qPCR assays to detect either SNPs or SPMs (35,36).

LNAs display greater base-pairing stability and mismatch discrimination, and these attributes

can be used to design shorter, more selective AS probes (36-40). Similarly AS primers that

incorporate a single LNA at or near the 3’ terminus have also led to improvements in SPE

when compared to pure-DNA AS primers of the same base sequence (41-43).

However, as LNAs alter duplex stability, the effective design of LNA-bearing AS

probes and AS primers requires accurate prediction of the hybridization thermodynamics

with the target allele. In particular, accurate knowledge or prediction of Tm is needed.

Regrettably, current thermodynamic models capable of predicting Tms of complementary

duplexes formed with an LNA containing probe or primer are limited in a number of

important ways. They are only applicable to oligonucleotides with internal non-neighboring

LNA substitutions (single-stranded sequences known as LNA/DNA “mixmers”), as well as to a narrow range of solution conditions (44,45). Furthermore, they are not always accurate in predicting Tm, and the accuracy of these models when applied to mixmer AS probes and

AS primers used in qPCR assays has yet to be tested. A number of improvements and

analyses are therefore needed before these models can be confidently applied to SNP and

SPM assay design.

In addition to an improved knowledge of hybridization thermodynamics, the

successful application of LNA-bearing AS probes and AS primers in qPCR assays designed

to detect SPMs will require better understanding of the interaction of the probe or primer

with the polymerase enzyme, which for qPCR is generally the thermostable DNA

polymerase Taq. Heteroduplexes containing both DNA and LNA have been shown to have

unique interactions with DNA polymerases (46), likely due to localized structural changes

6 caused by the LNA substitution(s) (47-50), and these altered interactions may perturb

extension kinetics in desirable or undesirable ways.

This thesis reports results from an integrated experimental and theoretical research

program aimed at providing an improved understanding of the impact of LNA substitutions

on the hybridization thermodynamics and performance of AS probes and primers designed to

selectivity detect SPMs that have been identified as being “driver mutations” in cancer. A

new model that accurately predicts hybridization thermodynamics of oligonucleotides

containing any number and pattern of LNA substitutions is developed and then applied to AS

probe and AS primer design. Results from experiments designed to investigate the impact of

LNA substitution patterns within probes and primers on Taq polymerase activity are also

reported, and guidelines for placement of LNAs to improve the specificity of qPCR assays

for SPMs are developed. Together, these advances are then used to create new LNA-

containing AS probes and AS primers directed against clinically relevant SPMs that exhibit

equal to greatly improved performance when compared to previously reported detection

methods.

1.2 Background

The importance of deoxyribonucleic acid (DNA) to life and medical science has made

it the focus of intense research over the past 70 years. In vivo, chromosomal duplex DNA is

sufficiently stable to preserve one’s genetic code. Yet, under appropriate conditions, portions

of a must and do dissociate into single strands to permit, among other things,

the expression of genes. Many powerful techniques and technologies used in molecular biology and in clinical laboratories also exploit the ability to dissociate duplex DNA into its

7 component single strands. Oligonucleotide probes that hybridize to natural single-stranded

DNA are used to identify specific sequences that are diagnostic of disease or to identify a

unique person of interest in a criminal investigation. Oligonucleotide primers are used in a

wide range of applications, including initiating complementary strand synthesis for

sequencing or PCR-based amplification. The development and successful application of

these techniques typically requires knowledge of how the stability or melting temperature Tm of a given duplex depends on its length, sequence and concentration. Solvent composition

(e.g. salt concentration, pH, added metal ions or organic solvents, etc.) is also known to affect the stability of a duplex at a given temperature (51-53). A long-standing goal of researchers studying structures, dynamics and energetics of nucleic acids has therefore been to understand, predict and control the properties and functions of natural nucleic acids and modifications to them.

Duplex DNA consists of two polynucleotide chains that are arranged in an anti- parallel double-helical structure. The nucleotides of DNA are all comprised of three chemical moieties: a phosphate group, a five-carbon sugar (deoxyribose), and an organic nitrogen-containing base. Four different nucleotides that are distinguished through their unique nitrogenous base occur in DNA. They include the pyrimidines, cytosine (c) and thymine (t), and the purines, adenine (a) and guanine (g). Each polynucleotide, also known as single-stranded (ss) DNA, is formed through covalent linkage of the deoxyribose sugar of one nucleotide to the phosphate group of the next nucleotide, with the bases orientated as side groups off the phosphodiester bonded backbone. Several conformations of duplex DNA are found in nature, including A-DNA, B-DNA and Z-DNA. First described by Watson and

Crick, the B form of DNA, which consists of a right-handed double helix that in aqueous

8 solution makes one complete turn about its central axis every 10.4 to 10.5 base pairs, is the most common double-stranded conformation in cells and other living systems. Orientation- specific pairing of a with t and g with c on opposing anti-parallel strands, as well as stacking forces between neighboring bases on each strand serve to create the observed B-form double helix. That knowledge alone is sufficient to understand how genes duplicate. However, through their careful structural and biochemical studies, Watson and Crick (54), along with others (55,56), have provided many additional insights that to this day continue to serve as reliable fundamental underpinnings for understanding and manipulating nucleic acid functions. For example, the neutral bases of individual nucleotides in solution can form at least 28 unique and stable base pair structures that include at least two hydrogen bonds (57); some of these structures are shown in Figure 1.1. The breakthrough discovery of Watson and

Crick was that only one type of these possible base pair structures, now appropriately named the Watson-Crick base pairs (Figure 1.1), fits into the uniform conformationally constrained double helical structure of B-DNA, which for the remainder of this thesis we will simply refer to as duplex or double-stranded (ds) DNA.

9 Watson-Crick

Hoogsteen

N NH2 O CH H O H N 3 N 2 H N a 2 N g HN N HN N t c N N N N N R O R O R R

Figure 1.1 Common types of base-pairing between DNA nucleobases.

In Watson-Crick base pairs, a forms 2 hydrogen bonds with t, and g forms 3 hydrogen bonds with c. The formation of hydrogen bonds between paired bases explains in large part why dsDNA is enthalpically and thermodynamically favored over the ssDNA state at physiological conditions. However, hydrogen bonds between a given base and its surrounding water molecules, as well as van der Waals and π-π∗ interactions between adjacent stacked bases along the helix also contribute to helix stability and structure. The strength of each of these interactions depends on base sequence, making the stability of a duplex sequence specific. In their now classic study, Marmur and Doty (58) found that the hydrogen-bond rich g•c base pair is more stable within the duplex. They then used their results to provide the first useful model of the sequence and length dependence of duplex

10 DNA stability by showing that, to a first approximation, Tm increases linearly with g•c base

pair content. Later seminal studies showed that the degree of base pair complementarity is also important (e.g., Aboul-ela F. et al. (59)), as mutations in base sequence that result, for example, from errors in DNA replication, can destabilize a duplex through the formation of mismatched base pairs.

The entropic cost of generating an ordered bimolecular structure from two flexible single strands destabilizes dsDNA. Entropy therefore compensates the favorable enthalpy

change for the duplexation reaction. As a result, the incremental Gibbs energy change ∆G

per base pair added to a DNA duplex increases the stability of the duplex at physiological

conditions by only a small amount, as has been shown in a number of studies (60-65) using techniques that will be described. This is important biologically since it provides the relatively modest duplex stability needed for gene and translation to occur. It is also of scientific importance, as it means that the accurate prediction of melting temperatures

(and thermodynamics) will require a molecular thermodynamic model that properly describes the balance of compensating forces that give rise to the small base pair specific incremental

Gibbs energy changes that collectively stabilize a duplex.

To gain insight into dsDNA stability and how it is affected by changes in primary structure, scientists have studied duplexes using a combination of methods that include X-ray crystallography (66,67), Raman spectroscopy (68,69) and NMR (70,71) to obtain structural information, and ultraviolet (UV) spectroscopy and differential scanning calorimetry (DSC) to quantify the thermodynamics of the melting transition (61,62,64,65,72-74). These various studies have shown that the denaturation of B-DNA involves disruption of stacking interactions between adjacent bases on a given strand and between the two base pairs within

11 the corresponding base pair doublet. Inter-chain stacking interactions are completely lost

during denaturation, while intra-strand stacking interactions are partly disrupted. Base pair

doublets, also known as nearest-neighbors, can be classified into four distinct groups based

on their composition and sequence. If R and Y denote the purine and pyrimidine bases,

respectively, these are RR, RY, YR and YY, with the RR-type doublet generally providing

the highest stacking stability (75). Hydrogen bonds between Watson-Crick base pairs are

also lost during denaturation. As noted above, one additional hydrogen bond is formed in the more stable g•c base pair. As a result, doublets may contain a total 4, 5 or 6 hydrogen bonds between the two base pairs, and thus exhibit marked differences in stability. Finally, the contributions of base-stacking and base-pairing to B-DNA stability are thought to be fairly similar in magnitude (76), so that both effects must be properly modeled if accurate predictions of melting thermodynamics are to be realized.

1.2.1 Methods for measuring duplex DNA melting thermodynamics

The importance of duplex DNA stability to biology and health science has motivated substantial research toward understanding, through experimentation, the melting thermodynamics of duplex DNA to the random-coil single-stranded state. Focus has largely concentrated on relatively short oligonucleotides because of their relatively simple nature and their widespread use as probes and primers in the polymerase chain reaction (PCR), in various quantitative real-time PCR techniques, and in next-generation DNA sequencing and microarray technologies. For these short dsDNAs, two experimental methods have been developed and widely used to measure Tm, as well as the change in Gibbs energy (∆G)

enthalpy (ΔH), entropy (ΔS) and, in some cases, heat capacity (ΔCp) upon melting. These

12 include i) indirect monitoring of melting thermodynamics by UV spectroscopy and ii) direct measurement by DSC. A description and comparison of these two methods is provided

below.

1.2.1.1 UV absorption spectroscopy

The denaturation of dsDNA into its composite single strands is typically measured by

optical absorption versus temperature studies that generate a “melting curve”. The experiment is generally conducted at 260 nm, where UV light absorption mainly occurs through a π-π* electronic transition in both pyrimidine and purine bases. An example of a

melting curve is provided in Figure 1.2 and shows that an increase in absorption is recorded

during the dsDNA to ssDNA transition. Commonly referred to as the hyperchromic effect,

this increase in the molar absorptivity of DNA is due to changes in vibrational modes of the

bases. For denaturation of short, fully complementary dsDNA, linear pre- and post-transition

base lines are expected and generally observed. This feature, along with the overall

simplicity of the technique and the low concentration of oligonucleotides required have

served to make UV absorption spectroscopy the primary method used to study DNA melting

transition thermodynamics (61).

13

Figure 1.2 Example of UVM data and analysis for a short DNA duplex.

Several assumptions are generally made to derive thermodynamic data from a UV melt (UVM) curve. The first is that the measured change in absorbance correlates directly with a transition in the DNA from the ds to the ss state. As noted above, helix denaturation alters the electronic configuration of the bases through both base unstacking and unpairing contributions, so it is indeed reasonable to think that the observed shift in absorption intensity to lower energy bands is proportional to the percentage of the original dsDNA that has denatured (77). The importance of this assumption is that it permits the fraction (α) of strands in the ds state to be estimated from the UVM curve, provided baselines representing the pre-transition (α = 1) and post-transition (α = 0) states can be accurately assigned (78).

14 Two further assumptions are then required to compute thermodynamic data from the melting

curve. First, one must assume the reaction can be modeled as a reversible two-state (all or

none) transition. Though unequivocal proof that this condition is met is hard to obtain, it is

common to test for two-state melting behavior by analyzing UVM data using two

independent methods: the first is the classic van’t Hoff analysis and the second is based on a

Levenberg-Marquardt nonlinear least-squares fit of the normalized melting curve. Both methods are described below and agreement (±15%) of the thermodynamic values obtained

using the two analyses is generally accepted as an indication that two-state thermodynamics

are applicable to the melting transition of a given duplex (64). Second, the change in heat

capacity (ΔCp) between the two states is assumed to be zero in this analysis. One clear and

widely recognized value in assuming ΔC = 0 is that thermodynamic changes for the p measured melting transition can be computed with ease via a van’t Hoff analysis (64,65). To

both see how this is done and understand the thermodynamics of bimolecular dissociation

reactions, consider the melting of a short dsDNA sequence into two single strands (ssDNA1 and ssDNA2) that are not self-complementary in the 5’→ 3’ sense (e.g. ssDNA1 might be the homo-polynucleotide aaaaaaaaaaaa, which cannot base-pair with itself and is therefore not self-complementary). The melting reaction is then described as

K (T ) dsDNA ←→⎯⎯ ssDNA1 + ssDNA2 (1.1) where K(T) is the equilibrium constant for the helix-to-coil transition at temperature T and is

given by

[ssDNA1 ][ssDNA2 ] K(T ) = (1.2) []dsDNA

15 Note that the equilibrium constant defined by Equation 1.2 is an effective one, as it is based

on equilibrium concentrations and not on activities at the chosen solvent conditions. A

strand mass balance gives

CT = []ssDNA1 + [ssDNA2 ]+ 2[dsDNA] (1.3)

where CT is the total strand concentration in the sample. Division of Equation 1.3 by CT allows one to define α (= 2[dsDNA]/CT), the fraction of strands in the duplex state, so that at

Tm

2 CT (1− α ) K(T ) = (1.4) m 2α

when the concentrations of the two strands are equimolar. The fundamental thermodynamic

relationship for the melting of a duplex formed from two non-complementary strands is

therefore given by

⎛ C 1−α 2 ⎞ ⎜ T ( ) ⎟ o o ΔG = −RT ln K = −RT ln⎜ ⎟ = ΔH − TΔS (1.5) ⎝ 2α ⎠

where ∆G is the Gibbs change at temperature T, and ∆H° and ∆S° are the standard enthalpy

and entropy changes, respectively, for the helix-to-coil transition. The superscript o in

Equation 1.5 denotes that both ∆H° and ∆S° are defined at the standard-state condition, which for UVM experiments is typically a buffered aqueous solution (pH 7) at temperature

Tm and atmospheric pressure containing NaCl at a concentration of either 1 M (typically) or

0.15 M (occasionally). Rearrangement of Equation 1.5 yields the melting curve modeling equation for the case where ∆Cp is assumed to be zero

ΔH o T(α ) = (1.6) ⎛ C 1− α 2 ⎞ o ⎜ T ()⎟ ΔS − R ln⎜ ⎟ ⎝ 2α ⎠

16 When α = 0.5, Equation 1.6 gives the melting temperature Tm of the duplex at the given total

strand concentration CT

ΔH o T = (1.7) m ⎛ C ⎞ ΔS o − R ln⎜ T ⎟ ⎝ 4 ⎠

Equation 1.7 emphasizes the role of enthalpy-entropy compensation in defining the stability

of duplex DNA. In particular, for the melting reaction (Equation 1.1), both ∆H° and ∆S° are

positive and relatively large in value; the term -R ln (CT/4) in the denominator of Equation

-4 1.7 is also positive in value since CT is typically less than 1 x 10 M, but the magnitude of this term is generally a fraction of ∆S°.

It is important to note that CT/4 becomes CT in Equation 1.7 when it is applied to a

duplex formed from a self-complementary oligonucleotide. This change arises because the

melting reaction for a duplex formed from a self-complementary strand (e.g. the ssDNA might be the polynucleotide aaaaaatttttt) is not given by Equation 1.1, but instead by

dsDNA ←→⎯⎯K (T ) 2ssDNA (1.8)

As a result, the equilibrium constant K for self-complementary DNA differs by a factor of 4

at a given CT from that for a duplex formed from non self-complementary strands. As

described in detail below, the analysis of UVM data for self-complementary DNA also

requires an entropy correction that arises from the symmetry of self-complementary strands

(79). Finally, if the two strands are non self-complementary and are not present in equimolar concentrations, CT/4 becomes CA – CB/2, where CA and CB are the concentrations of the more

and less concentrated strands, respectively. If one strand is present in great excess such that

17 CA >> CB, only the concentration of more abundant sequence is required in Equation 1.7 to determine Tm.

To analyze UVM data by the classic van’t Hoff analysis, it is convenient to linearize

Equation 1.7 to

o 1 R ⎛ CT ⎞ ΔS = − o ln⎜ ⎟ + o (1.9) Tm ΔH ⎝ 4 ⎠ ΔH

This result shows that a plot of 1/Tm versus ln (CT), known as the van’t Hoff plot, allows determination of ∆H° from the slope and, in theory, ∆S° from the intercept. The use of

Equation 1.9 to obtain accurate thermodynamic data requires the acquisition of melting temperatures over a wide range of CT values. However, the determination of Tm from UVM data is not straightforward, as it requires careful non-linear fitting and model-based analysis of the melting curve. To avoid this complication, the value of Tm in Equation 1.9 is usually taken to be that of Tmax, which is easily determined as the temperature at which d(A260)/dT is a maximum. Though they are close in value, Tm ≠ Tmax for bimolecular reactions (80,81), and this assumption therefore introduces error into the regressed thermodynamic values.

Additional and sometimes significant error in values of ∆S° determined by van’t Hoff analysis can occur due the inability to collect data at CT values close to the intercept of

Equation 1.9. Several other limitations to this classic analysis method have also been identified and carefully explained in previous reports (80).

An alternative method for analyzing UVM data according to two-state theory is provided by substituting the correct relation between K and α (i.e. Equation 1.4 for a non-self complementary bimolecular reaction) into the Gibb-Helmholtz equation.

dα ΔH o =−6RT 2 (1.10) m dT Tm

18 To apply Equation 1.10, raw UVM data are first normalized to 0 ≤ α ≤ 1, typically by independent linear fits of the pre- and post-transition baselines as shown in Figure 1.2. A non-linear fit of one or each baseline might prove more accurate in certain cases, though the added number of fitted variables complicates both the analysis and the estimation of errors.

2 2 Alternatively, Owczarzy has shown that regions where the second derivative d (A260)/dT is zero can be used to define temperatures where melting curve data are linear and may therefore be used to establish the pre-transition and post-transition baselines (81). Linear least-squares fitting of the resulting α(T) curve with Equation 1.10 in the range near and centered around α = 0.5 is then used to estimate Tm, as well as the value of dα/dT at Tm needed to compute ∆H° using Equation 1.10. ∆S° can then be calculated using Equation 1.7.

Typically, the values of ∆H° and Tm obtained by this local regression procedure are precise.

However, as a secondary check of data quality, both quantities may also be obtained by fitting the entire α(T) curve (or even sets of curves) using the Levenberg-Marquardt nonlinear least-squares method (Figure 1.2) with the local (linear least-squares) estimates used as initial guesses.

Important points may be made regarding analysis of UVM data as a means of building predictive models of dsDNA stability. First, the UVM method measures changes in absorbance due to the hyperchromic effect and does not directly measure melting thermodynamics. Instead, thermodynamic data are obtained by regressing a two-state thermodynamic model to α(T) data. The quality of this regression depends on several factors, the most significant of which are the proper selection of the pre- and post-transition baselines and the implications of assuming ΔCp equals zero. The first factor is significant because the baselines are used to normalize the absorbance data to α values, and this in turn

19 defines the shape of the function α(T) from which Tm (taken as the temperature at which α =

0.5) and ∆H° are estimated. The second factor is also important, though it does not, in general, introduce significant error in the regressed melting thermodynamics at Tm. Instead, its impact is felt because it leads to a thermodynamic analysis, and ultimately to nearest- neighbor type thermodynamic models, where ∆H and ∆S are assumed to be functions of duplex length and sequence, but not of temperature. The consequence of this is that ∆H° and ∆S° values acquired at Tm are then used without temperature correction in Equation 1.5 to estimate ∆G at a temperature T of interest, say 37 °C, that may be far from Tm. If, in fact,

ΔCp ≠ 0, this analysis procedure can result in significant errors that will serve to obscure the true differences in stability arising from duplex length, sequence, etc.

1.2.1.2 Calorimetry

Calorimetric methods such as differential scanning calorimetry (DSC) and isothermal titration calorimetry (ITC) offer the advantage of directly measuring the thermodynamics of

DNA melting transitions (82-84). Uncertainties in the data are therefore reduced. In DSC,

DNA helix-to-coil transitions are followed as a function of temperature by measuring the

ex excess heat capacity (Cp ) of a DNA-containing solution relative to an otherwise identical

ex DNA-free control solution. Integration of the resulting Cp versus T curve (Figure 1.3) provides a direct measure of the transition enthalpy and entropy

∆ (1.11)

∆ (1.12)

ex where T1 and T2 are the temperatures at which Cp (T) meets the pre- and post-transition baselines. As shown in Figure 1.3, Tm is also directly measured in the DSC thermogram as

20 the temperature at the area midpoint of the enclosed melting transition. Determination of

DNA melting thermodynamics by DSC is therefore not subject to assumptions required to analyze UVM data. However, since DSC directly measures total energy changes in the system, it is important that the duplex is of high purity (i.e. HPLC purified) to ensure molar calculations of thermodynamic parameters are accurate.

Figure 1.3 Example of DSC data and analysis for a short DNA duplex.

Calorimetric data can provide valuable additional information regarding the nature of the melting transition. In particular, DSC offers the ability to measure the difference in heat capacity (ΔCp) between the dsDNA and ssDNA states at the standard-state temperature Tm.

Although the exact value of ΔCp and its sequence dependence remain open questions, there is now ample data from calorimetry studies showing that ΔCp is not zero, but rather takes on

21 values > 0 (73,74,85-90). This result is not unexpected, in part because the denaturation of each anti-parallel strand of the double helix to its random-coil single stranded state will result in changes in the solvation of chemical groups, A number of studies exist (for a review, see

Prabhu and Sharp (91)), largely focused on protein folding, reporting very good correlation between a change in solvent-exposed surface area and the value of ΔCp. Those bases that remain stacked in the single strand following duplex denaturation are likely to produce little net change in polar and nonpolar surface hydration, and thus little change in Cp. But those bases that do unstack will change their states of hydration, and this in turn should contribute to a non-zero ∆Cp. This is supported by studies conducted by Schwarz and coworkers

(92,93) and by Mrevlishvili et al. (94) that show, respectively, that ssDNA exhibits broad changes in stacking structure with temperature and that the denaturation reaction is accompanied by a change in hydration. These results align with a particularly elegant study by Spink and Chaires (95), who combined UVM, circular dichroism and vapor-pressure osmometry experiments to show that, on average, the melting of poly(a)-poly(t) dsDNA results in the release of 4 water molecules per base pair. Other contributions to ∆Cp are thought to include changes in the conformations (and thus the hydration shells) of the bases and strands resulting from condensed counter-ions and other electrostatic effects (96).

Though questions remain regarding the precise magnitude of these various contributions

(97), current theory and data suggest that stacking/unstacking events provide important, if not dominant contributions to ∆Cp, as summarized in the excellent review by Mikulecky and

Feig (88).

In Equations 1.11 and 1.12, the subscript cal makes clear that the enthalpy and entropy changes for the helix-to-coil transition have been directly computed from the

22 calorimetry data and not through fitting the melting transition to the two-state (2-st) reaction model. As with UVM data, one can analyze dsDNA melting data acquired by DSC using the two-state model. In this case, however, the model need not assume that ∆Cp is zero and

DSC-derived melting transition data can be analyzed by the non-linear (i.e., ∆Cp ≠ 0) form of the two-state transition model, given by

∆° ∆ ∆ (1.13) 1

2 (1.14) 1 and

o ⎛ C ⎞ ⎡ ΔH ⎛ T ⎞ ΔC p ⎛ ⎛ T ⎞⎞⎤ K()T = ⎜ T ⎟exp⎢− ⎜1− ⎟ − ⎜T − T − T ln⎜ ⎟⎟⎥ (1.15) 4 RT ⎜ T ⎟ RT ⎜ m ⎜ T ⎟⎟ ⎝ ⎠ ⎣⎢ ⎝ m ⎠ ⎝ ⎝ m ⎠⎠⎦⎥ to determined the two-state theory derived values of ∆Cp, ΔH° and Tm, as well as K(T).

o Hereafter these values will therefore carry the subscript 2-st (e.g., ΔH2−st ) to distinguish them from those determined by direct integration of calorimetry data. Here, mpost and bpost are the slope and intercept, respectively, of the post-transition baseline, which tends to be more stable and easily fit to provide a reliable reference for regression of thermodynamic data.

Equivalency of thermodynamic values obtained directly from integration of

o calorimetry data and from non-linear two-state model analysis of that data (i.e., ΔHcal =

o ΔH2−st ) provides amongst the strongest indicators that the melting of a given dsDNA sequence obeys two-state thermodynamics and can therefore be treated using a molecular

23 o o thermodynamic model based on two-state reaction theory. Cases where ΔHcal > ΔH2−st are indicative of a melting reaction that may involve partially unfolded intermediate states that

o o are significantly populated, while systems where ΔHcal < ΔH2−st are less easily understood, but are nevertheless not amenable to two-state thermodynamic analyses.

Despite its apparent advantages, calorimetry has not found particularly wide use in studies of DNA melting thermodynamics, especially when compared to the UVM technique.

An explanation for this may be the poor sensitivity of older calorimeters, which required high concentrations of DNA at a time when oligonucleotides were quite expensive due to the lack of efficient synthesis methods. However, the development of much higher sensitivity calorimeters (98) and the dramatic decrease in the cost of oligonucleotides now make calorimetry a more attractive technique to study DNA thermodynamics.

1.2.2 Thermodynamic models used to predict DNA duplex stability

As detailed above, thermodynamic data are relatively easy to extract from either

UVM or DSC-derived melting curves for short dsDNA sequences that denature in a single, two-state transition. As a result, a rich database comprised of results from many independent studies (51,52,60-62,64,65,99,100) is now available and has been used to great effect to develop models capable of accurately predicting Tm values for short B- as a function of sequence and length, as well as solvent composition. The most widely used of these models are collectively known as nearest-neighbor thermodynamic (NNT) models, and much has been written about them (53,60-65,101-105) due to their ubiquitous use in molecular biology, biotechnology, and industry. While attempts have been made to directly connect them to

Poland-Scheraga (PS) theory and other statistical mechanics (SM) based theories of DNA

24 phase transitions (see, for example, Tøstesen et al. (106)), NNT models were not derived on that basis and it is therefore more accurate to state that their structure is merely inspired by these more rigorous, but less quantitative SM theories.

As noted above, many independent experiments have shown that DNA melting thermodynamics depend not only on duplex length and the number of a•t and g•c base pairs, but also on sequence. PS theory and advances made to it by Fisher (107) and others (108) predict that the sequence dependence arises, at least in part, from the contribution of base- stacking interactions. Any successful model of dsDNA thermal denaturation would therefore need to account for base-stacking interactions in addition to base-pairing. Regrettably, DNA melting experiments do not provide a means to separate these two important contributions.

The simplest way to use available melting thermodynamic data to develop a model capable of predicting Tm and melting thermodynamics is therefore to assume that base-pairing and base-stacking contributions can be captured together at the nearest-neighbor level, particularly since both are short-range interactions.

We may therefore consider a duplex n base pairs in length as n – 1 stacked doublet base pairs of the form

5' 3' M M N − N 3'+m+1 5'+m+1 (1.16)

N3'+m − N5'+m M M 3' 5'

th where the m base pair (N3’+m – N5’+m) might be, for example, an a•t pair that stacks with the

(m + 1)th to form a doublet base pair, also known as a nearest-neighbor (NN) base pair. As

25 originally proposed by Gray and Tinoco, Jr. (109), NNT models assume that (1) the hydrogen bonds between the bases of the mth base pair are sensitive to structural and electronic perturbations caused by the neighboring (m + 1)th base pair, and (2) the energetics of stacking interactions with the bases of the mth base pair depend only on the types of base pairs in the mth and (m + 1)th positions of the duplex. All longer-range contributions are

init ignored, so that ∆H° may be computed as a simple summation of the energy (ΔH j ) required to initiate denaturation through the dissociation of terminal base pairs, and the energies of denaturation (∆ H o ) for the set of nearest-neighbor (NN) base pairs comprising NNi the duplex:

∑ ∑ ∆ ∆ ∆ (1.17)

Here, ∆H° is the standard state enthalpy change for duplex denaturation at Tm. In Equation

1.17, index j counts the four possible terminal base pairs (e.g., 5’-a•t-3’’, 5’-t•a-3’, etc.), mj is the number (0, 1 or 2) of terminal base pairs of type j in the duplex, index i counts the 10 possible unique Watson-Crick nearest neighbors, and ni is the number of each nearest neighbor of type i in the duplex. As Gray has elegantly proven (101,102), the base-pairing requirements imposed by Watson and Crick in solving the structure of B-DNA reduce the 16

(i.e., 42) possible nearest neighbors within a single strand to 10 possible unique nearest neighbors within a double strand. If we use Equation 1.16 as a guide, this is because the two strands are anti-parallel, so that the doublet N3’+mN3’+(m+1)/N5’+mN5’+(m+1) is always equivalent to the doublet N5’+mN5’+(m+1)/N3’+mN3’+(m+1), a constraint that reduces the total number of independent nearest-neighbors to 10. In accordance with the statistical-mechanics derived theories mentioned above, the first term on the right-hand-side of Equation 1.17 accounts for

26 the unique energetics of terminal base pairs due to the fact that they are unbounded on one side.

Entropy is handled in an analogous manner within most NNT models in that it is assumed that the combined contributions of base-pair formation and base-stacking interactions, which are both orientationally (e.g., bases stack in plane) and spatially specific, to the total entropy gain accompanying the helix-to-coil transition (∆S°) can be partitioned into a sum of nearest-neighbor contributions, so that

∑ ∑ ∆ ∆ ∆ ∆ (1.18)

init where ∆Sj corrects for the unique entropy of the terminal base pairs that arises, in part, from the translational entropy loss in bringing the two independent strands together to form the first base pair. ∆Ssym applies only to self-complementary sequences and is an exact result of statistical mechanics that accounts for the fact that a bimolecular complex formed from two self-complementary strands has a two-fold rotational symmetry that is not present in either of the single strands or in any duplex formed from non-self-complementary strands

(79,110).

In Equation 1.17 or 1.18, all terms on the right-hand-side of the equality are temperature independent. The original NNT model for DNA (61), as well as those that are currently the most widely used (60,63), therefore compute both ∆H° and ∆S° as temperature- independent parameters by invoking the assumption that ΔCp = 0. As a result, ∆G is predicted by these models to have a linear dependence on temperature

ΔG = ΔH o −TΔSo (1.19) and we therefore classify them as “linear” NNT models. They are specific to short B-DNAs formed from single strands that melt in a two-state manner. If the two strands are not self-

27 complementary and present at equal concentration, Tm values at standard solution conditions

(1 M NaCl, pH 7) are calculated with Equation 1.7, using Equation 1.17 and 1.18 to estimate

∆H° and ∆S°, respectively. Otherwise, modified forms of Equation 1.7 are required as described above.

As indicated by Equations 1.17 and 1.18, linear NNT models are of the group- contribution type. They are primarily applied to the prediction of Tm, knowledge of which is critical to the proper design of oligonucleotides used as primers (111), probes (112,113) or antisense agents (114). Several versions of the linear NNT model (61,64,65), each possessing a unique set of regressed NN parameters, have been reported over the past few decades and have served to improve the accuracy of predicted Tm and ∆G (i.e., stability) values and, to a lesser extent, associated changes in enthalpy (ΔH°) and entropy (ΔS°) at the melting temperature. A thorough comparison and analysis of the different NNT models was performed by Allawi and SantaLucia, Jr. (60), and this permitted the conversion of existing versions of the NNT model, and their associated NN parameter sets, to a common thermodynamic reference state. The resulting “unified” NNT model, developed by

SantaLucia, Jr. and coworkers (63), currently provides one of the most accurate methods for predicting melting thermodynamics of short B-DNAs (115). SantaLucia, Jr.’s unified set of

NN parameters are reported in Table 1.1 along with additional parameters describing the energetics of terminal base pairs and duplex initiation. The “unified” NNT model accurately predicts Tm and thermodynamic changes such as ∆H° and ∆G for duplexes that melt near 50

°C, but becomes less accurate with increasing departure of Tm or T away from this temperature window (116). As noted above, the heat capacity change associated with duplex

28 denaturation is ignored, in part because reliable measurement of ΔCp was difficult to achieve at the time the model was established.

Table 1.1 “Unified” Nearest Neighbors Parameters for Helix-to-Coil Transition.

NN Sequence ∆ ∆ ∆ (5’-3’)•(5’-3’) kcal/mol kcal/mol cal/mol K-1 Internal aa•tt 1.00 7.9 22.2 at•at 0.88 7.2 20.4 ta•ta 0.58 7.2 21.3 ac•gt 1.44 8.4 22.4 ag•ct 1.28 7.8 21.0 ca•tg 1.45 8.5 22.7 ga•tc 1.30 8.2 22.2 cc•gg 1.84 8.0 19.9 cg•cg 2.17 10.6 27.2 gc•gc 2.24 9.8 24.4 Initiation & Terminal 5’-a•t-3’ -1.03 -2.3 -4.1 5’-t•a-3’ 5’-c•g-3’ -0.98 -0.1 2.8 5’-g•c-3’ Symmetry Correction If self complementary -0.43 1.4

The recognition of the non-zero value of ΔCp and the resulting temperature dependence of ∆H° and ∆S° has yielded a number of important insights into the hybridization thermodynamics of complementary oligonucleotides, including improved molecular interpretation of the well-known dependence of duplex RNA and DNA stabilities and conformations on nucleotide sequence (90). It has also permitted the reconciliation of differences in NNT model parameters derived from either oligomeric or polymeric DNA

(103). However, attention has not been given to properly accounting for the temperature

29 dependence of ΔH° and ΔS° within the NNT model so as to improve the prediction of Tms and hybridization thermodynamics, particularly at higher temperature.

1.2.3 Locked Nucleic Acids (LNAs)

Though they provide the elegantly simple yet extremely powerful language of life, the nucleotides of DNA, through their specific backbone and base-pairing chemistry, limit the thermodynamic behavior and other functional properties that can be engineered into oligonucleotides created for research, diagnostics and therapeutics applications. This has prompted extensive research into chemical modifications that enable facile tuning of the stability and functional properties of oligonucleotides to enhance their performance within specific applications and technologies (117). A number of unnatural nucleotide chemistries have been investigated to increase duplex stability while maintaining sequence specific recognition of complementary DNA or RNA strands. These studies show that stability enhancements can be achieved with chemistries that increase pre-organization of the modified single strand through a reduction in conformational degrees of freedom, including rotations around internal bonds (118). The Locked Nucleic Acid (LNA) is one example of a nucleotide that is conformationally restricted through chemical modification (119-121).

Discovered in 1998 by Wengel and Imanishi (120,122), LNAs are modified ribonucleotides that, when present in an RNA or DNA sequence, increase the stability of the helical structure of the oligonucleotide and thereby the stability of a complementary duplex in which a LNA- containing oligonucleotide is present (123-125). A discussion of the chemistry and properties of LNAs and the current models to predict melting temperatures of complementary

LNA-DNA duplexes is provided below.

30 1.2.3.1 Chemistry and properties

In LNA nucleotides (Figure 1.4) a 2’-O,4’-C methylene link is introduced into the ribose ring to constrain or “lock” the sugar moiety into an N-type (3’-endo) conformation

(121,122,126,127). The 2’-O,4’-C methylene bridge can be introduced into any of the DNA nucleosides to create LNA-adenine (A), LNA-thymine (T), LNA-guanine (G), or LNA- cytosine (C). Oligonucleotides containing A, T, G and C substitutions can be synthesized using standard phosphoramidite chemistry employed in DNA synthesis (34). Moreover, the incorporation of LNA bases does not alter oligonucleotide solubility, allowing LNAs to be strategically substituted into DNA or RNA for focused functional design of reagents for in- vitro and in-vivo applications (128,129).

Figure 1.4 Structure of sugar in DNA, RNA and LNA.

Recent studies have also indicated that LNA bases, despite their unnatural chemistry, can act as substrates for many important modification enzymes such as T4 PNK, T4 DNA ligase and DNA polymerases (130); however, they demonstrate resistance to certain exo- and

31 endonucleases (131). Interest in LNAs is driven by the ability of this chemical modification to increase the stability of oligonucleotide duplexes through the addition of one of more LNA bases to either or both of the complementary strands (123-125,132). In addition, heteroduplexes formed between an LNA containing oligonucleotide and pure ssDNA are relatively intolerant of base pair mismatches in general. These properties make LNAs a particularly attractive chemistry for designing short probes and primers for detecting low frequency or minority point mutations such as SPMs (133). To achieve this goal, however, models must be developed to accurately predict melting thermodynamics of heteroduplexes in which one strand (the probe or primer) contains one or more LNA.

1.2.3.2 Predicting the stability of LNA-DNA heteroduplexes

The design and application of single-stranded (ss) DNA reagents such as probes and primers has benefitted significantly from the development of thermodynamic models, most notably NNT type models that accurately predict duplex stability and Tm from sequence knowledge. The development of an equivalent model for oligonucleotides bearing LNA substitutions would obviously offer similar benefits. To date, two models have been developed to predict the Tm of a complementary duplex formed with an oligonucleotide containing one or more LNA substitutions.

The first model, referred to as “OligoDesign”, was developed by Exiqon Inc. (45). It is restricted to calculation of Tm and the details regarding the structure and parameters of the model have not been published. However, it is stated to have a standard deviation of ± 5.0

°C for Tm prediction for duplexes with non-neighboring LNA substitutions, and a convenient interface for using the model was available at http://lna-tm.com. However, at the time of

32 writing this thesis, this web-based tool is no longer accessible. In its place, Exiqon has recently launched a new Tm prediction tool for LNA heteroduplexes formed with either a pure-DNA or a pure-RNA complementary strand that is available at http://www.exiqon.com/ls/Pages/ExiqonTMPredictionTool.aspx. It allows Tm prediction for

LNA-DNA heteroduplexes with any number and configuration of LNA nucleotides.

+ However, it does not allow the user to specify either the CT and [Na ], but rather uses preset values of 2 μM oligonucleotide and 115 mM Na+, respectively, which are not consistent with most qPCR protocols.

The second model, which has been described in much greater detail (44), is specifically designed to predict the Tm of LNA-DNA “mixmers”, which are short complementary duplexes containing individual LNA•DNA base pairs flanked on both sides by DNA•DNA base pairs. It is an NNT type model that builds on the widely used “unified”

NNT model for unmodified duplex DNA (63) through the addition of 64 regressed LNA nearest neighbor parameters, 32 of which are used to compute ∆ΔH° (= ∆H°LNA - ∆H°DNA, where ∆H° is the standard enthalpy change for the helix-to-coil transition) and the remaining

32 are used to compute ∆ΔS°. The model therefore assumes that LNA substitutions enhance duplex stability by altering both ∆H° and ∆S°. Given the importance of the base-pairing and base-stacking forces that promote duplex DNA formation, as well as the chain organization processes that oppose it, this assumption seems reasonable. Indeed, on first inspection, the limited ∆H° and ∆S° data available for melting of duplexes containing LNA substitutions appear consistent with this assumption (44,122,126,134,135), though they do not reveal a definitive mechanism for the stability enhancement.

33 It is important to recognize that the heat capacity change ΔCp accompanying the helix-to-coil transition was not measured or accounted for in either model. By assuming ∆Cp

= 0, the second of these models (44), designated the LNA NNT model in this work, treats

∆H° as temperature invariant and computes ∆∆H° from melting thermodynamics data as

∆H°LNA(Tm(LNA)) - ∆H°DNA(Tm(DNA)), where Tm(LNA) ≠ Tm(DNA) due to the stability enhancement provided by the LNA substitution(s). If ∆Cp ≠ 0 but rather is > 0, this treatment of the data is incorrect and will create an analysis error that biases ∆∆H° to values greater than 0 and thus an improper accounting of the mechanism of action of an LNA substitution. There are now numerous data available showing that ΔCp for the melting of unmodified duplex DNA is non-zero and significant (73,74,86-90,136,137), and one recent study has reported a non-zero

(positive) ΔCp for an LNA-bearing duplex (85). These data therefore show that both ∆H°LNA and ∆H°DNA are temperature-dependent state functions, and ∆∆H° must then be computed at a common temperature, with Tm(DNA) being the most appropriate when one is trying to understand the mechanism of action of an LNA. ∆∆H° is therefore properly given by

∆H°LNA(Tm(DNA)) - ∆H°DNA(Tm(DNA)), where ∆H°LNA(Tm(DNA)) is computed using the measured

o enthalpy change ∆H LNA(Tm(LNA)) and the measured value of ΔCp.

1.2.4 PCR based methods for detection and quantification of a SPM

Standard primer-based real-time or quantitative PCR (qPCR) was pioneered by

Higuchi et al. (138) in the early 1990's, and provides the ability to monitor in real time (139-141) through the application of dyes such as SYBR Green (Molecular

Probes Inc.) which fluoresce upon intercalation with dsDNA amplification products. During

PCR amplification with an intercalating dye such as SYBR Green, the fluorescent signal

34 intensity increases in direct proportion to the amount of desired PCR product, allowing the progress of the PCR reaction to be monitored. Measurement of the number of PCR cycles required for the fluorescence intensity of the sample to reach a certain threshold, known as the quantification cycle (Cq), and comparison of Cq values with standards of known concentration allows the amount of starting material in the sample to be determined.

qPCR can also be performed using a sequence specific fluorescently labeled probe designed to selectively hybridize with the gene of interest (142,143). Primers common to the target gene and any homologous alleles are used for amplification, with the probe responsible for selectively reporting the amplification of the target. A number of different types of probes, including hydrolysis probes, hybridization probes, molecular beacons and FRET

(fluorescence resonance energy transfer) probes, have been developed (144).

Specific detection and discrimination of alleles that differ by as little as a single base pair can be achieved using qPCR assays utilizing either AS probes (Figure 1.5) or AS primers (Figure 1.6). Traditionally DNA oligonucleotides have been used to design both AS probes and AS primers capable of differentiating homologous sequences such as those found in SNPs and SPMs. However the analytical specificity of these probes and primers is generally limited to ca. 5% and 1%, respectively (28,30-33).

35

Figure 1.5 AS hydrolysis probe based qPCR assay for SPM detection. Primers common to both alleles hybridize and are then extended by Taq polymerase resulting in exponential amplification of both templates. Real- time fluorescent signal is only generated when the mutant specific probe hybridizes to the target mutant allele allowing cleavage of the 5’ terminal fluorophore by the 5’-3’ exonuclease activity of Taq. The single base pair mismatch between the mutant specific probe and wild-type allele prevents or limits hybridization.

36

Figure 1.6 AS primer based qPCR assay for SPM detection. Either the forward primer (as shown above) or the reverse primer is designed to be specific to the mutant allele with the site of variation (i.e. SPM) usually located at the 3’ terminus of the primer. A perfect match between the mutant specific AS primer and mutant allele allows for efficient extension of AS primer and exponential amplification ensues, monitored by either SYBR green or fluorescently labeled probes or primers. A mismatch formed between the mutant specific AS primer and willd-type allele inhibits extension of AS primer.

37 1.2.4.1 LNA containing AS probes

The use of a real-time probe to accurately detect and distinguish between two alleles that differ greatly in abundance while differing in sequence by as little as a single base requires that the probe display high affinity to the low-abundance target template and little to no affinity to the closely related sequence. When applied to the detection of an SPM, this requires maximizing the difference (∆) between Tm(MT), the melting temperature of the perfectly matched duplex formed between the probe and its mutant (MT) target, and Tm(WT), the Tm for the mismatched duplex formed between the probe and the more abundant wild- type (WT) sequence. The thermodynamics of bimolecular complexation equilibria provide the necessary conditions for achieving this goal. If heat capacity effects are, for the moment, ignored,

ΔH o ΔH o ΔT = T −T = MT − WT (1.20) m( MT −WT ) m( MT ) m(WT ) ΔS o − Rln()C ΔS o − Rln()C MT T ( MT ) WT T (WT ) where ∆H° and ∆S° are the standard enthalpy change and entropy change, respectively, for the helix-to-coil transition. Both are positive in value for either the perfect-match MT or the mismatched WT duplex denaturation reaction. Both CT(MT) and CT(WT) are equal to the probe concentration since the probe is present in great excess relative to either template at the start of a PCR, making the reaction pseudo-first-order. As a result, R ln(CT(MT)) and R ln(CT(MT)) in Equation 1.20 can be replaced by a constant C. Achieving an acceptable ∆Tm(MT-WT) for qPCR analysis of a somatic mutation therefore requires the design of a probe for which

∆H°WT/(∆S°WT + C) < Ta < ∆H°MT/(∆S°MT + C), so as to permit hybridization of the probe to the target MT template but not to the mismatched WT template during the PCR annealing step, conducted at temperature Ta. This thermodynamic criterion is difficult to achieve with unmodified DNA probes, as ∆Tm(MT-WT) is rarely larger than 5 °C for a single bp mismatch

38 within a standard 22 – 25 bp DNA probe due to the relatively low proportional contribution of the mismatched bp to the overall thermodynamic stability of the duplex. Current strategies for realizing a suitable ∆Tm(MT-WT) therefore involve modifying standard oligonucleotides used as hydrolysis probes either by adding a 3’ terminal minor groove binder (MGB) ligand (145) or by replacing nucleotides within the probe with locked nucleic acid (LNA) substitutions (36,39). The latter approach is intended to increase the difference in MT and WT duplex stabilities as a consequence of both the shorter probe length (60,146-

148) and the greater intolerance of LNAs to participation in mismatches (38).

1.2.4.2 LNA containing AS primers

AS primers are usually designed such that the 3’ terminal base is aligned with the site of variation (SOV) within the target gene (149,150). The complementarity of the primer with the MT template bearing the SPM allows efficient amplification to occur, while the single mismatch formed with the parent WT allele at the SOV reduces the efficiency of amplification of that more abundant template, presumably because the resulting perturbation in duplex structure reduces Taq polymerase extension activity (151,152). If the extension activity of Taq is completely inhibited by non-complementary binding of the AS primer to the non-target template, the primer can be defined as absolute in its allele selectivity and the

SPE is then set by the fidelity of the polymerase, which for Taq is estimated to lie between 1 x 10-5 and 2 x 10-4 errors per base pair per cycle (153,154).

Real-time qPCR performed with an AS primer linearly amplifies the strand complementary to the common primer. In time, Taq-mediated erroneous nucleotide substitution will occur during this linear amplification and can result in the generation of

39 templates with transition or transversion errors at the SOV. When such Taq-mediated errors at the SOV create base complementarity with the AS primer, exponential amplification ensues, resulting in a false qPCR signal. Based on this method of error accumulation and the associated 1 in 3 chance that a base substitution at the SOV is complementary to the AS primer, the lowest SPE of a qPCR assay utilizing an AS primer can be estimated to lie between 10-4 (0.01%) and 10-6 (0.0001%).

AS primers that align the site of variation with the 3’ terminal base are known to display a wide range of SPE, due in part to the complex dependence of primer performance on the nucleotides present at the SOV and within the primer•template sequence flanking that site (29,151,155,156). Several strategies have been proposed to improve discrimination, including the introduction of an additional mismatch near the 3’ terminus of the primer

(149,150). This approach, which typically requires trial and error placement of the second mismatch, has proven effective for some sequences, but does not appear to provide a general method for absolute AS primer design (31).

A second strategy is to incorporate a single Locked Nucleic Acid (LNA) as the primer’s 3’ terminal nucleotide, creating an LNA•DNA base-pair mismatch with the WT allele at the 3’ SOV (41-43,157-161). 3’-LNA AS primers can offer significantly enhanced

SPE for target alleles when compared to primers comprised only of DNA, but the success of this approach can vary widely (41,157,161). Nevertheless, these and other recent results

(162) suggest that clinically useful AS primers can be created using LNAs if design refinements can be established so as to systematically deliver acceptable SPE. At present, however, no guidelines exist for a priori design of absolute AS primers utilizing LNA substitutions.

40 1.2.5 Clinically significant somatic point mutations

In 1982 the first SPM directly linked to a cancerous cell state was identified in the

HRAS gene located on the short (p) arm of chromosome 11 (163). Since then, cancer-related and clinically significant somatic mutations, the majority being SPMs, have been identified in over 350 protein-coding genes (8). A subset of these SPMs is now part of routine cancer screening protocols conducted within clinical genetic laboratories, with results from these assays used to aid diagnoses of cancer presence and type, or to guide and monitor treatment.

Below I provide a brief description of three clinically relevant somatic point mutations that are adopted in this thesis work as meaningful targets for AS probe and AS primer design and testing. The intent is to use them as a basis to develop general rules for designing highly specific AS probes or AS primers for detection of SPMs.

1.2.5.1 JAK2 V617F

The JAK2 p.Val617Phe (JAK2 V617F) mutation is caused by a transversion mutation

(g•c to t•a) at nucleotide 1849 in the gene coding for the tyrosine kinase Januse Kinase 2

(JAK2), resulting is a substitution of phenylalanine (F) for valine (V) at codon 617. This mutation in JAK2 causes constitutive activation of the kinase, resulting in abnormal cell proliferation that is no longer regulated by associated cytokines and growth factors (164).

JAK2 mutations may be detected in up to 97% of polycythemia vera (PV) and up to 57% of essential thrombocythemia (ET) and idiopathic myelofibrosis (MF) cases, and may also be seen in cases of myelodysplastic syndrome (MDS) and MDS/myeloproliferative neoplasm overlap syndromes (165).

41 Clinical testing for the JAK2 V617F SPM is common, especially in patients with myeloproliferative disorders that are negative for the classic BCR/ABL translocation (166).

Detection of the JAK2 V617F SPM is currently used to guide diagnosis (164,166), and in the future may also be used to direct and monitor treatment with JAK2 inhibitors currently in development and in clinical trials (167).

1.2.5.2 KIT D816V

A transversion mutation (a•t to t•a) at nucleotide position 2468 in the KIT gene is responsible for the substitution of aspartic acid (D) with valine (V) in amino acid 816 of KIT, a trans-membrane class III receptor tyrosine kinase whose ligand is the stem cell factor

(SCF). Gain-of-function, SCF-independent growth is generally associated with the KIT p.Asp816Val (KIT D816V) mutation, found to occur in over 90% of adult cases of systemic mastocytosis (SM) (168). It has also been observed in gastrointestinal stromal tumors

(GISTs), acute myelogenous leukemia (AML), and a number of other malignancies (169-

171). Clinical diagnosis of KIT D816V positive SM or AML disease is currently used to identify patients unsuitable for imatinib (Gleevec) based treatment (172).

1.2.5.3 BRAF V600E

The BRAF p.Val600Glu c.1799t>a SPM is a transversion mutation (t•a to a•t) occurring at nucleotide position 1799 of the BRAF gene and is commonly found in a variety of human cancers, including colon, melanoma, lung cancer, and thyroid cancer (173). Of the three members of the RAF protein family, only the BRAF kinase has been found to be frequently activated in cancer. The single amino acid substitution of valine (V) with

42 glutamic acid (E) accounts for the majority of the mutations observed in the BRAF gene, and results in constitutive activation of BRAF kinase and transient cell proliferation. Detection of BRAF V600E is clinically relevant, as patients with metastatic melanoma positive for mutations in BRAF V600 codon have been shown to have higher survival rates when treated with the BRAF inhibitor vemurafenib (16).

1.3 Thesis Objectives and Content Overview

The objectives of this thesis are to

• develop and validate improved molecular thermodynamic models for

predicting the hybridization thermodynamics of short DNA duplexes and

LNA-DNA heteroduplexes

• combine these two models with experimental studies to establish guidelines

for defining the optimal number and placement of LNA substitutions in AS

probes and AS primers so as to achieve highly selective qPCR-based detection

of SPMs.

• show that these new design guidelines generally result in the creation of a AS

primers that can be classified as unequivocal in their SPE for a target,

clinically relevant SPM.

Chapter 2 (“Correcting for Heat Capacity and 5'-ta Type Terminal Nearest Neighbors

Improves Prediction of DNA Melting Temperatures Using Nearest-Neighbor

Thermodynamic Models”) reports on a critical analysis of the unified NNT model and modifications to that model that enable more accurate thermodynamic predictions for certain classes of sequences and for a broader range of temperatures. The analysis focuses on the

43 impact assuming ΔCp = 0 has on the accuracy of the “unified” NNT model and the improvements that can be realized by relaxing that assumption.

In Chapter 3 (“The role of the heat capacity change in understanding and modeling melting thermodynamics of complementary duplexes containing standard and nucleobase modified LNA”) the improved NNT model developed in Chapter 2 is extended further to create a molecular thermodynamic model that accurately predicts stabilities of LNA-DNA heteroduplexes. Thermodynamic data collected by DSC required to regress model parameters are reported. By accounting for a non-zero ΔCp, the model is shown to accurately predict hybridization thermodynamics for LNA-DNA heteroduplexes containing any number or pattern of LNA substitutions. Furthermore, the model is found to provide insight into known differences in the stability enhancement provided by each of the four standard LNA bases (A, T, G or C).

Both models are then combined with experimental studies to design improved AS probes for selectively detecting the SPMs KIT D816V and JAK2 V617F by real-time qPCR

(Chapter 4). Short LNA containing AS probes with sufficiently large ∆Tm(MT-WT) are created using the models so as to prevent cross-hybridization between the MT-specific probe and the non-target WT template. Based on the results obtained, guidelines are provided for designing LNA-containing hydrolysis probes that achieve SPE as low as 0.5%. Insights are also offered as to why unequivocal detection of SPMs is not possible using AS probe based qPCR assays.

This limitation is overcome in Chapter 5 (“Novel Plexor™ Multi-LNA Allele

Specific Primers for Unequivocal Clinical Detection of Somatic Point Mutations: Design

Rules and Application to JAK2 V617F, KIT D816V and BRAF V600E”) which focuses on

44 the design of AS primers containing LNA substitutions. An extensive study that defines the sensitivity of Taq polymerase to LNA substitutions in the 3’ terminal region of a primer is reported and then used to establish general guidelines for designing LNA containing AS primers capable of unequivocal detection of the clinically relevant SPMs JAK2 V617F, KIT

D816V and BRAF V600E.

45 Chapter 2: Correcting for Heat Capacity and 5'-ta Type Terminal Nearest

Neighbors Improves Prediction of DNA Melting Temperatures Using

Nearest-Neighbor Thermodynamic Models

The unified NNT model of SantaLucia, Jr. et al. (63), which currently provides the

most accurate predictions of stability of DNA duplexes, assumes that ΔCp is zero or

negligible. The model is known to be most accurate for short dsDNA that melts near 50°C

(116). Here, the effect that the zero ΔCp assumption within the unified NNT model and other

current NNT models has on the accuracy of Tm predictions, particularly for higher stability

duplexes that melt above 60°C, is investigated. The average heat capacity change per base

bp pair (ΔCp ) is measured and applied to remove the zero ΔCp assumption in the unified NNT

model to create a non-linear NNT model that extends the range over which Tm is accurately predicted. By properly accounting for the temperature dependence of ∆H° and ∆S°, a second limitation in the unified NNT model is identified that serves to increase Tm prediction errors.

Specifically, duplex sequences with at least one terminal 5’-ta sequence were found to be

more stable than predicted by the model. A library of sequences with equal representation of

all NN, but with different terminal base pairs and/or terminal nearest neighbors, was created

and examined to elucidate the nature of the stability increase in duplexes with terminal 5’-ta

sequences. A further correction to the unified NNT model for duplexes with one or two 5’-ta

termini is proposed and shown to provide significantly more accurate predictions of Tm for these duplexes.

46 2.1 Materials and Methods

2.1.1 DNA synthesis and purification

Desalted and HPLC purified single-strand oligonucleotides were purchased from

Integrated DNA Technologies (Coralville, IA), with the purity of each product determined by

capillary electrophoresis (CE) or calculated from the measured coupling efficiency to

establish the concentration of the full length product for each sample analyzed by DSC. The average purity of oligonucleotides following HPLC processing was 97.9% (± 1.8%), while the base coupling efficiency regressed from CE chromatograms was determined to be 99.3% for the desalted oligonucleotides.

2.1.2 Differential scanning calorimetry

Single-strand oligonucleotides were dissolved in 10 mM Na2HPO4 buffer (pH 7)

containing 1 M NaCl and 1 mM Na2EDTA. For the three duplexes where different salt

concentrations were tested, a NaCl concentration of 50 mM or 250 mM was used. The

concentration of each single strand was determined using absorbance readings at 260 nm and

80°C and extinction coefficients provided by the manufacturer. Duplex DNA sample

solutions, each containing an equimolar concentration of the two complementary strands,

were then prepared to the desired total molar concentration (CT) of 50, 75 or 100 μM.

Samples were degassed for 7 minutes under gentle stirring immediately prior to DSC

analysis. All melting experiments were performed on a VP-DSC differential scanning

calorimeter (Microcal, Inc., Northampton, MA). The nominal cell volume of the instrument

is 0.52811 ml and the scan rate used to collect data from 1 to 100 °C was 1°C/min. For each

experiment, buffer versus buffer excess thermal power baselines were measured immediately

47 before loading the degassed duplex solution into the DSC sample cell and then averaged.

The duplex-containing sample solution was then dynamically loaded on a thermal down scan

and several denaturation/renaturation scans were performed to verify the reversibility of the

melt transition. Except for occasional instabilities in the first denaturation scan associated

with the sample loading process, repeat denaturation scans for a given sample were super-

imposable, indicating that the helix-to-coil transition was fully reversible. The average of

these denaturation scans was computed and the average buffer versus buffer baseline was

ex subtracted from it to produce an excess heat capacity (Cp ) versus temperature curve for the

helix-to-coil transition.

2.1.3 Regression of melting thermodynamics data

Excess heat capacity data were concentration normalized and then analyzed using two

different methods to determine melting thermodynamics. The first method is model

independent and computes the calorimetric enthalpy change for the helix-to-coil transition

(∆H°cal) as the area enclosed by the transition peak and the underlying pre-to-post transition

cal baseline; Tm is the temperature at the area midpoint of the enclosed melting transition.

In the second method, melting transition data were fit to the non-linear two-state

2-st transition model, given by Equations 1.13 to 1.15, to determined ΔCp, ΔH°2-st and Tm .

2.1.4 Error analysis

Independent experiments were performed in triplicate for duplexes C1 and T1 in order to determine repeatability and to obtain an estimate of experimental error. Standard deviations for each thermodynamic parameter were determined from these repeat runs and

48 reported in Table 2.3. Experimental errors for the remaining duplexes, each studied by

independent runs performed in duplicate, were estimated from errors for C1 and T1 to be

±33%, ±3%, ±3% and ±0.5 °C for the measured thermodynamic parameters ΔCp, ΔH°, ΔS°

and Tm, respectively. While the error in ΔCp is higher than for the other regressed

thermodynamic data, higher error is expected for a second derivative of ΔG (i.e. ΔCp) as opposed to a first derivative (i.e ΔH°, ΔS°). More importantly, I will show that despite this uncertainity in ΔCp, significant improvements in the prediction of melting thermodynamics

bp can be realized by using the mean ΔCp per base pair (ΔCp ) reported in this work.

2.2 Results and Discussion

Initial studies focused on comparing Tm values predicted by available NNT models to published data for 125 complementary DNA duplexes (44,52,174) with the aim of determining if the zero ∆Cp assumption in current NNT models results in any prediction bias

or systematic error. The duplexes included in the study have lengths ranging from 8 to 30

bases and were selected based on a number of criteria, including demonstration of two-state

melting behavior (44,52), and the fact that they were not used for regression of the reported

NNT model parameters (60,61,63-65). Also, for each sequence, the true Tm, rather than the

+ value of Tmax (81), was available at 1 M Na , and therefore no salt or Tmax-to-Tm correction

was required for comparison with model predictions. For each sequence, the experimental

melting temperature (Tm(expt)) was subtracted from the predicted melting temperature (Tm) to

determine the model error (Tm(error)). Predictions from several available NNT models for short DNA duplexes were evaluated (61,64,65). The NNT model employing the unified

parameter set (60,63), previously and hereafter referred to as the unified NNT model, was

49 found to provide the lowest average Tm(error) and was therefore chosen as the reference (i.e., zero ∆Cp) model used for all subsequent linear NNT model predictions of melting

thermodynamics reported in this chapter.

Figure 2.1 shows a plot of Tm(error) versus Tm(expt) values for the 125 duplexes within

our study set. The Tm(expt) ranged from 33.6 to 88.6 °C and therefore provided a good test of

unified NNT model predictions at temperatures away from 50°C. The average ΔTm for the

125 sequences was 0.7 ± 1.8°C indicating that the unified NNT model typically provides

good estimates of Tm. However, for sequences with a Tm(expt) above 70°C, the unified NNT

model systematically over predicts Tm such that ΔTm = +1.6 ± 1.8°C. Similar trends in ΔTm error were observed when analyzing other NNT models in the same manner, and these results suggest that model improvement could be realized by properly defining the temperature dependence of the sequence specific ∆H° and ∆S° values computed by the model and used to predict Tm. Similarly, at lower temperatures, accounting for ∆Cp might serve to improve

structural prediction algorithms for short DNA duplexes, as has recently been discussed (88).

50

Figure 2.1 Tm(error) values for 125 test sequences in which the predicted Tm is determined byy the unified NNT model. Duplexes without 5’-ta (□) and duplexes with one or two 5’-ta (■) are shown.

2.2.1 Introduction of ΔCp into the unified NNT model improves Tm predictions

All previously published NNT parameter sets, including that used in the unified NNT model, have been determined under the assumption that ΔCp is zero for the helix-to-coil transition. An obvious benefit of this simplifying assumption is that it allows NNT parameters for ∆H°, ΔS° and ΔG° to be regressed directly from experimental data without the need to correct for differences in Tm(expt) within the data set. In principle, correcting a NNT model for a non-zero ΔCp would require (re)collecting melting thermodynamics data, including the ΔCp value, for a large set of duplex sequences to permit accurate regression of

NNT parameters following correction of the data set to a common reference temperature

51 (Tref). The regressed NNT parameters are then specific to Tref, and the Tm for a given sequence is predicted using Equation 2.1

o ΔH (Tref ) + ΔC p (Tm − Tref ) Tm = o (2.1) ΔS (Tref ) + ΔC p ln()Tm Tref + R ln()CT 4

where ΔCp is assumed to be non-zero and temperature independent. Although this approach

arguably provides the most rigorous strategy for addressing the observed bias in the unified

NNT model at high temperatures, its implementation is troublesome, as the large existing set

of thermodynamic data for duplex DNA sequences that undergo the helix-to-coil transition

according to two-state thermodynamics does not include accurate ΔCp values for the

sequences and therefore could not be utilized for parameter regression. What remains is a

rather limited set of ΔCp-inclusive melting thermodynamics data that is not sufficiently large

to permit accurate regression of the required set of new NN parameters needed to compute

∆H°(Tref) and ∆S°(Tref) in Equation 2.1.

An alternative, but less rigorous approach is to introduce a non-zero ΔCp estimate into

Equation 2.1 such that it and the existing unified NNT model (and associated NN parameter

set) can then be used to more accurately predict Tm values and helix-to-coil thermodynamics for duplexes that melt at temperatures away from Tref. Existing thermodynamic data and the

many advantages of the unified NNT model are thereby fully exploited.

As noted previously, Santa Lucia Jr. found that the unified NNT model most

accurately predicts ΔG° at temperatures near 50 °C (116), which suggests that model

estimates of ∆H° and ∆S° are also most reliable near this temperature. This result, which

supports a Tref of 50°C in Equation 2.1 is not surprising, as it is very close to the average

52 Tm(expt) (47 ± 11 °C) for the 108 duplexes originally used to regress the unified NNT

parameter set (60).

bp The value of ΔCp could likewise be drawn from existing literature. However, it is

bp possible to estimate the values of ΔCp and Tref using the Tm(expt) data for the 125 duplex

bp sequences within our test set. Regression of the required ΔCp and Tref parameters is then

2 achieved by minimizing the error of the residual (χ ) between Tm predicted by Equation 2.1

and Tm(expt)

∑ (2.2)

bp -1 -1 -1 An average ΔCp of 42 ± 16 cal mol K bp and a Tref of 53 ± 5 °C were thereby obtained.

bp Uncertainties in ΔCp and Tref parameters were estimated from regression analysis and

correspond to the upper and lower limits that result in a 10% increase in the value of χ2.

bp Note that the regressed Tref is quite close to 50°C. Moreover, the value of ΔCp falls well

within the consensus range of previously reported experimental values (88), and is also in

bp -1 -1 -1 excellent agreement with the intrinsic ΔCp value of 48 ± 10 cal mol K bp that has been

reported based on theoretical arguments (74).

As demonstrated in Figure 2.2, the application of Equation 2.1, where ∆H°(Tref) and

bp ∆S°(Tref) are computed using the unified NNT model and ΔCp is given by nbpΔCp (nbp = the

number of base pairs in the duplex), results in an improvement in both the overall accuracy

o and precision of Tm predictions, with the average Tm(error) now decreased from 0.7 ± 1.8 C to

-0.2 ± 1.4 oC for the 125 duplexes in the test set. This improvement can be attributed to

better Tm predictions for duplexes melting above 70°C, with the average Tm(error) for these duplexes improving to -0.2 ± 1.1°C, as compared to 1.6 ± 1.8°C when ΔCp is assumed to be

zero. There was no significant improvement in the average Tm(error) for duplexes with a Tm(expt) 53 between 40 and 70°C, as errors were -0.1 ± 1.6°C using Equation 2.1 and +0.1 ± 1.6°C using the existing unified NNT model.

Figure 2.2 Tm(error) values for 125 test sequences accounting for a non-zero ΔCp. -1 -1 -1 Predicted Tm is determined by Equation 2.1 with ΔCp = 42 cal mol K bp o and Tref = 53 C. Duplexes without 5’-ta (□) and duplexes with one or two 5’-ta (■) are shown.

Finally, for a number of duplex sequences that melt at temperature at or below 40°C,

Tm is not well predicted by either method (zero or non-zero ΔCp models). For many or all of these sequences, this is likely due to significant single stranded ordering at lower temperatures that is not accounted for in the unified NN thermodynamic parameters. This phenomenon has been carefully studied by others (73,74,87,92,175) and is not the focus of our investigation.

54 bp 2.2.2 Regressed ΔCp and Tref values are supported by DSC data

The accuracy of our method, embodied in Equation 2.1, for predicting melting

thermodynamics of short DNA duplexes depends on the validity of three key model

assumptions: namely that (1) a Tref can be identified at which the unified NNT model

bp provides accurate estimates of ∆H°(Tref) and ∆S°(Tref), (2) a value of ΔCp can be established

bp and used to estimate the ΔCp required in Equation 2.1, and (3) that said value of ΔCp is temperature independent over the temperature range where estimates of Tm are typically

required for primer and probe design.

To address the validity of these assumptions, DSC data were collected on 16 duplexes

at 1022 mM Na+ with Tm values ranging from 40.6 to 77.8°C (Table 2.1). DSC data were

also collected at two additional total salt concentrations of 272 mM and 72 mM Na+ for a

subset of these sequences consisting of three 12-mers with GC content of 25%, 50% or 75%.

For each sample, the ratio of the predicted enthalpy (ΔH°pred) to experimental enthalpy

(ΔH°expt) was determined and plotted against the corresponding Tm(expt) (Figure 2.3). The

results show that ΔH°pred agrees reasonably well with ∆H°expt between 50 and 60°C when

experimental errors are taken into account. The trend line suggests a Tref near 55°C is

optimal, which is in good agreement with our previously regressed value for Tref of 53 ± 5°C.

It also supports our assumption that the unified NNT model can provide reliable estimates of

∆H°(Tref) and ∆S°(Tref) at our chosen Tref of 53°C.

55 Table 2.1 Measured thermodynamic values from DSC analysis of short duplex DNA.

+ a o a a Sequence CT Na ΔCp ΔH Tm μM mM cal/mol K kcal/mol oC ctggagc 100 1022 418 46.2 40.6 agacctagt 100 1022 440 56.0 46.2 aaatagagaattc 75 1022 610 88.9 52.8 ccatgtccc 100 1022 509 65.9 53.4 gaaacagttaaag 75 1022 474 96.6 56.1 ttcttatagatacaag 50 1022 504 113.6 57.8 gatctgaggtact 75 1022 523 98.4 62.5 agtagtaatcacacc 50 1022 645 117.1 64.9 gcatgcccgtgacac 50 1022 650 129.0 76.2 acgcttgtaacactagt 50 1022 534 134.2 70.7 caaagttcacccaggaaca 50 1022 550 153.8 75.2 tcagatccgaggaacgtt 50 1022 657 147.4 76.3 ccgcactggacgcc 50 1022 735 122.1 77.8 aactatgaaact 75 1022 516 81.9 52.9 272 595 80.3 46.9 72 629 78.6 38.5 ctcgggaacgcc 75 1022 519 100.1 71.2 272 433 98.5 67.5 72 523 94.9 59.8 ggaacaagatgc 75 1022 604 93.8 61.3 272 623 93.9 56.4 75 72 663 91.0 48.3 a o Reported values of ΔCp, ΔH , Tm are determined from DSC data and are an average of thermodynamic parameters determined by the two-state model and by direct integration of the calorimetric data. Estimated errors for thermodynamic values as stated in Materials and Methods.

56

Figure 2.3 Relationship between predicted (unified NNT model) and experimental ΔHo as a function of Tm(expt). Solid circles (●) represent 16 duplexes with DSC data collected at 1 M NaCl, Open squares (□) are for duplexes with DSC data collected at various salt concentrations. The trend in the data is shown as a dashed line and deme onstrates the temperature dependence of ∆Ho.

In accordance with a previous experimental studies (85-87,89,90,137), our data show that ΔH° is temperature dependent. DSC analysis of the 16 duplexes at 1022 mM Na+

bp -1 -1 -1 bp yielded an average ΔCp of 43 ± 9 cal mol K bp , while an average ΔCp value of 47 ± 6 cal mol-1 K-1 bp-1 was obtained for the three duplexes studied at various salt concentrations.

bp -1 Both of these values are in good agreement with our average ΔCp value of 42 ± 16 cal mol

K-1 bp-1 regressed from a much larger data set. In addition, Figure 2.4 shows, at least for the

16 sequences examined here, that there is no statistically significant temperature dependence

57 bp to ΔCp over the temperature range 38.5 to 77.8°C. This assumption is likely not valid for all duplexes, especially those exhibiting a Tm lower than 40°C or significant ordering (i.e., pre-organization) of the single strands, including simple single stranded base-stacking or more sequence specific structures such as hairpins or homo-dimers (92). As noted previously, the contribution of significant single stranded structure to ΔCp and other thermodynamic values such as ΔH° and ΔS° for the helix-to-coil transition has been well documented and is not generally well predicted by current NNT models (73,74,87,175).

Although it has been suggested that the introduction of a temperature dependent ΔCp in the modeling of duplexes with a low Tm(expt) may improve predictions (87), there is currently insufficient data available for this concept to be properly implemented.

bp Figure 2.4 Measured ΔCp values as a function of experimental melting temperature. Duplexes with DSC data collected at 1 M NaCl (●) and collected at various bp salt concentrations are also shown (□). The average ΔCp value of 45 cal mol-1 K-1 bp-1 for the complete set of duplexes is also shown (―).

58 2.2.3 Duplexes terminating in a 5’-ta have statistically significant Tm(error)

The results in Figure 2.2 show that the application of Equation 2.1 and its associated

use of a non-zero ΔCp reduce the average error in predicted Tm values. However, a number of duplex sequences within the test set remain considerably more stable than predicted by

Equation 2.1, with some showing Tm(error) greater than -2°C. A common trait for all of these

duplexes is that the sequences contain at least one terminal 5’-ta. To investigate this further,

a complete analysis of Tm(error) as a function of duplex termini chemistry was completed and

the results are shown in Table 2.2. The 125 reference duplexes were categorized by their

terminal base pairs, terminal base pair direction and terminal nearest neighbors to determine

if Tm(error) associated with a certain terminal sequence were significant. Of the sixteen

possible terminal nearest-neighbors, only duplexes ending in a NN containing a 5’-ta had an

average prediction error (Tm(error) = -1.8 ± 1.2°C) greater than the standard deviation of

±1.4°C for the entire set of 125 duplexes, indicating that the presence of a terminal 5’-ta

sequence has a statistically significant stabilizing effect that is not accounted for in current

NNT models.

59 o Table 2.2 Average Tm(error) ( C) values associated with terminal base pairs and terminal nearest-neighbors.

Terminal Base Pair Terminal Base Pair Direction Terminal NN a•t -0.4 ± 1.5 5’-a•t-3’ 0.2 ± 1.4 5’-aa•tt-3’ -0.8 ± 0.7 5’-ac•gt-3’ 0.7 ± 1.2 5’-ag•ct-3’ 0.5 ± 1.7 5’-at•at-3’ -0.7 ± 1.2 5’-t•a-3’ -0.7 ± 1.6 5’-ta•ta-3’ -1.8 ± 1.2 5’-tc•ga-3’ 0.3 ± 0.8 5’-tg•ca-3’ 0.4 ± 1.2 5’-tt•aa-3’ 0.0 ± 1.4 g•c 0.1 ± 1.4 5’-c•g-3’ 0.2 ± 1.3 5’-ca•tg-3’ 0.2 ± 1.1 5’-cc•gg-3’ 0.2 ± 0.9 5’-cg•cg-3’ 0.8 ± 1.2 5’-ct•ag-3’ -0.8 ± 1.3 5’-g•c-3’ 0.0 ± 1.4 5’-ga•tc-3’ -0.1 ± 1.7 5’-gc•gc-3’ 0.3 ± 1.3 5’-gg•cc-3’ 0.7 ± 1.0 5’-gt•ac-3’ -0.6 ± 1.2

Since the library of published thermodynamic data used to generate the errors

reported in Table 2.2 was not designed to isolate the contribution of the termini to duplex stability, a new library of short DNA sequences specifically designed to more cleanly segregate the contributions to duplex stability of internal versus terminal nearest neighbors

was created and then characterized by DSC. Table 2.3 lists the twelve 11-mer duplexes

designed to contain all ten internal nearest-neighbor base pairs represented equally while

varying the terminal base pairs and terminal nearest-neighbors. Melting thermodynamic data

collected by DSC are also reported for this library and were used to determine the

thermodynamic contributions of different terminal groups. For all sequences in the library,

there is good agreement in thermodynamic parameters determined by the two-state model

and by direct integration of the calorimetric data. Agreement between these two methods of

data analysis is a necessary but not sufficient condition to indicate that the melting transition

60 obeys two-state thermodynamics (176). The average ΔCp for the twelve 11-mer duplexes was determined to be 378 ± 125 cal K-1 mol-1 or 34 ± 11 cal mol-1 K-1 bp-1, which is

bp consistent with our previously regressed value of ΔCp .

Table 2.3 Measured thermodynamic data for helix to coil transition of 11-mer DNA duplexes used to study duplex end effects.

Name Sequence Two State Modela Calorimetrica 2-st cal ΔCp ΔH°2-st ΔS°2-st Tm ΔH°cal ΔS°cal Tm cal/mol K kcal/mol cal/mol K oC kcal/mol cal/mol K oC C1b ctacgcattcc 448 ± 147 82.5 ± 2.5 226.5 ± 7.3 59.4 ± 0.5 84.5 ± 4.6 232.3 ± 13.3 59.7 ± 0.5 C2 ctaacggatgc 328 83.8 230.2 59.6 85.3 234.4 59.9 C3 ctattggcgac 386 82.3 225.0 60.5 83.8 229.5 60.5 C4 cgtattcaggc 403 87.9 243.4 58.7 88.6 245.0 59.0 C5 caatacgcctc 403 78.8 215.7 58.8 77.7 212.2 59.2 Average (C1 to C5) 394 83.1 228.2 59.4 84.0 230.7 59.6 T1b ttcatagccgt 356 ± 74 79.7 ± 1.3 218.1 ± 4.0 59.4 ± 0.1 80.4 ± 1.2 219.9 ± 3.6 59.6 ± 0.1 T2 ttccgtagcat 280 74.6 202.9 59.2 74.9 203.6 59.4 T3 tgcggataagt 308 79.7 217.9 59.6 79.7 217.7 59.7 T4 tcggctattgt 367 78.2 214.5 58.2 77.6 212.4 58.3 Average (T1 to T4) 328 78.1 213.3 59.1 78.1 213.4 59.3 T5 tactccgcatt 478 73.7 199.4 60.5 74.0 199.9 60.9 T6 tagaccgcaat 320 74.4 200.8 61.3 74.1 199.8 61.4 T7 tatcgttgcct 463 74.1 199.8 61.4 74.0 199.4 61.8 Average (T5 to T7) 421 74.1 200.0 61.1 74.0 199.7 61.4 Average (T1 to T7) 368 76.4 207.6 59.9 76.4 207.5 60.2 a b DSC data was collected at CT = 75 μM for all duplexes in Table 2.3. Errors for C1 and T1 determined from triplicate repeats. Estimated errors for other duplexes as stated in experimental section. The sequences in Table 2.3 have been segregated such that those in a given cluster

(e.g. C1 to C5, or T1 to T7) are predicted by the unified NNT model to have identical

melting thermodynamics since they share the exact same composition of nearest neighbors

and terminal base pairs. For each cluster, the average Tm agrees reasonably well with the

average Tm(expt). Average sequence dependent trends in the experimental data are also

reasonably well predicted by the unified NNT model. For example, 11-mer sequences with

terminal g•c base pairs on average hybridize with a greater enthalpic driving force (∆∆H° = -

61 5.4 ± 2.5 kcal mol-1) and more entropy loss (∆∆S° = -16.1 ± 6.9 cal mol-1 K-1) when compared to those sequences terminating in a•t base pairs but not containing a terminal 5’-ta sequence. Within experimental error, the unified NNT model captures this, predicting that

∆∆H° = - 4.4 kcal mol-1 and ∆∆S° = -13.8 cal mol-1 K-1.

Several insights can be drawn from comparing model predictions to the acquired thermodynamic data, but among the most important for the purposes of this study is the confirmation that 5’-ta termini provide a duplex stabilizing effect that is not accounted for in the unified NNT model (60,63) or in other NNT models (61,64,65). In particular, 11-mer sequences in the T1 to T7 cluster having a single 5’-ta terminus (T5 to T7) exhibit a Tm(expt) that is on average 2.0 ± 0.5°C higher than the average Tm(expt) for the four sequences devoid

-1 of a 5’-ta terminus. This increase in Tm is driven by an average ∆∆S° of 13.5 ± 6.4 cal mol

K-1 that serves to reduce the entropic penalty accompanying duplex formation. That entropy

gain is only partially compensated by a weaker enthalpy of interaction (∆∆H° = 4.0 ± 2.3 kcal mol-1) between the complementary bases of the 5’-ta terminus.

It is interesting to note that although terminal 5’-ta NN sequences stabilize the

duplex, internal ta NN sequences are predicted by the unified NNT model (63) and other

NNT models (61,64,65) to provide the lowest contribution to duplex stability of all 10 possible unique NN sequences. This indicates that the base-pairing and base-stacking interactions of a ta NN sequence within a duplex are relatively poor. However, when a 5’-ta

NN sequence is placed at a terminal position, that same weakness in base-pairing and base- stacking energy provides for greater configurational freedom (entropy) at the 5’-ta terminus, resulting in a net stabilization of the duplex.

62 2.2.4 Correcting Tm predictions for duplexes containing 5’-ta type termini

The proposed model, embodied in Equation 2.1, for predicting helix-to-coil transition thermodynamics and melting temperatures for short duplex DNA requires values for

∆H°(Tref) and ∆S°(Tref). These were computed using the unified NNT model. In that model,

∆H° is computed using 10 NN parameters (∆Hi where i represents the type of NN) and an

init additional 2 "chain-initiation/end-effect" parameters (∆Hj where j represents the type of

end, g•c or a•t) that account for the terminal base pairs as well as duplex initiation

init (60,63,116). Likewise, a set of 10 NN parameters (∆Si), 2 end-effect parameters (∆Sj ), and

one theoretically derived chain symmetry parameter (∆Ssym) are used to compute ∆S°. Given

the findings reported here that terminal 5’-ta groups are thermodynamically distinct from

internal ta NN sequences, enthalpic and entropic corrections, denoted ∆∆H5’-ta and ∆∆S5’-ta respectively, were determined to account for the excess duplex stabilizing effect of placing a

5’-ta type NN at a strand terminus. Values of ∆H°(Tref) and ∆S°(Tref) for duplexes with

terminal a•t base pairs, either with or without terminal 5’-ta groups, were computed using the

two-state as well as the calorimetric values of ∆H° and ∆S° at Tm (Table 2) and the regressed

bp -1 -1 -1 -1 ΔCp value of 42 cal K mol bp , which gives a ΔCp for each 11-mer duplex of 462 cal K

-1 mol . These results were then used to regress the ∆∆H5’-ta and ∆∆S5’-ta correction parameters

for a 5’-ta type termini by subtracting the average ∆H°(Tref) for duplexes with terminal 5’-ta

groups from that for duplexes with terminal a•t base pairs but no terminal 5’-ta sequence. A

similar calculation was completed with the entropy data, and the resulting values of ∆∆H5’-ta and ∆∆S5’-ta are reported in Table 2.4.

63 o Table 2.4 Thermodynamic parameters for 5'-ta terminal NN at Tref = 53 C.

a a Method ∆∆H5’-ta ∆∆S5’-ta kcal/mol cal/mol K Two State -4.9 -16.1 Calorimetric -5.0 -16.6 Average -5.0 -16.4 aReported parameters are for helix-to-coil transition and used in conjunction with Table 1.1 in Equations 2.3 and 2.4.

For duplex sequences possessing a 5’-ta group at one or both termini, ∆H°(Tref) and

∆S°(Tref) are then computed as

10 init ΔH°(Tref ) = ∑ niΔHi + ∑ m jΔH j + lΔΔH5' −ta (2.3) i=1

and

10 init sym ΔS°(Tref ) = ∑ niΔSi + ∑ m jΔS j + lΔΔS5' −ta + ΔS (2.4) i=1

where ni is the number of NN of type i, mj is the number of terminal base pairs of type j, and l is the number of 5’-ta sequences.

Equations 2.1, 2.3 and 2.4 were used to recalculate Tm and Tm(error) for all duplexes

with one or two 5’-ta sequences. As shown in Figure 2.5, a substantial improvement in

model predictions is observed for all such duplexes, with the average Tm(error) decreasing from

-1.8 ± 1.2°C to -0.6 ± 1.5°C, an average error that is similar to those found for all other

terminal NN sequences (Table 2.2). Comparison of Tm(error) in Figures 2.2 and 2.5 shows that

this method substantially eliminates the bias in predicted Tm values.

64

Figure 2.5 Tm(error) values for 125 test sequences accounting for a non-zero ΔCp and parameters for sequences with terminal 5’-ta. bp -1 -1 -1 Predicted Tm is determined by Equation 2.1 with ΔCp = 42 cal mol K bp o and Tref = 53 C and correction for 5’-ta determined using Equation 2.3 and 2.4 and parameters from Table 3. Duplexes without 5’-ta (□) and duplexes with one or two 5’-ta (■) are shown.

2.3 Conclusions

Previous calorimetry studies show that ΔCp exhibits positive, non-zero values

(73,74,85-90,103,136). Here, published and new melting thermodynamic data are collected and used to first show how the simplifying assumption in current NNT models that ∆H° and

∆S° are temperature independent biases predicted Tm values away from Tm(expt), and to then

bp -1 -1 -1 regress an average value for ΔCp of 42 cal K mol bp that can be used to more accurately compute the temperature dependence of ∆H° and ∆S°. These results were

65 combined with the unified NNT model to establish a method, embodied in Equations 2.1, 2.3

and 2.4, that more accurately predicts across a broad temperature range the sequence

dependent Tm and associated melting thermodynamics (∆H°, ∆S° and ∆G°) of short

complementary duplex DNA. The model exploits the ability of the unified NNT model to

provide good estimates of the melting enthalpy and entropy at 53°C, which was found in our

work to be the optimal Tref for model calculations. Key model assumptions were

bp experimentally validated, including the finding that ΔCp is, to a first approximation,

invariant over the temperature range within which the Tms of probes and primers are typically

designed.

Results from using the new method to reanalyze the melting thermodynamics for a

large library of duplex DNA sequences suggested a significant stabilization of duplexes

containing one or more 5’-ta termini. DSC data were therefore acquired and analyzed for

duplexes designed to specifically isolate terminal NN effects. The results revealed that a

terminal 5’-ta provides excess stability to the duplex through through an entropy increase

that is likely related to duplex fraying and is only partially compensated by the expected

enthalpy gain accompanying that fraying process. The data were also used to determine two

parameters, ∆∆H5’-ta and ∆∆S5’-ta, that can be incorporated into the model to improve Tm predictions for duplex sequences with 5’-ta.

66 Chapter 3: The Role of the Heat Capacity Change in Understanding and

Modeling Melting Thermodynamics of Complementary Duplexes

Containing Standard and Nucleobase Modified LNA

Chapter 2 and the associated publication (177) demonstrated that accounting for a non-zero ΔCp within NNT models can improve predictions of helix-to-coil transition

thermodynamics over a wide range of melting temperatures for complementary duplexes

comprised of short DNA oligonucleotides. Here, this concept is extended to short duplexes

containing standard and nucleobase modified LNA(s) with the aim of both improving our understanding of the thermodynamic origin of the stability enhancement provided by LNA nucleotides and creating a new model for predicting the stabilities of such duplexes.

Differential scanning calorimetry (DSC) is used to measure helix-to-coil transition thermodynamics for a library of unmodified and LNA-substituted duplexes. The data show

bp that ΔCp per base pair (∆Cp ) is non-zero, positive and, within the accuracy of the

experiment, unchanged by LNA substitution(s). A simple 4-parameter nonlinear NNT

model, the “single-base thermodynamic” (SBT) model, is derived based on the non-zero

bp ∆Cp and shown to predict ΔTm (= Tm(LNA) - Tm(DNA)) with an accuracy and precision similar

to the previously described LNA nearest-neighbor thermodynamic model (44) requiring a

total of 64 parameters. The regressed single base (SB) parameters are entropy related and

offer insight into the mechanism by which the four standard LNA nucleobases provide their

stability enhancements. The model further distinguishes itself through its ability to also

accurately predict ∆H° and ∆S° for the melting of LNA-containing duplexes, both of which

are poorly predicted by the previously reported model (39). Moreover, the SBT model,

which works at the individual base pair level, can be used to predict ΔTm for both mixmers

67 and “gapmers”, which are duplex sequences containing one or more neighboring (adjacent)

LNA•DNA base pairs.

DSC-derived melting thermodynamics data and SBT model parameters are also

reported for the modified nucleotides LNA-2-aminoadenine (D) and LNA-2-thiothymine (H)

(178-180). Both show that duplexes substituted with D and/or H bases are more stable than

their isosequential duplexes bearing the LNAs A and/or T, respectively. The D•H base pair

(Figure 3.1) is shown to be pseudo-complementary, a property that can be exploited in the functional design of oligonucleotides to minimize the formation of undesired stable secondary structures (181-183).

Figure 3.1 Structure of LNA-2-aminoadenine and LNA-2-thiothymine. Base-pairing configuration is shown to demonstrate steric hindrance (orange) between the 2-amino (blue) and 2-thioketo (green) groups of the respective bases. The 2’-O,4’-C-methylene bridge of LNA is also identified (pink).

68 3.1 Materials and Methods

3.1.1 Sequence design

Short complementary duplexes containing A and/or T substitutions were designed

from 12 duplex DNA reference sequences known to possess a two-state melting transition

from repeated DSC scans. From these 12 reference sequences a total of 20 duplexes with

one, two or three A and/or T substitutions were designed such that all 32 possible tri-

nucleotide sequences 5’-x-M-y-3’ (where x and y = a, t, g or c and M = A or T) were represented (Table 3.1). The modified ss within each duplex was designed as a mixmer with each LNA separated from any other LNA, as well as from the 5’ and 3’ termini, by a minimum of two DNA nucleotides. Each modified ss was also designed to preclude formation of stable single stranded structures (e.g., hairpins and homo-dimers), as well as undesired bimolecular products such as slipped duplexes with the complementary DNA strand. A second set of sequences used to analyze stability enhancements provided by the two base-modified LNAs (D and H) was then created by substituting A with D and T with H within the same set of duplexes (Table 3.2).

This basic design strategy was also used to study duplexes with G and/or C substitutions, where the set of LNA substituted duplexes (Table 3.1) were derived from 14

DNA reference duplexes. Thus, results for each LNA containing duplex could be directly compared to results for its corresponding isosequential DNA duplex. As before, all 32 possible trinucleotide 5’-x-M-y-3’ sequences containing a G or C central substitution were represented. However, two of these duplexes failed to show the required two-state melting behavior, resulting in 3 trinucleotide sequences (5’-c-G-a-3’, 5’-c-G-t-3’ and 5’-g-C-a-3’)

being excluded from our analysis.

69 Finally, the reference duplex sequences that were modified to study terminal and

tandem (gapmer-type) LNA substitutions were 11-mers that contain an equal representation

of all 10 possible DNA nearest neighbors. Single terminal LNA substitutions at either the 5’

or 3’ end were made to a total of 3 reference duplexes. All 4 standard LNA bases were tested

at both positions. Likewise, 16 tandem LNA sequences were created from three reference

duplexes. To avoid any end effects, all tandem LNA substitutions were placed a minimum of

two nucleotides from the 5’ or 3’ end.

3.1.2 Oligonucleotide synthesis and purification

Oligonucleotides containing D and/or H substitutions were synthesized and then

HPLC purified by Exiqon Inc. (Vedbæk, Denmark). All other DNA and LNA containing

oligonucleotides used in this study were obtained from Integrated DNA Technologies

(Coralville, IA). All oligonucleotides were resuspended in buffer containing 1 M NaCl, 10

mM Na2HPO4, 1 mM Na2EDTA at pH = 7.0 and quantified by UV-Spectophotometer (Cary

1E) at 80 °C using extinction coefficients provided by the supplier.

3.1.3 Differential scanning calorimetry

ex DSC was used to collect excess heat capacity (ΔCp ) versus temperature (T) data for

the helix-to-coil transition. All experiments were performed on a Microcal (Northampton,

MA) VP-DSC, with a nominal cell volume of 0.52811 ml. Data was collected by scanning

from 1 to 100 °C at a rate of 1 °C min-1. Both the sample and reference cells were initially

filled with buffer solution and DSC data were collected for a minimum of two cycles to

establish baseline buffer scans. Samples with total strand concentrations (CT) of 50, 75 or

70 100 μM containing equimolar amounts of two complementary strands were then dynamically

loaded during the down scan part of the cycle and data were collected for a minimum of two

ex cycles. Final ΔCp versus T profiles used for data analysis were obtained by subtracting the

average baseline scans from those obtained for duplex melting after normalizing against the

concentration, corrected for strand purities. These corrected thermograms were then used to

2-st determine ΔCp as well as two-state (ΔH°2-st, ΔS°2-st and Tm ) and calorimetric (ΔH°cal, ΔS°cal

cal and Tm ) data as described previously in Chapter 1 and in (177). Thermodynamic data reported throughout this paper are the average of two-state (model dependent) and

calorimetric (model independent) values.

3.1.4 UV spectroscopy

UV Spectroscopy was also used to collect helix-to-coil transition data using a Varian

(Santa Clara, CA) Cary 1E spectrophotometer equipped with a 12 cell peltier temperature

controller and sample temperature probes. All UV-monitored melt (UVM) experiments were

conducted at 260 nm by scanning from 10 to 98 °C at a scan rate of 0.5 °C min-1 to obtain absorbance (A260) versus T profiles. Cuvettes with path lengths of 1 mm or 10 mm were

loaded with buffer or duplex samples with CT = 75 or 7.5 μM respectively, and caps were

fitted to prevent evaporation during heating. UVM data were analyzed using a two-state

thermodynamic model that includes contributions from a non-zero ΔCp as previously

described (136). Raw UVM curves were transformed to compute the fraction of strands in

the random-coil state (θ) by fitting the pre-transition and post-transition baselines to

A (T ) = mUVMT + bUVM (1−θ )+ mUVMT + bUVM θ (3.1) 260 ( pre pre ) ( post post )

71 UVM UVM UVM UVM where mpre and bpre , and mpost and bpost are the slopes and intercepts of the pre

and post transition baselines, respectively. The enthalpy ΔH°UVM and entropy ΔS°UVM changes at the measured Tm for the helix-to-coil transition were then determined by fitting the fractional curve to Equations 3.2 and 3.3

− K(T ) + K 2(T ) + 2K(T )C θ ()T = T (3.2) CT

⎡ ⎛ ΔH o + ΔC (T −T )⎞ ΔS o + ΔC ln(T T )⎤ K()T = exp⎢− ⎜ UVM p m ⎟ + UVM p m ⎥ (3.3) ⎜ RT ⎟ R ⎣⎢ ⎝ ⎠ ⎦⎥

where K(T) is the temperature-dependent equilibrium constant for the dissociation (helix-to-

coil) reaction. The ΔCp required in Equations 3.2 and 3.3 was computed from the strand

bp -1 -1 -1 length and the average ΔCp value of 42 cal mol K bp . Tmax was used as an estimate of

Tm in Equation 3.3 during the first solution iteration. After solving for ΔH°UVM(Tm) and

ΔS°UVM(Tm), a new value of Tm is then determined from Equation 3.4 written for the helix-to-

coil transition

o ΔHUVM (Tm ) Tm = o (3.4) ΔSUVM (Tm ) − R ln()CT 4

and the solution iteration is continued with Equations 3.2 and 3.3 until a best fit with the

experimental data is obtained using Microsoft Excel™ Solver.

3.1.5 Error analysis

To establish the accuracy of the DSC experiment, repeated melting transition studies

were conducted for several duplexes to obtain three to six complete and independent data sets

for each duplex. From these data, the average errors in the regressed thermodynamic

72 parameters ΔCp, ΔH°, ΔS° and Tm were determined to be 25%, 3%, 3% and 0.5 °C

respectively. All UVM studies were performed in triplicate and the combined data also used

to estimate errors in reported thermodynamic data.

3.1.6 Regression of SBT model parameters

Incremental thermodynamic parameters (∆∆Hi° and ∆∆Si°) within the SBT model for

each possible LNA substitution i (where i = A, T, G, C, D or H) were determined through

global regression to the set of incremental thermodynamic data (63 modified sequences

identified as j = 1 to 63) reported in Tables 3.1 and 3.2. For example, the set of incremental

enthalpy parameters ∆∆Hi° was computed by minimizing the error weighted squares of the

residual (χ2) as shown in Equations 3.5 and 3.6

o o 2 (ΔΔH j( pred ) − ΔΔH j ) χ = ∑ 2 (3.5) j σ j

o o ΔΔH j( pred ) = ∑niΔΔHi (3.6) i

o o where ΔΔH j(pred) and ΔΔH j represent the model predicted and experimental values, respectively, and ni is the number of LNA substitutions of type i within duplex sequence j.

The same approach was used to determine ∆∆Si° and ∆∆Gi°37 parameters for each possible

base substitution. Parameter regression was completed using the program Microsoft Excel™

with the add-in software XLSTAT™ v. 2009.4.03.

3.1.7 Model predicted ΔTm values for LNA substituted duplexes

Model predicted ΔTm values for LNA substituted duplexes were determined and used

to compare and evaluate the performance of the SBT model relative to other available

73 methods. For the “Oligodesign” method (45), the web based algorithm accessed at http:\\lna-

tm.com was used to predict ΔTm (=Tm(LNA) - Tm(DNA)) values, hereafter denoted ∆Tm(pred) to explicitly recognize the value is model derived, by computing the Tm of both the LNA

substituted duplex (Tm(LNA)) and the isosequential DNA reference duplex (Tm(DNA)) with all input fields including the total strand concentration (CT) set to match experimental

conditions. ΔTm(pred) values predicted using the LNA NNT model of McTigue et al. were obtained as described in (44).

3.2 Results and Discussion

The central goal of this work was to conduct UVM and high-sensitivity DSC

experiments that serve to establish an improved understanding of the thermodynamic basis

for the enhanced duplex stability provided by replacement of a nucleotide with its LNA

analogue, and likewise provide the first detailed study of the thermodynamic properties of

oligodeoxynucleotide duplexes incorporating the modified nucleobases LNA-2-

aminoadenine (D) and LNA-2-thiothymine (H). Note that the thermodynamic data reported

here (typically, Tm, ∆G°37, ∆H°, ∆S°, and ∆Cp values) are for the helix-to-coil transition and

thus, the signs of data for thermodynamic changes are opposite to those in some previously

published literature on DNA and LNA thermodynamics (44,63). These data are used to

compute incremental thermodynamic changes (∆∆G°37, ∆∆H°, and ∆∆S°) at the chosen

reference temperature, Tm(DNA), the melting temperature of the corresponding isosequential

DNA duplex at the same solution conditions (P = 1 atm, same CT, 1 M NaCl, 10 mM

Na2HPO4, 1 mM Na2EDTA at pH = 7.0).

74 Table 3.1 DSC determined thermodynamic parameters for duplexes between DNA oligonucleotides with and without A, T, G and/or C substitutions.

. . . . Name Sequence CT ∆Cp ∆G°37 ∆H° ∆S° Tm ∆∆G°37 ∆∆H° ∆∆S° ∆Tm μM cal mol-1 K-1 kcal mol-1 kcal mol-1 cal mol-1 K-1 oC kcal mol-1 kcal mol-1 cal mol-1 K-1 oC UD1 gcccagcg 100 557 10.0 57.6 154 56.9 UL1a gcccAgcg 100 556 10.5 58.9 156 59.3 0.4 -0.1 -1.6 2.4 UL1b cgcTgggc 100 439 10.9 59.4 157 61.4 0.7 -0.2 -2.9 4.5 UL1c gccCagcg 100 466 11.2 62.0 164 62.4 1.0 1.8 2.5 5.5 UL1d cgCtgGgc 100 394 13.3 65.8 169 72.8 2.8 1.9 -2.8 15.9 UD2 agacctagt 100 440 8.1 56.0 154 46.2 UL2 agAcctAgt 100 445 10.2 61.1 164 56.7 1.8 0.4 -4.7 10.5 UD3 ccatgtccc 100 509 9.8 65.9 181 53.4 UL3 ccatgTccc 100 351 11.1 67.6 182 59.6 1.1 -0.5 -5.1 6.1 UD4* tgcacgcta 100 538 10.0 55.0 145 58.2 UL4* tgcAcgcta 100 393 10.3 55.2 145 60.0 0.3 -0.5 -2.3 1.9 UD5 atgctcatgc 100 472 10.5 65.0 176 57.1 UL5a atgctcAtgc 100 445 11.1 67.8 183 59.3 0.5 1.8 4.0 2.2 UL5b atgcTcatgc 100 472 11.2 65.8 176 60.7 0.6 -1.0 -5.1 3.7 UD6 ggcacgcttcg 75 469 14.8 88.4 237 68.2 UL6a cgAagcgTgcc 75 499 16.8 92.7 245 75.0 1.7 1.0 -2.4 6.8 UL6b ggcaCgcttcg 75 343 16.1 90.2 239 73.2 1.2 0.1 -3.5 5.0 UL6c cgaagCgtGcc 75 302 17.1 90.0 235 77.6 2.0 -1.2 -10.6 9.4 UD7 aagttctcttat 75 531 10.1 74.2 207 52.0 UL7a aagTtctcTtat 75 372 11.8 76.8 209 59.2 1.5 -0.1 -5.3 7.1 UL7b aagtTctctTat 75 563 12.2 80.0 218 60.1 1.8 1.2 -1.9 8.0 UD8 aactatgaaact 75 516 10.7 81.9 229 52.9 UL8a aactAtgaaAct 75 498 12.1 83.6 231 58.4 1.2 -1.0 -7.2 5.4 UL8b agtTtcaTagtt 75 510 12.7 84.9 233 60.4 1.7 -0.8 -8.1 7.4 UD9 aaatagagaattc 75 610 11.0 88.9 251 52.8 UL9 aaAtagAgaaTtc 75 850 13.3 93.5 259 60.4 1.9 -1.9 -12.2 7.6 UD10 gaaacagttaaag 75 474 12.3 96.6 272 56.1 UL10 gaAacagttAaag 75 387 12.8 96.0 268 58.0 0.4 -1.3 -5.6 1.8 UD11 aacatagattacat 50 480 12.9 98.4 276 56.8 UL11a aacatagattAcat 50 558 13.6 100.2 279 58.9 0.6 0.6 0.0 2.1 UL11b atgTaatctaTgtt 50 543 15.4 104.7 288 64.1 2.3 2.3 0.2 7.4 UL11c aacatagAttacat 50 412 13.5 98.2 273 59.0 0.6 -1.1 -5.4 2.2 UL11d atgtaaTctatgtt 50 663 13.3 95.3 264 59.2 0.4 -4.7 -16.5 2.5 UD12 ttcttatagatacaag 50 504 14.1 113.6 321 57.8 UL12a ttcttatagatacAag 50 622 14.5 114.5 322 59.0 0.4 0.1 -0.8 1.2 UL12b ctTgtatcTataAgaa 50 647 17.4 119.3 329 66.8 2.9 -0.1 -9.4 8.9 UL12c ttcttataGataCaag 50 416 16.1 113.9 315 64.2 1.8 -2.4 -13.6 6.3 UL12d cttGtatCtataagaa 50 604 16.4 117.4 326 64.1 2.0 0.0 -6.5 6.3 UD13 ctggagc 100 418 7.1 46.2 126 40.6 UL13 ctGgagc 100 375 7.7 48.1 130 44.9 0.6 0.3 -1.0 4.3 UD14* ggtgccaa 100 382 8.6 52.0 140 49.8 UL14* ggtgCcaa 100 267 9.7 51.6 135 57.2 1.0 -2.4 -10.9 7.4 UD15* acgtcttcg 100 583 9.1 57.0 154 51.4 UL15* acgtCttcg 100 380 10.5 61.3 164 58.3 1.3 1.7 1.4 6.9 UD16 tagagggcagac 75 399 14.6 91.0 247 66.4 UL16a tagaggGcagac 75 523 15.5 92.9 249 69.5 0.8 0.2 -2.0 3.2 UL16b tagaGggcagac 75 542 16.1 93.9 251 71.4 1.2 0.2 -3.4 5.0

75 Table 3.1 continued

. . . . Name Sequence CT ∆Cp ∆G°37 ∆H° ∆S° Tm ∆∆G°37 ∆∆H° ∆∆S° ∆Tm μM cal mol-1 K-1 kcal mol-1 kcal mol-1 cal mol-1 K-1 oC kcal mol-1 kcal mol-1 cal mol-1 K-1 oC UD17 ctcgggaacgcc 75 519 16.6 100.1 269 71.2 UL17a ctcGggaacgcc 75 317 17.3 100.8 269 73.4 0.6 0.0 -1.7 2.2 UL17b ggcgttccCgag 75 552 17.1 98.4 262 73.5 0.3 -3.0 -10.5 2.3 UL17c ggcgttCccgag 75 307 17.5 97.8 259 75.4 0.7 -3.5 -13.7 4.2 UL17d ctcggGaacgcc 75 333 17.1 98.1 261 73.9 0.4 -2.9 -10.7 2.7 UD18 gatctgaggtact 75 523 14.2 98.4 271 62.5 UL18a gatctgagGtact 75 500 15.3 99.9 273 66.4 1.0 -0.4 -4.6 3.8 UL18b agtaCctcagatc 75 400 15.9 101.6 276 67.9 1.6 1.0 -1.8 5.4 UL18c agtacCtcagatc 75 486 15.5 101.4 277 66.5 1.2 1.1 -0.3 4.0 UD19 gccctcgcacgtc 75 674 18.8 109.9 294 75.2 UL19 gccctcGcacgtc 75 476 19.3 112.0 299 76.1 0.4 1.7 3.9 0.9 UD20 cgaaagcctcggc 75 749 17.1 102.7 276 71.8 UL20 cgaaaGcctCggc 75 701 19.4 107.0 282 78.8 1.8 -0.6 -7.7 7.0 UD21 gcatgcccgtgacac 50 650 21.4 129.0 347 76.2 UL21 gcatgcCcgtgacac 50 476 22.4 127.6 339 79.7 0.8 -3.1 -12.5 3.5 UD22 agtagtaatcacacc 50 645 16.6 117.1 324 64.9 UL22 agtaGtaatCacacc 50 737 18.4 118.8 324 70.1 1.5 -2.2 -11.7 5.2 UD23 caacttgatattaata 50 609 14.5 114.9 324 58.9 UL23 caaCttGatattaata 50 554 17.3 119.2 329 66.4 2.4 0.2 -7.3 7.5

Measured thermodynamic values (∆G°37, ∆H°, ∆S° and Tm) are for the helix-to-coil transition and are the average of two-state and calorimetric analysis. For the mean experimental values reported, the estimated errors (standard deviations) for ∆Cp, ∆G°37, ∆H°, ∆S°, and Tm were determined to be 25%, 2%, 3%, 3%, and 0.5 °C, respectively. ∆H° and ∆S° are reported at the Tm of the sample, while ∆∆H° and ∆∆S° are at the Tm of the reference isosequential DNA duplex and computed from the measured thermodynamic values using the measured ∆Cp for the given duplex. Sequences marked with * were also previously analyzed by McTigue et al. (44) using UVM.

Thermodynamic data determined by DSC for 43 duplexes containing A, T, G and/or

C substitutions, (identified as UL#) and for 23 unmodified DNA duplexes used as references

(identified as UD#) are shown in Table 3.1. Similarly, thermodynamic data for 20 duplexes containing D and/or H substitutions (identified as USL#) are shown in Table 3.2. The thermodynamic values reported in Tables 3.1 and 3.2 were computed as the average of those determined by model dependent and model independent analyses (see Material and

Methods). For all 86 duplexes there was very good agreement in the thermodynamic data obtained by these two methods, with the value of ∆H2-st/∆Hcal lying between 0.97 and 1.03 for all samples. Average values of the two-state (model dependent) and calorimetric (model

76 independent) analysis could therefore be used so as to ameliorate any very minor biases that might be introduced by either data analysis method. We note that Zhou et al. (176) have shown that agreement between these two methods is a necessary, but not a completely sufficient condition to prove that the melting reaction occurs as a two-state transition.

Table 3.2 DSC derived thermodynamic data for complementary DNA duplexes where one strand contains D and/or H substitutions.

. . . . Name Sequence CT ∆Cp ∆G°37 ∆H° ∆S° Tm ∆∆G°37 ∆∆H° ∆∆S° ∆Tm μM cal mol-1 K-1 kcal mol-1 kcal mol-1 cal mol-1 K-1 oC kcal mol-1 kcal mol-1 cal mol-1 K-1 oC USL1a gcccDgcg 100 463 11.4 61.5 162 63.8 1.2 0.7 -1.6 6.9 USL1b cgcHgggc 100 501 11.5 60.2 157 65.2 1.2 -1.6 -9.3 8.3 USL2 agDcctDgt 100 414 11.0 62.7 167 61.1 2.6 0.5 -6.6 14.9 USL3 ccatgHccc 100 330 11.5 66.9 179 61.9 1.5 -1.7 -10.4 8.5 USL4 tgcDcgcta 100 384 11.3 60.2 158 63.9 1.1 3.0 6.1 5.8 USL5a atgctcDtgc 100 436 12.0 71.4 192 62.8 1.4 3.9 8.2 5.8 USL5b atgcHcatgc 100 311 12.1 69.5 185 63.8 1.4 2.4 3.3 6.7 USL6a cgDagcgHgcc 75 449 17.5 92.1 241 78.2 2.2 -0.7 -9.6 10.0 USL7a aagHtctcHtat 75 425 12.3 75.8 205 61.9 1.9 -2.6 -14.6 9.8 USL7b aagtHctctHat 75 333 12.3 73.4 197 62.7 2.0 -4.4 -20.6 10.7 USL8a aactDtgaaDct 75 418 13.2 87.0 238 62.2 2.3 1.2 -3.5 9.3 USL8b agtHtcaHagtt 75 474 13.8 85.7 232 64.8 2.7 -1.8 -14.6 11.9 USL9 aaDtagDgaaHtc 75 600 14.4 92.7 252 65.1 2.9 -3.6 -20.9 12.3 USL10 gaDacagttDaag 75 343 6 96.8 268 60.7 1.2 -1.4 -8.2 4.6 USL11a aacatagattDcat 50 276 14.3 100.4 278 61.4 1.4 0.7 -2.1 4.7 USL11b atgHaatctaHgtt 50 591 16.7 105.1 285 68.6 3.3 -0.3 -11.7 11.8 USL11c aacatagDttacat 50 464 14.0 98.3 272 60.7 1.0 -2.0 -9.5 4.0 USL11d atgtaaHctatgtt 50 526 14.0 94.2 259 62.0 0.9 -7.0 -25.5 5.2 USL12a ttcttatagatacDag 50 854 14.8 114.8 323 59.7 0.6 -0.3 -2.9 1.8 USL12b ctHgtatcHataDgaa 50 610 18.5 117.0 318 71.0 3.8 -4.6 -27.0 13.1 Measured thermodynamic values (∆G°37, ∆H°, ∆S° and Tm) are for the helix-to-coil transition and are the average of two-state and calorimetric analysis. For the mean experimental values reported, the estimated errors (standard deviations) for ∆Cp, ∆G°37, ∆H°, ∆S°, and Tm were determined to be 25%, 2%, 3%, 3%, and 0.5 °C, respectively. ∆H° and ∆S° are reported at the Tm of the sample, while ∆∆H° and ∆∆S° are at the Tm of the reference isosequential DNA duplex and computed from the measured thermodynamic values using the measured ∆Cp for the given duplex.

There is also excellent agreement between the DSC-derived thermodynamic data for the unmodified DNA reference duplexes and those predicted using the non-linear NNT model reported in Chapter 2 that accounts for a non-zero ∆Cp (116). Absolute average

77 differences in ΔH°, ΔS°, and Tm values predicted by the model and those obtained by DSC

were +5 ± 6% (% mean error ± % standard deviation), +5 ± 6%, and +0.5 ± 1.8 °C,

respectively. Moreover, six of the DNA reference duplexes investigated by DSC were

previously studied by UVM (44). Good agreement between these independent methods was

also observed, with average differences in ΔH°, ΔS°, and ΔG°37 of +2 ± 5%, +3 ± 7% and -2

± 1%, respectively.

Each of the 43 duplexes containing one or more A, T, G and/or C substitutions was

more stable than its isosequential DNA reference sequence, with ΔTm values ranging from

+1.1 to +15.9 °C. When the same sequences contained D and/or H substitutions, even larger

stability increases (ΔTm = +1.7 to +14.9 °C) were observed. Finally, DSC analysis of all 86

duplexes reported in Tables 3.1 and 3.2 gave (Table 3.3) an average heat capacity change per

bp -1 -1 -1 base pair ΔCp of 42 ± 11 cal mol K bp for the helix-to-coil transition. This value is in

bp excellent agreement with a ΔCp reported in Chapter 2 and elsewhere (177) for melting of

duplex DNA, and also lies within the consensus range (30 to 60 cal mol-1 K-1 bp-1) of all

bp previously reported ΔCp values for duplex DNA (88). Over the range of melting

bp temperatures analyzed (~ 42 °C < Tm < ~ 80 °C), the experimental value of ΔCp decreases

slightly but not appreciably with temperature.

bp Table 3.3 Average ΔCp and Tm values determined by DSC for duplexes with and without LNA substitutions.

bp Source Group ΔCp Tm cal mol-1 K-1 bp-1 oC Table 3.1 UD 47 ± 10 59 ± 9 Table 3.1 UL 41 ± 10 65 ± 8 Table 3.2 USL 40 ± 10 64 ± 4 Table 3.1 & 3.2 All 42 ± 11 63 ± 8

78 3.2.1 Accounting for ∆Cp shows that the increase in duplex stability resulting from

LNA substitutions is predominantly driven by a favorable entropy change.

A non-zero value of ΔCp indicates that ∆H° and ∆S° are functions of temperature

(85,90). To determine incremental enthalpy ∆∆H° and ∆∆S° changes, the measured ΔCp for

each modified duplex was therefore first used to compute ∆H° and ∆S° for that duplex at the

Tm of the corresponding DNA reference sequence. Nearly identical results were obtained

bp -1 -1 -1 when the average ΔCp value of 42 cal mol K bp was used for this computation.

However, while the two methods yield similar results, the errors in the ∆∆H° and ∆∆S° values (as well as in incremental Gibbs energy changes ∆∆G°37) decrease slightly when the

experimental ΔCp value for the modified duplex is employed. For each sequence, the values

of ∆H° and ∆S°, as well as values of ∆∆H°, ∆∆S°, and ∆∆G°37 are reported in Tables 3.1 and

bp 3.2. Average ΔCp values for each data set are reported in Table 3.3, and no statistical

bp difference is found in the average value of ΔCp for LNA-modified duplexes relative to that

for unmodified DNA duplexes, indicating that there is no appreciable ∆ΔCp associated with

LNA substitution(s). ∆∆H°, ∆∆S°, and ∆∆G°37 values are therefore assumed to be

temperature independent.

These results were used to assess the impact of a non-zero ΔCp on our fundamental

understanding of melting thermodynamics for short complementary duplexes containing

LNA substitutions. For all standard LNA (A, T, C or G) substituted duplexes studied in this

work, Figure 3.2A plots ∆∆H° versus ∆∆S° for the case in which both incremental changes

are computed assuming ΔCp is zero. When applied to the data set reported in Table 3.1, this assumption, which has been widely adopted in the DNA melting thermodynamics literature, leads to a result that suggests that the observed stability enhancement for LNA substituted

79 duplexes can be driven enthalpically, entropically, or by a combination thereof. This, as yet,

unproven assumption has been adopted in previous thermodynamic studies of LNA-

containing duplexes (44,85,122,126,134,135) and represent the primary justification made by

McTigue et al. (44) for including both incremental enthalpy and entropy correction terms in

their LNA NNT model.

25 25 20 (A) 20 (B) 15 15

-1 10 -1 10 K K

-1 5 -1 5 0 0 calmol calmol

o -5 o -5 T S S

ΔΔ -10 ΔΔ -10 -15 -15 -20 -20 -25 -25 -12 -8 -4 0 4 8 12 -12 -8 -4 0 4 8 12

ΔΔH o kcal mol-1 ΔΔH o kcal mol-1

Figure 3.2 Experimental helix-to-coil transition ΔΔH° and ΔΔS° for 43 duplexes with standard LNA substitutions. Correlation between ΔΔHo and ΔΔSo determined under the assumption that ΔCp = 0 (A) or determined at the Tm of the isosequential DNA sequence using the ΔCp measured by DSC (B).

However, first and correctly computing ∆H°LNA(Tm(DNA)) and ∆S°LNA(Tm(DNA)) from the ∆H°LNA(Tm(LNA)), ∆S°LNA(Tm(LNA)) and ΔCp data for the modified duplex shows that, within

experimental error, a favorable incremental entropy change is the predominant duplex

stabilization mechanism (Figure 3.2B), with average ∆∆H° and ∆∆S° values for the helix-to-

coil transition for all duplexes studied given by -0.5 ± 1.5 kcal mol-1 and -5.3 ± 4.8 cal mol-1

K-1, respectively. Thus, with respect to duplex stability, a single LNA substitution is on

80 average, an athermal (or enthalpically slightly unfavorable) process that succeeds in

stabilizing the duplex by reducing the entropy gain that accompanies the helix-to-coil

transition. Not surprisingly then, Figures 3.2A and 3.2B show that the erroneous assumption

that ∆Cp = 0 can lead to significant errors in the interpretation of both DNA and LNA

melting thermodynamics, including the introduction of non-random errors that result in an

overestimation of any entropy-enthalpy compensation associated with LNA substitutions.

This motivates the question of whether any advances in the thermodynamic modeling

of LNA-bearing oligonucleotides can be achieved by recognizing the non-zero value of ΔCp.

The existing LNA NNT model of McTigue et al. (44) can be used to predict incremental

thermodynamic parameters. Areas where efforts toward model improvement are justified

were therefore identified by comparing incremental thermodynamic values predicted by that

model (∆∆H°(pred), ∆∆S°(pred) and ∆∆G°37(pred)) with those determined from experimental data

for the 43 standard LNA-containing duplexes reported in Table 3.1. ∆∆G°37(pred) values computed by the LNA NNT model of McTigue et al. correlate somewhat (R2 = 0.78) with

∆∆G°37 values derived from experiment assuming ∆Cp = 0 (Figure 3.3; filled circles). A

similar correlation (R2 = 0.73) was obtained when LNA NNT model predictions were

compared to ∆∆G°37 values corrected for the non-zero value of ∆Cp (Figure 3.3, open

squares). However, many significant outliers are observed in either correlation. Moreover,

∆∆H°(pred) and ∆∆S°(pred) values predicted by the linear LNA NNT model do not correlate at

2 all (R = 0) with either the corresponding uncorrected or temperature-corrected (using ΔCp) experimental data, indicating that the linear LNA NNT model, which assumes ΔCp is zero, is unable to accurately quantify the enthalpic and entropic changes that drive the stability enhancements observed for LNA substitutions. As a result, the fundamental significance of

81 the large set of LNA NN enthalpic and entropic parameters generated and required for that

model are not clear. This then represents one area where useful advances in the

understanding and modeling of LNA hybridization thermodynamics may be realized.

3.5

3.0 -1 2.5

2.0 kcal mol kcal 37 o

G 1.5 ΔΔ 1.0

0.5 Predicted Predicted 0.0

-0.5 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

o -1 Experimental ΔΔG 37 kcal mol

Figure 3.3 Comparison of ∆∆G°37 values predicted using the LNA NNT model to experimental helix-to-coil ∆∆G°37 data for the 43 duplexes with standard LNA substitutions. Data taken from Table 3.1 and both the raw experimental data (●) and data corrected to the Tm of the isosequential DNA reference sequences using ΔCp (□) are shown.

The incremental thermodynamic values (∆∆G°37, ∆∆H°, and ∆∆S°) reported in Table

3.1 were therefore used to regress a set of thermodynamic parameters (Table 3.4), identified

as ∆∆Gi°37, ∆∆Hi°, and ∆∆Si°, that quantify the average incremental contribution (i.e., the excess contribution relative to that provided by the corresponding DNA nucleotide) to duplex

82 stability provided by a single LNA of type i within its complementary base pair (A•t, T•a,

G•c or C•g). Consistent with Figure 3.2B, these average incremental changes indicate that the

improved stability observed for LNA substitutions is entropically driven. Similar results

were obtained when incremental thermodynamic values from either the two-state (model

dependent) or the calorimetric (model independent) analysis were instead used to regress the

parameters, but in either case this resulted in a slightly larger uncertainty (standard error) in the parameters. On average, no statistically significant incremental enthalpy change ∆∆Hi° is

observed for any of the four standard LNA substitutions. A simple model describing the

melting thermodynamics of LNA bearing duplexes may therefore be realized by setting all

∆∆Hi° = 0 so that ∆∆Gi°37 = - 310∆∆Si°. Differences in the stability enhancement provided

by a particular LNA substitution are then ascribed to the fact that the magnitude of ∆∆Si° is

LNA base specific (Table 3.5).

Table 3.4 General SBT model parameters for the helix-to-coil transition of duplexes containing standard and nucleoside-modified LNA substitutions.

LNA•DNA ΔΔGi°37 ΔΔHi° ΔΔSi° base pair kcal mol-1 kcal mol-1 cal mol-1 K-1 A•t 0.58 ± 0.06 -0.1 ± 0.3 -2.3 ± 0.9 T•a 0.88 ± 0.06 -0.1 ± 0.3 -3.2 ± 1.0 G•c 0.83 ± 0.10 0.1 ± 0.5 -2.5 ± 1.4 C•g 1.14 ± 0.09 -0.3 ± 0.4 -4.8 ± 1.3 D•t 1.05 ± 0.06 0.4 ± 0.3 -2.2 ± 0.9 H•a 1.20 ± 0.06 -1.5 ± 0.3 -8.7 ± 0.9 Effect of Nucleoside Modification in LNA D•t - A•t 0.47 ± 0.06 0.5 ± 0.3 0.1 ± 0.9 H•a - T•a 0.32 ± 0.06 -1.4 ± 0.3 -5.5 ± 1.0

83 3.2.2 Base classification and pairing explain differences in the ∆∆Si° value and

stability enhancement offered by different LNAs.

Structural studies of single-stranded and duplexed oligonucleotides containing one or

more LNA substitutions indicate that introduction of an LNA pre-organizes (i.e. lowers the

entropy of) the LNA-containing single strand (47-50). This serves to improve duplex

stability. In particular, the introduction of a 2’-O, 4’-C-methylene bridge forces the

nucleoside into its N-type conformation in the single strand, greatly lowering the entropy of the strand. It remains fixed in its N-type conformation within the duplex, which lowers the

entropy of the duplex, but likely to lesser extent due to the inherently greater overall

organization of the duplex structure. These opposing entropic effects associated with LNA-

nucleoside structure are consistent with the thermodynamic data reported here. They further

suggest that the differences in ∆∆Si° values for the four possible LNA nucleosides arise from

differences in the extent of compensation between the two opposing entropy changes. To

better understand this, incremental entropy changes were computed as a function of LNA

base classification (i.e., purine vs. pyrimidine) and pair (i.e., a•t type base pair versus g•c type base pair). Table 3.5 provides a summary of this analysis, where the ∆∆Si° values have

now been calculated directly from the ∆∆Gi°37 values reported in Table 3.4 assuming ∆∆Hi° is 0 for each of the four standard LNA base substitutions. The results show that either possible LNA-pyrimidine substitution (t to T, or c to C) results in a -∆∆Si° (the incremental

entropy change for the hybridization reaction) that is 1.0 ± 0.3 cal mol-1 K-1 more favorable

than the corresponding LNA-purine substitution, suggesting that pyrimidinic LNA

substitutions provide a comparatively greater increase in the pre-organization of single

strands. The results further show that -∆∆Si° for C > T ≥ G > A (Table 3.5), which is in good

84 agreement with the average stability enhancement observed for each of these substitutions as

determined in this work and by others (44).

Table 3.5 ΔΔSi° parameters and differences in them for the helix-to-coil transition of duplexes containing standard LNA bases.

-1 -1 a ΔΔSi° (cal mol K )

LNA•DNA Base Pair Parameter Base Type Difference Base Pair Difference

A•t -1.9 ± 0.2 -1.0 ± 0.2 T•a -2.9 ± 0.2 -0.8 ± 0.3 G•c -2.7 ± 0.3 -1.0 ± 0.3 C•g -3.7 ± 0.3

a ΔΔSi° parameters were determined from ΔΔGi,37 parameters reported in Table 3.4 assuming ΔΔH° = 0.

The entropy gain arising from an LNA substitution likewise depends on base pair

-1 -1 type, with -∆∆Si° for substitution within a g•c base pair being 0.8 ± 0.3 cal mol K larger than that when the corresponding nucleotide is replaced within an a•t base pair. In this case,

however, the effect is likely related to differences in the degrees of freedom available to the

DNA base pair within the reference duplex. The unified NNT model of Santa Lucia Jr. (63)

estimates the average thermodynamic contributions of a g•c base pair within duplex DNA to

-1 -1 -1 -1 be -∆G°37 = - 1.8 kcal mol , -∆H° = - 9.0 kcal mol and -∆S° = - 23.1 cal mol K . Those

for an a•t base pair are estimated to be - 1.0 kcal mol-1, - 7.8 kcal mol-1 and - 21.6 cal mol-1

K-1, respectively. On average then, the additional hydrogen bond in a g•c base pair acts to

enhance the stability of a duplex by ~ -0.8 kcal mol-1 relative to an a•t base pair. As a result of this stronger interaction, the entropy of each base within the g•c pair is ~ 0.8 cal mol-1 K-1 lower than its corresponding base within an a•t pair. Given this more organized base

structure within a g•c pair, the average entropic penalty associated with locking either base

85 into its N-type conformation is reduced when compared to that when the LNA substitution

occurs within an a•t base pair.

3.1.1 Terminal 5’ and 3’ LNA substitutions are much less stabilizing than internal

LNA substitutions

The thermodynamic values reported in Tables 3.1 to 3.5 were regressed from DSC data for duplexes that were intentionally designed to avoid artifacts attributable to the duplex termini. However, development of a general and accurate model for predicting the Tm and

melting thermodynamics of LNA-containing duplexes will require proper accounting of the

influence of terminal LNA substitutions on duplex stability. Although not generally their

focus, previous investigations found that LNA substitutions at the 5’ or 3’ end of duplex

DNA are slightly destabilizing (46,134,174), but conflicting findings have been reported

(135), and the available information is not sufficient for reliable model development. A set

of 10 duplexes with a single LNA substitution at the 5’ or 3’ end were therefore designed and

studied. All four possible base substitutions at each position were tested, and results are

shown in Table 3.6 together with reference data for the associated isosequential DNA

duplexes.

86 Table 3.6 DSC derived thermodynamic data for duplexes used to study the effect of LNA substitution at the 3’ or 5’ termini.

Name Sequence ΔCp ΔG°37 ΔH° ΔS° Tm ΔΔG°37 ΔΔH° ΔΔS° ΔTm cal mol-1 K-1 kcal mol-1 kcal mol-1 cal mol-1 K-1 oC kcal mol-1 kcal mol-1 cal mol-1 K-1 oC C1 ctacgcattcc 462 12.18 81.7 224.0 59.2 L5C-C1 Ctacgcattcc 492 12.28 82.8 227.2 59.4 0.03 1.0 3.4 0.1 L5G-C1 Ggaatgcgtag 405 12.26 81.8 224.2 59.6 0.18 0.0 0.5 0.3 L3C-C1 ctacgcattcC 634 12.16 81.0 222.1 59.4 0.16 -0.7 -1.7 0.1 L3G-C1 ggaatgcgtaG 449 12.46 82.5 226.0 60.2 0.58 0.4 3.3 1.0 T1 ttcatagccgt 353 11.98 79.3 216.9 59.1 L5T-T1 Ttcatagccgt 430 11.73 78.2 214.2 58.3 -0.43 -0.7 -3.7 -0.8 L5A-T1 Acggctatgaa 260 11.64 77.2 211.4 58.2 -0.13 -1.8 -6.3 -1.0 L3T-T1 ttcatagccgT 414 12.05 79.7 218.2 59.3 0.07 0.4 1.5 0.2 L3A-T1 acggctatgaA 475 12.08 80.3 220.0 59.2 0.00 1.0 3.3 0.1 T5 tactccgcatt 478 11.95 73.9 199.7 60.7 LT5-T5 Tactccgcatt 357 11.58 71.5 193.1 59.7 -0.33 -2.0 -7.6 -1.0 LA3-T5 aatgcggagtA 280 12.03 74.1 200.1 61.0 0.09 0.1 0.7 0.3

Measured thermodynamic values (ΔG°37, ΔH°, ΔS° and Tm) are for the helix-to-coil transition and were determined as reported in Materials and Methods. For the mean experimental values reported, the estimated errors (standard deviation) for ΔCp, ∆G°37, ∆H°, ∆S° and Tm were determined to be 25%, 2%, 3%, 3% and 0.5 °C respectively. ∆H° and ∆S° are determined at the Tm of the sample, while ΔΔH° and ΔΔS° are at the Tm of the bp -1 -1 -1 reference isosequential DNA duplex and computed using the average ΔCp = 42 cal mol K bp . CT = 75 μM for all duplexes in Table 3.6.

On average, a single LNA substitution at the 5’ or 3’ terminal end results in no significant stability change in the duplex (∆Tm = – 0.1 ± 0.7 °C and ∆∆G°37 = -0.02 ± 0.28

-1 kcal mol ). Duplexes with 5’ terminal substitutions exhibited an average ∆Tm of -0.5 ± 0.6

°C, with 3 of the modified sequences showing a Tm reduction larger than 0.5 °C. One LNA substitution at the 3’ terminus resulted in a statistically significant increase in stability (∆Tm >

0.5 °C). However, on average, 3’ substitutions were also found to provide no significant stability improvement, exhibiting an average ∆Tm of +0.3 ± 0.4 °C. These end effects are not well described by either the LNA NNT model or the incremental thermodynamic changes reported in Table 3.5 for LNA substitutions within the interior of the duplex. Instead, the results, particularly the finding that on average ∆∆G°37 for terminal substitutions is -0.02 ±

87 0.28 kcal mol-1, indicate that to a good approximation single isolated LNA substitutions at oligonucleotide termini can be ignored for the purpose of stability prediction.

3.1.2 A new model for predicting the melting thermodynamics of LNA substituted duplexes

The results reported suggest that accurate calculation of ΔTm(pred) and melting

thermodynamics for complementary duplexes containing standard LNAs might be achieved

with a new model, hereafter called the single base thermodynamic (SBT) model, that corrects

predictions for the isosequential DNA duplex made by a recently described non-linear

extension (177) of the unified NNT model of Santa Lucia Jr. (63) using only the

bp experimentally determined average ∆Cp value and the set of four incremental entropy

parameters (∆∆Si°) reported in Table 3.5. This four-parameter model is therefore

considerably simpler in structure than the existing 64 parameter linear LNA NNT model

(44). Moreover, as the incremental thermodynamic parameters reported in Table 3.5 specifically characterize the individual LNA base substitution and not the associated nearest neighbor, the SBT model can in principal be applied to highly substituted duplexes containing one or more adjacent LNAs within one of the duplex-forming oligonucleotides.

In its most general form, the SBT model can be applied to oligonucleotides

substituted with standard LNAs and the nucleobase-modified LNAs D and H, which, unlike

standard LNAs, provide both an entropic and an enthalpic stability enhancement (see below).

Thermodynamic changes for the helix-to-coil transition at temperature T are computed as

o o o ΔHLNA ()Tm =ΔHDNA Tref +ΔΔH LNA +ΔCp Tm −Tref (3.7) ( ) ( )

o o o ΔSLNA ()Tm =ΔSDNA Tref +ΔΔSLNA +ΔCp ln Tm Tref (3.8) ( ) ( )

88 where ∆H°DNA(Tref) and ∆S°DNA(Tref) are determined using the aforementioned modified form of the universal NNT model for duplex DNA that corrects for both a non-zero ΔCp and the unique base-paring energetics of terminal 5’-ta groups (177). That model requires selection

of a reference temperature Tref, which is taken to be 53 °C for all SBT model calculations.

The SBT model therefore correctly treats ∆H and ∆S as temperature dependent functions

with their values at a given temperature computed from the chosen standard state of 53 °C.

Tref was identified through regression of the nonlinear NNT model to melting data for 125

- duplex sequences; in particular, minimum model error was observed when ΔCp = 42 cal mol

1 -1 -1 o bp K bp and Tref = 53 C (34). In Equations 3.7 and 3.8, ∆Cp is given by n∆Cp , where n is

bp the total number of base pairs in the duplex and ∆Cp is given by the average experimental

value, 42 cal mol-1 K-1 bp-1.

∆∆H°LNA and ∆∆S°LNA are computed as

o o ΔΔH LNA = ∑ niΔΔH i (3.9)

o o ΔΔSLNA = ∑ ni ΔΔSi (3.10)

where ni is the number of standard and base-modified LNA substitutions of type i (not

including those at either terminus), and the values of ∆∆Hi° and ∆∆Si° are those reported in

Table 3.4. The value of Tm for a duplex where one strand contains one or more internal LNA

substitutions is then given by

o ΔH LNA (Tm ) Tm = o (3.11) ΔSLNA ()Tm − Rln()CT 4

89 where CT is the total molar concentration of single stranded oligonucleotides. Solving for Tm using Equation 3.11 requires iteration, but convergence is rapid if the initial estimate for Tm is

found by setting ∆Cp = 0 in Equation 3.7 and 3.8.

For duplexes containing only standard LNA base substitutions, all ∆∆Hi° are zero within experimental error. A simplified form of the SBT model can therefore be obtained by setting ∆∆H° = 0 and calculating ∆∆S° using the ∆∆Si° parameters in Table 3.5.

3.1.3 The SBT model predicts Tm values for standard-LNA-containing mixmer

duplexes with similar accuracy as more complex NNT models.

The performance of the SBT model was first tested by determining errors in model- predicted Tm(LNA) and ∆Tm(pred) values through comparison with experimental data for 206

mixmer duplexes containing one or more interspersed (non-adjacent) standard LNA

substitutions. The 206 test sequences included the 43 duplexes from Table 3.1 that were

used to regress the SBT model parameters, along with 163 mixmer duplexes for which

experimental Tm(LNA) and ∆Tm values have been reported previously (38,44,174). Many of

these 163 sequences were part of the basis set used to regress the parameters of the linear

LNA NNT model of McTigue et al. (44). All experimental ∆Tm values were measured at or

very near [Na+] = 1 M, eliminating the need to correct for differences in salt concentration.

SBT model predicted ∆Tm(pred) values were computed as Tm(LNA) – Tm(DNA), with both required

Tm values determined by Equation 3.11 (∆∆H° and ∆∆S° = 0 for the unmodified reference duplex calculations).

As shown in Table 3.7, when applied to mixmers, the 4-parameter SBT model

calculates ∆Tm(pred) with similar accuracy as the two methods previously described (44,45).

90 Average ΔTm(error) for duplexes not used for parameter regression are +0.4 ± 1.5 °C for the

SBT model, -0.2 ± 1.4 °C for the linear LNA NNT model, and 0.0 ± 1.9 °C for the

OligoDesign algorithm. However, though all of these methods show similar performance in

correctly estimating ∆Tm(pred) of LNA mixmer duplexes, the SBT model provides ∆Tm(pred) values of greater accuracy and precision for duplexes highly substituted with LNA due to the temperature dependence of ∆H° and ∆S°. In particular, for eight 15-mer LNA mixmer

duplexes studied by You et al., each containing 7 LNAs on the modified strand, there is a

o significant improvement in the ∆Tm(error) when the SBT model is used (0.0 ± 1.2 C) as

compared to either the OligoDesign (-2.2 ± 1.9 oC) or LNA NNT model (-0.4 ± 2.3 oC).

o Table 3.7 Errors in ΔTm ( C) values predicted using the specified model for standard LNA (A, T, G and/or C) substituted mixmer duplexes.

Source Count Exiqon LNA NNT SBT NNT OligoDesign Modela Modela This studyb 43 0.1 ± 2.1 -0.6 ± 1.2 McTiguec 100 0.3 ± 1.7 0.5 ± 1.3 Youd 36 -1.3 ± 1.4 -0.1 ± 1.1 0.1 ± 1.2 Levine 27 0.2 ± 2.1 0.1 ± 2.1 0.5 ± 2.2 Subtotal 0.0 ± 1.9 -0.2 ± 1.4 0.4 ± 1.5 Errors values (denoted ΔTm(error) in the text) reported in Table 3.7 give the mean error ± standard deviation for a model predicted ΔTm(pred) values relative to experimental ΔTm. ΔTm values predicted by the LNA NNT and SBT NNT model ignored 5’ terminal LNA substitutions. b Data from Tables 3.1 and 3.2. c Data from McTigue et al.(44). d Data from You et al.(38). e Data from Levin et al.(174).

Finally, independent data from Levin et al. (174) provide reasonable confirmation

that single LNA substitutions at terminal ends do not provide an incremental increase in

duplex stability. For the four duplexes they studied that contain both interspersed internal

LNA(s) and a 5’ terminal LNA substitution, the computed ∆Tm(error) values decreased from

+3.7 °C when the appropriate incremental ∆∆Si° parameter of the SBT model was applied to

the 5’ LNA, to +0.2 °C when it was not. Tm data reported by Lattora et al. (184) for duplexed

91 primers containing terminal 3’ LNA substitutions show similar behavior, further supporting

the finding reported here that an LNA substitution at the duplex termini provides no

significant change in duplex stability relative to the DNA isosequence when adjacent base

pairs in the duplex are not populated with an LNA.

3.1.4 Testing the validity of SBT model assumptions.

A key finding of this work is the discovery that stability improvements resulting from standard LNA substitutions are entropically driven, likely due to a preorganization of the specific nucleotide involved. Derivation of the SBT model, however, assumes that these preorganization effects manifest themselves within the modified nucleotide, making incremental energy and entropy corrections additive at the individual base pair level. The validity of this potential insight and associated model assumption was therefore tested.

All parameters of the SBT model were regressed from DSC data for mixmer duplexes. However, if the underlying assumption of additivity at the base pair level is

correct, the model should be applicable to gapmer-type duplexes incorporating tandem

(neighboring) LNAs. To test this, a set of sixteen gapmer duplexes (Table 3.8), each

containing 1 of the 16 possible tandem LNA sequences, were designed and studied by UVM.

Melting of the three associated reference DNA duplexes was likewise studied both by UVM

and DSC, and DSC was also used to confirm thermodynamic values obtained by UVM for

seven of the sixteen tandem LNA substituted sequences. There was excellent agreement

between DSC and UVM data for all 10 duplexes studied by both techniques, with average

differences between the two methods of ∆H° = 0 ± 1%, ∆S° = 1 ± 1%, ∆G°37 = 0 ± 1%, and

Tm = 0.3 ± 0.2 °C.

92 Table 3.8 UVM derived thermodynamic data for duplexes containing tandem LNA substitutions.

Name Sequence ΔG°37 ΔH° ΔS° Tm ΔΔG°37 ΔΔH° ΔΔS° ΔTm Strand 1 Strand 2 kcal mol-1 kcal mol-1 cal mol-1 K-1 oC kcal mol-1 kcal mol-1 cal mol-1 K-1 oC T1 ttcatagccgt acggctatgaa 11.96 ± 0.13 79.2 ± 2.6 216.9 ± 7.6 59.0 ± 0.4 LCT ttcatagccgt acggCTatgaa 14.70 ± 0.37 81.7 ± 4.9 215.9 ± 14.1 70.6 ± 0.4 2.3 -2.9 -16.8 11.6 LGG ttcatagccgt acGGctatgaa 13.90 ± 0.34 81.5 ± 3.5 218.1 ± 10.0 67.0 ± 0.6 1.6 -1.4 -9.8 8.0 LTA ttcatagccgt acggcTAtgaa 13.83 ± 0.28 81.9 ± 4.3 219.4 ± 12.5 66.6 ± 0.2 1.6 -0.8 -7.9 7.5 LTG ttcatagccgt acggctaTGaa 14.26 ± 0.21 84.3 ± 0.9 225.8 ± 2.9 67.5 ± 0.6 2.0 1.1 -2.8 8.5 LAG ttcatAGccgt acggctatgaa 13.19 ± 0.06 76.9 ± 2.7 205.3 ± 8.0 65.6 ± 0.3 1.0 -5.4 -20.6 6.6 LCA ttCAtagccgt acggctatgaa 13.55 ± 0.21 81.9 ± 3.7 220.4 ± 10.9 65.3 ± 0.1 1.4 -0.2 -5.1 6.2 LCC ttcatagCCgt acggctatgaa 14.38 ± 0.26 81.3 ± 2.1 215.7 ± 6.0 69.3 ± 0.5 2.0 -2.7 -15.3 10.3 C1 ctacgcattcc ggaatgcgtag 12.08 ± 0.05 80.6 ± 1.6 221.0 ± 4.8 59.1 ± 0.1 LAT ctacgcATtcc ggaatgcgtag 12.98 ± 0.27 80.6 ± 3.5 217.9 ± 10.2 63.2 ± 0.5 0.8 -1.9 -8.9 4.1 LCG ctaCGcattcc ggaatgcgtag 14.23 ± 0.09 83.4 ± 2.0 222.9 ± 5.7 67.8 ± 0.2 1.9 -1.3 -10.1 8.6 LGC ctacGCattcc ggaatgcgtag 13.97 ± 0.18 82.4 ± 2.8 220.8 ± 8.2 66.9 ± 0.2 1.6 -1.8 -11.1 7.8 LTT ctacgcaTTcc ggaatgcgtag 14.08 ± 0.07 84.6 ± 2.2 227.4 ± 6.7 66.6 ± 0.3 1.8 0.5 -4.1 7.5 LAA ctacgcattcc ggAAtgcgtag 12.82 ± 0.08 80.3 ± 2.0 217.5 ± 5.9 62.5 ± 0.2 0.7 -1.9 -8.4 3.4 C2 ctaacggatgc gcatccgttag 12.00 ± 0.05 80.0 ± 0.3 219.2 ± 0.7 59.0 ± 0.5 LGT ctaacggatgc gcatccGTtag 12.79 ± 0.18 80.6 ± 0.6 218.5 ± 1.7 62.3 ± 0.3 0.7 -1.0 -5.3 3.4 LTC ctaacggatgc gcaTCcgttag 13.97 ± 0.24 81.1 ± 3.7 216.4 ± 10.6 67.5 ± 0.3 1.6 -2.8 -14.4 8.5 LAC ctaACggatgc gcatccgttag 13.85 ± 0.06 83.8 ± 0.8 225.4 ± 2.2 65.9 ± 0.4 1.6 0.6 -3.2 6.9 LGA ctaacgGAtgc gcatccgttag 13.70 ± 0.11 80.2 ± 1.9 214.5 ± 5.6 66.6 ± 0.3 1.4 -3.3 -15.1 7.6

The reported UVM data (ΔG°37, ΔH°, ΔS° and Tm) are for the helix-to-coil transition and were determined as reported in Materials and Methods. The reported errors refer to the standard deviation of triplicate runs. ∆H° and ∆S° are determined at the Tm of the sample, while ΔΔH° and ΔΔS° bp -1 -1 -1 are at the Tm of the reference isosequential DNA duplex and computed using the average ΔCp = 42 cal mol K bp . CT = 75 μM for all duplexes in Table 3.8.

93 Table 3.8 reports helix-to-coil transition data determined by UVM for these

sequences, as well as the experimental incremental thermodynamic values ∆∆G°37, ∆∆H°,

-1 -1 ∆∆S° and ΔTm. On average, ∆∆H° is -1.6 ± 2.4 kcal mol and ∆∆S° is -9.9 ± 5.3 cal mol

K-1, in accordance with the stability enhancement being entropically driven. Incremental thermodynamic values for the sixteen duplexes were also computed using the four-parameter

SBT model and good agreement was observed between model predictions and experiment,

with average model errors of -0.2 ± 0.3 kcal mol-1, -1.3 ± 1.7 kcal mol-1, -3.6 ± 5.4 cal mol-1

-1 K , and 0.0 ± 1.5 °C for ∆∆G°37(error), ∆∆H°(error), ∆∆S°(error), and ∆Tm(error), respectively. As

these sequences were not part of the data set used to regress SBT model parameters, the

agreement of model predictions with experiment provides good evidence that the

assumptions upon which the SBT model was derived are sound. Moreover, the SBT model

accurately predicts ∆Tm values for an additional 19 gapmer duplexes (Table 3.9) studied

previously (38). The prediction accuracy for the combined set of 35 duplexes was ∆Tm(error) =

0.1 ± 1.4 °C (Table 3.9), similar to that reported for mixmer duplexes (Table 3.7).

As an extreme test of SBT model performance, predicted ΔTm values were compared

with experimental data for 5 heteroduplexes in which the modified single strand bore an

LNA at every position. For each of these fully substituted sequences, the experimental ΔTm is very large, falling between 31.1 and 40.4 °C. ΔTm(error) values for this prediction are

reported in Table 3.9 and the results show that, even for these extreme cases, the predicted

ΔTm values differ from the corresponding experimental ΔTm by no more than 15%. A bias is

observed, as the SBT model under-predicts the melting temperature for the five fully

substituted LNA-DNA heteroduplexes. This bias arises, at least in part, because terminal

LNA can contribute to the net stability enhancement, albeit to a lesser extent when compared

94 to an internal LNA, when they are flanked by additional LNA. Indeed, ΔTm(pred) values for

fully substituted sequences differ from experiment by no more than ±6% (Table 3.9) when

the ∆∆Si° for each terminal LNA is taken to be half that reported in Table 3.5 for the

corresponding internal LNA. However, additional studies are required to establish a firm

understanding of the contribution to duplex stability made by terminal LNA(s) in highly

substituted gapmer sequences such as those found in fully substituted LNA-DNA

heteroduplexes.

Table 3.9 ΔTm prediction errors using SBT model for LNA substituted gapmer duplexes and fully modified LNA-DNA heteroduplexes.

o a Source # of Sequences ΔTm(error) ( C) LNA substituted gapmer duplexes This Studyb 16 0.0 ± 1.5 Youc 19 0.2 ± 1.3 Subtotal 35 0.1 ± 1.4 Fully substituted LNA-DNA heteroduplexesd CTTGTGAe -4.1 (1.1) TCACAAGe -2.5 (1.8) CGTATAGTf -6.9 (-0.5) CGTDTAGTf -6.1 (0.2) CGTDTDGTf -5.6 (0.7) a ΔTm(error) reported as the mean error ± the standard deviation between the model predicted ΔTm(pred) and the b c d experimental ΔTm. Data from Table 3.8. Data from You et al. (38). ΔTm(pred) values predicted for fully modified LNA-DNA heteroduplexes were calculated either ignoring terminal LNA substitutions or assuming that terminal LNA substitutions contribute half the stability as those for internal LNA (shown in brackets) e

ΔTm values estimated from ΔTmax as the high stability of LNA-DNA heteroduplex prevented complete thermodynamic analysis. fData from Rosenbohm et al. (180).

3.1.5 Substitution of standard LNAs with base-modified LNA nucleosides provides

further stability increases.

There are very few thermodynamic data available for duplexes containing the base-

modified LNA nucleotides D and H. Table 3.4 reports average incremental thermodynamic parameters for D and H substitutions determined by experiment and using the non-zero value

95 of ∆Cp. When compared to its standard LNA analog (A or T, respectively), either

substitution further stabilizes the duplex. However, as the chemical modification is now

within the base of the LNA, it is intended by design to alter the energetics (enthalpy) of

complementary base-pairing. Therefore, ∆∆Hi° are no longer athermal and indicate a change

in the strength of base-pairing, but not necessarily in base stacking since stacking is believed

to be primarily related to the surface area of the base, which is not changed appreciably by

the base modification (118).

The incremental entropy change for the modified LNA base D is similar to that for its

standard LNA homologue A. The D•t base pair is then further stabilized as compared to an

A•t pair by a favorable incremental enthalpy change for the hybridization reaction (-∆∆Hi°) of -0.5 ± 0.3 kcal mol-1, which is consistent with previously reported values for the

contribution to DNA duplex stability of a single hydrogen bond (185,186). This result is also

consistent with previous studies of DNA-2-aminoadenine (d) substitutions which indicate

that the increased duplex stability occurs through the creation of a third hydrogen bond

between the bases, formed between the 2-amine group introduced into adenine and the

existing 2-keto oxygen atom in thymine (187-189).

In contrast, the enhancement to duplex stability provided by H substitution when

compared to T substitution is found to be due to a more favorable incremental entropy

change, which in this case is partially compensated by an unfavorable incremental enthalpy

change. Previous studies on DNA-2-thiothymine (h) suggest the replacement of the 2-keto

oxygen with 2-keto sulfur may act to shift the base pair into a more stable A-type form (190).

Other work has suggested that the stability increase could be caused by improved stacking

and/or a change in solvation or cation interactions (191). However, the signs and magnitudes

96 of the regressed incremental thermodynamic parameters for the T to H substitution appear

consistent with the first argument; they also suggest a greater degree of preorganization of

both the altered base and the sugar ring of H within the single strand.

The incremental thermodynamic parameters for D and H reported in Table 3.4 can be used in the general SBT model to accurately determine ∆Tm values for mixmer and gapmer

duplexes containing both standard and base-modified LNAs. Average errors in predicted

∆Tm and Tm(LNA) values for the 20 duplexes in Table 3.2 bearing D and/or H substitutions were -0.5 ± 1.3 °C and -0.3 ± 2.1 °C, respectively.

3.1.6 D•H base pairs demonstrate pseudo-complementary properties.

Two duplexes from Table 3.1 (UD1 and UD11) were chosen to further study all

possible complementary base pairs between adenine (a, d, A and D) and thymine (t, h, T, H)

analogs. The results of thermal denaturation experiments on these duplexes are compiled in

Table 3.10. The results show that duplexes containing d•h, D•h and/or d•H base pairs have

lower Tmax values than their corresponding isosequential reference duplex containing a•t base

pairs. Duplexes containing D•H base pairs have a higher Tmax than their reference duplex

(a•t), but still exhibit net lower stability when compared to isosequences containing their three LNA•LNA base pair analogs (A•T, D•T and A•H). The D•H base pair (as well as the d•h, d•H, and D•h base pairs) is therefore found to be pseudo-complementary, likely due to steric hindrance (Figure 3.1) between the 2-amino group in d or D and the 2-thioketo group in h or H that serves to reduce base pair stability (179,181,183,192). This could prove useful in certain applications, as D•T and H•A base pairs have very high stability (Table 3.10) due to the absence of these steric hindrance effects. As a result, oligonucleotides containing A and

97 T, A and H, and/or D and T must be designed carefully, especially in regions of self- complementarity, to ensure that highly stable secondary products are not formed. The results reported here suggest that when these undesired secondary products are observed, it may be

advantageous to redesign the required oligonucleotides exclusively with D and H

substitutions and avoid the use of A and T.

Table 3.10 UVM derived ΔTmax data for the helix-to-coil transition of duplexes with base pairs formed between a, d, A or D and t, h, T, or H.

gcccXgcg / cgcYgggc X=a X=d X=A X=D Y = t - 3.5 1.8 8.3 Y = h 2.8 -8.4 4.8 -3.3 Y = T 4.7 7.9 12.0 20.5 Y = H 9.4 -0.4 17.1 9.8 aacatagXttacat / atgtaaYctatgtt X=a X=d X=A X=D Y = t - 0.7 2.0 3.4 Y = h 0.3 -6.6 3.8 -2.7 Y = T 3.8 4.9 6.5 9.1 Y = H 5.6 -2.8 10.0 4.0 Estimated error in ΔTmax = ± 1.0 °C. For both sequences tested, the Tmax of the duplex containing the base pair a•t at position X•Y was used as the reference to determine ΔTmax.

98 3.2 Conclusions

Chapter 2 reported results showing that nonlinear extension of the unified DNA NNT

model to account for the temperature dependence of ∆H° and ∆S° can improve Tm(DNA) predictions, especially at temperatures above 70 °C (177). Here, DSC and UVM data are presented which show that ∆H° and ∆S° for the helix-to-coil transition in duplexes containing LNA substitutions are also temperature dependent. The realization that ∆Cp is

non-zero is then leveraged to provide important insights into the manner by which LNA

substitutions serve to stabilize a duplex. In particular, previous studies, which treat melting

data assuming ∆Cp = 0, have concluded that the stability enhancement in duplexes

substituted with standard LNAs is enthalpically driven when ∆Tm becomes large (44).

Indeed, incremental enthalpy ∆∆H° and entropy ∆∆S° changes (3.2A) computed from the

DSC data reported here assuming ∆Cp = 0 are consistent with this observation. However,

proper accounting of the temperature dependencies of ∆H° and ∆S° shows that a negative

∆∆S° provides the stability enhancement, and that this stabilization effect is localized at the

level of the individual base pair. The results therefore suggest that neighboring bases play

little role in the net change in stability accompanying the substitution of a standard LNA into

a duplex. Though the possibility that LNA nucleobase substitutions promote subtle changes

in the base pair bonding interaction and/or stacking interactions cannot be categorically ruled

out, the results clearly show that ∆∆H° is zero within the experimental error of current

measurement techniques, including high-sensitivity DSC equipment.

Structural studies on duplexes containing one or more LNA substitutions have been

carried out to investigate possible differences in base-pairing and base-stacking between

LNA and DNA nucleotides (47-50). Regrettably, these studies have generally been

99 conducted at temperatures (25 to 27 °C) where stacking in both the single stranded and double stranded states occurs much more strongly than near the Tm of the duplex. As a

result, values of ΔCp are known to be significantly higher at 25 °C than at Tm (87). This

makes it difficult to relate structural data collected near room temperature to thermodynamic

data determined at Tm. However, even when conducting structural studies at these lower

temperatures, which should promote improved base-pairing and base-stacking interactions,

Jensen et al. (47) found that although LNA substitutions do alter the conformation of the

sugar within the LNA nucleotide, they do not promote significant changes in the base-

stacking when compared to an unsubstituted DNA duplex. Though limited, these structural

data are consistent with the finding that the stability enhancement mainly occurs at the level

of the individual base pair.

In accordance with both the structural data of Jensen et al. (47) and the extensive set

of DSC and UVM derived melting data reported here, melting thermodynamics of mixmer

(Table 3.7) and gapmer (Table 3.9) duplexes can be accurately predicted with a simple

group-contribution-type model that is additive at the level of the individual base pair. It is

important to note that this model structure is challenged by one study on RNA duplexes

(LNA can also be introduced into RNA) that suggests that a saturation of ∆Tm occurs with

increasing LNA substitutions prior to complete substitution of the strand with LNA (48). A

few other groups have likewise argued that the stability enhancement within duplex DNA

saturates with increasing LNA substitutions in a duplex of fixed length (49,50,193).

However, these arguments are largely based on a limited set of data that show that the

average increase in ΔTm per LNA substitution appears to decrease with increasing

substitution percentage. The SBT model, which is restricted to short complementary

100 oligonucleotides that show two-state melting behavior, does not predict a saturation effect

and instead predicts that ΔTm will continue to increase with each successive internal LNA

substitution by an amount specific to the type of LNA substituted. To address this

discrepancy, ΔTm was measured for a set of gapmer and LNA-DNA heteroduplexes offering increasing LNA content (Table 3.9). The data show that when the reduced contribution of terminal LNAs to stability is taken into account, ∆Tm does not saturate as LNA content

increases, a result that is well described by the SBT model. As a result, the SBT model

accurately predicts the ΔTm of gapmer duplexes containing 2, 3 or 6 neighboring LNA

substitutions. More importantly, experimental ∆Tm values for the five fully substituted LNA-

DNA heteroduplexes are slightly larger than predicted by the additive SBT model, a result

that is completely inconsistent with the idea that ∆Tm saturates (levels off) with increasing

LNA content. The saturation effect noted in a few previous studies therefore appears not to

be a general property of LNA-substituted duplexes, but may arise in certain specific

situations, including terminal LNA substitutions due to their a limited contribution to duplex

stability.

Finally, the use of D and H in place of A and T in a/t rich DNA oligonucleotides has

been characterized and the results reveal they may provide an effective strategy for improved

design of more challenging oligonucleotides for which formation of stable hairpins or homo-

dimers can be problematic (34,126) due to the exceptionally high stability of the A•T base

pair (Table 3.10).

101 Chapter 4: Design of LNA-rich Hydrolysis Probes for Detection of

Somatic Point Mutations

Real-time quantitative PCR (qPCR) using hydrolysis probes is regularly applied in a

variety of genotyping applications (36,39,194,195) to detect SNPs. However, this powerful

technology has not been extensively used in detection and quantification of somatic point

mutations (SPM), in part because an acceptable analytical specificity (SPE) is difficult to achieve. When applied in qPCR assays, standard DNA hydrolysis probes, such as Taqman™ technology, directed against a SNP or SPM typically offer an SPE for a target mutant (MT)

gene of ca. 5% relative to the background abundance of the germline or wild-type (WT) gene

(28). As a result, the MT gene must be present in relatively high abundance within the

clinical sample, making the technology unsuitable or at least unattractive for early disease

detection or for monitoring of minimal residual disease following treatment or surgery. One

increasingly popular strategy for improving the analytical specificity of an allele-specific

(AS) probe directed against a clinically relevant SPM is to introduce one or more locked

nucleic acid (LNA) into the probe. An example of the effectiveness of this strategy is

provided by a 16-mer LNA-bearing probe directed against JAK2 V617F; it achieves an SPE

of 2% (35). Although this represents an improvement over what can typically be achieved

using standard pure-DNA hydrolysis probes, it does not approach the SPE (≤ 0.01%)

generally required for unequivocal detection of an SPM within a standard clinical sample

containing 100 ng of total genomic DNA.

Improving the performance of hydrolysis probes used in AS qPCR assays of SPMs

therefore remains a desirable goal. One potential strategy is to increase the LNA content of

probes in a manner that improves their performance in detecting rare target alleles. Short

102 fully substituted 8-mer and 9-mer LNA probes are available as a commercial Universal Probe

Library engineered for use in a variety of applications, including the monitoring of gene

expression (196) and detection of miRNAs (197) and various virus (198). By

extension, short LNA-rich probes may therefore also have efficacy in the specific and

sensitive detection of SPMs. However, current guidelines for designing multi-LNA probes

(38), though useful, lack the fundamental underpinnings required either to meet the

performance needed to detect a rare SPM or to understand when and why a highly specific

and sensitive probe cannot be designed using LNA substitutions. Here, qPCR studies are

combined with model predictions of melting thermodynamics of LNA-containing duplexes to

better understand the requirements for designing LNA-rich hydrolysis probes offering

improved analytical specificity. AS probe design guidelines are proposed based on the

hypothesis that probe positioning on the template and LNA-substitution patterns on the probe

can be used to both define Tm, the melting temperature of the duplex formed between the

probe and its fully complementary mutant template, and increase ΔTm(MT-WT) so as to permit

efficient hybridization to the MT allele while effectively eliminating probe duplexation with

the non-target WT allele at the annealing temperature Ta of the qPCR assay. In principal,

both mismatch discrimination and the SPE of the associated qPCR assay should thereby improve in comparison to either unmodified DNA probes or first-generation LNA-type probes designed using methods proposed by You et al. (38).

The novel design strategy proposed here demands accurate prediction of hybridization thermodynamics and therefore leverages the fundamental advances reported in

Chapters 2 and 3 and the journal publications (177,199) on which they are based. Improved qPCR-based detection of KIT D816V and JAK2 V617F using hydrolysis probes is set as an

103 objective for the purposes of design guidelines development and technology assessment.

The results indicate that LNA substitutions made according to the design criteria can be used to create probes that show little or no cross-hybridization to the non-target WT allele in an

AS PCR assay. However, though unequivocal detection of the SPM is then predicted, qPCR experiments on mixtures of MT and WT templates reveal that the SPE becomes limited by a drop in the final (end-point) fluorescence signal as the MT allele is increasingly diluted in a constant background of WT allele. As a result, SPEs below 0.5% are not observed.

Nevertheless, considerable improvements in probe performance are realized by analyzing the melting thermodynamics of pure-DNA and LNA-rich probes and utilizing those results for probe design. In turn, this has provided an improved understanding of the factors that ultimately limit the SPE of qPCR assay technology utilizing hydrolysis probes.

4.1 Materials and Methods

4.1.1 Oligonucleotides

All DNA and LNA oligonucleotides used as templates, probes and primers in UV melting (UVM) experiments and in qPCR assays were purchased from Integrated DNA

Technologies (IDT; Coralville, Iowa, USA). Dual-labeled KIT D816V and JAK2 V617F

directed probes (Table 4.3) were HPLC purified after synthesis. Primer sequences used with

the KIT D816V probes included the forward primer (FP) 5’-ctcctccaacctaatagtgtattcacag-3’

and reverse primer (RP) 5’-gcagagaatgggtactcacg-3’. Primers used for the JAK2 V617F

P16L5 probe were as previously specified (35), while the forward and reverse primers used

with the JAK2 V617F P13L10 probe were 5’-gcagcaagtatgatgagcaagc-3’ and 5’-

cagatgctctgagaaaggcattag-3’, respectively.

104 4.1.2 Plasmids

Mini-genes synthesized by IDT (Coralville, IA) were used to represent the KIT WT and KIT D816V alleles. The KIT WT and D816V MT plasmids were kanamycin resistant, of pUC origin, and contained a 335 bp segment of KIT spanning portions of introns 16-17 and

17-18, and all of exon 17. BRAF plasmids were ampicillin resistant and also of pUC origin, and included a 280 bp fragment of BRAF spanning from intron 14-15, thru exon 15, and into intron 15-16.

JAK2 WT and V617F plasmids were created from PCR products derived from a patient that was positive for the JAK2 V617F mutation. Briefly, a 453bp target was amplified using primers JAK2-FP, 5’-tcctcagaacgttgatggcag-3’, and JAK2-RP, 5’- attgctttcctttttcacaagat-3’. The PCR protocol consisted of an initial enzyme activation at 94

ºC for 5 minutes followed by 35 cycles, each comprised of 94 ºC for 1 minute, 50 ºC for 1 minute and 72 ºC for 1 minute. A final extention at 72 ºC for 6 minutes was performed at the end of the run. The 50 µl PCR reaction contained 300 nM of primers, 3.5 mM MgCl and 100 ng of genomic DNA template. PCR products were analyzed on a 7% polyacrylamide gel to verify that the amplification fragments were of the correct size.

Subcloning into chemically competent TOP10 E. coli was achieved using a TOPO

TA subcloning kit (Invitrogen). 50 μl of the transfection reaction was plated on a

LB/ampicillin plate and grown overnight at 37 °C. Ten single colonies were picked and each colony was again grown overnight in LB/ampicillin media at 37 °C. Plasmid DNA was isolated using the Qiagen Midi plasmid purification kit according to the manufacturer’s instructions. Purified plasmid from each culture was individually sequenced and tested in a

105 Taqman®-based allelic discrimination assay (Applied Biosystems) to select the JAK2 WT

and JAK2 V617F bearing plasmids used for all studies reported here.

4.1.3 Monitoring helix-to-coil transitions with UV spectroscopy

UV melting (UVM) profiles for duplexes formed between a probe and template were

collected at a wavelength of 260 nm using a Varian (Santa Clara, CA) Cary 1E

spectrophotometer equipped with a 12 cell peltier temperature controller. Each probe was

combined with an equimolar concentration of a single-strand fragment of WT or MT

template. Probe sequences targeting KIT MT (D816V) and JAK2 MT (V617F) are shown in

Table 4.3. The probe and the corresponding template sequences (site of variance in bold) for

either KIT WT (5’-ttcttgatgtctctggctagaccaaaat-3’) and D816V (5’-

ttcttgatgactctggctagaccaaaat-3’) or JAK2 WT (5’-ggagtatgtgtctgtggagacga-3’) and V617F

(5’-ggagtatgtttctgtggagacga-3’) were combined to a total molar strand concentration (CT) of

5 or 10 μM in a solution containing 3 mM MgCl2 and Biorad iTaq buffer (20 mM Tris-HCl,

50 mM KCl, pH = 8.4). To ensure proper duplex formation, the samples were initially

heated to 95 °C and then slowly cooled. Samples were then heated from 25 to 95 °C using a

° thermal ramp rate of 0.5 C/min and the absorbance at 260 nm (A260) was collected every 0.5

°C. Raw data were exported to Microsoft Excel™ and the thermodynamic parameters ΔHo,

o ΔS and Tm were determined as previously described in Chapter 3 and (199).

4.1.4 UVM of 9-mer duplexes to determined ΔΔTmax(MM)

UVM experiments were also performed to investigate the destabilizing effect of

LNA•DNA base pair mismatches compared to their corresponding DNA•DNA base pair

106 mismatches. Melting data were collected for a total of 128 9-mer duplexes comprised of 64

DNA duplexes and an additional 64 isosequential LNA-DNA heteroduplexes that contained

a single centrally located LNA substitution (Appendix A). The 9-mer duplex sequences were

chosen such that all 64 unique trinucleotide sequences (5’-x-m-y-3’) with a centrally

positioned DNA nucleotide (m = a, c, g or t) were tested by varying the neighboring nucleotides (x and y = a, c, g or t). Similary the same 64 9-mer sequences allowed all possible trinucleotide sequences (5’-x-M-y-3’) with a centrally located LNA nucleotide (M =

A, C, G or T) to be studied. The impact of mismatches for both the DNA duplexes and LNA-

DNA heteroduplexes were studied by designing 192 9-mer oligonucleotides containing a centrally located non-complementary (i.e. mismatch) nucleotide (n = a, c, g or t) with all possible DNA•DNA base pair mismatches (m•n) and LNA•DNA base pair mismatches (M•n) investigated.

UVM data was collected for all duplexes at a CT = 75 μM and in buffer consisting of

1 M NaCl, 10 mM Na2HPO4 and 1 mM Na2EDTA at a pH = 7. The Tm for many of the

mismatched duplexes was very low (<30 oC) and UVM data for the complete melting

transition could not be collected. Therefore, the Tmax value, which could be determined from

the maximum of dA260/dT plots, was instead recorded for all duplexes. Although Tmax ≠ Tm

(80,81), ΔTmax and ΔΔTmax values do provide good estimates of ΔTm and ΔΔTm respectively.

4.1.5 Prediction of melting thermodynamics for probe•template duplexes

Melting thermodynamics for duplexes formed between KIT D816V or JAK2 V617F directed probes and the WT and MT target templates were predicted using the molecular thermodynamic model described in Chapter 3 and (199). In that model, Tm is calculated as

107 o ΔH (Tref ) + ΔCp (Tm −Tref ) Tm = o o o (4.1) ΔS (Tref ) + ΔΔSMg 2+ + ΔΔSLNA + ΔCp ln()Tm Tref − Rln()CT 4

where the heat capacity change for the helix to random-coil transition ΔCp is determined

bp -1 -1 o assuming ΔCp = 42 cal mol K per base pair and Tref is 53 C (177). ∆H°(Tref) and

∆S°(Tref) were determined from published nearest-neighbor thermodynamic parameters for complementary and mismatched DNA•DNA base pairs within duplexed oligonucleotides

(116), while ∆∆S°LNA values for LNA•DNA base pairs were computed as described in (199).

∆∆S°LNA is the incremental change in the entropy gain ∆S° for the helix-to-coil transition

resulting from substitution of a nucleotide with its corresponding LNA. In calculating

∆∆S°LNA, LNA substitutions at the 3’ end of a probe were treated as internal due to the

presense of the 3’ terminal quenchers. The entropic correction to Tm for the concentration of

2+ o Mg , ΔΔS Mg2+, was determined using the method reported by Owczarzy (200). Finally,

corrections to Tm to account for 5’-FAM and 3’ Black Hole Quencher 1 (BHQ1)

modifications to each probe were made using ΔΔHo and ΔΔSo parameters regressed from

melting thermodynamic data for duplexes harboring terminal fluorophores and quenchers

(201). For predictions under PCR conditions it is assumed that the CT of the probe is much

larger than the CT of the template, so that ln(CT) replaces ln(CT/4) in Equation 4.1.

4.1.6 Real-time qPCR

All primers were designed using Primer3 software (21), with the reverse primer (RP)

specifically designed to target a portion of the intron sequence. Reaction volumes (20 µL)

for all qPCR experiments contained 300 nM of each primer along with 200 nM of either the

KIT D816V directed pure-DNA probe or one of the corresponding LNA probes. Biorad iQ™

108 Powermix was used in all experiments and all real-time PCR experiments were performed on

a Biorad (Hercules, CA) MiniOpticon system. The cycle conditions used were 95 oC for 180 seconds for enzyme activation, followed by 50 cycles each consisting of a 15 s denaturation

o o reaction at 95 C and a 45 s annealing/extension reaction at Ta = 62 or 65 C. Only threshold cycle (Cq) values ≤ 40 were considered relevant and reported.

4.2 Results and Discussion

4.2.1 Model-based design of LNA-rich AS primers offering increased ∆Tm(MT–WT)

For benchmark pure-DNA probes and putative LNA-rich AS probes annealed to their complementary fragment of KIT D816V, Table 4.1 reports melting thermodynamics at PCR conditions predicted using Equation 4.1 and the molecular thermodynamic models described in Chapters 2 and 3. Each probe is designed to span the site of variance (SOV). The benchmark pure-DNA probe (KIT-P22) is 22 base pairs in length so as to meet the required

o o Tm (~5 C above Ta = 60 to 65 C). The probe was designed in combination with a set of

associated primers required for template amplification. The primers were designed using the

software Primer3™ with the RP specifically designed to target a portion of the intron

sequence. Primer3 (202) or other similar probe design software may also be used to design hydrolysis probes, but the predictions made are generally at preset solution conditions that

differ from those used for AS-qPCR and the models do not account for thermodynamic

contributions of LNAs, terminal fluorophores and quenchers. More accurate predictions of

Tm(MT) and ΔTm(MT-WT) for DNA as well as LNA-bearing probes at AS-qPCR conditions are therefore provided by the protocol described in the Materials and Methods, and

109 representative predictions using this model are shown in Table 4.1. For all probes reported, a

5’ terminal guanyl nucleotide was avoided due to its ability to quench the fluorophore (203).

Table 4.1 Model based design of DNA and LNA probes for KIT D816V.

a Target- Probe Sequence Tm ΔTm(MT-WT) Probe oC oC KIT-P22 ttggtctagccagagtcatcaa 65.7 5.2 KIT-P19L1 tggtctagccagagTcatc 65.7 7.9 KIT-P15L3 agccagaGTCatcaa 65.9 11.0 KIT-P14L5 ccagaGTCaTCaag 67.7 12.3 KIT-P11L6 ccagaGTCATC 66.2 18.4 Thermodynamic parameters predicted for probes as described in Material and Methods in + 2+ a PCR buffer containing 50 mM K , 3 mM Mg and a probe CT = 0.2 μM. The location of the site of variation is in bold.

LNA substitutions enhance the stability of duplexes (44,199), and this attribute may be used to design probes shorter than KIT-P22 that still meet the required Tm criteria (65 to

70 oC) for an AS hydrolysis probe. Moreover, the model proposed here predicts that the

shortening of probe length through LNA substitutions can greatly enhance its discrimination

potential (ΔTm(MT-WT)) so as to achieve selective annealing to the target allele (Table 4.1).

Consistent with the findings of You et al. (38), additional discrimination may be achieved

through the creation of an LNA•DNA base pair mismatch, as compared to a DNA•DNA

mismatch, at the SOV. Table 4.2 reports average ΔΔTmax(MM) values derived from UVM data

that may be used within the model to correct for the contribution of each possible LNA•DNA

mismatch at the SOV. In the proposed model, the selectivity of an LNA-rich AS probe is

estimated as:

ΔTm( MT−WT ) = Tm( MT ) −Tm(WT ) + ΔΔTmax( MM ) (4.2)

110 where Tm(MT) and Tm(WT) are predicted using Equation 4.1 as described in Materials and

Methods. With the exception of a G•t mismatch, an LNA•DNA mismatch formed at the SOV is found to enhance discrimination (ΔΔTmax(MM) > 0).

Table 4.2 Incremental ΔΔTmax(MM) for LNA•DNA mismatch in 9-mer duplexes determined from UVM experiments.

a Mutation Mismatches ΔΔTmax(MM) Probe Template (oC) Transition a•t ↔ g•c C a 1.2 ± 2.5 G t -2.7 ± 1.6 T g 1.3 ± 2.0 A c 0.7 ± 3.5 Transversion a•t ↔ t•a A a 3.4 ± 3.3 T t 1.6 ± 1.3

c•g ↔ g•c C c 3.1 ± 3.3 G g 4.6 ± 1.5

a•t ↔ c•g C t 2.5 ± 2.2 G a 6.7 ± 1.9 T c 1.9 ± 2.4 A g 5.3 ± 2.5 a ΔΔTmaxMM = (Tmax(LNA-PM) – Tmax(LNA-MM)) – (Tmax(DNA-PM) – Tmax(DNA-MM))

This new modeling approach yields predictions of sufficient accuracy for in silico AS

probe design. For example, Table 4.3 compares model predictions to UVM experimental

data (values in parentheses) for a representative set of probes directed against KIT D816V or

JAK2 V617F. In all cases, predicted melting thermodynamics are in good agreement with

experiment, typically matching or slightly under-predicting the true ΔTm(MT-WT), showing that

the model provides a conservative but useful estimate of ΔTm(MT-WT). It also provides a reliable thermodynamic basis for the initial design of LNA-rich AS probes based on the

111 principal of establishing a ∆Tm(MT–WT) that is sufficiently large to fully or largely avoid

hybridization to the WT template while permitting duplexation to the target MT template at

Ta. Finally, it reveals specific strategies for achieving these two necessary design criteria.

Table 4.3 Model predicted and UVM experimentally determined thermodynamic parameters for KIT D816V and JAK2 V617F probes at PCR solution conditions.

a o o Target/ Probe Sequence ΔH ΔS Tm ΔTm(MT-WT) Probe kcal/mol cal/mol K oC oC KIT D816V P22 ttggtctagccagagtcatcaa 180.3 501.0 68.2 5.2 (175.3 ± 3.4) (481.9 ± 9.8) (71.4 + 0.1) (6.7) P14L5 ccagaGTCaTCaag 115.8 308.8 71.8 11.8 (117.7 ± 2.7) (312.9 ± 7.7) (73.1 + 0.0) (15.1) P11L6 ccagaGTCATC 89.1 231.6 71.5 17.3 (93.5 ± 1.6) (242.6 ± 4.4) (73.7 + 0.2) (19.1) JAK2 V617F P16L5 ctccACagAAaCatacb 132.1 359.4 70.0 10.1 (131.1 ± 3.1) (353.8 ± 9.2) (72.3 ± 0.4) (9.9) P13L10 acaGAAACATACT 105.5 274.4 78.4 14.5 (103.9 ± 9.4) (270.6 ± 26.4) (77.6 ± 0.7) (14.1) + 2+ UVM performed in PCR buffer containing 50 mM K , 3 mM Mg . CT = 5 μM and 10 μM for the KIT D816V and JAK2 V617F respectively. Experimental thermodynamic parameters are shown in paratheses aThe location of the site of variation is in bold, and LNA substitutions are capitalized. bPreviously reported probe design for JAK2 V617F by Markova et al. (35).

Based on predictions from the model, the following putative guidelines apply to the in-silico

design of AS probes offering both a required Tm and a large ΔTm(MT-WT):

1. Design DNA oligonucleotides targetting the SPM for both the sense and antisense

strands using an appropriate primer design software (Primer3, etc.).

2. Determine Tm and ΔTm(MT–WT) for both probe designs using Equation 4.1 as described

in Materials and Methods.

3. Using the benchmark DNA probe sequences, shorten the length of the probe to

between 13 and 15 base pairs through LNA substitution. In order to maximize the

112 effect of the mismatch, the SPM should not be located at or near the terminal ends of

the probe. 5’ terminal guanines should also be avoided.

4. LNA substitutions should start at the location of the SPM and neighboring

nucleotides. Additional LNA substitutions are then made using equation 1 such that

o the probe has a predicted Tm of ~ 65 to 70 C under PCR conditions. The exact

number and location of LNA substitutions are dependent on the probe sequence, but

some general suggestions include:

a. LNA-C provides the largest stability increase and therefore is the most useful

for elevating Tm of probe.

b. LNA substitutions at the 5’ terminal position should generally be avoided as

they may reduce the 5’ exonuclease activity of Taq polymerase (46).

c. The generation of secondary structures within the probe itself or within one of

the two primers must be checked using an appropriate software, such as

Exiqon’s Oligo Optimizer (45). If stable secondary products are predicted,

the LNA substitution pattern in the probe should be altered.

5. Once the length, sequence and LNA substitution pattern have been defined for the

probes targetting the sense or antisense strands, the ΔTm(MT–WT) of each should be

predicted using equation 2 and the probe with the highest ΔTm(MT-WT) selected.

When followed, these guidelines should generate a probe offering the required Tm and a large

ΔTm(MT–WT). However, it is not known if these guidelines will also produce a probe capable of

achieving an SPE ≤ 0.01% and therefore offering the capacity for unequivocal detection of a

rare SPM in a clinical sample.

113 4.2.2 Performance testing of probe designs on plasmid templates

The benchmark DNA probes and putative LNA-rich AS probe designs reported in

Table 4.3 were evaluated in qPCR runs on samples containing serial dilutions of MT plasmid

template alone to determine amplification efficiencies (Figure 4.1) from relative fluorescence

unit (RFU) versus cycle number data. The pure-DNA parent probe (KIT-P22) and two LNA-

rich probes (KIT-P14L5 and KIT-P11L6) each provide for efficient amplification when

directed against KIT D816V (Table 4.4). As a simple test of the generality of the design approach, an extremely LNA-substituted probe (JAK2-P13L10) directed against JAK2 V617F

was also evaluated and displayed similar amplification efficiency (Table 4.4). Thus, the

presence of multiple LNA substitutions within the probes, including a large contiguous string

of substitutions, does not appear to alter the PCR such that incomplete or inefficient

amplification is observed.

Measured threshold concentration values CqWT are also reported for each probe

directed against the non-target WT template in the absence of the MT allele. Together, the

data in Table 4.4 permit a theoretical analytical specificity ( ) to be determined for each

probe directed against its target allele based on the number of replicates performed and the

desired degree of confidence (24,204). For a desired confidence interval (e.g., 95%), the

provided by an AS probe is given by

∆ % 100% 1 (4.3)

where E is the qPCR reaction efficiency estimated by standard curves using serial dilutions

of the MT template alone (Figure 4.1), ΔCq(WT-MT) is determined for reactions on 100% WT

and 100% MT template (using equivalent plasmid copy numbers) respectively, σCqWT is the standard deviation of CqWT determined from reactions performed in sextuplicate containing

114 only the WT template, and tc is the t-distribution critical value for the desired confidence interval (e.g., 2.01 for a one-tail 95% confidence interval).

o Figure 4.1 qPCR amplification at Ta = 62 C with P22 (A), P14L5 (B) and P11L6 (C) hydrolysis probes using serially diluted MT only plasmids (106 to 102 copies) and WT only plasmids (106 copies).

115 Table 4.4 Calculated theoretical analytical specificity of KIT D816V and JAK2 V617F o o hydrolysis probes at Ta = 62 C and 65 C.

o o Probe E Ta = 62 C Ta = 65 C a a CqMT CqWT ΔC CqMT CqWT ΔC q q KIT D816V P22 100% 17.8 ± 0.2 21.2 ± 0.2 3.3 12% 18.1 ± 0.0 23.6 ± 0.6 5.6 5% P14L5 99% 18.5 ± 0.3 32.4 ± 1.8 13.9 0.1% 18.9 ± 0.1 - - <0.01% P11L6 100% 20.9 ± 0.7 - - <0.01% JAK2 V617F P13L10 99% 17.5 ± 0.1 26.6 ± 1.1 9.1 0.8% 17.6 ± 0.3 - - <0.01% a determined for a one-tail confidence interval = 95% (tc = 2.01)

For the P22 probe directed against KIT D816V, a of between 5 and 12% is

predicted using Equation 4.3 (Table 4.4). This result is consistent with SPE values

previously reported for DNA hydrolysis probe based assays (205). Such SPE may be

sufficient for analyses, but clearly fall well short of that needed to reliably detect a

minority SPM. More important from the perspective of this study is the fact that the result is

consistent with the model predicted ∆Tm(MT–WT) (Table 4.1) which, as shown in Figure 4.1, is insufficient to prevent the DNA probe (P22) from cross-hybridizing to the WT template at a

o Ta generally used for qPCR probe based assays (60 to 65 C). Compared to the benchmark

P22 probe, the P14L5 probe offers a significantly larger ∆Tm(MT–WT). As shown in Table 4.4

and in accordance with the design criteria, cross-reactivity of this probe with the WT

o o template is greatly abated at a Ta = 62 C and completely eliminated when Ta = 65 C.

Interestingly, though it is not an ideal probe design (see above; it was designed to test model

predictions on a more extreme design case and is therefore characterized by a rather high Tm), the JAK2-P13L10 probe likewise offers a very large ∆Tm(MT-WT) (Table 4.3) and thus the

ability to fully discriminate its target JAK2 V617F allele from the JAK2 WT allele (Table

4.4).

116 The KIT-P11L6 design, which in contrast to the KIT-P14L5 and JAK2-P13L10

probes, shows no statistically significant fluorescence signal during qPCR amplification of

the KIT WT plasmid alone at any of the Ta tested (Table 4.4). The for the KIT-P11L6 probe, owing to the large ∆Tm(MT–WT), is therefore clearly not limited by cross-reactivity. In theory then, the performance of this AS probe is limited only by the number of cycles where qPCR data is deemed reliable, which is generally accepted as Cq ≤ 40 cycles (24). The

thermodynamic model and proposed design guidelines have thus been shown to provide a

means to generate LNA-rich AS hydrolysis probes that are highly specific for their target allele. When applied to individual templates, probes designed using this approach offer

values consistent with unequivocal detection of a rare SPM in clinical samples.

However, realizing unequivocal MT allele detection in clinical samples, where the target

template may be present at low frequency in a high background of WT template, requires that

probe performance be limited by the melting thermodynamic considerations upon which the

design guidelines are based, and not by any, as yet, unspecified effect(s) that arise when

analyzing mixtures of MT and WT genes.

117

o Figure 4.2 qPCR amplification at Ta = 62 C with P22 (A), P14L5 (B) and P11L6 (C) hydrolysis probes using MT only plasmids (100% MT), WT only plasmids (100% WT) and MT plasmids diluted in a background of WT plasmids (10% and 1% MT).

118 4.2.3 Application of LNA-rich probes to mixtures of MT and WT alleles

The study reported above demonstrates that model-designed LNA-rich probes,

engineered to exhibit a large ∆Tm(MT–WT), exhibit values that in theory should allow for

unequivocal detection of rare SPM in clinical samples. However when the KIT D816V MT

plasmid is diluted into a background of the KIT WT plasmid, which along with the MT

template is co-amplified by both primers, an SPE equal to the corresponding value is

not observed for either the KIT-P11L6 or KIT-P14L5 AS probes, nor for JAK2-P13L10 AS

probe. In each case, the measured SPE is equal to or above 0.5% (Table 4.5), and therefore

more than an order of magnitude higher than expected based on the design criteria and

associated individual template studies.

Cross-reactivity effects can be ruled out as an explanation for this discrepancy since

the design guidelines have been shown to ameliorate this possibility. Instead, what is

observed is a reduction in the fluorescent signal strength that is proportional to the degree of dilution of the KIT D816V plasmid relative to the constant background concentration of KIT

WT (Figure 4.2). The effect is observed for both the benchmark DNA and LNA-modified

probes, but the effect is more severe for the LNA-substituted probes where cross-reactivity

with the WT is eliminated. The end-point relative fluorescence () provides a

means to quantify this loss of signal. As shown in Figure 4.3, can be correlated with

the degree of KIT D816V plasmid dilution into a background of KIT WT plasmid using the

power function

(4.4)

where is the end-point relative fluoresence unit, determined as the average RFU

between cycles 38 to 40, for calibration samples where the MT template frequency (fMT) is

119 100%. The constant b (Figure 4.3) in Equation 4.4 is given by the slope of the log ()

vs. log (fMT) plot. As there is no cross-hybridization between the probe and the non-target

template, the true SPE of the assay can then be determined from the minimum ,

denoted here as , that can be reliably detected above background:

/ % 100% (4.5)

For all non-cross-reactive LNA probes tested at Ta, similar values of b (0.61 to 0.67) were

obtained, confirming that Equation 4.4 may be generally used to determine the SPE (Table

4.5). As shown in Figure 4.3, the SPE is given by the value of fMT where the line intersects RFUCq.

Table 4.5 Experimentally determined SPE for LNA bearing probes.

o a Probe Ta ( C) b SPE KIT D816V P14L5 65 0.36 0.67 0.5% KIT D816V P11L6 62 0.15 0.65 1.5% JAK2 V617F P13L10 65 0.30 0.61 0.5% a RFUCq = 0.01 was used to determined the SPE

Further results show that the decrease in probe signal strength is a feature inherent to

hydrolysis-probe-based qPCR detection of closely related template sequences (i.e. WT and

MT alleles differing by a single SPM) that share the same primer sites (Figure 4.2). The effect was observed for all probes designed to target either KIT D816V or JAK2 V617F. In

contrast, when an unrelated template such as BRAF WT, rather than KIT WT, serves as the

high-abundance background template during AS PCR of KIT D816V, no decrease in is observed with decreasing fMT. This is also true when both templates (BRAF WT and KIT

D816V) are co-amplified using two unique sets of primers and either the P22 or P11L6 probe

is used to detect KIT D816V amplicons; no reduction in is observed with increasing 120 end dilution of KIT D816V plasmids down to fMT = 0.1%. Thus, the drop in RFU associated with qPCR detection of a rare MT allele in the presence of an abundance of a near sequence- identical WT allele is not due to a limitation in one or more of the common reagents (Taq, dNTPs, Mg2+, etc) in the qPCR mastermix. Instead, the signal suppression is specifically related to near identical templates that co-amplify under a common set of primers. From a clinical applications perspective, this suppression limits the expected performance of AS hydrolysis probes such that SPE < ca. 0.5% may be difficult to realize.

Figure 4.3 RFFUend verse MT template frequency for KIT D816V & JAK2 V617F probes. o The KIT-P11L6 Probe (■) at Ta = 62 C and KIT-P14L5 (×) and JAK2- o P13L10 probe (○) at Ta = 65 C are shown. Both the KIT D816V and JAK2 V617F templates were diluted into a background of their corresponding WT plasmid template. RFUCq used to determine the SPE is also shown (―).

121 4.2.4 Understanding the reduction in hydrolysis probe signal strength

end Figure 4.2 shows that a drop in the end-point RFU (RFU ) with decreasing fMT occurs when either a pure-DNA or an LNA-rich hydrolysis probe directed against a MT allele is applied to samples containing both MT and closely related WT templates. However, the signal loss is more pronounced when LNA-rich hydrolysis probes are used. As a result, unlike for pure-DNA hydrolysis probes, where the SPE is determined by probe cross- hybridization with the WT gene (Figures 4.1A and 4.2A, WT-only control), the SPE for each of our LNA-rich hydrolysis probes is defined by the MT frequency at which drops

to the background (Figure 4.3). The mechanism by which this loss in probe signal strength

occurs is not clear, but is likely related to the way in which a hydrolysis probe monitors PCR

amplification.

PCR hydrolysis probes are dual labeled oligonucleotides that rely on the inherent 5′-

exonuclease activity of the Thermus aquaticus (Taq) polymerase to hydrolyze the probe and

thereby physically separate the 5′ reporter, often 6-carboxyfluorescein (FAM), and the 3′

quencher, which are often vendor specific and include, for example, Black Hole Quencher™.

Given that both the fluorophore- and quencher-modified nucleotides are not natural

substrates for Taq, the 5’-exonuclease activity and thus, probe hydrolysis rates, can be

affected by probe chemistry. More importantly, this added hydrolysis reaction, and the rate

at which it occurs, can serve to differentiate the kinetics of probe-based detection of real-time

PCR from those for SYBR-Green-based detection.

This is apparent in Figure 4.4 which shows the real-time monitoring by SYBR Green

or one of three hydrolysis probes of the qPCR-generated 137 bp amplicon from the KIT

D816V plasmid. Although SYBR-Green and the three hydrolysis probes are monitoring the

122 same amplification process, each of these reporters clearly provides a unique readout of the process, indicating either that they are altering the rate at which the amplification reaction proceeds or that at least some do not directly record amplification rates but instead report on the rate of a reaction that is dependent on amplification.

Figure 4.4 qPCR of KIT D816V using SYBR Green or one of three hydrolysis probes. Amplification monitored by SYBR Green (―), or the hydrolysis probes P22 (□), P14L5 (×) or P11L6 (●) is shown.

While it is not the intent of this study to fully elucidate the mechanism of hydrolysis- probe signal suppression and its dependence on LNA content (this thesis focuses on

DNA/LNA thermodynamics and its use to create improved assays for SPMs), the data presented in Figures 4.3 – 4.5 are consistent with a landmark study by Wittwer et al. (206), that carefully and comprehensively analyzes the performance differences between intercalating dyes and hydrolysis probes. During qPCR, it is commonly assumed that the change in RFU recorded by the reporter molecule (dye or probe) directly monitors the

123 products of PCR amplification. Indeed, Wittwer et al. (206) showed both that SYBR Green provides a direct linear measure of dsDNA abundance over a wide range of template/amplicon concentrations and that RFU values, recorded by SYBR Green, increase in linear proportion to total amplification product. Figures 4.4 and 4.5 show that a statistically significant RFU, above background, from the PCR amplification of ~1,000,000

(8 x 10-5 nM) KIT D816V plasmids is first detected by SYBR Green at ~13 cycles. In each of the next 4 to 5 cycles, the ratio RFUi+1/RFUi, where i is the cycle number, is approximately 2 (Figure 4.5), in accordance with the recorded amplification efficiency of

100% (Table 4.4). After this, per cycle increases in PCR amplification product drop rapidly.

As SYBR green provides a direct linear measurement of the quantity of dsDNA, changes in

RFU can be used to compute changes in the concentrations of the different components

(primers and template) involved in the PCR reaction. The results (Figure 4.5) show that after approximately 19 PCR cycles, the concentration of the template is within an order of magnitude of the primers. This has obvious implications, the first being that the primer concentration is no longer sufficient to support exponential amplification. In addition, the higher template concentration can result in significant self-annealing of the template (206).

124

o Figure 4.5 RFFUi+1/RFUi per cycle for qPCR amplification at Ta = 62 C of KIT D816V plasmid monitored by SYBR Green or hydrolysis probe. Data for SYBR Green (×) or the hydrolysis probe P22 (□) or P11L6 (●) and the model predicted [template]/[primer] ratio is also shown (▬).

For both pure-DNA and LNA-rich probes, results in Figures 4.4 and 4.5 show that the kinetics recorded by hydrolysis probes differ significantly from those recorded by SYBR

Green; Cq, for example is 4 to 5 cycles larger (Figure 4.4). More importantly, comparison to the SYBR Green generated data shows that the RFU signal increase detected by hydrolysis probes continues outside the exponential phase of template amplification (Figure 4.5). The data therefore show that the kinetics recorded by hydrolysis probes differ significantly from those recorded by SYBR Green for the same reaction. This reason for this is not fully defined, though the detailed studies of Wittwer et al. (204,206) suggest that hydrolysis probes do not measure the amplification reaction directly, but instead report the rate of a reaction that depends upon the amplification reaction. That rate-defining reaction is not known, but

125 Wittner et al. (206) have noted that signal generation by 5′-exonuclease hydrolysis probes involves a number of reactions in addition to template amplification. These include sequence-specific hybridization of the probe to single-stranded template, as well as

hydrolysis of the reporting fluorophore off the probe and removal of the partially digested

probe from the growing extension product. While the data reported in this thesis on their

own are not sufficient to verify the findings of Wittwer et al. (206), the probe signal kinetics

reported in Figure 4.4 and 4.5 are consistent with relatively slow rates of fluorophore

cleavage and reporter release.

Finally, Figure 4.4 shows that Cq is similar for pure-DNA and LNA-rich probes

directed against the same template and operating on the same initial template concentration

(Figure 4.4). Where differences in the performance of pure-DNA and LNA-rich probes can

be noted is in the slope of the RFU increase with cycle number above Cq (Figure 4.4), which

is found to decrease with increasing LNA content and, thus, decreasing probe length. To this

point, Kaiser et al. (207) demonstrated that the 5’ exonuclease activity of Taq polymerase is

lower for shorter duplexes than longer ones, with a significant reduction in cleavage activity

end observed for duplexes that were 10 bp long compared to those 12 bp long. The RFU MT values (Table 4.5) for the 11-mer, 13-mer and 14-mer LNA-bearing probes are consistent with this finding, suggesting that length of the probe may play an important role in determining both the strength of the probe signal and the SPE that can be achieved with the probe when applied to the detection of a rare SPM.

While not the dominant effect, differences in probe hybridization efficiency may also

end contribute to the observed reduction in RFU MT . A thermodynamic analysis of probe

hybridization under PCR conditions was therefore conducted. Data in Chapters 2 and 3 show

126 that the change in melting enthalpy ΔHo of a complementary pure-DNA or LNA-substituted

duplex increases in proportion to duplex length (see Chapter 2). As a result, dsDNA formed

with a long pure-DNA probe such as P22 has a significantly higher ΔHo than that formed

from a shorter LNA-rich probe (e.g. KIT-P11L6) designed to melt at the same Tm (Table 4.3).

The significance of this effect is shown in Figure 4.6, where the shape of the fractional curve

o (fraction of strands in the dsDNA state) around Tm is computed as a function of ΔH .

Probe•template duplexes having a lower ΔH° display broader melting transitions. Despite the KIT-P11L6 probe having a slightly higher Tm than the P22 probe, the fraction of dsDNA

o (αMT) at Ta = 62 C is therefore higher for P22 (αMT = 0.89) than for P11L6 (αMT = 0.81). To

understand the relationship between αMT and the signal strength, for the P22, P14L5

o and P11L6 probes was determined at Ta values ranging from 57 to 70 C. Ta values were

then converted to αMT values for each probe using data from Figure 4.6. Based on this

analysis, probe signal strength ( ) is found to increase with αMT, especially when αMT rises above 90% (Figure 4.7). Thus, hybridization efficiency likely also contributes to

differences in end-point probe signal strength. However a comparison of values for

the P22, P14L5 and P11L6 probes at the same αMT indicates that the contribution is small.

127

Figure 4.6 Theoretical fractional curves determined for KIT D816V hydrolysis probes. P22 probe αMMT (▬) and αWT (▪▪▪), P14L5 probe αMT (▬) and αWT (▪▪▪) and P11L6 probe αMT (▬) and αWT (▪▪▪) are shown. Fractional curves were generated using thermodynamic parameters from Table 4.3 at PCR conditions o with probe CT = 0.2 μM and ΔH (Tm) values were corrected using a ΔCp = 42 cal mol-1 K-1 bp-1.

128

Figure 4.7 Signal strength for KIT D816V DNA and LNA-beearing probes. KITI D816V P22 (□), P14L5 (●) and P11L6 (×) hydrolysis probes as a function of predicted bound fraction (αMT).

4.3 Conclusions

Both pure-DNA hydrolysis probes and sparsely LNA-substituted hydrolysis probes have previously been described and used to identify and differentiate genes or genotypes.

However, when probes of either type have been applied to the detection of a rare allele bearing an SPM, none has yielded an SPPE of less than 2%, and few show an SPE less than

5%. As these previous probes were not designed to maximize ΔTm(MTT–WT) and thereby eliminate cross-hybridization to the non-target WT allele, it was reasonable to assume that significant improvements in SPE could be achieved through probe designs that satisfy those thermodynamic criteria. Indeed the LNA bearing probes tested here were found to

129 significantly reduce or eliminate cross-reactivity with the non-target WT template, with the

short probe KIT-P11L6 showing the highest ΔTm(MT–WT) and also the greatest range of Ta where no cross-reactivity was observed.

However, the work reported here identifies significant non-thermodynamic limitations to the performance of hydrolysis probes when applied to sensitive detection of

SPMs. Probe signal strength was found to be an important factor in determining the ultimate

SPE of an LNA-bearing probe, with the longer LNA-rich probes KIT-P14L5 and JAK2-

P13L10 having higher and a better SPE than the shorter KIT-P11L6 probe. As a

result, if an SPE less than 0.5% is to be realized, the guidelines to LNA-rich AS probe design

proposed in this Chapter will need to be augmented with additional rules and chemistries that

permit highly efficient cleavage of the 5’ fluorophore in a length independent manner.

Additional, though smaller improvements may be realized by identifying ways to maintain a

sharp melting transition while achieving a ΔTm(MT–WT) sufficient to exclude cross-

hybridization. This would permit an αMT ≈ 1 for the probe•target-allele duplex at Ta.

Melting thermodynamic data reported in Chapter 3 for the novel 2,6-diaminopurine and 2- thiothymidine base-modified LNAs suggest that these attributes may be engineered into hydrolysis probes through the precise substitution of these two super-bases, possibly in conjunction with substitutions of conventional LNAs.

However, the development of improved fluorophore cleavage chemistries and appropriate super-base substitution rules was not considered further in this thesis, in large part because an alternate strategy for using LNAs to achieve unequivocal detection of rare

SPMs was explored, and it proved to be exceptionally powerful and versatile. That strategy and results from it are reported in Chapter 5.

130 Chapter 5: Novel Plexor™ Multi-LNA Allele Specific Primers for

Unequivocal Clinical Detection of Somatic Point Mutations: Design Rules and Application to JAK2 V617F, KIT D816V and BRAF V600E

DNA AS primer qPCR assays can, on occasion, provide sufficient discrimination to

permit unequivocal detection of a SPM (29). Typically, however, the lowest SPE that can be

detected with statistical confidence as being different from the signal generated from

unwanted amplification of the WT allele alone, lies near or above 1% (i.e. a MT to WT allele

copy frequency of 10-2) (30-33). This SPE falls well short of that required to accurately and unambiguously detect a rare SPM within a typical clinical sample containing 100 ng of genomic DNA. AS primers containing a single LNA substitution at or near the 3’ have been shown to provide better discrimination than DNA AS primers (41,42,161), however such AS primer designs are not guaranteed to provide the required SPE for unequivocal detection of

SPMs. AS primers bearing multiple LNA substitutions may provide a means to design AS primers with LNA patterns that can consistently achieve SPE required for unequivocal detection of SPM. Although studies involving primers containing multiple LNA substitutions have been performed (174,184) they do not provide definitive rules for optimal positional placement of LNAs, especially for application as AS primers. Instead, they provide some general insights, including the observation that the introduction of a large number of LNA substitutions within a primer can prevent amplification, a trait that has been exploited in the design of LNA-based blocking oligonucleotides (208-211).

This chapter reports on a comprehensive study of single and multiple LNA substitutions at or near the 3’ terminus of a primer. Positions in the primer are identified where Taq polymerase activity appears sensitive to the chemical and structural changes

131 provided by the 2’-O, 4’-C methylene bridge within LNA nucleotides that restricts the ribose

sugar to its C3’-endo (N-type) conformation and locally shifts the duplex into the A form

(47). Guidelines for the positional placement of LNAs are proposed and then used to create a set of multi-LNA AS primers directed against the JAK2 V617F SPM that demonstrate an

SPE < 0.01% and thus meet the criteria for clinically unequivocal SPM detection.

The general utility of the design guidelines is established through creation of AS primers directed against KIT D816V and BRAF V600E, respectively. These two additional clinically relevant targets, which were not used to establish the putative AS primer design guidelines, show that unequivocal detection of a rare SPM can be achieved using one or more of the LNA substitution patterns proposed in this work.

Finally, the technology is adapted for clinical use by combining the proposed multi-

LNA substitution patterns with Promega’s Plexor™ qPCR reporter technology to establish a novel platform for creating absolute multi-LNA AS primers that offer several advantages, including the need for only two primers to achieve unequivocal and specific quantitation of an amplified allele, and the ability to confirm the specificity of the amplification product thru post-PCR melting curve analysis. No probes or sequence-specific blocking agents are required, reducing the cost and complexity of the assay. When the Plexor multi-LNA AS primers are applied to detection of JAK2 V617F within clinical samples, an SPE < 0.01% is again observed, which represents a three order of magnitude improvement in SPE relative to the hydrolysis-probe-based assay it has now replaced in the Cancer Genetics Laboratory of the British Columbia Cancer Agency (BCCA).

132 5.1 Materials and Methods

5.1.1 Design and testing PCR primers with single and multiple LNA substitutions

A mini-gene spanning nucleotides 755 to 851 of the human BCL2 gene (Accession:

NM_00063) and all pure DNA and LNA substituted primers were synthesized by Integrated

DNA Technologies (Coralville, IA). A set of reverse primers (RP) directed against the BCL2

mini-gene (Table 5.1) were designed by shifting the primer position by one base to permit

each LNA nucleobase (A, T, G or C) to be introduced at each of the last five 3’-end positions

of the primer, either as a single substitution or as part of a combination of LNA substitutions.

The sequences of all forward and reverse primers used in these studies are reported in Tables

5.1 and 5.3.

Real-time qPCR experiments used to establish standard and multi-LNA primer

designs were conducted on a Biorad MyiQ qPCR instrument. PCR products were monitored

using either SYBR green or a hydrolysis probe synthesized with a 5’ 6-carboxyfluorescein

(FAM) reporter dye and a 3’ Black Hole Quencher-1 (BHQ1) group (5’ FAM-

cgacgacttctcccgccgct-BHQ1 3’). Each 20 μl reaction consisted of primers (200 nM), a

specified amount of template, and either Biorad iQ SYBR green Supermix or Biorad iQ

Supermix containing 200 nM of added hydrolysis probe. The PCR protocol consisted of 3

min at 95 °C, followed by 40 reaction cycles comprised of denaturation at 95 °C for 30

seconds, annealing at 60 °C for 30 seconds, and extension at 72 °C for 30 seconds. Melt

curves of PCR products were measured for the SYBR green monitored reactions to confirm

generation of a single amplicon.

Two different methods were used to determine PCR efficiency. In the first method, a

standard curve was generated by plotting the experimentally determined quantification cycle

133 (Cq) against the logarithm of the BCL2 plasmid template concentration, which was varied

over seven orders of magnitude by serial dilution. The PCR efficiency of the given primer

set was then determined from the slope of the standard curve.

In the second approach, which used a predetermined BCL2 template concentration,

eight identical reaction mixtures containing a given primer set were loaded across a PCR

plate to permit monitoring of amplification at different annealing temperatures (Ta). The

PCR followed the protocol described above, except the gradient feature of the machine was

used to vary the Ta from 48 to 72 °C across the length of the block. The efficiency of a PCR

using an LNA containing primer (ELNA) was then estimated from the average Cq recorded for

o the LNA-bearing RP/FP set at Ta ≤ 60 C (CqLNA) and for the corresponding isosequential

DNA RP/FP set (CqDNA)

CqDNA CqLNA E LNA = [(1+ E DNA ) ]−1 (5.1)

with EDNA determined using the standard curve method described above. For several LNA- bearing primer sets, both approaches were applied and good agreement between estimated

PCR efficiencies was observed (Table 5.1).

5.1.2 Testing of JAK2 WT and MT (V617F) AS primers on plasmids

JAK2 WT and V617F plasmids were created as described previously in Section 4.1.2.

Testing of JAK2 AS primers was performed using Biorad iQ SYBR green Supermix and 200

nM of both the AS and common primers. All assays used 5 x 10-3 ng (~1 million copies) of

plasmid and the PCR protocol was that used in the single and multiple LNA primer

experiments described above, with Ta either set to 60 °C or varied over the range 55 to 70 °C to determine amplification efficiency.

134 5.1.3 Testing of KIT and BRAF plasmids with AS primers

Mini-genes synthesized by Integrated DNA Technologies (Coralville, IA) were used

to represent the KIT WT and D816V MT alleles, and also the BRAF WT and V600E MT

alleles. The resulting KIT WT and D816V MT plasmids were kanamycin resistant, of pUC origin, and contained a 335 bp segment of KIT spanning portions of introns 16-17 and 17-18,

and all of exon 17. BRAF WT and V600E MT plasmids were ampicillin resistant and also of

pUC origin, and included a 280 bp fragment of BRAF spanning from intron 14-15, thru exon

15, and into intron 15-16.

LNA AS primers against KIT WT and D816V MT alleles and all pure DNA primers directed against either BRAF or KIT were synthesized by Integrated DNA Technologies.

LNA AS primers against BRAF WT and V600E MT alleles were synthesized by Exiqon Inc.

(Vedbæk, Denmark). The common FP or RP used in combination with each AS primer was designed to hybridize to a portion of an intron within the mini-gene template sequence on the plasmid.

All qPCR experiments on KIT and BRAF alleles were performed on a Biorad Mini-

Opticon instrument using the same reactions conditions (Biorad iQ SYBR green Supermix,

6 o 200 nM of each primer, 10 plasmid copies) and qPCR protocol (Ta = 60 C) as used for the

JAK2 related experiments described above.

5.1.4 Genomic DNA isolation

Patients requiring testing for a myeloproliferative neoplasm (MPN) had peripheral

blood submitted to the Cancer Genetics Lab at the BCCA. According to the World Health

Organization (WHO), MPNs are classified in two groups: those that are BCR-ABL fusion

135 positive such as in chronic myeloid leukemia (CML), and those that are BCR-ABL fusion

negative such as in myelofibrosis (MF), polycythemia vera (PV) and essential

thrombocythemia (ET). BCR-ABL fusion negative patients were tested for the JAK2 V617F mutation by isolating genomic DNA from 220 µl of peripheral blood using the FujiFilm

QuickGene-810 kit, a column based extraction method. Peripheral blood samples were frozen at -80 °C until a sufficient number was collected for testing. Each sample was then thawed and 30 μl of enzyme dilution buffer (EDB) and 250 μl of lysis buffer (LDB) were added. The diluted peripheral blood samples were incubated at 57 °C for 2 minutes, after which 250 μl of 100% ethanol (EtOH) was added to each tube and the mixture vortexed for

10 seconds. The samples were then loaded onto the Fuji QuickGene-810 extractor to isolate genomic DNA, which was quantified using a Nanodrop DNA spectrophotometer (Thermo

Scientific, Waltham, MA. USA).

Genomic DNA was also isolated from 96 patients with non-hematologic conditions.

EDTA whole blood treated with red blood cell (RBC) lysis buffer was spun to pellet white blood cells (WBC). Approximately 1 x 107 WBCs were taken and treated with cell lysis

reagent ± proteinase K in an overnight digest. Protein precipitate solution was then added and the resulting mixture vortexed and spun at 15,000 rpm for 5 minutes. The supernatant was drawn off and mixed with an equal volume of 100% isopropanol. The mixture was again spun at 15,000 rpm for 5 minutes and the recovered DNA pellet was washed with

70% ethanol, dried and resuspended in TE for quantitative analysis of nucleic acid concentration using a Nanodrop DNA spectrophotometer.

136 5.1.5 Plexor multi-LNA AS primer assay

Two primers, L123 WT0 and L123 MT0, directed against JAK2 WT and MT

(V617F) were selected for further modification with Plexor technology and subsequent

performance testing. Each modified primer was synthesized by IDT with the addition of

FAM-isodC at the 5’ terminus. Serial dilutions were created of the JAK2 WT or MT plasmid

alone or in a background of the corresponding JAK2 MT or WT plasmid. The Plexor-based

PCR protocol developed in this work uses 20 μl reaction mixtures containing Promega

Plexor mastermix pre-incubated with Clontech Taqstart Antibody (0.16 μl for every 10 μl of

mastermix) and 250 nM of the AS and common primers (Plexor primers were resuspended

in MOPS-EDTA buffer). The addition of Mg2+ (1.5 mM) to the final mix then provided for

efficient and specific amplification. The PCR protocol consisted of 3 minutes at 95 °C to activate enzyme followed by 50 cycles, each comprised of 95 °C for 15 seconds and 65 °C for 35 seconds. Post PCR melt data were acquired at the end of each run, with data collected from 50 to 95 °C at a scan rate of 1 oC/min.

Genomic DNA extracted from each clinical sample was diluted to 20 ng/μl and 5 μl

(100 ng) of that mixture was used for the 20 μl reaction. For each patient, two WT reactions

and two MT (V617F) reactions were performed on an ABI 7900 qPCR instrument (Applied

Biosystems, Foster City, CA). Raw amplification data were transferred to Plexor analysis

software for calculation of Cq and ΔCq. Melt curves were also acquired, and the peak derived

from the derivative of each curve was used to determine Tm and to confirm the generation of

a single amplicon.

137 5.1.6 Benchmark hydrolysis-probe based assay of JAK2 V617F

Clinical samples were also tested using a multiplexed MGB-hydrolysis-probe based assay developed and applied at the BCCA. Each 25 μl reaction consisted of 100 ng of extracted genomic DNA, 300 nM of primers FP 5’-gagcaagctttctcacaagcat-3’ and RP 5’- gcattagaaagcctgtagttttacttactct-3’, and 250 nM of dual labeled hydrolysis probes specific for the JAK2 WT (5’-FAM-cacagacacatactc-MGB-NFQ-3’) and JAK2 V617F (5’-VIC- ccacagaaacatactc-MGB-NFQ-3’), respectively. qPCR was performed on an ABI 7900 qPCR instrument (Applied Biosystems) using the following protocol: 50 oC for 2 minutes, 95

oC for 10 minutes, and 40 cycles, each comprised of 95 oC for 15 seconds and 62 oC for 90 seconds.

5.1.7 JAK2 MutaQuant assay

Clinical samples were also analyzed with the commercially available JAK2

MutaQuant assay (Ipsogen, Luminy Biotech, Marseille, France) performed in single-plex using the protocol provided by the manufacturer. All reactions were performed in duplicate, with the results from the single-plex reactions compared to standard curves to determine the mutant frequency in each patient sample. A confidence interval of 95% relating to a manufacturer’s stated SPE of 0.21% was used to classify positive samples.

138 Table 5.1 PCR efficiencies Eexpt for amplification of the BCL2 plasmid mini-gene using pure-DNA and LNA-substituted primers.

Name Sequence LNA Position Eexpt

SYBR Probe Cq based Average Reverse Primers with LNA Substitutionsa RP1 gacatctcggcgaagtcgcg 97% 97% 97% RP1 L0 gacatctcggcgaagtcgcG G 3' 94% 99% 97% RP1 L1 gacatctcggcgaagtcgCg C 3'-1 97% 101% 99% RP1 L2 gacatctcggcgaagtcGcg G 3'-2 96% 99% 97% RP1 L3 gacatctcggcgaagtCgcg C 3'-3 95% 99% 97% RP1 L4 gacatctcggcgaagTcgcg T 3'-4 96% 97% 96% RP1 L7 gacatctcggcgAagtcgcg A 3'-7 98% 98% RP1 L8 gacatctcggcGaagtcgcg G 3'-8 97% 97% RP2 ggacatctcggcgaagtcgc 96% 95% 95% RP2 L0 ggacatctcggcgaagtcgC C 3' 96% 95% 96% RP2 L1 ggacatctcggcgaagtcGc G 3'-1 95% 101% 98% RP2 L2 ggacatctcggcgaagtCgc C 3'-2 95% 94% 94% RP2 L3 ggacatctcggcgaagTcgc T 3'-3 95% 100% 98% RP2 L4 ggacatctcggcgaaGtcgc G 3'-4 93% 98% 95% RP2 L5 ggacatctcggcgaAgtcgc A 3'-5 94% 90% 92% RP2 L6 ggacatctcggcgAagtcgc A 3'-6 85% 91% 88% RP2 L7 ggacatctcggcGaagtcgc G 3'-7 99% 99% RP3 tggacatctcggcgaagtcg 93% 91% 92% RP3 L0 tggacatctcggcgaagtcG G 3' 90% 90% 90% RP3 L1 tggacatctcggcgaagtCg C 3'-1 90% 89% 90% RP3 L2 tggacatctcggcgaagTcg T 3'-2 92% 89% 91% RP3 L3 tggacatctcggcgaaGtcg G 3'-3 92% 90% 91% RP3 L4 tggacatctcggcgaAgtcg A 3'-4 93% 91% 92% RP4 ctggacatctcggcgaagtc 98% 99% 99% RP4 L0 ctggacatctcggcgaagtC C 3' 94% 94% 94% RP4 L1 ctggacatctcggcgaagTc T 3'-1 96% 100% 98% RP4 L2 ctggacatctcggcgaaGtc G 3'-2 97% 99% 98% RP4 L3 ctggacatctcggcgaAgtc A 3'-3 95% 100% 97% RP4 L4 ctggacatctcggcgAagtc A 3'-4 95% 100% 97% RP5 gctggacatctcggcgaagt 102% 105% 104% RP5 L0 gctggacatctcggcgaagT T 3' 101% 107% 104% RP5 L1 gctggacatctcggcgaaGt G 3'-1 101% 107% 104% RP5 L2 gctggacatctcggcgaAgt A 3'-2 100% 104% 102% RP5 L3 gctggacatctcggcgAagt A 3'-3 103% 109% 106% RP5 L4 gctggacatctcggcGaagt G 3'-4 95% 104% 100% RP5 L5 gctggacatctcggCgaagt C 3'-5 83% 84% 83% RP5 L6 gctggacatctcgGcgaagt G 3'-6 88% 88%

139 Table 5.1 continued

Name Sequence LNA Position Eexpt SYBR Probe Cq based Average RP6 ggctggacatctcggcgaag 95% 98% 97% RP6 L0 ggctggacatctcggcgaaG G 3' 90% 95% 92% RP6 L1 ggctggacatctcggcgaAg A 3'-1 94% 101% 98% RP6 L2 ggctggacatctcggcgAag A 3'-2 94% 101% 97% RP6 L3 ggctggacatctcggcGaag G 3'-3 93% 98% 95% RP6 L4 ggctggacatctcggCgaag C 3'-4 89% 97% 93% RP6 L5 ggctggacatctcgGcgaag G 3'-5 68% 73% 70% RP7 L7 ggctggacatctCggcgaag C 3'-7 97% 97% RP6 L8 ggctggacatcTcggcgaag T 3'-8 98% 98% RP7 tggctggacatctcggcgaa 98% 97% 98% RP7 L0 tggctggacatctcggcgaA A 3' 95% 101% 98% RP7 L1 tggctggacatctcggcgAa A 3'-1 97% 96% 97% RP7 L2 tggctggacatctcggcGaa G 3'-2 96% 94% 95% RP7 L3 tggctggacatctcggCgaa C 3'-3 96% 102% 99% RP7 L4 tggctggacatctcgGcgaa G 3'-4 88% 99% 93% RP7 L7 tggctggacatcTcggcgaa T 3'-7 98% 98% RP7 L8 tggctggacatCtcggcgaa C 3'-8 94% 94% Forward Primers with LNA Substitutions FP L7b gccacctgtggTccacct T 3'-6 96% 96% FP2c tgccacctgtggtccacc 97% 97% FP2 L6c tgccacctgtggTccacc T 3'-5 80% 80% Eexpt values obtained using each readout and computational (see Materials and Methods section) method used are recorded, and an average Eexpt value was computed in cases where multiple methods were applied. aThe forward primer 5’-gccacctgtggtccacct-3’ was used in these experiments. bThe RP6 primer was used in this experiment. cThe RP7 primer was used in these experiments.

140 5.2 Results and Discussion

5.2.1 The impact of LNA substitutions in the 3’ region of a primer

Table 5.1 reports PCR efficiency (Eexpt) values for amplification of the BCL2 plasmid

mini-gene using pure-DNA RPs directed against BCL2 or variants of those same primers

modified through substitution of a single LNA within base positions 3’ (denoted as an L0

substitution) to 3’-8 (L8). Average Eexpt values as a function of the substituted base type and

position are reported in Table 5.2. The combined results indicate that any single LNA

substitution near a primer’s 3’ terminus is generally well tolerated by Taq polymerase. The

exception is L5 substitutions, for which Eexpt values between 70 to 92% were recorded, indicating that the complex formed between Taq polymerase and the primer•template duplex maybe sensitive to LNA-induced perturbations to nucleotide chemistry and primer•template duplex structure at this position.

Table 5.2 Average Eexpt for amplification of the BCL2 plasmid mini-gene reported as a function of LNA base type and base location in primers containing a single LNA substitution within the 3’ (L0) to 3’-8 (L8) positions.

LNA Base Type or Position Count Average Eexpt DNA 8 97 ± 4%

A 12 97 ± 5% T 9 95 ± 7% G 17 94 ± 7% C 11 94 ± 4%

3' (L0) 7 96 ± 4% 3'-1 (L1) 7 98 ± 4% 3'-2 (L2) 7 96 ± 4% 3'-3 (L3) 7 98 ± 4% 3'-4 (L4) 7 95 ± 3% 3'-5 (L5) 4 81 ± 9%a 3'-6 (L6) 3 91 ± 5% 3'-7 (L7) 4 98 ± 1% 3'-8 (L8) 3 96 ± 2% a Eexpt for LNA substitutions at 3’-5 ranged from 70 to 92% 141 Table 5.3 PCR efficiencies for amplification of the BCL2 plasmid mini-gene using forward (FP) and reverse (RP) primers containing multiple LNA substitutions.

Forward Primer Reverse Primer LNA Eexpt Name Sequence Name Sequence Bases Primers with 2 LNA Substitutions FP gccacctgtggtccacct RP6 L01 gctggacatctcggcgaAG A,G 0% FP L01 gccacctgtggtccacCT RP2 ggacatctcggcgaagtcgc C,T 24% FP gccacctgtggtccacct RP3 L02 tggacatctcggcgaagTcG T,G 61% FP gccacctgtggtccacct RP6 L02 gctggacatctcggcgAaG A,G 18% FP gccacctgtggtccacct RP2 L02 ggacatctcggcgaagtCgC C,C, 88% FP L02 gccacctgtggtccaCcT RP2 ggacatctcggcgaagtcgc C,T 92% FP gccacctgtggtccacct RP3 L03 tggacatctcggcgaaGtcG G,G 98% FP gccacctgtggtccacct RP3 L04 tggacatctcggcgaAgtcG A,G 91% FP gccacctgtggtccacct RP2 L05 ggacatctcggcgaAgtcgC A,C 42% FP L05 gccacctgtggtCcaccT RP2 ggacatctcggcgaagtcgc C,T 42% FP L06 gccacctgtggTccaccT RP2 ggacatctcggcgaagtcgc T,T 92% FP gccacctgtggtccacct RP4 L12 ctggacatctcggcgaaGTc G,T 95% FP gccacctgtggtccacct RP4 L13 ctggacatctcggcgaAgTc A,T 96% FP gccacctgtggtccacct RP4 L14 ctggacatctcggcgAagTc A,T 78% FP gccacctgtggtccacct RP4 L15 ctggacatctcggcGaagTc G,T 0% FP gccacctgtggtccacct RP6 L15 gctggacatctcgGcgaAg G,A 0% FP L16 gccacctgtggTccacCt RP2 ggacatctcggcgaagtcgc T,C 94% FP gccacctgtggtccacct RP4 L23 ctggacatctcggcgaAGtc A,G 99% FP gccacctgtggtccacct RP4 L24 ctggacatctcggcgAaGtc A,G 95% FP gccacctgtggtccacct RP4 L25 ctggacatctcggcGaaGtc G,G 52% FP gccacctgtggtccacct RP2 L25 ggacatctcggcgaAgtCgc A,C 88% FP L25 gccacctgtggtCcaCct RP2 ggacatctcggcgaagtcgc C,C 84% FP L26 gccacctgtggTccaCct RP2 ggacatctcggcgaagtcgc T,C 91% FP L34 gccacctgtggtcCAcct RP2 ggacatctcggcgaagtcgc C,A 25% FP gccacctgtggtccacct RP2 L35 ggacatctcggcgaAgTcgc A,T 94% FP gccacctgtggtccacct RP2 L36 ggacatctcggcgAagTcgc A,T 95% FP gccacctgtggtccacct RP2 L45 ggacatctcggcgaAGtcgc A,G 92% FP L46 gccacctgtggTcCacct RP2 ggacatctcggcgaagtcgc T,C 76% FP gccacctgtggtccacct RP2 L56 ggacatctcggcgAAgtcgc A,A 84% Primers with 3 LNA Substitutions FP gccacctgtggtccacct RP6 L012 gctggacatctcggcgAAG A,A,G 0% FP gccacctgtggtccacct RP4 L024 ctggacatctcggcgAaGtC A,G,C 76% FP gccacctgtggtccacct RP4 L123 ctggacatctcggcgaAGTc A,G,T 92% FP gccacctgtggtccacct RP4 L145 ctggacatctcggcGAagTc G,A,T 0% FP gccacctgtggtccacct RP6 L145 gctggacatctcgGCgaAg G,C,A 0% FP gccacctgtggtccacct RP4 L245 ctggacatctcggcGAaGtc G,A,G 34% FP gccacctgtggtccacct RP4 L246 ctggacatctcggCgAaGtc C,A,G 88%

142 Much larger variation in Eexpt is observed when multiple LNA substitutions are made

within positions 3’ to 3’-6 of either a forward (FP) or reverse (RP) primer (Table 5.3). A

majority of primer designs containing two LNA substitutions (L03, L04, L06, L12, L13, L16,

L23, L24, L26, L35, L36, L45) exhibit efficient amplification, with Eexpt = 94 ± 3% on

average. However, a significant to complete inhibition (i.e., 0 ≤ Eexpt ≤ 85%) of extension

activity was recorded for several primer designs containing two LNA substitutions (L01,

L05, L14, L15, L34, L46, L56). These results therefore identify LNA substitution patterns

that appear unsuitable as general primer designs. Finally, two of the dual-substitution primer designs, L02 (Eexpt = 18 to 92%) and L25 (Eexpt = 55 to 88%), exhibit an Eexpt that depends

strongly on primer sequence. The utility of these two designs is therefore not clear and they

were retained in subsequent studies directed toward establishing AS primer designs.

Primers containing three LNA substitutions within the 3’ to 3’-6 positions were also

analyzed (Table 5.3). Though less comprehensive, this study was nevertheless sufficient to

permit investigation of whether results for dual-substituted primers can be used to forecast, at

least qualitatively, the amplification efficiency of primers containing a larger number of LNA

substitutions. Predicted PCR efficiencies (Epred) of primers containing three LNA substitutions (Table 5.4) were computed as the product of Eexpt values for primers containing

two LNA substitutions (Table 5.3). Results for primers L12 (Eexpt = 95%), L13 (96%) and

L23 (99%) therefore yield an Epred of 90% for primer L123, which agrees well with

experiment (92%). In general, reasonable agreement was observed between Epred and Eexpt for primers containing 3 or more LNA substitutions, suggesting that this approach may be useful as an initial guide to AS primer design. However, given the uncertainty of Eexpt for a

143 few dual-substitution patterns, most notably the L02 pattern, additional rules will be required

to reliably predict the performance of multi-LNA substituted primers.

Table 5.4 Experimental and model estimated PCR efficiencies for amplification of the BCL2 plasmid mini-gene using primers containing three LNA substitutions within the 3’-1 to 3’-6 positions.

a Name LNA Interactions Eexpt Epred L123 L12 L13 L23 92% 90% L145 L14 L15 L45 0% 0% L246 L24 L26 L46 88% 65%

L012 L01 L02 L12 0% 0 - 21% L024 L02 L04 L24 76% 16 - 76% L245 L24 L25 L45 34% 46 - 77% a For the first three entries, model predictions are computed as Epred = ELij ELik ELjk, where ELij is the Eexpt for the same amplification reaction conducted using an Lij primer having the same base sequence as primer Lijk. For the last three entries, model predictions are computed as Epred = ELij ELik ELjk, where ELij is the average Eexpt + standard error for all amplification reactions conducted using an Lij primer design. The uncertainties report reflect, in large part, the variable effect of LNA substititions at the 3’-2 and 3’-5 primer positions on amplification effciency (see Table 5.3).

144 Many of these additional rules are drawn from Table 5.5, which reports average Eexpt values for amplification of the BCL2 gene using primers bearing either one or two LNAs within the 3’ to 3’-6 positions; all possible substitution patterns were investigated in either case. To assess the value of Table 5.5, let us assume for the moment that it is applicable to the general design of absolute AS primers, and that Eexpt is altered to a similar extent by a

mismatch or an LNA present at a given primer position. Based on the poor performance

(Eexpt ≤ 24%) of the L01 primer directed against the BCL2 template, Table 5.5 would then predict that an L1 primer where the 3’ nucleotide aligns with the SOV will amplify the allele to which it is directed, say a WT allele, with 98% efficiency, while amplifying the corresponding SPM-bearing MT allele with an efficiency near or below 24%.

The above analysis is of course unproven at this point. Moreover, it is based only on amplification data for LNA-substituted primers directed against the model BCL2 template.

In the next sections, the assumptions made in establishing this putative design strategy are

tested to determine if additional design rules are required. The resulting protocol is thereby

refined and then applied to the design of LNA-containing primers directed against other, more clinically relevant alleles. Initial focus is on JAK2, where the aim is to establish the

guidelines needed to create AS primers offering superior SPE for either the JAK2 WT or MT

(V617F) allele.

145 Table 5.5 Average efficiency of amplification of the BCL2 mini-gene using each possible primer design comprised of either one or two LNA substitutions within the 3’ to 3’-6 region. Name LNA Position in Primer E a 3’-6 3’-5 3’-4 3’-3 3’-2 3’-1 3’ All primers containing a single LNA substitution L0 ■ 96% L1 ■ 98% L2 ■ 96% L3 ■ 98% L4 ■ 95% L5 ■ 70 – 92% L6 ■ 91% All primers containing two LNA substitutions L01 ■ ■ 0 – 24% L02 ■ ■ 18 – 92% L03 ■ ■ 98% L04 ■ ■ 91% L05 ■ ■ 42% L06 ■ ■ 92% L12 ■ ■ 95% L13 ■ ■ 96% L14 ■ ■ 78% L15 ■ ■ 0% L16 ■ ■ 94% L23 ■ ■ 99% L24 ■ ■ 95% L25 ■ 52 - 88% L26 ■ ■ 91% L34 ■ ■ 25% L35 ■ ■ 94% L36 ■ ■ 95% L45 ■ ■ 92% L46 ■ ■ 76% L56 ■ ■ 84% aThe set of E values are used as input parameters to our proposed model to estimate efficiencies and SPE of amplification of an SPM-bearing allele using any putative AS primer design containing LNA substitution(s) within the 3’ to 3’-6 positions. Data used to generate E values were taken from Tables 5.1 and 5.3.

146 5.2.2 AS primer design: preferred location for the site of variation (SOV)

Cues for determining the optimal alignment of an AS primer with a SOV are provided by recent studies that show that the presence of a mismatch at primer position 3’, 3’-1, 3’-2 or 3’-5 can reduce PCR efficiency (212). These findings were combined with qPCR results

(Figures 5.1A and 5.1B) for pure-DNA AS primers directed against JAK2 WT or JAK2 MT

(V617F), respectively, to better define the impact of positional alignment of a primer with a

SOV. The study was conducted across a range of Ta so as to identify alignments capable of

o providing robust, temperature-insensitive allele discrimination. At high Ta (≥ 64.5 C) both

the WT and MT AS primers exhibit a preference to selectively amplify their respective target

allele. Here, the observed discrimination is likely due to differences in the hybridization

thermodynamics, since Ta falls below the predicted melting temperature (Tm) of duplex formed with the respective target template, but above that for the non-target template due to

o the mismatch formed. However, at a lower Ta (≤ 60.7 C) that falls at or below the Tm of

either the primer•WT template or primer•MT template duplex, the ability of the primer to

selectively amplify its target is instead related to differences in the kinetics of Taq-mediated extension of the two templates. Under these preferred reaction conditions, significant allele

discrimination is retained only when the SOV aligns with the primer’s 3’ position. This result is supported by previous studies reporting improved amplification selectivity when a

3’-SOV primer is used (149).

147 45 A 40 35 WT0:WT 30 WT0 :MT 25 WT1:WT q C 20 WT1:MT 15 WT2:WT WT2:MT 10 WT5:WT 5 WT5:MT 0 67.3 64.5 60.7 58.0 56.2 55.0 o Ta ( C)

45 B 40 35 MT0:MT 30 MT0:WT 25 MT1:MT q C 20 MT1:WT 15 MT2:MT MT2:WT 10 MT5:MT 5 MT5:WT 0 67.3 64.5 60.7 58.0 56.2 55.0 o Ta ( C)

Figure 5.1 Cq data, of various pure-DNA AS primers for their target allele reported as a function of the target allele, the primer position interacting with the SOV, and the annealing temperature Ta used in the qPCR. Figure A and B are AS primers for WT JAK2 and MT JAK2 V617F respectively. Key: WT1:MT, for example, indicates amplification of the MT template using a WT-directed primer that forms a mismatch with the variant base at the primer’s 3’-1 position.

148 Further analysis of the ΔCq provided by 3’-SOV AS primers (hereafter identified as

the WT0 and MT0 primer designs, where WT and MT identify the target templates and “0”

the primer position aligned with the SOV) reveals that the allele discrimination provided by

this pure-DNA primer design strongly depends on the type of base pair mismatch created at

the 3’ position. For example, annealing of the WT0 AS primer (Table 5.6) to its non-target

JAK2 V617F MT allele creates a pyrimidine•pyrimidine c•t (primer•template) mismatch, and

o the selectivity (∆Cq = 1.1 at Ta < 60.7 C) of the primer for the target WT allele is very poor.

In contrast, when the corresponding MT0 AS primer (Table 5.7) is annealed to its non-target

JAK2 WT template, a purine•purine a•g (primer•template) mismatch is formed and the

o primer achieves a much higher degree of discrimination (∆Cq = 11.9 at Ta < 60.7 C). One

might expect this is due to the greater steric penalty created by the mismatching of bulkier

purinic bases which, in turn, leads to a greater reduction in Taq polymerase extension activity

on the non-complementary WT allele. However, this relationship does not appear to be

general (155,156), and at this point we can only conclude that the performance of pure-DNA

3’ SOV primers is highly dependent on the base pair and mismatch formed. Taq polymerase

extension activity is also known to be sensitive to the base pair at and the base sequence

flanking an AS primer’s 3’ position (155), Thus, the results reported here indicate that

alignment of the SOV with the 3’ position may be adopted as part of a general strategy for

AS primer design, but it is certainly not sufficient to consistently achieve the SPE required

for unequivocal detection of SPMs.

149 Table 5.6 Eexpt, Epred & SPE for various JAK2 WT AS primers directed against the JAK2 WT and MT (V617F) plasmid templates. a b c Name RP Sequence Cq(WT) Cq(MT) ΔCq(MT-WT) WT Target MT Target SPE Method Eexpt Epred Eexpt Epred WT0 acttactctcgtctccacagac 15.3 ± 0.1 16.4 ± 0.1 1.1 100% 96% 62% 1 WT0 AS Primer with Additional mismatch (MM) at 3'-5 position WT0 MM5c acttactctcgtctccccagac 16.0 ± 1.0 24.9 ± 1.2 8.9 95% 53% 3% 2 WT0 MM5g acttactctcgtctccgcagac 16.3 ± 0.3 19.9 ± 0.9 3.6 92% 70% 54% 2 WT0 MM5t acttactctcgtctcctcagac 14.9 ± 1.0 23.9 ± 1.2 9.0 104% 56% 2% 2 WT0 AS Primer with 1 LNA Substitutions WT0 L0 acttactctcgtctccacagaC 15.0 ± 0.3 21.2 ± 0.1 6.2 103% 96% 65% 2% 1

WT0 L0 acttactctcgtctccacagaC 15.0 ± 0.2 20.5 ± 0.3 5.5 103% 96% 68% 4% 2 WT0 L1 acttactctcgtctccacagAc 15.5 ± 0.3 21.0 ± 0.3 5.5 99% 100% 66% 0 - 24% 4% 2 WT0 L2 acttactctcgtctccacaGac 15.6 ± 0.2 20.1 ± 0.1 4.5 98% 99% 70% 18 - 88% 6% 2 WT0 L4 acttactctcgtctccaCagac 15.5 ± 0.3 19.2 ± 0.4 3.7 98% 98% 74% 91% 18% 2 WT0 L5 acttactctcgtctccAcagac 15.8 ± 0.2 27.7 ± 0.7 11.8 96% 70 - 92% 47% 42% 0.1% 2 WT0 L6 acttactctcgtctcCacagac 15.6 ± 0.2 18.7 ± 0.1 3.1 98% 91% 77% 92% 15% 2 WT0 AS Primer with 2 LNA Substitutions WT0 L02 acttactctcgtctccacaGaC 16.0 ± 0.2 24.4 ± 0.1 8.3 94% 18 - 92% 55% 0.5% 1 WT0 L03 acttactctcgtctccacAgaC 15.7 ± 0.5 20.7 ± 0.4 5.0 97% 98% 67% 7% 2 WT0 L05 acttactctcgtctccAcagaC 29.4 ± 0.8 n.d n.d. 44% 42% 0% 2% 2 WT0 L12 acttactctcgtctccacaGAc 16.7 ± 0.8 26.5 ± 0.5 9.8 89% 95% 49% 0 - 21% 0.5% 2 WT0 L13 acttactctcgtctccacAgAc 16.1 ± 0.1 19.4 ± 0.2 3.4 94% 96% 73% 0 - 23% 15% 1 WT0 L15 acttactctcgtctccAcagAc n.d. n.d. n.d. 0% 0% 0% 0% 2 WT0 L23 acttactctcgtctccacAGac 16.1 ± 0.3 25.3 ± 0.1 9.2 94% 99% 52% 17 - 85% 0.3% 1 WT0 L25 acttactctcgtctccAcaGac 16.0 ± 0.2 33.3 ± 0.4 17.3 95% 55 - 88% 38% 4 – 34% 0.002% 1

WT0 L25 acttactctcgtctccAcaGac 15.8 ± 0.1 33.0 ± 0.3 17.2 96% 55 - 88% 38% 4 - 34% 0.002% 2 WT0 L45 acttactctcgtctccACagac 16.9 ± 0.2 32.7 ± 0.1 15.8 87% 92% 38% 35% 0.006% 1 WT0 AS Primer with 3 LNA Substitutions WT0 L123 acttactctcgtctccacAGAc 17.7 ± 1.0 32.8 ± 0.7 15.1 82% 90% 38% 0 - 19% 0.04% 1 aThe sequence for the FP used in all JAK2 studies is 5’-tgaagcagcaagtatgatgagc-3’. The location of the site of variation is in bold and introduced

b −( ΔCq( MT −WT ) −tcσ MT ) c mismatches are underlined. SPE = ()1+ Eexpt where tc = 2.92 for 95% confidence interval. Method used to determine Cq. In o method 1, Cq determined from an average of three qPCR runs performed at Ta = 60 C. In method 2, Cq determined from an average of three o qPCR runs performed at Ta = 58.7, 56 and 55 C, respectively.

150 Table 5.7 Eexpt, Epred & SPE for various JAK2 MT (V617F) AS primers directed against the JAK2 WT and MT (V617F) plasmid templates. a b c Name RP Sequence Cq(MT) Cq(WT) ΔCq(WT-MT) MT Target WT Target SPE Method

Eexpt Epred Eexpt Epred MT0 acttactctcgtctccacagaa 15.0 ± 0.2 26.9 ± 0.3 11.9 100% 47% 0.05% 1 MT0 AS Primer with 1 LNA Substitutions MT0 L0 acttactctcgtctccacagaA 15.2 ± 0.2 32.6 ± 0.5 17.4 98% 96% 38% 0.002% 2 MT0 L1 acttactctcgtctccacagAa 15.5 ± 0.2 30.9 ± 0.6 15.4 96% 100% 40% 0 - 24% 0.01% 2 MT0 L2 acttactctcgtctccacaGaa 15.3 ± 0.1 26.4 ± 0.1 11.2 98% 99% 48% 18 - 88% 0.06% 2 MT0 L5 acttactctcgtctccAcagaa 16.2 ± 0.2 31.8 ± 0.2 15.6 90% 70 – 92% 39% 42% 0.01% 2 MT0 AS Primer with 2 LNA Substitutions MT0 L25 acttactctcgtctccAcaGaa 16.9 ± 0.0 36.3 ± 0.1 19.4 85% 55 – 88% 33% 4 – 34% 0.0008% 1 MT0 AS Primer with 3 LNA Substitutions MT0 L123 acttactctcgtctccacAGAa 16.0 + 0.2 36.9 + 0.7 20.9 91% 90% 33% 0 – 19% 0.0006% 1 aThe sequence for the FP used in all JAK2 studies is 5’-tgaagcagcaagtatgatgagc-3’. The location of the site of variation is in bold and introduced

b −( ΔCq( WT −MT ) −tcσWT ) c mismatches are underlined. SPE = ()1+ Eexpt where tc = 2.92 for 95% confidence interval. Method used to determine Cq. In o method 1, Cq determined from an average of three qPCR runs performed at Ta = 60 C. In method 2, Cq determined from an average of three o qPCR runs performed at Ta = 58.7, 56 and 55 C, respectively.

151 5.2.3 AS primer design: the effect of LNA substitution versus mismatch insertion

One way to improve the specificity of 3’-SOV AS primers is to introduce an artificial mismatch in the 3’ region of the primer (149,150). An understanding of how the specificity enhancement from introducing a mismatch compares to introducing an LNA substitution into the same position of a 3’-SOV primer was therefore sought. The WT0 AS primer directed against JAK2 (Table 5.6) was modified to create three primers that, together, cover all possible primer•template base pair mismatches at the 3’-5 position. Efficient amplification of the target WT template was observed in each case (Table 5.6), with an average Eexpt of 97%

(92 to 104%) recorded. When these 3’-5 mismatch-modified (MM5) WT0-type AS primers were applied to the MT template, amplification efficiency was significantly reduced relative to the parent 3’ SOV primer, with an average Eexpt of 60% (53 to 70%) recorded. Thus the introduction of a 3’-5 mismatch does result in improved discrimination, but it does not provide for unequivocal detection.

Modification of the WT0 AS primer to instead contain a complementary LNA at the

3’-5 position (i.e., WT0 L5) greatly improves allele specificity as well. An Eexpt of 96% is

observed for the WT0 L5 primer directed against its target, JAK2 WT, with unwanted cross amplification of the JAK2 MT allele inhibited significantly (Eexpt = 47%) but not completely.

These two results are best interpreted as showing that duplex perturbations caused by either a

mismatch or an LNA substitution at a given primer position alter Taq polymerase extension

activity to a comparable extent.

The results also support the assertion that Table 5.5 can be of use to forecast both the

efficiency and the specificity of primers possessing a particular LNA substitution pattern.

For example, Eexpt is 44% for amplification of the JAK2 WT allele using the JAK2 WT0 L05

152 AS primer (Table 5.6). The predicted efficiency Epred of this amplification reaction is 42%

based on results for the L05 primer directed against BCL2 (Table 5.5). Epred values for other

LNA substituted primers directed against the JAK2 WT allele likewise agree well with

corresponding Eexpt data. This includes primers containing three LNA substitutions where

predict amplification efficiencies are computed from data for primers having two LNA

substitutions (Table 5.4) as described above. However, both Epred and Eexpt values for certain

LNA substitution patterns, most notably the L02 pattern, carry significant uncertainty, further

confirming that the properties of these patterns are sensitive to both the nucleotide being

substituted and the flanking base sequence, as recorded in Table 5.3.

Specificity is then predicted, at least qualitatively, from Table 5.5 by assuming that a

primer possessing an LNA at the 3’ position amplifies its target template with an efficiency

similar to that of a primer of the same base sequence bearing no LNA at the 3’ position and

directed toward the non-target allele; a mismatch at the 3’ position is then formed in lieu of

an LNA•DNA base pair. As an example, Table 5.5 accurately predicts the performance of the MT0 L5 AS primer (Table 5.7) when directed against its target JAK2 MT allele (Eexpt =

90%; Epred = 92%) or against the JAK2 WT allele (Eexpt = 39%; Epred = 42%). The difference in E values (92% - 42%) may therefore be estimated to provide a useful measure of primer specificity. Tables 5.6 and 5.7 show that this predictive capability generally holds when applied to other LNA substitution patterns.

5.2.4 AS primer design: application of LNA substitution guidelines

One may then hypothesize that clinically useful absolute AS primers can be created by identifying LNA substitution patterns that modestly destabilize Taq polymerase activity

153 such that amplification proceeds in the absence of a base pair mismatch at the 3’ position, but

does not (or is greatly inhibited) when a 3’ mismatch is present. Too significant a destabilization of Taq activity on the target allele is undesirable. In particular, PCR amplification of the non-target JAK2 MT allele using the WT0 L05 primer results in no detectable amplification products after 60 cycles. When WT0 L05 is directed against the target JAK2 WT allele, gene amplification is detected, but at a Cq of 29.4. If selective

detection of an allele is the goal, a high Cq for amplification of the target template will

greatly reduce the value of ∆Cq(MT-WT) (since the maximum Cq for either reaction is taken to be 40 based on literature studies (24)), resulting in a far less favorable SPE. An additional

design constraint is therefore a requirement that Epred be ≥ 85% for amplification of the target

allele. An Epred ≤ 40% for amplification of the non-target allele is then used as a predictive

benchmark for a primer design with the potential to provide good specificity for an SPM-

bearing allele.

Table 5.8 lists those LNA substitution patterns within a 3’ SOV primer that are

predicted to offer good specificity based on the proposed design rules. Both L0 and L1 are

predicted to be useful AS primer designs in accordance with past studies (41,46,157,161).

The remaining designs, all of which are comprised of two or more LNA substitutions, are

new. Based on the degree to which they satisfy all design guidelines and the experimental

results on which they are based, certain primer designs, shown in bold in Table 5.8, appear

particularly promising, and those should be examined first when seeking an AS primer

capable of providing unequivocal detection of an SPM of interest. The other designs are

those that should be explored if the more promising group of designs proves unsuccessful.

154 Table 5.8 Putative AS primer designs possessing a 3’ SOV and one or more LNA substitutions within the 3’ to 3’-6 positions.

Single Two LNA Three LNA LNA Designs Designs Designs L0 L25 L123 L1 L45 L136 L2 L03 L236 L5 L04 L16 L13 L16 L35 All primer designs listed meet our model-based requirements. Primer designs in bold are those predicted by the model and our accompanying data analyses to offer the best chance of providing the SPE required for unequivocal detection of SPMs.

Initial testing of putative designs was conducted by applying them to create an AS

primer directed against JAK2 WT, which, due to the innately low specificity of the c•t 3’

SOV mismatch formed with the JAK2 V617F MT allele (see above), represents a severe technical challenge. Results for the WT0 L45 primer design illustrate the process used. Epred is 92% for amplification of WT JAK2, which is in good agreement with Eexpt = 87% (Table

5.6). For the L045 primer design acting on the WT JAK2 template, an Epred = 35% is then

calculated as the product of Eexpt values (Table 5.3) for the L04 (91%), L05 (42%) and L45

(92%) primer designs, which is consistent with the Eexpt = 38% determined for the WT0 L45

AS primer directed against the MT template. The design therefore meets all proposed

criteria and all design predictions based on the BCL2 data set are in accordance with experimental amplification efficiencies for the WT0 L45 primer. As a result, the primer is indeed highly specific for WT JAK2, with an SPE of 0.006% (Table 5.6).

A number of additional primer designs in Table 5.8 were tested to further validate the proposed method. Tables 5.6 and 5.7 present amplification data for four JAK2 primers (WT0

L25, MT0 L0, MT0 L25, MT0 L123) based on the putative designs that achieve SPEs

155 meeting the criteria for unequivocal detection. For example, the MT0 L123 forward primer

directed against JAK2 MT is characterized by an experimental SPE of 0.0006%, indicating that an associated qPCR assay can detect one JAK2 V617F mutation in a background of more than 100,000 copies of the JAK2 WT allele. Tables 5.6 and 5.7 also confirm the accuracy of the proposed design strategy; amplification data for a large set of JAK2 WT0 and MT0 LNA- bearing AS primers show that there is excellent agreement between Eexpt and Epred, with an average difference (Eexpt – Epred) of 0% ± 4%.

Raw qPCR data for amplification of target and non-target JAK2 alleles using the

WT0 L25 or MT0 L123 AS primer are shown in Figures 5.2 and 5.3, respectively. Also

shown for reference are corresponding data when the WT0 L025 and MT L0123 primer

designs are used. No amplification products are detected after 60 cycles of qPCR when

either the WT0 L025 or the MT L0123 primer is directed against its target WT JAK2

template, confirming a near zero Epred for these reactions and supporting the assumption that

an LNA at the 3’ primer position can serve as a surrogate for understanding the effect of a

mismatch at that position.

Similar performance was expected and observed when the WT0 L25 or the MT0

L123 primer is applied to its respective non-target allele. Though a cross-product

amplification signal is recorded for each primer, the Cq is very large, particularly when compared to that for amplifications using these same primers directed against their target template. The appearance of a signal is expected in this case, as unlike LNA substitutions, which are permanent during PCR, the mismatch formed between the 3’ base of a primer and the non-target allele is dynamic during the amplification reaction due to errors made by Taq polymerase. Thus, for an AS primer that is absolute in its specificity and therefore does not

156 extend when acting on the non-target template, linear amplification of the opposite strand with the common primer will still occur, along with Taq mediated errors. Substitutions at the

SOV can therefore occur and lead to generation of target templates for the AS primer and the subsequent detection of their exponential amplification. Despite these Taq-fidelity derived limitations, both the JAK2 WT0 L25 and MT0 L123 AS primers provide a ∆Cq sufficient for unequivocal detection of their target allele.

Figure 5.2 qPCR amplification curves using the JAK2 WT0 L025 or WT0 L25 primers. JAK2 WT plasmid using the WT0 LNA025 primer (▲) or the WT0 L25 primer (▬). Data for amplification of the JAK2 MT V617F plasmid using the WT0 L25 primer (▪▪▪) are also shown. For all experiments, detection of amplification is by SYBR green.

157

Figure 5.3 qPCR amplification curves using the JAK2 MT0 L0123 or MT0 L123 primers. JAK2 MT (V617F) plasmid using the MT0 L0123 primer (▲) or the MT0 L123 primer (▬). Data for amplification of the JAK2 WT plasmid using the MT0 L123 primer (▪▪▪) are also shown. For all experiments, detection of amplification is by SYBR green.

5.2.5 AS primer design: Plexor™ LNA AS primers directed against JAK2 V617F

The adaptation of LNA-based AS primers to the testing of clinical samples requires the incorporation of a readout method that specifically records amplification of the MT allele bearing the SPM within a complex background containing an excess of the associated WT allele. The recently developed Plexor™ real-time PCR system has been adopted for this purpose. Compared to hydrolysis probes, Plexor™ labelling provides an alternative and arguably more powerful means to monitor primer-specific amplification as it allows for post-

PCR product confirmation by melting curve analysis (213,214). To illustrate how LNA-

158 based AS primers may be made compatible with Plexor™ real-time detection, the JAK2 MT0

L123 primer was modified through addition of a fluorescent dye (FAM) and the non-natural base methylisocytosine (iso-dC) at the 5’ end. qPCR experiments on serial dilutions of JAK2

MT (V617F) plasmid in excess WT plasmid (Figure 5.4) were then performed using the

Plexor™ compatible version of the LNA-bearing AS primer. Results confirm that both the amplification efficiency (92%) and SPE (0.0007%) are maintained following JAK2 MT0

L123 primer modification and PCR adaptation to the Plexor™ mastermix.

1:10,000 1:100,000 1:1 WT Only 1:10

RFU 1:100 1:1000

Cycle

Figure 5.4 qPCR amplification of the JAK2 MT (V617F) and WT plasmid template using the Plexor™ MT0 L123 AS primer. The MT (V617F) plasmids were serially diluted into a background of WT plasmids and a Ta of 60 °C was used. A PCR efficiency of 92% was determined based on amplification of the serially diluted MT (V617F) plasmids and a SPE = 0.0007% using a 95% confidence interval.

159 A three way blind comparison of JAK2 V617F status was then undertaken on 96 consecutively accrued anonymous MPN samples at the BCCA Cancer Genetics Lab using the LDT, Ipsogen MutaQuant and Plexor LNA AS primer (Plexor JAK2 MT0 L123) assays.

Experimentally determined or manufacturer reported analytical specificities at a 95% confidence interval were 5% and 0.2% for the LDT and MutaQuant assays, respectively. An

SPE = 0.01% (ΔCq = 14 based on an E = 92% and a 95% confidence interval) was used to

identify JAK2 V617F positive samples using the Plexor LNA AS primer assay. Both the

MutaQuant and the Plexor LNA AS primer assays recorded 11 positives that tested negative

by the LDT assay (Table 5.9). One additional MPN status samples tested positive with the

Plexor LNA primer assay. For this patient, two additional samples were acquired, each after

a successive ca. 8 month period, to confirm JAK2 V617F status and assess temporal changes

in JAK2 V617F levels. Both samples tested positive using the Plexor LNA assay, with the measured ∆Cq decreasing monotonically from 13.3 (month 0) to 12.9 (month 8) to 12.2

(month 17) in a manner consistent with early disease progression.

Table 5.9 Summary of results of different techniques used to classify the JAK2 V617F status of 96 patients suspected of a myeloproliferative neoplasm (MPN).

Lab Developed Multiplex MutaQuant Plexor LNA AS Primers Hydrolysis Probes Positive 24 35 36 Negative 72 61 60 Call Rate 25% 36% 37.5%

The Plexor LNA technology was also evaluated for detection of false positives

through the testing of 96 blind samples acquired from non-hematological patients. No

positives were recorded. Together, these results confirm that the MT0 L123 AS primer, and

160 by extension the proposed protocol for AS primer design, may offer distinct advantages in the early detection, treatment and monitoring of life-threatening diseases associated with rare

SPMs.

5.2.6 Testing of LNA AS primer designs for absolute detection of KIT D816V and

BRAF V600E

To verify that the design rules are generally applicable, three multi-LNA designs

(L45, L25 and L123) from Table 5.8 that yielded absolute AS primers against either WT

JAK2 WT or MT (V617F) were applied to qPCR-based detection of serially diluted KIT

D816V or BRAF V600E in a background of WT template. For each of these clinically relevant SPMs, at least one of the LNA-substitution patterns yielded an AS primer that can be classified as unequivocal in its analytical specificity (Table 5.10). For KIT D816V, both the MT0 L25 and MT0 L45 designs can be defined as absolute (SPE < 0.01%) when applied as forward primers. The MT0 L45 design is also absolute (SPE = 0.001%) when applied as a reverse primer, along with both the MT0 L0, MT0 L25 and MT0 L123, highlighting the value of the AS primer designs and design guidelines.

Similarly, all three of multi-LNA primer designs show excellent selectivity for BRAF

V600E, with FP MT0 L123 permitting unequivocal discrimination of BRAF V600 SPM from

BRAF WT template in a real-time PCR assay. Moreover, for each template tested, at least one of the multi-LNA AS primer designs provides an analytical specificity that is significantly better than that provided by the benchmark MT0 design. These new designs, and all of the multi-LNA designs reported in Table 5.8, can therefore be added to the more

161 common MT0 L0 design to provide a set of LNA substitution patterns from which there is a strong potential to create at least one absolute AS primer for any given SPM.

Table 5.10 Experimental Cq, Eexpt and SPE data for various AS primer designs directed against the KIT D816V and BRAF V600E SPM-bearing alleles.

c d Name Sequence Cq(MT) Cq(WT) ΔCq(WT-MT) Eexpt SPE

KIT D816V AS primersa FP MT0 gtgattttggtctagccagagt 17.7 ± 0.1 19.0 ± 0.1 1.4 100% 51% FP MT0 L0 gtgattttggtctagccagagT 17.7 ± 0.2 25.9 ± 0.1 8.2 100% 0.4% FP MT0 L45 gtgattttggtctagcCAgagt 18.5 ± 0.1 37.9 ± 0.9 19.4 94% 0.002% FP MT0 L25 gtgattttggtctagcCagAgt 18.0 ± 0.1 34.4 ± 0.1 16.5 98% 0.002% FP MT0 L123 gtgattttggtctagccaGAGt 18.0 ± 0.1 27.6 ± 0.1 9.6 98% 0.2% RP MT0 taaccacataattagaatcattcttgatga 17.5 ± 0.0 24.8 ± 0.2 7.3 100% 1% RP MT0 L0 taaccacataattagaatcattcttgatgA 20.9 ± 0.0 39.8 ± 0.2 18.9 79% 0.002% RP MT0 L45 taaccacataattagaatcattctTGatga 17.8 ± 0.2 35.0 ± 0.2 17.2 98% 0.001% RP MT0 L25 taaccacataattagaatcattctTgaTga 18.8 ± 0.2 37.4 ± 1.0 18.6 91% 0.004% RP MT0 L123 taaccacataattagaatcattcttgATGa 18.1 ± 0.0 34.9 ± 0.5 16.8 95% 0.004%

BRAF V600E AS primersb FP MT0 ggtgattttggtctagctacaga 16.9 ± 0.2 23.5 ± 0.6 6.6 100% 4% FP MT0 L0 ggtgattttggtctagctacagA 18.2 ± 0.2 34.6 ± 0.3 16.5 91% 0.004% FP MT0 L45 ggtgattttggtctagcTAcaga 18.9 ± 0.2 40.0 ± 1.6 21.1 86% 0.004% FP MT0 L25 ggtgattttggtctagcTacAga 18.5 ± 0.3 37.0 ± 1.0 18.5 89% 0.006% FP MT0 L123 ggtgattttggtctagctaCAGa 17.5 ± 0.6 36.1 ± 0.5 18.7 96% 0.001% RP MT0 cccactccatcgagatttct 16.9 ± 0.4 27.0 ± 0.3 10.1 100% 0.2% RP MT0 L0 cccactccatcgagatttcT 23.7 ± 1.0 43.4 ± 1.0 19.7 64% 0.03% RP MT0 L45 cccactccatcgagATttct 19.8 + 0.2 37.4 ± 0.5 17.6 80% 0.01% RP MT0 L25 cccactccatcgagAttTct 45.0 ± 0.6 n.d. n.d. 30% n.d. RP MT0 L123 cccactccatcgagatTTCt 29.7 ± 1.0 n.d. n.d. 48% n.d. aSequences of the common FP and RP used with the KIT D816V AS primers are 5’- ctcctccaacctaatagtgtattcacag-3’ and 5’-gcagagaatgggtactcacg-3’ respectively. bSequences of the common FP and RP used with the BRAF V600E AS primers are 5’-gcttgctctgataggaaaatgagatc-3’ and 5’-tcagtggaaaaatagcctcaattc-3’ respectively. The location of the site of variation is in bold. c Eexpt for amplification of the MT target with the isosequential DNA primer is assumed to be 100%. d −( ΔCq( WT−MT ) −tcοWT ) SPE = ()1+ Eexpt where tc = 2.92 for 95% confidence interval.

162 5.3 Conclusions

The studies of primers containing single and multiple LNA substitutions described

here have yielded effective new guidelines for the design of absolute AS primers, in part by

identifying positions within the 3’ end of primers where LNA-induced conformational

changes to the primer•template duplex can reduce or even fully inhibit the extension activity

of Taq polymerase. The positions identified are consistent with a single, but quite

illuminating structural study involving Taq polymerase bound to a primer•template duplex

which shows, at the onset of the extension reaction, that Taq interacts with the five most 3'

nucleotides of the primer to varying degrees (215). The guidelines described in this chapter

also leverage important results of Bru et al. (212), who found that a mismatch at the 3’-5

position of a primer can reduce amplification efficiency significantly. In particular, Table 5.5

shows that a similar effect can be achieved through a single LNA substitution at that primer

position. This is not to suggest that the effect of an LNA substitution or a mismatch at a

given primer position is entirely equivalent. Indeed, Bru et al.’s data suggest that the impact

on Taq extension activity of an added mismatch is, in general, more severe.

A global analysis of all data collected here suggests that the application of a given AS

primer design to a particular SPM can fail either by not reaching the desired SPE, by failing to meet the minimum required PCR efficiency (85%) for amplification of the target allele, or via both effects, as is the case for RP MT0 L0 directed against BRAF V600E. Moreover, while specificity predictions, expressed as ∆E, generally match well with experiment (Tables

5.6 and 5.7), there are a few cases where significant discrepancies occur, indicating that the

type of 3’ mismatch formed or the specific primer sequence used can influence amplification

efficiency and allele specificity in ways not fully captured by the proposed guidelines.

163 Finally, a large experimental uncertainty in amplification efficiency values for primers

containing L2 and/or L5 substitutions was observed. AS primer designs containing either of

these substitutions, particularly an L2 substitution, may therefore exhibit a large range of

specificities when applied to different templates. As exemplified by the JAK2 MT0 L123 primer, however, these designs may provide an exceptional SPE. Nevertheless, due to this

uncertainty and the design limitations acknowledged above, screening of the candidate

primer designs in Table 5.8 will generally be required to identify an appropriate AS primer

providing the required SPE for a given SPM.

164 Chapter 6: Conclusions and Suggestions for Future Work

The advancement and refinement of DNA technology, including PCR and next-

generation sequencing systems, have allowed facile detection of differences in and mutations

to germline sequences, and the ability to link specific differences and mutations to more that

6000 rare diseases as well as to predispositions to more common diseases such as diabetes,

Alzheimer’s and cancer. Moreover, the detection of germline variations by qPCR,

microarrays, or to a lesser extent sequencing is routinely performed by clinical medical

genetics laboratories to diagnose disease and set treatment regimens. Genotype testing is

also available to the public through companies such as 23andMe, deCODE and Navigenics.

Major research programs are now revealing the different somatic mutations presented

in particular forms of cancer, and then identifying those “driver mutations” that are most

likely responsible for providing the cancer cells with genetic advantages for survival and proliferation (7,14). Although the work on the cancer genome is not complete, research efforts to date have already identified a number of clinical important SPMs, including the

JAK2 V617F, KIT D816V and BRAF V600E mutations considered in this thesis.

A challenge to studying the cancer genome is that somatic mutations important to cancer onset, progression or treatment often differ from germline sequence by only a single base pair difference. Moreover, mutations are not necessarily present in the genomic DNA of all cells collected within the sample. As a result, the frequency of template containing a

SPM within a sample to be analyzed can be very low (i.e. far less than 1%). Within the cancer genome project, extremely deep (i.e. repetitive) sequencing of the genome is helping to detect these rare SPMs. However the substantial effort and costs associated with deep sequencing generally precludes its regular use in the clinical laboratory setting. As discussed

165 in detail in Chapters 4 and 5 of this thesis, the limited analytical sensitivity and specificity of conventional genotyping assays also precludes their general application to SPM detection in clinics.

Thus, improved techniques for detecting a rare SPM are required and are being developed. Although some of these techniques have been validated and are now used in research laboratories, their translation to clinical laboratories has been slowed by poor availability of the equipment or resources required, and/or the inability of a typical clinical laboratory budget to cover the associated assay costs. Thus, the development of techniques suitable for use in clinical laboratories, and therefore offering both the analytical sensitivity and specificity to detect low frequency SPMs at the required cost point, was the primary objective of the research outlined in this thesis.

The specific aim was to advance existing qPCR based techniques, as clinical laboratories are generally already equipped with the necessary instrumentation and the laboratory personal are well acquainted with this technique. qPCR based assays are also appealing due to their relatively low costs and acceptable throughput. Moreover, since subsequent analyses of products of qPCR are generally not required, the risk of contamination is minimized.

The success of traditional DNA based oligonucleotides commonly used as allele AS probes and AS primers in genotyping qPCR assays provided a foundation for this technology development program. However, detecting a rare SPM is considerably more challenging than is the qPCR-based detection of higher frequency alleles and demanded further refinement of AS probe or AS primer designs so as to achieve the required analytical sensitivity and specificity. These refinements were provided through selective replacement of

166 nucleotides with the nucleotide analog LNA. The previously documented properties of

LNA-DNA heteroduplexes, including improved duplex stability and enhanced intolerance to base pair mismatches, make them potentially attractive for improving the design of AS probes and primers directed at SPMs. That potential has been recognized by others and used to design AS primers substituted with one or a small number of LNAs, some of which proved effective in enhancing the specificity of AS primers.

However, guidelines for how best to introduce LNAs into AS probes or primers were lacking, highlighting the need to conduct studies aimed at improving our understanding of

the hybridization thermodynamics of LNA-DNA heteroduplexes. This in turn requires a

more detailed understanding of how changes in probe•template or primer•template duplex

structure mediated by LNA substitutions affect the interaction of Taq polymerase with the

complex and its subsequent hydrolysis or extension activity. Both studies were conducted to

provide fundamental knowledge useful for establishing reliable guidelines for designing

LNA-bearing AS probes or primers. At the outset of this thesis, two models had been

described that were capable of predicting hybridization thermodynamics of duplexes

containing LNA substitutions. However, those models suffer from either a relatively high

standard deviation (± 5.0 oC) in predicted melting temperatures, are applicable to only a

limited range of solution, and/or only predict hybridization thermodynamics for LNA-DNA

heteroduplexes involving mixmers (i.e. non-consecutive LNA). Each of these limitations was a concern with respect to serving as a starting point for AS probe and primer design.

Chapters 2 and 3 therefore describe a set of new models that improve the accuracy of melting thermodynamics predictions and expand the range of application to include duplexes either heavily substituted with LNA, having substituted LNAs adjacent to one another on a strand

167 (gapmers), or both features. In part through a proper accounting of the non-zero heat

capacity change ΔCp associated with duplex melting, the LNA Single-Base Thermodynamic

(SBT) model presented in Chapter 3 took a unique approach to describing the incremental

o o changes in enthalpy (ΔΔH LNA) and entropy (ΔΔS LNA) associated with melting duplexes

containing LNA substitutions. This allowed accurate prediction of melting thermodynamics

for LNA-DNA heteroduplexes where one strand contains any pattern of LNA substitutions,

including mixmer, gapmer and highly or fully substituted sequences. The SBT model

requires only four parameters in addition to those used to predict the melting thermodynamics of the isosequential pure-DNA duplex with the model reported in Chapter 2.

Being based on a more mechanistically sound theory of melting, the SBT model is capable of predicting the Tm of mixmer LNA-DNA heteroduplexes with similar precision and accuracy

as the best of the previously reported models, the model of McTigue et al. (44), which

requires 64 parameters as opposed to 4.

Following publication of the SBT model (199) described in Chapter 3, an extension to

the McTigue model was published by Owzarzcy et al. (216) which allows for prediction of

melting thermodynamics of LNA gapmers as well as mixmers. The accuracy of that model

with respect to predicted melting temperatures is similar to that of the SBT model. However

in addition to the 64 parameters required by the original model of McTigue et al., Owzarzcy

et al.’s extension requires 32 LNA NN parameters specific to the prediction of hybridization

thermodynamics of nearest neighbors comprised of consecutive LNA•DNA base pairs (i.e.,

gapmers). A preliminary comparison, yet to be published, between the McTigue-Owczarzy

linear LNA NNT model and the nonlinear LNA SBT shows that the accuracy and precision

168 of Tm and ΔTm predictions for DNA-LNA heteroduplexes containing gapmers are similar despite the significant difference in parameters required (96 parameters versus 4 parameters).

The structure of the LNA SBT model assumes that the substitution of LNAs into a

DNA duplex has minimal net effect on base stacking forces. The stabilizing effect is therefore largely isolated to within the nucleoside and the modified base pair. However, though not addressed in this thesis, melting thermodynamic data collected as part of this work indicate that there may be some exceptions to this rule. In particular, significant absolute error (> 5 °C) in Tm and ∆Tm predictions using either the LNA SBT or the McTigue-

Owczarzy LNA NNT model is observed for LNA-DNA heteroduplexes consisting of

repetitive LNA pyrimidine or purine substitutions. Theoretically, a model that explicitly

considers nearest-neighbor interactions between LNA substituted base pairs such as the

McTigue-Owczarzy LNA NNT model should capture these effects. By design, the LNA

SBT model will not in its current form. Nevertheless, both models predict Tm and ∆Tm of duplexes containing stretches of consecutive LNA pyrimidine or purine substitutions with equal accuracy, suggesting that the mechanism for the unusual stabilities of this class of

heteroduplexes is not understood. A potentially fruitful avenue for future research is therefore to investigate the hybridization thermodynamics of homopurine and homopyrimidine LNA-DNA heteroduplexes using the approach described in Chapter 3, in part to allow additional parameters to be regressed for an extended form of the LNA SBT model that captures the unique stacking effects of this subset of sequences and thereby improves predictions.

One of the appeals of basing the LNA SBT on the “unified” NNT model is, owing to the success of this model, substantial progress has already been made to extend the model to

169 allow for prediction of hybridization thermodynamics at a variety of different solution

conditions (51-53), as well as to account for the effect of mismatches (116) or

oligonucleotide modifications such as terminal 5’ fluorophores and 3’ quenchers (201).

These benefits were exploited in Chapter 4 to extend the LNA SBT to permit accurate

prediction of hybridization thermodynamics of LNA-DNA heteroduplexes formed between

hydrolysis probes and either the target or non-target allele at PCR buffer conditions. This

then allowed for the discovery that hydrolysis probe performance in specifically detecting a

rare SPM-bearing allele in a background of WT allele is ultimately not limited by the

selectivity of probe in annealing to the target template, but rather by more general limitations

to qPCR when applied to mixtures of templates sharing nearly identical sequence. This work therefore highlights the need to develop alternative techniques offering greater analytical

specificity, such as the creation of AS primers that selectively amplify only the target of interest (i.e. the lower frequency MT template bearing the SPM).

Although AS primers bearing a terminal 3’ LNA substitution have previously been described and used to improve analytical specificity for a template containing a single base difference, the performance attributes of that technology are highly variable (41-43,157,161).

Chapter 5 therefore focused on combining predictions of melting thermodynamics with an investigation of the impact of LNA substitutions and substitution patterns at or near the 3’

end of a primer on the extension activity of Taq polymerase. The combined results served as

a basis for designing new AS primers capable of selective amplification of either a MT or a

WT template. Insights from this study thereby provided the foundation to develop a number

of unique LNA substitution patterns potentially capable of unequivocal SPM detection.

Indeed, one of the AS LNA primers demonstrating unequivocal detection of the SPM JAK2

170 V617F has been coupled to Plexor detection technology and is now applied to patient samples at the BCCA’s Clinical Cancer Genetics Laboratory. At the time of writing, AS

LNA primers designed using the guidelines reported in Chapter 5 and directed toward BRAF

V600E were also being evaluated by the BCCA for clinical use. This translation into clinical practice confirms that the design guidelines presented in chapter 5 may be used to generate

AS LNA primers targeting clinically important SPMs that offer exceptional performance attributes and acceptable price points. A primary focus of future efforts will therefore be to apply this method to design AS primers directed against a broad range of SPMs relevant to cancer diagnoses and setting of treatment regimens.

In this area, some exciting medical opportunities are presented by the extraordinary analytical specificities and sensitivities that can be achieved using the AS primer designs

(and associated guidelines) reported in Chapter 5. For example, the favorable performance attributes of this technology, particularly when applied to samples collected from cancer patients, should improve both early disease detection and the monitoring of minimal residual disease (MMRD) after treatment. Following treatment where the cancer goes into remission, the detection of low frequency SPMs, either previously identified or newly acquired by the specific cancer, may indicate disease relapse or progression. An ability to detect this may therefore allow appropriate treatment to be implemented early, ideally sending the cancer back into remission and improving patient outcome. With respect to early cancer detection, the ability to identify either rare circulating tumor cells (CTC) or circulating cell-free DNA

(cfDNA) containing clinically important SPMs represents a significant challenge due to the extremely low concentrations of CTC or cfDNA typically present. Establishment of a

171 cfDNA/CTC detection technique offering excellent analytical sensitivity and specificity may therefore represent an ultimate challenge and application for AS LNA primer technology.

172 References

1. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860-921. 2. Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A. et al. (2001) The sequence of the human genome. Science, 291, 1304-1351. 3. Human Genome Sequencing, C. (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931-945. 4. The International HapMap Consortium. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851-861. 5. Forbes, S.A., Bhamra, G., Bamford, S., Dawson, E., Kok, C., Clements, J., Menzies, A., Teague, J.W., Futreal, P.A. and Stratton, M.R. (2008) The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr Protoc Hum Genet, Chapter 10, Unit 10 11. 6. Forbes, S.A., Bindal, N., Bamford, S., Cole, C., Kok, C.Y., Beare, D., Jia, M., Shepherd, R., Leung, K., Menzies, A. et al. (2011) COSMIC: mining complete cancer in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research, 39, D945-D950. 7. Stratton, M.R. (2011) Exploring the Genomes of Cancer Cells: Progress and Promise. Science, 331, 1553-1558. 8. Stratton, M.R., Campbell, P.J. and Futreal, P.A. (2009) The cancer genome. Nature, 458, 719-724. 9. Blank, P.R., Moch, H., Szucs, T.D. and Schwenkglenks, M. (2011) KRAS and BRAF mutation analysis in metastatic colorectal cancer: a cost-effectiveness analysis from a Swiss perspective. Clin Cancer Res, 17, 6338-6346. 10. Marsh, S. and McLeod, H.L. (2006) Pharmacogenomics: from bedside to clinical practice. Hum Mol Genet, 15 Spec No 1, R89-93. 11. Stuart, D. and Sellers, W.R. (2009) Linking somatic genetic alterations in cancer to therapeutics. Curr Opin Cell Biol, 21, 304-310. 12. Swen, J.J., Huizinga, T.W., Gelderblom, H., de Vries, E.G., Assendelft, W.J., Kirchheiner, J. and Guchelaar, H.J. (2007) Translating pharmacogenomics: challenges on the road to the clinic. PLoS Med, 4, e209. 13. Vincent, M.D., Kuruvilla, M.S., Leighl, N.B. and Kamel-Reid, S. (2012) Biomarkers that currently affect clinical practice: EGFR, ALK, MET, KRAS. Current Oncology, 19. 14. Wood, L.D., Parsons, D.W., Jones, S., Lin, J., Sjoblom, T., Leary, R.J., Shen, D., Boca, S.M., Barber, T., Ptak, J. et al. (2007) The genomic landscapes of human breast and colorectal cancers. Science, 318, 1108-1113. 15. Arkenau, H.T., Kefford, R. and Long, G.V. Targeting BRAF for patients with melanoma. Br J Cancer, 104, 392-398. 16. Ascierto, P.A., Kirkwood, J.M., Grob, J.J., Simeone, E., Grimaldi, A.M., Maio, M., Palmieri, G., Testori, A., Marincola, F.M. and Mozzillo, N. (2012) The role of BRAF V600 mutation in melanoma. J Transl Med, 10, 85. 17. Wilson, V.L., Wei, Q., Wade, K.R., Chisa, M., Bailey, D., Kanstrup, C.M., Yin, X., Jackson, C.M., Thompson, B. and Lee, W.R. (1999) Needle-in-a-haystack detection

173 and identification of base substitution mutations in human tissues. Mutat Res, 406, 79-100. 18. Kwok, P.Y. (2000) Finding a needle in a haystack: detection and quantification of rare mutant alleles are coming of age. Clin Chem, 46, 593-594. 19. Ledford, H. (2010) Big science: The cancer genome challenge. Nature, 464, 972-974. 20. Hert, D.G., Fredlake, C.P. and Barron, A.E. (2008) Advantages and limitations of next-generation sequencing technologies: A comparison of electrophoresis and non- electrophoresis methods. ELECTROPHORESIS, 29, 4618-4626. 21. Kircher, M. and Kelso, J. (2010) High-throughput DNA sequencing – concepts and limitations. BioEssays, 32, 524-536. 22. Niedringhaus, T.P., Milanova, D., Kerby, M.B., Snyder, M.P. and Barron, A.E. (2011) Landscape of Next-Generation Sequencing Technologies. Analytical Chemistry, 83, 4327-4341. 23. Tsongalis, G.J. and Coleman, W.B. (2006) Clinical genotyping: the need for interrogation of single nucleotide polymorphisms and mutations in the clinical laboratory. Clin Chim Acta, 363, 127-137. 24. Bustin, S.A., Benes, V., Garson, J.A., Hellemans, J., Huggett, J., Kubista, M., Mueller, R., Nolan, T., Pfaffl, M.W., Shipley, G.L. et al. (2009) The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem, 55, 611-622. 25. Saah, A.J. and Hoover, D.R. (1997) "Sensitivity" and "Specificity" Reconsidered: The Meaning of These Terms in Analytical and Diagnostic Settings. Annals of Internal Medicine, 126, 91-94. 26. Germer, S., Holland, M.J. and Higuchi, R. (2000) High-throughput SNP allele- frequency determination in pooled DNA samples by kinetic PCR. Genome Res, 10, 258-266. 27. Livak, K.J. (1999) Allelic discrimination using fluorogenic probes and the 5' nuclease assay. Genet Anal, 14, 143-149. 28. Oliver, D.H., Thompson, R.E., Griffin, C.A. and Eshleman, J.R. (2000) Use of Single Nucleotide Polymorphisms (SNP) and Real-Time Polymerase Chain Reaction for Bone Marrow Engraftment Analysis. The Journal of Molecular Diagnostics, 2, 202- 208. 29. Kang, H.Y., Hwang, J.Y., Kim, S.H., Goh, H.G., Kim, M. and Kim, D.W. (2006) Comparison of allele specific oligonucleotide-polymerase chain reaction and direct sequencing for high throughput screening of ABL kinase domain mutations in chronic myeloid leukemia resistant to imatinib. Haematologica, 91, 659-662. 30. Ellison, G., Donald, E., McWalter, G., Knight, L., Fletcher, L., Sherwood, J., Cantarini, M., Orr, M. and Speake, G. (2010) A comparison of ARMS and DNA sequencing for mutation analysis in clinical biopsy samples. Journal of Experimental & Clinical Cancer Research, 29, 132. 31. Hodgson, D.R., Foy, C.A., Partridge, M., Pateromichelakis, S. and Gibson, N.J. (2002) Development of a facile fluorescent assay for the detection of 80 mutations within the p53 gene. Mol Med, 8, 227-237. 32. Ogasawara, N., Bando, H., Kawamoto, Y., Yoshino, T., Tsuchihara, K., Ohtsu, A. and Esumi, H. (2010) Feasibility and robustness of amplification refractory mutation

174 system (ARMS)-based KRAS testing using clinically available formalin-fixed, paraffin-embedded samples of colorectal cancers. Jpn J Clin Oncol, 41, 52-56. 33. Vannucchi, A.M., Pancrazzi, A., Bogani, C., Antonioli, E. and Guglielmelli, P. (2006) A quantitative assay for JAK2(V617F) mutation in myeloproliferative disorders by ARMS-PCR and capillary electrophoresis. Leukemia, 20, 1055-1060. 34. Vester, B. and Wengel, J. (2004) LNA (locked nucleic acid): high-affinity targeting of complementary RNA and DNA. Biochemistry, 43, 13233-13241. 35. Markova, J., Prukova, D., Volkova, Z. and Schwarz, J. (2007) A new allelic discrimination assay using locked nucleic acid-modified nucleotides (LNA) probes for detection of JAK2 V617F mutation. Leuk Lymphoma, 48, 636-639. 36. Ugozzoli, L.A., Latorra, D., Puckett, R., Arar, K. and Hamby, K. (2004) Real-time genotyping with oligonucleotide probes containing locked nucleic acids. Anal Biochem, 324, 143-152. 37. Letertre, C., Perelle, S., Dilasser, F., Arar, K. and Fach, P. (2003) Evaluation of the performance of LNA and MGB probes in 5'-nuclease PCR assays. Mol Cell Probes, 17, 307-311. 38. You, Y., Moreira, B.G., Behlke, M.A. and Owczarzy, R. (2006) Design of LNA probes that improve mismatch discrimination. Nucleic Acids Res, 34, e60. 39. Johnson, M.P., Haupt, L.M. and Griffiths, L.R. (2004) Locked nucleic acid (LNA) single nucleotide polymorphism (SNP) genotype analysis and validation using real- time PCR. Nucleic Acids Res, 32, e55. 40. Costa, J.M., Ernault, P., Olivi, M., Gaillon, T. and Arar, K. (2004) Chimeric LNA/DNA probes as a detection system for real-time PCR. Clin Biochem, 37, 930- 932. 41. Latorra, D., Campbell, K., Wolter, A. and Hurley, J.M. (2003) Enhanced allele- specific PCR discrimination in SNP genotyping using 3′ locked nucleic acid (LNA) primers. Human Mutation, 22, 79-85. 42. Latorra, D., Hopkins, D., Campbell, K. and Hurley, J.M. (2003) Multiplex allele- specific PCR with optimized locked nucleic acid primers. Biotechniques, 34, 1150- 1152, 1154, 1158. 43. Maertens, O., Legius, E., Speleman, F., Messiaen, L. and Vandesompele, J. (2006) Real-time quantitative allele discrimination assay using 3' locked nucleic acid primers for detection of low-percentage mosaic mutations. Anal Biochem, 359, 144-146. 44. McTigue, P.M., Peterson, R.J. and Kahn, J.D. (2004) Sequence-dependent thermodynamic parameters for locked nucleic acid (LNA)-DNA duplex formation. Biochemistry, 43, 5388-5405. 45. Tolstrup, N., Nielsen, P.S., Kolberg, J.G., Frankel, A.M., Vissing, H. and Kauppinen, S. (2003) OligoDesign: Optimal design of LNA (locked nucleic acid) oligonucleotide capture probes for gene expression profiling. Nucleic Acids Res, 31, 3758-3762. 46. Di Giusto, D.A. and King, G.C. (2004) Strong positional preference in the interaction of LNA oligonucleotides with DNA polymerase and proofreading exonuclease activities: implications for genotyping assays. Nucleic Acids Res, 32, e32. 47. Jensen, G.A., Singh, S.K., Kumar, R., Wengel, J. and Jacobsen, J.P. (2001) A comparison of the solution structures of an LNA:DNA duplex and the unmodified DNA:DNA duplex. Journal of the Chemical Society, Perkin Transactions 2, 1224- 1232.

175 48. Nielsen, K.E., Rasmussen, J., Kumar, R., Wengel, J., Jacobsen, J.P. and Petersen, M. (2004) NMR studies of fully modified locked nucleic acid (LNA) hybrids: solution structure of an LNA:RNA hybrid and characterization of an LNA:DNA hybrid. Bioconjug Chem, 15, 449-457. 49. Nielsen, K.E., Singh, S.K., Wengel, J. and Jacobsen, J.P. (2000) Solution structure of an LNA hybridized to DNA: NMR study of the d(CT(L)GCT(L)T(L)CT(L)GC):d(GCAGAAGCAG) duplex containing four locked nucleotides. Bioconjug Chem, 11, 228-238. 50. Petersen, M., Nielsen, C.B., Nielsen, K.E., Jensen, G.A., Bondensgaard, K., Singh, S.K., Rajwanshi, V.K., Koshkin, A.A., Dahl, B.M., Wengel, J. et al. (2000) The conformations of locked nucleic acids (LNA). J Mol Recognit, 13, 44-53. 51. Owczarzy, R., Moreira, B.G., You, Y., Behlke, M.A. and Walder, J.A. (2008) Predicting stability of DNA duplexes in solutions containing magnesium and monovalent cations. Biochemistry, 47, 5336-5353. 52. Owczarzy, R., You, Y., Moreira, B.G., Manthey, J.A., Huang, L., Behlke, M.A. and Walder, J.A. (2004) Effects of sodium ions on DNA duplex oligomers: improved predictions of melting temperatures. Biochemistry, 43, 3537-3554. 53. von Ahsen, N., Wittwer, C.T. and Schutz, E. (2001) Oligonucleotide melting temperatures under PCR conditions: nearest-neighbor corrections for Mg(2+), deoxynucleotide triphosphate, and dimethyl sulfoxide concentrations with comparison to alternative empirical formulas. Clin Chem, 47, 1956-1961. 54. Watson, J.D. and Crick, F.H. (1953) Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature, 171, 737-738. 55. Franklin, R.E. and Gosling, R.G. (1953) Molecular configuration in sodium thymonucleate. Nature, 171, 740-741. 56. Wilkins, M.H., Stokes, A.R. and Wilson, H.R. (1953) Molecular structure of deoxypentose nucleic acids. Nature, 171, 738-740. 57. Bloomfield, V.A., Crothers, D.M. and Tinoco, I. (2000) Nucleic acids : structures, properties, and functions. University Science Books, Sausalito, Calif. 58. Marmur, J. and Doty, P. (1962) Determination of the base composition of deoxyribonucleic acid from its thermal denaturation temperature. J Mol Biol, 5, 109- 118. 59. Aboul-ela, F., Koh, D., Tinoco, I. and Martin, F.H. (1985) Base-base mismatches. Thermodynamics of double helix formation for dCA3XA3G + dCT3YT3G (X, Y = A,C,G,D. Nucleic Acids Research, 13, 4811-4824. 60. Allawi, H.T. and SantaLucia, J., Jr. (1997) Thermodynamics and NMR of internal G.T mismatches in DNA. Biochemistry, 36, 10581-10594. 61. Breslauer, K.J., Frank, R., Blocker, H. and Marky, L.A. (1986) Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A, 83, 3746-3750. 62. Owczarzy, R., Vallone, P.M., Gallo, F.J., Paner, T.M., Lane, M.J. and Benight, A.S. (1997) Predicting sequence-dependent melting stability of short duplex DNA oligomers. Biopolymers, 44, 217-239. 63. SantaLucia, J., Jr. (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci U S A, 95, 1460-1465.

176 64. SantaLucia, J., Jr., Allawi, H.T. and Seneviratne, P.A. (1996) Improved nearest- neighbor parameters for predicting DNA duplex stability. Biochemistry, 35, 3555- 3562. 65. Sugimoto, N., Nakano, S., Yoneyama, M. and Honda, K. (1996) Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res, 24, 4501-4505. 66. Arnott, S. and Hukins, D.W.L. (1972) Optimised parameters for A-DNA and B-DNA. Biochemical and Biophysical Research Communications, 47, 1504-1509. 67. Wing, R., Drew, H., Takano, T., Broka, C., Tanaka, S., Itakura, K. and Dickerson, R.E. (1980) Crystal structure analysis of a complete turn of B-DNA. Nature, 287, 755-758. 68. Benevides, J.M. and Thomas, G.J. (1983) Characterization of DNA structures by Raman spectroscopy: high-salt and low-salt forms of double helical poly(dG-dC) in H2O and D2O solutions and application to B, Z and A-DNA*. Nucleic Acids Research, 11, 5747-5761. 69. Fodor, S.P.A. and Spiro, T.G. (1986) Ultraviolet resonance Raman spectroscopy of DNA with 200-266-nm laser excitation. Journal of the American Chemical Society, 108, 3198-3205. 70. Hare, D.R., Wemmer, D.E., Chou, S.-H., Drobny, G. and Reid, B.R. (1983) Assignment of the non-exchangeable proton resonances of d(C-G-C-G-A-A-T-T-C- G-C-G) using two-dimensional nuclear magnetic resonance methods. Journal of Molecular Biology, 171, 319-336. 71. Privé, G.G., Heinemann, U., Chandrasegaran, S., Kan, L.-S., Kopka, M.L. and Dickerson, R.E. (1987) Helix Geometry, Hydration, and G-A Mismatch in a B-DNA Decamer. Science, 238, 498-504. 72. Petersheim, M. and Turner, D.H. (1983) Base-stacking and base-pairing contributions to helix stability: thermodynamics of double-helix formation with CCGG, CCGGp, CCGGAp, ACCGGp, CCGGUp, and ACCGGUp. Biochemistry, 22, 256-263. 73. Holbrook, J.A., Capp, M.W., Saecker, R.M. and Record, M.T., Jr. (1999) Enthalpy and heat capacity changes for formation of an oligomeric DNA duplex: interpretation in terms of coupled processes of formation and association of single-stranded helices. Biochemistry, 38, 8409-8422. 74. Jelesarov, I., Crane-Robinson, C. and Privalov, P.L. (1999) The energetics of HMG box interactions with DNA: thermodynamic description of the target DNA duplexes. J Mol Biol, 294, 981-995. 75. Guckian, K.M., Schweitzer, B.A., Ren, R.X.F., Sheils, C.J., Tahmassebi, D.C. and Kool, E.T. (2000) Factors Contributing to Aromatic Stacking in Water: Evaluation in the Context of DNA. Journal of the American Chemical Society, 122, 2213-2222. 76. Dixit, S.B., Beveridge, D.L., Case, D.A., Cheatham, T.E., 3rd, Giudice, E., Lankas, F., Lavery, R., Maddocks, J.H., Osman, R., Sklenar, H. et al. (2005) Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides. II: sequence context effects on the dynamical structures of the 10 unique dinucleotide steps. Biophys J, 89, 3721-3740. 77. Tinoco, I. (1960) Hypochromism in Polynucleotides1. Journal of the American Chemical Society, 82, 4785-4790.

177 78. Breslauer, K.J. (1995) Extracting thermodynamic data from equilibrium melting curves for oligonucleotide order-disorder transitions. Methods Enzymol, 259, 221- 242. 79. Cantor, C.R. and Schimmel, P.R. (1980) Biophysical chemistry, Part III: The Behavior of Biological Macromolecules. W.H. Freeman, San Francisco. 80. Mergny, J.L. and Lacroix, L. (2003) Analysis of thermal melting curves. Oligonucleotides, 13, 515-537. 81. Owczarzy, R. (2005) Melting temperatures of nucleic acids: discrepancies in analysis. Biophys Chem, 117, 207-215. 82. Biltonen, R.L. and Freire, E. (1978) Thermodynamic characterization of conformational states of biological macromolecules using differential scanning calorimetry. CRC Crit Rev Biochem, 5, 85-124. 83. Privalov, G.P. and Privalov, P.L. (2000) Problems and prospects in microcalorimetry of biological macromolecules. Methods Enzymol, 323, 31-62. 84. Freire, E. (1995) Differential scanning calorimetry. Methods Mol Biol, 40, 191-218. 85. Bruylants, G., Boccongelli, M., Snoussi, K. and Bartik, K. (2009) Comparison of the thermodynamics and base-pair dynamics of a full LNA:DNA duplex and of the isosequential DNA:DNA duplex. Biochemistry, 48, 8473-8482. 86. Chalikian, T.V., Volker, J., Plum, G.E. and Breslauer, K.J. (1999) A more unified picture for the thermodynamics of nucleic acid duplex melting: a characterization by calorimetric and volumetric techniques. Proc Natl Acad Sci U S A, 96, 7853-7858. 87. Mikulecky, P.J. and Feig, A.L. (2006) Heat capacity changes associated with DNA duplex formation: salt- and sequence-dependent effects. Biochemistry, 45, 604-616. 88. Mikulecky, P.J. and Feig, A.L. (2006) Heat capacity changes associated with nucleic acid folding. Biopolymers, 82, 38-58. 89. Tikhomirova, A., Beletskaya, I.V. and Chalikian, T.V. (2006) Stability of DNA duplexes containing GG, CC, AA, and TT mismatches. Biochemistry, 45, 10563- 10571. 90. Tikhomirova, A., Taulier, N. and Chalikian, T.V. (2004) Energetics of nucleic acid stability: the effect of DeltaCP. J Am Chem Soc, 126, 16387-16394. 91. Prabhu, N.V. and Sharp, K.A. (2005) Heat Capacity in . Annual Review of Physical Chemistry, 56, 521-548. 92. Ramprakash, J., Lang, B. and Schwarz, F.P. (2008) Thermodynamics of single strand DNA base stacking. Biopolymers, 89, 969-979. 93. Zhou, J., Gregurick, S.K., Krueger, S. and Schwarz, F.P. (2006) Conformational changes in single-strand DNA as a function of temperature by SANS. Biophys J, 90, 544-551. 94. Mrevlishvili, G.M., Carvalho, A.P.S.M.C. and Ribeiro da Silva, M.A.V. (2002) Low- temperature DSC study of the hydration of ss-DNA and ds-DNA and the role of hydrogen-bonded network to the duplex transition thermodynamics. Thermochimica Acta, 394, 73-82. 95. Spink, C.H. and Chaires, J.B. (1998) Effects of Hydration, Ion Release, and Excluded Volume on the Melting of Triplex and Duplex DNA†Biochemistry, 38, 496-508. 96. Madan, B. and Sharp, K.A. (2001) Hydration Heat Capacity of Nucleic Acid Constituents Determined from the Random Network Model. Biophysical journal, 81, 1881-1887.

178 97. Gallagher, K. and Sharp, K. (1998) Electrostatic Contributions to Heat Capacity Changes of DNA-Ligand Binding. Biophysical journal, 75, 769-776. 98. Plotnikov, V.V., Brandts, J.M., Lin, L.N. and Brandts, J.F. (1997) A new ultrasensitive scanning calorimeter. Anal Biochem, 250, 237-244. 99. Freier, S.M., Alkema, D., Sinclair, A., Neilson, T. and Turner, D.H. (1985) Contributions of dangling end stacking and terminal base-pair formation to the stabilities of XGGCCp, XCCGGp, XGGCCYp, and XCCGGYp helixes. Biochemistry, 24, 4533-4539. 100. Turner, D.H., Sugimoto, N., Kierzek, R. and Dreiker, S.D. (1987) Free energy increments for hydrogen bonds in nucleic acid base pairs. Journal of the American Chemical Society, 109, 3783-3785. 101. Gray, D.M. (1997) Derivation of nearest-neighbor properties from data on nucleic acid oligomers. II. Thermodynamic parameters of DNA.RNA hybrids and DNA duplexes. Biopolymers, 42, 795-810. 102. Gray, D.M. (1997) Derivation of nearest-neighbor properties from data on nucleic acid oligomers. I. Simple sets of independent sequences and the influence of absent nearest neighbors. Biopolymers, 42, 783-793. 103. Rouzina, I. and Bloomfield, V.A. (1999) Heat capacity effects on the melting of DNA. 2. Analysis of nearest-neighbor base pair effects. Biophys J, 77, 3252-3255. 104. Turner, D.H. (1996) Thermodynamics of base pairing. Curr Opin Struct Biol, 6, 299- 304. 105. von Ahsen, N., Oellerich, M., Armstrong, V.W. and Schutz, E. (1999) Application of a thermodynamic nearest-neighbor model to estimate nucleic acid stability and optimize probe design: prediction of melting points of multiple mutations of apolipoprotein B-3500 and factor V with a hybridization probe genotyping assay on the LightCycler. Clin Chem, 45, 2094-2101. 106. Tøstesen, E., Liu, F., Jenssen, T.K. and Hovig, E. (2003) Speed-up of DNA melting algorithm with complete nearest neighbor properties. Biopolymers, 70, 364-376. 107. Fisher, M.E. (1966) Effect of Excluded Volume on Phase Transitions in Biopolymers. The Journal of Chemical Physics, 45, 1469-1473. 108. Causo, M.S., Coluzzi, B. and Grassberger, P. (2000) Simple model for the DNA denaturation transition. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics, 62, 3958-3973. 109. Gray, D.M. and Tinoco, I., Jr. (1970) A new approach to the study of sequence- dependent properties of polynucleotides. Biopolymers, 9, 223-244. 110. Bailey, W.F. and Monahan, A.S. (1978) Statistical effects and the evaluation of entropy differences in equilibrium processes. Symmetry corrections and entropy of mixing. Journal of Chemical Education, 55, 489. 111. Rychlik, W., Spencer, W.J. and Rhoads, R.E. (1990) Optimization of the annealing temperature for DNA amplification in vitro. Nucleic Acids Res, 18, 6409-6412. 112. Rouillard, J.M., Zuker, M. and Gulari, E. (2003) OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. Nucleic Acids Res, 31, 3057-3062. 113. Matveeva, O.V., Shabalina, S.A., Nemtsov, V.A., Tsodikov, A.D., Gesteland, R.F. and Atkins, J.F. (2003) Thermodynamic calculations and statistical correlations for oligo-probes design. Nucleic Acids Res, 31, 4211-4217.

179 114. Matveeva, O.V., Mathews, D.H., Tsodikov, A.D., Shabalina, S.A., Gesteland, R.F., Atkins, J.F. and Freier, S.M. (2003) Thermodynamic criteria for high hit rate antisense oligonucleotide design. Nucleic Acids Res, 31, 4989-4994. 115. Chavali, S., Mahajan, A., Tabassum, R., Maiti, S. and Bharadwaj, D. (2005) Oligonucleotide properties determination and primer designing: a critical examination of predictions. Bioinformatics, 21, 3918-3925. 116. SantaLucia, J., Jr. and Hicks, D. (2004) The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct, 33, 415-440. 117. Bergstrom, D.E. (2001) Unnatural nucleosides with unusual base pairing properties. Curr Protoc Nucleic Acid Chem, Chapter 1, Unit 1 4. 118. Kool, E.T. (1997) Preorganization of DNA: Design Principles for Improving Nucleic Acid Recognition by Synthetic Oligonucleotides. Chem Rev, 97, 1473-1488. 119. K. Singh, S., A. Koshkin, A., Wengel, J. and Nielsen, P. (1998) LNA (locked nucleic acids): synthesis and high-affinity nucleic acid recognition. Chemical Communications, 0, 455-456. 120. Koshkin, A.A., Singh, S.K., Nielsen, P., Rajwanshi, V.K., Kumar, R., Meldgaard, M., Olsen, C.E. and Wengel, J. (1998) LNA (Locked Nucleic Acids): Synthesis of the adenine, cytosine, guanine, 5-methylcytosine, thymine and uracil bicyclonucleoside monomers, oligomerisation, and unprecedented nucleic acid recognition. Tetrahedron, 54, 3607-3630. 121. Obika, S., Nanbu, D., Hari, Y., Morio, K.-i., In, Y., Ishida, T. and Imanishi, T. (1997) Synthesis of 2'-O,4'-C-methyleneuridine and -cytidine. Novel bicyclic nucleosides having a fixed C3, -endo sugar puckering. Tetrahedron Letters, 38, 8735-8738. 122. Obika, S., Nanbu, D., Hari, Y., Andoh, J.-i., Morio, K.-i., Doi, T. and Imanishi, T. (1998) Stability and structural features of the duplexes containing nucleoside analogues with a fixed N-type conformation, 2'-O,4'-C-methyleneribonucleosides. Tetrahedron Letters, 39, 5401-5404. 123. Koshkin, A.A., Nielsen, P., Meldgaard, M., Rajwanshi, V.K., Singh., S.K. and Wengel, J. (1998) LNA (Locked Nucleic Acid): An RNA Mimic Forming Exceedingly Stable LNA:DNA Duplexes. Journal of the American Chemical Society, 120, 13252-13253. 124. Lauritsen, A. and Wengel, J. (2002) Oligodeoxynucleotides containing amide-linked LNA-type dinucleotides: synthesis and high-affinity nucleic acid hybridization. Chem Commun (Camb), 530-531. 125. Singh, S.K., Nielsen, P., Koshkin, A.A. and Wengel, J. (1998) LNA (Locked nucelic acids): synthesis and high-affinity nucleic acid recognition. Chemical Communications, 455-456. 126. Koshkin, A.A., Nielsen, P., Meldgaard, M., Rajwanshi, V.K., Singh, S.K. and Wengel, J. (1998) LNA (Locked Nucleic Acid): An RNA Mimic Forming Exceedingly Stable LNA:LNA Duplexes. Journal of the American Chemical Society, 120, 13252-13253. 127. Singh, S.K., Koshkin, A.A., Wengel, J. and Nielsen, P. (1998) LNA (locked nucleic acids): synthesis and high-affinity nucleic acid recognition. Chemical Communications, 0, 455-456. 128. Kaur, H., Babu, B.R. and Maiti, S. (2007) Perspectives on chemistry and therapeutic applications of Locked Nucleic Acid (LNA). Chem Rev, 107, 4672-4697.

180 129. Petersen, M. and Wengel, J. (2003) LNA: a versatile tool for therapeutics and . Trends Biotechnol, 21, 74-81. 130. . Proligo. 131. Frieden, M., Hansen, H.F. and Koch, T. (2003) Nuclease stability of LNA oligonucleotides and LNA-DNA chimeras. Nucleosides Nucleotides Nucleic Acids, 22, 1041-1043. 132. Braasch, D.A. and Corey, D.R. (2001) Locked nucleic acid (LNA): fine-tuning the recognition of DNA and RNA. Chem Biol, 8, 1-7. 133. Kongsbak, L. (2002). Exiqon, Vedbaek, Denmark, pp. 1. 134. Christensen, U., Jacobsen, N., Rajwanshi, V.K., Wengel, J. and Koch, T. (2001) Stopped-flow kinetics of locked nucleic acid (LNA)-oligonucleotide duplex formation: studies of LNA-DNA and DNA-DNA interactions. Biochem J, 354, 481- 484. 135. Kaur, H., Arora, A., Wengel, J. and Maiti, S. (2006) Thermodynamic, counterion, and hydration effects for the incorporation of locked nucleic acid nucleotides into DNA duplexes. Biochemistry, 45, 7347-7355. 136. Wu, P., Nakano, S. and Sugimoto, N. (2002) Temperature dependence of thermodynamic properties for DNA/DNA and RNA/DNA duplex formation. Eur J Biochem, 269, 2821-2830. 137. Rouzina, I. and Bloomfield, V.A. (1999) Heat capacity effects on the melting of DNA. 1. General aspects. Biophys J, 77, 3242-3251. 138. Higuchi, R., Fockler, C., Dollinger, G. and Watson, R. (1993) Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Biotechnology (N Y), 11, 1026- 1030. 139. Klein, D. (2002) Quantification using real-time PCR technology: applications and limitations. Trends Mol Med, 8, 257-260. 140. Mackay, I.M. (2004) Real-time PCR in the microbiology laboratory. Clin Microbiol Infect, 10, 190-212. 141. Ginzinger, D.G. (2002) Gene quantification using real-time quantitative PCR: an emerging technology hits the mainstream. Exp Hematol, 30, 503-512. 142. Livak, K.J., Flood, S.J., Marmaro, J., Giusti, W. and Deetz, K. (1995) Oligonucleotides with fluorescent dyes at opposite ends provide a quenched probe system useful for detecting PCR product and nucleic acid hybridization. PCR Methods Appl, 4, 357-362. 143. Orlando, C., Pinzani, P. and Pazzagli, M. (1998) Developments in quantitative PCR. Clin Chem Lab Med, 36, 255-269. 144. Juskowiak, B. (2011) Nucleic acid-based fluorescent probes and their analytical potential. Anal Bioanal Chem, 399, 3157-3176. 145. Kutyavin, I.V., Afonina, I.A., Mills, A., Gorn, V.V., Lukhtanov, E.A., Belousov, E.S., Singer, M.J., Walburger, D.K., Lokhov, S.G., Gall, A.A. et al. (2000) 3'-minor groove binder-DNA probes increase sequence specificity at PCR extension temperatures. Nucleic Acids Res, 28, 655-661. 146. Allawi, H.T. and SantaLucia, J., Jr. (1998) Thermodynamics of internal C.T mismatches in DNA. Nucleic Acids Res, 26, 2694-2701. 147. Allawi, H.T. and SantaLucia, J., Jr. (1998) Nearest neighbor thermodynamic parameters for internal G.A mismatches in DNA. Biochemistry, 37, 2170-2179.

181 148. Peyret, N., Seneviratne, P.A., Allawi, H.T. and SantaLucia, J., Jr. (1999) Nearest- neighbor thermodynamics and NMR of DNA sequences with internal A.A, C.C, G.G, and T.T mismatches. Biochemistry, 38, 3468-3477. 149. Kwok, S., Chang, S.Y., Sninsky, J.J. and Wang, A. (1994) A guide to the design and use of mismatched and degenerate primers. PCR Methods Appl, 3, S39-47. 150. Newton, C.R., Graham, A., Heptinstall, L.E., Powell, S.J., Summers, C., Kalsheker, N., Smith, J.C. and Markham, A.F. (1989) Analysis of any point mutation in DNA. The amplification refractory mutation system (ARMS). Nucleic Acids Res, 17, 2503- 2516. 151. Huang, M.M., Arnheim, N. and Goodman, M.F. (1992) Extension of base mispairs by Taq DNA polymerase: implications for single nucleotide discrimination in PCR. Nucleic Acids Res, 20, 4567-4573. 152. Petruska, J., Goodman, M.F., Boosalis, M.S., Sowers, L.C., Cheong, C. and Tinoco, I., Jr. (1988) Comparison between DNA melting thermodynamics and DNA polymerase fidelity. Proc Natl Acad Sci U S A, 85, 6252-6256. 153. Eckert, K.A. and Kunkel, T.A. (1990) High fidelity DNA synthesis by the Thermus aquaticus DNA polymerase. Nucleic Acids Res, 18, 3739-3744. 154. Li, M., Diehl, F., Dressman, D., Vogelstein, B. and Kinzler, K.W. (2006) BEAMing up for detection and quantification of rare sequence variants. Nat Methods, 3, 95-97. 155. Ayyadevara, S., Thaden, J.J. and Shmookler Reis, R.J. (2000) Discrimination of primer 3'-nucleotide mismatch by taq DNA polymerase during polymerase chain reaction. Anal Biochem, 284, 11-18. 156. Kwok, S., Kellogg, D.E., McKinney, N., Spasic, D., Goda, L., Levenson, C. and Sninsky, J.J. (1990) Effects of primer-template mismatches on the polymerase chain reaction: human immunodeficiency virus type 1 model studies. Nucleic Acids Res, 18, 999-1005. 157. Fang, J., Wichroski, M.J., Levine, S.M., Baldick, C.J., Mazzucco, C.E., Walsh, A.W., Kienzle, B.K., Rose, R.E., Pokornowski, K.A., Colonno, R.J. et al. (2009) Ultrasensitive genotypic detection of antiviral resistance in hepatitis B virus clinical isolates. Antimicrob Agents Chemother, 53, 2762-2772. 158. Nakitandwe, J., Trognitz, F. and Trognitz, B. (2007) Reliable allele detection using SNP-based PCR primers containing Locked Nucleic Acid: application in genetic mapping. Plant Methods, 3, 2. 159. Rupp, J., Solbach, W. and Gieffers, J. (2006) Single-nucleotide-polymorphism- specific PCR for quantification and discrimination of Chlamydia pneumoniae genotypes by use of a "locked" nucleic acid. Appl Environ Microbiol, 72, 3785-3787. 160. Thomassin, H., Kress, C. and Grange, T. (2004) MethylQuant: a sensitive method for quantifying methylation of specific cytosines within the genome. Nucleic Acids Res, 32, e168. 161. Strand, H., Ingebretsen, O.C. and Nilssen, O. (2008) Real-time detection and quantification of mitochondrial mutations with oligonucleotide primers containing locked nucleic acid. Clin Chim Acta, 390, 126-133. 162. Frantz, C., Sekora, D.M., Henley, D.C., Huang, C.K., Pan, Q., Quigley, N.B., Gorman, E., Hubbard, R.A. and Mirza, I. (2007) Comparative evaluation of three JAK2V617F mutation detection methods. Am J Clin Pathol, 128, 865-874.

182 163. Reddy, E.P., Reynolds, R.K., Santos, E. and Barbacid, M. (1982) A point mutation is responsible for the acquisition of transforming properties by the T24 human bladder carcinoma oncogene. Nature, 300, 149-152. 164. Steensma, D.P. (2006) JAK2 V617F in myeloid disorders: molecular diagnostic techniques and their clinical utility: a paper from the 2005 William Beaumont Hospital Symposium on Molecular Pathology. J Mol Diagn, 8, 397-411; quiz 526. 165. Cankovic, M., Whiteley, L., Hawley, R.C., Zarbo, R.J. and Chitale, D. (2009) Clinical performance of JAK2 V617F mutation detection assays in a molecular diagnostics laboratory: evaluation of screening and quantitation methods. Am J Clin Pathol, 132, 713-721. 166. Bennett, M. and Stroncek, D.F. (2006) Recent advances in the bcr-abl negative chronic myeloproliferative diseases. J Transl Med, 4, 41. 167. Sayyah, J. and Sayeski, P.P. (2009) Jak2 inhibitors: rationale and role as therapeutic agents in hematologic malignancies. Curr Oncol Rep, 11, 117-124. 168. Lim, K.-H., Pardanani, A. and Tefferi, A. (2008) KIT and Mastocytosis. Acta Haematologica, 119, 194-198. 169. Hirota, S., Isozaki, K., Moriyama, Y., Hashimoto, K., Nishida, T., Ishiguro, S., Kawano, K., Hanada, M., Kurata, A., Takeda, M. et al. (1998) Gain-of-function mutations of c-kit in human gastrointestinal stromal tumors. Science, 279, 577-580. 170. Beghini, A., Peterlongo, P., Ripamonti, C.B., Larizza, L., Cairoli, R., Morra, E. and Mecucci, C. (2000) C-kit mutations in core binding factor leukemias. Blood, 95, 726- 727. 171. Hongyo, T., Li, T., Syaifudin, M., Baskar, R., Ikeda, H., Kanakura, Y., Aozasa, K. and Nomura, T. (2000) Specific c-kit mutations in sinonasal natural killer/T-cell lymphoma in China and Japan. Cancer Res, 60, 2345-2347. 172. Corless, C.L., Harrell, P., Lacouture, M., Bainbridge, T., Le, C., Gatter, K., White, C., Jr., Granter, S. and Heinrich, M.C. (2006) Allele-specific polymerase chain reaction for the imatinib-resistant KIT D816V and D816F mutations in mastocytosis and acute myelogenous leukemia. J Mol Diagn, 8, 604-612. 173. Michaloglou, C., Vredeveld, L.C.W., Mooi, W.J. and Peeper, D.S. (2007) BRAFE600 in benign and malignant human tumours. Oncogene, 27, 877-895. 174. Levin, J.D., Fiala, D., Samala, M.F., Kahn, J.D. and Peterson, R.J. (2006) Position- dependent effects of locked nucleic acid (LNA) on DNA sequencing and PCR primers. Nucleic Acids Res, 34, e142. 175. Vesnaver, G. and Breslauer, K.J. (1991) The contribution of DNA single-stranded order to the thermodynamics of duplex formation. Proc Natl Acad Sci U S A, 88, 3569-3573. 176. Zhou, Y., Hall, C.K. and Karplus, M. (1999) The calorimetric criterion for a two-state process revisited. Protein Sci, 8, 1064-1074. 177. Hughesman, C.B., Turner, R.F. and Haynes, C. (2011) Correcting for Heat Capacity and 5'-TA Type Terminal Nearest Neighbors Improves Prediction of DNA Melting Temperatures Using Nearest-Neighbor Thermodynamic Models. Biochemistry, 50, 2642-2649. 178. Koshkin, A.A. (2004) Syntheses and base-pairing properties of locked nucleic acid nucleotides containing hypoxanthine, 2,6-diaminopurine, and 2-aminopurine nucleobases. J Org Chem, 69, 3711-3718.

183 179. Lahoud, G., Arar, K., Hou, Y.M. and Gamper, H. (2008) RecA-mediated strand invasion of DNA by oligonucleotides substituted with 2-aminoadenine and 2- thiothymine. Nucleic Acids Res, 36, 6806-6815. 180. Rosenbohm, C., Pedersen, D.S., Frieden, M., Jensen, F.R., Arent, S., Larsen, S. and Koch, T. (2004) LNA guanine and 2,6-diaminopurine. Synthesis, characterization and hybridization properties of LNA 2,6-diaminopurine containing oligonucleotides. Bioorg Med Chem, 12, 2385-2396. 181. Compagno, D., Lampe, J.N., Bourget, C., Kutyavin, I.V., Yurchenko, L., Lukhtanov, E.A., Gorn, V.V., Gamper, H.B., Jr. and Toulme, J.J. (1999) Antisense oligonucleotides containing modified bases inhibit in vitro translation of Leishmania amazonensis mRNAs by invading the mini-exon hairpin. J Biol Chem, 274, 8191- 8198. 182. Gamper, H.B., Jr., Arar, K., Gewirtz, A. and Hou, Y.M. (2006) Unrestricted hybridization of oligonucleotides to structure-free DNA. Biochemistry, 45, 6978- 6986. 183. Kutyavin, I.V., Rhinehart, R.L., Lukhtanov, E.A., Gorn, V.V., Meyer, R.B., Jr. and Gamper, H.B., Jr. (1996) Oligonucleotides containing 2-aminoadenine and 2- thiothymine act as selectively binding complementary agents. Biochemistry, 35, 11170-11176. 184. Latorra, D., Arar, K. and Hurley, J.M. (2003) Design considerations and effects of LNA in PCR primers. Mol Cell Probes, 17, 253-259. 185. SantaLucia, J., Jr., Kierzek, R. and Turner, D.H. (1992) Context dependence of hydrogen bond free energy revealed by substitutions in an RNA hairpin. Science, 256, 217-219. 186. Searle, M.S. and Williams, D.H. (1993) On the stability of nucleic acid structures in solution: enthalpy-entropy compensations, internal rotations and reversibility. Nucleic Acids Res, 21, 2051-2056. 187. Chazin, W.J., Rance, M., Chollet, A. and Leupin, W. (1991) Comparative NMR analysis of the decadeoxynucleotide d-(GCATTAATGC)2 and an analogue containing 2-aminoadenine. Nucleic Acids Res, 19, 5507-5513. 188. Cheong, C., Tinoco, I., Jr. and Chollet, A. (1988) Thermodynamic studies of base pairing involving 2,6-diaminopurine. Nucleic Acids Res, 16, 5115-5122. 189. Gryaznov, S. and Schultz, R.G. (1994) Stabilization of DNA:DNA and DNA:RNA duplexes by substitution of 2'-deoxyadenosine with 2'-deoxy-2-aminoadenosine. Tetrahedron Letters, 35, 2489-2492. 190. Connolly, B.A. and Newman, P.C. (1989) Synthesis and properties of oligonucleotides containing 4-thiothymidine, 5-methyl-2-pyrimidinone-1-beta-D(2'- deoxyriboside) and 2-thiothymidine. Nucleic Acids Res, 17, 4957-4974. 191. Sintim, H.O. and Kool, E.T. (2006) Enhanced base pairing and replication efficiency of thiothymidines, expanded-size variants of thymidine. J Am Chem Soc, 128, 396- 397. 192. Sismour, A.M. and Benner, S.A. (2005) The use of thymidine analogs to improve the replication of an extra DNA base pair: a synthetic biological system. Nucleic Acids Res, 33, 5640-5646. 193. Singh, S.K. and Wengel, J. (1998) Universality of LNA-mediated high-affinity nucleic acid recognition. Chemical Communications, 0, 1247-1248.

184 194. Jacobsen, N., Bentzen, J., Meldgaard, M., Jakobsen, M.H., Fenger, M., Kauppinen, S. and Skouv, J. (2002) LNA-enhanced detection of single nucleotide polymorphisms in the apolipoprotein E. Nucleic Acids Research, 30, e100. 195. Mouritzen, P., Nielsen, A.T., Pfundheller, H.M., Choleva, Y., Kongsbak, L. and Moller, S. (2003) Single nucleotide polymorphism genotyping using locked nucleic acid (LNA). Expert Rev Mol Diagn, 3, 27-38. 196. Leucht, C. and Baily-Cuif, L. (2007) The Universal ProbeLibrary - A versatile Tool for Quantitative Expression Analysis in Zebrafish. Biochemica, 2, 16-18. 197. Wu, R.-M., Wood, M., Thrush, A., Walton, E.F. and Varkonyi-Gasic, E. (2007) Real- time PCR Quantification of Plant miRNAs using Universal ProbeLibrary Technology. Biochemica, 2, 12-15. 198. Kocjan, B.J., Poljak, M. and Seme, K. (2010) Universal ProbeLibrary based real-time PCR assay for detection and confirmation of human papillomavirus genotype 52 . J Virol Methods, 163, 492-494. 199. Hughesman, C.B., Turner, R.F.B. and Haynes, C.A. (2011) Role of the Heat Capacity Change in Understanding and Modeling Melting Thermodynamics of Complementary Duplexes Containing Standard and Nucleobase-Modified LNA. Biochemistry, 50, 5354-5368. 200. Owczarzy, R., Tataurov, A.V., Wu, Y., Manthey, J.A., McQuisten, K.A., Almabrazi, H.G., Pedersen, K.F., Lin, Y., Garretson, J., McEntaggart, N.O. et al. (2008) IDT SciTools: a suite for analysis and design of nucleic acid oligomers. Nucleic Acids Res, 36, W163-169. 201. Moreira, B.G., You, Y., Behlke, M.A. and Owczarzy, R. (2005) Effects of fluorescent dyes, quenchers, and dangling ends on DNA duplex stability. Biochem Biophys Res Commun, 327, 473-484. 202. Rozen, S. and Skaletsky, H. (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol, 132, 365-386. 203. Lunge, V.R., Miller, B.J., Livak, K.J. and Batt, C.A. (2002) Factors affecting the performance of 5' nuclease PCR assays for Listeria monocytogenes detection. J Microbiol Methods, 51, 361-368. 204. Wittwer, C.T. and Kusakawa, N. (2004) In Persing, D. H., Tenover, F. C., Versalovic, J., Tang, J. W., Unger, E. R., Relman, D. A. and White, T. J. (eds.), Molecular microbiology: diagnostic principles and practice. ASM Press, Washington, pp. 71- 84. 205. Pao, W. and Ladanyi, M. (2007) Epidermal Growth Factor Receptor Mutation Testing in Lung Cancer: Searching for the Ideal Method. Clinical Cancer Research, 13, 4954-4955. 206. Wittwer, C.T., Herrmann, M.G., Moss, A.A. and Rasmussen, R.P. (1997) Continuous fluorescence monitoring of rapid cycle DNA amplification. Biotechniques, 22, 130- 131, 134-138. 207. Kaiser, M.W., Lyamicheva, N., Ma, W., Miller, C., Neri, B., Fors, L. and Lyamichev, V.I. (1999) A Comparison of Eubacterial and Archaeal Structure-specific 5'- Exonucleases. Journal of Biological Chemistry, 274, 21387-21394. 208. Dominguez, P.L. and Kolodney, M.S. (2005) Wild-type blocking polymerase chain reaction for detection of single nucleotide minority mutations from clinical specimens. Oncogene, 24, 6830-6834.

185 209. Hummelshoj, L., Ryder, L.P., Madsen, H.O. and Poulsen, L.K. (2005) Locked nucleic acid inhibits amplification of contaminating DNA in real-time PCR. Biotechniques, 38, 605-610. 210. Oldenburg, R.P., Liu, M.S. and Kolodney, M.S. (2008) Selective amplification of rare mutations using locked nucleic acid oligonucleotides that competitively inhibit primer binding to wild-type DNA. J Invest Dermatol, 128, 398-402. 211. Thiede, C., Creutzig, E., Illmer, T., Schaich, M., Heise, V., Ehninger, G. and Landt, O. (2006) Rapid and sensitive typing of NPM1 mutations using LNA-mediated PCR clamping. Leukemia, 20, 1897-1899. 212. Bru, D., Martin-Laurent, F. and Philippot, L. (2008) Quantification of the detrimental effect of a single primer-template mismatch by real-time PCR using the 16S rRNA gene as an example. Appl Environ Microbiol, 74, 1660-1663. 213. Sherrill, C.B., Marshall, D.J., Moser, M.J., Larsen, C.A., Daude-Snow, L., Jurczyk, S., Shapiro, G. and Prudent, J.R. (2004) Nucleic acid analysis using an expanded genetic alphabet to quench fluorescence. J Am Chem Soc, 126, 4550-4556. 214. Tindall, E.A., Speight, G., Petersen, D.C., Padilla, E.J. and Hayes, V.M. (2007) Novel Plexor SNP genotyping technology: comparisons with TaqMan and homogenous MassEXTEND MALDI-TOF mass spectrometry. Hum Mutat, 28, 922-927. 215. Eom, S.H., Wang, J. and Steitz, T.A. (1996) Structure of Taq polymerase with DNA at the polymerase active site. Nature, 382, 278-281. 216. Owczarzy, R., You, Y., Groth, C.L. and Tataurov, A.V. (2011) Stability and mismatch discrimination of locked nucleic acid-DNA duplexes. Biochemistry, 50, 9352-9367.

186 Appendices

o Appendix A Tmax ( C) data for the helix-to-coil transition of DNA duplexes and LNA-DNA heteroduplexes that are perfectly matched or contain a single centrally located mismatch

Sequence (5' to 3') Tmax(DNA-PM) Tmax(DNA-MM) Tmax(LNA-PM) Tmax(LNA-MM) Central Central DNA LNA Complement a•t a•a a•c a•g A•t A•a A•c A•g

gtcaaatcg gtcaAatcg cgatntgac 48.3 32.7 30.5 30.8 52.4 28.9 25.8 31.1 ctgaacgga ctgaAcgga tccgntcag 51.5 35.5 33.2 42.7 54.3 35.5 34.2 41.0 cctaagtat cctaAgtat atacntagg 38.3 16.0 31.2 31.2 45.7 22.7 13.5 12.7 atcaatgcg atcaAtgcg cgcantgat 50.5 33.8 31.0 39.0 53.0 34.3 32.0 37.8 tcccaatgt tcccAatgt acatnggga 46.5 24.5 23.4 30.5 48.7 23.7 25.9 28.2 gcccactct gcccActct agagngggc 57.4 41.3 41.2 n.d. 60.9 39.5 42.7 n.d. tgccagttc tgccAgttc gaacnggca 53.3 35.6 32.8 41.3 58.2 36.4 37.5 39.3 ccacataga ccacAtaga tctangtgg 44.0 23.9 20.8 32.6 46.7 21.6 21.4 27.6 cctgaactc cctgAactc gagtncagg 50.5 32.3 29.8 n.d. 54.0 35.3 34.8 n.d. ctagacagt ctagAcagt actgnctag 46.1 29.6 26.1 39.7 49.0 31.2 29.6 42.7 gctgagaac gctgAgaac gttcncagc 51.1 34.8 30.0 42.2 55.8 32.1 34.7 39.6 agggatacc agggAtacc ggtanccct 46.7 26.3 31.7 41.3 48.7 24.3 28.6 36.8 cgttaaagc cgttAaagc gcttnaacg 46.0 32.7 25.0 30.8 47.2 29.0 28.7 24.9 ggatacatg ggatAcatg catgnatcc 44.1 28.3 26.2 38.1 47.9 25.1 28.2 33.2 acctagtac acctAgtac gtacnaggt 43.9 23.0 19.4 31.4 50.5 35.1 32.7 35.1 ccttatcga ccttAtcga tcganaagg 44.5 25.4 22.4 n.d. 46.5 24.1 26.3 n.d.

187

Sequence (5' to 3') Tmax(DNA-PM) Tmax(DNA-MM) Tmax(LNA-PM) Tmax(LNA-MM) Central Central DNA LNA Complement c•g c•a c•c c•t C•g C•a C•c C•t

acgacaatc acgaCaatc gattntcgt 49.7 22.7 15.4 26.3 56.3 29.0 19.1 29.6 cggaccaca cggaCcaca tgtgntccg 58.0 36.3 28.8 34.8 65.0 41.1 31.1 39.0 gaaacggct gaaaCggct agccntttc 55.7 26.9 23.3 29.9 61.7 34.7 27.7 35.9 agcactcta agcaCtcta tagantgct 51.6 23.0 17.3 25.3 59.4 29.5 18.9 30.3 tgcccagat tgccCagat atctnggca 53.9 25.7 24.7 29.4 59.5 30.7 29.4 33.7 atacccttg atacCcttg caagngtat 44.6 15.6 14.2 19.5 52.7 23.5 27.5 24.9 ccaccgcta ccacCgcta tagcngtgg 58.4 32.1 30.0 35.5 66.6 36.2 39.7 37.3 tagcctttg tagcCtttg caaangcta 47.8 19.8 15.6 23.8 55.6 27.8 20.8 27.3 ctcgcaaac ctcgCaaac gtttncgag 54.6 29.1 24.7 31.2 59.3 36.8 27.8 33.9 tgagccaaa tgagCcaaa tttgnctca 50.8 31.5 19.2 24.9 57.1 31.1 17.7 28.1 taagcgtcc taagCgtcc ggacnctta 54.6 26.7 24.8 24.2 62.6 36.7 26.2 35.5 agggctgaa agggCtgaa ttcanccct 54.9 29.2 23.9 29.2 60.5 32.5 23.9 31.2 gcatcaact gcatCaact agttnatgc 51.7 22.4 18.9 27.2 56.9 27.7 21.7 29.7 caatccgaa caatCcgaa ttcgnattg 48.5 23.2 18.0 20.8 53.2 27.0 19.6 22.2 tcatcgaac tcatCgaac gttcnatga 48.8 19.0 14.4 21.5 55.5 22.0 15.9 25.9 gagtcttat gagtCttat ataanactc 41.3 14.0 10.0 17.9 48.8 17.5 11.5 20.2

188

Sequence (5' to 3') Tmax(DNA-PM) Tmax(DNA-MM) Tmax(LNA-PM) Tmax(LNA-MM) Central Central DNA LNA Complement g•c g•a g•g g•t G•c G•a G•g G•t

cccagagtg cccaGagtg cactntggg 51.7 35.7 34.5 32.7 56.2 35.2 35.5 39.5 cttagcgtg cttaGcgtg cacgntaag 51.2 36.7 39.5 33.0 54.0 29.2 38.4 38.4 tgaaggctc tgaaGgctc gagcnttca 52.3 36.7 36.9 32.7 60.0 36.1 39.4 43.6 attagtccc attaGtccc gggantaat 45.6 27.2 30.4 26.7 49.9 25.6 30.2 33.4 tgccgattc tgccGattc gaatnggca 56.9 36.3 35.0 39.7 58.4 32.6 35.3 46.0 aaacgcatc aaacGcatc gatgngttt 50.8 30.1 35.4 34.6 53.9 26.8 33.6 41.6 aatcggctc aatcGgctc gagcngatt 54.8 36.6 36.1 39.1 59.3 33.6 36.6 46.9 caccgtatg caccGtatg catanggtg 48.8 30.1 32.3 34.1 51.2 25.8 28.6 41.3 acaggaaag acagGaaag ctttnctgt 46.3 34.1 35.1 31.9 50.3 30.6 34.1 35.9 tctggcaac tctgGcaac gttgncaga 53.3 42.1 44.0 38.4 57.7 38.1 44.7 43.1 tatgggcac tatgGgcac gtgcncata 54.6 47.4 44.2 38.1 62.7 47.1 45.8 46.8 cacggtcta cacgGtcta tagancgtg 53.5 38.3 43.3 38.1 59.5 37.3 42.1 45.0 ggatgaagc ggatGaagc gcttnatcc 50.3 30.8 29.3 34.1 52.0 29.1 28.8 39.1 tactgccta tactGccta taggnagta 51.0 24.8 33.5 32.5 55.0 21.5 32.3 40.8 atttggaca atttGgaca tgtcnaaat 44.7 25.3 25.5 27.3 49.6 23.0 25.8 34.5 acttgtaac acttGtaac gttanaagt 40.6 13.9 20.1 22.8 43.6 13.8 17.4 30.8

189

Sequence (5' to 3') Tmax(DNA-PM) Tmax(DNA-MM) Tmax(LNA-PM) Tmax(LNA-MM) Central Central DNA LNA Complement t•a t•c t•g t•t T•a T•c T•g T•t

ccgataagt ccgaTaagt acttntcgg 45.1 23.0 32.5 27.8 49.7 24.4 36.0 28.3 ggaatcgct ggaaTcgct agcgnttcc 56.0 34.3 45.0 38.3 59.8 36.1 47.6 38.8 tccatgcct tccaTgcct aggcntgga 54.3 44.9 44.8 39.3 60.8 43.8 48.6 42.3 gtcatttac gtcaTttac gtaantgac 43.0 23.6 29.6 26.6 46.5 26.4 33.2 30.8 ggtctagtg ggtcTagtg cactngacc 48.0 32.6 37.5 35.0 53.0 35.0 35.0 38.4 tacctcgcc taccTcgcc ggcgnggta 58.3 40.8 n.d. 44.3 60.8 40.8 n.d. 44.6 acactgtct acacTgtct agacngtgt 50.0 30.1 39.0 35.3 57.0 35.8 44.5 42.5 acccttctg acccTtctg cagangggt 48.9 31.6 35.0 35.0 52.5 33.0 38.3 37.5 gcggtacgt gcggTacgt acgtnccgc 57.7 34.6 50.3 40.6 63.7 44.3 57.2 46.6 cacgtcgca cacgTcgca tgcgncgtg 62.0 45.0 53.6 46.6 65.6 46.5 57.3 48.0 tctgtggga tctgTggga tcccncaga 50.7 28.8 39.3 33.0 55.0 34.0 43.5 36.8 tgcgtttct tgcgTttct agaancgca 53.1 33.0 39.8 35.8 58.5 37.8 44.3 40.0 cccttatgc ccctTatgc gcatnaggg 47.9 31.8 38.5 33.5 54.2 34.3 41.3 37.8 ggtttcccg ggttTcccg cgggnaacc 53.0 32.4 40.8 34.8 57.8 34.0 45.1 37.8 atattgcgt atatTgcgt acgcnatat 47.1 27.8 37.0 31.5 52.2 31.0 40.2 34.5 tggtttgag tggtTtgag ctcanacca 48.9 30.2 35.0 31.0 51.9 31.2 37.2 33.0

190