MOLECULAR THERMODYNAMICS OF THE STABILITY OF NATURAL, SUGAR
AND BASE-MODIFIED DNA DUPLEXES AND ITS APPLICATION TO THE
DESIGN OF PROBES AND PRIMERS FOR SENSITIVE DETECTION OF
SOMATIC POINT MUTATIONS
by
Curtis Hughesman
B.A.Sc., The University of Calgary, 1997
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
in
THE FACULTY OF GRADUATE STUDIES
(Chemical and Biological Engineering)
THE UNIVERSITY OF BRITISH COLUMBIA
(Vancouver)
December 2012
© Curtis Hughesman, 2012 Abstract
Cancer is characterized as a genetic disease associated with acquired somatic
mutations, a majority of which consist of only a single base change and are commonly
referred to as somatic point mutations (SPM). Real-time quantitative polymerase-chain reaction (qPCR) techniques using allele specific (AS) probes or primers are widely used in genotyping assays to detect commonly known single nucleotide polymorphisms (SNP), and also have the potential to detect SPMs, provided the required analytical sensitivity and specificity can be realized. One strategy to establish the necessary performance is to introduce nucleotide analogs such as Locked Nucleic Acids (LNAs) into AS probes or primers; however the successful design requires a fundamental understanding of both the thermodynamics and kinetics of LNA-DNA heteroduplexes. Melting thermodynamic studies of DNA duplexes and LNA-DNA heteroduplexes were therefore carried out using both
ultraviolet (UV) spectroscopy and differential scanning calorimetry (DSC) to quantify the
o o thermodynamics (ΔH , ΔS , ΔCp and Tm) associated with the helix-to-coil transition. Data
collected on DNA duplexes and DNA-LNA heteroduplexes were used to introduce
improvements in the “unified” nearest-neighbor model, and for the development of a new
model, referred to as the Single Base Thermodynamic (SBT) model that accurately predicts the Tm for the melting of LNA-DNA heteroduplexes.
The SBT model was extended and applied to PCR conditions to design LNA-bearing
AS probes for qPCR assays to detect the clinically important SPMs KIT c.1799t>a (D816V)
and JAK2 c.1849g>t (V617F), and were found to significantly outperform standard AS
probes containing only DNA. The interaction of Taq polymerase with heteroduplexes
formed between an LNA-bearing primer and a target template were also studied and results
ii used to generate general rules for designing LNA-bearing AS primers capable of unequivocal detection of a rare mutant allele bearing a SPM. The method was then extended to allow qPCR detection by Plexor™ technology and applied to create an AS primer directed against the JAK2 V617F SPM that can detect one mutation in a background of more than 100,000 copies of the wild-type allele and which is now used by the Cancer Genetics Laboratory of the British Columbia Cancer Agency (BCCA) to analyze patient samples.
iii Preface
A version of Chapter 1 has been submitted for publication to "Biochemical Engineering
Thermodynamics", in press, von Stocker U, et al. (Eds.), EPFL Press (2012). The
manuscript was written through collaboration between Dr. Charles Haynes and me.
A version of Chapter 2 has been published. Hughesman, C.B., Turner, R.F. and Haynes, C.
(2011) Correcting for Heat Capacity and 5'-TA Type Terminal Nearest Neighbors Improves
Prediction of DNA Melting Temperatures Using Nearest-Neighbor Thermodynamic Models.
Biochemistry, 50, 2642-2649. I performed all of the experiments and wrote most of the
manuscript. Dr. Charles Haynes provided guidance on the research. Dr. Charles Haynes and
Dr. Robin Turner reviewed and edited the manuscript.
A version of Chapter 3 has been published. Hughesman, C.B., Turner, R.F.B. and Haynes,
C.A. (2011) Role of the Heat Capacity Change in Understanding and Modeling Melting
Thermodynamics of Complementary Duplexes Containing Standard and Nucleobase-
Modified LNA. Biochemistry, 50, 5354-5368. I performed or directly supervised all of the experiments and wrote most of the manuscripts. Dr. Charles Haynes provided guidance on the research. Dr. Charles Haynes and Dr. Robin Turner reviewed and edited the manuscript.
A version of Chapter 5 is being prepared for submission as a publication. Experiments on the LNA-bearing primers directed at the BCL2 plasmids were performed by Colin Olsen and me. Kelly McNeil generated the JAK2 plasmids and DNA from patients for testing. All experiments involving the AS primers were performed by myself. Kelly McNeil, Sean
iv Young, Dr. Aly Karsan and Dr. Charles Haynes provided guidance on the research and
experiments. Approval by UBC’s Research Ethics Board was obtained; blind testing of anonymous patient samples previously acquired and stored at the BC Cancer Agency using assays developed in this work was conducted under UBC’s research ethics certificate H08-
01035. All testing was conducted for research purposes only and no knowledge of patient identity or medical history was known or transferred.
v Table of Contents
Abstract ...... ii Preface ...... iv Table of Contents ...... vi List of Tables ...... ix List of Figures ...... xii Nomenclature ...... xv Acknowledgements ...... xviii Dedication ...... xx Chapter 1: Introduction ...... 1 1.1 Thesis Overview ...... 1 1.2 Background ...... 7 1.2.1 Methods for measuring duplex DNA melting thermodynamics ...... 12 1.2.1.1 UV absorption spectroscopy ...... 13 1.2.1.2 Calorimetry ...... 20 1.2.2 Thermodynamic models used to predict DNA duplex stability...... 24 1.2.3 Locked Nucleic Acids (LNAs) ...... 30 1.2.3.1 Chemistry and properties ...... 31 1.2.3.2 Predicting the stability of LNA-DNA heteroduplexes ...... 32 1.2.4 PCR based methods for detection and quantification of a SPM ...... 34 1.2.4.1 LNA containing AS probes ...... 38 1.2.4.2 LNA containing AS primers ...... 39 1.2.5 Clinically significant somatic point mutations ...... 41 1.2.5.1 JAK2 V617F...... 41 1.2.5.2 KIT D816V...... 42 1.2.5.3 BRAF V600E ...... 42 1.3 Thesis Objectives and Content Overview ...... 43 Chapter 2: Correcting for Heat Capacity and 5'-ta Type Terminal Nearest Neighbors Improves Prediction of DNA Melting Temperatures Using Nearest-Neighbor Thermodynamic Models ...... 46 2.1 Materials and Methods ...... 47 2.1.1 DNA synthesis and purification ...... 47 2.1.2 Differential scanning calorimetry ...... 47 2.1.3 Regression of melting thermodynamics data ...... 48 2.1.4 Error analysis ...... 48 2.2 Results and Discussion ...... 49 2.2.1 Introduction of ΔCp into the unified NNT model improves Tm predictions ... 51 bp 2.2.2 Regressed ΔCp and Tref values are supported by DSC data ...... 55 2.2.3 Duplexes terminating in a 5’-ta have statistically significant Tm(error) ...... 59 2.2.4 Correcting Tm predictions for duplexes containing 5’-ta type termini...... 63 2.3 Conclusions ...... 65 Chapter 3: The Role of the Heat Capacity Change in Understanding and Modeling Melting Thermodynamics of Complementary Duplexes Containing Standard and Nucleobase Modified LNA ...... 67 3.1 Materials and Methods ...... 69
vi 3.1.1 Sequence design ...... 69 3.1.2 Oligonucleotide synthesis and purification ...... 70 3.1.3 Differential scanning calorimetry ...... 70 3.1.4 UV spectroscopy ...... 71 3.1.5 Error analysis ...... 72 3.1.6 Regression of SBT model parameters...... 73 3.1.7 Model predicted ΔTm values for LNA substituted duplexes ...... 73 3.2 Results and Discussion ...... 74 3.2.1 Accounting for ∆Cp shows that the increase in duplex stability resulting from LNA substitutions is predominantly driven by a favorable entropy change... 79 3.2.2 Base classification and pairing explain differences in the ∆∆Si° value and stability enhancement offered by different LNAs...... 84 3.1.1 Terminal 5’ and 3’ LNA substitutions are much less stabilizing than internal LNA substitutions ...... 86 3.1.2 A new model for predicting the melting thermodynamics of LNA substituted duplexes ...... 88 3.1.3 The SBT model predicts Tm values for standard-LNA-containing mixmer duplexes with similar accuracy as more complex NNT models...... 90 3.1.4 Testing the validity of SBT model assumptions...... 92 3.1.5 Substitution of standard LNAs with base-modified LNA nucleosides provides further stability increases...... 95 3.1.6 D•H base pairs demonstrate pseudo-complementary properties...... 97 3.2 Conclusions ...... 99 Chapter 4: Design of LNA-rich Hydrolysis Probes for Detection of Somatic Point Mutations ...... 102 4.1 Materials and Methods ...... 104 4.1.1 Oligonucleotides ...... 104 4.1.2 Plasmids ...... 105 4.1.3 Monitoring helix-to-coil transitions with UV spectroscopy ...... 106 4.1.4 UVM of 9-mer duplexes to determined ΔΔTmax(MM) ...... 106 4.1.5 Prediction of melting thermodynamics for probe•template duplexes ...... 107 4.1.6 Real-time qPCR ...... 108 4.2 Results and Discussion ...... 109 4.2.1 Model-based design of LNA-rich AS primers offering increased ∆Tm(MT–WT) ...... 109 4.2.2 Performance testing of probe designs on plasmid templates ...... 114 4.2.3 Application of LNA-rich probes to mixtures of MT and WT alleles ...... 119 4.2.4 Understanding the reduction in hydrolysis probe signal strength ...... 122 4.3 Conclusions ...... 129 Chapter 5: Novel Plexor™ Multi-LNA Allele Specific Primers for Unequivocal Clinical Detection of Somatic Point Mutations: Design Rules and Application to JAK2 V617F, KIT D816V and BRAF V600E ...... 131 5.1 Materials and Methods ...... 133 5.1.1 Design and testing PCR primers with single and multiple LNA substitutions ...... 133 5.1.2 Testing of JAK2 WT and MT (V617F) AS primers on plasmids ...... 134
vii 5.1.3 Testing of KIT and BRAF plasmids with AS primers ...... 135 5.1.4 Genomic DNA isolation ...... 135 5.1.5 Plexor multi-LNA AS primer assay ...... 137 5.1.6 Benchmark hydrolysis-probe based assay of JAK2 V617F ...... 138 5.1.7 JAK2 MutaQuant assay ...... 138 5.2 Results and Discussion ...... 141 5.2.1 The impact of LNA substitutions in the 3’ region of a primer ...... 141 5.2.2 AS primer design: preferred location for the site of variation (SOV) ...... 147 5.2.3 AS primer design: the effect of LNA substitution versus mismatch insertion ...... 152 5.2.4 AS primer design: application of LNA substitution guidelines ...... 153 5.2.5 AS primer design: Plexor™ LNA AS primers directed against JAK2 V617F ...... 158 5.2.6 Testing of LNA AS primer designs for absolute detection of KIT D816V and BRAF V600E ...... 161 5.3 Conclusions ...... 163 Chapter 6: Conclusions and Suggestions for Future Work ...... 165 References ...... 173 Appendices ...... 187 o Appendix A Tmax ( C) data for the helix-to-coil transition of DNA duplexes and LNA- DNA heteroduplexes that are perfectly matched or contain a single centrally located mismatch ...... 187
viii List of Tables
Table 1.1 “Unified” Nearest Neighbors Parameters for Helix-to-Coil Transition...... 29
Table 2.1 Measured thermodynamic values from DSC analysis of short duplex DNA. .... 56
o Table 2.2 Average Tm(error) ( C) values associated with terminal base pairs and terminal
nearest-neighbors...... 60
Table 2.3 Measured thermodynamic data for helix to coil transition of 11-mer DNA
duplexes used to study duplex end effects...... 61
o Table 2.4 Thermodynamic parameters for 5'-ta terminal NN at Tref = 53 C...... 64
Table 3.1 DSC determined thermodynamic parameters for duplexes between DNA
oligonucleotides with and without A, T, G and/or C substitutions...... 75
Table 3.2 DSC derived thermodynamic data for complementary DNA duplexes where one
strand contains D and/or H substitutions...... 77
bp Table 3.3 Average ΔCp and Tm values determined by DSC for duplexes with and without
LNA substitutions...... 78
Table 3.4 General SBT model parameters for the helix-to-coil transition of duplexes
containing standard and nucleoside-modified LNA substitutions...... 83
Table 3.5 ΔΔSi° parameters and differences in them for the helix-to-coil transition of
duplexes containing standard LNA bases...... 85
Table 3.6 DSC derived thermodynamic data for duplexes used to study the effect of LNA
substitution at the 3’ or 5’ termini...... 87
o Table 3.7 Errors in ΔTm ( C) values predicted using the specified model for standard LNA
(A, T, G and/or C) substituted mixmer duplexes...... 91
ix Table 3.8 UVM derived thermodynamic data for duplexes containing tandem LNA
substitutions...... 93
Table 3.9 ΔTm prediction errors using SBT model for LNA substituted gapmer duplexes
and fully modified LNA-DNA heteroduplexes...... 95
Table 3.10 UVM derived ΔTmax data for the helix-to-coil transition of duplexes with base
pairs formed between a, d, A or D and t, h, T, or H...... 98
Table 4.1 Model based design of DNA and LNA probes for KIT D816V...... 110
Table 4.2 Incremental ΔΔTmax(MM) for LNA•DNA mismatch in 9-mer duplexes determined
from UVM experiments...... 111
Table 4.3 Model predicted and UVM experimentally determined thermodynamic
parameters for KIT D816V and JAK2 V617F probes at PCR solution conditions.
...... 112
Table 4.4 Calculated theoretical analytical specificity of KIT D816V and JAK2 V617F
o o hydrolysis probes at Ta = 62 C and 65 C...... 116
Table 4.5 Experimentally determined SPE for LNA bearing probes...... 120
Table 5.1 PCR efficiencies Eexpt for amplification of the BCL2 plasmid mini-gene using
pure-DNA and LNA-substituted primers...... 139
Table 5.2 Average Eexpt for amplification of the BCL2 plasmid mini-gene reported as a
function of LNA base type and base location in primers containing a single LNA
substitution within the 3’ (L0) to 3’-8 (L8) positions...... 141
Table 5.3 PCR efficiencies for amplification of the BCL2 plasmid mini-gene using forward
(FP) and reverse (RP) primers containing multiple LNA substitutions...... 142
x Table 5.4 Experimental and model estimated PCR efficiencies for amplification of the
BCL2 plasmid mini-gene using primers containing three LNA substitutions
within the 3’-1 to 3’-6 positions...... 144
Table 5.5 Average efficiency of amplification of the BCL2 mini-gene using each possible
primer design comprised of either one or two LNA substitutions within the 3’ to
3’-6 region...... 146
Table 5.6 Eexpt, Epred & SPE for various JAK2 WT AS primers directed against the JAK2
WT and MT (V617F) plasmid templates...... 150
Table 5.7 Eexpt, Epred & SPE for various JAK2 MT (V617F) AS primers directed against the
JAK2 WT and MT (V617F) plasmid templates...... 151
Table 5.8 Putative AS primer designs possessing a 3’ SOV and one or more LNA
substitutions within the 3’ to 3’-6 positions...... 155
Table 5.9 Summary of results of different techniques used to classify the JAK2 V617F
status of 96 patients suspected of a myeloproliferative neoplasm (MPN)...... 160
Table 5.10 Experimental Cq, Eexpt and SPE data for various AS primer designs directed
against the KIT D816V and BRAF V600E SPM-bearing alleles...... 162
xi List of Figures
Figure 1.1 Common types of base-pairing between DNA nucleobases...... 10
Figure 1.2 Example of UVM data and analysis for a short DNA duplex...... 14
Figure 1.3 Example of DSC data and analysis for a short DNA duplex...... 21
Figure 1.4 Structure of sugar in DNA, RNA and LNA...... 31
Figure 1.5 AS hydrolysis probe based qPCR assay for SPM detection...... 36
Figure 1.6 AS primer based qPCR assay for SPM detection...... 37
Figure 2.1 Tm(error) values for 125 test sequences in which the predicted Tm is determined
by the unified NNT model...... 51
Figure 2.2 Tm(error) values for 125 test sequences accounting for a non-zero ΔCp...... 54
Figure 2.3 Relationship between predicted (unified NNT model) and experimental ΔHo
as a function of Tm(expt)...... 57
bp Figure 2.4 Measured ΔCp values as a function of experimental melting temperature. . 58
Figure 2.5 Tm(error) values for 125 test sequences accounting for a non-zero ΔCp and
parameters for sequences with terminal 5’-ta...... 65
Figure 3.1 Structure of LNA-2-aminoadenine and LNA-2-thiothymine...... 68
Figure 3.2 Experimental helix-to-coil transition ΔΔH° and ΔΔS° for 43 duplexes with
standard LNA substitutions...... 80
Figure 3.3 Comparison of ∆∆G°37 values predicted using the LNA NNT model to
experimental helix-to-coil ∆∆G°37 data for the 43 duplexes with standard
LNA substitutions...... 82
xii o Figure 4.1 qPCR amplification at Ta = 62 C with P22 (A), P14L5 (B) and P11L6 (C)
hydrolysis probes using serially diluted MT only plasmids (106 to 102 copies)
and WT only plasmids (106 copies)...... 115
o Figure 4.2 qPCR amplification at Ta = 62 C with P22 (A), P14L5 (B) and P11L6 (C)
hydrolysis probes using MT only plasmids (100% MT), WT only plasmids
(100% WT) and MT plasmids diluted in a background of WT plasmids (10%
and 1% MT)...... 118
Figure 4.3 RFUend verse MT template frequency for KIT D816V & JAK2 V617F probes.
...... 121
Figure 4.4 qPCR of KIT D816V using SYBR Green or one of three hydrolysis probes. ....
...... 123
o Figure 4.5 RFUi+1/RFUi per cycle for qPCR amplification at Ta = 62 C of KIT D816V
plasmid monitored by SYBR Green or hydrolysis probe...... 125
Figure 4.6 Theoretical fractional curves determined for KIT D816V hydrolysis probes. ....
...... 128
Figure 4.7 Signal strength for KIT D816V DNA and LNA-bearing probes...... 129
Figure 5.1 Cq data, of various pure-DNA AS primers for their target allele reported as a
function of the target allele, the primer position interacting with the SOV, and
the annealing temperature Ta used in the qPCR...... 148
Figure 5.2 qPCR amplification curves using the JAK2 WT0 L025 or WT0 L25 primers. ..
...... 157
Figure 5.3 qPCR amplification curves using the JAK2 MT0 L0123 or MT0 L123
primers...... 158
xiii Figure 5.4 qPCR amplification of the JAK2 MT (V617F) and WT plasmid template using
the Plexor™ MT0 L123 AS primer...... 159
xiv Nomenclature
BRAF V600E The somatic point mutation c.1799 t>a in the BRAF gene that causes a substitution of the amino acid valine with glutamic acid at position 600 (p.V600E) in the BRAF protein JAK2 V617F The somatic point mutation c.1849 g>t in the JAK2 gene that causes a substitution of the amino acid valine with phenylalanine at position 617 (p.V617F) in the JAK2 protein KIT D816V The somatic point mutation c.2468 a>t in the KIT gene that causes a substitution of the amino acid aspartic acid with valine at position 816 (p.D816V) in the KIT protein. a adenine c cytosine d 2-aminoadenine (2,6 Diaminopurine) g guanine h 2-thiothymine t thymine A LNA-adenine C LNA-cytosine G LNA-guanine T LNA-thymine D LNA-2-aminoadenine H LNA-2-thiothymine
• Hydrogen bonding involved in base pairs between strands - Covalent bonding between nucleotides on a single strand
MT Mutant gene or template containing SPM WT Wild-type (germline) gene or template
LOD Limit of detection (analytical sensitivity) SPE Analytical specificity
A260 Absorbance at 260 nm ex Cp Excess heat capacity bp ΔCp Change in heat capacity for helix to coil transition per base pair ΔCp Change in heat capacity for helix to coil transition CA Concentration of the more concentrated strand CB Concentration of the less concentrated strand Cq Quantification cycle CqMT Quantification cycle for MT template CqWT Quantification cycle for WT template ΔCq(MT-WT) Difference in Cq between MT and WT template ΔCq(WT-MT) Difference in Cq between WT and MT template CT Total strand concentration
xv CT(MT) Total strand concentration of probe and MT template CT(MT) Total strand concentration of probe and WT template E Amplification efficiency of a qPCR reaction EDNA Amplification efficiency of a qPCR reaction with DNA based primer Eexpt Experimentally determined amplification efficiency of a qPCR reaction ELNA Amplification efficiency of a qPCR reaction with LNA-bearing primer Epred Predicted amplification efficiency of a qPCR reaction ∆G Change in Gibb’s energy ∆G° Change in Gibb’s energy at standard-state conditions o ∆G°37 Change in Gibb’s energy at standard-state conditions determined at T = 37 C ∆∆G°37 Incremental change in Gibb’s energy at standard-state conditions determined at T = 37oC ∆H° Change in enthalpy at standard-state conditions ∆H°cal Change in enthalpy at standard-state conditions determined directly from integration of excess heat capacity data ∆H°2-st Change in enthalpy at standard-state conditions determined from two-state model ΔH°5’-ta Change in enthalpy at standard-state conditions associated with terminal 5’-ta ΔH°expt Experimentally determined change in enthalpy at standard-state condition ΔH°pred Model predicted change in enthalpy at standard-state condition ∆H°MT Change in enthalpy at standard-state condition for probe•MT template duplex ∆H°WT Change in enthalpy at standard-state condition for probe•WT template duplex ∆∆H° Incremental change in enthalpy ∆∆H°LNA Incremental change in enthalpy associated with LNA substitution ∆∆H°(LNA-DNA) Incremental change in enthalpy between LNA-DNA heteroduplex and isosequential DNA duplex ∆H°UVM Change in enthalpy at standard-state conditions determined from UVM data K(T) Equilibrium constant at transition temperature (T) nbp Number of base pairs ni Number of LNA bases of type i P Pressure R Gas constant RFU Relative fluorescence unit RFUend End point RFU ௗ ܴܨܷெ் End point RFU for probe determined for 100% MT template ∆S° Change in entropy at standard-state conditions ΔS°5’-ta Change in entropy at standard-state conditions associated with terminal 5’-ta ∆S°MT Change in entropy at standard-state condition for probe•MT template duplex ∆S°WT Change in entropy at standard-state condition for probe•WT template duplex ∆∆S° Incremental change in entropy ∆∆S°LNA Incremental change in entropy associated with LNA substitution ∆∆S°(LNA-DNA) Incremental change in entropy between LNA-DNA heteroduplex and isosequential DNA duplex T Temperature Ta Annealing temperature Tm Melting temperature (predicted)
xvi Tm(error) Error associated with the model predicted melting temperature as compared to the experimental melting temperature Tm(MT) Melting temperature of probe and MT template duplex Tm(WT) Melting temperature of probe and WT template duplex ∆Tm Incremental change in melting temperature between a LNA-DNA heteroduplex and an isosequential DNA duplex ∆Tm(error) Error associated with the model predicted incremental melting temperature as compared to the experimental incremental melting temperature determined between a LNA-DNA heteroduplex and an isosequential DNA duplex ΔTm(LNA-DNA) Incremental change in melting temperature for a LNA-DNA heteroduplex and isosequential DNA duplex ∆Tm(MT-WT) Difference in the melting temperature determined for duplexes formed between the probe•MT template and the probe•WT template ∆Tm(pred) Predicted incremental change in melting temperature for a LNA-DNA heteroduplex compared to the isosequential DNA duplex
Tm(expt) Experimentally determined melting temperature Tm(pred) Predicted melting temperature Tm(DNA) Melting temperature of DNA duplex
Tm(LNA) Melting temperature of LNA-DNA heteroduplex Tmax Temperature determined at the maximum of d(A260)/dT curve Tmax(DNA-PM) Tmax of DNA duplex Tmax(DNA-MM) Tmax of DNA duplex containing a single mismatch Tmax(LNA-PM) Tmax of LNA-DNA heteroduplex Tmax(LNA-MM) Tmax of LNA-DNA heteroduplex containing a single mismatch ΔΔTmax(MM) Incremental change in ΔTmax for a LNA•DNA mismatch compared to the DNA•DNA mismatch involving the same nucleobases (i.e A•a vs a•a) Tref Reference temperature tc t-distribution critical value
α Fraction of strands in double stranded state αMT Fraction of strands in duplex state formed between probe and MT template αWT Fraction of strands in duplex state formed between probe and WT template χ Residual error θ Fraction of strands in single stranded state σCqMT Standard deviation of CqMT σCqWT Standard deviation of CqWT
xvii Acknowledgements
The enigma and pervasiveness of cancer intrigued me since I was a young engineer.
This thesis represents my contribution through engineering and scientific research to improve our understanding, diagnosis and treatment of cancer. Through my own personal experience of my father’s lost battle with cancer, and which the overwhelming statistics affirm, too many lives are affected by this disease. It is my hope that more sensitive and specific detection of cancer will provide patients with a greater variety of selective treatment options, and therefore, better outcomes.
The ability to pursue this Ph.D. begins first with my parents, Yvonne and Doug, who have planted in me the necessary skills and attributes, and who have provided me in every conceivable way, the required support and encouragement to complete this goal.
I owe enduring gratitude to my wife Catherine whose unending love and guidance has provided me a solid foundation and with whom I have shared the happiest moments of my life, the birth of our son Kai Douglas and daughter Makena Alice. The experience of being a father has granted me a different perspective to both work and life, and the joy and laughter that Kai and Makena bring to me daily, continues to inspire me in new and unique ways.
I would like to thank Dr. Charles Haynes for providing me the opportunity to be a member of his research team, and who has dedicated significant time and effort not only aiding me in the completion of my thesis, but also in developing the essential skills to be a successful member of and contributor to the scientific community. I appreciate all of the members of his laboratory group with whom I have had the pleasure to either work with or work near. In particular I would like to thank Dr. Louise Creagh for always providing her time and support with the finer details of research.
xviii I would also like to thank the members of the B.C. Cancer Agency who have provided me the opportunity to be directly involved in scientific research aimed at improving diagnostics for cancer genetics. In particular I would like to thank Dr. Sean Young and Dr.
Aly Karsan. Very special thank you to Kelly McNeil with whom I have been fortunate
enough to not only work with but also become friends.
Thanks goes to an inspiring group of mentors, both past and present who have made
significant impact on my engineering career. These people include Pat Carlson and Ron
Sawatzky and my colleagues David Grohs, Jonathan White and Nance McCollom at Pavilion
Energy Corp. Finally, I want to thank all those whose patient encouragement couched
sometimes in provocation, have also inspired the completion of this epic Ph.D. journey:
Hughesman & Chow families, Stephen Dickinson, Philip Wong, Eddie Chan, Doug Howes,
Dr. Patrick Francis, Dr. Matt Larouche, Dr. Natasha Pollock, John Seo, Eric Chow, Kevin
Yee, Dr. Hans Drouin, Betty & Chris Marshall, Barry & Marilyn Wong and Rod Beltran.
xix Dedication
To my mom Yvonne, and in memory of my dad Douglas
&
To my family Catherine, Kai and Makena
xx Chapter 1: Introduction
1.1 Thesis Overview
The completion of the draft (1,2) and final (3) versions of the first human genome sequence had a profound impact on research and technical advances in the life sciences and clinical medicine. Increasingly detailed genome maps permit identification of genes associated with dozens of diseases, including myotonic dystrophy, fragile X syndrome, neurofibromatosis types 1 and 2, inherited colon cancer, Alzheimer's disease, and familial breast cancer. The study of genetic variations within a population, including those that are inherited (germline) and acquired (somatic) by individuals is likewise opening a new era of molecular medicine that will replace treating symptoms with identifying and repairing fundamental causes of disease. The development of rapid and more specific diagnostic tests is an essential component of this transformation, and will make possible earlier treatment of countless life-threatening and life-altering maladies. Medical researchers also will be able to devise personalized therapeutic regimens, improve immunotherapy techniques, identify environmental conditions that may trigger disease, and possibly augment or even replace defective genes through gene therapy.
Large-scale sequencing programs such as the International HapMap Project (4) and the Cancer Genome Project (5) led by the Wellcome Trust Sanger Institute have collected and compiled extensive data for common germline genetic variations and somatic mutations, respectively. Common types of germline variations include copy-number variations (CNV) and single nucleotide polymorphisms (SNPs). Several classes of somatic mutations have also been identified, including large genomic rearrangements such as translocations, smaller genetic variations such as short insertions, deletions or replacements, and individual base pair
1 differences created by somatic point mutations (SPMs). The Catalog of Somatic Mutations
in Cancer (COSMIC) created and curated by the Cancer Genome Project reveals that SPMs
are the most commonly observed class of somatic mutations (6), with a subset of these SPMs
identified as “driver mutations” in cancer development and progression (7,8).
Although significant research is still required to both discover and validate the
connection between genetic variations and disease or potential for disease, a number of
examples exist where knowledge of either a SNP or SPM has proven useful for differential
diagnosis, prognosis and/or prediction of therapeutic response (9-13). Among these
examples is a growing list of genetic variations that have been validated as clinically
relevant, and this has intensified the need for accurate, cost effective tests that can be used in
clinical laboratories and hospitals. Establishing reliable methods to detect SPMs is
particularly important. Although cancer has been found to be a heterogeneous disease with
both the type and number of somatic mutations unique to an individual’s cancer, a subset of specific SPMs appear at such frequencies in certain cancers that their role as “drivers” has been signified (14). Indeed the recent success of using BRAF inhibitors in treating patients with metastatic melanoma positive for mutations in the BRAF V600 codon (15,16) highlights
the potential to improve cancer therapy using targeted treatment based on SPM profiling.
Detecting a SPM can be challenging, as unlike germline SNP variations, the
frequency (percentage) of cells that have acquired one or more clinically relevant SPM can
be highly variable, and therefore the mutated cell pool may represent far less than 1% of the total cell population within a clinical sample. The detection of a low frequency somatic
2 mutation is further challenged by the fact that the mutant allele1 typically differs from the
parent wild-type (WT) allele by a single base pair, creating the need to avoid or at least strongly inhibit cross reactions with and/or amplification of the much higher abundance WT sequence (17,18).
Deep sequencing has been used to discover most of the clinically important SPMs known to date (19). However sequencing of SPMs, even using next-generation technology, is unlikely to find significant application in clinical laboratories for initial screening due to the cost, as well as technical limitations associated with sequencing error rates (20-22).
Instead, routine and repeated testing for somatic point mutations will more likely rely on techniques that not only accurately detect low frequency SPMs, but also allow for cost- effective continual monitoring of minimal residual disease (MRD) levels during and following treatment (23).
Of the techniques available to detect SPMs, those based on the polymerase chain reaction (PCR) are among the most promising due to their low cost, general ease of use, and potential for very good analytical sensitivity, also known as the limit of detection (LOD), and
analytical specificity (SPE), particularly when coupled with in-line detection systems as in
quantitative real-time PCR (qPCR). The terminology for analytical sensitivity (LOD) and
analytical specificity (SPE) used throughout this thesis is that recommended by the MIQE
(minimum information for publication of quantitative real-time PCR experiments) guidelines
(24) and also accepted by the medical community (25). For SPM detection, analytical
1 In this thesis allele is applied in accordance with the more general definition in which an allele represents one of two or more forms of a gene.
3 sensitivity (LOD) is a minimum number of copies that can be reliably be detected by a given
assay, while analytical specificity (SPE) is a measure of that assay’s ability to discriminate
the target gene containing the SPM from closely related alleles (i.e. germline) and is
generally presented as a ratio or percent.
The LOD and SPE of a clinical assay required for unequivocal detection of a SPM is
dependent on both the quality and quantity of genomic DNA present in a typical patient
sample. Although the amount of starting material may vary, a typical patient sample
collected and processed for use in qPCR contains ca. 100 ng of genomic DNA. A single
copy of the human genome is estimated to contain ~3 x 109 bp, with each nucleotide base
pair weighing on average 660 g/mole, therefore ca. ~ 30,000 copies of genomic DNA are present in a typical clinical sample. Stochastic effects limit the LOD that can be theoretically achieved with confidence (95%) to 3 copies of the mutant gene (24). Therefore in a typical patient sample, an assay with an SPE of 3:30,000 or 0.01% can be classified as providing unequivocal detection of a SPM. Although the clinical significance of detecting very low mutation loads is not clear, the development of clinical qPCR based assays capable of unequivocal detection of SPMs is a desirable goal, both in that it improves confidence in detecting what is currently set as a positive result in the clinic and that it may lead to improved understanding of the evolution and progress of cancer-related diseases as well as
earlier detection of cancerous or precancerous states for patients.
Common qPCR techniques used to detect and differentiate sequences that may differ
by as little as one nucleotide include both allele-specific probe (AS probe) and allele-specific
primer (AS primer) based assays. In both cases, accurate prediction of probe or primer
hybridization thermodynamics, especially the melting temperature (Tm) under specific PCR
4 solution conditions, is generally required to design a robust assay offering an acceptable LOD and SPE. First described more than a decade ago, AS probes and AS primers were originally designed and are still typically designed as standard DNA oligonucleotides
(26,27). Although SNPs are often reliably detected using these pure-DNA reagents, they have proven far less effective for the detection of SPMs, as the frequency of the allele bearing the SPM in the clinical sample is unknown. SPE below 5% are rarely achieved for
DNA-based AS probes (28), and although DNA-based AS primers can on occasion achieve
SPE capable of unequivocal detection of SPM (29), the performance of DNA-based AS primers is known to be sensitive to a number of factors including the type of base pair mismatch and the surrounding sequence, which makes it difficult to consistently achieve SPE below 1% (30-33). DNA based AS probes and AS primers are therefore generally unsuitable for detection of acquired mutations present at low frequency, such as during early disease pathology or following treatment.
The need for improved AS probe and AS primer chemistries and designs that greatly enhance the LOD and more notably the SPE for rare SPMs has in part motivated the development of a number of useful nucleic-acid analogs. Of particular importance are
Locked Nucleic Acids (LNAs), a class of RNA analogues that provide several advantages over natural oligonucleotides in molecular biology and antisense applications (34). Each of the four standard deoxynucleotides can be converted to its corresponding LNA through the introduction of a 2’-O, 4’-C methylene bridge into the ribose ring of the nucleotide. LNA is therefore a chemical analogue of RNA, and commercial DNA synthesizers can be used to synthesize either probes or primers comprised of linear combinations of standard and locked nucleotides. There is evidence that LNA containing oligonucleotides can offer performance
5 advantages when used as probes in qPCR assays to detect either SNPs or SPMs (35,36).
LNAs display greater base-pairing stability and mismatch discrimination, and these attributes
can be used to design shorter, more selective AS probes (36-40). Similarly AS primers that
incorporate a single LNA at or near the 3’ terminus have also led to improvements in SPE
when compared to pure-DNA AS primers of the same base sequence (41-43).
However, as LNAs alter duplex stability, the effective design of LNA-bearing AS
probes and AS primers requires accurate prediction of the hybridization thermodynamics
with the target allele. In particular, accurate knowledge or prediction of Tm is needed.
Regrettably, current thermodynamic models capable of predicting Tms of complementary
duplexes formed with an LNA containing probe or primer are limited in a number of
important ways. They are only applicable to oligonucleotides with internal non-neighboring
LNA substitutions (single-stranded sequences known as LNA/DNA “mixmers”), as well as to a narrow range of solution conditions (44,45). Furthermore, they are not always accurate in predicting Tm, and the accuracy of these models when applied to mixmer AS probes and
AS primers used in qPCR assays has yet to be tested. A number of improvements and
analyses are therefore needed before these models can be confidently applied to SNP and
SPM assay design.
In addition to an improved knowledge of hybridization thermodynamics, the
successful application of LNA-bearing AS probes and AS primers in qPCR assays designed
to detect SPMs will require better understanding of the interaction of the probe or primer
with the polymerase enzyme, which for qPCR is generally the thermostable DNA
polymerase Taq. Heteroduplexes containing both DNA and LNA have been shown to have
unique interactions with DNA polymerases (46), likely due to localized structural changes
6 caused by the LNA substitution(s) (47-50), and these altered interactions may perturb
extension kinetics in desirable or undesirable ways.
This thesis reports results from an integrated experimental and theoretical research
program aimed at providing an improved understanding of the impact of LNA substitutions
on the hybridization thermodynamics and performance of AS probes and primers designed to
selectivity detect SPMs that have been identified as being “driver mutations” in cancer. A
new model that accurately predicts hybridization thermodynamics of oligonucleotides
containing any number and pattern of LNA substitutions is developed and then applied to AS
probe and AS primer design. Results from experiments designed to investigate the impact of
LNA substitution patterns within probes and primers on Taq polymerase activity are also
reported, and guidelines for placement of LNAs to improve the specificity of qPCR assays
for SPMs are developed. Together, these advances are then used to create new LNA-
containing AS probes and AS primers directed against clinically relevant SPMs that exhibit
equal to greatly improved performance when compared to previously reported detection
methods.
1.2 Background
The importance of deoxyribonucleic acid (DNA) to life and medical science has made
it the focus of intense research over the past 70 years. In vivo, chromosomal duplex DNA is
sufficiently stable to preserve one’s genetic code. Yet, under appropriate conditions, portions
of a chromosome must and do dissociate into single strands to permit, among other things,
the expression of genes. Many powerful techniques and technologies used in molecular biology and in clinical laboratories also exploit the ability to dissociate duplex DNA into its
7 component single strands. Oligonucleotide probes that hybridize to natural single-stranded
DNA are used to identify specific sequences that are diagnostic of disease or to identify a
unique person of interest in a criminal investigation. Oligonucleotide primers are used in a
wide range of applications, including initiating complementary strand synthesis for
sequencing or PCR-based amplification. The development and successful application of
these techniques typically requires knowledge of how the stability or melting temperature Tm of a given duplex depends on its length, sequence and concentration. Solvent composition
(e.g. salt concentration, pH, added metal ions or organic solvents, etc.) is also known to affect the stability of a duplex at a given temperature (51-53). A long-standing goal of researchers studying structures, dynamics and energetics of nucleic acids has therefore been to understand, predict and control the properties and functions of natural nucleic acids and modifications to them.
Duplex DNA consists of two polynucleotide chains that are arranged in an anti- parallel double-helical structure. The nucleotides of DNA are all comprised of three chemical moieties: a phosphate group, a five-carbon sugar (deoxyribose), and an organic nitrogen-containing base. Four different nucleotides that are distinguished through their unique nitrogenous base occur in DNA. They include the pyrimidines, cytosine (c) and thymine (t), and the purines, adenine (a) and guanine (g). Each polynucleotide, also known as single-stranded (ss) DNA, is formed through covalent linkage of the deoxyribose sugar of one nucleotide to the phosphate group of the next nucleotide, with the bases orientated as side groups off the phosphodiester bonded backbone. Several conformations of duplex DNA are found in nature, including A-DNA, B-DNA and Z-DNA. First described by Watson and
Crick, the B form of DNA, which consists of a right-handed double helix that in aqueous
8 solution makes one complete turn about its central axis every 10.4 to 10.5 base pairs, is the most common double-stranded conformation in cells and other living systems. Orientation- specific pairing of a with t and g with c on opposing anti-parallel strands, as well as stacking forces between neighboring bases on each strand serve to create the observed B-form double helix. That knowledge alone is sufficient to understand how genes duplicate. However, through their careful structural and biochemical studies, Watson and Crick (54), along with others (55,56), have provided many additional insights that to this day continue to serve as reliable fundamental underpinnings for understanding and manipulating nucleic acid functions. For example, the neutral bases of individual nucleotides in solution can form at least 28 unique and stable base pair structures that include at least two hydrogen bonds (57); some of these structures are shown in Figure 1.1. The breakthrough discovery of Watson and
Crick was that only one type of these possible base pair structures, now appropriately named the Watson-Crick base pairs (Figure 1.1), fits into the uniform conformationally constrained double helical structure of B-DNA, which for the remainder of this thesis we will simply refer to as duplex or double-stranded (ds) DNA.
9 Watson-Crick
Hoogsteen
N NH2 O CH H O H N 3 N 2 H N a 2 N g HN N HN N t c N N N N N R O R O R R
Figure 1.1 Common types of base-pairing between DNA nucleobases.
In Watson-Crick base pairs, a forms 2 hydrogen bonds with t, and g forms 3 hydrogen bonds with c. The formation of hydrogen bonds between paired bases explains in large part why dsDNA is enthalpically and thermodynamically favored over the ssDNA state at physiological conditions. However, hydrogen bonds between a given base and its surrounding water molecules, as well as van der Waals and π-π∗ interactions between adjacent stacked bases along the helix also contribute to helix stability and structure. The strength of each of these interactions depends on base sequence, making the stability of a duplex sequence specific. In their now classic study, Marmur and Doty (58) found that the hydrogen-bond rich g•c base pair is more stable within the duplex. They then used their results to provide the first useful model of the sequence and length dependence of duplex
10 DNA stability by showing that, to a first approximation, Tm increases linearly with g•c base
pair content. Later seminal studies showed that the degree of base pair complementarity is also important (e.g., Aboul-ela F. et al. (59)), as mutations in base sequence that result, for example, from errors in DNA replication, can destabilize a duplex through the formation of mismatched base pairs.
The entropic cost of generating an ordered bimolecular structure from two flexible single strands destabilizes dsDNA. Entropy therefore compensates the favorable enthalpy
change for the duplexation reaction. As a result, the incremental Gibbs energy change ∆G
per base pair added to a DNA duplex increases the stability of the duplex at physiological
conditions by only a small amount, as has been shown in a number of studies (60-65) using techniques that will be described. This is important biologically since it provides the relatively modest duplex stability needed for gene transcription and translation to occur. It is also of scientific importance, as it means that the accurate prediction of melting temperatures
(and thermodynamics) will require a molecular thermodynamic model that properly describes the balance of compensating forces that give rise to the small base pair specific incremental
Gibbs energy changes that collectively stabilize a duplex.
To gain insight into dsDNA stability and how it is affected by changes in primary structure, scientists have studied duplexes using a combination of methods that include X-ray crystallography (66,67), Raman spectroscopy (68,69) and NMR (70,71) to obtain structural information, and ultraviolet (UV) spectroscopy and differential scanning calorimetry (DSC) to quantify the thermodynamics of the melting transition (61,62,64,65,72-74). These various studies have shown that the denaturation of B-DNA involves disruption of stacking interactions between adjacent bases on a given strand and between the two base pairs within
11 the corresponding base pair doublet. Inter-chain stacking interactions are completely lost
during denaturation, while intra-strand stacking interactions are partly disrupted. Base pair
doublets, also known as nearest-neighbors, can be classified into four distinct groups based
on their composition and sequence. If R and Y denote the purine and pyrimidine bases,
respectively, these are RR, RY, YR and YY, with the RR-type doublet generally providing
the highest stacking stability (75). Hydrogen bonds between Watson-Crick base pairs are
also lost during denaturation. As noted above, one additional hydrogen bond is formed in the more stable g•c base pair. As a result, doublets may contain a total 4, 5 or 6 hydrogen bonds between the two base pairs, and thus exhibit marked differences in stability. Finally, the contributions of base-stacking and base-pairing to B-DNA stability are thought to be fairly similar in magnitude (76), so that both effects must be properly modeled if accurate predictions of melting thermodynamics are to be realized.
1.2.1 Methods for measuring duplex DNA melting thermodynamics
The importance of duplex DNA stability to biology and health science has motivated substantial research toward understanding, through experimentation, the melting thermodynamics of duplex DNA to the random-coil single-stranded state. Focus has largely concentrated on relatively short oligonucleotides because of their relatively simple nature and their widespread use as probes and primers in the polymerase chain reaction (PCR), in various quantitative real-time PCR techniques, and in next-generation DNA sequencing and microarray technologies. For these short dsDNAs, two experimental methods have been developed and widely used to measure Tm, as well as the change in Gibbs energy (∆G)
enthalpy (ΔH), entropy (ΔS) and, in some cases, heat capacity (ΔCp) upon melting. These
12 include i) indirect monitoring of melting thermodynamics by UV spectroscopy and ii) direct measurement by DSC. A description and comparison of these two methods is provided
below.
1.2.1.1 UV absorption spectroscopy
The denaturation of dsDNA into its composite single strands is typically measured by
optical absorption versus temperature studies that generate a “melting curve”. The experiment is generally conducted at 260 nm, where UV light absorption mainly occurs through a π-π* electronic transition in both pyrimidine and purine bases. An example of a
melting curve is provided in Figure 1.2 and shows that an increase in absorption is recorded
during the dsDNA to ssDNA transition. Commonly referred to as the hyperchromic effect,
this increase in the molar absorptivity of DNA is due to changes in vibrational modes of the
bases. For denaturation of short, fully complementary dsDNA, linear pre- and post-transition
base lines are expected and generally observed. This feature, along with the overall
simplicity of the technique and the low concentration of oligonucleotides required have
served to make UV absorption spectroscopy the primary method used to study DNA melting
transition thermodynamics (61).
13
Figure 1.2 Example of UVM data and analysis for a short DNA duplex.
Several assumptions are generally made to derive thermodynamic data from a UV melt (UVM) curve. The first is that the measured change in absorbance correlates directly with a transition in the DNA from the ds to the ss state. As noted above, helix denaturation alters the electronic configuration of the bases through both base unstacking and unpairing contributions, so it is indeed reasonable to think that the observed shift in absorption intensity to lower energy bands is proportional to the percentage of the original dsDNA that has denatured (77). The importance of this assumption is that it permits the fraction (α) of strands in the ds state to be estimated from the UVM curve, provided baselines representing the pre-transition (α = 1) and post-transition (α = 0) states can be accurately assigned (78).
14 Two further assumptions are then required to compute thermodynamic data from the melting
curve. First, one must assume the reaction can be modeled as a reversible two-state (all or
none) transition. Though unequivocal proof that this condition is met is hard to obtain, it is
common to test for two-state melting behavior by analyzing UVM data using two
independent methods: the first is the classic van’t Hoff analysis and the second is based on a
Levenberg-Marquardt nonlinear least-squares fit of the normalized melting curve. Both methods are described below and agreement (±15%) of the thermodynamic values obtained
using the two analyses is generally accepted as an indication that two-state thermodynamics
are applicable to the melting transition of a given duplex (64). Second, the change in heat
capacity (ΔCp) between the two states is assumed to be zero in this analysis. One clear and
widely recognized value in assuming ΔC = 0 is that thermodynamic changes for the p measured melting transition can be computed with ease via a van’t Hoff analysis (64,65). To
both see how this is done and understand the thermodynamics of bimolecular dissociation
reactions, consider the melting of a short dsDNA sequence into two single strands (ssDNA1 and ssDNA2) that are not self-complementary in the 5’→ 3’ sense (e.g. ssDNA1 might be the homo-polynucleotide aaaaaaaaaaaa, which cannot base-pair with itself and is therefore not self-complementary). The melting reaction is then described as
K (T ) dsDNA ←→⎯⎯ ssDNA1 + ssDNA2 (1.1) where K(T) is the equilibrium constant for the helix-to-coil transition at temperature T and is
given by
[ssDNA1 ][ssDNA2 ] K(T ) = (1.2) []dsDNA
15 Note that the equilibrium constant defined by Equation 1.2 is an effective one, as it is based
on equilibrium concentrations and not on activities at the chosen solvent conditions. A
strand mass balance gives
CT = []ssDNA1 + [ssDNA2 ]+ 2[dsDNA] (1.3)
where CT is the total strand concentration in the sample. Division of Equation 1.3 by CT allows one to define α (= 2[dsDNA]/CT), the fraction of strands in the duplex state, so that at
Tm
2 CT (1− α ) K(T ) = (1.4) m 2α
when the concentrations of the two strands are equimolar. The fundamental thermodynamic
relationship for the melting of a duplex formed from two non-complementary strands is
therefore given by
⎛ C 1−α 2 ⎞ ⎜ T ( ) ⎟ o o ΔG = −RT ln K = −RT ln⎜ ⎟ = ΔH − TΔS (1.5) ⎝ 2α ⎠
where ∆G is the Gibbs change at temperature T, and ∆H° and ∆S° are the standard enthalpy
and entropy changes, respectively, for the helix-to-coil transition. The superscript o in
Equation 1.5 denotes that both ∆H° and ∆S° are defined at the standard-state condition, which for UVM experiments is typically a buffered aqueous solution (pH 7) at temperature
Tm and atmospheric pressure containing NaCl at a concentration of either 1 M (typically) or
0.15 M (occasionally). Rearrangement of Equation 1.5 yields the melting curve modeling equation for the case where ∆Cp is assumed to be zero
ΔH o T(α ) = (1.6) ⎛ C 1− α 2 ⎞ o ⎜ T ()⎟ ΔS − R ln⎜ ⎟ ⎝ 2α ⎠
16 When α = 0.5, Equation 1.6 gives the melting temperature Tm of the duplex at the given total
strand concentration CT
ΔH o T = (1.7) m ⎛ C ⎞ ΔS o − R ln⎜ T ⎟ ⎝ 4 ⎠
Equation 1.7 emphasizes the role of enthalpy-entropy compensation in defining the stability
of duplex DNA. In particular, for the melting reaction (Equation 1.1), both ∆H° and ∆S° are
positive and relatively large in value; the term -R ln (CT/4) in the denominator of Equation
-4 1.7 is also positive in value since CT is typically less than 1 x 10 M, but the magnitude of this term is generally a fraction of ∆S°.
It is important to note that CT/4 becomes CT in Equation 1.7 when it is applied to a
duplex formed from a self-complementary oligonucleotide. This change arises because the
melting reaction for a duplex formed from a self-complementary strand (e.g. the ssDNA might be the polynucleotide aaaaaatttttt) is not given by Equation 1.1, but instead by
dsDNA ←→⎯⎯K (T ) 2ssDNA (1.8)
As a result, the equilibrium constant K for self-complementary DNA differs by a factor of 4
at a given CT from that for a duplex formed from non self-complementary strands. As
described in detail below, the analysis of UVM data for self-complementary DNA also
requires an entropy correction that arises from the symmetry of self-complementary strands
(79). Finally, if the two strands are non self-complementary and are not present in equimolar concentrations, CT/4 becomes CA – CB/2, where CA and CB are the concentrations of the more
and less concentrated strands, respectively. If one strand is present in great excess such that
17 CA >> CB, only the concentration of more abundant sequence is required in Equation 1.7 to determine Tm.
To analyze UVM data by the classic van’t Hoff analysis, it is convenient to linearize
Equation 1.7 to
o 1 R ⎛ CT ⎞ ΔS = − o ln⎜ ⎟ + o (1.9) Tm ΔH ⎝ 4 ⎠ ΔH
This result shows that a plot of 1/Tm versus ln (CT), known as the van’t Hoff plot, allows determination of ∆H° from the slope and, in theory, ∆S° from the intercept. The use of
Equation 1.9 to obtain accurate thermodynamic data requires the acquisition of melting temperatures over a wide range of CT values. However, the determination of Tm from UVM data is not straightforward, as it requires careful non-linear fitting and model-based analysis of the melting curve. To avoid this complication, the value of Tm in Equation 1.9 is usually taken to be that of Tmax, which is easily determined as the temperature at which d(A260)/dT is a maximum. Though they are close in value, Tm ≠ Tmax for bimolecular reactions (80,81), and this assumption therefore introduces error into the regressed thermodynamic values.
Additional and sometimes significant error in values of ∆S° determined by van’t Hoff analysis can occur due the inability to collect data at CT values close to the intercept of
Equation 1.9. Several other limitations to this classic analysis method have also been identified and carefully explained in previous reports (80).
An alternative method for analyzing UVM data according to two-state theory is provided by substituting the correct relation between K and α (i.e. Equation 1.4 for a non-self complementary bimolecular reaction) into the Gibb-Helmholtz equation.
dα ΔH o =−6RT 2 (1.10) m dT Tm
18 To apply Equation 1.10, raw UVM data are first normalized to 0 ≤ α ≤ 1, typically by independent linear fits of the pre- and post-transition baselines as shown in Figure 1.2. A non-linear fit of one or each baseline might prove more accurate in certain cases, though the added number of fitted variables complicates both the analysis and the estimation of errors.
2 2 Alternatively, Owczarzy has shown that regions where the second derivative d (A260)/dT is zero can be used to define temperatures where melting curve data are linear and may therefore be used to establish the pre-transition and post-transition baselines (81). Linear least-squares fitting of the resulting α(T) curve with Equation 1.10 in the range near and centered around α = 0.5 is then used to estimate Tm, as well as the value of dα/dT at Tm needed to compute ∆H° using Equation 1.10. ∆S° can then be calculated using Equation 1.7.
Typically, the values of ∆H° and Tm obtained by this local regression procedure are precise.
However, as a secondary check of data quality, both quantities may also be obtained by fitting the entire α(T) curve (or even sets of curves) using the Levenberg-Marquardt nonlinear least-squares method (Figure 1.2) with the local (linear least-squares) estimates used as initial guesses.
Important points may be made regarding analysis of UVM data as a means of building predictive models of dsDNA stability. First, the UVM method measures changes in absorbance due to the hyperchromic effect and does not directly measure melting thermodynamics. Instead, thermodynamic data are obtained by regressing a two-state thermodynamic model to α(T) data. The quality of this regression depends on several factors, the most significant of which are the proper selection of the pre- and post-transition baselines and the implications of assuming ΔCp equals zero. The first factor is significant because the baselines are used to normalize the absorbance data to α values, and this in turn
19 defines the shape of the function α(T) from which Tm (taken as the temperature at which α =
0.5) and ∆H° are estimated. The second factor is also important, though it does not, in general, introduce significant error in the regressed melting thermodynamics at Tm. Instead, its impact is felt because it leads to a thermodynamic analysis, and ultimately to nearest- neighbor type thermodynamic models, where ∆H and ∆S are assumed to be functions of duplex length and sequence, but not of temperature. The consequence of this is that ∆H° and ∆S° values acquired at Tm are then used without temperature correction in Equation 1.5 to estimate ∆G at a temperature T of interest, say 37 °C, that may be far from Tm. If, in fact,
ΔCp ≠ 0, this analysis procedure can result in significant errors that will serve to obscure the true differences in stability arising from duplex length, sequence, etc.
1.2.1.2 Calorimetry
Calorimetric methods such as differential scanning calorimetry (DSC) and isothermal titration calorimetry (ITC) offer the advantage of directly measuring the thermodynamics of
DNA melting transitions (82-84). Uncertainties in the data are therefore reduced. In DSC,
DNA helix-to-coil transitions are followed as a function of temperature by measuring the
ex excess heat capacity (Cp ) of a DNA-containing solution relative to an otherwise identical
ex DNA-free control solution. Integration of the resulting Cp versus T curve (Figure 1.3) provides a direct measure of the transition enthalpy and entropy