Free Energy Landscape and Conformational Kinetics of Hoogsteen Base-Pairing in DNA Vs RNA
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.09.868018; this version posted January 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Manuscript submitted to BiophysicalJournal Article Free Energy Landscape and Conformational Kinetics of Hoogsteen Base-Pairing in DNA vs RNA D. Ray and I. Andricioaei ABSTRACT Genetic information is encoded in the DNA double helix which, in its physiological milieu, is characterized by the iconical Watson- Crick nucleobase pairing. Recent NMR relaxation experiments revealed the transient presence of an alternative, Hoogsteen base pairing pattern in naked DNA duplexes and estimated its relative stability and lifetime. In contrast, HG transitions in RNA were not observed. Understanding Hoogsteen (HG) base pairing is important because the underlying "breathing" can modulate significantly DNA/RNA recognition by proteins. However, a detailed mechanistic insight into the transition pathways and kinetics is still missing. We performed enhanced sampling simulation (with combined metadynamics and adaptive force bias method) and Markov State modeling to obtain accurate free energy, kinetics and the intermediates in the transition pathway between WC and HG base pair for both naked B-DNA and A-RNA duplexes. The Markov state model constructed from our unbiased MD simulation data revealed previously unknown complex extra-helical intermediates in this seemingly simple process of base pair conformation switching in B-DNA. Extending our calculation to A-RNA, for which HG base pair is not observed experimentally, resulted in relatively unstable single hydrogen bonded distorted Hoogsteen like base pair. Unlike B-DNA the transition pathway primarily involved base paired and intra-helical intermediates with transition timescales much higher than that of B-DNA. The seemingly obvious flip-over reaction coordinate, i.e., the glycosidic torsion angle is unable to resolve the intermediates; so a multidimensional picture, involving backbone dihedral angles and distance between atoms participating in hydrogen bonds, is required to gain insight into the molecular mechanism. SIGNIFICANCE Formation of unconventional Hoogsteen (HG) base pairing is an important problem in DNA biophysics owing to its key role in facilitating the binding of DNA repairing enzymes, proteins and drugs to damaged DNA. X-ray crystallography and NMR relaxation experiments revealed the presence of HG base pair in naked DNA duplex and protein- DNA complex but no HG base pair was observed in RNA. Molecular dynamics simulations could reproduce the experimental free energy cost of HG base pairing in DNA although a detailed mechanistic insight is still missing. We performed enhanced sampling simulation and Markov state modeling to obtain accurate free energy, kinetics and the intermediates in the transition pathway between WC and HG base pair for both B-DNA and A-RNA. INTRODUCTION The one-dimensional genetic information encoded in the sequence of DNA base pairs is intrinsically related to its three- dimensional structure described by the iconical Watson-Crick (WC) helix (1), with its specific hydrogen-bonding pattern between purine and pyrimidine (A-T, C-G) complementary nucleobases. Recently, a series of exciting NMR relaxation studies (2) on free DNA in solution have shown that the predominant WC base-pairing pattern between A-T and G-C nucleotides is in dynamic equilibrium with a less common and transient (µs-ms lifetime) Hoogsteen (HG) form (3), where the purine nucleotide in the base pair flips over. The WC to HG base pair transition, a form of “DNA breathing” (4), hints to the existence of a “secondary genetic code” signaled by base pair flipping, apart from the primary encoding in the nucleotide sequence. Crystallographic experiments have reported the presence of Hoogsteen base pairs only in specific contexts, such as when DNA binds to proteins like p53 tumor suppressor protein (5) or integration host factor (IHF) in E. coli (6). Detailed biochemical studies showed that HG base pairing plays a key role in circumventing interference in DNA replication mechanism by DNA polymerase by avoiding appearance of lesions (7, 8). In p53 binding sequence HG conformation exposes negative charged regions of nucleic acid to the positive arginine residue of the protein (5) leading to stability of the complex and playing crucial role in DNA recognition (9, 10). Examples of specific binding to DNA guided by HG base pair extends beyond proteins and even includes small molecules and drugs, which in turn facilitates its recognition by proteins and enzymes (11, 12). In a nutshell, HG base pairing therefore expands the structural and functional repertoire of Manuscript submitted to Biophysical Journal 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.09.868018; this version posted January 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. D. Ray and I. Andricioaei duplex DNA beyond what can be achieved by Watson-Crick base-pairing alone. For example, HG base pairs are recognized by DNA repairing enzymes resulting in selective binding in the damaged regions of DNA, rich in syn-anti configuration over the anti-anti WC base pair rich normal DNA. Thus, apart from providing structural integrity to the damaged and deformed DNA, HG base pairs assist in the DNA repair mechanism (13). Apart from multiple appearance of HG base pairing in damaged or distorted DNA (14–16) several studies have confirmed presence of Hoogsteen conformation in undistorted DNA duplex (17, 18); the first X-ray crystallographic evidence of Hoogsteen base pairing in a simple stretch of DNA sequence composed exclusively of A-T base pairs was provided by Abrescia and coworkers (19). Early computational work by Gould and Kollman had showed that HG base pairing between adenine and thymine nucleobases (i.e., without sugar-phosphate backbone) in vacuum was energetically more favourable (∼1 kcal/mol) than its WC counterpart (20) in agreement with the structure in crystalline environment (3). Subsequent ab-initio quantum chemistry calculations for gas phase A-T base pairs have painted a more complex picture, showing that neither HG or WC forms are the global minimum (21). Furthermore, NMR studies have found that N-methylation of adenine favors syn HG base pairing over the anti WC form by preventing formation of a WC specific hydrogen bond by inducing steric clash (16). NMR chemical shift perturbation, NOESY experiments and constant pH MD simulation show that the HG base pairing among G-C base pairs is ∼20 times less abundant than that in A-T base pairs arguably because a protonation of cytosine is necessary for the transition to HG form (11, 22). Experiments of bare DNA duplex like A-T rich A6-DNA segment have shown presence of Hoogsteen base pair in the A16-T9 position from carbon R1ρ relaxation dispersion NMR signal (2). Such experiments on DNA bound to base intercalating drugs like echinomycin and actinomycin showed formation of Hoogsteen base pair (23). Since RNA duplexes also involve Watson-Crick like base pairs, it was natural to look for the existence of Hoogsteen transitions there, as well. However, HG base pairing was not observed in NMR relaxation experiment in a common A-form RNA hairpin with same sequence as A6-DNA. In contrast to DNA N-methylation of RNA adenine base to induce HG base pair formation resulted in base pair melting and consequent dissociation of the RNA helical region (24). In the present paper, we perform a computational comparison of the free energy psurface and transition paths of DNA vs RNA WC to HG transitions, seeking to understand the relative stability of the HG state and the factors that modulate the dynamical equilibrium in DNA vs RNA. The first report (2) of the NMR measured WC-HG dynamic equilibrium also included our preliminary work on searching pathways of conformational transition using a biased molecular dynamics simulation. Further computational work from our group and others (12, 23–26) have since uncovered several other pieces of information, including energetic details, transition pathways and kinetics of the transition between WC and HG base pairs in naked A-T rich DNA helical duplexes. The paths predicted in our initial study were also recently sampled by Vreede et al.(27), who have calculated the rate constants of back and forth transitions between WC and HG base pairs using Transition Path Sampling (TPS) (28). Experimental estimates put the WC-HG transition in DNA on a timescale of 50-250 ms (2). Therefore, direct simulation of Hoogsteen base pair formation in explicit water for a biologically relevant system is still beyond the reach of present day computational power. In previous work, we showed that formation of Hoogsteen base pair can be achieved in molecular dynamics simulation by applying an artificial biasing force on the the glycosidic torsion angle for free DNA (2, 24) and echinomycin-bound DNA (23). Enhanced sampling methods like umbrella sampling (29) and metadynamics (30, 31) can be utilized to obtain the free energy surface along slow coordinates. For example, Yang et. al. have calculated the free energy landscape of DNA breathing motion for a A-T rich duplex DNA segment from umbrella sampling (25) and multiple walker well tempered metadynamics simulation(26). Even for a short DNA segment, it required 6-40 µs long enhanced sampling simulations to obtain a converged physically meaningful free energy surface. Adaptive biasing force (ABF) (32) has recently become a good alternative to the traditional free energy sampling methods (33) for a wide range of biophysical problems including membrane permeation (34), ion transport (35), ligand diffusion in protein (36) and binding free energy calculation (37).