<<

Molecular Modeling of UNIT 7.5 Structure

Molecular modeling, loosely defined, re- ing highly flexible systems, investigating pro- lates to the use of models to investigate the posed chemical modifications that have yet to three-dimensional structure, dynamics, and be synthesized, or to represent extremes of properties of a or set of . At pressure, temperature, and concentration. As the heart of this is specification of a molecular will become apparent, the methods are steadily model, which provides a molecular structure at improving to the point that reliable predictions an appropriate level of granularity, usually in are emerging. terms of three-dimensional atomic coordinates. A critical point that needs to be made at the Molecular modeling can be approached on outset is that these methods cannot be treated many levels, ranging from energy minimiza- as a “black box” or hands-off procedure; there tion (finding the set of coordinates that mini- is no standard protocol that can be applied. mizes the energy) with a complete ab initio Modeling is really more of an art. As each quantum-mechanical treatment of the energet- situation has differing requirements and needs, ics, to sampling “reasonable” conformations various choices need to be made as to what level with a simplified energy representation or po- of treatment to apply and what model to use. tential, to the manipulation of physical models These choices rely on a critical understanding where no implicit energy representation is in- of the limitations in the methods. Therefore, the cluded. These methods serve not only as tools purpose of this discussion is to open up this to aid in the interpretation of experimental data, black box a bit to allow some understanding of but to directly complement such data by pro- the options and choices a modeler makes, high- viding a relationship between the macroscropic lighting the tradeoffs that must be made in behavior observed experimentally and the mi- accuracy, system size, and time. The discussion croscopic properties represented in the model here and in UNITS 7.8 to 7.10 is not meant to provide or simulation. a complete review of nucleic acid modeling, As discussed in previous units, various mo- nor to substitute for the more complete treat- lecular modeling tools can serve as conforma- ment discussed in the primary literature. In- tional search engines for sampling conforma- stead, these units are intended to provide a tional space subject to the restraints inferred framework that describes molecular modeling from nuclear magnetic resonance (NMR; see of nucleic acids, points out common issues and UNIT 7.2) and (see UNIT 7.1) ex- limitations, and points the reader to other useful periments. This is a critical step in the refine- information sources. ment of three-dimensional atomic structure. Implicit in this discussion is a realization Inclusion of some representation of the energy, that a molecular model is more than simply a such as through the use of a specially parame- representation of the covalent connectivity or terized empirical force field, can aid in this static structure. The model may also include endeavor by limiting sampling to more realistic some representation of the energetics of the (in terms of energy) conformations. system and perhaps the dynamics over a par- As mentioned above, molecular mechanics ticular time scale. Although it increases the methods can not only be used as a tool, but can utility, supplementing static structure with a directly complement experimental data. For representation of the energy and dynamics of instance, molecular dynamics simulations can molecular motion tremendously increases the be used to aid in the interpretation of NMR cost of the modeling. For example, the simula- order parameters or to estimate anisotropic ro- tions required to accurately represent the se- tational diffusion. In addition, computer simu- quence-specific structure and molecular dy- lation techniques have the potential to give namics of a small, solvated nucleic acid duplex structural and dynamic insight into the atomic (<20 base pairs) on a nanosecond time scale interactions occurring on a time scale (<µsec) would likely require weeks to months on avail- typically not observable due to averaging in able computer workstations, even with simple crystallography and NMR experiments. Ulti- empirical energy representations. Of course, mately, as methods are proven reliable, they can this added information may not always be nec- then be applied in cases where experimentation essary. For example, to investigate whether a Biophysical Analysis of is limited, difficult, or unfeasible, such as study- proposed modification to a DNA base is steri- Nucleic Acids Contributed by Thomas E. Cheatham, III, Bernard R. Brooks, and Peter A. Kollman 7.5.1 Current Protocols in Nucleic Acid (2000) 7.5.1-7.5.12 Copyright © 2000 by John Wiley & Sons, Inc. Supplement 6 cally feasible may only require the crude ma- nipulation of a physical model to see an effect. Therefore, it is critical to understand the appli- cability, reliability, and limitations of these methods. In other words, the choice of the model depends on the question being asked. The remainder of the discussion in this unit Figure 7.5.1 Schematic representation of introduces the simplest levels of molecular molecular modeling analysis. modeling applied to nucleic acids. These in- clude generation, evaluation, and charac- Prior to generating an initial molecular terization of the initial molecular model. At this model, it is necessary to choose its repre- simplest level, a nucleic acid model is limited sentation or level of detail. For nucleic acids, to a static representation of the structure in the the structural representation can be approached gas phase. Evaluation of this given model’s on many levels, ranging from the atomic level utility is therefore based on the chemical intui- (including electrons) to coarser levels, such as tion of the modeler, where manipulations to the those that model structure using a single point model are limited to rotation about single per . The realism of the model directly bonds. To move beyond this level, supplement depends on this choice of representation and units in this series will delve more deeply into further depends on what properties one is trying the myriad of issues involved in the computer to represent. As shown in Table 7.5.1, modeling simulation of nucleic acids. These include de- can be considered a tradeoff between the accu- scribing the common energy representations racy, the size and granularity of the system, and for nucleic acids that may be applied (UNIT 7.8), the time scale to be represented. If the model and discussion of how to properly represent the only concerns a single conformation or small electrostatic interactions and solvation effects set of conformations of a molecule of <100 (UNIT 7.9). Additionally, various methods to find atoms, a very accurate energy model and a more representative structures are introduced, description that includes all the atoms and elec- with a focus on molecular dynamics simulation trons can be used (such as ab initio quantum methodologies. Finally, a description of practi- mechanics with a fairly large basis set and even cal issues in nucleic acid simulations will be correlation). However, to investigate the super- provided (UNIT 7.10), such as what force fields coiling of a small DNA plasmid over a micro- are appropriate to apply, how simulations of second time scale, the system can no longer be nucleic acid are set up with explicit solvent and represented at the atomic level, and a much counterions, and how crude relative free energy simpler description of the energetics and a differences can be estimated from molecular coarser representation of the structure must be dynamics simulations. In these discussions, the imposed. However, this may be sufficient to focus will be on the middle ground in terms of represent the properties of interest. Between a size, time scale, and accuracy—that is, the full quantum mechanical treatment appropriate simulation of small nucleic acids (typically less for small molecules and the coarse-grained sin- than ~250 base pairs), with explicit repre- gle point per base pair model appropriate for sentation of the environment (if feasible or large systems, molecular dynamics methods necessary), empirical pairwise potential func- with an empirical potential may give reliable tions, and time scales ranging from the analysis results as long as no “chemistry” is involved of individual snapshots to nanosecond-length (such as bond forming, bond breaking, or elec- simulations. For those readers more interested tron transfer) and highly polarizable metal in learning about the simulation of larger nu- are treated at a very approximate level. These ∼ cleic acid systems ( 1,000 to 15,000 base methods can give reliable insight into the se- pairs), a variety of reviews can be consulted quence-specific structure and dynamics of a (Vologodski and Cozzarelli, 1994; Schlick, small nucleic acid duplex in solution. 1995; Olson, 1996).

MOLECULAR MODELING The Static Structure Model The practice of molecular modeling basically At the simplest level, and where the repre- Molecular involves the generation of an initial molecular sentation of the model does not include any Modeling of model, evaluation of the model’s utility, and per- reality beyond the covalent connectivity, mo- Nucleic Acid haps manipulation of the molecular model (fol- lecular modeling can be performed by creating Structure lowed by further evaluation; see Figure 7.5.1). and manipulating physical models. Physical 7.5.2

Supplement 6 Current Protocols in Nucleic Acid Chemistry Table 7.5.1 Tradeoffs in Molecular Modeling Time scale System size Granularity (finer Accuracy (increasing) (decreasing) (decreasing) grain) Effective potential Microseconds Supercoiled DNA, One point per base plasmid pair, elastic rod Molecular mechanics Nanoseconds to <1000 base pairs All atom, implicit (implicit solvent) microseconds solvent Molecular mechanics Nanoseconds <250 base pairs All atom, explicit solvent Quantum mechanics Individual snapshots (s), few All atom plus waters/ions electrons, implicit solvent

models are available that can represent three in the 1 cm to 1 inch per Å range. Some models levels of granularity. At the finest level, there that have been used successfully are the are a variety of atomic and molecular orbital Maruzen models, such as the HGS Biochemis- models that represent the atoms and electrons. try Molecular Model (see Internet Resources). These molecular orbital models are not really Coarser folded-chain models, such as appropriate for larger and more complicated models that represent a connection/bond for molecules (such as anything larger than per- each α-, are also in use. haps benzene), and therefore their use is really The physical bond-oriented models, al- limited to teaching. Much more useful for rep- though tedious to build and often very fragile, resenting nucleic acid structure are models that are very useful for gaining insight into atomic represent the atoms and bonds and, therefore, structure. In addition, the models can be ma- the covalent connectivity of a molecule. nipulated (which can lead to problems with There are a few common types of models in larger model structures, as they tend to deform). use that can be classified as either space-filling Although the models have rigid bonds and or bond-oriented. The most common space-fill- angles, they typically allow free rotation about ing models are of the Corey-Pauling-Koltun single bonds. This can provide insight into the (CPK) variety, named after the researchers that correlated conformational changes that occur developed them. These space-filling models upon change in a given coordinate. One exam- represent the various atoms as cut-out spheres ple is the change in sugar pucker conformation of a size proportional to the van der Waals from C2′-endo to C3′-endo, which lowers the radius, which are colored and shaped according rise between base pairs and shifts the confor- to atom type and can be connected together mation not only of the atoms in the ring (based upon the hybridization state and possi- but also of the nucleic acid backbone. In fact, ble connectivity of the atom). The most com- modeling B-DNA with physical models led to mon bond-filling models are polyhedral mod- the formulation of Calladine’s rules, which sug- els. These provide a series of pieces that are in gest means to overcome strong steric hin- various polyhedral shapes with holes for pegs, derances between adjacent in opposite which represent the bonds. The shape, color, strands as the base pair propeller twist increases and number of holes represent the various atom to improve stacking. types (and hybridization state), and connecting pegs represent the bonds. Although these models are useful for teach- Computational Graphics and Energy ing and for building models of small molecules, Models they are not appropriate for building macro- A problem with physical models is that there molecular models, such as of a DNA duplex. is no reliable means to include a description of To build a larger molecule, special-purpose and the energy. With these models, energy can only more durable physical models can be pur- be represented rather crudely, such as by inhib- chased. These provide larger building units iting free rotation because of the connectivity (such as DNA bases) in addition to smaller or by the addition of physical restraints to Biophysical atom/half-bond units, which can be connected prevent rotation about double bonds. This al- Analysis of Nucleic Acids together. The scale of these models is usually lows a minimal interpretation of the intra- 7.5.3

Current Protocols in Nucleic Acid Chemistry molecular or internal energetics of the system to fit a particular drug into the minor groove of (related to the connectivity of the molecule). a double-helical nucleic acid without seriously In addition to intramolecular interactions, a distorting the duplex, or showing that a certain realistic depiction of the energy requires repre- chemical modification to the phosphodiester sentation of the intermolecular interactions backbone is incompatible with the model struc- (e.g., van der Waals or steric repulsion and ture. Simple modeling and dispersion attraction interactions, were used as a guide in the initial design of bonding, and electrostatic interactions). Al- peptide nucleic acid (PNA), an isosteric and though the -sphere models can represent stable backbone modification to DNA pro- steric repulsion, they cannot be used to accu- posed for use as an antisense therapeutic agent rately describe the total energy; however, a (Nielsen et al., 1991). realistic treatment of the energetics can readily Manipulation of molecular graphics or be calculated by computer. Coupled with mo- physical models, when coupled with an appro- lecular graphics (digital display of molecular priate chemical/structural intuition, can give models), computational energy models open useful information. Examples include under- the door for much more realistic and reliable standing steric effects, such as the interaction molecular modeling. Prior to the advent of of drugs with the grooves or base pairs of molecular graphics, physical models were rou- nucleic acid duplexes or correlated changes in tinely used as aids for crystallographic refine- structure due to rotation about particular bonds. ment. However, a major issue with this type of mod- Molecular graphics programs are now abun- eling is evaluation of the molecular models. dant and allow very nice and realistic display Evaluation and interpretation of the meaning of of molecular structure. The generality of the the molecular model depends on the quality of programs removes some of the tedium and cost the initial model, the reliability of the energy of building physical models. However, since representation (if any), and the choice of coor- the computer graphics display is two-dimen- dinate manipulations to the model that might sional, the ease of seeing the three-dimensional be made. Without a reliable guide into the model is lost and needs to be recovered by conformational energetics and coordinate ma- coloring, shading, or rotating the model to pro- nipulations necessary to “improve” the model, ject the third dimension. Alternatively, stereo- evaluation of the model depends solely on the view displays can be used, which allow three- chemical intuition of the modeler. This intui- dimensional viewing with special glasses tion is necessary to rule out unfeasible or unre- (either through shuttering, as with the Crystal alistic models or to suggest manipulations to Eyes display, or with coloring and shading). In the model that may improve the property of addition to more general usage, adding a de- interest. scription of the conformational energy to the Because there is no easy way to judge the molecular model is easier on the computer. quality of these models within this simple mod- Including a picture of the energy along with eling framework, the conclusions made are the molecular graphics can provide greater in- often tenuous in the absence of experimental sight and help aid in the evaluation of the model. verification. For example, the initial model may Examples include coloring regions of a mole- not have been at all representative of what is cule based on favorable electrostatic potential seen experimentally or structural manipula- or highlighting atoms that show significant tions may lead to a model structure that is steric overlap. The manipulations possible at energetically unreasonable. Although the situ- the simplest level mirror those of physical mod- ation, in principle, improves with more ad- els and include a variety of coordinate manipu- vanced treatments because the energy is in- lations, such as rotating about bonds or chemi- cluded and unreasonable coordinate manipula- cally modifying the structure. However, rather tions are avoided, there are still many than manipulating the model by hand as with limitations in the methods. This is compounded physical models, hooks need to be provided in by the sheer complexity of rugged energy land- the molecular graphics software to allow selec- scapes for biomolecular structures, which tion and rotation of various parts of the mole- makes evaluation of the reliability of a model cule. structure difficult. In this sense, it should not Molecular Given a reliable initial model structure, mo- be immediately assumed that “better” results Modeling of lecular modeling with simple coordinate ma- are seen with more advanced treatments only Nucleic Acid nipulations may be sufficient for many appli- because more reliable methods are used. There Structure cations, such as suggesting that it is not feasible is still an essential need to compare the model 7.5.4

Current Protocols in Nucleic Acid Chemistry with experimental data and to critically evalu- cording to the Boltzmann distribution) is ate the model. needed. To do this, molecular dynamics To aid the modeler with simple molecular (MD) or Monte Carlo (MC) simulation (dis- modeling, perhaps the ultimate molecular mod- cussed in more detail in UNIT 7.8) can be done eling environment might involve viewing a with the given energy representation. This, molecular graphics depiction of the model as it however, tremendously increases the cost and updates in real time according to the underlying complexity of modeling. Whether or not the energy potential, while the model is manipu- sampled space of conformations is repre- lated according to the whims of the modeler. sentative depends on the reliability of the An example of this type of program is Sculpt energy description, the amount of conforma- (Surles et al., 1994), which allows real-time tional sampling, and the reliability of the minimization of the structure as it is manipu- initial model. However, it should be empha- lated. Further enhancement to this environment sized that more costly and detailed treatments could come from visual and aural feedback do not always lead to “better” insight and are from the system, such as a bang sound and flash not always necessary to address the question of red light, to discourage manipulations by the at hand. modeler that move atoms into sterically forbid- den regions. More involved haptic feedback Generating the Initial Model mechanisms are also possible, such as increas- The first step in any modeling endeavor is ing the difficulty of performing a given manipu- creation of the initial molecular model, where lation in proportion to the energetic penalty. “model” refers to a particular set of three-di- Ultimately, molecular modeling environments mensional coordinates that define the structure of this type will incorporate visual, aural, and of interest. In this discussion, which concerns tactile feedback mechanisms, coupled with nucleic acid structure on an atomic level (as stereoscopic three-dimensional display in a vir- opposed to the more coarse-grained bead mod- tual reality “cave” (Cruz-Neira et al., 1992), to els appropriate for modeling larger nucleic acid guide the modeler as the model is manipulated. structures), this model is the set of three-dimen- Software to perform this type of real-time mod- sional atomic coordinates. Generally, a model eling has become available in recent years, of the coordinates is built by hand or received although the complexity of the calculations from another source (such as a database of limits the treatment, and therefore fairly ap- experimentally derived structures). As will be- proximate representations of the energetics come more apparent later in this overview, the must be employed. quality of the modeling in large part relates to Nevertheless, this ultimate molecular mod- the quality of the initial model or the ability to eling facility, with realistic energy repre- find or sample the “correct” structure given the sentations and user feedback to steer the various initial model. In this regard, studying an un- molecular manipulations, unfortunately does known RNA structure is likely to be unfeasible not give a complete understanding of the mo- at present, since it is unrealistic to imagine lecular structure. The energy (enthalpy) alone correctly folding up the RNA structure in dy- is insufficient to describe the relative stability namics simulations (due to barriers to confor- of various models, and care needs to be levied mational transition that cannot be overcome in judging the reliability of models based on during the time scale of the simulations, and to differences in energy. In addition to describing inaccuracies in the energetic representation). the energy of the system, it is also necessary to Although there has been tremendous progress include entropic effects. When entropic effects in predicting RNA secondary structure, pre- are included, free energy values may be ob- dicting the overall tertiary structure (i.e., three- tained, providing the connection with reality dimensional atomic coordinates) is still a major and experimental measurement. With free en- unsolved challenge. In spite of this, there have ergy, the modeler has a handle on the relative been a few attempts (for review see Brion and population of each state or can equivalently Westhof, 1997; Leclerc et al., 1997). Therefore, understand the various thermally accessible it is best to base the modeling on experimentally conformations of the molecule in its native derived structures. Since DNA tends to adopt environment. regular duplex structures, one can often use the To add entropic effects, some means of sam- canonical structures as an initial guess. The pling the space of accessible conformations canonical models were derived from fiber dif- Biophysical (according to the relative probability of observ- fraction studies of large DNA fibers and give Analysis of Nucleic Acids ing a given conformation or equivalently ac- an average idealized geometry and structure 7.5.5

Current Protocols in Nucleic Acid Chemistry Supplement 6 representative of DNA under specific condi- Resources; Abola et al., 1987) or the Nucleic tions (such as A-DNA under low humidity and Acid Database (NDB; see Internet Resources; B-DNA under physiological conditions; Arnott Berman et al., 1992), both of which contain the and Hukins, 1972). Crystallography provides coordinates for a variety of nucleic acid struc- another source of high-resolution structures, tures and protein-nucleic acid complexes de- such as the left-handed Z-DNA duplex (Wang rived from crystallography or NMR experi- et al., 1979). The common canonical forms of ments. The NDB may be a more appropriate DNA (A-DNA, B-DNA and Z-DNA) are place to start, as (1) it has been specifically shown in Figure 7.5.2 as stereo views. A good tailored to assemble and distribute structural resource (although somewhat out of date) for information about nucleic acids, (2) it can be general information on the structure of DNA is searched, and (3) it provides coordinates (in Saenger’s excellent book (Saenger, 1984). multiple formats) as well as information about High-resolution structures are also emerging the crystal parameters, packing, and experi- from NMR spectroscopy (Ulyanov and James, mental conditions. From both of these sources, 1995; also see UNIT 7.2). A more recent book coordinate files in the commonly used PDB surveying nucleic acid structure and interac- format can be obtained. tions as well as NMR and crystallography stud- If an experimental structure is not available, ies is Bioorganic Chemistry: Nucleic Acids it may still be possible to generate a reasonable (Hecht, 1996). model structure. A tool (or more accurately, a Many of the experimentally derived nucleic language for molecular manipulation) that can acid structures are freely available through help develop such an initial model is Nucleic either the Protein Data Bank (PDB; see Internet Acid Builder (NAB) developed by Tom Macke

Figure 7.5.2 Canonical structures of DNA shown as stereo views. Shown are canonical models of A-DNA and B-DNA of d[CCAACGTTGG]2 (Arnott and Hukins, 1972) and a 10-mer extended model of the Wang Z-DNA structure of d[CGCGCGCGCG]2 (Wang et al., 1979) as stereo views. Stereo views are common in the literature; these are wall-eyed stereo views as opposed to cross-eyed. Although some people can view these directly, most people resort to one of a variety of hand-held stereo viewers, such as those based on mirrors or better ones that use focusing lenses. The model of Z-DNA was built by overlaying the two 6-mers at the joining region to the root-mean- Molecular squared (RMS) best fit overlapping CpG steps, and the A/B-DNA models were built using the Modeling of NUCGEN module of AMBER 4.1 (Pearlman et al., 1995). The A-DNA and B-DNA models were Nucleic Acid Structure all-atom RMS best fit to a common reference frame, and the view is into the major groove on top and the minor groove on the bottom. 7.5.6

Supplement 6 Current Protocols in Nucleic Acid Chemistry and Dave Case (Macke and Case, 1998). The particular program’s pedantic conventions and, NAB molecular manipulation language allows additionally, to somehow add hydrogen atoms a specification of rigid body translations, speci- to the structure. Almost all of the modeling fication of restraints, distance geometry meth- programs are equipped with some facility for ods, and various other tools to aid in the gen- adding missing atoms, particularly hydro- eration of arbitrary structures. This has been gens. For more advanced treatments, solvent used to generate model structures of synthetic and counterions can also be added (discussed Holliday junctions, protein-DNA complexes, in UNIT 7.9). RNA , supercoiled DNA, and It is always a good idea to check the initial other structures (Macke and Case, 1998). If the structure carefully to determine if the confor- model shares properties with other known mation and nomenclature is as expected and structures, such as common secondary struc- whether the are added with the cor- ture elements or sequence, it may be possible rect . It would be very disap- to model by homology to the known structures pointing to discover, after spending weeks run- or, alternatively, to build up the structure from ning nanosecond-length molecular dynamics a library of smaller pieces of known structure. simulations of solvated DNA, that one of the This approach has been used to model RNA H1′ atoms on a particular residue was inadver- tertiary structure (Major et al., 1991) and the tently added with the wrong stereochemistry, structure of DNA single strands (Erie et al., leading to an α-glycosyl linkage rather than the 1993). expected β linkage. It is likewise critical to Recent surveys of crystal structures in the check the stereochemistry of the structure after Cambridge Structure Database (which contains manipulations to the molecular model are a variety of high-resolution structures of made. Under some conditions, such as when mononucleosides and mononucleotides; Allen using distance geometry methods or when per- et al., 1979) and the NDB (Berman et al., 1992) forming stringent minimization with large re- provide a set of parameters that can serve as the straints, the structure can be distorted and the beginnings of a dictionary for standard nucleic stereochemistry altered. acid geometry. These surveys investigate the Although not all modeling programs adhere geometry of the bases (Clowney et al., 1996) to IUPAC naming conventions (JCBN, 1983; and the sugar and phosphate backbone (Gelbin see APPENDIX 1C), these conventions are a good et al., 1996; Schneider et al., 1997). Addition- reference to check the naming, orientation, and ally, recent surveys have investigated the spe- placement of the various atoms. Additionally, cific hydration of nucleic acids and interaction there are a variety of tools for characterizing with metal ions (Schneider et al., 1993; the nucleic acid structure, which are discussed Schneider and Kabelac, 1998). High-level in the next section. However, these methods do theoretical techniques can also give useful in- not necessarily check stereochemistry, depend formation. Ab initio quantum-mechanical on the use of correct hydrogen naming conven- simulations with a reasonable basis set (6-31 tions, or enforce IUPAC naming conventions. G* or better) and some inclusion of correlation Although the PDB format is a common and can accurately represent geometry and polari- well-defined standard for three-dimensional zation effects, and therefore properly represent atomic coordinates, not all programs under- nucleic acid interaction with various ions, met- stand the standard PDB format, and they in- als, or nucleic acid bases. Monte Carlo and stead rely on some subtle variant or expect molecular dynamics simulation can also be another coordinate format entirely. To aid in used to obtain specific insight into associa- converting between the large set of formats tion and hydration. available for many of the various modeling tools, the program babel is very useful (see Completing the Initial Model Internet Resources). Not only can this perform Often the experimentally determined struc- direct conversion among various coordinate file tures obtained from the PDB or NBD lack formats, it can assign connectivity, bond orders, explicit hydrogen atoms. Additionally, the no- and hybridization when this information is not menclature used is invariably different from present. that of the given modeling program, and the user has to impose various contortions to coerce Characterizing Nucleic Acid Structure the file into the expected naming and number- In order to characterize the quality of an Biophysical ing conventions. Therefore it is fairly common initial molecular model or to later evaluate the Analysis of Nucleic Acids to have to modify a PDB file to conform to the conformational changes that occur as the model 7.5.7

Current Protocols in Nucleic Acid Chemistry Supplement 6 is manipulated (for example, during MD simu- of a single polynucleotide strand, it is also lation), it is useful to characterize the overall desirable to characterize the commonly occur- three-dimensional structure. In , one is ring duplex structures that result from comple- typically only concerned with the φ and ψ mentary base pairing between strands. Helicoi- backbone angles and perhaps some of the side- dal analysis is typically applied to characterize chain χ angles; the overall structure is charac- global properties of the duplex (such as the terized by the particular secondary structure helical repeat or overall helical twist), proper- elements and folding class. In contrast, with ties between adjacent base pairs (such as the nucleic acids, there are many angles of interest. rise), or properties of individual bases (such as These range from the backbone angles α, β, γ, the propeller twist). These properties represent ε, and ζ , to the puckering conformation of the the extent of rotation or translation of the bases furanose ring, to the χ angle representing the or base pairs with respect to a common refer- orientation of the sugar to the base (Saenger, ence frame, typically the helical axis. 1984; see APPENDIX 1B). To characterize the con- The nomenclature and definitions were formation of the sugar moeity (the furanose standardized at an EMBO workshop on DNA ring), the Altona and Sundaralingam concept curvature and bending (Dickerson et al., 1989). of pseudorotation is generally used (Altona and See Figure 7.5.3 for a graphical description of Sundaralingam, 1972). This defines the sugar these values. Despite the standard nomencla- pucker amplitude (representing how far the ring ture and definitions, the precise details of the is from planar) and the pseudorotation phase mathematics were not standardized. Therefore, angle (representing the correlated values of the among the variety of programs commonly used individual torsions making up the ring). Various to analyze helicoidal structure, each differs in values of the pseudorotation phase angle, more the details regarding the exact definition of the commonly referred to as the sugar pucker, rep- helical axis, reference frame, and pivot points. resent different puckerings out of the plane (on Commonly used programs include NEW- the same side as the C5′ atom, endo, or to the HELIX by Richard Dickerson, Curves by He- opposite side, exo). Methods for calculating inz Sklenar and Richard Lavery (Lavery and these values are straightforward and are typi- Sklenar, 1988), and programs by Marla Bab- cally included in most modeling packages. cock and Wilma Olson (Babcock et al., 1994) In addition to characterizing the overall among others. The most developed and consis- backbone structure, sugar pucker, and χ angle tent mathematical treatment of the helicoidal

Molecular Modeling of Nucleic Acid Structure Figure 7.5.3 Pictorial definition of the helicoidal parameters. 7.5.8

Supplement 6 Current Protocols in Nucleic Acid Chemistry parameters is likely either that of Babcock and true for helicoidal parameters, which are very Olson or that of Elhassan and Calladine, which sensitive to the conformation (Cheatham and is fully reversible (Elhassan and Calladine, Kollman, 1997). Modelers should keep in mind 1995). The former has symmetrical definitions that the average structure obtained, such as that on a uniform scale for the various rotations and seen in crystallography or NMR experiments, defines pivot points or axes that minimize hides the detailed dynamics. Moreover, coor- mathematically induced artifactual correla- dinate-averaged conformations are not equiva- tions between the various rotational and trans- lent to torsion-averaged structures, which do lational parameters. Despite the advantages of not necessarily give average properties similar these programs, NEWHELIX and Curves are to that from the mean of the individual coordi- the most commonly used programs to calculate nate sets. Therefore, care should be taken in helicoidal parameters. Although these methods various coordinate comparisons. The common give qualitatively comparable results, care means to compare structures is through the use should be taken in quantitative comparison of of best-fit root-mean-squared deviations helicoidal values calculated from different pro- (RMSd) between the coordinates or torsion grams due to the sensitivity of the method to angles. This indicator is very useful for deter- definition of the reference frame. This is dis- mining the degree of similarity between two cussed in more detail in recent work by Lu and structures (when the RMSd values are small), Olson (Lu and Olson, 1999; Lu et al., 1999). but does less well at representing dissimilarity, A further distinction relates to global versus since small differences in structure can lead to local helicoidal parameters; reference to a local large root-mean-squared differences. helical axis typically relates to the axis between adjacent base pairs, whereas global helicoidal SUMMARY parameters are in reference to some best-fit This unit has introduced molecular model- global helical axis over the whole duplex. ing of nucleic acids on the simplest level. The While the global parameters typically lead to modeling process can be described in three more regular values (and less individual vari- stages: ation), the global axis may not be sufficiently Generation. Create an initial model either determined for small duplexes (such as those by hand building it based on the molecular with less than a full helical repeat) or distorted connectivity or by obtaining the coordinates duplexes (such as an RNA duplex with a bulge), from a depository of experimentally derived giving rise to misleading helicoidal parameters. structures. In the absence of a complete experi- The global axis may therefore not be appropri- mental structure, base the structure on known ate. Moreover, given that the overall structure (cannonical) structure and/or use tools (e.g., is determined by local interactions between Nucleic Acid Builder) to complete the model. adjacently stacked base pairs, local helicoidal Evaluation. Is the structure valid? Judge this parameters may be more representative. When based on chemical/structural intuition and comparing helicoidal values calculated during comparison with experimentally derived struc- modeling to those in the literature, care should tures. The structure can be described in terms be taken to ensure that consistent reference of the backbone angles, sugar pucker, gly- frames (local versus global) and definitions of cosidic χ torsion, and helicoidal parameters. the values are applied. In addition to standard Additionally, it is important to check the stere- helicoidal analysis, groove structure is also ochemistry and hydrogen placement. commonly investigated, such as the relative Manipulation. Coordinate manipulations width and depth of the minor or major groove can be made by simple rotation around chemi- (see, for example, Stofer and Lavery, 1994). cal bonds. As possible, include some crude Helicoidal analysis and calculation of the representation of the energy to avoid bad steric various backbone angles can also be applied to overlap and unrealistic rotations. the individual coordinate snapshots (for like Other units delve more deeply into methods conformations) or a representative coordinate- for evaluating and manipulating the models and averaged structure generated during modeling, representations of nucleic acids that go beyond such as from a molecular dynamics or Monte the single static gas-phase structure model. This Carlo simulation. Although it is often the case includes a discussion of how to properly repre- that average backbone angles calculated as the sent the long-range electrostatic interactions average of individual values for each coordi- and how to include some representation of Biophysical nate snapshot are close to the values determined the effect of the environment (solvent and Analysis of Nucleic Acids from the average structure, this is not typically ionic strength effects; see UNIT 7.9). With a more 7.5.9

Current Protocols in Nucleic Acid Chemistry Supplement 6 realistic representation of the energy (UNIT 7.8), Dickerson, R.E., Bansal, M., Calladine, C.R., Diek- the energy can be used as a guide to suggest mann, S., Hunter, W., Kennard, O., von Kitzing, coordinate manipulations. Evaluation of the E., Lavery, R., Nelson, H.C.M., Olson, W.K., Saenger, W., Shakked, Z., Sklenar, H., Soum- model depends on the reliability of the energy pasis, D.M., Tung, C.S., Wang, A.H., and and how the system is represented, coupled Zhurkin, V.B. 1989. Definitions and nomencla- with the chemical intuition of the modeler and ture of nucleic acid structure components. Nuc. comparison to experimental data. Acids Res. 17:1797-1803. Elhassan, M.A. and Calladine, C.R. 1995. The as- sessment of the geometry of dinucleotide steps LITERATURE CITED in double helical DNA; a new local calculation Abola, E.E., Bernstein, F.C., Bryant, S.H., Koetzle, scheme. J. Mol. Biol. 251:648-664. T.F., and Weng, J. 1987. Protein Data Bank. In Crystallographic Databases—Information Con- Erie, D.A., Breslauer, K.J., and Olson, W.K. 1993. tent, Software Systems, Scientific Applications A Monte Carlo method for generating structures (F.H. Allen, G. Bergerhoff, and R. Sievers, eds.) of short single-stranded DNA sequences. Biopo- pp. 107-132. Data commission of the interna- lymers 33:75-105. tional union of crystallography, Bonn/Cam- Gelbin, A., Schneider, B., Clowney, L., Hsieh, S.-H., bridge/Chester. Olson, W.K., and Berman, H.M. 1996. Geomet- Allen, F.H., Bellard, S., Brice, M.D., Cartright, ric parameters in nucleic acids: Sugar and phos- B.A., Doubleday, A., Higgs, H., Hummelink, phate constituents. J. Amer. Chem. Soc. 118:519- T., Hummelink-Peters, B.G., Kennard, O., 529. Motherwell, W.D.S., Rodgers, J.R., and Wat- Hecht, S. 1996. Bioorganic Chemistry: Nucleic Ac- son, D.G. 1979. The Cambridge Crystal- ids (S. Hecht, ed.) pp. 512. Oxford University lographic Data Centre: Computer-based Press, New York. search, retrieval, analysis and display of infor- mation. Acta Crystallogr. B35:2331-2339. JCBN. 1983. IUPAC-IUB Joint Commission on Biochemical Nomenclature (JCBN). Abbrevia- Altona, C. and Sundaralingam, M. 1972. Conforma- tions and symbols for the description of confor- tional analysis of the sugar ring in nucleosides mations of polynucleotide chains. Recommen- and . A new description using the dations 1982. Eur. J. Biochem. 131:9-15. concept of pseudorotation. J. Am. Chem. Soc. 94:8205-8212. Lavery, R. and Sklenar, H. 1988. The definition of generalized helicoidal parameters and of axis Arnott, S. and Hukins, D.W. 1972. Optimised pa- curvature for irregular nucleic acids. J. Biomol. rameters for A-DNA and B-DNA. Biochem. Bio- Struct. Dyn. 6:63-91. phys. Res. Commun. 47:1504-1509. Leclerc, F., Srinivasan, J., and Cedergren, R. 1997. Babcock, M.S., Pednault, E.P., and Olson, W.K. Predicting RNA structures: The model of the 1994. Nucleic acid structure analysis. Mathe- RNA element binding Rev meets the NMR struc- matics for local Cartesian and helical structure ture. Folding Des. 2:141-147. parameters that are truly comparable between structures. J. Mol. Biol. 237:125-156. Lu, X.-J. and Olson, W.K. 1999. Resolving the discrepancies among nucleic acid conforma- Berman, H.M., Olson, W.K., Beveridge, D.L., West- tional analyses. J. Mol. Biol. 285:1563-1575. brook, J., Gelbin, A., Demeny, T., Hsieh, S.H., Srinivasan, A.R., and Schneider, B. 1992. The Lu, X.-J., Babcock, M.S., and Olson, W. K. 1999. nucleic acid database—A comprehensive rela- Overview of nucleic acid analysis programs. J. tional database of 3-dimensional structures of Biomol. Struct. Dyn. 16:833-843. nucleic acids. Biophys. J. 63:751-759. Macke, T. and Case, D.A. 1998. Modeling unusual Brion, P. and Westhof, E. 1997. Hierarchy and dy- nucleic acid structures. In Molecular Modeling namics of RNA folding. Annu. Rev. Biophys. of Nucleic Acids (N.B. Leontis and J. Santa Biomol. Struct. 26:113-137. Lucia, eds.) pp. 379-393. ACS, Washington, D.C. Cheatham, T.E. III. and Kollman, P.A. 1997. Mo- lecular dynamics simulations highlight the struc- Major, F., Turcotte, M., Gautheret, D., LaPalme, G., tural differences in DNA:DNA, RNA:RNA and Fillion, E., and Cedergren, R. 1991. The combi- DNA:RNA hybrid duplexes. J. Amer. Chem. Soc. nation of symbolic and numerical computation 119:4805-4825. for three-dimensional modeling of RNA. Sci- ence 253:1255-1260. Clowney, L., Jain, S.C., Srinivasan, A.R., West- brook, J., Olson, W.K., and Berman, H.M. 1996. Nielsen, P.E., Egholm, M., Berg, R.H., and Geometric parameters in nucleic acids: Nitroge- Buchardt, O. 1991. Sequence-selective recogni- nous bases. J. Amer. Chem. Soc. 118:509-518. tion of DNA by strand displacement with a -substituted polyamide. Science Molecular Cruz-Neira, C., Sandin, D.J., DeFranti, T.A., Ken- 254:1497-1500. Modeling of yon, R.V., and Hart, J.C. 1992. The CAVE: Audio Nucleic Acid visual experience automatic virtual environ- Olson, W.K. 1996. Simulating DNA at low resolu- Structure ment. Commun. ACM 35:65-72. tion. Curr. Opin. Struct. Biol. 6:242-256. 7.5.10

Supplement 6 Current Protocols in Nucleic Acid Chemistry Pearlman, D.A., Case, D.A., Caldwell, J.W., Ross, http://honiglab.cpmc.columbia.edu/grasp W.S., Cheatham, T.E., Debolt, S., Ferguson, D., The home page for the GRASP continuum electro- Seibel, G., and Kollman, P. 1995. AMBER, a statics and molecular graphics display code devel- package of computer programs for applying mo- oped by Anthony Nicholls. lecular mechanics, normal mode analysis, mo- lecular dynamics and free energy calculations to http://www.lobos.nih.gov/Charmm simulate the structure and energetic properties of molecules. Comp. Phys. Comm. 91: 1-41. The CHARMM molecular mechanics/dynamics software home page at the National Institutes of Saenger, W. 1984. Principles of Nucleic Acid Struc- Health. The root of this link discusses the LoBoS ture. Springer Advanced Texts in Chemistry “lot’s of boxes on shelves” parallel computer devel- (C.E. Cantor, ed.). Springer-Verlag, New York. oped at the NIH for use in molecular simulation. Schlick, T. 1995. Modeling superhelical DNA: Re- cent analytical and dynamical approaches. Curr. http://www.msi.com Opin. Struct. Biol. 5:245-252. The home page for Molecular Simulations, which Schneider, B. and Kabelac, M. 1998. Stereochemis- distributed X-Plor and the commercial version of try of binding of metal cations and water to a CHARMM. phosphate group. J. Am. Chem. Soc. 120:161- 165. http://www.intsim.com Schneider, B., Cohen, D.M., Schleifer, L., Srini- The home page for the company Interactive Simula- vasan, A.R., Olson, W.K., and Berman, H.M. tions, which develops the Sculpt software. This pro- 1993. A systematic method for studying the spa- gram allows real-time molecular modeling with tial distribution of water molecules around nu- continuous energy minimization as the model is cleic acid bases. Biophys. J. 65:2291-2303. manipulated. Schneider, B., Neidle, S., and Berman, H.M. 1997. http://www.ks.uiuc.edu/Research/namd Conformations of the sugar-phosphate backbone in helical DNA crystal structures. Biopolymers The home page for the NAMD molecular mechan- 42:113-124. ics/dynamics simulation package developed by Klaus Shulten’s group at the University of Illinois. Stofer, E. and Lavery, R. 1994. Measuring the ge- ometry of DNA grooves. Biopolymers 34:337- http://dasher.wustl.edu/tinker 346. The home page for the TINKER molecular mechan- Surles, M.C., Richardson, J.S., Richardson, D.C., ics/dynamics software. Includes an extensive list of and Brooks, F.P. 1994. Scuplting proteins inter- WWW links to other MM/MD resources. actively—Continual energy minimization em- bedded in a graphical modeling system. Protein Sci. 3:198-210. Model building and analysis tools, nucleic acid nomenclature Ulyanov, N.B. and James, T.L. 1995. Statistical http://www.scripps.edu/case analysis of DNA duplex structural features. Methods Enzymol. 261:90-120. The home page of Professor David Case at the Scripps Research Institute contains links to the NAB Vologodski, A.V. and Cozzarelli, N.R. 1994. Con- (Nucleic Acid Builder) software and manuals. formational and thermodynamic properties of supercoiled DNA. Annu. Rev. Biophys. Biomol. http://www.eyesopen.com/babel.html Struct. 23:609-643. The home page of the Molecular Structure Informa- Wang, A.H., Quigley, G.J., Kolpak, F.J., Crawford, tion Interchange Hub or the program babel devel- J.L., van Boom, J.H., van der Marel, G., and oped in Professor Dan Dolata’s group by Pat Rich, A. 1979. Molecular structure of a left- Walters and Matt Stahl. This program is very useful handed double helical DNA fragment at atomic for interconverting a variety of different molecular resolution. Nature 283:743-745. modeling program file formats.

INTERNET RESOURCES http://www.chem.qmw.ac.uk/iupac A repository of many of the IUPAC naming conven- Simulation codes tions. This site has a very nice Web page describing http://www.amber.ucsf.edu/amber in detail the notation and naming conventions that apply to nucleic acids. The home page for the AMBER suite of programs for molecular mechanics and dynamics. See also http://www.sphere.ad.jp/hgs the subpage http://www.amber.ucsf.edu/amber/ The site for the company that makes the Maruzen polyA-polyT/ for a tutorial that describes in detail physical molecular models (HGS). For protein and setting up, equilibrating, and running molecular nucleic acids, of particular interest is the Maruzen dynamics simulations using AMBER on a small Biochemistry Molecular Models. DNA duplex in solution. http://igc.ethz.ch/gromos Coordinate repositories and Biophysical The GROMOS molecular mechanics/dynamics soft- Analysis of information resources Nucleic Acids ware home page. http://www.rcsb.org/pdb 7.5.11

Current Protocols in Nucleic Acid Chemistry The Protein Data Bank server at the Research Col- This page, sponsored by the Center for Molecular laboratory for Structural Bioinformatics (Rutgers, Modeling at the NIH, provides a nice introduction SDSC, NIST). to macromolecular simulation.

http://ndbserver.rutgers.edu The Nucleic Acid Database server maintained by Helen Berman and others at Rutgers University. Contributed by Thomas E. Cheatham, III and Bernard R. Brooks http://www.ccl.net/chemistry National Heart, Lung and Blood Institute, NIH The computational chemistry list archives. This con- Bethesda, Maryland tains information about a number of modeling pro- grams, conference listings, and job postings. Peter A. Kollman http://cmm.info.nih.gov/intro_simulation/course_ University of California for_html.html San Francisco, California

Molecular Modeling of Nucleic Acid Structure 7.5.12

Current Protocols in Nucleic Acid Chemistry