University of Groningen

NMR chemical shift data and ab initio shielding calculations Mulder, Frans A. A.; Filatov, Michael

Published in: Chemical Society Reviews

DOI: 10.1039/b811366c

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record

Publication date: 2010

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA): Mulder, F. A. A., & Filatov, M. (2010). NMR chemical shift data and ab initio shielding calculations: emerging tools for structure determination. Chemical Society Reviews, 39(2), 578-590. https://doi.org/10.1039/b811366c

Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

Download date: 12-11-2019 TUTORIAL REVIEW www.rsc.org/csr | Chemical Society Reviews

NMR chemical shift data and ab initio shielding calculations: emerging tools for protein structure determination

Frans A. A. Muldera and Michael Filatovb

Received 29th June 2009 First published as an Advance Article on the web 4th November 2009 DOI: 10.1039/b811366c

In this tutorial review, we discuss the utilization of chemical shift information as well as ab initio calculations of nuclear shieldings for protein structure determination. Both the empirical and computational aspects of the chemical shift are reviewed and the role of molecular dynamics and the accuracy of different computational methods are discussed. It is anticipated that incorporating theoretical information on chemical shifts will increase the accuracy of protein structures, in the solid and liquid state alike, and extend the applicability of NMR spectroscopy to ever larger systems.

I. Introduction natural milieu. The continued success of NMR spectroscopy is largely due to the constant development of NMR spectroscopy Over the past decades nuclear magnetic resonance (NMR) as a technique. Important advances include improvements spectroscopy has established itself as a powerful technique to in instrumentation hardware, pulse sequence development, determine the three-dimensional (3D) structures of biological access to residual anisotropic interactions, (bio)synthetic macromolecules at atomic resolution, and a valuable comple- isotope enrichment schemes, and in vivo, in-cell detection. It ment to X-ray . Notably, it is the only method is also evident that solid-state NMR spectroscopy is making that can accurately define atomic structures in solution. headway, extending the applicability of NMR spectroscopy to To date (June 2009) over 5000 NMR structures of study protein atomic structure in fibers, crystals, glasses (here taken to contain 50 or more residues) have been deposited and amorphous formulations. For the immediate future, the at the (PDB), whereas the first deposited inclusion of chemical shift information promises to become a structures date back only to 1989. With few exceptions these very powerful addition to structure determination, making use structures have been determined in solution, similar to their of semiempirical relationships as well as ab initio calculations. Recent progress in quantum chemical methods makes it a Groningen Biomolecular Sciences and Biotechnology Institute, possible to derive accurate relationships between calculated University of Groningen, Nijenborgh 4, 9747 AG Groningen, shielding parameters and protein 3D conformation and The Netherlands b Zernike Institute for Advanced Materials, University of Groningen, increases significantly the range of proteins that can be Nijenborgh 4, 9747 AG Groningen, The Netherlands characterized structurally with NMR spectroscopy. Although

Frans A. A. Mulder (MSc and M. Filatov (MSc University PhD cum laude from Utrecht of Novosibirsk (Russia), University) was a post-doctoral PhD Institute of Catalysis fellow with Lewis E. Kay at (Novosibirsk, Russia)) was a the University of Toronto, and postdoctoral fellow with Mikael Akke at Lund Univer- Walter Thiel at Zu¨rich Uni- sity. In 2004 he moved to the versity and with Sason Shaik University of Groningen, and at the Hebrew University of in 2006 he was awarded a Jerusalem before working young investigator VIDI grant as an Assistant Professor at from the Netherlands Research the University of Go¨teborg, Organization. In 2005, he Sweden from 2000 till 2005. spent a sabbatical with H. Vogel Since 2005 he has been a at the University of Calgary. Professor of Theoretical Frans A. A. Mulder Current research interests in Michael Filatov Chemistry at the University his group include protein of Groningen (Netherlands). electrostatics, natively unfolded proteins, NMR pulse sequence His research interests focus on the development of accurate development, protein molecular dynamics, and NMR structure quantum chemical computational schemes within the domain of determination of large proteins using bio-synthetic isotope density functional theory and on modeling ground and excited labeling and computational methods. state electronic properties of organic, bio-inorganic and inorganic compounds.

578 | Chem.Soc.Rev., 2010, 39, 578–590 This journal is c The Royal Society of Chemistry 2010 chemical shieldings are very rich in information, owing to their The most easily accessible alternative probes of molecular tensorial character1 we focus primarily on isotropic chemical conformation in proteins are the chemical shifts of 13C and shifts, as these are more relevant for comparison with 15N nuclei. There are two major routes of using the informa- experimental data. tion from chemical shifts: (i) an empirical route based on the observation that similar structures produce similar chemical shifts, and (ii) a first principles modeling of chemical shifts in II. Experimental determination of specific protein secondary structures. three-dimensional protein structures by A. Empirical relationships between chemical shift and NMR spectroscopy structure All the parameters that can be measured by NMR spectro- Chemical shifts are readily obtained for large proteins, and are scopy are sensitive to molecular structure and dynamics and also among the more easily accessible data for proteins in the can be employed as restraints to construct models of the three- solid state. Therefore, by using this information it should be dimensional structures of proteins. The structures of small possible to further expand the reach of NMR spectroscopy in proteins (up to B100 amino acids) have traditionally been structural biology. However, since many factors potentially 1 derived from two-dimensional, homonuclear H NMR contribute to protein chemical shifts it is important to spectroscopy, and were primarily based on extensive lists of establish their relative contributions, and assess their pairwise distances derived from nuclear Overhauser effects independence.3 As an important first step, one needs to define (NOEs), and from dihedral angle restraints derived from a reference state from which all factors are to be counted. In 2 scalar spin–spin couplings (J). The obvious advantage of this practice it is common to define ‘‘random coil’’ chemical shifts approach is that protein material can be isolated from a for short peptides, which are assumed to be disordered, and natural source, and, after purification, subjected to investi- devoid of persistent structure. Random coil chemical shifts gation. However, even a small protein of 100 residues will rely on the specific identity of the side chain, and conforma- contain approximately 800 protons, hence the NMR spectra tional sampling, and have proven very valuable when investi- are highly congested. With increasing size, the number of gating the effects of amino acid sequence on chemical shifts.4 signals that will appear in the same spectral region will become In a second, alternative, approach the chemical shifts of a overwhelming, and this overlap problem becomes prohibitive peptide fragment are obtained from so-called ‘‘coil libraries’’.5 for spectral assignment and structure determination. Multi- In this case, proteins of known structure are used to filter dimensional spectroscopy mitigates this problem somewhat, chemical shift data belonging to regions of regular secondary but many restraints remain ambiguous. structure (a-helices, b-sheets and turns) from those of ‘‘coil’’. This problem can be overcome by making use of additional The resultant library is assumed to represent a generic 13 15 spins in proteins, such as C and N. However, because reference state for disordered peptides, and the formation of of the low natural abundance of the favorable spin-1/2 structure will induce chemical shift changes through, for 13 15 isotopes—1.1% for C and 0.36% for N—this approach example, changes in backbone dihedral angles, hydrogen only became feasible through the use of heterologous protein bonds and ring current shifts of nearby aromatic residues. In 13 expression and isotope enrichment. The large spread in C a third approach, a random coil chemical shift data set can be 15 1 15 and N chemical shifts, coupled with the use of [ H– N] and determined under strongly denaturing conditions. In an 1 13 [ H– C] correlation spectroscopy, largely overcame the over- extensive investigation of this kind, Schwarzinger et al.6 lap problem. The widespread practice of uniform isotopic investigated the effects of neighboring amino acids in the enrichment since the 1990s has pushed the molecular weight peptides Ac-GGXGG-NH2, where X equals any one of the limit for routine structure determination towards larger twenty naturally occurring amino acid types, and used this to systems (up to B250 amino acids), and provided additional compare with the protein apo myoglobin, under identical 13 13 1 13 probes such as C– C and H– C coupling constants, as well environmental circumstances. as access to protein dynamics from 15Nand13C spin relaxation. Nonetheless, many proteins and complexes of biological 1H chemical shifts. The sensitivity of proton chemical shifts interest are significantly larger than 30 kDa, and still remain to structural and conformational effects in proteins have long out of reach. This is because the local magnetic fields from been known, but it is still difficult to accurately decipher magnetic dipoles and anisotropic chemical shielding become the multiple contributing factors. These primarily include increasingly effective in damping the NMR signal for larger sensitivity to conformation, hydrogen bonding, electric fields, molecules; the more rapid decrease of signal leads to broader and ring currents. Their relative contributions also vary for lines, which reduces the resolution and sensitivity of the NMR different chemical groups. For example, several studies have experiment. A way around this issue has been to eliminate or demonstrated that clear correlations exist between the 1Ha dilute the abundant and strongly polarized proton spins by chemical shift and secondary structure, with upfield shift in protein deuteration, effectively removing the strongest relaxation a-helical conformations of about 0.3 ppm, and downfield source. While a high degree of deuteration is beneficial to the shifts for b-sheets of approximately 0.5 ppm.5,7 A relationship spectral quality, the information for defining protein structure between amide proton shifts and backbone conformation also is now largely lost, as most aliphatic and aromatic protons exists, contributing to the observed downfield shifts for have been removed. There is therefore clear need to use b-sheets, but having an almost negligible influence on the alternative sources of information on 3D protein structure. shifts for a-helices. In addition, the distributions of the

This journal is c The Royal Society of Chemistry 2010 Chem.Soc.Rev., 2010, 39, 578–590 | 579 secondary chemical shifts for amide protons are larger than Although these conformational effects have hitherto been those observed for 1Ha,8 indicating that their correlation with largely neglected, two recent studies demonstrate that the secondary structure is weaker. A second contribution arises effects can be sizable,9,10 and may in part explain the remaining from hydrogen bonds, and most strongly influences 1HN discrepancy between observed and calculated chemical shifts.3 chemical shifts. Hydrogen bonds often are responsible for Of note, in contrast to the other backbone shifts, 13Ca shieldings the most strongly shifted signals, and display a sensitive are not sensitive to this effect, as the intervening dihedral angle dependence on acceptor–donor distance. is the near-planar peptide bond, which is always trans with A third, and sometimes large, contribution arises from the the rare exception of cis Xxx-Pro (where Xxx denotes any magnetic susceptibility anisotropies due to aromatic groups, amino acid). and, to a lesser extent, peptide groups. The ring currents of Since carbon chemical shifts are so dominantly dependent nearby aromatic groups most significantly affect 1Ha chemical on backbone geometry, they are the most accurate predictors shifts, thereby blurring their relationship with conformation. of secondary structure.3 Based on this premise, algorithms Taking these effects into account prior to chemical shift have been developed to score the presence of secondary analyses further explicates the structural correlation of 1H structure from chemical shift data alone. The most well-known chemical shifts. However, calculating ring current contri- program is likely the Chemical Shift Index (CSI), due to butions is challenging, as they are very sensitive to the precise Wishart and Sykes,11 which originally was based on scoring structure, which may not be available with sufficient accuracy, 1Ha secondary chemical shifts, and later included 13Ca, 13Cb and subject to dynamics. and 13C0. This approach was subsequently further refined. For example, Wang and Jardetzky8 used a joint probability 13 13 a 13 b C chemical shifts. Backbone C and side chain C calculation, including 1HN, 1Ha, 13Ca, 13Cb, 13C0, and 15N carbon chemical shifts are first of all sensitive to amino acid chemical shifts. They found that the reliability to discern side chain identity, due to the presence of electron-withdrawing a-helix from coil decreased in the order 13Ca 4 13C0 4 substituents or charges. When these contributions are 1Ha 4 13Cb 4 15N 4 1HN, whereas this order is 1Ha 4 accounted for by subtracting random coil values for these 13Cb 4 1HN B 15N B 13Ca B 13C0 to detect b-sheet structures. 13 amino acids, the remaining secondary chemical shifts, Dd( C), These results indicate that 1HN and 15N chemical shifts can display a high sensitivity to backbone f/c torsion angles also be useful indicators. Further improvement can be 13 a (see Scheme 1). Excluding glycines, C chemical shifts are obtained by prior correction for neighboring effects on the shifted downfield by 2.2 (Ala) to 4.5 ppm (Thr) in helices, and random coil chemical shifts.5–7 upfield by 0.4 (Thr) to 1.8 (Arg) ppm in b-sheets.5 Since the standard deviations for the secondary chemical shifts are only 15N chemical shifts. Peptide amide nitrogen chemical shifts about 1.3 ppm this explains their power for the detection of display a large variation, covering about 30 ppm. For example, 13 b secondary structure. C secondary chemical shifts are some- the random coil chemical shifts of Gly are around 110 ppm, what smaller, and of opposite sign, such that the secondary and well-separated from these of other residue types. 15N 13 a 13 b chemical shift difference Dab = Dd( C ) Dd( C ) is an even chemical shifts for individual residue types have standard 13 0 better predictor. Finally, backbone carbonyl C chemical deviations of approximately 4 ppm. Much of the variation is shifts are sensitive to the amino acid side chains on both sides due to the large influence of the preceding amino acid side of the peptide plane, but the correction factors are small, with chain; the 15N chemical shift will be about 4 ppm higher for an the exception of Pro (up to 2.5 ppm, depending on amino acid preceded by the b-branched residues Val, Thr or conformation) and aromatic amino acids (up to 0.5 ppm) Ile as compared with Ala or Gly. After taking these neighboring 13 0 at position +1 in the sequence. The C chemical shift is also effects into account there remains less than one ppm variation quite sensitive to secondary structure, showing sizable (3 ppm) unexplained for flexible peptides, such that the remaining differences between a-helix and b-sheet. variations in folded proteins must be explained by other A further contribution may arise from conformational contributions. Besides occasional ring current effects preferences of substituents that are three bonds apart: a other significant variations result from hydrogen bonding, carbon nucleus in the gauche position about a subtending deviations from peptide bond planarity, and the presence of dihedral angle experiences increased shielding by up to 5 ppm, nearby charges. relative to the same group in the trans position. Depending on the probe nucleus considered, these gamma-gauche effects Semiempirical chemical shift calculations. We consider here depend on backbone, and/or side chain rotameric states. two recent, accurate programs to calculate chemical shifts from known 3D structures, which are freely distributed. These are SHIFTX by Neal, Wishart and co-workers,12 and SPARTA by Shen and Bax.7 The performance of these programs is summarized in Table 1. The program SHIFTX was developed to calculate 1H, 13C and 15N chemical shifts from atomic coordinates. It makes use of chemical shift hypersurfaces, derived from chemical shift and structural data, combined with semi-classical equations that relate the chemical shifts to ring currents, hydrogen Scheme 1 Backbone dihedral angles in peptide chain. bonding, solvent effects and electric field contributions. Many

580 | Chem.Soc.Rev., 2010, 39, 578–590 This journal is c The Royal Society of Chemistry 2010 Table 1 Correlation coefficient (r) between predicted and experimental closest agreement between prediction and experiment. As can chemical shifts and rms error for the semi-empirical programs be seen from Table 1, the improvements are small, but as Shen SHIFTX12 and SPARTA7 and Bax argue,7 in molecular fragment replacement methods 1Ha 1HN 15N 13Ca 13Cb 13C0 the search for matching peptide fragments on the basis of multiple chemical shifts strongly reduces the search through SHIFTXa Correlation (r) 0.91 0.74 0.91 0.98 1.00 0.86 the combined effect of a large number of independent, small Rms error (ppm) 0.23 0.49 2.43 0.98 1.10 1.16 improvements. To demonstrate the facility of the method, a SPARTA plot of the correlations of predicted versus observed chemical Correlation (r) 0.88 0.72 0.89 0.98 0.99 0.86 Rms error (ppm) 0.27 0.51 2.52 0.98 1.07 1.09 shifts is shown in Fig. 1. Similar results were obtained using SHIFTX.12 a An analysis on the same set of proteins by both procedures performed by Shen and Bax7 indicates that the SHIFTX performance on that data set deviates slightly from the numbers in the table. In that B. Using chemical shifts to aid protein structure determination comparison, the rms errors using SHIFTX were 0.29, 0.54, 2.87, 1.12, 1.25 and 1.28 ppm for 1Ha, 1HN, 15N, 13Ca, 13Cb and 13C0, respectively. With the advent of increased understanding of the various contributions to the chemical shift and as their quantitative prediction emerges, there is hope that they will be intelligible of the necessary parametrizations used in the calculation were enough to enable de novo structure prediction. One early derived from the extensive body of literature on these effects, approach in this direction is the TALOS program,13 which is in addition to a large body of three-dimensional protein based on the notion that similar structures are expected to structures and chemical shift data. The program is able to yield similar chemical shifts. The TALOS database searching obtain very good correlations between calculated and program contains extensive chemical shift assignments and observed chemical shifts. high-resolution structures for several proteins, and can be used The program SPARTA uses a database searching method, to search for contiguous segments of three amino acid residues which utilizes both structural homology and protein sequence which most closely agree with a segment of known structure. to predict the 1HN, 1Ha, 13Ca, 13Cb, 13C0, and 15N chemical The output of the program can subsequently be used to shifts, considering backbone f/c torsion angles, side chain w1, restrain the backbone f/c torsion angles in the process of hydrogen bonding, and ring current shifts. The predictive structure calculation. power of the method was optimized by iterative adjustment Recently two approaches were developed for 3D protein of weighting factors for the various contributions to yield the structure determination from chemical shift data alone that

Fig. 1 Scatter plots comparing experimental and SPARTA-predicted secondary chemical shifts for backbone 15N, 1HN, 1Ha, 13Ca, 13Cb and 13C0 nuclei. The RMS deviations (in ppm) and Pearson correlation coefficients (R) between experimental and SPARTA-predicted shifts are indicated. Reprinted with kind permission from Springer Science + Business Media: J. Biomol. NMR,7 copyright (2007).

This journal is c The Royal Society of Chemistry 2010 Chem.Soc.Rev., 2010, 39, 578–590 | 581 agree rather closely with the target structures.14,15 Here, the diamagnetic current is proportional to the magnitude of the

TALOS strategy for finding fragments is extended to applied external field, this results in a local magnetic field Bloc contiguous segments, which are subsequently combined to at the nucleus A given by build a complete 3D protein model. In the CHESHIRE B =(1 sd,A)B (1) algorithm15 secondary chemical shift information is first used loc 0 to determine secondary structure and backbone f/c torsion where sd,A is a small (usually in the range of 106) diamagnetic angles, from which low-resolution structures without side shielding constant. chains beyond the Cb atom are assembled through simulated In molecules, the circulation of electronic currents around annealing. In the next step all atoms are represented explicitly, the target nucleus is hindered due to the presence of other and refined by allowing movement about backbone and side nuclei and electrons revolving about them and this deviation chain torsion angles in an energy refinement protocol, from the spherical symmetry leads to emergence of an augmented with information about side chain rotamer statistics. additional contribution to the total nuclear shielding, Using this approach, 11 proteins of 46–123 residues were stot,A = sd,A + sp,A (2) determined to 2 A˚ or better rmsd for the backbone, whereas the agreement for side chains was poorer. which is known as the paramagnetic term, sp,A.17 Because the The program CS-ROSETTA was assembled by Shen, Bax, paramagnetic contribution opposes the diamagnetic shielding Baker and co-workers14 using similar concepts. The ROSETTA (in the cases of interest in this review), this results in deshielding machinery for structure prediction has been supplemented (an increase in the resonance frequency) of the target nucleus with the empirical relationships derived from SPARTA.7 For as compared to an isolated atom. The paramagnetic term 16 proteins, ranging in size from 56 to 129 residues, full-atom involves the mixing between ground and excited states of the models were obtained that are within 0.7–1.8 A˚ backbone molecule due to the magnetic field, and it is rather sensitive to rmsd of the experimentally determined X-ray or NMR structures. the molecular electronic structure. Here, as with the CHESHIRE protocol, the current lack of In practice, nuclear shielding constants are not measured accurate relationships between the chemical shifts of side chain relative to a bare nucleus, as eqn (1) implies, but with respect nuclei and amino acid conformation in the models results in to the target nucleus embedded in a reference compound.18 limited agreement for the side chains. Since quantum chemical Thus, the chemical shift dA is now defined as calculations can yield this information, they can provide the n n g B ðsref sAÞ missing insight into the relationship between the observed A 6 A ref 6 A 0;z d ¼ 10 ¼ 10 ref shifts and protein structure. Recent methodological advances nref gAB0;zð1 s Þ ð3Þ enable one to accurately predict chemical shifts in large 106ðsref sAÞ protein fragments from first principles, with the inclusion of important environmental and conformational factors that where it is used that the magnitude of nuclear shielding is of influence the chemical shifts.16 the order of 106–104 for nuclei of light elements such as 1H or 13C. The so-defined chemical shift is expressed in parts per III. Theoretical methods of NMR chemical shift million (ppm) units. calculation The values of absolute shielding constants eqn (2) can be obtained from the measured chemical shifts provided that tot,A The resonance frequency nA = gAB0,z/2p of a given magnetic s is known for at least one compound containing the p,A nucleus A in external magnetic field B0 is determined by its target nucleus. The paramagnetic contribution s is related magnetic moment ~mA = gAIA, which is related to the nuclear to the so-called spin-rotation constants which can be inde- 19 spin IA and the magnetogyric ratio gA. Here and below we pendently measured from molecular beam experiments. The p,A define that the magnetic field B0 as aligned with the z-axis and so-obtained s values are combined with the diamagnetic employ the atomic system of units in which the Planck terms sd,A obtained from ab initio calculations. This technique constant h = h/2p, Bohr radius a0, electron charge e and enables one to convert the entire set of experimental chemical electron mass me are set to unity. For practically affordable shifts for a given nucleus into the absolute scale of shieldings. magnitudes of the magnetic field B0, the frequency nA lies Currently, the experimental values of absolute shieldings are within the radio frequency domain. Because different magnetic known with high accuracy for many nuclei, e.g. for 1H 13 19 nuclei have different magnetogyric ratios (e.g., g1H ¼ and C. 6 1 1 6 1 1 267:513 10 rad s T , g13 ¼ 67:261 10 rad s T , C A. Theory of chemical shielding tensor: diamagnetic and etc.), they resonate at different frequencies. paramagnetic contributions This simple picture however is valid for a bare nucleus and, in atoms or molecules, it should be corrected for the presence A detailed theory of chemical shielding in molecules can be of atomic or molecular electrons. Application of an external derived from the energy expression for a molecule containing magnetic field induces electronic currents which, in turn, a magnetic nucleus placed in a magnetic field.17 When a generate an additional local magnetic field. In a spherically magnetic field of strength B0 is applied to a closed-shell symmetric atom, these currents are purely diamagnetic in molecule containing a magnetic nucleus A with magnetic origin and the resulting local magnetic field opposes the moment ~mA its ground state energy changes due to the inter- external field thus leading to shielding of the atomic nucleus action of the induced electronic currents with the nuclear (lowering of the resonance frequency). Because the induced magnetic moment and with the applied field. For weak static

582 | Chem.Soc.Rev., 2010, 39, 578–590 This journal is c The Royal Society of Chemistry 2010 perturbations, such as B0 and ~mA, the molecular electronic In practical applications, eqn (8) and (9) are often replaced energy E(~mA, B0) can be expanded in a Taylor series around with a computationally more convenient expression for the the unperturbed energy value. In this expansion, the term shielding tensor20 bilinear in B and ~m will be exactly equal to the change in the 0 A X 2 X nuclear Zeeman energy ~m (1 sA) resulting from inter- A @ hmn @Dmn @hmn A B0 s ¼ Dmn þ ð10Þ ab ! @ B ! action with the surrounding electrons. Therefore the Taylor m;n @ðB0Þb@ðmAÞa m;n ð 0Þb @ðmAÞa coefficient given by a second derivative of the molecular which is derived from eqn (4) with the help of the interchange electronic energy E(~mA, B0) with respect to the field strength and the nuclear magnetic moment can be identified with the theorem of perturbation theory. In eqn (10), hmn are the nuclear shielding tensor. elements of the one-electron Hamiltonian matrix and Dmn are the elements of the density matrix in the atomic orbitals @2E m! ; B (AO) representation. The first term in eqn (10) represents the A ð A 0Þ sab ¼ a; b ¼ x; y; z ð4Þ ! diamagnetic part of the shielding tensor and the second term @ma@Bb m n!0 B0!0 represents the paramagnetic part. For large molecules, sd and sp are rather large in absolute From eqn (4), the isotropic shielding constant sA and the iso magnitude and are of opposite sign thus nearly compensating anisotropy parameter DsA can be obtained as18 each other in the total shielding constant. This requires the use A 1 A A A 1 A A A siso = 3(sxx + syy + szz)=3(s11 + s22 + s33) (5) of accurate computational approaches in connection with eqn (8), (9) or (10). However, simple qualitative arguments and can be used to analyse different contributions to sd and sp. A 1 Thus, it is obvious from eqn (8) and (9) that non-vanishing Ds = s33 2(s11 + s22). (6) contributions are made by the products of molecular orbitals In eqn (5) and (6), sab (a,b = x,y,z) are the Cartesian (occupied–occupied orbital pairs for sd and occupied–virtual components of the shielding tensor (a non-symmetric tensor orbital pairs for sp) which comprise a rotation. Using these of rank 2) and sij (i,j = 1,2,3) are the components of arguments a contribution of diamagnetic ring currents (that is the shielding tensor in the principal axes system, a nearly free circulation of electrons in the occupied p-orbitals) coordinate system in which the symmetric part of the tensor in aromatic p-systems to e.g.sd of 1H can be identified.21 is diagonal. Analogously, s–p* and p–s* excitations should play an The magnetic field is incorporated into the quantum important role for e.g.sp of 13C in double bonds. Although mechanical equations for electrons via the minimal coupling partitioning of the overall shielding into contributions from ˆ prescription within which the canonical momentum p is ring currents, specific bonds, substituent groups, etc. can be ˆ replaced with the mechanical momentum p + A/c, where useful for analysis of trends and for a rapid evaluation of E c 137 a.u. is the velocity of light. The vector potential A chemical shifts (for example, they have been implemented in is due to both the external magnetic field B0 and the field of the the SHIFTX program12), it should be realised that such a magnetic nucleus A with the moment ~mA partitioning is somewhat artificial, because, in large molecules, 1 m! r there is no unambiguous way of separating different parts A ¼ A þ A ¼ B ðr R Þþ A iA : ð7Þ 0 mA 2 0 iA 0 r3 (for example, s–p separation), and can not serve as a basis for iA accurate calculation of nuclear shieldings.21 Using mechanical momentum with eqn (7) in the molecular B. Gauge invariance Schro¨ dinger equation, Ramsey17 derived the following expressions for the components of the shielding tensor: In eqn (8) and (9), the operator kernels depend on the

X origin R0 of the vector potential A0 of the external uniform d;A 1 3 sab ¼ hc0j ½^eb ðriA R0Þ ½^ea riA=riA jc0ið8Þ magnetic field B0, see eqn (7). Because the external field 2 ! 2c i B0ðB0 ¼r A0Þ is independent of the gauge origin R0,so for the diamagnetic component and should be the shielding tensor. Although this property is not at "#all obvious from expressions (8) and (9), with the use of the P ! P X X hypervirial theorem hc c E E c r c p;A 1 1 ^ 3 k i ri P0i¼ð 0 kÞh kj i ij 0i sab ¼ ðEk E0Þ ^ea hc0j li=riA jcki c2 and the completeness relation k|ckihck| = 1, the vector k i "# ð9Þ algebra relationships can demonstrate that the total nuclear X shielding tensor is indeed independent of the choice of R , ^0 0 ^eb hckj l ijc0i that is i A A sab(R0) sab(0) = 0 (11) for the paramagnetic component. In eqn (7)–(9), R0 is an 20 arbitrary gauge origin (origin of the vector potential A0), eˆa for any R0. The fulfilment of eqn (11) is guaranteed when the and eˆb are unit vectors along the respective Cartesian axes, riA completeness relation is satisfied and the unperturbed wave ˆ is the position of electron i with respect to the nucleus A, li = functions ck are obtained variationally. Although the latter riA pˆi is its angular momentum relative to the nucleus A, and condition is easily satisfied in practice, the former condition ˆ0 l i =(riA R0) pˆi is its angular momentum with respect to requires the use of a complete basis set (CBS) which is not the gauge origin R0. practically attainable. Therefore, in practical applications of

This journal is c The Royal Society of Chemistry 2010 Chem.Soc.Rev., 2010, 39, 578–590 | 583 eqn (8), (9), or (10), there remains a dependence on the choice a point at which the orbital wm is centered. Hence the use of of the gauge origin R0 unless special care is taken to remove it. GIAOs in the calculation of matrix elements of the molecular Hamiltonian leads to cancellation of the dependence on the C. Practical methods of NMR shielding tensor calculation 20 gauge origin R0. Perhaps the simplest starting point for the shielding calcula- In the MO-LCAO approximation using GIAOs, the tion is in the use of the Hartree–Fock approximation, within coefficients Cmj in eqn (12) become dependent on external which the ground state wave function is given by a single magnetic field. Because the basis functions and the expansion 1/2 Slater determinant c0 =(N!) Jf1f1...fjfj...fN/2fN/2J, coefficients become complex in the presence of an external where fj are the occupied molecular orbitals (MO). magnetic field, the resulting GIAO-HF equations become more Commonly the MOs are represented as linear combinations complicated than in the case of the common origin method.20,23 of the atomic orbitals, the MO-LCAO approximation An additional set of overlap integrals and two-electron integrals X need to be evaluated within the GIAO-HF method, which makes C fj ¼ mjwm ð12Þ this approach more time consuming. However, this is compen- m sated by a much faster convergence of the calculated shielding and the density matrix of a closed shell molecule is given by parameters with respect to the basis set expansion.23 In practical calculations with the GIAO approach, it is possible to obtain XN=2 reasonably converged theoretical values of shielding parameters Dmn ¼ 2 CmiCni: ð13Þ i¼1 with the use of basis sets of modest size.

The derivative of the density matrix with respect to the applied Local gauge origin methods: IGLO and LORG. A number of magnetic field qDmn/q(B0)b can be obtained from the solution computational schemes were developed which avoid the of the coupled-perturbed Hartree–Fock (CPHF) equations necessity to compute a large number of additional two- in which perturbation due to the external magnetic field electron integrals employed in the GIAO formalism. Probably (first term on the right hand side of eqn (7)) is included into the most economic gauge-independent approach to the the Fock operator.22 calculation of nuclear shielding parameters is realized in the individual gauge for localized orbitals (IGLO) method of Common gauge origin methods. With the use of a common Schindler and Kutzelnigg.24 In the IGLO method, the phase gauge origin R the perturbed Fock operator becomes gauge 0 factors similar to (14) are attached to the molecular orbitals origin dependent and this dependence is transferred into the rather than to the AOs as in the GIAO approach. The new calculated nuclear shieldings unless a complete basis set is molecular orbitals j are introduced via employed to expand the MOs (12).22 In the theory with j d p iB0ðRjR0Þr=2 common gauge origin, both s and s are rather large and fj ¼ e jj ð15Þ nearly cancel each other in a relatively small value stot. d Because s is a ground state property and can be calculated where Rj is the position vector of the individual gauge origin with relatively high accuracy, even small inconsistencies in the of the orbital fj (usually chosen as the centroid of the charge description of the paramagnetic contribution, e.g. due to the of the orbital). Similar to the GIAO approach, this choice of use of insufficiently flexible basis sets, may translate to large MOs leads to cancellation of the dependence of the matrix 24 errors in the total shielding. Although the shieldings become elements in eqn (10) on the choice of the gauge origin R0. gauge origin independent in the CBS limit, the convergence Because the canonical molecular orbitals are delocalized over with respect to basis set size is rather slow and requires the use the entire molecule, it is more convenient to work with localized of very large basis sets which severely restricts the applicability MOs. The use of localized (e.g. by the Foster–Boyd criterion) of such an approach. MOs offers a possibility to analyze the chemical shielding parameters in terms of individual contributions from atomic Gauge-including atomic orbitals (GIAO) method. An core electrons, chemical bonds, and lone pairs thus allowing for approach to the calculation of shielding tensors in which the a transparent interpretation of the shielding parameters.24 dependence on the gauge origin is removed is based on the use Yet another theoretical method which yields gauge-origin of so-called gauge-including atomic orbitals (GIAO). The first independent shielding constants was developed by Hansen and ab initio shielding calculations employing GIAOs were carried Bouman25 based on the Hartree–Fock wave function and 20 out by Ditchfield and the developed computational scheme is referred to as the localized orbital/local origin (LORG) was revised by Pulay et al.23 who formulated it in a method. In this method, individual gauge origins Rj are computationally convenient form. introduced for the occupied orbitals and the diamagnetic (8) In the GIAO approach, complex basis functions which and paramagnetic (9) contributions are split into two parts of depend on the external magnetic field B 25 0 which only one is dependent on the global gauge origin R0.

GIAO iB0ðRmR0Þr=2c The dependence on R0 is then eliminated under the assump- wm ðB0Þ¼e wm ð14Þ tion of a complete basis set. Similar to the IGLO method, the are employed in eqn (12). In eqn (14), wm is a field-independent local gauge origins Rj are selected as the charge centroids of 25 real atomic orbital centered at the position Rm. The property the localized molecular orbitals. With this choice of MOs, GIAO which makes the new functions wm especially convenient in the LORG shieldings can be decomposed into the individual shielding calculations is that, under the action of the mecha- bond contributions. With the use of the complete basis set, the nical momentum operator, the gauge origin R0 is shifted to Rm, shielding tensors obtained with both methods, LORG and

584 | Chem.Soc.Rev., 2010, 39, 578–590 This journal is c The Royal Society of Chemistry 2010 IGLO, should converge to the values obtained with a common origin CPHF approach.25

Comparison between the GIAO, IGLO, and LORG approaches. Although the methods based on localized MOs, such as IGLO and LORG, are computationally less demanding than the GIAO approach, it is the latter approach that is currently more popular in practical calculations of NMR shielding parameters. More than 1400 scientific articles published since 1980 refer to GIAO as the method of shielding calculation, whereas 411 articles refer to IGLO and only 60 to LORG. Probably the turning point was the work of Pulay et al.23 whoachievedavery efficient implementation of the GIAO-HF method for NMR chemical shieldings with the use of efficient techniques for the two-electron integrals calculation in combination with fast algorithms for solving the CPHF equations. Although the Fig. 2 Deviations from experiment of 13C chemical shifts IGLO method still remains computationally the fastest method (with respect to CH4) calculated with different ab initio theoretical for the calculation of chemical shieldings (it can be faster than methods. Data taken from ref. 29. GIAO23,24 by a factor close to 2), the GIAO method shows much lower sensitivity with respect to the choice of basis set.23 delocalized orbitals. Most straightforwardly this can be done Besides being much less sensitive to basis set choice, the GIAO with the use of the GIAO method, because it does not depend approach represents a very convenient starting point for the on a specific representation of MOs. Using the derivative development of computational schemes which include electron approach to molecular properties, eqn (10), Gauss27 has correlation effects on calculated shieldings. extended the GIAO-HF method of Pulay to second-order Møller-Plesset perturbation theory, GIAO-MP2. Although Electron correlation methods for nuclear shieldings. The first the GIAO-MP2 method is more time-consuming than practical applications of the computational schemes described GIAO-HF, its use leads to a marked improvement in the above have been done at the Hartree–Fock level of approxi- calculated nuclear shieldings in comparison with experiment.27 mation in which it is assumed that any particular electron in This is illustrated in Fig. 2, where the deviations from experi- the molecule moves in the average field of all other electrons. mental values in 13C NMR chemical shifts are shown for a Within this approximation the correlation in the motion of series of hydrocarbons and heterosubstituted compounds. electrons is lost, and, as a consequence, the Hartree–Fock Even more accurate inclusion of the electron correlation method often yields rather poor results for many molecular effects has been achieved in the GIAO-CCSD and GIAO- properties, such as the molecular atomization energies, CCSD(T) methods,28 which employ the coupled-cluster vibrational frequencies, dipole moments, etc. An accurate Ansatz with single and double excitations (CCSD) and with description of the nuclear shielding parameters requires perturbative treatment of triple excitations (CCSD(T)). Using inclusion of the electron correlation. This can be achieved, the latter method, Gauss et al.28,29 have achieved an accuracy for example, via the use of many-body perturbation theory, of ca. 1 ppm (see Fig. 2 and Table 2) in describing the absolute either in the form of a finite perturbational expansion, as in the 13C isotropic shieldings in a series of molecules for which Møller–Plesset (MP) perturbation theory, or in the form of the accurate gas-phase NMR data are available. To achieve this coupled cluster (CC) approximation, where partial summation accuracy one needs to employ very large basis sets and of the perturbation series is carried out up to infinite order.26 incorporate vibrational corrections to the isotropic shieldings.29 The importance of electron correlation for the nuclear For the latter purpose, the shielding tensor is expanded around shieldings was established already in early calculations which the molecular equilibrium geometry in a Taylor series in terms employed the common origin approach based on eqn (8) and (9). of the normal coordinates Qr. Then, one obtains the vibra- 29 Electron correlation has the greatest effect on the paramag- tionally averaged shielding tensor s0 using the average values netic contribution for which the Hartree–Fock approximation of displacements along the normal modes obtained from the often underestimates the stability of the ground state relative anharmonic force fields, as in to the excited states (see ref. 26 and references cited therein). X @s Thus, the correlation may account for up to ca. 30% reduction s ¼ s þ hQ i 0 e @Q r in the absolute magnitude of the paramagnetic contribution in r r Qr¼0 the compounds of light elements, especially those with lone ð16Þ 1 X @2s electron pairs and multiple bonds. þ hQ Q iþ: 2 @Q @Q r s Inclusion of the electron correlation into the local gauge r;s r s Qr;Qs¼0 origin methods, LORG or IGLO, is hampered by the necessity to use localized molecular orbitals.26 Since most electron Obviously, such an approach is very time consuming and can correlation theories are based on canonical Hartree–Fock currently only be applied to relatively small molecules. However, orbitals, it appears most convenient to formulate the the results of these calculations are indispensable for setting up correlated theory of nuclear shieldings in terms of these proper scales of the nuclear shielding constants in the gas phase.

This journal is c The Royal Society of Chemistry 2010 Chem.Soc.Rev., 2010, 39, 578–590 | 585 Table 2 Mean absolute deviations (MAD) and standard deviations from the experiment in the theoretical 13C shieldings obtained with different methods. Data taken from refs. 29 and 36

Method MAD Standard deviation GIAO-HF 8.8 10.7 GIAO-MP2 1.5 2.2 GIAO-CCSD(T) 0.7 0.8 GIAO-BP86 5.9 4.0 GIAO-B3LYP 6.8 5.3 IGLO-BPW91 3.4 5.4

Density functional calculations of NMR shielding tensors. A cost-effective alternative to correlated wave function ab initio methods is provided by density functional theory (DFT). DFT represents an ingenious reformulation of the many-body problem in quantum mechanics whereby the Fig. 3 Deviations from experiment of 13C chemical shifts (with problem of solving the Schro¨ dinger equation for interacting respect to CH4) calculated with different DFT methods. Data taken electrons is cast in the form of a much simpler set of one- from ref. 29 and 36. electron equations, the so-called Kohn–Sham equations. The 35 cost of a DFT calculation is roughly the same as the cost of a and GIAO-DFT. Although a large number of approximate Hartree–Fock calculation. Modern density functional methods density functionals are currently available, none of the have been successful with regard to the calculation of commonly used functionals has a clear advantage over the molecular thermochemical and electronic properties such as others. This can be seen from Table 2 and Fig. 3, which show 13 equilibrium geometries, atomization energies, vibrational deviations from the experimental values of the theoretical C frequencies, dipole moments, excitation energies, etc.However, shieldings obtained with three popular density functionals. the commonly employed density functionals are less successful Even though these functionals perform better than the in the calculation of nuclear shieldings.30,31 GIAO-HF method on average, all of them have conspicuous Generally, the magnitude of the paramagnetic contribution failures for certain molecules, especially those with multiple is overestimated with the common density functionals, thus bonds and heteroatoms. However, for a particular class of leading to much too deshielded nuclei.30 This overestimation molecules, the results of density functional calculations can reach was explained as a consequence of too low orbital energy acceptable accuracy in the calculation of chemical shifts. Because differences ea ei between t