Ab Initio Solution of Macromolecular Crystal Structures Without Direct Methods
Total Page:16
File Type:pdf, Size:1020Kb
Ab initio solution of macromolecular crystal structures without direct methods Airlie J. McCoya, Robert D. Oeffnera, Antoni G. Wrobelb, Juha R. M. Ojalac, Karl Tryggvasonc,d, Bernhard Lohkampe, and Randy J. Reada,1 aDepartment of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom; bDepartment of Clinical Biochemistry, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom; cDivision of Matrix Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, 171 77 Stockholm, Sweden; dCardiovascular and Metabolic Disorders Program, Duke-NUS (National University of Singapore) Medical School, 16957 Singapore; and eDivision of Molecular Structural Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, 171 77 Stockholm, Sweden Edited by Axel T. Brunger, Stanford University, Stanford, CA, and approved February 27, 2017 (received for review January 30, 2017) The majority of macromolecular crystal structures are determined (optionally) a correction for the effect of disordered solvent using the method of molecular replacement, in which known described by the parameters fsol and Bsol: related structures are rotated and translated to provide an initial sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi atomic model for the new structure. A theoretical understanding 2 2 Bsol 2π Δ of the signal-to-noise ratio in likelihood-based molecular replace- σA = fP 1 − fsolexp − exp − , [1a] d2 d2 ment searches has been developed to account for the influence of 4 3 model quality and completeness, as well as the resolution of the diffraction data. Here we show that, contrary to current belief, pffiffiffiffi 2π2 Δ2 molecular replacement need not be restricted to the use of models σA ≈ fPexp − . [1b] 3 d2 comprising a substantial fraction of the unknown structure. In- stead, likelihood-based methods allow a continuum of applica- The simpler expression in Eq. 1b neglects the effect of disor- tions depending predictably on the quality of the model and the dered solvent at low resolution. resolution of the data. Unexpectedly, our understanding of the The signal for an MR search can be estimated before the signal-to-noise ratio in molecular replacement leads to the finding calculation as the expected value, or probability-weighted aver- that, with data to sufficiently high resolution, fragments as small age, of the LLGI for a correctly placed model. The expected as single atoms of elements usually found in proteins can yield ab hLLGIi initio solutions of macromolecular structures, including some that value of the contribution of one reflection, hkl, can be D4 σ4 = Methods elude traditional direct methods. approximated simply by obs A 2( ), an approximation that is particularly good for the low values of DobsσA character- macromolecular crystallography | likelihood | ab initio phasing | molecular izing the difficult cases of most interest. In the following, we refer replacement | Shisa to the total expected LLGI, summed over all reflections, as the eLLG. eLLG ver the past century, determination of novel crystal struc- The variance of can similarly be approximated as the D4 σ4 Otures has evolved from an exercise in logic identifying the sum over all reflections of obs A, leading to the conclusion that locations of single atoms by inspecting diffraction patterns (1) or the expected signal-to-noisepffiffiffiffiffiffiffiffiffiffiffiffiffi ratio in an MR search will be pro- vector maps (2), through the development of direct methods for portional to eLLG (Methods). By the same reasoning, the small molecules (3) and of isomorphous replacement (4, 5) or signal-to-noisepffiffiffiffiffiffiffiffiffiffiffiffiffi ratio achieved in a particular search will be pro- anomalous diffraction (6, 7) phasing for molecules as large as portional to LLGI. The theoretical deduction that confidence proteins. in an MR solution can be judged simply by the LLGI value has Currently, about 80% of protein structures are solved by the been validated by analyzing a database of nearly 22,000 MR method of molecular replacement (8), exploiting prior structural knowledge of related proteins. In principle, molecular re- Significance placement (MR) involves rotational and translational searches over many possible placements of a molecular model within the It is now possible to make an accurate prediction of whether or unit cell of an unknown structure. The most sensitive method not a molecular replacement solution of a macromolecular of evaluating the fit to the observed data is a likelihood function crystal structure will succeed, given the quality of the model, (9, 10) that accounts for the effect of measurement errors in the its size, and the resolution of the diffraction data. This un- observed diffraction intensities (11). Potential solutions are scored LLGI derstanding allows the development of powerful structure- by the log-likelihood-gain on intensities ( ), the sum of the solution strategies, and leads to the unexpected finding that, log-likelihoods for individual reflections minus the log-likelihoods Methods with data to sufficiently high resolution, fragments as small as for an uninformative model ( ). single atoms can be placed as the basis for ab initio structure Success in MR depends on the signal-to-noise of the search, solutions. which varies according to two parameters in the likelihood D BIOPHYSICS AND function: obs characterizes the precision of each measurement, Author contributions: B.L. and R.J.R. designed research; A.J.M., R.D.O., A.G.W., J.R.M.O., taking values near 1 for moderately well-measured data and only K.T., B.L., and R.J.R. performed research; A.J.M., R.D.O., A.G.W., B.L., and R.J.R. analyzed COMPUTATIONAL BIOLOGY taking values near 0 for extremely weak data; σA measures the data; A.J.M. and R.J.R. wrote the paper; and all authors contributed to revisions. quality of the model in terms of the fraction of a crystallographic The authors declare no conflict of interest. structure factor that it explains. The resolution-dependent value This article is a PNAS Direct Submission. σ f of A for each reflection can be estimated from the fraction ( P) Data deposition: The atomic coordinates and structure factors have been deposited in the of the X-ray scattering power accounted for by the model (where Protein Data Bank, www.pdb.org (PDB ID code 5m0w). the total scattering power is the sum of the squares of the scat- 1To whom correspondence should be addressed. Email: [email protected]. tering factors for the atoms in the crystal), its estimated accuracy This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. (rms error Δ), and the resolution (d) of the reflection (9), with 1073/pnas.1701640114/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1701640114 PNAS | April 4, 2017 | vol. 114 | no. 14 | 3637–3641 Downloaded by guest on September 28, 2021 freedom compared with a molecule with 6 degrees of freedom (Fig. 1). Our insights predict that, for crystals that contain up to a few thousand unique ordered atoms and diffract beyond about 1-Å resolution, there should be a significant signal in a likelihood search carried out by translating a single sulfur atom over all of its possible positions. Even if the placement of the first atom is ambiguous, the signal will increase quadratically with the number of atoms placed (Fig. 2), allowing the ambi- guity to be resolved. Results Test calculations on a number of systems proved the principle of single-atom MR: it was indeed possible to find sulfur atoms in a variety of protein crystals, as well as phosphorus atoms in one RNA crystal tested (Table S1). The largest structure that yielded to this approach was that of aldose reductase [Protein Data Fig. 1. Confidence in MR solution as function of final LLGI score. The final Bank (PDB) ID code 3bcj] (13). The protein has a mass of refined LLGI score provides a clear diagnostic for success in MR. The three curves show how the success rate for placing the first copy by MR varies with 36 kDa with 2,525 nonhydrogen atoms (2,606 including ligands) LLGI in 3 different space-group symmetry classes: P1 (only 3 rotational de- and no atom heavier than sulfur, and the deposited data extend grees of freedom; red; total of 263 MR trials), polar (3 rotational and to 0.78-Å resolution. The eLLG for a sulfur atom with a B factor 2 translational degrees of freedom, with an arbitrary origin along one axis; equal to the average in the crystal is 4.0, or 12.6 for a well- blue; 4,738 MR trials), and nonpolar (3 rotational and 3 translational degrees ordered sulfur atom with a B factor reduced by only 1 Å2.MR of freedom; black; 16,740 MR trials). implemented in Phaser was able to locate up to 10 atoms with clear signal (Table 1). calculations, where an LLGI of 60 or more in a 6-dimensional A structure comprising a few atoms can then serve as a seed rotation/translation search typically indicates a correct solution. for structure completion by using log-likelihood-gradient maps (See Fig. 1, which also shows that the required signal scales with to select locations for new nitrogen atoms (as a surrogate for the number of degrees of freedom in the search.) The database other types) that improve the MR likelihood score (14) (Meth- of test calculations also reveals that the translation function Z ods). Starting from as few as the first two atoms placed by MR, score (TFZ: the number of SDs by which the translationp functionffiffiffiffiffiffiffiffiffiffiffiffiffi the structure of aldose reductase was extended successfully by peak exceeds its mean) is roughly on the same scale as LLGI, log-likelihood-gradient completion. The result was a model with although the exact relationship depends on the number of 3,051 atoms (some accounting for solvent molecules and for primitive symmetry operators; this justifies the success of TFZ as static disorder) that yields an LLGI of 483,292 and an R value of a measure of confidence (10).