research papers
Journal of Applied Phaser crystallographic software Crystallography a b b c ISSN 0021-8898 Airlie J. McCoy, * Ralf W. Grosse-Kunstleve, Paul D. Adams, Martyn D. Winn, Laurent C. Storonia‡2and Randy J. Reada
Received 31 January 2007 Accepted 27 April 2007 aDepartment of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Hills Road, Cambridge CB2 0XY, UK, bLawrence Berkeley National Laboratory, One Cyclotron Road, Bldg 64R0121, Berkeley, CA 94720-8118, USA, and cDaresbury Laboratory, Warrington WA4 4AD, UK. Correspondence e-mail: [email protected]
Phaser is a program for phasing macromolecular crystal structures by both molecular replacement and experimental phasing methods. The novel phasing algorithms implemented in Phaser have been developed using maximum likelihood and multivariate statistics. For molecular replacement, the new algorithms have proved to be significantly better than traditional methods in discriminating correct solutions from noise, and for single-wavelength anomalous dispersion experimental phasing, the new algorithms, which account for correlations between F+ and F , give better phases (lower mean phase error with respect to the phases given by the refined structure) than those that use mean F and anomalous differences F. One of the design concepts of Phaser was that it be capable of a high degree of automation. To this end, Phaser (written in C++) can be called directly from Python, although it can also be called using traditional CCP4 keyword-style input. Phaser is a platform for # 2007 International Union of Crystallography future development of improved phasing methods and their release, including Printed in Singapore – all rights reserved source code, to the crystallographic community.
1. Introduction scripting, and to facilitate the development of improved algorithms for these methods through the use of maximum Improved crystallographic methods rely on both improved likelihood and multivariate statistics. automation and improved algorithms. The software handling Other software shares some of these features. For molecular one part of structure solution must be automatically linked to replacement, AMoRe (Navaza, 1994) and MOLREP (Vagin & software handling parts upstream and downstream of it in the Teplyakov, 1997) both implement automation strategies, structure solution pathway with (ideally) no user input, and though they lack likelihood-based scoring functions. Like- the algorithms implemented in the software must be of high lihood-based experimental phasing can be carried out using quality, so that the branching or termination of the structure Sharp (La Fortelle & Bricogne, 1997). solution pathway is minimized or eliminated. Automation allows all the choices in structure solution to be explored where the patience and job-tracking abilities of users would be 2. Algorithms exhausted, while good algorithms give solutions for poorer models, poorer data or unfavourable crystal symmetry. Both The novel algorithms in Phaser are based on maximum like- forms of improvement are essential for the success of high- lihood probability theory and multivariate statistics rather throughput structural genomics (Burley et al., 1999). than the traditional least-squares and Patterson methods. Macromolecular phasing by either of the two main methods, Phaser has novel maximum likelihood phasing algorithms for molecular replacement (MR) and experimental phasing, the rotation functions and translation functions in MR and the which includes the technique of single-wavelength anomalous SAD function in experimental phasing, but also implements dispersion (SAD), are key parts of the structure solution other non-likelihood algorithms that are critical to success in pathway that have potential for improvement in both auto- certain cases. Summaries of the algorithms implemented in mation and the underlying algorithms. MR and SAD are good Phaser are given below. For completeness and for consistency phasing methods for the development of structure solution of notation, some equations given elsewhere are repeated pipelines because they only involve the collection of a single here. data set from a single crystal and have the advantage of minimizing the effects of radiation damage. Phaser aims to 2.1. Maximum likelihood facilitate automation of these methods through ease of Maximum likelihood is a branch of statistical inference that ‡ Present address: Department of Applied Mathematics and Theoretical asserts that the best model on the evidence of the data is the Physics, University of Cambridge, UK. one that explains what has in fact been observed with the
658 doi:10.1107/S0021889807021206 J. Appl. Cryst. (2007). 40, 658–674 research papers
highest probability (Fisher, 1922). The model is a set of on hFOi. (Note that here and in the following, bold font is used parameters, including the variances describing the error esti- to represent complex or signed structure factors, and italics to mates for the parameters. The introduction of maximum represent their amplitudes.) likelihood estimators into the methods of refinement, In applications to molecular replacement and structure experimental phasing and, with Phaser, MR has substantially refinement, hFOi is the structure factor calculated from the increased success rates for structure solution over the methods model (FC) multiplied by a fraction D (where 0 < D <1; that they replaced. A set of thought experiments with dice Luzzati, 1952) that accounts for the effects of errors in the (McCoy, 2004) demonstrates that likelihood agrees with our positions and scattering of the atoms that are correlated with intuition and illustrates the key concepts required for under- the true structure factor. (If one works with E values, the 2 standing likelihood as it is applied to crystallography. factor D is replaced by A and is replaced by 1 A.) The likelihood of the model given the data is defined as the Integrating out the phase between FO and hFOi gives ! probability of the data given the model. Where the data have 2 2F F2 þ F 2F F independent probability distributions, the joint probability of PF ; F ¼ O exp O O I O O the data given the model is the product of the individual O O 0 distributions. In crystallography, the data are the individual