Improved Prediction of Protein Side-Chain Conformations with SCWRL4 Georgii G

proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 Georgii G. Krivov,1,2 Maxim V. Shapovalov,1 and Roland L. Dunbrack Jr.1* 1 Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, Pennsylvania 19111 2 Department of Applied Mathematics, Moscow Engineering Physics Institute (MEPhI), Kashirskoe Shosse 31, Moscow 115409, Russian Federation INTRODUCTION ABSTRACT The side-chain conformation prediction problem is an integral Determination of side-chain conformations is an component of protein structure determination, protein structure important step in protein structure prediction and protein design. Many such methods have prediction, and protein design. In single-site mutants and in closely been presented, although only a small number related proteins, the backbone often changes little and structure pre- 1 are in widespread use. SCWRL is one such diction can be accomplished by accurate side-chain prediction. In method, and the SCWRL3 program (2003) has docking of ligands and other proteins, taking into account changes remained popular because of its speed, accuracy, in side-chain conformation is often critical to accurate structure pre- and ease-of-use for the purpose of homology dictions of complexes.2–4 Even in methods that take account of modeling. However, higher accuracy at compara- changes in backbone conformation, one step in the process is recal- ble speed is desirable. This has been achieved in culation of side-chain conformation or ‘‘repacking.’’5 Because many a new program SCWRL4 through: (1) a new backbone conformations may be sampled in model refinements, backbone-dependent rotamer library based on side-chain prediction must also be very fast. In protein design, as kernel density estimates; (2) averaging over samples of conformations about the positions in the changes in the sequence are proposed by Monte Carlo steps or other rotamer library; (3) a fast anisotropic hydrogen algorithms, conformations of side chains need to be predicted accu- 6–8 bonding function; (4) a short-range, soft van der rately to determine whether the change is favorable or not. Waals atom–atom interaction potential; (5) fast Most side-chain prediction methods are based on a sample space collision detection using k-discrete oriented poly- that depends on a rotamer library, which is a statistical clustering of topes; (6) a tree decomposition algorithm to observed side-chain conformations in known structures.9 Such solve the combinatorial problem; and (7) optimi- rotamer libraries can be backbone-independent, lumping all side zation of all parameters by determining the chains together regardless of the local backbone conformation,10 or interaction graph within the crystal environment backbone-dependent, such that frequencies and dihedral angles vary using symmetry operators of the crystallographic with the backbone dihedral angles / and w.11,12 An alternative to space group. Accuracies as a function of electron density of the side chains demonstrate that side using statistical rotamer libraries is to use conformer libraries, which chains with higher electron density are easier to are samples of side chains from known structures, usually in the predict than those with low-electron density and form of Cartesian coordinates, thus accounting for bond length, presumed conformational disorder. For a testing bond angle, and dihedral angle variability.13–16 Once a search space set of 379 proteins, 86% of v1 angles and 75% of in the form of rotamers (and samples around rotamers in some v112 angles are predicted correctly within 408 of cases) or conformers is defined, a scoring function is required to the X-ray positions. Among side chains with evaluate the suitability of the sampled conformations. These may higher electron density (25–100th percentile), include the negative logarithm of the observed rotamer library fre- these numbers rise to 89 and 80%. The new pro- quencies,17–20 van der Waals or hard sphere steric interactions of gram maintains its simple command-line inter- side chains with other side chains or the backbone, and sometimes face, designed for homology modeling, and is 20–24 now available as a dynamic-linked library for electrostatic, hydrogen bonding, and solvation terms. Many incorporation into other software programs. Proteins 2009; 00:000–000. Additional Supporting Information may be found in the online version of this article. VC 2009 Wiley-Liss, Inc. Grant sponsor: NIH; Grant numbers: R01 HG02302, R01 GM84453, P20 GM76222. *Correspondence to: Roland L. Dunbrack, Institute for Cancer Research, Fox Chase Cancer Center, Key words: homology modeling; side-chain pre- 333 Cottman Avenue, Philadelphia, PA 19111. E-mail: [email protected]. Received 17 February 2009; Accepted 13 May 2009 diction; protein structure; rotamer library; graph Published online in Wiley InterScience (www.interscience.wiley.com). decomposition; SCWRL. DOI: 10.1002/prot.22488 VC 2009 WILEY-LISS, INC. PROTEINS 1 G.G. Krivov et al. search algorithms have been applied, including cyclic usability of the program for homology modeling and optimization of single residues or pairs of residues,11,16 other purposes. As part of this, we wanted to make sure Monte Carlo,5,18,25 dead-end elimination (DEE),26,27 that the program always solves the structure prediction self-consistent mean field optimization,28 integer pro- problem in a reasonable time, even if the graph is not gramming,29 and graph decomposition.17,30,31 These sufficiently decomposable. This is accomplished with an methods vary in how fast they can solve the combi- approximation that while not guaranteeing a global mini- natorial problem, and whether they guarantee a global mum of the energy function given the rotamer search minimum of the given energy function or instead search space, does complete the calculation quickly in all cases for a low energy without such a guarantee. In general, tested. such a guarantee is not necessary, given the approximate In this article, we describe the development of the nature of the energy functions, and it is the overall pre- SCWRL4 program for prediction of protein side-chain diction accuracy and speed that are more important conformations. We used many different approaches to ac- features of a prediction method. In recent years, it has complish the goals described earlier. We have improved become clear that some flexibility around rotameric the SCWRL energy function using a new backbone-de- positions15,16,32 and more sophisticated energy func- pendent rotamer library (Shapovalov and Dunbrack, in tions20,33 are needed for improved side-chain packing preparation) which uses kernel density estimates and ker- and prediction. nel regressions to provide rotamer frequencies, dihedral SCWRL3 is one of the most widely used programs of angles, and variances that vary smoothly as a function of its type with 2986 licenses in 72 countries as of April 30, the backbone dihedral angles / and w. SCWRL4 also 2009. It uses a backbone-dependent rotamer library,12 a uses a short-range, soft van der Waals interaction poten- simple energy function based on the library rotamer fre- tial between atoms rather than a linear repulsive-only quencies and a purely repulsive steric energy term, and a function used in SCWRL3, as well as an anisotropic graph decomposition to solve the combinatorial packing hydrogen bond function similar to that used in Rosetta36 problem.30 It has a number of features that have made it (but using a different functional form that is faster to widely used. The first of these is speed, which has evaluate). To account for variation of dihedral angles enabled the program to be used on a number of web around the mean values given in the rotamer library, we servers that predict protein structure from sequence- used the approach of Mendes et al.,32 which samples v structure alignments34 and may perform many hundreds angles around the library values and averages the energy of predictions per day. The second is accuracy. At the of interaction between rotamers of different side chains time of its publication, it was one of the most accurate over these samples, resulting in a free-energy-like scoring side-chain prediction methods. However, a number of function. To determine the interaction graph, as used in other methods have appeared claiming higher accu- SCWRL3, we implemented a fast method for detecting racy,15,18,20,35 although often at much longer CPU collisions (i.e., atom–atom interactions less than some times. The third feature of SCWRL3 is usability. The pro- distance) using k-discrete oriented polytopes (kDOPs). gram takes input PDB coordinates for the backbone, kDOPs are three-dimensional shapes with faces perpen- optionally a new sequence, and outputs coordinates for dicular to common fixed axes, such that kDOPs around the structure with predicted side chains using the same two groups of atoms can be rapidly tested for overlap.37 residue numbering and chain identifiers as the input In SCWRL3, we used a graph decomposition method structure. This feature is simple but in fact many if not that broke down the interaction graph of residues into most side-chain prediction programs renumber the resi- biconnected components, which overlap by single residues of the input structure and strip the chain identifiers, dues called articulation points. In most cases, this solves making them difficult to use in homology modeling. the graph quickly. However, with a longer range energy One unfortunate feature of SCWRL3 is that the graph function and sampling about the rotameric dihedral decomposition method used may not always result in a angles, this is no longer true. We therefore implemented combinatorial optimization that can be solved quickly. In our own version of a tree decomposition of the graphs, such cases, the program may go on for many hours as suggested by Xu31 for the side-chain prediction prob- instead of finishing in a few seconds, since it lacks any lem.

Improved Prediction of Protein Side-Chain Conformations with SCWRL4 Georgii G

Side-Chain Liquid Crystal Polymers (SCLCP): Methods and Materials. an Overview

The Role of Side-Chain Branch Position on Thermal Property of Poly-3-Alkylthiophenes

Molecular Modelling Virtual Chemistry Resource Pymol Experiment Answer Sheet Well Done for Completing Our Pymol Experiment! We Really Hope You Enjoyed It!

Amino Acid Recognition by Aminoacyl-Trna Synthetases

2. Proteins Have Hierarchies of Structure

Prediction of Amino Acid Side Chain Conformation Using a Deep Neural

6 Synthesis of N-Alkyl Amino Acids Luigi Aurelio and Andrew B

Amino Acids, Peptides, Proteins: Introduction

Effect of Side Chain Substituent Volume on Thermoelectric Properties of IDT-Based Conjugated Polymers

Naming Organic Compounds: Alkanes

Prediction of Protein Structure by Emphasizing Local Side- Chain/Backbone Interactions in Ensembles of Turn Fragments

1-1 Amino Acids