From: ISMB-98 Proceedings. Copyright © 1998, AAAI (www.aaai.org). All rights reserved.

A Computational System for Modelling Flexible

Protein- and Protein-DNA

MichaelJ. E. Sternberg (1) ¶, Patrick Aloy (1,2), HenryA. Gabb

Richard MJackson (1), Gidon Moont(1), Enrique Querol (2) & Francesc X. Aviles

¶ CorrespondingAuthor (I) BiomolecularModelling Laboratory Imperial CancerResearch Fund 44 Lincoln’s Inn Fields, LondonWC2A 3PX, UK e-mail [email protected]

(2) Institut de BiologiaFonarnental and Departmentde Bioquimica, Universitat Aut6nomade Barcelona,08193 Bellaterra, Barcelona,Spain

Abstract Acomputational system is described that predicts the based design of novel regulators of activity. For structure of protein / protein and protein / DNA reviews of macromolecular docking see (Janin, 1995; complexesstarting from unboundcoordinate sets. The Shoichet and Kuntz, 1996; Sternberg et al., 1998). approachis (i) a global search with rigid-bodydocking for complexes with shape complementarity and This paper describes a system for macromolecular favourableelectrostatics; (ii) use of distanceconstraints docking recently developedin the group (see Figure 1). fromexperimental ( or predicted) knowledgeof critical The four componentsof the strategy are : residues; (iii) use of pair potential to screen docked i) FTDOCKwhich performs a global search for complexesand (iv) refinementand further screening protein-side chain optimisationand interracial energy favourable rigid-body docking of the two components minimisation.The system has been applied to modelten (Gabbet al., 1997); protein/proteinand eight protein-repressor/ DNA(steps ii) FILTRwhich screens the favourable solutions from i to iii only) complexes.In general a few complexes, (i) by use of distance constraints from known one of which is close to the true structure, can be predicted regions involved in complexformation (Gabb generated. et al., 1997); iii) RPDOCKwhich rapidly screens complexes using empirical residue pair potentials (Moontet al., 1998); 1 - Introduction iv) MULTIDOCKwhich refines the structure of the complex including both the side-chain conformation The aim of predictive macromolecular docking is to and rigid-body minimisation and thereby provides start with the coordinates of the two in their another screen of the complexes(Jackson et al., 1998). unbound states and compute the three-dimensional structure of the complexincluding the conformational The entire approach has been tested on ten change on association. There is a increasing need for protein/protein complexes(Gabb et al., 1997; Jackson et successful docking algorithms as the rate of protein al., 1998; Moontet al., 1998) and the first three stages structure determination is increasing (> 5,000 deposited applied to model eight protein-repressor/DNA coordinates) whilst there are the structures of far fewer complexes(Aloy et al., 1998). Nearly all the starting protein-protein and protein/DNA complexes (< 200 coordinates were unbound (i.e. not taken from the coordinates). It is the structures of these complexesthat complex) but occasion lack of data required use of one provides valuable understanding of the functions of the of the two coordinates to be from the bound state. The moleculesand can serve as the basis for the structure- results show that computational methods with biochemical data can generate a few docked complexes Copyright© 1998, American Association for Artificial Intelligence one of whichis close to the correct conformation. (www.aaai.org).All rights reserved.

Sternberg 183 Figure 1 - Schematic of the strategy for macromolecular docking

(unbound state) 1

FTDOCK-Global search by rigid-body I dockingusing Fourier correlation for Ishapearts electrostatic complementarity 1-

rotate 4000

FILTR- Screencomplexes on distanceconstraint (e.g. active site, combiningloops or DNAfingerprint)

40-400

RPDOCK-Screenvia empirical amino-acid/amino-acidor Iamino-acid/nucleotidepotentials

4-40

MULTIDOCK- Refine and screen via all atomside-chain mutli copyand limited rigid-body minimisation

1-10

Final model(s) for testing 1

The numbers on the right hand side refer to the number of complexes generated for the enzyme-inhibitor studies (see Table I).

184 ISMB-98 multiplications and additions. Katchalski-Katzir et al suggested that proceeding via a discrete Fourier 2 - Algorithm transform enables the correlation to be calculated in the order of N3 In N3 steps. For a global search of rigid body docking, B is rotated The approach will be described as used for protein- ° protein docking and then the modifications to tackle through the three Eulerian angles in 15 steps and for protein/DNAcomplexes will be reported. each step the correlation c calculated. An important additional constraint of macromolecular complexes is that there nearly always will be a 2.1) FTDOCK- Rigid-body docking by Fourier favourable electrostatic interaction. To consider this we Correlation have augmentedthe Fourier correlation approach with an electrostatic calculation. As with shape complementarity, the modelling of electrostatics is The number of internal degrees of freedom for two designed to be sufficiently soft to cope with the with numerous rotatable bonds conformational changes on association. For molecule together with the six degrees of associational freedom A, charges are assigned to the atoms and the make at present a global search for the docked electrostatic potential evaluated outside the molecule conformation impracticable. Accordingly the present from approach is to start with a rigid-body docking and to incorporate softness into the scoring function to allow Ct,,.,n= for conformational changes on association. We have r/j developed a program FTDOCKto perform the initial step of rigid-body docking basing the approach on the where Ol,m,n is the potential at node l,m,n (position Fourier correlation methodof Katchalski-Katzir et al., i), qj is the charge on atom j, rij is the distance (1992). between i and j (with a minimumvalue of 2.3, to avoid FTDOCK searches for complexes with artificially large values of the potential) and E(~j) is complementarity of shape. The two molecules A and B distance dependentdielectric function. Inside molecule are placed onto a three-dimensional grids each of size N A, Ol,m,n is zero. For molecule B charges are assigned x N x N. For the larger molecule A each node l,m,n to neighbouring grid points giving a function ql,m,n" is assigned a value { 1 for grid points on the surface The electrostatic interaction ect,[3, T for a shift of al,m,n= { 9 for the core a, fl,)’ is calculated from { 0 for the outside of the molecule N N N where p has a negative value ( we use -15) for grid ea,fl,~’ = Z ZZOI,m,n " ql+a,m+fl,n+7 nodes within the surface layer of thickness t (we use l=l m=ln=l between 1.5 A and 1.2A). For the smaller molecule The Fourier correlation approach is also used to the nodes are assigned values calculate ea,fl,},. bl,m,n= { l f or the surface { 0 for the outside of the molecule FTDOCKis written in FORTRAN77and the time- The complementarity of shape is given by the consumingFourier correlation can be performed using correlation ca,fl, 7 between the molecules and is the Silicongraphics libraries and can be run in parallel evaluated from on a Silicongraphics Challenge taking a few hours of N N N cpu time using 8 RI0000processors. Serial code is also C O~,fl,~/ :~Z~a ,,m,n .b l+ct,m+fl,n+), available. l=l m=ln=l The results of benchmarking this procedure on ten whereO~, fl,)r are the translational shift of molecule protein complexes showed that electrostatics is best with respect to A for a given relative orientation of the used as a binary filter excluding unfavourable two grids. A high value for c denotes a complex with interactions but then ranking the allowed complexes good shape complementarity since there is substantial solely by shape complementarity. To sample docking overlap of the surface grid nodes of A with molecule B space, the three most favourable complexes from each without a high degree of penetration of the core of A orientation of B are stored and then after all orientations into B. Calculation of c is time consumingas for each considered the top set (we use 4000) are examined. of the N30~,fl,~’shifts there3 are of the order of N Fromthis list one needs to consider at least a list of

Sternberg 185 hundreds of complexes to include one complex that is where na and nb are the total occurrences of residues close (<2.5A rms for the Ca atoms at the interface) of types a and b. the correct structure. A log-odds score for a pairing of residue types a and b is derived: 2.2) FILTR- Distance constraints Sab = log10 (~) The lack of selectivity after a global search is a Empirical residue pair potential are derived by applying consequence of the simplified scoring functions used Boltzmann’sequation to the log odds score top obtain and the lack of consideration of conformation change the potential. Howeverthe validity of this approachhas on association. Howeverin manypractical applications been questioned (Skolnick et al., 1997; Thomasand of docking there is knowledgeabout the binding site on Dill, 1996) and therefore we simply use the log odds as at least one of the two components. The procedure a statistical measure of the tendencies of different FILTR is used to screen complexes generated by residue types to pair. FTDOCKfor compatibility with residue-residue distance constraints. The inter Ca distance is calculated The program RPDOCKevaluates the stability of a and this check to be less than the sumof the effective complexby evaluating a total score whichis the sum of side-chain radii plus 4.5A. Sab values for all intermolecular residue pairings with the distance less than dcut (= 12A) and where each residue has a relative accessibility above a cut-off of 2.3) RPDOCK- Screening docked complexes by (Acut) (= 5%) to exclude buried side-chains. This total residue pair potentials score is then divided by the total numberof contacting pairs of residues. High scoring complexesare evaluated The next step is to employ an approach for screening by this function to be morefavourable. that is not sensitive to precise atomic positions and has a radius of convergence that can identify suitable candidates for further examination and refinement. 2.4) MULTIDOCK- Refinement of protein These considerations led us to consider empirically- interfaces incorporating solvation derived residue-residue pair potentials. These are derived from a database of knownstructures and in principle being empirical can incorporate the dominant MULTIDOCKperforms the next step which is to move thermodynamic effects without explicitly having to beyonda rigid-body treatment of the complex(Jackson model each of them. Indeed residue pair potentials et al., 1998). The model considers the placementof the have proved helpful in protein fold recognition which side chain on a fixed main chain together with limited requires evaluation of the stability of a protein structure rigid-body minimisationof the interacting . from a model at least several Angstromsaway from the true structure (e.g. see review Vajdaet al., 1997). The complex is treated at the atomic level with solvation modelled as soft sphere Langevin dipoles The pair potentials are evaluated from a non redundant following the work of Luzhkov and Warshel (1992). database of protein chains. The frequencies Fab of Each water molecule is treated as a particle with a van der Waals radius and a dipole whose magnitude is intra-chain pairings between residues of type a and b evaluated from the field according to the Langevin having a CI3-C[3 distance (for Gly Ca) less than a cut equation. The used takes the van der Waals off dcut are calculated. The total numberof pairings T radii from AMBER(Weiner et al., 1984) with partial is given by charges from PARSE(Sitkoff et al., 1994). 20 20 T:EEFo Multiple side chain conformationsare refined by a self- a=l b=l These observed frequencies are compared to those consistent mean field approach (Koehl and Delarue, expected from randompairings and the model chosen is 1994; Lee et al., 1994). For a protein of N residues, purely compositional (i.e. based on the product of molar each side-chain from residue i can adopt one of Ki fractions). The molar expected frequency for the a-b conformational states (i.e. rotamers). Therefore pairing is given by )conformational matrix CMof dimension (N x max(Ki) r/a ?/b defines the side chain degrees of freedomso that each Eat; = T. 20 "20 rotamer, k, has a probability of CM(i,k). The potential En,, Enh of meanforce, E(i,k), on the k-th rotamer of residue, a=l b=[ is given by;

186 ISMB-98 E(i,k) = VI(Zil0 0+ VM(Zil protein/DNA complex formation generally involves substantial changes to the DNAbackbone and this N Kj raises the question of whetherone can use of the initial rigid-body search. We have investigated modelling + ~ ~ CMG,I)VS(xik, ZjI) + Esol(i,k) j=l j ~i 1=1 protein repressor/DNA complexes as this system does not exhibit gross distortions to the DNA(Aloy et al., where Zik are the coordinates of atoms in rotamer k of 1998). residue i. VI represents the internal energy of the The following modifications to the protein-protein rotamer and VMrepresents the interaction energy docking approach were made. FTDOCKused a between the rotamer and all the main chain atoms. specific charge set for DNAthat exaggerates the partial These two values are constant for a given rotamer on a charges on the chemical groups in the DNAhelix given main chain. The third term represents the groove as these charges provide specificity of interaction energy (VS) betweenthe rotamer and all the recognition. In contrast the phosphate charges on the rotamers of other residues weighted by their respective backbone were damped. The distance filtering probabilities given in CM.The fourth term Esol (i,k) exploited knowledgeof the recognition base sequence. represents the potential of meanforce acting at rotamer, Rotations were in 12° steps. A set of nucleotide / k, of residue, i, due to the surrounding solvent amino-acid pair potential were generated from environment.For a water in conflict with a rotamer the protein/DNA complexes (excluding those being probability of the site is dependenton the probability of modelled). The best parameters for screening were that rotamericstate. found to use molar fraction model for the randomstate with dcur of 12Abut using a sparse matrix that did not The probability of a rotamer is calculated according to score interactions involving hydrophobicresidues (only the Boltzmannprinciple as; scoring C,D,E,G,H,K,N,Q,R,Sand T).

CM(i,k) = e-E(i,k)/RT Ki 3 Results e-E(i,k)/RT k=l where R is the Boltzmann constant and T the 3.1) Protein-protein complexes temperature. The values of CM(i,k) are substituted back into the equation describing E(i,k) and its new Table l presents the results of the strategy applied to six value recalculated and the procedure repeated until enzyme-inhibitor and four -antigen complexes. convergence. The predicted structure is the highest The starting coordinates were from the unbound probability rotamer for each residue. molecules with the exception of the HyHEL5 and HyHEL10 (for details of coordinate sets see Gabb Following each complete cycle of side-chain meanfield et al., 1997). Acorrect solution was taken as one with optimisation, rigid-body minimisation is performed on rms difference for the Cct atoms at the interface of < the resultant coordinates of the proteins without 2.5]k. The distance constraint filter used in FILTRwas consideration of the solvent. The smaller molecule is that one of the three active site residues of the enzyme movedin three translational and three rotational degrees should contact any part of the inhibitor or that one of freedom according to the path determined by the residue from either complementaritydetermining region numerical derivatives to minimise the intermolecular L3 or H3contacted the antigen. Out of the initial list of interaction energy. The process cycles between side- 4,000 complexes, applying FILTRand then ordering by chain optimisation and rigid-body minimisation until the shape complementarity yielded set 1 in which a convergence. good solution was placed between rank 3 and 130 for the enzyme/inhibitors and between 39 and 226 for modelling antibody docking. No solution considered correct was generated for the subtilisin/subtilisin 2.5) Docking protein-DNA complexes inhibitor complex.

The above strategy for protein-protein docking exploits The set 1 results from FTDOCKwere then screened by the rigid-body approximation which is expected to be RPDOCK.The use of pair potentials (RPDOCK) effective for many systems since there is limited provides a powerful approach for screening. For the conformational change on association and this often enzyme/inhibitor complexesno more than 7 alternatives primarily restricted to side=chains. In contrast would have to be examined. Antibody/antigen

Sternberg 187 Table I - Discrimination provided by generation and screening docked protein-protein complexes.

System Total N< Rank ’P,.ank ...... R’ank Rank rms no after 2.5A in FTDOCK RPDOCK in MULTI- MULTI- Cff. atoms FTDOCK FTDOCK in set I SET I DOCK in DOCKin set (A) & FILTR list (making SET 1 2 (i.e. after ~set 1) set 2) RPDOCK) ocCHYN- 94 I 3 1 2 1 2.0 HPTI ~CHY- 86 5 11 3 1 1 1.1 ovomucoid Kallikrein- 363 18 130 5 2 1 1.0 BPTI Subtilisin- 26 2 8 1 12 2 2.0 CHY I Subtilisin- - - subtilisin I - 228 8 16 7 26 4 1.7 BPTI D 1.3- 694 2 168 34 235 84 1.8 lysozyme D44. I- 586 5 39 18 108 42 2.5 lysozyme HyHEL5- 516 2 226 97 31 23 1.5 lysozyme HyHELI0- 756 5 62 169 13 4 1.1 ,g. oz me ......

aCHYN- a-chymotrypsinogen, aCHY- a-chymotrypsin, HPTI - human pancreatic trypsin inhibitor, BPTI - bovine pancreatic trypsin inhibitor, CHYI-chymotrypsin inhibitor, Subtilisin I - subtiisin inhibitor D1.3, D44.1, HyHEL5and HyHELI0are monoclonal antibodies. In the table some degenerate identical complexes included our earlier studies have been excluded.

Table 2 - Discrimination provided by generation and screening docked repressor/DNA complexes

System Total no after Rank FTDOCK N_< 65’~° correct Rank RPDOCK o~ rms FTDOCK & in set 1 contacts in in set 1 correct (A) FI LTR FTDOCK contacts ...... (seti) ARC 1232 9i I 1 l 69 4.1 CRO 570 12 2 121 80 3.0 GAL 1470 37 4 2 75 3.6 LAC 800 30 6 133 77 4.0 LAMDBA 889 22 6 4 98 3.0 MET 1017 PURINE 1444 9 13 28 100 4.3 TRP 564 4 6 1 74 4.2 rms is for equivalenced Cc~ protein and C 1’ DNAatoms in the entire complex

188 ISMB-98 Figure 2 - Dockingof bovine pancreatic trypsin inhibitor to trypsin

° Arg 17 ~d//, Lys 15

Fig 2a (left) the results from FTDOCKwith BPTI being the upper molecule. The Ca trace shows in black the X-ray structure of complex and in white the predicted FTDOCKcomplex superimposed. Fig 2b - The prediction of the inhibitor loop of BPTIshowing three side-chains central to the interaction with trypsin (not shownbut wouldbe below). Black for X-ray structure of complex, white for prediction by FTDOCKand grey for prediction after refinement using MULTIDOCK.Fig 2a reproduced from Sternberg et al (1998) with permission.

Figure 3 - Dockingof protein repressors to DNA

CRO TRP

Predicted (black) and X-ray (grey) of the cro and trp repressors / DNAcomplexes. Proteins shownas Cot trace and as the phosphate backbone.

Sternberg 189 complexes were less successful with between 18 and Subsequenttrials suggested that a finer rotational scan 169 complexes to examine. at 12° wouldhave enabled a solution to be found. The discriminatory power of MULTIDOCKwas studied considering the set ! output from FTDOCK. MULTIDOCKby itself yielded better ranks for 3 of the 3.2) Repressor / DNAcomplexes 5 enzyme/inhibitor systems considered and was a powerful screen for the numerousfalse positives in the The procedure was applied to eight repressor/DN,~ Kallikrein/BPTl system. For screening the antibody complexes starting with unbound repressor for all system MULTIDOCKdid not prove successful. proteins (except the LAMBDArepressor) and with model built B-DNA(for details see Aloy et al., 1998). The complete strategy was then evaluated in which the The conformational change on association were results from RPDOCK(i.e. set 2) were ordered by pair sufficiently great that we foundthat rms differences did potential score and then the top 10%of this list were not provide a good score for a correct model. Instead considered for the enzymes and the top 40% for we required that at least 65%of the contacts should be antibodies. This set was then screened by correctly modelled (to an error of 4A). The distance MULTIDOCKFor the enzyme/inhibitor complexes filter was that one of the two central base pairs in the the lowest rank of the first correct solution was 4 and recognition sequence should contact any part of the thus for 5 of the 6 systemsconsidered a very limited list repressor. The METrepressor could not be modelled as of complexes can now be examined as plausible no solution was within the 65% correct contact predicted models. The strategy also provided criterion. This is a consequenceof a loop present in the discrimination for the antibody/antigen complexes. unbound coordinates that moves away from the DNA on complex formation. Apart from this, no more than Figures 2 illustrates the agreement between the 91 solutions ranked by FTDOCK shape predicted and the X-ray structure for BPTI (bovine complementarity would need to be examined to find a pancreatic trypsin inhibitor) with trypsin. The results of correct solution. Ranking by the empirical score for FTDOCK(Fig 2a) are chosen with a view that pairing (RPDOCK)is effective in finding a solution highlights the differences. The interface rms for the Cot the top 4 positions, but only for 4 of the 7 complexes atoms is 1.7A and for all Cot atoms is 1.5A. The (ignoring MET). Thus RPDOCKcan be useful inhibitory loop has successfully been located in the suggests a few probable complexes as the basis for active site of the enzyme. Refinement using experimental verification. Apart from MET, the MULTIDOCKis illustrated by the side-chain of the discrimination in modelling somesystems comparedto inhibitor as there are only minor conformational others cannot simply be explained from a consideration changes in the enzyme(Fig 2b). The conformation of the magnitude of the conformational change on the critical side-chain Lys 15 is improved and now docking (quantified by protein Cot or DNAC I’ rms). points towards the enzyme. The conformation of Arg Figure 3 illustrates the agreementof the predicted and 17 is improved but FTDOCKplaces Arg 39 too far X-ray structure of two repressor/DNAcomplexes. awayfor successful refinement. This result is typical of the refinement of the enzyme/inhibitor system showing that the conformation of several, but not all, the 4 - Discussion and Conclusions important side-chains can be modelledaccurately.

In general, in our protein-protein docking, modelling A computational system has been developed to perform the antibody/antigen systems was less successful than rigid-body docking and subsequent screening of docked protein-protein and protein-DNA complexes. In the enzyme/inhibitors. This may reflect that the addition an approach is available to refine the side- enzyme/inhibitor system has evolved to a near optimal recognition. In contrast the particular antibody that chain conformation for protein-protein complexes. The recognises an antigen has not evolved just to recognise strategy has been benchmarked on several systems that antigen but is simply chosen from the available starting from unboundcoordinates. It is shownthat for repertoire of antibodies. Indeed the affinity for useful modelling one requires the additional use of knowledge of a binding residue in one of the two antibody/antigen complexes tends to be one order of proteins or of the two central base pairs for magnitudeless than for enzyme/inhibitors studied here. DNA/repressordocking. Then for most, but not all, the The distinction between these two types of systems has complexes studied modelling combined with this also been discussed by Lawrence and Colman (1993). The individual poor result for dockingsubtilisin with its distance constraint can yield a very limited list of inhibitor is because FTDOCKfails to generate any complexes one of which will be a model close to the solution with an interface Cot rms of better than 2.5A. true complex.

190 ISMB-98 There are a variety of other algorithms for protein- screening of previously docked complexes (Jackson and protein docking ( for reviews see Janin, 1995; Shoichet Sternberg, 1995) were able to identify a solution close and Kuntz, 1996; Sternberg et al., 1998). The first stage to the true complex( <_2.5ACct rms for the enzymeand of our method (FTDOCK)is similar to the original inhibitor). The second trial was a far larger complex, method by Katchalski-Katzir et al., (1992) upon which docking the coordinates in the bound state of antibody it is based. Vakser (1995) has also developed this to unbound haemagglutinin coordinates (Dixon, 1997). method to consider low-resolution docking. In contrast Two of the four groups that entered were able to to these two other approaches, FTDOCKincludes a suggest a model with Cct rms for the interface atoms < consideration of electrostatics in addition to shape 10A. Vakser (1997), using his low-resolution complementarity. The use of pair potentials in implementation of the Katchalski-Katzir algorithm RPDOCKto screen docked structures has not, to our provided a single prediction with an rms of 9.5A. We knowledge, been previously evaluated. The final stage used preliminary versions of FTDOCK and of side-chain refinement by MULTIDOCKis a new MULTIDOCKand submitted 8 solutions, one of which combinationof the treatment of solvation by a Langevin was the closet model(8.5A rms) but the others were far dipole model (Luzhkov & Warshel, 1992) combined poorer. Subsequently we showed that a major with a mean field approach (Koehl & Delarue, 1994; limitation in our modelling was the grid size of the Lee et al., 1994) to identify optimal rotatmers. search. The version of FTDOCKwas not parallelised and becauseof the size of the target, the grid resolution Protein-DNAdocking has also been studied by Knegtel was about 1.5A. Wecan now perform the search with a et al., (1994) who used Monte Carlo simulations 0.8A grid and a limited set (around 10) models can study shifts of a repressor within the DNAgroove by a suggested one of which would be a very good model few base pairs. To our knowledge,previously there has with an interface rms < 1.5A. This re-run clearly not been a systematic study to perform a global search benefits from hindsight but nevertheless it shows that starting with unbound coordinates for repressors our approach can tackle large complexes and yield docking to DNA. accurate models.

There are several other approaches for rigid body As there have only been two blind trials and several docking. The widely used DOCKalgorithm by Shoichet approaches were successful in the first, one cannot and Kuntz (1991) matches overlapping spheres from the identify any method as clearly superior to others. cavity in one molecule with the atoms of the other. Further blind trials are therefore required to identify Cherfils et al., (1991) match surfaces and evaluate which docking approaches are accurate and robust. In buried area followed by energy refinement of the addition, use by the communityis another important complex. Duncan and Olson (1993) match smoothed methodto validate software and the software suite is molecular surface. Subsequent screening methods being made available via our web site include the continuum model for the hydrophobic and (http://www.icnet.uk/bmm). This study and the results electrostatic effects (Jackson and Sternberg, 1995; of the blind trials suggest that computer-based Wenget al., 1996). macromoleculardocking is nowlikely to be a powerful strategy to obtain structural and functional insights for Comparisonsof protein docking algorithms are difficult macromolecular complexes. because of different test systems used by the groups. Our approach meets the following criteria that we consider important in any effective approach. First, Acknowledgements more than a few test systems were considered without any variation in the parameters. Second, in general both P Aloy was supported by a short-term EMBO starting coordinate sets were from unbound molecules. fellowship; H. Gabb by a long-term EMBOfellowship Finally, there was no selective pruning of side chains and R Jackson by the Lloyd’s of LondonTercentenary that undergo conformational change on association. Foundation. The collaboration is also supported by Spain/UKAcciones Integradas (3279). Recently there have been two blind trials of protein docking that provides some basis to evaluate the References different approaches. The first trial was predicting the docking of the unboundcoordinates of 13-1actamase and Aloy, P., Moont,G., Gabb,H. A., Querol, E., Aviles, F. its inhibitor (Strynadkaet ai., 1996). Thebest prediction X. & Sternberg, M. J. E. (1998). Modelling repressor was by the Katchalski-Katzir (1992) algorithm with proteins binding to DNA.submitted. rms of 1.1A. The three other groups that performed the original search using established algorithms (Shoichet Cherfils, J., Duquerroy,S. & Janin, J. (1991). Protein- and Kuntz, 1991; Cherfils et al., 1991; Duncan and protein recognition analyzed by docking simulation. Olson, 1993) and our group that considered only Proteins 11,271-280.

Sternberg 191 Dixon, J. S. (1997). Evaluation of the CASP2docking Moont, G., Gabb, H. A. & Sternberg, M. J. E. (1998). section. Proteins Supplement1, 198-204. Use of pair potentials across protein interfaces in screening predicted docked complexes. Submitted. Duncan, B. S. & Olson, A. J. (1993). Shape analysis molecular surfaces. Biopolymers33, 231-238. Shoichet, B. K. & Kuntz, I. D. (1996). Predicting the structure of protein complexes: a step in the right Gabb, H. A., Jackson, R. M. & Sternberg, M. J. E. direction. Chemistry& 3, 151-156. (1997). Modelling Protein Docking using Shape Complementarity, Electrostatics and Biochemical Sitkoff, D., Sharp, K. A. & Honig, B. (1994). Accurate Information. J. Mol. Biol. 272, 106-120. calculation of hydration free energies using macromolecular solvent models. J. Phys. Chem. 98, Jackson, R. M., Gabb, H. A. & Steinberg, M. J. E. 1978-1988. (1998). Rapid refinement of protein interfaces incorporating solvation: application to the docking Skolnick, J., Jaroszewski, L., Kolinski, A. & Godzik, problem. J. Mol. Biol. 276, 265-285. A. (1997). Derivation and testing of pair potentials for protein folding. When is the quasichemical Jackson, R., M. & Sternberg, M. J. E. (1995). approximationcorrect? Protein Science 6, 676-688. continuum model for protein-protein interactions : Application to the docking problem. J. Mol. Biol. 250, Shoichet, B. K. & Kuntz, I. D. (1991). Protein docking 258-275. and complementarity. J. Mol. Biol. 221,327-346.

Janin, J. (1995). Protein-protein recognition. Prog. Sternberg, M. J. E., Gabb, H. A. & Jackson, R. M. Biophys. Molec. Biol. 64, 145-166. (1998). Predictive docking of protein-protein and protein-DNA complexes. Current Opin in Structural Katchalski-Katzir, E., Shariv, 1., Eisenstein, M., Biology, in the press. Friesem, A. A., Aflalo, C. & Vakser, I. A. (1992). Molecular surface recognition: determination of Strynadka, N. C. J., Eisenstein, M., Katchalski-Katzir, geometric fit between proteins and their ligands by E., Shoichet, B., Kuntz, I., Abagyan,R., Totrov, M., correlation techniques. Proc. Natl. Acad. Sci. USA89, Janin, J., Cherfils, J., Zimmerman,F., Olson, A., 2195-2199. Duncan, B., Rao, M., Jackson, R., Sternberg, M. & James, M. N. G. (1996). Molecular docking programs Knegtel, R. M. A., Boelens, R. & Kaptein, R. (1994). sucessfully determine the binding of a beta-lactamase Monte Carlo docking of protein-DNA complexes: inhibitory protein to TEM-1beta-lactamase. Nature incorporation of DNAflexibility and experimental data. Structural Biology3, 233-238. Prot. Eng. 7, 761-767. Thomas, P. D. & Dill, K. A. (1996). Statistical Koehl, P. & Delarue, M. (1994). Application of a self- potentials extracted from protein structures: how consistent mean field theory to predict protein side- accurate are they.’? J. Mol.Biol. 257, 457-469. chains conformation and estimate their conformational entropy. J. Mol. Biol. 239, 249-275. Vajda, S., Sippl, M. & Nonotny, J. (1997). Empirical potentials and functions for protein folding and binding. Lawrence, M. C. & Colman, P. M. (1993). Shape Curr. Opin. in Struct. Biol. 7, 222-228. complementarityat protein/protein interfaces. J. Mol. Biol. 234, 946-950. Vakser, I. A. (1997). Evaluation of GRAMMlow- resolution docking methodologyon the hemagglutinin- Lee, K. H., Xie, D., Freire, E. & Amzel, L. M. (1994). antibody complex. Proteins supplement 1,205-209. Estimation of changes in side chain configurational entropy in binding and folding: general methods and Weiner, S. J., Kollman,P. A., Case, D. A., Singh, U. C., application to helix formation. Proteins, 20, 68-84. Ghio, C., Alagona, G., Profeta, S. & Weiner, P. (1984). A new force field for molecular mechanical simulation Luzhkov, V. & Warshel, A. (1992). Microscopic of nucleic acids and proteins. J. Am.Chem. Soc. 106, models for quantum mechanical calculations of 765-784. chemical processes in solution: LD/AMPACand SCAAS/AMPACcalculations of solvation energies. J. Weng,Z., Vajda, S. & Delisi, C. (1996). Prediction Comput. Chem. 13, 199-213. protein complexes using empirical free energy functions. Prot. Sci. 5, 614-626.

192 ISMB-98