PHS 398 (Rev. 9/04), Continuation Page s1

Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P.

Refinement of Comparative Protein Models A. Specific Aims We aim to create and test computational methods capable of refining comparative protein structure models to an accuracy comparable to that of moderate to high resolution experimental structures. Our overall strategy is to dramatically improve the efficiency of sampling, which has limited the success of prior efforts at comparative model refinement, by a combination of (a) identifying which degrees of freedom are critical to sample, (b) developing new algorithms for making large moves along these degrees of freedom, and (c) using experimental data, if available, to help constrain the search space. We will 1. Implement a method for refining homology models based on methods from kinematics. The approach combines two powerful strategies for improving the efficiency of sampling. The first strategy is to replace the small, femtosecond motions of molecular dynamics sampling with large, concerted motions of critical structural elements, such as loops. These Monte Carlo (MC) moves occur in dihedral angle space, and are guided by kinematics, which has been used to solve similar problems in the field of robotics. The second strategy is to adopt the powerful multiple temperature sampling schemes (replica exchange), which dramatically increase sampling efficiency and are simple to implement with inexpensive Linux clusters. 2. Implement a method for refining comparative models based on remote homologs by combining the “zippers” strategy for sampling protein backbones with restraints derived from bioinformatics. The zippers method was originally developed to increase the sampling efficiency for all-atom protein folding by preferentially sampling local contacts. Here, the method will be adapted to refine comparative models by focusing the sampling on those portions of the structure that are poorly constrained by the sequence alignment. Two key features of this approach are that it will be (i) tolerant of modest sequence alignment errors, and (ii) capable of providing models for large insertions, which are not aligned to template residues. 3. Integrate information in template structures, a molecular mechanics force field, and crystallographic data to maximize the number and accuracy of protein structures determined by molecular replacement. Crystallographic structure determination by molecular replacement is limited in two model building aspects: (i) it is sensitive to the inaccuracy of the starting models and (ii) the ensuing refinement of the molecular replacement solution has a relatively small radius of convergence. We will address the first problem by an iterative process of target-template alignment, model building, and model assessment based partly on crystallographic data; and the second problem by several algorithms that refine a given model based on crystallographic data, a molecular mechanics force field, and information in the template structures. This work will involve close collaborations with scientists participating in the Protein Structure Initiative. 4. Evaluate the methods developed in Aims 1–3, including the utility of the models for computer-aided inhibitor discovery. We will subject the methods in Aims 1–3 to blind predictions in CASP and CAFASP, and on an on-going basis using EVA-CM. In addition, we will directly assess the utility of the models generated for structure-based inhibitor design, an important but challenging application. The ability to create comparative models with accuracy comparable to experimental structures would open up a vast new set of protein targets to structure-based drug design methods. Common methods of assessing comparative models focus on geometrical accuracy, and do not directly assess the aspects of binding sites required for inhibitor design applications. We will directly assess the quality of the models for this purpose by performing docking enrichment studies on comparative models generated by Aims 1–3 for proteins with known inhibitors. This work will involve close collaborations with experimental groups who will test our methods in inhibitor discovery projects. Aims 1–3 address both Comparative Modeling Goals identified by RFA-GM-05-008, “High Accuracy Protein Structure Modeling”. Our approach integrates methods grounded in bioinformatics (Sali), physics (Jacobson and Dill), and applied mathematics (Coutsias). Close collaborations among these researchers, as well as Dr. Shoichet (Aim 4), is facilitated by most of the researchers being located together at UCSF; our budget also allocates funds for Dr. Coutsias (U. New Mexico, Dept. of Mathematics and Statistics) to spend several months a year at UCSF. The tangible outcome of this research will be a set of freely available modular source codes and executable programs that implement the methods in Aims 1–3.

PHS 398/2590 (Rev. 09/04) Page 1 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P.

B. Background and Significance Comparative structure prediction, in conjunction with the Protein Structure Initiative (PSI), has the potential to bridge the gap between the number of available protein sequences (>1 million) and structures (>20,000). The experience of the Sali group in constructing ModBase, a database currently containing over one million protein comparative models, highlights this potential. The fraction of sequences with comparative models for at least one domain is currently 57%.1 This number will continue to grow as the PSI enters its production phase. The New York Structural Genomix Research Consortium documented the number and quality of the comparative models that could be built based on their new structures. On average, about 100 protein sequences without any prior structural characterization could be modeled for each new structure.1 The accuracy of these models, however, varies significantly, with many accurately representing the overall tertiary structure, but relatively few (<10%, i.e., those with >50% sequence identity) expected to be as accurate as moderate resolution experimental structures (1–2 Å RMSD). While the CASP and CAFASP competitions have shown some measurable progress over time in the accuracy of comparative models2,3, they indicated little ability to refine comparative models to an accuracy better than the template protein. The research we propose addresses this critical bottleneck to high accuracy protein comparative modeling. The three major sources of inaccuracy in comparative protein models are 1) incorrect choice of template protein, 2) inaccuracy in aligning the target sequence to the template, and 3) inability to routinely refine comparative models, i.e., to predict conformations of residues that do not align to the template, structural differences between the target and template proteins in aligned regions, and critical details such as side chain conformations. Improved methods of sequence alignment, fueled by the ever-growing databases of protein sequences4 and structures as well as algorithmic improvements, will contribute to the first two of these challenges. The research we propose focuses on the third of these challenges, model refinement, but will also contribute to identifying correct templates and alignments. Specifically, we propose methods for conformational sampling and scoring of comparative protein models, generated by any alignment and model building protocols. The new algorithms we develop will be capable of refining individual models to improve the accuracy, and choosing the most accurate model among several generated from different templates and/or different alignments. In short, we aim to improve the accuracy of comparative models by identifying the global free energy minimum of the protein sequence. This is a very challenging undertaking, despite the fact that the initial model(s) should be “close” to the global minimum, in the sense that at least the tertiary fold is correct, as long as the correct template is chosen. Success requires both adequate sampling and accurate scoring, and these two imperatives work against each other: more accurate scoring functions generally entail greater computational expense, reducing the amount of sampling that can be accomplished with fixed computer time. Our strategy is to develop methods that dramatically improve the efficiency of sampling, by a combination of 1) identifying which degrees of freedom are critical to sample, 2) developing new algorithms for making large moves along these degrees of freedom, and 3) using experimental data, if available, to help constrain the search space. In Section B.1, we argue that existing energy models, particularly those that treat the protein at an atomic level of detail, are accurate enough to be useful in refining comparative models. Then in Section B.2, we review available methods for protein sampling, and outline our approach to improving sampling efficiency for comparative model refinement. In Section B.3 we discuss the role that experimental data can play in aiding comparative model refinement, and identify low-resolution data from xray crystallography as a neglected but potentially very useful source of data for this purpose. Finally, in Section B.4, we return to the potential impact of new methods for high-accuracy comparative modeling, highlighting the role that comparative models can play in structure-based inhibitor discovery, and the challenges that confront this goal.

B.1 Refinement of Comparative Models: Scoring Several lines of evidence suggest that currently available all-atom energy functions, although they have limitations5, are capable of the accuracy required to achieve our goal (i.e., refining comparative models to 1–2 Å RMSD). We focus attention on scoring functions composed of all-atom force fields and implicit solvent models. These are used as the primary scoring functions in Aims 1 and 2 due to the attractive balance they provide between accuracy and computational efficiency. However, the sampling methods that we develop can

PHS 398/2590 (Rev. 09/04) Page 2 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. be used with virtually any all-atom scoring function, whether physics- or knowledge-based. Studies that provide grounds for optimism that all-atom energy functions can provide the necessary accuracy include: 1. Decoy studies, where energy functions are evaluated by their ability to distinguish native from non- native protein structures6,7, including studies that have focused on force fields such as CHARMM8, OPLS9, and Amber10 in combination with Generalized Born solvent models. Feig and Brooks also demonstrated that the CHARMM force field with GB or PB implicit solvent performed well in identifying the most accurate models among those submitted to CASP411. 2. All-atom protein folding simulations. Although these results are limited to relatively small systems (<60 residues), they nonetheless represent a stringent test of energy functions, since they involve sampling huge number of diverse protein conformations. For example, Dill et al. have used the AMBER/GB energy function to successfully fold 3 small proteins. In extensive replica exchange molecular dynamics (REMD) sampling of the conformational space of GB1, a 56-mer protein, the state of lowest free energy throughout that space is within 1.5 Å RMSD from the native structure. Other successful folding simulations with similar energy functions include several studies by Pande12-14. 3. Side chain and loop prediction. Numerous papers have tested sampling and scoring schemes for placing side chains on a native protein backbone, or reconstructing protein loops in a native protein. These tests provide an upper limit on the accuracy that could be expected for side chains and loops in more realistic homology modeling applications. Jacobson and co-workers, using the all-atom OPLS force field15- 17 and a Generalized Born implicit solvent model18,19, demonstrated the highest accuracy yet reported for predicting the conformations of loops with 4–12 residues20; other high-accuracy results in the literature have used similar molecular mechanics energy functions21. In these tests, thousands of loop conformations are generated, and the energy function shows a robust ability to identify native-like conformations, as shown in Figure 1. In an extension of this work, Li et al. combined rigid sampling of helices with prediction of the adjacent loops, to permit partial refinement of tertiary packing22. In tests analogous to those employed to validate loop prediction, i.e., predicting the loop-helix-loop region keeping the rest of the protein fixed at the native conformation, the average RMSD for the helix regions was 0.9 Å over test 36 cases.

Mean Median # # RMSD RMSD Res. Cases (Å) (Å)

4 35 0.2 0.2

6 160 0.5 0.3

8 66 0.8 0.4

10 40 1.2 0.5

Figure 1. Loop prediction results from Jacobson et al.20. Left: Statistics of loop prediction on 4, 6, 8, and 10 residue loops. Right: Example of local energy minima identified for a nine residue loop in 3pte (residues 78–86), during the three- stage prediction algorithm. These results and others suggest that, in most cases, the OPLS/GB energy function is capable of identifying near-native conformations, and dihedral-angle based sampling is capable of generating near-native conformations. Given that it is possible to identify accurate structures among decoys and reconstruct portions of proteins with high accuracy with currently available energy functions, why is it not possible to routinely refine homology models to comparable levels of accuracy? One critical bottleneck is sampling. A typical homology model requires refinement of several loops simultaneously, and may also exhibit distortions or incorrect lengths of secondary structure elements. Sampling of all of these degrees of freedom simultaneously represents a major challenge, but we argue in the next section that this challenge is not insurmountable.

PHS 398/2590 (Rev. 09/04) Page 3 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P.

B.2 Refinement of Comparative Models: Sampling Previous attempts to refine homology models using molecular mechanics methods have been largely unsuccessful. Reviews of protein structure prediction methods have bluntly reported that efforts to use energy minimization and molecular dynamics (MD) to refine initial models generally fail, frequently producing structures with significantly increased RMSD to the native state23,24. A few attempts have been made to use restrained MD to refine comparative models25-28. These methods remain computationally intensive, particularly when explicit solvent is employed, because long simulation times are required to surmount the energy barriers separating the native state and the starting model. The key problem with using MD for searching conformational space is the large gap between the size of a MD timestep and the timescale for converting between qualitatively different conformations. While femtosecond time steps must be taken to capture essential atomic motions, simulations of at least microseconds to milliseconds in duration must be performed. For example, in the area of all-atom protein folding, in 1997, Duan and Kollman achieved a major computer parallelization milestone: they ran the villin headpiece (a 36-mer) for 6 months on the SDSC supercomputer. Although they did not fold this small protein, they reached to within 4.5 Å of the native state29. It is difficult to estimate the timescale required for sampling the native state starting from an initial comparative model with significant errors, but the lack of success with refining comparative models using MD suggests that the timescale is not short, and is probably comparable to protein folding timescales. Although the initial homology model may be relatively close to the native state (a few Å RMSD), the effective energy barriers may be high, because the motions required to move towards the native state are “frustrated” by the tightly packed environment. Given this challenge, and given that prior efforts to refine homology models with MD have largely failed, why do we believe that refinement using all-atom energy function can work, in our hands, now? Our overall strategy is to sample critical degrees of freedom using highly efficient move sets, and to use all available knowledge, including bioinformatics analysis and experimental data, to constrain the search space. Specific elements of our strategy are the following: 1. Most importantly, we propose new sampling methods that avoid the inherent inefficiency of MD, by not being constrained by small, femtosecond steps. In Aim 1, we propose to refine homology models using a Monte Carlo strategy, where the moves involve large, concerted motions of critical structural elements, such as loops. The MC moves are informed by kinematics, which has been used to solve similar problems in the field of robotics. These moves perturb only dihedral angles, which avoids unnecessary sampling of high frequency modes like bond stretching, and do not perturb the overall tertiary structure. In Aim 2, we propose to refine comparative models based on remote homologs by adapting the “zippers” strategy, which dramatically improves the efficiency of protein conformational sampling by attempting to form contacts only if they are spatially proximate, due to locality in the sequence, or due to locality arising from previously formed contacts. 2. We focus sampling on those portions of the protein model that are expected to be least accurate, based on information from bioinformatics, i.e., analysis of the sequences and structures of homologous proteins. At the simplest level, the protein cores are usually better conserved than loops, where most gaps in the sequence alignment are located. The initial comparative models to be subjected to refinement will be generated by MODELLER30-37, which will quantify the expected level of uncertainty in the coordinates, based primarily on the multiple sequence alignment; the sampling methods will then use this information to preferentially sample those portions of the models with the least certainty. Other “knowledge-based” aspects of our methods include the use of side chain rotamer libraries and Ramachandran-allowed regions for backbone degrees of freedom, which avoid sampling conformations that do not occur in native proteins. 3. In both Aims 1 and 2, we will exploit simple but powerful multiple temperature sampling methods, primarily Replica Exchange, also referred to as Parallel Tempering38,39. These methods improve the efficiency of sampling by periodically exchanging conformations that run in parallel at different temperatures; the higher temperatures allow energy barriers to be traversed more efficiently. Replica Exchange finds minima of free energies, not energies, and this can be critical for predicting good structures using all-atom force-fields. Tests by the Dill group and others have confirmed that REMD is far superior to standard MD or Monte Carlo minimization methods. Replica Exchange is simple to implement and is PHS 398/2590 (Rev. 09/04) Page 4 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. ideally suited to cheap, scalable, loosely-coupled Linux clusters. The investigators on this proposal together have over 1000 CPUs at their disposal. 4. Finally, when experimental data is available, it can also be used to focus the sampling effort, as discussed in the next section and in Aim 3.

B.3 Comparative Modeling and Experimental Structure Determination At least in principle, all protein structure determination and prediction methods can be expressed as an optimization of a scoring function. They differ in (i) information about the structure encoded in a scalar scoring function and (ii) optimization of the scoring function that delivers good scoring conformations. The defining aspect of a method is the type of information, which can be derived by experiment (e.g., structure factor amplitudes from x-ray crystallography and distance restraints from NMR spectroscopy), from a physical theory (e.g., a molecular mechanics force field), or from statistical preferences observed in known protein structures (e.g., statistical potentials of mean force and geometrical restraints from homologous known structures). The difficulty of protein structure characterization encourages hybrid methods, in which information from different sources is combined to compensate for the shortcomings of the individual approaches. For example, known structures of proteins related to the target have been used to facilitate structure determination by x-ray crystallography40-43, NMR spectroscopy44-46, and electron cryo-microscopy (cryoEM)47-49. Protein crystallographers already supplement the relatively sparse experimentally determined structure factor amplitudes and potentially their phases with a molecular mechanics force field50. This synthesis is needed because of the large number of degrees of freedom in a protein structure, the limited number of experimental observations, errors in the phases, and the fundamental problem of phase bias; the integration is achieved by including both experimental and physical information in a single scoring function used for model building. Protein crystallographers can also benefit from known protein structures that are related to the target sequence: An initial model of the target based on the related known structures can be used in molecular replacement to locate the target in a unit cell and obtain the initial electron density map for subsequent model refinement43. Many papers addressed the crystallographic aspects of molecular replacement, including different crystallographic target functions51-56 and optimization protocols for the translational and/or rotational searches57-62. In contrast, relatively little has been done on assessing and improving the ability of comparative modeling to provide a sufficiently accurate starting model. The dependence of molecular replacement on the accuracy of the target-template alignments has been explored63. Most importantly, even small differences in target-template alignments can make a difference to molecular replacement. This observation justifies a tight coupling between the sampling of the crystallographic and comparative modeling degrees of freedom, guided by a unified scoring function that depends on crystallographic data, template structures, and a molecular mechanics force field. In Aim 3, we propose such an integration that will produce more accurate models for more proteins than is possible either by comparative modeling or crystallography on its own. When only a relatively low-resolution (2.5-4.0 Å) crystallographic data set is available, information from template structures allow efficient calculation of more accurate models. And when only low-accuracy initial comparative models are available, the integrated method will result in increased robustness and applicability of molecular replacement. The impact of even a modest increase in the accuracy and applicability of molecular replacement is likely to be large. Approximately one half of the crystallographic structure depositions in the Protein Data Bank are obtained by molecular replacement, corresponding to 1764 structures in 200464. The estimated fraction of the genome to which molecular replacement is applicable increases from ~15% to ~30% when a more accurate profile-profile alignment is used for construction of the initial model instead of pairwise sequence alignment.63 Moreover, more robust molecular replacement protocols proposed in Aim 3 are expected to be valuable in determining many protein structures for which only medium resolution (2.5-4.0 Å) crystallographic data sets and remotely related template structures (less than 30% sequence identity) are available. Based on the experience of the New York Structural Genomics Research Consortium (NYSGXRC), we estimate that as many as 30% of structural genomics targets fall into this category. Thus, a tight integration of crystallography and comparative modeling may increase the accuracy of a fraction of the many structures produced by molecular replacement as well as produce hundreds of structures each year that would not be produced otherwise. This impact may shorten the structural genomics effort by a year or two, or, equivalently, provide a denser and more accurate structural coverage of the protein sequence space.

PHS 398/2590 (Rev. 09/04) Page 5 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. B.4 Comparative Modeling and Structure-Based Inhibitor Design The ability to create comparative models with accuracy comparable to experimental structures would open up a vast new set of protein targets to structure-based inhibitor design. In Aim 4, we propose to directly evaluate the suitability of our models for this task, specifically the use of docking to identify inhibitors. It is widely accepted that docking to comparative models is more challenging and less successful than docking to crystallographic structures. Nonetheless, comparative models have been used in conjunction with virtual screening to successfully identify novel inhibitors over the past few years, as reviewed recently by Jacobson and Sali65. A series of papers by Kuntz, McKerrow, Cohen, and co-workers, dating back to 1993, describe successful efforts to use comparative protein models to aid inhibitor design against cysteine proteases in several parasites66-69. Other proteins successfully targeted Enzyme % of ranked Enzyme % of ranked by docking against comparative models have included db to find db to find matriptase70, Bcl-271, retinoic acid receptor72, DHFR in T. 25% of 25% of cruzi73, human CK274, and CDK475. known known ligands ligands Despite these success stories, comparative models DHFR GART continue to play a relatively small role in structure-guided Holo* 2.0 Holo* 0.4 drug design. A few studies have attempted to quantify the Apo 3.9 Apo 30.0 accuracy of docking to comparative models by evaluating Model 10.7 Model 6.8 the ability to enrich known inhibitors relative to large

PNP - PO4 SAHH databases of “decoy” ligands. A thorough study by Holo 2.8 Holo* 1.1 McGovern and Shoichet compared the success of docking Apo 13.1 Apo 9.7 against ten enzymes, each in three different conformations: Model* 1.2 Model 20.5 holo (ligand bound), apo, and homology modeled76 (Table

PNP+PO4 AR 1). In general, the enrichment was best for holo structures Holo* 0.4 Holo* 2.8 and worst for the homology models, with significant Apo N.D. Apo 4.3 variation from case to case. Nonetheless, at one extreme, Model 0.9 Model 10.6 purine nucleoside phosphorylase, the modeled structure PARP AChE actually performed better than the holo structure. In other Holo* 2.8 Holo* 6.3 cases, the homology models performed poorly enough to Apo 3.8 Apo 9.0 be problematic as targets, though in every case they lead to Model 18.2 Model 15.6 enrichments of known ligands significantly better than Thrombin TS simple random selection of compounds. Other studies of Holo 6.6 Holo 3.5 docking against comparative models, such as those of Diller 77 78 Apo* 3.1 Apo* 2.1 and Li and Oshiro et al. , focused on other classes of Model 3.5 Model 2.6 proteins (kinases) and arrived at similar conclusions. Table 1. Docking the MDDR database against Surprisingly little effort has been made to determine in detail holo, apo, and modeled structures. *Best why docking against homology models typically leads to conformations worse results than docking against crystal structures, and to improve methods for docking to comparative models explicitly. From the studies comparing the results of docking to homology models and x-ray structures76, several features are nevertheless apparent. One common problem in homology models is misplaced active site side chains, often to the point where the side chain will occupy the ligand binding site. A good example of this error occurs in the Poly(ADP-ribose) polymerase (PARP, Table 1), where the conformation of a single residue, Tyr906, leads to a bad steric conflict with the adenosine analog ligand in the homology model.76 One possible approach to increasing the robustness of docking to comparative models is considering multiple candidate conformations in the docking. Such an approach must consider the relative internal energies of differing conformations as well as the differing docking energies. This seemingly obvious point has often been overlooked, but can have a huge effect on docking success rates with homology-modeled and flexible receptors79. The comparative model refinement approaches that we will consider seem well suited to such an approach, at least in principle. Because we approach the refinement problem using physics-based, all-atom energy functions, we naturally model both the internal energy of the receptor and the interaction-energy it makes with the docked ligand and so avoid the trap of generating high energy receptor conformations for the docking ligand. We consider these points in detail in Section D, Aim 4.

PHS 398/2590 (Rev. 09/04) Page 6 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. B.5 Interdisciplinary Approach to Comparative Model Refinement As discussed above, comparative protein structure model refinement is exceptionally challenging, both scientifically and technologically. We believe that we can succeed by taking an interdisciplinary approach that integrates methods grounded in bioinformatics (Sali), physics (Jacobson and Dill), and applied mathematics (Coutsias). We will also benefit from expertise in computer science provided by Dr. Michael Wester (U. New Mexico). Close collaborations will be facilitated by most of the researchers being located together at UCSF; our budget also allocates funds for Dr. Coutsias (U. New Mexico, Dept. of Mathematics and Statistics) to spend several months a year at UCSF. The integration of our team will be facilitated by our existing collaborations. As discussed in Section D.1, Coutsias, Dill, and Jacobson have published previous work on protein loop modeling80. Coutsias and Dill have other on-going collaborations81, Dill and Jacobson jointly mentor a post-doctoral researcher, and Shoichet and Jacobson jointly mentor another post-doc. In short, we recognize that interdisciplinary collaboration is critical to the success of our proposed work, and each aim requires close interactions between several investigators.

C. Preliminary Results Two of the investigators (Jacobson and Sali) have previously worked extensively in the area of comparative protein structure modeling. The work proposed here represents an entirely new line of research and has no overlap with funded projects in their groups. However, their prior work is relevant to establishing the feasibility of this proposed research, and is briefly reviewed here. The other investigators (Dill, Shoichet, and Coutsias) have no prior direct experience with comparative modeling. Drs. Dill and Shoichet have extensive experience with other aspects of computational biochemistry (protein folding and solvation, virtual screening), while Dr. Coutsias’ background is in Applied Mathematics. We review the aspects of their research programs that we will build upon to forge the new collaboration represented by this proposal. C.1 Comparative modeling (Jacobson and Sali) Dr. Jacobson’s group has developed algorithms for comparative modeling based on all-atom force fields and implicit solvent models (Generalized Born). These algorithms include side chain prediction82,83, loop prediction20,80,84, and helix sampling22. As discussed further in Section B.1, the loop prediction algorithm was tested on over 800 loops, 4–12 residues in length, and showed the highest accuracy of any method yet reported in the literature. Applications of these algorithms have included characterizing functionally relevant loop motions in TIM84, structure determination using residual dipolar couplings85, and several inhibitor discovery projects, discussed in Section C.4. The Sali group develops and applies tools for (i) predicting the structures of proteins, (ii) determining the structures of macromolecular assemblies, and (iii) annotating the functions of proteins using their structures. Dr. Sali’s group created MODELLER, a widely used software package for comparative modeling30-37, with over 7000 academic licenses. Recent developments include improvements in fold assignment86, sequence- structure alignment87, sidechain modeling88, incorporation of an implicit solvation model34, model assessment89, and automation of the modeling process35,90. In addition, the Sali group has applied MODELLER to construct many comparative protein structure models that were used to address biological problems in collaboration with other scientists, including identification of ligand binding sites, characterization of substrate and ligand specificity, and protein-protein docking30,44,91-115. C.2 Sampling methods (Dill and Coutsias) A major emphasis of this grant is the development of powerful sampling algorithms for refining comparative models. The Dill group has collaborated with members of the math community on new methods. With Professor Ben Rosen (U. Minnesota), he has developed a novel global optimization algorithm called the Convex Global Underestimator (CGU)116-119, which has been successfully applied to both small model proteins and protein-ligand docking. During the past few years, Dill has also collaborated with Dr. Coutsias, an applied mathematician, on the development of new algorithms for computational biology80,81. The work most relevant to this proposal applied methods from kinematics to sampling protein conformations, in collaboration with both Dill and Jacobson80. The essential idea is to use correlated changes in multiple dihedral angles that allow large energy barriers to be surmounted without perturbing the overall tertiary structure of a protein. The

PHS 398/2590 (Rev. 09/04) Page 7 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. advantages of this approach relative to similar methods developed previously include its speed and generality, as discussed further in the Research Plan. This approach will be used to develop highly efficient Monte Carlo moves for refining homology models (Specific Aim 1). C.3 Force fields and solvent models (Jacobson and Dill) This proposal focuses on developing new sampling methods for refining homology models. In Section B.1, we argue that molecular mechanics energy functions are capable of achieving high accuracy in protein modeling, based on evidence from decoy studies, protein folding, and loop prediction algorithms. However, we recognize that molecular mechanics energy functions, and particularly the implicit solvent models that we rely on for much of our proposed work, have flaws that could potentially limit the accuracy of our models. The characterization and correction of flaws in the energy function is a major undertaking, and we consider it to fall outside the scope of this proposal. Nonetheless, the Jacobson and Dill groups have on-going efforts to systematically improve the accuracy of force fields and solvent models. Prior work by Jacobson in this area has included the evaluation and refinement of torsional potentials by side chain and loop prediction accuracy82, and evaluation of implicit solvent models for representing hydrogen bonding interactions120,121. The Dill group has published a large number of studies aimed at improved understanding of the hydrophobic effect and other fundamental properties of aqueous solvation122-133. Any improvements in the energy functions generated by the Dill and Jacobson groups will be tested in the context of homology model refinement. C.4 Use of comparative models for drug discovery (Shoichet and Jacobson) Comparative models have many uses in biological research; we focus attention on their use in inhibitor discovery. In Aim 4, we directly assess the suitability of the models we generate for this purpose. The Shoichet group published one of the first thorough assessments of the accuracy of virtual screening using comparative models76 (Section B.4). This study was encouraging to the extent that it suggested that homology models could be used as templates for molecular docking, though they typically performed significantly less well than ligand-bound crystal structures. This study used unrefined models, exclusively drawn from MODBASE1 without further work; one conclusion that we draw from it is that refinement of the models may dramatically improve the usefulness of these modeled structures for docking. Prior work by the Shoichet group on integrating docking with receptor internal energies134 provides a framework for utilizing the ensembles of low-energy models that will be generated in Aims 1 and 2. The Jacobson group is pursuing collaborations with several groups aimed at inhibitor discovery and functional annotation of proteins lacking experimental structures, by docking against homology models. Letters of collaboration from James McKerrow (UCSF), Jack Taunton (UCSF), John Gerlt (UIUC), Roger Brent (MSI), and Ted Holman (UCSC) are included that describe some of these collaborations. The work with Holman has led to a publication135 and the discovery of three new low micromolar inhibitors of human lipoxygenases. C.5 Combining comparative modeling with experimental structure determination (Sali) The Sali group applied comparative modeling to specific proteins in order to support structure determination by experimental methods, including x-ray crystallography, NMR spectroscopy, cryoEM, and electron tomography. For example, we contributed to the structure determination of PEBP2/CBF Runt-domain by combining the sparse restraints determined by NMR spectroscopy and the alignment to several remotely related immunoglobulin structures44. We facilitated the crystallography of a number of proteins by providing comparative models for molecular replacement calculations136. We helped construct partial molecular models of the first eukaryotic ribosome from yeast47,137, the E. coli ribosome48, and the first mammalian ribosome from dog (with C. Akey, unpublished) by providing comparative models for fitting into cryoEM maps at resolutions from 8-15 Å. We are refining the structures of scruin domains linking the actin filaments, in the context of ~10 Å resolution cryoEM density maps of the Limulus acrosomal bundle (with W. Chiu, unpublished)138. We are helping to position PSD-95 and CAM-K in the postsynaptic density area of a synapse by providing comparative models for use as templates in the analysis of the electron cryo-tomography density maps at ~60 Å resolution (with W. Baumeister and M. Kennedy, unpublished). These applications to specific proteins in collaboration with experimentalists encouraged us to work on the development of hybrid structure determination methods that integrate comparative modeling with experimental sources of information. For example, we recently developed protocols for combining comparative modeling and

PHS 398/2590 (Rev. 09/04) Page 8 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. fitting into cryoEM density maps at 5-15 Å resolution, which can significantly reduce alignment errors in comparative modeling49. We are also using information about residue solvent accessibilities obtained efficiently by electron spin resonance spectroscopy to restrain fold assignment and sequence-structure alignment (with M. Chance, unpublished).

D. Research Plan We will develop new methods to refine comparative models by integrating methods grounded in bioinformatics, physics, and applied mathematics. We address both the problem of high-resolution homology modeling and comparative modeling based on more remote templates. Although there is no clear dividing line between the two problems, we anticipate that the kinematics-based replica exchange Monte Carlo (REMC) method in Aim 1 will be of greater use for high-resolution homology modeling, and the zippers-based strategy in Aim 2, which provides more aggressive sampling, will be of greater use for comparative models based on remote templates. Long-term, we aim to combine the two methods, as well as the molecular replacement methods in Aim 3, into a single algorithm that can be applied to a continuum of comparative modeling problems, from single side chain substitutions to remote homologs. As discussed in greater detail in our Data Sharing Plan (Section D.6), all source code developed under this support will be freely available to the academic community. To maximize the utility of the code, we will use modular programming practices, allowing the code to be compiled as a stand-alone executable or compiled as a module with other programs. This modularity will allow other groups to test scoring functions other than those used here, and to modify or extend our sampling algorithms.

D.1 Specific Aim 1 Implement a method for refining homology models based on methods from kinematics. [Jacobson, Dill, Coutsias, and Sali] We argue in Section B that insufficient sampling has been a major impediment to successful refinement of comparative models. Here, we propose a strategy to dramatically improve the sampling of critical elements of protein structure, primarily loops, in comparative model refinement. The elements of the strategy are a bioinformatics approach to identifying which portions of a model are likely to require refinement, highly efficient Monte Carlo moves based on kinematics that make large perturbations along these degrees of freedom, and a multiple-temperature (replica exchange) simulation protocol. Although we and other researchers have laid the foundations for each of these elements of the strategy, combining them in the way we describe is novel and will lead to multiplicative improvements in sampling efficiency. This sampling scheme can be used with essentially any energy function, but we will use it with all-atom force fields (OPLS and AMBER) and implicit solvent models (initially Generalized Born), based on the high accuracy obtained in Jacobson’s prior loop prediction work20, discussed in Section B.2. D.1.1 Methods Replica exchange molecular dynamics (REMD) simulations are now used extensively and successfully for studying peptides139-146. In these methods, molecular dynamics or Monte Carlo trajectories at multiple temperatures are run in parallel and periodically exchanged; the high-temperature trajectories help to cross high energy barriers, while the low-temperature trajectories explore low energy basins. The strengths of the method include the following: 1) it is simple to implement, 2) it is ideally suited to take advantage of cheap, scalable Linux cluster architectures, and 3) the trajectories obey detailed balance and entropic effects are properly included. Despite these strengths, we believe that REMD will be much less efficient for refining homology models of typical size proteins (>100 residues) than for smaller systems because the energy barriers increase dramatically due to tight packing. Initial homology models may also be quite strained, and releasing the strain requires crossing high energy barriers. We propose to overcome these obstacles by 1) implementing Monte Carlo moves consisting of concerted dihedral angle motions in a replica exchange simulation, and 2) focusing sampling on the degrees of freedom that require refinement. One effective way to reduce the number of degrees of freedom is to perform sampling in dihedral angle space rather than Cartesian space147-160. That is, the bond lengths and bond angles have large force constants and thus tend to remain close to their equilibrium values. Torsion-space optimization also takes advantage of the

PHS 398/2590 (Rev. 09/04) Page 9 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. known chemical bonding structure, i.e., local energy minima can be connected by changes in one or a small number of dihedral angles. The major disadvantage is that changing backbone dihedral angles in the middle of a protein chain, even by a few degrees, can grossly perturb the structure, due to a lever-arm effect. Solutions to this problem have been developed147-160, dating back to the early work of Go and Scheraga147, who demonstrated that the possible () angles of three-residue segments, given the endpoint geometry, could be determined analytically. Coutsias, Dill, and Jacobson have recently reformulated this problem based on methods from kinematics161-165, which allowed significantly simpler solutions and much broader applicability80. In particular, our formulation can be used to sample any 6 dihedral angles in a closed loop; the significance of this generality for homology model refinement is discussed below. The numerical implementation is also extremely fast, requiring approximately 10 ms on a current generation PC to determine all accessible backbone geometries of a three-residue segment.

Figure 2. Monte Carlo minimization of an 8-residue loop (residues 84-91 in lysozyme, PDB code 135l), using a kinematics-based move set, taken from Coutsias et al.80. Left: Energy vs. iteration, for 10 MCM runs starting from the same initial conformation, which has 3.8 Å RMSD to the crystal structure. Right: Backbone RMSD vs. iteration. Each run finds the same energy minimum, which has <0.5 Å RMSD to the crystal structure. These results used the CHARMM19/EEF1 force field. The simplest MC move using this strategy randomly perturbs one backbone “driver angle” in a loop and adjusts six other dihedral angles within the loop to close it. Data using such a strategy is shown in Figure 2. The method robustly recovers the conformation of a loop when it has been deliberately perturbed away from the native; the backbone RMSD decreases from 3.8 to <0.5 Å in a few hundred MC steps. This simple strategy is already quite efficient because the torsion-space moves can cross high energy barriers. Further improvements in efficiency can be obtained by 1) screening new conformations for steric clashes prior to energy evaluation (any state with a steric clash is not accepted), 2) choosing driver dihedral angles, and screening the closure dihedral angles, to ensure that they fall within the allowed portions of the Ramachandran plot, and 3) perturbing side chain conformations concomitantly with the backbone perturbations. All of these strategies are already implemented in the Jacobson group’s software platform Protein Local Optimization Program, which has been interfaced with the kinematics code80. One important elaboration of this elementary “loop MC move” is to perturb the positions of helices. Structural alignments of homologous proteins reveal that the loops are of course the most variable, but corresponding helices frequently adopt somewhat different conformations, especially at lower levels of sequence similarity. The flexibility of our kinematics-based formalism makes it possible to trivially perturb the positions/orientations of helices, as rigid objects, by choosing driver and closure angles on the loops on either side of the helix. Long-term, we also aim to include structural deformations of secondary structure elements, e.g., helix bending, which can be important in refining comparative models based on remote homologs. Our strategy of using dihedral-angle based MC moves makes it possible to focus sampling on those portions of the structure that require refinement, because the moves only perturb local regions of the structure. We propose to identify these regions by analyzing alignments of multiple homologous sequences and structures. The Sali group, which has extensive experience with homology model evaluation, will undertake this task. This module will take as input a multiple sequence alignment, including the target and template proteins, and provide as output a number (from 0 to 1) for each residue quantifying the expected accuracy of placement based on the sequence alignment, as well as any available structure information for homologous proteins. For

PHS 398/2590 (Rev. 09/04) Page 10 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. example, residues from the target protein aligned to gaps in the template would have a value of 0; the sequence alignment provides no direct information about their placement, and the burden of predicting these residues falls squarely on the physics-based refinement protocol. Highly conserved residues would have values near 1 and would be subjected to little if any physics-based refinement. These values would then be used to prioritize MC moves. That is, driver angles for the MC move would be chosen randomly but with a frequency in inverse relationship to the score provided by the alignment module. We envision that available experimental data, e.g., from NMR, could also be incorporated into this scheme to further restrict the sampling. The initial comparative models will be obtained from MODELLER, although any model building protocol can in principle be used. The combination of using implicit solvent and sampling a relatively small number of degrees of freedom of the protein should make it possible to span a large temperature range with a relatively small number of replicas in the REMC simulation. The simplest replica exchange scheme would initialize the trajectories at each temperature with the same initial model. However, there may be advantages to initializing the replica exchange with models constructed based on different alignments to the same template (either generated by different alignment programs or by suboptimal alignments) or based on alignments to different templates. The REMC scheme would then select the lowest energy model among those presented to it, while refining each of them. In addition to Replica Exchange Monte Carlo, we will also test Monte Carlo Minimization 166 as a search strategy. A disadvantage of MCM is that is does not obey detailed balance, but energy minimization prior to evaluating the Metropolis criterion can greatly improve sampling efficiency, and make very large MC steps possible. MCM may be particularly appropriate for the early stages of refinement, when the model is far from the global energy minimum and possibly quite strained; REMC could then be used after initial relaxation. Most of the above discussion has focused on improving the efficiency of sampling for homology model refinement. However, we recognize that the energy functions we intend to use for most of the work (force field + Generalized Born) have deficiencies that could limit our accuracy. In Section B.1, we argue that such energy functions are capable of high accuracy and are significantly more accurate than most scoring functions in current use for homology modeling. Nonetheless, as described in Section C.3, the Dill and Jacobson groups have on-going research projects aimed at identifying and correcting the deficiencies of the current generation of fixed charge force fields and implicit solvent models82,120-133. Any improvements to the energy functions made in our on-going work will be tested on the homology modeling applications described here. In addition, we will explore whether more expensive, but potentially more accurate, energy functions help to improve accuracy, including the use of Poisson-Boltzmann instead of Generalized Born implicit solvent and the use of the polarizable instead of fixed charge AMBER and OPLS force fields. Because these energy functions are significantly more computationally expensive, they may be most useful for rescoring a relatively small number of low-energy conformations from the initial simulations. Initial testing of the method will be performed by creating comparative models for proteins with high-resolution crystal structures, especially those in Table 2 (Aim 4), to facilitate diagnosis of any problems that arise. Models generated from structure-based (“correct”) alignments will test the ability of the method to refine models free from alignment error, while models based on sequence alignment algorithms will test the ability to cope with imperfect alignments. Other aspects of our plans for evaluation, including blind predictions, are described in Specific Aim 4. A special focus of the evaluation will be the utility of the models for virtual screening applications, to be carried out by the Shoichet lab. One feature of this method is that it will produce not a single structure, but an ensemble of low-energy structures. This introduces additional complexity in the evaluation of the method (clustering and ensemble averaging of the “trajectories” is required), but may also provide useful information concerning uncertainties in the model as well as intrinsic flexibility in the structure. This point is elaborated upon in Aim 4. D.1.2 Expected results and alternative approaches We expect that methods described here will enable the refinement of many comparative models to significantly improved RMSD, when the initial model is built using an alignment that is free from any significant errors (generally >30% sequence identity). At least in their initial implementation, the methods will require significant computational expense, at least hours and probably days on a few dozen processors. If the initial proofs of principle are successful but slow, we will then undertake algorithmic optimization, varying parameters such as

PHS 398/2590 (Rev. 09/04) Page 11 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. the number of replicas. If some or all of the initial tests fail to improve the accuracy of the comparative models, we will distinguish between two possibilities. The first is that sampling is adequate but the energy function is inadequate to identify near-native conformations. In our initial tests using models generated for proteins with experimental structures, we can identify such cases when near-native conformations are sampled transiently, but have higher energies than incorrect conformations; we can also run sampling on the native proteins themselves to help assess this type of failure. In this case, we will explore other methods of scoring the conformations generated by the sampling. Possibilities include using Poisson-Boltzmann solvent instead of Generalized Born, or even polarizable force fields. Explicit solvent, on the other hand, is likely too computationally expensive, and not compatible with the large conformational changes generated in our MC scheme. One other possibility is that incorrect protonation states on titratable residues could lead to incorrect predictions. In principle, protonation state changes can be incorporated into the Monte Carlo scheme. The second possible cause of failure is inadequate sampling. In such a case, the energies of the conformations generated during the MC sampling would never reach those of the native basin. One possibility is that our set of elementary MC moves may be inadequate to sample all critical degrees of freedom; for example, if distortions of secondary structure elements or tertiary packing need to be sampled in order to reach the native structure, we could try normal-mode type MC moves167. If on the other hand the problem is kinetic trapping, we can either 1) use a simpler energy function which is faster to compute, specifically the MODELLER scoring function, which would enable about an order of magnitude greater sampling for the same total computer time (possibly with lower accuracy); or 2) use methods such as potential flattening at the higher temperature replicas to improve the efficiency of barrier crossing144. D.1.3 Summary We aim to dramatically improve sampling efficiency for comparative model refinement by combining 1) kinematics-based MC moves that preserve tertiary structure and bond lengths/angles, 2) the use of rotamer libraries and known Ramachandran backbone-allowed conformations, 3) restraints from the analysis of homologous sequences and structures (provided by MODELLER), 4) replica exchange, and 5) implicit treatments of solvent.

D.2 Specific Aim 2 Implement a method for refining comparative models based on remote homologs by combining the “zippers” strategy for sampling protein backbones with restraints derived from bioinformatics. [Dill, Sali, Jacobson, and Coutsias] In contrast to the refinement of comparative models based on close homologs, the accuracy of comparative models based on remote homologs is frequently limited by the accuracy of sequence alignments. We propose to address this challenge with a refinement algorithm that can actually recover from (locally) incorrect sequence alignments, and identify correct models among a series generated by multiple alignments to a single or multiple templates. The refinement algorithm will be built on the “zippers” approach to protein folding that the Dill group has been developing, and combined with restraints obtained from MODELLER. In addition to being tolerant of modest sequence alignment errors, the zippers algorithm will be capable of providing models for large insertions (i.e., those portions of the sequence that do not align to template residues), which is a restrained ab initio structure prediction problem. D.2.1 Methods The Dill group is combining Replica Exchange searching with the Amber all-atom force field (Amber 6 + parm 96 + GB solvent), in conjunction with a search method called “zipping”, which explores small-entropy-loss conformational steps. Two major directions of our group for the past decade have been the physics of how proteins fold so quickly, and new computer methods that are guided by that physics. For speeding up protein structure prediction, we have explored zipping, a Convex Global Underestimator method116-119, an elasticity- based solution to the Traveling Salesman problem168, and recently the powerful CKY method from computational linguistics, in collaboration with Aravind Joshi at U. Penn. The most successful so far is a method called “zipping”.169-173 In zipping, the computer attempts to form contacts only if they are spatially

PHS 398/2590 (Rev. 09/04) Page 12 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. proximate, due to locality in the chain sequence, or due to locality arising from previously formed contacts. We now have many proofs of principle that zipping searches only a small fraction of conformational space, that it leads to secondary and tertiary structures, that it usually finds the native states in test model problems, and that the folding routes are consistent with Phi value analysis of folding experiments 169-173. It is an exceptionally efficient conformational search method. Figure 3 illustrates the zippers method applied to the protein GB1. The method constructs the C-terminal end of protein GB1, 16 residues, to within 1.7 Å in 15 cpu-days in all-atom simulations, whereas the Folding@home14 simulations of Pande of the same system require about 5000 cpu-days. The next stage, in which we folded the first 31 residues of the same protein, to about 2.2 Å, requires 28 cpu-days. Finally, we can fold the full 56-mer protein, to about 2.5 Å resolution, in 280 cpu-days. To our knowledge, this is the first time such a large structure has been accurately folded using all-atom physical potential functions. Most importantly, the search time scales linearly with chain length N, as expected from zippers principles, rather than exponentially (in NP-search problems) or as N3, which is expected approximately from Monte Carlo methods. This, and similar results in protein A and alpha-spectrin SH3 domain give us three preliminary proofs of principle that we can fold small proteins at the all-atom level of detail, to their native configurations. Further extensive testing is required, and being supported under NIH grant GM34993 to the Dill group.

Figure 3. Zippers methods applied to ab initio protein folding. Left: Initial folding of the C-terminal end of GB1, with 1.7 Å backbone RMSD to the native structure, accomplished in 15 cpu-days. Middle: The initial fragment extended to 31 residues (2.2 Å RMSD; 28 cpu-days). Right: Full 56 residue protein, folded in 280 cpu-days, to 2.5 Å RMSD. The sampling is accomplished by a combination of replica exchange molecular dynamics with the zippers search strategy, which identifies and restrains low-entropy-loss contacts. In the work proposed here, the zippers method will start with an initial comparative model, rather than an extended peptide chain, and incorporate restraints from MODELLER. Of course, even if such all-atom methods succeed on small proteins such as these, their impact will be far greater if they can be combined with comparative models, which can be applied to much larger proteins. We will use target-template alignment to restrain much of the target structure while our zippers + replica exchange molecular dynamics (REMD) simulation focuses on sampling the loops and other insertions or structurally divergent regions. MODELLER will be applied to generate a number of initial comparative models of the target sequence based on its alignment to homologous template structure(s). The same template-derived interatomic restraints applied by MODELLER during the construction of the comparative model can be applied between well-determined regions in the zippers + REMD simulation to focus sampling in poorly determined regions. These poorly determined regions can be identified based on the sequence variability in the corresponding multiple sequence alignment, by assessment programs such as PROSAII174 and VERIFY3D175, and variability among independently calculated MODELLER models36. For those poorly determined regions, the zippers simulation will simply take the initial model as a suggestion, and explore other packing arrangements; as such, it will not necessarily become trapped in a conformation suggested by an incorrect alignment. Experimental restraints obtained from NMR or other experiments, if available, can also be included in the simulation. Starting from comparative models with restraints on those portions expected to be most accurate, the zippers method will then be used to identify low-entropy-loss contacts in the unconstrained portions of the model. Those contacts that are most favorable, based on the free energy estimated from performing a REMD simulation, will then be retained. At an algorithmic level, there is no difference between the restraints imposed from bioinformatics and those imposed based on the zippers algorithm; they will all be treated as fictitious harmonic potentials. As the simulation progresses, the number of restraints will increase as the zippers method identifies new favorable low-entropy-loss contacts.

PHS 398/2590 (Rev. 09/04) Page 13 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. Final refinement and scoring of the models, after applying the zippers search strategy to the most divergent portions of the model, will be performed by REMD. As in Aim 1, it may be possible to apply the final refinement and re-scoring to multiple models simultaneously (initially built from different alignments and/or templates) by seeding each temperature with a different model. The Dill group has performed a proof-of-principle for such an approach involving 48 replicas, 47 of which start in distant regions of conformational space, and only 1 of which is near-native, and the REMD correctly identified the native state. After equilibration, the REMD simulation generates an ensemble of structures sampled from the equilibrium distribution at a number of temperatures. These structures can be clustered into distinct conformational substates 176 and the equilibrium population (or corresponding free energy) of each cluster estimated by the weighted histogram analysis method177. As with the method described in Aim 1, we anticipate that fluctuations within this ensemble may provide useful information about uncertainty in the model as well as intrinsic flexibility of the protein. Initial testing of this method will proceed along similar lines as in Aim 1, starting with comparative models created for proteins that have high-resolution crystal structures (focusing on those in Table 2, below). For each of these targets, a control REMD simulation starting from the experimental structure will also be initiated. This reference simulation will show how well the energy function can represent the particular protein. Other aspects of our strategy for testing the method are described in Aim 4. Dr. Coutsias has recently begun to collaborate with the Dill group on the zippers search strategy, and will use his expertise in applied mathematics to help design new efficient algorithms for identifying and creating low- entropy-loss contacts. Long term, we aim to replace the use of molecular dynamics in this aim with the kinematics methods developed in Aim 1, which should greatly improve the efficiency of sampling; Dr. Coutsias will take the lead on this effort to combine these strategies. D.2.2 Expected outcomes and alternative approaches This Aim is the most ambitious of this proposal. All of the possible pitfalls discussed for Aim 1 (Section D.1.2), especially those related to potential limitations of the energy models, apply here even more strongly, because the zippers-based sampling scheme is much more aggressive. We believe, however, that the exceptionally challenging nature of refining comparative models based on remote templates requires testing of ambitious, even risky new ideas. The key new idea proposed here is that the zippers search strategy can greatly improve the efficiency of backbone sampling in comparative model refinement, as the Dill group has already shown it can in the context of protein folding. We anticipate that the zippers methods will be particularly useful for one of the most challenging problems confronting comparative modeling, which is predicting the conformations of large insertions, including those that contain secondary structure. It is also likely to represent an efficient method for perturbing the lengths and precise packing arrangements of secondary structure elements, another important challenge in refining comparative models based on remote templates. D.2.3 Summary We have a new conformational search method, combining “zippers” with REMD, that we are finding to be successful in the all-atom folding of three proteins so far, up to 60-mers. It’s an exciting result because it shows that: (a) state-of-the-art physics-based force-fields are pretty good, and (b) this new sampling method can find native states in large search spaces. The best way to scale up to larger proteins is to use homology- derived restraints and apply the all-atom methods to the subsequent sampling of the loop conformations and other parts of the chain that the comparative model is not predicting reliably.

D.3 Specific Aim 3 Integrate information in template structures, a molecular mechanics force field, and crystallographic data to maximize the number and accuracy of protein structures determined by molecular replacement. [Sali, Jacobson, and Dill] Crystallographic structure determination by molecular replacement is limited in two model building aspects. First, it is sensitive to the inaccuracy of the starting models. In general, at least 60% of the atoms in the initial model need to be within 2.5 Å RMSD error for molecular replacement to work at 2 Å crystallographic resolution63. Second, the ensuing refinement of the molecular replacement solution has a relatively small radius of convergence53. As a result of these limitations, x-ray data sets for a large fraction of structural

PHS 398/2590 (Rev. 09/04) Page 14 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. genomics targets are essentially thrown away because the resolution is too poor (2.5–4 Å) to obtain a high- accuracy structure with an acceptable amount of effort. We propose to address the first problem by an iterative process of target-template alignment, model building, and model assessment based partly on crystallographic data (D.3.1); and the second problem by an increasingly complex series of algorithms that refine a given model based on crystallographic data, a molecular mechanics force field, and information in the template structures (D.3.2-4). We will benefit from a collaboration with expert crystallographers David Agard and Luke Rice as well as test cases provided by Robert Stroud, Stephen Burley, and Wayne Anderson (see letters of collaboration). D.3.1 Methods D.3.1.1 Version 1: Molecular replacement by iterative alignment, model building, and model assessment A small number of ad hoc models are typically used independently as input to a molecular replacement program in the hope of finding at least one model that results in a molecular replacement solution. Recently, this simplest approach has been expanded by using models created based on different alignments63. We will take the next logical step by iterating the process of model building and molecular replacement, based on our existing “moulding” protocol (Appendix)86. This task will be achieved by adding a measure of harmony between a model and the crystallographic data (e.g., correlation coefficient and maximum likelihood scores) to the model assessment criterion of moulding, as follows. To ameliorate the problem of dependence of the final comparative model on the input target-template alignment, we developed an automated method that optimizes both the alignment and the model implied by it86. This task is achieved by a genetic algorithm protocol that starts with a set of initial alignments, and then iterates through re-alignment, model building, and model assessment to optimize a model assessment score. During this iterative process, (i) new alignments are constructed by application of a number of operators, such as alignment mutations and cross-overs; (ii) comparative models corresponding to these alignments are built by our program MODELLER; and (iii) the models are assessed by a composite criterion that depends on an atomic statistical potential178, alignment significance score, and model compactness. When testing the procedure on a very difficult set of 19 modeling targets sharing only 4-27% sequence identity with their template structures, the average model accuracy increased from 43 to 54% (the model accuracy was measured as the percentage of the Cα atoms of the model that were within 5 Å of the corresponding Cα atoms in the superposed native structure). We showed that this iterative protocol would produce even more accurate models if a better method for ranking of the models were available. Correspondingly, we will improve the composite model assessment score in moulding by adding a measure of harmony between the evaluated model and the crystallographic data. This measure (e.g., correlation coefficient and maximum likelihood scores) will be produced by a molecular replacement program. We will test the utility a variety of composite assessment scores with the aid of the existing benchmark of 27 proteins 63 and x-ray data sets for structures solved by MAD phasing by NYSGXRC (see letter of collaboration from S.K. Burley), the Midwest Center for Structural Genomics (see letter of collaboration from W. F. Anderson) and at UCSF (see letter of collaboration from R. Stroud). An improvement of the composite model assessment score will enhance the sampling of the alignment space as well as the selection of the most accurate model, so that a model resulting in a clear molecular replacement solution can be found. A variety of variations of the basic protocol will be explored: First, we will use different initial models for molecular replacement (e.g., all atom models, backbone-only models, poly-alanine models, and models of conserved regions only). Second, we will use several molecular replacement programs that vary in (i) the method for optimizing the translational and/or rotational degrees of freedom and (ii) the crystallographic target functions that depend on structure factor amplitudes and/or phases. These programs include CNS53, PHASER56, MOLREP179, EPMR61, PHENIX180, and AMORE60. An especially attractive feature of the PHASER implementation will allow us to employ multiple models as input to molecular replacement and assign different errors to different parts of a model55,56; model assessment programs such as ProsaII181 and VERIFY3D182 will be used to assess model errors. D.3.1.2 Version 2: Structure refinement by satisfying template-derived restraints while fitting an atomic model into a density map

PHS 398/2590 (Rev. 09/04) Page 15 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. Once a crystallographic solution is obtained by molecular replacement, the initial model needs to be refined. We recently developed a MODELLER module (Mod-EM) that fits a given atomic model into a mass density map by maximizing the cross-correlation between the model and the map (Appendix) 49. This module has so far been applied to the fitting of comparative models into cryoEM density maps at 5-15 Å resolution range. Here, as a first step towards using structure factor phase information in our optimizations, we propose to add the cross-correlation coefficient between a given model and a density map to the main scoring function of MODELLER. This integration will immediately allow us to optimize the conformation of a protein structure model based on simultaneous consideration of the template-derived restraints, a molecular mechanics force field, as well as its fit into a given density map. Initially, optimization protocols based on existing methods in MODELLER will be employed, including the variable target function method183, conjugate gradients, and molecular dynamics. If necessary, more powerful optimization schemes will be implemented, such as those discussed in Aims 1 and 2. This model refinement process can be viewed either as crystallographic structure refinement in real space restrained by template structure(s) or as comparative modeling restrained by a crystallographic density map. Compared to the existing real space refinement protocols, we are using additional information in the form of template-derived restraints. We expect that consideration of this additional information will improve the radius of convergence of the optimization as well as result in more accurate models. Typically, the maps from molecular replacement are subject to model bias and thus do not provide ideal starting points for real space refinement. In the cases where MAD/SAD phase information is available, however, we anticipate that incorporating template-derived information will have a positive impact on our ability to produce more accurate and complete models. For cases with no experimental phases, we will explore the use of annealed omit maps to minimize model bias. At the very least, real space refinement with template- derived restraints will represent a step towards automated model building with crystallographic target functions defined in the reciprocal space (Section D.3.1.3). D.3.1.3 Version 3: Structure refinement by satisfying template-derived restraints and a crystallographic target function In addition to real space refinement (D.3.1.2), we will test a variety of reciprocal space crystallographic target functions to optimize the harmony between the model and the crystallographic data, including the standard least-squares residual, maximum likelihood based on amplitudes (MLF)51, and maximum likelihood incorporating experimental phase information (MLHL)52. The crystallographic target functions will include the flat bulk-solvent model184,185 and the overall anisotropic B-factor correction. These target functions and their first derivatives with respect to model Cartesian coordinates will be obtained from PHENIX and, if needed, other crystallographic programs. D.3.1.4 Version 4: Iterative structure refinement We will implement and test an iterative version of structure refinement, as follows. A given starting model will result into a molecular replacement solution, producing the first electron density map. Next, the model will be refined as described in the previous sections (D.1.3.2-3). The molecular replacement map will then be re- calculated using the new model and the whole process iterated until convergence. A variety of convergence criteria will be tested, including free R value and a measure of structural change (e.g., DRMS). The benchmarks will focus on whether or not the iterative structure refinement has a larger radius of convergence and a higher accuracy than a single refinement step. D.3.2 Application to structural genomics The new functionality of MODELLER is expected to be valuable in refining many protein structures for which only medium resolution (2.5-4.0 Å) crystallographic data sets and remotely related template structures (less than 30% sequence identity) are available. Based on the experience of the NYSGXRC so far, we estimate that as many as 30% of structural genomics targets may fall into this category. Therefore, we will apply our protocols for molecular replacement to the crystallographic data sets that are not easily solved by standard methods. These data sets will be obtained from three structural genomics efforts (see letters of collaboration from S. K. Burley, W. Anderson, and R. Stroud). This application will provide valuable feedback for further development of the methods. The application of our methods to structural genomics will be funded by the corresponding centers. PHS 398/2590 (Rev. 09/04) Page 16 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. D.3.3 Expected results and alternative approaches Several protocols with a number of variations are described above. These variations increase the probability of success. They will be extensively tested based on the existing benchmark of 27 proteins63 as well as several structures solved by MAD phasing, from NYSGXRC, the Midwest Center for Structural Genomics, and UCSF. Our own web server for automated assessment of all aspects of a molecular model, EVA-CM (http://salilab.org/eva), will be used to assess geometric accuracy of the final structures186,187. The existing model building routines of MODELLER will initially be used for comparative modeling; in addition, new model building methods described in Aims 1 and 2 will also be explored at least for the creation of the initial molecular replacement models when they become available. We will strive to make our software compatible with the CCP4 and PHENIX program suites. D.3.4 Summary We will produce and test modular, freely available software that will improve the accuracy and applicability of molecular replacement. This aim will be achieved by tightly coupling exploration of comparative modeling and standard molecular replacement degrees of freedom. Several protocols to address the sensitivity of molecular replacement to the accuracy of the initial model as well as the structure refinement stage will be developed.

D.4 Specific Aim 4 Evaluate the methods in Aims 1–3, including the utility of the models for computer-aided inhibitor discovery. [Shoichet, Sali, and Jacobson] D.4.1 Methods One important component of our strategy for testing the methods we develop in Aims 1-3 will be blind predictions made in CASP and CAFASP. The next iteration of these contests will likely be summer 2006. Because of the novelty of the methods we are developing, we do not expect that they will be fully developed by that time, although it may be possible to test early versions of the algorithms on a small number of targets. The following contest, however, likely to be held in summer 2008, is ideally timed for our participation. As outlined in the Timeline in Section D.5, we expect that we can implement initial versions of the algorithms by the end of 2006, to be followed by early testing and refinement of the algorithms in 2007. An important complement to the CASP meetings is an online evaluation of protein structure modeling web servers. The online evaluation has the benefit of a much larger number of test structures, and of providing automatic and continuous feedback about the performance of the modeling servers. In collaboration with Burkhard Rost (Columbia University) and Alfonso Valencia (CNB, Madrid), the Sali group developed the EVA server for evaluation of the automated protein structure modeling servers (http://cubic.bioc.columbia.edu/eva/), including the comparative modeling servers (http://salilab.org/eva/)186,187. The evaluations are updated automatically to cope with the large number of existing prediction servers and the constant changes in the prediction methods. EVA currently accesses servers for secondary structure prediction, residue contact prediction, comparative protein structure modeling, and fold recognition. Every day, sequences of newly available protein structures in the PDB are sent to the servers and their predictions are collected. The predictions are then compared to the experimental structures once a week; the results are published on the EVA web pages. Over time, EVA has accumulated prediction results for a large number of proteins, ranging from hundreds to thousands, depending on the prediction method. Using such a large sample common to all assessed methods ensures that methods are compared as reliably as possible. Consequently, EVA provides useful information to developers as well as users of prediction methods. We also aim to move beyond purely geometric assessment of model accuracy. One practical metric is “how useful is the model for discovering ligands of the target?” To test our models by this criterion, we will use models generated by the new methods as targets for molecular docking screens (virtual screening), asking how well they behave as templates for recognizing known ligands? We will return to the approach adopted by McGovern & Shoichet76 (Section B.4). In this paper, the performance of homology modeled structures as docking targets was compared that of the same enzymes, from different species, determined by x-ray crystallography. The MDDR database of 100,000 drug-like molecules was docked against each of the target enzymes in both the x-ray and homology modeled

PHS 398/2590 (Rev. 09/04) Page 17 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. conformations. For each target, between 30 and 900 molecules were annotated as ligands in the MDDR, allowing us to calculate “enrichment factors” of known ligands, versus the rest of the database “decoy” molecules, among the top-scoring docking “hits”. Since true ligand geometries were known from the x-ray structures, we could also compare geometric fidelity of the modeling-based results versus those docked against the crystal structures. Here we adopt the same strategy, with two exceptions: first, we expand our list of target sites from 10 to 35, each of which has a known x-ray structure and at least 30 known ligands in the docking database (Table 2). Second, rather than relying simply on models pulled from ModBase, we use models refined using the methods that we have described above.

% of ranked % of ranked Number Number of database to database to Protein PDB code of known Protein PDB code known find 25% of find 25% of ligands ligands ligands ligands AR 1ah3 722 3.5 HHSP90 1uy6 13 b 1.3 DHFR 3dfr 117 0.3 CDK2 1hck 17 7.5 GART 1c2t 50 0.9 Adometdc 1i7m 13 0.6 PARP 1efy 45 4.6 ADe 1ndw 27 5.2 PNP 1b8o 25 1.2 ERa (active) 1l2i 53 0.4 SAHH 1a7a 37 2.1 ERa (inactive) 3ert 110 0.1 Thrombin 1ba8 243 4.2 Ar 1xq2 13 c 2.3 AChE 1e66 554 5.0 PPARg 1fm9 31 5.2 TS 2bbq 171 1.5 RXRa 1fm6 17 26 P38 MAP 1kv2 234 a 1.7 FVIIa 1dan 10 0.9 EGFr 1m17 416 a 1.1 Trypsin 1bju 73 0.6 FGFr1 1agw 118 a 4.8 FXa 1f0r 425 1.7 SRC 2src 162 a 3.8 COX-1 1p4g 26 3.0 PDGFrb Modweb35 156 a 2.0 COX-2 1cvu 550 26 VEGFr2 1vr2 75 b 4.5 AK 1lij 45 18 Carbonic Peptide 1cil 241 1.6 1lqy 21 0.7 anhydrase II deformylase MMP3 1hy7 337 3.2 Xanth. Oxid. 1fiq 74 1.0 NEP 1dmt 184 0.2 Table 2. Proteins to be used as initial test cases for the comparative modeling algorithms. Most ligands are from MDDR 2001, except (a) Courtesy of Dr. David Diller;77, (b) from Refs.188-190, (c) contributed by Dr. Tong from NCTR test case. Abbreviations: AR, aldose reductase; DHFR, dihydrofolate reductase; GART, glycinamide ribonucleotide transformylase; PARP, poly(ADP-ribose) polymerase; PNP, purine nucleoside phosphorylase; SAHH, S-adenosyl- homocysteine hydrolase; AChE, acetylcholinesterase; TS, thymidylate synthase; P38 MAP, mitogen activated protein kinase P38; EGFr, epidermal growth factor receptor; FGFr1, fibroblast growth factor receptor; SRC, non receptor tyrosine kinase; PDGFrb, the platelet derived growth factor receptor; VEGRr2, vascular endothelial growth factor receptor; HHSP90, human heat shock protein 90; CDK2, cyclin-dependent kinase 2; Adometdc, S-adenosyl-L-methionine decarboxylase; ADe, Adenosine deaminase; ERa, estrogen receptor; Ar, androgen receptor; PPARg, peroxisome proliferator activated receptor gamma; RXRa, retinoic X receptor alpha; FVIIa, factor VIIa; Fxa, factor Xa; COX-1, cyclooxygenase-1; COX-2, cyclooxygenase-2; AK, adenosine kinase; MMP3: matrix metalloproteinase 3; NEP: neutral endopeptidase; Xanth. Oxid: xanthine oxidase. The refined models should outperform our original studies for three reasons. First, the refined and especially multiple-conformation homology models can overcome residue-level errors in the placement of side chains, (Table 1, Section B.4). For instance, in the homology model of PARP used by McGovern and Shoichet, Tyr906 was modeled in a position that occupied part of the binding site, presenting an unresolvable conflict for docking. An alternate, more favorable conformation of this Tyr906 would have resulted in a model with almost the same internal energy, if only it had been generated. Sampling this level of conformation change among active site residues is certainly within the range of our refinement techniques. It is also within the range of new ensemble docking methods developed in the Shoichet lab, which can treat multiple receptor conformations without significantly increasing calculation time.134 A second advantage of the refined homology models is their ability to access concerted, large scale PHS 398/2590 (Rev. 09/04) Page 18 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. conformational changes that receptors can undergo in response to ligand binding. For instance, in the original McGovern & Shoichet study, the homology model of Aldose Reductase (AR in Table 1, Section B.4) performed poorly compared to the crystallographic structures. Unlike PARP, in AR the conformation change on binding of inhibitors such as Zenarestat and Tolrestat involves large displacements of secondary structure and loops. Such large-scale motions cannot be accessed by ordinary energy refinement. These should, however, be accessible by the kinematic and replica-exchange methods proposed here, and considered by the multi- conformation docking as part of an ensemble of low energy receptor conformations. The modifier “low energy” for receptor conformations is the critical third advantage of the physics-based refinement for structure-based ligand discovery. It is widely understood that it is important to consider multiple receptor conformations when designing or discovering new ligands. “Have you considered receptor flexibility?” is a question that is often asked in docking seminars. What is sometimes overlooked is that these receptor conformations must be low energy, and these internal energies must be balanced against the docking energy. A receptor conformation that will better complement a particular ligand is only useful if it is low energy. If it is a high energy conformation, any likely complementarity advantage will be overwhelmed by a conformational energy disadvantage. This all seems obvious, but it turns out to be non-trivial to implement because most docking “scoring functions” are not based on physical models of energy and often ignore the receptor energy. In the studies proposed here, this is not the case: both the internal energy and the docking interaction energy are calculated through similar functional forms and can be integrated. D.4.2 Experimental testing of homology-model-based docking The calculations against the 35 test systems in Table 2 provide a strong retrospective test of our methods. Success here will be comforting but not compelling, since we will have not predicted anything genuinely new. To do so, we turn to docking against refined homology models, followed by experimental testing of the proposed inhibitors. Letter of support are included from several experimental groups who are eager to collaborate on such projects, including James McKerrow (UCSF, cysteine proteases), Jack Taunton (UCSF, kinases), John Gerlt (UIUC, alpha-beta barrel enzymes), Ted Holman (UCSC, lipoxygenase), and Roger Brent (MSI, GTPases and other targets). All of these researchers are already collaborating with one or more members of our research team, and will provide feedback on the quality of the comparative models we generate by using them in challenging applications, primarily oriented around inhibitor discovery. In addition, the Shoichet lab will exploit the comparative model refinement methods in their own on-going work aimed at developing new inhibitors of β-lactamases. They will target the class C β-lactamase from Acinetobacter baumanii. The structure of the enzyme has not been determined, but it shares 35% sequence identity with the AmpC β-lactamase from E. coli191. The Shoichet lab has previously determined the structure of AmpC from E. coli with multiple ligands by x-ray crystallography192-195 and discovered novel, micromolar inhibitors of it.196,197 Thus, the AmpC homolog from A. baumanii provides a feasible and well controlled challenge. Its sequence identity is low enough to be non-trivial, but the enzyme class is well in hand experimentally, making it tractable for experimental testing of inhibitors proposed by the docking. The success rate of docking against the modeled structure can be compared to that against the crystallographic structure from E. coli. The clone for the A. baumanii AmpC β-lactamase is a gift from Dr. Robert Bonomo of Case Western Reserve University. If warranted, we will follow the inhibition studies with structure determination of the inhibitor-enzyme complex by x-ray crystallography. D.4.3 Expected outcomes and alternative approaches One premise is that docking for novel inhibitors measures the usefulness of the models that emerge from the new refinement and sampling protocols. But molecular docking has its own problems, and many wrongly predicted ligands will be the fault not of the homology models but of the docking. We can control for this by comparing the docking results against the x-ray structures to those from the refined homology models. Any docking artifacts should be common to both and should drop out. We expect the refinement and sampling methods to dramatically improve the docking results against the homology models, to a point, in fact, where they are competitive with the x-ray crystallographic models. This is not so much because the structures will be of similar quality, as that by accessing more conformational states in the models, we will present the docking with a fuller picture of the receptor than can be given by even a high quality, single experimental structure. The experimental testing of results from docking against comparative models, by the Shoichet lab and other PHS 398/2590 (Rev. 09/04) Page 19 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. collaborators, may present challenges of interpretation. What do we learn if we get no new inhibitors from the docking, as occasionally happens, can we say convincingly that refined models have failed? Conversely, if we get lots of new inhibitors, can we convincingly say that the refined homology models have succeeded? A key contribution of the experimental testing will be to keep us honest, in the spirit of CASP. Any single success or failure with the experimental component will not guarantee or sink this enterprise, but it will be a strong motivation to keep us from fooling ourselves with retrospective “predictions”. We expect that the docking against the comparative modes will not completely succeed or fail, but that we will get hit rates and new inhibitors that we can compare to those found against an x-ray structure. Most compellingly, through the efforts of the Shoichet group, we will have the opportunity to compare the experimentally determined x-ray structures of a complex between the AmpC from A. baumanii and the structure predicted by modeling and docking. From this we might learn a great deal indeed. D.4.4 Summary We close by returning to what we believe are the two fundamental innovative aspects of this aim. First, simply focusing on applying refined models as docking targets forces us to consider a very practical, very desirable application of homology modeling: structure-based ligand discovery. Improvements for this application would have wide applicability and it is an area where we believe we have unique synergies and competencies. Second, we believe that the sampling and physics-based energy methods proposed are suited to the types of problems encountered in docking against homology modeled structures, and thus we can reasonably hope to significantly advance the state of the art in the field. Such an advance would open up a huge number of targets for which no experimental structure is now available for virtual screening and have considerable impact.

D.5 Timeline and deliverables 12/05 Meeting of all investigators. Implement computational platform for Aim 1 (Jacobson, Dill, Coutsias) 1/06-12/06 Implement computational platform for Aim 2 (Dill, Sali, Jacobson, Coutsias) Implement computational platform for Aim 3 (Sali, Jacobson, and Dill) Iterative cycles of testing (on known structures, and by docking) and refinement of 1/07-12/07 the methods Generation of models for experimental collaborators Large-scale benchmarking of kinematics and zippers strategies; begin participation 1/08-6/08 in EVA-CM; begin general distribution of executables and source code 6/08-9/08 CASP/CAFASP participation 6/08-11/08 Integration of methods from Aims 1–3; testing of composite strategies Dr. Coutsias plans to travel to UCSF twice per year to facilitate collaboration, staying for approximately 6 weeks each time. These visits are tentatively planned to occur during the summer and during the winter, when he will be released from teaching responsibilities due to his effort on this grant. The tangible outcome of this research will be a set of modular source codes, freely distributed to the academic community, that implement the methods in Aims 1–3 for improving sampling efficiency during homology model refinement. The codes will be useable with any scoring functions through a suitable source-level or object code interface. We will also distribute new versions of the MODELLER (Sali group) and Protein Local Optimization Program (PLOP, Jacobson group) executables that implement these methods, allowing users to input initial homology models and perform refinement with little user intervention. The primary difference between the two executables will be the scoring function used in conjunction with the sampling methods in Aims 1–3. PLOP uses a force field (OPLS) and Generalized Born implicit model for scoring, while MODELLER uses a faster, more empirically derived scoring function. Details of the licensing agreements are presented below. The source code and executables will be distributed through a web interface, which will also permit user feedback and bug tracking. Bugzilla and CVS will be used to help manage source code development by multiple researchers.

PHS 398/2590 (Rev. 09/04) Page 20 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. D.6 Data sharing plan

Our overall aim is to distribute the software developed under this support as widely as possible, and to encourage other academic researchers to make improvements at the source code level which can benefit the research community as a whole. To this end, all new software will be developed under a licensing agreement that permits free distribution of the software to all nonprofit institutions, including source code, which can be modified by the users (proposed text below). (The existing source code for the kinematics-based loop closure is already freely available on the Dill lab web page.) To make the licensing process as easy as possible, we will use an online “shrink-wrap” style license agreement for non-commercial users. In this model, the user is presented with the text of the license agreement using a web browser. Before the user can download the software they are interested in, they must click on an "I Accept" button within their browser. A letter from the UCSF Office of Technology Management supporting this plan is included with this application. The terms of the license will also permit commercialization of software components.

Beyond these licensing arrangements, we will also abide by the following “best practices” principles of scientific programming: 1) all source code will be written in a language that permits modular programming (principally Fortran 90), to make compilation with other source codes as facile as possible; 2) all source code will be annotated and “signed” by its author, to make modifying the source code as facile as possible; and 3) all source code will be maintained in the Jacobson group’s CVS repository with access provided to all investigators and collaborators, facilitating the development of source code by many parties simultaneously. We intend to use Bugzilla for bug tracking, and solicit user feedback through the web site used to distribute the software.

The tangible results from the research will be published in peer-reviewed journals and made available through web-based resources developed and supported by the Shoichet and Sali groups. Specifically, the Sali group’s MODBASE resource will provide web-based access to the homology models generated, and the Shoichet group’s DOCKBLASTER resource, which will provide web-based access to any results of docking against homology models.

Proposed text of click-to-accept on-line license:

PLEASE READ THIS SOFTWARE LICENSE AGREEMENT CAREFULLY BEFORE PRESSING THE “ACCEPT” BUTTON DISPLAYED BELOW AND DOWNLOADING THE SOFTWARE. BY PRESSING THE “ACCEPT” BUTTON, YOU ARE AGREEING TO BE BOUND BY THE TERMS OF THIS LICENSE. IF YOU DO NOT AGREE TO THE TERMS OF THIS LICENSE, PRESS THE “DECLINE” BUTTON AND DO NOT USE THE SOFTWARE.

This license agreement (“License”), effective today, is made by and between you (hereinafter referred to as the “Licensee”) and The Regents of the University of California, a California corporation having its statewide administrative offices at 1111 Franklin Street, Oakland, California 94607-5200, ("Regents") acting through its Office of Technology Management, University of California San Francisco, 185 Berry Street, Suite 4603, San Francisco, California 94107, and concerns certain software for the purposes of the refinement of comparative protein models for research purposes and includes executable code, source code, and documentation (hereinafter referred to as the “Software”).

1. General. A non-exclusive, nontransferable, perpetual license is granted to the Licensee to install and use the Software for academic, non-profit, or government-sponsored research purposes. Use of the Software under this License is restricted to non-commercial purposes. Commercial use of the Software requires a separately executed written license agreement.

2. Permitted Use and Restrictions. Licensee agrees that it will use the Software, and any modifications, improvements, or derivatives to the Software and associated documentation that the Licensee may create (collectively, “Improvements”) solely for internal, non-commercial purposes and shall not distribute or transfer the Software or Improvements to any person or third parties without prior written permission from

PHS 398/2590 (Rev. 09/04) Page 21 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. the Regents. The term “non-commercial,” as used in this License, means academic or other scholarly research which (a) is not undertaken for profit, or (b) is not intended to produce works, services, or data for commercial use, or (c) is neither conducted, nor funded, by a person or an entity engaged in the commercial use, application or exploitation of works similar to the Software.

3. Ownership of Copyright. The Licensee agrees that the copyrights in the Software and associated documentation are the property of the Regents. The Licensee may create Improvements and ownership of the copyrights in these Improvements will vest in the Licensee to the extent allowable under the law, but only in those aspects of the Improvements that are not the subject of the Regents’ copyrights in the Software and associated documentation. The Licensee agrees to use his/her reasonable best efforts to protect the contents of the Software and associated documentation and to prevent unauthorized disclosure by its agents, officers, employees, and consultants. If the Licensee receives a request to furnish all or any portion of the Software and associated documentation to a third party, Licensee will not fulfill such a request but will refer the third party to [the web page for this license] so that the third party’s use of this Software and associated documentation will be subject to the terms and conditions of this License.

4. Copies. The Licensee may make a reasonable number of copies of the Software for the purposes of backup, maintenance of the Software or the development of Improvements. These additional copies shall carry the copyright notice and shall be controlled by this License, and will be destroyed along with the original by the Licensee upon termination of the License.

5. Acknowledgement. Licensee agrees that any publication of results obtained with the Software will acknowledge its use by an appropriate citation as specified in the documentation.

6. Disclaimer of Warranties and Limitation of Liability. THE LICENSEE AGREES THAT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. THE REGENTS MAKES NO REPRESENTATION OR WARRANTY THAT THE SOFTWARE WILL NOT INFRINGE ANY PATENT OR OTHER PROPRIETARY RIGHT. IN NO EVENT SHALL THE REGENTS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

7. Termination. This License is effective until terminated by either party. Your rights under this License will terminate automatically without notice from the Regents if you fail to comply with any term(s) of this License. You may terminate the license by giving written notice of termination to the Regents. Upon termination of this License, you shall immediately discontinue all use of the Software and destroy the original and all copies, full or partial, of the Software, including any modifications or derivative works, and associated documentation.

8. Governing Law and General Provisions. This License shall be governed by the laws of the State of California, excluding the application of its conflicts of law rules. This License shall not be governed by the United Nations Convention on Contracts for the International Sale of Goods, the application of which is expressly excluded. If any provisions of this License are held invalid or unenforceable for any reason, the remaining provisions shall remain in full force and effect. This License is binding upon any heirs and assigns of the Licensee. The License granted to Licensee hereunder may not be assigned or transferred to any other person or entity without the express consent of the Regents. This License constitutes the entire agreement between the parties with respect to the use of the Software licensed hereunder and supersedes all other previous or contemporaneous agreements or understandings between the parties, whether verbal or written, concerning the subject matter. Any translation of this License is done for local requirements and in the event of a dispute between the English and any non-English versions, the English version of this License shall govern.

PHS 398/2590 (Rev. 09/04) Page 22 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Office of Technology Management

185 Berry Street – Suite 4603 Office: 415.343.4462 San Francisco, CA 94107 FAX: 415.348.1579 E-mail: [email protected]

January 7, 2005

Prof. Matthew P. Jacobson University of California, San Francisco 600 16th Street, Room N472C, Box 2240 San Francisco, CA 94143-2240

The UCSF Office of Technology Management has reviewed the software dissemination plans for your proposal entitled “Refinement of Comparative Protein Models” and concurs with these plans, including the on-line academic software licensing agreement and plans for potential commercialization and/or technology transfer of the software you will develop. The Office of Technology Management (OTM) was established in 1996 with the charge to bring the results of the research and educational programs at UCSF forward for public use and benefit. University of California patent policy requires that technologies conceived or developed by its employees be disclosed in a timely basis to this office, and title to these technologies be assigned to the University. The OTM is responsible for intellectual property issues at UCSF, including patent and copyright protection, and the timely licensing of intellectual property. Our office receives technology disclosures, evaluates the disclosed technology for commercial potential, obtains patent or other intellectual property protection when appropriate, diligently seeks to license the technology, manages the resulting license agreements, and distributes net revenues of the process to inventors and within UCSF to support further research. The OTM will work with you and the other UCSF investigators participating in this work to commercialize your software and/or databases as the need arises. Over 150 new technology disclosures per annum, or over three per week, are generated from research and scholarship at UCSF, and the OTM has a strong record of successfully licensing this technology to industry as well as facilitating the transfer of UCSF intellectual property and proprietary materials to non-profit organizations for research and education purposes. Regards,

Joel B. Kirschbaum, Ph.D. Director and Senior Technology Portfolio Manager

PHS 398/2590 (Rev. 09/04) Page 23 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. E. Human Subjects

None.

F. Vertebrate Animals

None.

G. Literature Cited

1. U. Pieper, N. Eswar, H. Braberg, M.S. Madhusudhan, F.P. Davis, A.C. Stuart, N. Mirkovic, A. Rossi, M.A. Marti-Renom, A. Fiser, B. Webb, D. Greenblatt, C.C. Huang, T.E. Ferrin, and A. Sali. (2004) MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res. 32 Database issue: D217-22. 2. A. Tramontano and V. Morea. (2003) Assessment of homology-based predictions in CASP5. Proteins. 53 Suppl 6: 352-68. 3. A. Tramontano, R. Leplae, and V. Morea. (2001) Analysis and assessment of comparative modeling predictions in CASP4. Proteins. Suppl 5: 22-38. 4. D. Cozzetto and A. Tramontano. (2005) Relationship between multiple sequence alignments and quality of protein comparative models. Proteins. 58: 151-7. 5. R. Zhou. (2003) Free energy landscape of protein folding in water: explicit vs. implicit solvent. Proteins. 53: 148-61. 6. D. Gilis. (2004) Protein decoy sets for evaluating energy functions. J Biomol Struct Dyn. 21: 725-36. 7. J. Tsai, R. Bonneau, A.V. Morozov, B. Kuhlman, C.A. Rohl, and D. Baker. (2003) An improved protein decoy set for testing energy functions for protein structure prediction. Proteins. 53: 76-87. 8. B.N. Dominy and C.L. Brooks. (2002) Identifying native-like protein structures using physics-based potentials. J Comput Chem. 23: 147-160. 9. A.K. Felts, E. Gallicchio, A. Wallqvist, and R.M. Levy. (2002) Distinguishing native conformations of proteins from decoys with an effective free energy estimator based on the OPLS all-atom force field and the surface generalized born solvent model. Proteins. 48: 404-422. 10. M.C. Lee and Y. Duan. (2004) Distinguish protein decoys by using a scoring function based on a new AMBER force field, short molecular dynamics simulations, and the generalized born solvent model. Proteins. 55: 620-34. 11. M. Feig and C.L. Brooks, 3rd. (2002) Evaluating CASP4 predictions with physical energy functions. Proteins. 49: 232-45. 12. C.D. Snow, B. Zagrovic, and V.S. Pande. (2002) The Trp cage: folding kinetics and unfolded state topology via molecular dynamics simulations. J Am Chem Soc. 124: 14548-9. 13. B. Zagrovic, C.D. Snow, M.R. Shirts, and V.S. Pande. (2002) Simulation of folding of a small alpha- helical protein in atomistic detail using worldwide-distributed computing. J Mol Biol. 323: 927-37. 14. V.S. Pande, I. Baker, J. Chapman, S.P. Elmer, S. Khaliq, S.M. Larson, Y.M. Rhee, M.R. Shirts, C.D. Snow, E.J. Sorin, and B. Zagrovic. (2003) Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing. Biopolymers. 68: 91-109. 15. W.L. Jorgensen, D.S. Maxwell, and J. TiradoRives. (1996) Development and testing of the OPLS all- atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc. 118: 11225-11236. 16. G.A. Kaminski, R.A. Friesner, J. Tirado-Rives, and W.L. Jorgensen. (2001) Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J Phys Chem B. 105: 6474-6487. 17. M.P. Jacobson, G.A. Kaminski, R.A. Friesner, and C.S. Rapp. (2002) Force field validation using protein side chain prediction. J Phys Chem B. 106: 11673-11680. 18. A. Ghosh, C.S. Rapp, and R.A. Friesner. (1998) Generalized born model based on a surface integral formulation. J Phys Chem B. 102: 10983-10990.

PHS 398/2590 (Rev. 09/04) Page 24 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. 19. E. Gallicchio, L.Y. Zhang, and R.M. Levy. (2002) The SGB/NP hydration free energy model based on the surface generalized born solvent reaction field and novel nonpolar hydration free energy estimators. J Comput Chem. 23: 517-529. 20. M.P. Jacobson, D.L. Pincus, C.S. Rapp, T.J. Day, B. Honig, D.E. Shaw, and R.A. Friesner. (2004) A hierarchical approach to all-atom protein loop prediction. Proteins. 55: 351-67. 21. P.I.W. de Bakker, M.A. DePristo, D.F. Burke, and T.L. Blundell. (2003) Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the generalized born solvation model. Proteins. 51: 21-40. 22. X. Li, M.P. Jacobson, and R.A. Friesner. (2004) High resolution prediction of protein helix positions and orientations. Proteins. 55: 368-382. 23. P. Koehl and M. Levitt. (1999) A brighter future for protein structure prediction. Nature Structural Biology. 6: 108-111. 24. J. Moult. (1996) The current state of the art in protein structure prediction. Current Opinion in Biotechnology. 7: 422-427. 25. J.A. Flohil, G. Vriend, and H.J.C. Berendsen. (2002) Completion and refinement of 3-D homology models with restricted molecular dynamics: Application to targets 47, 58, and 111 in the CASP modeling competition and posterior analysis. Proteins-Structure Function and Genetics. 48: 593-604. 26. H. Li, R. Tejero, D. Monleon, D. BassolinoKlimas, C. AbateShen, R.E. Bruccoleri, and G.T. Montelione. (1997) Homology modeling using simulated annealing of restrained molecular dynamics and conformational search calculations with CONGEN: Application in predicting the three-dimensional structure of murine homeodomain Msx-1. Protein Science. 6: 956-970. 27. M.R. Lee, J. Tsai, D. Baker, and P.A. Kollman. (2001) Molecular dynamics in the endgame of protein structure prediction. J Mol Biol. 313: 417-30. 28. M.R. Lee, D. Baker, and P.A. Kollman. (2001) 2.1 and 1.8 A average C(alpha) RMSD structure predictions on two small proteins, HP-36 and s15. J Am Chem Soc. 123: 1040-6. 29. Y. Duan and P.A. Kollman. (1998) Pathways to a protein folding intermediate observed in a 1- microsecond simulation in aqueous solution. Science. 282: 740-4. 30. A. Sali, R. Matsumoto, H.P. McNeil, M. Karplus, and R.L. Stevens. (1993) Three-dimensional models of four mouse mast cell chymases. Identification of proteoglycan binding regions and protease-specific antigenic epitopes. J.Biol.Chem. 268: 9023-9034. 31. A. Sali and J.P. Overington. (1994) Derivation of rules for comparative protein modeling from a database of protein structure alignments. Protein Sci. 3: 1582-1596. 32. A. Fiser, R.K. Do, and A. Sali. (2000) Modeling of loops in protein structures. Protein Sci. 9: 1753-1773. 33. M.A. Marti-Renom, A.C. Stuart, A. Fiser, R. Sanchez, F. Melo, and A. Sali. (2000) Comparative protein structure modeling of genes and genomes. Annu.Rev.Biophys.Biomol.Struct. 29: 291-325. 34. A. Fiser, M. Feig, C.L. Brooks, III, and A. Sali. (2002) Evolution and physics in comparative protein structure modeling. Acc.Chem.Res. 35: 413-421. 35. N. Eswar, B. John, N. Mirkovic, A. Fiser, V. Ilyin, U. Pieper, A.C. Stuart, M.A. Marti-Renom, M.S. Madhusudhan, B. Yerkovich, and A. Sali. (2003) Tools for comparative protein structure modeling and analysis. Nucleic Acids Res. 31: 3375-3380. 36. R. Sanchez and A. Sali. (1997) Evaluation of comparative protein structure modeling by MODELLER-3. Proteins. Suppl 1: 50-58. 37. A. Sali, L. Potterton, F. Yuan, H. van Vlijmen, and M. Karplus. (1995) Evaluation of comparative protein modeling by MODELLER. Proteins. 23: 318-326. 38. U.H.E. Hansmann. (1997) Parallel tempering algorithm for conformational studies of biological molecules. Chem Phys Lett. 281: 140-150. 39. Y. Sugita and Y. Okamoto. (1999) Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett. 314: 141-151. 40. W. Hoppe, in Acta Crystallogr. 1957. p. 750-751. 41. M.G. Rossmann and D.M. Blow. (1962) The detection of subunits within the crystallographic asymmetric unit. Acta Crystallogr. 15: 24-31. 42. J.P. Turkenburg and E.J. Dodson. (1996) Modern developments in molecular replacement. Curr Opin Struct Biol. 6: 604-10.

PHS 398/2590 (Rev. 09/04) Page 25 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. 43. M.G. Rossmann. (2001) Molecular replacement--historical background. Acta Crystallogr D Biol Crystallogr. 57: 1360-6. 44. T. Nagata, V. Gupta, D. Sorce, W.Y. Kim, A. Sali, B.T. Chait, K. Shigesada, Y. Ito, and M.H. Werner. (1999) Immunoglobulin motif DNA recognition and heterodimerization of the PEBP2/CBF Runt domain. Nat.Struct.Biol. 6: 615-619. 45. P.M. Bowers, C.E. Strauss, and D. Baker. (2000) De novo protein structure determination using sparse NMR data. J Biomol NMR. 18: 311-8. 46. M. Andrec and R.M. Levy. (2002) Protein sequential resonance assignments by combinatorial enumeration using 13C alpha chemical shifts and their (i, i-1) sequential connectivities. J Biomol NMR. 23: 263-70. 47. C.M. Spahn, R. Beckmann, N. Eswar, P.A. Penczek, A. Sali, G. Blobel, and J. Frank. (2001) Structure of the 80S ribosome from Saccharomyces cerevisiae--tRNA-ribosome and subunit-subunit interactions. Cell. 107: 373-386. 48. H. Gao, J. Sengupta, M. Valle, A. Korostelev, N. Eswar, S.M. Stagg, P. VanRoey, R.K. Angrawal, S.C. Harvey, A. Sali, M.S. Chapman, and J. Frank. (2003) Study of the Structural Dynamics of the E. Coli 70S Ribosome Using Real Space Refinement. Cell. 113: 789-801. 49. M. Topf, M.L. Baker, B. John, W. Chiu, and A. Sali. (2005) Structure Characterization of Domains in Protein Assemblies by Comparative Modeling and Electron Cryo-Microscopy. J Struct Biol. 149: 191- 203. 50. T.L. Blundell and L.N. Johnson, Protein Crystallography. 1976, New York: Academic Press. 51. N.S. Pannu and R.J. Read. (1996) Improved structure refinement through maximum likelihood. Acta Crystallogr A. 54: 659-668. 52. N.S. Pannu, G.N. Murshudov, E.J. Dodson, and R.J. Read. (1998) Incorporation of prior phase information strengthens maximum-likelihood structure refinement. Acta Crystallogr D Biol Crystallogr. 54: 1285-94. 53. A.T. Brunger, P.D. Adams, G.M. Clore, W.L. DeLano, P. Gros, R.W. Grosse-Kunstleve, J.S. Jiang, J. Kuszewski, M. Nilges, N.S. Pannu, R.J. Read, L.M. Rice, T. Simonson, and G.L. Warren. (1998) Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr. 54 ( Pt 5): 905-21. 54. P.D. Adams, N.S. Pannu, R.J. Read, and A.T. Brunger. (1999) Extending the limits of molecular replacement through combined simulated annealing and maximum-likelihood refinement. Acta Crystallogr D Biol Crystallogr. 55 ( Pt 1): 181-90. 55. R.J. Read. (2001) Pushing the boundaries of molecular replacement with maximum likelihood. Acta Crystallogr D Biol Crystallogr. 57: 1373-82. 56. L.C. Storoni, A.J. McCoy, and R.J. Read. (2004) Likelihood-enhanced fast rotation functions. Acta Crystallogr D Biol Crystallogr. 60: 432-8. 57. A.T. Brunger. (1990) Extension of molecular replacement: A New search strategy based on Patterson correlation refinement. Acta Cryst A. 46: 46-57. 58. W.L. DeLano and A.T. Brunger. (1995) The direct rotation function: Patterson correlation search applied to molecular replacement. Acta Crystallogr D Biol Crystallogr. 51: 740-8. 59. J. Navaza and E. Vernoslova. (1995) On the Fast Translation Functions for Molecular Replacement. Acta Crystallogr A. 51: 445-449. 60. J. Navaza. (2001) Implementation of molecular replacement in AMoRe. Acta Crystallogr D Biol Crystallogr. 57: 1367-72. 61. C.R. Kissinger, D.K. Gehlhaar, and D.B. Fogel. (1999) Rapid automated molecular replacement by evolutionary search. Acta Crystallogr D Biol Crystallogr. 55 ( Pt 2): 484-91. 62. N.M. Glykos and M. Kokkinidis. (2000) A stochastic approach to molecular replacement. Acta Crystallogr D Biol Crystallogr. 56 ( Pt 2): 169-74. 63. R. Schwarzenbacher, A. Godzik, S.K. Grzechnik, and L. Jaroszewski. (2004) The importance of alignment accuracy for molecular replacement. Acta Crystallogr D Biol Crystallogr. 60: 1229-36. 64. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne. (2000) The Protein Data Bank. Nucleic Acids Res. 28: 235-42. 65. M.P. Jacobson and A. Sali. (2004) Comparative Protein Structure Modeling and Its Applications to Drug Discovery. Annual Reports in Medicinal Chemistry. 39: 259-276. PHS 398/2590 (Rev. 09/04) Page 26 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. 66. C.S. Ring, E. Sun, J.H. McKerrow, G.K. Lee, P.J. Rosenthal, I.D. Kuntz, and F.E. Cohen. (1993) Structure-based inhibitor design by using protein models for the development of antiparasitic agents. Proc Natl Acad Sci U S A. 90: 3583-7. 67. R. Li, X. Chen, B. Gong, P.M. Selzer, Z. Li, E. Davidson, G. Kurzban, R.E. Miller, E.O. Nuzum, J.H. McKerrow, R.J. Fletterick, S.A. Gillmor, C.S. Craik, I.D. Kuntz, F.E. Cohen, and G.L. Kenyon. (1996) Structure-based design of parasitic protease inhibitors. Bioorg Med Chem. 4: 1421-7. 68. P.M. Selzer, X. Chen, V.J. Chan, M. Cheng, G.L. Kenyon, I.D. Kuntz, J.A. Sakanari, F.E. Cohen, and J.H. McKerrow. (1997) Leishmania major: molecular modeling of cysteine proteases and prediction of new nonpeptide inhibitors. Exp Parasitol. 87: 212-21. 69. X. Que, L.S. Brinen, P. Perkins, S. Herdman, K. Hirata, B.E. Torian, H. Rubin, J.H. McKerrow, and S.L. Reed. (2002) Cysteine proteinases from distinct cellular compartments are recruited to phagocytic vesicles by Entamoeba histolytica. Mol Biochem Parasitol. 119: 23-32. 70. I.J. Enyedy, S.L. Lee, A.H. Kuo, R.B. Dickson, C.Y. Lin, and S. Wang. (2001) Structure-based approach for the discovery of bis-benzamidines as novel inhibitors of matriptase. J Med Chem. 44: 1349-55. 71. I.J. Enyedy, Y. Ling, K. Nacro, Y. Tomita, X. Wu, Y. Cao, R. Guo, B. Li, X. Zhu, Y. Huang, Y.Q. Long, P.P. Roller, D. Yang, and S. Wang. (2001) Discovery of small-molecule inhibitors of Bcl-2 through structure-based computer screening. J Med Chem. 44: 4313-24. 72. M. Schapira, B.M. Raaka, H.H. Samuels, and R. Abagyan. (2001) In silico discovery of novel Retinoic Acid Receptor agonist structures. BMC Struct Biol. 1: 1. 73. F. Zuccotto, M. Zvelebil, R. Brun, S.F. Chowdhury, R. Di Lucrezia, I. Leal, L. Maes, L.M. Ruiz-Perez, D. Gonzalez Pacanowska, and I.H. Gilbert. (2001) Novel inhibitors of Trypanosoma cruzi dihydrofolate reductase. Eur J Med Chem. 36: 395-405. 74. E. Vangrevelinghe, K. Zimmermann, J. Schoepfer, R. Portmann, D. Fabbro, and P. Furet. (2003) Discovery of a potent and selective protein kinase CK2 inhibitor by high-throughput docking. J Med Chem. 46: 2656-62. 75. T. Honma, T. Yoshizumi, N. Hashimoto, K. Hayashi, N. Kawanishi, K. Fukasawa, T. Takaki, C. Ikeura, M. Ikuta, I. Suzuki-Takahashi, T. Hayama, S. Nishimura, and H. Morishima. (2001) A novel approach for the development of selective Cdk4 inhibitors: library design based on locations of Cdk4 specific amino acid residues. J Med Chem. 44: 4628-40. 76. S.L. McGovern and B.K. Shoichet. (2003) Information decay in molecular docking screens against holo, apo, and modeled conformations of enzymes. J Med Chem. 46: 2895-907. 77. D.J. Diller and R. Li. (2003) Kinases, homology models, and high throughput docking. J Med Chem. 46: 4638-47. 78. C. Oshiro, E.K. Bradley, J. Eksterowicz, E. Evensen, M.L. Lamb, J.K. Lanctot, S. Putta, R. Stanton, and P.D. Grootenhuis. (2004) Performance of 3D-database molecular docking studies into homology models. J Med Chem. 47: 764-7. 79. A.M. Ferrari, B.Q. Wei, L. Costantino, and B.K. Shoichet. (2004) Soft docking and multiple receptor conformations in virtual screening. J Med Chem. 47: 5076-84. 80. E.A. Coutsias, C.L. Seok, M.P. Jacobson, and K.A. Dill. (2004) A kinematic view of loop closure. J Comput Chem. 25: 510-528. 81. E.A. Coutsias, C. Seok, and K.A. Dill. (2004) Using quaternions to calculate RMSD. J Comput Chem. 25: 1849-57. 82. M.P. Jacobson, Kaminski, G. A., Friesner, R. A., Rapp, C. S.,. (2002) Force Field Validation Using Protein Side Chain Prediction. J. Phys. Chem. B. 106: 11673. 83. M.P. Jacobson, R.A. Friesner, Z.X. Xiang, and B. Honig. (2002) On the role of the crystal environment in determining protein side-chain conformations. J Mol Biol. 320: 597-608. 84. V. Guallar, M.P. Jacobson, A. McDermott, and R.A. Friesner. (2004) Computational Modeling of the Catalytic Reaction in Triose Phosphate Isomerase. J Mol Biol. 337: 227-239. 85. M. Andrec, Y. Harano, M.P. Jacobson, R.A. Friesner, and R.M. Levy. (2002) Complete protein structure determination using backbone residual dipolar couplings and sidechain rotamer prediction. Journal of Structural and Functional Genomics. 2: 103-111. 86. B. John and A. Sali. (2003) Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res. 31: 3982-3992. PHS 398/2590 (Rev. 09/04) Page 27 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. 87. M.A. Marti-Renom, M.S. Madhusudhan, and A. Sali. (2004) Alignment of protein sequences by their profiles. Protein Sci. 13: 1071-1087. 88. E. Feyfant, A. Fiser, and A. Sali. (2004) Modeling mutations in protein structures. In Preparation. 89. F. Melo, R. Sanchez, and A. Sali. (2002) Statistical potentials for fold assessment. Protein Sci. 11: 430- 448. 90. A. Fiser and A. Sali. (2003) ModLoop: Automated modeling of loops in protein structures. Bioinformatics. 19: 1-2. 91. B. Guenther, R. Onrust, A. Sali, M. O'Donnell, and J. Kuriyan. (1997) Crystal structure of the delta' subunit of the clamp-loader complex of E. coli DNA polymerase III. Cell. 91: 335-345. 92. X. Wu, B. Knudsen, S.M. Feller, J. Zheng, A. Sali, D. Cowburn, H. Hanafusa, and J. Kuriyan. (1995) Structural basis for the specific interaction of lysine-containing proline-rich peptides with the N-terminal SH3 domain of c-Crk. Structure. 3: 215-226. 93. N. Ghildyal, D.S. Friend, R.L. Stevens, K.F. Austen, C. Huang, J.F. Penrose, A. Sali, and M.F. Gurish. (1996) Fate of two mast cell tryptases in V3 mastocytosis and normal BALB/c mice undergoing passive systemic anaphylaxis: prolonged retention of exocytosed mMCP-6 in connective tissues, and rapid accumulation of enzymatically active mMCP-7 in the blood. J.Exp.Med. 184: 1061-1073. 94. C. Huang, G. Morales, A. Vagi, K. Chanasyk, M. Ferrazzi, C. Burklow, W.T. Qiu, E. Feyfant, A. Sali, and R.L. Stevens. (2000) Formation of enzymatically active, homotypic, and heterotypic tetramers of mouse mast cell tryptases. Dependence on a conserved Trp-rich domain on the surface. J.Biol.Chem. 275: 351-358. 95. C. Huang, A. Sali, and R.L. Stevens. (1998) Regulation and function of mast cell proteases in inflammation. J.Clin.Immunol. 18: 169-183. 96. C. Huang, G.W. Wong, N. Ghildyal, M.F. Gurish, A. Sali, R. Matsumoto, W.T. Qiu, and R.L. Stevens. (1997) The tryptase, mouse mast cell protease 7, exhibits anticoagulant activity in vivo and in vitro due to its ability to degrade fibrinogen in the presence of the diverse array of protease inhibitors in plasma. J.Biol.Chem. 272: 31885-31893. 97. J.E. Hunt, D.S. Friend, M.F. Gurish, E. Feyfant, A. Sali, C. Huang, N. Ghildyal, S. Stechschulte, K.F. Austen, and R.L. Stevens. (1997) Mouse mast cell protease 9, a novel member of the chromosome 14 family of serine proteases that is selectively expressed in uterine mast cells. J.Biol.Chem. 272: 29158- 29166. 98. R. Matsumoto, A. Sali, N. Ghildyal, M. Karplus, and R.L. Stevens. (1995) Packaging of proteases and proteoglycans in the granules of mast cells and other hematopoietic cells. A cluster of histidines on mouse mast cell protease 7 regulates its binding to heparin serglycin proteoglycans. J.Biol.Chem. 270: 19524-19531. 99. G.W. Wong, S. Yasuda, M.S. Madhusudhan, L. Li, Y. Yang, S.A. Krilis, A. Sali, and R.L. Stevens. (2001) Human tryptase epsilon (PRSS22), a new member of the chromosome 16p13.3 family of human serine proteases expressed in airway epithelial cells. J.Biol.Chem. 276: 49169-49182. 100. G.W. Wong, L. Li, M.S. Madhusudhan, S.A. Krilis, M.F. Gurish, M.E. Rothenberg, A. Sali, and R.L. Stevens. (2001) Tryptase 4, a new member of the chromosome 17 family of mouse serine proteases. J.Biol.Chem. 276: 20648-20658. 101. G.W. Wong, Y. Tang, E. Feyfant, A. Sali, L. Li, Y. Li, C. Huang, D.S. Friend, S.A. Krilis, and R.L. Stevens. (1999) Identification of a new member of the tryptase family of mouse and human mast cell proteases which possesses a novel COOH-terminal hydrophobic extension. J.Biol.Chem. 274: 30784- 30793. 102. Y. Yang, L. Li, G.W. Wong, S.A. Krilis, M.S. Madhusudhan, A. Sali, and R.L. Stevens. (2002) RasGRP4, a new mast cell-restricted Ras guanine nucleotide-releasing protein with calcium- and diacylglycerol-binding motifs. Identification of defective variants of this signaling protein in asthma, mastocytosis, and mast cell leukemia patients and demonstration of the importance of RasGRP4 in mast cell development and function. J.Biol.Chem. 277: 25756-25774. 103. G.M. Iverson, S. Reddel, E.J. Victoria, K.A. Cockerill, Y.X. Wang, M.A. Marti-Renom, A. Sali, D.M. Marquis, S.A. Krilis, and M.D. Linnik. (2002) Use of single point mutations in domain I of beta 2- glycoprotein I to determine fine antigenic specificity of antiphospholipid autoantibodies. J.Immunol. 169: 7097-7103.

PHS 398/2590 (Rev. 09/04) Page 28 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. 104. D.A. Kandiah, A. Sali, Y. Sheng, E.J. Victoria, D.M. Marquis, S.M. Coutts, and S.A. Krilis. (1998) Current insights into the "antiphospholipid" syndrome: clinical, immunological, and molecular aspects. Adv.Immunol. 70: 507-563. 105. Y. Sheng, S.A. Krilis, and A. Sali. (1997) Site-directed mutagenesis of recombinant human beta 2- glycoprotein I. Effect of phospholipid binding and anticardiolipin antibody activity. Ann.N.Y.Acad.Sci. 815: 331-333. 106. Y. Sheng, A. Sali, H. Herzog, J. Lahnstein, and S.A. Krilis. (1996) Site-directed mutagenesis of recombinant human beta 2-glycoprotein I identifies a cluster of lysine residues that are critical for phospholipid binding and anti-cardiolipin antibody activity. J.Immunol. 157: 3744-3751. 107. J.M. Miwa, I. Ibanez-Tallon, G.W. Crabtree, R. Sanchez, A. Sali, L.W. Role, and N. Heintz. (1999) lynx1, an endogenous toxin-like modulator of nicotinic acetylcholine receptors in the mammalian CNS. Neuron. 23: 105-114. 108. L.Z. Xu, R. Sanchez, A. Sali, and N. Heintz. (1996) Ligand specificity of brain lipid-binding protein. J.Biol.Chem. 271: 24711-24719. 109. J. Vernal, A. Fiser, A. Sali, M. Muller, J.J. Cazzulo, and C. Nowicki. (2002) Probing the specificity of a trypanosomal aromatic alpha-hydroxy acid dehydrogenase by site-directed mutagenesis. Biochem.Biophys.Res.Commun. 293: 633-639. 110. G. Wu, A. Fiser, B. ter Kuile, A. Sali, and M. Muller. (1999) Convergent evolution of Trichomonas vaginalis lactate dehydrogenase from malate dehydrogenase. Proc.Natl.Acad.Sci.U.S.A. 96: 6285- 6290. 111. N. Mirkovic, M.A. Marti-Renom, A. Sali, and A.N.A. Monteiro, Structure-based assessment of missence mutations in human BRCA1: Implications for breast and ovarian cancer predisposition, in Cancer Research. 2003. 112. J.B. Bonanno, C. Edo, N. Eswar, U. Pieper, M.J. Romanowski, V. Ilyin, S.E. Gerchman, H. Kycia, F.W. Studier, A. Sali, and S.K. Burley. (2001) Structural genomics of enzymes involved in sterol/isoprenoid biosynthesis. Proc.Natl.Acad.Sci.U.S.A. 98: 12896-12901. 113. C.M. Groft, R. Beckmann, A. Sali, and S.K. Burley. (2000) Crystal structures of ribosome anti- association factor IF6. Nat.Struct.Biol. 7: 1156-1164. 114. E. Wolf, A. Vassilev, Y. Makino, A. Sali, Y. Nakatani, and S.K. Burley. (1998) Crystal structure of a GCN5-related N-acetyltransferase: Serratia marcescens aminoglycoside 3-N-acetyltransferase. Cell. 94: 439-449. 115. S.A. Lee, E.L. Shen, A. Fiser, A. Sali, and S. Guo. (2003) The zebrafish forkhead transcription factor Foxi1 specifies epibranchial placode-derived sensory neurons. Development. 130: 2669-2679. 116. K.W. Foreman, A.T. Phillips, J.B. Rosen, and K.A. Dill. (1999) Comparing search strategies for finding global optima on energy landscapes. J Comput Chem. 20: 1527-1532. 117. K.A. Dill, A.T. Phillips, and J.B. Rosen. (1997) Protein Structure Prediction and Potential Energy Landscape Analysis using Continuous Global Minimization. Proceedings of the First Annual International Conference on Computational Molecular Biology: 109-117. 118. A.T. Phillips, J.B. Rosen, and K.A. Dill, Energy landscape projections of molecular potential functions, in Optimization in Computational Chemistry and Molecular Biology, P.M. Pardalos, Editor. 1999, Kluwer. 119. J.B. Rosen, K.A. Dill, and A.T. Phillips. (1997) Protein structure and energy landscape dependence on sequence using a continuous energy function. J Comput Biol. 4: 227-239. 120. Z. Yu, M.P. Jacobson, C.S. Rapp, and R.A. Friesner. (2004) First-Shell Solvation of Ion Pairs: Correction of Systematic Errors in Implicit Solvent Models. J Phys Chem B. 108: 6643-6654. 121. S. Wong, K. Bernacki, and M.P. Jacobson. (2005) Competition Between Intramolecular Hydrogen Bonds and Solvation in Phosphorylated Peptides: Simulations with Implicit and Explicit Solvent. J Phys Chem B: in press. 122. K.A.T. Silverstein, K.A. Dill, and A.D.J. Haymet. (1998) Hydrophobicity in a simple model of water: solvation and hydrogen bond energies. Fluid Phase Equilibr. 151: 83-90. 123. K.A.T. Silverstein, A.D.J. Haymet, and K.A. Dill. (1998) A simple model of water and the hydrophobic effect. J Am Chem Soc. 120: 3166-3175. 124. K.A.T. Silverstein, A.D.J. Haymet, and K.A. Dill. (1999) Molecular model of hydrophobic solvation. J Chem Phys. 111: 8000-8009. PHS 398/2590 (Rev. 09/04) Page 29 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. 125. N.T. Southall and K.A. Dill. (2000) The mechanism of hydrophobic solvation depends on solute radius. J Phys Chem B. 104: 1326-1331. 126. K.A.T. Silverstein, A.D.J. Haymet, and K.A. Dill. (2000) The strength of hydrogen bonds in liquid water and around nonpolar solutes. J Am Chem Soc. 122: 8037-8041. 127. Y.V. Kalyuzhnyi, V. Vlachy, and K.A. Dill. (2001) Hydration of simple ions. Effect of the charge density. Acta Chim Slov. 48: 309-316. 128. T. Urbic, V. Vlachy, Y.V. Kalyuzhnyi, N.T. Southall, and K.A. Dill. (2002) A two-dimensional model of water: Solvation of nonpolar solutes. J Chem Phys. 116: 723-729. 129. T.M. Truskett and K.A. Dill. (2002) Predicting water's phase diagram and liquid-state anomalies. J Chem Phys. 117: 5101-5104. 130. N.T. Southall, K.A. Dill, and A.D.J. Haymet. (2002) A view of the hydrophobic effect. J Phys Chem B. 106: 521-533. 131. B. Hribar, N.T. Southall, V. Vlachy, and K.A. Dill. (2002) How ions affect the structure of water. J Am Chem Soc. 124: 12302-12311. 132. T.M. Truskett and K.A. Dill. (2002) A simple statistical mechanical model of water. J Phys Chem B. 106: 11829-11842. 133. V. Vlachy, B. Hribar-Lee, Y.V. Kalyuzhnyi, and K.A. Dill. (2004) Short-range interactions: from simple ions to polyelectrolyte solutions. Curr Opin Colloid In. 9: 128-132. 134. B.Q. Wei, L.H. Weaver, A.M. Ferrari, B.W. Matthews, and B.K. Shoichet. (2004) Testing a flexible- receptor docking algorithm in a model binding site. J Mol Biol. 337: 1161-82. 135. V.C. Ruddat, R. Mogul, I. Chorny, C. Chen, N. Perrin, S. Whitman, V. Kenyon, M.P. Jacobson, C.F. Bernasconi, and T.R. Holman. (2004) Tryptophan 500 and arginine 707 define product and substrate active site binding in soybean lipoxygenase-1. Biochem. 43: 13063-71. 136. M.R. Chance, A. Fiser, A. Sali, U. Pieper, N. Eswar, G. Xu, J.E. Fajardo, T. Radhakannan, and N. Marinkovic. (2004) High-throughput computational and experimental techniques in structural genomics. Genome Res. 14: 2145-54. 137. R. Beckmann, C.M. Spahn, N. Eswar, J. Helmers, P.A. Penczek, A. Sali, J. Frank, and G. Blobel. (2001) Architecture of the protein-conducting channel associated with the translating 80S ribosome. Cell. 107: 361-372. 138. M.F. Schmid, M.B. Sherman, P. Matsudaira, and W. Chiu. (2004) Structure of the acrosomal bundle. Nature. 431: 104-7. 139. D. Paschek and A.E. Garcia. (2004) Reversible temperature and pressure denaturation of a protein fragment: a replica exchange molecular dynamics simulation study. Phys Rev Lett. 93: 238105. 140. G.S. Jas and K. Kuczera. (2004) Equilibrium structure and folding of a helix-forming peptide: circular dichroism measurements and replica-exchange molecular dynamics simulations. Biophys J. 87: 3786- 98. 141. A.K. Felts, Y. Harano, E. Gallicchio, and R.M. Levy. (2004) Free energy surfaces of beta-hairpin and alpha-helical peptides generated by replica exchange molecular dynamics with the AGBNP implicit solvent model. Proteins. 56: 310-21. 142. R. Zhou. (2004) Exploring the protein folding free energy landscape: coupling replica exchange method with P3ME/RESPA algorithm. J Mol Graph Model. 22: 451-63. 143. W.Y. Yang, J.W. Pitera, W.C. Swope, and M. Gruebele. (2004) Heterogeneous folding of the trpzip hairpin: full atom simulation and experiment. J Mol Biol. 336: 241-51. 144. S. Jang, S. Shin, and Y. Pak. (2003) Replica-exchange method using the generalized effective potential. Phys Rev Lett. 91: 058305. 145. J.W. Pitera and W. Swope. (2003) Understanding folding and design: replica-exchange simulations of "Trp-cage" miniproteins. PNAS. 100: 7587-92. 146. Y.M. Rhee and V.S. Pande. (2003) Multiplexed-replica exchange molecular dynamics method for protein folding simulation. Biophys J. 84: 775-86. 147. N. Go and H.A. Scheraga. (1970) Ring Closure and Local Conformational Deformations of Chain Molecules. Biopolymers. 3: 178-187. 148. R.E. Bruccoleri and M. Karplus. (1985) Chain Closure with Bond Angle Variations. Macromolecules. 18: 2767-2773.

PHS 398/2590 (Rev. 09/04) Page 30 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. 149. W.J. Wedemeyer and H.A. Scheraga. (1999) Exact analytical loop closure in proteins using polynomial equations. J Comput Chem. 20: 819-844. 150. C. Baysal and H. Meirovitch. (1997) Efficiency of the local torsional deformations method for identifying the stable structures of cyclic molecules. J Phys Chem A. 101: 2185-2191. 151. M.W. Deem and J.S. Bader. (1996) A configurational bias Monte Carlo method for linear and cyclic peptides. Mol Phys. 87: 1245-1260. 152. M.G. Wu and M.W. Deem. (1999) Efficient Monte Carlo methods for cyclic peptides. Mol Phys. 97: 559- 580. 153. A.A. Canutescu and R.L. Dunbrack. (2003) Cyclic coordinate descent: A robotics algorithm for protein loop closure. Protein Sci. 12: 963-972. 154. D. Hoffmann and E.W. Knapp. (1996) Protein dynamics with off-lattice Monte Carlo moves. Phys Rev E. 53: 4221-4224. 155. S. Cahill, M. Cahill, and K. Cahill. (2003) On the kinematics of protein folding. J Comput Chem. 24: 1364-1370. 156. J.P. Ulmschneider and W.L. Jorgensen. (2004) Polypeptide folding using Monte Carlo sampling, concerted rotation, and continuum solvation. J Am Chem Soc. 126: 1849-1857. 157. J. Cortes, T. Simeon, M. Remaud-Simeon, and V. Tran. (2004) Geometric algorithms for the conformational analysis of long protein loops. J Comput Chem. 25: 956-967. 158. A.R. Dinner. (2000) Local deformations of polymers with nonplanar rigid main-chain internal coordinates. J Comput Chem. 21: 1132-1144. 159. L.R. Dodd, T.D. Boone, and D.N. Theodorou. (1993) A Concerted Rotation Algorithm for Atomistic Monte-Carlo Simulation of Polymer Melts and Glasses. Mol Phys. 78: 961-996. 160. R. Abagyan, S. Batalov, T. Cardozo, M. Totrov, J. Webber, and Y. Zhou. (1997) Homology modeling with internal coordinate mechanics: deformation zone mapping and improvements of models via conformational search. Proteins. Suppl 1: 29-37. 161. C. Wampler and A. Morgan. (1991) Solving the 6r Inverse Position Problem Using a Generic-Case Solution Methodology. Mech Mach Theory. 26: 91-106. 162. R. Manseur and K.L. Doty. (1988) A Fast Algorithm for Inverse Kinematic Analysis of Robot Manipulators. Int J Robot Res. 7: 52-63. 163. H.Y. Lee and C.G. Liang. (1988) Displacement Analysis of the General Spatial 7-Link 7r Mechanism. Mech Mach Theory. 23: 219-226. 164. J. Duffy, Analysis of Mechanisms and Robot Manipulators. 1980, London: Arnold. 165. K.H. Hunt, Kinematic Geometry of Mechanisms. 1990, Oxford: Oxford University Press. 166. Z. Li and H.A. Scheraga. (1987) Monte Carlo-minimization approach to the multiple-minima problem in protein folding. PNAS. 84: 6611-5. 167. P. Carnevali, G. Toth, G. Toubassi, and S.N. Meshkat. (2003) Fast protein structure prediction using Monte Carlo simulations with modal moves. J Am Chem Soc. 125: 14244-5. 168. K.D. Ball, B. Erman, and K.A. Dill. (2002) The elastic net algorithm and protein structure prediction. J Comput Chem. 23: 77-83. 169. K.M. Fiebig and K.A. Dill. (1993) Protein Core Assembly Processes. J Chem Phys. 98: 3475-3487. 170. K.A. Dill, K.M. Fiebig, and H.S. Chan. (1993) Cooperativity in Protein-Folding Kinetics. P Natl Acad Sci USA. 90: 1942-1946. 171. T.R. Weikl and K.A. Dill. (2003) Folding rates and low-entropy-loss routes of two-state proteins. J Mol Biol. 329: 585-598. 172. T.R. Weikl and K.A. Dill. (2003) Folding kinetics of two-state proteins: Effect of circularization, permutation, and crosslinks. J Mol Biol. 332: 953-963. 173. T.R. Weikl, M. Palassini, and K.A. Dill. (2004) Cooperativity in two-state protein folding kinetics. Protein Sci. 13: 822-829. 174. M.J. Sippl. (1993) Recognition of Errors in 3-Dimensional Structures of Proteins. Proteins. 17: 355-362. 175. R. Luthy, J.U. Bowie, and D. Eisenberg. (1992) Assessment of Protein Models with 3-Dimensional Profiles. Nature. 356: 83-85. 176. W. Huisinga, C. Best, R. Roitzsch, C. Schutte, and F. Cordes. (1999) From simulation data to conformational ensembles: Structure and dynamics-based methods. J Comput Chem. 20: 1760-1774.

PHS 398/2590 (Rev. 09/04) Page 31 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. 177. S. Kumar, D. Bouzida, R.H. Swendsen, P.A. Kollman, and J.M. Rosenberg. (1992) The Weighted Histogram Analysis Method for Free-Energy Calculations on Biomolecules .1. The Method. J Comput Chem. 13: 1011-1021. 178. F. Melo, R. Sanchez, and A. Sali, Statistical potentials for fold assessment, in Protein Sci. 2002. p. 430- 48. 179. A. Vagin and A. Teplyakov. (1997) MOLREP: an automated program for molecular replacement. J Appl Cryst. 30: 1022-1025. 180. P.D. Adams, K. Gopal, R.W. Grosse-Kunstleve, L.W. Hung, T.R. Ioerger, A.J. McCoy, N.W. Moriarty, R.K. Pai, R.J. Read, T.D. Romo, J.C. Sacchettini, N.K. Sauter, L.C. Storoni, and T.C. Terwilliger. (2004) Recent developments in the PHENIX software for automated crystallographic structure determination. J Synchrotron Radiat. 11: 53-5. 181. M.J. Sippl. (1993) Recognition of errors in three-dimensional structures of proteins. Proteins. 17: 355- 62. 182. D. Eisenberg, R. Luthy, and J.U. Bowie. (1997) VERIFY3D: assessment of protein models with three- dimensional profiles. Methods Enzymol. 277: 396-404. 183. W. Braun and N. Go. (1985) Calculation of protein conformations by proton-proton distance constraints. A new efficient algorithm. J Mol Biol. 186: 611-26. 184. J.S. Jiang and A.T. Brunger. (1994) Protein hydration observed by X-ray diffraction. Solvation properties of penicillopepsin and neuraminidase crystal structures. J Mol Biol. 243: 100-15. 185. G.J. Kleywegt and T.A. Jones. (1993) Masks made easy. CCP4/ESF-EACBM Newslett. Protein Crystallogr. 28: 56-59. 186. V.A. Eyrich, M.A. Marti-Renom, D. Przybylski, M.S. Madhusudhan, A. Fiser, F. Pazos, A. Valencia, A. Sali, and B. Rost. (2001) EVA: continuous automatic evaluation of protein structure prediction servers. Bioinformatics. 17: 1242-1243. 187. I.Y.Y. Koh, V.A. Eyrich, M.A. Marti-Renom, D. Przybylski, M.S. Madhusudhan, E. Narayanan, O. Grana, F. Pazos, A. Valencia, A. Sali, and B. Rost. (2003) EVA: evaluation of protein structure prediction servers. Nucleic Acids Res. 31. 188. L.F. Hennequin, A.P. Thomas, C. Johnstone, E.S. Stokes, P.A. Pl inverted question marke, J.J. Lohmann, D.J. Ogilvie, M. Dukes, S.R. Wedge, J.O. Curwen, J. Kendrew, and C. Lambert-van der Brempt. (1999) Design and structure-activity relationship of a new class of potent VEGF receptor tyrosine kinase inhibitors. J Med Chem. 42: 5369-89. 189. L.F. Hennequin, E.S. Stokes, A.P. Thomas, C. Johnstone, P.A. Ple, D.J. Ogilvie, M. Dukes, S.R. Wedge, J. Kendrew, and J.O. Curwen. (2002) Novel 4-anilinoquinazolines with C-7 basic side chains: design and structure activity relationship of a series of potent, orally active, VEGF receptor tyrosine kinase inhibitors. J Med Chem. 45: 1300-12. 190. L. Sun, N. Tran, C. Liang, F. Tang, A. Rice, R. Schreck, K. Waltz, L.K. Shawver, G. McMahon, and C. Tang. (1999) Design, synthesis, and evaluations of substituted 3-[(3- or 4-carboxyethylpyrrol-2- yl)methylidenyl]indolin-2-ones as inhibitors of VEGF, FGF, and PDGF receptor tyrosine kinases. J Med Chem. 42: 5120-30. 191. G. Bou, A. Oliver, M. Ojeda, C. Monzon, and J. Martinez-Beltran. (2000) Molecular characterization of FOX-4, a new AmpC-type plasmid-mediated beta-lactamase from an Escherichia coli strain isolated in Spain. Antimicrob Agents Chemother. 44: 2549-53. 192. A. Patera, L.C. Blaszczak, and B.K. Shoichet. (2000) Crystal Structures of Substrate and Inhibitor Complexes with AmpC b-lactamase: Possible Implications for Substrate-Assisted Catalysis. J. Am. Chem. Soc. 122: 10504-10512. 193. R.A. Powers, E. Caselli, P.J. Focia, F. Prati, and B.K. Shoichet. (2001) Structures of ceftazidime and its transition-state analogue in complex with AmpC beta-lactamase: implications for resistance mutations and inhibitor design. Biochemistry. 40: 9207-14. 194. B.M. Beadle, I. Trehan, P. Focia, and B.K. Shoichet. (2002) Structural Milestones in the Pathway of an Amide Hydrolase: substrate, acyl, and product complexes of cephalothin with AmpC b-lactamase. Structure. 10: 413-424. 195. F. Morandi, E. Caselli, S. Morandi, P.J. Focia, J. Blazquez, B.K. Shoichet, and F. Prati. (2003) Nanomolar inhibitors of AmpC beta-lactamase. J Am Chem Soc. 125: 685-95.

PHS 398/2590 (Rev. 09/04) Page 32 Continuation Format Page Principal Investigator/Program Director (Last, First, Middle): Jacobson, Matthew P. 196. D. Tondi, F. Morandi, R. Bonnet, M.P. Costi, and B.K. Shoichet. (2005) Optimizing a novel lead: structure-based design, synthesis and testing of non-ß-lactam inhibitors of ß-lactamase. submitted. 197. R.A. Powers, F. Morandi, and B.K. Shoichet. (2002) Structure-based discovery of a novel, noncovalent inhibitor of AmpC beta-lactamase. Structure (Camb). 10: 1013-23.

PHS 398/2590 (Rev. 09/04) Page 33 Continuation Format Page