Reprogrammed: This is the structure of the endonuclease I-MsoI computationally redesigned to target a new DNA » site. The redesigned enzyme displays altered site speci- ficity with a level of target discrimination comparable to that of wild type (see ref. 8). Such methods are currently being applied to more ambitious targets, such as disease-gene hot spots.

Proteins by Design New functional are buing built on advances in modeling and structure prediction By David Baker

magine having the power to create a brand new – a organism, scientists could also characterize the structure of each biosensor for any small molecule, say, or a novel enzyme of its proteins. This would unveil biomolecular interactions, struc- – on demand. It’s not pure fantasy. Computational structural tural homologies, functional roles, and potential drug targets that I biology is poised to put this power into our hands. might never be found from gene sequence alone. Along with a team of research groups around the world, we Although there is still a long way to go, with improvements to have begun designing novel proteins and folds from scratch, com- algorithms and increases in computing power, exciting progress puting amino acid sequences that will fold to create enzymatic is being made in both prediction and design. activities never before seen in nature. The possibilities are limited only by our imaginations: Picture an endonuclease designed STRUCTURE PREDICTION to thwart malaria, molecular sensors for bioterror agents, or a vaccine that HIV is less likely to evolve around. Now-classic experiments conducted with RNAase in the early The mechanics of these engineering feats are closely related, 1970s demonstrated that all the information necessary to fold a perhaps not surprisingly, to their logical inverse: structure predic- protein resides in its amino acid sequence. This suggested that tion. Scientists have for years tried to develop methods for pre- protein-structure prediction could be fairly straightforward. But dicting a protein’s structure simply from its amino acid sequence. going from sequence to structure has proven phenomenally dif-

Imagine that in the time it takes to sequence the genome of an ficult – biology’s version of predicting the weather – at least in part Ω Courtesy Justin Ashworth

26 The Scientist July 2006 July 2006 The Scientist 27 proteins by design

because even a relatively small protein amino acid sequence that will fold in such can assume a vast number of possible a way as to create a protein structure that conformations. According to “Levinthal’s carries out a desired function. Paradox,” which assumes each amino acid The field has made exciting progress has three rotational degrees of freedom designing proteins with new structures (and that’s an underestimate), a 100- and functions. In 1998, Stephen Mayo residue sequence could adopt at least 3100 and coworkers at the California Institute possible conformations. of Technology computed a novel sequence To attack this problem, we have that folded into a naturally occurring zinc developed a computer program called finger structure. In 2003, Brian Kuhlman, ROSETTA, which has at its core a now a professor at the University of method for computing the energy of a North Carolina, Chapel Hill, and Gautam given protein conformation. Eliminating Dantas in my group went a step further, unlikely structures that have, for instance, using ROSETTA to design an exception- hydrophobic residues exposed to solvent, ally stable protein called , which has the program intelligently samples the total a sequence and structure unrelated to protein-folding landscape, testing perhaps any known protein. The 93-amino acid a million or so possible conformations for protein was found to be monomeric and the lowest energy structure. folded, and it’s X-ray structure lined up To benchmark our progress, we have remarkably well with our prediction, dem- since 1998 been entering ROSETTA predic- onstrating that modern protein-design tions in a worldwide structure-prediction methodology can design brand-new pro- experiment called CASP (Critical Assess- teins with atomic-level accuracy.2 ment of techniques for protein Structure While the creation of completely new Prediction; www.predictioncenter.org). structures is exciting, current efforts in Instituted more than a decade ago, CASP is the protein-design field have primarily a sort of structural biology proving ground, taken aim at giving existing proteins func- a community experiment/competition in tions found elsewhere in protein chemis- which participants are asked to predict the try. Mayo, along with Daniel Bolon, for shape of proteins whose structures have instance, used the catalytically inactive been elucidated but not yet published. Escherichia coli protein, thioredoxin, as a The predictions are then compared to the scaffold for a novel enzyme capable of cat- correct structures, and a meeting is held to alyzing the histidine-mediated hydrolysis discuss the results and identify the most of p-nitrophenyl acetate.3 Bill DeGrado’s promising methods and the most impor- group at the University of Pennsylvania tant problems remaining to be solved. engineered a metalloenzyme site into a No algorithm has yet produced a pre- designed four-helix bundle protein, while cisely correct structure, but ROSETTA has Homme Hellinga of Duke University performed very well in these tests, and our Medical Center in Durham, NC, and col- predictions (and those of other groups) get leagues have used members of the E. coli closer every year. A highlight of the most periplasmic binding protein superfamily recent event, CASP6, for instance, was the as scaffolds upon which to design new prediction by Phil Bradley in my group of a biosensors for, among other molecules, 76-residue protein to within 1.5 Angstroms trinitrotoluene (TNT) and glucose.4 of the correct structure. Phil followed up Remarkably, Hellinga and coworkers on this achievement by showing we could succeeded in coupling their new designed predict structures with this accuracy for a biosensors to cellular signaling pathways number of small proteins.1 to generate bacteria that turn blue when exposed to TNT; this work may lead to DE NOVO DESIGN new detection methods for the land mines plaguing much of the world. More recently, is essentially the inverse Hellinga’s team successfully converted an of prediction; here, we are asking for the otherwise inactive ribose-binding protein

28 The Scientist July 2006 into an extremely active catalyst of the lize the complex, and then identified com- triose phosphate isomerase reaction.5 pensatory changes we could make in the The next step is the important but second partner to restore the interaction. formidable challenge of creating enzymes In this way, we developed new protein- to catalyze chemical reactions not per- protein pairs that interact with each other formed by naturally occurring proteins with subnanomolar affinities, but which (see “Designing a New Catalyst”). Our do not interact with their cognate wild- group is heading up a worldwide team type partners.6 of researchers representing the wide We are also working to design new range of expertise that will be required protein-DNA interactions. In 2002, for for success. The team includes the design example, working with the groups of groups of Mayo, Hellinga, Kuhlman, and Ray Monnat at the University of Wash- ; the computational chemistry ington and Barry Stoddard at the Fred expertise of William Jorgensen’s group Hutchinson Cancer Research Center in at Yale; Ken Houk’s quantum chemistry group at the University of California, Los Angeles, which brings the ability to accu- While the creation of completely rately compute structures of active sites optimal for stabilizing reaction transition new structures is exciting, current states; the molecular evolution and cata- efforts in the protein-design field have lytic antibody expertise of Don Hilvert’s largely aimed at giving existing proteins group in Switzerland; and other groups functions found elsewhere. with expertise ranging from computer science and physical chemistry to bio- chemistry and molecular biology. Although a tremendous challenge, with this stellar Seattle, we developed a novel endonucle- team, brought together and funded by ase, E-DreI, by fusing domains from two the Defense Advanced Research Projects separate homing enzymes, I-DmoI and Agency (DARPA), I am optimistic we will I-CreI.7 The new enzyme has a DNA- see some real breakthroughs. binding specificity that is a hybrid of the two parent enzymes. INTERACTION DESIGN Recently, Jim Havranek in my group extended the ROSETTA design meth- Beyond de novo design of catalysts, reen- odology to the reengineering of protein- gineering the interface of macromolecular DNA interaction specificity, and graduate interactions, whether between proteins student Justin Ashworth computationally or between a protein and a nucleic acid, redesigned the DNA-binding interface of is an important goal. Macromolecular the I-MsoI homing endonuclease to cleave interactions play critical regulatory and a new DNA sequence.8 Justin’s experi- functional roles in the cell, and redesign- mental characterization of the redesigned ing these could lead to the development enzyme showed that it efficiently cleaves of new drug compounds, research tools, the new site, but not the original site, and/or diagnostics. As with de novo and the high-resolution crystal structure protein design, we have made exciting confirmed the accuracy of the design (see progress in this area. image, p. 27). Lukasz Joachimiak, a graduate student We are now using this computational in my group, and Tanja Kortemme, now design approach to try to create therapeu- a professor at the University of Califor- tically useful endonucleases. Designed nia, San Francisco, used as a test bed the enzymes could be introduced into mutant interaction between colicin E7 (a nonspe- cells or organisms, together with a wild- cific DNAase) and its inhibitor, immunity type copy of the mutant gene, to drive protein Im7. We first identified contacts on gene therapy, for instance. Following one partner in the pair that would destabi- cleavage of the disease gene by the novel Ω

July 2006 The Scientist 29 proteins by design

enzyme, the wild-type sequence would be ated with proteins that misfold to form used to drive DNA repair, thereby fixing amyloid fibrils. David Eisenberg’s group the genetic defect. In collaboration with at UCLA made a major breakthrough a team of researchers worldwide we are last year in the understanding of how also seeking to design new endonucleases this occurs when they reported the first that would inactivate the genes in mos- high-resolution structure of an amyloid- quitoes required for the malaria parasite forming peptide.9 The study revealed a to survive and propagate; such engineered set of interactions that seem very likely enzymes could play roles in malaria eradi- to be general to most, if not all, amyloid cation programs. structures. John Karanicolas in my group has been collaborating with Eisenberg’s OTHER APPLICATIONS group to try to predict the portions of proteins responsible for amyloid fiber Protein design methodology can be formation. The design methodology in useful in unanticipated ways. A number ROSETTA was used to identify sequences of significant human diseases, including compatible with a generalized model of Alzheimer and Parkinson, are associ- their amyloid structure.10

Designing a New Catalyst In a recent demonstration of computational protein-design principles, we approached an intramolecular aldol reaction whose substrate is not catalyzed by naturally occurring enzymes. The aldol reaction constitutes one of the most powerful tools for the formation of carbon- carbon bonds both in nature and the lab. Our goal is to design a non-natural aldolase able to catalyze reactions of non-natural substrates.

Target Reaction: Hajos-Parrish-Eder-Sauer-Wiechert intramolecular aldol reaction.

Enzyme TS Model

Lys 201

Site Description: The proposed catalytic residues in the active site are based on three key residues – Lysine 167, Aspartate 102, and Asp 102 Lysine 201 – found in the native deoxyribose- phosphate aldolase (DERA). The transition state (TS) model (yellow) and optimal functional group positions of the catalytic residues (green) are calculated by quantum mechanical (QM) methods. Lys 167

30 The Scientist July 2006 Protein design also has commercial tions, such as that of the antimalarial drug, implications. Xencor, a protein-engineering artemisinin. firm in Monrovia, Calif., for instance, has Yet we remain constrained by compu- used its Protein Design Automation tech- tational power. The calculations required nology to develop XPro-1595, a dominant- to probe and refine a protein’s structure negative inhibitor of tumor necrosis factor are intensive, as they must account for all alpha, which is expected to enter clinical the various rotational and conformation trials in 2006. Researchers at Sangamo degrees of freedom. It took 15 processor- Biosciences recently announced their days (3.2 GHz CPUs) to model the docking success in repairing genes in vivo using a interface between two proteins for a com- designed, zinc-finger-based enzyme. Bio- munity experiment similar to CASP, the sensors, like Hellinga’s novel designs, could Critical Assessment of Prediction of Inter- be used to build chips, or even to regulate actions (CAPRI, http://capri.ebi.ac.uk). The an implantable artificial pancreas. And same calculations would have taken a year then there are the nearly limitless options on decade-old hardware. It took approxi- in organic chemistry for enzymes capable mately 150 CPU-days to make the folding of accelerating complicated synthesis reac- predictions for our 2005 Science paper.1 Ω

ROSETTA Design: Rosetta algorithms search for new active sites in a library of protein scaffolds and design the residues surrounding these potential active sites to further stabi- lize the TS model. The protein should fold in a way that will place the desired catalytic groups in the active site and have pocket-shape complementary to the TS model.

Experimentation: This designed protein retains the desired catalytic geometry from the QM calculation and the pocket to bind the TS model. The next steps would include synthesizing the designed protein, evaluating its catalytic activity, and crystallizing it for comparison with the predicted structure. Such experiments will further test our under- standing of catalytic sites.

July 2006 The Scientist 31 proteins by design

To enable the searching of the huge These examples are just the beginning. conformational and sequence spaces I would guess the potential for protein associated with protein-structure predic- design is nearly as vast as the diversity of tion and design, David Kim in my group biology itself. With the power to create has developed a distributed computing unheard of catalysts, improved biomo- version of ROSETTA, called rosetta@ lecular interactions, and genuinely useful home, which harnesses the collective new proteins, I consider myself privileged computing power of tens of thousands of to be part of a field limited merely by computers around the world. imagination and computing power. And I At last count rosetta@home had am even more privileged to have had the some 67,000 users worldwide, yielding a opportunity to work with the wonderful combined 29 teraflops of computational students and postdoctoral fellows who power – good enough for eighth place on have come through my research group, the November 2005 list of the world’s top many of whom continue at their respec- 500 supercomputers. The resulting web tive universities to develop the ROSETTA of computing power is greatly increas- prediction and design methodology and ing the rate at which we can improve the apply it to problems I never would have ROSETTA structure-prediction method- dreamed of. n ology, as we can test new ideas much more quickly. Currently rosetta@home users are David Baker is a professor of biochemistry and a field-testing our new methods on the just- Howard Hughes Medical Institute investigator at the , Seattle. You can sign released prediction targets for the ongoing up for the rosetta@home project from his lab Web CASP7 structure-prediction experiment, page, www.bakerlab.org which closes August 4. [email protected] When CASP7 is completed, we will also direct our users’ computational power References to important design problems, such as the 1. P. Bradley et al., “Toward high-resolution de novo structure prediction for small proteins,” Science, development of a vaccine for HIV. The 309:1868–71, 2005. reason people have difficulty fighting off 2. B. Kuhlman et al., “Design of a novel globular HIV is that the major viral coat protein, protein fold with atomic level accuracy,” Science, gp120, has many highly variable loops. 302:1364–8, 2003. Immune responses to the virus tend to be 3. D.N. Bolon, S.L. Mayo, “Enzyme-like proteins by computational design,” Proc Natl Acad Sci, aimed at these variable regions and hence 98:14274–9, 2001. are relatively ineffective against the virus. 4. L.L. Looger et al., “Computational design There are, however, a few key regions of of receptor and sensor proteins with novel gp120 that cannot change because they are functions,” Nature, 423:185–90, 2003. critical to virus infectivity. Bill Schief in my 5. M.A. Dwyer et al, “Computational design of a biologically active enzyme,” Science, 304:1967–71, group, in collaboration with the groups of 2004. Peter Kwong, Rich Wyatt, and Gary Nabel 6. T. Kortemme et al., “Computational redesign of at the National Institutes of Health, Leo protein-protein interaction specificity,”Nat Struct Stamatatos at the Seattle Biomedical Mol Biol, 11:371–9, 2004. Research Institute, Roland Strong at the 7. B.S. Chevalier et al., “Design, activity, and structure of a highly specific artificial endonuclease,”Mol Fred Hutchinson Cancer Research Center, Cell, 10:895–905, 2002. and Dennis Burton at Scripps Research 8. J. Ashworth et al., “Computational redesign Institute, is now designing a series of novel of endonuclease DNA binding and cleavage protein vaccines designed to mimic these specificity,”Nature , in press, 2006. Achilles-heel regions of the virus. Our col- 9. R. Nelson, “Structure of the cross-b spine of amyloid-like fibrils,”Nature , 435:773–8, 2005. laborators will be testing whether antibod- 10. M.J. Thompson et al., “The 3D profile method for ies made against these potential vaccines identifying fibril-forming segments of proteins,” can neutralize virus infectivity. Proc Natl Acad Sci, 103:4074–8, March 14, 2006.

32 The Scientist July 2006