464

Core-directed design Derek N Woolfson

For various reasons, it seems sensible to redesign or design and combinatorial design. I discuss the latter in detail later. from the inside out. Past approaches in this field have By wholly rational design, I mean the direct application of involved iterations of mutagenesis and characterisation to sequence-to-structure rules to achieve a specific target ‘evolve’ designs. Increasingly, combinatorial approaches are structure. Preferably, the rules should be understood in being taken to select ‘fit’ sequences from libraries of variant physicochemical terms. The rules may be positive, that is, proteins. In particular, in silico methods have been used to to design towards the target, or negative, to disfavour and good effect. More recently, experimental methods have been design away from alternative structures [4–6]. developed and improved. We are now in a position to redesign stability and function into natural protein frameworks confidently Not surprisingly, current successes in wholly rational and to attempt de novo designs for more ambitious targets. approaches are limited to special cases. From the perspective of core-directed design, the best examples are the rules for Addresses oligomer-state selection in coiled coils, which are two-, Centre for Biomolecular Design and Drug Development, School of three-, four- or five-stranded helical bundles. The seminal Biological Sciences, University of Sussex, Falmer BN1 9QG, UK; e-mail: [email protected] studies of Harbury et al. [7] on mutants of the leucine zipper show that the oligomer state can be distinguished — at Current Opinion in Structural Biology 2001, 11:464–471 least between parallel dimer, trimer and tetramer — using 0959-440X/01/$ — see front matter appropriate combinations of isoleucine and leucine at © 2001 Elsevier Science Ltd. All rights reserved. the a and d positions of the abcdefg (heptad) sequence Abbreviations repeat. The resulting rules have been, and doubtless will PDB continue to be, improved [1,8,9]. Nonetheless, the current TIM triose phosphate isomerase rules provide clear guidelines for constructing specified WT wild-type coiled-coil oligomers and form the basis of more ambitious designs [1,10•,11–14]. Introduction The organisation of a hydrophobic core provides the main Iterative experimental design processes driving force for protein folding and stabilisation and, in For non-coiled-coil proteins, design is not so prescribed some cases, native-state specification. It seems reasonable, and alternative routes to correctly folded, stable structures therefore, to design new proteins from the inside out. are needed. One approach is to design iteratively, adding Increasingly, protein designers are taking this approach, small positive and negative design features step-by-step which I refer to as core-directed design. and testing the intermediates experimentally.

In certain cases, core-directed design is protein design The design of α3D illustrates this process [15]. α3D is a single- heaven; for example, stability and specificity can be built chain protein designed to form a mixed parallel/antiparallel into simple coiled-coil structures using a few knowledge- three-helix bundle. The starting point is Coil-Ser, a based rules that can be applied without involving previously described three-stranded coiled coil [16]. This computers [1]. Unfortunately, this understanding does not is used as a template to design α3C [17], which has extend to globular proteins. Early attempts to design shortened helices, helix-capping motifs and a repacked globular proteins took iterative approaches, in which related hydrophobic core to introduce heterogeneity and disfavour sequences were sequentially tested for stability and coiled-coil-type packing. The NMR structure of α3D (a structural uniqueness — effectively evolving the designs. variant of α3C) agrees reasonably with the design model. A Now, combinatorial methods are being applied. In these noteworthy point is the use of negative design: interhelix approaches, many core sequences that are potentially electrostatic interactions are used to orientate the helices in compatible with a target structure are tested simultaneously an anticlockwise manner and disfavour alternative topolo- and winners selected. Selection can be done in silico or via gies — this principle also works in a canonical coiled-coil wet experiments; the latter are generally referred to as system [10•,18]. The iterative redesign and characterisation directed-evolution methods. The main computational of α3D is ongoing [19]. Incidentally, Coil-Ser has been methods are amply reviewed elsewhere [2,3]. This review used as a template to make another single-chain three- focuses largely on recent experimental approaches to the helix bundle [20]. problem of core-directed design. Many iterative designs and redesigns have focused on Rule-based or wholly rational design and four-helix bundles, which offer a step up in complexity special cases from coiled coils. The classic example is DeGrado’s evolution It is convenient to consider two broad approaches in protein of four-helix-bundle designs, which was recently reviewed redesign and de novo design; namely, wholly rational design by Hill et al. [6]. Core-directed protein design Woolfson 465

Figure 1

(a) (b)

(c) (d)

Current Opinion in Structural Biology

Orthogonal views of various four-helix-bundle structures. (a) WT ROP (PDB code 1rop; [60]). (b) The Ala2Ile2-6 core mutant of ROP (PDB code 1f4m; [27•]). (c) The A31P mutant of ROP (PDB code 1b6q; [28•]). (d) The de novo design α2D (PDB code 1qp6; [29]). Different chains of each structure are coloured blue and red; N termini are highlighted black.

Gibney et al. [21] describe an iterative approach to map out The four-helix bundle ROP is a dimer of an antiparallel sequence space and the associated free-energy landscape helical hairpin (Figure 1a). The active RNA-binding site is of a previously designed four-helix-bundle maquette [22,23]. on the face formed by the two copies of helix 1, which are This is done on a modest scale, limited by the level of antiparallel; this provides a convenient probe for the native characterisation undertaken. The parent peptide has structure. The structure has a core of a and d layers from histidine at two a sites to promote haem binding and three coiled-coil-like heptad repeats. Regan’s group [26] has leucines at d sites. The apoform is unstable to guanidine systematically repacked this core. For example, in denaturation and displays structural heterogeneity. In an Ala2Leu2-6, the middle six a and d sites are exchanged for attempt to improve this design, single, double and triple alanine and leucine, respectively. Theoretically, this mutant mutants were made to introduce isoleucine, valine and maintains the wild-type (WT) core volume. Consistent phenylalanine at the d positions. The most-promising with this, the mutant is active but thermally stabilised. • mutants (in terms of stability and structural uniqueness) at Willis et al. [27 ] characterise a related mutant, Ala2Ile2-6, each stage are taken to the next stage; the first and second which is also thermally stabilised but inactive. The crystal iterations returned improved designs, but the third iteration structure explains the loss of activity: compared with the was disappointing as the peptides lost conformational WT structure, one protomer in the mutant is rotated 180° uniqueness. The maquette is being used as a template for around the dimer interface (Figure 1b). In this new topology, the iterative redesign of haem-binding pockets [24]. the two copies of helix 1 are juxtaposed diagonally rather than adjacent, which splits the active face. Interesting but cautionary tales from the repacking of four-helix bundles A more perplexing structural rearrangement occurs in the As highlighted for the coiled coils [1,7,25], repacking a ROP mutant A31P [28•], which is a helical dimer with hydrophobic core can have consequences other than changing reduced stability but some activity. The crystal structure stability and conformational heterogeneity. Four-helix reveals a remarkable architectural transformation to a bundles also alter in response to mutation. ‘bisecting U’ motif, in which the helical hairpins intercalate 466 Engineering and design

to form a right-handed four-helix bundle (Figure 1c). The from high-resolution structures of the WT and mutant term ‘bisecting U’ is introduced by Hill to describe the proteins as the weighted sum of the atom–atom contacts structure of the designed four-helix bundle α2D (Figure 1d) within that region. Two weightings are included for [6,29]. Presumably, ROP A31P retains activity because the distance and solvent accessibility, which dampen contributions two copies of helix 1 remain adjacent, although they are from distantly spaced atoms and from surface-exposed ∆ parallel and not antiparallel as in the WT structure. This residues, respectively. nh is the difference between the dramatic rearrangement is particularly worrisome because it nh of the mutant and WT structures. Given the limited ∆∆ ∆ results from a single amino acid change to the ROP sequence. experimental data available, G and nh correlate reason- ∆∆ ∆ ably, and the slopes of G versus nh plots essentially Thus, simple rules like those for packing coiled coils are not provide values for the energy cost per contact lost. In this ∆ as forthcoming in ROP and similar systems. This probably respect, nh may prove useful in quickly assessing the reflects the fact that, firstly, ROP is more complicated than relative quality of in silico generated design models. Similar, the leucine zippers on which the coiled-coil studies are though less-sophisticated parameters have been introduced based, as ROP is twice the size and has less-regular sequence by others, which may also be useful in this regard [31]. repeats. Secondly, alternative four-helix-bundle topologies and architectures are possibly more similar in energy than the What about assessing structural specificity? Fleming and alternative oligomer states of coiled coils [6]. Richards [32•] apply an occluded surface algorithm to calculate packing efficiencies in high-resolution protein Combinatorial design and the general case structures. A striking approximately 20% variation in packing How can one design without specific rules to relate sequence parameters is noted across these structures. Briefly, packing and structure? The answer is to take a combinatorial efficiency increases with protein size, α-helix content and approach. This can be done either in silico or using wet content of aromatic and small residues. The higher packing experiments. Both processes involve selecting fit variants from density of α-helical structures is a result of good intrahelix libraries of sequences for the targeted structural scaffold. packing, primarily between backbone atoms. In terms of intersecondary structure packing, β strand–β strand inter- Computational approaches in combinatorial design actions show the highest occluded surfaces, whereas β and redesign strand–α helix interactions are poorer and only marginally Various computational methods have been developed for better than intrastrand packing efficiencies; this fits neatly combinatorial core redesign and design. In essence, with recent experimental findings [33••]. Finally, the sequence and (to differing extents) conformational spaces packing efficiencies of proteins from the same structural are searched using methods such as simulated annealing, family are similar — which presumably reflects the sum of dead-end elimination, Metropolis Monte Carlo sampling the above correlations — and the authors suggest that and genetic algorithms. Sequences are then scored on the these calculations will be of use in benchmarking and basis of the physicochemical attributes of proteins, such as validating homology and other models. Presumably, this van der Waals contacts, solvation terms, secondary structure includes design models. propensities, electrostatic energies and hydrogen-bond potentials, which are parameterised with varying degrees Jiang et al. [34••] present a new algorithm (CORE) for in silico of approximation and sophistication. The developments combinatorial design. Hydrophobic core residues are mutated and successes in this area have been considerable. The on fixed backbone structures. Metropolis-driven simulated field is extremely well reviewed elsewhere [2,3] and, with annealing and Monte Carlo sampling are combined with a a few exceptions, I will not dwell on it. novel scoring function to find sequence and rotamer com- binations for potentially hyperstable proteins. The scoring New parameters and algorithms for in silico function selects combinations with the best compromise core-directed design of minimal atom–atom clashes, maximal burial of hydro- In silico design requires scoring functions to rank the carbon and lowest sidechain conformational entropy. The compatibility of the sequences searched with the target assessment of steric compatibility is straightforward and structure. The functions must be quick to implement and, does not evaluate van der Waals interactions explicitly. ∆ therefore, must make assumptions about interactions within The thermodynamic term CP, which is an experimental proteins. The development of parameters that make parameter that reflects the amount of hydrophobic surface scoring functions faster and/or more realistic, therefore, has buried during protein folding, is implemented to drive clear benefits for protein design. towards sequences with maximum burial of hydrocarbon. The entropy term is introduced to select combinations of Parameterising the part of the hydrophobic interaction that residues that ‘freeze out’ more conformations upon packing. stabilises protein structures is one issue. In an attempt to The latter seems counterintuitive, but is rationalised in that understand the energetics of core deletions from protein- structurally unique and cooperatively folded states require • ∆ engineering studies, Vlassi et al. [30 ] introduce nh. nh relatively fixed sidechains to make specific interactions. In reflects the number of methylene and methyl contacts a sense, the entropy term is an attempt to parameterise made within 6 Å of the site of mutation, and it is calculated negative design by selecting complementary fits. The Core-directed protein design Woolfson 467

Figure 2

Phage-display selection of stable, folded (a) (b) proteins. (a) Selectively infective phage (SIP) takes advantage of the three-domain structure of the minor coat protein (g3p) of phage. The C-terminal domain anchors the protein in the viral coat, whereas the N-terminal domains are responsible for binding and infection in E. coli. Cloning a library into the flexible linker before the C-terminal domain allows protease-based Protease selection because proteolysis of the insert removes the N-terminal domains and prevents Protease infection in E. coli. This selects against unstable inserts [38,39]. (b) Alternatively, an uninterrupted g3p can be used as follows. His-tag–target–g3p–phage allows intact protein–phage fusions to be tethered to nickel-coated surfaces, which can be washed with protease to remove phage harbouring unstable linkers [40•]. In this case, selection can be monitored directly by surface plasmon His6 His6 resonance in BIACORE, which allows many Ni Ni conditions to be tested quickly and individual clones to be compared. Alternatively, Ni-NTA- agarose beads can be used for large-scale Current Opinion in Structural Biology selections. The domains of g3p are represented by shaded ovals and the targeted protein inserts are represented by rectangles.

disadvantage of the current version of CORE is that the leads to display of the target on the phage surface, where it polar residue and backbone contexts of the target are can be subjected to various selections. Because the constrained, which may explain why it returns core sequences for the fusion is encased within the phage, phenotype that are closely related to the WT. This aside, CORE’s and genotype are linked, and selected proteins can be ability to cope with large structures is impressive and its identified by DNA sequencing. This system relies on the potential to relate in silico and experimental parameters is compliance of Escherichia coli to become infected by and promising. The group has used CORE to design a hyper- then to propagate the phage. Traditional phage-display thermophilic protein [35]. selection relies on the displayed proteins binding some- thing, which is a function. The selection of stable proteins Experimental approaches in combinatorial design without resorting to functional selection has been achieved and redesign in two ways. Kristensen and Winter [38] and Sieber et al. [39] The first approaches to the combinatorial redesign of protein employ selectively infective phage (SIP) [41], whereas my cores were experimental [36,37]. These used function-based colleagues and I use an alternative approach, which selections; however, true experimental counterparts of the involves more traditional protein–phage fusions, to select aforementioned in silico methods require selections that do stable target proteins (Figure 2) [40•]. not rely on function, but reflect only structure and stability. Such methods would complement in silico methods and The proof-of-principle studies for these methods use a find applications, firstly, in optimising stability under specific variety of control inserts and/or relatively small protein conditions, secondly, in de novo design or redesign libraries. All three groups have now presented more where selectable functions are not available and, thirdly, in ambitious applications of the new technology: we rescued establishing sequence-to-structure/stability rules where stable ubiquitin variants from a library of hydrophobic core structure/stability and structure/function must be uncoupled. mutants [42••]; Riechmann and Winter [43•] generated stable protein chimeras by complementing half of the CspA Three groups have succeeded in selecting stable proteins protein with fragments generated from genomic E. coli without a functional screen or selection [38,39,40•]. All DNA [43]; and Martin et al. [44•] described the selection of combine phage display and proteolysis to recover stable hyperstable variants of a mesophilic CspB. proteins from mutant libraries. The underlying principle is straightforward: poorly folded mutants are proteolysed more Oil-drop versus jigsaw-puzzle models for rapidly than competently folded and stable variants. But how core packing are stably folded (intact) variants rescued? In phage display, The issue of whether complementary core packing is a target gene is fused to that for a phage coat protein. This necessary for folding to a stable, unique state has also been 468 Engineering and design

addressed during the period of review. The influential maintaining an active conformation of triose phosphate β/α work of DeGrado and co-workers [6] on the evolution of isomerase (TIM), the archetypal ( )8 barrel. Effectively, designed four-helix bundles emphasises that achieving this structure has two hydrophobic cores. The main conclusion stability and structural uniqueness are distinct; some of the regarding core packing is that the two cores react differently earlier designs were stable to denaturation, but showed to mutation: the core between the outer α helices and structural heterogeneity. Achieving structural uniqueness the inner β barrel is tolerant, whereas the inner core of the using negative design is now recognised as key in protein β barrel is extremely sensitive. design. Is negative design necessary, however, in hydrophobic core design or can structural specificity be Where do these studies leave the protein designer? Recent achieved elsewhere in the sequence and structure? computational studies on other systems [49–51] lend support to the experimental work on ubiquitin and TIM; There are two extreme models for core packing. In the oil- that is, for certain proteins, the jigsaw-puzzle model for drop model, partitioning of hydrophobic and polar residues core packing might be appropriate. In one respect, this is is paramount and the precise fit in the core secondary. In encouraging. If stable sequences do cluster in sequence space, the jigsaw-puzzle model, however, shape and chemical such regions might be homed in on or otherwise targeted complementarity of residues in the core are all-important in computational and experimental combinatorial design. in defining the structural uniqueness. These concepts and Indeed, such approaches are underway [33••,50,52]. On models are more fully reviewed elsewhere [2,4]. the other hand, if the stable regions of sequence space are highly focused, locating them could prove difficult. The Until recently, combinatorial mutagenesis studies of protein problem could be particularly difficult for true de novo cores lent force to the oil-drop model; within the restraints design of novel structures. Nonetheless, Kuhlman and of maintaining ballpark hydrophobicity and volume, the Baker [51] inject some optimism here: sequence simulations cores of λ-repressor [36], barnase [37] and T4 lysozyme [45] using NMR-derived templates (compared with using X-ray tolerate amino-acid substitutions. Recent experimental, structures) illustrate that introducing backbone flexibility bioinformatics and theoretical work, however, suggests that, widens the net of sequences compatible with the target for other proteins and even for groups of structurally related [51]. Furthermore, even for ubiquitin sequences with half proteins, this is not necessarily the case. the core sites altered, structures that do fold correctly can be selected computationally and experimentally [42••,46,53]. Evidence highlighting the need for specific constellations of residues within a protein core comes from combinatorial Experimental approaches in combinatorial de mutagenesis and selection of ubiquitin [42••]. A library has novo design been created in which the first eight core positions are The selection studies described above use natural scaffolds substituted with combinations of phenylalanine, isoleucine, — they are redesigns. What is the scope for the design of leucine, methionine and valine. (Multiple amino acids can be novel structures using combinatorial approaches? encoded at a single position in a protein by introducing degenerate codons into the synthetic oligonucleotides used Keefe and Szostak [54••] derive ATP-binding peptides for mutagenesis. For example, {AGT}T{CG} encodes the from a library of 80-residue randomers displayed on hydrophobic residues phenylalanine, isoleucine, leucine, mRNA. Cycles of selection and a round of mutagenesis are methionine and valine in the ratio 1:1:1:1:2, whereas combined to increase the ATP-binding fraction of the {ACG}A{ACGT} encodes the polar subset aspartic acid, library. Many of the selectants have a similar 45-residue glutamic acid, histidine, lysine, asparagine and glutamine in hub with a CXXC zinc-binding motif, which is responsible equal numbers.) The stable ubiquitin selectants show three for activity. Unfortunately, none of the selectants are surprises. Firstly, most have only two, three or four differences isolated or characterised in structural detail. The authors from WT, whereas random selection would have mostly suggest that approximately 1 in 1011 randomly generated returned sequences with seven substitutions. Secondly, their sequences should have some targetable function and consensus sequence differs from WT at only one site (V26L). although this bodes well for directed evolution approaches, Thirdly, none are as stable as WT. Thus, after selection, the it highlights the gargantuan task facing de novo design. library becomes more like WT, although WT stability is not matched. These results concur with earlier computational Keefe and Szostak’s study is wonderful and I respect their studies that use design algorithms that either repack the views; however, I feel that more targeted approaches are ubiquitin core [46,47] or create simulated sequences for also needed. The difficulties here are, firstly, to choose or ubiquitin-like architectures [48]. Together, these studies design a starting template shrewdly and, secondly, to suggest that specific constellations of residues, or folding restrict sequence space to permit an experiment while nuclei, may be important for the ubiquitin-like superfamily. allowing enough freedom to encounter stable proteins.

More experimental evidence comes from excellent work Binary patterns of hydrophobic and polar residues (HP by Silverman et al. [33••]. This describes a combinatorial patterns) simplify protein sequences, and offer one approach dissection of structural residues important in defining and to deriving templates and limiting amino acid usage [55]. Core-directed protein design Woolfson 469

Roy and Hecht [56••] use HP patterns to make a library of used to guide the design of libraries for combinatorial potential four-helix-bundle structures: amphipathic helical experiments on TIM. The selected (functional) sequences segments — with PHPPHHPPHPPHHP patterns of polar indicate that approximately five times as many of the (lysine, histidine, glutamate, glutamine, aspartate and phylogenetically hydrophobic sites show amino-acid asparagine) and hydrophobic (phenylalanine, isoleucine, preferences compared with the polar sites. leucine, methionine and valine) residues generated using the degenerate codons described above — are linked by glycine, Conclusions proline and polar-based turns. Although the sequences Good progress is being made in experimental core-directed cannot all be sampled, the library size is potentially design to complement in silico approaches. Iterative 5 × 1041, which is approximately 54 orders of magnitude approaches are being formalised as a continued means to smaller than a completely random library. In this study, test design principles and hone specific designs. Interesting, proteins are not ‘actively selected’; monoclonals are simply though cautionary, results are still emerging from the expressed. Most of the 26 variants analysed are monomeric redesign and design of four-helix-bundle structures. Protein and half of them unfold with sigmoidal thermal unfolding engineering experiments continue to be rationalised to curves and measurable enthalpies. The authors argue that relate stability changes to new structural parameters, which this should be true for the majority of the library, although could be of value in improving scoring functions for in silico it is likely that some in vivo selection for competently design. The most encouraging signs are in experimental folded, expressible and nontoxic proteins is at play. By combinatorial approaches. Here, methods are being devel- comparison, sequences generated from truly random libraries oped to recover stable and correctly folded proteins from are generally not so well behaved. combinatorial libraries without functional selections or screens. In addition, because of the vastness of sequence A cautionary tale for this approach comes from the same and structural space, the design of libraries for such work is group. To promote amphipathic β-structures, West et al. [57•] being rationalised and focused. In short, we are in a strong generate semi-random sequences with six segments of position to redesign stability into existing protein frame- alternating HP patterns separated by turn-promoting works with confidence and we are better placed to tackle tetrapeptides [57•]. Several of the expressed proteins true de novo design of novel sequences and structures. The reversibly form amyloid-like β-structured fibres. The difficulties here will be to make sensible choices for design group follow this up with an analysis of natural peptide templates; to guide these using positive and negative sequences and find that simple alternating HP patterns are design principles; and to make focused combinatorial not favoured, but are actually under-represented [58]. libraries using reduced amino-acid alphabets, which, They argue that Nature disfavours alternating HP nonetheless, contain sequences compatible with a competent sequences because of the possibility of amyloidogenesis. structure. The next step will be to append and tailor functions onto such structures. How does one assign an HP pattern to a more formal model for a design target? Marshall and Mayo [59•] introduce References and recommended reading Genclass, which automatically defines a binary (HP) pattern Papers of particular interest, published within the annual period of review, for a target structure. Based on the solvent accessibility of a have been highlighted as: generic sidechain placed at a position in a sequence, • of special interest Genclass assigns the site as buried, surface or boundary. •• of outstanding interest Appropriate cut-offs are gleaned from known structures; for 1. Kohn WD, Hodges RS: De novo design of alpha-helical coiled coils and bundles: models for the development of protein-design 2 example, approximately 20 Å predicts approximately 75% principles. Trends Biotechnol 1998, 16:379-389. of the HP patterns. The cut-off has also been optimised 2. Lazar GA, Handel TM: Hydrophobic core packing and protein experimentally through redesign cycles on the homeodomain design. Curr Opin Chem Biol 1998, 2:675-679. fold. Based on thermal stability and correct folding, the best 3. Street AG, Mayo SL: Computational protein design. Structure designs equate to those HP patterns that would be selected 1999, 7:R105-R109. using a cut-off of approximately 40 Å2; in effect, more of 4. Beasley JR, Hecht MH: Protein design: the choice of de novo the boundary sites are made hydrophobic. The results sequences. J Biol Chem 1997, 272:2031-2034. show that optimisation of HP patterns can improve protein 5. Hellinga HW: Rational protein design: combining theory and experiment. Proc Natl Acad Sci USA 1997, 94:10015-10017. design stability considerably. The discrepancy with natural 6. Hill RB, Raleigh DP, Lombardi A, DeGrado WF: De novo design of mesophilic sequences possibly reflects Nature’s lack of helical bundles as models for understanding protein folding and interest in superstable proteins and the potential role of function. Accounts Chem Res 2000, 33:745-754. surface hydrocarbon (or buried polar residues) in specifying 7. Harbury PB, Zhang T, Kim PS, Alber T: A switch between structure and function. 2-stranded, 3-stranded and 4-stranded coiled coils in GCN4 leucine-zipper mutants. Science 1993, 262:1401-1407. An alternative method for assigning HP patterns is pre- 8. Woolfson DN, Alber T: Predicting oligomerization states of coiled coils. Protein Sci 1995, 4:1596-1607. sented by Silverman et al. [33••]. These workers used 9. Walshaw J, Woolfson DN: SOCKET: a program for identifying and sequence alignments to class residues as phylogenetically analysing coiled-coil motifs within protein structures. J Mol Biol hydrophobic, polar, conserved or variable. The classes are 2001, 307:1427-1450. 470 Engineering and design

10. Nautiyal S, Alber T: Crystal structure of a designed, thermostable; 31. Main ERG, Fulton KF, Jackson SE: Context-dependent nature of • heterotrimeric coiled coil. Protein Sci 1999, 8:84-90. destabilizing mutations on the stability of FKBP12. Biochemistry The authors present a crystal structure that confirms a previous design of a 1998, 37:6145-6153. novel heterotrimeric coiled coil (see [18]) that employs positive and negative design features to specifically orient the helices. 32. Fleming PJ, Richards FM: Protein packing: dependence on protein • size, secondary structure and amino acid composition. J Mol Biol 11. Harbury PB, Plecs JJ, Tidor B, Alber T, Kim PS: High-resolution 2000, 299:487-498. protein design with backbone freedom. Science 1998, The packing efficiencies of protein structures are quantified and the origins 282:1462-1467. of noted differences are explored. 12. Sharma VA, Logan J, King DS, White R, Alber T: Sequence-based 33. Silverman JA, Balakrishnan R, Harbury PB: Reverse engineering the design of a peptide probe for the APC tumor suppressor protein. •• (beta/alpha)(8) barrel fold. Proc Natl Acad Sci USA 2001, Curr Biol 1998, 8:823-830. 98:3092-3097. A wonderful combinatorial dissection of the residues important for specify- 13. Pandya MJ, Spooner GM, Sunde M, Thorpe JR, Rodger A, ing the TIM barrel is described. A variety of conclusions are made. Of note Sticky-end assembly of a designed peptide fiber Woolfson DN: here is the finding that the two distinct hydrophobic cores respond differ- provides insight into protein fibrillogenesis. Biochemistry 2000, ently to mutation: the outer core is permissive, whereas the inner core is less 39 :8728-8734. tolerant (see also [42••]). 14. Ogihara NL, Ghirlanda G, Bryson JW, Gingery M, DeGrado WF, 34. Jiang X, Farid H, Pistor E, Farid RS: A new approach to the design Eisenberg D: Design of three-dimensional domain-swapped •• of uniquely folded thermally stable proteins. Protein Sci 2000, dimers and fibrous oligomers. Proc Natl Acad Sci USA 2001, 9:403-416. 98:1404-1409. A new computational approach to core design is introduced. A new scoring 15. Walsh STR, Cheng H, Bryson JW, Roder H, DeGrado WF: Solution function includes an entropy term, which effectively selects residues and structure and dynamics of a de novo designed three-helix bundle rotamers that ‘freeze out’ more conformational degrees of freedom. The aim protein. Proc Natl Acad Sci USA 1999, 96:5486-5491. is to produce cores with complementary fits of residues. 16. Lovejoy B, Choe S, Cascio D, McRorie DK, Degrado WF, Eisenberg D: 35. Jiang X, Bishop EJ, Farid RS: A de novo designed protein with Crystal-structure of a synthetic triple-stranded alpha-helical properties that characterize natural hyperthermophilic proteins. bundle. Science 1993, 259:1288-1293. J Am Chem Soc 1997, 119:838-839. 17. Bryson JW, Desjarlais JR, Handel TM, DeGrado WF: From coiled 36. Lim WA, Sauer RT: Alternative packing arrangements in the coils to small globular proteins: design of a native-like three-helix hydrophobic core of lambda-repressor. Nature 1989, 339:31-36. bundle. Protein Sci 1998, 7:1404-1414. 37. Axe DD, Foster NW, Fersht AR: Active barnase variants with 18. Nautiyal S, Woolfson DN, King DS, Alber T: A designed completely random hydrophobic cores. Proc Natl Acad Sci USA heterotrimeric coiled-coil. Biochemistry 1995, 34:11645-11651. 1996, 93:5590-5594. 19. Walsh STR, Sukharev VI, Betz SF, Vekshin NL, DeGrado WF: 38. Kristensen P, Winter G: Proteolytic selection for protein folding Hydrophobic core malleability of a de novo designed three-helix using filamentous bacteriophages. Fold Des 1998, 3:321-328. bundle protein. J Mol Biol 2001, 305:361-373. 39. Sieber V, Pluckthun A, Schmid FX: Selecting proteins with improved 20. Johansson JS, Gibney BR, Skalicky JJ, Wand AJ, Dutton PL: stability by a phage-based method. Nat Biotechnol 1998, A native-like three-alpha-helix bundle protein from 16:955-960. structure-based redesign: a novel maquette scaffold. J Am Chem Soc 1998, 120:3881-3886. 40. Finucane MD, Tuna M, Lees JH, Woolfson DN: Core-directed protein • design. I. An experimental method for selecting stable proteins 21. Gibney BR, Rabanal F, Skalicky JJ, Wand AJ, Dutton PL: Iterative from combinatorial libraries. Biochemistry 1999, 38:11604-11612. protein redesign. J Am Chem Soc 1999, 121:4952-4960. A phage-display selection method for rescuing stably folded proteins from combinatorial libraries (see also [38,39]). 22. Robertson DE, Farid RS, Moser CC, Urbauer JL, Mulholland SE, Pidikiti R, Lear JD, Wand AJ, Degrado WF, Dutton PL: Design and 41. Jung S, Arndt KM, Muller KM, Pluckthun A: Selectively infective synthesis of multi-heme proteins. Nature 1994, 368:425-431. phage (SIP) technology: scope and limitations. J Immunol Methods 1999, 231:93-104. 23. Skalicky JJ, Gibney BR, Rabanal F, Urbauer RJB, Dutton PL, Wand AJ: Solution structure of a designed four-alpha-helix bundle 42. Finucane MD, Woolfson DN: Core-directed protein design. II. maquette scaffold. J Am Chem Soc 1999, 121:4941-4951. •• Rescue of a multiply mutated and destabilized variant of ubiquitin. Biochemistry 1999, 38:11613-11623. 24. Gibney BR, Dutton PL: Histidine placement in de novo-designed This paper describes the selection of stable ubiquitin variants from a library heme proteins. 8 Protein Sci 1999, :1888-1898. of hydrophobic mutants. The selectants show a clear consensus for the WT 25. Lupas A: Coiled coils: new structures and new functions. Trends sequence, although none match WT stability. This provides evidence for the Biochem Sci 1996, 21:375-382. requirement for a restricted constellation of residues to specify and cement this particular core (see also [33••]). 26. Munson M, Balasubramanian S, Fleming KG, Nagi AD, O’Brien R, Sturtevant JM, Regan L: What makes a protein a protein? 43. Riechmann L, Winter G: Novel folded protein domains generated Hydrophobic core designs that specify stability and structural • by combinatorial shuffling of polypeptide segments. Proc Natl properties. Protein Sci 1996, 5:1584-1593. Acad Sci USA 2000, 97:10068-10073. This paper describes an interesting attempt to select stable protein chimeras 27. Willis MA, Bishop B, Regan L, Brunger AT: Dramatic structural and formed by combining the N-terminal half of CspA and fragmented genomic • thermodynamic consequences of repacking a protein’s E. coli DNA. hydrophobic core. Structure 2000, 8:1319-1328. A topological rearrangement of ROP is described that accompanies multiple 44. Martin M, Sieber V, Schmid FX: In-vitro selection of highly core mutations within the hydrophobic core. See also [28•,29]. • stabilized protein variants with optimized surface. J Mol Biol 2001, 309:717-726. 28. Glykos NM, Cesareni G, Kokkinidis M: Protein plasticity to the This paper provides an alternative view of stabilising proteins: hyperstable • extreme: changing the topology of a 4-alpha-helical bundle with a variants are selected from a library in which only surface residues of a single amino acid substitution. Structure 1999, 7:597-603. mesophilic form of CspB are mutagenised. A new protein architecture is described for a point mutant of ROP. See also [27•,29]. 45. Gassner NC, Baase WA, Matthews BW: A test of the ‘jigsaw puzzle’ model for protein folding by multiple methionine 29. Hill RB, DeGrado WF: Solutions structure of alpha D-2, a substitutions within the core of T4 lysozyme. Proc Natl Acad Sci nativelike de novo designed protein. J Am Chem Soc 1998, USA 1996, 93:12155-12158. 120:1138-1145. 46. Lazar GA, Desjarlais JR, Handel TM: De novo design of the 30. Vlassi M, Cesareni G, Kokkinidis M: A correlation between the loss hydrophobic core of ubiquitin. Protein Sci 1997, 6:1167-1178. • of hydrophobic core packing interactions and protein stability. J Mol Biol 1999, 285:817-827. 47. Wernisch L, Hery S, Wodak SJ: Automatic protein design with all A new parameter is introduced for rationalising the effects on stability of atom force-fields by exact and heuristic optimization. J Mol Biol making deletions in protein cores. 2000, 301:713-736. Core-directed protein design Woolfson 471

48. Michnick SW, Shakhnovich E: A strategy for detecting the 56. Roy S, Hecht MH: Cooperative thermal denaturation of proteins conservation of folding-nucleus residues in protein superfamilies. •• designed by binary patterning of polar and nonpolar amino acids. Fold Des 1998, 3:239-251. Biochemistry 2000, 39:4603-4607. An alternative view of core packing and protein design. Characterisations of 49. Koehl P, Levitt M: De novo protein design. I. In search of stability variants expressed from a HP library for a four-helix-bundle template are and specificity. J Mol Biol 1999, 293:1161-1181. described. It is estimated that about half of the proteins fold with some 50. Koehl P, Levitt M: De novo protein design. II. Plasticity in sequence degree of cooperativity and independence, which is a considerable improve- space. J Mol Biol 1999, 293:1183-1193. ment on completely random libraries. 57. West MW, Wang WX, Patterson J, Mancias JD, Beasley JR, Hecht MH: 51. Kuhlman B, Baker D: Native protein sequences are close to • optimal for their structures. Proc Natl Acad Sci USA 2000, De novo amyloid proteins from designed combinatorial libraries. 97:10383-10388. Proc Natl Acad Sci USA 1999, 96:11211-11216. A cautionary note on working with HP-patterned templates in combinatorial 52. Voigt CA, Mayo SL, Arnold FH, Wang ZG: Computational method to design. This work describes a library targeted at six-stranded β structures. reduce the search space for directed protein evolution. Proc Natl Expressed variants self-assemble and reversibly form amyloid. Acad Sci USA 2001, 98:3778-3783. 58. Broome BM, Hecht MH: Nature disfavors sequences of alternating 53. Johnson EC, Lazar GA, Desjarlais JR, Handel TM: Solution structure polar and non-polar amino acids: implications for and dynamics of a designed hydrophobic core variant of ubiquitin. amyloidogenesis. J Mol Biol 2000, 296:961-968. Structure 1999, 7:967-976. 59. Marshall SA, Mayo SL: Achieving stability and conformational 54. Keefe AD, Szostak JW: Functional proteins from a • specificity in designed proteins via binary patterning. J Mol Biol •• random-sequence library. Nature 2001, 410:715-718. 2001, 305:619-631. An excellent study is described in which ATP-binding polypeptides are A computational method for assigning HP patterns to a structural template selected from a starting library of near-random 80-mers. is described. This is tested on the structures in the PDB and optimised through cycles of protein redesign. 55. Kamtekar S, Schiffer JM, Xiong HY, Babik JM, Hecht MH: Protein design by binary patterning of polar and nonpolar amino-acids. 60. Banner DW, Kokkinidis M, Tsernoglou D: Structure of the Co1E1 Science 1993, 262:1680-1685. Rop protein at 1.7 Å resolution. J Mol Biol 1987, 196:657-675.