COMMENTARY

Evolution and evolvability of proteins in the laboratory

Michael W. Deem* Departments of Bioengineering and Physics & Astronomy, Rice University, 6100 Main Street, MS 142, Houston, TX 77005-1892

he introduction of DNA shuf- fling in 1994 substantially in- creased the power of laboratory protocols to optimize Tprotein function and led to a significant increase in the number of scientists pur- suing directed protein evolution (1). The introduction of family shuffling in 1998 allowed for the homologous recombina- tion of natural diversity among families of proteins and further increased the range of protein sequence and function space that could be accessed in the lab- oratory (2). The realization that evolu- tion of truly novel protein function likely requires nonhomologous exchange of genetic material led to the introduc- tion of the THIO-ITCHY technique in 2001 (3). In this issue of PNAS, Saraf et al. (4) develop the theory behind non- homologous evolution of protein func- tion by using the information contained within the natural diversity found in a Fig. 1. In the canal of functional proteins (25, 26) lie members of a protein fold. The theory of Saraf et al. (4) predicts the probable activities of the nonhomologously engineered protein hybrids of members of protein fold family (Fig. 1). this fold. The theory developed by Saraf et al. (4) quantifies by how much a test pro- tein sequence differs in its structural application comes from the Stemmer combination to evolution? Can we pre- and chemical properties from the con- group (6) where a tetravalent vaccine dict such usefulness? Can we predict sensus sequence of a protein fold. The for dengue fever has been developed. likely recombinations (12)? Finally, can test sequences are those that arise by Dengue fever comes in four strains, and we suggest optimal treatment strategies the and recombination dynam- vaccines vainly struggle for in light of the recombination predic- ics of the THIO-ITCHY technique (3). over all strains. Most vaccines provide tions? Perhaps the most immediate ap- The first step of the procedure is to de- protection against only one or two plication of such a program would be to termine which pairs of residues in the strains, and there is currently no Food HIV dynamics and treatment (13). protein are evolutionary conserved, or and Drug Administration-approved An interesting question that the the- correlated (5), in the family of proteins vaccine against all four strains. The ory of Saraf et al. (4) might be able to with the chosen fold. It is implicitly as- Maxygen group (6) used a combination address is why recombination and CϩG sumed that many of the dominant inter- of nonhomologous exon shuffling and content are correlated. This correlation, actions are pairwise additive. A measure bioinformatics theory that identified and of the strictness of this conservation is which has been observed for over a de- enhanced crossover sites for optimal calculated. The second step of the pro- cade (14), still is unexplained. Various cedure is to determine whether the test shuffling to evolve a single protein anti- mechanisms that might lead to this cor- protein sequence also conserves the gen that provokes an antibody response relation have been posited. On the one against all four dengue strains. This vac- hand are the theories that suggest the structural and chemical properties at the ϩ pairs of residues identified to be con- cine has enormous potential importance C G bias is a result of selective pres- served. The amino acid properties for to the 2.5 billion people who live in sures that only become apparent with which conservation is examined are dengue-infected regions, of whom 100 the increased sequence-searching ability charge, volume, and hydrophobicity. million are stricken with dengue each that recombination provides. On the Through quantifying by how much the year. other hand are theories that suggest test sequence fluctuates away from the Predictive models for analyzing the CϩG content might bias recombination average properties of the protein fold, outcome of experimental rounds of se- rates, through chromatic associations relative to the natural fluctuations lection and mutation also may be useful that cause physical exposure of the within the fold ensemble, the authors in the analysis of disease evolution. At DNA, CG-biased mismatch repair, or an are able to rank the probable activity of the level of point mutation, for example, underlying bias of the associated bio- the test sequences. Although the rank- diversity within disease protein folds is chemical machinery. Most of these theo- ing was of enzymatic activity in this now a factor in protein inhibitor design ries rely on statistical analysis of DNA case, the theory is generalizable to any (7, 8). Larger-scale genetic changes such sequence data for their support. The measurable figure of merit that is corre- as recombination (9, 10) and transposi- lated with protein structure. tion (11), however, also play an impor- Evolution of better protein-based tant role in creating pathogen diversity. See companion article on page 4142. therapeutics is one example of how this Quantitative theories can answer the *E-mail: [email protected]. technology can be used. An interesting following questions. How useful is re- © 2004 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0400475101 PNAS ͉ March 23, 2004 ͉ vol. 101 ͉ no. 12 ͉ 3997–3998 Downloaded by guest on September 24, 2021 sequence-level theory of Saraf et al. is much lower within annotated domains lead to fewer clashes upon recombina- might help to settle this question di- than random chance would dictate (19). tion than would be expected by chance? rectly, at least in the in vitro setting. That is, the process of alternative splic- One interesting feature of crossover- The theoretical models of Saraf et al. ing tends to insert or delete entire pro- type protein evolution experiments, (4) also may be used to examine such tein domains more frequently, and which the theory of Saraf et al. (4) re- basic questions as how diversity and disrupt protein domains less frequently, produces, is a characteristic V shape in evolvability arise in natural systems. the activity as a function of crossover Within the commu- position. This characteristic shape is a nity, there is rigid resistance to the Sequence-level reflection of the typical reduction in concept that evolvability is a selectable activity as the evolved sequences be- trait: because evolvability is a character- theory may elucidate come more distinct from the parent istic of the future, causality would seem sequences: a crossover in the middle to prevent its selection. Selection, how- how diversity and creates maximal distinction on average. This result points toward the need to ever, operates at the group level. In- develop methods to search the protein deed, the mechanism for the evolution evolvability arise in sequence space in a nonrandom way. By and maintenance of adaptability traits protein families. biasing the search of protein sequence is population-based and requires a space with what we know about protein dynamic environment. The general structure, it is possible to make large framework relating evolvability and en- jumps in sequence space between func- than expected by chance. One might vironmental dynamics recently has been tional regions (20). Pathway evolution presented (15). Simulations (16, 17) and ask whether this positive selection for (21) and module recombination (22) are theories (18) suggest in a general way evolvability also has left a mark on the macromolecular examples of this find- how evolvability arises. Detailed studies within-domain splice sites. That is, do ing. The combination of predictive abil- at the sequence level, which the work of the splice sites that occur within do- ity to create new structural folds (23) Saraf et al. enables, could further clarify mains tend to occur in positions that and predictive ability to rationally (24) the mechanisms by which evolvability tend not to be ‘‘clashing,’’ in the lan- or combinatorially (4) optimize fold arises in a population. guage of Saraf et al. (4)? On a related function should yield intriguing and pos- Recent bioinformatics studies show note, do species with high crossover sibly profound results in the coming that the frequency of alternative splicing rates tend to have domain families that years.

1. Stemmer, W. P. C. (1994) Nature 370, 389–391. 10. Worobey, M., Rambaut, A. & Holmes, E. C. 19. Kriventseva, E. V., Koch, I., Apweiler, R., Vin- 2. Crameri, A., Raillard, S. A., Bermudez, E. & (1999) Proc. Natl. Acad. Sci. USA 96, 7352–7357. gron, M., Bork, P., Gelfand, M. S. & Sunyaev, S. Stemmer, W. P. C. (1998) Nature 391, 288–291. 11. Shapiro, J. A. (1997) Trends Genet. 13, 98–104. (2003) Trends Genet. 19, 124–128. 3. Lutz, S., Ostermeier, M. & Benkovic, S. J. (2001) 12. Moore, G. L., Maranas, C. D., Lutz, S. & Benk- 20. Bogarad, L. D. & Deem, M. W. (1999) Proc. Natl. Nucleic Acids Res. 29, E16. ovic, S. J. (2001) Proc. Natl. Acad. Sci. USA 98, Acad. Sci. USA 96, 2591–2595. 4. Saraf, M. C., Horswill, A. R., Benkovic, S. J. & 3226–3231. 21. Stemmer, W. P. C. (2002) J. Mol. Catal. B 19, 3–12. Maranas, C. D. (2004) Proc. Natl. Acad. Sci. USA 13. Lathrop, R. H. & Pazzani, M. J. (1999) J. Comb. 22. Dueber, J. E., Yeh, B. J., Chak, K. & Lim, W. A. Optim. 3, 101, 4142–4147. 301–320. (2003) Science 301, 1904–1908. 14. Eyre-Walker, A. (1993) Proc. R. Soc. London B 5. Rod, T. H., Radkiewicz, J. L. & Brooks, C. L. 23. Kuhlman, B., Dantas, G., Ireton, G. C., Varani, 252, 237–243. (2003) Proc. Natl. Acad. Sci. USA 100, 6980– G., Stoddard, B. L. & Baker, D. (2003) Science 15. Sato, K., Ito, Y., Yomo, T. & Kaneko, K. (2003) 6985. 302, 1364–1368. Proc. Natl. Acad. Sci. USA 100, 14086–14090. 6. Stemmer, W. & Holland, B. (2003) Am. Sci. 91, 16. Travis, J. M. J. & Travis, E. R. (2002) Proc. R. Soc. 24. Looger, L. L., Dwyer, M. A., Smith, J. J. & 526–533. London B 269, 591–597. Hellinga, H. W. (2003) Nature 423, 185–190. 7. Freire, E. (2002) Nat. Biotech. 20, 15–16. 17. Kepler, T. B. & Perelson, A. S. (1998) Proc. Natl. 25. Waddington, C. H. (1942) Nature 150, 563–565. 8. Leslie, M. (2002) Science 297, 1615. Acad. Sci. USA 95, 11514–11519. 26. Kauffman, S. A. (1993) The Origins of Order 9. Colegrave, N. (2002) Nature 420, 664–666. 18. Blasio, F. V. D. (1999) Phys. Rev. E 60, 5912–5917. (Oxford Univ. Press, New York).

3998 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0400475101 Deem Downloaded by guest on September 24, 2021