© 2001 Nature Publishing Group http://structbio.nature.com history

The birth of computational

Michael Levitt

Like Sydney Altman1, I too was initially rejected by the renowned Medical Research Council (MRC) Laboratory of Molecular Biology in Cambridge, England. The year was 1967 and I was then in my final year of a B.Sc. degree in at Kings College in London. Enthralled by John Kendrew’s BBC 1964 television series “The Thread of Life”, I wanted desperately to do my Ph.D. at the MRC in Cambridge. Alas there was no room for any new postgraduate students in 1967! After some negotiations, I was accepted for the following year. More importantly, John Kendrew said that I should spend the intervening period at the Weizmann Institute in with Shneior Lifson. Kendrew had just heard of Lifson’s initial ideas2 on the consistent (CFF), which was an attempt to simulate the Fig. 1 The total potential energy of any is the sum of terms allowing for bond stretch- ing, bond angle bending, bond twisting, van der Waals interactions and electrostatics. Many prop- properties of any molecular system from a erties of a can be simulated with such an empirical energy function. simple potential energy function. He

© http://structbio.nature.com Group 2001 Nature Publishing believed that these methods should be applied to and nucleic acid macro- . I arrived in Israel in October, dition of Lab Talks. These talks by mem- interact closely with both Crick and Aaron 1967 and set to work programming the bers of the three divisions at the Klug and so I was exposed to the wonders consistent force field under the supervi- Laboratory of Molecular Biology at that of molecular and structural biology. sion of Lifson and his Ph.D. student Arieh time (Structural Studies Division under The model was published in 1969 Warshel. At that time, computing at the Kendrew, the Cell Biology Division under (ref. 6) and I settled down to work on my Weizmann Institute was amongst the best Sydney Brenner and Francis Crick, and thesis entitled “Conformation Analysis of in the world; in 1963 computer engineers Protein and Nucleic Acid Chemistry ”7. This was entirely devoted to there had built their own machine, appro- Division under Fred Sanger) were a treat computational biology and included priately known as the Golem, after the for newcomers to the Lab. The ‘Molecule chapters entitled “Energy Parameters Jewish folklore automaton. of the Year’ was tRNA, which had been from Proteins”, “Interpreting Problematic In a few short months we had a program predicted to exist by Francis Crick 10 years Regions of Electron Density Maps Using called CFF that allowed us to calculate the before5 and was now the subject of intense Convergent Energy Refinement”, “Energy energy, forces (energy first derivatives with structural and genetic interest. I decided Refinement of /Substrate Com- respect to atomic positions) and curvature to try to build a model of tRNA and start- plexes: Lysozyme and Hexa-N-Acetyl- (energy second derivatives with respect to ed off playing with CPK space-filling glucosamine” and “Energy Refinement of atomic positions) of any molecular system. models at home. Transfer RNA has almost Tertiary Structure Changes Caused by Warshel went on to use the program to cal- 2,000 and a space-filling model Oxygenation of Horse Haemoglobin”. culate structural, thermodynamic and weighs over 100 pounds. My most vivid Work on nucleic acids was not neglect- spectroscopic properties of small organic memory is lowering the tRNA CPK model ed and at that time it seemed that RNA molecules3, while I followed Kendrew’s from the first floor window of our terrace folding would be easier to tackle than pro- dictum and applied these same programs cottage in Newnham, while my somewhat tein folding8. Computational work on to proteins. This led to the first energy pregnant wife was having a hard time con- began in 1973 during my minimization of an entire protein struc- trolling her laughter. The model, which postdoctoral research with Shneior Lifson ture (in fact we did two, myoglobin and was then rebuilt from brass components, back at the Weizmann Institute. Arieh lysozyme) in a process that became known towered over me as I measured all atomic Warshel had returned from his postdoc at as energy refinement4. positions with a plumb line (a pointed Harvard and we started to work together I began my Ph.D. at the MRC in metal weight hanging from a string onto again on both protein folding and enzyme Cambridge in September, 1968 and was graph paper) so that the model could be reactions. Each project led to novel simu- immediately immersed in the annual tra- energy refined. Modeling tRNA led me to lations9,10 that became the basis for a great

392 nature structural biology ¥ volume 8 number 5 ¥ may 2001 © 2001 Nature Publishing Group http://structbio.nature.com history

deal of future work, with much still to be Gelin’s rewrite to form the basis of the In thinking about how to do this, it is done a quarter of a century later. next generation of programs including interesting to compare Nature with simu- I returned to a staff position at the MRC CHARMM (Chemistry at HARvard lated biology. Some things that are very in Cambridge in October, 1974 and Molecular Mechanics) from Karplus’ difficult in Nature are trivial for comput- Warshel joined me there as a visitor. group at Harvard, AMBER from Peter ers: consider how much cellular machin- Warshel focused his attention on quan- Kollmann’s group at UCSF and Discover ery is needed to transcribe DNA sequence tum mechanics in biology and published a from Arnold Hagler’s company, Biosym. to RNA sequence — in the computer all model of the initial steps in the visual Looking back to that period, it is much one needs to do is change ‘T’ to ‘U’. process, based on a molecular dynamics easier to appreciate who were the key con- Translating RNA sequence to protein simulation11. Meanwhile I worked with tributors. Shneior Lifson, who passed sequence is even more difficult in the cell, Cyrus Chothia on the classification and away on 22 January, 2001, really started it but in a computer one just applies the analysis of protein architecture12 and with all by defining the form of the empirical genetic code table. Other things that Tony Jack, who passed away in 1978, on potential energy function still in use today appear very easy for Nature are almost the refinement of large structures by (Fig. 1). In particular, he was the first to impossibly hard for computers: once syn- simultaneous minimization of the molec- realize that the hydrogen bond could be thesized a protein sequence spontaneously ular energy and crystallographic R-fac- described by simple electrostatic interac- folds into the native structure, whereas tor13. Both papers were to lead to tion of partial charges. With Warshel, he simulating even a part of this process is significant future science: Chothia went also set up a consistent procedure for still well beyond our computational capa- on to develop the first web database, deriving the energy parameters. bilities. Computational structural biology SCOP14 and Axel Brünger based his won- Sequence analysis, which forms such a will remain very challenging well into the derfully useful X-PLOR program15 on key part of modern computational biolo- 21st century. Jack’s work with me. gy, was born in that same 1969–1977 peri- While Warshel and I were travelling od. In 1969, analysis of tRNA sequences is in the Department of around the world, our computer program, revealed a correlated base change6 (two Structural Biology, Stanford University, CFF, had wings of its own. bases not in a helical stem change together Stanford, Calfornia 94305-5400, USA. took the program with him on his post- to maintain function, thereby indicating a email: [email protected]

doctoral visit to Martin Karplus’ lab at possible interaction); in 1971, Needleman 1. Altman, S. Nature Structural Biology 7, 827–828 Harvard in 1969. In 1971, Bruce Gelin, and Wunsch applied the computer science (2000). 2. Bixon, M. & Lifson, S. Tetrahedron 23, 769–784 who was just released from the US army, method of dynamic programming to (1967). began working with Warshel and started sequence alignment17; and in 1977, Sanger 3. Lifson, S. & Warshel, A. J. Chem. Phys. 49, 5116–5129 (1968) writing a new version of the code. This and coworkers started genome-scale DNA 4. Levitt, M. & Lifson, S. J. Mol. Biol. 46, 269–279 rewrite was essential as I had learned my sequencing with the ϕX-174 bacterio- (1969). 5 Crick, F.H.C. Symp. Soc. Exp. Biol. 12, 138–163

© http://structbio.nature.com Group 2001 Nature Publishing 18 programming from an IBM FORTRAN II phage sequence . (1958). manual, whereas Bruce Gelin was much I still remember with much chagrin that 6. Levitt, M. Nature 224, 759–763 (1969). 7. Levitt, M. Ph. D. Thesis Conformation analysis of better trained. I can still recall my excite- day in 1976 when Bart Barrell approached proteins (Cambridge University, Cambridge, UK; ment when I saw his version of the pro- me to help analyze the ϕX DNA sequence 1971); http://csb.stanford.edu/levitt/Levitt_Thesis_1971/ Levitt_Thesis_1971.html. gram — many of the variable names were only to be rebuffed; I felt that structure was 8. Levitt, M. In Polymerization in Biological Systems those I had invented but the code was so just so much more interesting than Ciba Foundation Symposium 7, 146–171 (Eds Wolstenholme, G.E.W. & O’Connor, M., Elsevier, much more elegant! sequence. Having confessed what may be Amsterdam; 1972). Bruce Gelin’s code led to his pioneering the greatest misjudgment of my career, I 9. Levitt, M. & Warshel, A.Nature 253, 694–698 (1975). 10. Warshel, A. & Levitt, M. J. Mol. Biol. 103, 227–249 work with Andy McCammon and Martin would like to conclude with a few words (1976). Karplus on the simulation of protein about the future of computational biology. 11. Warshel, A. Nature 260, 679–683 (1976). 16 12. Levitt, M. & Chothia, C. Nature 261, 552–558 (1976). dynamics . This work, published in 1977, Computers were made for biology: biol- 13. Jack, A. & Levitt, M. Acta Crystallogr. A 34, 931–935 marks the start of the next phase of com- ogy would never have advanced as it did (1978). 14. Murzin, A.G, Brenner, S.E., Hubbard, T. & Chothia C. putational structural biology in that it sig- without the dramatic increase in comput- J. Mol. Biol. 247, 536–540 (1995). naled the linking of computational er power and availability. One day we 15. Brünger, A.T., Karplus, M. & Petsko G.A. Acta Crystallogr. A 45, 50–61 (1989). chemistry with biology. Work in the field would like to be able to simulate compli- 16. McCammon, J.A., Gelin, B.R. & Karplus, M. Nature was becoming much more widespread; cated biological processes, perhaps even 267, 585–590 (1977). 17. Needleman, S.B. & Wunsch, C.D. J. Mol. Biol. 48, the original program that I wrote with going from the genomic sequence to a full 443–453 (1970) Arieh Warshel went on through Bruce simulation of the organism’s phenotype. 18. Sanger,F. et al. Nature 265, 687–695 (1977).

nature structural biology ¥ volume 8 number 5 ¥ may 2001 393