© 2000 Nature America Inc. ¥ http://structbio.nature.com perspectives

Structural genomics in North America

Thomas C. Terwilliger

Structural genomics in North America has moved remarkably quickly from ideas to pilot projects. Just three years ago, the field was only a concept, independently being discussed by its many inventors. Now it is already a well- organized, increasingly-funded, consortium-based effort to determine protein structures on a large scale.

The pivotal point for the North American folds, or proteins from fully-sequenced organisms such as yeast efforts was the 1998 Argonne meeting (see or Haemophilus influenzae that held the promise of yielding Table 1 for a timeline of structural important functional information from protein structures. genomics in North America). This meeting All the pilot projects have used more or less the same initial brought together over 80 researchers and strategy: cloning many genes and finding the ones that were high- representatives of funding agencies who ly suitable for structure determination. This process consisted of .com thought that improvements in technology, finding those genes that express well in a simple system, selecting combined with the successes of the genome from this set ones that produce protein in soluble form, testing sequencing projects, had set the stage for a large-scale structure these for crystallization or NMR spectra, and determining the determination project. Experts in all the required steps, from structures of the ones that pass all these tests. All of the projects protein expression to X-ray and NMR structure determination, have found that this process leads to a huge attrition rate at each described recent progress and how the obstacles remaining in step, with three-dimensional structures obtained for just 5–20% their areas could be overcome. The Argonne meeting led to a of the genes that have been cloned. (To be fair, these projects are reinforced conviction on the part of many participants that the ongoing and many of the other cloned genes may yield structures http://structbio.nature

¥ time was indeed right for structural genomics. It set a tone of later.) Table 2 lists the numbers of structures determined over the

excitement and promise for the structural genomics field that past two years by some of these pilot projects. Altogether, these has propelled it ever since. Just as importantly, however, discus- pilot projects have produced over 70 new protein structures, with sions at the Argonne meeting indicated that small-scale testing of the Toronto-based consortium alone contributing ~20 of these. the ideas of structural genomics and considerable additional Although the pilot projects have already generated an impres- technology development were necessary before a full-scale pro- sive number of structures, obtaining them required considerable ject could be carried out. resources and effort, and most people in the field feel that sub- stantial improvements in technology will be required to make Pilot projects begin structure determination a high-throughput process. Several of A number of independent and mostly small pilot projects were the pilot projects also have major components focused on tech- 2000 Nature America Inc.

© initiated right after the Argonne meeting, complementing some nology development. The Rutgers group, for example, is focus- existing projects that had begun before then (see Table 2 for a list ing on development of high-throughput methods for NMR of the major pilot projects). The projects have been supported by structure determination. The Argonne group on the other hand diverse sources, with funding for several coming from the US is focusing on developing of high-throughput methods for syn- Department of Energy, and additional funding coming from the chrotron-based X-ray crystallography, and the Scripps/GNF National Institutes of Health (NIH), the Ontario Cancer group has been entirely focused on technology development up Institute, the New Jersey Commission on Science and to this point. Technology Initiative, and the University of California. The funding of these projects in 1998–2000 range from minimal NIH workshops funding to ∼$1.5 million per year. The Ontario project is to be During the year following the Argonne workshop, the NIH held funded at ∼$3.4 million per year beginning in the fall of 2000, a series of three workshops to discuss the possibility of a large- and the Scripps/Genomics Institute of the Novartis Research scale publicly-funded effort in structural genomics (see Foundation (GNF) effort, involved in both public and private http://www.nigms.nih.gov/funding/psi.html for a comprehensive sectors, is funded at ~$6 million per year. discussion of the NIH program). The first workshop focused on Most of these pilot projects had two principal goals: to demon- the scope of a possible structural genomics project. The partici- strate the overall feasibility of structural genomics, and to devel- pants concluded that a project to determine a representative set op some of the technology necessary for large-scale structure of a few thousand protein structures would be the right scale to determination. For feasibility demonstrations, participants in be useful in understanding the structures and functions of most several of these pilot projects chose proteins from thermophilic other proteins. Importantly, the workshop conclusions also organisms, hoping to start with the simplest possible case, and noted that the infrastructure and technologies that would be reasoning that these proteins would be relatively easy to work developed in the course of such a project would transform the with. Other pilot projects chose proteins expected to have novel way structure determination is done in the future.

Bioscience Division, Mail Stop M888, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA. email: [email protected]

nature structural biology • structural genomics supplement • november 2000 935 © 2000 Nature America Inc. ¥ http://structbio.nature.com perspectives

rather than focus. Even in this age of Table 1 Timeline of North American structural genomics efforts rapid communication and remote X-ray January, 1998 Argonne Structural Genomics Meeting and NMR data collection, geographical 1998–1999 Starts of pilot projects funded by US Department proximity remains important. of Energy (DOE), University of California, NIH, The seven new NIH structural Ontario Cancer Institute; numerous workshops genomics centers have substantially vary- and meeting sessions on structural genomics ing emphases even though all of them are February, 1998–January, 1999 NIH holds three workshops on a Protein Structure Initiative designed to carry out all aspects of struc- June, 1999 NIH calls for center proposals and technology development tural genomics. Each of the centers plans proposals as part of the Protein Structure Initiative to obtain hundreds of new protein struc- September 30, 2000 NIH centers begin operation tures and expects these to contain many structures that represent families of pro- The second workshop focused on the appropriateness of a tein structure that previously had no representatives with known structural genomics project, concluding that such a project structure. Each of the centers also plans to develop new technolo- would indeed be important, that the technology is nearly ready, gies that will allow a full-scale structural genomics effort to suc- and that pilot projects of a substantial size should be supported ceed. The technologies to be developed by the different centers to assess feasibility and to develop additional technology. There address different bottlenecks in structure determination. The Joint was discussion of whether such a project would take funding Center for Structural Genomics and the Northeast Consortium away from existing projects; the NIH representatives assured for Structural Genomics, for example, plan to develop and use participants that new funding would be sought. high-throughput robotics crystallization devices to set up and .com The third workshop centered on target selection. Two general analyze tens of thousands to as many as 130,000 crystallization approaches to selecting targets for structural genomics were dis- experiments in a day, hoping that even a tiny percentage of these cussed. The first was to organize protein sequences into families will yield crystals. They are also developing robotics equipment for at a level of ∼30% sequence identity, and to determine just one all other steps in high throughput protein production. The TB representative of each family. The other approach was to focus Structural Genomics Consortium, in contrast, plans to place more on proteins with clear biological importance. Although there emphasis on the earlier bottleneck of protein expression, and was not a general consensus on one approach or the other, a goal expects to use in vitro evolution-based methods to engineer its for the global structural genomics efforts of 10,000 structures protein targets to increase solubility. Almost all of the centers plan http://structbio.nature

¥ was thought by many to be reasonable. to develop automated procedures for X-ray data collection and

Many meetings during 1998–1999 focused on or contained analysis, and the Northeast Consortium for Structural Genomics sessions on structural genomics. A particularly influential meet- plans additionally to automate NMR data analysis. ing was held in the fall of 1998 in Avalon, New Jersey. At this The choice of targets for the structural genomics centers also meeting there were spirited debates about the goals of structural varies considerably. All expect to determine structures of many genomics, and it became clear that for many participants the real proteins with novel folds. Some (the Midwest Center for goal of structural genomics was to know the structures of all pro- Structural Genomics, the Southeast Collaboratory for Structural teins. In this context, the targeting of current small-scale efforts Genomics) plan to choose targets from a number of different could be understood as strategies for getting to this very long- organisms largely on this basis. Others (the Northeast Structural term and very ambitious goal. Genomics Consortium, the New York Structural Genomics 2000 Nature America Inc.

© Research Consortium, and the Joint Center for Structural NIH-supported centers and technology development Genomics) plan to emphasize proteins from functionally impor- The NIH moved quickly after its three workshops to develop a tant families. The Berkeley Structural Genomics Center plans to program in structural genomics. In June of 1999, it announced determine protein folds representing the complement of an its intent to fund three to six pilot research centers for structural entire organism (Mycoplasma genitalia). The TB Structural genomics. Each research center would be expected to carry out Genomics Consortium plans to emphasize proteins from all the steps in structural genomics in order to assess feasibility Mycobacterium tuberculosis that are promising targets for anti- and to develop the large-scale processes that would be necessary tuberculosis therapeutics. for full-scale structural genomics. The funding for these projects was to be in addition to existing structural biology funding by International coordination the NIH. Additionally, the NIH announced its desire to receive There have been substantial efforts to foster cooperation and applications in areas of technology that would support structur- communication among the many structural genomics efforts al genomics efforts, though this work was to be funded out of around the world (see the accompanying articles by Heinemann existing sources. and Yokoyama). The NIH and the Wellcome Trust have taken the A total of 11 proposals for structural genomics research lead in this area by sponsoring a meeting in Hinxton, UK in April Centers were submitted, and the NIH has announced that it will of 2000 to discuss international cooperation. The participants fund seven of these. The new research centers are listed in Table agreed on overall goals for structural genomics, the need for 3. Perhaps the most remarkable thing about these structural cooperation, including timely data release, and a need for devel- genomics centers is that every one of them involves at least five oping intellectual property policies. There appears to be a great institutions. This suggests that at this point it is very difficult to deal of interest among the structural genomics efforts in collabo- amass the full range of capabilities and manpower at any one ration and sharing of methods. An extensive overlap of investiga- institution needed to carry out even what the NIH considers a tors among the structural genomics efforts and an international pilot-scale structural genomics project. A second remarkable aspect of many of the efforts exists as well (Tables 2, 3). A second aspect is that the defining feature in most of the names of the meeting on international cooperation is scheduled for April, centers is geography (Northeast, New York, Southeast, Midwest), 2000 at Airlie House, Virginia.

936 nature structural biology • structural genomics supplement • november 2000 © 2000 Nature America Inc. ¥ http://structbio.nature.com perspectives

Relation to industrial efforts structures freely available in the public database (see the article All of the structural genomics efforts described in this article by Williamson). At this point it is not entirely clear how propri- plan to make their targeting and results public. Commercial etary and non-proprietary efforts will be related. structural genomics efforts are planned as well. Two articles in In principle, there is not necessarily a need for coordination with this issue describe the range of structural genomics activities that private efforts in the biotechnology sector. Some of the structures involve industry: from the general business strategies of biotech- resulting from such commercial efforts may remain trade secrets, nology companies devoted to structural genomics (see the article and as far as the public efforts are concerned, it will be as if these by Harris and colleagues) to efforts to form a structural structures were not determined. Other commercially determined genomics consortium of pharmaceutical companies and The structures are likely to be patented (if a clear path from structure to Wellcome Trust, which will attempt to increase the number of drug design is enabled by the structure; see the article by Waller and

Table 2 Major North American pilot projects in structural genomics Project leaders1 Funding sources Targeting Number of structures determined Sung-Hou Kim, University of US Department of Energy (DOE) Methanococcus jannaschii; 12 California, Berkeley/Lawrence novel folds and unknown structures Berkeley National Laboratory of hyperthermophile proteins

Roberto Poljak, Edward Eisenstein, National Institutes of Health Haemophilus influenzae 10 .com Gary Gilliland, Osnat Herzberg, (NIH) function from structure John Moult, John Orban, Center for Advanced Research in Biotechnology/University of Maryland; Andrew Howard, Illinois Institute Of Technology/Argonne National Laboratory

David Eisenberg, UCLA2 DOE, University of California Pyrobaculum aerophilum; 8 http://structbio.nature ¥ hyperthermophile proteins; technology development

Steven K. Burley, Rockefeller University3 DOE Yeast proteins; protein domains 8

Aled Edwards, Cheryl Arrowsmith, Ontario Cancer Institute/ M. thermoautotrophicum;20 Ontario Cancer Institute/ Ontario Research & Development function from structure University of Toronto4 Challenge Fund 2000 Nature America Inc.

Wayne Anderson, Northwestern University Institutional Novel folds, putative 17 © Medical School; Alfonso Mondragon, proteins unique to Northwestern University; Andrzej pathogenic organisms, Joachimiak, Argonne National Laboratory; M. thermoautotrophicum Aled Edwards, University of Toronto5 and other thermophilic proteins

Gaetano Montelione, Rutgers University6 New Jersey Commission on Science Functionally important proteins; 5 and Technology Initiative / NMR technology development Institutional funding

Peter Schultz, Raymond Stevens, Lawrence Berkeley National Technology development 0 The Scripps Research Institute/ Laboratory, The Genomics Institute (Dedicated The Genomics Institute of the of the Novartis Research Foundation to technology Novartis Research Foundation development)

1Because of space constraints, only some investigators could be listed in the table. All other participants are listed in the respective footnotes. 2T. Alber, J. Berger, University of California, Berkeley; E.N. Baker, University of Auckland; J. Berendzen, M. Park, T. Terwilliger, G. Waldo, Los Alamos National Laboratory; J. Bowie, R. Clubb, J. Feigon, J. Perry, T. Yeates, UCLA; B. Rupp, Lawrence Livermore National Laboratory; R. Smith, Pacific Northwest National Laboratory. 3B.T. Chait, T. Gaasterland, J. Kuriyan, A. Sali. Rockefeller University; F.W. Studier, S. Swaminathan, R.M. Sweet, Brookhaven National Laboratory; C. Lima, Weill Medical College; L. Shapiro, Mount Sinai School of Medicine; S. Almo, M.R. Chance, Albert Einstein College of Medicine. 4E. Pai, Ontario Cancer Institute/; M. Kennedy, Pacific Northwest National Laboratory; L. McIntosh, University of British Columbia; K. Gehring, I. Ekiel, McGill University. 5M. Egli, D. Freeman, Northwestern University Medical School; T.S. Jardetzky, A. Rosenzweig, Northwestern University; F. Stevens and W.-K. Lee Argonne National Laboratory; C. Arrowsmith, University of Toronto. 6S. Anderson, C. Kulikowski, E. Arnold, Rutgers, Rutgers University; A. Stock, Robert Wood Johnson Medical School; J. Hunt, L. Tong, Columbia University; G. DeTitta, Hauptman Woodward Medical Research Institute.

nature structural biology • structural genomics supplement • november 2000 937 © 2000 Nature America Inc. ¥ http://structbio.nature.com perspectives

Table 3 NIH structural genomics centers Center name Focus of center Project leaders1 Northeast Structural Eukaryotic gene families; technology Gaetano Montelione, Genomics Consortium development; small proteins; analysis of Rutgers University2 complementarity of NMR and crystallographic methods

Midwest Center for Novel protein folds (including membrane proteins), Andrzej Joachimiak, Structural Genomics proteins unique to pathogenic organisms, proteins Argonne National Laboratory3 unique to eukaryota (human, Caenorhabditis elegans); synchrotron-based X-ray crystallography methods, robotic technology development

Southeast Collaboratory Pyrococcus furiosus and C. elegans, and human proteins, Bi-Cheng Wang, for Structural Genomics robotic cloning, expression purification, and University of Georgia4 crystallization; automated NMR and crystallographic structure determination.

New York Structural Yeast proteins; technology development; novel folds Steven K. Burley, .com Genomics Research Consortium The Rockefeller University5

Joint Center for Human and C. elegans proteins involved in signal Ian Wilson, Structural Genomics transduction; novel folds; technology development The Scripps Research Institute6

Berkeley Structural Near-complete structural representation of Sung-Hou Kim, University of Genomics Center Mycoplasma proteome, novel folds, California, Berkeley and Lawrence and technology development Berkeley National Laboratory7 http://structbio.nature ¥

TB Structural Genomics Mycobacterium tuberculosis proteins; Thomas C. Terwilliger, Consortium novel folds, technology development Los Alamos National Laboratory8

1Because of space constraints, only the principal investigators (according to the NIGMS announcement, http://www.nigms.nih.gov/news/ releases/SGpilots.html) are shown in the table; all other participants are listed in the respective footnotes. 2S. Anderson, E. Arnold, C. Kulikowski, Rutgers University; A. Stock, Robert Wood Johnson Med. School; C. Arrowsmith, A. Edwards, E. Jurisica, University of Toronto; G. DeTitta. Hauptman-Woodward Medical Research Institute; M. Gerstein, L. Regan, Yale University; P. Allen, J. E. Gouaux, W. Hendrickson, B. Honig, J. Hunt, A. Palmer, B. Rost, L. Tong, Columbia University; M. Kennedy, Pacific Northwest National Laboratory; T. Szyperski, SUNY Buffalo; H. Wu, Weill Medical College; M. Zhou, Mount Sinai School of Medicine; M. Linial, Hebrew University; S. Yokoyama, The Institute of Physical and Chemical Research (RIKEN). 3

2000 Nature America Inc. M. Schiffer, F. Stevens, D. Hanson, M. Donnelly, Argonne National Laboratory; W. Anderson, M. Egli, D. Freeman, A. Mondragon, T.S. Jardetzky,

© A. Rosenzweig, Northwestern University; A. Edwards, University of Toronto; G. Waksman, D. Fremont, Washington University; J. Thornton, C. Orengo, University College London; W. Minor, University of Virginia; Z. Otwinowski, University of Texas. 4M.W.W. Adams, J. Arnold, H.A. Dailey, Jr., J.H. Prestegard, J.P. Rose, R.D.Hall, Z.-J. Liu, M.G. Newton, J.G. Omichinski, G. Rosenbaum, F. Tian, University of Georgia; R.W. Harrison, I.T. Weber, Georgia State University; L.J. DeLucas, M. Luo, W.M. Carson, C.-H. Luan, B. Sha, University of Alabama at Birmingham; E.J. Meehan, L. Chen, J. Ng, University of Alabama at Huntsville; J. Hudson, T. Moore, B. Pollock, Research Genetics; X. Lin, Oklahoma Medical Research Foundation; B.A. Roe, University of Oklahoma. 5B.T. Chait, T. Gaasterland, J. Kuriyan, A. Sali, The Rockefeller University; F.W. Studier, S. Swaminathan, R.M. Sweet, Brookhaven National Laboratory; C. Lima, Weill Medical College; L. Shapiro, Mount Sinai School of Medicine; S. Almo, M.R. Chance, Albert Einstein College of Medicine. 6R. Stevens, The Scripps Research Institute/The Genomics Institute of the Novartis Research Foundation; J. Wooley, S. Taylor, S. Subbiah, UCSD/San Diego Supercomputing Center; K. Hodgson, P. Kuhn, Stanford Synchrotron Radiation Laboratory; T. Earnest, Advanced Light Source (Lawrence Berkeley National Laboratory); S. Choe, The Salk Institute. 7K.K. Kim,Gyungsang National University; Y. Cho, Korea Institute of Science Technology; E. Berry, T. Earnest, S. Holbrook, L.-W. Hung, R. Kim, T. Earnest, P. Adams, U. Schulz-Gahmenn, Lawrence Berkeley National Laboratory; B.-H. Oh, Pohang University of Science and Technology; A. Brünger, D. McKay, W. Weis, Stanford University; W. Lim, R. Stroud, University of California, San Francisco; T. Alber, J. Berger, University of California, Berkeley, E. Baldwin, A. Fisher, D. Wilson, University of California, Davis; S. Yokoyama, University of Tokyo; C. Kang, Washington State University; C. Hutchinson, University of North Carolina. 8T. Alber, J. Berger, University of California, Berkeley; E.N. Baker, University of Auckland; J. Berendzen, M. Park, G. Waldo, Los Alamos National Laboratory; J. Bowie, D. Eisenberg, J. Feigon, J. Perry, T. Yeates, UCLA; A. Brünger, Stanford University; P. Adams, Lawrence Berkeley National Laboratory; W. Jacobs, Albert Einstein College of Medicine; B. Rupp, Lawrence Livermore National Laboratory; J. Sacchettini, Texas A&M University; S.W. Suh, Seoul National University; M. Weiss, Institute of Molecular Biology, Jena; M. Wilmans, P. Tucker, E. Pohl, V. Lamzin, EMBL-Hamburg; W. Wood, University of Colorado, Boulder; S. Yokoyama, RIKEN, J. Thornton, P. Driscoll, N. Stoker, University College, London; N. Keep, Birkbeck College; B. Robertson, Imperial College; G. Dodson, D. Roper, University of York; C. Carter, University of North Carolina, Chapel Hill; K. Kantardjieff, California State University, Fullerton; M. Vijayan, K. Muniyappa, U. Varshney, N.R.Chandra, K. Sekar, Indian Institute of Science; S. Cole, P. Alzari, Institut Pasteur; C. Cambillau, H. van Tilbeurgh, J.-P. Samama, CNRS; J. Naismith, St. Andrews University, A. Coates, St. George’s Hospital Medical School, M. Gennaro, Public Health Research Institute.

938 nature structural biology • structural genomics supplement • november 2000 © 2000 Nature America Inc. ¥ http://structbio.nature.com perspectives

Table 4 Web sites for North American structural genomics Organization Web site Northeast Structural Genomics Consortium http://www.nesg.org/ http://www-nmr.cabm.rutgers.edu/structuralgenomics/

New York Structural Genomics Research Consortium http://www.nysgrc.org/

Joint Center for Structural Genomics http://www.jcsg.org/

Berkeley Structural Genomics Center http://www-kimgrp.lbl.gov/genomics/proteinlist.html

TB Structural Genomics Consortium http://www.doe-mbi.ucla.edu/TB

Ontario Cancer Institute/University of Toronto/ http://nmr.oci.utoronto.ca/arrowsmith/proteomics/ Pacific Northwest National Laboratory/University of British Columbia/ McGill University

Center for Advanced Research in Biotechnology/ http://s2f.carb.nist.gov/ University of Maryland .com

NIH structural genomics initiatives http://www.nigms.nih.gov/funding/psi.html http://grants.nih.gov/grants/guide/rfa-files/RFA-GM-00-006.html

Targeting http://www.structuralgenomics.org/main.html http://presage.berkeley.edu/

International cooperation http://www.nigms.nih.gov/news/meetings/hinxton.html http://structbio.nature ¥

colleagues for an overview of patenting) or simply deposited in idea of structural genomics and to put together strong consortia public databases (if the structure does not appear to have commer- to carry out large pilot projects. The new NIH centers and the cial value or if the structure is determined to prevent patenting by other continuing pilot projects appear certain to determine others). In these cases, the structural information will be available many hundreds of new protein structures over the next five to the public (although commercial use may be restricted for years. At the same time, the technology development projects at patented structures). At this point, intellectual property rights the centers and elsewhere are likely to develop methods for pro- associated with public and commercial structural genomics efforts tein expression, purification, crystallization, structure determi- are not yet well delineated. nation and analysis that will benefit not just structural genomics 2000 Nature America Inc.

© but the wider structural biology community (see the articles in Students, postdocs, and structural genomics the Progress section of this issue). As the various structural The academic research system in North America depends heavily genomics consortia are focusing on different bottlenecks, all the on contributions by graduate students and postdoctoral projects stand to benefit from the progress of the others. researchers, and the bulk of macromolecular structure determina- Five years from now, the situation may be quite different. It tion is currently done by this important group of people. There is is possible that by that time methods for structure determina- considerable discussion as to whether it is appropriate for students tion of routine structures will have been substantially auto- and postdocs to be involved in structural genomics projects. The mated, and many structure determinations will have been NIH has specifically asked potential structural genomics centers to made routine. In this case the production of most proteins and describe the appropriateness of any use of students and postdocs. the solution of their structures need not be carried out as Although there is no consensus at this point, many in the field feel research efforts, and the need for academic and research insti- that there remains a major research aspect to structural genomics tution involvement in these steps will be greatly decreased. On both now and in the future. For now, simply developing the tech- the other hand, the need for detailed analysis of protein func- nologies is a substantial research effort. For the future, analyzing tion, both through informatics and through biochemical char- the structural results and using them to understand function will acterization, is likely to remain paramount (see the articles by require extensive work and will be highly appropriate for students Hol, and Thornton and colleagues). In a sense this means that and postdocs. It is likely that very routine structure determina- one of the main outcomes of structural genomics will be to tions will not be considered substantial research contributions in allow structural biologists to worry less about structure deter- the near future. Difficult structures such as those of large proteins mination per se, and to allow them to focus even more on or of complexes of proteins, however, will likely remain major understanding biological function. research projects suitable for students and postdocs for some time.

The future Associations with structural genomics The North American structural biology community and the T.C.T. is a member of the Pyrobaculum aerophilum Consortium pilot project, and public funding agencies have moved very quickly to develop the leads the TB Structural Genomics Consortium.

nature structural biology • structural genomics supplement • november 2000 939