Comparative and Functional Genomics Comp Funct Genom 2002; 3: 270–276. Published online 9 May 2002 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.178 Feature Meeting Review: And Medicine – From molecules to humans, virtual and real Hinxton Hall Conference Centre, Genome Campus, Hinxton, Cambridge, UK – April 5th–7th

Roslin Russell* MRC UK HGMP Resource Centre, Genome Campus, Hinxton, Cambridge CB10 1SB, UK

*Correspondence to: Abstract MRC UK HGMP Resource Centre, Genome Campus, The Industrialization Workshop Series aims to promote and discuss integration, automa- Hinxton, Cambridge CB10 1SB, tion, simulation, quality, availability and standards in the high-throughput life sciences. UK. The main issues addressed being the transformation of bioinformatics and bioinformatics- based drug design into a robust discipline in industry, the government, research institutes and academia. The latest workshop emphasized the influence of the post-genomic era on medicine and healthcare with reference to advanced biological systems modeling and simulation, research, protein-protein interactions, metabolism and physiology. Speakers included , Kenneth Buetow, Francois Cambien, , Jean Garnier, Francois Iris, Matthias Mann, Maya Natarajan, Peter Murray-Rust, Richard Mushlin, Barry Robson, David Rubin, Kosta Steliou, John Todd, , Pim van der Eijk, Michael Vieth and Richard Ward. Copyright # 2002 John Wiley & Sons, Ltd. Received: 22 April 2002 Keywords: bioinformatics; medicine; virtual human; protein structure; personalized Accepted: 24 April 2002 medicine; ontology

Introduction Keynote speakers

The conference was jointly organized and spon- Franc¸ois Cambien (Institut National de la Sante´etde sored by the Deep Computing Institute of IBM la Recherche Me´dicale, INSERM) presented his (http://www.research.ibm.com/dci/), the Task Force latest results [9] and gave an overview on the mole- on Bioinformatics of the International Union of cular and epidemiological of cardiovascular Pure and Applied Bioinformatics (http://www.iupab. disease and the different approaches towards geno- org/taskforces.html) and the European Bioinfor- typing. He stressed the importance of creating a matics Institute (EMBL-EBI, http://www.ebi.ac.uk/). catalogue of ‘all’ common polymorphisms in the Barry Robson (IBM Deep Computing Institute) human genome. Since the start of the Etude Cas- opened the workshop by stating that industrializa- Te´moin sur l’Infarctus du Myocarde (ECTIM) in tion does not mean commercialization but high- 1988 (the first large scale study on myocardial in- throughput industrial strength. Last year saw the farction, recent figures on the total number of candi- development of ‘clinical bioinformatics’ linking cli- date genes and polymorphisms across 29 studies are nical data to genomics, requiring a large collective now at 102 and 475 respectively as recorded by database that is ‘patient-centric’. This field will lead GeneCanvas (www.genecanvas.org) a website sup- to improved diagnostic tools, therapeutics and ported by INSERM. The project and others will patient care. continue to search for more polymorphisms but

Copyright # 2002 John Wiley & Sons, Ltd. Meeting Review: Bioinformatics and Medicine 271

how many are there, are they all relevant, and how almost identical function, while proteins with less do we make the most out of this information? Little than 30% sequence identity show significant func- is known about the distribution and structure of tional differences. These results have been applied polymorphisms across the genome despite the avail- to the problem of genome annotation and func- ability of the almost complete sequence. On the tional assignments have been catalogued in the assumption that there are 10–20 common poly- database, Gene3D (http://www.biochem.ucl.ac.uk/ morphisms per gene affecting coding and regulatory bsm/cath_new/Gene3D/) for a number of completed regions and that there are around 30 000 human genomes, providing functional and structural anno- genes in total, it is estimated that there may be tation. around 500 000 functionally related SNPs. The classi- fication and exploitation of polymorphisms for the whole genome not only requires data quality, Virtual human biology storage, integration and accessibility, but also an understanding of data relevance and representation Michael Ashburner () spoke in relation to complex disorders. Lopez and Murray about recent developments and tools associated [5] ranked cardiovascular disease the fifth most with the Gene Ontology2 Consortium (GO, http:// common disease in 1990 but they estimate that in www.geneontology.org/). GO uses a natural lan- 2020, it will be the most common disease world- guage ontology system and aims to provide a set wide. There is a complex interplay between envir- of structured vocabularies for specific biological onmental and genetic determinants: the former may domains that can be used to describe gene products explain the difference in disease prevalence between in any organism. There are GO terms for human, population groups, and time related changes, while mouse, rat, worm, fly, amoeba, yeast, Arabidopsis the latter plays a role in susceptibility. However, the and the more recently sequenced rice, and TIGR expectations of finding genetic factors for multi- are now remodeling all their bacterial genomes. The factorial diseases may not be fulfilled because many three organizing principles (orthogonal ontologies) genes are involved and several models (such as the of GO are: biological process, molecular function whole genome approach) may be much too simplis- and cellular component. There is a hierarchical tic. Important correlations may also be found not structure within these groups that allows querying only in phenotypes such as clinical data but also in at different levels and as the number of these biological systems. When integrated with epidemiol- parent-child relationships increase, the annotations ogy and Mendelian genetics, the application of high get more complex. Many databases are now colla- throughput proteomic tools as well as comparative borating with GO and these include MGI, LOCUS- investigations of different species will provide a LINK, SWISSPROT and CGAP. Recent browsers greater understanding into and functional and query tools attached to GO include the follow- biology and is expected to provide new leads for ing: the MGI GO Browser, AmiGO (Berkeley, drug development or other approaches to maintain http://www.godatabase.org/) ; the CGAP GO Brow- health. ser (http://cgap.nci.nih.gov/Genes/GOBrowser), the Janet Thornton (EMBL-EBI, http://www.ebi.ac. EP : GO Browser (http://ep.ebi.ac.uk/EP/GO/)which uk/Thornton/index.html) presented results on the is built into EBI’s Expression Profiler (a set of tools analysis of the domain structure of enzymes and for clustering and analysing microarray data); Quick- how this knowledge can be interpreted in the con- GO (http://www.ebi.ac.uk/ego/) which is integrated text of the evolution of metabolic pathways [10,8,6]. into InterPro; and DAG-Edit (http://sourceforge. The results from an analysis of small molecule meta- net/projects/geneontology), a purpose built, down- bolic pathways in yeast support a ‘mosaic’ model of loadable Java application to maintain, edit and evolution in which enzymes in metabolic pathways display GO terms. With the emerging development are chosen in an unsystematic way to meet the func- of ontologies in other domains, the Global Open tional requirements of a metabolic pathway. Further Biology Ontologies (GoBo) or Extended GO analysis on the conservation of function across (EGO) has also been set up with the aim of families of homologous enzymes shows that the being open source and DAG-Edit compatible, degree of functional similarity varies according to the using non-overlapping defined terms and common degree of sequence identity shared by enzyme homo- ID space. logue pairs. High sequence identity corresponds to Richard Ward (Oak Ridge National Laboratory)

Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276. 272 Meeting Review: Bioinformatics and Medicine is involved in the Virtual Human (VH) Project Protein structure and protein (http://www.ornl.gov/virtualhuman/), a human simu- interaction lation tool. The ambitious goal of the VH will be an integrated digital representation of the functioning Jean Garnier (INRA – de Recherche de Versailles) human from the molecular level to the living gave an overview of protein structure prediction human, and from conception to old age. Tools are techniques, describing some of the more widely used being developed to represent the body’s processes tools. Structure prediction tools fall into three from DNA molecules and proteins to cells, tissues, classes, those which predict structures from first organs and gross anatomy. These tools include principles such as molecular dynamics simulations, XML (Extensible Markup Language) where ontol- those based on statistical analysis, such as hidden ogies and a high level description of the human Markov models (HMMs) and those based on body are important, visualization tools and prob- comparison to known structures, i.e. homology lem solving environments. A forum has been estab- modeling. Jean Garnier stressed the importance of lished for the digital human effort to provide open testing different techniques, referring to EVA and source code tools to represent the body’s processes: CASP. EVA is an automatic evaluation tool that http://www.fas.org/dh/. The Physiome project (http:// continuously tests publicly available structure pre- www.physiome.org) is another comprehensive mod- diction servers by feeding them sequences for eling environment that aims to promote modeling known new structures from PDB and comparing for analysing integrative physiological function in the results with the determined structures. CASP is terms of underlying structure and molecular mech- a public competition in which experts attempt to anisms. There is a recognised need to adopt predict structures for proteins whose structures standards for data definition, storage and transfer, have been determined but whose structures have XML to replace standard data input, and databases not been made public. So far, structure prediction is for creating and developing XML schemas. There is most successful for proteins with known folds. Identification of proteins whose structure may be also a need to establish integrative databases deal- similar to a novel protein is often difficult and ing with genomic, biophysiological, bioengineering attempts to identify homologues of a new sequence and clinical data. XML descriptions are now becom- often fail despite the existence of homologues with ing more common and examples of these include: known structures because linear alignment of sequ- cellular function, cellMEL (http://www.cellml.org); sys- ences is not as conserved as a three-dimensional tems biology modeling, SBML (http://xml.coverpages. topology. Some recent ‘threading’ programs have org/sbml.html); and their own prototype for physio- been developed with better sensitivity than conven- logical modeling, physioML (http://www.ornl.gov/ tional alignment techniques for detecting distant virtualhuman). His group originally used a Java homologues for novel sequences, such as FORESST SAX parser for physioML but they converted to (http://abs.cit.nih.gov/foresst/, 3) and FROST (http:// using a Java document object model instead, since www.inra.fr/bia/produits/logiciels/BD2html.php?nom= new objects in the description are easily defined and FROST). The former program predicts which it allows the integration of physiology and anat- known fold best matches a novel sequence using omy. However, ontologies have an advantage over HMMs based on patterns of secondary structure in XML because they provide a richer description of proteins with known structures to analyse novel relationships (semantic rules) between objects and proteins. Secondary structure prediction algorithms interoperability improves between systems. They are applied to the novel sequence to determine a are now creating a Digital Human (DH) for the predicted secondary structure topology which is development of an anatomical ontology. The DH then fed to the HMM to identify known structures requires a problem-solving environment for com- with matching topologies. plex 3D-flow modeling and for high performance Major evolutionary transitions have arisen as a computing, three-tiered client/server architecture result of changes in protein repertoires and changes with Java RMI and NetSolve (http://icl.cs.utk.edu/ in the expression of protein repertoires. Cyrus netsolve) is used. He also mentioned how a common Chothia and Bernard de Bono (at the MRC component architecture, which is currently used for Laboratory of Molecular Biology) have been study- climate modeling, is the way of the future for the ing the immunoglobulin superfamily as a model of VH project. whole genome protein repertoires. In his talk Dr

Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276. Meeting Review: Bioinformatics and Medicine 273

Chothia discussed problems that have arisen from has developed tools both to identify distantly analyses of Ensembl in the context of attempts related homologues and to assign function to those to determine the complete repertoire of immuno- homologues. The 3D-PSSM tool (http://www.sbg. globulin superfamily proteins in the human genome. bio.ic.ac.uk/y3dpssm/) was developed to improve Application of HMMs to predicted proteins from PSI-BLAST predictions and to detect distant sequ- Ensembl to identify immunoglobulin superfamily ence relationships using structural profiles from proteins is hampered by various issues. Parts of known proteins. An algorithm has also been devel- some genes are absent while parts of other genes are oped for predicting functional sites in proteins split and widely separated from each other. Con- based on multiple sequence alignment of homolo- sistency between releases of Ensembl is also a pro- gous sequences followed by identification of invar- blem. Each new Ensembl release contains new iant polar residues. Spatially clustered groups of protein predictions, while predictions from previous these residues are considered to be functional sites. releases may be deleted. The solution involved Comparison of data for proteins with known func- looking at all nine releases of Ensembl to collec- tional sites with the multiple alignment predictions tively identify all immunoglobulin superfamily pro- showed the system to be 94% accurate. He empha- teins from all the predicted proteins in these releases sized the extent to which automated methods can be using HMMs. This was followed by removing used but also how human insight can provide added redundant predictions and confirming the quality value to genome annotation projects. of the predictions that remain using GeneWise. It is proving difficult to develop novel therapeutic molecules for treatment of multifactorial disorders based upon known targets despite a steady increase Environmental healthcare issues and in the number of such targets. In most cases, thera- bioinformatics/individuality and the peutic molecules, which successfully alleviate a dis- pursuit of personalised molecular ease do not directly affect the causative gene medicine products, instead they act on the physiological mechanisms that are ultimately triggered. There- Kenneth Buetow (NCBI) is actively involved in the fore, there is a need to identify these physiological Cancer Genome Anatomy Project (CGAP) and mechanisms and an understanding of the under- spoke about how combining bioinformatics and lying events should be the main aim of the research. genomics gives an insight into the etiology of can- However, technological and conceptual difficulties cer. The most obvious challenges associated with must be resolved for functionally targeted drug utilizing large data collections are management, development to be a reality. Since it is well visualisation and interpretation. If bioinformatics is established that physiological changes result from to succeed in integrating knowledge from diverse and induce gene expression pattern changes, the fields of research, the problem of standardizing latter together with pathological states could be terminology and language must be overcome. He used to determine the physiological mechanisms discussed how the NCBI is using the GO2 infra- implicated. However, it is important to consider the structure in cancer research. isoforms and any protein modifications involved, The annotation of genomes requires sensitive the types of complexes they associate with and the tools for the identification of homologous proteins, nature of the systemic environment. Franc¸ois Iris allowing proteins of unknown function to be linked (Hemispherx Biopharma Europe) presented Bio-Graph, to homologues with known function. Conventional a computer-assisted approach to the determination sequence alignment tools are often insensitive to of phenotype-specific physiological mechanisms for distant sequence relationships even though the dis- drug discovery. He pointed out the data mining tantly related sequences may have similar structures problems of ‘textual informatics’ and discussed how and functions. Since structure is typically more to approach them. Biograph is an integrative sys- conserved than sequence, the use of structure based tematic approach that is a four-phased physiolo- homologue recognition tools for genome annota- gical modeling process. It takes information from tion has significant value. (Imper- diverse forms and formats to create pertinent ial College) presented the latest results from his knowledge that is directly implemented to tackle group on the development of genome annotation the biological problems being addressed. In the first tools based on structural profiles [2,7]. His group phase, data from various biological databases (for

Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276. 274 Meeting Review: Bioinformatics and Medicine

examples, MedLine, OMIM, KEGG and SPAD) from the UK Paediatric TID registry. The ‘regis- are mapped into a relational graph, and then in the tries’ provide a diverse range of cases, patient hist- second phase, the graph is interpreted in the context ory and other relevant clinical data such as reasons of the problem of interest, leading to a sub-graph for death. Another challenge in SNP research is in that reveals potential pathological links. The third understanding splicing complexity. phase involves the construction of a theoretical phy- Matthias Mann (University of Southern Denmark siological model entirely supported by scientific and MDS-Proteomics) presented the latest results literature and finally, the last phase identifies from his collaborative work on high-throughput pathway-associated potential intervention points and mass spectrometric protein complex identification therapeutic intervention modes. Direct experimental [4]. Libraries of expressed cDNAs encoding ‘bait’ biological verification of the results is then required. proteins are prepared by a high-throughput recom- Kosta Steliou (Boston University) gave an over- bination procedure such that the members of the view of population history and genomics. Studies library of expressed proteins are each linked to into patient history and genealogy give important an epitope tag. This tag may be used to immuno- insights into the molecular basis of disease. He des- precipitate the expressed ‘bait’ protein and any cribed how the analysis of the mitochondrial proteins that are interacting with the ‘bait’ protein genome can be used to segregate patients into medi- from whole cell extracts. The components of the cally relevant genetic subgroups. Research has shown immunoprecipitated complex can then be identified that there are certain mutations in the mitochon- further by gel electrophoresis and peptide mass drial DNA that are associated with diverse diseases fingerprinting. The technique found 3-times more such as cancer, infertility, diabetes, stroke, deafness interactions than the yeast 2-hybrid technology in a and many others (http://www.mitoresearch.org). comprehensive study of the yeast proteome. Some There are also associations between the A3243G teething troubles were mentioned such as ectopic mtDNA mutation and neurodegenerative disorders. expression of the tagged clones and a significant number of false positives. The filtered data set has been deposited in BIND2, a powerful protein- Researching and simulating biomolecular protein interaction database (http://www.bind.ca/). interaction networks and bioinformatics aspects of intercellular integration Raw data can be obtained from http://www.mdsp. com/yeast. Matthias Mann also referred to his latest work in Despite the ease with which SNPs can be identified, collaboration with Angus Lamond in Dundee on determining the biological significance of SNPs pro- the characterisation of the nucleolus, a substructure vides many challenges. In his presentation, John within the cell nucleus [1]. This project aims to Todd (University of Cambridge) gave examples from determine which proteins are found in the nucleo- his work in type I diabetes (TID), and autoimmune disease. He pointed out that there are two general lus, what they interact with, and what their function mechanisms by which SNPs have their effect, either is. Nucleolar isolation followed by 2-D gel electro- by changing the regulation of the gene or by phoresis and peptide mass fingerprinting has been changing the precise amino acid sequence coded by used to identify components of the nucleolus. Issues the gene. Analysis of the function of SNPs is that have arisen from this project include how to complicated by epistasis (which is where the effects confirm that proteins identified by this process of one SNP may be masked by another). To deal actually reside in the nucleolus rather than being with epistasis, large sample sizes are required and contaminants. Fluorescence localization using Yellow Todd suggested that a case study of around 5000 Fluorescent Protein tagged clones was carried out samples is required. This implies that SNP screening to confirm nucleolar residence of the proteins must become an industrial process once SNPs are identified by mass spectrometry. recognized in all haplotypes. However, the techno- Matthias Mann emphasized the need for stan- logy is still approximately 12-fold out in terms of dards to interpret proteomic data particularly given cost therefore more cost effective methods are the variety of analytical methods that are referred required. Genetics has to scale up to enable popu- to as proteomic techniques. Predictive tools may lation-based studies and he currently has funding become more valuable for functional pathways once for a case study of 10 000 DNA patient samples more data is available, such as more in vivo data on

Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276. Meeting Review: Bioinformatics and Medicine 275

phosphorylation sites to allow the identification of networks of PCs that are actively performing unknown proteins with phosphorylation sites. menial tasks can provide significant computational David Rubin presented Cognia, a company that is power for processor intensive problems. The com- developing proteomic database products. Cognia putational power derived from such networks grows has licensed the right to distribute commercial ver- linearly with the number of computing nodes within sions of the TRANSFAC and TRANSPATH data- the network and the use of such networks provides bases with improved querying tools and interfaces. a cost-effective solution for high performance com- Cognia is also developing a proprietary relational putational needs. database product, the Protein Catabolism DataBase Barry Robson’s presentation on ‘Health Care and (PCDB), a comprehensive database on protein turn- Personalised Medicine’ was a visionary account of over and protein degradation pathways. This will be what the healthcare system might be like in the of value in a number of diseases in which protein future and how this may be achieved. The medicine turnover is believed to be aberrant, such as in cer- of the future will take advantage of personal geno- tain cancers, neurodegenerative diseases and inflam- mic and expression data but such data cannot be matory disorders. The PCDB database includes used to its full potential unless there is an effective data on degradation mechanisms, post-translational link and accessibility to patient records. IBM’s modifications, structure, protein-protein interac- vision for a global healthcare system, ‘The Secure tions and signal transduction. The PCDB product Health and Medical Access Network’ (SHAMAN) includes search tools to enable Boolean queries, provides the infrastructure for this goal. The future structural searching and tools to enable tailored prospect of designing personal drugs in real time is curation specific to particular organizations or pro- achievable. The drugs will be designed for patients jects. The design of the database is amenable to on high performance servers at pharmaceutical inclusion of open source code. companies and FDA offices. However, such a Molecular docking is a technique for determining vision will mean that a change in the healthcare how well pairs of molecules interact with each system is required. The development of SHAMAN other. Pairs of molecules are ‘docked’ in various essentially requires the encoding of patient record orientations and the energy of each interaction is into digital form and the team is working closely with scored. The process is repeated to find the best a group in Oxford on bioethics issues. An XML interaction between the pair. Applications include standard is used for the ‘clinical’ document ‘architec- ‘in silico’ drug screening to filter compound libraries ture’ and various forms of visualization tools are prior to actual screening and prediction of binding being developed. Mitochondrial DNA may be used conformation of hits from screening. Michael Vieth as a ‘barcode’ for the record. A major problem is that (Eli Lilly and Company) addressed various limi- the data is dirty, sparse and in a high dimensional tations of docking algorithms. The flexibility of space. Algorithms are required to cope with this, for proteins, and their ability to adopt novel conforma- example, knowledge can be reduced into a few tions on ligand binding, are difficult to account for, terms using mathematical terminologies. Extensions as are changes in ligand conformation. Multiple to SHAMAN will include a database of potential X-ray structures and improved Structure Activity pathogens and also a database of models for Relationships (SARs) from known proteins/ligand human disease. complexes can improve predictions for new proteins Richard Mushlin (IBM T. J. Watson Research and/or new ligands. Center) mentioned that digital patient data has been around for many years now, and the recent success of these systems is largely due to the diverse forms E-Medicine, medical research and of data being structured according to standards and healthcare in an Internet world brought under the control of integrated applica- tions. Several of these applications are complete Maya Natarajan (Entropia, Inc., http://www.entropia. enough such that virtually no paper work is involved. com/) presented results of how ‘grid computing’ aka However, with the emergence of new multi-source ‘distributed computing’ can accelerate bioinfor- data types such as personal genomic information, matics research. Using BLAST results and results these systems must be able to accommodate them from docking experiments from a pharmaceutical electronically, or risk returning to paper records. He company, she demonstrated how moderately sized emphasized the importance of integrating genomic

Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276. 276 Meeting Review: Bioinformatics and Medicine

data with clinical data for making scientific dis- functions to novel sequences. High throughput coveries. Re-evaluation tools for the data will be structure analysis is beginning to provide auto- required as more information is fed to the system. mated methods of genome annotation. Accurate Pim van der Eijk (Oasis, http://www.oasis-open. functional annotation of genomes will lead to better org/) gave a high-level introduction to XML and interpretation of both molecular and clinical genet- presented the results of a survey of several XML- ics, allowing structural and functional information based languages in biology with the emphasis on to be linked to clinical phenotypes. The ultimate schema design, process issues and XML-based outcome of this process of integration will be knowledge formalisms. improved health-care for all. Peter Murray-Rust (Unilever Centre for Molecular Informatics) described an XML based system for chemical nomenclature. He emphasized the need for References agreed ontologies and discussed the problems of 1. Andersen JS, Lyon CE, Fox AH, et al. 2002. Directed overlapping domains. Scientific publication can be proteomic analysis of the human nucleolus. Curr Biol 12: used to create a knowledge base, or semantic web, 1–11. and this has great potential in providing novel 2. Bates PA, Kelley LA, MacCallum RM, Sternberg MJ. 2001. knowledge discoveries beyond what is presently Enhancement of protein modeling by human intervention in found in traditional publications. applying the automatic programs 3D-JIGSAW and 3D- PSSM. Proteins 45: 39–46. 3. Di Francesco V, Munson PJ, Garnier J. 1999. FORESST: Conclusion fold recognition from secondary structure predictions of proteins. Bioinformatics 15: 131–140. 4. Ho Y, Gruhler A, Heilbut A, et al. 2002. Systematic A diverse array of subjects was covered by the identification of protein complexes in Saccharomyces cerevi- speakers and a number of underlying themes have siae by mass spectrometry. Nature 415: 180–183. emerged relating to the exploitation of genomic 5. Lopez AD, Murray CC. 1998. The global burden of disease, data, principally the need for standardization both 1990–2020. Nat Med 4: 1241–1243. of data structure and vocabulary. To achieve this 6. Rison S, Teichmann SA, Thornton JM. 2002. Homology, goal of integration of disciplines as varied as clinical pathway distance and chromosomal localisation of the small molecule metabolism enzymes in Escherichia coli. J Mol Biol genetics, structural genomics and proteomics, this 00: 00–00. process of standardization will need to be far- 7. Smith GR, Sternberg MJ. 2002. Prediction of protein-protein reaching and extensive, and achieve global accep- interactions by docking methods. Curr Opin Struct Biol 12: tance. Scientific publication can also provide and 28–35. create a wealth of knowledge, however, publication 8. Teichmann SA, Rison SC, Thornton JM, Riley M, Gough J, Chothia C. 2001. The evolution and structural anatomy of will need to be in an accessible format for data the small molecule metabolic pathways in Escherichia coli. integration and mining. Furthermore, if medical J Mol Biol 311: 693–708. research is to evolve, academics in particular need 9. Tiret L, Poirier O, Nicaud V, et al. 2002. Heterogeneity of to take an ‘industrial approach’ to research in order linkage disequilibrium in human genes has implications for to achieve the necessary scale of data acquisition association studies of common diseases. Hum Mol Genet 11: and management for genome-wide population based 419–429. 10. Todd AE, Orengo CA, Thornton JM. 2001. Evolution of studies. To successfully scale up, research processes function in protein superfamilies, from a structural perspec- will require effective automation particularly of tive. J Mol Biol 307: 1113–1143. genome annotation and the task of assigning

Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276. International Journal of Peptides

Advances in BioMed Stem Cells International Journal of Research International International Genomics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Virolog y http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal of Nucleic Acids

Zoology International Journal of Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Submit your manuscripts at http://www.hindawi.com

Journal of The Scientific Signal Transduction World Journal Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Genetics Anatomy International Journal of Advances in Research International Research International Microbiology Research International Bioinformatics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Enzyme International Journal of Molecular Biology Journal of Archaea Research Evolutionary Biology International Marine Biology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014