Meeting Review: Bioinformatics and Medicine – from Molecules To
Total Page:16
File Type:pdf, Size:1020Kb
Comparative and Functional Genomics Comp Funct Genom 2002; 3: 270–276. Published online 9 May 2002 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.178 Feature Meeting Review: Bioinformatics And Medicine – From molecules to humans, virtual and real Hinxton Hall Conference Centre, Genome Campus, Hinxton, Cambridge, UK – April 5th–7th Roslin Russell* MRC UK HGMP Resource Centre, Genome Campus, Hinxton, Cambridge CB10 1SB, UK *Correspondence to: Abstract MRC UK HGMP Resource Centre, Genome Campus, The Industrialization Workshop Series aims to promote and discuss integration, automa- Hinxton, Cambridge CB10 1SB, tion, simulation, quality, availability and standards in the high-throughput life sciences. UK. The main issues addressed being the transformation of bioinformatics and bioinformatics- based drug design into a robust discipline in industry, the government, research institutes and academia. The latest workshop emphasized the influence of the post-genomic era on medicine and healthcare with reference to advanced biological systems modeling and simulation, protein structure research, protein-protein interactions, metabolism and physiology. Speakers included Michael Ashburner, Kenneth Buetow, Francois Cambien, Cyrus Chothia, Jean Garnier, Francois Iris, Matthias Mann, Maya Natarajan, Peter Murray-Rust, Richard Mushlin, Barry Robson, David Rubin, Kosta Steliou, John Todd, Janet Thornton, Pim van der Eijk, Michael Vieth and Richard Ward. Copyright # 2002 John Wiley & Sons, Ltd. Received: 22 April 2002 Keywords: bioinformatics; medicine; virtual human; protein structure; personalized Accepted: 24 April 2002 medicine; ontology Introduction Keynote speakers The conference was jointly organized and spon- Franc¸ois Cambien (Institut National de la Sante´etde sored by the Deep Computing Institute of IBM la Recherche Me´dicale, INSERM) presented his (http://www.research.ibm.com/dci/), the Task Force latest results [9] and gave an overview on the mole- on Bioinformatics of the International Union of cular and epidemiological genetics of cardiovascular Pure and Applied Bioinformatics (http://www.iupab. disease and the different approaches towards geno- org/taskforces.html) and the European Bioinfor- typing. He stressed the importance of creating a matics Institute (EMBL-EBI, http://www.ebi.ac.uk/). catalogue of ‘all’ common polymorphisms in the Barry Robson (IBM Deep Computing Institute) human genome. Since the start of the Etude Cas- opened the workshop by stating that industrializa- Te´moin sur l’Infarctus du Myocarde (ECTIM) in tion does not mean commercialization but high- 1988 (the first large scale study on myocardial in- throughput industrial strength. Last year saw the farction, recent figures on the total number of candi- development of ‘clinical bioinformatics’ linking cli- date genes and polymorphisms across 29 studies are nical data to genomics, requiring a large collective now at 102 and 475 respectively as recorded by database that is ‘patient-centric’. This field will lead GeneCanvas (www.genecanvas.org) a website sup- to improved diagnostic tools, therapeutics and ported by INSERM. The project and others will patient care. continue to search for more polymorphisms but Copyright # 2002 John Wiley & Sons, Ltd. Meeting Review: Bioinformatics and Medicine 271 how many are there, are they all relevant, and how almost identical function, while proteins with less do we make the most out of this information? Little than 30% sequence identity show significant func- is known about the distribution and structure of tional differences. These results have been applied polymorphisms across the genome despite the avail- to the problem of genome annotation and func- ability of the almost complete sequence. On the tional assignments have been catalogued in the assumption that there are 10–20 common poly- database, Gene3D (http://www.biochem.ucl.ac.uk/ morphisms per gene affecting coding and regulatory bsm/cath_new/Gene3D/) for a number of completed regions and that there are around 30 000 human genomes, providing functional and structural anno- genes in total, it is estimated that there may be tation. around 500 000 functionally related SNPs. The classi- fication and exploitation of polymorphisms for the whole genome not only requires data quality, Virtual human biology storage, integration and accessibility, but also an understanding of data relevance and representation Michael Ashburner (University of Cambridge) spoke in relation to complex disorders. Lopez and Murray about recent developments and tools associated [5] ranked cardiovascular disease the fifth most with the Gene Ontology2 Consortium (GO, http:// common disease in 1990 but they estimate that in www.geneontology.org/). GO uses a natural lan- 2020, it will be the most common disease world- guage ontology system and aims to provide a set wide. There is a complex interplay between envir- of structured vocabularies for specific biological onmental and genetic determinants: the former may domains that can be used to describe gene products explain the difference in disease prevalence between in any organism. There are GO terms for human, population groups, and time related changes, while mouse, rat, worm, fly, amoeba, yeast, Arabidopsis the latter plays a role in susceptibility. However, the and the more recently sequenced rice, and TIGR expectations of finding genetic factors for multi- are now remodeling all their bacterial genomes. The factorial diseases may not be fulfilled because many three organizing principles (orthogonal ontologies) genes are involved and several models (such as the of GO are: biological process, molecular function whole genome approach) may be much too simplis- and cellular component. There is a hierarchical tic. Important correlations may also be found not structure within these groups that allows querying only in phenotypes such as clinical data but also in at different levels and as the number of these biological systems. When integrated with epidemiol- parent-child relationships increase, the annotations ogy and Mendelian genetics, the application of high get more complex. Many databases are now colla- throughput proteomic tools as well as comparative borating with GO and these include MGI, LOCUS- investigations of different species will provide a LINK, SWISSPROT and CGAP. Recent browsers greater understanding into evolution and functional and query tools attached to GO include the follow- biology and is expected to provide new leads for ing: the MGI GO Browser, AmiGO (Berkeley, drug development or other approaches to maintain http://www.godatabase.org/) ; the CGAP GO Brow- health. ser (http://cgap.nci.nih.gov/Genes/GOBrowser), the Janet Thornton (EMBL-EBI, http://www.ebi.ac. EP : GO Browser (http://ep.ebi.ac.uk/EP/GO/)which uk/Thornton/index.html) presented results on the is built into EBI’s Expression Profiler (a set of tools analysis of the domain structure of enzymes and for clustering and analysing microarray data); Quick- how this knowledge can be interpreted in the con- GO (http://www.ebi.ac.uk/ego/) which is integrated text of the evolution of metabolic pathways [10,8,6]. into InterPro; and DAG-Edit (http://sourceforge. The results from an analysis of small molecule meta- net/projects/geneontology), a purpose built, down- bolic pathways in yeast support a ‘mosaic’ model of loadable Java application to maintain, edit and evolution in which enzymes in metabolic pathways display GO terms. With the emerging development are chosen in an unsystematic way to meet the func- of ontologies in other domains, the Global Open tional requirements of a metabolic pathway. Further Biology Ontologies (GoBo) or Extended GO analysis on the conservation of function across (EGO) has also been set up with the aim of families of homologous enzymes shows that the being open source and DAG-Edit compatible, degree of functional similarity varies according to the using non-overlapping defined terms and common degree of sequence identity shared by enzyme homo- ID space. logue pairs. High sequence identity corresponds to Richard Ward (Oak Ridge National Laboratory) Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276. 272 Meeting Review: Bioinformatics and Medicine is involved in the Virtual Human (VH) Project Protein structure and protein (http://www.ornl.gov/virtualhuman/), a human simu- interaction lation tool. The ambitious goal of the VH will be an integrated digital representation of the functioning Jean Garnier (INRA – de Recherche de Versailles) human from the molecular level to the living gave an overview of protein structure prediction human, and from conception to old age. Tools are techniques, describing some of the more widely used being developed to represent the body’s processes tools. Structure prediction tools fall into three from DNA molecules and proteins to cells, tissues, classes, those which predict structures from first organs and gross anatomy. These tools include principles such as molecular dynamics simulations, XML (Extensible Markup Language) where ontol- those based on statistical analysis, such as hidden ogies and a high level description of the human Markov models (HMMs) and those based on body are important, visualization tools and prob- comparison to known structures, i.e. homology lem solving environments. A forum has been estab- modeling. Jean Garnier stressed the importance of lished for the digital human effort to provide open testing different techniques, referring to EVA and source code tools to represent