Prediction of Membrane Proteins in Post-Genomic Era

Total Page:16

File Type:pdf, Size:1020Kb

Prediction of Membrane Proteins in Post-Genomic Era Prediction of Membrane Proteins in Post-Genomic Era Daisuke Kihara1* & Minoru Kanehisa2 1) Donald Danforth Plant Science Center 893 North Warson Road St. Louis, Missouri 63141, USA [email protected] Fax: 314-812-8075 2) Institute for Chemical Research, Kyoto University Gokasho, Uji, Kyoto, 611-0011, Japan [email protected] Fax: 81-774-38-3269 *) To whom correspondence should be addressed Running title: Membrane Proteins in Genome Recent Res. Developments in Protein Engineering. 1: 179-196 (2001) 1 ABSTRACT protein. Membrane proteins have important roles in living cells, such as transport, energy The current status of the prediction methods of production, cell signaling and cell adhesion. In membrane proteins in the post-genomic era is the genomic context, membrane proteins have described. In the first section, prediction attracted more attention, because they discussed methods of transmembrane segments in that in microbial genomes, the distribution of proteins are considered, including the TSEG transporters, one of the major members of the program, which we have recently developed. membrane protein family, reflects the Several prediction methods for the tertiary environment that each organism inhabits structure of membrane proteins are also [10,11]. mentioned. Finally, we show the unique feature We begin this text by reviewing the of the distribution of membrane proteins in a prediction methods of TM segments in proteins, genome. Our prediction of the function of which are also used in detecting TM proteins in several membrane proteins is also shown. genome sequences. The prediction of TM segments is also a starting point of the tertiary INTRODUCTION structure prediction of a membrane protein. Here, we describe the TSEG program, which The beginning of the scientific research of this we have developed recently. The proper way to century is characterized by the report that the measure the performance of prediction methods human genome sequencing project has been is also mentioned. Next, methods for predicting completed [1,2]. Beyond the human genome, a the tertiary structure of TM proteins are surge of world-wide genome projects in the last reviewed. Currently, tertiary structures of only decade have been producing complete genome a few TM proteins are solved by experimental sequences of an increasing number of methods, and most of them have helical TM organisms. At this point (February 2001), segments. Limiting the objects of prediction to complete genome sequences of 48 organisms those proteins with helical TM segments are available in the KEGG database [3], which (exceptions at this point are porin, which has β- is a compilation of genome sequences and barrel structure [12] and bacterial toxin proteins pathways we have been maintaining on the web. which penetrate into membranes and form Complete genome sequences enable a membrane pores [13-16]), this prediction comprehensive study of proteins or organisms procedure can be simplified to the assembly of through a catalog of proteomes, e.g., dynamic semi-rigid helical rods. In a later section, we genome rearrangement of a pair of closely show a genome sequence analysis in terms of related organisms [4,5], the spread and membrane proteins. The membrane protein evolution of a particular protein family among content of a genome, and their physical organisms [6,7], and physical principles of how distribution in a genome is the main issue here. related genes are encoded in the genome [8,9]. We have also made predictions for the In this post-genomic era, every computational functions of membrane proteins based on the structure/function prediction method should observations. take into account its application to genome sequences. 1. Prediction Methods of Transmembrane Throughout this manuscript, trans- Segments membrane (TM) protein is meant by membrane 2 The TM segment prediction method originates amino acid propensity to construct prediction from the hydropathy plot by Kyte & Doolittle methods [30-32]. Using the Neural network is (1982) [17]. The idea is rather simple: another promising approach to these kinds of recognizing highly hydrophobic stretches in the two-dimensional structure predictions [33-35]. sequence as TM segments using a sliding Using multiple sequence alignments helps window of a certain length (7-21 amino acids). improving prediction accuracy [30,31,34,35]. Another important contribution of this paper is Kroth et al. applied a hidden Markov model the derivation of the hydrophobicity index of [36,37]. amino acids, which is still commonly used. It The topology of membrane proteins can was derived from water-vapor transfer free be derived from some biochemical energies [18-20] and interior-exterior experimental evidence, even if their tertiary distribution of amino acids [21]. Various structures are not solved [38-40]. But it is often improvements have been performed on the the case that contradictory results are suggested hydropathy analysis since then: taking by other experiments. If one also considers that amphiphilicity of TM helices into account [22, any prediction method has limited accuracy, it 23], using different or various hydrophobicity is optimal for a prediction method to output not indices [24,25], employing discriminant a single prediction but a list of possibilities with analysis [26]. Hydropathy-type methods can be certainty measures, so that further experiments generalized as follows: given a hydrophobicity can be designed to distinguish among several index ft for each amino acid type t, weights hn topology models. Jones et al. [32] elegantly for the position n in the sliding window and a applied a dynamic programming algorithm for threshold B which is constant: this purpose. TopPred II by von Heijne [41] ranks several topology models according to the ( j,m) ( j,m−n) D = ∑ M hn − B , 1 ≤ m ≤ L j (1) ‘positive-inside rule’ (see below, in Topology n Prediction section). We also mention here that (j,m-n) (j,m) neural network-based and hidden Markov where M = fp(j,m-n), and p denotes the model-based methods can assign reliability residue type at position m in chain j. Lj is the index to the predictions made for each residue. length of the chain j. The position m in chain j (j,m) is decided to be transmembrane if D > 0. TSEG program Treating ft and hn as variables, Edelman [27] derived their optimal values by minimizing the In this section, we describe TSEG, which we following quadratic equation S (which have developed recently. When annotating corresponds to minimizing prediction errors): biological functions of genes in a genome, the prediction of higher order structures can be 2 S = ∑()D ( j,m) − q ( j,m) (2) used in order to compensate for the limitation j,m of the conventional sequence similarity search [42-44]. This is especially true for membrane (j,m) Here, q = 1 when (j,m) in the training set is proteins, since the number of TM segments in a (j,m) transmembrane, otherwise q = 0. protein can be related to a functional subclass It has been observed that not only the in some cases, such as seven-TM receptors or inside of TM segments but also their flanking six-TM transporters. One of the objectives of regions have preferable amino acids, e.g. developing the Transmembrane SEGment aromatic residues [28,29]. One can utilize the prediction program [45] was to enhance the 3 functional identification of TM proteins. To segment, and ω is the angle in degrees. Figure 1 capture detailed properties of TM segments, is the model of the membrane protein used in our method is based on a classification of TM the prediction, according to the locations of the segments in a database. In fact, not all TM five groups of TM segments. Here, membrane segments are equally hydrophobic. For example, proteins with more than fourteen TM segments TM segments of single spanning TM proteins are excluded because there were not enough are known to be highly hydrophobic and have sequences in the database to execute statistical less amphiphilicity [22], whereas the last TM calculations. The most hydrophobic group, segments in seven-TM proteins are relatively group1, appears only in the single spanning less hydrophobic and often difficult to detect by membrane proteins. Group2, which has prediction methods [30,32]. Thus, we have relatively high hydrophobicity, appears at the classified TM segments first by the total N-terminal TM segments in most of the number of TM segments in a protein and the membrane protein classes (except for the 12TM order in which they appear in the protein proteins). It is possible that these TM segments sequence, and at last merged similar ones into are involved in the initiation of membrane the same group. The second feature is that the insertion, which may correspond to what TSEG enumerates possible models as ranked Eisenberg et al. [22] called ‘initiators’. The last by their scores, where a model is distinguished segment of the seven TM proteins belongs to by the number of TM segments in a protein and the least hydrophobic group 5, which also represented by the order of different groups appears in the eighth segment of the nine TM (types) of TM segments. A model of globular proteins. proteins is included as well. The prediction procedure is based on the We have classified TM segments into detection of different TM segment groups using five groups according to their average hydro- different discriminant functions, followed by phobicity and amphiphilicity (or AP value), matching with the 15 models shown in Figure 1. using the Mahalanobis distance of the linear In the first stage, a query sequence is applied to discriminant analysis [46]. The dataset of 2876 each model and the best candidates for TM non-redundant TM protein sequences used here segments in the model are selected. This was extracted from Swiss-Prot rel.
Recommended publications
  • A Method to Infer Changed Activity of Metabolic Function from Transcript Profiles
    ModeScore: A Method to Infer Changed Activity of Metabolic Function from Transcript Profiles Andreas Hoppe and Hermann-Georg Holzhütter Charité University Medicine Berlin, Institute for Biochemistry, Computational Systems Biochemistry Group [email protected] Abstract Genome-wide transcript profiles are often the only available quantitative data for a particular perturbation of a cellular system and their interpretation with respect to the metabolism is a major challenge in systems biology, especially beyond on/off distinction of genes. We present a method that predicts activity changes of metabolic functions by scoring reference flux distributions based on relative transcript profiles, providing a ranked list of most regulated functions. Then, for each metabolic function, the involved genes are ranked upon how much they represent a specific regulation pattern. Compared with the naïve pathway-based approach, the reference modes can be chosen freely, and they represent full metabolic functions, thus, directly provide testable hypotheses for the metabolic study. In conclusion, the novel method provides promising functions for subsequent experimental elucidation together with outstanding associated genes, solely based on transcript profiles. 1998 ACM Subject Classification J.3 Life and Medical Sciences Keywords and phrases Metabolic network, expression profile, metabolic function Digital Object Identifier 10.4230/OASIcs.GCB.2012.1 1 Background The comprehensive study of the cell’s metabolism would include measuring metabolite concentrations, reaction fluxes, and enzyme activities on a large scale. Measuring fluxes is the most difficult part in this, for a recent assessment of techniques, see [31]. Although mass spectrometry allows to assess metabolite concentrations in a more comprehensive way, the larger the set of potential metabolites, the more difficult [8].
    [Show full text]
  • The Kyoto Encyclopedia of Genes and Genomes (KEGG)
    Kyoto Encyclopedia of Genes and Genome Minoru Kanehisa Institute for Chemical Research, Kyoto University HFSPO Workshop, Strasbourg, November 18, 2016 The KEGG Databases Category Database Content PATHWAY KEGG pathway maps Systems information BRITE BRITE functional hierarchies MODULE KEGG modules KO (KEGG ORTHOLOGY) KO groups for functional orthologs Genomic information GENOME KEGG organisms, viruses and addendum GENES / SSDB Genes and proteins / sequence similarity COMPOUND Chemical compounds GLYCAN Glycans Chemical information REACTION / RCLASS Reactions / reaction classes ENZYME Enzyme nomenclature DISEASE Human diseases DRUG / DGROUP Drugs / drug groups Health information ENVIRON Health-related substances (KEGG MEDICUS) JAPIC Japanese drug labels DailyMed FDA drug labels 12 manually curated original DBs 3 DBs taken from outside sources and given original annotations (GENOME, GENES, ENZYME) 1 computationally generated DB (SSDB) 2 outside DBs (JAPIC, DailyMed) KEGG is widely used for functional interpretation and practical application of genome sequences and other high-throughput data KO PATHWAY GENOME BRITE DISEASE GENES MODULE DRUG Genome Molecular High-level Practical Metagenome functions functions applications Transcriptome etc. Metabolome Glycome etc. COMPOUND GLYCAN REACTION Funding Annual budget Period Funding source (USD) 1995-2010 Supported by 10+ grants from Ministry of Education, >2 M Japan Society for Promotion of Science (JSPS) and Japan Science and Technology Agency (JST) 2011-2013 Supported by National Bioscience Database Center 0.8 M (NBDC) of JST 2014-2016 Supported by NBDC 0.5 M 2017- ? 1995 KEGG website made freely available 1997 KEGG FTP site made freely available 2011 Plea to support KEGG KEGG FTP academic subscription introduced 1998 First commercial licensing Contingency Plan 1999 Pathway Solutions Inc.
    [Show full text]
  • 3 13437143.Pdf
    Title Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O.; Barrero, Roberto A.; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A.; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K.; Brookes, Anthony J.; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; Souza, Sandro J. de; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Author(s) Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei;
    [Show full text]
  • Genbank by Walter B
    GenBank by Walter B. Goad o understand the significance of the information stored in memory—DNA—to the stuff of activity—proteins. The idea that GenBank, you need to know a little about molecular somehow the bases in DNA determine the amino acids in proteins genetics. What that field deals with is self-replication—the had been around for some time. In fact, George Gamow suggested in T process unique to life—and mutation and 1954, after learning about the structure proposed for DNA, that a recombination—the processes responsible for evolution-at the triplet of bases corresponded to an amino acid. That suggestion was fundamental level of the genes in DNA. This approach of working shown to be true, and by 1965 most of the genetic code had been from the blueprint, so to speak, of a living system is very powerful, deciphered. Also worked out in the ’60s were many details of what and studies of many other aspects of life—the process of learning, for Crick called the central dogma of molecular genetics—the now example—are now utilizing molecular genetics. firmly established fact that DNA is not translated directly to proteins Molecular genetics began in the early ’40s and was at first but is first transcribed to messenger RNA. This molecule, a nucleic controversial because many of the people involved had been trained acid like DNA, then serves as the template for protein synthesis. in the physical sciences rather than the biological sciences, and yet These great advances prompted a very distinguished molecular they were answering questions that biologists had been asking for geneticist to predict, in 1969, that biology was just about to end since years.
    [Show full text]
  • Integrative Annotation of 21,037 Human Genes Validated by Full-Length Cdna Clones
    Title Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O.; Barrero, Roberto A.; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A.; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K.; Brookes, Anthony J.; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; Souza, Sandro J. de; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Author(s) Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei;
    [Show full text]
  • I S C B N E W S L E T T
    ISCB NEWSLETTER FOCUS ISSUE {contents} President’s Letter 2 Member Involvement Encouraged Register for ISMB 2002 3 Registration and Tutorial Update Host ISMB 2004 or 2005 3 David Baker 4 2002 Overton Prize Recipient Overton Endowment 4 ISMB 2002 Committees 4 ISMB 2002 Opportunities 5 Sponsor and Exhibitor Benefits Best Paper Award by SGI 5 ISMB 2002 SIGs 6 New Program for 2002 ISMB Goes Down Under 7 Planning Underway for 2003 Hot Jobs! Top Companies! 8 ISMB 2002 Job Fair ISCB Board Nominations 8 Bioinformatics Pioneers 9 ISMB 2002 Keynote Speakers Invited Editorial 10 Anna Tramontano: Bioinformatics in Europe Software Recommendations11 ISCB Software Statement volume 5. issue 2. summer 2002 Community Development 12 ISCB’s Regional Affiliates Program ISCB Staff Introduction 12 Fellowship Recipients 13 Awardees at RECOMB 2002 Events and Opportunities 14 Bioinformatics events world wide INTERNATIONAL SOCIETY FOR COMPUTATIONAL BIOLOGY A NOTE FROM ISCB PRESIDENT This newsletter is packed with information on development and dissemination of bioinfor- the ISMB2002 conference. With over 200 matics. Issues arise from recommendations paper submissions and over 500 poster submis- made by the Society’s committees, Board of sions, the conference promises to be a scientific Directors, and membership at large. Important feast. On behalf of the ISCB’s Directors, staff, issues are defined as motions and are discussed EXECUTIVE COMMITTEE and membership, I would like to thank the by the Board of Directors on a bi-monthly Philip E. Bourne, Ph.D., President organizing committee, local organizing com- teleconference. Motions that pass are enacted Michael Gribskov, Ph.D., mittee, and program committee for their hard by the Executive Committee which also serves Vice President work preparing for the conference.
    [Show full text]
  • UCLA UCLA Electronic Theses and Dissertations
    UCLA UCLA Electronic Theses and Dissertations Title Bipartite Network Community Detection: Development and Survey of Algorithmic and Stochastic Block Model Based Methods Permalink https://escholarship.org/uc/item/0tr9j01r Author Sun, Yidan Publication Date 2021 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Bipartite Network Community Detection: Development and Survey of Algorithmic and Stochastic Block Model Based Methods A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Statistics by Yidan Sun 2021 © Copyright by Yidan Sun 2021 ABSTRACT OF THE DISSERTATION Bipartite Network Community Detection: Development and Survey of Algorithmic and Stochastic Block Model Based Methods by Yidan Sun Doctor of Philosophy in Statistics University of California, Los Angeles, 2021 Professor Jingyi Li, Chair In a bipartite network, nodes are divided into two types, and edges are only allowed to connect nodes of different types. Bipartite network clustering problems aim to identify node groups with more edges between themselves and fewer edges to the rest of the network. The approaches for community detection in the bipartite network can roughly be classified into algorithmic and model-based methods. The algorithmic methods solve the problem either by greedy searches in a heuristic way or optimizing based on some criteria over all possible partitions. The model-based methods fit a generative model to the observed data and study the model in a statistically principled way. In this dissertation, we mainly focus on bipartite clustering under two scenarios: incorporation of node covariates and detection of mixed membership communities.
    [Show full text]
  • Orchestrating Single-Cell Analysis with Bioconductor
    bioRxiv preprint doi: https://doi.org/10.1101/590562; this version posted March 27, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Orchestrating Single-Cell Analysis with Bioconductor Robert A. Amezquita1, Vince J. Carey∗2, Lindsay N. Carpp∗1, Ludwig Geistlinger∗3,4, Aaron T. L. Lun∗5, Federico Marini∗6,7, Kevin Rue-Albrecht∗8, Davide Risso∗9,10, Charlotte Soneson∗11,12, Levi Waldron∗3,4, Hervé Pagès1, Mike Smith13, Wolfgang Huber13, Martin Morgan14, Raphael Gottardo†1, and Stephanie C. Hicksy15 1Fred Hutchinson Cancer Research Center, Seattle, WA, USA 2Channing Division of Network Medicine, Brigham And Women’s Hospital, MA, USA 3Graduate School of Public Health and Health Policy, City University of New York, NY, USA 4Institute for Implementation Science in Population Health, City University of New York, NY, USA 5Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK 6Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz, Germany 7Center for Thrombosis and Hemostasis, Mainz, Germany 8Kennedy Institute of Rheumatology, University of Oxford, Oxford, OX3 7FY, UK 9Department of Statistical Sciences, University of Padua, Italy 10Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA 11Friedrich Miescher Institute
    [Show full text]
  • Bringing Data Exploration to Biologists
    BioBridge: Bringing Data Exploration to Biologists by Joseph R. Boyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE In partial fulfillment of the requirements for the Degree of Master of Science in Bioinformatics and Computational Biology by May 2014 APPROVED: Professor Matthew O. Ward, Major Thesis Advisor Professor Samuel Politz, Thesis Reader Professor Elizabeth Ryder, Head of Department Abstract Since the completion of the Human Genome Project in 2003, biologists have be- come exceptionally good at producing data. Indeed, biological data has experienced a sustained exponential growth rate, putting effective and thorough analysis be- yond the reach of many biologists. This thesis presents BioBridge, an interactive visualization tool developed to bring intuitive data exploration to biologists. Bio- Bridge is designed to work on omics style tabular data in general and thus has broad applicability. This work describes the design and evaluation of BioBridge's Entity View pri- mary visualization as well the accompanying user interface. The Entity View vi- sualization arranges glyphs representing biological entities (e.g. genes, proteins, metabolites) along with related text mining results to provide biological context. Throughout development the goal has been to maximize accessibility and usability for biologists who are not computationally inclined. Evaluations were done with three informal case studies, one of a metabolome dataset and two of microarray datasets. BioBridge is a proof of concept that there is an underexploited niche in the data analysis ecosystem for tools that prioritize accessibility and usability. The use case studies, while anecdotal, are very encouraging. These studies indicate that BioBridge is well suited for the task of data exploration.
    [Show full text]
  • The Kyoto Encyclopedia of Genes and Genomes–KEGG
    Yeast Yeast 2000; 17: 48±55. Website Review The Kyoto Encyclopedia of Genes and GenomesÐKEGG http://www.genome.ad.jp/kegg/ The KEGG project is run by the Institute for Chemical Research at Kyoto University, as part of the Japanese Human Genome Program. The page was constructed by a large project team (see http://www.genome.ad.jp/kegg/kegg1.html) and is maintained by Professor Minoru Kanehisa and Susumu Goto (Institute for Chemical Research). All screen views from the website are reproduced with the kind permission of Professor Minoru Kanehisa. Page structure Page guide The site (http://www.genome.ad.jp/kegg/kegg2.html) Pathway information is organized into three sections: pathway informa- Pathway maps and orthologue tables tion, genomic information, and computational This section provides metabolic and regulatory path- tools. The pathway information section includes way maps and tables of orthologous genes, the path- searchable pathway maps and orthologue tables of ways can also be searched using the tools provided. metabolic and regulatory pathways. There are also The metabolic pathway link leads to an ordered extensive catalogues of diseases (human), organisms table of known metabolic pathways. Clicking on the (completely sequenced genomes or chromosomes), pathway of interest leads to a ¯ow-chart of the cells (cell lineages), enzymes and `compounds'. All of these are extensively supported by searches of the `standard pathway' (Figure 1), in which all enzymes relevant sections of the KEGG databases, which (Figure 2), substrates, products (Figure 3) and allied can be used to help with the correct speci®cation of pathways are linked to pages on these topics.
    [Show full text]
  • Planning for Globally Sustainable Life Sciences Data Resources
    PLANNING FOR GLOBALLY SUSTAINABLE LIFE SCIENCES DATA RESOURCES W. Anderson1, R. Appel8, R. Apweiler2, G.A. Bauer1, H. Berman3, N. Blomberg4, C.H. de Brito Cruz5, T. Donohue6, M. Dunn7, C. Durinx8, P. Goodhand9, E.D. Green10, J. McEntyre2, Y. Nakamura11, A. Smith2, Y-Y. Teo12, M. Woodwark13, and P. Lasko14 1: The International Human Frontier Science Program Organization, 12 Quai Saint-Jean, 67080 Strasbourg, France 2: European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK 3: Rutgers University, Center for Integrative Proteomics Research (CIPR), SAS - Chemistry & Chemical Biology, 174 Frelinghuysen Rd, Piscataway, NJ 08854-8076, USA 4: ELIXIR, Wellcome Genome Campus, Hinxton, CB10 1SD, UK 5: The Sao Paulo Research Foundation, R. Pio XI, 1500 - Alto da Lapa - CEP 05468-901 Sao Paulo/SP, Brazil 6: Department of Bacteriology, Great Lakes Bioenergy Research Center, & Wisconsin Energy Institute, University of Wisconsin-Madison, 1552 University Avenue, Madison, WI 53726, USA 7: The Wellcome Trust, Gibbs Building, 215 Euston Road, London NW1 2BE, UK Wisconsin-Madison, 1552 University Avenue, Madison, WI 53726, USA 8: SIB Swiss Institute of Bioinformatics, University of Lausanne, Batiment Genopode, 1015 Lausanne, Switzerland 9: Global Alliance for Genomics and Health, MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada, M5G 0A3 10: National Human Genome Research Institute, National Institutes of Health, 31 Center Dr., Bethesda, MD 20892, USA 11: National Institute
    [Show full text]
  • KEGG: Kyoto Encyclopedia of Genes and Genomes Hiroyuki Ogata, Susumu Goto, Kazushige Sato, Wataru Fujibuchi, Hidemasa Bono and Minoru Kanehisa*
    1999 Oxford University Press Nucleic Acids Research, 1999, Vol. 27, No. 1 29–34 KEGG: Kyoto Encyclopedia of Genes and Genomes Hiroyuki Ogata, Susumu Goto, Kazushige Sato, Wataru Fujibuchi, Hidemasa Bono and Minoru Kanehisa* Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan Received September 8, 1998; Revised September 22, 1998; Accepted October 14, 1998 ABSTRACT The basic concepts of KEGG (1) and underlying informatics technologies (2,3) have already been published. KEGG is tightly Kyoto Encyclopedia of Genes and Genomes (KEGG) is integrated with the LIGAND chemical database for enzyme a knowledge base for systematic analysis of gene reactions (4,5) as well as with most of the major molecular functions in terms of the networks of genes and biology databases by the DBGET/LinkDB system (6) under the molecules. The major component of KEGG is the Japanese GenomeNet service (7). The database organization PATHWAY database that consists of graphical dia- efforts require extensive analyses of completely sequenced grams of biochemical pathways including most of the genomes, as exemplified by the analyses of metabolic pathways known metabolic pathways and some of the known (8) and ABC transport systems (9). In this article, we describe the regulatory pathways. The pathway information is also current status of the KEGG databases and discuss the use of represented by the ortholog group tables summarizing KEGG for functional genomics. orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES OBJECTIVES OF KEGG database for the gene catalogs of all organisms with complete genomes and selected organisms with In May 1995, we initiated the KEGG project under the Human partial genomes, which are continuously re-annotated, Genome Program of the Ministry of Education, Science, Sports as well as the LIGAND database for chemical com- and Culture in Japan.
    [Show full text]