Human Genome Center Laboratory of Genome Database Laboratory of Sequence Analysis ゲノムデータベース分野 シークエンスデータ情報処理分野

Human Genome Center Laboratory of Genome Database Laboratory of Sequence Analysis ゲノムデータベース分野 シークエンスデータ情報処理分野

116 Human Genome Center Laboratory of Genome Database Laboratory of Sequence Analysis ゲノムデータベース分野 シークエンスデータ情報処理分野 Professor Minoru Kanehisa, Ph.D. 教授(委嘱) 理学博士 金 久 實 Assistant Professor Toshiaki Katayama, M.Sc. 助 教 理学修士 片山俊明 Assistant Professor Shuichi Kawashima, M.Sc. 助 教 理学修士 川島秀一 Lecturer Tetsuo Shibuya, Ph.D. 講 師 理学博士 渋谷哲朗 Assistant Professor Michihiro Araki, Ph.D. 助 教 薬学博士 荒木通啓 Since the completion of the Human Genome Project, high-throughput experimental projects have been initiated for uncovering genomic information in an extended sense, including transcriptomics, proteomics metabolomics, glycomics, and chemi- cal genomics. We are developing a new generation of databases and computa- tional technologies, beyond the traditional genome databases and sequence analy- sis tools, for making full use of these divergent and ever-increasing amounts of data, especially for medical and pharmaceutical applications. 1. KEGG DRUG and KEGG DISEASE natural product information in the context of plant and other genomes. We also develop Minoru Kanehisa KEGG DISEASE (http://www.genome.jp/kegg/ disease/), a new resource for understanding KEGG is a database of biological systems that molecular mechanisms of human diseases, integrates genomic, chemical, and systemic func- where molecular networks involving disease tional information. It is widely used as a refer- genes are represented as KEGG pathway maps ence knowledge base for understanding higher- or, when such details are not known, simply by order functions and utilities of the cell or the or- a list of diseases genes and other lists of mole- ganism from genomic information. Although the cules including environmental factors. basic components of the KEGG resource are de- veloped in Kyoto University, this Laboratory in 2. KEGG OC: Automatic assignments of the Human Genome Center is responsible for orthologs and paralogs in complete the applied areas of KEGG, especially in medi- genomes cal and pharmaceutical sciences. We develop KEGG DRUG (http://www.genome.jp/kegg/ Toshiaki Katayama, Shuichi Kawashima, drug/), a chemical structure based information Akihiro Nakaya and Minoru Kanehisa resource for all approved drugs in the world, in- tegrating target information in the context of The increase in the number of complete KEGG pathways, efficacy information in the genomeshasprovidedcluestogainusefulin- context of hierarchical drug classifications, and sights to understand the evolution of the gene 117 universe. Among the KEGG suites of databases, Currently, It contains 2,452,094 sequence cata- the GENES database contains more than 4.2 mil- logues (779,490 contigs and 1,672,604 singletons) lion genes from over 1,000 organisms as of Feb- in 62 plants. 25% of the sequences are assigned ruary 2009. Sequence similarities among these K numbers. EGENES is available at http:// genes are calculated by all-against-all SSEARCH www.genome.jp/kegg/plant/pln_list.html. comparison and stored them in the SSDB data- base. Based on those databases, the ORTHOL- 4. KEGG API: SOAP/WSDL interface for the OGY database has been manually constructed to KEGG system store the relationships among the genes sharing the same biological function. However, in this Shuichi Kawashima, Toshiaki Katayama and strategy, only the well known functions can be Minoru Kanehisa used for annotation of newly added genes, thus the number of annotated genes is limited. To KEGG is a suite of databases and associated overcome this situation, we developed a fully software, integrating our current knowledge of automated procedure to find candidate ortholo- molecular interaction/reaction pathways and gous clusters including whose current functional other systemic functions (PATHWAY and BRITE annotation is anonymous. The method is based databases), the information about the genomic on a graph analysis of the SSDB database, treat- space (GENES database), and information about ing genes as nodes and Smith-Waterman se- the chemical space (LIGAND and DRUG data- quence similarity scores as weight of edges. The bases). To facilitate large-scale applications of cluster is found by our heuristic method for the KEGG system programmatically, we have finding quasi-cliques, but the SSDB graph is too been developing and maintaining the KEGG large to perform quasi-clique finding at a time. API as a stable SOAP/WSDL based web service. Therefore, we introduce a hierarchy (evolution- The KEGG API is available at http://www. ary relationship) of organisms and treat the genome.jp/kegg/soap/. SSDB graph as a nested graph. The automatic decomposition of the SSDB graph into a set of 5. KEGG DAS: Comprehensive repository for quasi-cliques results in the KEGG OC (Ortholog community genome annotation Cluster) database. We built a system that per- forms automatic update of the ortholog cluster, Toshiaki Katayama, Mari Watanabe and which can be run weekly basis. As a result, we Minoru Kanehisa obtained 669,043 clusters including 403,752 sin- gleton clusters from 3,721,464 protein coding KEGG DAS is an advanced genome database genes. Among them, only 2,309 clusters were system providing DAS (Distributed Annotation shared across kingdoms and other clusters were System) service for all bacterial organisms in the kingdom specific. The automatic classification of GENOME database in KEGG. Currently, KEGG our ortholog clusters largely consistent with the DAS contains over 8 million annotations as- manually curated ORTHOLOGY database. A signed to the genome sequences of 817 organ- web interface to search and browse genes in isms (increased from 615 organisms in last year). clusters is made available at http://oc.kegg.jp/. The KEGG DAS server provides gene annota- tions linked to the KEGG PATHWAY and 3. EGENES: A database for expressed se- LIGAND databases. In addition to the coding quence tag indices of plant species genes, information of non-coding RNAs pre- dicted using Rfam database is also provided to Shuichi Kawashima, Yuki Moriya, Toshiaki fill the annotation of the intergenic regions of Tokimatsu , Susumu Goto and Minoru the genomes. The KEGG DAS service is avail- Kanehisa able at http://das.hgc.jp/. EGENES is a knowledge-based database for 6. Full-Arthropods: Constructing full length efficient analysis of plant expressed sequence cDNA of pathogenetic arthropods tags (ESTs), which was recently added to the KEGG PLANT. It links plant genomic informa- Toshiaki Katayama, Shuichi Kawashima, Miho tion with higher order functional information in Usui, Hiroyuki Wakaguri, Eri Kibukawa, KEGG. The genomic information in EGENES is Masahide Sasaki , Kazuhisa Hiranuka , a collection of EST contigs constructed from as- Ryuichiro Maeda, Yutaka Suzuki, Sumio sembled plant ESTs by using EGassembler. The Sugano and Junichi Watanabe EST indices are automatically annotated with the KEGG Orthology identifiers (K numbers) by Anopheles mosquito, tsetse fly, tsutsugamushi KEGG Automatics Annotation Server (KAAS). -mite, dust mite are arthropods which are 118 known as medically important because these and adults using the vector trapper method and either transmit various infectious disease includ- sequenced the both ends of 11,520 cDNA clones. ing malaria, Japanese river fever, or cause al- Cleaning, clustering and assembling of the row lergy such as asthma and dermatitis. Because of sequences produced 3,031 contigs and 4,281 sin- serious medical problems they cause, their gletons. 1,797 of the total unique 7,312 se- genomes are being extensively analyzed re- quences were assigned KEGG Orthology by cently. We have produced libraries of the four KAAS system. More than 30% of the sequences organisms and are constructing their databases showed significant matches to KEGG GENES for the functional genome analysis. Full- database, which includes well characterized Der Arthropods is available on the site http://ful- f group 1 allergens. We predicted 1,109 peptides larth.hgc.jp/ longer than 20 amino acids from the 3,031 con- tigs. Some of the peptides are predicted to con- 7. Full-Entamoeba: a database for the full tain the 9-mer peptides with strong affinities to length cDNA library of Entamoeba MHC class II by NetMHC 3.0. We expect that these in silico analyses will pave the way toward Toshiaki Katayama , Kazushi Hiranuka , prediction of allergens from D. farinae. Masahiro Kumagai, Yutaka Suzuki, Sumio Sugano, Atsushi Toyoda, Asao Makioka, 9. HiGet and SSS: Search engines for the Junichi Watanabe large-scale biological databases Entamoeba histolytica is a protozoan parasite Toshiaki Katayama, Shuichi Kawashima, which predominantly infects humans and other Kazuhiro Ohi, Kenta Nakai and Minoru primates and causes amebiasis. E. histolytica is Kanehisa estimated to infect about 50 million people worldwide and amebiasis is estimated to cause Recently, the number of entries in biological 70,000 deaths per year. Full-Entamoeba, a data- databases is exponentially increasing year by base for full-length cDNAs from a human para- year. For example, there were 10,106,023 entries site, E. histolytica, and a reptilian parasite, E. in- in the GenBank database in the year 2000, which vadens, has been produced. The full-length has now grown to 98,868,465 (Release 169+ cDNA libraries were produced using the oligo- daily updates). In order for such a vast amount capping method from trophozoites of each spe- of data to be searched at a high speed, we have cies cultivated axenically. A total of 5,000 5’end- developed a high performance database entry one-pass sequences of cDNAs from the two spe- retrieval system, named HiGet. For this purpose, cies were compared with the non-redundant da- the system is constructed on the HiRDB, a com- tabase of DDBL/GenBank/EMBL using BLAST mercial ORDBMS (Object-oriented Relational and TBLASTX programs. These clones are avail- Database Management System) developed by able for further analysis and experiments. Full- Hitachi, Ltd. HiGet can perform full text search Entamoeba database is available at http://ful- on various biological databases including Gen- lent.hgc.jp/ Bank, RefSeq, UniProt, Prosite, OMIM and PDB. Additional advantage of the HiGet system is the 8.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    39 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us