104

Human Genome Center Laboratory of Genome Database Laboratory of Sequence Analysis ゲノムデータベース分野 シークエンスデータ情報処理分野

Professor Minoru Kanehisa, Ph.D. 教 授 理学博士金久 實 Assistant Professor Toshiaki Katayama, M.Sc. 助 教 理学修士 片山俊明 Assistant Professor Shuichi Kawashima, M.Sc. 助 教 理学修士 川島秀一 Lecturer Tetsuo Shibuya, Ph.D. 講 師 理学博士 渋谷哲朗 Assistant Professor Michihiro Araki, Ph.D. 助 教 薬学博士 荒木通啓

DNA, RNA, and are the basic molecular building blocks of life, but the liv- ing cell contains additional molecules, including water, ions, small chemical com- pounds, glycans, lipids, and other biochemical molecules, without which the cell would not function. We are developing bioinformatics methods to integrate differ- ent types of data and knowledge on various aspects of the biological systems to- wards basic understanding of life as a molecular interaction/reaction system and also for practical applications in medical and pharmaceutical sciences.

1. KEGG DRUG and KEGG DISEASE KEGG DISEASE (http://www.genome.jp/kegg/ disease/) as a new addition to the KEGG suite Minoru Kanehisa and Michihiro Araki of databases. Each disease entry consists of a list of diseases and other lists of molecules KEGG is a database of biological systems that such as environmental factors, markers, drugs, integrates genomic, chemical, and systemic func- etc. Both DRUG and DISEASE are highly inte- tional information. It is widely used as a refer- grated with other KEGG databases including ence knowledge base for understanding higher- PATHWAY, BRITE, GENES, and COMPOUND, order functions and utilities of the cell or the or- and also with other Internet resources. ganism from genomic information. Although the basic components of the KEGG resource are de- 2. Automatic assignments of orthologs and veloped in Kyoto University, this Laboratory in paralogs in complete genomes the Center is responsible for the applied areas of KEGG, especially in medi- Toshiaki Katayama, Shuichi Kawashima, cal and pharmaceutical sciences. We develop Akihiro Nakaya and Minoru Kanehisa KEGG DRUG (http://www.genome.jp/kegg/ drug/), which is a chemical structure database The increase in the number of complete for all approved drugs, associated with target genomeshasprovidedcluestogainusefulin- information in the context of KEGG pathways, sights to understand the evolution of the efficacy information in the context of hierarchi- universe. Among the KEGG suites of databases, cal drug classifications, etc. We also develop the GENES database contains more than 2.5 mil- 105 lion genes from over 600 organisms as of Octo- at http://www.genome.jp/kegg-bin/create_ ber 2007. Sequence similarities among these kegg_menu?category5plants_egenes and http:// genes are calculated by all-against-all SSEARCH egassembler.hgc.jp/respectively. comparison and stored them in the SSDB data- base. Based on those databases, the ORTHOL- 4. SOAP/WSDL interface for the KEGG sys- OGY database has been manually constructed to tem store the relationships among the genes sharing the same biological function. However, in this Shuichi Kawashima, Toshiaki Katayama and strategy, only the well known functions can be Minoru Kanehisa used for annotation of newly added genes, thus the number of annotated genes is limited. To KEGG (Kyoto Encyclopedia of Genes and overcome this situation, we developed a fully Genomes) is a suite of databases and associated automated procedure to find candidate ortholo- software, integrating our current knowledge of gous clusters including whose current functional molecular interaction/reaction pathways and annotation is anonymous. The method is based other systemic functions (PATHWAY and BRITE on a graph analysis of the SSDB database, treat- databases), the information about the genomic ing genes as nodes and Smith-Waterman se- space (GENES database), and information about quence similarity scores as weight of edges. The the chemical space (LIGAND and DRUG data- cluster is found by our heuristic method for bases). To facilitate large-scale applications of finding quasi-cliques, but the SSDB graph is too the KEGG system programatically, we have large to perform quasi-clique finding at a time. been developing and maintaining the KEGG Therefore, we introduce a hierarchy (evolution- API as a SOAP/WSDL based web service. Re- ary relationship) of organisms and treat the cent improvements includes retrieval of the ref- SSDB graph as a nested graph. The automatic erence information from the KEGG PATHWAY decomposition of the SSDB graph into a set of database and utilization of the newly developed quasi-cliques results in the KEGG OC (Ortholog KEGG DRUG database. We are refining our Cluster) database. A web interface to search and service to compromise standardization of the browse genes in clusters is made available at bioinformatics web services and checking of op- http://dev.kegg.jp/oc/. erations in various computer languages. The KEGG API is available at http://www.genome. 3. EGENES: Transcriptome-Based Plant Data- jp/kegg/soap/. base of Genes with Metabolic Pathway In- formation and Expressed Sequence Tag 5. Comprehensive repository for community Indices in KEGG genome annotation

Ali Masoudi-Nejad, Susumu Goto, Ruy Toshiaki Katayama, Mari Watanabe and Jauregui, Masumi Itoh, Shuichi Kawashima, Minoru Kanehisa Yuki Moriya, Takashi R. Endo and Minoru Kanehisa KEGG DAS is an advanced genome database system providing DAS (Distributed Annotation EGENES is a knowledge-based database for System) service for all organisms in the efficient analysis of plant expressed sequence GENOME and GENES databases in KEGG tags (ESTs) that was recently added to the (Kyoto Encyclopedia of Genes and Genomes). KEGG suite of databases. It links plant genomic Currently, KEGG DAS contains over 8 million information with higher order functional infor- annotations assigned to the genome sequences mation in a single database. It also provides of 615 organisms (increased from 440 organisms gene indices for each genome. The genomic in- in last year). The KEGG DAS server provides formation in EGENES is a collection of EST con- gene annotations linked to the KEGG PATH- tigs constructed from assembled ESTs by using WAY and LIGAND databases. In addition to the EGassembler. EGassembler is a web server, coding genes, information of non-coding RNAs which provides an automated as well as a user- predicted using Rfam database is also provided customized analysis tool for cleaning, repeat to fill the annotation of the intergenic regions of masking, vector trimming, organelle masking, the genomes. We have been developing the clustering and assembling of ESTs and genomic server based on open source software including fragments. Due to the extremely large genomes BioRuby, BioPerl, BioDAS and GMOD/GBrowse of plant species, the bulk collection of data such to make the system consistent with the existing as ESTs is a quick way to capture a complete open standards. We are also responsible to the repertoire of genes expressed in an organism. Japanese localization of the GBrowse genome EGENES and EGassembler are publicly available browser. The contents of the KEGG DAS data- 106 base can be accessed graphically in a web Hiroshi Wada and Minoru Kanehisa browser using GBrowse GUI (graphical user in- terface) and also programatically by the DAS In the evolution of the eukaryotic genome, (XML over HTTP) protocol. Users are able to exon or domain shuffling has produced a vari- add their own annotations onto the KEGG DAS ety of proteins. On the assumption that each fu- server by connecting other DAS servers or by sion event between two independent - simply uploading their own data as a file. This domains occurred only once in the evolution of functionality enables researchers to share the metazoans, we can roughly estimate when the “community annotation.” The KEGG DAS is fusion events were happened. For this purpose, weekly updated and freely available at http:// we made phylogenetic profiles of pair-wise das.hgc.jp/. domain-combinations of metazoans. The phylo- genetic profiles can be expected to reflect the 6. Constructing a database of full length protein evolution of metazoan. Interestingly, the cDNA of pathogenetic arthropods phylogenetic tree of metazoans, derived from the profiles, supported the “Ecdysozoa hypothe- Toshiaki Katayama, Shuichi Kawashima, sis” that is one of the major hypotheses for Junichi Watanabe, Yutaka Suzuki, Sumio metazoan evolution. Further, the phylogenetic Sugano and Minoru Kanehisa profiles showed the candidates of genes that were required for each clade-specific features in Anopheles mosquito, tsetse fly, tsutsugamushi- metazoan evolution. We propose that compara- mite, dust mite are arthropods which are known tive proteome analysis focusing on pair-wise as medically important because these either domain-combinations is a useful strategy for re- transmit various infectious disease including searching the metazoan evolution. Additionally, malaria, Japanese river fever, or cause allergy we found that the extant ecdysozoans share such as asthma and dermatitis. Because of seri- only fourteen domain-combinations in our pro- ous medical problems they cause, their genomes files. Such a small number of ecdysozoan- are being extensively analyzed recently. We specific domain-combinations is consistent with have produced libraries of the four organisms the extensive gene losses through the evolution and are constructing their databases for the of ecdysozoans. functional genome analysis. It will be available on the site http://fullarth.hgc.jp/ 9. SSS: a sequence similarity search service

7. AAindex: Amino Acid index database Toshiaki Katayama, Kazuhiro Ohi and Minoru Kanehisa Shuichi Kawashima, Toshiaki Katayama and Minoru Kanehisa There are various services in the world to find similar sequences from the database, such as the AAindex is a database of numerical indices famous BLAST service provided at NCBI. How- representing various physicochemical and bio- ever, the method to search and the database to chemical properties of amino acids and pairs of be searched could not be added from outside. amino acids. We have added a collection of pro- To provide our super computer resources at the tein contact potentials to the AAindex as a new Human Genome Center to the research commu- section. Accordingly AAindex consists of three nity, we started to develop a new service for the sections now: AAindex1 for the amino acid in- sequence similarity search, SSS. In SSS, user can dex of 20 numerical values, AAindex2 for the select the search algorithm from BLAST, FASTA, amino acid mutation matrix and AAindex3 for SSEARCH, TRANS and EXONERATE. This va- the statistical protein contact potentials. All data riety of options is unique among the public are derived from published literature. The data- services. Then user is prompted to select appro- base can be accessed through the DBGET/ priate database depending on the algorithm se- LinkDB system at GenomeNet (http://www. lected and the search is executed. On the back- genome.jp/dbget-bin/www_bfind?aaindex) or end, we implemented the search system on the downloaded by anonymous FTP (ftp://ftp. Sun Grid Engine to provide efficient resources genome.jp/pub/db/community/aaindex/). on distributed computers. As a result, we are able to provide time consuming services such as 8. Comparative pair-wise domain-combination TRANS and EXONERATE in addition to the for screening the clade specific domain- popular algorithms. The SSS service is freely architectures in metazoan genomes available at http://sss.hgc.jp/.

Shuichi Kawashima, Takeshi Kawashima, 107

10. High performance database entry re- suffix tree (or PSGST for short). According to trieval system our experiments, the PSGST outperforms the geometric suffix tree in most cases. The PSGST Toshiaki Katayama, Shuichi Kawashima, shows its best performance when the database Kenta Nakai and Minoru Kanehisa does not have many substructures similar to the query. The query is sometimes 100 times faster Recently, the number of entries in biological than the original geometric suffix trees in such databases is exponentially increasing year by cases. year. For example, there were 10,106,023 entries in the GenBank database in the year 2000, which 12. Compact Geometric Suffix Tree has now grown to 83,051,496 (Release 162+ daily updates). In order for such a vast amount Tetsuo Shibuya of data to be searched at a high speed, we have developed a high performance database entry Protein structure analysis is one of the most retrieval system, named HiGet. The HiGet sys- important research issues in the post-genomic tem is constructed on the HiRDB, a commercial era, and faster and more accurate index data ORDBMS (Object-oriented Relational Database structures for such 3-D structures are highly de- Management System) developed by Hitachi, sired for research on proteins. The geometric Ltd. It is publicly accessible on the Web page at suffix tree, which is also a data structure pro- http://higet.hgc.jp/ or SOAP based web service posed by our group, is a very sophisticated in- at http://higet.hgc.jp/soap/. HiGet can execute dex structure that enables fast and accurate full text search on various biological databases. search on protein 3-D structures. By using it, we In addition to the original plain format, the sys- can search from 3-D structure databases for all tem contains data in the XML format in order to the substructures whose RMSDs (root mean provide a field specific search facility. When a square deviations) to a given query 3-D struc- complicated search condition is issued to the ture are not larger than a given bound. But the system, the search processing is executed effi- geometric suffix tree requires rather large mem- ciently by combining several types of indices to ory, i.e., about 65n byte for a protein of length reduce the number of records to be processed n. We developed a new data structure called within the system. Current searchable databases “compact geometric suffix tree” which requires areGenBank,UniProt,Prosite,OMIM,PDBand only 23n byte, which is almost about 1/3 of the RefSeq. We are planning to include other valu- original geometric suffix tree, while the query able databases and also planning to develop an speed is almost the same as the original geomet- inter-database search interface and a complex ric suffix tree. search facility combining keyword search and sequence similarity search. 13. Protein Function Prediction based on 3-D Structure Motifs 11. PSGST: A New Data Structure for Index- ing Protein Structures Chia-Han Chu, Hiroki Sakai, and Tetsuo Shibuya Tetsuo Shibuya Protein functions are said to be determined by Protein structure analysis is one of the most its 3-D structures, but not all functions have important research issues in the post-genomic been known to be related to some 3-D structure era, and faster and more accurate index data motifs. The geometric suffix tree, a data struc- structures for such 3-D structures are highly de- ture for indexing 3-D protein structures, which sired for research on proteins. The geometric is also developed by us, enables comprehen- suffix tree, which is also proposed by our group, sively enumeration of all the possible structural is a very sophisticated index structure that en- motifs among given set of proteins. We are de- ables fast and accurate search on protein 3-D veloping a new algorithm based on the support structures. By using it, we can search from 3-D vector machine that decides protein’s function structure databases for all the substructures from the 3-D structure of a protein. This algo- whose RMSDs (root mean square deviations) to rithm utilizes all the possible 3-D motifs found a given query 3-D structure are not larger than by using the geometric suffix tree. a given bound. We proposed a new data struc- ture based on the geometric suffix tree whose 14. Fast Hinge Detection Algorithm in Protein query performance is much better than the Structures original geometric suffix tree. We call the modi- fied data structure the prefix-shuffled geometric Tetsuo Shibuya 108

Analysis of conformational changes is one of deviation) is the most frequently used measure the keys to the understanding of protein func- for comparing two protein 3-D structures. We tions and interactions. For the analysis, we often deal with two fundamental problems related to compare two protein structures, taking flexible the RMSD. We first deal with a problem called regions like hinge domains into consideration. the‘rangeRMSDquery’problem.Givenan The RMSD (Root Mean Square Deviation) is the aligned pair of structures, the problem is to most popular measure for comparing two pro- compute the RMSD between two aligned sub- tein structures, but it is only for rigid structures structures of them without gaps. This problem without hinge domains. In this paper, we pro- has many applications in protein structure pose a new measure called RMSDh (Root Mean analysis. We propose a linear-time preprocess- Square Deviation considering hinges) and its ing algorithm that enables constant-time RMSD variant RMSDh(k) for comparing two flexible computation. Next, we consider a problem proteins with hinge domains. We also propose called the ‘substructure RMSD query’ problem, novel efficient algorithms for computing them, which is a generalization of the above range which can detect the hinge positions at the same RMSD query problem. It is a problem to com- time. The RMSDh is suitable for cases where pute the RMSD between any substructures of there is one small hinge domain in each of the two unaligned structures without gaps. Based two target structures. The new algorithm for on the algorithm for the range RMSD problem, computing the RMSDh runs in linear time, we propose an O(nm) preprocessing algorithm whichissameasthetimecomplexityforcom- that enables constant-time RMSD computation, puting the RMSD and is faster than any of pre- where n and m are the lengths of the given vious algorithms for hinge detection. The structures. Moreover, we propose O(nm log r/r) RMSDh(k) is designed for comparing structures -time and O(nm/r)-space preprocessing algo- with more than one hinge domain. The RMSDh rithm that enables O(r) query, where r is an ar- (k) measure considers at most k small hinge do- bitrary integer such that 0<r<min(n, m). We mains, i.e., the RMSDh(k) value should be small also show that our strategy also works for an- if the two structures are similar except for at other measure called the URMSD (unit-vector most k hinge domains. To compute the value, root mean square deviation), which is a variant we propose an O(kn2)-time and O(n)-space algo- of the RMSD. rithm based on a new dynamic programming technique. We also test our measures against 17. Suffix Array Construction with a Lazy both flexible protein structures and non-flexible Scheme protein structures, and show that the hinge po- sitions can be correctly detected by our algo- Ben Hachimori and Tetsuo Shibuya rithms. The suffix array is one of the most important 15. Fast Flexible Protein Structure Alignment indexing data structures for alphabet strings, in- cluding DNA sequences, RNA sequences, pro- Kohichi Suematsu and Tetsuo Shibuya tein sequences, web pages, Medline database, and so on. But even the most sophisticated algo- The Hinge Detection Algorithm described in rithm for constructing the suffix array requires a section 14 only considered rigid hinge points, lot of time. We developed a new efficient lazy but the hinges are sometimes bends a little by it- algorithm that computes the suffix array only self, which sometimes leads to inaccurate pre- afterwegetthequery.Bydoingso,wehaveto diction of hinge positions. Thus we incorporated compute only the necessary part of the suffix ar- the notion ‘bending hinge’ to detect such hinge ray. positions. We developed a very efficient heuris- tic algorithm for finding such bending hinges, as 18. Genotype Clustering based on Hidden the exact algorithm for this problem requires ex- Markov Models ponential time. Ritsuko Onuki, Tetsuo Shibuya and Minoru 16. Fast Algorithms for Fast Range RMSD Kanehisa Query Haplotype clustering is important for gene Tetsuo Shibuya mapping of human disease. Although its impor- tance for the analysis, it is difficult to obtain Protein structure analysis is a very important haplotype data from present experiment for its research topic in the molecular biology of the cost and error rate. Instead of haplotypes, geno- post-genomic era. The RMSD (root mean square types are much easier to obtain. In this work, 109 we propose a new method for clustering geno- patterns were then applied for in silico drug types. In the algorithm. we first infer the multi- modifications. We developed a scheme to pre- ple haplotype candidates from the genotype, dict potential drug-like compounds, which and next we calculate the distance between the could demonstrate the reconstruction of the genotypes based on the results of the haplotype drug developmental maps. inference. Then we perform genotype clustering based on the distances. We evaluated our algo- 21. Comprehensive Analysis of Distinctive rithm by applying our algorithm against several Polyketide and Nonribosomal Peptide actual genotype data. Structural Motifs Encoded in Microbial Genomes 19. Analysis of the Chemical Diversity of Me- dicinal Natural Products Yohsuke Minowa, Michihiro Araki and Minoru Kanehisa Michihiro Araki, Tetsuo Shibuya, Kohichi Suematsu, and Minoru Kanehisa We developed a highly accurate method to predict polyketide (PK) and nonribosomal pep- The diverse activities of medicinal natural tide (NRP) structures encoded in the microbial products are explained by extensive diversities genome. PKs/NRPs are polymers of carbonyl or in the chemical structures. Although several hy- peptidyl chains synthesized by polyketide syn- potheses have been proposed to understand the thases (PKS) and nonribosomal peptide syn- origin of the chemical diversity, no systematic thetases (NRPS). We analyzed domain se- analyses have been done so far. In order to elu- quences corresponding to specific substrates and cidate how the chemical diversities are designed physical interactions between PKSs/NRPSs in in the biosynthetic circuits, we defined molecu- order to predict which substrates are selected lar building blocks required for describing the and assembled into highly ordered chemical chemical information of natural products and structures. The predicted PKs/NRPs were repre- performed the systematic analyses with the sented as the sequences of carbonyl/peptidyl transformational and combinatorial information units to extract the structural motifs efficiently. of the building blocks. We further reconstructed The structural sequences were compared using the metabolic network using the chemical and the Smith-Waterman algorithm to extract 33 the genomic information to identify alternative structural motifs that are significantly related biosynthetic pathways as well as new chemical with their bioactivities. The integrative analysis entities. of genomic and chemical information here will provide a strategy to predict the chemical struc- 20. Extraction of Chemical Modification Pat- tures, the biosynthetic pathways, and the bio- terns from Drug Developmental History logical activities of PKs/NRPs, which is useful and Its Application for the rational design of novel PKs/NRPs.

Daichi Shigemizu, Michihiro Araki, Shujiro 22. Prediction of Drug-target Interaction Net- Okuda, Susumu Goto and Minoru Kanehisa works from the Integration of Chemical and Genomic Spaces The chemical modifications in the history of drug discovery are expected to provide a wealth Yoshihiro Yamanishi, Michihiro Araki, Alex of the medicinal chemists’ knowledge and it is Gutteridge, Wataru Honda and Minoru of importance to extract the information on the Kanehisa chemical modifications in the form of distinct patterns. KEGG DRUG structure map collected The identification of interactions between such information by illustrating the develop- drugs and target proteins is a key area in drug mental relationships among the drug structures. discovery. We characterize four classes of drug- Here we focused on 288 drug pairs in the KEGG target interaction networks in humans involving DRUG structure map to extract the chemical enzymes, ion channels, GPCRs, and nuclear re- modification patterns. We developed a method ceptors, and reveal significant correlations be- for taking the transformations of core chemo- tween drug structure similarity, target sequence types and modified fragments into considera- similarity, and the drug-target interaction net- tions. A set of the pairs of the atoms on the core work topology. We then develop new statistical chemotypes and the fragments was collected as methods to predict unknown drug-target inter- the indicators of the chemical modifications, action networks from chemical structure and which identified 347 chemical modification pat- genomic sequence information simultaneously terns in the structure maps. The transformation on a large scale. As a result, we demonstrate the 110 usefulness of our proposed method for the pre- suggest many potential drug-target interactions diction of the four classes of drug-target interac- and to increase research productivity toward tion networks. Our comprehensively predicted genomic drug discovery. drug-target interaction networks enable us to

Publications

Kanehisa, M., Araki, M., Goto, S., Hattori, M., Genomics, 8: 48, 2007 Hirakawa, M., Itoh, M., Katayama, T., Huang, J., Kawashima, S. and Kanehisa, M. Kawashima,S.,Okuda,S.,Tokimatsu,T.,and New amino acid indices based on residue net- Yamanishi, Y. KEGG for linking genomes to work topology, Genome Informatics, 18, 2007, life and the environment. Nucleic Acids Re- in press search 36: 2008, in press. Shibuya, T., Combitorial Pattern Matching for Kawashima, S., Pokarowski, P., Pokarowska, M., Protein 3-D Structures, SIG FPAI, 2008, to ap- Kolinski, A., Katayama, T., Kanehisa, M. pear. AAindex: amino acid index database, progress Minowa, Y., Araki, M., Kanehisa, M. Compre- report 2008. Nucleic Acids Research 36: 2008, in hensive Analysis of Distinctive Polyketide and press. Nonribosomal Peptide Structural Motifs En- Lapp, H., Bala, S., Balhoff, J.P., Bouck, A., Goto, coded in Microbial Genomes, Journal of Mo- N., Holder, M., Holland, R., Holloway, A., lecular Biology, 368: 1500-1517, 2007. Katayama, T., Lewis, P.O., Mackey, A., Os- Shibuya, T., Efficient Substructure RMSD Query borne, B.I., Piel, W.H., Kosakovsky Pond, S.L., Algorithms, Journal of Computational Biol- Poon, A., Qiu, W.-G, Stajich, J.E., Stoltzfus, A., ogy, Vol. 14, No. 9., 2007, pp. 1201-1207. Thierer, T., Vilella, A.J., Vos, R.A., Zmasek, C. Shibuya, T., Prefix-Shuffled Geometric Suffix M., Zwickl, D. and Vision, T.J. The 2006 NES- Tree, Proc. 14th String Processing and Infor- Cent Phyloinformatics Hackathon: A field re- mation Retrieval Symposium (SPIRE 2007), port, Evolutionary Bioinformatics, 2008, in press. LNCS 4726, 2007, pp. 300-309. Kawashima, S., Kawashima, T., Putnam, N.H., Shibuya, T., Current Bioinformatics and Its Fu- Rokhsar D.S., Wada, H. and Kanehisa, M. ture, Journal of the Institute of Electronics, In- Comparative pair-wise domain-combinations formation and Communication Engineers, vol. for screening the clade specific domain- 90, no. 2, 2007, pp. 145-147. architectures in metazoan genomes, Genome Onuki, R., Shibuya, T., and Kanehisa, M., Informatics, 19: 50-60, 2007 Haplotype Inference Probability-based Geno- Hu, Z., Ng, D.M., Yamada, T., Chen, C., type Clustering, Proc. 7th Annual Workshop Kawashima, S., Mellor, J., Linghu, B., Kane- on Bioinformatics and Systems Biology (IBSB), hisa,M.,Stuart,J.M.andDeLisi,C.VisANT 2007, pp. 26-27. 3.0: new modules for pathway visualization, Shibuya, T., Efficient Substructure RMSD Query editing, prediction and construction. Nucleic Algorithms, IPSJ SIG Notes SIGAL 114-8, Acids Research, 35: W625-32, 2007 2007, pp. 57-64. Masoudi-Nejad, A., Goto, S., Jauregui, R., Ito, Shibuya, T., Fast and Accurate Algorithms for M.,Kawashima,S.,Moriya,Y.,Endo,T.R.and Protein Hinge Detection, IPSJ SIG Notes SIG- Kanehisa, M. EGENES: transcriptome-based BIO 10-4, 2007, pp. 25-32. plant database of genes with metabolic path- Shibuya, T., Prefix-Shuffled Geometric Suffix way information and expressed sequence tag Tree, IPSJ SIG Notes SIGAL 112-1, May 11, indices in KEGG. Plant Physiology, 144: 857- 2007, pp. 1-8. 866, 2007 Onuki, R., Shibuya, T., and Kanehisa, M., Geno- Okamoto, S., Yamanishi, Y., Ehira, S., type Clustering based on Haplotype Inference, Kawashima, S., Tonomura, K. and Kanehisa, Japanese Society of Human Genetics, 2007. M. Prediction of nitrogen metabolism-related Onuki, R., Shibuya, T., and Kanehisa, M., A genes in Anabaena by kernel-based network New Method for Genotype Clustering based analysis. Proteomics, 7: 900-909, 2007 on Haplotype Inference, Biochemistry and Okuda, S., Kawashima, S., Kobayashi, K., Molecular Biology (BMB 2007), 1P-1071, 2007. Ogasawara, N., Kanehisa, M. and Goto, S. 渋谷哲朗,坂内英夫(訳),Neil C. Jones, Pavel Characterization of relationships between Pevzner(著),バイオインフォマティクスのた transcriptional units and operon structures in めのアルゴリズム入門,共立出版,2007. Bacillus subtilis and Escherichia coli, BMC 111

Human Genome Center Laboratory of DNA Information Analysis DNA情報解析分野

Professor Satoru Miyano, Ph.D. 教 授 理学博士 宮 野 悟 Associate Professor Seiya Imoto, Ph.D. 准教授 博士(数理学) 井元清哉 Assistant Professor Masao Nagasaki, Ph.D. 助教 博士(理学) 長 “ 正朗 Project Lecturer Rui Yamaguchi, Ph.D. 特任講師 博士(理学) 山 口 類

The original aim of the research at this laboratory is to establish computational methodologies for discovering and interpreting information of nucleic acid se- quences, proteins and some other experimental data arising from researches in Genome Science. The recent advances in biomedical research have been produc- ing large-scale, ultra-high dimensional, ultra-heterogeneous data. Due to these post-genomic research progresses, our current mission is to create computational strategy for systems biology and medicine towards translational bioinformatics. With this mission, we have been developing computational methods for under- standing life as system and applying them to practical issues in medicine and biol- ogy.

1. Computational Systems Biology format, the ontology is implemented in the Web Ontology Language (OWL), which enables se- a. Cell System Ontology: Representation for mantic validation and automatic reasoning to modeling, visualizing, and simulating bio- check the consistency of biological pathway logical pathways models. The features of CSO are as follows: (1) manipulation of different levels of granularity Euna Jeong, Masao Nagasaki, Ayumu Saito, and abstraction of pathways, e.g., metabolic Satoru Miyano pathways, regulatory pathways, signal transduc- tion pathways, and cell-cell interactions; (2) cap- With the rapidly accumulating knowledge of ture of both quantitative and qualitative aspects biological entities and networks, there is a grow- of biological function by using hybrid functional ing need for a general framework to understand Petri net with extension (HFPNe); and (3) en- this information at a system level. In order to coding of biological pathway data related to understand life as system, a formal description visualization and simulation, as well as model- of system dynamics with semantic validation ing. The new ontology also predefines mature will be necessary. Within the context of biologi- core vocabulary, which will be necessary for cre- cal pathways, several formats have been pro- ating models with system dynamics. In addition, posed, e.g., SBML, CellML, and BioPAX. Unfor- each of the core terms has at least one standard tunately, these formats lack the formal defini- icon for easy modeling and accelerating the ex- tions of each term or fail to capture the system changeability among applications. In order to dynamics behavior. Thus, we have developed a demonstrate the potential of CSO-based path- new system dynamics centered ontology called way modeling, visualization, and simulation, we Cell System Ontology (CSO). As an exchange present an HFPNe model for the ASEL and 112

ASER regulatory networks in Caenorhabditis ele- three problems: (i) the initial layout is often very gans. far from any local optimum because nodes are initially placed at random, (ii) from a biological b. Conversion from BioPAX to CSO for sys- viewpoint, human layouts still exceed automatic tem dynamics and visualization of biologi- layouts in understanding because except subcel- cal pathway lular localization, it does not fully utilize bio- logical information of pathways, and (iii) it em- Euna Jeong, Masao Nagasaki, Satoru Miyan ploys a local search strategy in which the neigh- borhood is obtained by moving one node at The vast accumulation of biological pathway each step, and automatic layouts suggest that si- data scattered in various sources presents chal- multaneous movements of multiple nodes are lenges in the exchange and integration of these necessary for better layouts, while such exten- data. Major new standards for representation of sionmayfaceworseningthetimecomplexity. pathway data and the ability to check inconsis- We propose a new grid layout algorithm. To ad- tency in pathways are inevitable for the devel- dress problem (i), we devised a new forcedi- opment of a reliable pathway data repository. rected algorithm whose output is suitable as the Within the context of biological pathways, the initial layout. For (ii), we considered that an ap- cell system ontology (CSO) had been developed propriate alignment of nodes having the same as a general framework to model system dy- biological attribute is one of the most important namics and visualization of diverse biological factors of the comprehension, and we defined a pathways. CSO provides an excellent environ- new score function that gives an advantage to ment for modeling, visualizing, and simulating such configurations. For solving problem (iii), complex molecular mechanisms at different lev- we developed a search strategy that considers els of details. This paper examines whether CSO swapping nodes as well as moving a node, addresses the integration capability of pathway while keeping the order of the time complexity. data with system dynamics. We present a con- Though a naive implementation increases by version tool for converting BioPAX to CSO. one order, the time complexity, we solved this Transforming the data from BioPAX to CSO not difficulty by devising a method that caches dif- only allows an analysis of the dynamic behav- ferences between scores of a layout and its pos- iors in molecular interactions but also allows the sible updates. Layouts of the new grid layout al- results to be stored for further biological investi- gorithm are compared with that of the previous gations, which is not possible in BioPAX. The algorithm and human layout in an endothelial conversion is done using simple inference algo- cell model, three times as large as the apoptosis rithms with the addition of view- and model. The total cost of the result from the new simulation-related properties. We demonstrate grid layout algorithm is similar to that of the how CSO can be used to build a complete and human layout. In addition, its convergence time consistent pathway repository and enhance the is drastically reduced (40% reduction). interoperability among applications. d. Modelling and simulation of signal trans- c. An efficient grid layout algorithm for bio- ductions in an apoptosis pathway by us- logical networks utilizing various biologi- ing timed Petri nets cal attributes Chen Li, Qi-Wei Ge1,MitsuruNakata1,Hiroshi Kaname Kojima, Masao Nagasaki, Euna Matsuno1, Satoru Miyano: 1Yamaguchi Univer- Jeong, Mitsuru Kato, Satoru Miyano sity

Clearly visualized biopathways provide a This paper first presents basic Petri net com- great help in understanding biological systems. ponents representing molecular interactions and However, manual drawing of large-scale mechanisms of signalling pathways, and intro- biopathways is time consuming. We proposed a duces a method to construct a Petri net model grid layout algorithm that can handle gene- of a signalling pathway with these components. regulatory networks and signal transduction Then a simulation method of determining the pathways by considering edge-edge crossing, delay time of transitions, by using timed Petri node-edge crossing, distance measure between nets, i.e. the time taken in firing of each transi- nodes, and subcellular localization information tion is proposed based on some simple princi- from . Consequently, the layout ples that the number of tokens flowed into a algorithm succeeded in drastically reducing place is equivalent to the number of tokens these crossings in the apoptosis model. How- flowed out. Finally, the availability of proposed ever, for larger-scale networks, we encountered method is confirmed by observing signalling 113 transductions in biological pathways through need to consider some restrictions for model simulation experiments of the apoptosis signal- building. We propose weighted lasso estimation ling pathways as an example. for the graphical Gaussian models as a model of large gene networks. In the proposed method, e. Understanding endothelial cell apoptosis: the structural learning for gene networks is What can the transcriptome glycome and equivalent to the selection of the regularization proteome reveal? parameters included in the weighted lasso esti- mation. We investigate this problem from a Muna Affara2 , Benjamin Dunmore2 , Bayes approach and derive an empirical Baye- Christopher Savoie3, Seiya Imoto, Yoshinori sian information criterion for choosing them. Tamada3 , Hiromitsu Araki3 , D. Stephen Unlike Bayesian network approach, our method Charnock-Jones2 , Satoru Miyano, Cristin can find the optimal network structure and does Print4: 2Cambridge University, 3GNI, Ltd. 4Uni- not require to use heuristic structural learning versity of Auckland algorithm. We conduct Monte Carlo simulation to show the effectiveness of the proposed Endothelial cell (EC) apoptosis may play an method. We also analyze Arabidopsis thaliana important role in blood vessel development, ho- microarray data and estimate gene networks. meostasis and remodelling. In support of this concept, EC apoptosis has been detected within g. A structure learning algorithm for infer- remodelling vessels in vivo, and inactivation of ence of gene networks from microarray EC apoptosis regulators has caused dramatic gene expression data using Bayesian net- vascular phenotypes. EC apoptosis has also been works associated with cardiovascular pathologies. Therefore, understanding the regulation of EC Kazuyuki Numata, Seiya Imoto, Satoru apoptosis, with the goal of intervening in this Miyano process, has become a current research focus. The protein-based signalling and cleavage cas- Estimation of gene networks based on mi- cades that regulate EC apoptosis are well croarray gene expression data is an important known. However, the possibility that pro- problem in systems biology. In this paper we grammed transcriptome and glycome changes use Bayesian networks as a mathematical model contribute to EC apoptosis has only recently for reverse-engineering gene networks from mi- been explored. Traditional bioinformatic tech- croarray data. In such a case, structural learning niques have allowed simultaneous study of of Bayesian networks is known as an NP-hard thousands of molecular signals during the proc- problem and we need to use heuristic algo- ess of EC apoptosis. However, to progress fur- rithms to find better network structures. Re- ther, we now need to understand the complex cently, several algorithms have been proposed cause and effect relationships among these sig- to estimate optimal Bayesian network structure, nals. In this article, we will first review current but the number of genes included in the net- knowledge about the function and regulation work is limited less than 30 or so. In order to of EC apoptosis including the roles of the pro- apply Bayesian network approach to drug target teome transcriptome and glycome. Then, we as- gene discovery, we need to consider gene net- sess the potential for further bioinformatic works with several hundreds of genes. There- analysis to advance our understanding of EC fore we need to develop more efficient algo- apoptosis, including the limitations of current rithms to learn Bayesian network structure technologies and the potential of emerging tech- based on observed data. In this paper we pro- nologies such as gene regulatory networks. pose an efficient structural learning algorithm for Bayesian networks by extending K2 algo- f. Weighted Lasso in graphical Gaussian rithm that is one of the standard learning algo- modeling for large gene network estima- rithms in Bayesian networks. We conduct Monte tion based on microarray data Carlo simulations to examine the effectiveness of the proposed algorithm by comparing with Teppei Shimamura , Seiya Imoto , Rui greedy hill-climbing algorithm. We also show Yamaguchi, Satoru Miyano the application of yeast gene network estimation based on the proposed algorithm. We propose a statistical method based on graphical Gaussian models for estimating large h. Clustering samples characterized by time gene networks from DNA microarray data. In course gene expression profiles using the estimating large gene networks, the number of mixture of state space models genes is larger than the number of samples, we 114

Osamu Hirose, Ryo Yoshida5, Rui Yamaguchi, Sumio Sugano, Tadashi Yamamoto, Satoru Mi- Seiya Imoto, Tomoyuki Higuchi5, Satoru yano Miyano: 5Institute of Statistical Mathematics Comprehensive description of the behavior of We propose a novel method to classify sam- cellular components in a quantitative manner is ples where each sample is characterized by a essential for systematic understanding of bio- time course gene expression profile. By exploit- logical events. Recent LC-MS/MS (tandem mass ing the mixture of state space model, the pro- spectrometry coupled with liquid chromatogra- posed method addresses the following tasks: (1) phy) technology, in combination with the SILAC clustering samples according to temporal pat- (Stable Isotope Labeling by Amino acids in Cell terns of gene expressions, (2) automatic detec- culture) method, has enabled us to make rela- tion of genes that discriminate identified clus- tive quantitation at the proteome level. The re- ters, (3) estimation of a restricted autoregressive cent report by Blagoev et al . (Nat. Biotechnol., coefficient for each cluster. We demonstrate the 22, 1139-1145, 2004) indicated that this method proposed method along with the cluster analysis was also applicable for the time-course analysis of 53 multiple sclerosis patients under recombi- of cellular signalling events. Relative quatitation nant interferon βtherapy with the longitudinal can easily be performed by calculating the ratio time course expression profiles. of peak intensities corresponding to differen- tially labeled peptides in the MS spectrum. As i. Identification of activated transcription fac- currently available software requires some GUI tors from microarray gene expression data applications and is time-consuming, it is not of Kampo medicine-treated mice suitable for processing large-scale proteome data. Rui Yamaguchi, Masahiro Yamamoto6,Seiya To resolve this difficulty, we developed an al- Imoto, Masao Nagasaki, Ryo Yoshida5,Kenji gorithm that automatically detects the peaks in Tsuiji6, Atsushi Ishige6, Hiroaki Asou7,Kenji each spectrum. Using this algorithm, we devel- Watanabe2, Satoru Miyano: 6Keio University oped a software tool named AYUMS that auto- School of Medicine, 7Tokyo Metropolitan Insti- matically identifies the peaks corresponding to tute of Gerontology differentially labeled peptides, compares these peaks, calculates each of the peak ratios in We propose an approach to identify activated mixed samples, and integrates them into one transcription factors from gene expression data data sheet. This software has enabled us to dra- using a statistical test. Applying the method, we matically save time for generation of the final can obtain a synoptic map of transcription factor report. activities which helps us to easily grasp the sys- AYUMS is a useful software tool for compre- tem’s behavior. As a real data analysis, we use a hensive quantitation of the proteome data gen- case-control experiment data of mice treated by erated by LC-MS/MS analysis. This software a drug of Kampo medicine remedying degraded was developed using Java and runs on Linux, myelin sheath of nerves in central nervous sys- Windows, and Mac OS X. Please contact ayums tem. Kampo medicine is Japanese traditional @ims.u-tokyo.ac.jp if you are interested in the herbal medicine. Since the drug is not a single application. The project web page is http:// chemical compound but extracts of multiple me- www.csml.org/ayums/ dicinal herb, the effector sites are possibly multi- ple. Thus it is hard to understand the action k. Modeling gene expression regulatory net- mechanism and the system’s behavior by inves- works with the sparse vector autoregres- tigating only few highly expressed individual sive model genes. Our method gives summary for the sys- tem’s behavior with various functional annota- Andre Fujita, Joao R Sato8, Humberto M. tions, e.g. TFAs and gene ontology, and thus of- Garay-Malpartida8, Rui Yamaguchi, Satoru fer clues to understand it in more holistic man- Miyano, Mari C. Sogayar8, Carlos E Ferreira8: ner. 8University of Sao~ Paulo j. AYUMS: an algorithm for completely auto- To understand the molecular mechanisms un- matic quantitation based on LC-MS/MS derlying important biological processes, a de- proteome data and its application to the tailed description of the gene products networks analysis of signal transduction involved is required. In order to define and un- derstand such molecular networks, some statisti- Ayumu Saito, Masao Nagasaki, Masaaki cal methods are proposed in the literature to es- Oyama, Hiroko Kozuka-Hata, Kentaro Semba, timate gene regulatory networks from time- 115 series microarray data. However, several prob- have been developed for this purpose. The most lems still need to be overcome. Firstly, informa- existing tools implement the same approach that tion flow need to be inferred, in addition to the exploits rank statistics of the genes which are correlation between genes. Secondly, we usually ordered by the strength of statistical evidences, try to identify large networks from a large num- e.g. p-values computed by testing hypotheses at ber of genes (parameters) originating from a the individual gene level. However, such an ap- smaller number of microarray experiments proach often causes the serious false discovery. (samples). Due to this situation, which is rather Particularly, one of the most crucial drawbacks frequent in Bioinformatics, it is difficult to per- is that the rank-based approaches wrongly judge form statistical tests using methods that model the ontology term as statistically significant al- large genegene networks. In addition, most of though all of the genes annotated by the ontol- the models are based on dimension reduction ogy term are irrelevant to the underlying cellu- using clustering techniques, therefore, the result- lar mechanisms. In this paper, we first point out ing network is not a gene-gene network but a some drawbacks of the rank-based approaches module-module network. Here, we present the from the statistical point of view, and then, pro- Sparse Vector Autoregressive model as a solu- pose a new testing procedure in order to over- tion to these problems. We have applied the come the drawbacks. The method that we pro- Sparse Vector Autoregressive model to estimate pose has the theoretical basis on the statistical gene regulatory networks based on gene expres- meta-analysis, and the hypothesis to be tested is sion profiles obtained from time-series microar- suitably stated for the problem of the ontologi- ray experiments. Through extensive simulations, cal analysis. We perform Monte Carlo experi- by applying the SVAR method to artificial regu- ments for highlighting the disadvantages of the latory networks, we show that SVAR can infer rank-based approach and the advantages of the true positive edges even under conditions in proposed method. Finally, we demonstrate the which the number of samples is smaller than applicability of the proposed method along with the number of genes. Moreover, it is possible to the ontological analysis of the gene expression control for false positives, a significant advan- data of human diabetes. tage when compared to other methods de- scribed in the literature, which are based on b. Computational discovery of aberrant ranks or score functions. By applying SVAR to splice variations with genome-wide exon actual HeLa cell cycle gene expression data, we expression profiles were able to identify well known transcription factor targets. The proposed SVAR method is Ryo Yoshida5, Kazuyuki Numata, Seiya Imoto, able to model gene regulatory networks in fre- Masao Nagasaki, Atsushi Doi3, Kazuko Ueno, quent situations in which the number of sam- Satoru Miyano ples is lower than the number of genes, making it possible to naturally infer partial Granger cau- Alternative splicing plays a prominent role in salities without any a priori information. In ad- eukaryotic gene regulations that allow a single dition, we present a statistical test to control the gene to generate the multiple mRNA products. false discovery rate, which was not previously The recent advent of GeneChip Human Exon 1.0 possible using other gene regulatory network ST Array enables us to measure the exon ex- models. pression profiles of human cells on a genome- wide scale. With this advent, analysis of func- 2. Statistical and Computational Knowledge tional gene regulation could be extended to de- Discovery tect not only differentially expressed genes, but also specific splicing events that occur in target a. Statistical absolute evaluation of gene on- cells, but not in normal controls. We address tology terms with gene expression datas some statistical issues for the identification of biomarker splice variations with exon expres- Pramod K. Gupta, Ryo Yoshida5, Seiya Imoto, sion data. The proposed method involves the Rui Yamaguchi, Satoru Miyano following steps: (1) Whole transcript analysis with the nonparametric analysis of variance We propose a new testing procedure for the (ANOVA) to identify potential biomarkers that automatic ontological analysis of gene expres- present specific splice variations. (2) Meta- sion data. The objective of the ontological analy- analysis for discriminating non-specific splice sis is to retrieve some functional annotations, e. variations that are caused by clinical heterogene- g. Gene Ontology terms, relevant to underlying ity in the collected samples. In the analysis of cellular mechanisms behind the gene expression human cells, controlling non-specific splicing profiles, and currently, a large number of tools factors is essential for success in the detection of 116 biomarker splice variations because splice pat- chine learning system BONSAI to predict pat- terns are possibly affected by inter-individual terns based on which positive and negative ex- differences in the collected samples. We demon- amples could be classified. Although BONSAI strate its utility and perform a whole transcript has helped establish 2 interesting rules regard- analysis of exon expression profiles of colorectal ing the requirements for N-myristoylation, the carcinoma. accuracy rates of these rules are not satisfactory. This paper suggests an enhancement of BONSAI c. Performance improvement in protein N- by introducing an “insignificant indexing sym- myristoyl classification by BONSAI with in- bol”and demonstrates the efficiency of this en- significant indexing symbol hancement by showing an improvement in the accuracy rates. We further examine the perform- Manabu Sugii1, Ryo Okada1, Hiroshi Matsuno1, ance of this enhanced BONSAI by comparing Satoru Miyano the results of classification obtained the pro- posed method and an existing public method Many N-myristoylated proteins play key roles for the same sets of positive and negative exam- in regulating cellular structure and function. In ples. the previous study, we have applied the ma-

Publications

1. Affara, M., Dunmore, B., Savoie, C.J., Imoto, with time course gene expression profiles S., Tamada, Y., Araki, H., Charnock-Jones, and the mixture of state space models. D.S., Miyano, S., Print, C. Understanding en- Genome Informatics. 18: 258-266, 2007. dothelial cell apoptosis: What can the tran- 8. Imoto, S. Knowledge discovery of causal re- scriptome glycome and proteome reveal? lations among genes from microarray gene Philosophical Transactions of Royal Society. expression data, Journal of Japan Statistical 362 (1484): 1469-1487, 2007. Society. 37 (1): 55-70, 2007. 2. Akutsu, T., Bannai, H., Miyano, S., Ott, S. 9. Imoto, S., Miyano, S. Bayesian network ap- On the complexity of deriving position spe- proach to estimate gene networks. A. Mittal, cific score matrices from positive and nega- A. Kassim and T. Tan (Eds.), Bayesian Net- tive sequences. Discrete Applied Mathemat- work Technologies : Applications and ics. 155: 676-685, 2007. Graphical Models, Idea Group Publishers, 3. Fujita, A., Sato, J.R., Garay-Malpartida, H. USA. 269-299, 2007. M., Sogayar, M.C., Ferreira, C.E., Miyano, S. 10. Imoto, S., Tamada, Y., Savoie, C.J., Miyano, Modeling nonlinear gene regulatory net- S., Analysis of gene networks for drug tar- works from time series gene expression get discovery and validation. Methods in data. J. Bioinformatics and Computational Molecular Biology. 360: 33-56, 2007. Biology. In press. 11. Jeong, E., Nagasaki, M., Miyano, S. Conver- 4. Fujita, A., Sato, J.R., Garay-Malpartida, H. sion from BioPAX to CSO for system dy- M., Yamaguchi, R., Miyano, S., Sogayar, M. namics and visualization of biological path- C., Ferreira, C.E. Modeling gene expression way. Genome Informatics. 18: 225-236, 2007. regulatory networks with the sparse vector 12. Jeong, E., Nagasai, M., Saito, A., Miyano, S. autoregressive model. BMC Systems Biology Cell System Ontology: Representation for 2007, 1: 39 (doi:10.1186/1752-0509-1-39) modeling, visualizing, and simulating bio- 5. Gupta, P.K., Yoshida, R., Imoto, S., Yama- logical pathways. In Silico Biology 7, 0055, guchi, R., Miyano, S. Statistical absolute 2007. evaluation of gene ontology terms with gene 13. Kojima, K., Nagasaki, M., Jeong, E., Kato, expression data. Lecture Notes in Bioinfor- M., Miyano, S. An efficient grid layout algo- matics. 4463: 146-157, 2007. rithm for biological networks utilizing vari- 6. Hirose, O., Yoshida, R., Imoto, S., Yama- ous biological attributes. BMC Bioinformat- guchi, R., Higuchi, T., Charnock-Jones, D.S., ics. 8: 76, 2007. Print, C., Miyano, S. Statistical inference of 14. Li, C., Ge, Q.-W., Nakata, M., Matsuno, H., transcriptional module-based gene networks Miyano, S. Modeling and simulation of sig- from time course gene expression profiles by nal transductions in an apoptosis pathway using state space models. Bioinformatics. In by using timed Petri nets. J. Biosciences. 32 press. (1): 113-125, 2007. 7. Hirose, O., Yoshida, R., Yamaguchi, R., 15. Miyano, S., DeLisi, C., Holzhütter, H.-G., Imoto, S., Higuchi, T., Miyano, S. Clustering Kanehisa,M.(Eds.).GenomeInformatics.18, 117

2007. S.,Washio,T.,Higuchi,T.,DIGDAG,afirst 16. Numata, K., Imoto, S., Miyano, S.A structure algorithm to mine closed frequent embed- learning algorithm for inference of gene net- ded sub-DAGs. Proc. 5th International works from microarray gene expression data Workshop on Mining and Learning with using Bayesian networks. Proc. IEEE 7th In- Graphs, in press, 2007. ternational Symposium on Bioinformatics & 21. Yamaguchi, R., Yamamoto, M., Imoto, S., Bioengineering. 1280-1284, 2007. (BIBE2007: Nagasaki, M., Yoshida, R., Tsujii, K., Ishiga, Refereed conference; Digital Object Identifier A., Asou, H., Watanabe, K., Miyano, S. Iden- 10.1109/BIBE.2007.4375731) tification of activated transcription factors 17. Saito, A., Nagasaki, M., Oyama, M., Kozuka- from microarray gene expression data of Hata, H., Semba, K., Sugano, S., Yamamoto, Kampo-medicine treated mice. Genome In- T., Miyano, S. AYUMS: an algorithm for formatics. 18: 119-129, 2007. completely automatic quantitation based on 22. Yamaguchi, R., Yoshida, R., Imoto, S., LC-MS/MS proteome data and its applica- Higuchi, T., Miyano, S. Finding module- tion to the analysis of signal transduction. based gene networks with state-space mod- BMC Bioinformatics. 8: 15, 2007. els? Mining high-dimensional and short 18. Shimamura, T., Yamaguchi, R., Imoto, S., time-course gene expression data. IEEE Sig- Miyano, S. Weighted lasso in graphical nal Processing Magazine, 24 (1): 37-46, 2007. Gaussian modeling for large gene network 23. Yoshida, R., Numata, K., Imoto, S., Na- estimation based on microarray data. gasaki, M., Doi, A., Ueno, K., Miyano, S. Genome Informatics. 19: 142-153, 2007. Computational discovery of aberrant splice 19. Sugii, M., Okada, R., Matsuno, H., Miyano, variations with genome-wide exon expres- S. Performance improvement in protein N- sion profiles. Proc. IEEE 7th International myristoyl classification by BONSAI with in- Symposium on Bioinformatics & Bioengi- significant indexing symbol. Genome Infor- neering. 715-722, 2007. (IEEE BIBE2007: matics. 18: 277-286, 2007. Refereed conference; Digital Object Identifier 20. Termier, A., Tamada, Y., Numata, K., Imoto, 10.1109/BIBE.2007.4375639) 118

Human Genome Center Laboratory of Molecular Medicine Laboratory of Genome Technology Division of Advanced Clinical Proteomics ゲノムシークエンス解析分野 シークエンス技術開発分野 先端臨床プロテオミクス共同研究ユニット

Professor Yusuke Nakamura, M.D., Ph.D. 教 授 医学博士 中村祐輔 Associate Professor Toyomasa Katagiri, Ph.D. 准教授 医学博士 片桐豊雅 Hidewaki Nakagawa, M.D., Ph.D. 准教授 医学博士 中川英刀 Assistant Professor Ryuji Hamamoto, Ph.D. 助 教 理学博士 浜本隆二 Koichi Matsuda, M.D., Ph.D. 助 教 医学博士 松田浩一 Hitoshi Zembutsu, M.D., Ph.D. 助 教 医学博士 前 佛 均 Project Associate Professor Yataro Daigo, M.D., Ph.D. 特任准教授 医学博士 醍 醐 弥太郎

The major goal of our group is to identify genes of medical importance, and to de- velop new diagnostic and therapeutic tools. We have been attempting to isolate genes involving in carcinogenesis and also those causing or predisposing to vari- ous diseases as well as those related to drug efficacies and adverse reactions. By means of technologies developed through the genome project including a high- resolution SNP map, a large-scale DNA sequencing, and the cDNA microarray method, we have isolated a number of biologically and/or medically important genes, and are developing novel diagnostic and therapeutic tools.

1. Genes playing significant roles in human date, Koji Ueda, Nguyen Minh-Hue, Yuria cancer Mano, Masaya Taniwaki, Ryohei Nishino, Daizaburo Hirata, Takumi Yamabuki, Nagato Toyomasa Katagiri, Yataro Daigo, Hidewaki Sato, Junkichi Koinuma, Banu Turel, Kenji Nakagawa, Hitoshi Zembutsu, Koichi Matsuda, Tamura, Masayo Hosokawa, Su-Youn Chung, Ryo Takata, Mitsugu Kanehira, Koji Taka- Motohide Uemura, Akio Takehara, Arata hashi, Atsushi Takano, Nobuhisa Ishikawa, Shimo, Fabio Pittella Silva and Yusuke Naka- Tatsuya Kato, Satoshi Hayama, Chie Suzuki, mura Akira Togashi, Kazuhito Morioka, Tomohide Kidokoro, Chizu Tanikawa, Asahi Hishida, (1) Lung cancer Miki Akiyama, Sachiko Dobashi, Meng-Lay Lin, Jae-Hyun Park, Tomomi Ueki, Yosuke MAPJD (Myc-associated protein with JmjC Harada, Chikako Fukukawa, Toshihiko Nishi- domain) 119

Through genome-wide expression profile coding HJURP (Holliday Junction-Recognizing analysis for non-small cell lung carcinomas Protein) whose activation appeared to play a (NSCLCs), we found over-expression of a pivotal role in immortality of cancer cells. MAPJD (Myc-associated protein with JmjC do- HJURP was shown to be a possible downstream main) gene in the great majority of NSCLC target of ATM signaling and increased by DNA cases. Induction of exogenous expression of double-strand breaks (DSBs). HJURP is involved MAPJD into NIH3T3 cells conferred growth- in the homologous recombination (HR) pathway promoting activity. Concordantly, in vitro sup- in the DSBs repair process, interacting with pression of MAPJD expression with siRNA ef- hMSH5, or a member of a MRN protein com- fectively suppressed growth of NSCLC cells in plex NBS1. HJURP formed nuclear foci in cells which MAPJD was over-expressed. We found at S-phase or in those after having DNA dam- four candidate MAPJD-target genes, SBNO1, age. In vitro assays implied that HJURP directly TGFBRAP1, RIOK1,andRASGEF1A,which bound to Holliday Junction and rDNA array. were the most significantly induced by exoge- Treatment of cancer cells with siRNA against nous MAPJD expression. Through interaction HJURP caused abnormal chromosomal fusions, with MYC protein, MAPJD transactivates a set and led to genomic instability and senescence. of genes including kinases and cell signal trans- In addition, HJURP over-expression was ob- ducers that are possibly related to proliferation served in the majority of lung cancers and asso- of lung cancer cells. As our data imply that ciated with poorer prognosis. We suggest that MAPJD is a novel member of the MYC tran- HJURP is an indispensable factor for chromoso- scriptional complex and its activation is a com- mal stability in immortalized cancer cells and mon feature of lung-cancer, selective suppres- could be a novel therapeutic target for develop- sion of this pathway could be a promising ment of anti-cancer drugs. therapeutic target for treatment of lung cancers. FGFR1OP (fibroblast growth factor receptor 1 CDCA8 (cell-division-associated 8) oncogene partner)

We found in the great majority of lung-cancer We found an elevated expression of FGFR1OP samples co-transactivation of cell-division- (fibroblast growth factor receptor 1 oncogene associated 8 (CDCA8) and aurora kinase B partner) in the great majority of lung cancers. (AURKB ) that were considered to be compo- Immunohistochemical staining using tumor tis- nents of the vertebrate chromosomal passenger sue microarrays consisting of 372 archived non- complex. Immunohistochemical analysis of lung- small cell lung cancer (NSCLC) specimens re- cancer tissue microarrays demonstrated that vealed positive staining of FGFR1OP in 334 over-expression of CDCA8 and AURKB was sig- (89.8%) of 372 NSCLCs. We also found that the nificantly associated with poor prognosis of high level of FGFR1OP expression was signifi- lung-cancer patients. AURKB directly phospho- cantly associated with shorter tumor-specific rylated CDCA8 at Ser-154, Ser-219, Ser-275, and survival times (P <0.0001 by log-rank test). Thr-278, and appeared to stabilize CDCA8 pro- Moreover multivariate analysis determined that tein in cancer cells. Suppression of CDCA8 ex- FGFR1OP was an independent prognostic factor pression with siRNA against CDCA8 signifi- for surgically-treated NSCLC patients (P < cantly suppressed the growth of lung cancer 0.0001). Treatment of lung cancer cells, in which cells. In addition, functional inhibition of inter- endogenous FGFR1OP was over-expressed, us- action between CDCA8 and AURKB by a cell- ing FGFR1OP siRNA, suppressed its expression permeable peptide corresponding to 20 amino- and resulted in inhibition of the cell growth. acid sequence of a part of CDCA8 (11R-CDCA Furthermore, induction of FGFR1OP increased

8261-280), which included two phosphorylation the cellular motility, invasion, and growth- sites by AURKB, significantly reduced phospho- promoting activity of mammalian cells. To in- rylation of CDCA8 and resulted in growth sup- vestigate its function, we searched for FGFR1 pression of lung cancer cells. Our data imply OP-interacting proteins in lung cancer cells and that selective suppression of the CDCA8- identified the ABL1 (Abelson murine leukemia AURKB pathway could be a promising thera- viral oncogene homolog 1) and WRNIP1 peutic strategy for treatment of lung cancer pa- (Werner helicase interacting protein 1), which tients. was known to be involved in cell cycle progres- sion. ABL1 appeared to suppress DNA synthesis HJURP (Holliday Junction-Recognizing Pro- by directly phosphorylating tyrosine resides of tein) WRNIP1, while FGFR1OP significantly reduced ABL1-dependent phosphorylation of WRNIP1 We additionally identified a novel gene en- and resulted in the promotion of DNA synthesis 120

of cancer cells. Since our data imply that FGFR1 of WT-MELK suppressed Bcl-GL-induced apop- OP is likely to play a significant role in lung tosis, while that of D150A-MELK did not. Our cancer growth and progression, FGFR1OP findings suggest that kinase activity of MELK is should be useful as a prognostic biomarker and likely to be involved in mammary carcinogene- probably as a therapeutic target for lung cancer. sis through inhibition of pro-apoptotic function of Bcl-GL. The kinase activity of MELK should (2) Breast Cancer be a promising target for development of molecular-targeting therapy for patients with MELK (maternal embryonic leucine-zipper breast cancers. kinase) KIF2C/MCAK (Kinesin family member 2C/Mi- Cancer therapy directing at specific molecular totic centromere-associated kinesin) targets in signaling pathways of cancer cells such as Tamoxifen, aromatase inhibitors and We also demonstrated functional significance trastuzumab has been proven its usefulness for of KIF2C/MCAK (Kinesin family member 2C/ treatment of advanced breast cancers. However, Mitotic centromere-associated kinesin) in growth increases of the risk of endometrial cancer by of breast cancer cells. Northern-blot and immu- long-term tamoxifen administration as well as nohistochemical analyses confirmed KIF2C/ those of bone fracture due to osteoporosis in MCAK overexpression in breast cancer cells, postmenopausal women with aromatase inhibi- and indicated its undetectable level of expres- tor prescription are recognized as their side ef- sion in normal human tissues except testis, sug- fects. Due to the emergence of these side effects gesting KIF2C/MCAK to be a cancer-testis anti- and also drug resistance, it is necessary to gen. Western-blot analysis using breast cancer search novel targets for molecularly-orientated cell-lines revealed a significant increase of endo- drugs on the basis of well-characterized mecha- genous KIF2C/MCAK protein level and its nisms of action. Using the accurate genome- phosphorylation in G2/M phase. Treatment of wide expression profiles of breast cancers, we breast cancer cells with small-interfering RNAs found maternal embryonic leucine-zipper kinase (siRNAs) against KIF2C/MCAK effectively sup- (MELK ) that was significantly overexpressed in pressed KIF2C/MCAK expression and inhibited the great majority of breast cancer cells. To as- the growth of breast cancer cell-lines, T47D and sess a possible role of MELK in mammary car- HBC5 cells. We also demonstrated that KIF2C/ cinogenesis, we knocked down the expression of MCAK expression was significantly suppressed endogenous MELK in breast cancer cell-lines by by ectopic introduction of p53. These findings means of the mammalian vector-based RNA in- suggest that overexpression of KIF2C/MCAK terference (RNAi) technique. Furthermore, we might be involved in breast carcinogenesis and identified a long isoform of Bcl-G (Bcl-GL), a is a promising therapeutic target for breast can- pro-apoptotic member of Bcl-2 family, as a pos- cers. sible substrate(s) for the MELK kinase by pull- down assay with wild-type- and kinase-dead- (3) Bladder cancer MELK recombinant proteins. Finally, we per- formed TUNEL assay and FACS analysis to MPHOSPH1 (M-phase phosphoprotein 1) measure the proportions of sub-G1 population to investigate MELK is involved in apoptosis To disclose molecular mechanism of bladder cascade through the Bcl-GL-related pathway. The cancer, the second most common genitourinary multiple human tissues- and cancer cell lines- tumor, we had previously performed genome- northern blot analyses demonstrated that MELK wide expression profile analysis of 26 bladder was overexpressed at a significantly high level cancers by means of cDNA microarray repre- in a great majority of breast cancers and cancer senting 27,648 genes. Among genes that were cell lines, but not expressed in normal vital or- significantly up-regulated in the majority of gans (heart, liver, lung, hidney). Suppression of bladder cancers, we here report identification of MELK expression with small-interfering RNA MPHOSPH1 (M-phase phosphoprotein 1)asacan- significantly inhibited growth of human breast didate molecule for drug development for blad- cancer cells. We also found that MELK protein der cancer. Northern blot analyses using physically interacted with Bcl-GL protein, a pro- mRNAs of normal human organs and cancer apoptotic member of the Bcl-2 family through cell-lines indicated this molecule to be a novel its N-terminal region. Subsequent immunocom- cancer / testis antigen . Introduction of plex kinase assay showed Bcl-GL was specifically MPHOSPH1 into NIH3T3 cells significantly en- phosphorylated by MELK in vitro. TUNEL assay hanced cell growth at in vitro and in vivo condi- and FACS analysis revealed that overexpression tions. We subsequently found an interaction be- 121 tween MPHOSPH1 and PRC1 (Protein Regula- genome-wide cDNA microarray combined with tor of Cytokinesis 1), which was also up- laser microdissection. Among dozens of trans- regulated in bladder cancer cells. Immunocyto- activated genes in pancreatic cancer cells, this chemical analysis revealed co-localization of en- study focused on KIAA0101 whose overexpres- dogenous MPHOSPH1 and PRC1 proteins in sion in pancreatic cancer cells was validated by bladder cancer cells. Interestingly, knockdown immunohitochemical analysis. KIAA0101 was of either of MPHOSPH1 or PRC1 expression previously identified as p15PAF (PCNA-associated with specific siRNAs caused significant increase factor) to bind with PCNA (proliferating cell nu- of multi-nuclear cells and subsequent cell death clear antigen), however its function remains un- of bladder cancer cells. Our results imply that known. To investigate for the biological signifi- the MPHOSPH1/PRC1 complex is likely to play cance of KIAA0101 overexpression in cancer a crucial role in bladder carcinogenesis and that cells, we knocked down KIAA0101 by siRNA in inhibition of the MPHOSPH1/PRC1 expression pancreatic cancer cells, and found that the re- or their interaction should be novel therapeutic duced expression by siRNA caused drastic at- targets for bladder cancers. tenuation of their proliferation as well as signifi- cant decrease in DNA replication rate. Concor- DEPDC1 (DEP domain containing 1) dantly, exogenous over-expression of KIAA0101 enhanced cancer cell growth and NIH3T3- We also found a novel gene, DEPDC1 (DEP derivative cells expressing KIAA0101 revealed domain containing 1), whose over-expression was in vivo tumor formation, implying its growth confirmed in bladder cancers by northern-blot promoting and oncogenic property. We also and immunohistochemical analyses. Immunocy- demonstrated that the expression of KIAA0101 tochemical staining analysis detected strong was regulated tightly by p53-p21 pathway. To staining of endogenous DEPDC1 protein in the consider the KIAA0101/PCNA interaction as a nucleus of bladder cancer cells. Since DEPDC1 therapeutic target, we designed the cell- expression was hardly detectable in any of 24 permeable 20-amino-acid dominant-negative normal human tissues we examined except the peptide and found that it could effectively in- testis, we considered this gene-product to be a hibit the KIAA0101/PCNA interaction and re- novel cancer/testis antigen. Suppression of sulted in the significant growth suppression of DEPDC1 expression with small-interfering RNA cancer cells. Our results clearly implicated that (siRNA) significantly inhibited growth of blad- suppression of the KIAA0101 and PCNA onco- der cancer cells. Taken together, these findings genic activity, or the inhibition of KIAA0101/ suggest that DEPDC1 might play an essential PCNA interaction is likely to be a promising role in the growth of bladder cancer cells, and strategy to develop novel cancer therapeutic would be a promising molecular-target for novel drugs. therapeutic drugs or cancer peptide-vaccine to bladder cancers. GABRP (GABA receptor π subunit)

(4) Pancreatic cancer GABA (γ-amino-butyric acid) functions pri- marily as an inhibitory neurotransmitter in the KIAA0101 mature central nervous system, and GABA/ GABA receptors are also present in non-neural Pancreatic ductal adenocarcinoma (PDAC) tissues, including cancer, but their precise func- shows the worst mortality among the common tion in non-neuronal or cancerous cells is poorly malignancies, with a 5-year survival rate of only defined so far. We identified over-expression of 4%, and the majority of PDAC patients are di- GABA receptor π subunit (GABRP) in pancreatic agnosed at an advanced stage, in which no ef- ductal adenocarcinoma (PDAC) cells. We also fective therapy is available at present. Although found the expression of this peripheral-type the proportion of curable cases is still not so GABAA receptor subunit, GABRP, in some adult high, surgical resection of early-staged PDACs is human organs. Knockdown of endogenous the only way to cure the disease. Hence, estab- GABRP expression in PDAC cells by siRNA at- lishment of a screening strategy to detect early- tenuated PDAC cell growth, suggesting its es- staged PDACs by novel serological markers is sential role in PDAC cell viability. Notably, ad- urgently required, and development of novel dition of GABA into the cell-culture medium molecular therapies for PDAC treatment is also promoted the proliferation of GABRP-expressing eagerly expected. To isolate novel diagnostic PDAC cells, but not GABRP-negative cells, and markers and therapeutic targets for pancreatic GABAA receptor antagonists inhibited this cancer, we earlier performed expression profile growth-promoting effect by GABA. The HEK293 analysis of pancreatic cancer cells using a cells constitutively expressing exogenous 122

GABRP revealed the growth-promoting effect by encoding a putative 5α-steroid reductase that GABA treatment. Furthermore, GABA treatment produces the most potent androgen DHT (5α- in GABRP-positive cells increased intracellular dihydrotestosterone) from testosterone. LC-MS/ Ca2+ level and activated MAPK (mitogen- MS analysis following in vitro 5α-steroid reduc- activated protein kinase) cascade. Clinical PDAC tase reaction validated its ability to produce tissues contained a higher level of GABA ligand DHT from testosterone as similar to type-1 5α- than normal pancreas tissues due to the up- steroid reductase. Since two types of 5α-steroid regulation of GAD1 (glutamate decarboxylase 1) reductases were previously reported, we termed expression, suggesting their autocrine/paracrine this novel 5α-steroid reductase to be SRD5A3 growth-promoting effect in PDACs. These find- (type-3 5α-steroid reductase). RT-PCR and ings imply that GABA ligand and GABRP could northern blot analyses confirmed its over- play important roles in PDAC development and expression in HRPC cells, and indicated no or progression, and this pathway can be a promis- little expression in normal adult organs. Knock- ing molecular target for development of new down of SRD5A3 expression by siRNA in pros- therapeutic strategies for PDAC. tate cancer cells resulted in significant decrease of DHT production and drastic reduction of (4) Prostate cancer their cell viability. These findings implicate that anovel5α-steroid reductase, SRD5A3, is associ- One of the most critical issues in prostate can- ated with DHT production and maintenance of cer clinic is emerging hormone-refractory pros- the AR pathway activation in HRPC cells, and tate cancers (HRPCs) and their management. that this enzymatic activity should be a promis- Prostate cancer is usually androgen-dependent ing molecular target for prostate cancer therapy. and responds well to androgen ablation therapy. However, at a certain stage they eventually ac- (5) Chemosensitivity quire androgen-independent phenotype and show very poor response to any anti-cancer Bladder Cancer therapies that are presently available. To charac- terize the molecular features of clinical HRPCs, To predict the efficacy of the M-VAC neoadju- we analyzed gene-expression profiles of 25 clini- vant chemotherapy for invasive bladder cancers, cal HRPCs and 10 hormone-sensitive prostate we previously established the method to calcu- cancers (HSPCs) by genome-wide cDNA mi- late the prediction score on the basis of expres- croarrays combining with laser microbeam mi- sion profiles of 14 predictive genes. This scoring crodissection. An unsupervised clustering analy- system had clearly distinguished the responder sis clearly distinguished expression patterns of group from the non-responder group. To further HRPC cells from those of HSPC cells. In addi- validate the clinical significance of the system, tion, primary and metastatic HRPCs from indi- we applied it to 22 additional cases of bladder vidual patients were closely clustered regardless cancer patients and found that the scoring sys- to metastatic organs. A supervised clustering tem correctly predicted clinical response for 19 analysis identified 36 up-regulated genes and 70 of the 22 test cases. The group of patients with down-regulated genes in HRPCs, compared positive predictive scores had significantly with HSPCs (P <0.0001, gap>1.5). We observed longer survival than that with negative scores. over-expression of AR, ANLN ,andSNRPE ,and When we compared our results with the previ- down-regulation of NR4A1, CYP27A1,and ous report describing the prognosis of the pa- HLA-A antigeninHRPCprogression.Such tients with cystectomy alone, the results imply genes were considered to be related to the that patients with positive scores are likely to androgen-independent and more aggressive have benefit by having M-VAC neoadjuvant phenotype of HRPCs, and in fact, knockdown of chemotherapy, but that the chemotherapy some over-expressing genes by siRNA resulted would shorten lives of patients with negative in drastic attenuation of prostate cancer cell scores. We are confident that our prediction sys- growth. Our precise microarray analysis of tem to M-VAC therapy should provide opportu- HRPC cells should provide useful information nities for achieving better prognosis and im- to understand the molecular mechanism of proving quality of life of patients. HRPC progression as well as to identify molecu- lar targets for development of treatment of Philadelphia -positive acute lym- HRPCs. phoblastic leukemia (Ph+ALL)

SRD5A3 Philadelphia chromosome-positive acute lym- phoblastic leukemia (Ph+ALL) reveals very We then identified one novel gene, SRD5A2L, poor prognosis due to high incidence of relapse 123 when treated with standard chemotherapy. Al- els were significantly higher in lung and though more than 96 % of patients with Ph+ esophageal cancer patients than in healthy con- ALL achieved complete remission (CR) with trols. The proportion of the DKK1-positive cases imatinib-combined chemotherapy in a phase II was 126 (70.0%) of 180 NSCLC, 59 (69.4%)of85 clinical trial conducted by the Japan Adult Leu- SCLC, and 51 (63.0%)of81ESCCpatients, kemia Study Group (JALSG), 26 % of them ex- while only 10 (4.8%) of 207 healthy volunteers perienced hematological relapse in a short time were falsely diagnosed as positive. A combined after achievement of CR. In this study, to estab- ELISA assays for both DKK1 and CEA increased lish a prediction system for risk of relapse, we sensitivity, and classified 82.2% of the NSCLC analyzed gene expression profiles of 23 bone patients as positive while only 7.7% of healthy marrow samples from patients with Ph+ALL volunteers were falsely diagnosed to be positive. using cDNA microarray consisting of 27,648 The use of both DKK1 and ProGRP increased cDNA sequences. Using the 19 randomly- sensitivity to detect SCLCs up to 89.4%, while selected test cases, we identified 16 genes that false positive rate in healthy donors were only were expressed significantly differently between 6.3%. Our data imply that DKK1 should be use- patients with (n=8) and without (n=11) con- ful as a novel diagnostic/prognostic biomarker tinuous response; from the list of 16 genes, we in clinic and probably as a therapeutic target for selected the 6 “predictive” genes and con- lung and esophageal cancer. structed a numerical scoring system by which the two groups were clearly separated, with KIF4A positive scores for the former and the negative scores for the latter. Scoring of 4 cases that were To identify molecules that might be useful as reserved from the original 23 cases predicted diagnostic/prognostic biomarkers and as targets correctly their clinical responses. In addition, for the development of new molecular therapies, three cases whose BCR-Abl transcript levels we screened genes that were highly transacti- failed to reduce sufficiently after induction ther- vated in a large proportion of 101 lung cancers apy, also revealed negative scores. We also de- by means of a cDNA microarray representing veloped a quantitative reverse transcription-PCR 27,648 genes. We found a gene encoding KIF4A, -based prediction system that could be feasible a kinesin family member 4A, as one of such can- for routine clinical use. Our results suggest that didates. Tumor-tissue microarray was applied to achievement of continuous response with examine the expression of KIF4A protein and its imatinib-combined chemotherapy can be pre- clinicopathological significance in archival non- dicted by expression patterns in this set of small cell lung cancer (NSCLC) samples from genes, leading to achievement of “personalized 357 patients. A role of KIF4A in cancer cell therapy” for treatment of this disease. growth and/or survival was examined by small interfering RNA (siRNA) experiments. Cellular (6) Biomarker invasive activity of KIF4A on mammalian cells was examined using Matrigel assays. Immuno- DKK1 (Dickkopf-1) histochemical staining detected positive KIF4A staining in 127 (36%) of 357 NSCLCs and 19 (66 Gene-expression profile analysis of lung and %) of 29 SCLCs examined. Positive im- esophageal carcinomas revealed that Dickkopf-1 munostaining of KIF4A protein was associated (DKK1) was highly transactivated in the great with male gender ( P = 0.0287 ) , non- majority of lung cancers and esophageal adenocarcinoma histology (P =0.0097), and squamous-cell carcinomas (ESCCs). Immunohis- shorter survival for patients with NSCLC (P = tochemical staining using tumor tissue microar- 0.0005), and multivariate analysis confirmed its rays consisting of 279 archived non-small cell independent prognostic value (P =0.0012). lung cancers (NSCLCs) and 280 ESCC speci- Treatment of lung cancer cells with siRNAs for mens demonstrated that a high level of DKK1 KIF4A suppressed growth of the cancer cells. expression was associated with poor prognosis Furthermore, we found that induction of exoge- of patients with NSCLC as well as ESCC, and nous expression of KIF4A conferred cellular in- multivariate analysis confirmed its independent vasive activity on mammalian cells. These data prognostic value for NSCLC. In addition, we strongly implied that targeting the KIF4A mole- identified that exogenous expression of DKK1 cule might hold a promise for the development increased the migratory activity of mammalian of anti-cancer drugs and cancer vaccines as well cells, suggesting that DKK1 may play a signifi- as a prognostic biomarker in clinic. cant role in progression of human cancer. We established an ELISA system to measure serum levels of DKK1 and found that serum DKK1 lev- 124

LY6K (lymphocyte antigen 6 complex, locus K) nitrobenzensulfenyl) stable isotope labeling fol- lowed by MALDI-QIT-TOF mass spectrometric We revealed that a member of low molecular analysis. Through this systematic analysis for weight GPI-anchored molecule-like protein, lym- five serum samples derived from patients with phocyte antigen 6 complex, locus K (LY6K) was lung adenocarcinoma, we identified as candi- transactivated in the great majority of NSCLCs date biomarkers 34 serum glycoproteins that re- and ESCCs. Northern-blot analysis detected ex- vealed significant difference in α1,6-fucosylation pression of LY6K gene only in testis among nor- level between lung cancer and healthy control, mal tissues examined. Immunohistochemical clearly demonstrating that the carbohydrate- staining using tumor tissue microarrays consist- focused proteomics could allow for the detection ing of 413 archived NSCLC and 271 ESCC speci- of serum components with cancer-specific fea- mens confirmed that a high level of LY6K ex- tures. In addition, we developed more simpli- pression was associated with poor prognosis of fied and practical technique, mass spectrometry- patients with NSCLC as well as ESCC, and mul- based glycan structure analysis and lectin blot- tivariate analysis confirmed its independent ting, in order to validate glycan structure of can- prognostic value for NSCLC. In addition, treat- didate biomarkers that could be applicable in ment of NSCLC cells with small interfering clinical use. Our new glycoproteomic strategy RNAs against LY6K knocked-down its expres- will provide highly-sensitive and quantitative sion and resulted in growth suppression of the profiling of specific glycan structures on multi- cancer cells. Serum levels of LY6K were signifi- ple proteins, which should be useful for serum cantly higher in NSCLC and ESCC patients than biomarker discovery. in healthy controls. The proportion of the serum LY6K-positive cases defined by our criteria was 2. Common diseases 38 (33.9%) of 112 NSCLC and 26 (32.1%)of81 ESCC, while only 3 (4.1%) of 74 healthy volun- (1) Cerebral infarction teers were falsely diagnosed as positive. A com- bined assay using both LY6K and CEA in- Michiaki Kubo1,2,4,JunHata1,2,4, Toshiharu Ni- creased sensitivity, and judged 64.7% of the nomiya1,2, Koichi Matsuda4, Koji Yonemoto1, lung adenocarcinoma patients as positive while Toshiaki Nakano2,3, Tomonaga Matsushita2,4, 9.5% of healthy volunteers were falsely diag- Keiko Yamanaka-Yamazaki4, Yozo Ohnishi5, nosed to be positive. The use of both LY6K and Susumu Saito5, Takanori Kitazono2, Setsuro CYFRA 21-1 increased sensitivity to detect lung Ibayashi2, Katsuo Sueishi3 , Mitsuo Iida2 , squamous-cell carcinomas up to 70.4%, while Yusuke Nakamura4, and Yutaka Kiyohara1.: false positive rate in healthy donors were only 1Department of Environmental Medicine, 2De- 6.8%. Our data imply LY6K to be a cancer-testis partment of Medicine and Clinical Science, antigen and suggest that LY6K should be useful 3Pathophysiological and Experimental Pathol- as a diagnostic/prognostic biomarker and prob- ogy, Graduate School of Medical Sciences, Ky- ably as a therapeutic target for development of ushu University, Fukuoka 812-8582, Japan. new molecular-targeted agents and/or immuno- 4Laboratory of Molecular Medicine, Human therapy of lung and esophageal cancers. Genome Center, Institute of Medical Science, University of Tokyo, Tokyo 108-8639, Japan. Proteomics 5Laboratory for Genotyping, SNP Research Center, the Institute of Physical and Chemical The recent progress in various proteomic tech- Research (RIKEN), Yokohama 230-0045, Ja- nologies allows us to screen serum biomarker pan. including carbohydrate antigens. However, only a limited number of proteins could be detected PRKCH by current conventional methods such as shotgun-proteomics, primarily because of the Cerebral infarction is the most common type enormous concentration distribution of serum of stroke and often causes long-term disability. proteins and peptides. To circumvent this diffi- To investigate the genetic contribution to cere- culty and isolate potential cancer-specific bral infarction, we conducted a case-control biomarkers for diagnosis and treatment, we es- study using 52,608 gene-based tag-SNPs selected tablished a new screening system consisting of from JSNP database. Here we reported that one the sequential steps of (1) immuno-depletion of non-synonymous SNP in a member of protein 6 high abundant proteins, (2) targeted enrich- kinase C (PKC) family, PRKCH , was signifi- ment of glycoproteins by lectin column chroma- cantly associated with lacunar infarction in two tography, and (3) the quantitative proteome independent Japanese samples (p=3.2×10-7, 12 13 analysis using C6- or C6-NBS ( 2- crude odds ratio of 1.39). This SNP was likely to 125 affect the PKC activity. Furthermore, a 14-year pan. 3Department of Pediatrics, University of follow-up cohort study in Hisayama (Fukuoka, California San Diego, School of Medicine, La Japan) supported involvement of this SNP for Jolla, CA, and Rady Children’s Hospital San the development of cerebral infarction (p=0.03, Diego, CA, USA. 3Department of Cardiology, age- and sex-adjusted hazard ratio of 2.83). We Boston Children’s Hospital, Boston, MA, USA. also found that PKCη was mainly expressed in 4Department of Public Health, Jichi Medical vascular endothelial cells and foamy macro- School, Minamikawachi, Tochigi, Japan. 4Sai- phages in human atherosclerotic lesions, and its tama Prefectural University, Koshigaya, Sai- expression was enhanced as the lesion type pro- tama, Japan. 5Department of Preventive Medi- gressed. Our results support a role for PRKCH cine, Shinshu University School of Medicine, in the pathogenesis of cerebral infarction. Matsumoto, Japan. 6Department of Molecular Biology, Kawasaki Medical School, Kurashiki, AGTRL1 (angiotensin receptor like-1) Okayama, Japan. 7Department of Pediatrics, Fukuoka University School of Medicine, Comparison of allele frequencies between Fukuoka, Fukuoka, Japan. 8Department of Pe- 1,112 cases with brain infarction and age- and diatrics, Tokyo Women’s Medical University sex-matched control subjects of the same num- Yachiyo Medical Center, Yachiyo, Chiba, Ja- ber found a SNP in the 5’-flanking region of an- pan. 9Department of Pediatrics, Fuji Heavy In- giotensin receptor like-1 (AGTRL1)gene(rs dustries LTD. Health Insurance Society Gen- 9943582, -154G/A) to have a significant associa- eral Ohta Hospital, Ohta, Gunma, Japan. 10De- tion with brain infarction (Odds ratio=1.30, 95 partment of Pediatrics, Kawasaki Medical % confidence interval (CI)=1.14-1.47, P = School, Kurashiki, Okayama, Japan. 11Depart- 0.000066). We also found the binding of Sp1 ment of Pediatrics, Toho University School of transcription factor to the region including the Medicine, Tokyo, Japan. 12Department of Pedi- susceptible G allele, but not the non-susceptible atrics, Yokohama Minami Kyousai Hospital, A allele. Luciferase assay and RT-PCR analysis Yokohama, Kanagawa, Japan. 15Department of demonstrated that exogenously-introduced Sp1 Pediatrics, National Hospital Organization induced transcription of AGTRL1 and its ligand, Yokohama Medical Center, Yokohama, Kana- apelin as well, indicating direct regulation of gawa, Japan. 16Department of Pediatrics, Fujita apelin/APJ pathway by Sp1. Furthermore, a 14- Health University, Toyoake, Aichi, Japan. 17De- year follow-up cohort study in a Japanese com- partment of Pediatrics, Toyokawa Citizen’s munity in Hisayama town, Japan revealed that Hospital, Toyokawa, Aichi, Japan. 18Depart- the homozygote of the susceptible G allele of ment of Pediatrics, National Hospital Organi- this particular SNP had significantly higher risk zation Kure Medical Center, Kure, Okayama, of brain infarction (Hazard ratio=2.00, 95 % CI Japan. 19Department of Pediatrics, Dokkyo =1.22-3.29, P =0.006). Our results indicate that Medical University, Koshigaya Hospital, Koshi- the SNP in the AGTRL1 gene is associated with gaya, Saitama, Japan. 20Department of Pediat- the susceptibility to brain infarction. rics, Kawasaki Municipal Hospital, Kawasaki, Japan. 21Department of Surgery, Keio Univer- (2) Kawasaki disease sity School of Medicine, Tokyo, Japan. 22Labo- ratory for Genotyping, SNP Research Center, Yoshihiro Onouchi1, Tomohiko Gunji1,2, Jane RIKEN, Yokohama, Kanagawa, Japan. 23Labo- C. Burns3, Chisato Shimizu3, Jane W. New- ratory for Medical Informatics, SNP Research burger4, Mayumi Yashiro5, Yoshikazu Naka- Center, RIKEN, Yokohama, Kanagawa, Japan. mura5, Hiroshi Yanagawa6, Keiko Wakui7, 24Japan Kawasaki Disease Research Center, Yoshimitsu Fukushima7,FumioKishi8, Kuni- Tokyo, Japan. 25Laboratory for Molecular hiro Hamamoto9, Masaru Terai10, Yoshitake Medicine, Human Genome Center, Institute of Sato11, Kazunobu Ouchi12, Tsutomu Saji13,Aki- Medical Science, University of Tokyo, Tokyo, yoshi Nariai14, Yoichi Kaburagi15, Tetsushi Japan Yoshikawa16, Kyoko Suzuki17, Takeo Tanaka18, Toshiro Nagai19,HideoCho20, Akihiro Fujino21, Kawasaki disease (KD; OMIM 300530) is an Akihiro Sekine22, Reiichiro Nakamichi23,Tat- acute, self-limited vasculitis of infants and chil- suhiko Tsunoda23 , Tomisaku Kawasaki24 , dren. KD is characterized by prolonged fever Yusuke Nakamura25 &AkiraHata1: 1Labora- unresponsive to antibiotics, polymorphous skin tory for Gastrointestinal Diseases, SNP Re- rash, erythema of the oral mucosa, lips and search Center, RIKEN, Yokohama, Kanagawa, tongue, erythema of the palms and soles, bilat- 230-0045, Japan. 2Department of Hard Tissue eral conjunctival injection, and cervical lympha- Engineering, Graduate School Tokyo Medical denopathy. Coronary artery aneurysms develop and Dental University, 113-8549, Tokyo, Ja- in 15-25% of untreated patients making KD the 126 leading cause of acquired heart disease in chil- Naoyuki Kamatani5, Yuta Kochi6, Kenichi Shi- dren in developed countries. Treatment with in- mane6, Kazuhiko Yamamoto6, Yusuke Naka- travenous immunoglobulin (IVIG) abrogates the mura1, Wako Yumura2, and Koichi Matsuda1*: inflammation in approximately 80% of patients 1Laboratory of Molecular Medicine, Human and reduces the aneurysm rate to less than 5%. Genome Center, Institute of Medical Science, Cardiac sequelae of the aneurysms include 4Department of Human Genetics, Graduate ischemic heart disease, myocardial infarction, School of Medicine, University of Tokyo, To- and sudden death. Epidemiological features kyo 108-8639, Japan. 2Department of Medicine, such as seasonality and clustering of cases sug- Kidney Center, 5Institute of Rheumatology, To- gest an infectious trigger, although no pathogen kyo Women’s Medical University, Tokyo 162- has been isolated and the etiology remains un- 8666, Japan. 3Laboratory for Genotyping, known. Several lines of evidence suggest the im- 6Laboratory for Rheumatic Diseases, SNP Re- portance of genetic factors in disease susceptibil- search Center, The Institute of Physical and ity and outcome. First, the incidence of KD is 10 Chemical Research (RIKEN), Kanagawa 230- to 20 times higher in Japan than in Western 0045, Japan. countries4. Second, the risk of KD in siblings of affected children is 10 times higher than the SLE (OMIM #152700) is one of the common general population (λs=10) and the incidence of autoimmune diseases that predominantly afflict KD in children born to parents with a history of women of child-bearing age. The clinical fea- KD is twice as high as the general population. tures and serological abnormalities observed in Familial aggregation of the disease has also been patients with SLE are remarkably diverse, and observed. Although association studies have make clinical assessment and treatment very dif- identified candidate genes that may influence ficult. Anti-inflammatory drugs and immuno- KD susceptibility, a systematic genetic approach suppressive drugs such as cortisone, azathio- has not been previously performed. Recently, prine, methotrexate, and cyclophosphamide we conducted affected sib pair analysis of KD have been widely used for treatment of this dis- that demonstrated linkage in several chromoso- ease and contribute to significant improvement mal regions including chromosome 19. Here we in prognosis of patients with SLE; although the show the results of linkage disequilibrium map- 4-year survival was estimated to be ~50% in ping performed on 19q13.2 which identified a the 1950s (Merrell and Shulman 1955), the 15- functional SNP in intron 1 of ITPKC signifi- year survival rate is at present estimated to be cantly associated with the risk of KD and the approximately 80%. Despite the improved prog- formation of coronary artery aneurysms. We nosis in the majority of SLE patients, a signifi- also characterized ITPKC as a novel negative cant proportion of the patients still suffer from regulator of Ca2+/NFAT signaling pathway in T- the disease and/or severe adverse reactions cells. We identified a functional SNP (itpkc_3) of caused by these drugs. For example, synthetic the inositol 1,4,5-trisphosphate 3-kinase C glucocorticoid causes adverse events including (ITPKC ) gene on chromosome 19q13.2 which is hypertension, hyperglycemia, diabetes, cataract, significantly associated with susceptibility to KD glaucoma, infection, psychosis, osteoporosis, and both in Japanese and U.S. children, as well as osteonecrosis. To improve quality of life for the withanincreasedriskofcoronaryarterylesions. SLE patients, it is one of the critical steps to elu- Transfection experiments showed that the C- cidate the molecular mechanism causing SLE. To allele of itpkc_3 reduces splicing efficiency of identify a gene(s) susceptible to SLE, we per- the ITPKC mRNA. ITPKC acts as a negative formed a case-control association study using regulator of T-cell activation through the Ca2+/ genome-wide gene-based single nucleotide poly- NFAT signaling pathway and the C-allele may morphisms (SNPs) in Japanese population. Here contribute to immune hyper-reactivity in KD. we report that an SNP (rs3748079) located in a This finding provides new insights into the promoter region of the inositol 1,4,5- mechanisms of immune activation in KD and triphosphate receptor type 3 (ITPR3)geneon emphasizes the importance of activated T-cells chromosome 6p21 was significantly associated in the pathogenesis of this vasculitis. with SLE in two independent Japanese case- control samples (p=0.0000000178 with odds ra- (3) Systemic Lupus Erythematosus tio of 1.88, 95% Confidence Interval (CI) of 1.51- 2.35). This particular SNP also revealed associa- Tetsuya Oishi1,2,AritoshiIida3, Shigeru Ot- tions with rheumatoid arthritis (RA) (p=0.0084 subo1,2, Yoichiro Kamatani1, Masayuki Usami1, with odds ratio of 1.23, 95% CI of 1.05-1.43) and Takashi Takei2, Keiko Uchida2, Ken Tsuchiya2, with Graves’ disease (GD) (p=0.00036 with Susumu Saito3, Yozo Ohnisi1, Katsushi Toku- odds ratio of 1.57, 95% CI of 1.23-2.02). We naga4, Kosaku Nitta2,Yasushi Kawaguchi5, found the binding of NKX2.5 specific to the 127 non-susceptible T allele in the region including cated by independent 203 cases and 294 controls this SNP. Furthermore, an SNP in NKX2.5 also (p=0.0440, OR of 1.52 with 95%CI of 1.01-2.78). revealed an association with SLE (p=0.0058 Although a copy number variation (CNV) of the with odds ratio of 1.74, 95% CI of 1.16-2.58). In- C4 gene adjacent to the TNXB gene was re- dividuals with risk genotype of both ITPR3 and ported to be associated with SLE, our analysis NKX2.5 loci have higher risk for SLE (Odds ra- on this CNV revealed that the association of tio=5.77). Our data demonstrate that genetic CNV of the C4 gene was weaker than the SNP and functional interactions of ITPR3 and NKX in the TNXB gene and likely to reflect the link- 2.5 play a crucial role in the pathogenesis of age disequilibrium between C4 CNV and this SLE. particular SNP. Stratified analysis also revealed We also identified that an SNP, rs3130342, in that the association of SNP rs3130342 with SLE a 5’ flanking region of the TNXB gene revealed was independent of HLA-DRB1*1501 allele that a significant association with SLE (p = has been shown to be associated with SLE. Our 0.000000930, odds ratio (OR) of 3.11 with 95% findings strongly imply that the TNXB gene is a confidence interval (95%CI) of 1.89-5.28) in candidate gene susceptible to SLE in Japanese Japanese population. This association was repli- population.

Publications

1. Kato, T., Hayama, S., Yamabuki, T., sults in G2/M arrest. Oncogene. 26: 1110- Ishikawa, N., Miyamoto, M., Ito, T., 1121, 2007. Tsuchiya, E., Kondo, S., Nakamura, Y. and 7. Ebana, Y., Ozaki, K., Inoue, K., Sato, H., Daigo, Y. Increased expression of insulin- Iida, A., Lwin, H., Saito, S., Mizuno, H., like growth factor-II messenger RNA- Takahashi, A., Nakamura, T., Miyamoto, Y., binding protein 1 is associated with tumor Ikegawa, S., Odashiro, K., Nobuyoshi, M., progression in patients with lung cancer. Kamatani, N., Hori, M., Isobe, M., Naka- Clin Cancer Res. 13: 434-442, 2007. mura, Y. and Tanaka, T. A functional SNP 2. Kanehira, M., Katagiri, T., Shimo, A., Takata, in ITIH3 is associated with susceptibility to R., Shuin, T., Miki, T., Fujioka, T. and Naka- myocardial infarction. J Hum Genet. 52: 220- mura, Y. Oncogenic role of MPHOSPH1, a 229, 2007. cancer-testis antigen specific to human blad- 8. Lin, M.-L., Park, J.-H., Nishidate, T., Naka- der cancer. Cancer Res. 67: 3276-3285, 2007. mura,Y.andKatagiri,T.MELK,maternal 3.Kubo,M.,Hata,J.,Ninomiya,T.,Matsuda, embryonic leucine zipper kinase, involved in K., Yonemoto, K., Nakano, T., Matsushita, mammary carcinogenesis through interac- T., Yamazaki, K., Ohnishi, Y., Saito, S., Kita- tion with Bcl-G, a pro-apoptotic member of zono, T., Ibayashi, S., Sueishi, K., Iida, M., Bcl-2 family. Breast Cancer Res. 9: R17, 2007. Nakamura, Y. and Kiyohara, Y.A nonsyn- 9. Takata, R., Katagiri, T., Kanehira, M., Shuin, onymous SNP in PRKCH increases the risk T., Miki, T., Namiki, M., Kohri, K., Tsunoda, of cerebral infarction. Nat Genet. 39: 212- T.,Fujioka,T.andNakamura,Y.Validation 217, 2007. study of the prediction system for clinical 4. Suzuki, C., Takahashi, K., Hayama, S., response of M-VAC neoadjuvant chemother- Ishikawa,N.,Kato,T.,Ito,T.,Tsuchiya,E., apy. Cancer Sci. 98: 113-117, 2007. Nakamura, Y. and Daigo, Y. Identification of 10. Hosokawa, M., Takehara, A., Matsuda, K., Myc-assocaited protein with JmjC domain as Eguchi, H., Ohigashi, H., Ishikawa, O., Shi- a novel therapeutic target oncogene for lung nomura,Y.,Imai,K.,Nakamura,Y.and cancer. Mol Cancer Ther. 6: 542-551, 2007. Nakagawa, H. Oncogenic role of KIAA0101 5. Yamabuki, T., Takano, A., Hayama, S., interacting with proliferating cell nuclear an- Ishikawa, N., Kato, T., Miyamoto, M., Ito, T., tigen in pancreatic cancer. Cancer Res. 67: Ito, H., Miyagi, Y., Nakayama, H., Fujita, M., 2568-2576, 2007. Hosokawa, M., Tsuchiya, E., Kohno, N., 11. Hata, J., Matsuda, K., Ninomiya, T., Yo- Kondo, S., Nakamura, Y. and Y. Daigo. nemoto, K., Matsushita, T., Ohnishi, Y., Dikkopf-1 as a novel serologic and prognos- Saito, S., Kitazono, T., Ibayashi, S., Iida, M., tic biomarker for lung and esophageal carci- Kiyohara, Y., Nakamura, Y. and Kubo, M. nomas. Cancer Res. 67: 2517-2525, 2007. Functional SNP in a Sp1-binding site of 6. Saigusa,K.,Imoto,I.,Tanikawa,C.,Aoyagi, AGTRL1 gene is associated with susceptibil- M., Ohno, K., Nakamura, Y. and Inazawa, J. ity to brain infarction. Hum Mol Genet. 16: RGC32, a novel p53-inducible gene, is lo- 630-639, 2007. cated on centrosomes during mitosis and re- 12. Osawa, N., Koga, D., Araki, S., Uzu, T., 128

Tsunoda, T., Kashiwagi, A., Nakamura, Y. Y. and Ohno, R. Prediction of risk of disease and Maeda, S. Combinational effect of genes recurrence by genome-wide cDNA microar- for the renin-angiotensin system in confer- ray analysis in patients with Philadelphia ring susceptibility to diabetic nephropathy. J chromosome-positive acute lymphoblastic Hum Genet. 52: 143-151, 2007. leukemia treated with imatinib-combined 13. Onouchi, Y., Tamari, M., Takahashi, A., chemotherapy. Int J Oncol. 31: 313-322, 2007. Tsunoda, T., Yashiro, M., Nakamura, Yo., 21. Fujikawa, M., Katagiri, T., Tugores, A., Yanagawa, H., Wakui, K., Fukushima, Y., Nakamura, Y. and Ishikawa, F. ESE-3, an Kawasaki, T., Nakamura, Yu. and Hata, A. Ets family transcription factor, Is up- A gennomewide linkage analysis of regulated in cellular senescence. Cancer Sci. Kawasaki disease:evidence for linkage to 98: 1468-1475, 2007. chromosome 12. J Hum Genet. 52: 179-190, 22.Hosono,N.,Kubo,M.,Tsuchiya,Y.,Sato, 2007. H., Kitamoto, T., Saito, S., Ohnishi, Y. and 14. Miyamoto, Y., Mabuchi, A., Shi, D., Kubo, Nakamura, Y. Multiplex PCR-based real- T., Takatori, Y., Saito, S., Fujioka, M., Sudo, time Invader assay (mPCR-RTIARETINA): a A., Uchida, A., Yamamoto, S., Ozaki, K., novel SNP-based detection method for du- Takigawa, M., Tanaka, T., Nakamura, Y., Ji- plicated regions copy number polymor- ang, Q. and Ikegawa, S. A functional poly- phisms. Hum Mutat. 29: 182-189, 2007. morphism in the 5’ UTR of GDF5 is associ- 23. Kato, T., Sato, N., Hayama, S., Yamabuki, T., ated with susceptibility to osteoarthritis. Nat Ito, T., Miyamoto, M., Kondo, S., Nakamura, Genet. 39: 529-533, 2007. Y. and Daigo, Y. Activation of Holliday 15. Kanehira, M., Harada, Y., Takata, R., Shuin, Junction-Recognizing Protein involved in the T., Miki, T., Fujioka, T., Nakamura, Y. and chromosomal stability and immortality of Katagiri, T. Involvement of upregulation of cancer cells. Cancer Res. 67: 8544-8553, 2007. DEPDC1 (DEP domain containing 1) in 24. Ueda, K., Katagiri, T., Shimada, T., Irie, S., bladder carcinogenesis. Oncogene. 26: 6448- Sato,T.,Nakamura,Y.andDaigo,Y.Com- 6455, 2007. parative profiling of serum glycoproteome 16. Hayama, S., Daigo, Y., Yamabuki, T., Hirata, by sequential purification of glycoproteins D.,Kato,T.,Miyamoto,M.,Ito,T.,Tsuchiya, and 2-nitrobenzensulfenyl (NBS) stable iso- E.,Kondo,S.andNakamura,Y.Phospho- tope labeling: a new approach for the novel rylation and activation of cell division cycle biomarker discovery for cancer. J Proteome associated 8 by aurora kinase B plays a sig- Res. 6: 3475-3483, 2007. nificant role in human lung carcinogenesis. 25. Taniwaki, M., Takano, A., Ishikawa, N., Cancer Res. 67: 4113-4122, 2007. Yasui, W., Inai, K., Nishimura, H., Tsuchiya, 17. Tamura, K., Furihata, M., Tsunoda, T., E., Kohno, N., Nakamura, Y. and Daigo, Y. Ashida, S., Takata, R., Obara, W., Yoshioka, Activation of KIF4A as a prognostic H., Daigo, Y., Nasu, Y., Kumon, H., Konaka, biomarker and therapeutic target for lung H.,Namiki,M.,Tozawa,K.,Kohri,K.,Tanji, cancer. Clin Cancer Res. 13: 6624-6631, 2007. N., Yokoyama, M., Shimazui, T., Akaza, H., 26. Suda, T., Tsunoda, T., Daigo, Y., Nakamura, Mizutani,Y.,Miki,T.,Fujioka,T.,Shuin,T., Y. and Tahara, H. Identification of HLA-A Nakamura, Y. and Nakagawa, H. Molecular 24-restricted epitope-peptides derived from features of hormone-refractory prostate can- gene products up-regulated in lung and cer cells by genome-wide gene expression esophageal cancers as novel targets for im- profiles. Cancer Res. 67: 5117-5125, 2007. munotherapy. Cancer Sci. 98: 1803-1808, 18. Bourdon, A., Minai, L., Serre, V., Jais, J.-P., 2007. Sarzi, E., Aubert, S., Chretien, D., Lonlay, de 27. Mano, Y., Takahashi, K., Ishikawa, N., P., Paquis-Flucklinger, V., Arakawa, H., Takano, A., Yasui, W., Inai, K., Nishimura, Nakamura, Y., Munnich, A. and Rotig, A. H., Tsuchiya, E., Nakamura, Y. and Daigo, Mutation of RRM 2 B , encoding p 53- Y. Growth factor receptor 1 oncogene part- controlled ribonucleotide reductase (p53R2), ner as a noveprognostic biomarker and causes severe mitochondrial DNA depletion. therapeutic target for lung cancer. Cancer Nat Genet. 39: 776-780, 2007. Sci. 98: 1902-1913, 2007. 19. Giacomini, K. M., Krauss, R. M., Roden, D. 28. Senju, S., Suemori, H., Zembutsu, H., Ue- M., Eichelbaum, M., Hayden, M. R. and mura, Y., Hirata, S., Fukuma, D., Mat- Nakamura,Y.Whengooddrugsgobad suyoshi, H., Shimomura, M., Haruta, M., (commentary). Nature. 446: 975-977, 2007. Fukushima, S., Matsunaga, Y., Katagiri, T., 20. Zembutsu, H., Yanada, M., Hishida, A., Nakamura, Y., Furuya, M., Nakatsuji, N. Katagiri, T., Tsuruo, T., Sugiura, I., and Nishimura, Y. Genetically manipulated Takeuchi, J., Usui, N., Naoe, T., Nakamura, human embryonic stem cell-derived den- 129

dritic cells with immune regulatory function. hanced methyltransferase activity of SMYD3 Stem Cells. 25: 2720-2729, 2007. by the cleavage of its N-terminal region in 29. Cha, P.-C., Mushiroda, T., Takahashi, A., human cancer cells. Oncogene. Nov 12; Saito, S., Shimomura, H., Wanibuchi, Y., [Epub ahead of print], 2007. Suzuki, T., Kamatani, N. and Nakamura, Y. 37. The International HapMap Consortium. A High-resolution SNP and haplotype maps of second generation human haplotype map of the human gamma-glutamyl carboxylase over 3.1 million SNPs. Nature. 449: 851-861, gene (GGCX) and association study between 2007. polymorphisms in GGCX with warfarin 38. Sabeti, P.C., Varilly, P., Fry, B., Lohmueller, maintenance-dose requirement of the Japa- J.,Hostetter,E.,Cotsapas,C.,Xie,X.,Byrne, nese population. J Hum Genet. 52: 856-864, E.H., McCarroll, S.A., Gaudet, R., Schaffner, 2007. S.F., Lander, E.S. and The International Hap- 30. Yamazaki, K., Onouchi, Y., Takazoe, M., Map Consortium: Genome-wide detection Kubo,M.,Nakamura,Y.andHata,A.Asso- and characterization of positive selection in ciation analysis of genetic variants in IL23R, human populations. Nature. 449: 913-918, ATG16L1 and 5p13.1 loci with Crohn’s dis- 2007. ease in Japanese patients. J Hum Genet. 52: 39. Ishikawa, N., Takano, A., Yasui, W., Inai, K., 575-583, 2007. Nishimura, H., Ito, H., Miyagi, Y., 31. Tsukumo, Y., Tomida, A., Kitahara, O., Nakayama, H., Fujita, M., Hosokawa, M., Nakamura, Y., Asada, S., Mori, K. and Tsu- Tsuchiya, E., Kohno, N., Nakamura, Y. and ruo, T. Nucleobindin 1 controls the unfolded Daigo, Y. Cancer-testis antigen, LY6K is a protein response by inhibiting ATF6 activa- serologic biomarker and a therapeutic target tion. J Biol Chem. 282: 29264-29272, 2007. for lung and esophageal carcinomas. Cancer 32. Takehara, A., Hosokawa, M., Eguchi, H., Res. 67: 11601-11611, 2007. Ohigashi, H., Ishikawa, O., Nakamura, Y. 40. Kato, M., Miya, F., Kanemura, Y., Tanaka, and Nakagawa, H. GABA stimulates pancre- T., Nakamura, Y. and Tsunoda, T. Recombi- atic cancer growth through over-expressing nation rates of genes expressed in human GABAA receptor pai subunit (GABRP). Can- tissues. Hum Mol Genet. Nov 13; [Epub cer Res. 67: 9704-9712, 2007. ahead of print], 2007. 33. Kunizaki, M., Hamamoto, R., Silva, F.P., 41. Kidokoro, T., Tanikawa, C., Furukawa, Y., Yamaguchi, K., Nagayasu, T., Shibuya, M., Katagiri, T., Nakamura, Y. and Matsuda, K. Nakamura, Y. and Furukawa, Y. The lysine CDC20, a potential cancer therapeutic target, 831 of VEGFR1 a novel target of methyla- is negatively regulated by p53. Oncogene. tion by SMYD3. Cancer Res. 67: 10759-10765, Sep 17; [Epub ahead of print], 2007. 2007. 42. Brunet, J., Pfaff, A.W., Abidi, A., Unoki, M., 34. Onouchi, Y., Gunji, T., Burns, J.C., Shimizu, Nakamura, Y., Guinard, M., Klein, J.-P., C., Newburger, J.W., Yashiro, M., Naka- Candolfi, E. and Mousli, M. Toxoplasma mura, Yo., Yanagawa, H., Wakui, K., gondii exploits UHRF1 and induces host cell Fukushima, Y., Kishi, F., Hamamoto, K., cycle arrest at G2 to enable its proliferation. Terai,M.,Sato,Y.,Ouchi,K.,Saji,T.,Nariai, Cell Microbiol. Dec 6; [Epub ahead of print], A., Kaburagi, Y., Yoshikawa, T., Suzuki, K., 2007. Tanaka, T., Nagai, T., Cho, H., Fujino, A., 43. Shimo, A., Tanikawa, C., Nishidate, T., Mat- Sekine, A., Nakamichi, R., Tsunoda, T., suda, K., Lin, M.-L., Park, J.-H., Ohta, T., Kawasaki, T., Nakamura, Yu. and Hata, A. Hirata, K., Fukuda, M., Nakamura, Y. and A functional polymorphism in ITPKC is as- Katagiri,T.InvolvementofKIF2C/MCAK sociated with Kawasaki disease susceptibil- overexpression in mammary carcinogenesis. ity and formation of coronary artery aneu- Cancer Sci. 99: 62-70, 2008. rysms. Nat Genet. 40: 35-42, 2008. 44. Leung, A.A.C.C., Wong, V.C.L., Yang, L.C., 35. Kudo, S., Konda, R., Obara, W., Kudo, D., Chan, P.L., Daigo, Y., Nakamura, Y., Qi, R. Tani, K., Nakamura, Y. and Fujioka, T. Inhi- Z., Miller, L., Liu, E.T.-K., Wang, L.D.J.-L., S. bition of tumor growth through suppression Law,Tsao,W.andLung,M.L.Frequentde- of angiogenesis by brain-specific angiogene- creased expression of candidate tumor sup- sis inhibitor 1 gene transfer in murine renal pressor gene, DEC1, and its anchorage- cell carcinoma. Oncol Rep. 18: 785-791, 2007. independent growth properties and impact 36. Silva, F.P., Hamamoto, R., Kunizaki, M., on global gene expression in esophageal car- Tsuge, M., Nakamura, Y. and Furukawa, Y. cinoma. Int J Cancer. 122: 587-594, 2008. Enhanced methyltransferase activity of 45. Uemura, M., Tamura, K., Chung, S., Honma, SMYD3 by the cleavage of its N-terminal re- S., Okuyama, A., Nakamura, Y. and Naka- gion in human cancer cells. Oncogene, En- gawa, H. A novel 5-steroid reductase (SRD5 130

A3, type-3) is overexpressed in hormone- Nakayama, T., Soma, M., Hata, A., Fujioka, refractory prostate cancer. Cancer Sci. 99: 81- A., Kawano, Y., Nakao, K., Sekine, A., 86, 2008. Yoshida, T., Nakamura, Y., Saruta, T., Ogi- 46. Kamatani, Y., Matsuda, K., Ohishi, T., Oht- hara, T., Sugano, S., Miki, T. and Tomoike, subo, S., Yamazaki, K., Iida, A., Hosono, N., H. High-Density Association Study and Kubo, M., Yumura, W., Nitta, K., Katagiri, Nomination of Susceptibility Genes for Hy- T., Kawaguchi, Y., Kamatani, N. and Naka- pertension in the Japanese National Project. mura, Y. Identification of a significant asso- Hum Mol Genet. in press, 2008. ciationofanSNPinTNXBwithSLEin 50. Oishi, T., Iida, A., Otsubo, S., Kamatani, Y., Japanese population. J Hum Genet. 53: 64- Usami, M., Takei, T., Uchida, K., Tsuchiya, 73, 2008. K., Saito, S., Ohnishi, Y., Tokunaga, K., 47. Obama, K., Satoh, S., Hamamoto, R., Sakai, Nitta, K., Kawaguchi, Y., Kamatani, N., Ko- Y., Nakamura, Y. and Furukawa, Y. En- chi, Y., Shimane, K., Yamamoto, K., Naka- hanced expression of RAD51AP1 is involved mura, Y., Yumura, W. and Matsuda, K. A in the growth of intrahepatic cholangiocarci- functional SNP in the NKX2.5-binding site noma cells. Clin Cancer Res. in press, 2008. of ITPR3 promoter is associated with sus- 48. Fukukawa, C., Hanaoka, H., Nagayama, S., ceptibility to Systemic Lupus Erythematosus Tsunoda, T., Toguchida, J., Endo, K., Naka- in Japanese population. J Hum Genet. 53: mura,Y.andKatagiri,T.Radioimmunother- 151-162, 2008. apy of human synovial sarcoma using a 51. Daigo, Y. and Nakamura, Y. From cancer monoclonal antibody against FZD10. Cancer genomics to the thoracic oncology: New Sci. in press, 2008. biomarker and therapeutic target discovery 49. Kato, N., Miyata, T., Tabara, Y., Katsuya, T., for lung and esophageal carcinomas. (Re- Yanai, K., Hanada, H., Kamide, K., Nakura, view Article) Gen Thorac and Cardiovasc J., Kohara, K., Takeuchi, F., Mano, H., Yasu- Surg. 56: 43-53, 2008. nami, M., Kimura, A., Kita, Y., Ueshima, H., 131

Human Genome Center Laboratory of Functional Genomics ゲノム機能解析分野

Visiting Professor Gregory Mark Lathrop, Ph.D. 客員教授 理学博士 グレゴリー・マーク・ラスロップ Associate Professor Ryo Yamada, M.D., Ph.D. 准教授 医学博士 山 田 亮

Genetic heterogeneity of human beings is one of the most important targets of post-genomic research. Genome-wide association studies are being actively car- ried out using the genetic polymorphism markers to identify disease-related loci. We focus on the development of new methods to interpret the heterogeneity and to map the disease-associated loci and collaborate with research groups for data- mining of their genetic epidemiology studies.

1. The development of new methods to map the beginning and also attempt to improve the disease-associated loci with genetic poly- interpretation of data of GWA studies as a morphisms. whole.

Ryo Yamada a. Multiple testing correction

Genome-wide association (GWA) studies are One of the major sources of inter-test depend- resulting in many useful findings. The scale of ence is allelic association due to LD and popula- such studies is increasing along with rapid pro- tion structure. This problem has been well gress in genotyping technology. This increase in aware of since SNP-based LD mapping became scale necessarily increases the degree of depend- popular. There are some other origins of inter- ence among individual tests in GWA studies. test dependence that are less noticed but simi- The inter-test dependence is problematic be- larly troublesome as allelic association; the inter- cause almost all the conventional statistical test dependence due to application of multiple methods assume independence among multiple genetic models to individual markers and due tests. Besides the multiple sources of inter-test to study designs, such as the usage of common dependency, the variable inflation of test statis- controls and subjects with comorbidity for multi tics due to biased sampling from structured -phenotype studies, which is currently common population is one of the unavoidable conse- particularly in the gigantic study design. This quences of enlarged sample size. These prob- year we developed a method to estimate the ef- lems that complicate the interpretation of data fective number of independent tests from GWA of GWA studies are mutually related and there studies in the structured population. is no straight-forward solution of them all to- gether. We decompose the difficulty into parts, b. Test statistics correction for data of struc- i.e., the problem of linkage disequilibrium (LD), ture population population structure, multiple genetic models, study design and characterize their problem and Because the genetic epidemiology studies on propose solution of the individual problems at complex genetic traits target relatively weak fac- 132 tors, which means sample size of them should netic heterogeneity and for genetic epidemiol- be more than thousands and subsequently ogy studies usually. We develop a new method makes idealistic random sampling from homo- to quantify the heterogeneity and complexity of geneous population impossible. The test statis- population of DNA sequence with SNPs so that tics of the studies in the heterogeneous popula- various researches based on genetic heterogene- tion, in other words, structured population, ity will benefit in 2007. tends to give false positive results. One of the methodstocorrecttheincreaseinthefalseposi- b. Geometric expression of haplotype popu- tives is genomic control method for chi-square lations distribution. We modified the genomic control method so that it could correct the Fisher’s exact Haplotypes are consisted of alleles of multiple test statistics in 2007. markers. We attempted to deal the haplotype data from combination theory standpoint and c. Characterization of exact trend test for investigated the utility of polyhedral handling SNP case-control association test data of the combinatorial aspects of haplotypes.

The trend test is the test of the choice for 2×3 3. Collaboration with genetic epidemiology contingency table test of SNP data. The defini- research groups. tion of the two-sided exact trend test is contro- versial. We investigated the suitability of multi- Gregory Mark Lathrop and Ryo Yamada ple ways of the two-sided exact trend test for 2 ×3 SNP data tables this year. Besides the development of new methods to analyze genetic polymorphism data in the con- 2. The development of new methods to inter- text of population genetics and genetic statistics, pret the genetic heterogeneity. we collaborate with multiple research groups in and out of the IMS-UT, including Kyoto Univer- Ryo Yamada sity, Kyoto, The University of Tokyo Hospital, Tokyo, Laboratory for Rheumatic Diseases, SRC, As a compound in nature, the DNA sequence RIKEN, Yokohama, National Hospital Organiza- is under pressure to maximize the heterogeneity tion Sagamihara National Hospital, Sagamihara, of the sequence. Under the most random condi- and The Centre National de Génotypage, Evry, tion, all bases of the sequence would be poly- France, for the interpretation of genetic epidemi- morphic, and all bases and all sets of bases are ology data with the conventional statistical mutually independent. At the other extreme, un- methods. der the least random condition, all DNA mole- cules would be clones. In living organisms, the 4. Public distribution of population genetics number of polymorphic sites in the DNA se- and genetic association study tools. quence is limited due to the requirements for re- production and as a result of selection and ge- Ryo Yamada netic drift, against which opposite forces act to increase heterogeneity (e.g., mutation and re- Because the designs of genetic epidemiology combination). A major research target following studies have been changing, the analysis tools the completion of the genome sequence is the have to be updated all the time. The number of investigation of intra-species variations, among genetic epidemiology study groups is much which diallelic single nucleotide polymorphisms more than the groups on genetic statistics in the are the most common. world and also in Japan. We opened the web site that distributes basic tool of linkage dise- a. Quantitation of linkage disequilibrium of quilibrium mapping for public use. This distri- multiple markers bution is supported by the grant from Japan So- ciety for the Promotion of Science on the permu- Genetic variations within a population give tation test. rise to LD, and the use of the genetic history of Web-site URL: http://func-gen.hgc.jp/ the population and LD mapping is a very prom- ising method for identifying genetic back- grounds of various phenotypes. LD is a measure of inter-marker dependence. Although the inter- marker dependence exist among any set of markers, only the pair-wise inter-marker de- pendence is utilized for quantitation of the ge- 133

Publications

1. Yamamoto, K. and Yamada, R. Lessons from in rheumatoid arthritis. Ann. N.Y. Acad. Sci. a Genomewide Association Study of Rheuma- 1108: 323-39, 2007 toid Arthritis. N. Engl. J. Med. 357: 1250-1251, 6. Ryo Yamada, and F. Matsuda. A novel 2007 method to express SNP-based genetic hetero- 2. Yamada, R. and Yamamoto, K. Mechanisms geneity,Ψ,and its use to measure of disease: genetics of rheumatoid arthritis-- linkage disequilibrium for multiple SNPs, Dg, ethnic differences in disease-associated genes. and to estimate absolute maximum of haplo- Nat. Clin. Pract. Rheumatol. 3: 644-50, 2007 type frequency. Genetic Epidemiol. 31: 709- 5. Suzuki, A., Yamada, R. and Yamamoto, K. 726, 2007 Citrullination by peptidylarginine deiminase 134

Human Genome Center Laboratory of Functional Analysis In Silico 機能解析イン・シリコ分野

Professor Kenta Nakai, Ph.D. 教 授 理学博士 中井謙太 Associate Professor Kengo Kinoshita, Ph.D. 准教授 理学博士 木下賢吾

The mission of our laboratory is to conduct computational ( “in silico”) studies on the functional aspects of genome information. Roughly speaking, genome informa- tion represents what kind of proteins/RNAs are synthesized on what conditions. Thus, our study includes the structural analysis of molecular function of each gene product as well as the analysis of its regulatory information, which will lead us to the understanding of its cellular role represented by the networks of inter-gene in- teraction.

1. Transcription factor binding site co- LexA sites upstream of some of the genes in- occurrence in firmicutes volved in the first response may be necessary to ensure a tight repression of potentially harmful Nicolas Sierro and Kenta Nakai genes despite the weaker binding site.

Preliminary investigation of binding site co- 2. Promoter structure modeling for expres- occurrence in firmicutes highlighted several spe- sion pattern prediction cific combinations of transcription factors ap- pearing with a higher-than-expected frequency Alexis Vandenbon and Kenta Nakai in some genomes. Such combinations could in- dicate the presence of different regulation Initiation of transcription in eukaryotes is mechanisms for specific pathways and in spe- regulated by transcription factors binding cis- cific strains. For example, two LexA sites sepa- regulatory elements in the regulatory regions of rated by 16 or 17 nucleotides are found about genes. We can therefore assume that regulatory 150 nucleotides upstream of lexA and a gene regions containing similar sets of cis-regulatory similar to uvrX in Bacillus anthracis and clausii, motifs will drive similar expression patterns at Listeria monocytogenes and Staphylococcus aureus, the transcriptional level. We have developed a epidermidis , haemolyticus and saprophyticus Markov chain-based promoter structure model strains. Both of these genes are repressed by the that includes positional preference, order, and SOS DNA-repair genes repressor LexA. How- orientation of motifs in upstream regulatory re- ever, with the notable exception of the Staphylo- gions. The model is trained using promoter se- coccus strains for which a similar pattern with a quences of a set of co-regulated genes, and is distance of 31 nucleotides is found in front of subsequently used to predict genes having simi- recA, no other gene of the SOS DNA-repair lar expression profiles as the input gene set. We regulon is preceded by such combinations. This applied our model on a dataset of genes ex- specific co-occurrence pattern could be related pressed in pharyngeal muscle cells of Caenorhab- to the two-step SOS DNA-repair response ditis elegans, and on a dataset of muscle ex- mechanism, where the presence of multiple pressed genes of Ciona intestinalis. Both compu- 135 tational and experimental verifications indicated ward the understanding the relationship, we that this model is capable of predicting candi- analyzed human promoter sequences and their date promoters driving similar expression pat- transcriptional activities in specific tissues, terns as the input-regulatory sequences. We are which were defined by large collection of full- further developing our promoter structure length cDNAs. We used support vector machine model, and we are also working on a number of to predict the transcriptional activities from pu- projects to find common features in sets of co- tative TF binding sites (TFBSs) in the promoter regulated genes. We believe that our approaches sequences. Among 38 tissues tested, best predic- can be useful for finding promising candidate tion performances were obtained in the cases of genes for wet-lab experiments and for increasing brain and testis. Furthermore, we performed the our understanding of transcriptional regulation. same scheme of prediction by dividing the pro- moter sequences into two parts at -200 of tran- 3. DBTSS: database of transcription start scriptional start sites (TSSs). The prediction per- sites, progress report 2008 formances were improved, suggesting that the TFBSs function differently according to the posi- Hiroyuki Wakaguri1, Riu Yamashita, Yutaka tion, and the boundary is -200. We also per- Suzuki1, Sumio Sugano1, and Kenta Nakai: formed cross-species comparison of promoter se- 1Graduate School of Frontier Sciences, U. To- quences between human and mouse, and exam- kyo ined the average identities at various positions from the TSSs. The result supports the func- One of the major topics in the post-sequence tional boundary at -200 of TSSs. These results era is the analysis of transcription regulation will promote the understanding of the promoter network. In order to study this problem, we architecture of mammals. have constructed a database, DataBase of Tran- scription Starts Site (DBTSS), which included a 5. Retrotransposition as a source of new pro- number of 5’-end sequence (ESTs) produced by moters theoligo-cappingmethod.Wehaverecentlyre- leased DBTSS version 6, extended with three Kohji Okamura and Kenta Nakai major additional features. The first new feature is the new data resource by SOLEXA. Now Thefactthatpromotersareessentialforthe DBTSS has not only 1,540,411 mapped se- function of all genes presents the basis of the quences from oligo-capped clones, but also general idea that retrotranspositions give rise to 21,981,890 SOLEXA sequences from MCF7 and processed pseudogenes. However, recent studies HEK193 cells. The second feature is a viewer of have demonstrated that some retrotransposed evolutional conservation of the promoter re- genes are transcriptionally active. Because pro- gions. A user can compare the sequence around moters are not thought to be retrotransposed the promoter regions from three species among along with exonic sequences, these transcription- human, mouse, rat, chimp, and macaque. The ally active genes must have acquired a func- third feature is a tool for the transcriptome tional promoter by mechanisms that are yet to analysis of promoter region with expression and be determined. Hence, comparison between a predicted transcription factor binding sites. This retrotransposed gene and its source gene ap- tool searches for putative transcription factor pears to provide a unique opportunity to inves- binding sites commonly occurring in the pro- tigate the promoter creation for a new gene. We moters, which show similar behaviors on vari- identified 29 gene pairs in the human genome, ous drug perturbations. DBTSS can be accessed consisting of a functional retrotransposed gene at http://dbtss.hgc.jp. and its parental gene, and compared their re- spective promoters. In more than half of these 4. Predicting transcriptional activities of hu- cases, we unexpectedly found that a large part man promoters from putative transcription of the core promoter had been transcribed, factor binding sites by using support vec- reverse-transcribed, and then integrated to be tor machines operative at the transposed locus. This observa- tion can be ascribed to the recent discovery that Hirokazu Chiba, Riu Yamashita, Kenta Nakai transcription start sites tend to be interspersed rather than situated at one specific site. This Bindings of specific transcription factors (TFs) propensity could confer retrotransposability to to promoter sequences play a central role in promoters per se. Accordingly, the retrotrans- transcription regulation. However, the relation- posability can explain the genesis of some alter- ship between the combination of TFs and tran- native promoters. scription activities remains to be deciphered. To- 136

6. Investigation of motif finding perform- this analysis, different tendencies are observed ances for each organism. For example, 10bp AA/TT periodic motifs are observed in C. elegans and S. Keishin Nishida and Kenta Nakai cerevisiae,butnotinH. sapiens. These tendencies will be helpful for developing a new prediction Computational prediction of nucleotide bind- approach of nucleosome positioning. Now we ing specificity for transcription factors remains a are also investigating other important tendencies fundamental and largely unsolved problem. One for nucleosome positioning in higher eukaryotic of the prediction methods is pattern discovery genomes. in unaligned DNA sequences, called “motif finding”. Despite many studies about motif 8. Computational analysis of trans-splicing in finding, this problem is far from being resolved. C. intestinalis gene expression To provide an appropriate motif finding over- view for experiment design, we tried larger Shuang Li, Riu Yamashita, Nicolas Sierro and scale analysis of wide input conditions. We fo- Kenta Nakai cused on the Gibbs Motif Sampler, one of the most famous motif finding programs. Back- Trans-splicing, in which the 5’-ends of some ground sequences are generated randomly with pre-mRNAs are cut and replaced by the 5’- any GC content. Motif information is down- terminal sequence of a specific donor mRNA loaded from the JASPAR database to plant ex- called spliced-leader (SL), has been observed in perimentally valid motif into the background se- sixphyla.Inchordates,Ciona intestinalis is the quences. Our results show an expected ten- only one reported as performing trans-splicing dency; longer sequences are a more difficult systematically.Wehavestudiedthetrans- problem, and more mismatches further increase splicing of C. intestinalis with 5’-EST data. This the difficulty. We find that the motif finding year, we compared the two populations (trans- performance is worse for larger input sequences splicing positive or negative) of genes, aiming at than for smaller ones. Indicating that the default finding the biological meaning of trans-splicing. program parameters are not optimal for larges. We discovered that the expression ratio between Based on this result, we will investigate suitable the two gene groups varies with tissues and de- Gibbs Motif Sampler parameters for handling velopmental stages, ranging from 1:3.1 at the suchlargesizesequencedatasets.Inaddition ‘juvenile’ stage to 1:1.1 in ‘blood’. We also ob- we will investigate other motif models and served for instance that ribosome related genes other motif finding programs, and produce simi- are more likely to fall into the non-trans-spliced lar guidelines. group while mitochondria related genes show a preference for the trans-spliced group. Our 7. Analysis of Nucleosomal DNA sequences analysis indicates that in C. intestinalis,although there may not exist strong fundamental require- Yoshiaki Tanaka, Riu Yamashita and Kenta ments for genes to generate trans-splicing pre- Nakai mRNAs, the different populations of genes are likely to be spatially and temporally regulated Nucleosomes limit accessibility of some regu- differently. latory factor binding sites, and thus play an im- portant role in transcription and replication. It 9. Computational Description of Gene Net- has been reported that the nucleosome occu- works by Direct Regulatory Interactions in pancy rate in the regions upstream of transcrip- Ascidian Early Development tion start sites (TSSs) is lower than that in other regions. It is also known that the binding of nu- Xuyang Yuan, Nicolas Sierro, Kohji Okamura, cleosomes on DNA is sequence dependent. Mo- and Kenta Nakai tifs recognized by nucleosomes are categorized into two groups, periodic sequence motifs (ex. The ascidian Ciona is a good model system to 10bp AA/TT repeat) and consecutive sequence elucidate gene regulatory networks in chordate motifs (ex. poly(A/T)). Recently some research- development. Accordingly, comprehensive in ers succeeded in the computational prediction of situ hybridization assays have identified a num- nucleosome positions in the S. cerevisiae genome ber of regulatory genes with localized expres- using some of these motifs. However, their sion pattern. Subsequent knockdown assays methodsarenotasefficientinhighereukaryotic have illuminated thousands of combinations of genomes. We therefore compared nucleosome gene expression profiles in the early embryo, de- and linker DNA sequences from the genomes picting a blueprint for its early development. such as H. sapiens, C. elegans and S. cerevisiae.In However, the blueprint cannot demonstrate di- 137 rect molecular mechanisms occurring in each able to predict gene function. We are thus con- cell because an interaction between two genes structing new database named COXPRESdb (co- was detected by whether each transcription expression database) (http://coxpresdb.hgc.jp) takes place or not. Thus, we are computationally for coexpressed genes in mouse and human constructing gene networks consisting of only from such publicly available gene expression direct interactions. For trans-acting factors data. The information of gene coexpression is whose binding sites are unknown, we are pre- calculated from thousands of oligonucleotide dicting them by motif finding programs and ex- microarray (GeneChip) data and then repre- amination of orthologs to complete the whole sented as gene lists and gene networks. When network. Our result will help to understand the the networks became too big for the static pic- precise molecular mechanisms during the devel- ture on the web in GO networks or in tissue opment and will provide suggestion for further networks, we used Google Maps API to visual- experimental analyses. ize them interactively. This information of gene coexpression will widely promote experimental 10. Docking of protein molecular surfaces researches for mouse and human. with evolutionary trace analysis 13. Identification of transient hub proteins Eiji Kanamori2,3, Yoichi Murakami4, Yuko and the possible structural basis for their Tsuchiya, Daron M Standley4, Haruki Naka- multiple interactions mura4, and Kengo Kinoshita: 2Japan Biological Information Research Center, 3Hitachi Soft- Miho Higurashi, Takashi Ishida and Kengo Ki- ware Engineering Co., Ltd., 4Protein Research noshita Institute, Osaka University Proteins that can interact with multiple part- Protein-protein interaction is the first step to ners play central roles in the important biologi- construct interaction network of proteins, which cal processes, such as signal transduction. Al- is a key to understand the complex biological though it is well known that stable complexes functions. To identity the reliable interactions, and transient complexes have different struc- structural information of proteins are useful, so tural features, the hub proteins defined by pre- the method to predict the complex structure of vious study were identified as proteins with proteins is require. Many methods have been multiple partners, regardless of whether the in- developed by several groups, but the prediction teractions were transient or stable. As a result, accuracy is not so high at the moment. There- so-called hub proteins can be the components of fore, we tried to develop a new method to pre- stable supramolecules as opposed to the normal dict the complex structure of proteins using the concepts of hub proteins. In this study, we have different view of proteins, that is, molecular sur- developed a method to identify transient hub face with considering the physicochemical and proteins using PDB, and then perform statistical evolutional information. With this method, we analyses of the structural features of these pro- participated in blind prediction contest, critical teins. As a result, we found that the main differ- assessment of prediction of interactions (CAPRI) ence between sowciable and non-sociable pro- and have achieved the good results. teins is not the abundance of disordered regions, in contrast to the previous studies, but rather 11. COXPRESdb: a database of coexpressed the structural flexibility of the entire protein, re- gene networks in mammals sulting from less number of hydrogen bonds be- tween core residues. Takeshi Obayashi, Shinpei Hayashi5, Masayuki Shibaoka1, Motoshi Saeki5, Hiroyuki Ohta5,6, 14. PrDOS: prediction of disordered protein Kengo Kinoshita: 5Graduate School of Infor- regions from amino acid sequence mation Science and Engineering, Tokyo Insti- tute of Technology, 6Center of Biological Re- Takashi Ishida and Kengo Kinoshita sources and Informatics, Tokyo Institute of Technology Identification of disordered regions in pro- teins is important for the functional annotation The amount of publicly available gene expres- of proteins and for high-throughput structural sion data is the most abundant in mouse and determination. However, the cost of experimen- human, which is tenth or twentieth larger than tal determination of disordered regions is ex- that for Arabidopsis. However there is no coex- pensive. Thus, we developed a computational pressed gene database as like ATTED-II for method to predict disordered protein regions Arabidopsis, although the information is valu- from their amino acid sequences and imple- 138 mented web-interface of this prediction method. ent binding sites by different spatial arrange- Our system is composed of two predictors, that ments. is, a predictor based on the local amino acid se- quence, and one based on template proteins. 17. Molecular dynamics simulation of The performance of the method was evaluated cholesterol-induced change in a mem- in the blind benchmark by the structural biology brane environment community. The method achieved high accuracy (>90% with the sensitivity of 0.56), especially Naoya Fujita, Takashi Ishida and Kengo Ki- for short disordered regions. noshita

15. eF-seek: prediction of the functional sites Simulation in molecular level of lipid raft, of proteins by searching for similar elec- where many biological functions are realized, is trostatic potential and molecular surface important to understand membrane heterogene- shape ity, function of proteins and effects of raft struc- ture to proteins’ function. To reveal effects of Kengo Kinoshita, Yoichi Murakami4 and cholesterol, which is one of main components in Haruki Nakamura4 the raft domain, we compared a cholesterol- mixed raft-like membrane and a pure lipid, di- Molecular function of proteins are determined palmitoylphosphatidylcholine (DPPC), mem- by their three dimensional structures, thus the brane. By analyzing the simulation trajectories, a similarity of protein structure can give some significant variation of membrane width was clues to infer their functions. In many cases, the found on the pure DPPC membrane but not on molecular functions are begun with the molecu- the cholesterol-mixed membrane. Such a differ- lar interaction with small molecules (ligands). ence on the membrane environment also Therefore, to find the putative ligands is the changed stability and mobility on an embedded first step to identify the molecular function of protein, alamethicin. proteins. For the purpose, we have developed a web server to identify a putative ligand upon 18. Domain size effect on amino acid compo- the structure of proteins. The web server, eF- sition seek, accepts a coordinate file with PDB format file and returns complex structures predicted. to Matsuyuki Shirota, Takashi Ishida, and Kengo search for the similar ligand binding sites for Kinoshita the uploaded coordinate file with PDB format. The representative binding sites in eF-site data- The size effect of proteins on amino acid com- base are search by our own algorithm based on position, that is whether the frequencies of resi- the clique search algorithm. dues change systematically with increasing pro- tein size, has been tested several times under 16. Structural diversity of ligand binding the assumption that the frequencies of hydro- sites in proteins phobic residues would increase and those of hy- drophilic residues would decrease if protein size Megumi Okamoto, Matsuyuki Shirota, and increased. However, such systematic change in Kengo Kinoshita the frequency of hydrophobic or hydrophilic residues has not yet been detected. Here, taking Molecule recognition is the first step in realiz- advantage of recent large database of protein ing the function of proteins. Therefore, the un- structures, we examined the change in occur- derstanding of the molecular recognition is an rence of each amino acid with increasing protein important step to reveal the structure-function size, and identified definite correlation between relation in proteins. To observe the structure- protein size and occurrence of several amino ac- function relationship of known complexes, we ids. Contrary to the generally accepted expecta- carried out the classification of AMP, GMP, tion that the incidence of hydrophilic amino ac- CMP, UMP binding sites in PDB. We analyzed ids would decrease, only two charged amino variety of binding sites in three levels, which are acids-lysine and glutamic acid-among all hydro- secondary structure elements, spatial arrange- philic residues had decreased occurrence with ment of atoms and shape of three-residue frag- increasing protein size. Since these two amino ments. As a result, we found that binding sites, acids are the most rarely found in the protein which are various in the view of secondary core, this decrease of lysine and glutamic acid structure and atomic configuration, have com- canbeconsideredtobeinducedbytheincrease mon fragment combinations. And same frag- of protein size due to the decrease of relative ment combination constitutes structurally differ- surface area. 139

Publications

Vandenbon, A., Miyamoto, Y., Takimoto, N., Nakai, K., and Sugano, S. Intrinsic promoter Kusakabe, T., and Nakai, K. Markov chain- activities of primary DNA sequences in the based promoter structure modeling for tissue- human genome. DNA Res. 14(2): 71-77, 2007. specific expression pattern prediction. DNA Nakai, K. and Horton, P. Chapter 29: Computa- Res., in press. tional prediction of subcellular localization (in Genome Information Integration Project and H- M. van der Giezen ed., Protein Targeting Proto- invitational 2 Consortium, The H-Invitational cols (2nd ed.), Methods in Molecular Biology, Database (H-InvDB), a comprehensive annota- vol. 390). pp. 429-466, Humana Press, Totowa, tion resource for human genes and tran- 2007, (ISBN 13 978-1-58829-702-0). scripts. Nucl. Acids Res., in press. Obayashi, T., Kinoshita, K., Nakai, K., Shibaoka, Sierro, N., Makita, Y., de Hoon, M., and Nakai, M., Hayashi, S., Saeki, M., Shibata, D., Saito, K. DBTBS: a database of transcriptional regu- K.,andOhta,H.ATTED-II:adatabaseofco- lation in Bacillus subtilis containing upstream expressed genes and cis elements for identify- intergenic conservation information. Nucl. Ac- ing co-regulated gene groups in Arabidopsis. ids Res., in press. Nucl. Acids Res. 35: D863-D869, 2007. Wakaguri, H., Yamashita, R., Suzuki, Y., Koike, R., Kinoshita, K., and Kidera, A. Proba- Sugano, S., and Nakai, K. DBTSS: DataBase of bilistic alignment detects remote homology in Transcription Start Sites, progress report 2008. a pair of protein sequences without homolo- Nucl. Acids Res., in press. gous sequence information. Proteins 66: 655- Ikura, T., Kinoshita, K., and Ito, N.A cavity with 63, 2007. an appropriate size is the basis of the PPIase Kinoshita, K., Murakami, Y., and Nakamura, H. activity. PEDS , in press. eF-seek: prediction of the functional sites of Obayashi, T., Hayashi, S., Shibaoka, M., Saek, proteins by searching for similar electrostatic M., Ohta, H., and Kinoshita, K. COXPRESdb: potential and molecular surface shape. Nucleic a database of coexpressed gene networks in Acids Res. 35: W398-402, 2007. mammals. Nucl. Acids Res, in press. Ishida, T. and Kinoshita, K. PrDOS: prediction Higurashi, M., Ishida, T., and Kinoshita, K. of disordered protein regions from amino acid Identification of transient hub proteins and sequence, Nucleic Acids Res. 35: W460-464, the possible structural basis for their multiple 2007. interactions. Protein Sci, 17: 72-78, 2008. Hosaka, Y., Iwata, M., Kamiya, N., Yamada, M., Kanamori, E., Murakami, Y., Tsuchiya, Y., Stan- Kinoshita, K., Fukunishi, Y., Tsujimae, K., dley, D.M., Nakamura, H., and Kinoshita, K. Hibino, H., Aizawa, Y., Inanobe, A., Naka- Docking of protein molecular surfaces with mura, H., and Kurachi, Y. Mutational analysis evolutionary trace analysis. Proteins 69: 832- of block and facilitation of HERG current by a 838, 2007. class III anti-crrhythmic agent, nifekalant. Okumura, T., Makiguchi, H., Makita, Y., Channels 1: 198-208, 2007. Yamashita, R., and Nakai, K. Melina II: a web 蒔田由布子,中井謙太.第2章10.転写因子とシ tool for comparisons among several predictive スエレメントの相互作用解析に役立つデータ algorithms to find potential motifs from pro- ベースとウェブツール.礒辺・中山・伊藤編, moter regions. Nucl. Acids Res. 35: W227-W 分子間相互作用解析ハンドブック(実験ハンド 231, 2007. ブックシリーズ).羊土社.pp.210―217,2007. Horton,P.,Park,K.-J.,Obayashi,T.,Fujita,N., 鈴木穣,山下理宇,中井謙太,菅野純夫.転写制 Harada, H., Adams-Collier, C.J., and Nakai, 御領域(プロモーター領域)の同定と解析~ K. WoLF PSORT: protein localization predic- DBTSS/TRANSFAC~.中村・礒合・石川・平 tor. Nucl. Acids Res. 35: W585-W587, 2007. 川・坊農編,バイオデータベースとウェブツー Tsuritani, K., Irie, T., Yamashita, R., Wakaguri, ルの手とり足とり活用法 改訂第2版.羊土 H., Kanai, A., Mizushima-Sugano, J., Sugano, 社.pp.65―75,2007. S., Nakai, K., and Suzuki, Y. Distinct class of ポール・ホートン,中井謙太.PSORT(ピーソー putative “non-conserved” promoters in hu- ト).中村・礒合・石川・平川・坊農編,バイ mans: comparative studies of alternative pro- オデータベースとウェブツールの手とり足とり moters of human and mouse genes. Genome 活用法 改訂第2版.羊 土 社.pp. 108―114, Res. 17(7): 1005-1014, 2007. 2007. Sakakibara, Y., Irie, T., Suzuki, Y., Yamashita, 木下賢吾.タンパク質間相互作用に関するデータ R., Wakaguri, H., Kanai, A., Chiba, J., Takagi, ベースの構築.生体の科学.58:334―337,2007. T., Mizushima-Sugano, J., Hashimoto, S., 140

Human Genome Center Department of Public Policy 公共政策研究分野

Associate ProfessorKaoriMuto,Ph.D. 准教授 保健学博士武藤香織

Department of Public Policy is a new and the first social science section at the IMSUT. We work for three major missions; public policy studies on translational research, its application to healthcare and its impact on social security; practical advices and survey for research projects to build public trust; and “minority- centered” scientific communication. We have conducted a comparative political study on genetic testing business in East Asia and supported for “BioBank Japan” project from ethical, legal and social standpoints.

1. A comparative political study on genetic testing guidelines in 2006, which suggests ban- testing business in East Asia ning certain types of genetic tests for multi- factorial common diseases with poor scientific To examine broader social and cultural agen- evidences. Finally, the Ministry banned 20 types das on industrialization of genetic tests and sug- of genetic tests to be supplied for the clinical gest policy implications in East Asia, we’ve been and business purposes, which had been previ- conducting this research. In the first phase, we ously supplied by Korean bioindustries. The have surveyed political and legal positions re- banned genetics tests include obesity, all can- garding genetic testing business, especially ge- cers, Alzheimer’s, high blood pressures, diabe- netic tests related to lifestyle and multi-factorial tes, osteoporosis, and asthma. Genetic testing for common diseases directly to the public or via research purposes are out of scopes of these clinics in Japan, China, Taiwan and South Ko- policies. South Korean bioethics law will be re- rea. We’ve interviewed main players in this vised within a year or two and add statutes for area; the relevant authorities, bioindustries, phy- regulating genetic testing. With regards to labo- sicians, academics and patients support groups. ratory quality assurance, the Ministry of Health We also conducted literature reviews regarding and Welfare requires all genetic testing labs in regulations. South Korea to submit applications about the One of the key preliminary findings is the list of genetic tests they provide and other infor- contrary regulative differences between South mation to qualify their laboratory quality to ob- Korea and Japan. After the fabrication of Hwang tain licenses from the Ministry. The licensed labs Woo-suk’s stem cell cloning and unethical hu- have to accept laboratory inspection by the Ko- man egg collection, bioethics law has been dis- rean Institute of Genetic Testing Evaluation cussed for revision and the government seeks (KIGTE). more strict regulation towards life science and On the contrary, in Japan, we don’t have any healthcare. Then, the government takes strict po- legal regulation regarding genetic information sitions towards bioindustries which provide ge- and genetic tests. Ministry of Health and Wel- netic tests for children directly to the public or fare (MHLW) hasn’t taken any action towards via clinics. The Korean National Bioethics Advi- genetic testing markets and leave self-regulation sory Commission suggested the first genetic by bioindustries. The Council for Protection of 141

Individual Genetic Information (CPIGI), which alized consent process” and we have to change has been consisted of 25 bioindustries, published the relationship between participants and re- their first guidelines for standardization and searchers, from one-time informed consent to business ethics on genetic tests marketing in long lasting public trust. 2007, supported by the Ministry of International Obtaining feedbacks from participants is also Trade and Industry (MITI). The MHLW has effective to keep incentives to participation and made metabolic syndrome countermeasures and prevent dropout of participants from research regular health check-ups for employees over 40 process. We conducted three kinds of surveys to will be changed to stratify individual risks for evaluate and improve the consent process and metabolic syndromes since April 2008. It is obvi- explore what the project should do for public in- ous that bioindustries are ready to enter into the volvement; questionnaire surveys towards re- fast-growing market with genetic testing for search participants, a web-based questionnaire obesity, which is banned in South Korea for survey towards all MCs and focus group inter- clinical use. We could further compare the social views with chief MCs to triangulate the consent and cultural difference between these neighbors, process. The preliminary results show that par- such as notions of individual responsibility and ticipants are basically satisfied with the consent individuals’ right to obtain genetic information process and highly evaluate MCs’ attitudes to- for their healthcare management. We’re now wards them. Most MCs also responded that studying Taiwan and China’s political approach they have made their original efforts to make for genetic testing business to obtain wider im- their explanation easier and understandable spe- plications in East Asia. Some part of this project cifically towards the elderly. However, certain has been financially supported by the Japan Sci- amounts of participants have already forgotten ence and Technology Agency (JST) and the Min- about what for they have donated their DNA istry of Education, Culture, Sports, Science and and serums and the experience of watching the Technology (MEXT). DVD or the leaflet about the project overview. MCs explains that this project doesn’t have any 2. Ethical, legal and social support for “Bio- plans to disclose personal genotyped data to Bank Japan” project each participant, but a certain amount of partici- pants responded that they now want to see their For supporting “BioBank Japan” project, led own genotyped data or tentative research feed- by Professor Yusuke Nakamura of Laboratory of backs, while others are just satisfied with their Molecular Medicine of IMSUT, we’ve conducted contribution to genomic research without any three types of surveys and issued newsletters rewards. Even though participants should forget for participants. By the end of 2007, the project the fact that they gave consent for research, has obtained 200,000 written consent forms by MCs explain, encourage and appreciate partici- research coordinators called Medical Coordina- pants at each time and participants recall their tors (MC). The project trained nurses or phar- will for contribution. macists as MCs for obtaining fully informed To appreciate participants’ and MCs’ contri- consent from participants. This consent process bution to the project, we had issued “BioBank had been well-worked out in advance and is newsletters No.1” in August 2006 and “No.2” in complied with the government ethical guide- November 2006. We will explore more methods lines for genetic/genomic research. However, re- and opportunities to communicate with partici- cent publications show that the long and tedious pants. Because the current forms of BioBank consent process may not contribute to partici- newsletters are available only for the sighted pants’ understanding the overview of the re- with good eyesight, we make efforts for person- search, may be unethical rather than ethical. If alized information security to meet with dis- we long for “personalized medicine”, we should abilities of participants. think further about the construction of “person-

Publications

1. 武藤香織.遺伝病ピアサポートの可能性と課 最先端治療,福祉の実態まで~.阿部康二編 題.遺伝診療をとりまく社会―その科学的・ 著.新興医学出版社.187―192,2007. 倫理的アプローチ. 水口修紀・吉田雅幸監修, 3. 武藤香織.「オーダーメイド医療」からみえる 吉田雅幸・小笹由香編集.ブレーン出版. 科学性と不確実性.遺伝子技術の社会学.柘 149―158,2007. 植あづみ・加藤秀一編著.文化書房博文社. 2. 武藤香織.神経難病当事者団体のネットワー 177―181,2007. キング.神経難病のすべて~症状・診断から 4. 柊中智恵子,武藤香織.遺伝に対する相談へ 142

の対応.難病医療専門員による難病患者のた pan. American Journal of Medical Genetics, めの難病相談ガイドブック.吉良潤一編.九 2008 (in press). 州大学出版会.64―87,2008. 6. Kaori Muto; Truth telling and predictive ge- 5. Izumi Ishiyama, Akiko Nagai, Kaori Muto, netic testing―Huntington’s Disease in Japan, Akiko Tamakoshi, Minori Kokado, Kyoko In Predictive & Genetic Testing in Asia: Mimura, Tetsuro Tanzawa, Zentaro Yama- social-science perspectives on the ramification gata. Relationship between Public Attitudes of choice. (University of Amsterdam Press, toward Genomic Studies Related to Medicine The Netherlands), 2008 (in press). and Their Level of Genomic Literacy in Ja-