150

Human Genome Center Laboratory of Genome Database Laboratory of Sequence Analysis ゲノムデータベース分野 シークエンスデータ情報処理分野

Professor Minoru Kanehisa, Ph. D. 教 授 理学博士金久 實 Research Associate Toshiaki Katayama, M. Sc. 助 手 理学修士 片山俊明 Research Associate Shuichi Kawashima, M. Sc. 助 手 理学修士 川島秀一 Lecturer Tetsuo Shibuya, Ph. D. 講 師 理学博士 渋谷哲朗 Research Associate Michihiro Araki, Ph. D. 助 手 薬学博士 荒木通啓

Owing to continuous developments of high-throughput experimental technologies, ever-increasing amounts of data are being generated in functional genomics and proteomics. We are developing a new generation of databases and computational technologies, beyond the traditional genome databases and sequence analysis tools, for making full use of such large-scale data in biomedical applications, espe- cially for elucidating cellular functions as behaviors of complex interaction systems.

1. Comprehensive repository for community to make the system consistent with the existing genome annotation open standards. Thus, the contents of the KEGG DAS server can be accessed programatically by Toshiaki Katayama and Minoru Kanehisa the DAS protocol and graphically in a web browser using GBrowse. BioDAS, which is an KEGG DAS is a DAS (Distributed Annotation XML over HTTP data retrieving protocol, en- System) service for all organisms in the ables the user to write various kinds of auto- GENOME and databases in KEGG mated programs for analyzing genome se- (Kyoto Encyclopedia of Genes and Genomes). It quences and annotations. For example, by com- started as part of the standardization efforts bining KEGG DAS with KEGG API, a program along with KEGG API and KGML, which are a to retrieve upstream sequences of a given set of SOAP based KEGG web service and XML repre- genes with similar expression patterns on the sentation of KEGG pathways, respectively. In same pathway can be written very easily. Fur- addition to the genome sequences, the KEGG thermore, GBrowse provides a graphical web in- DAS server also provides KEGG annotations for terface for the KEGG DAS server to browse, each linked to the PATHWAY and search, zoom and visualize a particular region LIGAND databases, as well as the SSDB data- of the genome. Thanks to the powerful feature base containing paralog, ortholog and motif in- of the GBrowse, users are also able to add their formation. We have been developing the server own annotations onto the KEGG DAS view by basedonopensourcesoftwareincluding uploading their data. This enables researchers to BioRuby, BioPerl, BioDAS and GMOD/GBrowse annotate the genome by themselves, and by 151 sharing the annotations with the community ing the KEGG system, such as searching and they can continuously refine the annotation. computing biochemical pathways in cellular This is so-called “community annotation.” processes or analyzing the universe of genes in KEGG DAS is available at http://das.hgc.jp/. the completely sequenced genomes. The users access the KEGG API server using the SOAP 2. Automatic assignments of orthologs and technology over the HTTP. The SOAP server paralogs in complete genomes also comes with WSDL, which makes it easy to build a client library for a specific computer lan- Toshiaki Katayama and Minoru Kanehisa guage. This enables the users to write their own programs for many different purposes and to In addition to the human intensive efforts in automate the procedure of accessing the KEGG the community databases, we are developing a API server and retrieving the results. Over the computational method for automating genome first year after the initial release of the KEGG annotations. It is based on a graph analysis of API, we have received numerous feedbacks be- the KEGG SSDB database, containing sequence yond our expectations. Making use of these sug- similarity relations among all the genes in the gestions, we are continuously developing the completely sequenced genomes. The nodes of KEGG API to refine the interface to the KEGG the SSDB graph are genes (currently about system. Recent key changes include the follow- 680,000 genes in over 200 genomes) and the ing: (1) Methods for retrieving the pathway in- edges are the Smith-Waterman sequence similar- formation have been re-organized for consis- ity scores computed by the SSEARCH program tency. (2) Wrapper methods for the DBGET/ (currently over 300 million edges above the LinkDB system have been fully incorporated to threshold score of 100). The edges are not only enable complicated database search. DBGET/ weighted but also directed, indicating the best LinkDB serves as a database retrieval system be- (top-scoring) hit when a gene in an organism is hind the KEGG system. (3) Several new meth- compared against all genes in another organism. ods (e.g. methods for retrieving a set of ho- Thus, a highly connected cluster of nodes con- molog genes) have been added. (4) To control taining a number of bidirectional best hits might the number of results, ‘start’ and ‘max_results’ be considered an ortholog cluster consisting of arguments were introduced to the methods that functionally identical genes. Such a cluster can return vast amounts of results at once. The be found by our heuristic method for finding KEGG API is available at http://www.genome. “quasi-cliques”, but the SSDB graph is too large jp/kegg/soap/. to perform quasi-clique finding at a time. There- fore, we introduce a hierarchy (evolutionary re- 4. High performance database entry retrieval lationship) of organisms and treat the SSDB system graph as a nested graph. The automatic decom- position of the SSDB graph into a set of quasi- Kazutomo Ushijima, Chiharu Kawagoe, Toshi- cliques results in the KEGG OC (Ortholog Clus- aki Katayama, Shuichi Kawashima, Hideo ter) database, which is regularly updated and Bannai, Kenta Nakai, Minoru Kanehisa made avaiable at http://www.genome.jp/kegg- bin/OC/count/lookup. Recently, the number of entries in biological databases is exponentially increasing year by 3. SOAP/WSDL interface for the KEGG sys- year. For example, there were 10,106,023 entries tem in the GenBank database in the year 2000, which has now grown to 43,918,359 (Release 144). Al- Shuichi Kawashima, Toshiaki Katayama and though we have provided a simple entry re- Minoru Kanehisa trieval system covering several major databases for several years at the Center, We have started developing a new web serv- searches have come to require quite a long time. ice, named KEGG API, to facilitate usability of Therefore, in order to continue to utilize the the KEGG system. KEGG is a suite of databases rapidly growing databases, it has become a criti- and associated software, integrating our current cal issue to develop a new data retrieval system knowledge of molecular interaction networks in that can scan the whole database at a high biological processes (PATHWAY database), the speed. Furthermore, as the number of matched information about the universe of genes and entries by a simple keyword search increases (GENES/SSDB databases), and the in- along with the growth of the database size, a formation about the universe of chemical com- new mechanism is needed to narrow down the pounds and reactions (LIGAND database). The results by applying a field specific search. To KEGG API provides valuable means for access- meet these needs, we have developed a new 152 high performance database entry retrieval sys- ous types of biological functions of given com- tem, named HiGet. The HiGet system is con- pounds. However, exisiting structure compari- structed on the HiRDB, a commercial ORDBMS son methods for organic compounds, not to say (Object-oriented Relational Database Manage- methods for similarity search on chemical struc- ment System) developed by Hitachi, Ltd., and is ture databases, are not well-suited for under- publicly accessible at http://higet.hgc.jp/. standing biological functions. Hence we are de- HiGet can execute full text search on various veloping new algorithms and data structures for biological databases. In addition to the original indexing and/or searching structures of various plain format, the system contains data in the kinds, including compounds, polyketides, nonri- XML format in order to provide a field specific bosomal peptides, and proteins. For polyketides search facility. When a complicated search con- and nonribosomal peptides, we already devel- dition is issued to the system, the search proc- oped algorithms for searching enzymes that are essing is executed efficiently by combining sev- used to synthesize them. For proteins, we are eral types of indices to reduce the number of re- developing a new data structure based on geo- cords to be processed within the system. Cur- metric hashing and suffix trees to achieve fast rent searchable databases are GenBank, UniProt, search for structurally similar proteins. Prosite and OMIM. We are planning to include other valuable databases (such as PDB and Ref- 6. Chemogenomic dissection of the biosyn- Seq) and also planning to develop an interdata- thetic process of medicinal natural prod- base search interface and a complex search facil- ucts ity combining keyword search and sequence similarity search. Michihiro Araki, Tetsuo Shibuya, and Minoru Kanehisa 5. Development of algorithms for searching similar patterns in molecular structures Medicinal natural products are enzymically synthesized as secondary metabolites for specific Tetsuo Shibuya, Michihiro Araki and Minoru biological purposes, which have been the major Kanehisa sources of bioactive compounds with diverse pharmacological activities. In order to make full Finding similar patterns in biological se- use of the potential of natural products as re- quences of proteins and DNAs is a critical step search tools as well as drug leads on the context in identifying biological functions of these mole- of metabolic engineering, it is of great impor- cules. Algorithms to search sequence similarities tance to understand the biosynthetic strategies were developed in conjunction with the avail- with the integrative computational analyses of ability of DNA and databases. In recent chemical and genomic information. An increas- years carbohydrate sequence data have become ing amount of such information becomes avail- publicly available, such as in the KEGG GLY- able to allow us to extract the chemical design CAN database, and algorithms to search similar principles of the natural products coded on tree structures have also been developed. Fur- genomic information. We have thus started thermore, with the emphasis on chemical identifying the system structures of the biosyn- genomics, databases of chemical compounds are thetic devices by merging the chemical informa- now rapidly expanding. Thus, it is desired to tion of molecular building blocks and module find compounds with similar chemical struc- structures into the genomic information of the tures from the databases and to elucidate vari- biosynthetic pathway.

Publications

Kanehisa, M, Goto, S, Kawashima, S, Okuno, Y, Cluster) and its assignment to draft genomes, Hattori, M: The KEGG resource for decipher- Genome Informatics, 15: P049, 2004. ing the genome. Nucleic Acids Res., 32: D277- Masoudi-Nejad, A, Jauregui, R, Kawashima, S, 80, 2004. Goto,S,Kanehisa,M,Endo,TR:Thekingdom Okamoto, S, Kawashima, S, Narikawa, R, Kane- of plantae EST indices: a resource for plant hisa, M: Correlation between signal transduc- genomics community, Genome Informatics, tion domains and habitats in cyanobacteria. 15: P102, 2004. Genome Informatics, 15: P054, 2004. Yoshizawa, A, Okuda, S, Moriya, Y, Itoh, M, Moriya, Y, Katayama, T, Nakaya, A, Itoh, M, Katayama, T, Kawano, S, Okamoto, S, Yoshizawa, A, Okuda, S, Kanehisa M: Auto- Kawashima, S, Kanehisa, M: Functional classi- matic generation of KEGG OC (Ortholog fication of coiled-coil proteins in multiple 153

genomes, Genome Informatics, 15: P157, 2004. Kwan, A., Miyano, S., Okazaki, Y. and Shibuya, Tamada,Y,Bannai,H,Imoto, S, Katayama, T, T. What are the applications and limitations Kanehisa, M, Miyano, S: Modeling gene net- of microarray datamining for immunology? works utilizing evolutionary information us- First International Immunoinformatics Sympo- ing Bayesian network models, Genome Infor- sium, 2004. matics, 15: P020, 2004. Araki, M, Katayama, T, Matsuura, Y, Kanehisa, Okuda, S, Yoshizawa, A, Moriya, Y, Itoh, M, M., KEGG PEPTIDE: a database for peptide Katayama,T,Goto,S,Kanehisa,M:Functional structures, IBSB 2004, 1-2, 2004. categorization of multiple genomes using Kobayashi, H, Kaern, M, Araki, M, Chung, K, KEGGOCintheGenomeIndecies,Genome Gardner, TS, Cantor, CR, Collins, JJ., Program- Informatics, 15: P050, 2004. mable cells: interfacing natural and engi- Yuasa, T., Hayashi, T., Ikai, N., Katayama, T., neered gene networks. Proc. Natl. Acad. Sci. Aoki, K., Obara, T., Toyoda, Y., Maruyama, U.S.A., 101: 8414-8419, 2004. T., Kitagawa, D., Takahashi, K., Nagao, K., 渋谷哲朗,バイオインフォマティクスにおける Nakaseko, Y. and Yanagida, M. 2004. An in- データマイニングの基礎,ゲノム研究実験ハン teractive gene network for securin-separase, ドブック,辻本豪三,田中利男編,羊土社, condensin, cohesin, Dis1/Mtc1 and histones pp.45―48,2004. constructed by mass transformation, Genes 片山俊明,インターネットの利用.ゲノム研究実 Cells, 9: 1069-1082, 2004. 験ハンドブック,辻本豪三,田中利男編,羊土 Shibuya,T.,Kashima,H.andKonagaya,A.Ac- 社.pp.20―29,2004. curate cDNA Clustering Algorithm based on 川島秀一,プログラムからのデータベースアクセ Spliced Sequence Alignment, Bioinformatics, ス.ゲノム研究実験ハンドブック,辻本豪三, 20: 29-39, 2004. 田中利男編,羊土社,pp.109―113,2004. Shibuya, T. Generalization of a Suffix Tree for 荒木通啓,服部正泰,金久實,バイオインフォマ RNA Structural Pattern Matching, Algo- ティクスとケモインフォマティクスの融合,現 rithmica, 39: 1-19, 2004 代医療,Vol.36,№5,pp.86―92,2004 154

Human Genome Center Laboratory of DNA Information Analysis DNA情報解析分野

Professor Satoru Miyano, Ph.D. 教 授 理学博士 宮 野 悟 Research Associate Seiya Imoto, Ph.D. 助 手 理学博士 井元清哉 Research Associate Hideo Bannai, Ph.D. 助手博 士 坂内英夫 (情報理工学) Instructor Michiel J.L. de Hoon 特任教員 Ph.D. Michiel J.L. de Hoon

The aim of the research at this laboratory is to establish computational methodolo- gies for discovering and interpreting information of nucleic acid sequences, pro- teins and some other experimental data arising from researches in Genome Sci- ence. Our current concern is focused on Computational Systems Biology and its related computational techniques. Apart from the research activity, the laboratory has been providing bioinformatics software tools and has been taking a leading part in organizing an international forum for Genome Informatics.

1. Computational Systems Biology improves the accuracy of the estimated gene networks, and successfully identifies some bio- a. Using protein-protein interactions for refin- logical facts. ing gene networks estimated from mi- croarray data by Bayesian networks b. Superiority of network motifs over optimal networks and an application to the revela- Naoki Nariai, Sunyong Kim, Seiya Imoto, Sa- tion of gene network evolution toru Miyano Sascha Ott, Annika Hansen1, Sunyong Kim, We propose a statistical method to estimate and Satoru Miyano: 1Wolfson Institute for Bio- gene networks from DNA microarray data and medical Research, University College London protein-protein interactions. Because physical in- teractions between proteins or multiprotein Estimating the network of regulative interac- complexes are likely to regulate biological proc- tions between genes from gene expression meas- esses, using only mRNA expression data is not urements is a major challenge. Recently, we sufficient for estimating a gene network accu- have shown that for gene networks of up to rately. Our method adds knowledge about around 35 genes, optimal network models can protein-protein interactions to the estimation be computed. However, even optimal gene net- method of gene networks under a Bayesian sta- work models will in general contain false edges, tistical framework. In the estimated gene net- since the expression data will not unambigu- work, a protein complex is modeled as a virtual ously point to a single network. In order to node based on principal component analysis. overcome this problem, we present a computa- We show the effectiveness of the proposed tional method to enumerate the most likely m method through the analysis of Saccharomyces networks and to extract a widely common sub- cerevisiae cell cycle data. The proposed method graph (denoted as gene network motif) from 155 these. We apply the method to bacterial gene edge. We also use Monte Carlo simulations to expression data and extensively compare esti- investigate the properties of the proposed mation results to knowledge. Our results reveal method. Real data analysis detects complicated that gene network motifs are in significantly bet- effects as well as a set of parent genes from a ter agreement to biological knowledge than opti- large set of candidates even when there are mal network models. We also confirm this ob- complicated relationship between a target gene servation in a series of estimations using syn- and its parent genes. thetic microarray data and compare estimations by our method with previous estimations for e. Functional data analysis of the dynamics yeast. Furthermore, we use our method to esti- of gene regulatory networks mate similarities and differences of the gene net- works that regulate tryptophan metabolism in Tomohiro Ando, Seiya Imoto, Satoru Miyano two related species and thereby demonstrate the analysis of gene network evolution. A new method for constructing gene net- works from microarray time-series gene expres- c. Finding optimal models for small gene net- sion data is proposed in the context of Bayesian works network approach. An essential point of Baye- sian network modeling is the construction of the Sascha Ott, Seiya Imoto, Satoru Miyano conditional distribution of each random vari- able. When estimating the conditional distribu- Finding gene networks from microarray data tions from gene expression data, a common has been one focus of research in recent years. problem is that gene expression data contain Given search spaces of super-exponential size, multiple missing values. Unfortunately, many researchers have been applying heuristic ap- methods for constructing conditional distribu- proaches like greedy algorithms or simulated tions require a complete gene expression value annealing to infer such networks. However, the and many loose effectiveness even with a few accuracy of heuristics is uncertain, which - in missing value. Additionally, they treat microar- combination with the high measurement noise ray time-series gene expression data as static of microarrays - makes it very difficult to draw data, although time can be an important factor conclusions from networks estimated by heuris- that affects the gene expression levels. We over- tics. We present a method that finds optimal come these difficulties by using the method of Bayesian networks of considerable size and functional data analysis. The proposed network show first results of the application to yeast construction method consists of two stages. data. Having removed the uncertainty due to Firstly, discrete microarray time-series gene ex- the heuristic methods, it becomes possible to pression values are expressed as a continuous evaluate the power of different statistical models curve of time. To account for the time depend- to find biologically accurate networks. ency of gene expression measurements and the noisy nature of the microarray data, P-spline d. Bayesian network and radial basis function nonlinear regression models are utilized. After network regression for nonlinear modeling this preprocessing step, the conditional distribu- of genetic network tion of each random variable is constructed based on functional linear regression models. Tomohiro Ando, Seiya Imoto, Satoru Miyano The effectiveness of the proposed method is in- vestigated through Monte Carlo simulations and A new method for constructing gene net- the analysis of Saccharomyces cerevisiae gene ex- works from microarray gene expression data is pression data. proposed using Bayesian networks and radial basis function network regression models. An f. Constructing biological pathway models essential point of Bayesian network modeling is with hybrid functional Petri nets the construction of the conditional distribution of each random variable. Unfortunately, many Atsushi Doi, Sachie Fujita2, Hiroshi Matsuno2, statistical methods often fail in capturing bio- Masao Nagasaki, Satoru Miyano: Faculty of logical systems when a target gene receives Science, Yamaguchi University complicated effects from its parent genes. To capture the complicated biological systems, ra- In many research projects on modeling and dial basis function network regression models analyzing biological pathways, the Petri net has are utilized. We analyze Saccharomyces cerevisiae been recognized as a promising method for rep- gene expression data and evaluate the resulting resenting biological pathways. From the pio- network by comparing with biological knowl- neering works by Reddy et al. (1993), and 156

Hofest? dt (1994), that model metabolic path- using a software “Genomic Object Net” which is ways by traditional Petri net, several enhanced developed based on hybrid functional Petri net Petri nets such as colored Petri net, stochastic (HFPN) architecture. However, in this model, Petri net, and hybrid Petri net have been used cellular formation is fixed throughout the simu- for modeling biological phenomena. Recently, lation. Then, this paper constructs an HFPN Matsuno et al. ((PSB 8: 152-163, 2003), intro- model of Xenopus cell cycle pathway which in- duced the hybrid functional Petri net (HFPN) in cludes the mechanism for cell division control as order to give a more intuitive and natural mod- well as checkpoint processes. With this model, eling method for biological pathways than these dynamic cell division processes of Xenopus early existing Petri nets. Although the paper demon- embryo including the changes in cell division strates the effectiveness of HFPN with two ex- cycles from synchronous to asynchronous are amples of gene regulation mechanism for circa- simulated. dian rhythms and apoptosis signaling pathway, there has been no detailed explanation about the i. A versatile Petri net based architecture for method of HFPN construction for these exam- modeling and simulation of complex bio- ples. The purpose of this paper is to describe logical processes method to construct biological pathways with the HFPN step-by-step. The method is demon- Masao Nagasaki, Atsushi Doi, Hiroshi Mat- strated by the well-known glycolytic pathway suno2, Satoru Miyano controlled by the lac operon gene regulatory mechanism. The research on modeling and simulation of complex biological systems is getting more im- g. Modeling and simulation of fission yeast portant in Systems Biology. In this respect, we cell cycle on hybrid functional Petri net have developed Hybrid Function Petri net (HFPN) that was newly developed from existing Sachie Fujita2,MikaMatsui2,HiroshiMat- Petri net because of their intuitive graphical rep- suno2, Satoru Miyano resentation and their capabilities for mathemati- cal analyses. However, in the process of model- Through many researches on modeling and ing metabolic, gene regulatory or signal trans- analyzing biological pathways, Petri net has duction pathways with the architecture, we been recognized as a promising method for rep- have realized three extensions of HFPN, (i) an resenting biological pathways. Recently, Mat- entity should be extended to contain more than suno et al. (PSB 8: 152-163, 2003) introduced hy- one value, (ii) an entity should be extended to brid functional Petri net (HFPN) for giving more handle other primitive types, e.g. Boolean, intuitive and natural biological pathway model- string, (iii) an entity should be extended to han- ing method than existing Petri nets. They also dlemoreadvancedtypecalledobjectthatcon- developed Genomic Object Net (GON) which sists of variables and methods, are necessary for employs the HFPN as a basic architecture. Many modeling biological systems with Petri net kinds of biological pathways have been modeled based architecture. To deal with it, we define a with the HFPN and simulated by GON. This new enhanced Petri net called hybrid functional paper gives a new HFPN model of cell cycle of Petri net with extension (HFPNe). To demon- fission yeast with giving six basic HFPN compo- strate the effectiveness of the enhancements, we nents of typical biological reactions, and demon- model and simulate with HFPNe four biological strating the method how biological pathways processes that are difficult to represent with the can be modeled with these HPFN components. previous architecture HFPN. Simulation results by GON suggest a new hy- pothesis which will help biologists for perform- j. Integrating biopathway databases for large- ing further experiments. scale modeling and simulation h. Simulated cell division processes of the Masao Nagasaki, Atsushi Doi, Hiroshi Mat- Xenopus cell cycle pathway by Genomic suno2, Satoru Miyano Object Net Biopathway databases have been developed, Mika Matsui2, Sachie Fujita2, Shunichi Suzuki2, such as KEGG and EcoCyc that compile interac- Hiroshi Matsuno2, Satoru Miyano tion structures of biopathways together with biological annotations. However, these biopath- Matsuno et al. (PSB 8: 152-163, 2003) modeled ways are not directly editable and simulatable. and simulated a Drosophila multicellular pat- Thus,wehavemadeanapplication,the terning by Delta-Notch signaling pathway by Biopathway Executer (BPE) that reconstructs 157 these two major biopathway databases to XML DBTBS database (http://dbtbs.hgc.jp). formats of modeling and simulation platforms. BPE is developed with JAVA and has a data- l. Predicting gene regulation by sigma factors base of executable biopathways that integrates in Bacillus subtilis from genome-wide data some parts of biopathway information, KEGG and BioCyc, and other databases, e.g. MIPS and Michiel J.L. de Hoon, Yuko Makita, Seiya BRENDA. Currently, BPE employs the XML for- Imoto, Kazuo Kobayashi3, Naotake Ogasawara3, mat (GONML) of a Hybrid Functional Petri net Kenta Nakai, Satoru Miyano (HFPN) for the output. The features of HFPN are: (i) biopathways that contain discrete and Sigma factors regulate the expression of genes continuous processes can be modeled, (ii) all in Bacillus subtilis at the transcriptional level. We biopathways that are modeled with ordinary assess the accuracy of a fold-change analysis, differential equations (ODEs) can be remodeled, Bayesian networks, dynamic models and super- (iii) biopathways can be modeled while keeping vised learning based on coregulation in predict- readability by human. Other XML formats of ing gene regulation by sigma factors from gene biopathways, SBML and CellML are subsets of expression data. To improve the prediction ac- GONML. Thus, BPE can bridge major biopath- curacy, we combine sequence information with way databases and major modeling and simulat- expression data by adding their log-likelihood ing softwares. To demonstrate the effectiveness/ scores and by using a logistic regression model. usability of BPE, four examples are created and We use the resulting score function to discover simulated on Genomic Object Net which is currently unknown gene regulations by sigma based on the HFPN architecture; (i) executable factors. The coregulation-based supervised KEGG maps while keeping the features of origi- learning method gave the most accurate predic- nal maps, (ii) executable BioCyc maps while tion of sigma factors from expression data. We keeping the features of original maps, (iii) large- found that the logistic regression model effec- scale editable and simulatable KEGG metabolic tively combines expression data with sequence pathways, (iv) a metabolic pathway with gene information. In a genome-wide search, highly regulatory networks. These examples show that significant logistic regression scores were found BPE is a useful tool for integrating biopathway for several genes whose transcriptional regula- databases for large scale modeling and simula- tion is currently unknown. We provide the cor- tion. responding RNA polymerase binding sites to enable a straightforward experimental verifica- k. Predicting the operon structure of Bacillus tion of these predictions. subtilis using operon length, intergene distance, and gene expression information m. A neural network method for identification of RNA-interacting residues in protein Michiel J.L. de Hoon, Seiya Imoto, Kazuo Ko- bayashi3, Naotake Ogasawara3, Satoru Miyano: Euna Jeong, I-Fang Chung, Satoru Miyano 3Nara Institute of Science and Technology Identification of the most putative RNA- We predict the operon structure of the Bacillus interacting residues in protein is an important subtilis genome using the average operon length, and challenging problem in a field of molecular the distance between genes in base pairs, and recognition. Structural analysis of protein-RNA the similarity in gene expression measured in complexes reveals a strong correlation between time course and gene disruptant experiments. interaction residues and their structure. Building By expressing the operon prediction for each on this viewpoint, we have developed a neural method as a Bayesian probability, we are able to network predictor to correctly identify residues combine the four prediction methods into a Bay- involved in protein-RNA interactions from pro- esian classifier in a statistically rigorous manner. tein sequence and its structural information. The The discriminant value for the Bayesian classi- system has been exhaustedly cross-validated fier can be chosen by considering the associated with various strategies differing in input encod- cost of misclassifying an operon or a nonoperon ing, amount of input information, and network gene pair. For equal costs, an overall accuracy of architectures. In addition, we have evaluated 88.7% was found in a leave one-out analysis for performance among functional subsets of com- the joint Bayesian classifier, whereas the individ- plexes. Finally, to reflect the properties of ual information sources yielded accuracies of protein-RNA complexes in our dataset, two 58.1%, 83.1%, 77.3%, and 71.8% respectively. kinds of post-processing method are adopted. The predicted operon structure based on the The experimental result shows that our system joint Bayesian classifier is available from the yields not-trivial performance although the resi- 158 dues in interaction sites are too scarce. Akaike information criterion (AIC) to find the most effective set of SNPs and environments. 2. Algorithmic and Statistical Methods for That is, the set of SNPs and environments that Bioinformatics gives the smallest AIC is chosen as the optimal set. Since the number of combinations of SNPs a. Finding optimal pairs of patterns and environments is usually huge, we propose the use of the genetic algorithm for choosing the Hideo Bannai, Heikki Hyyrö4, Ayumi Shino- optimal SNPs and environments in the sense of hara4, Masayuki Takeda4, Kenta Nakai, Satoru AIC. We show the effectiveness of the proposed Miyano: 4Department of Informatics, Kyushu method through the analysis of the case/control University populations of diabetes patients. We succeeded in finding an efficient set of to predict types of We consider the problem of finding the opti- diabetes and some SNPs which have strong in- mal pair of string patterns for discriminating be- teraction to age while it is not significant as a tween two sets of strings, i.e. finding the pair of single locus. patterns that is best with respect to some appro- priate scoring function that gives higher scores c. Kernel mixture survival models for identify- to pattern pairs which occur more in the strings ing cancer subtypes, predicting patient’s ofoneset,butlessintheother.Wepresentan cancer types and survival probabilities O(N2) time algorithm for finding the optimal pair of substring patterns, where N is the total Tomohiro Ando, Seiya Imoto, Satoru Miyano length of the strings. The algorithm looks for all possible Boolean combination of the patterns, e. One important application of microarray gene g. patterns of the form p ˆ _q, which indicates expression data is to study the relationship be- that the pattern pair is considered to match a tween the clinical phenotype of cancer patients given string s, if p occurs in s, AND q does and gene expression profiles on the whole- NOT occur in s. The same algorithm can be ap- genome scale. The clinical phenotype includes plied to a variant of the problem where we are several different types of cancers, survival times, given a single set of sequences along with a nu- relapse times, drug responses and so on. Under meric attribute assigned to each sequence, and the situation that the subtypes of cancer have the problem is to find the optimal pattern pair not been previously identified or known to ex- whose occurrence in the sequences is correlated ist, we develop a new kernel mixture modeling with this numeric attribute. An efficient imple- method that performs simultaneously identifica- mentation based on suffix arrays is presented, tion of the subtype of cancer, prediction of the and the algorithm is applied to several nucleo- probabilities of both cancer type and patient’s tide sequence datasets of moderate size, com- survival, and detection of a set of marker genes bined with microarray gene expression data, on which to base a diagnosis. The proposed aiming to find regulatory elements that cooper- method is successfully performed on real data ate, complement, or compete with each other in analysis and simulation studies. enhancing and/or silencing certain genomic functions. d. A mixed factors model for dimension re- duction and extraction of a group struc- b. Case-control study of binary trait consider- ture in gene expression data ing interactions between SNPs and envi- ronmental effects using logistic regression Ryo Yoshida5, Tomoyuki Higuchi5,Seiya Imoto: 5Institute of Statistical Mathematics Reiichro Nakamichi, Seiya Imoto, Satoru Miy- ano When we cluster tissue samples on the basis of genes, the number of observations to be In this paper, we propose a combination of lo- grouped is much smaller than the dimension of gistic regression and genetic algorithm for the feature vector. In such a case, the applicability association study of the binary disease trait. We of conventional model-based clustering is lim- use a logistic regression model to describe the ited since the high dimensionality of feature relation of multiple SNPs, environments and the vector leads to overfitting during the density es- target binary trait. The logistic regression model timation process. To overcome such difficulty, can capture the continuous effects of environ- we attempt a methodological extension of the ments without categorization, which causes the factor analysis. Our approach enables us not loss of the information. To construct an accurate only to prevent from the occurrence of overfit- prediction rule for binary trait, we adopted ting, but also to handle the issues of clustering, 159 data compression and extracting a set of genes The potential usefulness is demonstrated with to be relevant to explain the group structure. the application to the leukemia dataset.

Publications

Akutsu, T., Bannai, H., Miyano, S., Ott, S. On Predicting gene regulation by sigma factors in the complexity of deriving position specific Bacillus subtilis from genome-wide data. Bioin- score matrices from positive and negative se- formatics, 20 (Suppl. 1): i101-i108, 2004. quences. Discrete Applied Mathematics. In Doi,A.,Fujita,S.,Matsuno,H.,Nagasaki,M., press. Miyano, S. Constructing biological pathway Ando, T., Imoto, S., Miyano, S. Bayesian net- models with hybrid functional Petri nets. In work and radial basis function network re- Silico Biology. 4(3): 271-291, 2004. gression for nonlinear modeling of genetic Fujita, S., Matsui, M., Matsuno, H., Miyano, S. network. Proc. Third International Conference Modeling and simulation of fission yeast cell on Information. 561-564, 2004. cycle on hybrid functional Petri net. IEICE Ando, T., Imoto, S., Miyano, S. Functional data Transactions on Fundamentals of Electronics, analysis of the dynamics of gene regulatory Communications and Computer Sciences. E87- networks. Proc. Knowledge Exploration in A(11): 2919-2928, 2004. Life Science Informatics KELSI2004. Lecture Hirose, O., Nariai, N., Tamada, Y., Bannai, H., Notes in Artificial Intelligence. 3303: 69-83, Imoto, S., Miyano, S. Estimating gene net- 2004. works from expression data and binding loca- Ando, T., Imoto, S., Miyano, S. Kernel mixture tion data via boolean networks. Proc. 1st In- survival models for identifying cancer sub- ternational Workshop on Data Mining and types, predicting patient’s cancer types and Bioinformatics. Lecture Note in Comupter Sci- survival probabilities. Genome Informatics. 15 ence. In press. (2): 201-210, 2004. Imoto, S., Higuchi, T., Goto, T., Tashiro, K., Ku- Araki, Y., Konishi, S., Imoto, S. Functional dis- hara, S., Miyano, S. Combining microarrays criminant analysis for time-seriese gene ex- and biological knowledge for estimating gene pression data via radial basis function expan- networks via Bayesian networks. J. Bioinfor- sion. Proc. COMPSTAT 2004, 613-620, 2004. matics and Computational Biology. 2(1): 77- Physica-Verlag/Springer. 98, 2004. Bannai, H., Hyyrö, H., Shinohara, A., Takeda, Imoto,S.,Higuchi,T.,Kim,S.,Jeong,E.,Miy- M.,Nakai,K.,Miyano,S.AnO(N2) algorithm ano, S. Residual bootstrapping and median fil- for discovering optimal Boolean pattern pairs. tering for robust estimation of gene networks IEEE/ACM Transactions on Computational from microarray data. Proc. 2nd Computa- Biology and Bioinformatics. In press. tional Methods in Systems Biology. Lecture Bannai, H., Hyyrö, H., Shinohara, A., Takeda, Note in Bioinformatics. In press. M., Nakai, K., Miyano, S. Finding optimal Inenaga, S., Bannai, H., Hyyrö, H., Shinohara, pairs of patterns. Proc. 4th International A., Takeda, M., Nakai, K., Miyano, S. Finding Workshop on Algorithms in Bioinformatics optimal pairs of cooperative and competing (WABI 2004). Lecture Notes in Bioinformatics. patterns with bounded distance. Proc. 7th In- 3240: 450-462, 2004. ternational Conference on Discovery Science Bannai, H., Inenaga, S., Shinohara, A., Takeda, (DS 2004). Lecture Notes in Artifitial Intelli- M., Miyano, S. Efficiently finding regulatory gence. 3245: 32-46, 2004. elements using correlation with gene expres- Jeong, E., Chung, I., Miyano, S. A neural net- sion. J. Bioinformatics and Computational Bi- work method for identification of RNA- ology. 2(2): 273-288, 2004. interacting residues in protein. Genome Infor- De Hoon, M.J.L., Imoto, S., Kobayashi, K., matics. 15(1): 105-116, 2004. Ogasawara, N., Miyano, S. Predicting the op- Kim,S.,Imoto,S.,Miyano,S.DynamicBayesian eron structure of Bacillus subtilis using operon network and nonparametric regression for length, intergene distance, and gene expres- nonlinear modeling of gene networks from sion information. Pacific Symposium on Bio- time series gene expression data. Biosystems. computing. 9: 276-287, 2004. 75(1-3): 57-65, 2004. De Hoon, M.J.L., Imoto, S., Nolan, J., Miyano, S. Konishi,S.,Ando,T.,Imoto,S.Bayesianinfor- Open source clustering software. Bioinformat- mation criteria and smoothing parameter se- ics. 20(9): 1453-1454, 2004. lection in radial basis function networks. Bio- De Hoon, M.J.L., Makita, Y., Imoto, S., Kobay- metrika. 91(1): 27-43, 2004. ashi, K., Ogasawara, N., Nakai, K., Miyano, S. Matsui, M., Fujita, S., Suzuki, S., Matsuno, H., 160

Miyano, S. Simulated cell division processes Ohtsubo,S.,Iida,A.,Nitta,K.,Tanaka,T., of the Xenopus cell cycle pathway by Yamada, R., Ohnishi, Y., Maeda, S., Tsunoda, Genomic Object Net. J. Integrative Bioinfor- T., Takei, T., Obara, W., Akiyama, F., Ito, K., matics. 3: http://journal.imbio.de/index.php? Honda, K., Uchida, K., Tsuchiya, K., Yumura, paper_id=3, 2004. W., Ujiie, T., Nagane, Y., Miyano, S., Suzuki, Matsuno, H., Inouye, S-T., Okitsu, Y., Fujii, Y., Y.,Narita,I.,Gejyo,F.,Fujioka,T.,Nihei,H., Miyano, S. A new regulatory interactions sug- Nakamura, Y. Association of a single- gested by simulations for circadian genetic nucleotide polymorphism in the immuno- control mechanism in mammals. Proc. The 3 globulin mu-binding protein 2 gene with im- rd Asia-Pacific Bioinformatics Conference munoglobulin A nephropathy. J. Hum. Genet. 2005, Imperial College Press, In press. 50(1): 30-35, 2005. Miyano, S. Computational systems biology. Ott, S., Imoto, S., Miyano, S. Finding optimal Proc. Third International Conference on Infor- models for small gene networks. Pacific Sym- mation (Li, L. and Yen, K.K., eds.). 9-14, 2004. posium on Biocomputing. 9: 557-567, 2004. Nagasaki, M., Doi, A., Matsuno, H., Miyano, S. Ott,S.,Hansen,A.,Kim,S.-Y.,andMiyano,S. Genomic Object Net: a platform for modeling (2005). Superiority of network motifs over op- and simulating biopathways. Applied Bioin- timal networks and an application to the reve- formatics. 2: 181-184, 2004. lation of gene network evolution. Bioinformat- Nagasaki, M., Doi, A., Matsuno, H., Miyano, S. ics. 21(2): 227-238, 2005. Integrating biopathway databases for large- Takei,Y.,Inoue,K.,Ogoshi,M.,Kawahara,T., scale modeling and simulation. Proc. Second Bannai, H., and Miyano, S. Identification of Asia-Pacific Bioinformatics Conference (APBC novel adrenomedullin in mammals: a potent 2004) (Y.P. Chen, ed). Conferences in Research cardiovascular and renal regulator. FEBS Let- and Practice in Information Technology. 29: 43 ters, 556: 53-58, 2004. -52, 2004. Yoshida, R., Higuchi, T., Imoto, S. A mixed fac- Nagasaki, M., Doi, A., Matsuno, H., Miyano, S. tors model for dimension reduction and ex- A versatile Petri net based architecture for traction of a group structure in gene expres- modeling and simulation of complex biologi- sion data. Proc. 3rd Computational Systems cal processes. Genome Informatics, 15(1): 180- Bioinformatics. 161-172, 2004. IEEE Press. 197, 2004. Yoshida, R., Imoto, S., Higuchi, T. A penalized Nagasaki, M., Doi, A., Matsuno, H., Miyano, S. likelihood estimation on transcriptional Computational modeling of biological proc- module-based clustering. Proc. 1st Interna- esses with Petri net based architecture. In tional Workshop on Data Mining and Bioin- “Bioinformatics Technologies” (Y.P. Chen, ed). formatics. Lecture Note in Comupter Science. Springer Press. 179-243, 2005. In press. Nakamichi,R.,Imoto,S.,Miyano,S.Case- 土井淳,長崎正朗,松野浩嗣,宮野悟.Genomic control study of binary trait considering inter- Object Netによるパスウエイの表現とシミュ actions between SNPs and environmental ef- レーション.ゲノミクス・プロテオミクスの新 fects using logistic regression. Proc. 4th IEEE 展開~生物情報の解析と応用~(今中忠之編). Bioinformatics and Bioengineering. 73-78, エヌ・ティー・エス.930―937,2004. 2004. IEEE Press. 土井淳,長崎正朗,松野浩嗣,宮野悟.生命パス Nakano, M., Noda, R., Kitakaze, H., Matsuno, ウエイのモデル化・可視化技術と創薬研究への H., Miyano, S. XML pathway file conversion 応用.月刊薬事.46¾:1265―1272,2004. between Genomic Object Net and SBML. Proc. 宮野悟.ゲノムからバイオインフォマティクス The Third International Conference on Infor- へ.現代医療.36¼:1018―1021,2004. mation. 585-588, 2004. 宮野悟,松野浩嗣,倉田博之.システム生物学. Nariai,N.,Kim,S.,Imoto,S.,Miyano,S.Using ゲノム研究実験ハンドブック(辻本豪三,田中 protein-protein interactions for refining gene 利男編集),49―54,羊土社,2004. networks estimated from microarray data by 宮野悟.バイオパスウェイシミュレーション―医 Bayesian networks. Pacific Symposium on 学とコンピュータ.Molecular Medicine.41: Biocomputing. 9: 336-347, 2004. 328―331,2004. 161

Human Genome Center Laboratory of Molecular Medicine Laboratory of Genome Technology ゲノムシークエンス解析分野 シークエンス技術開発分野

Professor Yusuke Nakamura, M.D., Ph.D. 教 授 医学博士 中村祐輔 Professor Yoichi Furukawa M.D., Ph.D. 特任教授 医学博士 古川洋一 Associate Professor Toyomasa Katagiri, Ph.D. 助教授 医学博士 片桐豊雅 Assistant Professor Ryuji Hamamoto, Ph.D. 助 手 理学博士 浜本隆二 Assistant Professor Yataro Daigo, M.D., Ph.D. 助 手 医学博士 醍 醐 弥太郎 Assistant Professor Hidewaki Nakagawa, M.D., Ph.D. 助 手 医学博士 中川英刀 Assistant Professor Koichi Matsuda, M.D. Ph.D. 助 手 医学博士 松田浩一

The major goal of the Human Genome Project is to identify genes predisposing to diseases, and to develop new diagnostic and therapeutic tools. We have been at- tempting to isolate genes involving in carcinogenesis and also those causing or predisposing to other diseases such as IgA nephropathy, and Crohn’s disease. By means of technologies developed through the genome project including a high- resolution SNP map, a large-scale DNA sequencing, and the cDNA microarray method, we have isolated a number of biologically and/or medically important genes.

1. Genes playing significant roles in human infected by adenovirus vectors containing either cancer p53-121F (Ad-p53-121F ) or wild type p53 (Ad- wtp53). The STAG1 gene was one of the tran- a. Genes that are inducible by p53 scripts showing higher expression levels in cells infected with Ad-p53-121F as opposed to Ad- Yoshio Anazawa, Park Woong Ryeon, Chizu wtp53. The encoded product appears to contain Tanikawa, Kyongah Yoon, Hirohumi Arakawa, a transmembrane domain, and binding motifs Koichi Matsuda and Yusuke Nakamura for SH3 and WW. In two other cancer-cell lines, expression of STAG1 mRNA was induced in re- A mutant version of p53 (p53-121F), in which sponse to various genotoxic stresses in a p53- phenylalanine replaces the 121st serine residue, dependent manner; moreover, enforced expres- can induce apoptosis more effectively than wild- sion of STAG1 led to apoptosis in several addi- type p53 (wt-p53). In view of that observation tional cancer-cell lines. Suppression of endoge- we considered that one or more apoptosis- nous STAG1 using the RNA-interference related p53-target genes might be preferentially method reduced the apoptotic response , induced by p53-121F. We carried out cDNA mi- whether induced by Ad-p53-121F or Ad-p53. croarray analysis to identify such genes, using These results suggest that STAG1, a novel tran- mRNAs isolated from LS174T colon-cancer cells scriptional target for p53, mediates p53- 162 dependent apoptosis, and might be a good can- enhanced apoptosis, whereas over-expression didate for next-generation gene protected cells from apoptosis caused by DNA The p53AIP1 gene, which we recently identi- damage. p53CSV is induced significantly when fied as a novel p53-target, mediates p53- cells have a low level of genotoxic stresses, but dependent apoptosis. We evaluated the effects not when DNA damage is severe. p53CSV can of adenovirus-mediated introduction of p53AIP1 modulate apoptotic pathways through interac- (Ad-p53AIP1) on 30 human cancer-cell lines in tion with Hsp70 that probably inhibits activity vitro, and two cell lines in vivo,incomparison of Apaf-1. Our results imply that under specific with the effects of p53 (Ad-p53). In 20 of the 30 conditions of stress, p53 regulates transcription cell lines, p53AIP1-induced apoptosis was ob- of p53CSV and that p53CSV is one of important served, and in 12 of these p53AIP1-sensitive players in the p53-mediated cell survival. cancer cell lines, the apoptotic effects of p53AIP1 were greater than those of p53 itself. Cancers b. Colon, Liver, and Gastric cancers with wild-type p53, which were thought to be p 53-resisitant, were likely to be sensitive to p53 Yoichi Furukawa, Ryuji Hamamoto, Hiroshi AIP1-induced apoptosis. p53-resistant cancers Okabe, Li Meihua, Meiko Takahashi, Takashi such as LS174T (p53+/+) and A549 (p53+/ Shimokawa, Kazutaka Obama, Michihiro +), in which no increase of p53AIP1 mRNA ex- Sakai, Katsuaki Ura, Takaaki Kobayashi, Pit- pression was observed when Ad-p53 was intro- tella Fabio, Natini Jinawath and Yusuke Naka- duced, were killed effectively by Ad-p53AIP1. mura Furthermore, co-introduction of p53 and p53AIP 1 had synergistic effects on the induction of Through a genome-wide cDNA microarray, apoptosis regardless of p53 status. Finally, we identified SMYD3, a novel gene over- adenovirus-mediated introduction of p53AIP1 expressed in the great majority of colorectal car- suppressed tumor growth in vivo. These results cinomas and hepatocellular carcinomas. Intro- suggested that p53AIP1 gene transfer might be- duction of SMYD3 into NIH3T3 cells enhanced come a new strategy for the treatment of p53- the cell growth, and its reduced expression by resistant cancers. siRNAs in various cancer cells resulted in sig- We also identified the 1aldehyde dehydroge- nificant growth suppression. We found that nase 4 (ALDH4) gene as a direct target of p53. through an interaction with an RNA helicase ALDH4 is a mitochondrial matrix NAD+- HELZ, SMYD3 formed a complex with RNA po- dependent enzyme catalyzing the second step of lymerase II and transactivated a set of genes in- the proline degradation pathway. The expres- cluding oncogenes, homeobox genes, and genes sion of ALDH4 mRNA was induced by various associated with cell cycle regulation. Subsequent cellular stresses, including hydrogen peroxide, analysis revealed that SMYD3 bound to a “5’- UV, g-irradiation or adriamycin treatment in a p CCCTCC-3’” motif present in the promoter re- 53-dependent manner. p53-dependent transcrip- gion of its downstream genes such as Nkx2.8.A tional activity of the binding site was confirmed SET domain of this protein revealed histone H3- by a reporter assay. Inhibition of ALDH4 ex- lysine 4 (H3-K4)-specific methyltransferase activ- pression by antisense oligonucleotides could en- ity, which was enhanced in the presence of 90- hance cell death induced by infection of Ad-p53. kD heat shock protein (HSP90A). Our findings ALDH4 over-expressing cells showed signifi- suggest that SMYD3 has a histone methyltrans- cantly lower intracellular ROS level than control ferase activity and plays an important role in cells after treatment of hydrogen peroxide or transcriptional regulation as a member of RNA UV. Those cells were also resistant to cell dam- polymerase complex, and that its activation is age by hydrogen peroxide. These results suggest one of key factors in human carcinogenesis. that ALDH4 is a direct-target of p53, and that p We also identified a novel human gene, 53 might play a protective role against cell dam- termed DDEFL1 (development and differentia- age induced by intracellular ROS generation tion enhancing factor-like 1), that was transacti- through transcriptional activation of ALDH4. vated in colon cancer. DDEFL1 encodes a prod- Although a number of p53-target genes have uct that shared structural features with been identified, the mechanisms of p53- centaurin-family proteins. The deduced 903- dependent activities that determine cellular sur- amino-acid sequence showed 46% homology to vival or death are still not fully understood. DDEF/ASAP1 (development and differentiation Here we report isolation of a novel p53-target enhancing factor), and contained an Arf GTPase- gene, designated p53-inducible cell-survival fac- activating protein (ArfGAP) domain and two tor (p53CSV). p53CSV contains a p53-binding ankyrin repeats. Gene transfer of DDEFL1 pro- site within its second exon and the reduction of moted proliferation of cells that lacked endoge- expression by small interfering RNA (siRNA) nous expression of this gene. Furthermore, re- 163 duction of DDEFL1 expression by transfection of system and identified nuclear Clusterin. Expres- anti-sense S-oligonucleotides inhibited the sion of CLUAP1 was gradually increased in the growth of SNU475 cancer cells, in which DDEFL late S to G2/M phases of cell cycle and it re- 1 expression was highly up-regulated. Our re- turned to the basal level in the G0/G1 phases. sults provide novel insight into hepatocarcino- Suppression of this gene by siRNAs resulted in genesis and may contribute to development of growth retardation in the transfected cells. These new strategies for diagnosis and treatment of data provide better understanding of colorectal HCC. carcinogenesis, and inactivation of CLUAP1 may Among the genes that were frequently trans- conceivably serve in the future as a novel thera- activated in colorectal tumors, we identified a peutic intervention for treatment of colon can- novel human gene termed LEMD1 (LEM domain cer. containing1) whose expression was elevated in We characterized biological importance of a 17 of 18 CRCs compared with their correspond- novel human gene termed RNF43 (RING finger ing non-cancerous mucosae. Multiple-tissue protein 43) in colorectal carcinogenesis. Its ex- northern blot analysis revealed that the gene ogenous expression conferred growth-promoting was abundantly expressed in the testis in a tis- effect in COS7 and NIH3T3 cells, and suppres- sue specific manner among the 16 adult normal sion of its expression by specific short interfer- tissues examined. Subsequent analysis identified ingRNAsresultedingrowthsuppressionofco- six alternatively spliced forms of transcripts, of lon cancer cells. Interestingly, RNF43 protein which one transcript of 201 bp was expressed was shown to be a secreted protein and addi- exclusively in the CRC tissues but not in their tion of the conditioned media of the RNF43- corresponding normal tissues. Since proteins transfected cells into culture media of NIH3T3 that are not expressed in normal tissues except cells revealed a significant enhancement of cell in the testis and expressed in cancer tissues, it growth. We also found that RNF43 protein in- should serve as a promising therapeutic target teracted with an extracelluler domain of Notch2 for CRC treatment like immunotherapy. in vitro and in vivo, suggesting that RNF43 may To uncover mechanisms underlying progres- exert its growth promoting effect through sion of colorectal carcinogenesis and to identify modulation of Notch2-signaling pathway in an genes associated with liver metastasis, we ana- antocrine manner. These data imply that RNF43 lyzed expression profiles of 14 primary colorec- plays an important role in colorectal carcino- tal cancers (CRCs) with liver metastases, and genesis and may serve as a good target mole- compared them with profiles of 11 non- cule for development of diagnosis and novel metastatic carcinomas and those of 9 adenomas drugs for CRCs. of the colon. A hierarchical cluster analysis us- Among the genes up-regulated in hepatocellu- ing data from a cDNA microarray containing lar carcinomas, we focused on a novel gene, 23,040 genes indicated that the cancers with me- termed WDRPUH and characterized its biologi- tastasis had different expression profiles from cal function. WDRPUH encodes a predicted 620 those without metastasis, although a number of amino acid protein containing 11 highly con- genes were commonly up-regulated in primary served WD40 repeat domains. Multiple-tissue cancers of both categories. We documented 54 northern blot analysis revealed its specific ex- genes that were frequently up-regulated and 375 pression in the testis among the 16 normal tis- that were frequently down-regulated in primary sues examined. Transfection of plasmids de- tumors with metastases to liver, but not in tu- signed to express WDRPUH -specific siRNA sig- mors without metastasis. Subsequent quantita- nificantly reduced its expression in HCC cells tive PCR experiments confirmed that PRDX4, andresultedingrowthsuppressionofthetrans- CKS2, MAGED2, and an EST (GenBank acces- fected cells. Interestingly, we found that sion number W38659) were expressed at signifi- WDRPUH associated directly with HSP70, pro- cantly higher levels in tumors with metastasis. teins of the chaperonin-containing TCP-1 (CCT) These data should contribute to a better under- complex, as well as BRCA2. These findings have standing of the progression of colorectal tumors, disclosed a novel insight into hepatocarcino- and facilitate prediction of their metastatic po- genesis and suggested that WDRPUH may be a tential. molecular target for development of new strate- Among the genes commonly transactivated in gies to treat HCCs. the cancers, we identified a novel human gene, Gastric cancer is the fourth leading cause of which we termed CLUAP1 (Clusterin Associated cancer-related death in the world. Two his- Protein 1). It encodes a nuclear protein of 413 tologically distinct types of gastric cancers, amino acids containing a coiled-coil domain. To namely intestinal type and diffuse type, have investigate its function, we searched for CLUAP different epidemiological and pathophysiologi- 1-interacting proteins using yeast two-hybrid cal features, suggesting different mechanisms of 164 carcinogenesis. A number of studies have been when at least one symmetrically methylated- carried out to determine the molecular mecha- CpG dinucleotides is presented as its recogni- nisms of intestinal-type gastric cancer, whereas tion sequence. A SET and RING finger- little is known about those of diffuse-type gas- associated (SRA) domain accounts for the high tric cancer that has a more invasive phenotype binding affinity of ICBP90 for methyl-CpG dinu- and poorer prognosis. To clarify the mecha- cleotides. This protein constitutes a complex nisms that underlie development and/or pro- with HDAC1 also via its SRA domain, and gression of diffuse-type gastric cancer, we com- bound to methylated promoter regions of vari- pared the expression profiles of 20 diffuse-type ous tumor suppressor genes, including p16 INK4A gastric cancer tissues with their corresponding and p14 ARF, in cancer cells. It has been reported non-cancerous mucosae by means of cDNA mi- that expression of ICBP90 was up-regulated by croarray containing 23,040 genes in combination E2F-1, and we confirmed the up-regulation was with laser-microbeam microdissection. We iden- caused by binding of E2F-1 to the intron1 of tified 153 genes commonly up-regulated and ICBP90, which contains two E2F-1 binding mo- more than 500 genes commonly down- tifs. Our data also revealed accumulation of regulated. Furthermore, comparison of the ex- ICBP90 in breast-cancer cells, where it might pression profiles of diffuse-type with those of suppress expression of tumor suppressor genes intestinal-type gastric cancers identified 46 genes through deacetylation of histones after recruit- that may represent the distinct molecular signa- ment of HDAC1. The data reported here suggest ture of each type of gastric cancer. The signature that ICBP90 is involved in cell proliferation by of diffuse-type cancer exhibits altered expression way of methylation-mediated regulation of cer- of genes related to cell-matrix interaction and tain genes. extracellular matrix components. Although fur- ther investigation of their functions is essential, d. cDNA microarray analysis of cancer these data should help to better understand the different mechanisms underlying gastric carcino- Toyomasa Katagiri, Yataro Daigo, Hidewaki genesis and may also provide clues to the iden- Nakagawa, Yoichi Furukawa , Takehumi tification of novel diagnostic markers and/or Kikuchi, Soji Kakiuchi, Toru Nakamura, Koi- therapeutic targets of diffuse-type gastric cancer. chi Okada, Satoshi Nagayama, Shingo Ashida, In addition, among the 153 genes whose expres- Toshihiro Nishidate, Chie Suzuki, Nobuhisa sion levels were elevated in cancers compared to Ishikawa, Tatsuya Kato, Akira Togashi, Sa- non-cancerous mucosae, we identified a gene toshi Hayama, Megumi Iiizumi, Keisuke Tani- termed NOL8 that encodes a putative 150-kDa uchi, and Yusuke Nakamura protein with an RRM domain in its amino-acid terminal region. Northern blot analysis revealed (1) Chemosensitivity that NOL8 was abundantly expressed in skeletal Gefitinib (Iressa, ZD1839), an inhibitor of epi- muscle but not expressed in other 22 tissues ex- dermal growth factor receptor-tyrosine kinase amined. Immunocytochemical staining of NOL8 (EGFR-TK), has shown potent anti-tumor effects showed its localization in the nucleolus and sub- and improved symptom and quality-of-life of a sequent protein phosphatase analysis revealed subset of patients with advanced NSCLC. How- that it was present as the phophorylated form. ever, a large portion of the patients showed no In addition, transfection of short-interfering effect to this agent. To establish a method to RNA specific to NOL8 into three diffuse-type predict the response of NSCLC patients to gefit- gastric cancer cells, St-4, MKN45 and TMK-1, ef- inib, we used a genome-wide cDNA microarray fectively reduced expression of this gene and in- to analyze 33 biopsy samples of advanced duced apoptosis in these cells. These findings NSCLC from patients who had been treated provide a new insight into diffuse-type gastric with an identical protocol of second-to 7th-line carcinogenesis and may contribute to develop- gefitinib monotherapy. We identified 51 genes ment of new therapeutic strategies for diffuse- whose expression differed significantly between type gastric cancer. 7 responders and 10 non-responders to the drug. We selected the 12 genes that showed the c. Epigenetic control of cancer most significant differences to establish a nu- merical scoring system (GRS, gefitinib response Motoko Unoki and Yusuke Nakamura score), for predicting response to gefitinib treat- ment. The GRS system clearly separated the two ICBP90, (inverted CCAAT box-binding pro- groups without any overlap, and accurately pre- tein of 90 kDa), has been reported as a regulator dicted responses to 2the drug in sixteen addi- of topoisomerase IIa expression. We present evi- tional NSCLC cases. The system was further denceherethatICBP90bindstomethyl-CpG validated by the semi-quantitative RT-PCR, im- 165 munohistochemistry, and ELISA for serological and the other in the other types of bones in test. Moreover, we proved that the anti- rather middle age. A set of genes was identified apoptotic activity of amphiregulin (AREG), a to characterize this subgroup, some of which protein that was significantly over-expressed in were previously indicated possible relation to non-responders but undetectable in responders, carcinogenesis of osteosarcoma. Thirteen of the leads to resistance of NSCLC cells to gefitinib in 19 patients were treated with an identical proto- vitro. Our results suggested that sensitivity of a col of chemotherapy with doxorubicin, cisplatin given NSCLC to gefitinib can be predicted ac- and ifosfamide, and histological examination of cording to expression levels of a defined set of resected specimens after operation classified six genes that may biologically affect drug sensitiv- casesasrespondersandsevenasnon- ity and survival of lung-cancer cells. Our scor- responders. A comparison of expression profiles ing system might eventually lead to achieve- of these two groups identified 60 genes whose ment of personalized therapy for NSCLC pa- expression levels were likely to be correlated tients. with the response to this particular chemother- Neoadjuvant chemotherapy for invasive blad- apy (P value of <0.008). We developed a drug der cancer, involving a regimen of methotrexate, response scoring (DRS) system on the basis of vinblastin, doxorubicin, and cisplatin (M-VAC), the expression levels of these genes, and proved can improve the resectability of larger neo- this system may be applicable to predict the re- plasms for some patients and offer a better sponse to this protocol irrespective to the sub- prognosis. However, some suffer severe adverse classification of OS. The reliability of the DRS drug reactions without any effect, and no system was further confirmed by testing addi- method yet exists for predicting the response of tional five OS cases. These results indicated that an individual patient to chemotherapy. Our pur- scoring system based on gene-expression pro- pose in this study is to establish a method for files might be useful to predict the response to predicting response to the M-VAC therapy. chemotherapy for OS. Hence, we analyzed gene- expression profiles of biopsy materials from 27 invasive bladder can- (2) Lung cancer cers using a cDNA microarray consisting of We have been investigating genes involved in 27,648 genes, after populations of cancer cells pulmonary carcinogenesis by examining genome had been purified by laser-microbeam microdis- -wide gene-expression profiles of non-small cell section. We identified dozens of genes that were lung cancers (NSCLCs), to identify molecules expressed differently between nine “responder” that might serve as diagnostic markers or tar- and nine “non-responder” tumors; from that list gets for development of new molecular thera- we selected the 14 “predictive” genes that pies. A gene encoding ADAM8, a disintegrin showed the most significant differences and de- and metalloproteinase domain-8 protein was se- vised a numerical prediction-scoring system that lected as a candidate for such molecule. Tumor- clearly separated the responder group from the tissue microarray was applied to examine ex- non-responder group. This system accurately pression of ADAM8 protein in archival lung- predicted the drug responses of eight of nine cancer samples from 363 patients. Serum ADAM test cases that were reserved from the original 8 levels of 105 lung-cancer patients and 72 con- 27 cases. As real-time RT-PCR data were highly trols were also measured by ELISA. A role of concordant with the cDNA microarray data for ADAM8 in cellular motility was examined by those 14 genes, we developed a quantitative RT- Matrigel assays. ADAM8 was abundantly ex- PCR based-prediction system that could be fea- pressed at both transcriptional and protein lev- sible for routine clinical use. Our results suggest els in the great majority of lung-cancer tissues that the sensitivity of an invasive bladder cancer and cell-lines examined. A high level of ADAM8 to the M-VAC neoadjuvant chemotherapy can protein expression was significantly more com- be predicted by expression patterns in this set of mon in advanced stage-IIIB/IV adenocarcino- genes, a step toward achievement of “personal- mas (ADCs) than in ADCs at stages I-IIIA. Se- ized therapy” for treatment of this disease. rum levels of ADAM8 protein detected by To establish a method for predicting the re- ELISA were significantly higher in lung-cancer sponse to chemotherapy for osteosarcoma (OS), patients than in healthy controls. The proportion we performed expression profile analysis using of the serum ADAM8-positive cases defined by cDNA microarray consisting of 23,040 genes. Hi- our criteria was 63% and that for carcinoembry- erarchical clustering based on the expression onic antigen (CEA) was 57%, indicating equiva- profiles of 19 biopsy samples of OS demon- lent diagnostic power of these two markers. A strated two major clusters; one consisted exclu- combined assay using both ADAM8 and CEA sively of typical OS, i.e. conventional central OS increased sensitivity, as 80% of the patients with in long bone of patients in the second decade lung cancer were then diagnosed as positive 166 while only 11% of 72 healthy volunteers were volved in pancreatic carcinogenesis and apply falsely diagnosed as positive. In addition, exoge- the information toward development of novel nous expression of ADAM8 increased the migra- tumor markers and therapeutic targets, we ana- tory activity of mammalian cells, an indication lyzed gene-expression profiles of 18 pancreatic that ADAM8 may play a significant role in pro- tumors using a cDNA microarray representing gression of lung cancer. 23,040 genes. As pancreatic ductal adenocarcino- mas usually contain a low proportion of cancer (3) Testicular cancer cells in the tumor mass, we prepared 95% pure To identify new diagnostic markers for tes- populations of pancreatic-cancer cells by means ticular germ cell tumors (TGCTs), including of laser microbeam microdissection (LMM), and seminomas, as well as potential targets of new compared their expression profiles to those of drugs for treating the disease, we compared similarly purified, normal pancreatic ductal gene-expression profiles of cancer cells from 13 cells. Because of the high degree of purity in the seminomas with normal human testis using cell populations, a large proportion of genes that laser-capture microdissection and a cDNA mi- we detected as up-regulated or down-regulated croarray representing 23,040 genes. We identi- in pancreatic cancers were different from those fied 349 genes that were commonly up- reported in previous studies. We identified 260 regulated in seminoma cells. The functions of genes that were commonly up-regulated in pan- 227 were known to some extent; the remaining creatic cancer cells; the functions of 66 of them 122 included 57 ESTs. On the list were cyclin D2 (including 31 ESTs) are currently unknown. The (CCND2), prostate cancer over-expressed gene 1 up-regulated genes included some that were (POV1), and junction plakoglobin (JUP), all of previously reported to be over-expressed in pan- which were already known to be over-expressed creatic cancers, such as interferon-induced trans- in seminomas. On the other hand, our protocol membrane protein 1 (IFITM1), plasminogen acti- selected 593 genes as being commonly down- vator, urokinase (PLAU), prostate stem cell anti- regulated in seminoma cells. That list included gen (PSCA), S100 calcium binding protein P (S 340 functionally characterized genes; the other 100P), and baculoviral IAP repeat-containing 5 253 included 131 ESTs. To confirm the expres- (BIRC5). On the other hand, 346 genes were sion data, we performed semi-quantitative RT- commonly down-regulated in the cancer cells. PCR experiments with nine highly up-regulated Of those, 211 had been functionally character- genes, and the results supported those from of ized; this group included tumor suppressor our microarray analysis. The information pro- genes such as AXIN1 up-regulated 1 (AXUD1), vided here should prove useful for identifying deleted in liver cancer 1 (DLC1), growth arrest genes whose products might serve as molecular and DNA-damage-inducible beta (GADD45B), targets for treatment of TGCTs. We subse- and p53-inducible p53DINP1 (p53DINP1). These quently analyszed NACHT, leucine rich repeat data should provide useful information for find- and PYD containing 7 (NALP7), that was signifi- ing candidate genes whose products might serve cantly transactivated in testicular seminomas. as specific tumor markers and/or as molecular Northern blot analyses confirmed an approxi- targets for treatment of patients with pancreatic mately 3.3-kb transcript that was expressed ex- cancer. clusively in testis although the expression level Through functional analysis of genes that of this gene in normal testis was much lower were transactivated in PDACs, we identified than in tumor cells, suggesting an important RAB6KIFL as a good candidate for development role of this gene in germ-cell proliferation. Im- of drugs to treat PDACs at the molecular level. munohistocheminal analysis using anti-NALP7 Knockdown of endogenous RAB6KIFL expres- polyclonal antibody detected the endogenous sion in PDAC cell lines by siRNA drastically at- NALP7 protein in the cytoplasm of embryonal tenuated growth of those cells, suggesting an es- carcinoma cells and testicular seminoma tissues. sential role for the gene product in maintaining Transfection of small interfering RNA (siRNA) viability of PDAC cells. RAB6KIFL belongs to for NALP7 significantly reduced the NALP7 ex- the kinesin superfamily of motor proteins, pression and resulted in growth suppression of which have critical functions in trafficking of testicular germ-cell tumor. These findings imply molecules and organelles. Proteomics analyses that NALP7 may play crucial roles in cell prolif- using a polyclonal anti-RAB6KIFL antibody eration as well as testicular tumorigenesis, and identified one of the cargoes transported by represents a promising candidate for develop- RAB6KIFL as discs large homolog 5 (DLG5), a ment of targeted therapy for TGCTs. scaffolding protein that may link the vinexin-β- catenin complex at sites of cell-cell contact. Like (4) Pancreatic cancer RAB6KIFL, DLG5 was up-regulated in PDACs, To characterize molecular mechanism in- and knockdown of endogenous DLG5 by siRNA 167 significantly suppressed the growth of PDAC a full understanding of the molecular basis of cells as well. Decreased levels of endogenous mammary tumorigenesis. In this study we ana- RAB6KIFL in PDAC cells altered the sub- lyzed gene-expression profiles of 81 surgical cellular localization of DLG5 from cytoplasmic specimens of 12 ductal carcinomas in situ membranes to cytoplasm. Our results imply that (DCIS) and 69 invasive ductal carcinomas (IDC). collaboration of RAB6KIFL and DLG5 is likely After applying laser-microbeam microdissection to be involved in pancreatic carcinogenesis. to all samples we achieved 98-99% pure popula- These molecules should be promising targets for tions of breast-cancer cells, and of normal breast development of new therapeutic strategies for epithelial cells used as controls. A cDNA- ductal adenocarcinomas of the pancreas. microarray analysis of 23,040 genes in these samples and a subsequent unsupervised hierar- (5) Prostate cancer chical clustering distinguished two tumor To characterize the molecular mechanisms in- groups, mainly in terms of estrogen-receptor volved in prostate carcinogenesis and to evalu- (ER) status. We then undertook a supervised ate a controversial hypothesis about the putative analysis and identified 325 genes that were com- transition from prostatic intraepithelial neoplasia monly either up-or down-regulated in both pa- (PIN) to invasive prostate cancer (PC), we ana- thologically discrete stages (DCIS and IDC), in- lyzed gene-expression profiles of 20 PCs and 10 dicating that these genes might play important high-grade PINs, using a cDNA microarray rep- roles in malignant transformation of breast duc- resenting 23,040 genes. Considering the histo- tal cells. In addition, we searched invasion- logical heterogeneity of PCs and the minimal associated gene candidates whose expression nature of PIN lesions, we applied laser mi- was altered in IDC, but not in DCIS, and identi- crobeam microdissection (LMM) to purify popu- fied 24 up-regulated genes and 41 down- lations of PC and PIN cells, and then compared regulated genes. Furthermore, we identified 34 their expression profiles with those of corre- genes that were expressed differently in tumors sponding normal prostatic epithelial cells that from patients with lymph-node metastasis as were also purified by LMM. A hierarchical clus- opposed to no metastasis. On that basis we de- tering analysis clearly separated the PC group veloped a scoring system that correlated well from the PIN group except for three tumors that with the metastatic status. Tumors from all of were morphologically defined as one very-high- the 37 test patients with lymph-node metastasis grade PIN and two low-grade PCs. The findings yielded positive scores by our definition, suggested that PINs and PCs share some mo- whereas 38 of the 40 tumors (95%)without lecular features, and supported the hypothesis lymph-node metastasis had negative scores. Our of PIN-to-PC transition. Based on this hypothe- data should provide useful information for iden- sis, we identified 21 genes that were up- tifying predictive markers for invasion or metas- regulated and 63 that were down-regulated tasis, and suggest potential target molecules for commonly in PINs and PCs comparing with treatment of breast cancers. normal epithelium, which were considered to be involved in the presumably early stage of 2. Genes responsible for other diseases prostatic carcinogenesis. The altered genes in- cluded AMACR, OR51E2, RODH and SMS .Fur- a. Bone development thermore, comparing the expression profiles of PCs with those of PINs, we identified 41 genes Mitsuhito Doi and Yusuke 3Nakamura that were up-regulated and 98 that were down- regulated in the transition from PINs to PCs; Through expression profile analyses of the hu- those altered genes included elements that are man mesenchymal stem cells incubated in the likely to be involved in cell adhesion or motility osteogenic supplements, we identified and char- of invasive PC cells, such as POV1, CDKN2C, acterized a novel human cDNA, EMILIN-5 FASN, ITGB2, LAMB2, PLAU ,andTIMP1. (Elastin Microfibril Interface Located proteIN-5), These data provide clues to the molecular that is likely to play a significant role in osteo- mechanisms underlying prostatic carcinogenesis, genic process. The deduced amino acid se- and suggest candidate genes whose products quence of EMILIN-5 consists of 766 amino acids might serve as molecular targets for prevention with a cysteine-rich EMI domain at the NH2 ter- and treatment of prostate cancers. minus. Western blot analysis suggested that EMILIN-5 expression was detected in various (6) Breast cancer osteoblastic cells. Immunohistochemistry of Breast carcinoma is a complex disease charac- mouse embryos at 13.5 days post coitus interest- terized by accumulation of multiple genetic al- ingly revealed relatively high levels of EMILIN-5 terations, and investigators are far from having protein in perichondrium cells of developing 168 limbs. The present findings suggest that the q13.2-q13.4. The association (c2=17.1, p = EMILIN-5 gene plays an important role in mes- 0.00003; odds ratio of 1.85 with 95% confidence enchymal development. interval of 1.39-2.50 in a dominant association model) was found using DNA from 465 affected b. IgA nephropathy individuals and 634 controls. The SNP (G34448 A) caused an amino-acid substitution from glu- Shigeru Ohtsubo, Aritoshi Iida2, Kosaku Nitta1, tamine to lysine (E928K). As the gene product is Toshihiro Tanaka3,RyoYamada4,YozoOh- involved in immunoglobulin-class switching and nishi3, Shiro Maeda5, Tatsuhiko Tsunoda6, patients with the A allele revealed higher serum Takashi Takei1, Wataru Obara7, Fumihiro Aki- levels of IgA (p=0.048), the amino-acid change yama8, Kyoko Ito1, Kazuho Honda1,Keiko might influence a class-switch to increase serum Uchida1, Ken Tsuchiya1, Wako Yumura1, Taka- IgA levels, resulting in a higher risk of IgA shi Ujiie9, Yutaka Nagane10, Satoru Miyano, nephropathy Yasushi Suzuki7, Ichiei Narita8,FumitakeGe- jyo8, Tomoaki Fujioka9, Hiroshi Nihei1 and c. Crohn disease Yusuke Nakamura: Department of Medicine, Kidney Center, Tokyo Women’s Medical Uni- Keiko Yamazaki, Masakazu Takazoe1, Torao versity, Tokyo, Japan; Laboratory for Geno- Tanaka1, Toshiki Ichimori1, Susumu Saito2, typing, Laboratory for Cardiovascular Dis- Aritoshi Iida2, Yoshihiro Onouchi2,AkiraHata2 eases, Laboratory for Rheumatic Diseases, and Yusuke Nakamura: Department of Medi- Laboratory for Diabetic Nephropathy, and cine, Division of Gastroenterology, Social In- Laboratory for Medical Informatics, SNP Re- surance Chuo General Hospital, Tokyo, Japan, search Center, The Institute of Physical and SNP Research Center, the Institute of Physical Chemical Research (RIKEN), Tokyo, Japan; and Chemical Research (RIKEN), Kanagawa, Department of Urology, Iwate Medical Univer- Japan sity, Iwate, Japan; Division of Clinical Neph- rology and Rheumatology, Niigata University Crohn disease (CD) is an inflammatory bowel Graduate School of Medical and Dental Sci- disease as characterized by chronic transmural, ences, Niigata, Japan; Department of Urology, segmental and typically granulomatous inflam- Iwate Prefectural Ofunato Hospital, Iwate, Ja- mation of the gut. Recently, two novel candidate pan; Department of Urology, Sanai Hospital, gene loci associated with CD, SLC22A4 and SLC Iwate, Japan 22A5 on 5 known as IBD5,and DLG5 on chromosome 10, were identified Immunogobulin A (IgA) nephropathy is the through association analysis of Caucasian CD most common form of primary glomeru- patients. We validated these candidate genes in lonephritis worldwide. The pathogenesis of IgA Japanese patients with CD and found a weak nephropathy is unknown, but it is certain that but possible association with both SLC22A4 (p= some genetic factors are involved in susceptibil- 0.028) and DLG5 (p=0.023), although the re- ity to the disease. Employing a large-scale, case- ported genetic variants that were indicated to be control association study using gene-based causative in Caucasian population were com- single-nucleotide polymorphism (SNP) markers, pletely absent in or were not associated with we previously reported three candidate genes. Japanese CD patients. These findings imply sig- We report here an additional significant associa- nificant differences in genetic background with tion between IgA nephropathy and a SNP lo- CD susceptibility among different ethnic groups cated in the gene encoding immunoglobulin m- and further indicate some difficulty of binding protein 2 (IGHMBP2) at chromosome 11 population-based studies.

Publications

Okabe, H., Furukawa, Y., Kato, T., Hasegawa, pathway. Oncogene 23: 7621-7627, 2004. S.,Yamaoka,Y.andNakamura,Y.Isolationof Yoshida, K., Monden, M., Nakamura, Y. and development and differentiation enhancing Arakawa, H. Adenovirus-mediated p53AIP1 factor-like 1 (DDEFL1) as a drug target for he- gene transfer as a new strategy for treatment patocellular carcinomas. Int. J. Oncology. 24: of p53-resistant tumors. Cancer Sci. 95: 91-97, 43-48, 2004. 2004. Anazawa, Y., Arakawa, H., Nakagawa, H. and Li, M., Lin, Y.-M., Hasegawa, S., Shimokawa, T., Nakamura, Y. Identification of STAG1 as a Murata, K., Kameyama, M., Ishikawa, O., key mediator of a p53-dependent apoptotic Katagiri, T., Tsunoda, T., Nakamura, Y. and 169

Furukawa, Y. Genes associated with liver me- Kikkawa,E.,Omura,Y.,Abe,K.,Kamihara, tastasis of colon cancer, identified by genome- K.,Katsuta,N.,Sato,K.,Tanikawa,M., wide cDNA microarray. Int. J. Oncology. 24: Yamazaki, M., Ninomiya, K., Ishibashi, T., 305-312, 2004. Yamashita, H., Murakawa, K., Fujimori, K., Nagayama, S., Iiizumi, M., Katagiri, T., Tanai, H., Kimata, M., Watanabe, M., Hiraoka, Toguchida, J. and Nakamura, Y. Identification S., Chiba, Y., Ishida, S., Ono, Y., Takiguchi, S., of PDZK4, a novel human gene with PDZ do- Watanabe, S., Yosida, M., Hotuta, T., Kusano, mains, that is up-regulated in synovial sarco- J., Kanehori, K., Takahashi-Fujii, A., Hara, H., mas. Oncogene 23: 5551-5557, 2004. Tanase,T.-O.,Nomura,Y.,Togiya,S.,Komai, Ochi, K., Daigo, Y., Katagiri, T., Nagayama, S., F., Hara, R., Takeuchi, K., Arita, M., Imose, Tsunoda,T.,Myoui,A.,Naka,N.,Araki,N., N., Musashino, K., Yuuki, H., Oshima, A., Kudawara, I., Ieguchi, M., Toyama, Y., Sasaki, N., Aotsuka, S., Yoshikawa, Y., Mat- Toguchida, J., Yoshikawa, H. and Nakamura, sunawa, H., Ichihara, T., Shiohata, N., Sano, Y. Prediction of response to neoadjuvant che- S., Moriya, S., Momiyama, H., Satoh, N., motherapy for osteosarcoma by gene- Takami, S., Terashima, Y., Suzuki, O., Naka- expression profiles. Int. J. Oncol. 24: 647-655, gawa, S., Senoh, A., Mizoguchi, H., Goto, Y., 2004. Shimizu, F., Wakebe, H., Hishigaki, H., Nakamura, Y. Isolation of p53-target genes and Watanabe, T., Sugiyama, A., Takemoto, M., their functional analysis (Review). Cancer Sci. Kawakami, B., Yamazaki, M., Watanabe, K., 95: 7-11, 2004. Kumagai, A., Itakura, S., Fukuzumi, Y., Fuji- Sekiya, T., Adachi, S., Kohu, K., Yamada, T., mori, Y., Komiyama, M., Tashiro, H., Tani- Higuchi, O., Furukawa, Y., Nakamura, Y., gami, A., Fujiwara, T., Ono, T., Yamada, K., Nakamura, T., Tashiro, K., Kuhara, S., Oh- Fujii, Y., Ozaki, K., Hirao, M., Ohmori, Y., wada, S. and Akiyama, T. Identification of Kawabata, A., Hikiji, T., Kobatake, N., Ina- BMP and activin membrane-bound inhibitor gaki, H., Ikema, Y., Okamoto, S., Okitani, R., (BAMBI), an inhibitor of TGF-β signaling, as a Kawakami, T., Noguchi, S., Itoh, T., Shigeta, target of the β-catenin pathway in colorectal K., Senba, T., Matsumura, K., Nakajima, Y., tumor cells. J. Biol. Chem. 279: 6840-6846, Mizuno, T., Morinaga, M., Sasaki, M., To- 2004. gashi, T., Oyama, M., Hata, H., Watanabe, M., Doi, M., Nagano, A. and Nakamura, Y. Molecu- Komatsu, T., Mizushima-Sugano, J., Satoh, T., lar cloning and characterization of a novel Shirai, Y., Takahashi, Y., Nakagawa, K., Oku- gene, EMILIN-5, and its possible involvement mura, K., Nagase, T., Nomura, N., Kikuchi, in skeletal development. Biochem. Biophy. H., Masuho, Y., Yamashita, R., Nakai, K., Res. Commun. 313: 888-893, 2004. Yada, T., Nakamura, Y., Ohara, O., Isogai, T. Yoon, K., Nakamura, Y. and Arakawa, H. Iden- and Sugano, S. Complete sequencing and tification of ALDH4 as a p53-inducible gene characterization of 21,243 full-length human and its protective role in cellular stresses. J. cDNAs. Nat. Genet. 36: 40-45, 2004. Hum. Genet. 49: 134-140, 2004. Iida, A., Saito, S., Sekine, A., Kataoka, Y., Tabei, Nakamura, T., Furukawa, Y., Tsunoda, T., Ohi- W. and Nakamura, Y. Catalog of 300 SNPs in gashi, H., Murata, K., Ishikawa, O., Ohgaki, 23 genes encoding G-protein coupled recep- K.,Kashimura,N.,Miyamoto,M.,Hirano,S., tors. J. Hum. Genet. 49: 194-208, 2004. Kondo,S.,Katoh,H.,Nakamura,Y.and Kochi, Y., Yamada, R., Kobayashi, K., Takahashi, Katagiri, T. Genome-wide cDNA microarray A., Suzuki, A., Sekine, A., Mabuchi, A., Aki- analysis of gene-expression profiles in pancre- yama, F., Tsunoda, T., Nakamura, Y. and atic cancers using populations of tumor cells Yamamoto, K. Analysis of single-nucleotide and normal ductal epithelial cells selected for polymorphisms in Japanese rheumatoid ar- purity by laser microdissection. Oncogene 23: thritis patients shows additional susceptibility 2385-2400, 2004. markers besides the classic shared epitope Ota,T.,Suzuki,Y.,Nishikawa,T.,Otsuki,T., susceptibility sequences. Arthritis & Rheuma- Sugiyama, T., Irie, R., Wakamatsu, A., Hay- tism 50: 63-71, 2004. ashi, K., Sato, H., Nagai, K., Kimura, K., Harima, Y., Togashi, A., Horikoshi, K., Ima- Makita, H., Sekine, M., Obayashi, M., Nishi, mura, M., Sougawa, M., Sawada, S., Tsunoda, T., Shibahara, T., Tanaka, T., Ishii, S., T., Nakamura, Y. and Katagiri, T. Prediction Yamamoto,J.-I.,Saito,K.,Kawai,Y.,Isono,Y., of outcome of advanced cervical cancers to Nakamura, Y., Nagahari, K., Murakami, K., thermoradiotherapy according to expression Yasuda, T., Iwayanagi, T., Wagatsuma, M., profiles of 35 genes selected by cDNA mi- Shiratori, A., Sudo, H., Hosoiri, T., Kaku, Y., croarray analysis. Int. J. Radiat. Oncol. Biol. Kodaira, H., Kondo, H., Sugawara, M., Taka- Phys. 60: 237-248, 2004. hashi,M.,Kanda,K.,Yokoi,T.,Furuya,T., Yuki, D., Lin, Y.-M., Fujii, Y., Nakamura, Y. and 170

Furukawa, Y. Isolation of LEM domain- M.,Tsunoda,T.,Takata,R.,Kasahara,K., containing 1, a novel testis-specific gene ex- Miki,T.,Fujioka,T.,Shuin,T.andNakamura, pressed in colorectal cancers. Oncol. Rep. 12: Y. Molecular features of the transition from 275-280, 2004. prostatic intraepithelial neoplasia (PIN) to Jinawath, N., Furukawa, Y. and Nakamura, Y. prostate cancer : Genome-wide gene- Identification of NOL8, a nucleolar protein expression profiles of prostate cancers and containing an RNA recognition motif (RRM), PINs. Cancer Res. 64: 5963-5972, 2004. which was overexpressed in diffuse-type gas- Nishidate, T., Katagiri, T., Lin, M.-L., Mano, Y., tric cancer. Cancer Science 95: 430-435, 2004. Miki, Y., Kasumi, F., Yoshimoto, M., Tsunoda, Iida, A. and Nakamura, Y. Identification of 45 T., Hirata, K. and Nakamura, Y. Genome- novel SNPs in the 83-kb region containing wide gene-expression profiles of breast-cancer peptidylarginine deiminase types 1 and 3 loci cells purified with laser microbeam microdis- on chromosomal band 1p36. 13. J. Hum. section: Identification of genes associated with Genet. 49: 387-390, 2004. progression and metastasis. Int. J. Oncol. 25: Ozaki,K.,Inoue,K.,Sato,H.,Iida,A.,Ohnishi, 797-819, 2004. Y., Sekine, A., Sato, H., Odashiro, K., Mizuguchi, T., Collod-Beroud, G., Akiyama, T., Nobuyoshi, M., Hori, M., Nakamura, Y. and Abifadel, M., Harada, N., Morisaki, T., Allard, Tanaka, T. Functional variation in LGALS2 D., Varret, M., Claustres, M., Morisaki, H., confers risk of myocardial infarction and Ihara, M., Kinoshita, A., Yoshiura, K., Junien, regulates lymphotoxin-alpha secretion in vi- C., Kajii, T., Jondeau, G., Ohta, T., Kishino, T., tro. Nature 429: 72-75, 2004. Furukawa, Y., Nakamura, Y., Niikawa, N., Kamatani, N., Sekine, A., Kitamoto, T., Iida, A., Boileau, C. and Matsumoto, N. Heterozygous Saito, S., Kogame, A., Inoue, E., Kawamoto, TGFBR2 mutations in Marfan syndrome. Nat. M.,Harigai,M.andNakamura,Y.Large-scale Genet. 36: 855-860, 2004. single-nucleotide polymorphism (SNP) and Unoki, M., Nishidate, T. and Nakamura, Y. haplotype analyses, using dense SNP maps, of ICBP90, an E2F-1 target, recruits HDAC1 and 199 drug-related genes in 752 subjects: The binds to methyl-CpG through its SRA do- analysis of the association between uncom- main. Oncogene 23: 855-860, 2004. mon SNPs within haplotype blocks and the Cha, P.-C., Yamada, R., Sekine, A., Nakamura, haplotypes constructed with haplotype- Y. and Koha, C.-L. Inference from the rela- tagging SNPs. Am. J. Hum. Genet. 75: 190- tionships between linkage disequilibrium and 203, 2004. allele frequency distributions of 240 candidate Iida, A., Saito, S., Sekine, A., Tabei, W., Kataoka, SNPs in 109 genes of drug-related genes im- Y. and Nakamura, Y. Identification of 20 portance in 4 Asian populations. J. Hum. novel SNPs in the guanine nucleotide binding Genet. 49: 558-572, 2004. protein alpha 12 gene locus. J. Hum. Genet. Takahashi, M., Lin, Y.-M., Nakamura, Y. and 49: 445-448, 2004. Furukawa, Y. Isolation and characterization of Jinawath, N., Furukawa, Y., Hasegawa, S., Li, a novel gene CLUAP1 whose expression is M., Tsunoda, T., Satoh, S., Yamaguchi, T., frequently upregulated in colon cancer. Onco- Imamura, H., Inoue, M., Shiozaki, H. and gene 23: 9289-9294, 2004. Nakamura, Y. Genome-wide analysis of gene Yoshitake, Y., Nakatsura, T., Monji, M., Senju, expression in diffuse-type 4 gastric cancers us- S., Matsuyoshi, H., Tsukamoto, H., Hosaka, S., ing cDNA microarray; comparative expression Komori, H., Fukuma, D., Ikuta, Y., Katagiri, profiles between diffuse-type and intestinal- T., Furukawa, Y., Ito, H., Shinohara, M., type gastric cancer. Oncogene 23: 6830-6844, Nakamura, Y. and Nishimura, Y. Proliferation 2004. potential-related protein, an ideal esophageal Tsunoda,T.,Lathrop,G.M.,Sekine,A.,Yamada, cancer antigen for immnotherapy, identified R., Takahashi, A., Ohnishi, Y., Tanaka, T. and using cDNA microarray analysis. Clin. Cancer Nakamura,Y.Variationofgene-basedSNPs Res. 10: 6437-6448, 2004. and linkage disequilibrium patterns in the hu- Yagyu, R., Furukawa, Y., Lin, Y.-M., Shimok- man genome. Hum. Mol. Genet. 13: 1623- awa, T., Yamamura, T. and Nakamura, Y. A 1632, 2004. novel oncoprotein RNF43 functions as an Hamamoto, R., Furukawa, Y., Morita, M., autocrine manner in colorectal cancer. Int. J. Iimura,Y.,Silva,F.P.,Li,M.,Yagyu,R.and Oncol. 25: 1343-1348, 2004. Nakamura, Y. SMYD3 encodes a novel his- Yang, L., Leung, A.C.C., Ko, J.M.Y., Lo, P.H.Y., tone methyltransferase involved in the prolif- Tang, J.C.O., Srivastava, G., Oshimura, M., eration of cancer cells. Nat. Cell Biol. 6: 731- Stanbridge, E.J., Daigo, Y., Nakamura, Y., 740, 2004. Tang, C.M.C., Lau, K.W., Law, S. and Lung, Ashida, S., Nakagawa, H., Katagiri, T., Furihata, M. L. Tumor suppressive role of a 2.4 Mb 9q 171

33-q34 critical region and DEC1 in esophageal M., Nakashima, K., Hasegawa, K., Takahashi, squamous cell carcinoma. Oncogene, Dec 6, N., Shimizu, M., Sekiguchi, H., Kokubo, M., 2004 [Epub ahead of print]. Doi, S., Fujiwara, H., Miyatake, A., Fujita, K., Ishikawa, N., Daigo, Y., Yasui, W., Inai, K., Enomoto, T., Kishi, F., Suzuki, Y., Saito, H., Nishimura, H., Tsuchiya, E., Kohno, N. and Nakamura, Y., Shirakawa, T. and Tamari, M. Nakamura, Y. ADAM8 as a novel serological Association between genetic variation in the and histochemical marker for lung cancer. gene for death-associated protein-3 (DAP3) Clin. Cancer Res. 10: 8363-8370, 2004. and adult asthma. J. Hum. Genet. 49: 370-375, Okada, K., Hirota, E., Mizutani, Y., Fujioka, T., 2004. Shuin, T., Miki, T., Nakamura, Y. and Kanazawa, A., Tsukada, S., Sekine, A., Tsunoda, Katagiri,T.OncogenicroleofNALP7intes- T., Takahashi, A., Kashiwagi, A., Tanaka, Y., ticular seminomas. Cancer Sci. 95: 949-954, Babazono, T., Matsuda, M., Kaku, K., 2004. Iwamoto, Y., Kawamori, R., Kikkawa, R., Yamazaki, K., Takazoe, M., Tanaka, T., Ichimori, Nakamura,Y.andMaeda,S.Associationof T., Saito, S., Iida, A., Onouchi, Y., Hata, A. the gene encoding wingless-type mammary and Nakamura, Y. Association analysis of SLC tumor virus integration-site family member 5B 22A4, SLC22A5 and DLG5 in Japanese pa- (WNT5B) with type 2 diabetes. Am. J. Hum. tients with Crohn disease. J. Hum. Genet. 49: Genet. 75: 832-843, 2004. 664-668, 2004. Ohtsubo,S.,Iida,A.,Nitta,K.,Tanaka,T., Minaguchi, T., Yoshikawa, H., Nakagawa, S., Yamada, R., Ohnishi, Y., Maeda, S., Tsunoda, Yasugi, T., Yano, T., Iwase, H., Mizutani, K., T., Takei, T., Obara, W., Akiyama, F., Ito, K., Shiromizu, K., Ohmi, K., Watanabe, Y., Noda, Honda, K., Uchida, K., Tsuchiya, K., Yumura, K., Nishiu, M., Nakamura, Y. and Taketani, Y. W., Ujiie, T., Nagane, Y., Miyano, S., Suzuki, Association of PTEN mutation with HPV- Y.,Narita,I.,Gejyo,F.,Fujioka,T.,Nihei,H. negative adenocarcinoma of the uterine cer- and Nakamura, Y. Association of a single- vix. Canacer Letters 210: 57-62, 2004. nucleotide polymorphism in the immuno- Nakatsuda, T., Komori, H., Kudo, T., Yoshitake, globulin u-binding protein 2 gene with immu- Y.,Senju,S.,Katagiri,T.,Furukawa,Y., noglobulin A nephropathy. J. Hum. Genet., Ogawa, M., Nakamura, Y. and Nishimura, Y. Dec. 14, 2004 [Epub ahead of print]. Mouse homologue of a novel human oncofetal Silva, F.P., Hamamoto, R., Nakamura, Y. and antigen, Glypican-3, evokes T cell-mediated Furukawa, Y. WDRPUH, a novel WD-repeat tumor rejection without autoimmune reactions containing protein, is highly expressed in hu- in mice. Clin. Cancer Res. 10: 8630-8640, 2004. man hepatocellular carcinoma and involved in Uchida, N., Tsunoda, T., Wada, S., Furukawa, cell proliferation. Neoplasia, in press. Y., Nakamura, Y. and Tahara, H. Ring finger Iida, A., Ozaki, K., Tanaka, T. and Nakamura, Y. protein (RNF) 43 as a new target for cancer Fine-scale SNP map of an 11-kb genomic re- immunotherapy. Clin. Cancer Res. 10: 8577- gion at 22q13.1 containing the galectin-1 gene. 8586, 2004. J. Hum. Genet., in press. Kakiuchi, S., Daigo, Y., Ishikawa, N., Furukawa, Park, W.-R. and Nakamura, Y. p53CSV, a novel C.,Tsunoda,T.,Yano,S.,Nakagawa,K.,Tsu- p53-inducible gene involved in the p53- ruo, T., Kohno, N., Fukuoka, M., Sone, S. and dependent cell-survival pathway. Cancer Res., Nakamura,Y.Predictionofsensitivityofad- in press. vanced non-small cell lung cancers to gefit- Taniuchi, K., Nakagawa, H., Nakamura, T., inib. Hum. Mol. Genet. 13: 3029-3043, 2004. Eguchi, H., Ohigashi, H., Ishikawa, O., Kamiyama, H., Kurosaki, K., Kurimoto, M., Katagiri,T.andNakamura,Y.Over-expressed Katagiri, T., Nakamura, Y., Kurokawa, M., P-cadherin/CDH3 promotes motility of pan- Sato, H., Endo, S. and Shiraki, K. Herpes sim- creatic cancer cells by interacting with p120ctn plex virus induced death receptor-dependent and activating rho-family GTPases. Cancer apoptosis and regression of transplanted hu- Res., in press. man cancers. Cancer Sci. 95: 990-998, 2004. Takata, R., Katagiri, T., Kanehira, M. Tsunoda, Onouchi, Y., Onoue, S., Tamari, M., Wakui, K., T.,Shuin,T.,Miki,T.,Namiki,M.,Kohri,K., Fukushima, Y., Yashiro, M., Nakamura, Y., Matsushita, Y., Fujioka, T. and Nakamura, Y. Yanagawa,H.,Kishi,F.,Ouchi,K.,Terai,M., Predicting response to M-VAC neoadjuvant Hamamoto, K., Kudo, F., Aotsuka, H., Sato, chemotherapy for bladder cancers through Y.,Nariai,A.,Kaburagi,Y.,Miura,M.,Saji,T., genome-wide gene expression profiling. Clin. Kawasaki,T.,Nakamura,Y.andHata,A.CD Cancer Res., in press. 40 ligand gene and Kawasaki disease. Eur. J. Nishiu,M.,Tomita,Y.,Nakatsuka,S., Hum. Genet. 12: 1062-1068, 2004. Takakuwa, T., Iizuka, N., Hoshida, Y., Ikeda, Hirota, T., Obara, K., Matsuda, A., Akahoshi, J.,Iuchi,K.,Yanagawa,R.,Nakamura,Y.and 172

Aozasa, K. Distinct pattern of gene expression A., Kawakami, A., Yamamoto, S., Uchida, N., in pyothorax-associated lymphoma (PAL), a Nakamura, K., Notoya, K., Nakamura, Y. and lymphoma developing inlong-standing in- Ikegawa, S. An aspartic acid repeat polymor- flammation. Cancer Sci. 95: 828-834, 2004. phism in asporin negatively affects chondro- Kizawa,H.,Kou,I.,Iida,A.,Sudo,A.,Mi- genesis and increases susceptibility to osteoar- yamoto, Y., Fukuda, A., Mabuchi, A., Kotani, thritis. Nat. Genet., in press. 173

Human Genome Center Laboratory of Functional Analysis In Silico 機能解析イン・シリコ分野

Professor Kenta Nakai, Ph.D. 教授博士(理学)中井謙太 Associate Professor Kengo Kinoshita, Ph.D. 助教授 博士(理学)木下賢吾

The mission of our laboratory is to conduct computational (“in silico”) studies on the functional aspects of genome information. Roughly speaking, genome informa- tion represents what kind of proteins are synthesized on what conditions. Thus, our study includes the structural analysis of molecular function of each gene prod- uct as well as the analysis of regulatory information, which will lead us to the un- derstanding of its cellular role represented by the networks of inter-gene interac- tion.

1. Regulon prediction based on sigma factor prediction. We applied our proposed method to prediction and DBTBS (Database of Tran- all genes in Bacillus subtilis to find currently un- scriptional Regulation in Bacillus subtilis) known transcriptional regulations by transcrip- tion factors and sigma factors. Yuko Makita, Michiel J.L. de Hoon1, Naotake Ogasawara2, Satoru Miyano1, and Kenta 2. Comprehensive analysis of alternative pro- Nakai: 1Laboratory of DNA Information moters Analysis; 2Nara Institute of Science and Tech- nology Tsuritani Katuski3, Riu Yamashita, Yutaka Suzuki4, Sumio Sugano4 and Kenta Nakai: Sigma factors, often in conjunction with other 3Taisho Pharmaceutical Co. ltd.; 4Grad. School transcription factors, regulate gene expression in New Frontier Sci. prokaryotes at the transcriptional level. Specific transcription factors tend to co-occur with spe- It is getting clearer that a single gene pro- cific sigma factors. To predict new members of duces multiple transcripts in many ways. In the transcription factor regulons, we applied mammals, an example of such mechanisms is Bayes’rule to combine the Bayesian probability the use of alternative promoters, whose involve- of sigma factor prediction calculated from mi- ment is implicated in the spatial/temporal regu- croarray data and the sigma factor binding se- lation of their downstream gene expression as quence motif, the motif score of the transcrip- well as the isoformal production of their prod- tion factor associated with the sigma factor, the ucts. To further understand this phenomenon, empirically determined distance between the we comprehensively analyzed a large amount of transcription start site to the cis-regulatory re- human/mouse 5’ESTs that have intact tran- gion, and the tendency for specific sigma factors scriptional start sites (TSSs), obtained with the and transcription factors to co-occur. By combin- oligo-capping or related methods. Namely, we ing these information sources, we improve the mapped the ESTs on the genome and classified accuracy of predicting regulation by transcrip- them into clusters that are separated each other tion factors, and also confirm the sigma factor for more than 500 bp on it. If we assume that 174 these clusters are produced from alternative pro- Riu Yamashita, Yutaka Suzuki4 , Sumio moters, 7,635 of 14,312 human genes seem to Sugano4 and Kenta Nakai have alternative TSSs, which means that about 50%of mammalian genes may be regulated by It is known that the expression of many genes alternative promoters. We further created a coding ribosomal proteins and translation fac- more convincing ‘alternative promoter core da- tors is regulated at the time of translation. Those taset’ (AP core dataset) from the alternative TSS genes have terminal oligo-pyrimidine sequence dataset, based on the conservation of their up- at the 5’-end of their mRNA and are called TOP stream regions between human and mouse. The genes. However, how many TOP genes exist in AP core dataset contains 523 genes, which in- human/mouse genomes is still unknown. By us- clude genes, such as shc and AKAP1, that are ing the accurate information of TSSs (transcrip- known to have alternative promoters. Moreover, tional start sites), higher accuracy to detect them we observed that some genes have tissue- is expected. As a first step, we focused on genes specific TSS clusters. By analyzing their Gene that have fixed TSS positions, which will also Ontology annotation, we also found that genes reduce the number of potential false positives. related to ‘signal transduction’ are significantly By using a weight matrix constructed from enriched in the AP core dataset. Our results give known TOP genes, we could screen 511 candi- a firm foothold for the clarification of the alter- date TOP genes. In them, all of 9 previously native transcription mechanism. characterized TOP genes were included. More- over, 78 of 80 ribosomal protein genes were also 3. Comprehensive analysis of CpG islands in included. There were 99 human TOP gene can- promoter regions didates that have also mouse orthologs. Our re- sult suggests that TOP genes are not only Riu Yamashita, Yutaka Suzuki4 , Sumio translation-related but may include a wider va- Sugano4 and Kenta Nakai riety of genes such as chaperones and transport proteins. It has been envisaged that CpG islands are often observed near the transcriptional start sites 5. Cis element analysis using DNA array data (TSSs) of house-keeping genes. However, neither in Arabidopsis the precise positions of CpG islands relative to TSSs of genes nor the correlation between the Takeshi Obayashi5, Kenta Nakai, and Hiroyuki presence of the CpG islands and the expression Ohta5: 5Tokyo Institute of Technology specificity of these genes are well-understood. Using thousands of sequences with known TSSs Although cis-element prediction methods in human and mouse, we found that there is a from gene expression data are generally used, clear peak in the distribution of CpG islands there are few reports in plant science for the around TSSs in the genes of these two species. comprehensive analysis of regulatory mecha- Thus, we classified human (mouse) genes into nisms of gene expression using bioinformatics 6,600 (2,948) CpG+genes and 2,619 (1,830) CpG- approaches. Thus, we comprehensively pre- ones, based on the presence of a CpG island dicted cis elements from gene expression data within the -100: +100 region. We estimated obtained by our microarray analyses of Arabi- the degree of each gene being a housekeeper by dopsis thaliana genes. Based on the results of cis the number of cDNA libraries where its ESTs element prediction, we estimated the regulatory were detected. Then, the tendency that a gene mechanisms of gene expression. The analysis lacking CpG islands around its TSS is expressed was performed as follows: First, many candi- with a higher degree of tissue specificity turned dates of cis elements were selected by several out to be evolutionarily conserved. We also con- methods. After the evaluation of their effects on firmed this tendency by analyzing the gene on- gene expression, a set of non-redundant cis ele- tology annotation of classified genes. Since no ments was created. Finally, the regulatory such clear correlation was found in the control mechanisms of gene expression were inferred data (mRNAs, pre-mRNAs, and chromosome from these results. banding pattern), we concluded that the effect of a CpG island near the TSS should be more 6. Large-scale analysis of human alternative important than the global GC content of the re- protein isoforms: pattern classification and gion where the gene resides. correlation with subcellular localization signals 4. Global detection of human genes that have the terminal oligo-pyrimidine (TOP) Mitsuteru Nakao6, Roberto A. Barrero7,Paul motif Horton6, and Kenta Nakai: 6CBRC, Nat. Inst. 175

Advanced Industrial Sci. Tech.; 7Nat. Inst. Identification of protein biochemical functions Genet. based on their three-dimensional structures is now required in the post genome-sequencing We investigated human alternative protein era. Ligand binding is one of the major bio- isoforms of more than 2,600 genes based on full- chemical functions of proteins, and thus the length cDNA clones and SwissProt. We classi- identification of ligands and their binding sites fied the isoforms and examined their co- is the starting point for the function identifica- occurrence for each gene. Further, we investi- tion. Previously we reported our first trial on gated potential relationships between these structure based function prediction, based the changes and differential subcellular localization. similarity searches of molecular surfaces against The two most abundant patterns were the one the functional site database. Here we describe with different C-terminal regions and the one the extension of our first trial by expanding the with an internal insertion, which together ac- search database to whole hetero atom binding count for 43% of the total. Although changes of sites appearing within the PDB with the new the N-terminal region are less common than analysis protocol. In addition, we have deter- those of the C-terminal region, extension of the mined the similarity threshold line, by using 10 C-terminal region is much less common than structure pairs with solved free and complex that of the N-terminal region, probably because structures. Finally, we extensively applied our of the difficulty of removing stop codons in one method to newly determined hypothetical pro- isoform. We also found a tendency for the mem- teins including some without annotations, and bers of a gene family to show the same combi- evaluated the performance of our methods. nation of co-occurrence in alternative isoforms. We interpret this as evidence that there is some 8. Prediction of DNA binding sites on mo- structural relationship which produces a reper- lecular surface of proteins toire of isoformal patterns. Finally, many termi- nal changes are predicted to cause differential Yuko Tsuchiya8, Kengo Kinoshita, Haruki subcellular localization, especially in targeting to Nakamura8 either peroxisomes or mitochondria. Our study sheds new light on the enrichment of the hu- PreDs is a www server that predicts the man proteome through alternative splicing and dsDNA-binding sites on protein molecular sur- related events. Our database of alternative pro- faces generated from the atomic coordinates in a tein isoforms is available through the Internet. PDB format. The prediction was done by evalu- ating the electrostatic potential, local curvature 7. Identification of the ligand binding sites on and global curvature on the surfaces. Results of the molecular surface of proteins. the prediction can be interactively checked with our original surface viewer. Kengo Kinoshita and Haruki Nakamura8: 8In- stitute for Protein Research, Osaka University

Publications

Makita, Y., Nakao, M., Ogasawara, N., and Shiratori, A., Sudo, H., Hosoiri, T., Kaku, Y., Nakai, K., DBTBS: Database of transcriptional Kodaira, H., Kondo, H., Sugawara, M., Taka- regulation in Bacillus subtilis and its contribu- hashi,M.,Kanda,K.,Yokoi,T.,Furuya,T., tion to comparative genomics, Nucl. Acids Kikkawa,E.,Omura,Y.,Abe,K.,Kamihara, Res., 32: D75-D77, 2004. K.,Katsuta,N.,Sato,K.,Tanikawa,M., Suzuki, Y., Yamashita, R., Sugano, S., and Yamazaki, M., Ninomiya, K., Ishibashi, T., Nakai, K., DBTSS (DataBase of Transcriptional Yamashita, H., Murakaw,a K., Fujimori, K., Start Sites): Progress Report 2004, Nucl. Acids Tanai, H., Kimata, M., Watanabe, M., Hiraoka, Res., 32: D78-D81, 2004. S., Chiba, Y., Ishida, S., Ono, Y., Takiguchi, S., Ota,T.,Suzuki,Y.,Nishikawa,T.,Otsuki,T., Watanabe, S., Yosida, M., Hotuta, T., Kusano, Sugiyama, T., Irie, R., Wakamatsu, A., Hay- J., Kanehori, K., Takahashi-Fujii, A., Hara, H., ashi, K., Sato, H., Nagai, K., Kimura, K., Tanase, T.O., Nomura, Y., Togiya, S., Komai, Makita, H., Sekine, M., Obayashi, M., Nishi, F., Hara, R., Takeuchi, K., Arita, M., Imose, T., Shibahara, T., Tanaka, T., Ishii, S., N., Musashino, K., Yuuki, H., Oshima, A., Yamamoto, J., Saito, K., Kawai, Y., Isono, Y., Sasaki, N., Aotsuka, S., Yoshikawa, Y., Mat- Nakamura, Y., Nagahari, K., Murakami, K., sunawa, H., Ichihara, T., Shiohata, N., Sano, Yasuda, T., Iwayanagi, T., Wagatsuma, M., S., Moriya, S., Momiyama, H., Satoh, N., 176

Takami, S., Terashima, Y., Suzuki, O., Naka- Quackenbush J, Okazaki Y, Hayashizaki Y, gawa, S., Senoh, A., Mizoguchi, H., Goto, Y., Hide W, Chakraborty R, Nishikawa K, Suga- Shimizu, F., Wakebe, H., Hishigaki, H., wara H, Tateno Y, Chen Z, Oishi M, Tonellato Watanabe, T., Sugiyama, A., Takemoto, M., P, Apweiler R, Okubo K, Wagner L, Wiemann Kawakami, B., Yamazaki, M., Watanabe, K., S, Strausberg RL, Isogai T, Auffray C, Kumagai, A., Itakura, S., Fukuzumi, Y., Fuji- Nomura N, Gojobori T, and Sugano S., Inte- mori, Y., Komiyama, M., Tashiro. H., Tani- grative annotation of 21,037 human genes gami, A., Fujiwara, T., Ono, T., Yamada, K., validated by full-length cDNA clones, PLoS Fujii, Y., Ozaki, K., Hirao, M., Ohmori, Y., Biol., 2 (6): E162-, 2004. Kawabata, A., Hikiji, T., Kobatake, N., Ina- Horton, P., Mukai, Y., and Nakai, K., Chapter gaki, H., Ikema, Y., Okamoto, S,. Okitani, R., 14: Protein subcellular localization prediction, Kawakami, T., Noguchi, S., Itoh, T., Shigeta, in L. Wong (ed.), Practical Bioinformatician, K., Senba, T., Matsumura, K., Nakajima, Y., World Scientific Publishing Co., pp. 193-216, Mizuno, T., Morinaga, M., Sasaki, M., To- 2004. gashi, T., Oyama, M., Hata, H., Watanabe, M., De Hoon, M.J.L., Makita, Y., Imoto, S., Kobay- Komatsu, T., Mizushima-Sugano, J., Satoh, T., ashi, K., Ogasawara, N., Nakai, K., and Miy- Shirai, Y., Takahashi, Y., Nakagawa, K., Oku- ano, S., Predicting gene regulation by sigma mura, K., Nagase, T., Nomura, N., Kikuchi, factors in Bacilllus subtilis from genome-wide H., Masuho, Y., Yamashita, R., Nakai, K., data, Bioinformatics, 20 (Supp. 1): I101-I108, Yada, T., Nakamura, Y., Ohara, O., Isogai, T., 2004. Sugano, S., Complete sequencing and charac- Suzuki,Y.,Yamashita,R.,Shirota,M., terization of 21,243 full-length human cDNAs Sakakibara, Y., Chiba, J., Mizushima-Sugano, Nat. Genet., 36 (1): 40-45, 2004. J., Kel, A. E., Arakawa, T., Caminci, P., Kawai, Imanishi T, Itoh T, Suzuki Y, O’Donovan C, J., Hayashizaki, Y., Takagi, T., Nakai, K., and Fukuchi S, Koyanagi KO, Barrero RA, Tamura Sugano, S., Large-scale collection and charac- T, Yamaguchi-Kabata Y, Tanino M, Yura K, terization of promoters of human and mouse Miyazaki S, Ikeo K, Homma K, Kasprzyk A, genes, In silico Biol., 4: 0036-, 2004. Nishikawa T, Hirakawa M, Thierry-Mieg J, Suzuki,Y.,Yamashita,R.,Shirota,M., Thierry-Mieg D, Ashurst J, Jia L, Nakao M, Sakakibara, Y., Chiba, J., Mizushima-Sugano, Thomas MA, Mulder N, Karavidopoulou Y, J., Nakai, K., and Sugano, S., Sequence com- Jin L, Kim S, Yasuda T, Lenhard B, Eveno E, parison of human and mouse genes reveals a Suzuki Y, Yamasaki C, Takeda JI, Gough C, homolgous block structure in the promoter re- Hilton P, Fujii Y, Sakai H, Tanaka S, Amid C, gions, Genome Res., 14: 1711-1718, 2004. Bellgard M, Bonaldo Md M, Bono H, Brom- Poluliakh,N.,Konno,M.,Horton,P.,andNakai, berg SK, Brookes AJ, Bruford E, Carninci P, K., Parameter landscape analysis for common Chelala C, Couillault C, Souza SJ, Debily MA, motif discovery programs, Proc. 1st Ann. RE- Devignes MD, Dubchak I, Endo T, Estreicher COMB Satellite Workshop on Regulatory A, Eyras E, Fukami-Kobayashi K, R Gopinath Genomics, in press. G, Graudens E, Hahn Y, Han M, Han ZG, Bannai, H., Hyyro, H., Shinohara, A., Takeda, Hanada K, Hanaoka H, Harada E, Hashimoto M., Nakai, K., and Miyano, S., Finding opti- K, Hinz U, Hirai M, Hishiki T, Hopkinson I, mal pairs of patterns, Lecture Notes in Comp. Imbeaud S, Inoko H, Kanapin A, Kaneko Y, Sci. (WABI 2004), 3240: 450-, 2004. Kasukawa T, Kelso J, Kersey P, Kikuno R, Inenaga, S., Bannai, H., Hyyro, H., Shinohara, Kimura K, Korn B, Kuryshev V, Makalowska A., Takeda, M., Nakai, K., and Miyano, S., I, Makino T, Mano S, Mariage-Samson R, Finding optimal pairs of cooperative and com- Mashima J, Matsuda H, Mewes HW, Mi- peting patterns with bounded distance, Lec- noshima S, Nagai K, Nagasaki H, Nagata N, ture Notes in Comp. Sci. (DS 2004), 3245: 32-, Nigam R, Ogasawara O, Ohara O, Ohtsubo 2004. M, Okada N, Okido T, Oota S, Ota M, Ota T, Kato, K., Yamashita, R., Matoba, R., Monden, Otsuki T, Piatier-Tonneau D, Poustka A, Ren M.,Noguchi,S.,Takagi,T.,andNakai,K., SX, Saitou N, Sakai K, Sakamoto S, Sakate R, Cancer gene expression database (CGED): a Schupp I, Servant F, Sherry S, Shiba R, database for gene expression profiling with Shimizu N, Shimoyama M, Simpson AJ, accompanying clinical information of human Soares B, Steward C, Suwa M, Suzuki M, cancer tissues, Nucl. Acids Res., 33: D533-D Takahashi A, Tamiya G, Tanaka H, Taylor T, 536, 2005. Terwilliger JD, Unneberg P, Veeramachaneni Makita, Y., De Hoon, M.J.L., Ogasawara, N., Mi- V, Watanabe S, Wilming L, Yasuda N, Yoo yano, S., and Nakai, K., Bayesian joint predic- HS, Stodolsky M, Makalowski W, Go M, tion of associated transcription factors in Ba- Nakai K, Takagi T, Kanehisa M, Sakaki Y, cillus subtilis, Pacific Symposium on Biocom- 177

puting 2005 (Altman et al ed.), 507-518, World informatics, in press. Scientific, 2005. 山下理宇・中井謙太,第3編第1節 ゲノムの Bannai, H., Hyyro, H., Shinohara, A., Takeda, データベース,今中忠行監修 ゲノミクス・プ M., Nakai, K., and Miyano, S., An O(Nˆ2) al- ロテオミクスの新展開:生物情報の解析と応 gorithm for discovering optimal boolean pat- 用,エヌ・ティー・エス:786―791,2004. tern pairs, TCBBSI, in press. 中井謙太,ゲノムデータベース入門:つくる人と Yamashita, R., Suzuki, Y., Sugano, S., and 使う人のために,高木利久編 東京大学生物情 Nakai, K., Genome-wide analysis reveals 報科学学部教育特別プログラム:バイオイン strong correlation between CpG islands with フォマティクス概論,羊土社:100―110,2004. nearby transcription start sites of genes and 鈴木穣,山下理宇,中井謙太,菅野純夫,トラン their tissue-specificity, Gene, in press. スクリプトーム解析とトランスクリプトーム Kinoshita, K., and Nakamura, H., Identification データベース,小原・谷口・市川・猪飼編,バ of the ligand binding sites on molecular sur- イオ高性能機器・新技術利用マニュアル 蛋白 face of proteins, Protein Science, in press. 質核酸酵素8月号増刊49Â:1859―1865,2004. Tsuchiya, Y., Kinoshita, K., and Nakamura, H., 中井謙太,遺伝子の機能予測概論,松原謙一監 PreDs: A server for predicting dsDNA- 修,情報生物学講義¿―配列情報の科学財団法 binding site on protein molecular surface, Bio- 人国際高等研究所,55頁,2004.