136

Human Genome Center Laboratory of Genome Database Laboratory of Sequence Analysis ゲノムデータベース分野 シークエンスデータ情報処理分野

Professor Minoru Kanehisa, Ph.D. 教授(委嘱)理学博士金久 實 Research Associate Toshiaki Katayama, M.Sc. 助 手 理学修士 片山俊明 Research Associate Shuichi Kawashima, M.Sc. 助 手 理学修士 川島秀一 Lecturer Tetsuo Shibuya, Ph.D. 講 師 理学博士 渋谷哲朗 Research Associate Michihiro Araki, Ph.D. 助 手 薬学博士 荒木通啓

Owing to continuous developments of high-throughput experimental technologies, ever-increasing amounts of data are being generated in functional genomics and proteomics. We are developing a new generation of databases and computational technologies, beyond the traditional genome databases and sequence analysis tools, for making full use of such large-scale data in biomedical applications, espe- cially for elucidating cellular functions as behaviors of complex interaction systems.

1. Comprehensive repository for community We have been developing the server based on genome annotation open source software including BioRuby, BioPerl, BioDAS and GMOD/GBrowse to make Toshiaki Katayama, Mari Watanabe and Mi- the system consistent with the existing open noru Kanehisa standards. The contents of the KEGG DAS data- base can be accessed graphically in a web KEGG DAS is an advanced genome database browser using GBrowse GUI (graphical user in- system providing DAS (Distributed Annotation terface) and also programatically by the DAS System) service for all organisms in the protocol. The DAS, which is an XML over HTTP GENOME and databases in KEGG data retrieving protocol, enables the user to (Kyoto Encyclopedia of Genes and Genomes). write various kinds of automated programs for Currently, KEGG DAS contains 6,943,951 anno- analyzing genome sequences and annotations. tations for genome sequences of 441 organisms. For example, by combining KEGG DAS with The KEGG DAS server provides annota- KEGG API, a program to retrieve upstream se- tions linked to the KEGG PATHWAY and quences of a given set of genes which have LIGAND databases, as well as the SSDB data- similar expression patterns on the same path- base containing paralog, ortholog and motif in- way,canbewrittenveryeasily.GBrowse,the formation. In addition to the coding genes, in- graphical interface, enables user to browse, formation of non-coding RNAs predicted using search, zoom and visualize a particular region Rfam database is also provided to fill the anno- of the genome. Moreover, users are also able to tation of the intergenic regions of the genome. add their own annotations onto the GBrowse 137 view by providing another DAS server or by edge of molecular interaction/reaction pathways simply uploading their own data as a file. This and other systemic functions (PATHWAY and functionality enables researchers to add various BRITE databases), the information about the annotations on the genome and by sharing their genomic space (GENES database), and informa- annotations with the community they can con- tion about the chemical space (LIGAND data- tinuously refine the genome annotation, so- base). KEGG API provides valuable means to re- called “community annotation.” This year we trieve various kinds of information stored in the have updated the GBrowse genome browser to KEGG and has become an increasingly popular the latest version, which has improved user in- mode of access. Recently, we have introduced terface. We are also responsible to Japanese lo- several new methods to search compounds, calization of the browser. The KEGG DAS is drugs and glycans by their structure, mass, com- weekly updated and freely available at http:// position and annotations. Additionally, the das.hgc.jp/. methods to colorize the PATHWAY diagram is enhanced to control the different elements shar- 2. Automatic assignments of orthologs and ing the same name can be distinguished by their paralogs in complete genomes ID. The KEGG API is available at http://www. genome.jp/kegg/soap/. Toshiaki Katayama and Minoru Kanehisa 4. EGassembler: web server for large-scale Accession of the number of sequenced geno- clustering and assembling ESTs and mes made it difficult to characterize orthologous genomic DNA fragments relationship among organisms. We are develop- ing a computational method for finding appro- Ali Masoudi-Nejad, Shuichi Kawashima, Koi- priate orthologous gene clusters automatically. It chiro Tonomura, Masanori Suzuki, Minoru is based on a graph analysis of the KEGG SSDB Kanehisa database, containing sequence similarity rela- tions among all the genes in the completely EST sequencing has proven to be an economi- sequenced genomes. The nodes of the SSDB cally feasible alternative for gene discovery in graph are genes and the edges are the Smith- species lacking a draft genome sequence. Ongo- Waterman sequence similarity scores computed ing large-scale EST sequencing projects feel the by the SSEARCH program. The edges are not need for bioinformatics tools to facilitate uni- only weighted but also directed, indicating the form ESTs handling. This brings about a re- best (top-scoring) hit when a gene in an organ- newed importance to a universal tool for proc- ism is compared against all genes in another or- essing and functional annotation of large sets of ganism. Thus, a highly connected cluster of ESTs in order to cover the complete transcrip- nodes containing a number of bidirectional best tome of an organism. EGassembler (http://egas- hits might be considered an ortholog cluster sembler.hgc.jp/) is a web server, which provides consisting of functionally identical genes. Such a an automated as well as a user-customized cluster can be found by our heuristic method for analysis tool for cleaning, repeat masking, vec- finding quasi-cliques, but the SSDB graph is too tor trimming, organelle masking, clustering and large to perform quasi-clique finding at a time. assembling of ESTs and genomic fragments. It is Therefore, we introduce a hierarchy (evolution- also designed to serve as a standalone web ap- ary relationship) of organisms and treat the plication for each one of those processes. The SSDB graph as a nested graph. However, the web server is freely available and provides the method still requires large computation time community with a unique all-in-one online ap- along with the number of organism increases, plication web service for large scale ESTs and we are trying to refine the process to make it genomic DNA clustering and assembling, espe- faster and accurate. cially for EST processing and annotation pro- jects. 3. SOAP/WSDL interface for the KEGG sys- tem 5. New version of MAGEST database

Toshiaki Katayama, Shuichi Kawashima and Shuichi Kawashima and Minoru Kanehisa Minoru Kanehisa We have developed a new version of MAG- We have continued to develop KEGG API, a EST database. Previously MAGEST was de- web service to facilitate usability of the KEGG signed as a database for maternal gene expres- system. KEGG is a suite of databases and associ- sion information for an ascidian, Halocynthia ated software, integrating our current knowl- roretzi. In the new version of MAGEST, it is ex- 138 tended to four embryonic developmental stages developed a high performance database entry including the following additional stages, e.g. retrieval system, named HiGet. The HiGet sys- early cleavage, early gastrula and early neurula. tem is constructed on the HiRDB, a commercial Furthermore, we constructed gene clusters and ORDBMS (Object-oriented Relational Database assembled sequences of ESTs by EGassembler. Management System) developed by Hitachi, These clusters enabled us to cross-refer the gene Ltd. It is publicly accessible on the Web page at expression of the four different stages. The new http://higet.hgc.jp/ or SOAP based web service web site is accessible at http://magest.hgc.jp/. at http://higet.hgc.jp/soap/. HiGet can execute Now we are comparing the MAGEST sequences full text search on various biological databases. with other species of ascidian, Ciona sp. Because In addition to the original plain format, the sys- the phylogenetic position of H. roretzi is a good tem contains data in the XML format in order to outgroup for Enterogonia to which Ciona sp. be- provide a field specific search facility. When a longs, we expect that the result from this com- complicated search condition is issued to the parative analysis lead us to understand the di- system, the search processing is executed effi- versification of gene families among Urocho- ciently by combining several types of indices to data. reduce the number of records to be processed within the system. Current searchable databases 6. SSS: a sequence similarity search service areGenBank,UniProt,Prosite,OMIM,PDBand RefSeq. We are planning to include other valu- Toshiaki Katayama, Kazuhiro Ohi, Minoru able databases and also planning to develop an Kanehisa inter-database search interface and a complex search facility combining keyword search and There are various services in the world to find sequence similarity search. similar sequences from the database, such as the famous BLAST service provided at NCBI. How- 8. Development of algorithms for biosyn- ever, the method to search and the database to thetic process analysis be searched could not be added from outside. To provide our super computer resources at the Kohichi Suematsu, Tetsuo Shibuya, Michihiro Center to the research commu- Araki, and Minoru Kanehisa nity, we started to develop a new service for the sequence similarity search, SSS. In SSS, user can We developed algorithms for identifying bio- select the search algorithm from BLAST, FASTA, synthetic process of medicinal products by util- SSEARCH, TRANS and EXONERATE. This va- izing the database of sub-molecular building riety of options is unique among the public blocks in biosynthetic processes. The problem is services. Then user is prompted to select appro- to find the most reasonable decomposition of a priate database depending on the algorithm se- graph into subgraphs which are annotated in lected and the search is executed. On the back- the database. We have developed new efficient end, we implemented the search system on the algorithms and tools based on the algorithms Sun Grid Engine to provide efficient resources for the problem, though the problem is a very on distributed computers. As a result, we are difficult NP-hard problem. able to provide time consuming services such as TRANS and EXONERATE in addition to the 9. Geometric Suffix Tree: A New Data struc- popular algorithms. The SSS service is freely ture for Indexing Structures available at http://sss.hgc.jp/. Tetsuo Shibuya 7. High performance database entry retrieval system Protein structure analysis is one of the most important research issues in the post-genomic Kazutomo Ushijima, Chiharu Kawagoe, Toshi- era, and faster and more accurate index data aki Katayama, Shuichi Kawashima, Kenta structures for such 3-D structures are highly de- Nakai, Minoru Kanehisa sired for research on . We proposed a new data structure for indexing protein 3-D Recently, the number of entries in biological structures. For strings, there are many efficient databases is exponentially increasing year by indexing structures such as suffix trees, but it year. For example, there were 10,106,023 entries has been considered very difficult to design in the GenBank database in the year 2000, which such sophisticated data structures against 3-D has now grown to 65,771,589 (Release 156 + structures like proteins. Our index structure is daily updates). In order for such a vast amount based on the suffix trees and is called the geo- of data to be searched at a high speed, we have metric suffix tree. By using the geometric suffix 139 tree for a set of protein structures, we can search matsu, and Minoru Kanehisa for all of their substructures whose RMSDs (root mean square deviations) or URMSDs (unit- Medicinal natural products have been the ma- vector root mean square deviations) to a given jor sources of bioactive compounds with diverse query 3-D structure are not larger than a given pharmacological activities, and are enzymically bound. Though there are O (N 2) substructures, synthesized as secondary metabolites for specific our data structure requires only O (N )space biological purposes. The diverse activities are where N is the sum of lengths of the set of pro- obviously explained by extensive diversities in teins. We propose an O (N 2) construction algo- the chemical structures. Although several hy- rithm for it, while a naive algorithm would re- potheses have been proposed to understand the quire O (N 3) time to construct it. Moreover we origin of the chemical diversity, no systematic propose an efficient search algorithm. Experi- analyses have been done so far. We thus define ments show that we can search for similar struc- molecular building blocks required for describ- tures much faster than naive RMSD computa- ing the chemical information of natural products tion against all over the database. The experi- to elucidate how chemical diversities are de- ments also show that the construction time of signed in the biosynthetic circuits. Each natural the geometric suffix tree is practically almost product is expressed as a combination of mo- linear to the size of the database, when applied lecular building blocks with corresponding en- to a protein structure database. zymatic information as links between building blocks to be collected in a database. The knowl- 10. Protein Function Prediction based on 3-D edge database constructed from various re- Structure Motifs sources enables us to identify the system struc- tures of the biosynthetic circuits. We also extract Hiroki Sakai and Tetsuo Shibuya distinctive network structures consisted of both chemical and genomic information, which will Protein functions are said to be determined by be useful for understanding the origin of chemi- its 3-D structures, but not all functions have cal diversity in the biosynthetic circuits. been known to be related to some 3-D structure motifs. The geometric suffix tree, a data struc- 13. Extracting A Strategy for Drug Design by ture for indexing 3-D protein structures, which Tracing Drug Development is also developed by us, enables comprehensive enumeration of all the possible structural motifs Daichi Shigemizu, Michihiro Araki and Mi- among given set of proteins. We are developing noru Kanehisa a new algorithm based on the support vector machine that decides protein’s function from the Current drugs are mostly derived by modifi- 3-D structure of a protein. This algorithm util- cation of known drug structures or from lead izes all the possible 3-D motifs found by using structures to be optimized for targeting new the geometric suffix tree. molecules or obtaining improved efficacy. To ex- plore a design rule in drug development, it is 11. Genotype Clustering based on Hidden necessary to focus on the empirical modifica- Markov Models tions to construct a knowledge base for tracing the chemical evolutions. We have been collect- Ritsuko Onuki and Tetsuo Shibuya ing data on the drug evolutions from databases and literatures, which has already been imple- Analysis of single nucleotide polymorphisms mented on the drug structure maps in the (SNPs) is one of the most important research KEGG DRUG database. In order to trace the topics in the post-genomic era. Recently, the chemical development in the database, binary genotype data increases exponentially and de- relations of chemical structures were defined be- velopment of tools for analyzing these data is tween before and after the chemical changes. highly desired. We are developing a new algo- We subsequently compared chemical structures rithm for clustering genotypes, by utilizing the in the binary relations to define functional at- hidden markov model (HMM) that infers haplo- oms and groups supposed to be important for types from genotpyes. drug development. We have been collecting a series of such transformation patterns in the 12. The Origin of Chemical Diversity in Bio- process of drug development to analyze and synthetic Circuits of Medicinal Natural provide a design strategy in drug development. Products 14. Comprehensive Analysis of Distinctive Michihiro Araki, Tetsuo Shibuya, Kohichi Sue- Polyketide and Nonribosomal Peptide 140

Structural Motifs Encoded in Microbial sented as the sequences of carbonyl/peptidyl Genomes units to extract the structural motifs effeciently. We applied our method to 4,529 PKSs/NRPSs Yohsuke Minowa, Michihiro Araki and Mi- and found 619 PKs/NRPs. We also collected noru Kanehisa 1,449 PKs/NRPs whose chemical structures have been experimentally determined. The We developed a highly accurate method to structural sequences were compared using the predict polyketide (PK) and nonribosomal pep- Smith-Waterman algorithm, and clustered into tide (NRP) structures encoded in the microbial 271 clusters. From the compound clusters, we genome. PKs/NRPs are polymers of carbonyl or extracted 33 structural motifs which are signifi- peptidyl chains synthesized by polyketide syn- cantly related with their bioactivities. The inte- thases (PKS) and nonribosomal peptide syn- grative analysis of genomic and chemical infor- thetases (NRPS). We analyzed domain se- mation here will provide a strategy to predict quences corresponding to specific substrates and the chemical structures, the biosynthetic path- physical interactions between PKSs/NRPSs in ways, and the biological activities of PKs/NRPs, order to predict which substrates are selected which is useful for the rational design of novel and assembled into highly ordered chemical PKs/NRPs. structures. The predicted PKs/NRPs were repre-

Publications

Yoshizawa AC, Kawashima S, Okuda S, Fujita Genome Inform. Ser. Workshop Genome In- M, Itoh M, Moriya Y, Hattori M, Kanehisa M. form. 2006, 17(1): 184-193, 2006 Extracting sequence motifs and the phyloge- Shibuya, T., Geometric Suffix Tree: A New In- netic features of SNARE-dependent mem- dex Structure for Protein 3-D Structures, Com- brane traffic. Traffic. 7(8): 1104-1118, 2006. binatorial Pattern Matching 2006 (CPM 2006), Masoudi-Nejad A, Tonomura K, Kawashima S, LNCS 4009, Barcelona, July 5-7, 2006, pp. 84- Moriya Y, Suzuki M, Itoh M, Kanehisa M, 93. Endo T, Goto S. EGassembler: online bioinfor- Suematsu, K., Araki, M., and Shibuya, T. Graph matics service for large-scale processing, clus- tiling algorithm for molecular synthesis analy- tering and assembling ESTs and genomic sis, SIGAL 106, May 18, 2006, Ikaho, pp. 25- DNA fragments. Nucleic Acids Res. 34: W459- 32. 62, 2006. Shibuya, T., Geometric Suffix Tree: A New In- Kanehisa,M.,Goto,S.,Hattori,M.,Aoki-Kino- dex Structure for Protein 3-D Structures, IPSJ shita, K.F., Itoh, M., Kawashima, S., Kata- SIG Notes SIGAL 105-3, March 18 2006, At- yama,T.,Araki,M.,andHirakawa,M.From sugi, pp. 17-24. genomics to chemical genomics: new develop- Shibuya, T., Combinatorial Structural Pattern ments in KEGG. Nucleic Acids Res. 34, D354- Matching, The Second Japan-Taiwan Bilateral 357 (2006). Symposium on Bioinformatics, 2006. Hashimoto, K., Goto, S., Kawano, S., Aoki-Kino- Shibuya, T., Indexing Structures for Biomolecu- shita, K.F., Ueda, N., Hamajima, M., Kawa- lar Structures, The First Japan-Taiwan Bilat- saki, T., and Kanehisa, M. KEGG as a glycome eral Symposium on Bioinformatics, 2006. informatics resource. Glycobiology 16, 63R-70 バイオインフォマティクス事典,日本バイオイン R (2006). フォマティクス学会編,共立出版,2006. Honda, W., Kawashima, S., Kanehisa, M. Me- 荒木通啓,金久實「創薬科学のためのKEGG」化 tabolite antigens and pathway incompatibility. 学工業 Vol.57№3,42―46,2006. 141

Human Genome Center Laboratory of DNA Information Analysis DNA情報解析分野

Professor Satoru Miyano, Ph.D. 教 授 理学博士 宮 野 悟 Assistant Professor Seiya Imoto, Ph.D. 助手 博士(数理学) 井元清哉 Assistant Professor Masao Nagasaki, Ph.D. 助手 博士(理学) 長 “ 正朗 Project Assistant Professor Ryo Yoshida, Ph.D. 特任助手 博士(理学) 吉 田 亮

The aim of the research at this laboratory is to establish computational methodolo- gies for discovering and interpreting information of nucleic acid sequences, pro- teins and some other experimental data arising from researches in Genome Sci- ence. Our current interests are focused on Computational Systems Biology and its related computational techniques. Apart from the research activity, the laboratory has been providing bioinformatics software tools and has been taking a leading part in organizing an international forum for Genome Informatics.

1. Computational Systems Biology -series proteomic data to our HFPNe model. We constructed an epidermal growth factor receptor a. Modeling and estimation of dynamic EGFR signal transduction pathway model (EFGR pathway by data assimilation approach us- model) by using the biological data available in ing time series proteomic data the literature. Then, the kinetic parameters were determined in the data assimilation (DA) frame- Shinya Tasaki, Masao Nagasaki, Masaaki work with some manual tuning so as to fit the Oyama, Hiroko Hata, Kazuko Ueno, Ryo proteomic data published by Blagoev et al. (Nat. Yoshida, Tomoyuki Higuchi, Sumio Sugano, Biotechnol., 22: 1139. 1145, 2004). This in silico Satoru Miyano model was further refined by adding or remov- ing some regulation loops using biological back- Cell Illustrator is a model building tool based ground knowledge. The DA framework was on the Hybrid Functional Petri net with exten- used to select the most plausible model from sion (HFPNe). By using Cell Illustrator, we have among the refined models. By using the pro- succeeded in modeling biological pathways, e.g., teomic data, we semi-automatically constructed metabolic pathways, gene regulatory networks, a well-tuned EGFR HFPNe model by using the microRNA regulatory networks, cell signaling Cell Illustrator coupled with the DA framework. networks, and cell-cell interactions. The recent development of tandem mass spectrometry cou- b. Genomic data assimilation for estimating pled with liquid chromatography (LC/MS/MS) hybrid functional Petri net from time- technology has enabled researchers to quantify course gene expression data the dynamic profile of a wide range of proteins within the cell. The proteomic data obtained by Masao Nagasaki, Rui Yamaguchi, Ryo Yoshi- using LC/MS/MS has been considerably useful da, Seiya Imoto, Atsushi Doi, Yoshinori for introducing dynamics to the HFPNe model. Tamada, Hiroshi Matsuno, Satoru Miyano, To- Here, we report the first introduction of the time moyuki Higuchi 142

We propose an automatic construction the cell fate determination model of two gusta- method of the hybrid functional Petri net as a tory neurons of Caenorhabditis elegans…ASE simulation model of biological pathways. The left (ASEL) and ASE right (ASER). These neu- problems we consider are how we choose the rons are morphologically bilaterally symmetric values of parameters and how we set the net- but physically asymmetric in function. Johnston work structure. Usually, we tune these un- et al. have suggested that their cell fate is deter- known factors empirically so that the simulation mined by the double-negative feedback loop in- results are consistent with biological knowledge. volving the lsy-6 and mir-273 microRNAs. Our Obviously, this approach has the limitation in simulation model confirms their hypothesis. In thesizeofnetworkofinterest.Toextendtheca- addition, other well-known mutants that are re- pability of the simulation model, we propose the lated with the double-negative feedback loop use of data assimilation approach that was origi- are also well-modeled. The new upstream regu- nally established in the field of geophysical lator of lsy-6 (lsy-2) that is mentioned in another simulation science. We provide genomic data as- paper is also integrated into this model for the similation framework that establishes a link be- mechanism of switching between ASEL and tween our simulation model and observed data ASER without any contradictions. Therefore, the like microarray gene expression data by using a HFPNe-based modeling will be one of the nonlinear state space model. A key idea of our promising modeling methods and simulation ar- genomic data assimilation is that the unknown chitectures that illustrate microRNA regulatory parameters in simulation model are converted networks. as the parameter of the state space model and the estimates are obtained as the maximum a d. A combined pathway to simulate CDK- posteriori estimators. In the parameter estima- dependent phosphorylation and ARF- tion process, the simulation model is used to dependent stabilization for p53 transcrip- generate the system model in the state space tional activity model. Such a formulation enables us to handle both the model construction and the parameter Atsushi Doi, Masao Nagasaki, Kazuko Ueno, tuning within a framework of the Bayesian sta- Hiroshi Matsuno, Satoru Miyano tistical inferences. In particular, the Bayesian ap- proach provides us a way of controlling overfit- The protein p53 is phosphorylated by a mem- ting during the parameter estimations that is es- ber of protein kinases such as CDK7, and stabi- sential for constructing a reliable biological lized by the protein ARF. The phosphorylation pathway. We demonstrate the effectiveness of and stabilization of p53 is believed to enhance our approach using synthetic data. As a result, its transcriptional activity and act simultane- parameter estimation using genomic data as- ously. Biological pathways composed of experts similation works very well and the network knowledge obtained from the literature are in- structure is suitably selected. cluding these activation mechanisms. However, the map of biological pathways does not reflect c. Cell fate simulation model of gustatory the combination effect of phosphorylation and neurons with microRNAs double-negative stabilization. We have conducted some simula- feedback loop by hybrid functional Petri tions of biological pathways with hybrid func- net with extension tional Petri net (HFPN) after careful reading of papers. In this paper, we constructed the HFPN Ayumu Saito, Masao Nagasaki, Atushi Doi, based biological pathway of CDK-dependent Kazuko Ueno, Satoru Miyano phosphorylation pathway and combine with ARF-dependent pathway described previously, Biological regulatory networks have been ex- to observe the effect of the phosphorylation on tensively researched. Recently, the microRNA the stabilization with simulation-based valida- regulation has been analyzed and its importance tion. has increasingly emerged. We have applied the Hybrid Functional Petri net with extension e. Analysis of gene networks for drug target (HFPNe) model and succeeded in creating discovery and validation model biological pathways, e.g. metabolic path- ways, gene regulatory networks, cell signaling Seiya Imoto, Yoshinori Tamada, Christopher J. networks, and cell-cell interaction models with Savoie, Satoru Miyano one of the HFPNe implementations Cell Illustra- tor.Thus,wehaveappliedHFPNetomodel Understanding responses of cellular system regulatory networks that involve a new key for a dosing molecule is one of the most impor- regulator microRNA. As a test case, we selected tant problems in pharmacogenomics. In this 143 chapter, we describe computational methods for model and analyze signaling pathways by using identifying and validating drug target genes Petri net. Firstly, we propose a modeling based on the gene network estimated from mi- method based on Petri net by paying attention croarray gene expression data. We use two to the molecular interactions and mechanisms. types of microarray gene expression data: One is Then, we introduce a new notion “activation gene disruptant microarray data and the other is transduction component” in order to describe an time-course drug response microarray data. For enzymic activation process of reactions in sig- this purpose, the information of gene networks naling pathways and shows its correspondence plays an essential role and is unattainable from to a so-called elementary T-invariant in the Petri clustering methods, which are the standard for net models. Further, we design an algorithm to gene expression analysis. The gene network is effectively find basic enzymic activation proc- estimated from disruptant microarray data by esses by obtaining a series of elementary T- the Bayesian network model and then the pro- invariants in the Petri net models. Finally, we posed method automatically identifies sets of demonstrate how our method is practically used genes or gene regulatory pathways affected by in modeling and analyzing signaling pathway thedrug.Weuseanactualexamplefromanaly- mediated by thrombopoietin as an example. sis of Saccharomyces cerevisiae gene expression profile data to express a concrete strategy for 2. Statistical and Computational Knowledge the application of gene network information to- Discovery ward drug target discovery. Key words: Drug target, gene network, Bayesian network, Boolean a. A statistical framework for genome-wide network, microarray data, Bayes statistics. discovery of biomarker splice variations with GeneChip Human Exon 1.0 ST arrays f. Finding module-based gene networks in time-course gene expression data with Ryo Yoshida, Kazuyuki Numata, Seiya Imoto, state space models Masao Nagasaki, Atsushi Doi, Kazuko Ueno, Satoru Miyano Rui Yamaguchi, Ryo Yoshida, Seiya Imoto, Tomoyuki Higuchi, Satoru Miyano Alternative splicing is an important regulatory mechanism that generates multiple mRNA tran- We discuss the use of the state space models scripts which are transcribed into functionally to analyze time-course microarray gene expres- diverse proteins. According to the current stud- sion data. Typical features of time-course mi- ies, aberrant transcripts due to splicing muta- croarray data are short time-course and high- tions are known to cause for 15% of genetic dis- dimensional observational vector. By these as- eases. Therefore understanding regulatory me- pects, conventional statistical models such as the chanism of alternative splicing is essential for multivariate autoregressive models lead to un- identifying potential biomarkers for several suitable results due to the overfitting. The state types of human diseases. Most recently, advent space models have the potential to overcome of GeneChipÑ Human Exon 1.0 ST Array en- this problem by the dimension reduction proc- ables us to measure genome-wide expression ess. This paper provides (1) a survey of the state profiles of over one million exons. With this space models, (2) a review of some existing re- new microarray platform, analysis of functional searches using the state space models for time- gene expressions could be extended to detect course microarray data together with a biologi- not only differentially expressed genes, but also cal meaning of the model, (3) a solution for the a set of specific-splicing events that are differen- lack of parameter identifiability, and (4) identifi- tially observed between one or more experimen- cation of biological system that is a central topic tal conditions, e.g. tumor or normal control cells. in bioinformatics. Finally, we show the useful- In this study, we address the statistical problems ness of the state space models for time-course to identify differentially observed splicing vari- microarray data by the analysis of Saccharomy- ations from exon expression profiles. The pro- ces cerevisiae cell cycle gene expression data. posed method is organized according to the fol- lowing process: (1) Data preprocessing for re- g. Structural modeling and analysis of sig- moving systematic biases from the probe inten- naling pathways based on Petri nets sities. (2) Whole transcript analysis with the analysis of variance (ANOVA) to identify a set Chen Li, Shunichi Suzuki, Qi-Wei Ge, Mitsuru of loci that cause the alternative splicing-related Nakata, Hiroshi Matsuno, Satoru Miyano to a certain disease. We test the proposed statis- tical approach on exon expression profiles of The purpose of this study is to discuss how to colorectal carcinoma. The applicability is veri- 144 fied and discussed in relation to the existing bio- own interests in these aquatic animals. How- logical knowledge. This paper intends to high- ever, the data obtained in fish studies not only light the potential role of statistical analysis of satisfy their own interests but often contribute all exon microarray data. Our work is an impor- more generally to the studies of other verte- tant first step toward development of more ad- brates, including mammals. The life of fishes is vanced statistical technology. Supplementary in- characterized by the aquatic habitat, which de- formation and materials are available from http: mands many physiological adjustments distinct //bonsai.ims.u-tokyou.ac.jp/~ yoshidar / IBSB from the terrestrial life. Among them, body fluid 2006_ExonArray.htm regulation is of particular importance as the body fluids are exposed to media of varying sa- b. Machine learning prediction of amino acid linities only across the thin respiratory epithelia patterns in protein N-myristoylation of the gills. Endocrine systems play pivotal roles in the homeostatic control of body fluid balance. Ryo Okada, Chigusa Miyakawa, Manabu Judging from the habitat-dependent control Sugii, Hiroshi Matsuno, Satoru Miyano mechanisms, some osmoregulatory hormones of fish should have undergone functional and mo- Protein N-myristoylation is the lipid modifica- lecular evolution during the ecological transition tion in which the 14-th carbon saturated fatty to the terrestrial life. In fact, water-regulating acid binds covalently to N-terminal of virus- hormones such as vasopressin are essential for based and eukaryotic protein. In this study, we survival on the land, whereas ion-regulating suggest an approach to predict the pattern of N- hormones such as natriuretic peptides, guany- myristoylation signal using the machine learn- lins and adrenomedullins are diversified and ex- ing system BONSAI. BONSAI finds rules in hibit more critical functions in aquatic species. combination of an alphabet indexings and deci- In this short review, we introduce some exam- sion trees. Computational experiments with ples illustrating how comparative fish studies BONSAI classified amino acid residues depend- contribute to general endocrinology by taking ing on effect for N-myristoylation and found advantage of such differences between fishes rules of the alphabet indexing. In addition, and tetrapods. In a functional context, fish stud- BONSAI suggested new requirements for the ies often afford a deeper understanding of the position of an amino acid in N-myristoylation essential actions of a hormone across vertebrate singal. taxa. Using the natriuretic peptide family as an example, we suggest that more functional stud- c. Contribution of comparative fish studies to ies on fishes will bring similar rewards of un- general endocrinology: structure and func- derstanding. At the molecular level, recent es- tion of some osmoregulatory hormones tablishment of genome databases in fishes and mammals brings clues to the evolutionary his- Yoshio Takei, Akatsuki Kawakoshi, Takehiro tory of hormone molecules via a comparative Tsukada, Shinya Yuge, Maho Ogoshi, Koji genomic approach. Because of the functional Inoue, Susumu Hyodo, Hideo Bannai, Satoru and molecular diversification of ion-regulating Miyano hormones in fishes, this approach sometimes leads to the discovery of new hormones in Fish endocrinologists are commonly moti- tetrapods as exemplified by adrenomedullin 2. vated to pursue their research driven by their

Publications

1. DeLisi,C.,Kanehisa,M.,Heinrich,R.,Miy- transcriptional activity. Genome Informatics. ano, S. (Eds.) Genome Informatics. 17 (1), 17 (1): 112-123, 2006. 2006. 4. Imoto, S., Miyano, S. Bayesian network ap- 2. Doi, A., Nagasaki, M., Matsuno, H., Miy- proach to estimate gene networks. in A. ano, S. Simulation based validation of the p Mittal, A. Kassim and T. Tan (Eds.), Baye- 53 transcriptional activity with hybrid unc- sian Network Technologies: Applications tional Petri net. In Silico Biology. 6 (1-2): 1- and Graphical Models, Idea Group Publish- 13, 2006. ers, USA. In press. 3. Doi, A., Nagasaki, M., Ueno, K., Matsuno, 5. Imoto, S., Tamada, Y., Araki, H., Yasuda, H., Miyano, S.A combined pathway to K., Print, C.G., Charnock-Jones, S.D., Sand- simulate CDK-dependent phosphorylation ers,D.,Savoie,C.J.,Tashiro,K.,Kuhara,S., and ARF-dependent stabilization for p53 Miyano, S. Computational strategy for dis- 145

covering druggable gene networks from 16. Saito, A., Nagasaki, M., Doi, A., Ueno, K., genome-wide RNA expression profiles. Pa- Miyano,S.Cellfatesimulationmodelof cific Symposium on Biocomputing. 11: 559- gustatory neurons with microRNAs double- 571, 2006. negative feedback loops by hybrid func- 6. Imoto, S., Tamada, Y., Savoie, C.J., Miyano, tional Petri net with extension. Genome In- S., Analysis of gene networks for drug tar- formatics 17 (1): 100-111, 2006. get discovery and validation. Methods in 17. Sakakibara, Y., Smith, T.F., Kanehisa, M., Molecular Biology. 360: 33-56, 2006. Miyano, S., Takagi, T. (Eds.) Genome Infor- 7. Imoto, S., Higuchi, T., Goto, T., Miyano, S. matics. 17 (2), 2006. Error tolerant model for incorporating bio- 18. Takei, Y., Kawakoshi, A., Tsukada, T., Yuge, logical knowledge with expression data in S.,Ogoshi,M.,Inoue,K.,Hyodo,S.,Bannai, estimating gene networks. Statistical Meth- H., Miyano, S. Contribution of comparative odology. 3 (1): 1-16, 2006. fish studies to general endocrinology: struc- 8. Jeong, E., Miyano, S.A weighted profile ture and function of some osmoregulatory based method for protein-RNA interacting hormones. J. Experimental Zoology. Part A, residue prediction. Transactions on Compu- Comparative Experimental Biology. 305 (9): tational Systems Biology. 123-139, 2006. 787-798, 2006. 9. Li, C., Suzuki, S., Ge, Q.-W., Nakata, M., 19. Tamada, Y., Imoto, S., Miyano, S. Estimat- Matsuno, H., Miyano, S. Structural model- ing gene networks from gene expression ing and analysis of signaling pathways data utilizing biological information. Proc. based on petri nets. J. Bioinf. Comput. Biol. Inst. Statist. Math. 54 (2): 333-356, 2006. 4 (5): 1119-1140, 2006. 20. Tasaki, S., Nagasaki, M., Oyama, M., Hata, 10. Matsuno, H., Inouye, S.-T., Okitsu, Y., Fujii, H.,Ueno,K.,Yoshida,R.,Higuchi,T., Y., Miyano, S.A new regulatory interactions Sugano, S., Miyano, S. Modeling and esti- suggested by simulations for circadian ge- mation of dynamic EGFR pathway by data netic control mechanism in mammals. J. assimilation approach using time series pro- Bioinf. Comput. Biol. 4 (1): 139-154, 2006. teomic data. Genome Informatics. 17 (2): 11. Matsuno, H., Li, C., Miyano, S. Petri net 226-228, 2006. based description for systematic under- 21.Washio,T.,Higuchi,T.,Imoto,S.,Tamada, standing of biological pathways. IEICE Y., Sato, K., Motoda, H. Graph mining and Trans. Fundamentals. E89-A (11): 3166-3174, its application to statistical modeling. Proc. 2006. Inst. Statist. Math. 54 (2): 315-331, 2006. 12. Miyano, S. (Ed.) Special RECOMB 2005 Is- 22. Yamaguchi, R., Yoshida, R., Imoto, S., sue. J. Comp. Biol. 13 (2), 2006. Higuchi, T., Miyano, S. Finding module- 13. Nagasaki, M., Yamaguchi, R., Yoshida, R., based gene networks in time-course gene Imoto, S., Doi, A., Tamada, Y., Matsuno, H., expression data with state space models. Miyano, S., Higuchi, T. Genomic data as- IEEE Signal Processing Magazine. In press. similation for estimating hybrid functional 23. Yoshida, R., Higuchi, T., Imoto, S., Miyano, Petri net from time-course gene expression S. ArrayCluster: an analytic tool for cluster- data. Genome Informatics. 17 (1). 46-61, ing, data visualization and module finder 2006. on gene expression profiles. Bioinformatics. 14. Nakamichi, R., Imoto, S., Miyano, S. Statisti- 22 (12): 1538-1539, 2006. cal model selection method to analyze com- 24. Yoshida, R., Numata, K., Imoto, S., Na- binatorial effects of SNPs and environ- gasaki, M., Doi, A., Ueno, K., Miyano, S.A mental factors for binary disease. Interna- statistical framework for genome-wide dis- tional Journal on Artificial Intelligence covery of biomarker splice variations with Tools. 15 (5), 711-724, 2006. GeneChip Human Exon 1.0 ST arrays. 15. Okada, R., Sugii, M., Matsuno, H., Miyano, Genome Informatics. 17 (1): 88-99, 2006. S. Machine learning prediction of amino 25.宮野 悟,江口至洋,金久 實,高木利久, acid patterns in protein N-myristoylation. 中井謙太(編).バイオインフォマティクス事 Lecture Notes in Bioinformatics. 4146: 4-14, 典.共立出版.2006. 2006. 146

Human Genome Center Laboratory of Molecular Medicine Laboratory of Genome Technology Division of Advanced Clinical Proteomics ゲノムシークエンス解析分野 シークエンス技術開発分野 先端臨床プロテオミクス共同研究ユニット

Professor Yusuke Nakamura, M.D., Ph.D. 教 授 医学博士 中村祐輔 Assistant Professor Hidewaki Nakagawa, M.D., Ph.D. 助 手 医学博士 中川英刀 Assistant Professor Koichi Matsuda, M.D., Ph.D. 助 手 医学博士 松田浩一 Assistant Professor Hitoshi Zembutsu, M.D., Ph.D. 助 手 医学博士 前 佛 均 Associate Professor Toyomasa Katagiri, Ph.D. 助教授 医学博士 片桐豊雅 Assistant Professor Ryuji Hamamoto, Ph.D. 助 手 理学博士 浜本隆二 Project Associate Professor Yataro Daigo, M.D., Ph.D. 特任助教授 医学博士 醍 醐 弥太郎

The major goal of the Human Genome Project is to identify genes of medical im- portance, and to develop new diagnostic and therapeutic tools. We have been at- tempting to isolate genes involving in carcinogenesis and also those causing or predisposing to various diseases as well as those related to drug efficacies and adverse reactions. By means of technologies developed through the genome pro- ject including a high-resolution SNP map, a large-scale DNA sequencing, and the cDNA microarray method, we have isolated a number of biologically and/or medi- cally important genes.

1. Genes playing significant roles in human Chikako Fukukawa, Koji Ueda, Minh Hue cancers Nguyen, Yuria Mano, Masaya Taniwaki, Ryo- hei Nishino, Daizaburo Hirata, Takumi Toyomasa Katagiri, Yataro Daigo, Hidewaki Yamabuki, Nagato Sato, Kenji Tamura, Nakagawa, Koichi Matsuda, Hitoshi Zembutsu, Masayo Hosokawa, Su-Youn Chung, Motohide Ryo Takata, Mitsugu Kanehira, Koji Taka- Uemura, Akio Takehara, Arata Shimo, Eiji Hi- hashi, Chiyuki Furukawa, Atsushi Takano, rota, and Yusuke Nakamura Nobuhisa Ishikawa, Tatsuya Kato, Satoshi Hayama, Chie Suzuki , Akira Togashi , (1) Chemosensitivity Kazuhito Morioka, Tomohide Kidokoro, Chizu Tanikawa, Asahi Hishida, Miki Akiyama, Sa- To predict the efficacy of the M-VAC neoadju- chiko Dobashi, Meng-Lay Lin, Jae-Hyun Park, vant chemotherapy for invasive bladder cancers, Tomomi Ueki, Yosuke Harada, Koichiro Inaki, we previously established the method to calcu- 147 late the prediction score on the basis of expres- data for those 6 genes, we developed a quantita- sion profiles of 14 predictive genes. This scoring tive reverse transcription-PCR based prediction system had clearly distinguished the responder system that could be feasible for routine clinical group from the non-responder group. To further use. Our results suggest that possibility of the validate the clinical significance of the system, relapse after complete remission by the imatinib we applied it to 22 additional cases of bladder -combined chemotherapy can be predicted by cancer patients and found that the scoring sys- expression patterns in this set of genes, leading tem correctly predicted clinical response for 19 to achievement of “personalized therapy” for of the 22 test cases. The group of patients with treatment of this disease. positive predictive scores had significantly longer survival than that with negative scores. (2) Lung cancer When we compared our results with the previ- ous report describing the prognosis of the pa- We found co-transactivation of CDCA1 (cell tients with cystectomy alone, the results imply division associated 1) and KNTC2 (kinetocore that patients with positive scores are likely to associated 2), members of the evolutionarily- have benefit by having M-VAC neoadjuvant conserved centromere protein complex, in non- chemotherapy, but that the chemotherapy small cell lung carcinomas (NSCLCs). Immuno- would shorten lives of patients with negative histochemical analysis using lung-cancer tissue scores. We are confident that our prediction sys- microarray confirmed high levels of CDCA1 and tem to M-VAC therapy should provide opportu- KNTC2 proteins in the great majority of lung nities for achieving better prognosis and im- cancers of various histological types. Their ele- proving quality of life of patients. Taken to- vated expressions were associated with poorer gether, our data suggest that the goal of “per- prognosis of NSCLC patients. Knockdown of sonalized medicine,” prescribing the appropriate either CDCA1 or KNTC2 expression with treatment regimen for each patient, may be siRNA significantly suppressed growth of achievable by selecting specific sets of genes for NSCLC cells. Furthermore, inhibition of their their predictive values. binding by a cell-permeable peptide carrying the Philadelphia--positive acute lym- CDCA1-derived 19 amino-acid peptide (11R- phoblastic leukemia (Ph+ALL) revealed one of CDCA1398-416) that correspond to the binding do- the poorest prognoses among leukemias due to main to KNTC2 effectively suppressed growth its high incidence of relapse. Although more of NSCLC cells. As our data imply that human than 96% of patients with Ph-positive ALL CDCA1 and KNTC2 appear to fall in the cate- achieved complete remission (CR) by the gory of cancer-testis antigens and their simulta- imatinib-combined chemotherapy in a phase II neous up-regulation is a frequent and important study conducted by the Japan adult leukemia feature of cell growth/survival of lung-cancer, study group (JALSG), 25% of them experienced selective suppression of CDCA1 or KNTC2 ac- relapse. To establish a prediction system for a tivity, and/or inhibition of the CDCA1-KNTC2 risk of relapse after CR, we analyzed gene ex- complex formation could be a promising thera- pression profiles of 23 bone marrow samples peutic target for treatment of lung cancer. from patients with Ph+ALL using cDNA mi- We also identified abundant expression of croarray consisting of 27,648 cDNA sequences. neuromedin U (NMU) in the great majority of Using the 19 randomly-selected test cases, we lung cancers. Immunohistochemical analysis identified 16 genes that were expressed signifi- demonstrated significant association of NMU ex- cantly differently between “non-relapse” (8 pa- pression with poorer prognosis of NSCLC pa- tients) and “relapse” (11 patients) groups; from tients. Treatment of NSCLC cells with siRNA the list of 16 genes, we selected the 6 “predic- against NMU suppressed its expression and in- tive” genes that showed significant differences hibited growth of the cells; on the other hand, and constructed a numerical prediction scoring induction of exogenous expression of NMU con- system by which the non-relapse group (with ferred growth-promoting activity and enhanced positive scores) was clearly separated from the the cell mobility in vitro. We found that two G relapse group (with negative scores). Scoring of protein-coupled receptors, growth hormone 4 cases that were reserved from the original 23 secretagogue receptor 1b (GHSR1b) and neuro- cases predicted correctly the responses to tensin receptor 1 (NTSR1), were also over- imatinib-combined chemotherapy. In addition, expressed in NSCLC cells and a hetero-dimer three cases, that were resistant to the imatinib- complex of these receptors functioned as an combined therapy and failed to induce remis- NMU receptor. The NMU-receptor interaction sion, also revealed the negative scores. Because subsequently induced generation of a second real-time reverse transcription-PCR data were messenger, cAMP, to activate its downstream highly concordant with the cDNA microarray genes including transcription factors and cell cy- 148 cle regulators. Treatment of NSCLC cells with gression of small-cell lung cancer (SCLC) and siRNAs for GHSR or NTSR1 suppressed expres- identify molecules to be applicable as novel di- sion of those genes and the growth of NSCLC agnostic markers and/or for development of cells. These data strongly implied that targeting molecular-targeted drugs, we applied cDNA mi- the NMU signaling pathway would be a prom- croarray profile analysis coupled with purifica- ising therapeutic strategy for treatment of lung tion of cancer cells by laser-microbeam microdis- cancers. section (LMM). Expression profiles of 32,256 In addition, we found over-expression of a genes in 15 SCLCs identified 252 genes that MAPJD (Myc-associated protein with JmjC do- were commonly up-regulated and 851 tran- main) gene in the great majority of NSCLC scripts that were down-regulated in SCLC cells cases. Induction of exogenous expression of compared with non-cancerous lung tissue cells. MAPJD into NIH3T3 cells conferred growth- An unsupervised clustering algorithm applied to promoting activity. Concordantly, in vitro sup- the expression data easily distinguished SCLC pression of MAPJD expression with siRNA ef- from the other major histological type of non- fectively suppressed growth of NSCLC cells in small cell lung cancer (NSCLC) and identified which MAPJD was over-expressed. We found 475 genes that may represent distinct molecular four candidate MAPJD-target genes, SBNO1, features of each of the two histological types. In TGFBRAP1, RIOK1,andRASGEF1A,which particular, SCLC was characterized by altered were the most significantly induced by exoge- expression of genes related to neuroendocrine nous MAPJD expression. Through interaction cell differentiation and/or growth such as ASCL with MYC protein, MAPJD transactivates a set 1, NRCAM ,andINSM1. We also identified 68 of genes including kinases and cell signal trans- genes that were abundantly expressed both in ducers that are possibly related to proliferation advanced SCLCs and advanced adenocarcino- of lung cancer cells. As our data imply that mas (ADCs), both of which had been obtained MAPJD is a novel member of the MYC tran- from patients with extensive chemotherapy scriptional complex and its activation is a com- treatment. Some of them are known to be tran- mon feature of lung-cancer, selective suppres- scription factors and/or gene expression regula- sion of this pathway could be a promising tors such as TAF5L, TFCP2L4, PHF20, LMO4, therapeutic target for treatment of lung cancers. TCF20, RFX2,andDKFZp547I048 as well as To identify molecules that might serve as those encoding nucleotide-binding proteins such biomarkers or targets for development of novel as C9orf76, EHD3,andGIMAP4.Ourdatapro- molecular therapies, we have been screening vide valuable information for better understand- genes encoding transmembrane/secretory pro- ing of lung carcinogenesis and chemoresistance. teins that are up-regulated in lung cancers, us- ing cDNA microarrays coupled with purification (3) Pancreatic cancer of tumor cells by laser microdissection. A gene encoding seizure-related 6 homolog (mouse)-like Pancreatic ductal adenocarcinoma (PDAC) 2 (SEZ6L2) protein, was chosen as a candidate shows the worst mortality among the common for such molecule. Semi-quantitative RT-PCR malignancies, with a 5-year survival rate of only and western-blot analyses documented in- 4%, and the majority of PDAC patients are di- creased expression of SEZ6L2 in the majority of agnosed at an advanced stage, in which no ef- primary lung cancers and lung-cancer cell lines fective therapy is available at present. Although examined. SEZ6L2 protein was proven to be the proportion of curable cases is still not so present on the surface of lung-cancer cells by high, surgical resection of early-staged PDACs is flow cytometrical analysis using anti-SEZ6L2 an- the only way to cure the disease. Hence, estab- tibody. Immunohistochemical staining for tumor lishment of a screening strategy to detect early- tissue microarray consisting of 440 archived staged PDACs by novel serological markers is lung-cancer specimens detected positive SEZ6L2 urgently required, and development of novel staining in 327 (78%) of 420 non-small cell lung molecular therapies for PDAC treatment is also cancers (NSCLCs) and 13 (65%) of 20 small-cell eagerly expected. We here report over- lung cancers (SCLCs) examined. Moreover, expression of REG4, a new member of the REG NSCLC patients whose tumors revealed a family, in PDAC cells on the basis of the higher level of SEZ6L2 expression suffered genome-wide cDNA microarray analysis as well shorter tumor-specific survival compared to as RT-PCR and immunohistochemical analysis. those with no SEZ6L2 expression. These results We also detected significant elevation of REG4 indicate that SEZ6L2 should be a useful prog- in serum of some parts of patients with early- nostic marker of lung cancers. staged PDACs by our ELISA system, indicating Furthermore, to characterize the molecular the possibility of REG4 as a new serological mechanisms involved in carcinogenesis and pro- marker of PDACs. Furthermore, we found that 149 knockdown of the endogenous REG4 expression or tumor classification (p<0.0001). Furthermore, in PDAC cell lines with siRNA caused decrease the expression levels of MICAL2-PVs were also of cell viability. Concordantly, addition of re- concordant to those of c-Met, a marker of tumor combinant REG4 to the culture medium en- aggressiveness, with statistical significance (p= hanced growth of PDAC cell line in a dose- 0.0018). To investigate its biological function in dependent manner. A monoclonal antibody PC cells in vitro, we knocked down endogenous against REG4 neutralized its growth-promoting MICAL2-PVs in PC cells by siRNA, which re- effects and attenuated significantly the growth sulted in the significant reduction of PC cell vi- of PDAC cells. These findings implicate that ability. Our findings suggest that MICAL2-PVs REG4 is a promising tumor marker to screen is likely to be involved in tumor progression or early-staged PDAC and also that neutralization aggressiveness of PC, and could be a candidate of REG4 by the antibody may offer us novel po- asanovelmolecularmarkerand/oratargetfor tential tools for treatment of PDACs. treatment of advanced PCs. Among dozens of up-regulated genes in PDAC cells, we also focused on one tyrosine (5) Breast Cancer kinase receptor, Eph receptor A4 (EphA4), as a molecular target for PDAC therapy. Immunohis- We previously reported that up-regulation of tochemical analysis validated EphA4 overex- SMYD3, a histone H3 lysine-4 specific methyl- pression in approximately a half of PDAC tis- transferase, plays a key role in the proliferation sues. To investigate its biological function in of colorectal carcinoma (CRC) and hepatocellu- PDAC cells, we knocked down EphA4 expres- lar carcinoma (HCC). In this study, we reveal sion by siRNA, which drastically attenuated that SMYD3 expression is also elevated in a PDAC cell viability. Concordantly to the siRNA great majority of breast cancer tissues. Similarly experiment, PDAC-derivative cells that were de- to CRC and HCC, silencing of SMYD3 by signed to constitutively express exogenous siRNA to this gene resulted in the inhibited EphA4 showed more rapid growth rate than growth of breast cancer cells, suggesting that the cells transfected with mock vector, suggesting increased SMYD3 expression is also essential for the growth-promoting effect of EphA4 on PDAC the proliferation of the breast cancer cells. More- cells. Furthermore, the expression analysis for over, we show here that SMYD3 could promote ephrin ligand family members indicated co- breast carcinogenesis by directly regulating the existence of ephrinA3 ligand in PDAC cells with expression of the proto-oncogene WNT10B . EphA4 receptor, and knockdown of ephrinA3 by These data imply that augmented SMYD3 plays siRNA also attenuated PDAC cell viability as a crucial role in breast carcinogenesis, and that well as EphA4. These results implicate that the inhibition of SMYD3 should be a novel thera- EphA4-ephrinA3 pathway is likely to be a peutic strategy for treatment of breast cancer. promising molecular target for pancreatic cancer Cancer therapy directing at specific molecular therapy. targets in signaling pathways of cancer cells such as Tamoxifen, aromatase inhibitors and (4) Prostate cancer trastuzumab has been proven its usefulness for treatment of advanced breast cancers. However, Through genome-wide cDNA microarray increases of the risk of endometrial cancer by analysis coupled with microdissection of pros- long-term tamoxifen administration as well as tate cancer cells, we identified MICAL2-PV those of bone fracture due to osteoporosis in (Molecule Interacting with CasL-2 Prostate Cancer postmenopausal women with aromatase inhibi- Variants) , novel splicing variants of MICAL2, tor prescription are recognized as their side ef- showing overexpression in PC cells. Northern fects. Due to the emergence of these side effects blot analysis demonstrated that MICAL2-PVs and also drug resistance, it is necessary to were cancer-testis specific transcripts. Immuno- search novel targets for molecularly-orientated histochemical analysis using an antibody gener- drugs on the basis of well-characterized mecha- ated specific to MICAL2-PV revealed that MI- nisms of action. Using the accurate genome- CAL2-PV was expressed in the cytoplasm of wide expression profiles of breast cancers, we cancer cells with various staining patterns and found maternal embryonic leucine-zipper kinase intensities, while it was not or hardly detectable (MELK) that was significantly overexpressed in in adjacent normal prostate epithelium or the great majority of breast cancer cells. To as- prostatic intraepithelial neoplasia (PIN). Interest- sess a possible role of MELK in mammary car- ingly, immunohistochemical analysis of 105 PC cinogenesis, we knocked down the expression of specimens on the tissue microarray indicated endogenous MELK in breast cancer cell-lines by that MICAL2-PV expression status was signifi- means of the mammalian vector-based RNA in- cantly correlated with Gleason scores (P <0.0001) terference (RNAi) technique. Furthermore, we 150

identified a long isoform of Bcl-G (Bcl-GL), a pro small-interfering RNAs (siRNAs) against PRC1 -apoptotic member of Bcl-2 family, as a possible effectively suppressed PRC1 expression and in- substrate (s) for the MELK kinase by pull-down hibited the growth of breast cancer cells, T47D assay with wild-type-and kinase-dead-MELK re- and HBC5. Furthermore, we found interaction combinant proteins. Finally, we performed of PRC1 and KIF2C/MCAK (Kinesin family TUNEL assay and FACS analysis to measure the member 2C/Mitotic centromere-associated kine- proportions of sub-G1 population to investigate sin) by co-immunoprecipitation and immuno- MELK is involved in apoptosis cascade through blotting using COS-7 cells in which these mole- the Bcl-GL-related pathway. The multiple human cules were introduced exogenously. These find- tissues-and cancer cell lines-northern blot analy- ings suggest a possible involvement of the PRC1 ses demonstrated that MELK was overexpressed -KIF2C/MCAK complex in breast tumorigenesis at a significantly high level in a great majority and that this complex should be a promising of breast cancers and cancer cell lines, but not target for development of novel treatment for expressed in normal vital organs (heart, liver, breast cancer. lung, hidney). Suppression of MELK expression with small-interfering RNA significantly inhib- (6) Colon cancer ited growth of human breast cancer cells. We also found that MELK protein physically inter- Through a genome-wide cDNA microarray acted with Bcl-GL protein, a pro-apoptotic mem- analysis, we identified a number of genes whose ber of the Bcl-2 family through its N-terminal expression was up-regulated frequently in col- region. Subsequent immunocomplex kinase as- orectal cancer. Among them, we here report a say showed Bcl-GL was specifically phospho- gene termed FAM84A that was expressed in rylated by MELK in vitro. TUNEL assay and none of 22 normal tissues examined except the FACS analysis revealed that overexpression of testis. Although immunocytochemical staining WT-MELK suppressed Bcl-GL-induced apoptosis, revealed localization of FAM84A protein in the while that of D150A-MELK did not. Our find- sub-cellular membrane region, the staining was ings suggest that kinase activity of MELK is observed limitedly in the region lacking the at- likely to be involved in mammary carcinogene- tachment with neighboring cells. In addition, we sis through inhibition of pro-apoptotic function found that exogenous FAM84A expression in- of Bcl-GL. The kinase activity of MELK should creased cell motility in NIH3T3 cells, and that be a promising target for development of phosphorylation of serine38 of FAM84A was as- molecular-targeting therapy for patients with sociated with morphology of cells. Our results breast cancers. indicate a possibility that up-regulation of FAM We also focused on one gene that encodes 84A plays a critical role in progression of colon PBK/TOPK including a kinase domain. North- cancer. ern blot analyses using mRNAs of normal hu- man organs, breast cancer tissues, and cancer (7) Renal cancer cell-lines indicated this molecule to be a novel cancer/testis antigen. Reduction of PBK/TOPK In order to clarify the molecular mechanism expression by siRNA resulted in significant sup- involved in renal carcinogenesis, and identify pression of cell growth probably due to dys- molecular targets for diagnosis and treatment, function in the cytokinetic process. Immunocyto- we analyzed genome-wide gene expression pro- chemical analysis with anti-PBK/TOPK anti- files of 15 surgical specimens of clear cell renal body implicated a critical role of PBK/TOPK in cell carcinoma (RCC), compared to normal renal an early step of mitosis. PBK/TOPK could phos- cortex, using a combination of laser microbeam phorylate histone H3 at Ser10 in viro and in microdissection (LMM) with a cDNA microarray vivo, and medicated its growth-promoting effect representing 27,648 genes. We identified 257 through histone H3 modification. Since PBK/ genes that were commonly up-regulated and TOPK is the cancer/testis antigen and its kinase 721 genes that were down-regulated in RCCs. function is likely to be related to its oncogenic Interestingly, none of top 24 up-regulated genes activity, we suggest PBK/TOPK to be a promis- that showed most significant differences in in- ing molecular target for breast cancer therapy. formativeRCC-caseswereincludedinprevi- Among the up-regulated genes, we further fo- ously reports describing expression profiles of cused on functional significance of PRC1 (pro- ccRCC using RNAs isolated from bulk tissues. tein regulator of cytokinesis 1) in development These findings suggest that it is important to of breast cancer. Western blot analysis using purify as much as possible the populations of breast cancer cell-lines revealed a significant in- cancerous and normal epithelial cells obtained crease of endogenous PRC1 level in G2/M from surgical specimens. Among significantly- phase. Treatment of breast cancer cells with transactivated genes, we in this study focused 151 on Semaphorin 5B (SEMA5B ) and knocked- 20 Japanese ICCs that were not associated with down its expression in RCC cells by small- fluke by means of laser-microbeam-micro- interfering RNA (siRNA). Effective down- dissection and a cDNA microarray containing regulation of its expression levels in RCC cells 27,648 genes. We identified 77 commonly upre- significantly attenuated RCC cell viability. In gulated genes and 325 commonly downregu- conclusion, our data should be helpful for a bet- lated genes in the two ICC groups. Unsuper- ter understanding of the tumorigenesis of RCC vised hierarchical cluster analysis separated the and should contribute to the development of di- 40 ICCs into two major branches almost com- agnostic tumor marker and molecular-targeting pletely according to the fluke status. The puta- therapy for patients with RCC. tive signature of liver fluke-associated ICC ex- hibited elevated expression of genes involved in (8) Bladder cancer xenobiotic metabolism (UGT2B11, UGT1A10, CHST4, SULT1C1), while that of non-liver fluke- To disclose molecular mechanism of bladder associated ICC represented enhanced expression cancer, the second most common genitourinary of genes related to growth factor signaling tumor, we had previously performed genome- (TGFBI, PGF, IGFBP1, IGFBP3). Additional ran- wide expression profile analysis of 26 bladder dom permutation tests identified a total of 52 cancers by means of cDNA microarray repre- genes whose expression levels were significantly senting 27,648 genes. Among genes that were different between the two groups. We also iden- significantly up-regulated in the majority of tified 17 genes associated with macroscopic type bladder cancers, we here report identification of of fluke-associated ICCs. These data may not MPHOSPH1 (M-phase phosphoprotein 1) as a can- only contribute to clarification of common and didate molecule for drug development for blad- fluke-specific mechanisms underlying ICC, but der cancer. Northern blot analyses using mRNAs also serve as a starting point for the identifica- of normal human organs and cancer cell-lines tion of novel diagnostic markers and/or thera- indicated this molecule to be a novel cancer/tes- peutic targets for the disease. tis antigen. Introduction of MPHOSPH1 into NIH3T3 cells significantly enhanced cell growth (10) Biomarker for esophageal and lung can- at in vitro and in vivo conditions. We subsequ- cer ently found an interaction between MPHOSPH1 and PRC1 (Protein Regulator of Cytokinesis 1), Gene-expression profile analysis of lung and which was also up-regulated in bladder cancer esophageal carcinomas revealed that Dikkopf-1 cells. Immunocytochemical analysis revealed co- (DKK1) was highly transactivated in the great localization of endogenous MPHOSPH1 and majority of lung cancers and esophageal PRC1 proteins in bladder cancer cells. Interest- squamous-cell carcinomas (ESCCs). Immunohis- ingly, knockdown of either of MPHOSPH1 or tochemical staining using tumor tissue microar- PRC1 expression with specific siRNAs caused rays consisting of 279 archived non-small cell significant increase of multi-nuclear cells and lung cancers (NSCLCs) and 280 ESCC speci- subsequent cell death of bladder cancer cells. mens demonstrated that a high level of DKK1 Our results imply that the MPHOSPH1/PRC1 expression was associated with poor prognosis complex is likely to play a crucial role in blad- of patients with NSCLC as well as ESCC, and der carcinogenesis and that inhibition of the multivariate analysis confirmed its independent MPHOSPH1/PRC1 expression or their interac- prognostic value for NSCLC. In addition, we tion should be novel therapeutic targets for identified that exogenous expression of DKK1 bladder cancers. increased the migratory activity of mammalian cells, suggesting that DKK1 may play a signifi- (9) Cholangiocarcinoma cant role in progression of human cancer. We established an ELISA system to measure serum Intrahepatic cholangiocarcinoma (ICC) is the levels of DKK1 and found that serum DKK1 lev- second most common primary cancer in the els were significantly higher in lung and liver, and its incidence is highest in northeastern esophageal cancer patients than in healthy con- part of Thailand. ICCs in this region are known trols. The proportion of the DKK1-positive cases to be associated with infection with liver flukes, was 126 (70.0%) of 180 NSCLC, 59 (69.4%)of85 particularly Opisthorchis viverrini (OV), as well SCLC, and 51 (63.0%)of81ESCCpatients, as nitrosamines from food. To clarify molecular while only 10 (4.8%) of 207 healthy volunteers mechanisms of ICC associated with or without were falsely diagnosed as positive. A combined liver flukes, we analyzed gene-expression pro- ELISA assays for both DKK1 and CEA increased files of fluke-associated ICCs from 20 Thai pa- sensitivity, and classified 82.2% of the NSCLC tients, and compared their profiles with those of patients as positive while only 7.7% of healthy 152 volunteers were falsely diagnosed to be positive. 5Laboratory for Genotyping, SNP Research The use of both DKK1 and ProGRP increased Center, the Institute of Physical and Chemical sensitivity to detect SCLCs up to 89.4%, while Research (RIKEN), Yokohama 230-0045, Ja- false positive rate in healthy donors were only pan. 6.3%. Our data imply that DKK1 should be use- ful as a novel diagnostic/prognostic biomarker Cerebral infarction is the most common type in clinic and probably as a therapeutic target for of stroke and often causes long-term disability. lung and esophageal cancer. To investigate the genetic contribution to cere- bral infarction, we conducted a case-control 2. Common diseases study using 52,608 gene-based tag-SNPs selected from JSNP database. Here we reported that one (1) cerebral infarction non-synonymous SNP in a member of protein kinase C (PKC) family, PRKCH , was signifi- Authors: Michiaki Kubo1,2,4,JunHata1,2,4,Toshi- cantly associated with lacunar infarction in two haru Ninomiya1,2, Koichi Matsuda4, Koji Yo- independent Japanese samples (p=3.2x10-7, nemoto1, Toshiaki Nakano2,3, Tomonaga Mat- crude odds ratio of 1.39). This SNP was likely to sushita2,4, Keiko Yamanaka-Yamazaki4,Yozo affect the PKC activity. Furthermore, a 14-year Ohnishi5, Susumu Saito5, Takanori Kitazono2, follow-up cohort study in Hisayama (Fukuoka, Setsuro Ibayashi2, Katsuo Sueishi3,Mitsuo Japan) supported involvement of this SNP for Iida2, Yusuke Nakamura4, and Yutaka Kiyo- the development of cerebral infarction (p=0.03, hara1.: 1Department of Environmental Medi- age-and sex-adjusted hazard ratio of 2.83). We cine, 2Department of Medicine and Clinical Sci- also found that PKCη was mainly expressed in ence, 3Pathophysiological and Experimental Pa- vascular endothelial cells and foamy macro- thology, Graduate School of Medical Sciences, phages in human atherosclerotic lesions, and its Kyushu University, Fukuoka 812-8582, Japan. expression was enhanced as the lesion type pro- 4Laboratory of Molecular Medicine, Human gressed. Our results support a role for PRKCH Genome Center, Institute of Medical Science, in the pathogenesis of cerebral infarction. University of Tokyo, Tokyo 108-8639, Japan.

Publications

1. R. Hamamoto, F.P. Silva, M. Tsuge, T. requirements in Japanese patients. Journal of Nishidate, T. Katagiri, Y. Nakamura and Y. Human Genetics, 51: 249-253, 2006 Furukawa: Enhanced SMYD3 expression is 6. T.Suda,T.Tsunoda,N.Uchida,T.Watan- essential for the growth of breast cancer abe,S.Hasegawa,S.Satoh,S.Ohgi,Y.Fu- cells. Cancer Science, 97: 113-118, 2006 rukawa, Y. Nakamura, and H. Tahara: Iden- 2. K. Obama, T. Kato, S. Hasegawa, S. Satoh, tification of secernin 1 as a novel immuno- Y. Nakamura, and Y. Furukawa: Overex- therapy target for gastric cancer using the pression of peptidyl-prolyl isomerase-like 1 expression profiles of cDNA microarray. is associated with the growth of colon can- Cancer Science, 97: 411-419, 2006 cer cells. Clinical Cancer Research, 12: 70-76, 7. K.Yoshiura,A.Kinoshita,T.Ishida,A.Ni- 2006 nokata, T. Ishikawa, T. Kaname, M. Bannai, 3. A.Iida,S.Saito,A.Sekine,A.Takahashi,N. K. Tokunaga, S. Sonoda, R. Komaki, M. Kamatani, and Y. Nakamura: Japanese sin- Ihara, V. Saenko, A.G. Kaimovich, I. Sekine, gle nucleotide polymorphism database for K. Komatsu, H. Takahashi, M. Nakashima, 267 possible drug-related genes. Cancer Sci- N. Sosonkina, C.K. Mapendano, M. Ghad- ence, 97: 16-24, 2006 ami, M. Nomura, D.-S. Linag, D.-K. Kim, A. 4. T. Kikuchi, Y. Daigo, N. Ishikawa, T. Garidkhuu, N. Natsume, T. Ohta, H. To- Katagiri,T.Tsunoda,S.Yoshida,andY. mita, K. Hirayama, M. Ishibashi, A. Taka- Nakamura: Expression profiles of metastatic hashi, N. Saito, S. Saitou, Y. Nakamura, and brain tumor from lung adenocarcinomas on N, Niikawa: A SNP in the ABCC11 gene is cDNA microarray. International Journal of the determinant of human earwax type. Na- Oncology, 28: 799-805, 2006 ture Genetics, 38: 324-330, 2006 5. T. Mushiroda, Y. Ohnishi, S. Saito, A. Taka- 8. T.Yamabuki,Y.Daigo,T.Kato,S.Hayama, hashi, Y. Kikuchi, S. Saito, H. Shimomura, Y. T.Tsunoda,M.Miyamoto,T.Ito,M.Fujita, Wanibuchi, T. Suzuki, N. Kamatani, and Y. M.Hosokawa,S.Kondo,andY.Nakamura: Nakamura: Association of VKORC1 and Genome-wide gene expression profile analy- CYP2C9 polymorphisms with warfarin dose sis of esophageal squamous cell carcinomas. 153

International Journal of Oncology, 28: 1375- T. Katagiri: Genome-wide gene expression 1384, 2006 profiles of clear cell renal cell carcinoma; In- 9. A. Iida, H. Kizawa, Y. Nakamura, and S. ternational Journal of Oncology, 29: 799-827, Ikegawa: High-resolution SNP map of 2006 ASPN, a susceptibility gene for osteoarthri- 18.J.-H.Park,M.-L.Lin,T.Nishidate,Y.Naka- tis. Journal of Human Genetics, 51: 151-154, mura, and T. Katagiri: PDZ-binding kinase/ 2006 T-LAK cell-originated protein kinase, a puta- 10. F.P. Silva, R. Hamamoto, Y. Furukawa, and tive cancer/testis antigen with an oncogenic Y. Nakamura: TIPUH1 encodes a novel activity in breast cancer. Cancer Research, KRAB zinc-finger protein highly expressed 66: 9186-9195, 2006 in human hepatocellular carcinomas. Onco- 19. K. Ozaki, H. Sato, A. Iida, H. Mizuno, T. gene, 25: 5063-5070, 2006 Nakamura,Y.Miyamoto,A.Takahashi,T. 11. K. Nakashima, T. Hirota, Y. Suzuki, A. Mat- Tsunoda, S. Ikegawa, N. Kamatani, M. Hori, suda, M. Akahoshi, M. Shimizu, A. Jodo, S. Y. Nakamura, and T. Tanaka: A functional Doi,K.Fujita,M.Ebisawa,S.Yoshihara,T. SNP in PSMA6 confers risk of myocardial Enomoto, T. Shirakawa, F. Kishi, Y. Naka- infarction in the Japanese population. Na- mura, and M. Tamari: Association of the RIP ture Genetics, 38: 921-925, 2006 2 gene with childhood atopic asthma. Alleol- 20. N. Jinawath, Y. Chamgramol, Y. Furukawa, ogy International, 55: 77-83, 2006 K. Obama, T. Tsunoda, B. Sripa, C. Pairo- 12.K.Nakashima,T.Hirota,K.Obara,M. jkul, and Y. Nakamura: Comparison of gene Shimizu, A. Jodo, M. Kameda, S. Doi, K. Fu- -expression profiles between opisthorchis vi- jita,T.Shirakawa,T.Enomoto,F.Kishi,S. verrini and non-opisthorchis viverrini associ- Yoshihara, K. Matsumoto, H. Saito, Y. ated human intrahepatic cholangiocarci- Suzuki, Y. Nakamura, and M. Tamari: An noma. Hepatology, 44: 1025-1038, 2006 association study of asthma and related phe- 21. Y. Nakamura, M. Futamura, H. Kamino, K. notypes with polymorphisms in negative Yoshida, Y. Nakamura, and H. Arakawa: regulator molecules of the TLR signaling Identification of p53-46F as a super p53 with pathway. Journal of Human Genetics, 51: an enhanced ability to induce p53-depen- 284-291, 2006 dent apoptosis. Cancer Science, 97: 633-641, 13. S. Ashida, M. Furihata, T. Katagiri, K. Ta- 2006 mura, Y. Anazawa, H. Yoshioka, T. Miki, T. 22. A. Takehara, H. Eguchi, H. Ohigashi, O. Fujioka, T. Shuin, Y. Nakamura, and H. Ishikawa, T. Kasugai, M. Hosokawa, T. Nakagawa: Expression of novel molecules, Katagiri, Y. Nakamura, and H. Nakagawa: MICAL2-PV (MICAL2 prostate cancer vari- Novel tumor marker REG4 detected in se- ants), increases with high gleason score and rum of patients with resectable pancreatic prostate cancer progression. Clinical Cancer cancer and feasibility for antibody therapy Research, 12: 2767-2773, 2006 targeting REG4. Cancer Science, 97: 1191- 14. N. Ishikawa, Y. Daigo, A. Takano, M. Tani- 1197, 2006 waki,T.Kato,S.Tanaka,W.Yasui,Y.Take- 23. M. Iiizumi, H. Nakagawa, M. Hosokawa, A. shima,K.Inai,H.Nishimura,E.Tsuchiya, Takehara, C. Su-Youn, T. Nakamura, T. N. Kohno, and Y. Nakamura: Characteriza- Katagiri,H.Eguchi,H.Ohigashi,O. tion of SEZ6L2 cell-surface protein as a Ishikawa, Y. Nakamura, and H. Nakagawa: novel prognostic marker for lung cancer. EphA4 receptor, overexpressed in pancreatic Cancer Science, 97737-745, 2006 ductal adenocarcinoma, promotes cancer cell 15. T. Kobayashi, T. Masaki, M. Sugiyama, Y. growth. Cancer Science, 97: 1211-1216, 2006 Atomi, Y. Furukawa, and Y. Nakamura: A 24. K. Takahashi, C. Furukawa, A. Takano, N. gene encoding a family with sequence simi- Ishikawa,T.Kato,S.Hayama,C.Suzuki,W. larity 84, member A (FAM84A) enhanced Yasui, K. Inai, S. Sone, T. Ito, H. Nishimura, migration of human colon cancer cells. Inter- E.Tsuchiya,Y.NakamuraandY.Daigo: national Journal of Oncology, 29: 341-347, The neuromedin u-growth hormone secre- 2006 tagogue receptor 1b/neurotensin receptor 1 16. M. Taniwaki, Y. Daigo, N. Ishikawa, A. oncogenic signaling pathway as a therapeu- Takano,T.Tsunoda,W.Yasui,K.Inai,N. tic target for lung cancer. Cancer Research, Kohno, and Y. Nakamura: Gene expression 66: 9408-9419, 2006 profiles of small-cell lung cancers: molecular 25. S. Hayama, Y. Daigo, T. Kato, N. Ishikawa, signatures of lung cancer. International Jour- T. Yamabuki, M. Miyamoto, T. Ito, E. nal of Oncology, 29: 567-575, 2006 Tsuchiya, S. Kondo, and Y. Nakamura: Acti- 17.E.Hirota,L.Yan,T.Tsunoda,S.Ashida,M. vation of CDCA1-KNTC2, members of cen- Fujime,T.Shuin,T.Miki,Y.Nakamuraand tromere protein complex, involved in pul- 154

monary carcinogenesis. Cancer Research, 66: Similarity of the allele frequency and linkage 10339-10348, 2006 disequilibrium pattern of single nucleotide 26. K. Ura, K. Obama, S. Satoh, Y. Sakai, Y. polymorphisms in drug-related gene loci be- Nakamura, and Y. Furukawa: Enhanced tween Thai and northern East Asian popula- RASGEF1A expression is involved in the tions: implications for tagging SNP selection growth and migration of intrahepatic cho- in Thais. Journal of Human Genetics, 51: 896 langiocarcinoma. Clinical Cancer Research, -904, 2006 12: 6611-6616, 2006 29. Y. Nakazaki, H. Hase, H. Inoue, Y. Beppu, 27. N. Ishii, K. Ozaki, H. Sato, H. Mizuno, S. M.Xin,G.Sakaguchi,R.Kurita,S.Asano,Y. Saito, A. Takahashi, Y. Miyamoto, S. Ike- Nakamura, and K. Tani: Serial analysis of gawa,N.Kamatani,M.Hori,S.Saito,Y. gene expression in progressing and regress- Nakamura and T. Tanaka: Identification of a ing mouse tumor implicates the involvement novel non-coding RNA, MIAT, that confers of RANTES andTARC in antitumor immune risk of myocardial Infarction. Journal of Hu- responses. Molecular Therapy, 14: 599-606, man Genetics, 51: 1087-1099, 2006 2006 28. S. Mahasirimongkol, W. Chantratita, S. 30. M. Kato, A. Sekine, Y. Ohnishi, T.A. John- Promso, E. Pasomsab, N. Jinawath, W. Jong- son, T. Tanaka, Y. Nakamura and T. Tsu- jaroenprasert, V. Lulitanond, P. Kritta- noda: Linkage disequilibrium of evolutionar- yapoositpot, S. Tongsima, P. Sawanpany- ily conserved regions in the human genome. alert, N. Kamatani, Y. Nakamura, T. Sura: BMC Genomics, 7: 326, 2006 155

Human Genome Center Laboratory of Functional Analysis In Silico 機能解析イン・シリコ分野

Professor Kenta Nakai, Ph.D. 教 授 理学博士 中井謙太 Associate Professor Kengo Kinoshita, Ph.D. 助教授 理学博士 木下賢吾

The mission of our laboratory is to conduct computational (“in silico”) studies on the functional aspects of genome information. Roughly speaking, genome informa- tion represents what kind of proteins/RNAs are synthesized on what conditions. Thus, our study includes the structural analysis of molecular function of each gene product as well as the analysis of its regulatory information, which will lead us to the understanding of its cellular role represented by the networks of inter-gene in- teraction.

1. Conservation of regulation systems in fir- known B. subtilis transcription factor binding micutes sites on every genome using the position spe- cific weight matrices provided by DBTBS, and Nicolas Sierro and Kenta Nakai the analysis of the upstream intergenic region conservation for each cluster of homologous Due to the probable co-regulation by a com- genes. mon transcription factor of genes showing a To provide a comprehensive yet easy to un- similar expression profile, investigation of their derstand and interpret representation of the promoter regions is an important step towards known motif conservation pattern, new tools the understanding of global cell regulation net- were developed that can generate a graphic for works. By coupling the current knowledge each cluster indicating on one side in which about experimentally proven transcriptional re- stains homologous proteins are found, and on gulation with the currently available raw genetic the other side whether or not these proteins pos- data, a better understanding of the similarities sess a certain transcription factor binding site in and differences between various species could their upstream intergenic region. With this be obtained. Most of the bacterial transcriptional method, the existence of different regulation sys- regulation data have however been obtained in tems for the CtsR and HrcA heat shock response two specific organisms, Escherichia coli and Bacil- regulons within the firmicutes could for instance lus subtilis, and comparative genomics is there- be shown. Our data suggest for example that fore necessary to evaluate to which extent the the mollicutes, which are characterized by a acquired knowledge is applicable to other bacte- verysmallgenomesizeandlackCtsR,have rial species. Therefore, the annotated proteins of placed the regulation of genes typically regu- 66 complete firmicutes genomes, including that lated by CtsR under the control of HrcA, or that of B. subtilis, were compared to each other in or- in the Staphylococcus strains the HrcA regulons der to build clusters of homologous proteins is not only regulated by HrcA itself, but also by and their upstream intergenic regions. Two ap- CtsR. proaches were then used to analyze these up- The analysis of the upstream intergenic region stream intergenic regions: the mapping of conservation for each cluster of homologous 156 genes is also of interest because it is expected high-scoring genes indicate that this promoter that regions involved in gene regulation are model might be useful for predicting promising more conserved than regions with no particular candidates for wet-lab experiments. function. To investigate this conservation, each cluster was therefore divided in subclusters 3. Analysis of trans-splicing in Ciona intesti- basedbothonthebacterialgenusandthesize nalis of the intergenic regions and aligned by Clus- talW. The aligned sequences were used to calcu- Li Shuang, Riu Yamashita, Takehiro Kusak- late the sequence conservation and position spe- abe1 and Kenta Nakai cific weight matrices generated for the con- served regions. By comparing these matrices to Trans-splicing, in which the original 5’-ends of each other, motifs corresponding to both known some pre-mRNAs are discarded and replaced by transcription factor binding sites and unknown the 5’-region of a certain SL RNA, has been re- conserved regions could be obtained; the analy- ported in Ciona intestinalis. We analyzed this sis of the latter group being currently underway. phenomenon using its EST and genome se- By carrying out comparative analysis of a quences. A conserved head sequence, which is large number of related genomes, concentrating called Spliced-Leader (SL), ATTCTATTTGAATA particularly on the conservation of the promoter AG, and could not be mapped to the genome regions of homologous genes, and using the ex- sequence, has been found in 419 of 2,077 5’- perimental data available in literature, signifi- ESTs. Depending on the existence of this Spliced cant differences in the regulatory networks of -Leader sequence, we classified the 5’-ESTs into not only a single strain, but of whole genus 2groups:SL+ and SL-. Using the UniGene could be highlighted. This approach will there- database, the gene names of each SL+ and SL- fore not only allow a refinement of the current groups were obtained. The fact that there were understanding of bacterial regulation networks, only small overlaps between these groups sug- but also provide new input for experimental re- gests that trans-splicing occurs on a specific search by helping in the design of experiments gene set. By regarding the number of EST clones and the interpretation of their results. for each gene as its expression strength, we found that the expression level of SL+ genes is 2. Construction of a promoter model for significantly high while that of SL- genes var- tissue-specific expression in Ciona intesti- ies greatly. In addition, by regarding the num- nalis ber of libraries where a gene’s EST(s) are ob- served as a degree of anti-tissue/developmental Alex Vandenbon, Takehiro Kusakabe1,and stage specificity, we found that the gene expres- Kenta Nakai: 1Graduate School of Life Science, sion between SL+ and SL- groups shows a University of Hyogo significant difference at the developmental stage of juvenile. We further used the Transcriptional regulation of gene expression information of human homologous genes to an- in eukaryotes is controlled by transcription fac- notate corresponding Ciona genes. As a result, tors binding cis-regulatory elements in regula- we found that a significant number of SL- tory sequences. Genes containing binding sites genes seem to be related to ribosomal functions for the same set of transcription factors in their while mitochondrion-related genes are found non-coding regions can be expected to be co- more frequently in SL+ genes. regulated at the transcriptional level. However, using merely the presence of these binding sites 4. Updating the database of transcriptional to predict new target genes for tissue-specific start sites (DBTSS) expression might result in high numbers of false positives, as the computational prediction of Riu Yamashita, Yutaka Suzuki2, Sumio Suga- these binding sites has been shown to suffer no2, and Kenta Nakai: 2Graduate School of from very bad specificity. To increase accuracy Frontier Sciences of the prediction of new target genes we con- structed a promoter model that does not only DBTSS (http://dbtss.hgc.jp) was constructed take into account the presence of cis-regulatory in 2002 based on experimentally-determined 5’- elements, but also their order in the regulatory end clones. During this year, we have added sequence and the distances between pairs of ele- several features. First, the number of clones cor- ments. We trained this model on 5 sets of tissue- responding to human genes has significantly in- specifically expressed genes of C. intestinalis and creased, from 190,964 to 1,359,000. Second, we used it to predict new target genes in a genome- defined putative promoter groups by clustering wide set of promoter sequences. Blast results for TSSs within a 500 base range because the con- 157 tent of our database is now large enough to ana- fied the 17,245 putative promoter regions (PPRs) lyze the existence of multiple promoters for each in 5,764 PAP-containing human genes into three gene. If a gene has several putative TSS clusters, categories: “conserved”, “marginal” and “non- we regard them as the evidence of alternative conserved” according to the results of sequence promoters. We found 8,308 human genes and comparison. The conserved PPRs are major and 4,276 mouse genes which have alternative pro- rich in CpG-island. In contrast, the non-conserved moters. Third, DBTSS now supports a function PPRs are minor and rich in repetitive elements. that enables detailed comparison between any The latter also are similar to intergenic region in pair of TSSs present in it. Finally, we have the distribution of CG content, and they pro- added TSS information of zebrafish (15,189 duce more transcripts that encode small or no TSSs: 32,263 clones), malaria (6,908 TSSs: 10,236 proteins than the conserved ones. Systematic lu- clones), and schyzon (14,029 TSSs: 22,923 ciferase assays of these PPRs revealed that both clones). classes of PPRs did have promoter activity, but that their strength ranges were significantly dif- 5. Comprehensive detection of human termi- ferent. Furthermore, we demonstrated that these nal oligo-pyrimidine (TOP) genes and characteristic features of the non-conserved analysis of their characteristics PPRs are shared with the PPRs of previously discovered putative non-protein coding tran- Riu Yamashita, Yutaka Suzuki2, Nono Tomita- scripts. Taken together, our data suggest that Takeuchi2, Hiroyuki Wakaguri2, Takuya Ueda2, there are two distinct classes of promoters in Sumio Sugano2, and Kenta Nakai humans, with the latter class of promoters emerging frequently during evolution. It is known that several genes have a terminal oligo-pyrimidine sequence at the 5’-end of their 7. Relationships between protein conserva- mRNA, and are hence called TOP genes. These tion and promoter conservation revealed genes are also known to be transcriptionally or by comparative sequence analysis be- translationally regulated. But it is not known tween human and mouse how many of these genes are present in the hu- man genome. We performed a detection of TOP Hirokazu Chiba, Riu Yamashita, Kengo Ki- gene candidates using accurate TSS information noshita, and Kenta Nakai provided by DBTSS. By using a position specific weight matrix constructed from 48 known TOP Comparative sequence analysis is a powerful genes, we could detect 1,645 candidate TOP tool to extract functional or evolutionary infor- genes from the 13,717 human genes in our data- mation from genomes of organisms. With the base. These 1,645 genes were broadly expressed use of complete genome sequences, there have compared with the rest of genes (p

Takeshi Obayashi, Atsushi Takabayashi7 , Noriko Ishikawa7, Fumihiko Sato7, and Kengo 159

Kinoshita: 7Kyoto University 15. Probabilistic alignment detects remote homology in a pair of protein sequences The information of gene co-expression is valu- without homologous sequence information able to predict gene functions. The performance of gene prediction depends on the quality of the Ryotaro Koike10, Kengo Kinoshita, and Akinori gene co-expression data and the method of pre- Kidera10: 10Yokohama City University diction. The quality of gene co-expression data depends on not only the quality and quantity of Dynamic programming algorithm and its heu- the original data but also on the normalizing ristics are the fundamental methods for similar- method and the method used to calculate corre- ity searches of biological sequences. Including lation coefficient between gene expression pat- additional information, such as homologous se- terns. To compare these methods, we developed quences in the profile comparison, has refined a new system to automatically predict gene the detection power of sequence comparison. function from its co-expression data. Using this We described a new approach, probabilistic system, we assessed the methods to construct alignment (PA), which gives improved detection gene co-expression data. In addition to compu- power using only a pair of amino acid se- tational prediction, the methods are compared quences. Receiver operating characteristic (ROC) by large-scale gene disruption. analysis showed that the PA method is far supe- rior to BLAST, and that its sensitivity and selec- 13. Analyses of homo-oligomer interfaces of tivity approach to those of PSI-BLAST. Particu- proteins from the complementarity of mo- larly for orphan proteins, PA exhibits much bet- lecular surface, electrostatic potential and ter performance than PSI-BLAST. hydrophobicity 16. Prediction of disordered region and terti- Yuko Tsuchiya, Kengo Kinoshita, and Haruki ary structure of proteins from its amino Nakamura8: 8Osaka University acid sequence

To extract the structural interacting patterns Takashi Ishida and Kengo Kinoshita between proteins, we developed a method to es- timate the complementarities for hydrophobic- Identification of protein intrinsically disor- ity, electrostatic potential on the molecular sur- dered or unstructured regions is useful for pro- faces and shape of the surfaces in protein- tein structure determination and protein tertiary protein interfaces. We have found that the homo structure prediction. We developed a method to -oligomer interfaces can be classified into five predict protein-disordered regions from its groups according to the structure of the inter- amino acid sequence. We achieved high predic- face; cyclic-oligomer, twisted-dimer, dimer- tion accuracy by combining the prediction from parallel , dimer-perpendicular and dimer- local sequence information and the prediction circular. As the results of correlation analyses from global sequence alignment. At the same between the shape classification and the prop- time, prediction of protein tertiary structure erty complementary, new characteristic trends from amino acid sequence without the informa- as the possible necessary conditions of protein- tion of structure of homologues is still a big protein interactions are emerging. problem. We developed a new potential func- tion based on contact number prediction to 14. PreBI: prediction of biological interfaces evaluate the matching between an amino acid of proteins in crystals sequence and a tertiary structure, and applied this potential to protein tertiary structure predic- Yuko Tsuchiya, Kengo Kinoshita, Nobutoshi tion. Ito9, and Haruki Nakamura8: 9Tokyo Medical and Dental University 17. Int-surf: a database for interacting sites for protein-protein interaction PreBI is a WWW server for predicting biologi- cal interfaces in protein crystal structures ac- Miho Higurashi, Takashi Ishida, and Kengo cording to the complementarities of the electro- Kinoshita static potential, hydrophobicity and shape of the interfaces, along with the area of the interfaces. Int-surf is a database of interacting sites of The results can be checked through our interac- proteins. For all PDB entries, interacting sites tive viewer. (http://pre-s.protein.osaka-u.ac.jp/ with other proteins, small molecules and DNA ¯prebi/) are calculated from the atomic coordinates in PDB. The information of the interacting sites of 160 homologous proteins is mapped onto each pro- Cholesterol, as a component of raft structures tein structure, so users can see possible interac- which are important in signal transduction, pro- tion sites of the considering protein, when its tein transport, etc., is a fundamental molecule in complex structure is not available, if the com- most eukaryotic cell membranes. A Molecular plex structures of close homologues are avail- dynamics simulation was applied to dipalmi- able. Using jV version 3, an interactive 3D toylphosphatidylcholine (DPPC) with several viewer program, Int-surf allow users to observe concentrations of cholesterol. Comparing a pure the interactions with interactive visualization. phospholipid membrane and a cholesterol- containing one suggests that sterol modifies the 18. Molecular dynamics study of lipid mem- membrane to a higher order phase. These mixed branes containing cholesterol bilayers could be useful as surrounding systems for eukaryotic membrane proteins. Naoya Fujita, Takashi Ishida, and Kengo Ki- noshita

Publications

Horton,P.,Park,K.-J.,Obayashi,T.,andNakai, DNA methylation status at alternative pro- K. Protein subcellular localization prediction moters of human genes in various normal tis- with WoLF PSORT, In Proc. 4th Asia-Pacific sues, DNA Res., 13(14): 155-167, 2006. BIOINFORMATICS Conference (APBC2006) Tsuchiya, Y., Kinoshita, K., and Nakamura, H. Edited by Jiang, T. et al. (Imperial College Analysis of homo-oligomer interfaces of pro- Press), pp. 39-48, 2006. teins from the complementarity of molecular Kimura, K., Watanabe, A., Suzuki, Y., Ota, T., surface, electrostatic potential and hydropho- Nishikawa, T., Yamashita, R., Yamamoto, J., bicity, Protein Eng Des Sel., 19: 421-429, 2006. Sekine, M., Tsuritani, K., Ishii, S., Sugiyama, Tsuchiya, Y., Kinoshita, K., Ito, N., and Naka- T.,Saito,K.,Isono,Y.,Irie,R.,Kushida,N., mura, N., PreBI: Prediction of biological inter- Yoneyama, T., Otsuka, R., Kanda, K., Yokoi, faces of proteins in crystals, Nucl Acid Res, T., Kondo, H., Wagatsuma, M., Murakawa, K., 34: W320-324, 2006. Ishida, S., Ishibashi, T., Takahashi-Fujii, A., Kinoshita, K., Kusunoki, M., and Miyai, K. Tanase, T., Nagai, K., Kikuchi, H., Nakai, K., Analysis of the three-dimensional structure of Isogai, T., and Sugano, S. Diversification of the “CXGXC” motif in the CMGCC and transcriptional modulation: large-scale identi- CAGYC regions of alpha-and beta-subunits of fication and characterization of putative alter- human chorionic gonadotropin: Importance of native promoters of human genes, Genome Glycine residue in the motif, Endocrine J., 53, Res., 16: 55-65, 2006. 51-38, 2006. Sierro,N.,Kusakabe,T.,Park,K.-J.,Yamashita, Ishida, T., Nakamura, S., and Shimizu, K. Poten- R.,Kinoshita,K.,andNakai,K.DBTGR:ada- tial for assessing quality of protein structure tabase of tunicate promoters and their regula- based on contact number prediction. Proteins tory elements, Nucl. Acids Res., 34: D552-D 64: 940-947, 2006. 555, 2006. 中井謙太,生物学の未来を考える,蛋白質核酸酵 Yamashita, R., Suzuki, Y., Wakaguri, H., Tsuri- 素51(12):1704―1707,2006. tani, K., Nakai, K., and Sugano, S. DBTSS: Da- 中井謙太,第8章:バイオインフォマティクス~ tabase of Human Transcription Start Sites, ホモロジー解析からシステム生物学まで~,遺 Progress Report 2006, Nucl. Acids Res., 34: D 伝子工学集中マスター,山本雅・仙波憲太郎編 86-D89, 2006. 集,羊土社,pp.102―111,2006. Cheong, J., Yamada, Y., Yamashita, R., Irie, T., 木下賢吾,eF-siteを利用した機能部位の予測,バ Kanai, A., Wakaguri, H., Nakai, K., Ito, T., イオテクノロジージャーナル,6:444―447, Saito, I., Sugano, S., and Suzuki, Y. Diverse 2006. 161

Human Genome Center Laboratory of Biostatistics (Biostatistics Training Unit) バイオスタティスティクス人材養成ユニット

Project Lecturer Rui Yamaguchi, Ph.D. 特任講師 博士(理学) 山 口 類 Project Research Associate Atsushi Doi, Ph.D. 特任助手 博士(理学) 土 井 淳

The main projects of our laboratory are to discover new biological meaning at mo- lecular level from vast array of biological data by various statistical approaches, and to train the researchers for the right use of statistical techniques. The sub- jects under investigation are as follows: Extraction of useful information from time- course gene expression data by using time-series models, development of pa- rameter estimation methods for bio-simulation models with statistical approaches, and building bio-pathway models with wide variety of background knowledge.

1. Extraction of Useful Information from Time b. Finding Module-Based Gene Networks Course Gene Expression Data with Statis- with State-Space Models tical Time Series Models Rui Yamaguchi, Ryo Yoshida2, Seiya Imoto2, a. State-space Approach with the Maximum Tomoyuki Higuchi2, Satoru Miyano2: 2Human Likelihood Principle to Identify the System Genome Center, Laboratory of DNA Informa- Generating Time-course Gene Expression tion Analysis Data of Yeast We discuss the use of the state space models Rui Yamaguchi and Tomoyuki Higuchi1: 1Insti- to analyze time-course microarray gene expres- tute of Statistical Mathematics sion data. Typical features of time-course mi- croarray data are short time-course and high- We use linear Gaussian state-space models to dimensional observational vector. By these as- analyze time-course gene expression data. They pects, conventional statistical models such as the are modeled to be generated from hidden state multivariate autoregressive models lead to un- variable in a system. To identify the system, we suitable results due to the overfitting. The state estimate parameters of the model by EM algo- space models have the potential to overcome rithm and determine dimension of the state this problem by the dimension reduction proc- variable by BIC. We apply this method to a ess. This study provides (1) a survey of the state published cell-cycle gene expression data of space models, (2) a review of some existing re- yeast (Saccharomyces cerevisiae). The determined searches using the state space models for time- dimension of the state variable are compared course microarray data together with a biologi- with those reported in other papers, which were cal meaning of the model, (3) a solution for the obtained by other parameter estimation methods lack of parameter identifiability, and (4) identifi- and criteria. cation of biological system that is a central topic 162 in bioinformatics. Finally, we show the useful- 3. Building Bio-Pathway Models with Back- ness of the state space models for time-course ground Knowledge microarray data by the analysis of Saccharomyces cerevisiae cell cycle gene expression data. As a a. Simulation-Based Validation of the p53 result, we estimated the number of the gene Transcriptional Activity with Hybrid Func- modules, and a gene-module network. tional Petri Net

2. Development of Parameter Estimation Atsushi Doi, Masao Nagasaki2,HiroshiMat- Methods for Bio-Simulation Models with suno3, and Satoru Miyano2 Statistical Approaches MDM2 and p19ARF are essential proteins in a. Genomic Data Assimilation for Estimating cancer pathways forming a complex with pro- Hybrid Functional Petri Net from Time- tein p53 to control the transcriptional activity of Course Gene Expression Data protein p53. It is confirmed that protein p53 loses its transcriptional activity by forming the Masao Nagasaki2 , Rui Yamaguchi, Ryo functional dimer with protein MDM2. However, Yoshida2, Seiya Imoto2, Atsushi Doi, Yoshinori it is still unclear that protein p53 keeps its tran- Tamada1, Hiroshi Matsuno3, Satoru Miyano2, scriptional activity when it forms the trimer and Tomoyuki Higuchi1: 3Fuculty of Science, with proteins MDM2 and p19ARF. We have ob- Yamaguchi University served mutual behaviors among genes p53, MDM2, p19ARF and their products on a com- For simulation models of biological pathways, putational model with hybrid functional Petri in most of cases, parameters to govern them are net (HFPN) which is constructed based on infor- tuned by experts empirically at the present time. mation described in the literature. The simula- In this study we take a novel approach for this tion results suggested that protein p53 should area, in order to estimate these parameters and have the transcriptional activity in the forms of to select the best model from several candidate the trimer of proteins p53, MDM2, and p19ARF. ones by using observed data. The approach is This paper also discusses the advantages of called the data assimilation (DA) of which the HFPN based modeling method in terms of path- concept is to incorporate information from ob- way description for simulations. served data into a simulation model. We can ex- pect to obtaine more plausible results from the b. A Combined Pathway to Simulate CDK- simulation model by this approach. From the Dependent Phosphorylation and ARF- point of view of the statistical modeling, DA can Dependent Stabilization for p53 Transcrip- be realized by solving an inverse problem to es- tional Activity timate unknown state and parameters of a simu- lation model by using observed data. To formu- Atsushi Doi, Masao Nagasaki2,HiroshiMat- late the problem, we use a statistical time series suno3, and Satoru Miyano2 model which is called a nonlinear state space model (SSM). Using an SSM, we can employ ef- The protein p53 is phosphorylated by a mem- fective statistical methods to estimate parame- ber of protein kinases such as CDK7, and stabi- ters, i.e. a particle filter, which is based on lized by the protein ARF. The phosphorylation Monte Carlo simulation. In order to examine the and stabilization of p53 is believed to enhance applicability of the approach for biological simu- its transcriptional activity and act simultane- lation models, the methods was applied to a ously. Biological pathways composed of experts model of circadian rhythm represented by a hy- knowledge obtained from the literature are in- brid functional Petri net (HFPN) with a synthe- cluding these activation mechanisms. However, sized data. the map of biological pathways does not reflect the combination effect of phosphorylation and stabilization. We have conducted some simula- tions of biological pathways with hybrid func- tional Petri net (HFPN) after careful reading of papers. In this paper, we constructed the HFPN based biological pathway of CDK-dependent phosphorylation pathway and combine with ARF-dependent pathway described previously, to observe the effect of the phosphorylation on the stabilization with simulation-based valida- tion. 163

Publications

Doi, A., Nagasaki, M., Matsuno, H., and Miya- Miyano, S., and Higuchi, T., Genomic Data no, S., Simulation based validation of the p53 Assimilation for Estimating Hybrid Functional transcriptional activity with hybrid functional Petri Net from Time-Course Gene Expression Petri net, In Silico Biol., 6(0001), 2006.. 46-61, 2006. Doi, A., Nagasaki, M., Matsuno, H., and Miya- Yamaguchi, R., and Higuchi, T., State-space Ap- no, S., A combined pathway to simulate CDK- proach with the Maximum Likelihood Princi- dependent phosphorylation and ARF-depen- ple to Identify the System Generating Time- dent stabilization for p53 transcriptional activ- Course Gene Expression Date of Yeast, Inter- ity, Genome inform., 17(1): 112-123, 2006. national Journal of Data Mining and Bioinfor- Nagasaki, M., Yamaguchi, R., Yoshida, R., matics, 1(1): 77-87, 2006. Imoto, S., Doi, A., Tamada, Y., Matsuno, H., 164

Human Genome Center Department of Public Policy 公共政策研究分野

Associate ProfessorKaoriMuto,Ph.D. 助教授 武藤香織

Department of Public Policy has launched since September 2007 as a new and the first social scientific section on medical sciences featuring genomics. We value conducting empirical studies in many fields of social science for future policy mak- ing. We work for three major missions; public policy studies on translational re- search, its application to healthcare and its impact on social security, practical ad- vices and survey for research projects to build public trust, and “minority-centered” scientific communication.

1. Public policy studies on translational re- research participants and candidates to evaluate search, its application to healthcare and the consent process of ongoing genomic studies its impact on social security to suggest future change or correction of ethical guidelines for genomic research. Exploring the human genome give us lots of critical keys to approach human health and dis- 2. Practical advices and survey for research ease and to develop new diagnostic tools and projects to build public trust preventive methods. However, ethical, legal and social aspects have been featured since the Medical scientists sometimes feel anxiety launch of the Human Genome Project. Various about protection of human participants and ethi- ethical concerns have been raised concerning cal issues. We help them to picture the ethically- these scientific quest as well as expectation for valid roadmap towards their goals and suggest change. Ethical discussions, however, tend to go regulatory scheme for social responsibility of before reviewing empirical evidences and facts. science, which includes creating consent forms Our major mission is to conduct empirical and brochures with high readability, considering analysis by interdisciplinary approach for future procedures of informing participants and obtain- public discussions. We collaborate with experts ing consents, considering ethical balance of the of several fields of social sciences; e.g. sociology, whole protocol before applying ethical review anthropology, psychology, disability studies, boards, and public disclosure throughout the re- economics, finance, policy science, law, market- search process. ing, STS (science, technology and society) and so on. One of the urgent issues should be an em- 3. “Minority-centered” scientific communica- pirical analysis of social and financial aspects of tion personalized medicine. We’re examining social and financial validity and draw regulatory Japanese government emphasizes the impor- framework of clinical introduction of drug sensi- tance of promoting communication between re- tivity genetic testing, by estimating saving costs searchers, engineers and society in Science and and impact on healthcare insurance system. Technology Basic Plan (2006). In a word “soci- Also we’re planning to conduct surveys towards ety”, we have to imagine various people with 165 diverse values. We will organize a range of pub- We conduct surveys and suggest policy agenda lic engagement projects and science participation how science and policy works throughout these events and particularly we focus on “minority- practices. centered” scientific communication, empowered Collaborating with modern artists, we think by disability studies and gay/lesbian studies. future of science and society. One of our related We feature perspectives from people with chro- projects is “Delivery by Male Project” (2004-) nic disorders, the disabled/challenged, LGBT/ with Hiroko Okada, which features ethical and queers and ethnic minorities. These people are social issues of male pregnancy using assisted not exactly “minority” in our society, but their reproductive technologies. claims towards science are not easy to be heard.