Detection of Tissue-Specific Genes and Computational Analysis of Testis
Total Page:16
File Type:pdf, Size:1020Kb
Detection of tissue-specific genes and Title computational analysis of testis-specific gene expression regulatory regions Author(s) 山下, 明史 Citation Issue Date Text Version ETD URL http://hdl.handle.net/11094/2599 DOI rights Note Osaka University Knowledge Archive : OUKA https://ir.library.osaka-u.ac.jp/ Osaka University Detection of tissue-specific genes and computational analysis of testis-specific gene expression regulatory regions (組織特異的遺伝子の検出と精巣特異的遺伝子の 発現制御領域のコンピュータを使った解析) by Akifumi Yamashita Department of Genome Informatics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan February, 2009 1/97 Contents LIST of ABBREVIATIONS 5 ABSTRACT 6 1 INTRODUCTION 7 2 METHODS 10 2.1 Detection of tissue-specific genes 10 2.2 Annotation of tissue-specific genes 10 2.3 Extracting the 5′-regulatory regions 11 2.4 Selection of over-represented 8-mers 11 2.5 Comparison of the 8-mer frequency within regulatory regions of 13 testis-specific genes with those of non-testis-specific genes. 3 RESULTS 14 3.1 Detection of tissue-specific genes 14 3.2 Selection of testis-specific genes which can be used for the 14 following analysis 3.3 Classification of the over-represented 8-mers 15 3.4 Conservations of the flanking region of highlighted 8-mers 16 2/97 4 DISCUSSION 18 ACKNOWLEDGEMENTS 23 REFERENCES 24 TABLES 27 Table 1 - Number of tissue-specific genes 27 Table 2 - Classification of 117 representative 8-mers on the basis of their 28 testis association level Table 3 - 8-mers which appeared near by the other 8-mers 31 FIGURES 32 Fig. 1 - Number of tissue-specific genes 32 Fig. 2 - Identification of testis-specific genes and their 5′-regulatory 33 regions Fig. 3 - Profiling of 634 testis-specific genes 35 Fig. 4 - Conservations of flanking region of the highlighted 8-mers in 36 testis-specific and non-testis-specific regulatory regions 3/97 APPENDIX TABLES 40 Appendix Table 1 - List of tissue specific genes 40 Appendix Table 2a - Testis-specificity of the genes, representative 8-mer 75 and annotation Appendix Table 2b - Testis-specificity and annotation of the genes whose 90 regulatory regions were not completely sequenced or overlapped with other regulatory regions Appendix Table 2c - Testis-specificity and annotation of the genes which 92 were not supported to cover the intact 5'-ends Appendix Table 2d - Testis specificity and annotation of 18 testis-specific 95 genes whose clone IDs were duplicated with other genes shown in Appendix Table 2a, b, and c Appendix Table 2e - Testis-specificity of the genes whose clone IDs were 96 not available PUBLICATIONS 97 4/97 LIST of ABBREVIATIONS bp base pair(s) cAMP cyclic adenosine 3′, 5′-monophosphate cDNA DNA complementary to RNA CRE cAMP response element CSS cardiac specific sequence GNF Genomics Institute of the Novartis Research Foundation S.D. standard deviation TPA 12-O-tetradecanoylphorbol-13-acetate TRE TPA-responsive element TSS transcription start site UCSC University of California, Santa Cruz, USA A Adenine T Thymine G Guanine C Cytosine R Guanine or Adenine (puRine) Y Thymine or Cytosine (pYrimidine) H Adenine or Thymine or Cytosine (not Guanine, H follows G) V Guanine or Cytosine or Adenine (not Thymine (Uracil), V follows U) B Guanine or Cytosine or Thymine (not Adenine, B follows A) N Any nucleotide (aNy) 5/97 ABSTRACT Accumulation of microarray data has enabled the computational analysis of gene expressions in various tissues. We have collected genes showing tissue, organ or cell specific genes using GNF mouse gene expression database. Thereafter, we searched for features of regulatory regions of testis-specific genes, because the genes showing testis-specific expression are the most abundant among the genes exhibiting tissue-specific expression, and no systematic study has been conducted for over-represented motifs within their regulatory regions. We have identified 117 over-represented 8 nucleotide sequences (8-mers) that appeared 2,648 times within the regulatory regions of 634 testis-specific genes. Of these, 64 over-represented 8-mers were significantly more frequent in the regulatory regions of testis-specific genes than in those of non-testis-specific genes. In this group of 8-mers, 4 8-mers differed from the canonical cAMP response element (CRE) 8-mer by 1 nucleotide, but appearance frequency of the canonical CRE in the regulatory regions of testis-specific genes is not significantly differ from those of non-testis-specific genes. We consider that these CRE-like 8-mers participate in the regulatory expression of testis-specific genes to a greater extent than the canonical CRE 8-mer. 6/97 1. INTRODUCTION The difference of cell types in a multicellular organism comes from the difference of gene expression. Gene expression can be regulated at many of the steps in the pathway from DNA to protein, such as transcriptional control, RNA processing control, RNA transport and localization control, translational control, mRNA degradation control, and protein activity control. For most genes, the major cause of the regulation is the transcriptional control. The transcriptional control of each gene is, whether complex or simple, done by the interactions between gene regulatory proteins and short stretches of DNA of defined sequence in a regulatory region relatively near the site where transcription begins (Molecular biology of The Cell, fourth edition chapter 7). Now, we are able to search for such DNA sequences from the regulatory regions based on statistical methods, because of the determination of genomic sequences of more than 48 eukaryotes (http://www.ensembl.org/ on 2008’ Nov. 21) and transcription start sites (TSSs) of genes. FitzGerald et al. (2004) used the database of the TSSs (dbTSS, Suzuki et al., 2004) to collect human promoter sequence and investigated the distribution of 8 nucleotide sequences (8-mers). They found that 156 8-mers which cluster very significantly near the TSS could be placed into 9 groups of related sequences and 7 of them are found to be known transcription factor binding sites. Further, microarray technology enabled us to know the expression level of many genes in many different tissues. This technology can be used to find genes with unique expression pattern, such as tissue-specific genes. For example, Bono et al. 7/97 (2003) identified 7,206 separate mRNA clones that satisfied stringent criteria for tissue-specific expression based on the expression profile of 57,931 clones on 20 mouse tissues using cDNA microarrays. In this way, we are able to collect tissue-specific genes and their regulatory regions. We started our research detecting tissue-specific genes. Thereafter, we used testis-specific genes to investigate the mechanism of regulation of tissue-specific gene expression, because spermatogenesis is an excellent model for studying the regulation of gene expression during differentiation (Wolgemuth and Wartrin, 1991), and because no systematic study has been conducted for frequently appearing sequences within these regulatory regions to search for the candidates of transcription factor binding sites. To find these motifs, we initiated the systematic isolation of testis-specific genes based on the method of Bono et al. (2003) with more stringent parameters, using the latest gene expression database of the Genomics Institute of the Novartis Research Foundation (GNF), which provides gene expression data of 36,182 mouse genes obtained from 2 independent experiments with 61 tissues, organs, and cells (Su et al., 2004). Functions of these testis-specific genes were checked, and then the regulatory sequences were extracted based on the positional information of the TSS (Suzuki et al., 2004). Then we attempted to identify the regulatory motifs participating in the testis-specific gene expression on the basis of the following working hypotheses: (i) Most of the regulatory motifs are localized within the 5′-regulatory regions between –300 bp and +50 bp ([–300 to +50]) relative to the TSSs (FitzGerald et al., 2004), (ii) a length of 8 nucleotides (8-mer) is adequate for searching the regulatory motifs (FitzGerald et al., 2004), and (iii) most of the significantly over-represented motifs are regulatory motifs. 8/97 To find significantly over-represented motifs within these regulatory regions, we conducted the following two-step analysis. First, to find these significantly over-representing 8-mers, we estimated the expected appearance of each 8-mers from the nucleic content of these regions. If appearance of an 8-mer is significantly more frequent than the expected one, the 8-mer is regarded as over-represented 8-mer. After collecting 8-mers showing the greatest significance in each regulatory region, we proceeded to the next step: we compared the appearance frequency of these 8-mer per regulatory region among testis-specific genes with those among non-testis-specific genes. If an 8-mer appears significantly more frequent in the regulatory region of testis-specific genes than in those of non-testis-specific genes, the 8-mer seems to be related to the expression of testis-specific genes. 9/97 2. METHODS 2.1. Detection of tissue-specific genes Tissue-specific genes were selected from the Mouse GNF1M (MAS5-condensed) expression database, which was downloaded from the GNF website (http://wombat.gnf.org/index.html). This database contains the gene expression data of 36,182 mouse genes obtained from 2 independent experiments with 61 tissues, organs, and cells (hereafter, they will be referred to simply as tissues). The gene expression intensities of the 2 experiments were averaged, and thereafter log-transformed. Since each gene has expression data from 61tissues, these data for each gene were normalized. Bono et al. (2003) denoted the gene as tissue specific if the normalized value exceeded mean + 3 standard deviation (S.D.) for their cDNA microarray data and mean + 2 S.D. for Affy chips.