US 2006O123505A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/0123505 A1 Kikuchi et al. (43) Pub. Date: Jun. 8, 2006

(54) FULL-LENGTH PLANT CDNA AND USES Publication Classification THEREOF (75) Inventors: Shoshi Kikuchi, Ibaraki (JP): Naoki (51) Int. Cl. Kishimoto, Ibaraki (JP): Kouji Satoh, AOIH 5/00 (2006.01) Ibaraki (JP); Toshifumi Nagata, Ibaraki G06F 9/00 (2006.01) (JP); Nobuyuki Kawagashira, C7H 2L/04 (2006.01) Tsukuba-shi (JP); Junshi Yazaki, Ibaraki (JP); Masahiro Ishikawa, AOIH I/00 (2006.01) Ibaraki (JP); Koji Doi, Ibaraki (JP); C07K I4/45 (2006.01) Jun Kawai, Kanagawa (JP); Yoshihide C07K 6/18 (2006.01) Hayashizaki, Ibaraki (JP); Yasuhiro (52) U.S. Cl...... 800/278; 800/320.2: 435/468; Otomo, Ibaraki (JP); Kenichi 530/370; 435/419; 530/388.1; Matsubara, Osaka (JP); Kazuo 536/23.6; 702/20 Murakami, Ibaraki (JP) Correspondence Address: (57) ABSTRACT KLARQUIST SPARKMAN, LLP 121 SW SALMON STREET SUTE 16OO Full-length cDNAs of plants and their uses are provided. PORTLAND, OR 97204 (US) Source plants are preferably monocot plants, more prefer ably poaceous plants, and most preferably rice. Vectors (73) Assignees: National Institute of Agrobiological carrying said cDNAS and transformants containing said Sciences: RIKEN, Foundation for cDNAS or said vectors, transgenic plants containing said Advancement of International Scienc transformants, polypeptides encoded by said cDNAS are also provided. The full-length cDNA clones play important (21) Appl. No.: 10/449,902 roles in the annotation of correct gene coding region, deter mination of exons and introns, comprehensive expression (22) Filed: May 29, 2003 analysis on the transcription level and proteome analysis. (30) Foreign Application Priority Data Furthermore, full-length cDNA clones are industrially useful in producing plants having different properties from those of May 30, 2002 (JP)...... 2002-203269 the wild type due to the inhibition of expression and func Dec. 11, 2002 (JP)...... 2002-38387O tional Suppression in plant bodies. Patent Application Publication Jun. 8, 2006 US 2006/0123505 A1

Fig. 1 006-203-C08 006-203-C08 006-203-C08 2300610 23006J 10 23006J10 2300610 CDS01-11 CDS0-09 CDSO-07 COSO-05 CDS01-03 CDS01-01 O 1000 2000 3000 4000 5000 6000 7000 8000 9000 US 2006/O123505 A1 Jun. 8, 2006

FULL-LENGTH PLANT CIDNA AND USES these draft genome sequences were obtained based on the THEREOF whole-genome shotgun sequencing method, and therefore they have gaps, and no chromosomal information. In com FIELD OF THE INVENTION parison, the International Rice Genome Sequencing Project announced its intention to use the sequencing by the method 0001. The present invention relates to a full-length plant using BAC/PAC clones to complete most of the rice genome CDNA sequence by the end of 2002. The results of partial analysis of rice chromosomes 1 and 4 were published (Sasaki, T. et and uses thereof. al., “The genome sequence and structure of rice chromo REFERENCE TO TABLES AND A SEQUENCE some 1.” Nature, 2002, 420, 312-316; Q. Feng et al., LISTING “Sequence and analysis of rice chromosome 4.' Nature, 2002, 420, 316-320). The genome sequence data from both 0002 Tables 13 through 17 inclusive and a Sequence rice and Arabidopsis thaliana can be used to compare how Listing are providedinelectronicformatonly oncompactdiscs, monocot and dicot plants differ from each other and how aspermitted under 37 CFR 1.52(e) and 1.821(c). The disc plant genomes differ from animal genomes. entitled “Tables' (Copy 1 and Copy 2) contains the follow 0005. When all of these sequence data are completely ing files: obtained, decoding of the rice genome sequence will be accomplished. These sequence data are, however, not suf ficient as data for gene function analysis, because the current File name Size (KB) Date recorded onto disc technology cannot completely predictgene coding regions and other regions based on only the genome sequence data. Table 13.doc 3,770 May 23, 2003 Table 14.doc 5,148 May 23, 2003 The full-length cDNA project has been initiated aiming at Table 15.doc 234 May 23, 2003 making up for Such data gaps as well as accumulating Table 16.doc 4,388 May 23, 2003 comprehensive information on the expression of the rice Table 17.doc 858 May 23, 2003 gene and its protein. Information on the full-length cDNA clones greatly contributes to determine annotation, exon and intron of the correct gene-coding region. In addition, Such 0003) The disc entitled “Sequence List” (Copy 1 and information on these cDNA clones is important for the Copy 2) contains the following files: exhaustive expression analysis at the transcriptional level and proteome analysis. The results of the full-length cDNA project on Arabidopsis have already been published (Seki, M. et al., “Functional Annotation of a Full-Length Arabi File name Size (KB) Date recorded onto disc dopsis cDNA Collection.” Science, 2002, 296, 141-145). OO1.tx 4,688 Apr. 18, 2003 OO2.tx 4,662 Nov. 25, 2002 SUMMARY OF THE INVENTION OO3.tx 4,853 Nov. 25, 2002 0006 Even today when decoding of the rice genomic OO4.tx 4,832 Nov. 25, 2002 OO5.tx 4,985 Nov. 25, 2002 DNA is coming to an end, we are still in great need of OO6.tx 4,962 Nov. 25, 2002 isolation and characterization of the full-length cDNA of OO7.tx 5,030 Nov. 25, 2002 rice. OO8.tx 5,000 Nov. 25, 2002 OO9.tx 4.944 Nov. 25, 2002 0007. These full-length cDNA clones can be used for O10.tx 4,981 Nov. 25, 2002 various purposes. They play important roles in the annota O11.tx 5,024 Nov. 25, 2002 tion of correct gene coding regions, exon-intron determina O12.tx 5,036 Nov. 25, 2002 O13.tx 5,662 Nov. 26, 2002 tion, comprehensive expression analysis at the transcription 014.tx 7,950 Nov. 28, 2002 level and analysis of proteome. Furthermore, full-length O15.tx 7,906 Nov. 28, 2002 cDNAs are industrially useful in creating plant bodies O16.tx 8,018 Nov. 28, 2002 showing a phenotype that is different from the wild type as O17.tx 8,514 Nov. 28, 2002 O18.tx 8,429 Nov. 28, 2002 a result of inhibiting their expression and function in the O19.tx 8,350 Nov. 28, 2002 plant bodies. O2O.tx 8,294 Nov. 28, 2002 O21.tx 8,463 Nov. 28, 2002 0008 Diverse uses of these full-length cDNA clones can O22.tx 8,449 Nov. 28, 2002 be mentioned. First, alignment of these clones togenome O23.tx 499 Nov. 28, 2002 sequences enables to check the fidelity of computer predic tion of gene coding regions from the genome sequence. Such as, a transcription initiation point, exons, introns, and a The material on these discs is hereby incorporated by transcription termination point. In contrast, this fidelity reference in its entirety. checking leads to the improvement of gene coding region prediction program. Furthermore, information of full-length BACKGROUND OF THE INVENTION cDNA clones is combined with results of comprehensive 0004 Recently, draft sequences of the rice (indica sub analysis of expression profiles at the transcriptional level species) genome (Yu, J. et al., “A Draft sequence of the rice using microarrays such, to predict promoter and transcrip genome (Oryza sativa L. ssp. indica) Science, 2002, 296, tion regulatory regions of genes of interest in the genome 79-92) and japonica subspecies (Goff, S. A. et al., “A Draft Sequence. sequence of the ricegenome (Oryza sativa L. ssp. Japonica) 0009 Furthermore, comparison between full-length Science, 2002, 296, 92-100), have been published. Both of cDNA clone sequence information and sequences adjacent US 2006/O123505 A1 Jun. 8, 2006

the insertion site of insertion-mutants such as transposon (b) a DNA molecule encoding RNA having ribozyme activ insertion mutants, enables to find a plant whose gene cor ity to specifically cleave a transcript of the DNA of 1 or (2): responding to said clone is disrupted. The function of said gene can be predicted from the phenotype of the plant. It is (c) a DNA molecule encoding RNA inhibiting the expres also possible to express the protein encoded by the full sion of the DNA of 1 or 2) via an RNAi effect at the time length cDNA clone using various systems such as Escheri of expression of said DNA in plant cells; and chia coli, yeast, in vitro system so as to investigate the (d) a DNA molecule encoding RNA inhibiting the expres biochemical function and conformation of the protein, sion of the DNA of 1 or 2 by the co-suppression effect at thereby elucidating overall its functions. Furthermore, pro the time of expression of said DNA in plant cells. teins that interact with said protein can be found by using the yeast two-hybrid system and Such to clarify a part of a 4) A vector containing the nucleic acid of anyone of 1 biological network in vivo. through 3. 5A transformed plant cell maintaining the nucleic acid of 0010 Given this, an objective of the present invention is to provide the full-length cDNA clones of plants. Another any one of 1 through 3 or the vector of 4. objective of this invention is to modify plants using the 6. A transformed plant body containing the transformed cDNAs thus isolated and characterized. plant cell of 5. 0011. The present inventors collected 3'-EST sequences 7. A progeny or clone of the transformed plant body of 6. of 175,642 full-length cDNA clones using two different 8) A propagation material of the transformed plant body of methods, namely, the oligo-capping and biotinylated CAP 6 or 7. trapper methods. These clones were clustered into 28,469 nonredundant groups, and all representative clones from 0013 9 A method of producing the transformed plant each group were completely sequenced with 99.98% fidelity. body of 6), wherein said method comprises the step of As a result of homology searches, 21.596 (75.86%) of these transducing the DNA of any one of 1 through 3 or the representative full-length cDNA clones (28,469) were anno vector of 4 into plant cells to regenerate a plant body from tated. ORFs were present in 28.332 clones; among them said plant cells. 24,507 clones had ORFs containing 100 or more amino-acid 10A protein encoded by any one of the nucleic acids of 1. residues. As a result of attempting to map the28,469 full length cDNA clones, 18.933 transcription units (TU) were 11 A method of producing the protein of 10 comprising mapped to the indica draft genome. Of the said full-length the following steps: cDNA clones, 18.900 clones (12.996TU) had a homology to (1) transducing any one of the nucleic acids of 1 or a vector genes (27.288) of Arabidopsis predicted from its genome containing said nucleic acid into cells capable of expressing sequence. Thus, the present inventors succeeded in compre said nucleic acid so as to obtain a transformant; hensive collection, grouping, sequencing and functional annotation of full-length cDNA clones from rice. (2) culturing said transformant; and 0012 That is, the present invention relates to full-length (3) recovering the protein of 10 from the culture of the step cDNAs of plants and uses thereof, more specifically to: (2). 12 An antibody binding to the protein of 10). 1) An isolatedplant-derivednucleic acid, wherein said nucleic acid is selected from the group consisting of 13A rice gene database comprising sequence information selected from the group consisting of: (a) a nucleic acid encoding a protein comprising an amino acid sequence set forth in any one of SEQ ID NOS: 28470 (a) one or more amino acid sequences selected from SEQID through 56791; NOs: 28470 through 56791; (b) a nucleic acid containing the coding region of a nucle (b) one or more nucleotide sequences selected from SEQID otide sequence set forth in any one of SEQ ID NOs: 1 NOs: 1 through 28469; and through 284.69; (c) both (a) and (b). (c) a nucleic acid encoding a protein comprising an amino 14. A method of determining the transcriptional regulatory acid sequence set forth in any one of SEQ ID NOS: 28470 region comprising the steps of through 56791 wherein one or more amino acids are sub (1) mapping the nucleotide sequence of any one of SEQ ID stituted, deleted, inserted and/or added; and NOs: 1 through 28,469 to the rice genome nucleotide (d) a nucleic acid hybridizing to a nucleic acid comprising sequence, and a nucleotide sequence set forth in any one of SEQID NOs: (2) determining the transcriptional regulatory region of the 1 through 28469 under stringent conditions. gene mapped in the step (1) which contains the transcrip 2 The nucleic acid according to 1), wherein said nucleic tional regulatory region found on the 5'-side of the 5' most acid is derived from rice. end of the mapped region. 3. An isolated DNA molecule selected from the group BRIEF DESCRIPTION OF THE DRAWING consisting of: 0014 FIG. 1 represents the results of detailed compari (a) a DNA molecule encoding an antisense RNA comple son of CDS1 region at the exon and intron levels. One mentary to a transcript of the DNA molecule of 1 or 2: complete coding sequence (CDS) has been predicted in the US 2006/O123505 A1 Jun. 8, 2006

BACO-10 kb region; this figure shows introns and exons in 0024 Found specifically and frequently in Arabidopsis the region and relationship between two cDNA clones of the thaliana present invention that have been mapped to the BACO-10 kb region. 0.025 (Table 4d) 0026 domain DETAILED DESCRIPTION OF THE INVENTION 0027 Transcription factors play important roles in the gene expression control of living organisms. Controlling the 0.015 The present invention provides full-length cDNA transcription factor activity is none other than controlling the clones derived fromplants. Nucleotide sequences of full gene expression. Therefore, genes encoding transcription length cDNA clones isolated from a rice plant by the present factors are useful in the control of the gene expression in inventors are set forth in SEQID NOs: 1 through 28,469 and plants. amino acid sequences of proteins encoded by these cDNAS are shown in SEQ ID NOs: 28,470through56,791. Corre 0028. The InterPro search for full-length cDNAs of the spondences of the nomenclature of each clone, SEQID NOS present invention yielded 1336 transcription factor clones of nucleotide sequences, initiation and terminationpoints of classified into 18 DNA binding domains, and the details are ORFs, and SEQ ID NOs of amino acid sequences encoded shown in Table 5. Clones predicted as the transcription in ORFs are set forth in the "List of Clones' at the end of this factor are shown in the “List of clones predicted as the specification. transcription factor” categorized into the respective classes at the end of this specification. As a result of this classifi 0016. As used herein, an "isolated nucleic acid is a cation, Zn finger-type transcription factors are dominant, nucleic acid the structure of which is not identical to that of followed by Myb-type factors; these constitutions are shared any naturally occurring nucleic acid or to that of any with Arabidopsis. Of the clones predicted as a transcription fragment of a naturally occurring genomic nucleic acid factor, those classified into Zn finger-type were collected and spanning more than three genes. The term therefore covers, subclassified in the “List of clones predicted as Zn finger for example, (a) a DNA which has the sequence of part of type' at the end of this specification. a naturally occurring genomic DNA molecule but is not flanked by both of the coding sequences that flank that part 0029 Control of a transcription factor expression enables of the molecule in the genome of the organism in which it to control the expression of a gene transcriptionally regu naturally occurs; (b) a nucleic acid incorporated into a vector lated by said transcription factor. For example, a plant body or into the genomic DNA of a prokaryote or eukaryote in a produced by transducing an antisense sequence of a tran manner Such that the resulting molecule is not identical to Scription factor involved in the transcription of a gene any naturally occurring vector or genomic DNA; (c) a causing an undesirable phenotype enables to Suppress the separate molecule Such as a cDNA, a genomic fragment, a action of said transcription factor. It is also possible to fragment produced by chain reaction (PCR), or competitively inhibit the action of a transcription factor by a restriction fragment; and (d) a recombinant nucleotide transducing into cells a nucleic acid imitating the recogni sequence that is part of a hybrid gene, i.e., a gene encoding tion sequence for the transcription factor as a decoy nucleic a fusion protein. Specifically excluded from this definition acid. The recognition sequence for transcription factors are nucleic acids present in random, uncharacterized mix encoded by full-length cDNA of the present invention can be tures of different DNA molecules, transfected cells, or cell determined by the footprinting method or gel shift assay. On clones, e.g., as these occur in a DNA library such as a CDNA the other hand, the activity of a transcription factor involved or genomic DNA library. in expression of a desirable phenotype can be enhanced by 0017. Of 28,469 clones, those with the longest ORF transducing a gene encoding the transcription factor. encoding 100 or more amino acid residues were examined 0030 Plant membrane proteins play important roles in for what functional domains they had against the InterPro the interaction between cells, absorption of nutrients from DB, yielding a total of 3491 InterPro domains. Comparison the extracellular environment, recognition and infection by of the search results of rice cDNA clones obtained in the viruses, etc. For example, the transporter present in the present invention with those of cDNAs from Arabidopsis plasma membrane is closely associated with the salt toler thaliana, Caenorhabditis elegans, Drosophila melanogaster; ance of plants. Therefore, full-length cDNA clones encoding Homo sapiens, Saccharomyces cerevisiae and S. pombe the plant membrane protein is useful in controlling various yielded the following characteristic InterPro domains. properties of plants. 0018 Found commonly in eukaryotes (Table 4a) 0031. The MEMSAT program was used to predict trans 0.019 Found specifically or frequently in rice (Table 4b membrane domains of proteins encoded by full-length and 4c) cDNA clones of the present invention. As shown in Table 6, 6,280 clones, which account for 22.1% of the whole full 0020 Pollen allergic protein domain length cDNA clones, had two or more transmembrane spanning domains. When the direction of the transmem 0021 Expressed organ-specifically (e.g., serine pro brane domain whose N-terminus is intracellular is referred tease inhibitor in seeds) to as IN, and whose N-terminus is extracellular OUT, the 0022. Environmental stress-inducible proteins (anti number of transmembrane spanning domains in IN direction freeze and that in OUT direction was about the same, although, in Some instances, the number of either one direction is pre 0023 protein, ABS/WDS-inducible anti-drought pro dominant over the other, depending on the number of tein, etc.) transmembrane spanning. US 2006/O123505 A1 Jun. 8, 2006

0032. Furthermore, intracellular localization of proteins, 0040 Fumio Takaiwa, Shoshi Kikuchi and Kiyoharu encoded by full-length cDNA clones of the present inven Oono, "Structure and expression of rice storage protein tion, was analyzed using the pSORT program. Among those glutelin genes (Oral presentation, abstract). Abstr. 2nd proteins with ORF encodding 100 or more amino acid Int. Congr. Plant Mol. Biol., 1988 November, p. 325. residues, those encoded by 18,166 clones were predicted tobe localized intracellularly. Table 7 shows the putative 0041 Shoshi Kikuchi, Fumio Takaiwa and Kiyoharu target organelles, the number of clones encoding proteins Oono, “Gene expression analysis in rice cultured cells— that are predicted to belocalized, and the ratio (%) of said analysis of protein synthesis in rice callus using two number of clones to 18,166 clones. The major organelle dimensional electrophoresis (Oral presentation, where the proteins are localized is nucleus, accounting for abstract), Ikushugaku Zasshi (Breeding Science), 1985 20.0% of clones. Second major organelles include plasma April, Vol. 35 (Suppl. 1), pp. 14-15. membrane, cytoplasm, endoplasmic reticulum (ER), and microbody, and each accounted for about 10% of the clones analyzed. 0042 Shoshi Kikuchi, Yoshio Kaneko, Takao Komat 0033 Homology search was performed between known suda, Fumio Takaiwa and Kiyoharu Oono, “DNA analysis genes and nucleotide sequences of the full-length cDNA in rice cultured cells isolation and characterization of clones or their amino acid sequences of the present invention DNA fragments whose copy numbers are significantly as query sequences. BLASTN search revealed that 2603 different between rice embryo and callus (oral presenta cDNA clones were identical to already-known rice genes. tion, abstract). Ikushugaku Zasshi (Breeding Science), These clones were classified into the identical rice genes. 1985 September, Vol. 35 (suppl. 2), pp. 102-103. BLAST X search revealed that 5607 clones were homolo 0043 Shoshi Kikuchi, Fumio Takaiwa and Kiyoharu gous to rice homolog genes, 12 527 clones to genes of other Oono, “Analysis of rice genes—analysis of DNA frag plants than rice, and 859 clones to genes of organisms other ments whose copy numbers are varied in cultured cells than plants. These results were used to functionally classify and differentiated cells (oral presentation, abstracts), 21,596 (75.86%) of 28,469 full-length cDNA clones of the Abstr. Annu. Meet. Mol. Biol. Soc. Jap., 1985 December, present invention. The identical rice genes or the already Vol. 8, p. 56. known genes found to be homologous to genes of rice and plants other than rice include the following genes. 0044 Shoshi Kikuchi, Fumio Takaiwa and Kiyoharu Oono, “Analysis of rice genes—analysis of DNA whose

0050 Shoshi Kikuchi, M. Soliman, Shin Dong-Hyun, Kazunari Maruta, Fumio Takaiwa and Kiyoharu Oono, 0060 Kanegae, H., H. Funatsuki, S. Kikuchi and M. 'Analysis of genes which specifically express in rice dry Takano, “Differential expression of the rice snf-1 related callus culture (oral presentation abstr.). Ikushugaku protein gene family.” Abs. Int. Congr. Plant Mol. Zasshi (Breeding Science), 1990 November, Vol. 40 Biol. 5th, p. 147. (Suppl. 2), p. 20-21. 0061 Takano, M., H. Kanegae, and S. Kikuchi, “Genome 0051 Shoshi Kikuchi, M. Soliman, Shin Dong-Hyun and structure of SNF-related genes in rice.” Kiyoharu Oono, “Cloning and characterization of the Abs. Annu. Meet. Mol. Biol. Soc. Jpn. 20th, 1997, 4-EH genes which express under dried condition of rice callus P-O65. (Accumulation of MRNA discovered in rice callus under 0062 Takano, M., H. Kanegae, K. Miyoshi, M. Mori, S. dried conditions) (oral presentation abstr.), 20th Annu. Kikuchi and Y. Nagato, “Rice has two distinct classes of Meet. Mol. Cell Biol. at Intnl. Congr. Mol. Biol. (Key protein kinase genes related to SNF 1 of Saccharomyces stone Symposium), 1991 January, p. 58. cerevisiae, which are differently regulated in the early 0.052 Shoshi Kikuchi, Kazunari Maruta and Kiyoharu seed development. Interact, Intersect. Plant Signal Pathw. Oono, “Analysis of gene specifically expressed in rice dry 1999, 60. callus II—Giant transcript discovered in mature seeds and 0063 Kanegae, H. S. Kikuchi and M. Takano, “Analysis dried rice callus culture (oral presentation abstr.). Ikush of promoter activity of OSK genes in rice.” Plant Cell ugaku Zasshi (Breeding Science), 1991 April, Vol. 41 Physiol., 1998, 39 (suppl), p. 125. (Suppl. 1), pp. 136-137. 0064. Takano, M., H. Kanegae, K. Miyoshi, M. Mori, S. 0053 Shoshi Kikuchi and Kiyoharu Oono, “Giant tran Kikuchi and Y. Nagato, “Rice has two distinct classes of Script of drought-specific gene discovered in mature seeds protein kinase genes related to SNF 1 of Saccharomyces and dried callus culture of rice (oral presentation abstr.).” cerevisiae, which are differently regulated in the early 4th Plant Mol. Biol. Symp., 1991 January, p. 37. seed development.” Plant Cell Physiol., 1999, 40 (suppl), p. 79. 0054 Shoshi Kikuchi and Kiyoharu Oono, “Cloning and characterization of genes specifically expressed in rice callus (Analysis of genes expressed in rice callus) (Oral 0065 Kanegae, H., M. Tahira, S. Kikuchi, K. Yamamoto, presentation, abstr.). Intnl. Workshop Rice Mol. Biol. M. Yano, T. Sasaki, K. Kanegae, M. Wada and M. Takano, 1991 August, p. 35. “Identification of NPH1 homologs in rice.” Abs. Annu. 0.055 Shoshi Kikuchi and Kiyoharu Oono, “Cloning and Meet. Mol. Biol. Soc. Jap. 21st, 1999, p. 274. characterization of the genes which specifically express in callus of rice (Oral presentation abstr.). Abst. 3rd Congr. Int. Soc. Plant. Mol. Biol., 1991 Octover, p. 872. 0066 Masaki Mori, Hiroshi Takatsuji, Hiromi Kanegae, Toshifumi Nagata, Yuriko Shibata and Shoshi Kikuchi, 0056 Shoshi Kikuchi, Kazumaru Miyoshi, Kazunari “Cloning and structural analysis of rice SUPERMAN Maruta and Kiyoharu Oono, “Rice callus cDNA clones gene.” Abs. Annu. Meet. Mol. Biol. Soc. Jpn., 1997, p. for the characterization and understanding of calli in 475. molecular level (oral presentation abstr.), German-Japa nese Work-shop Plant Culture, Breeding and Formation of 0067 Masaki Mori, Hiroshi Takatsuji, Hiromi Kanegae, Phytochemicals, Abstr, 1992 February, p. 35. Toshifumi Nagata, Yurikio Shibata and Shoshi Kikuchi, “Isolation and expression analysis of genomic DNA of 0057 Shoshi Kikuchi, Kazumaru Miyoshi, Kazunari rice SUPERMAN gene (RSUP).” Ikushugaku Zasshi Maruta and Kiyoharu Oono, “cDNA cloning from rice (Breeding Science), 1998, 48 (suppl. 1), p. 28. callus I. Classification of clones based on the expression 0068 Masaki Mori, Hiroshi Takatsuji, Hiromi Kanegae, specificity and analysis of nucleotide sequences (Oral Toshifumi Nagata, Yuriko Shibata and Shoshi Kikuchi, presentation, abstract). Ikushugaku Zasshi (Breeding “Isolation and characterization of a rice gene encoding a Science), 1992 April, Vol. 42 (suppl. 1), pp. 206-207. zinc-fingerprotein related to Arabidopsis SUPERMAN.” 0.058 Kazumaru Miyoshi, Shoshi Kikuchi, Kazunari Abstracts of 15th International Congress on Sexual Plant Maruta, Tadao Naito and Kiyoharu Oono, “Expression Reproduction, 1998, p. 96. analysis of rice callus cDNA in long-term subcultured cells (oral presentation abstr.). Ikushugaku Zasshi (Breeding Science), 1992 April, Vol. 42 (suppl. 1), pp. 0069. Masaki Mori. Hisako Ooka, Kazuhiko Sugimoto, 208-209. Kouji Sato, Hirohiko Hirochika, Koji Yamamoto and Shoshi Kikuchi. “Analysis of dwarf mutant strain whose

brd 1 is a brassinosteroidbiosynthesis mutant.” Ikush cursor consisted of 49 amino acid residues, arranged in 0085 Masaki Mori, Takahito Nomura, Hisako Ooka, order of signal peptide-acidic Subunit-basic Subunit. Masumi Ishizaka, Takao Yokota, Kazuhiko Sugimoto, Ken Okabe, Hideyuki Kajiwara, Kouji Satoh, Koji Yama 0074 Fumio Takaiwa, Shoshi Kikuchi and Kiyoharu moto, Hirohiko Hirochika and Shoshi Kikuchi, “Isolation Oono, “Ariceglutelin gene family—A major type of glu and characterization of a rice dwarf mutant with a defect telin mRNAs can be divided into two classes'. Mol. Gen. in brassinosteroid biosynthesis' Plant Physiology, 2002, Genetics, 1987 July 208, pp. 15-22. Vol.130, pp. 1152-1161. 0075 Analysis of cDNA clones of rice glutelin gene 0086). Additional homology search was performed revealed that they could be classified into 2 classes based between amino-acid sequences encoded by ORFs of full on the differences in the restriction map and length cDNA clones of the present invention and those expression time, etc. encoded by the predicted CDS of the Arabidopsis genome 0.076 Fumio Takaiwa, Hiroyasu Ebinuma, Shoshi Kiku nucleotide sequence. As shown in Table 8, 18.900 full chi and Kiyoharu Oono, “Nucleotide sequence of a rice length cDNA clones (12.996 TU) yielded hits, confirming glutelin gene FEBS. Lett., 1987 August 221 (1), pp. the homology of the aforementioned ORF amino acid 43-47. sequences with those Arabidopsis with high probability. These results substantiated a high reliability of amino acid 0.077 Genomic DNA of rice glutelin gene was cloned to sequences obtained in this invention. Furthermore, it is elucidate its entire structure. highly reliable that cDNAs of the present invention have full-length, which makes their ORFs also reliable. 0087) “BLAST search results' at the end of this specifi 0078 Shoshi Kikuchi, Fumio Takaiwa and Kiyoharu cation lists combinations of clones which showed the high Oono, “Variable copy number DNA sequences in rice' est homology among the 28,469 clones searched by BLAST Mol. Genetics, 1987 December 210, pp. 7373-380. N and BLAST X. Rice genes highly homologous to known 0079 Rice nuclear DNA whose copy number reversibly genes whose functions are known, would function similar to changes during cellular differentiation and dedifferentia the homologous known genes. tion, was cloned to elucidate its structure. 0088 For example, the following clones were found to

0.091 The following clones were found to have the present invention was 18,485. Tables 9-12 show categori domain associated with ethylene receptors, which are phy zation of these clones. The largest class was “Unclassified.” tohormone receptors and involved in controlling the matu accounting for about 4 of the total terms, followed by ration and deterioration of plants. “,”“Transport,” and “Translation”. GO terms J023034P21 1561: two-component response regulator and associated with “function” were assigned to 10,942 full length cDNA clones, and the total number of these GO terms J023056E19 1551: two-component sensor molecule. was 16,853. These GO terms could not be classified exclu 0092 Proteins having the following domains are sively, and the most frequently observed GO “function' term assumed to be so-called G-proteins, which are involved in was “enzyme'. GO terms associated with “cellular compo the signal transduction and are useful in the regulation of nent' were assigned to 3629 full-length cDNA clones, and signal transduction system in plants. the total number of the terms was 3637. Small monomeric GTPase, 0099 Relationship between GO terms contained in each category and clones to which the GO terms are assigned is RAS small monomeric GTPase, and shown in the “List of clones included in each Gene Ontology GTPase small monomeric GTPase. category' at the end of this specification. The function of each clone can be found from GO terms shown in the list. 0093 Proteins having the following domains are Representative GO terms will be specifically described expected to be involved in the signal transduction system. below. The largest number of clones was classified into the Inositol-3-phosphate synthase, and calcium ion binding. category “Enzyme'. GO function term “Enzyme includes many industrially useful as listed below. When Clones (e.g., 002-143-H11) having the above domains are protein functions are predicted by Motif search results, useful in controlling the signal transduction in plants. prediction as being enzymatic activity is usually highly accurate. In other words, proteins whose functions associ 0094 Gene ontology (GO) is useful to clarify the func ated with enzyme are predicted by Motif search results tion of a gene based on the results of motif and homology would have that function actually. Therefore, proteins to searches using protein informatics analysis against the Inter which GO term included in category “Enzyme’ attached to Pro database. Construction of GO enables to systematically the InterPro domain names (results of functional domain comprehend functions of known genes detected by a homol search) would have the activity corresponding to that GO ogy search. term. Since the GO terms included in the category “Enzyme 0.095 Motif search results predict functional domains of Inhibitor also designate functions to control enzyme proteins encoded by cINAs of the present invention. actions, proteins to which these GO terms are assigned Homology search results predict functions of known genes include many of those having useful functions like proteins having homology to full-length cDNAs of this invention. to which “Enzyme” Go terms are assigned. However, functions of known genes described as DEFINI TION in the homology search results below arevariously 1,3-beta-Glucan synthase: expressed. To comprehend functions of each gene from : expressions of functions, the expressions should be unified as much as possible. GO is a tool for comprehending gene 0.100 This enzyme is involved in disease tolerance and functions by replacing expressions actually assigned to each Vermin tolerance of plants. Therefore, genes encoding pro gene with a unified expression. In other words, the use of teins having this enzyme activity are useful in breeding GO makes it possible to comprehend homology search disease tolerant or Vermin tolerant plants. results by the unified expression terms. GO terms have been attached to the GenBank report, InterPro domain names, and 11-Aminocyclopropane-1-carboxylate synthase: Arabidopsis genes. Therefore, if GO terms are assigned to 0101 This enzyme is involved in the synthesis of ethyl the records of the database to which full-length cDNAs of ene, a phytohormone. Therefore, a gene encoding a protein the present invention showed homology, GO terms can also be assigned to full-length cDNA clones of this invention having this enzyme activity is useful in breeding environ based onhomology search results. Full-length cDNA clones mental stress-tolerant plants. of this invention were classified based on the GO terms. Alpha, alpha-: 0096. The unified expression assigned to genes using GO Alpha, alpha-trehalose-phosphate synthase (UDP-forming): is referred to as GO term. It is classified into a number of categories. In other words, each category may include plural 0102) These enzymes are involved in the synthesis of GO terms. Furthermore, categories are divided into subcat trehalose, which is an element regulating cryotolelance of egories based on different aspects, and one GO term may be plants. Therefore, a gene encoding a protein having either included in two or more categories. one of these enzyme activities is useful in breeding cryo 0097 GO terms assigned to full-length cDNA clones of tolerant plants. the present invention are shown at the end of this specifi Alpha-: cation. Furthermore, the names of categoriess and the num ber of constitutive GO terms in each category are listed in Beta-amylase: Tables 10-12. 0103) These enzymes are involved in degrading plant 0098. The total number of GO terms associated with storage starch. Therefore, genes encoding these proteins are “biological processes’ in the full-length cDNA clones of the useful in the artificial regulation of germination. US 2006/O123505 A1 Jun. 8, 2006

Aspartate kinase: Such as development and differentiation, and response to Glutamate synthase: environmental factors. Therefore, genes encoding proteins having these enzyme activities are extremely important 0104. These enzymes are involved in the nitrogen meta research Subjects. bolic system of rice. Therefore, genes encoding proteins having these enzyme activities can be used in breed : improvement. 0.112. This enzyme is important for sulfur nutrition in Caspase: plants. Therefore, genes encoding proteins having this enzy 0105. This is the enzyme involved in cell death of plant. matic activity are useful in breeding plants responsive to Therefore, genes encoding proteins having this enzyme nutrients and/or environment. activity are useful in the artificial induction of apoptosis in plants. 3',5'-Cyclic-nucleotide phoshodiesterase (TOC1 homolog): : Dyneinalpha chain, flagellarouterarm (Adagio3 =FKFlho Copper, Zinc Superoxide : molog): Ferredoxin : 0113. These enzymes are involved in regulation of cir Glutathione : cadian rhythm in plants. Circadian rhythm refers to a bio logical rhythm having a period of approximately 24 hours. Glutathione synthase: Therefore, genes encoding proteins having these enzyme Peroxidase: activities are useful in breeding environment-responsive : plants. Examples of full-length cDNA clones of the present 0106 These enzymes play important roles in the plant invention to which these GO terms are assigned, include response to oxidation stresses. Therefore, genes encoding JO1311 6P12 and JO13023A04. proteins having these enzyme activities are important in DNA (DDM1=SYD homolog): breeding oxidation stress tolerant plants. 0114. This enzyme controls DNA methylation. There : fore, genes encoding proteins having this enzyme activity 01.07 This enzyme is involved in the accumulation of are useful in breeding plants by controlling gene expression. storage Substances in rice. Therefore, genes encoding pro An example of clones to which this GO term is assigned teins having this enzyme activity are useful in the improve includes JO13133NO2. ment of phenotypes associated with plant storage Sub Stances. : O-: 0115 This enzyme is involved in the control of starch 0108. This enzyme is involved in the syntheses of sec storage in rice. Therefore, genes encoding proteins having ondary metabolites in plants. Therefore, genes encoding this enzyme activity are useful in the improvement of proteins having this enzymatic activity are useful in produc phenotypes associated with storage Substances in plants. ing pharmacological Substances. Examples of clones to which this GO term is assigned include 002-108-EO1 and JO13167021. Phosphoenolpyruvate carboxylase: 0109) This enzyme has an important role in the C4 Heat shock protein (HSP100 homolog): photosynthesis. Therefore, genes encoding proteins having 0.116) This enzyme is an environmental stress-responsive this enzyme activity may be used in improvement of pho enzyme. Therefore, genes encoding proteins having this tosynthesis efficiency. enzyme activity areuseful inbreedingheat-tolerantplants. : Examples of clones to which this GO term is assigned 0110. This enzyme is involved in the synthesis of in vivo include JO23007C17 and 001-027-DO1. second messengers (phospholipids). Therefore, genes Ubiquitin activating enzyme: encoding proteins having this enzymatic activity may be used in controlling the signal transduction system. Ubiquitin : Protein kinase: 0.117) These enzymes can be useful in elucidating the Protein : basic physiological regulatory mechanism of plants. This GO term is assigned to, for example, J033076H04 and Protein serine/threonine kinase: OO1-046-CO3. Protein serine/threonine phosphatase: (Wooden leg homolog): Protein : Protein tyrosine phosphatase: HD-zipped (Revoluta homolog): Protein tyrosine?serine/threonine phosphatase: 0118. These enzymes are involved in intracellular signal transduction occurring in plant biological process Such as Transmembrane receptor protein tyrosine kinase: development and differentiation, and response to environ 0111. These enzymes are involved in the intracellular mental factors. Therefore, genes encoding proteins having signal transduction occurring in plant biological process these enzyme activities are extremely important research US 2006/O123505 A1 Jun. 8, 2006

Subjects. These GO terms are assigned to, for example, organ-specific genes. Therefore, the transcriptional regula JO13112K17 and 001-023-HO8. tory region of thesegenes is useful, for example, in control Receptor-like protein kinase (bril homolog): ling the expression of proteins in seeds. Plant seeds contain many industrially important proteins such as enzymes influ Serine/threonine kinase (shaggy-like homolog): encing nutrition of seeds and potentially allergic proteins. 0119) These enzymes play important roles in signal trans duction mediated by brassinosteroid, which is a phytohor 0.126 Regulation of these gene functions within plant mone having important actions such as growth promotion, bodies may change characteristics and morphology of an increase in the plant yield, or enhancement of stress plants. Examples of Such changes are changes in Salt and tolerance. Therefore, genes encoding proteins having these Vermintolerances, growth, and flowering time of plants, but enzyme activities can be extremely important research Sub not limited thereto. jects. These GO terms are assigned to, for example, 0127 Plant-derived DNAs of the present invention are JO33069J12 and JO33061 L20. not limited to those derived from rice, including DNAS Serine/threonine kinase (ERECTA homolog): derived from other plants as long as they have the function Transporter (shoot gravitropism 2 homolog): equivalent to that of the rice-derived DNA set forth in any of SEQ ID NOs: 1 through 28,469. Source plants are Replication licensing factor (Prolifera homolog): preferably monocot plants, more preferably poaceous plants, 0120) These enzymes control morphogenesis of plants. and most preferably rice. Whether a DNA has the function Therefore, genes encoding proteins having these enzyme equivalent to that of rice-derived DNAs isolated by the activities can be extremely important research Subjects. present inventors or not can be judged by whether, compared These GO terms are assigned to, for example, J033070P05, to rice-derived DNAs, similar changes occur in plant bodies JO13O87B12 and JO33O41P2O. when said DNA is expressed in plants or when the function 0121 GO terms associated with the category “trans of said DNA is inhibited in plant bodies. DNAS which porter include proteins which are all involved in substance induce similar changes in plant bodies have the “function transport within plant bodies or between the inside and equivalent” to that of rice-derived DNAs isolated by the outside of plants. Substance transport in plants is important present inventors. biologically and industrially. The GO terms associated with 0128. The present invention includes DNAS encoding “transporter and their industrial usefulness is specifically proteins which are structurally analogous to a protein having described below. the amino acid sequence set forth in any one of SEQ ID Ammonium transporter: NOs: 28470 through 56791 and have functions equivalent to 0122) This protein functions as an ammonium ion trans that of said protein. Such DNAs include, mutants, deriva porter within plant bodies. Therefore, genes encoding pro tives, alleles, variants and homologs of DNAS encoding teins having this activity are useful in breeding plants proteins comprising amino acid sequences set forth in any of focusing on nitrogen metabolism. Nitrogen absorption capa SEQ ID NOs: 28470 through 56791, in which one or more bility of plants can be enhanced by increasing expression of amino acid residues are substituted, deleted, added and/or this protein. inserted. Cobalt ion transporter: 0129. Examples of the method known to those skilled in Heavy metal ion transporter: the art for preparing DNAS encoding proteins whose amino acid sequences have been modified include the site-directed 0123 These proteins function as a heavy metal trans mutagenesis method (Kramer, W. & Fritz, H.-J., Methods porter within plant bodies. Therefore, genes encoding pro Enzymol, 1987, 154: 350). Furthermore, mutation in the teins having these activities are useful in breeding plants amino acid sequence of a protein due to the mutation in the capable of growing on heavy metal-polluted soil. For coding nucleotide sequence may occur spontaneously. Thus, example, plants become tolerant to heavy metal-polluted even DNAS encoding proteins having amino acid sequences soil by Suppressing activities of these proteins. Plants and in which one or more amino acids have been Substituted, microorganisms in which these proteins are expressed can deleted, added and/or inserted are included in the DNAs of be utilized to recover the heavy metal-polluted soil. the present invention as long as they encode proteins having 0124 Full-length cDNAs of the present invention were functions equivalent to those of the natural proteins (SEQID isolated fromavarietyof library sources. Comparisonof NOs: 28470 through 56791). library sources from which cDNAs of the present invention are derived enables to detect genes specifically found in a 0.130. To maintain the original function of a protein, an specific library Source. Flower organ-specific genes thus amino acid of the protein is preferably substituted with an amino acid thathas the similar property as that of the amino found based on such idea are shown in the "List of flower acid of the protein. For example, amino acids belonging to organ-derived clones' at the end of this specification. the same group shown below have similar property. Even 0125 Flower organs have functions that directly influ when an amino acid is Substituted with another amino acid ence the rice yield such as flowering and fruition. Needless in the same group, the essential function of the protein does to say, genes specifically expressed in flower organs are not change in most cases. Such amino acid Substitution is industrially extremely useful in genetically breeding new referred to as conservative substitution, which is well rice plants. Furthermore, regions that control the expression known modification of amino acid sequences without alter of these genes may possibly control the expression of flower ing the original function of a protein. US 2006/O123505 A1 Jun. 8, 2006

Non-polar amino acids: Ala, Val, Leu, Ile, Pro, Met, Phe and Natl. Acad. Sci. USA, 1990, 87: 2264-2268: Proc. Natl. Trp; Acad. Sci. USA, 1993, 90:5873). Programs called BLASTN and BLASTX based on the BLAST algorithm have been Non-charged amino acids: Gly, Ser, Thr, Cys, Tyr, ASn and developed (Altschul, S. F., et al., J. Mol. Biol., 1990, 215: Gln; 403). In the case of analyzing nucleotide sequences using Acidic amino acids: Asp and Glu; and BLASTN, parameters are set at, for example, score=100, and word length=12. Furthermore, in the case of analyzing Basic amino acids: Lys, Arg and His. amino acid sequences using BLASTX, parameters are set at, 0131 The number of amino acids that are mutated is not for example, score=100 and word length=12. In the case of particularly restricted, as long as the a mutant protein is using BLAST and Gapped BLAST programs, default functionally equivalent to the original protein. Normally, it parameters are used. Specific techniques of these analytical is within 50 amino acids, preferably within 30 amino acids, methods are well-known (see the NCBI website. http:// more preferably within 10 amino acids, and even more www.ncbi.nlm.nih.gov/). preferably within 3 amino acids. The site of mutation may be any site, as long as a mutant protein is functionally equiva 0.137 cloNAs of the present invention can be prepared by lent to the original protein. synthesizing cDNAs based on mRNAs extracted from plant Such as rice, inserting them into a vector Such as W%ZAP to 0132) DNAs having changes in their nucleotide construct cDNA libraries, and developing them to perform sequences due to degeneracy are also included in the present the colony hybridization or plaque hybridization using invention. Degeneracy refers to a mutation in nucleotide probes comprising the entire nucleotide sequences set forth sequences, which does not cause any mutation in amino acid in SEQ IDNOs: 1 through 28469 or portions thereof as residues in proteins. probes or to carry out PCR using primers designed based on 0.133 Examples of other methods for preparing DNAS the nucleotide sequences set forth in SEQID NOs: 1 through encoding proteins functionally equivalent to those compris 28469. ing the amino acid sequences set forth in SEQ ID NOs: 0.138 Comparison of nucleotide sequences of cDNAs of 28470 through 56791 are those using the hybridization the present invention with the genomic nucleotide sequence, technique (Southern, E. M., J. Mol. Biol., 1975,98: 503) and can clarify the locations of the transcription initiation point polymerase chain reaction (PCR) technique (Saiki, R. K., et or boundaries between exons and introns in the genome. The al., Science, 1985, 230:1350; Saiki. R. K., et al., Science, genome nucleotide sequence is available, for example, as the 1988, 239: 487). Using the nucleotide sequences of cDNAs sequence of the BAC/PAC clone presented by the Interna of the present invention (SEQID NOs: 1 through 28469) or tional Rice Genome Sequencing Project (IRGSP) . The their portions as a probe, and oligonucleotides specifically nucleotide sequences of full-length cDNA clones of the hybridizing to these cDNAs as a primer, those skilled in the present invention are aligned to map them to each exon of art would readily isolate DNAs highly homologous to these the genomic sequence. This mapping gives information as to cDNAs from rice and other plants. Thus, DNAs of the at what positions of the genome transcription starts and ends, present invention include DNAS encoding proteins having and from what sites introns are excised. functions equivalent to those of the proteins comprising amino acid sequences set forth in SEQ ID NOs: 28470 0.139. Full-length cDNA clones of the present invention through 56791 which can be isolated by the hybridization can be used to perform comprehensive expression analysis and PCR techniques. as well as proteome analysis at the transcription level. For example, full-length cDNA clones of the present invention 0134) For the isolation of such DNAs, hybridization is as they are or their regions that would be clone-specific. Such carried out preferably under the stringent condition. The as 5'-UTR or 3'-UTR region of the sequences of cDNA stringent hybridization condition in the present invention clones, may be immobilized on glass plates to prepare refers to the condition of 6 Murea and 0.4% SDS in 0.5x microarrays for expression analysis of rice. DNA fragments SSC, or the stringent hybridization condition equivalent to be immobilized can be synthesized by known techniques thereto. Under the more stringent condition, for example, such as PCR. Methods of immobilizing cDNA clones and that of 6 Murea and 0.4% SDS in 0.1x SSC, more highly their fragments on glass plates are also well-known in the homologous DNAs can be isolated. High homology refers to art. cRNAs prepared from various cells of rice are allowed the sequence homology of at least 50% or more, preferably to hybridize to the microarrays thus obtained, thereby 70% or more, more preferably 90% or more, and most obtaining expression patterns of genes characteristic for preferably 95% or more (e.g. 96, 97, 98 and 99%) in the entire amino acid sequence. each cell. 0140. A DNA array is composed of a onto 0135 Preferably, isolated nucleic acid of the present which a large quantity of probes are attached in high density invention includes a nucleotide sequence that is at least 50% So as to analyze changes in the expression levels of a large identical to any one of the nucleotide sequence shown in amount of genes at high speed. The use of DNA arrays SEQ ID NO: 1 to 28469. More preferably, the isolated enables to comprehensively find out genes whose expression nucleic acid molecule is at least 60%, 70%, 80%, 85%, 90%, levels are altered in plant cells exposed to the specific 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, condition. This analytical technique is referred to as gene identical to anyone of the nucleotide sequence shown in expression profile analysis. One of factors that greatly SEQ ID NO: 1 to 28469. influence usefulness of DNA arrays as an analytical tool, is 0136 Homologies of amino acid and nucleotide the number of probes attached to DNA arrays. It may be sequences (sequence identity) can be determined using easily understood that, for example, a DNA array compre algorithm BLAST reported by Carlin and Altschul (Proc. hensively containing all of the genes of a plant is an ideal US 2006/O123505 A1 Jun. 8, 2006 tool for the gene expression profile analysis. In fact, there is Inhibition of transcription initiation due to triple helix for no higher plant in which all of the existing genes of which mation; have been analyzed. Therefore, in reality, usefulness of DNA Transcriptional inhibition due to hybridization to the site of arrays depends on how as many genes as possible they opened loop structure locally formed by RNA polymerase: contain. The present inventors determined the structures of a great number of rice-derived genes using a rice DNA array. Transcriptional inhibition due to hybridization to RNA It can thus be said that the present invention remarkably whose synthesis is in progress; improved the usefulness of rice DNA arrays. Inhibition of splicing due to hybridization to the boundaries 0141 DNA arrays used in the present invention can of introns and exons; contain probes comprising nucleotide sequences specifically Inhibition of splicing due to hybridizaation to the site of found in each of 28,469 clones. The probes constituting spliceosome formation; DNA arrays of the present invention may include probes of any clones selected from28,469 clones. The number of Inhibition of transition of mRNA from the nucleus to cytosol selected clones is, for example, 10% or more, usually 30%, due to hybridization to the mRNA: preferably 50% or more, more preferably 70% or more, and Inhibition of splicing due to hybridization to the capping site most preferably 80% or more of 28,649 clones. The more the and poly(A) addition site; number of selected clones is, the more genes are compre Inhibition of translational initiation due to hybridization to hensively analyzed. the translation initiation factor ; 0142. The proteome analysis can be carried out, for Inhibition of translation due to hybridization to the ribosome example, as follows. First, the coding regions of cDNAS binding site near the initiation codon; excised from the full-length cDNA clones are each linked to an appropriate vector to express their proteins using Escheri Inhibition of peptide chain elongation due to hybridization chia coli and yeast systems. Proteins thus obtained are to the mRNA coding region and polysome binding site; and purified, and can be used in structural analysis, interaction Inhibition of gene expression due to hybridization to the analysis, complementation test for mutation, etc. interaction site between nucleic acid and protein. 0143 DNAs of the present invention can be used to 0146 Thus, antisense nucleic acids inhibit various pro produce mutant plants. Mutant plants expressing DNAS of cesses such as transcription, splicing, and translation to the present invention can be produced by inserting the Suppress the expression of a target gene (Hirashima and above-described DNA into an appropriate vector, introduc Inoue, “New Experimental Biochemistry Manual 2, Nucleic ing the vector with the insertion into plant cells by methods Acid IV. Gene Duplication and Expression.” The Japanese described below, and regenerating transformed plant cells Biochemical Society, ed., Tokyo Kagaku Dojin, 1993, pp. thus obtained. On the other hand, plants in which the 319-347). expression of DNAs of the present invention is suppressed 0147 Antisense sequences used in the present invention can be produced, for example, as described below, by may inhibit the expression of a target gene by any of the inserting a DNA that suppresses the expression of a DNA of above-described actions. As an embodiment, an antisense the present invention into an appropriate vector, transducing sequence designed so as to be complementary to the non the vector with the insertion into plant cells, and regenerat coding region near the 5'-end of mRNA of a target gene ing the transformed plant cells thus obtained. Herein, the would inhibit its translation. Sequences complementary to “suppression of DNA expression' includes the suppression the coding region or non-coding region of the 3'-side can of transcription of DNAS as well as their translation into also be used. Antisense DNAs used in the present invention proteins. Furthermore, it also includes not only complete include DNAS comprising antisense sequences of not only block but also reduction of DNA expression, and further the coding region but also non-coding region of a target more, inhibition of the translated protein to express its gene. inherent function within plant cells. 0.148 Antisense DNA to be used is linked downstream of 0144. The antisense technique is the most common an appropriate promoter and the transcription termination method for Suppressing the expression of a specific endog signal is preferably linked to the 3'-end of antisense DNA. enous gene in plants employed by those skilled in the art. Antisense DNA thus prepared can be transformed to a Antisense effects in plant cells were proved for the first time desired plant by known methods. Although sequences of by Ecker et al. who demonstrated antisense effects of the antisense DNAS are preferably complementary to the endog antisense RNA transduced into plant cells by electroporation enous gene of a plant to be transformed or a portion thereof, (Ecker, J. R. & Davis, R. W. Proc. Natl. Acad. Sci. USA, they may not be completely complementary thereto as long 1986, 83: 5372). Thereafter, a report presenteda decrease in as they are capable of efficiently inhibit target gene expres the target gene expression in tobacco and petunia plants due sion. The transcribed RNAs are preferably 90% or more and to the antisense RNA expression (van der Krol, A. R. et al., most preferably 95% or more complementary to the tran Nature, 1988, 333: 866), and, to-date, the antisense tech script of a target gene. For the effective inhibition of target nique has been established as a means of inhibiting gene gene expression, an antisense sequence has at least 15 expression in plants. nucleotides or more, preferably 100 nucleotides or more, and more preferably 500 nucleotides or more. Antisense 0145 For example, antisense nucleic acids inhibit the DNAs commonly used are less than 5 kb, preferably less expression of a target gene, in the following manners. than 2.5 kb. US 2006/O123505 A1 Jun. 8, 2006

0149. It is also possible to inhibit the expression of sequence identical or analogous to a target gene sequence, endogenous genes using a DNA encoding ribozyme. into cells Suppresses expression of both the transgene and Ribozyme refers to an RNA molecule with catalytic activity. the target endogenous gene. Although the mechanism of Various activities of ribozymes are known, and, among RNAi has not been elucidated in detail, it is thought that the them, researches focusing on the ribozyme activity to cleave double-stranded RNA introduced is decomposed into small RNA have enabled to design ribozymes that site-specifically fragments, which, in turn, become an indicator of the target cleave RNA. Large-sized ribozymes Such as group I intron gene by Some meansand induce degradation of the target type RNA and M1 RNA included in RNase P, consist of 400 gene. It is also known that RNAi is effective in plants or more nucleotides, while some ribozymes have about 40 (Chuang, C. F. & Meyerowitz, E. M., Proc. Natl. Acad. Sci. nucleotides of the active domain, including hammerhead USA, 2000, 97: 4985). For example, for the suppression of and hairpin ribozymes (Makoto Koizumi and Eiko Ohot expression of DNA encoding the target protein in plants, suka, Protein Nucleic Acid Enzyme, 1990, 35: 2191). DNA encoding said protein or a double-stranded RNA having the sequence analogous thereto may be introduced 0150. For example, the self-splicing domain of hammer into a plant to select, out of the plant bodies thus obtained, head ribozymes cleaves the 3'-side of C15 in the plants in which the expression level of said protein is G13U14C15 sequence. Base pairing of U14 with A9 is reduced compared to the wild type plant. Genes used in thought to be important for that cleaving activity, and it has RNAi are not necessarily completely identical to a target been demonstrated that said domain can be cleaved even gene, but have at least 70% or more, preferably 80% or when C15 is replaced by A15 or U15 (Koizumi, M., et al., more, more preferably 90% or more, and most preferably FEBS Lett., 1988, 228: 228). Ribozymes whose substrate 95% or more of sequence identity to the target gene. binding site is complementary to an RNA sequence near the Sequence identity can be determined by the above-described target site, recognize a sequence Such as UC. UU or UA in technique. the target RNA and cleave it like restriction enzymes (Koi Zumi, M., etal, FEBS Lett., 1988, 239: 285; Koizumi, M. 0154 Expression of an endogenous gene can also be and Ohotsuka, E., Protein Nucleic Acid Enzyme, 1990, 35: inhibited by co-suppression due to transformation with DNA 2191: Koizumi, M., et al., Nucl. Acids Res., 1989, 17: 7059). having a sequence identical or analogous to a target gene. For example, DNAs of the present invention (SEQID NOs: “Co-suppression” refers to the phenomenon in which intro 1 through 28469) contains several potential ribozyme target duction of a gene having a sequence identical or analogous sites. to a target endogenous gene into plants by transformation, inhibits the expression of both of the transgene and the target 0151 Hairpin ribozymes, which are found, for example, endogenous gene. Although the mechanism of co-suppres in the negative strand of satellite RNA of tobacco ring spot sion has not been clarified in detail, it is thought to at least virus (Buzayan, J. M., Nature, 1986, 323: 349), are also partially overlap that of RNAi. This phenomenon has been useful for the purpose of the present invention. It was shown observed also in plants (Smyth, D. R. Curr. Biol., 1997, 7: that hairpin ribozymes can also be designed to cleave RNA R793; Martienssen. R. Curr. Biol., 1996, 6: 810). For target site-specifically (Kikuchi, Y. and Sasaki, N., Nucl. example, a plant body having co-suppressed DNA encoding Acids Res., 1991, 19: 6751; Kikuchi, Y., Chemistry and a protein can be obtained by preparing a vector DNA that Biology, 1992, 30: 112). expresses DNA encoding said protein or DNA having the sequence analogous thereto, transforming a target plant with 0152. A ribozyme designed to cleave a target site is the vector, and selecting plants in which the expression level linked to a promoter such as 35S promoter of cauliflower of said protein is reduced compared to the wild type plant mosaic virus and the transcription termination sequence so body. Although a gene used in co-suppression needs not be as to be transcribed in plant cells. However, when an excessive sequence is attached to the 5’- and 3'-ends of the completely identical to a target gene, it has at least 70% or transcribed RNA, the ribozyme activity might be sometimes more, preferably 80% or more, more preferably 90% or lost. In Such a case, it is possible to arrange a different more, and most preferably 95% or more sequence identity to cis-acting trimming ribozyme on the 5'- and 3'-ends of the the target gene. Sequence identity can be determined using ribozyme portion so as to accurately excise only the the aforementioned technique. ribozyme portion from the transcribed RNA containing said 0.155 The present invention provides a method of pro ribozyme (Taira, K. et al., Protein Eng. , 1990, 3: 733; ducing a transformed plant body comprising the steps of Dzianott, A. M. and Bujarski, J. J., Proc. Natl. Acad. Sci. introducing a DNA of this invention into plant cells and USA, 1989, 86: 4823; Grosshans, C. A. and Cech, T. R., regenerating plant bodies from said plant cells. Nucl. Acids Res., 1991, 19: 3875; Taira, K. et al., Nucl. 0.156. In the present invention, there is no particular Acids Res., 1991, 19: 5125). Furthermore, it is also possible limitation in the types of plants from which plant cells are to arrange these constitutive units in tandem so as to cleave derived. There is also no particular limitation in the types of at plural sites within the target gene, thereby enhancing its vectors used in the transformation of plant cells, as long as effect (Yuyama, N. et al., Biochem. Biophys. Res. Commun., they are capable of expressing the transgene in said cells. 1992, 186: 1271). Expression of a target gene of the present For example, a vector to be used has a promoter (such as 35S invention can be suppressed by specifically cleaving the promoter of cauliflower mosaic virus) that constantly transcript of the target gene. express a gene in plant cells and is inducible by an extra 0153 Endogenous gene expression can be suppressed by neous stress. Herein, “plant cells' include plant cells in RNA interference (RNAi) using a double-stranded RNA various forms such as Suspended cultured cells, protoplasts, having a sequence identical or analogous to a target gene leaf sections, and calli. sequence. RNAi refers to the phenomenon wherein intro 0157 The vectors can be introduced into plant cells using duction of a double-stranded RNA, which comprises a various methods known to those skilled in the art such as the US 2006/O123505 A1 Jun. 8, 2006 polyethylene glycol method, electroporation method, Maltose binding protein (vector PMAL series, New England method mediated by Agrobacterium, and particle gun BioLabs, USA), method. In the method mediated by Agrobacteriun (e.g., Glutathion-S- (GST) (vector pGEX series, Amer EHA101), it is possible to use, for example, the ultrahigh sham Pharmacia Biotech), and speed monocot transformation method (Japanese Patent No. 3.141084). Furthermore, the particle gun method can be Histidine tag (pET series, Novagen) carried out by using, for example, the from BioRad. 0162 There is no particular limitation in the type of host Regeneration of plant bodies from the transformed plant cells to be used for producing recombinant proteins as long cells may be conducted using methods known to those as they are Suitable for expressing the recombinant proteins. skilled in the art depending on the type of plant cells (Toki, Besides the aforementioned Escherichia coli, for example, host cells Such as yeasts, various animal and plant cells, and S., et al., Plant Physiol., 1995, 100: 1503). insect cells may be used. The vector can be introduced into 0158. Several methods of producing transformed rice cells by various methods known to those skilled in the art. plant bodies have already been established and extensively For example, introduction into Escherichia coli can be performed by the method using calcium ion (Mandel, M. & used in the technical field to which the present invention Higa, A., Journal of Molecular Biology, 1970, 53, 158-162; pertains. These methods include the method of introducing Hanahan, D., Journal of Molecular Biology, 1983, 166, a gene into protoplasts using polyethylene glycol to regen 557-580). Recombinant proteins expressed in host cells can erate plant bodies (suitable for Oryza sativa L. ssp. Indica) be recovered and purified from the host cells or their culture (Datta, S. K., “In Gene Transfer To Plants. Potrykus, I. and supernatant by the method known to those skilled in the art. Spangenberg, Eds. , 1995, pp. 66–74); Toki, S., et al., “the When recombinant proteins are expressed as a fusion protein method of introducing a gene into protoplasts with the with the above-described fusion partner such as maltose electric pulse to regenerate plant bodies (suitable for Oryza binding protein, the proteins of interest can be easily purified sativa L. ssp. japonica), Plant Physiol., 1992, 100: 1503); by affinity chromatography. the method of directly introducing a gene into cells by the 0.163 For more easily performing the affinity purifica particle gun technique to regenerate plant bodies (Christou, tion, an amino acid sequence recognized by protease may be P. et al., Biotechnology, 1991, 9: 957), and the method of inserted into the fusion protein. For example, amino acid introducing a gene into cells mediated by Agrobacterium to sequences of the maltose-binding protein and a protein to be regenerate plant bodies (Hiei, Y., et al., Plant J., 1994, 6: produced are connected mediated by a protease recognition sequence. After capturing the fusion protein using the action 271). These methods can be preferably used in the present of maltose-binding protein, the target protein can be recov invention. ered by the action of the protease. In this case, any protease 0159) Once the transformed plant body in which a DNA not acting on the target protein may be used. of the present invention has been introduced is obtained, it 0164. The recombinant protein thus obtained can be used is possible to obtain progenies from said plant by sexual or to prepare antibodies binding thereto. For example, poly asexual reproduction. It is also possible to obtain propaga clonal antibodies can be obtained by immunizing animals tion materials from said plant body and its progeny or clone, such as rabbits with a purified protein of the present inven which allows mass-production of said plant body. Propaga tion or its partial peptide. After confirming the titer elevation tion materials which can be thus obtained are included in the of immunized animals, the blood is withdrawn, and the serum is recovered to obtain anti serum against the target present invention. Propagation materials of the present protein. Alternatively, polyclonal antibodies can be obtained invention include all of the materials capable of regenerating by purifying IgG from the antiserum. Furthermore, purified the plant bodies having the genetic characteristics intro antibodies can be obtained by carrying out immuno-affinity duced into the transformed plant body of this invention, for purification using the target protein as an antigen. example, seeds, fruits, spikes, tubers, tuberous roots, stubs, 0.165. Furthermore, monoclonal antibodies can be pre calli, and protoplasts. pared by fusing myeloma cells and antibody-producing cells 0160 The present invention also relates to proteins hav of the animal immunized with the above-described protein ing the amino acid sequences encoded by the above-de or peptide, isolating monoclonal cells (hybridoma) produc scribed nucleic acids. Proteins of this invention can be ing the target antibodies, and obtaining the antibodies from prepared by the gene recombination technique using the said cells. The antibodies thus obtained can be used in the aforementioned nucleic acids. Recombinant proteins of this purification and detection of proteins of this invention. The invention can be prepared by inserting a DNA encoding a present invention includes antibodies capable of binding to protein of this invention into an appropriate vector, intro proteins of this invention. ducing said vector into Suitable cells, culturing transformed 0166 The present invention also provides a database cells thus obtained, and purifying the expressed protein. including information on the nucleotide sequences of rice full-length cDNA clones ofthis invention and/or information 0161. A recombinant protein can be expressed as a fusion on their amino acid sequences. A data base refers to a protein with another protein to facilitate its purification and collection of information on the nucleotide sequences and/or detection. For example, a method for expressing in a host amino acid sequences contained as retrievable and machine Escherichia coli a protein as a fusion protein with a fusion readable data. Databases of the present invention contain at partner protein as described below. Preferable vectors for least one of the nucleotide sequences of rice full-length each fusion protein are parenthetically shown. cDNAs of this invention. Databases of the present invention US 2006/O123505 A1 Jun. 8, 2006

may consist of rice full-length cDNAs of this invention, or (2) judging the region containing the transcriptional regu include information on nucleotide sequences of known latory region found upstream of the 5' most end of the full-length cDNAs, ESTs, etc. In databases of the present mapped region as the transcriptional regulatory region of the invention, not only information on the nucleotide sequences gene mapped in the step (1). but also information related thereto Such as the gene func 0172 Gene expression is controlled by transcription fac tion revealed by this invention and names of clones retaining tors. In genomes, the region containing the nucleotide those full-length cDNAs may be recorded together or linked sequence recognized by a transcription factor is referred to thereto. as the transcription regulatory region, which usually exists in the region on the 5'-side of the transcription initiation 0167 Databases of the present invention are useful for point. For example, the region comprising several hundreds acquiring a full-length gene based on information on gene to several kbs of DNAs arranged in tandem often has the fragments. Databases based on this invention all comprise transcriptional regulatory action. In genomes, in contrast to information on full-length cDNA nucleotide sequences. a nucleotide sequence encoding mRNA which is divided by Therefore, by comparing nucleotide sequences of gene frag many introns, the transcriptional regulatory region exists ments obtained by the gene expression analyses using a with being linked in tandem. Therefore, once the transcrip DNA array and subtraction method with the information of tion initiation point can be mapped to the genome, the this database, the full-length nucleotide sequences of genes transcriptional regulatory region can be relatively easily can be revealed. obtained. The method of obtaining the transcriptional regu latory region based on information on nucleotide sequences 0168 Furthermore, since databases of the present inven of the full-length cDNAs, will be described below in more tion contains information associated with rice genes, it is detail. useful for isolating rice homologs based on information on nucleotide sequences of genes isolated from other species. 0173 First, of the nucleotide sequences of full-length cDNAs, sequences containing the 5'-end, in particular, are 0169. At present, gene expression analysis such as DNA mapped to the genomic sequence. Genomic nucleotide array analysis enables to obtain information on diverse gene sequences of any species may be used, and the genome of fragments. These gene fragments, ingeneral, are used as a Oryza sativa L. ssp. japonica is preferably used. Mapping tool for obtaining their full-length sequence. When a gene reveals the position coinciding with the 5'-end as the tran fragment is derived from a known gene, its full-length can scription initiation point. Nucleotide sequences near the be easily elucidated by comparing it with a known database. transcription initiation point will be the analytical Subject as the candidate sequence. The candidate sequence is usually However, when no identical nucleotide sequence is found in selected from the region ranging from (-) several kb to (+) a known database, cloning of gene fragments must be several hundreds b counting the transcription initiation point carried out to obtain the full-length cDNA. It is often to be “1.” Herein, the minus (-) numerals refer to the values difficult to obtain full-length nucleotide sequences based on counted in the direction toward the 5'-end, while the plus (+) information on DNA fragments. Without obtaining a full numerals the values counted in the direction toward the length gene, it is impossible to deduce the amino acid 3'-end. More specifically, the candidate sequence may be in sequence of a protein encoded by that gene. Databases of the the region of from -2 kb to +500 b, for example, from -1 kb present invention would contribute to identification of full to +200 b, and preferably from -1 kb to +1 b. The tran length cDNAS corresponding to gene fragments that cannot Scriptional regulatory region can be predicted by searching be elucidated by the already known gene databases. for the transcription factor-binding consensus sequence. 0170 Databases of the present invention can also be used Nucleotide sequences found in genomes capable of binding to isolate genes associated with diverse characteristics. For to the transcription factor are referred to as cis sequences. example, the relationship between polymorphism markers For example, TATA box is a representative cis sequence. and phenotypes of organisms has been clarified. cDNA Many nucleotide sequences have been identified in the nucleotide sequences of this invention are mapped to transcription factor-binding consensus sequence, and infor genomic DNA and compareded with the information on mation on a specific sequence can be obtained, for example, polymorphism markers whose association with phenotypes from the transcription factor-binding consensus sequence is known. A gene having a polymorphism marker within the database TRANSFAC (http://transfac.gbf.de/homepage/da transcriptional regulatory region and its exons would asso tabases/transfac/transfac.html). ciate with the phenotype correlated to the polymorphism 0.174 Analytical results determine the region containing marker. Thus, the use of information on nucleotide the cis sequence in candidate sequences as the transcrip sequences of cDNAs of this invention enables positional tional regulatory region. Further analysis can be done based cloning in silico. The polymorphism markers usable in Such on their interaction with the transcription factor recognizing analysis include, for example, SNPs. SNPs may be exam cis sequence constituting said transcriptional regulatory ined for the relationship with a single polymorphism site, or region. For example, the footprinting method and gel shift focused on a plurality of SNPs. assay have been used as the analytical methods for clarifying 0171 Furthermore, the present invention relates to a the relationship between the transcriptional regulatory method of obtaining the transcriptional regulatory region of region and transcription factor. These analytical methods rice, wherein said method comprising the following steps: identify a region necessary for the transcription in the transcriptional regulatory region thus selected. Furthermore, (1) mapping the nucleotide sequence set forth in any of SEQ a reporter assay can be performed to assess the transcrip ID NOs: 1 through 28469 to the genomic nucleotide tional action of the transcriptional regulatory region sequence of rice, and obtained by the present invention. US 2006/O123505 A1 Jun. 8, 2006

0175) Full-length cDNAs of the present invention would contain the nucleotide sequence of the 5'-end of mRNA, and TABLE 1. are useful as a tool for obtaining the transcriptional regula Library Starting materials for Full-length tory region. Depending on purposes. A cDNA derived from No cDNA library construction a special library source can also be used to obtain the Seedlings two weeks transcriptional regulatory region. For example, flower bud after germination specific cDNAs can be used to obtain the transcriptional 1 Normally grown Green Shoot regulatory region specific to flower bud. Furthermore, 2 Normally grown Root cDNAs obtained from tissues exposed to various stresses 3 Dark grown Etiolated Shoot can be used to obtain the transcriptional regulatory region (Radiation and Oxidative) 4 UVB 550 J/m2 associated with stress. As described above, the genome Calli ten days after transfer to new medium drafts of indicassp. (Yu, J. et al., “A draft sequence of the rice 5 Normally grown genome (Oryza sativa L. ssp. indica) Science, 2002, 296, (Temperature) 79-92) and japonica ssp. (Goff, S. A. et al., “A draft 6 +cold cold treated (at 6° C.) sequence of the rice genome (Oryza sativa L. ssp. 7 +heat heat treated (at 45° C.) japonica) Science, 2002, 296, 92-100), have been already (Hormone) 8 +auxin 2 ppm of NAA published. Therefore, information on sequences of cDNAs 9 +Cytokinine 2 ppm of BAP of this invention can be used to obtain many transcriptional 10 +ABA 2 ppm of ABA regulatory regions. (Chemical) 11 +Cd 6 ppm of CdCl2 0176) The present invention provides and characterizes Germinating seeds three days after imbibition full-length cDNAs comprehensively. These full-length 12 Normally grown cDNA scan be used for annotation of correct gene coding (Hormone) region, determination of exons and introns, comprehensive 13 +auxin 2 ppm of NAA 14 +Cytokinine 2 ppm of BAP expression analysis on the transcriptional level, and pro Panicles teome analysis. Additionally, they are useful in producing 15 less than 1 cm stage plants having the different characteristic from the wild type 16 less than 5 cm stage 17 more than 5 cm stage due to the inhibition of their expression and function within 18 one day after flowering the plant bodies. 19 two weeks after flowering 2O three weeks after flowering 0177 Since cDNAs of the present invention have a strong probability of being full-length, information on these nucleotide sequences would efficiently predict the transcrip 0180. To date, the Foundation of Advancement of Inter tional regulatory region. To obtain the transcriptional regu national Science (FAIS) has constructed 20 types of rice latory region based on the gene nucleotide sequences, the libraries enriched with full-length cDNAs by using the transcriptional initiation point should be correctly deter oligo-capping method combined with normalization and mined, which can be successfully achieved using informa size fractionation. To avoid PCR bias, the number of ampli tion on nucleotide sequences of full-length cDNAs whose fication cycles during the construction of these libraries was 5'-end nucleotide sequences are completely provided. Any minimized. Furthermore, about a half of the resulting cDNA patents, patent applications, and publications cited herein are were cloned into a vector, which is part of the Gateway incorporated by reference in their entirety. system, to facilitate high-throughput analysis of rice pro teomics. 0178 The present invention is explained in more detail 0181 RIKEN has constructed 4 types of rice full-length with reference to examples, but is not be construed as being cDNA libraries (shoots of seedlings, roots, calli and germi limited thereto. nating seeds) by the RIKEN’s original technologies which combined used the biotinylated CAP trapper method, thermo EXAMPLE 1. activation of reverse transcriptase by trehalose, normaliza tion, the oligolinker method, the poly-stretch-less method, Acquisition of cDNA Clones and the vector designed for the preferential cloning of long (1) Starting Materials and Method For Constituting a Full inserts. Length cDNA Library 0182 Briefly, mRNAs were extracted by using the modi fied CTAB method, underwent cloNA synthesis, CAP trap 0179 The rice genome project of Japan has obtained EST ping and normalization, and were cloned into the lambda clones from various tissues and organs at each developmen vector. Lambda cDNA libraries were bulk-excised to plas tal stage ofrice. However, most of them were derived from mid libraries. non-full-length cDNA libraries. ESTs are effective in cata loging the expressed genes, but not suitable for functional (2) Grouping of Terminal Sequences and Determination of genome analysis. Starting materials for the library construc Full-Length Sequences tion are extremely important to collect as many types of 0183 Clones were randomly picked up from each library, clones as possible. Table 1 is the list of starting materials for and both the 5’- and 3'-ends of each clone were sequenced constructing a library Suitable for functional genome analy once. 175,642 of these clones were sequenced from the S1S. 3'-end, and 91,425 clones from the 5'-end. These clones US 2006/O123505 A1 Jun. 8, 2006

(175,642) were clustered into 28,469 (nonredundant) groups by means of a grouping program using the nucleotide TABLE 2-continued sequences determined from the 3'-end. All clones from each Number of group were completely sequenced using the two methods Number of non-redundant (primer walking and shotgun methods). Assessment of Genome Size mapped Transcription sequence fidelity with the Fred value indicated that all Sequence Source (Mbp) clones % Unit representative 28,469 clones were sequenced with 99.98% BACPAC IRGSP 368 221 62 77.8 15523 fidelity. The length of the insert cDNA varied from 55 to clones from 6528 bp, and its average length was 1655.0 bp. japonica EXAMPLE 2 Functional Classification of Full-Length cDNA Clones 0190. The 28,469 clones originated from 19,000 TU of the rice genome draft sequences using 94% of genome (1) BLAST Search coverage data. Since, in this mapping method, paralog, 0184 Sequence homology search by BLAST was con pseudogene, and Such are counted into a single orthologue ducted as follows. Sequence data from 10 divisions of gene, the actual TU number is expected to be larger. In fact, NCBI's GenBank (as of June 15, 2002; release version 130) mapping results of the full-length cDNA clones to the were downloaded, and searches were carried out using BAC/PAC clone cluster derived from Chromosome 1 BLASTN and BLASTX programs using 28,469 sequences revealed that about 3,800 clones were mapped to 7,700 sites, as query (2002/7/20). Search subjects are the following 10 indicating that approximately twice the cDNAs overlap in divisions: PRI, ROD, MAM, VRT, INV. PLN, BCT, VRL, the rice gene. PHG, and PAT. Sizes of searched database were as follows: 0191 Herein, “one transcription unit’ means a group 0185 BLASTN: 1,212,780 sequences : 1998,000.464 comprising a plurality of transcripts sharing exons. When Letters two cDNAS are mapped to the same genomic region, an intron of one cDNA may have an exon of the other. In this 0186 BLASTX: 623,580 sequences : 327,145,996 Let case, the exon is not shared between cDNAs so that these ters cDNAs constitute different transcription units. In another 0187 Alignment pattern was checked for sequence case where two cDNAS are mapped to the same genomic homology, and a similarity threshold of E<10' was used. region, these cDNAs are sometimes different in their orien Because of BLAST N search, 2603 cDNA clones were tations; an exon of one cDNA is mapped to the nucleotide identical to already-registered rice genes, and classified into sequence of an antisense Strand of the other. In addition, in the identical rice genes. As a result of BLAST X search, this case, since the exon is not shared, they constitute 5,607 clones were homologs of already known rice genes, different transcription units. 12,527 clones were homologous to already-known genes of 0.192 Considering the genome coverage of less than plants other than rice, and 859 clones were homologous to 100% and the clustering results at the clone level, a unique already-known genes in organisms other than plants. These population of 20.259 representative full-length CDNA search results were shown in Table 2. In total, these homol clones is selected to hereafter discuss sequence data of total ogy searches enabled to assign potential functions to 21.596 clones (28,469) and unique representative clones (20.259). A (75.86%) clones. unique population refers to a cluster of cDNAs predicted to 0188 “Results of BLAST search” at the end of this be transcribed from the same TU. A particular clone of the specification shows data of the highest homology to each unique population representing said unique population is clone obtained by the homology search of 28,469 represen referred to as a representative clone. tative full-length cDNA clones using BLASTN and BLSST 0193 The nucleotide sequence identity between CDSs X programs. predicted in the indica genome and ORFs in cDNAs of the 0189 28,469 full-length cDNA clones were mapped to present invention is 93%, while the amino acid identity is three types of already known rice genome sequence data, only 53%. These results may reflect the following reasons. that is, the indica (aforementioned) and japonica (aforemen tioned) draft genomes. Sequence data of the gene coding The longest ORF of cDNA of the present invention is not regions are said to be very similar between japonica and correct; and/or indica subspecies. 94% of the 28,469 full-length cDNA clones were mapped to these rice draft genomic sequences. the predicted CDS of indica genome is not correct. Mapping results are shown in Table 2. 0194 From the results of InterPro search and such as described below, cDNA nucleotide sequences of the present TABLE 2 invention would have higher fidelity than CDS predicted in Number of indica, indicating that the actual acquisition of cDNAS Number of non-redundant improves the quality of information on gene sequence. Genome Size mapped Transcription Sequence Source (Mbp) clones % Unit 0.195 For example, comparison between predicted CDSs japonica draft Ref 2 390 26930 94.6 18933 reported for the BAC clone derived from Chromosome 1 genome and mapping results of full-length cDNA clones of the indica draft Ref1 363 26784 94.1 19036 present invention clearly revealed the presence of various genome differences between them. Results of comparison of them at the exon and intron levels are shown in FIG. 1. Results of US 2006/O123505 A1 Jun. 8, 2006

comparison of cDNAs with all CDSs of Chromosome 1 are the clones that were mapped in the opposite direction to the listed in Table 3. same genomic DNA region. The number of such clones was 1443. TABLE 3 4) Full-Length cDNAs Are Useful in the Promoter Analysis. Total number of BACPAC clones from 418 (redundant) Chromosome 1 0208. The expression analysis using a microarray system Total number of annotated BACPAC 387 (redundant) containing 8987 rice EST clones found that UV irradiation clones from Chromosome 1 to rice led to the transcriptional upregulation of 58 EST Total number of CDS 9828 (redundant) clones. After confirming these upregulations by the real-time Total number of Transcription Units 7,763 mapped by FL-cDNA clones by 3895 FL-cDNA clones PCR method, candidate genes whose expression was Total number of the hit CDS by 4874 (49.6%) enhanced were PR-10b and PBZl genes. FL-cDNA clones by 3239 FL-cDNA clones 0209 The full-length cDNA clones found to correspond to these ESTs were mapped to the region 10 kb away in the 0196) In Table 3, 418 CDSs derived from Chromosome 1 same BAC sequence derived from Chromosome 12. Fur are registered, 387 clones of which are annotated with CDS. thermore, the use of sequences 1 kb upstream of the respec Total number of CDS is 9828. On the other hand, when tive transcription initiation points of the full-length cDNA full-length cDNA clones of the present invention were clones as queries for the cis element database (PLACE) fou mapped to the same BAC/PAC clones, the total number of nd cis elements such as GT1 CONSENSUS and TBOXAT TU mapped by 3895 full-length cDNA clones was 7763. GAPB. Although further experiments are needed to confirm Then, the coincidence of the mapped regions with the CDS whether these cis elements are actually involved in UV regions was examined. When the coincidence of at least 10 related transcriptional upregulation or not, the use of infor bp in the same direction is judged as a coincidence, the total mation on full-length CDNA clones, as described above, number of the CDSs that hit by mapped cDNA clones was will facilitate the promoter analysis of target genes from only 4874, accounting for only 49.6% of the total CDS. That EST. is, the results of assessing the predicted CDS based on the 5) Protein Informatics Analysis full-length cDNA sequences actually isolated from rice 0210 Amino acid sequences of proteins encoded by indicated that current programs cannot find CDS with suf nucleotide sequences of cDNAs were deduced from the ficient accuracy, and that information on the cDNA nucle longest ORF (from ATG to term codon). Because of homol otide sequence isolated from the living body is important. ogy search, clones clearly encoding RNA (24.397 clones), 3) Changes in the Transcript Form and clones with an ORF shorter than 100 amino-acid resi dues were excluded from analysis. 28.332 clones had ORF, 0197). Of the 18.933 TUs mapped to the japonica genome and the average number of amino acid residues was 331 draft, 5045 are multiexon TU that contains two or more residues. Average numbers of nucleotides of the 5’- and transcripts with a plurality of forms. These transcripts were 3'-untranslated region (UTR) were 259.83 and 398.41, analyzed in detail focusing on the following conditions: respectively. 24.507 clones had ORFs exceeding 100 amino 0198 Difference in length of the 5’- and 3'-ends: acid residues. 0199 presence or absence of exons; and 6) Assessment of Completeness of the Full-Length cDNAs 0200 initiation and termination sites of intron at the time 0211 cDNAs of the present invention were acquired of splicing. using the CAP trapper method that is advantageous for acquiring full-length cDNAs. Whether cDNA clones of the 0201 As a result, of the 5045 TU, 2471 TU (13.1%) present invention were full-length was confirmed by the contains clones having variations in the above-described following analyses. conditions. Contents of variations were as described below. 0212 Comparison of the full-length cDNA clones with 0202) Difference in the 5'-end: 1673 TU (8.8%), the previously registered 859 cDNA sequences according to single-pass sequence data, revealed for 621 of 667 5'-end 0203 Difference in the 3'-end: 853 TU (0.5%), sequences and 570 of 648 3'-end sequences, the full-length 0204 Alternatively spliced exons: 94 TU (4.5%), cDNA clones of the present invention were longer than the registered cDNA sequence. Using full sequences of the 0205 Difference in the intron initiation point: 180 TU clones, the full-length cDNA clones of this invention were (1.0%), and longer than the registered cDNA sequence in 468 of 579 0206. Difference in the intron termination point: 241 TU cases. Thus, it was obvious that most of the full-length cDNA clones of the present invention, which were found to (1.3%). coincide with the known nucleotide sequences, comprise When the above TUs are simply summed up, the sum does longer terminal nucleotide sequences compared to the not become 2471 because the identical TUs might have known sequences. Therefore, cDNAs of the present inven different combinations. Even considering the collection bias, tion were confirmed to be a cluster of cDNAs that are likely the alternative splicing event is not frequent in rice. to be full-length. 0207. Due to these differences in the transcript structure, 0213 Furthermore, whether cDNAs have full-length or alterations at the amino acid level accompanying with ORF not can be confirmed if they contain the initiation and changes among transcripts occurred at 1937 TU (78.4%) termination codons. More preferable full-length cDNA loci. In contrast, 902 antisense transcript pairs were found as clones according to this invention may contain more US 2006/O123505 A1 Jun. 8, 2006 18

5'-UTRs of their transcript. Containg of all of the 5'-UTRs contain the false hit data due to the small number of is not the essential prerequisite for the full-length cDNA. constitutive amino acid residues. However, if transcript nucleotide sequences are maintained 0222 Among 3491 domains, 313 are plant (only O. as completely as possible, the usefulness of full-length sativa and A. thaliana)-specific and 1356 do not occur in cDNAs increases. For example, when the transcriptional plant proteins. There are 1177 domains that are animal (any regulatory region is obtained, the transcription initiation of C. elegans, D. melanogaster, and H. sapiens) -specific, point must be correctly determined for accurately obtaining and 528 are not observed in these species. Eighty domains it. For example, because of the alignment of cDNA clones were yeast (S. cerevisiae and S. pombe) -specific and 1776 and EST, etc., clones having a long nucleotide sequence on were not found in yeasts. These results indicate that 85% of the 5' side is likely to have full-length. When 98% homology the total number of domains is found in animals, 61% occur or more was observed for 80% region or more of cDNA in plants, and 50% occur in yeasts. Comparison of the clones to the registered clones, they were regarded as the numbers of proteins having domains (number of clones) same clone. between rice and Arabidopsis yielded interesting results as 7) InterPro Homology Search described below. 0214) A protein domain search was performed by using 0223 Rice-specific domains (Table 4b), or those more information on the amino acid sequence of the rice full frequent in rice than Arabidopsis (Table 4c), length cDNA clones having the 28.332 deduced ORFs 0224 Pollen allergic protein domain, against the InterPro database. At the same time, similar 0225 Organ-specifically expressed domains (e.g., serine InterPro searches were conducted using the following protease inhibitor of seed), amino-acid sequence data to compare the both results. 0226 Environmental stress-inducible proteins (antifreeze 0215 Arabidopsis thaliana (27.288 sequences), proteins, drought: ABA/WDS-inducible protein, etc.), 0216 Caenorhabditis elegans (20.732), 0227 Domains more frequent specifically in Arabidopsis 0217 Drosophila melanogaster (18,118), (Table 4d) 0228 Toll and interleukin-1 receptor (TIR) domain. 0218 Homo sapiens (24,147), 0229. It was found that the TIR domain overwhelmingly 0219 Saccharomyces cerevisiae (6360), and occurs in the N-terminal region of the NBS-LRR type disease-resistance genes (R gene) products in Arabidopsis, 0220 S. pombe (4962). but does not at all in rice. TIR-NBS-LRR type R genes are 0221) The search yielded a total of 3491 InterPro known to be amplified at high level in a specific genome domains from all the above organisms. Table 4 highlights the region of Arabidopsis, but no alternative amplified domain 13 most frequent domains, and domains ranked below the was found in the R gene in rice. Transposon-related gene third were observed almost commonly among 7 organism products were also more specifically and frequently found in species. Second most frequently observed domains may Arabidopsis than in rice.

TABLE 4

O. A. C D H. S. S. Saiiva thaiana elegans melanogaster sapierce cereviciae Dombe interPro ID InterPro name count rank count rank count rank count rank count rank count rank count rank a. Top 13

PROO1687 ATP, GTP- 1281 1 1781 1 894 1 1061 1 1435 2 441 1 330 1 binding site motif A (P-loop) (*) PROOO694 Proline-rich 116S 2 618 6 723 2 794 2 1791 1 88 6 51 10 region PRO00719 Eukaryotic 944 3 1053 2 520 4 397 4 639 5 122 2 111 3 protein kinase PROO2290 Serine. Threonine 827 4 1010 3 488 5 359 5 589 6 114 3 107 4 protein kinase PROO1245 Tyrosine protein 792 5 995 4 402 8 340 6 557 8 106 4 101 5 kinase PROO1611 Leucine-rich 344 6 SO4 7 68 51 151 16 219 23 6 46 10 37 repeat PROO1841 Zn-finger, RING 341 7 436 9 155 22 145 19 320 13 34 19 44 11 PROO1810 Cyclin-like F-box 335 8 6SO 5 428 6 46 63 70 69 14 38 13 34 PRO00504 RNA-binding 325 9 313 10 160 2O 293 7 340 10 65 11 86 6 region RNP-1 (RNA recognition motif) PRO02885 PPR repeat 288 10 459 8 1 110 5 104 5 130 2 50 4 43 PROO1680 G-protein beta 268 11 267 14 166 19 240 9 329 11 102 5 121 2 WD-40 repeat US 2006/O123505 A1 Jun. 8, 2006 19

TABLE 4-continued

O. A. C D H. S. S. Saiiva thaiana elegans melanogaster sapience cereviciae bombe

interPro ID InterPro name count rank count rank count rank count rank count rank count rank count rank PROOO379 /? 232 12 221 17 137 25 167 14 108 49 37 18 25 22 thioesterase, PROO3593 AAA ATPase 227 13 307 11 116 27 160 15 159 31 82 7 70 7 b. Exist in rice but not in Arabidopsis

PROO5795 Major pollen 16 99 allergen Lolp PROO3496 ABAWDS 1S 100 induced protein PROOO877 Bowman-Birk 13 102 serine protease inhibitor c. Ricef Arabi >> 1

PROO4873 BURP domain 70 51 5 125 PROO1568 T2 29 86 5 125 1 110 1 108 1 134 51 PROOO104 Antifreeze 69 52 13 117 4 107 143 2O 41 94 47 protein, type I d. Arabi Rice >> 1

PROOO157 TTR domain 1 14 126 29 3 108 12 97 23 112 PROO4252 Putative plant 2 13 76 57 transposon protein PROO4146 DC1 domain 4 11 138 25 PROO3614 Knottin 1 14 19 111 6 103 PROOO477 RNA-directed 5 10 91 47 64 S4 33 76 69 70 47 13 34 DNA polymerase (Reverse transcriptase) PROO5174 Protein of 2 13 34 96 unknown function DUF295 PROOS162 Retrotransposon 7 O8 86 50 S 106 2 107 1 134 gag protein PROO3653 SUMO'Sentrin 8 O7 81 52 6 105 7 102 7 128 50 4 43 Ubl1 specific brotease PROO1584 integrase, 13 O2 117 32 21 90 33 76 7 128 47 15 11 36 catalytic domain PROO4332 Plant MudR 13 O2 99 42

8) Transcription Factor TABLE 5-continued 0230. The InterPro search yielded 1336 transcription factor clones classified into 18 DNA-binding domains, as Number of categorized in Table 5. As a result of classification, Zn Category FL-cDNA clones Comment finger-type transcription factors are most numerous, fol Myb 158 lowed by Myb-typefactors; these constitutions are similar to ERF 83 Including one ERF those for Arabidopsis. Of the predicted transcription factor NAM 74 clones, those included in the Zn finger-type are classified Homeo box 73 Including ELK, KNOX1.2 into Subtypes as shown at the end of this specification. bZIP 63 Not includind Zn finger AUXIAA 47 Including four TFB3 TABLE 5 WRKY 46 TFB3 27 Not including ERF, AUXIAA Number of GRAS 27 Not including Zn finger Category FL-cDNA clones Comment MADS 26 HSF 25 Zn finger S88 Including Ring, C2H2, Cx8CX5C3H, Tubby 24 Dof, GATA, Constans, NF-X1 BRCT 19 Not including Zn finger US 2006/O123505 A1 Jun. 8, 2006 20

ranged from 0.9968 to 0.2 in the trial analysis. When, a TABLE 5-continued cut-off value for the certainity was set at 0.5, this sorting process predicted the localization of signal peptide Number of Category FL-cDNA clones Comment sequences in 18,166 clones out of those with ORF having 100 or more amino acid residues. Table 7 shows the pre Fungal TF 18 dicted target organelles, number of clones whose localiza SBP 17 Jumonji 11 Including mC, Jim N, not including tion in that target has been predicted, and the ratio (%) of Zn finger said clones to 18,166 clones. Proteins predicted to be in the TCP(k 10 nucleus formed the largest group, accounting for 20.0% of total, followed by those in the plasma membrane, cytoplasm, Total 1336 endoplasmic reticulum, microbody, etc., and these locations all accounted for about 10%.

9) Searches for Transmembrane Spanning Proteins and TABLE 7 Their Cellular Location 0231. The MEMSAT programis effective in predicting Target organella Number of clones (%) the secondary structure of a transmembrane spanning pro Chloroplast Stroma 1383 7.6 teins and their topology. The use of this program identifies thylakoid membrane S42 3.0 thylakoid space 90 O.S the number of transmembrane spanning segments of full Cytoplasm 1875 10.3 length CDNA clones and their topology. Table 6 shows the ER 2045 11.3 number of transmembrane-spanning segments, total number Golgi body 26 O.1 of said clones, and direction of the segments (when the Microbody 1783 9.8 N-terminus is intracellular: IN, when extracellular: OUT). Mitochondorion inner membrane 5O2 2.8 Intermembrane space 8O 0.4 matrix space 1434 7.9 TABLE 6 Outer membrane 32 O.2 Nuleus 3635 2O.O Number of Spanning Outside 1508 8.3 transmembrane-spanning Total number direction Plasma membrane 2982 16.4 Vacuole 249 1.4 segment of clone In Out Total 18166 1 17839 8175 9664 2 2842 1585 1257 3 1259 646 613 A cut-off value for certainity: >0.5 4 789 470 319 5 383 160 223 6 290 177 113 10) Homology With Arabidopsis 7 2S6 90 166 8 109 62 47 0234) To examine the homology of the full-length cDNA 10 138 97 41 clones to the genes encoded in the Arabidopsis genome, rice 11 120 59 61 28,444 ORF amino-acid sequences were compared with the 13 51 14 37 14 19 18 1 27.288 deduced amino-acid sequences from the predicted 15 4 3 1 CDS of the Arabidopsis genome at the BLASTP threshold 16 12 7 5 of E-107. Results are summarized in Table 8. 18,900 17 5 3 2 full-length CDNA clones (12,996 TU, 64%) had a homolo 2O 3 3 O gin Arabidopsis, and 20,473 Arabidopsis genes (75%) had a homolog in the rice full-length cDNAs. 0232. It was found that 17,839 clones had only one 0235 Similar comparison of the indica genome with transmembrane-spanning segment, while 6.280 clones had Arabidopsis genes at the genome sequence level was already two or more, accounting for 22.1% of the total cDNA clones. reported, saying that 50% of predicted rice genes had a As to the spanning direction, one direction overwhelmingly homolog in Arabidopsis and 80% of Arabidopsis genes had exceeds the other in number depending on the number of a homolog in rice. If the prediction in indica is correct, then spannings in Some cases, but the numbers of both directions considering the full-length cDNA clones being 19,000 TU were mostly about the same. and the redundancy of genes in their respective genomes, it 0233. The pSORT program is effective in predicting becomes a subject to further acquire a 5% equivalent gene signal peptide sequences and their cellular location. This family common to Arabidopsis and collect more rice-spe program yields the data on the putative target organelles to cific gene family to reduce the fraction of Arabidopsis be sorted and the certainity of the localization. The certainity common TU from 64% to 50%.

TABLE 8 iaponica indica Arabidopsis

Fraction E-10 ES-SO E-100 E-10 E-SO E-1OO E-10 ES-SO E-100 1 japonica 2667 specific US 2006/O123505 A1 Jun. 8, 2006

TABLE 8-continued iaponica indica Arabidopsis

Fraction E-10 ES-SO E-1OO E-10 ES-SO E-100 E-10 ES-SO E-100

2 indica ::::::::: ::::::::: ::::::::: 18461 306SS 38684 ::::::::: ::::::::: ::::::::: specific 3 Arabidopsis ::::::::: ::::::::: ::::::::: ::::::::: ::::::::: ::::::::: 5171 12148 18129 specific 4 iaponica de 5560 7627 8682 8215 7969 7252 ::::::::: ::::::::: ::::::::: indica COOl 5 indica & ::::::::: ::::::::: ::::::::: 1553 1094 777 1644 1667 1445 Arabidopsis COOl 6 Arabidopsis 372 610 622 ::::::::: ::::::::: ::::::::: 674 983 1036 & japonica COOl 7 common 2O3S1 1401S 7573 25169 1368O 6685 19799 12490 6678 amang three

total 2.8950 53398 27288

11) Functional Classification of Genes Using Gene Ontol- 0238 ogy (GO) TABLE 10 0236 GO terms are assigned to GenBank reports, Inter Pro domains and Arabidopsis genes. If GO terms are Category Number assigned to the information in the concerned database to whichhich full-length cDNAsc s of theth present invention showedshowe enzymeligand binding or carrier S988S369 homology, GO terms can be added to the full-length cDNA nucleic acid binding 2199 clones of the present invention based on the results of transporter 1615 homology search. Therefore, these GO terms were collected still prote nkn s: to perform the functional classification based on them. molecularsignal transducer function unknown 193 inhibit 143 0237) The total number of cDNA clones of the&G1 present E.efense immunityinnibitor protein 112 invention to which GO terms associated with “biological motor 60 processes' were added was 18,485, which were classified as chaperone 48 shown in Tables 9-12. About a half of full-length cDNAs storage protein 41 with “unclassified function' formed the largest group, fol- toxin 12 1 d by those classified into “metabolism.”“transport microtubule binding 8 lowe y 99 s port, 8 translation,”, accounting for one-fourth of the total terms. apoptosis regulator 5 GO terms associated with “function' were added to 10,942 obsolete 4 full-length cDNA clones, and the total number of said GO cell adhesion molecule 3 terms was 16,853. These GO terms are unable to be classi- Total 16583 fied exclusively, and the most frequently observed GO O8. “function' term was “enzyme.” GO terms associated with “cellular component” were added to 3629 full-length cDNA clones, and the total number of said GO terms was 3637 0239) TABLE 9 TABLE 11 Category Number % Enzymes Number unclassified 98.69 53.4 2099 Metabolism 4912 26.6 transferase 1909 Transport 1108 6.O kinase 1126 Translation 694 3.8 796 Transcription 583 3.2 ATPase 378 cell communication 436 2.4 250 Energy 276 1.5 phosphatase 213 Communication, Defense 241 1.3 ligase 18S Cell growth Maintenance 164 O.9 helicase 130 Developmental Process, Aging, Death 101 O.S monooxygenase 105 DNA replication 92 O.S 87 others 9 O.O Small protein conjugating enzyme 50 aldolase 37 total 18485 serine esterase 35 disulfide oxidoreductase 25 US 2006/O123505 A1 Jun. 8, 2006 22

0244 Many plant genes are known to be amplified com TABLE 11-continued pared to those in animals, suggesting differences between plants and animals in the strategies used to increase the Enzymes Number variety of proteins. Protein informatics analyses using the Small protein activating enzyme 7 InterPro database provide insight into differences in the glycine cleavage system 6 profile of the plant proteins such as proteins more frequent heme-copper terminal 6 in rice than Arabidopsis and vice versa. Rieske iron-sulfur protein 4 1-deoxyxylulose-5-phosphate synthase 3 0245. The protein homology searches between rice and 34 dihydroxy-2-butanone-4-phosphate synthase 3 lipoate-protein ligase 3 Arabidopsis confirmed that the present full-length cDNA 2C-methyl-D-erythritol 2,4-cyclodiphosphate synthase 1 collection contains homologs corresponding to 75% of DNA repair enzyme 1 genes predicted in Arabidopsis, while 64% of the TU glycogen debranching enzyme 1 clusters in rice are homologous to Arabidopsis gene. imidazoleglycerol-phosphate synthase 1 “Clone List Total 7461 0246 The Clone list (Table 13; submitted in electronic format) shows, from the left column, the clone name 0240) (CLONE NAME), DDBJ accession number (DDBJ ACC), nucleotide sequence identification number (N ID), amino TABLE 12 acid sequence identification number (AMINO ID), tran scription initiation codon position (START POSITION) and Location Number transcription termination codon position (STOP POSI cell 3414 TION). unlocalized 16S “BLASTX Search Results extracellular 56 external protective structure 2 cellular component unknown O 0247 The following information on each clone was obsolete O described in Table 14 (submitted in electronic format) from the left divided by //. Total 3637 intracellular 2222 Clone name, membrane 1457 DDBJ accession number of each clone, Total (cell) 3679 Cluster ID to which the clone belongs, Accession number of clone which yielded hits in BLASTN, 12) Tissue-Specifically Expressed Gene Definition of clone that yielded hits in BLASTN, 0241 Full-length CDNA clones of the present invention were obtained from various libraries as shown in Table 1. Score when yielding hits in BLAST N. Comparing genes obtained among these libraries with each E value when yielding hits in BLAST N. other enables to sort genes universally obtained from a plurality of tissues and those specifically expressed in par Accession number of clone that yielded hits in BLAST X, ticular tissues. The results obtained by this gene expression Definition of clone that yielded hits in BLAST X, analysis are referred to as body map. Score when yielding hits in BLAST X, 0242. The body map of the full-length cDNAs of the present invention revealed that most of full-length cDNA E value when yielding hits in BLAST X clones obtained from libraries derived from the flower “List of Clones Predicted as Transcription Factor organs are specifically Xpressed in the flower organs and are not expressed in other tissues. A list of clones derived from 0248. In the List of Clones Predicted as Transcription flower organs is shown as “Clone list derived from flower Factors (Table 15; submitted in electronic format), Clone organs' at the end of this specification. Most of clones set name, ID of InterPro domain name, and domain name are forth in this list are genes whose expressions have been described from the left divided by =. confirmed in the maturation processes from the stages of young spikes through seed maturation. “List of Clones Predicted to be Zn Finger” 0249 Results of categorizing clones predicted to be Zn 13) Conclusion finger into Subtypes are shown. Corresponding clones are 0243 The rice full-length cDNA project has completely listed following the InterPro domain ID and domain name. sequenced a set of 28.469 clones and mapped them to the IPR000822:Zn-Finger, C2H2 Type genome, revealing that these clones are those derived from at least 19,000 TU. Information on these cDNA clones will 0250 001-012-G08, 001-014-E06, 001-015-H09, 001 provide precise gene annotation from the genome sequence 017-D01, 001-017-F12, 001-017-G09, 001-019-B08, 001 data, and facilitate the assignment of EST sequences to the 021-A06, 001-021-FO9, 001-023-A03, 001-030-GO2, 001 corresponding promoter sequences. Considering possible O38-B11, 001-102-C06, 001-103-H07, 001-107-E09, 001 bias during collection of various clones, the frequency of 111-B01, 001-111-H03, 001-113-E02, 001-113-E12, 001 alternative splicing events in plants may be lower than that 119-H03, 001-125-A08, 001-128-E09, 001-200-D09, 001 in animals. 200-H06, 001-203-E 10, 002-102-A12, 002-107-D06, 002

US 2006/O123505 A1 Jun. 8, 2006 24

J023020P04, J023021A17, J023023M16, J023031B11, 0272 002-150-D04==microtubule binding JO23031G24, J023034D08, J023039004, J023041 E24, J023044E12, J023044P07, J023047F13, J023047F14, 0273) 002-143-F09==nucleic acid binding J02304.9F08, J023049J20, J023052D05, J023052J15, 0274) 002-147-D09==nucleotide binding J023054E15, J023054P08, JO23061KO6, J02306318, 002-145-H10==protein binding JO23066IO4, J023072K15, J023075N19, J023077E12, 0275 J023077P18, J023078L19, J023089D17, J023096P15, 0276) 002-145-G05==RNA binding J023097G23, J023098C19, J023105K15, J023105K22, J023106F04, JO23106J06, J023106M02, J023107E20, 0277 002-143-E 10==Sugar binding J023108I11, J023110J02, JO23114C23, J023124E09, 0278) 002-137-H02==tRNA binding J023127C10, J023133D04, J023134P18, J023139M11, J023142C17, JO23148H24, JO23149D24, J023149P20, 0279 002-143-E06==zinc binding J023150.J15, J033000G10, J033010L07, J033015K15, ( J033020F05, J03302008, J033021A14, J033023F07, 0280 J023043J23== inhibitor J033023K06, J033025K23, J033026G05, J033026H11, J033029A20, J03303OD14, J033033J06, J033036L15, 0281 J013043C13== inhibitor J033038J15, J033041L15, J033047I01, J033048I07, J033048J15, J033050HO1, J033051D17, J033058G15, 0282) 002-151-F03==enzyme inhibitor J033060L17, J033063D14, J033067G01, JO33068I24, 0283 J023100009==RAB GDP-dissociation inhibitor JO33068K11, J033068N07, J033069D23, J033072K04, J033073H23, J033075P09, J033083D05, JO33084AO1, 0284 J023112G04==Rho GDP-dissociation inhibitor J033088P15, J033089H07, J033090H12, J033091 I10, 0285 002-145-A10==serine protease inhibitor J033094D12, JO33094G14, J033101B01, J033104F07, Transporter Protein Transporter J033106E23, J033107017, J033108E06, J033117A10, J033119F23, J033119L05, J033120I17, JO33122C24, 0286 J023037NO7==amino acid-polyamine transporter J033125G22, J033129G15, J033132P09, J033133F19, J033142N16, J033149FO1 0287 002-150-A12==ammonium transporter 0288 002-145-D12==ATP-binding cassette (ABC) “Results of Gene Ontology’ transporter 0254 Clone name (CLONE NAME), database name, GO code, and gene ontology label (GO LABEL) were 0289 J023036P06==cation transporter described in Table 16 (submitted in electronic format) 0290 J013108A03==cobalt ion transporter according to GO categories divided by =. 0291) 002-144-CO2==electron transporter “List of Clones Included in Each Category of Gene Ontol 99 J023054A11==heavy metal ion transporter Ogy 0292) 0255 Clone names and GO terms added to respective 0293 002-111-D08==heme transporter clones are categorized. 0294 J023028L 13==hydrogen-transporting tWO-SectOr Binding Protein Binding ATPase J023050L03==inorganic phosphate transporter 0256 J023051M04==actin binding 0295) 0257 J023149A14==acyl-CoA binding 0296) J023136I24==intracellular transporter 0258 J023030G12==amino acid binding 0297) 002-169-D04==nucleobase transporter 0259 002-144-D12==ATP binding 0298) J033024P16==nucleoside transporter J023052011 ==nucleotide-sugar transporter 0260 002-145-D12==ATP-binding cassette (ABC) 0299) transporter 0300 002-151-FO1 ==plasma membrane cation-trans 0261 J013002C11==biotin binding porting ATPase 0262 002-143-H11==calcium ion binding 0301 J023055A03==potassium transporter 0263 002-154-C06==chitin binding 0302) J023048B11==protein transporter 0264) 002-160-B03==chromatin binding Carrier 0265 002-149-C02==copper binding 0303) J023122I22==acyl carrier 0266 002-143-H02==DNA binding 0304 JO33133F17==carrier 0305 002-150-BO7==Fe3S4/FeaS4 electron transfer car 0267 J023111A18==double-stranded RNA binding rier 0268 J023029A13==GTP binding 0306 JO23035HO8==iron-sulfur electron transfer carrier 0269 J023033I19==heavy metal binding 0307 J023034G24==protein carrier 0270 J023079P11==iron binding 0308) J023028D17==redox-active disulfide bond elec 0271 J023041 I08==lipid binding tron carrier US 2006/O123505 A1 Jun. 8, 2006

Enzyme 0343 002-145-G09=="alpha,alpha-trehalose-phosphate 0309 001-100-C02=="1,3-beta-glucan synthase” synthase (UDP-forming) 0310 J023080J16==1-aminocyclopropane-1-carboxy 0344) JO33046P17=="alpha-1,3-mannosylglycoprotein late synthase beta-1,2-N-acetylglucosaminyltransferase' 0311 J023132G24==1-deoxyxylulose-5-phosphate syn 0345 002-162-AO1==alpha-amylase thase 0346 002-167-G08==alpha- 0312 006-308-E05=="1-phosphatidylinositol-4,5-bisphosp 0347 J023039E12== phosphate ' 0348 006-204-B10-=aminoacyl-tRNA hydrolase 0313 J023083N15== 1-phosphatidylinositol-4-phos phate 5-kinase 0349 J013106I02== 0314 001-003-D03=="2C-methyl-D-erythritol 2.4-cy 0350 006-208-D05== clodiphosphate synthase' 0351 J023082J11==ammonia ligase 0315 002-145-A06==2-dehydro-3-deoxyphosphohepto 0352 J013052F22==arginine-tRNA ligase nate aldolase 0353) J023081 B09==argininosuccinate synthase 0316 J0231.44M13=="3,4 dihydroxy-2-butanone-4- phosphate synthase' 0354) 006-309-E04== J033004I10== 0317 J023083L07==3'-5' exoribonuclease 0355) 0318 J013032A17==3-beta-hydroxy-delta(5)-steroid 0356) 002-160-AO4==aspartate-tRNA ligase 0357) 002-149-B12==aspartic-type endopeptidase 0319 J033028020==3-dehydroquinate dehydratase 0358) J033082C14==ATP phosphoribosyltransferase 0320 J013067A20==3-dehydroquinate synthase 0359 002-161-E09==ATP-dependent peptidase 0321 J013000009==3-deoxy-manno-octulosonate cyti 0360) J023048N01==beta-amylase dylyltransferase 0361) J033105C09==beta-galactosidase 0322 J033004006==3-hydroxyisobuty- rOXV1SObutVrate dehydroge J023055K01==beta-N-acetylhexosaminidase aSC 0362 0323 002-151-H05==5-methyltetrahydropteroyltri 0363) 002-159-A07== glutamate-homocysteine S-methyltransferase 0364 J033031 KO2== 0324 J023050A11==6- 0365) J013167021==calpain 0325 002-138-C02== 0366) J033082C21==carbamoyl-phosphate synthase 0326 002-154-A08==acetyl-CoA carboxylase 0367) 002-152-A12==carbonate dehydratase 0327 J023043F24== 0368 002-161-G04==carboxyl- and carbamoyltrans 0328 J023087C13== ferase 0329 002-154-G06==acyl-CoA dehydrogenase 0369) J023068K15==carboxy-lyase 0330 006-301-D08==acyl-CoA oxidase 0370 002-147-C06== 0331 006-307-C01.==acyl-CoA thioesterase 0371) J013051E04=="casein kinase II, regulator” 0332 002-147-G01== 0372) 002-174-H07==caspase 0373 J023079C18==catalase 0333 006-203-H06== 0374 J023041M10==CDP-diacylglycerol-glycerol-3- 0334 J013070D05==adenosinetriphosphatase phosphate 0335) JO 13002P17== 0375) 3-phosphatidyltransferase 0336) J023041P13==adenosylmethionine decarboxylase 0376) 002-164-A06== 0337 JO 13088JO3==adenylosuccinate synthase 0377) JO23042L15==chi tinase 0338 002-177-B04==alanine-tRNA ligase 0378) JO33031 C20==chorismate 0339) 002-144-E03==", zinc-de 0379) J013 110C10== pendent’ 0380) J0231.46L07==citrate (SI)-synthase 0340 J023139A20==aldolase 0381 J0230881.03==CoA hydrolase 0341 J023050C09==aldose 1-epimerase 0382) 002-162-H04=="copper, zinc superoxide dismu 0342 002-139-H02=="alpha,alpha-trehalase' tase' US 2006/O123505 A1 Jun. 8, 2006 26

0383) J023050L08==coproporphyrinogen oxidase 0422 J023058E10==glutamine amidotransferase 0384) 001-102-AO3==cyanate lyase 0423 001-047-D06==-glutamyl tRNA reductase 0385) J013071P10==cysteine-tRNA ligase 0424 J013000L07==glutamyl-tRNA(Gln) amidotrans 0386 002-143-E04==cysteine-type endopeptidase ferase 0387) J023089B16==cytochrome c oxidase 0425 J033076L03== 0388 J023148P17==D-alanine-D-alanine ligase 0426 J013033N23==glutathione synthase J033051A22-= 0427 J023036H06==glyceraldehyde 3-phosphate dehy 0389) drogenase (phosphorylating) 002-182-G05==deoxyribodipyrimidine 0390) 0428 002-159-C07==glycerol-3-phosphate dehydroge 0391) J023080K21== a SC 0392) 001-046-A07==dihydrodipicolinate reductase 0429 J023050E11==glycerol-3-phosphate dehydroge 0393) J023089118==dihydroneopterin aldolase nase (NAD+) 0394) J013002H08==dihydroorotate dehydrogenase 0430 J013124A06==glycerone kinase 0395) J013147H17== 0431 J023047B14==glycerophosphodiester phosphodi eSterase 0396) JO2303OK24==disulfide oxidoreductase 0432) 002-161-H05==glycine 0397) 006-203-E05-=DNA 3-methyladenine dehydrogenase (decar I boxylating) 0398 002-160-D05==DNA ligase (ATP) 0433 002-153-G10==glycine hydroxymethyltransferase 0399) 033.020H02==DNA photolyase 0434 J023135F01==glycine-tRNA ligase 0435 002-155-F06==glycolipid 2-alpha-mannosyltrans 04.00 002-151-C09=-DNA-directed RNA polymerase ferase 0401 J033003F07==dolichyl-diphospho-oligosaccha ride-protein 0436 J023044M15==glycosyltransferase 0402 J033038E19== 0437 J033040C05==glycyl-peptide N-tetradecanoyl transferase 0403 J023031C24==endopeptidase C1p 0438 001-201-F05==GTP cyclohydrolase I 04.04 J023054N24== 0439 J013116D09==GTP cyclohydrolase II 04.05 001-102-D09==ferredoxin reductase 04.06) J033072J07== 0440 002-167-C06==GTPase 04.07 J023125G01==folylpolyglutamate synthase 0441 002-165-D05== 0408 002-164-E10==formate-tetrahydrofolate ligase 0442 002-179-F05==heterotrimeric G-protein GTPase 0443) J023031 J07=="heterotrimeric G-protein GTPase, 04.09 002-119-H07=-formylmethionine deformylase alpha-subunit J023038E06==fructose-bisphosphate aldolase 0410 0444 J023038N14==hexokinase 0411 J013071 F22== 0445 J023037L20== 0412 J023071K13==galactosylgalactosylxylosylprotein 3-beta- 0446 J033030E05==homocysteine S-methyltransferase 0413 J023065E21== 0447 J023098E10== 0414 J033088N13==glucose-1-phosphate adenylyl 0448 001-020-B10-= transferase 0449 J023047F16==-hydrogen-translocating pyrophos 0415 J023102M14==glucose-6-phosphate 1-dehydroge phatase aSC 0450 J023086K07==hydrogen-translocating V-type 0416) J033116J21==glucose-6-phosphate isomerase ATPase J033069K09-=glutamate N-acetyltransferase 0451 J023028L 13==hydrogen-transporting tWO-SectOr 0417 ATPase 0418 002-166-H10==glutamate synthase 0452 J023031L04==hydrolase 0419 J033031H21==glutamate-5-semialdehyde dehy drogenase 0453 J023030L13==“hydrolrolase, actinging on carbon-nitrob gen (but not peptide) bonds' 0420 006-208-G06==glutamate-ammonia ligase 0454) JO23114011==“hydrolrolase, actinging on carbon-nitrob 0421 J023141A17==glutamate-tRNA ligase gen (but not peptide) bonds, in cyclic amides’ US 2006/O123505 A1 Jun. 8, 2006 27

0455 J023133M21=="hydrolase, acting on carbon-nitro 0490) J023095C13==methionine adenosyltransferase gen (but not peptide) bonds, in cyclic amidines' 0491 J023031NO1==methyltransferase 0456) J013059B20==“hydrololase, actinging on carbon-nitrob gen (but not peptide) bonds, in linear amides’ 0492 002-144-B01==monooxygenase 0457 002-145-B07=="hydrolase, hydrolyzing O-glyco 0493 J023100D05==N-acetyl-gamma-glutamyl-phos Syl compounds' phate reductase 0458 002-159-G02==hydro-lyase 0494 002-166-B07==N-acetyltranferase 0459) J023089B21=="hydroxymethyl-, formyl- and 0495 001-116-B03==NAD+ ADP-ribosyltransferase related transferase' 0496 001-113-E04==NAD+ synthase (glutamine-hydro 0460) J033088M20==hydroxymethylbilane synthase lyzing) 0461) J013170018==hydroxymethylglutaryl-CoA lyase 0497 J023048E09==NADH dehydrogenase (ubiquinone) 0462 J033025C16==hydroxymethylglutaryl-CoA reduc tase (NADPH) 0498) J023055E24== 0463 J023079L19==hydroxymethylglutaryl-CoA syn 0499) JO23049111== thase 0500 J023039G14==nucleoside-diphosphate kinase 0464 J033025I19==imidazoleglycerol-phosphate dehy dratase 0501) J023056M07== 0465) J033107D20==IMP cyclohydrolase 0502) J023054H04==oligosaccharyl transferase 0466 J023092E01.==indole-3-glycerol-phosphate syn 0503) 006-203-H05==O-methyltransferase thase 0504) J033094F19==orotidine-5'-phosphate decarboxy 0467 J023047P17==inositol/phosphatidylinositol kinase lase 0468 002-148-A06==inositol/phosphatidylinositol 0505) J023064101==O-sialoglycoprotein endopeptidase phosphatase 0506 002-147-D07==oxidoreductase 0469 J023125K01==inositol-3-phosphate synthase 0507 002-170-H05=="oxidoreductase, acting on the 0470 J023034C14=="intramolecular transferase, phos aldehyde or oxo group of donors, disulfide as acceptor photransferases” 0508 001-040-G05==pantoate-beta-alanine ligase 0471 002-168-C03==isopentenyl-diphosphate delta 002-146-G03== A isomerase 0509) J023038A03==peptidase 0472 J013002N13==ketol-acid reductoisomerase 0510) 0473 J033057L23==kinase 0511) J023033M17==peroxidase 0474 J023043J03== 0512) 002-155-B07==phenylalanine-tRNA ligase 0475 002-160-B02==leucine-tRNA ligase 0513 J023114K10==phosphatidate cytidylyltransferase 0476 J023132C04==lipid-A-disaccharide synthase 0514 J023108I08==phosphatidylcholine-sterol O-acyl transferase 0477 J033066K16==lipoate synthase 0515 J013008I04==phosphatidylserine decarboxylase 0478 J023043P03==lyase 0516 002-147-B03==phosphoenolpyruvate carboxyki 0479 002-165-A01==lysine-tRNA ligase nase (ATP) 0480 002-164-F03== 0517 J02304.9F03==phosphoenolpyruvate carboxylase 0481 J023056M13==magnesium 0518 J023080NO3==phosphogluconatedehydrogenase 0482 002-172-H02== (decarboxylating) 0483 J023065MO2==malic enzyme 0519 J023036N24== 0484 J023043B18==mannose-6-phosphate isomerase 0520 002-131-H11==phospholipase 0485 002-166-F07=="mannosyl-oligosaccharide 1,2-al 0521 001-039-E07==phospholipase A2 pha-mannosidase 0522 J02305.0004==phospholipase C 0486) 002-159-E 10== 0523 006-304-HO9== J023039G20= 0487 0524) J023133I04==phosphopyruvate hydratase 001-040-H12== 0488 0525) J023114015==phosphoribosylamine-glycine 0489 J023028L23==metallopeptidase ligase US 2006/O123505 A1 Jun. 8, 2006 28

0526 J013129D15==phosphoribosylaminoimidazole 0563) JO13169B12==ribonucleoside-diphosphate reduc carboxylase tase 0527 001-115-F04==phosphoribosylaminoimidazole 0564) J013026J06==ribose-5-phosphate isomerase Succinocarboxamide synthase 0565) J023042N11==ribulose-bisphosphate carboxylase 0528 006-203-G09==phosphoribosyl-AMP cyclohydro lase 0566) 006-205-A08==ribulose-phosphate 3-epimerase 0567 002-174-A05=="rRNA (adenine-N6.N6-)-dimeth 0529) 001-121-B07== yltransferase' 0530 002-174-E12==phosphoserine phosphatase 0568 002-143-D09==S-adenosylmethionine-dependent 0531 002-151-FO1 ==plasma membrane cation-trans methyltransferase porting ATPase 0569 002-183-AO5==serine carboxypeptidase 002-143-E 12== 0532) 0570) 002-146-HO3==serine esterase J033069C18==porphobilinogen synthase 0533) 0571) J023104M20==serine-tRNA ligase 0534 002-162-D12== 0572) J023048E15==serine-type endopeptidase 0535 J013159D01== 001-103-F08==serine-type peptidase (NADP+) 0573) 0536) JO 13002B07==procollagen-lysine 5- 0574) 002-146-B12== J023147K09== 0537) J023012J21==prolyl 0575) 0538) J023041C17==proteasome endopeptidase 0576) JO23012K16==Small monomeric GTPase 0539 002-145-E02==protein kinase 0577) 002-153-C06==Stearoyl-CoA desaturase 0540) J023079D17== 0578) 002-145-D10== 0541) J023060H13==protein prenyltransferase 0579) 002-147-B12== 0542 001-044-A03==protein serine/threonine kinase 0580) 002-144-D08== 0543) J023031I12==protein serine/threonine phos 0581 J013043008==sulfate adenylyltransferase (ATP) phatase 0582) JO23042G18==Sulfotransferase 0544 002-143-E05==protein 0583) J023060D13==superoxide dismutase 0545) 002-155-E07==protein tyrosine kinase 0584) J023133M06==thiamin-phosphate pyrophospho 0546 002-162-F06==protein tyrosine phosphatase rylase 0547 002-160-D11==protein tyrosine?serine/threonine 0585) JO13065EO5==thiosulfate phosphatase 0586 J013057I21== O548 002-145-HO9==protein-methionine-S-oxidep 0587) 002-146-A10== reductase 0588) JO23057DO5==transferase 006-310-C10== 0549) 0589 J023035J06=="transferase, transferring glycosyl 0550) J013069D01== groups' 0551) J03307OK02==pyridoxamine-phosphate oxidase 0590 J023034P03=="transferase, transferring hexosyl 0552) 006-305-H12==pyroglutamyl-peptidase I groups' 0591 J033065A07=="transferase, transferring phospho 0553) J023036A18== rus-containing groups' J023044E03==pyrroline 5- 0554 0592) 002-164-E12== 002-149-C12== 0555) 0593 J023048G03==transmembrane receptor protein 0556) J033096C17==queuine tRNA-ribosyltransferase tyrosine kinase 0557) JO23100009–=RAB GDP-dissociation inhibitor 0594) 001-118-H08== 0558) JO23076015==RAB Small monomeric GTPase 0595) J023115C24==triose-phosphate isomerase 0559) 002-145-H05=="racemase and epimerase, acting 0596) JO33124K13==tRNA on amino acids and derivatives' 0597 (5-methylaminomethyl-2-thiouridylate)-methyl 0560 002-163-F07==RAS GTPase activator transferase 0561 001-103-H08==RAS small monomeric GTPase 0598 002-148-C06==tRNA ligase 0562) J023053H09==recombinase 0599 006-206-E08==tRNA-intron endonuclease US 2006/O123505 A1 Jun. 8, 2006 29

0600 002-154-F02== 0611 002-124-B12==uroporphyringonen-III synthase 0601 J023125E11== 0612 J023047E18==uroporphyrinogen decarboxylase 0602) 001-046-E04==ubiquinol-cytochrome c reductase 0613 J023078I01==UTP-hexose-1-phosphate uridylyl 0603) J023049I12==ubiquitin activating enzyme transferase 0604) J023055G02==ubiquitin conjugating enzyme 0614 J023088P07==vacuolar 0605) JO 13066H16==ubiquitin C-terminal hydrolase 0615 J023028013==voltage-gated chloride channel 0606) J023039K21==ubiquitin-protein ligase 0616 002-127-C10== 0607) J013071 NO1==UDP-3-O-3-hydroxymyristoyl “List of Clones Derived From Flower Organs' 0608) N-acetylglucosamine deacetylase 0.617 This list (Table 17; submitted in electronic format) shows the relationship between the names of respective 0609) JO23089L18== clones derived from flower organs and the libraries from 0610) JO33082E21== which the clones are derived.

SEQUENCE LISTING The patent application contains a lengthy “Sequence Listing section. A copy of the “Sequence Listing is available in electronic form from the USPTO web site (http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20060123505 A1). An electronic copy of the “Sequence Listing will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

What is claimed is: 4. An isolated DNA molecule selected from the group 1. An isolated plant-derived nucleic acid, wherein said consisting of: nucleic acid is selected from the group consisting of (a) a DNA molecule encoding an antisense RNA comple (a) a nucleic acid encoding a protein comprising an amino mentary to a transcript of the DNA molecule of claim acid sequence set forth in any one of SEQ ID NOs: 2. 28470 through 56791; (b) a DNA molecule encoding RNA having ribozyme (b) a nucleic acid containing the coding region of a activity to specifically cleave a transcript of the DNA of nucleotide sequence set forth in any one of SEQ ID NOs: 1 through 28469; claim 2: (c) a nucleic acid encoding a protein comprising an amino (c) a DNA molecule encoding RNA inhibiting the expres acid sequence set forth in any one of SEQ ID NOs: sion of the DNA of claim 2 via an RNAi effect at the 28470 through 56791 wherein one or more amino acids time of expression of said DNA in plant cells; and are substituted, deleted, inserted and/or added; and (d) a DNA molecule encoding RNA inhibiting the expres (d) a nucleic acid hybridizing to a nucleic acid comprising sion of the DNA of claim 2 by the co-suppression effect a nucleotide sequence set forth in any one of SEQ ID at the time of expression of said DNA in plant cells. NOs: 1 through 28469 under stringent conditions. 5. A vector containing the nucleic acid of claim 1. 2. The nucleic acid according to claim 1, wherein said 6. A vector containing the nucleic acid of claim 2. nucleic acid is derived from rice. 7. A vector containing the nucleic acid of claim 3. 3. An isolated DNA molecule selected from the group 8. A vector containing the nucleic acid of claim 4. consisting of: 9. A transformed plant cell maintaining the nucleic acid of (a) a DNA molecule encoding an antisense RNA comple claim 1. mentary to a transcript of the DNA molecule of claim 10. A transformed plant cell maintaining the nucleic acid 1; of claim 3. 11. A transformed plant cell maintaining the nucleic acid (b) a DNA molecule encoding RNA having ribozyme of claim 4. activity to specifically cleave a transcript of the DNA of 12. A transformed plant cell maintaining the vector of claim 1: claim 5. (c) a DNA molecule encoding RNA inhibiting the expres 13. A transformed plant body containing the transformed sion of the DNA of claim 1 via an RNAi effect at the plant cell of claim 9. time of expression of said DNA in plant cells; and 14. A progeny or clone of the transformed plant body of (d) a DNA molecule encoding RNA inhibiting the expres claim 13. sion of the DNA of claim 1 by the co-suppression effect 15. A propagation material of the transformed plant body at the time of expression of said DNA in plant cells. of claim 13. US 2006/O123505 A1 Jun. 8, 2006 30

16. A propagation material of the transformed plant body (2) culturing said transformant; and of claim 14. 17. A method of producing a transformed plant body, (3) recovering the protein from the culture of the step (2). wherein said method comprises the step of transducing the 23. An antibody binding to the protein of claim 21. nucleic acid of claim 1 into plant cells to regenerate a plant 25. A rice gene database comprising sequence information body from said plant cells. selected from the group consisting of: 18. A method of producing a transformed plant body, wherein said method comprises the step of transducing the (a) one or more amino acid sequences selected from SEQ nucleic acid of claim 3 into plant cells to regenerate a plant ID NOs: 28470 through 56791; body from said plant cells. (b) one or more nucleotide sequences selected from SEQ 19. A method of producing a transformed plant body, ID NOs: 1 through 28469; and wherein said method comprises the step of transducing the nucleic acid of claim 4 into plant cells to regenerate a plant (c) both (a) and (b). body from said plant cells. 26. A method of determining the transcriptional regula 20. A method of producing a transformed plant body, tory region comprising the steps of wherein said method comprises the step of transducing the vector of claim 5 into plant cells to regenerate a plant body (1) mapping the nucleotide sequence of any one of SEQ from said plant cells. ID NOs: 1 through 28,469 to the rice genome nucle 21. A protein encoded by the nucleic acid of claim 1. otide sequence, and 22. A method of producing a protein encoded by the (2) determining the transcriptional regulatory region of nucleic acid of claim 1 comprising the following steps: the gene mapped in the step (1) which contains the (1) transducing the nucleic acid of claim 1 or a vector transcriptional regulatory region found on the 5'-side of containing said nucleic acid into cells capable of the 5' most end of the mapped region. expressing said nucleic acid so as to obtain a transfor mant, k k k k k