US 2003O157592A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2003/0157592 A1 Lerchl et al. (43) Pub. Date: Aug. 21, 2003

(54) GENES FROM PHYSCOMITRELLA (30) Foreign Application Priority Data PATENS ENCODNG PROTEINS INVOLVED IN THE SYNTHESIS OF TOCOPHEROLS Dec. 16, 1999 (US)...... 6O171121 AND CAROTENOIDS Publication Classification (76) Inventors: Jens Lerchl, Ladenburg (DE); Andreas Renz, Limburgerhof (DE); Thomas (51) Int. Cl." ...... C12P 23/00; C12N 9/10; Ehrhardt, (DE); Andreas AO1H 1/00; C12N 15/82; Reindl, (DE); Petra C12P 21/02; C12N 5/04 Cirpus, Mannheim (DE); Friedrich (52) U.S. Cl...... 435/67; 435/69.1; 435/193; Bischoff, Mannheim (DE); Markus 435/320.1; 435/419; 800/278; Frank, (DE); Annette 800/282 Freund, Limburgerhof (DE); Elke Duwenig, Ludwigshafen (DE); (57) ABSTRACT Ralf-Michael Schmidt, Kirrweiler (DE); Ralf Reski, Oberried (DE); Ralf Isolated nucleic acid molecules, designated TCMRP nucleic Badur, Goslar (DE) acid molecules, which encode novel TCMRPs from e.g. Correspondence Address: Phycomitrella patens are described. The invention also KEIL & WEINKAUF provides antisense nucleic acid molecules, recombinant 1350 CONNECTICUT AVENUE, N.W. expression vectors containing TCMRP nucleic acid mol WASHINGTON, DC 20036 (US) ecules, and host cells into which the expression vectors have been introduced. The invention still further provides isolated (21) Appl. No.: 10/149,759 TCMRPs, mutated TCMRPs, fusion proteins, antigenic pep tides and methods for the improvement of production of a (22) PCT Filed: Dec. 14, 2000 desired compound from transformed cells, organisms or plants based on genetic engineering of TCMRP genes in (86) PCT No.: PCT/EP00/12698 these organisms. Patent Application Publication Aug. 21, 2003 Sheet 1 of 4 US 2003/0157592 A1 Figure 1:

Xho EcoRI

Bamhi

O92-260CCs pOE30-092-260cds

BamH Kpnl Smal Xmal Sall PSt Xbal Mscl Patent Application Publication Aug. 21, 2003 Sheet 2 of 4 US 2003/0157592 A1 Figure 2:

pBinLePTkTp9-092-260Cas

EcoR

LeB4-Promotor

Kpnl wif Bam H.

TKTP

092-260cds OCS Bamhi Smal Xmal Sall Patent Application Publication Aug. 21, 2003 Sheet 3 of 4 US 2003/0157592 A1 Figure 3:

poE30-087-259Cterm Xhol : EcoR

: Bamhl

78pProt187e12Cterm

BamH Sphl -----Sac Kpni Smal Xmal Sall

Sapi ; Nde Xbal Patent Application Publication Aug. 21, 2003 Sheet 4 of 4 US 2003/0157592 A1 Figure 4:

pBinLePTkTp-087-259Cterm

EcoR

LeB4-Promotor

BamHl Smal Xmal Sall

Nhel US 2003/O157592 A1 Aug. 21, 2003

MOSS GENES FROM ENCODING PROTEINS INVOLVED IN THE SYNTHESIS OF TOCOPHEROLS AND (1) CAROTENOIDS HO BACKGROUND OF THE INVENTION 0001 Certain products and by-products of naturally-oc curring metabolic processes in cells have utility in a wide array of industries, including the food, feed, cosmetics, and pharmaceutical industries. These molecules, collectively termed fine chemicals, include organic acids, both protei 0008) 1a, C-tocopherol: R'=R=RCH nogenic and non-proteinogenic amino acids, nucleotides and 0009) 1b, B-tocopherol: R=R-CH, R-H nucleosides, lipids and fatty acids, carotenoids, diols, car 0010) 1c, Y-tocopherol: R'=H, R=R=CH bohydrates, aromatic compounds, and cofactors and . 0011) 1d, 8-tocopherol: R=R-H R =CH 0002 Their production is most conveniently performed (2) through the large-scale culture of bacteria developed to R1 produce and Secrete large quantities of one or more desired molecules. One particularly useful organism for this purpose HO is Corynebacterium glutamicum, a gram positive, nonpatho genic bacterium. R2 O 2 3 0003) Through strain selection, a number of mutant R3 Strains of the respective microorganisms have been devel oped which produce an array of desirable compounds. However, Selection of Strains improved for the production of 0012 2b, C-tocotrienol: R'=R=R CH a particular molecule is a time-consuming and difficult 0013) 2b, B-tocotrienol: R'=R=CH, R=H proceSS. 0014) 2b, Y-tocotrienol: R'=H, R=R =CH 0004 Alternatively the production of fine chemicals can 0015 2b, 8-tocotrienol: R'=R=H, RCH be most conveniently performed via the large Scale produc 0016. In the present invention, tocopherols are to be tion of plants developed to produce one of aforementioned understood as meaning all the abovementioned tocopherols fine chemicals. Of particular interest for this purpose are all and tocotrienols and derivates thereof with E activ crop plants for food and feed uses. Increased or modulated ity. compositions of fine chemicals like amino acids, Vitamins 0017. These compounds with vitamin E activity (vitamin and nucleotides, in these plants would lead to optimized E compounds) are important natural lipid-Soluble Sub nutritional qualities. stances, which among other activities have especially the function of antioxidants. A lack of Vitamin E in humans and 0005 Through conventional breeding, a number of animals leads to pathophysiological situations. Vitamin E mutant plants have been developed which produce increased compounds therefore have an important economical value as amounts of for example, carotenoids, and amino acids. additives in the food and feed Sectors, in pharmaceutical However, selection of new plant cultivars improved for the formulations and in cosmetic applications. production of a particular molecule is a time-consuming and 0018. An economical method for the production of vita difficult process. min E compounds, and foodstuffs and animal feeds with an elevated Vitamin E content are therefore of great importance. SUMMARY OF THE INVENTION 0019 WO 00/10380 describes the gene sequence encod 0006. This invention provides novel nucleic acid mol ing the 2-methyl-6-phytylplastoquinol-methyltransferase ecules which may be used to modify tocopherols and from the prokaryotic organism Synechocystis Spec. carotenoids in plants, algae and microorganisms. PCC6803. WO97/27285 describes the mapping of the gene 0007. The naturally occurring eight compounds with locus of p-hydroxyphenylpyruvate dioxygenase encoding vitamin E activity are derivatives of 6-chromanol (Ull gene of . Speculations are done about mann's Encyclopedia of Industrial , Vol. A 27 the effects of overexpression or downregulation of the plant (1996), VCH Verlagsgesellschaft, Chapter 4, 478-488, Vita on the Vitamin E content or herbicide resistance in min E). The group of the tocopherols (1C-6) has a Saturated transgenic plants. WO 99/04622 and D. DellaPenna et al., Side chain, while the group of the tocotrienols (2C-6) has an Science 1998, 282, 2098-2100 describe gene sequences unsaturated Side chain: encoding a Y-tocopherol methyltransferase from Syn US 2003/O157592 A1 Aug. 21, 2003 echocystis PCC6803 and Arabidopsis thaliana and their ogy on the DNA sequence and polypeptide level allowing incorporation into plants. However, the transgenic plants the use of heterologous screening of DNA molecules with show only a shift in the Spectum of tocopherols, i.e. a shift probes evolving from other or organisms, thus from gamma-tocopherol to alpha-tocopherol because of the enabling the derivation of a consensus Sequence Suitable for higher expression of Y-tocopherol methyltransferase. No heterologous Screening or functional annotation and predic data are shown concerning a higher yield of tocopherols, i. tion of gene functions in third species. The ability to identify e. a quantitative improvement in tocopherol content. Such functions can therefor have significant relevance, e.g., prediction of Substrate Specificity of enzymes. Further, these 0020. To date no economical methods are available for an effective production of tocopherols and/or carotinoids in nucleic acid molecules may serve as reference points for the transgenic organisms, i. e. for effectively increasing the mapping of moSS , or of genomes of related organ metabolite flow in the direction of increased tocopherol SS. and/or carotinoid content in transgenic organisms, for 0026. This invention provides novel nucleic acid mol example in transgenic plants, by overexpressing one or ecules which encode proteins, referred to herein as Toco Several biosynthesis genes, alone or in any combination, pherol, and Carotenoid Related Proteins related to the tocopherol and/or carotinoid metabolism. (TCMRP). These TCMRPs are capable of, for example, performing an enzymatic Step involved in the metabolism of 0021 Methods which are particularly economical are certain fine chemicals, including tocopherols and/or caro biotechnological methods which exploit proteins and bio tenoids. Synthesis genes from tocopherol or carotinoid biosynthesis from organisms producing these compounds. 0027) Given the availability of cloning vectors for use in plants and plant transformation, Such as those published in 0022 Microorganisms like Corynebacterium and fungi and cited therein: Plant Molecular and Biotechnol and algae like Phaeodactylum are commonly used in indus ogy (CRC Press, Boca Raton, Fla.), chapter 6/7, S.71-119 try for the large-scale production of a variety of fine chemi (1993); F. F. White, Vectors for Gene Transfer in Higher cals. Plants, in: Transgenic Plants, Vol. 1, Engineering and Uti 0023 Given the availability of cloning vectors for use in lization, eds.: Kung und R. Wu, Academic Press, 1993, Corynebacterium glutamicum, Such as those disclosed in 15-38; B. Jenes et al., Techniques for Gene Transfer, in: Sinskey et al., U.S. Pat. No. 4,649,119, and techniques for Transgenic Plants, Vol. 1, Engineering and Utilization, eds.: genetic manipulation of C. glutamicum and the related Kung und R. Wu, Academic Press (1993), 128-143; Pot Brevibacterium species (e.g., lactofermentum) (Yoshihama rykus, Annu. Rev. Plant Physiol. Plant Molec. Biol. 42 et al., J. Bacteriol. 162: 591-597 (1985); Katsumata et al., J. (1991), 205-225)) the nucleic acid molecules of the inven Bacteriol. 159: 306–311 (1984); and Santamaria et al., J. tion may be utilized in the genetic engineering of a wide Gen. Microbiol. 130: 2237-2246 (1984)), the nucleic acid variety of plants to make it a better or more efficient molecules of the invention may be utilized in the genetic producer of one or more fine chemicals. This improved engineering of this organism to make it a better or more production or efficiency of production of a fine chemical efficient producer of one or more fine chemicals. This may be due to a direct effect of manipulation of a gene of the improved production or efficiency of production of a fine invention, or it may be due to an indirect effect of Such chemical may be due to a direct effect of manipulation of a manipulation. gene of the invention, or it may be due to an indirect effect 0028. There are a number of mechanisms by which the of Such manipulation. alteration of an TCMRP of the invention may directly affect 0024. Given the availability of cloning vectors and tech the yield, production, and/or efficiency of production of a niques for genetic manipulation of ciliates Such as disclosed fine chemical in plant due to Such an altered protein. in WO9801572 or algae and related organisms such as 0029. The nucleic acid and protein molecules of the Phaeodactylum tricornutum (described in Falciatore et al., invention may directly improve the production or efficiency 1999, Marine 1 (3):239-251 as well as of production of one or more desired fine chemicals from Dunahay et al. 1995, Genetic transformation of diatoms, J. microorganisms and plants. Using recombinant genetic tech Phycol. 31:10004-1012 and references therein) the nucleic niques well known in the art, one or more of the biosynthetic acid molecules of the invention may be utilized in the or degradative enzymes of the invention for tocopherols genetic engineering of these organisms to make them better and/or carotinoids may be manipulated Such that its function or more efficient producers of one or more fine chemicals. is modulated. For example, a biosynthetic enzyme may be This improved production or efficiency of production of a improved in efficiency, or its allosteric control region fine chemical may be due to a direct effect of manipulation destroyed such that feedback inhibition of production of the of a gene of the invention, or it may be due to an indirect compound is prevented. Similarly, a degradative enzyme effect of Such manipulation. may be deleted or modified by substitution, deletion, or 0.025 The moss Physcomitrella patens represents one addition Such that its degradative activity is lessened for the member of the mosses. It is related to other moSSes Such as desired compound without impairing the viability of the cell. Ceratodon purpureuS which is capable to grow in the 0030) Further, one gene or one enzyme of the invention absense of light. Further Physcomitrella patens represents for tocopherols and/or carotinoids or preferably a combina the only plant organism which can be utilized for targeted tion of Several genes or enzymes of the invention can be disruption of genes by . Mutants transformed into host cells (e.g. starting organism or already generated by this technique are useful to characterize the genetically modified host System), whereby the gene(s) or function for genes described in the invention. Mosses like enzyme(s) can be modified either in their activity or number Ceratodon and Physcomitrella share a high degree of homol in the correponding host cell (e.g. plant). Besides, the host US 2003/O157592 A1 Aug. 21, 2003 cell itself might be already genetically manipulated (e.g. in or deposited. Similarly, those TCMRPs involved in the key position of the pathway) in the way that the flux of import of nutrients necessary for the biosynthesis of one or metabolites can be directed to higher yields of tocopherols more fine chemicals (e.g. tocopherols and/or carotinoids) and/or carotinoids, when the cell is used to be transformed may be increased in number or activity Such that these with one or more genes (encoding the corresonding precursors, cofactors, or intennediate compounds are enzymes) of the invention for tocopherols and/or caroti increased in concentration within the cell or within the noids. In each case, the overall yield or rate of production of Storing compartments. The invention pertains to an isolated the desired fine chemical may be increased. In one preferred nucleic acid molecule which encodes an TCMRP or an embodiment of the instant invention the genes encoding the TCMRP polypeptide involved in assisting in transmembrane TCMR proteins Y-tocopherol-methyltransferase (gamma transport. TMT type I), 2-methyl-6-phytylplastoquinol methyltrans ferase (gamma-TMT type II) and/or 4-hydroxyphenylpyru 0033. The mutagenesis of one or more TCMRPs of the Vate dioxygenase alone or in any combination have a invention may also result in TCMRPs having altered activi substancial effect on the production of the desired fine ties which indirectly impact the production of one or more chemical, preferred vitamin E compounds or in the produc desired fine chemicals from plants. For example, TCMRPs tion of relevant precursors, e.g. tocopherol precursorS Such of the invention involved in the export of waste products as homogentisic acid and/or phytylpyrophosphate and/or may be increased in number or activity Such that the normal geranylgeranyl-pyrophosphate. In the instant invention, the metabolic wastes of the cell (possibly increased in quantity genes encoding these enzymes mentioned above, i.e. Y-to due to the overproduction of the desired fine chemical) are copherol-methyltransferase (gamma-TMT type I), 2-me efficiently exported before they are able to damage nucleic thyl-6-phytylplastoquinol methyltransferase (gamma-TMT acids and proteins within the cell (which would decrease the type II) and/or 4-hydroxyphenylpyruvate dioxygenase, can viability of the cell) or to interfere with fine chemical be isolated from the moss Physcomitrella patens and trans biosynthetic pathways (which would decrease the yield, ferred into Suitable host cells, but the invention is not limited production, or efficiency of production of the desired fine to this organism as a Source for the nucleic acid isolation. chemical). Further, the relatively large intracellular quanti Thus, the mentioned genes and/or enzymes can also be ties of the desired fine chemical may in itself be toxic to the isolated from any other organisms, e.g. prokaryotes or cell or may interfere with enzyme feedback mechanisms eukaryotes, which comprises an endogenous Sequence men Such as allosteric regulation, So by increasing the activity or tioned above. Preferred examples for Such organisms, espe number of transporters able to export this compound from cially in View to the enzyme 4-hydroxyphenylpyruvate the compartment, one may increase the viability of Seed dioxygenase, are Streptomyces avermitilis (database acces cells, in turn leading to a greater number of cells in the sion number of the corresponding gene is AL 096852), culture producing the desired fine chemical. The TCMRPs of the invention may also be manipulated Such that the Rattus norwegicus (database accession number AF 082834), relative amounts of different tocopherols and/or carotinoids Synechocystis spec. PCC6803 or Arabidopsis thaliana (Del are produced. This can be appreciable for optimizing plant laPenna, D. et al., 1998, Science, 282, 2098-2100). nutritional composition. In plants these changes can more 0031. It is also possible that alterations in the protein and over also influence other characteristic like tolerance nucleotide molecules of the invention may improve the towards abiotic and biotic StreSS conditions. production of other fine chemicals besides the tocopherols and/or carotinoids through indirect mechanisms. Metabo 0034. This invention provides novel nucleic acid mol lism of any one compound is necessarily intertwined with ecules which encode TCMRPs, which are capable of, for other biosynthetic and degradative pathways within the cell, example, performing an enzymatic Step involved in the and necessary cofactors, intermediates, or Substrates in one metabolism of molecules important for the normal function pathway are likely Supplied or limited by another Such ing of cells, Such as tocopherols and/or carotinoids. Nucleic pathway. Therefore, by modulating the activity of one or acid molecules encoding an TCMRP are referred to herein as more of the proteins of the invention, the production or TCMRP nucleic acid molecules. In a preferred embodiment, efficiency of activity of another fine chemical biosynthetic or the TCMRP performs an enzymatic step related to the degradative pathway may be impacted. For example, amino metabolism of one or more tocopherols and/or carotinoids. acids Serve as the Structural units of all proteins, yet may be Examples of Such proteins include those encoded by the present intracellularly in levels which are limiting for pro genes set forth in the Appendix A and B and Table 1. tein Synthesis, therefore, by increasing the efficiency of production or the yields of one or more amino acids within 0035. As biotic and abiotic stress tolerance is a general the cell, proteins, Such as biosynthetic or degradative pro trait wished to be inherited into a wide variety of plants like teins, may be more readily Synthesized. Likewise, an alter maize, wheat, rye, oat, triticale, rice, barley, Sorghum, ation in a metabolic pathway enzyme Such that a particular potato, tomato, Soyabean, bean, pea, peanut, cotton, rape Side reaction becomes more or less favored may result in the Seed, canola, alfalfa, grape, fruit plants (apple, pear, over- or under-production of one or more compounds which pinapple), bushy plants (coffee, cacao, tea), trees (oil palm, are utilized as intermediates or Substrates for the production coconut), legumes, perennial grasses, and forage crops. of a desired fine chemical. These crops plants are also preferred target plants for a genetic engineering as one further embodiment of the 0032 Those TCMRPs involved in the transport of fine present invention. More preferably are corp plants and oil chemical molecules from the cell may be increased in Seed plants and most preferably are rape and Soyabean. number or activity Such that greater quantities of these compounds are allocated to different plant cell compart 0036) The nucleic acid constructs according to the inven ments or the cell exterior Space from which they are more tion can be used for the generation of genetically modified readily recovered and partitioned into the biosynthetic flux organisms, hereinbelow also termed transgenic organisms. US 2003/O157592 A1 Aug. 21, 2003

0037 Starting or host organisms are to be understood as 0046 Genetically modified plants according to the inven meaning prokaryotic or eukaryotic organisms. Such as, for tion with an increased tocopherol content which can be example, microorganisms, mosses or plants. Preferred consumed by humans and animals can also be used as micororganisms are bacteria, yeasts, algae or fungi. In one foodstuffs or feeds for example directly or after processing preferred embodiment of the instant invention host organ which is known per Se. isms are plants. 0047 The invention furthermore relates to a method for 0.038 Examples of preferred plants are Tagetes, sunflow the generation of genetically modified organisms by intro ers, Arabidopsis, tobacco, red pepper, Soyabeans, tomatoes, ducing a nucleic acid according to the invention or a nucleic aubergines, capsicums, carrots, potatoes, maize, Saladings acid construct according to the invention into the of and cabbages, cereals, alfalfa, oats, barley, rye, wheat, the Starting organism. Triticale, panic grasses, rice, luceme, flax, cotton, hemp, Brassicaceae Such as, for example, oilseed rape or canola, 0048. Accordingly, one aspect of the invention pertains to Sugar beet, Sugar cane, nut and grapevine Species or Woody isolated nucleic acid molecules (e.g., cDNAS) comprising a Species Such as, for example, aspen or yew. More preferably nucleotide sequence encoding an TCMRP or biologically are crop plants or oil Seed plants, most preferably are active portions thereof, as well as nucleic acid fragments Arabidopsis thaliana, Tagetes erecta, Brassica napus, Nic Suitable as primerS or hybridization probes for the detection Otiana tabacum, canola or potatoes. Especially preferred are or amplification of TCMRP-encoding nucleic acid (e.g., rape or Soyabeans. DNA or mRNA). In another embodiment, the isolated nucleic acid molecule is at least 15 nucleotides in length and 0.039 Genetically modified or transgenic organisms are hybridizes under Stringent conditions to a nucleic acid to be understood as meaning the corresponding transformed molecule comprising a nucleotide Sequence of Appendix A. Starting organisms. Preferably, the isolated nucleic acid molecule corresponds to 0040. The invention relates to a genetically modified a naturally-occurring nucleic acid molecule. More prefer organism where the genetic modification of the gene expres ably, the isolated nucleic acid encodes a naturally-occurring Sion of a nucleic acid according to the invention relative to Physcomitrella patens TCMRP, or a biologically active a wild type is increased in the event that the Starting portion thereof. In particularly preferred embodiments, the organism comprises a nucleic acid according to the inven isolated nucleic acid molecule comprises one of the nucle tion or caused in the event that the Starting organism does otide Sequences Set forth in Appendix A or the coding region not contain a nucleic acid according to the invention. or a complement thereof of one of these nucleotide 0041 Transgenic organisms comprising at least one Sequences. In other particularly preferred embodiments, the exogenous or at least one additional endogenous gene isolated nucleic acid molecule of the invention comprises a according to the invention which already in the form of the nucleotide Sequence which hybridizes to or is at least about Starting organisms possess the biosynthesis genes for the 50%, preferably at least about 60%, more preferably at least about 70%, 80% or 90%, and even more preferably at least production of tocopherols Such as, for example, plants or about 95%, 96%, 97%, 98%, 99% or more homologous to a other photosynthetically active organisms. Such as, for nucleotide Sequence Set forth in Appendix A, or a portion example, cyanobacteria, mosses or algae exhibit an thereof. In other preferred embodiments, the isolated nucleic increased tocopherol content compared with the respective acid molecule encodes one of the amino acid Sequences Set wild type or starting organism. forth in Appendix B. The preferred TCMRP of the present 0042. Accordingly, the invention furthermore relates to invention also preferably possess at least one of the TCMRP genetically modified organisms, wherein the genetically activities described herein. modified organism exhibits an increased tocopherol content relative to the wild type in the case where the Starting 0049. In another embodiment, the instant nucleic acid organism is capable of producing tocopherols, or is capable molecule is full length or nearly full length nucleic acid of producing tocopherols in the case where the Starting molecule with an homology of at least about 50%, prefer organism comprises the genes required for tocopherol bio ably at least about 60%, more preferably at least about 70%, Synthesis. 80% or 90%, and even more preferably at least about 95%, 96%, 97%, 98%, 99% or more homologous to a nucleotide 0043. The invention preferably relates to an above-de Sequence Set forth in Appendix A. Scribed genetically modified organism which exhibits an increased tocopherols content over the wild type. 0050. In another embodiment, the isolated nucleic acid molecule encodes a protein or portion thereof wherein the 0044) Used in a preferred embodiment as organisms and protein or portion thereof includes an amino acid Sequence for the generation of organisms with an increased toco which is Sufficiently homologous to an amino acid Sequence pherols content compared with the wild type are plants, not of Appendix B, e.g., Sufficiently homologous to an amino only as Starting organisms but also, accordingly, as geneti acid Sequence of Appendix B Such that the protein or portion cally modified organisms. thereof maintains an TCMRP activity. Preferably, the protein 004.5 The present invention therefore also relates to or portion thereof encoded by the nucleic acid molecule processes for the production of tocopherols by growing a maintains the ability to perform an enzymatic reaction in a genetically modified organism according to the invention, tocopherol and/or carotinoid metabolic pathway. In one preferably a genetically modified plant according to the embodiment, the protein encoded by the nucleic acid mol invention, which exhibits an increased tocopherol content ecule is at least about 50%, preferably at least about 60%, over the wild type, harvesting the organism and Subse and more preferably at least about 70%, 80%, or 90% and quently isolating the tocopherol compounds from the organ most preferably at least about 95%, 96%, 97%, 98%, or 99% S. or more homologous to an amino acid Sequence of Appendix US 2003/O157592 A1 Aug. 21, 2003

B (e.g., an entire amino acid sequence Selected from those embodiment, Such a host cell is a cell capable of Storing fine Sequences set forth in Appendix B). In another preferred chemical compounds in order to isolate the desired com embodiment, the protein is a full length or nearly full length pound from harvested material. The compound or the PhySCOmitrella patens protein is Substantially homologous TCMRP can then be isolated from the medium or the host to an entire amino acid sequence of Appendix B (encoded by cell, which in plants are cells containing and Storing fine an open reading frame shown in Appendix A). AS used chemical compounds, most preferably cells of Storage tis herein, a protein which has an amino acid Sequence which Sues like epidermal and Seed cells. is Substantially homologous to a Selected amino acid Sequence is least about 50% homologous to the Selected 0057 Yet another aspect of the invention pertains to a amino acid Sequence, e.g., the entire Selected amino acid genetically altered PhySCOmitrella patens plant in which an Sequence. A protein which has an amino acid Sequence TCMRP gene has been introduced or altered. In one embodi which is Substantially homologous to a Selected amino acid ment, the genome of the PhySCOmitrella patens plant has sequence can also be least about 50-60%, preferably at least been altered by introduction of a nucleic acid molecule of about 60-70%, and more preferably at least about 70-80%, the invention encoding wild-type or mutated TCMRP 80-90%, or 90-95%, and most preferably at least about 96%, Sequence as a transgene. In another embodiment, an endog 97%, 98%, 99% or more homologous to the selected amino enous TCMRP gene within the genome of the Physcomi acid Sequence. trella patens plant has been altered, e.g., functionally dis rupted, by homologous recombination with an altered 0051. In another preferred embodiment, the isolated TCMRP gene. In a preferred embodiment, the plant organ nucleic acid molecule is derived from Physcomitrella patens ism belongs to the genus Physcomitrella or Ceratodon, with and encodes a protein (e.g., an TCMRP fusion protein) Physcomitrella being particularly preferred. In a preferred which includes a biologically active domain which is at least embodiment, the Physcomitrella patens plant is also utilized about 50% or more homologous to one of the amino acid for the production of a desired compound, Such as toco Sequences of Appendix B and is able to perform an enzy pherols and/or carotinoids. Hence in another preferred matic reaction in a tocopherol and/or carotinoid metabolic embodiment, the moss Physcomitrella patens can be used to pathway or has one or more of the activities set forth in Table show the function of new, yet unidentified genes of mosses 1, and which also includes heterologous nucleic acid or plants using homologous recombination based on the Sequences encoding a heterologous polypeptide or regula nucleic acids described in this invention. tory regions. 0058 Still another aspect of the invention pertains to an 0.052 Preferably, so-called conservative exchanges are isolated TCMRP or a portion, e.g., a biologically active carried out in which the amino acid which is replaced has a portion, thereof. In a preferred embodiment, the isolated Similar property as the original amino acid, for example the TCMRP or portion thereof can catalyze an enzymatic reac exchange of Glu by Asp, Gln by ASn, Val by Ile, Leu by Ile, tion involved in one or more pathways for the metabolism of and Serby Thr. Deletion is the replacement of an amino acid tocopherols and/or carotinoids. In another preferred embodi by a direct bond. Preferred positions for deletions are the ment, the isolated TCMRP or portion thereof is sufficiently termini of the polypeptide and the linkages between the homologous to an amino acid Sequence of Appendix B Such individual protein domains. that the protein or portion thereof maintains the ability to catalyze an enzymatic reaction involved in one or more 0.053 Insertions are introductions of amino acids into the pathways for the metabolism of tocopherols and/or caroti polypeptide chain, a direct bond formally being replaced by noids. one or more amino acids. 0059. The invention also provides an isolated preparation 0054) One embodiment of the invention pertains to of an TCMRP. In preferred embodiments, the TCMRP TCMRP polypeptides, where by of one or more amino acids comprises an amino acid Sequence of Appendix B. In are Substituted or exchanged by one or more amino acids. another preferred embodiment, the invention pertains to an 0.055 Another aspect of the invention pertains to an isolated full length protein which is substantially homolo TCMRP polypeptide whose amino acid sequence can be gous to an entire amino acid Sequence of AppendiX B modulated with the help of art-known computer simulation (encoded by an open reading frame set forth in Appendix A). programms resulting in an polypeptide with e.g. improved In yet another embodiment, the protein is at least about 50%, activity or altered regulation (molecular modelling). On the preferably at least about 60%, and more preferably at least basis of this artificially generated polypeptide Sequences, a about 70%, 80%, or 90%, and most preferably at least about corresponding nucleic acid molecule coding for Such a 95%, 96%, 97%, 98%, or 99% or more homologous to an modulated polypeptide can be Synthesized in-vitro using the entire amino acid Sequence of Appendix B. In other embodi Specific codon-usage of the desired host cell, e.g. of micro ments, the isolated TCMRP comprises an amino acid organisms, mosses, algae, ciliates, fungi or plants (back Sequence which is at least about 50% or more homologous translated nucleic acid Sequences). In a preferred embodi to one of the amino acid Sequences of AppendiX B and is ment, even these artificial nucleic acid molecules coding for able to perform an enzymatic reaction in a tocopherol and/or improved TCMRP proteins are within the scope of this carotinoid metabolic pathway in a microorganism or a plant invention. cell or has one or more of the activities set forth in Table 1. 0056. Another aspect of the invention pertains to vectors, 0060 Alternatively, the isolated TCMRP can comprise an e.g., recombinant expression vectors, containing the nucleic amino acid Sequence which is encoded by a nucleotide acid molecules of the invention, and host cells into which Sequence which hybridizes, e.g., hybridizes under Stringent Such vectors have been introduced, especially microorgan conditions, or is at least about 50%, preferably at least about ims, plant cells, plant tissue, organs or whole plants. In one 60%, more preferably at least about 70%, 80%, or 90%, and US 2003/O157592 A1 Aug. 21, 2003

even more preferably at least about 95%, 96%, 97%, 98,%, which modulates TCMRP activity can be an agent which or 99% or more homologous, to a nucleotide Sequence of stimulates TCMRP activity or TCMRP nucleic acid expres Appendix B. It is also preferred that the preferred forms of Sion. Examples of agents which stimulate TCMRP activity TCMRP also have one or more of the TCMRP activities or TCMRP nucleic acid expression include small molecules, described herein. active TCMRPs, and nucleic acids encoding TCMRPs that 0061 The TCMRP polypeptide, or a biologically active have been introduced into the cell. Examples of agents portion thereof, can be operatively linked to a non-TCMRP which inhibit TCMRP activity or expression include small polypeptide to form a fusion protein. In preferred embodi molecules and antisense TCMRP nucleic acid molecules. ments, this fusion protein has an activity which differs from 0067. Another aspect of the invention pertains to methods that of the TCMRP alone. In other preferred embodiment, for modulating yields of a desired compound from a cell, this fusion protein performs an enzymatic reaction in a involving the introduction of a wild-type or mutant TCMRP tocopherol and/or carotinoid metabolic pathway. In particu gene into a cell, either maintained on a separate plasmid or larly preferred embodiments, integration of this fusion pro integrated into the genome of the host cell. If integrated into tein into a host cell modulates production of a desired the genome, Such integration can be random, or it can take compound from the cell. Further, the instant invention place by recombination Such that the native gene is replaced pertains to an antibody specifically binding to an MP by the introduced copy, causing the production of the desired polypeptide mentioned before or to a portion thereof. compound from the cell to be modulated or by using a gene in trans Such as the gene is functionally linked to a functional 0.062 Another aspect of the invention pertains to a test kit expression unit containing at least a Sequence facilitating the comprising a nucleic acid molecule encoding an TCMRP, a expression of a gene and a sequence facilitating the poly portion and/or a complement of this nucleid acid molecule adenylation of a functionally transcribed gene. used as probe or primer for identifying and/or cloning further nucleic acid molecules involved in the synthesis of 0068. In a preferred embodiment, said yields are modi amino acids, Vitamins, cofactors, nucloetides and/or nucleo fied. In another preferred embodiment, Said desired chemical Sides or assisting in transmembrane transport in other cell is increased while unwanted disturbing compounds can be types or organisms. decreased. In a particularly preferred embodiment, Said desired fine chemical is a tocopherols and/or carotinoids. 0.063. In another embodiment the test kit comprises an TCMRP-antibody for identifying and/or purifying further 0069. Another aspect of the invention pertains to the fine TCMRP molecules or fragments thereof in other cell types chemicals produced by a method described before and the or organisms. use of the fine chemical or a polypeptide of the invention for the production of another fine chemical. 0064. Another aspect of the invention pertains to a method for producing a fine chemical. This method involves DETAILED DESCRIPTION OF THE either the culturing of a Suitable microorganism, algae or INVENTION culturing plant cells tissues, organs or whole plants contain ing a vector directing the expression of an TCMRP nucleic 0070 The present invention provides TCMRP nucleic acid molecule of the invention, Such that a fine chemical is acid and protein molecules which are involved in the produced. In a preferred embodiment, this method further metabolism of tocopherols and/or carotinoids in the moss includes the Step of obtaining a cell containing Such a vector, Physcomitrella patens. The molecules of the invention may in which a cell is transformed with a vector directing the be utilized in the production or modulation of fine chemicals expression of an TCMRP nucleic acid. In another preferred in microorganisms, algae and plants either directly (e.g., embodiment, this method further includes the step of recov where overexpression or optimization of a Vitamin biosyn ering the fine chemical from the culture. In a particularly thesis protein has a direct impact on the yield, production, preferred embodiment, the cell is from the genus Phaeodac and/or efficiency of production of the vitamin from modified tylum, mosses, algae or plants. organims), or may have an indirect impact which nonethe leSS results in an increase of yield, production, and/or 0065. Another aspect of the invention pertains to a efficiency of production of the desired compound or method for producing a fine chemical which involves the decrease of undesired compounds (e.g., where modulation culturing of a Suitable host cell whose genomic DNA has of the metabolism of tocopherols and/or carotinoids results been altered by the inclusion of an TCMRP nucleic acid in alterations in the yield, production, and/or efficiency of molecule of the invention. Further, the invention pertains to production or the composition of desired compounds within a method for producing a fine chemical which involves the the cells, which in turn may impact the production of one or culturing of a Suitable host cell whose membrane has been more other fine chemicals). altered by the inclusion of an TCMRP of the invention. 0071 Preferred mircroorganisms for the production or 0.066 Another aspect of the invention pertains to methods modulation of fine chemicals are for example Corynehac for modulating production of a molecule from a kostcell. terium, Synechocystis Spec., SynechococcuS Spec., Ashbya Such methods include contacting the cell with an agent gOSSypi, Neurospora crassa, Aspergillus Spec., Saccharo which modulates TCMRP activity or TCMRP nucleic acid myces cerevisiae. Preferred algae for the production or expression Such that a cell associated activity is altered modulation of fine chemicals are Chlorella Spec., Crypth relative to this Same activity in the absence of the agent. In ecodineum spec., Phylodactenum spec. Preferred plants for a preferred embodiment, the cell is modulated for one or the production or modulation of fine chemicals are for more metabolic pathways for tocopherols and/or carotinoids example mayor crop plants for example maize, wheat, rye, Such that the yields or rate of production of a desired fine oat, triticale, rice, barley, Sorghum, potato, tomato, Soybean, chemical by this microorganism is improved. The agent bean, pea, peanut, cotton, rapeseed, canola, alfalfa, grape, US 2003/O157592 A1 Aug. 21, 2003 fruit plants (apple, pear, pinapple), bushy plants (coffee, tical compounds. The language "cofactor” includes nonpro cacao, tea), trees (oil palm, coconut), legumes, perennial teinaceous compounds required for a normal enzymatic grasses, and forage crops. activity to occur. Such compounds may be organic or inorganic, the cofactor molecules of the invention are pref 0.072 Particularly Suited for the production or modula erably organic. The term “nutraceutical” includes dietary tion of lipophilic fine chemicals. Such as tocopherols and/or Supplements having health benefits in plants and animals, carotinoids are oil Seed plants containing high amounts of particularly humans. Examples of Such molecules are Vita lipid compounds like rapeseed, canola, linseed, Soybean and mins, antioxidants, and also certain lipids (e.g., polyunsatu Sunflower. rated fatty acids). 0.073 Aspects of the invention are further explicated 0076. The biosynthesis of these molecules in organisms below. capable of producing them, Such as bacteria and plants, has been largely characterized (Friedrich, W. “Handbuch der Fine Chemicals Vitamine', Urban und Schwarzenberg, 1987; Ullman's 0.074 The term “fine chemical is art-recognized and Encyclopedia of Industrial Chemistry, “Vitamins' vol. A27, includes molecules produced by an organism which have p. 443-613, VCH: Weinheim, 1996; Michal, G. (1999) applications in various industries, Such as, but not limited to, Biochemical Pathways: An Atlas of Biochemistry and the pharmaceutical, , and cosmetics industries. Molecular Biology, John Wiley & Sons; Ong, A. S., Niki, E. Such compounds include lipids, fatty acids, Vitamins, cofac & Packer, L. (1995) “, Lipids, Health, and Disease” tors and enzymes, both proteinogenic and non-proteinogenic Proceedings of the UNESCO/Confederation of Scientific amino acids, purine and pyrimidine bases, nucleosides, and and Technological ASSociations in Malaysia, and the Society nucleotides (as described e.g. in Kuninaka, A. (1996) Nucle for Free Radical Research-Asia, held Sept. 1-3, 1994 at otides and related compounds, p. 561-612, in Biotechnology Penang, Malaysia, AOCS Press: Champaign, Ill. X, 374S). vol. 6, Rehm et al., eds. VCH: Weinheim, and references 0077. The metabolism and uses of certain of these vita contained therein), lipids, both Saturated and polyunsatu mins are further explicated below. rated fatty acids (e.g., arachidonic acid), diols (e.g., propane diol, and butane diol), carbohydrates (e.g., hyaluronic acid 0078 Tocopherols (Vitamin E): and trehalose), aromatic compounds (e.g., aromatic amines, 0079 The fat-soluble vitamin E has received great atten Vanillin, and indigo), Vitamins and cofactors (as described in tion for its essential role as an antioxidant in nutritional and Ullmann's Encyclopedia of Industrial Chemistry, vol. A27, clinical applications (Liebler DC 1993. Critical Reviews in Vitamins, p. 443-613 (1996) VCH: Weinheim and references Toxicology23(2):147-169) thus representing a good area for therein; and Ong, A. S., Niki, E. & Packer, L. (1995) food design, feed applications and pharmaceutical applica Nutrition, Lipids, Health, and Disease” Proceedings of the tions. In addition, benefitial effects are encountered in UNESCO/Confederation of Scientific and Technological retarding diabetes-related high-age damages, anticancero Associations in Malaysia, and the Society for Free Radical genic effects as well as a protective role against erythreme Research, Asia, held Sept. 1-3, 1994 at Penang, Malaysia, and skin aging. Alpha-tocopherol as the most important AOCS Press, (1995)), enzymes, and all other chemicals antioxidans helps to prevent the oxidation of unsatturated described in Gutcho (1983) Chemicals by Fermentation, fatty acids by oxygen in humans by its redox potential (Erin Noyes Data Corporation, ISBN: 0818805086 and references AN, Skrypin VV, Kragan V E 1985, Biochim. Biophy. Acta therein. The metabolism and uses of certain of these fine 815: 209). chemicals are further explicated below. 0080. The demand for this vitamin has increased year after year. The Supply of tocopherols has been limited to the Tocopherol and Carotenoid Metabolism and Uses chemically Synthesized racemate of alpha-tocopherol or a 0075 Vitamins, cofactors, and nutraceuticals comprise mixture of alpha-, beta(gamma)- and delta-tocopherols from another group of fine chemical molecules which higher vegetable oils. Altogether, the group of compounds with animals have lost the ability to Synthesize and So must Vitamin E activity now comprises alpha-, beta-, gamma-, ingest. These molecules are readily Synthesized by other and delta-tocopherol as well as alpha-, beta-, gamma-, and organisms, Such as bacteria, fungi, algae and plants. These delta-tocotrienol. molecules are either bioactive Substances themselves, or are 0081 Biologically, tocopherols are indispensable com precursors of biologically active Substances which may ponents of the lipid bilayer of cell membranes. A reduction Serve as electron carriers or intermediates in a variety of of availability of tocopheroles leads to Structural and func metabolic pathways. Besides their nutritive value, these tional damaging of membranes. This Stabilizing effect of the compounds also have significant industrial value as coloring tocopherols on membranes is accepted to be related to three agents, antioxidants, and catalysts or other processing aids. functions: 1) tocopherols react with lipid peroxide radicals, (For an Overview of the structure, activity, and industrial 2) quenching of reactive molecular oxygen, and 3) reducing applications of these compounds, See, for example, Ull the molecular mobility of the membrane bilayer by the man's Encyclopedia of Industrial Chemistry, “Vitamins' formation of tocopherol-fatty acids complexes. vol. A27, p. 443-613, VCH: Weinheim, 1996.) The term “Vitamin' is art-recognized, and includes nutrients which are 0082 In addition to the occurrence of tocopherols in required by an organism for normal functioning, but which plants, their presence has been determined in various micro that organism cannot synthesize by itself. One preferred organisms, especially in many chlorophyll-containing embodiment of the instant invention pertains to Vitamin E organisms (Taketomi H, Soda K, Katsui G 1983, Vitamins compounds (tocopherols) and their production in plants. The () 57: 133-138). Algae, for example Euglenia gracilis, group of Vitamins may encompass cofactors and nutraceu also contain tocopherols and Euglenia gracilis is described US 2003/O157592 A1 Aug. 21, 2003 as a Suitable host for the production of tocopherols since the quenching photoSensensitizers interacting with Singlet oxy most valuable form alpha-tocopherol is the major compo gen and Scavenging peroxiradicals, thus preventing the nent of tocopherols (Shigeoka S, Onishi T, Nakano Y, accumulation of harmful oxygen Species and Subsequent Kitaoka S 1986, Agric. Biol. Chem. 50: 1063-1065). Also, maintainance of membrane integrity (Havaux 1998, Trends yeasts and bacteria were found to Synthesize tocopherols in Plant Science Vol 3 (4): 147-151; Krinsky 1994, Pur Appl. (Forbes M, Zilliken F, Roberts G, György P 1958, J. Am. Chem. 66:1003-1010). Thus an application is also given for Chem. Soc. 80: 385-389; Hughes and Tove 1982, J Bacte the optimization of fermentation processes with respect to riol., 151: 1397-1402; Ruggeri BA, Gray RJ H, Watkins T lesser Susceptibility to oxidative damage. For a review of R, Tomlins RI 1985, Appl. Env. Microbiol. 50:1404-1408). biotechnological potential see Sandmann et al. (1999, 0.083 Tocopherol is synthesized from geranylgeranylpy Tibtech 17; 233-237). rophosphate which is generated from isopentenylpyrophoS 0089 Plant genes originating from Physcomitrella patens phate (IPP). IPP can be produced via two independent can be used to modify carotenoid metabolism in plants as pathways. One pathway is located in the cytoplasm, whereas well as algae and microorganisms enabling these host cells the other is located in the (for descriptions and to increase their capacity to produce carotenoids and to reviews see Trelfall D R, Whistance G R in Aspects of produce newly designed carotenoids as well as improving Terpenoid Chemistry and Biochemistry, Goodwin TWEd., survival and fitness of the host cell due to the expression of Academic Press, London, 1971: 357-404, Michal G. Ed. plant acrotenoid biosynthetic genes. 1999, Biochemical Pathways, Spektrum Akademischer Ver lag GmbH Heidelberg, and references cited therein; McCa 0090. Due to results obtained in labelling experiments it skill D, Croteau R 1998, Tibtech 16:349-355 and references is clear that carotenes arise from the isoprenoid biosynthesis cited therein; Rhomer M 1998, Progress in Drug Research pathway via geranylgeranylpyrophosphate Synthesis. For 50: 135-154; Lichtenthaler H K 19998, Annu. Rev. Plant review of products of the isoprenoid biosynthetic pathway Physiol. Plant Mol. Biol. 50: 47-65; Lichtenthaler H K, including carotenoids see Chappell 1995, Annu. Rev. Plant Schwender J, Disch A, Rhomer M 1997, FEBS Letters 400: Physiol. Plant Mol. Biol. 46:521-547. The biosynthesis of 271-274; SchultZ, G, Soll J 1980 Deutsche Tierärzliche carotenoids in microorganims and plants is described in Wochenschrift 87: 401-424; Arigoni D, Sagner S, Latzel C, following articles and references therein: Armstrong 1997, Eisenreich W. Bacher A, Zenk, M H 1997 Proc. Natl. Acad. Annu. Rev. Microbiol. 51:629-659; Sandmannn 1994, Eur. Sci. USA 94(2): 10600-10605). For a general review of J. Biochem. 223:7-24; Misawa et al. 1995, J. Bacteriol. 177 isoprene biosynthesis and products derived from that path (22): 6575-6584; -Hirschberg et al. 1997, Pure & Appl. way (Chappell 1995, Annu. Rev. Plant Physiol. Plant Mol. Chem 69 (10):2151-2158; Lotan & Hirschberg 1995, FEBS Biol. 46:521-547; Sharkey TD, 1996, Endeavor 20:74-78). Letters 364:125-128; U.S. Pat. No. 5,916,791). 0084. The cyclic structures which are required for toco 0091. The large-scale production of the fine chemical pherol biosynthesis are quinones. Quinones are Synthesized compounds described above has largely relied on cell-free from products of the Shikimate pathway (for review See chemical Syntheses. Production through large Scale fermen Dewick P M 1995, Natural Products Reports 12(6): 579 tation of microorganism has not yet proven to be useful, due 607; Weaver L. M., Herrmann K M 1997, Trends in Plant to unsufficient efficience and high costs. Although not yet Science 209): 346-351; Schmid J, Amrhein N 1995, Phy applicable for large Scale production it has been shown that tochemistry 39(4): 737-749). production of fine chemicals can be enhanced in genetically 0085 Plant genes originating from Physcomitrella patens modified plants as exemplified for phytoene in rice can be used to modify tocopherol metabolism in plants as (Burkhardt et al. Plant Journal 11(5):1071-8, 1997) and well as algae and microorganisms enabling these host cells vitamin E in Arabidopsis thaliana and other plants (Shintani to increase their capacity to produce tocopherols as well as and DellaPenna. Science 282(5396):2098-100, 1998; W099/ improving survival and fitness of the host cell. Thereby, one 23231). Increased amounts of Such compounds in plants are or Several genes, alone or in combination, preferably of the especially appreciable because the plants can be directly genes encoding the Y-tocopherol-methyltransferase applied for food and feed purposes. (gamma-TMT type I), 2-methyl-6-phytylplastoquinol meth yltransferase (garnma-TMT type II) or 4-hydroxyphe Elements and Methods of the Invention nylpyruvate dioxygenase, can be used to modify the toco pherol metabolism. 0092. The present invention is based, at least in part, on the discovery of novel molecules, referred to herein as 0086) Carotenoids: TCMRP nucleic acid and protein molecules, which play a 0.087 Carotenoids are naturally occurring pigments syn role in or function in one or more cellular metabolic path thesized as hydrocarbons (carotenes) and their oxygenated ways in Physcomitrella patens. In one embodiment, the derivatives (Xantophylls) are produced by plants and micro TCMRP molecules catalyze an enzymatic reaction involving organisms. The application potential was broadly investi one or more tocopherol and/or carotinoid metabolic path gated during the last 20 years. Besides the use of carotenoids ways. In a preferred embodiment, the activity of the TCMRP as coloring agents, it is assumed that carotenoids play an molecules of the present invention in one or more PhySCOmi important role in the prevention of cancer (Rice-Evans et al. trella patens metabolic pathways for tocopherols and caro 1997, Free Radic. Res. 26:381-398; Gerster 1993, Int. J. tenoids has an impact on the production of a desired fine Vitam. Nutr. Res.63:93-121; Bendich 1993, Ann. New York chemical by this organism. In a particularly preferred Acad. Sci. 691:61-67) thus representing a good area for food embodiment, the TCMRPs encoded by TCMRP nucleotides design, feed applications and pharmaceutical applications. of the invention are modulated in activity, Such that the 0088. The major function of carotenoids in plants and mircroorganisms or plants metabolic pathways which the microoganisms is in protection against Oxidative damage by TCMRPs of the invention regulate are modulated in yield, US 2003/O157592 A1 Aug. 21, 2003 production, and/or efficiency of production and/or transport TCMRPs having altered activities which indirectly impact of a desired fine chemical by microorganisms and plants. the production of one or more desired fine chemicals from 0093. The language, TCMRP or TCMRP polypeptide microorganisms, algae and plants. For example, a biosyn includes proteins which play a role in, e.g., catalyze an thetic enzyme may be improved in efficiency, or its allosteric enzymatic reaction, in one or more tocopherol and caro control region destroyed such that feedback inhibition of tenoid metabolic pathways in microorganisms and plants. production of the compound is prevented. Similarly, a Examples of TCMRPs include those encoded by the degradative enzyme may be deleted or modified by Substi TCMRP genes set forth in Table 1 and Appendix A. The tution, deletion, or addition Such that its degradative activity terms TCMRP gene or TCMRP nucleic acid sequence is lessened for the desired compound without impairing the include nucleic acid Sequences encoding an TCMRP, which viability of the cell. In each case, the overall yield or rate of consist of a coding region or a part thereof and/or also production of one of these desired fine chemicals may be corresponding untranslated 5' and 3' sequence regions. increased. Examples of TCMRP genes include those set forth in Table 0095. It is also possible that such alterations in the protein 1. The terms production or productivity are art-recognized and nucleotide molecules of the invention may improve the and include the concentration of the fermentation product production of other fine chemicals besides the tocopherols (for example, the desired fine chemical) formed within a and carotenoids. Metabolism of any one compound is nec given time and a given fermentation volume (e.g., kg essarily intertwined with other biosynthetic and degradative product per hour per liter). The term efficiency of production pathways within the cell, and necessary cofactors, interme includes the time required for a particular level of produc diates, or Substrates in one pathway are likely Supplied or tion to be achieved (for example, how long it takes for the limited by another Such pathway. Therefore, by modulating cell to attain a particular rate of output of a fine chemical). the activity of one or more of the proteins of the invention, The term yield or product/carbon yield is art-recognized and the production or efficiency of activity of another fine includes the efficiency of the conversion of the carbon chemical biosynthetic or degradative pathway may be Source into the product (i.e., fine chemical). This is generally impacted. For example, amino acids Serve as the Structural written as, for example, kg product per kg carbon Source. By units of all proteins, yet may be present intracellularly in increasing the yield or production of the compound, the levels which are limiting for protein Synthesis, therefore, by quantity of recovered molecules, or of useful recovered increasing the efficiency of production or the yields of one molecules of that compound in a given amount of culture or more amino acids within the cell, proteins, Such as over a given amount of time is increased. The terms bio biosynthetic or degradative proteins, may be more readily Synthesis or a biosynthetic pathway are art-recognized and Synthesized. Likewise, an alteration in a metabolic pathway include the Synthesis of a compound, preferably an organic enzyme Such that a particular Side reaction becomes more or compound, by a cell from intermediate compounds in what may be a multistep and highly regulated process. The terms less favored may result in the over- or under-production of degradation or a degradation pathway are art-recognized and one or more compounds which are utilized as intermediates include the breakdown of a compound, preferably an organic or Substrates for the production of a desired fine chemical. compound, by a cell to degradation products (generally 0096) TCMRPs of the invention involved in the export of Speaking, Smaller or less complex molecules) in what may waste products may be increased in number or activity Such be a multistep and highly regulated process. The language that the normal metabolic wastes of the cell (possibly metabolism is art-recognized and includes the totality of the increased in quantity due to the overproduction of the biochemical reactions that take place in an organism. The desired fine chemical) are efficiently exported before they metabolism of a particular compound, then, (e.g., the are able to damage nucleotides and proteins within the cell metabolism of a ) comprises the overall biosyn (which would decrease the viability of the cell) or to thetic, modification, and degradation pathways in the cell interfere with fine chemical biosynthetic pathways (which related to this compound. would decrease the yield, production, or efficiency of pro 0094) In another embodiment, the TCMRP molecules of duction of the desired fine chemical). Further, the relatively the invention are capable of modulating the production of a large intracellular quantities of the desired fine chemical desired molecule, Such as a fine chemical, in microorgan may in itself be toxic to the cell, So by increasing the activity isms and plants. There are a number of mechanisms by or number of transporters able to export this compound from which the alteration of an TCMRP of the invention may the cell, one may increase the viability of the cell in culture, directly affect the yield, production, and/or efficiency of in turn leading to a greater number of cells in the culture production of a fine chemical from a microorganisms or producing the desired fine chemical. plant Strain incorporating Such an altered protein. Those 0097. The TCMRPs of the invention may also be TCMRPs involved in the transport of fine chemical mol manipulated Such that the relative amounts of different ecules within or from the cell may be increased in number tocopherols and carotenoids are produced. The isolated or activity Such that greater quantities of these compounds nucleic acid Sequences of the invention are contained within are transported across membranes. Similarly, those TCM the genome of a Physcomitrella patens strain available RPs involved in the import of nutrients necessary for the through the moss collection of the . biosynthesis of one or more fine chemicals may be increased The nucleotide sequence of the isolated Physcomitrella in number or activity Such that these precursor, cofactor, or patens TCMRP cDNAS and the predicted amino acid intermediate compounds are increased in concentration sequences of the respective Physcomitrella patens TCMRPs within a desired cell. Further TCMRPS may be increased in are shown in Appendices A and B, respectively. number or activity which lead to a regeneration of a pool of fine chemicals in a desired State. The mutagenesis of one or 0098 Computational analyses were performed which more TCMRP genes of the invention may also result in classified and/or identified these nucleotide Sequences as US 2003/O157592 A1 Aug. 21, 2003

Sequences which encode proteins involved in the metabo of Appendix A, or a portion thereof, can be isolated using lism of amino acids, Vitamins, cofactors, nutraceuticals, Standard molecular biology techniques and the Sequence nucleotide or nucleosides. information provided herein. For example, a P. patens TCMRP cDNA can be isolated from a P. patens library using 0099. The present invention also pertains to proteins all or portion of one of the Sequences of Appendix A as a which have an amino acid Sequence which is Substantially hybridization probe and Standard hybridization techniques homologous to an amino acid Sequence of Appendix B. AS (e.g., as described in Sambrook et al., Molecular Cloning. A used herein, a protein which has an amino acid Sequence Laboratory Manual. 2nd, ed., Cold Spring Harbor Labora which is Substantially homologous to a Selected amino acid tory, Cold Spring Harbor Laboratory Press, Cold Spring Sequence is least about 50% homologous to the Selected Harbor, N.Y., 1989). Moreover, a nucleic acid molecule amino acid Sequence, e.g., the entire Selected amino acid encompassing all or a portion of one of the Sequences of Sequence. A protein which has an amino acid Sequence Appendix A can be isolated by the polymerase chain reac which is Substantially homologous to a Selected amino acid tion using oligonucleotide primerS designed based upon this sequence can also be least about 50-60%, preferably at least Sequence (e.g., a nucleic acid molecule encompassing all or about 60-70%, and more preferably at least about 70-80%, a portion of one of the Sequences of Appendix A can be 80-90%, or 90-95%, and most preferably at least about 96%, isolated by the polymerase chain reaction using oligonucle 97%, 98%, 99% or more homologous to the selected amino otide primerS designed based upon this Same Sequence of acid Sequence. Appendix A). For example, MRNA can be isolated from 0100. The TCMRP or a biologically active portion or plant cells (e.g., by the guanidinium-thiocyanate extraction fragment thereof of the invention can catalyze an enzymatic procedure of Chirgwin et al. (1979) Biochemistry 18: 5294 reaction in one or more tocopherol and carotenoid metabolic 5299) and cDNA can be prepared using reverse transcriptase pathways in plants and microorganisms, or have one or more (e.g., Moloney MLV reverse transcriptase, available from of the activities set forth in Table 1. Various aspects of the Gibco/BRL, Bethesda, Md., or AMV reverse transcriptase, invention are described in further detail in the following available from Seikagaku America, Inc., St. Petersburg, SubSections: Fla.). Synthetic oligonucleotide primers for polymerase chain reaction amplification can be designed based upon one A. Isolated Nucleic Acid Molecules of the nucleotide Sequences shown in Appendix A. A nucleic acid of the invention can be amplified using cDNA or, 0101 One aspect of the invention pertains to isolated alternatively, genomic DNA, as a template and appropriate nucleic acid molecules that encode TCMRP polypeptides or oligonucleotide primers according to Standard PCR ampli biologically active portions thereof, as well as nucleic acid fication techniques. The nucleic acid So amplified can be fragments Sufficient for use as hybridization probes or cloned into an appropriate vector and characterized by DNA primers for the identification or amplification of TCMRP Sequence analysis. Furthermore, oligonucleotides corre encoding nucleic acid (e.g., TCMRPDNA). As used herein, sponding to an TCMRP nucleotide Sequence can be prepared the term “nucleic acid molecule' is intended to include DNA by Standard Synthetic techniques, e.g., using an automated molecules (e.g., cDNA or genomic DNA) and RNA mol DNA synthesizer. ecules (e.g., mRNA) and analogs of the DNA or RNA 0103) In a preferred embodiment, an isolated nucleic acid generated using nucleotide analogs. This term also encom molecule of the invention comprises one of the nucleotide passes untranslated Sequence located at both the 3' and 5' Sequences shown in Appendix A. The Sequences of Appen ends of the coding region of the gene: at least about 100 dix A correspond to the Physcomitrella patens TCMRP nucleotides of Sequence upstream from the 5' end of the cDNAs of the invention. This cDNA comprises sequences coding region and at least about 20 nucleotides of Sequence encoding TCMRPs (i.e., the “coding region”, indicated in downstream from the 3' end of the coding region of the gene. each sequence in Appendix A), as well as 5' untranslated The nucleic acid molecule can be Single-Stranded or double Sequences and 3' untranslated Sequences. Alternatively, the stranded, but preferably is double-stranded DNA. An “iso nucleic acid molecule can comprise only the coding region lated nucleic acid molecule is one which is separated from of any of the Sequences in Appendix A or can contain whole other nucleic acid molecules which are present in the natural genomic fragments isolated from genomic DNA. In another Source of the nucleic acid. Preferably, an “isolated” nucleic embodiment, the Sequences of Appendix A can have corre acid is free of Sequences which naturally flank the nucleic sponding longest nucleic acid molecules, e.g. full length or acid (i.e., Sequences located at the 5' and 3' ends of the nearly full length nucleic acid molecules encoding a nucleic acid) in the genomic DNA of the organism from TCMRP. The corresponding clone name is given in Table 1. which the nucleic acid is derived. For example, in various embodiments, the isolated TCMRP nucleic acid molecule 0104 For the purposes of this application, it will be can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 understood that each of the Sequences Set forth in Appendix kb or 0.1 kb of nucleotide sequences which naturally flank A has an identifying entry number. Each of these Sequences the nucleic acid molecule in genomic DNA of the cell from comprises up to three parts: a 5' upstream region, a coding which the nucleic acid is derived (e.g., a Physcomitrella region, and a downstream region. Each of these three patens cell). Moreover, an "isolated nucleic acid molecule, regions is identified by the same entry number designation such as a cDNA molecule, can be substantially free of other to eliminate confusion. The recitation one of the Sequences cellular material, or culture medium when produced by in Appendix A, then, refers to any of the Sequences in recombinant techniques, or chemical precursors or other Appendix A, which may be distinguished by their differing chemicals when chemically Synthesized. entry number designations. The coding region of each of these Sequences is translated into a corresponding amino 0102) A nucleic acid molecule of the present invention, acid Sequence, which is Set forth in Appendix B. The e.g., a nucleic acid molecule having a nucleotide Sequence Sequences of Appendix B are identified by the same entry US 2003/O157592 A1 Aug. 21, 2003 numbers designations as Appendix A, Such that they can be 0108. In one embodiment, the nucleic acid molecule of readily correlated. For example, the amino acid Sequence in the invention encodes a protein or portion thereof which Appendix B designated 41 bd10 g03rev is a translation of includes an amino acid Sequence which is Sufficiently the coding region of the nucleotide Sequence of nucleic acid homologous to an amino acid Sequence of Appendix B Such molecule 41 bd10 g03rev in Appendix A, and the amino that the protein or portion thereof maintains the ability to acid sequence in Appendix B designated 68 ck12 d10fwd catalyze an enzymatic reaction in a tocopherol or carotenoid is a translation of the coding region of the nucleotide metabolic pathway in microorganisms or plants. AS used Sequence of nucleic acid molecule 68 ck12 d10fwd in herein, the language “Sufficiently homologous' refers to Appendix A. proteins or portions thereof which have amino acid Sequences which include a minimum number of identical or 0105. In another preferred embodiment, an isolated equivalent (e.g., an amino acid residue which has a similar nucleic acid molecule of the invention comprises a nucleic Side chain as an amino acid residue in one of the Sequences acid molecule which is a complement of one of the nucle of Appendix B) amino acid residues to an amino acid otide Sequences shown in Appendix A, or a portion thereof. Sequence of Appendix B Such that the protein or portion A nucleic acid molecule which is complementary to one of thereof is able to catalyze an enzymatic reaction in a the nucleotide Sequences shown in Appendix A is one which tocopherol or carotenoid metabolic pathway in microorgan is Sufficiently complementary to one of the nucleotide isms or plants. Protein members of Such metabolic path Sequences shown in Appendix A Such that it can hybridize to ways, as described herein, function to catalyze the biosyn one of the nucleotide Sequences shown in Appendix A, thesis or degradation or Stabilisation of one or more thereby forming a stable dupleX. tocopherols or carotenoids. Examples of Such activities are 0106. In still another preferred embodiment, an isolated also described herein. Thus, the function of an TCMRP” nucleic acid molecule of the invention comprises a nucle contributes either directly or indirectly to the yield, produc otide sequence which is at least about 50-60%, preferably at tion, and/or efficiency of production of one or more fine least about 60-70%, more preferably at least about 70-80%, chemicals. Examples of TCMRP activities are set forth in 80-90%, or 90-95%, and even more preferably at least about Table 1. 95%, 96%, 97%, 98%, 99% or more homologous to a nucleotide Sequence shown in Appendix A, or a portion 0109. In another embodiment, the protein is at least about thereof. In an additional preferred embodiment, an isolated 50-60%, preferably at least about 60-70%, and more pref nucleic acid molecule of the invention comprises a nucle erably at least about 70-80%, 80-90%, 90-95%, and most otide Sequence which hybridizes, e.g., hybridizes under preferably at least about 96%, 97%, 98%, 99% or more Stringent conditions, to one of the nucleotide Sequences homologous to an entire amino acid Sequence of Appendix shown in Appendix A, or a portion thereof. B. 0107 Moreover, the nucleic acid molecule of the inven 0110 Portions of proteins encoded by the TCMRP tion can comprise only a portion of the coding region of one nucleic acid molecules of the invention are preferably bio of the Sequences in Appendix A, for example a fragment logically active portions of one of the TCMRP. As used which can be used as a probe or primer or a fragment herein, the term “biologically active portion of an TCMRP” encoding a biologically active portion of an TCMRP. The is intended to include a portion, e.g., a domain/motif, of an nucleotide Sequences determined from the cloning of the TCMRP that participates in the metabolism of fine chemi TCMRP genes from P patens allows for the generation of cals like amino acids, Vitamins, cofactors, nutraceuticals, probes and primerS designed for use in identifying and/or nucleotides, or nucleosides in microorganisms or plants or cloning TCMRPhomologues in other cell types and organ has an activity as set forth in Table 1. To determine whether isms, as well as TCMRP homologues from other mosses or an TCMRP or a biologically active portion thereof can related Species. The probe/primer typically comprises Sub participate in the metabolism of fine chemicals like amino Stantially purified oligonucleotide. The oligonucleotide typi acids, Vitamins, cofactors, nutraceuticals, nucleotides, or cally comprises a region of nucleotide Sequence that hybrid nucleosides in microorganisms or plants, an assay of enzy izes under Stringent conditions to at least about 12, matic activity may be performed. Such assay methods are preferably about 25, more preferably about 40, 50 or 75 well known to those skilled in the art, as detailed in Example consecutive nucleotides of a Sense Strand of one of the 17 of the Exemplification. Sequences Set forth in Appendix A, an anti-Sense Sequence of 0111 Additional nucleic acid fragments encoding bio one of the Sequences Set forth in Appendix A, or naturally logically active portions of an TCMRP can be prepared by occurring mutants thereof. Primers based on a nucleotide isolating a portion of one of the Sequences in Appendix B, Sequence of Appendix A can be used in PCR reactions to expressing the encoded portion of the TCMRP or peptide clone TCMRP homologues. Probes based on the TCMRP (e.g., by recombinant expression in vitro) and assessing the nucleotide Sequences can be used to detect transcripts or genomic Sequences encoding the Same or homologous pro activity of the encoded portion of the TCMRP or peptide. teins. In preferred embodiments, the probe further comprises 0112 The invention further encompasses nucleic acid a label group attached thereto, e.g. the label group can be a molecules that differ from one of the nucleotide Sequences radioisotope, a fluorescent compound, an enzyme, or an shown in Appendix A (and portions thereof) due to degen enzyme cofactor. Such probes can be used as a part of a eracy of the genetic code and thus encode the same TCMRP genomic marker test kit for identifying cells which misex as that encoded by the nucleotide Sequences shown in press an TCMRP, such as by measuring a level of an Appendix A. In another embodiment, an isolated nucleic TCMRP-encoding nucleic acid in a Sample of cells, e.g., acid molecule of the invention has a nucleotide Sequence detecting TCMRP mRNA levels or determining whether a encoding a protein having an amino acid Sequence shown in genomic TCMRPgene has been mutated or deleted. Appendix B. In a still further embodiment, the nucleic acid US 2003/O157592 A1 Aug. 21, 2003

molecule of the invention encodes a full length Physcomi sequence of the encoded TCMRP, without altering the trella patens protein which is Substantially homologous to an functional ability of the TCMRP. For example, nucleotide amino acid sequence of Appendix B (encoded by an open Substitutions leading to amino acid Substitutions at “non reading frame shown in Appendix A). essential” amino acid residues can be made in a Sequence of 0113. In addition to the Physcomitrella patens TCMRP Appendix A. A “non-essential” amino acid residue is a nucleotide Sequences shown in Appendix A, it will be residue that can be altered from the wild-type Sequence of appreciated by those skilled in the art that DNA sequence one of the TCMRP proteins (Appendix B) without altering polymorphisms that lead to changes in the amino acid the activity of said TCMRP, whereas an “essential” amino Sequences of TCMRPS may exist within a population (e.g., acid residue is required for TCMRP activity. Other amino the PhySCOmitrella patens population). Such genetic poly acid residues, however, (e.g., those that are not conserved or morphism in the TCMRP gene may exist among individuals only semi-conserved in the domain having TCMRP activity) within a population due to natural variation. AS used herein, may not be essential for activity and thus are likely to be the terms “gene' and “recombinant gene’ refer to nucleic amenable to alteration without altering TCMRP activity. acid molecules comprising an open reading frame encoding an TCMRP, preferably a Physcomitrella patens TCMRP. 0116. Accordingly, another aspect of the invention per Such natural variations can typically result in 1-5% Variance tains to nucleic acid molecules encoding TCMRPs that in the nucleotide sequence of the TCMRP gene. Any and all contain changes in amino acid residues that are not essential for TCMRP activity. Such TCMRPs differ in amino acid Such nucleotide variations and resulting amino acid poly Sequence from a sequence contained in Appendix B yet morphisms in TCMRPsthat are the result of natural variation retain at least one of the TCMRP activities described herein. and that do not alter the functional activity of TCMRPs are In one embodiment, the isolated nucleic acid molecule intended to be within the scope of the invention. comprises a nucleotide Sequence encoding a protein, 0114) Nucleic acid molecules corresponding to natural wherein the protein comprises an amino acid Sequence at variants and non-PhySCOmitrella patens homologues of the least about 50% homologous to an amino acid Sequence of Physcomitrella patens TCMRP cDNA of the invention can Appendix B and is able to catalyze an enzymatic reaction in be isolated based on their homology to Physcomitrella a tocopherol or carotenoid metabolic pathway in P. patens, patens TCMRP nucleic acid disclosed herein using the or has one or more activities set forth in Table 1. Preferably, Physcomitrella patens c)NA, or a portion thereof, as a the protein encoded by the nucleic acid molecule is at least hybridization probe according to Standard hybridization about 50-60% homologous to one of the sequences in techniques under Stringent hybridization conditions. Appendix B, more preferably at least about 60-70% homolo Accordingly, in another embodiment, an isolated nucleic gous to one of the Sequences in Appendix B, even more acid molecule of the invention is at least 15 nucleotides in preferably at least about 70-80%, 80-90%, 90-95% homolo length and hybridizes under Stringent conditions to the gous to one of the Sequences in Appendix B, and most nucleic acid molecule comprising a nucleotide Sequence of preferably at least about 96%, 97%, 98%, or 99% homolo Appendix A. In other embodiments, the nucleic acid is at gous to one of the Sequences in Appendix B. least 30, 50, 100, 250 or more nucleotides in length. As used herein, the term "hybridizes under Stringent conditions” is 0117 To determine the percent homology of two amino intended to describe conditions for hybridization and wash acid sequences (e.g., one of the sequences of Appendix B ing under which nucleotide Sequences at least 60% homolo and a mutant form thereof) or of two nucleic acids, the gous to each other typically remain hybridized to each other. Sequences are aligned for optimal comparison purposes Preferably, the conditions are Such that Sequences at least (e.g., gaps can be introduced in the Sequence of one protein about 65%, more preferably at least about 70%, and even or nucleic acid for optimal alignment with the other protein more preferably at least about 75% or more homologous to or nucleic acid). The amino acid residues or nucleotides at each other typically remain hybridized to each other. Such corresponding amino acid positions or nucleotide positions Stringent conditions are known to those skilled in the art and are then compared. When a position in one sequence (e.g., can be found in Current Protocols in Molecular Biology, one of the Sequences of Appendix B) is occupied by the John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. A preferred, Same amino acid residue or nucleotide as the corresponding non-limiting example of Stringent hybridization conditions position in the other sequence (e.g., a mutant form of the are hybridization in 6xsodium chloride/sodium citrate (SSC) Sequence Selected from Appendix B), then the molecules are at about 45 C., followed by one or more washes in homologous at that position (i.e., as used herein amino acid 0.2xSSC, 0.1% SDS at 50-65° C. Preferably, an isolated or nucleic acid “homology' is equivalent to amino acid or nucleic acid molecule of the invention that hybridizes under nucleic acid “identity”). The percent homology between the Stringent conditions to a sequence of Appendix A corre two Sequences is a function of the number of identical sponds to a naturally-occurring nucleic acid molecule. AS positions shared by the Sequences (i.e., % homology=num used herein, a “naturally-occurring nucleic acid molecule bers of identical positions/total numbers of positionsx100). refers to an RNA or DNA molecule having a nucleotide 0118. An isolated nucleic acid molecule encoding an Sequence that occurs in nature (e.g., encodes a natural TCMRP homologous to a protein sequence of Appendix B protein). In one embodiment, the nucleic acid encodes a can be created by introducing one or more nucleotide natural Physcomitrella patens TCMRP. Substitutions, additions or deletions into a nucleotide 0115) In addition to naturally-occurring variants of the Sequence of Appendix A Such that one or more amino acid TCMRPsequence that may exist in the population, the Substitutions, additions or deletions are introduced into the skilled artisan will further appreciate that changes can be encoded protein. can be introduced into one of the introduced by into a nucleotide Sequence of Sequences of Appendix A by Standard techniques, Such as Appendix A, thereby leading to changes in the amino acid Site-directed mutagenesis and PCR-mediated mutagenesis. US 2003/O157592 A1 Aug. 21, 2003

Preferably, conservative amino acid Substitutions are made enzymatic ligation reactions using procedures known in the at one or more predicted non-essential amino acid residues. art. For example, an antisense nucleic acid (e.g., an antisense A “conservative amino acid Substitution' is one in which the oligonucleotide) can be chemically Synthesized using natu amino acid residue is replaced with an amino acid residue rally occurring nucleotides or variously modified nucle having a similar Side chain. Families of amino acid residues otides designed to increase the biological Stability of the having Similar Side chains have been defined in the art. molecules or to increase the physical Stability of the duplex These families include amino acids with basic side chains formed between the antisense and Sense nucleic acids, e.g., (e.g., lysine, arginine, histidine), acidic side chains (e.g., phosphorothioate derivatives and acridine Substituted nucle aspartic acid, glutamic acid), uncharged polar side chains otides can be used. Examples of modified nucleotides which (e.g., glycine, asparagine, glutamine, Serine, threonine, can be used to generate the antisense nucleic acid include tyrosine, cysteine), nonpolar side chains (e.g., alanine, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, Valine, leucine, isoleucine, proline, phenylalanine, methion hypoxanthine, Xanthine, 4-acetylcytosine, 5-(carboxyhy ine, tryptophan), beta-branched side chains (e.g., threonine, droxylmethyl) uracil, 5-carboxymethylaminomethyl-2- valine, isoleucine) and aromatic side chains (e.g., tyrosine, thiouridine, 5-carboxymethylaminomethyluracil, dihydrou phenylalanine, tryptophan, histidine). Thus, a predicted non racil, beta-D-galactosylcqueosine, inosine, essential amino acid residue in an TCMRP is preferably N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, replaced with another amino acid residue from the same side 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, chain family. Alternatively, in another embodiment, muta 3-methylcytosine, 5-methylcytosine, N6-adenine, -7-meth tions can be introduced randomly along all or part of an ylguanine, 5-methylaminomethyluracil, 5-methoxyaminom TCMRP coding Sequence, Such as by Saturation mutagen ethyl-2-thiouracil, beta-D-mannosylcqueosine, 5'-methoxy esis, and the resultant mutants can be Screened for an carboxymethyluracil, 5-methoxyuracil, 2-methylthio-N6 TCMRP activity described herein to identify mutants that isopentenyladenine, uracil-5-oxyacetic acid (v), retain TCMRP activity. Following mutagenesis of one of the Wybutoxosine, pseudouracil, queosine, 2-thiocytosine, Sequences of Appendix A, the encoded protein can be 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methylu expressed recombinantly and the activity of the protein can racil, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic be determined using, for example, assays described herein acid (v), 5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carbox (see Example 17 of the Exemplification). ypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Alterna 0119). In addition to the nucleic acid molecules encoding tively, the antisense nucleic acid can be produced biologi TCMRPs described above, another aspect of the invention cally using an expression vector into which a nucleic acid pertains to isolated nucleic acid molecules which are anti has been Subcloned in an antisense orientation (i.e., RNA Sense thereto. An "antisense' nucleic acid comprises a transcribed from the inserted nucleic acid will be of an nucleotide Sequence which is complementary to a “Sense” antisense orientation to a target nucleic acid of interest, nucleic acid encoding a protein, e.g., complementary to the described further in the following Subsection). coding strand of a double-stranded cDNA molecule or 0121 The antisense nucleic acid molecules of the inven complementary to an mRNA sequence. Accordingly, an tion are typically administered to a cell or generated in Situ antisense nucleic acid can hydrogen bond to a Sense nucleic such that they hybridize with or bind to cellular MRNA acid. The antisense nucleic acid can be complementary to an and/or genomic DNA encoding an TCMRP to thereby entire TCMRP cDNA coding strand, or to only a portion inhibit expression of the protein, e.g., by inhibiting tran thereof. In one embodiment, an antisense nucleic acid mol scription and/or translation. The hybridization can be by ecule is antisense to a “coding region' of the coding Strand conventional nucleotide complementarity to form a stable of a nucleotide sequence encoding an TCMRP. The term duplex, or, for example, in the case of an antisense nucleic “coding region” refers to the region of the nucleotide acid molecule which binds to DNA duplexes, through spe cific interactions in the major groove of the double helix. Sequence comprising codons which are translated into amino The antisense molecule can be modified Such that it spe acid residues. In another embodiment, the antisense nucleic cifically binds to a receptor or an antigen expressed on a acid molecule is antisense to a "noncoding region' of the Selected cell Surface, e.g., by linking the antisense nucleic coding Strand of a nucleotide Sequence encoding TCMRPs. acid molecule to a peptide or an antibody which binds to a The term "noncoding region” refers to 5' and 3' Sequences cell Surface receptor or antigen. The antisense nucleic acid which flank the coding region that are not translated into molecule can also be delivered to cells using the vectors amino acids (i.e., also referred to as 5' and 3' untranslated described herein. To achieve Sufficient intracellular concen regions). trations of the antisense molecules, vector constructs in 0120 Given the coding strand sequences encoding TCM which the antisense nucleic acid molecule is placed under RPS disclosed herein (e.g., the sequences Set forth in Appen the control of a Strong prokaryotic, viral, or eukaryotic dix A), antisense nucleic acids of the invention can be including plant promoters are preferred. designed according to the rules of Watson and Crick base 0122) In yet another embodiment, the antisense nucleic pairing. The antisense nucleic acid molecule can be comple acid molecule of the invention is an O-anomeric nucleic acid mentary to the entire coding region of TCMRP cDNA, but molecule. An O-anomeric nucleic acid molecule forms Spe more preferably is an oligonucleotide which is antisense to cific double-stranded hybrids with complementary RNA in only a portion of the coding or noncoding region of TCMRP which, contrary to the usual B-units, the Strands run parallel mRNA. For example, the antisense oligonucleotide can be to each other (Gaultier et al. (1987)Nucleic Acids. Res. complementary to the region Surrounding the translation 15:6625-6641). The antisense nucleic acid molecule can start site of TCMRPMRNA. An antisense oligonucleotide also comprise a 2'-O-methylribonucleotide (Inoue et al. can be, for example, about 5, 10, 15, 20, 25, 30, 35, 40, 45 (1987) Nucleic Acids Res. 15:6131-6148) or a chimeric or 50 nucleotides in length. An antisense nucleic acid of the RNA-DNA analogue (Inoue et al. (1987) FEBS Lett. invention can be constructed using chemical Synthesis and 215:327-330). US 2003/O157592 A1 Aug. 21, 2003

0123. In still another embodiment, an antisense nucleic include one or more regulatory Sequences, Selected on the acid of the invention is a ribozyme. Ribozymes are catalytic basis of the host cells to be used for expression, which is RNA molecules with ribonuclease activity which are operatively linked to the nucleic acid Sequence to be capable of cleaving a single-Stranded nucleic acid, Such as expressed. an mRNA, to which they have a complementary region. Thus, ribozymes (e.g., hammerhead ribozymes (described in 0127 Suitable vectors for plants are described, interalia, Haselhoff and Gerlach (1988) Nature 334:585-591)) can be in “Methods in Plant Molecular Biology and Biotechnol used to catalytically cleave TCMRP mRNA transcripts to ogy” (CRC Press), chapter 6/7, pp. 71-119 (1993). thereby inhibit translation of TCMRP mRNA. A ribozyme 0128. Within a recombinant expression vector, “operably having specificity for an TCMRP-encoding nucleic acid can linked' is intended to mean that the nucleotide Sequence of be designed based upon the nucleotide Sequence of an interest is linked to the regulatory Sequence(s) in a manner TCMRP cDNA disclosed herein. For example, a derivative which allows for expression of the nucleotide Sequence are of a Tetrahymena L-19 IVS RNA can be constructed in fused to each other So that both Sequences fulfil the proposed which the nucleotide Sequence of the active site is comple function addicted to the Sequence used. (e.g., in an in vitro mentary to the nucleotide Sequence to be cleaved in an transcription? translation System or in a host cell when the TCMRP-encoding mRNA. See, e.g., Cech et al. U.S. Pat. vector is introduced into the host cell). The term “regulatory No. 4,987,071 and Cech et al. U.S. Pat. No. 5,116,742. Sequence' is intended to include promoters, enhancers and Alternatively, TCMRP mRNA can be used to select a other expression control elements (e.g., polyadenylation catalytic RNA having a specific ribonuclease activity from a Signals). Such regulatory sequences are described, for pool of RNA molecules. See, e.g., Bartel, D. and Szostak, J. example, in Goeddel; Gene Expression Technology: Meth W. (1993) Science 261:14-11-1418. ods in Enzymology 185, Academic PreSS, San Diego, Calif. 0.124. Alternatively, TCMRP gene expression can be (1990) or in Gruber and Crosby, in: Methods in Plant inhibited by targeting nucleotide Sequences complementary Molecular Biology and Biotechnolgy, CRC Press, Boca to the regulatory region of an TCMRP nucleotide sequence Raton, Fla., eds. Glick and Thompson, Chapter 7, 89-108 (e.g., an TCMRP promoter and/or enhancers) to form triple including the references therein. helical structures that prevent transcription of an TCMRP 0129. Other advantageous regulatory sequences are gene in target cells. See generally, Helene, C. (1991) Anti present in, for example, the Gram-positive promoters amy cancer Drug Des. 6(6):569-84; Helene, C. et al. (1992) Ann. and SPO2, in the yeast or fungal promoters ADC1, MFa, N.Y. Acad. Sci. 660:27-36; and Maher, L. J. (1992) Bioas AC, P-60, CYC1, GAPDH, TEF, rp28, ADH or in the plant says 14(12):807-15. promoters CaMV/35S Franck et al., Cell 21 (1980) 285 294), PRP1 Ward et al., Plant. Mol. Biol. 22 (1993)), SSU, B. Recombinant Expression Vectors and Host Cells OCS, leb4, usp, STLS1, B33, nos or in the ubiquitin or 0.125. Another aspect of the invention pertains to vectors, phaseolin promoters. preferably expression vectors, containing a nucleic acid encoding an TCMRP (or a portion thereof). As used herein, 0.130 AS regards plants as genetically modified organ the term “vector” refers to a nucleic acid molecule capable isms, any promoter capable of governing the expression of of transporting another nucleic acid to which it has been foreign genes in plants is Suitable in principle as promoter of linked. One type of vector is a “plasmid', which refers to a the expression cassette. circular double stranded DNA loop into which additional 0131 Preferably, it is in particular a plant promoter or a DNA segments can be ligated. Another type of vector is a promoter derived from a plant virus which is used. Particu viral vector, wherein additional DNA segments can be larly preferred is the cauliflower mosaic virus CaMV 35S ligated into the viral genome. Certain vectors are capable of promoter (Franck et al., Cell 21 (1980), 285-294). As is autonomous replication in a host cell into which they are known, this promoter comprises various recognition introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other Sequences for transcriptional effectors which, in totality, lead vectors (e.g., non-episomal mammalian vectors) are inte to permanent and constitutive expression of the gene which grated into the genome of a host cell upon introduction into has been inserted (Benfey et al., EMBO J. 8 (1989), 2195 the host cell, and thereby are replicated along with the host 2202). genome. Moreover, certain vectors are capable of directing 0132) The expression cassette can also comprise a patho the expression of genes to which they are operatively linked. gen-inducible or chemically inducible promoter by means of Such vectors are referred to herein as “expression vectors'. which expression of the exogenous TCMRP genes in the In general, expression vectors of utility in recombinant DNA plant can be governed at a particular point in time. techniques are often in the form of plasmids. In the present Specification, "plasmid' and “vector' can be used inter 0.133 Examples of Such promoters which can be used changeably as the plasmid is the most commonly used form are, for example, the PRP1 promoter (Ward et al., Plant. of vector. However, the invention is intended to include Such Mol. Biol. 22 (1993), 361-366), a salicylic-acid-inducible other forms of expression vectors, Such as viral vectors (e.g., promoter (WO95/19443), a benzenesulfonamide-inducible replication defective retroviruses, adenoviruses and adeno promoter (EP-A 388186), a tetracyclin-inducible promoter associated viruses), which serve equivalent functions. (Gatz et al., (1992) Plant J. 2, 397-404), an abscisic-acid inducible promoter (EP-A 335528) or an ethanol- or cyclo 0.126 The recombinant expression vectors of the inven hexanone-inducible promoter (WO 93/21334). tion comprise a nucleic acid of the invention in a form Suitable for expression of the nucleic acid in a host cell, 0134) Furthermore, preferred promoters are in particular which means that the recombinant expression vectors those which ensure expression in tissueS or plant organs in US 2003/O157592 A1 Aug. 21, 2003 which, for example, the biosynthesis of tocopherol or its 0142. The recombinant expression vectors of the inven precursors takes place or in which the products are advan tion can be designed for expression of TCMRPs in prokary tageously accumulated. otic or eukaryotic cells. For example, TCMRP genes can be expressed in bacterial cells Such as C. glutamicum, insect 0135 Promoters which must be mentioned in particular cells (using baculovirus expression vectors), yeast and other are those for the entire plant owing to constitutive expres fungal cells (see Romanos, M.A. et al. (1992) Foreign gene Sion, Such as, for example, the CaMV promoter, the Agro expression in yeast: a review, Yeast 8: 423-488; van den bacterium OCS promoter (octopine Synthase), the Agrobac Hondel, C. A. M. J. J. et al. (1991) Heterologous gene terium NOS promoter (nopaline synthase), the ubiquitin expression in filamentous fungi, in: More Gene Manipula promoter, promoters of vacuolar ATPase Subunits, or the tions in Fungi, J. W. Bennet & L. L. Lasure, eds., p. 396-428: promoter of a proline-rich protein from wheat (wheat WO Academic Press: San Diego; and van den Hondel, C. A. M. 9113991). J. J. & Punt, P. J. (1991) Gene transfer systems and vector 0.136 Furthermore, promoters which must be mentioned development for filamentous fungi, in: Applied Molecular in particular are those which ensure leaf-specific expression. of Fungi, Peberdy, J. F. et al., eds., p. 1-28, Promoters which must be mentioned are the potato cytosolic Cambridge University Press: Cambridge), algae (Falciatore FBPase promoter (WO9705900), the Rubisco (ribulose-1, et al., 1999, Marine Biotechnology. 1 (3):239-251), ciliates 5-bisphosphate carboxylase) SSU (small subunit) promoter of the types: Holotrichia, Peritrichia, Spirotrichia, Suctoria, or the potato ST-LSI promoter (Stockhaus et al., EMBO J. Tetrahymena, Paramecium, Colpidium, Glaucoma, 8 (1989), 2445-245). Platyophrya, Potomacus, Pseudocohnilembus, Euplotes, Engelmaniella, and Stylonychia, especially of the genus Examples of Further Suitable Promoters Are Stylonychia lemnae with vectors following a transformation method as described in WO9801572 and multicellular plant 0.137 specific promoters for tubers, storage roots or roots cells (see Schmidt, R. and Willimitzer, L. (1988), High Such as, for example, the patatin promoter class I (B33), the efficiency Agrobacterium tumefaciens-mediated transforma potato cathepsin D inhibitor promoter, the Starch Synthase tion of Arabidopsis thaliana leaf and cotyledon explants, (GBSS1) promoter or the sporamin promoter, fruit-specific Plant Cell Rep.: 583-586); Plant Molecular Biology and promoterS Such as, for example, the tomato fruit-specific Biotechnology, C Press, Boca Raton, Fla., chapter 6/7, promoter (EP 409625), fruit-maturation-specific promoters S.71-119 (1993); F. F. White, B. Jenes et al., Techniques for Such as, for example, the tomato fruit-maturation-specific Gene Transfer, in: Transgenic Plants, Vol. 1, Engineering promoter (WO9421794), flower-specific promoters such as, and Utilization, eds. Kung und R. Wu, Academic Press for example, the phytoene synthase promoter (WO (1993), 128-43; Potrykus, Annu. Rev. Plant Physiol. Plant 92.16635) or the promoter of the P-rr gene (WO9822593) or Molec. Biol. 42 (1991), 205-225; or mammalian cells. Specific plastid or chromoplast promoterS Such as, for Suitable host cells are discussed further in Goeddel, Gene example, the RNA polymerase promoter (WO 9706250). Expression Technology. Methods in Enzymology 185, Aca 0.138. Other promoters which can advantageously be demic Press, San Diego, Calif. (1990). Alternatively, the used are the Glycine max phosphoribosyl pyrophosphate recombinant expression vector can be transcribed and trans amidotransferase promoter (See also Genbank Accession lated in vitro, for example using T7 promoter regulatory Number U87999) or another nodia-specific promoter as Sequences and T7 polymerase. described in EP 249676. 0.143 Expression of proteins in prokaryotes is most often 0.139. In principle, all natural promoters together with carried out with vectors containing constitutive or inducible their regulatory Sequences like those mentioned above can promoters directing the expression of either fusion or non be used for the process according to the invention. In fusion proteins. Fusion vectors add a number of amino acids addition, Synthetic promoters can also be used advanta to a protein encoded therein, usually to the amino terminus geously. of the recombinant protein but also to the C-terminus or fused within Suitable regions in the proteins. Such fusion 0140) Further, a seed-specific promoter (preferably the vectors typically serve three purposes: 1) to increase expres phaseolin promoter (U.S. Pat. No. 5,504,200), the USP Sion of recombinant protein; 2) to increase the Solubility of promoter (Baumlein, H. et al., Mol. Gen. Genet. (1991) 225 the recombinant protein; and 3) to aid in the purification of (3), 459-467), the Brassica Bce4 gene promoter (WO the recombinant protein by acting as a ligand in affinity 9113980) or the LEB4 promoter (Fiedler and Conrad, purification. Often, in fusion expression vectors, a pro 1995)), are advantagous. teolytic cleavage Site is introduced at the junction of the 0141 Regulatory sequences include those which direct fusion moiety and the recombinant protein to enable Sepa constitutive expression of a nucleotide Sequence in many ration of the recombinant protein from the fusion moiety types of host cell and those which direct expression of the Subsequent to purification of the fusion protein. Such nucleotide Sequence only in certain host cells or under enzymes, and their cognate recognition Sequences, include certain conditions. It will be appreciated by those skilled in Factor Xa, thrombin and enterokinase. the art that the design of the expression vector can depend 0144) Typical fusion expression vectors include pGEX on Such factors as the choice of the host cell to be trans (Pharmacia Biotech Inc.; Smith, D. B. and Johnson, K. S. formed, the level of expression of protein desired, etc. The (1988) Gene 67:31-40), pMAL (New England Biolabs, expression vectors of the invention can be introduced into Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) host cells to thereby produce proteins or peptides, including which fuse glutathione S-transferase (GST), maltose E bind fusion proteins or peptides, encoded by nucleic acids as ing protein, or protein A, respectively, to the target recom described herein (e.g., TCMRPs, mutant forms of TCMRPs, binant protein. In one embodiment, the coding Sequence of fusion proteins, etc.). the TCMRP is cloned into a pCEX expression vector to US 2003/O157592 A1 Aug. 21, 2003 create a vector encoding a fusion protein comprising, from Simian Virus 40. For other suitable expression systems for the N-terminus to the C-terminus, GST-thrombin cleavage both prokaryotic and -eukaryotic cells See chapters 16 and site-X protein. The fusion protein can be purified by affmity 17 of Sambrook, J., Fritsh, E. F., and Maniatis, T. Molecular chromatography using glutathione-agarose resin. Recombi Cloning. A Laboratory Manual. 2nd, ed., Cold Spring nant TCMRP unfused to GST can be recovered by cleavage Harbor Laboratory, Cold Spring Harbor Laboratory Press, of the fusion protein with thrombin. Cold Spring Harbor, N.Y., 1989. 0145 Examples of suitable inducible non-fusion E. coli 0150. In another embodiment, the recombinant mamma expression vectors include pTrc (Amann et al., (1988) Gene lian expression vector is capable of directing expression of 69:301-315) and pET 11d (Studier et al., Gene Expression the nucleic acid preferentially in a particular cell type (e.g., Technology. Methods in Enzymology 185, Academic Press, tissue-specific regulatory elements are used to express the San Diego, Calif. (1990) 60-89). Target gene expression nucleic acid). Tissue-specific regulatory elements are known from the pTrc vector relies on host RNA polymerase tran in the art. Non-limiting examples of Suitable tissue-specific Scription from a hybrid trp-lac fusion promoter. Target gene promoters include the albumin promoter (liver-specific; expression from the pe.T 11d vector relies on transcription Pinkert et al. (1987) Genes Dev. 1:268-277), lymphoid from a T7 gn10-lac fusion promoter mediated by a coex specific promoters (Calame and Eaton (1988) Adv. Immunol. pressed viral RNA polymerase (T7 gn1). This viral poly 43:235-275), in particular promoters of T cell receptors merase is supplied by host strains BL21(DE3) or (Winoto and Baltimore (1989) EMBO J. 8:729-733) and HMS174(DE3) from a resident prophage harboring a T7 immunoglobulins (Banerji et al. (1983) Cell 33:729-740; gn1 gene under the transcriptional control of the lacUV 5 Queen and Baltimore (1983) Cell 33:741-748), neuron rOmoter. Specific promoters (e.g., the neurofilament promoter; Byrne 0146) One strategy to maximize recombinant protein and Ruddle (1989) PNAS 86:5473-5477), pancreas-specific expression is to express the protein in a host bacteria with an promoters (Edlund et al. (1985) Science 230:912-916), and impaired capacity to proteolytically cleave the recombinant mammary gland-specific promoters (e.g., milk whey pro protein (Gottesman, S., Gene Expression Technology. Meth moter; U.S. Pat. No. 4,873,316 and European Application Ods in Enzymology 185, Academic Press, San Diego, Calif. Publication No. 264,166). Developmentally-regulated pro (1990) 119-128). Another strategy is to alter the nucleic acid moters are also encompassed, for example the murine hoX Sequence of the nucleic acid to be inserted into an expression promoters (Kessel and Gruss (1990) Science 249:374-379) vector So that the individual codons for each amino acid are and the fetoprotein promoter (Campes and Tilghman (1989) those preferentially utilized in the bacterium chosen for Genes Dev. 3:537-546). expression, such as C. glutamicum (Wada et al. (1992) 0151. In another embodiment, the TCMRPs of the inven Nucleic Acids Res. 20:2111-2118). Such alteration of nucleic tion may be expressed in unicellular plant cells (such as acid Sequences of the invention can be carried out by algae) see Falciatore et al., 1999, Marine Biotechnology.1 Standard DNA Synthesis techniques. (3):239-251 and references therein and plant cells from 0147 In another embodiment, the TCMRP expression higher plants (e.g., the Spermatophytes, Such as crop plants). vector is a yeast expression vector. Examples of vectors for Examples of plant expression vectors include those detailed expression in yeast S. cerivisae include pYepSec1 (Baldari, in: Becker, D, Kemper, E., Schell, J. and Masterson, R. et al., (1987) Embo J. 6:229-234), pMFa (Kurjan and (1992) “New plant binary vectors with selectable markers Herskowitz, (1982) Cell 30:933-943), p.JRY88 (Schultz et located proximal to the left border”, Plant Mol. Biol. 20: al., (1987) Gene 54: 113-123), and pYES2 (Invitrogen Cor 1195-1197; and Bevan, M. W. (1984) “Binary Agrobacte poration, San Diego, Calif.). Vectors and methods for the rium vectors for plant transformation, Nucl. Acid. Res. 12: construction of vectors appropriate for use in other fungi, 8711-8721; Vectors for Gene Transfer in Higher Plants; in: Such as the filamentous fungi, include those detailed in: Van Transgenic Plants, Vol. 1, Engineering and Utilization, eds.: den Hondel, C. A. M. J.J. & Punt, P.J. (1991) "Gene transfer Kung und R. Wu, Academic Press, 1993, S. 15-38. Systems and vector development for filamentous fungi, in: 0152. Further, TCMRP genes can be incorporated into a Applied Molecular Genetics of Fungi, J. F. Peberdy, et al., derivative of the transformation vector pBin-19 with 35S eds., p. 1-28, Cambridge University Press: Cambridge. promoter (Bevan, M., Nucleic Acids Research 12: 8711 0148 Alternatively, the TCMRPs of the invention can be 8721 (1984)). expressed in insect cells using baculovirus expression vec 0153. A plant expression cassette preferably contains tors. Baculovirus vectors available for expression of proteins regulatory Sequences capable to drive gene expression in in cultured insect cells (e.g., Sf 9 cells) include the pac plants cells and which are operably linked So that each series (Smith et al. (1983) Mol. Cell Biol. 3:2156-2165) and Sequence can fulfil its function Such as termination of the pVL series (Lucklow and Summers (1989) Virology transcription Such as polyadenylation Signals. Preferred 170:31-39). polyadenylation Signals are those originating from Agrobac 0149. In yet another embodiment, a nucleic acid of the terium tumefaciens t-DNA such as the gene 3 known as invention is expressed in mammalian cells using a mamma octopine synthase of the Ti-plasmid pTiACH5 (Gielen et al., lian expression vector. Examples of mammalian expression EMBO J. 3 (1984), 835 ff) or functional equivalents therof vectors include pCDM8 (Seed, B. (1987) Nature 329:840) but also all other terminators are Suitable. and pMT2PC (Kaufman et al. (1987) EMBO J. 6:187-195). 0154 As plant gene expression is very often not limited When used in mammalian cells, the expression vector's on transcriptional levels a plant expression cassette prefer control functions are often provided by viral regulatory ably contains other operably linked Sequences like transla elements. For example, commonly used promoters are tional enhancerS Such as the overdrive-Sequence containing derived from polyoma, Adenovirus 2, cytomegalovirus and the 5'-untranlated leader Sequence from tobacco mosaic US 2003/O157592 A1 Aug. 21, 2003 virus enhancing the protein per RNA ratio (Gallie et al 1987, Nucl. Acids Research 15:8693–8711). -continued pTP11 O155 Plant gene expression has to be operably linked to KpnI GGTACCATGGCGTCTTCTTCTTCTCTCACTCTCTCTCAAGCTATC an appropriate promoter conferring gene expression in a timely , cell or tissue Specific manner. Preferrred are pro CTCTCTCGTTCTGTCCCTCGCCATGGCTCTGCCTCTTCTTCTCAACTTTC moters driving constitutitive expression (Benfey et al., CCCTTCTTCTCTCACTTTTTCCGGCCTTAAATCCAATCCCAATATCACCA EMBO J.8 (1989) 2195-2202) like those derived from plant viruses like the 35S CAMV (Franck et al., Cell 21 (1980) CCTCCCGCCGCCGTACTCCTTCCTCCGCCGCCGCCGCCGCCGTCGTAAGG 285-294), the 19S CaMV (see also U.S. Pat. No. 5352605 TCACCGGCGATTCGTGCCTCAGCTGCAACCGAAACCATAGAGAAAACTGA and WO8402913) or plant promoters like those from Rubisco Small Subunit described in U.S. Pat. No. 4,962,028. GACTGCGGGGATCC BamHI. WO 8705629, WO 92.04449. 0159. The biosynthesis site of tocopherols is, inter alia, 0156. Other preferred sequences for use operable linkage the leaf tissue, so that leaf-specific expression of the TCMRP in plant gene expression cassettes are targeting-Sequences genes constitutes a preferred embodiment. However, this necessary to direct the gene-product in its appropriate cell does not constitute a limitation Since tocopherol biosynthe compartment (for review see Kermode, Crit. Rev. Plant Sci. Sis need not be restricted to leaf tissue but can also take place 15, 4 (1996), 285-423 and references cited therin) such as in a tissue-specific manner in all other parts of the plant, in the vacuole, the nucleus, all types of plastids like amylo particular in fatty Seeds. plasts, chloroplasts, chromoplasts, the extracellular space, mitochondria, the endoplasmic reticulum, oil bodies, per 0160 Accordingly, a further preferred embodiment oxisomes and other compartments of plant cells. relates to a seed-specific expression of the TCMRP genes. O157. It is also possible to use expression cassettes whose 0.161 In addition, constitutive expression of the exog DNA sequence encodes, for example, a fusion protein, part enous TCMRP genes is advantageous. On the other hand, of the fusion protein being a transit peptide which governs inducible expression may also appear desirable. the translocation of the polypeptide. Preferred are chloro 0162 Expression efficacy of the recombinantly expressed plast-specific transit peptides, which are cleaved enzymati genes can be determined for example in vitro by Shoot cally from the moiety after the TCMRP gene product has meristem propagation. Also, changes in the nature and level been translocated into the chloroplasts. Particularly pre of the expression of the genes, and their effect on tocopherol ferred is the transit peptide which is derived from the plastid biosynthesis performance, can be tested on test plants in Nicotiana tabacum transketolase or from another transit greenhouse experiments. peptide (for example the Rubisco Small Subunit transit 0163 Plant gene expression can also be facilitated via a peptide, or the ferredoxin NADP oxidoreductase and also chemically inducible promoter (for rewiew see Gatz 1997, the isopentenyl pyrophosphate isomerase-2) or its functional Annu. Rev. Plant Physiol. Plant Mol. Biol., 48:89-108). equivalent. Chemically inducible promoters are especially Suitable if 0158 Especially preferred are DNA sequences of three gene expression is wanted to occur in a time Specific manner. cassettes of the plastid transit peptide of the tobacco plastid Examples for Such promoters are a Salicylic acid inducible transketolase in three reading frames as Kipn/BamHI frag promoter (WO95/19443), a tetracycline inducible promoter (Gatz et al., (1992) Plant J. 2, 397-404) and an ethanol ments with an ATG codon in the Nco cleavage site: inducible promoter (WO 93/21334). 0164. Also promoters responding to biotic or abiotic pTPO9 StreSS conditions are Suitable promoterS Such as the pathogen KpnI GGTACCATGGCGTCTTCTTCTTCTCTCACTCTCTCTCAAGCTATC inducible PRP1-gene promoter (Ward et al., Plant. Mol. CTCTCTCGTTCTGTCCCTCGCCATGGCTCTGCCTCTTCTTCTCAACTTTC Biol. 22 (1993), 361-366), the heat inducible hsp80-pro moter from tomato (U.S. Pat. No. 5,187,267), cold inducible CCCTTCTTCTCTCACTTTTTCCGGCCTTAAATCCAATCCCAATATCACCA alpha-amylase promoter from potato (WO9612814) or the CCTCCCGCCGCCGTACTCCTTCCTCCGCCGCCGCCGCCGCCGTCGTAAGG wound-inducible pinII-promoter (EP3 75091). 01.65 Especially those promoters are preferred which TCACCGGCGATTCGTGCCTCAGCTGCAACCGAAACCATAGAGAAAACTGA confer gene expression in Storage tissues and organs Such as GACTGCGGGATCC BamHI cells of the endosperm and the developing embryo. pTP10 0166 Suitable promoters are the napin-gene promoter KpnI GGTACCATGGCGTCTTCTTCTTCTCTCACTCTCTCTCAAGCTATC from rapeseed (U.S. Pat. No. 5,608,152), the USP-promoter from Vicia faba (Baeumlein et al., Mol Gen Genet, 1991, CTCTCTCGTTCTGTCCCTCGCCATGGCTCTGCCTCTTCTTCTCAACTTTC 225 (3):459-67), the oleosin-promoter from Arabidopsis (WO9845.461), the phaseolin-promoter from Phaseolus vul CCCTTCTTCTCTCACTTTTTCCGGCCTTAAATCCAATCCCAATATCACCA garis (U.S. Pat. No. 5,504,200), the Bce4-promoter from CCTCCCGCCGCCGTACTCCTTCCTCCGCCGCCGCCGCCGCCGTCGTAAGG Brassica (WO9113980) or the legumin B4 promoter (LeB4; Baeumlein et al., 1992, Plant Journal, 2 (2):233-9) as well as TCACCGGCGATTCGTGCCTCAGCTGCAACCGAAACCATAGAGAAAACTGA promoters conferring Seed specific expression in monocot GACTGCGCTGGATCC BamHI plants like maize, barley, wheat, rye, rice etc. Suitable promoters to note are the 1pt2 or 1pt1-gene promoter from barley (WO9515389 and WO9523230) or those desribed in US 2003/O157592 A1 Aug. 21, 2003

WO9916890 (promoters from the barley hordein-gene, the tation, DEAE-dextran-mediated transfection, lipofection, rice glutelin gene, the rice oryzin gene, the rice prolamin natural competence, chemical-mediated transfer, or elec gene, the Wheat gliadin gene, wheat glutelin gene, the maize troporation. Suitable methods for transforming or transfect Zein gene, the oat glutelin gene, the Sorghum kasirin-gene, ing host cells including plant cells can be found in Sam the rye Secalin gene). brook, et al. (Molecular Cloning: A Laboratory Manual. 0167 Also especially Suited are promoters that confer 2nd, ed., Cold Spring Harbor Laboratory, Cold Spring plastid-specific gene expression as plastids are the compart Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989) ment where part of the biosynthesis of amino acids, Vita and other laboratory manuals such as Methods in Molecular mins, cofactors, nutraceuticals, nucleotide or nucleosides Biology, 1995, Vol. 44, Agrobacterium protocols, ed: Gart take place. Suitable promoters such as the viral RNA land and Davey, Humana Press, Totowa, N.J. polymerase promoter are described in WO9516783 and 0172 Suitable methods are transformation by WO9706250 and the c1 pP-promoter from Arabidopsis polyethylene-glycol-induced DNA uptake, the biolistic described in WO994.6394. method using the gene gun-the So-called particle bombard 0168 The invention further provides a recombinant ment method, electroporation, incubation of dry embryos in expression vector comprising a DNA molecule of the inven DNA-containing Solution, microinjection and agrobacte tion cloned into the expression vector in an antisense ori rium-mediated gene transfer. entation. That is, the DNA molecule is operatively linked to 0173 For stable transfection of mammalian cells, it is a regulatory Sequence in a manner which allows for expres known that, depending upon the expression vector and sion (by transcription of the DNA molecule) of an RNA transfection technique used, only a Small fraction of cells molecule which is antisense to TCMRP mRNA. Regulatory may integrate the foreign DNA into their genome. In order Sequences operatively linked to a nucleic acid cloned in the to identify and Select these integrants, a gene that encodes a antisense orientation can be chosen which direct the con Selectable marker (e.g., resistance to antibiotics) is generally tinuous expression of the antisense RNA molecule in a introduced into the host cells along with the gene of interest. variety of cell types, for instance viral promoters and/or Preferred selectable markers include those which confer enhancers, or regulatory Sequences can be chosen which resistance to drugs, Such as G418, hygromycin and meth direct constitutive, tissue Specific or cell type specific otrexate or in plants that confer resistance towards a herbi expression of antisense RNA. The antisense expression cide Such as glyphosate or glufosinate. Nucleic acid encod vector can be in the form of a recombinant plasmid, ing a Selectable marker can be introduced into a host cell on phagemid or attenuated virus in which antisense nucleic the same vector as that encoding an TCMRP or can be acids are produced under the control of a high efficiency introduced on a separate vector. Cells Stably transfected with regulatory region, the activity of which can be determined the introduced nucleic acid can be identified by, for example, by the cell type into which the vector is introduced. For a drug Selection (e.g., cells that have incorporated the Select discussion of the regulation of gene expression using anti able marker gene will Survive, while the other cells die). Sense genes See Weintraub, H. et al., AntiSense RNA as a molecular tool for genetic analysis, Reviews-Trends in 0.174. To create a homologous recombinant microorgan Genetics, Vol. 1(1) 1986 and Molet al., 1990, FEBS Letters ism, a vector is prepared which contains at least a portion of 268:427-430. an TCMRP gene into which a deletion, addition or substi tution has been introduced to thereby alter, e.g., functionally 0169. Another aspect of the invention pertains to host disrupt, the TCMRP gene. Preferably, this TCMRP gene is cells into which a recombinant expression vector of the a Physcomitrella patens TCMRP gene, but it can be a invention has been introduced. The terms "host cell' and homologue from a related plant or even from a mammalian, “recombinant host cell” are used interchangeably herein. It yeast, or insect Source. In a preferred embodiment, the is understood that Such terms refer not only to the particular vector is designed Such that, upon homologous recombina Subject cell but to the progeny or potential progeny of Such tion, the endogenous TCMRP gene is functionally disrupted a cell. Because certain modifications may occur in Succeed (i.e., no longer encodes a functional protein; also referred to ing generations due to either mutation or environmental as a knock-out vector). Alternatively, the vector can be influences, Such progeny may not, in fact, be identical to the designed Such that, upon homologous recombination, the parent cell, but are Still included within the Scope of the term endogenous TCMRP gene is mutated or otherwise altered as used herein. but still encodes functional protein (e.g., the upstream regu 0170 A host cell can be any prokaryotic or eukaryotic latory region can be altered to thereby alter the expression of cell. For example, an TCMRP can be expressed in bacterial the endogenous TCMRP). To create a point mutation via cells Such as E.coli, C. glutamicum, insect cells, fungal cells homologous recombination also DNA-RNA hybrids can be or mammalian cells (Such as Chinese hamster ovary cells used known as chimeraplasty known from Cole-StrauSS et (CHO) or COS cells), algae, ciliates, plant cells or fungi. al. 1999, Nucleic Acids Research 27(5):1323-1330 and Other Suitable host cells are known to those skilled in the art. Kmiec Gene therapy. 19999, American Scientist. 87(3):240 Preferred are plant cells. 247. 0171 Vector DNA can be introduced into prokaryotic or 0.175 Whereas in the homologous recombination vector, eukaryotic cells via conventional transformation or trans the altered portion of the TCMRP gene is flanked at its 5' and fection techniques. AS used herein, the terms “transforma 3' ends by additional nucleic acid of the TCMRP gene to tion' and "transfection', conjugation and transduction are allow for homologous recombination to occur between the intended to refer to a variety of art-recognized techniques for exogenous TCMRP gene carried by the vector and an introducing foreign nucleic acid (e.g., DNA) into a host cell, endogenous TCMRP gene in a microorganism or plant. The including calcium phosphate or calcium chloride co-precipi additional flanking TCMRP nucleic acid is of sufficient US 2003/O157592 A1 Aug. 21, 2003 length for Successful homologous recombination with the which are involved in the synthesis of the protein. In one endogenous gene. Typically, Several hundreds of basepairs embodiment, the language “Substantially free of chemical up to kilobases of flanking DNA (both at the 5' and 3' ends) precursors or other chemicals' includes preparations of are included in the vector (see e.g., Thomas, K. R., and TCMRP having less than about 30% (by dry weight) of Capecchi, M. R. (1987) Cell 51: 503 for a description of chemical precursors or non-TCMRP chemicals, more pref homologous recombination vectors or Strepp et al., 1998, erably less than about 20% chemical precursors or non PNAS, 95 (8):43684373 for cDNA based recombination in TCMRP chemicals, still more preferably less than about Physcomitrella patens). The vector is introduced into a 10% chemical precursors or non-TCMRP chemicals, and microorganism or plant cell (e.g., via polyethyleneglycol most preferably less than about 5% chemical precursors or mediated DNA) and cells in which the introduced TCMRP non-TCMRP chemicals. In preferred embodiments, isolated gene has homologously recombined with the endogenous proteins or biologically active portions thereof lack contami TCMRP gene are Selected, using art-known techniques. nating proteins from the same organism from which the TCMRP is derived. Typically, such proteins are produced by 0176). In another embodiment, recombinant microorgan recombinant expression of, for example, a PhySCOmitrella isms can be produced which contain Selected Systems which patens TCMRP in other plants than Physcomitrella patens or allow for regulated expression of the introduced gene. For microorganisms. Such as C. glutamicum or ciliates, algae or example, inclusion of an TCMRP gene on a vector placing fungi. it under control of the lac operon permits expression of the TCMRP gene only in the presence of IPTG. Such regulatory 0180. An isolated TCMRP or a portion thereof of the Systems are well known in the art. invention can participate in the metabolism of amino acids, Vitamins, cofactors, nutraceuticals, nucleotides or nucleo 0177. A host cell of the invention, such as a prokaryotic sides in Physcomitrella patens, or has one or more of the or eukaryotic host cell in culture, can be used to produce activities set forth in Table 1. In preferred embodiments, the (i.e., express) an TCMRP. An alternate method can be protein or portion thereof comprises an amino acid Sequence applied in addition in plants by the direct transfer of DNA which is Sufficiently homologous to an amino acid Sequence into developing flowers via electroporation or Agrobacte of Appendix B such that the protein or portion thereof rium medium gene transfer. Accordingly, the invention maintains the ability to participate in the metabolism of fine further provides methods for producing TCMRPs using the chemicals like amino acids, Vitamins, cofactors, nutraceuti host cells of the invention. In one embodiment, the method cals, nucleotides, or nucleosides in PhySCOmitrella patens. comprises culturing the host cell of invention (into which a The portion of the protein is preferably a biologically active recombinant expression vector encoding an TCMRP has portion as described herein. In another preferred embodi been introduced, or into which genome has been introduced ment, an TCMRP of the invention has an amino acid a gene encoding a wild-type or altered TCMRP) in a suitable Sequence shown in Appendix B. In yet another preferred medium until TCMRP is produced. In another embodiment, embodiment, the TCMRP has an amino acid sequence which the method further comprises isolating TCMRPs from the is encoded by a nucleotide Sequence which hybridizes, e.g., medium or the host cell. hybridizes under Stringent conditions, to a nucleotide Sequence of Appendix A. In Still another preferred embodi C. Isolated TCMRPS ment, the TCMRP has an amino acid sequence which is encoded by a nucleotide Sequence that is at least about 0.178 Another aspect of the invention pertains to isolated 50-60%, preferably at least about 60-70%, more preferably TCMRPs, and biologically active portions thereof. An “iso at least about 70-80%, 80-90%, 90-95%, and even more lated” or “purified” protein or biologically active portion preferably at least about 96%, 97%, 98%, 99% or more thereof is substantially free of cellular material when pro homologous to one of the amino acid Sequences of Appendix duced by recombinant DNA techniques, or chemical pre B. The preferred TCMRPS of the present invention also cursors or other chemicals when chemically Synthesized. preferably possess at least one of the TCMRP activities The language “substantially free of cellular material” described herein. For example, a preferred TCMRP of the includes preparations of TCMRP in which the protein is present invention includes an amino acid Sequence encoded Separated from cellular components of the cells in which it by a nucleotide Sequence which hybridizes, e.g., hybridizes is naturally or recombinantly produced. In one embodiment, under Stringent conditions, to a nucleotide Sequence of the language “Substantially free of cellular material' Appendix A, and which can participate in the metabolism of includes preparations of TCMRP having less than about tocopherols or carotenoids in PhySCOmitrella patens, or 30% (by dry weight) of non-TCMRP (also referred to herein which has one or more of the activities set forth in Table 1. as a “contaminating protein'), more preferably less than 0181. In other embodiments, the TCMRP is substantially about 20% of non-TCMRP, still more preferably less than homologous to an amino acid Sequence of Appendix B and about 10% of non-TCMRP, and most preferably less than retains the functional activity of the protein of one of the about 5% non-TCMRP. Sequences of Appendix B yet differs in amino acid Sequence 0179 When the TCMRP or biologically active portion due to natural variation or mutagenesis, as described in thereof is recombinantly produced, it is also preferably detail in SubSection I above. Accordingly, in another Substantially free of culture medium, i.e., culture medium embodiment, the TCMRP is a protein which comprises an represents less than about 20%, more preferably less than amino acid sequence which is at least about 50-60%, pref about 10%, and most preferably less than about 5% of the erably at least about 60-70%, and more preferably at least Volume of the protein preparation. The language “Substan about 70-80, 80-90, 90-95%, and most preferably at least tially free of chemical precursors or other chemicals' about 96%, 97%, 98%, 99% or more homologous to an includes preparations of TCMRP in which the protein is entire amino acid Sequence of AppendiX B and which has at Separated from chemical precursors or other chemicals least one of the TCMRP activities described herein. In US 2003/O157592 A1 Aug. 21, 2003 20 another embodiment, the invention pertains to a full Phy 0185. Preferably, an TCMRP chimeric or fusion protein SCOmitrella patens protein which is Substantially homolo of the invention is produced by standard recombinant DNA gous to an entire amino acid Sequence of Appendix B. techniques. For example, DNA fragments coding for the different polypeptide Sequences are ligated together in 0182 Biologically active portions of an TCMRP include frame in accordance with conventional techniques, for peptides comprising amino acid Sequences derived from the example by employing blunt-ended or Stagger-ended termini amino acid Sequence of an TCMRP, e.g., the an amino acid for ligation, restriction enzyme digestion to provide for Sequence shown in Appendix B or the amino acid Sequence appropriate termini, filling-in of cohesive ends as appropri of a protein homologous to an TCMRP, which include fewer ate, alkaline phosphatase treatment to avoid undesirable amino acids than a full length TCMRP or the full length joining, and enzymatic ligation. In another embodiment, the protein which is homologous to an TCMRP, and exhibit at fusion gene can be Synthesized by conventional techniques least one activity of an TCMRP Typically, biologically including automated DNA synthesizers. Alternatively, PCR active portions (peptides, e.g., peptides which are, for amplification of gene fragments can be carried out using example, 5, 10, 15, 20, 30, 35, 36, 37,38, 39, 40, 50, 100 or anchor primers which give rise to complementary overhangs more amino acids in length) comprise a domain or motif between two consecutive gene fragments which can Subse with at least one activity of an TCMRP. Moreover, other quently be annealed and reamplified to generate a chimeric biologically active portions, in which other regions of the gene sequence (see, for example, Current Protocols in protein are deleted, can be prepared by recombinant tech Molecular Biology, eds. Ausubel et al. John Wiley & Sons: niques and evaluated for one or more of the activities 1992). Moreover, many expression vectors are commer described herein. Preferably, the biologically active portions cially available that already encode a fusion moiety (e.g., a of an TCMRP include one or more selected domains/motifs GST polypeptide). An TCMRP-encoding nucleic acid can or portions thereof having biological activity. be cloned into Such an expression vector Such that the fusion moiety is linked in-frame to the TCMRP. 0183 TCMRPs are preferably produced by recombinant DNA techniques. For example, a nucleic acid molecule 0186 Homologues of the TCMRP can be generated by encoding the protein is cloned into an expression vector (as mutagenesis, e.g., discrete point mutation or truncation of described above), the expression vector is introduced into a the TCMRP. As used herein, the term “homologue' refers to host cell (as described above) and the TCMRP is expressed a variant form of the TCMRP which acts as an agonist or in the host cell. The TCMRP can then be isolated from the antagonist of the activity of the TCMRP. An agonist of the cells by an appropriate purification Scheme using Standard TCMRP can retain substantially the same, or a subset, of the protein purification techniques. Alternative to recombinant biological activities of the TCMRP. An antagonist of the expression, an TCMRP, polypeptide, or peptide can be TCMRP can inhibit one or more of the activities of the Synthesized chemically using Standard peptide Synthesis naturally occurring form of the TCMRP, by, for example, techniques. Moreover, native TCMRP can be isolated from competitively binding to a downstream or upstream member cells (e.g., endothelial cells), for example using an anti of the cell membrane component metabolic cascade which TCMRP antibody, which can be produced by standard includes the TCMRP, or by binding to an TCMRP which techniques utilizing an TCMRP or fragment thereof of this mediates transport of compounds acroSS Such membranes, invention. thereby preventing translocation from taking place. 0184 The invention also provides TCMRP chimeric or 0187. In an alternative embodiment, homologues of the fusion proteins. As used herein, an TCMRP “chimeric TCMRP can be identified by screening combinatorial librar protein' or “fusion protein’ comprises an TCMRP polypep ies of mutants, e.g., truncation mutants, of the TCMRP for tide operatively linked to a non-TCMRP polypeptide. An TCMRP agonist or antagonist activity. In one embodiment, “TCMRP polypeptide” refers to a polypeptide having an a variegated library of TCMRP variants is generated by amino acid Sequence corresponding to an TCMRP, whereas combinatorial mutagenesis at the nucleic acid level and is a “non-TCMRP polypeptide” refers to a polypeptide having encoded by a variegated gene library. A variegated library of an amino acid Sequence corresponding to a protein which is TCMRP variants can be produced by, for example, enzy not substantially homologous to the TCMRP, e.g., a protein matically ligating a mixture of Synthetic oligonucleotides which is different from the TCMRP and which is derived into gene Sequences Such that a degenerate Set of potential from the same or a different organism. Within the fusion TCMRP sequences is expressible as individual polypep protein, the term “operatively linked' is intended to indicate tides, or alternatively, as a set of larger fusion proteins (e.g., that the TCMRP polypeptide and the non-TCMRP polypep for phage display) containing the set of TCMRP sequences tide are fused to each other so that both sequences fulfil the therein. There are a variety of methods which can be used to proposed function addicted to the Sequence used. The non produce libraries of potential TCMRP homologues from a TCMRP polypeptide can be fused to the N-terminus or degenerate oligonucleotide Sequence. Chemical Synthesis of C-terminus of the TCMRP polypeptide. For example, in one a degenerate gene Sequence can be performed in an auto embodiment the fusion protein is a GSTTCMRP fusion matic DNA Synthesizer, and the Synthetic gene then ligated protein in which the TCMRP sequences are fused to the into an appropriate expression vector. Use of a degenerate C-terminus of the GST Sequences. Such fusion proteins can Set of genes allows for the provision, in one mixture, of all facilitate the purification of recombinant TCMRPs. In of the Sequences encoding the desired Set of potential another embodiment, the fusion protein is an TCMRP con TCMRP sequences. Methods for synthesizing degenerate taining a heterologous Signal Sequence at its N-terminus. In oligonucleotides are known in the art (See, e.g., Narang, S. certain host cells (e.g., mammalian host cells), expression A. (1983) Tetrahedron 39:3; Itakura et al. (1984) Annu. Rev. and/or secretion of an TCMRP can be increased through use Biochem. 53:323; Itakura et al. (1984) Science 198: 1056; Ike of a heterologous Signal Sequence. et al. (1983) Nucleic Acid Res. 11:477. US 2003/O157592 A1 Aug. 21, 2003

0188 In addition, libraries of fragments of the TCMRP 0.192 Further, the nucleic acid and protein molecules of coding can be used to generate a variegated population of the invention may serve as markers for Specific regions of TCMRP fragments for screening and subsequent selection the genome. This has utility not only in the mapping of the of homologues of an TCMRP. In one embodiment, a library genome, but also for functional studies of Physcomitrella of coding Sequence fragments can be generated by treating patens proteins. For example, to identify the region of the a double stranded PCR fragment of an TCMRP coding genome to which a particular Physcomitrella patens DNA Sequence with a nuclease under conditions wherein nicking binding protein binds, the PhySCOmitrella patens genome occurs only about once per molecule, denaturing the double could be digested, and the fragments incubated with the stranded DNA, renaturing the DNA to form double stranded DNA-binding protein. Those which bind the protein may be DNA which can include sense/antisense pairs from different additionally probed with the nucleic acid molecules of the nicked products, removing Single Stranded portions from invention, preferably with readily detectable labels; binding reformed duplexes by treatment with S1 nuclease, and of Such a nucleic acid molecule to the genome fragment ligating the resulting fragment library into an expression enables the localization of the fragment to the genome map vector. By this method, an expression library can be derived of Physcomitrella patens, and, when performed multiple which encodes N-terminal, C-terminal and internal frag times with different enzymes, facilitates a rapid determina ments of various sizes of the TCMRP. tion of the nucleic acid Sequence to which the protein binds. 0189 Several techniques are known in the art for screen Further, the nucleic acid molecules of the invention may be ing gene products of combinatorial libraries made by point Sufficiently homologous to the Sequences of related Species mutations or truncation, and for Screening cDNA libraries Such that these nucleic acid molecules may serve as markers for gene products having a Selected property. Such tech for the construction of a genomic map in related mosses, niques are adaptable for rapidScreening of the gene libraries such as Physcomitrella patens. generated by the combinatorial mutagenesis of TCMRP 0193 The TCMRP nucleic acid molecules of the inven homologues. The most widely used techniques, which are tion are also useful for evolutionary and protein Structural amenable to high through-put analysis, for Screening large Studies. The metabolic and transport processes in which the gene libraries typically include cloning the gene library into molecules of the invention participate are utilized by a wide replicable expression vectors, transforming appropriate cells variety of prokaryotic and eukaryotic cells, by comparing with the resulting library of vectors, and expressing the the Sequences of the nucleic acid molecules of the present combinatorial genes under conditions in which detection of invention to those encoding similar enzymes from other a desired activity facilitates isolation of the vector encoding organisms, the evolutionary relatedness of the organisms can the gene whose product was detected. Recursive ensemble be assessed. Similarly, Such a comparison permits an assess mutagenesis (REM), a new technique which enhances the ment of which regions of the Sequence are conserved and frequency of functional mutants in the libraries, can be used which are not, which may aid in determining those regions in combination with the screening assays to identify TCMRP of the protein which are essential for the functioning of the homologues (Arkin and Yourvan (1992) PNAS 89:7811 enzyme. This type of determination is of value for protein 7815; Delgrave et al (1993) Protein Engineering 6(3):327 engineering Studies and may give an indication of what the 331). protein can tolerate in terms of mutagenesis without losing 0190. In another embodiment, cell based assays can be function. exploited to analyze a variegated TCMRP library, using 0194 Manipulation of the TCMRP nucleic acid mol methods well known in the art. ecules of the invention may result in the production of D. Uses and Methods of the Invention TCMRPs having functional differences from the wild-type 0191 The nucleic acid molecules, proteins, protein TCMRPs. These proteins may be improved in efficiency or homologues, fusion proteins, priners, vectors, and host cells activity, may be present in greater numbers in the cell than described herein can be used in one or more of the following is usual, or may be decreased in efficiency or activity. methods: identification of Physcomitrella patens and related 0.195 There are a number of mechanisms by which the organisms, mapping of genomes of organisms related to alteration of an TCMRP of the invention may directly affect Physcomitrella patens; identification and localization of the yield, production, and/or efficiency of production of a PhySCOmitrella patens Sequences of interest, evolutionary fine chemical like tocopherols and carotenoids incorporating studies; determination of TCMRP regions required for func Such an altered protein into microorganisms, algae or plants. tion; modulation of an TCMRP activity; modulation of the Recovery of fine chemical compounds from large-scale cellular production of one or more fine chemicals. Such as cultures of C. glutamicum, ciliates, algae or fungi is signifi tocopherols or carotenoids. The TCMRP nucleic acid mol cantly improved if the cell Secretes the desired compounds, ecules of the invention have a variety of uses. First, they may Since Such compounds may be readily purified from the be used to identify an organism as being PhySCOmitrella culture medium (as opposed to extracted from the mass of patens or a close relative thereof. Also, they may be used to cultured cells). In the case of plants expressing TCMRPs identify the presence of Physcomitrella patens or a relative increased transport can lead to improved partitioning within thereof in a mixed population of microorganisms. The the plant tissue and organs. By either increasing the number invention provides the nucleic acid Sequences of a number or the activity of transporter molecules which export fine of PhySCOmitrella patens genes, by probing the extracted chemicals from the cell, it may be possible to increase the genomic DNA of a culture of a unique or mixed population amount of the produced fine chemical which is present in the of microorganisms under Stringent conditions with a probe extracellular medium, thus permitting greater ease of har Spanning a region of a PhySCOmitrella patens gene which is vesting and purification or in case of plants mor efficient unique to this organism, one can ascertain whether this partitioning. Conversely, in order to efficiently overproduce organism is present. one or more fine chemicals, increased amounts of the US 2003/O157592 A1 Aug. 21, 2003 22 cofactors, precursor molecules, and intermediate com herein, the nucleic acid and protein molecules of the inven pounds for the appropriate biosynthetic pathways are tion may be utilized to generate algae, ciliates, plants, fungi required. Therefore, by increasing the number and/or activ or other microorganims like C. glutamicum expressing ity of transporter proteins involved in the import of nutrients, mutated TCMRP nucleic acid and protein molecules such Such as carbon Sources (i.e., Sugars), nitrogen Sources (i.e., that the yield, production, and/or efficiency of production of amino acids, ammonium salts), phosphate, and Sulfur, it may a desired compound is improved. This desired compound be possible to improve the production of a fine chemical, due may be any natural product of algae, ciliates, plants, fungi or to the removal of any nutrient Supply limitations on the C. glutamicum, which includes the final products of biosyn biosynthetic process. thesis pathways and intermediates of naturally-occurring 0196. The engineering of one or more TCMRP genes of metabolic pathways, as well as molecules which do not the invention may also result in TCMRPs having altered naturally occur in the metabolism of Said cells, but which are activities which indirectly impact the production of one or produced by a Said cells of the invention. more desired fine chemicals from algae, plants, ciliates or 0199. This invention is further illustrated by the follow fungi or other microorganims like C. glutamicum. For ing examples which should not be construed as limiting. The example, the normal biochemical processes of metabolism contents of all references, patent applications, patents, and result in the production of a variety of waste products (e.g., published patent applications cited throughout this applica hydrogen peroxide and other reactive oxygen species) which tion are hereby incorporated by reference. may actively interfere with these same metabolic processes (for example, peroxynitrite is known to nitrate tyrosine side EXAMPLIFICATION chains, thereby inactivating Some enzymes having tyrosine in the active site (Groves, J. T. (1999) Curr. Opin. Chem. Example 1 Biol, 3(2): 226-235). While these waste products are typi cally excreted, cells utilized for large-scale fermentative General Processes production are optimized for the overproduction of one or more fine chemicals, and thus may produce more waste 0200 a) General Cloning Processes: products than is typical for a wild-type cell. By optimizing 0201 Cloning processes Such as, for example, restriction the activity of one or more TCMRPs of the invention which cleavages, agarose gel electrophoresis, purification of DNA are involved in the export of waste molecules, it may be fragments, transfer of nucleic acids to nitrocellulose and possible to improve the viability of the cell and to maintain nylon membranes, linkage of DNA fragments, transforma efficient metabolic activity. Also, the presence of high intra tion of Escherichia coli and yeast cells, growth of bacteria cellular levels of the desired fine chemical may actually be and Sequence analysis of recombinant DNA were carried out toxic to the cell, So by increasing the ability of the cell to as described in Sambrook et al. (1989) (Cold Spring Harbor Secrete these compounds, one may improve the viability of Laboratory Press: ISBN 0-87969-309-6) or Kaiser, Michae the cell. lis and Mitchell (1994) “Methods in Yeasr Genetics” (Cold 0197) Further, the TCMRPs of the invention may be Spring Harbor Laboratory Press: ISBN 0-87969451-3). manipulated Such that the relative amounts of various lipo Transformation and cultivation 21 of algae Such as Chlorella philic fine chemicals like for example Vitamin E or caro or Phaeodactylum are transformed as described by El tenoids are altered. This may have a profound effect on the Sheekh (1999), Biologia Plantarum 42: 209-216; Apt et al. lipid composition of the membrane of the cell. Since each (1996), Molecular and General Genetics 252 (5): 872-9. type of lipid has different physical properties, an alteration 0202) b) Chemicals: in the lipid composition of a membrane may significantly alter membrane fluidity. Changes in membrane fluidity can 0203 The chemicals used were obtained, if not men impact the transport of molecules across the membrane, tioned otherwise in the text, in p.a. quality from the com which, as previously explicated, may modify the export of panies Fluka (Neu-Ulm), Merck (Darmstadt), Roth waste products or the produced fine chemical or the import (Karlsruhe), Serva (Heidelberg) and Sigma (Deisenhofen). of necessary nutrients. Such membrane fluidity changes may Solutions were prepared using purified, pyrogen-free water, also profoundly affect the integrity of the cell; cells with designated as HO in the following text, from a Milli-Q relatively weaker membranes are more Vulnerable abiotic water System water purification plant (Millipore, Eschbom). and biotic StreSS conditions which may damage or kill the Restriction endonucleases, DNA-modifying enzymes and cell. By manipulating TCMRPs involved in the production molecular biology kits were obtained from the companies of fatty acids and lipids for membrane construction Such that AGS (Heidelberg), Amersham (Braunschweig), Biometra the resulting membrane has a membrane composition more (Göttingen), Boehringer (Mannheim), Genomed (Bad Oeyn amenable to the environmental conditions extant in the nhausen), New England Biolabs (Schwalbach/Taunus), cultures utilized to produce fine chemicals, a greater pro Novagen (Madison, Wis., USA), Perkin-Elmer (Weiter portion of the cells should survive and multiply. Greater stadt), Pharmacia (Freiburg), Qiagen (Hilden) and Strat numbers of producing cells should translate into greater agene (Amsterdam, Netherlands). They were used, if not yields, production, or efficiency of production of the fine mentioned otherwise, according to the manufacturer's chemical from the culture. instructions. 0198 The aforementioned mutagenesis strategies for c) Plant Material TCMRPs to result in increased yields of a fine chemical are not meant to be limiting, variations on these Strategies will 0204 For this study, plants of the species Physcomitrella be readily apparent to one skilled in the art. Using Such patens (Hedw.) B.S.G. from the collection of the genetic Strategies, and incorporating the mechanisms disclosed studies section of the University of Hamburg were used. US 2003/O157592 A1 Aug. 21, 2003

They originate from the strain 16/14 collected by H. L. K. Example 3 Whitehouse in Gransden Wood, Huntingdonshire (England), which was subcultured from a spore by Engel (1968, Am J Isolation of Total RNA and Poly-(A)+RNA From Bot 55, 438446). Proliferation of the plants was carried out Plants by means of Spores and by means of regeneration of the . The protonema developed from the haploid 0211 For the investigation of transcripts, both total RNA Spore as a -rich chloronema and chloroplast-low and poly-(A)"RNA were isolated. The total RNA was caulonema, on which buds formed after approximately 12 obtained from wild-type 9d old protonemata following the days. These grew to give gametophores bearing antheridia GTC-method (Reski et al. 1994, Mol. Gen. Genet., 244:352 and archegonia. After fertilization, the diploid 359). with a short Seta and the Spore capsule resulted, in which the 0212) Isolation of Poly A+RNA was isolated using Dyna meiospores mature. Beads(R) (Dynal, Oslo) Following the instructions of the manufacturers protocol. d) Plant Growth 0213. After determination of the concentration of the 0205 Culturing was carried out in a climatic chamber at RNA or of the poly-(A)+RNA, the RNA was precipitated by an air temperature of 25DC and light intensity of 55 addition of 1/10 volumes of 3M sodium acetate pH 4.6 and micromols-1m-2 (white light; Philips TL 65W/25 fluores 2 volumes of ehanol and stored at -70 C. cent tube) and a light/dark change of 16/8 hours. The moss was either modified in liquid culture using Knop medium Example 4 according to Reski and Abel (1985, Planta 165,354-358) or cultured on Knop Solid medium using 1% oxoid (Unipath, Basingstoke, England). cDNA Library Construction 0214) For cDNA library construction first strand synthe 0206. The protonemas used for RNA and DNA isolation sis was achieved using Murine Leukemia Virus reverse were cultured in aerated liquid cultures. The protonemaS transcriptase (Roche, Mannheim, ) and olido-dCT)- were comminuted every 9 days and transferred to fresh primers, second strand synthesis by incubation with DNA culture medium. polymerase I, Klenow enzyme and RNASeH digestion at 12 C. (2 h), 16° C. (1 h) and 22 C. (1h). The reaction was Example 2 stopped by incubation at 65° C. (10 min) and Subsequently Total DNA ISOlation From Plants transferred to ice. Double stranded DNA molecules were blunted by T4-DNA-polymerase (Roche, Mannheim) at 37 0207. The details for the isolation of total DNA relate to C. (30 min). Nucleotides were removed by phenol/chloro the working up of one gram fresh weight of plant material. form extraction and Sephadex-G50 spin columns. EcoRI 0208 CTAB buffer: 2% (w/v) N-cethyl-N,N,N-trimethy adapters (Pharmacia, Freiburg, Germany) were ligated to the lammonium bromide (CTAB); 100 mM Tris HCl pH 8.0; 1.4 cDNA ends by T4-DNA-ligase (Roche, 12 C., overnight) M NaCl; 20 mM EDTA. and phosphorylated by incubation with polynucleotide kinase (Roche, 37 C., 30 min). This mixture was subjected 0209 N-Laurylsarcosine buffer: 10% (w/v) N-laurylsar to Separation on a low melting agarose gel. DNA molecules cosine; 100 mM Tris HCl pH 8.0; 20 mM EDTA. larger than 300 basepairs were eluted from the gel, phenol 0210. The plant material was triturated under liquid nitro extracted, concentrated on Elutip-D-columns (Schleicher gen in a mortar to give a fine powder and transferred to 2 ml and Schuell, Dassel, Germany) and were ligated to vector Eppendorf vessels. The frozen plant material was then arms and packed into lambda ZAPII-phages or lambda covered with a layer of 1 ml of decomposition buffer (1 ml ZAP-Express phages using the Gigapack Gold Kit (Strat CTAB buffer, 100 ml of N-laurylsarcosine buffer, 20 ml of agene, Amsterdam, Netherlands) using material and follow b-mercaptoethanol and 10 ml of proteinase K solution, 10 ing the instructions of the manufacturer. mg/ml) and incubated at 60 C for one hour with continuous Shaking. The homogenate obtained was distributed into two Example 5 Eppendorf vessels (2 ml) and extracted twice by Shaking with the same volume of chloroform/isoamyl alcohol (24:1). Identification of Genes of Interest For phase Separation, centrifugation was carried out at 8000xg and RT for 15 min in each case. The DNA was then 0215 Gene sequences can be used to identify homolo precipitated at -70 C. for 30 min using ice-cold isopropanol. gous or heterologous genes from cDNA or genomic librar The precipitated DNA was sedimented at 4 C and 10,000 g ies. for 30 min and resuspended in 180 ml of TE buffer (Sam 0216 Homologous genes (e.g. full length CDNA clones) brook et al., 1989, Cold Spring Harbor Laboratory Press: can be isolated via nucleic acid hybridization using for ISBN 0-87969-309-6). For further purification, the DNA example cDNA libraries: Depended on the abundance of the was treated with NaCl (1.2 M final concentration) and gene of interest 100 000 up to 1 000 000 recombinant precipitated again at -70 C. for 30 min using twice the bacteriophages are plated and transferred to a nylon mem volume of absolute ethanol. After a washing step with 70% brane. After denaturation with alkali, DNA is immobilized ethanol, the DNA was dried and subsequently taken up in 50 on the membrane by e.g. UV cross linking. Hybridization is ml of HO+RNAse (50 mg/ml final concentration). The carried out at high Stringency conditions. In acqueous Solu DNA was dissolved overnight at 4 C and the RNAse tion hybridization and washing is performed at an ionic digestion was Subsequently carried out at 37 C for 1 h. strength of 1M NaCl and a temperature of 68DC. Hybrid Storage of the DNA took place at 4 C. ization probes are generated by e. g. radioactive (P) nick US 2003/O157592 A1 Aug. 21, 2003 24 transcription labeling (Amersham Ready Prime). Signals are in 1.25% Strength agarose gels using formaldehyde as detected by exposure to X-ray films. described in Amasino (1986, Anal. Biochem. 152, 304), transferred by capillary attraction using 10xSSC to posi 0217 Partially homologous or heterologous genes that tively charged nylon membranes (Hybond N--, Amersham, are related but not identical can be identified analog to the Braunschweig), immobilized by UV light and prehybridized above described procedure using low Stringency hybridiza for 3 hours at 68°C. using hybridization buffer (10% dextran tion and washing conditions. For aqueous hybridization the sulfate w/v, 1 M NaCl, 1% SDS, 100 mg of herring sperm ionic strength is normally kept at 1 M NaCl while the DNA). The labeling of the DNA probe with the “Highprime temperature is progressively lowered from 68 to 42DC. DNA labeling kit” (Roche, Mannheim, Germany) was car 0218 Isolation of gene sequences with homologies only ried out during the prehybridization using alpha-p dCTP in a distinct domain of (for example 20 aminoacids) can be (Amersham, Braunschweig, Germany). Hybridization was carried out by using Synthetic radio labeled oligonucleotide carried out after addition of the labeled DNA probe in the probes. Radio labeled oligonucleotides are prepared by same buffer at 68 C. overnight. The washing steps were phosphorylalation of the 5'-prime end of two complemen carried out twice for 15 min using 2xSSC and twice for 30 tary oligonucleotides with T4 polynucleotede kinase. The min using 1xSSC, 1% SDS at 68° C. The exposure of the complementary oligonucleotides are annealed and ligated to sealed-in filters was carried out at -70° C. for a period of form concatemers. The double Stranded concatemers are 1-14d. than radiolabled by for example nick transcription. Hybrid ization is normally performed at low Stringency conditions Example 8 using high oligonucleotide concentrations. 0219 Oligonucleotide hybridization solution: DNA Sequencing 0220) 6xSSC 0230 CDNA libraries as described in Example 4 were used for DNA sequencing according to Standard methods, in 0221 0.01 M sodium phosphate particular by the chain termination method using the ABI 0222 1 mM EDTA (pH 8) PRISM Big Dye Terminator Cycle Sequencing Ready Reac 0223) 0.5% SDS tion Kit (Perkin-Elmer, Weiterstadt, Germany). Random Sequencing was carried out Subsequent to preparative plas 0224 100 tug/ml denaturated salmon sperm DNA mid recovery from cDNA libraries via in vivo mass excision 0225 0.1% nonfat dried milk and retransformation of DH10B on agar plates (material and 0226. During hybridization temperature is lowered step protocol details from Stratagene, Amsterdam, Netherlands. wise to 5-10DC below the estimated oligonucleotid Tm. Plasmid DNA was prepared from overnight grown E. coli cultures grown in Luria-Broth medium containing amplicil 0227 Further details are described by Sambrook, J. et al. lin (see Sambrook et al. (1989) (Cold Spring Harbor Labo (1989), “Molecular Cloning: A Laboratory Manual”, Cold ratory Press: ISBN 0-87969-309-6)) on a Qiagene DNA Spring Harbor Laboratory Press or Ausubel, F. M. et al. preparation robot (Qiagen, Hilden) according to the manu (1994) “Current Protocols in Molecular Biology”, John facturers protocols. Sequencing primers with the following Wiley & Sons. nucleotide Sequences were used: Example 6 Identification of Genes of Interest by Screening 5'-CAGGAAACAGCTATGACC-3' Expression Libraries With Antibodies 5'-CTAAAGGGAACAAAAGCTG-3' 0228 C-DNA sequences can be used to produce recom 5'-TGTAAAACGACGGCCAGT-3' binant protein for example in E. coli (e.g. Qiagen QIAex press pCE System). Recombinant proteins are than normally affinity purified via Ni-NTA affinity chromatoraphy Example 9 (Qiagen). Recombinant proteins are than used to produce Specific antibodies for example by using Standard techniques Plasmids for Plant Transformation for rabbit immunization. Antibodies are affinity purified 0231. For plant transformation binary vectors such as using a Ni-NTA column Saturated with the recombinant pBinAR-TkTp-9 (Badur, 1998 PhD thesis, Georg August antigen as described by Gu et al., (1994) BioTechniques 17: University of Göttingen, Germany, “Molecular and func 257-262. The antibody can than be used to screen expression tional analysis of isoenzymes for example of fructose-1,6- cDNA libraries to identify homologous or heterologous bisphosphate aldolase, phosphoglucose-isomerase and genes via an immunological Screening (Sambrook, J. et al. 3-deoxy-D-arabino-heptuSolonate-7-phosphate Synthase' (1989), “Molecular Cloning: A Laboratory Manual”, Cold “Molekularbiologische und funktionelle Analyse vonpflan Spring Harbor Laboratory Press or Ausubel, F. M. et al. Zlichen Isoenzymen am Beispiel der Fructose-1,6-bisphos (1994) “Current Protocols in Molecular Biology”, John phat Aldolase, Phosphoglucose-Isomerase und der 3-Deoxy Wiley & Sons). D-Arabino-Heptusolonat-7-Phosphat Synthase”) can be used. This vector is a derivative of pBinAR (Höfgen and Example 7 Willimitzer, Plant Science 66(1990), 221-230) and contains the CaMV (cauliflower mosaic virus) 35S promoter (Franck Northern-Hybridization et al., 1980), the termination signal of the Octopine Synthase 0229. For RNA hybridization, 20 mg of total RNA or 1 gene (Gielen et al., 1984) and the DNA sequence encoding mg of poly-(A)+RNA were separated by gel electrophoresis the transit peptide of the Nicotiana tabacum plastid tran US 2003/O157592 A1 Aug. 21, 2003

sketolase. Construction of the binary vectors can be per 0240 Transformation of soybean can be performed using formed by ligation of the cDNA in sense or antisense for example a technique described in EP 0424047, U.S. Pat. orientation into the T-DNA. No. 322,783 (Pioneer Hi-Bred International) or in EP 0397 0232 5'-prime to the cDNA a plant promotor activates 687, U.S. Pat. No. 5,376,543, U.S. Pat. No. 5,169,770 transcription of the cDNA. A polyadenylation Sequence is (University Toledo). located 3'-prime to the cDNA. 0241 Plant transformation using particle bombardment, 0233 Tissue specific expression can be archived by using Polyethylene Glycol mediated DNA uptake or via the Sili a tissue Specific promotor. For example Seed specific expres con Carbide Fiber technique is for example described by sion can be archived by cloning the napin or USP promotor Freeling and Walbot “The maize handbook” (1993) ISBN 5-prime to the cDNA. Also any other seed specific promotor 3-540-97826-7, Springer Verlag New York). element can be used. For constitutive expression within the Example 12 whole plant the CaMV 35S promotor can be used. 0234. The expressed protein can be targeted to a cellular In vivo Mutagenesis compartment using a signal peptide, for example for plasids, 0242. In Vivo mutagenesis of microorganisms can be mitochondria or endoplasmatic reticulum (Kermode, Crit. performed by passage of plasmid (or other vector) DNA Rev. Plant Sci. 15, 4 (1996), 285423). The signal peptide is through E. coli or other microorganisms (e.g. Bacillus spp. cloned 5'-prime in frame to the cDNA to archive subeellular or yeasts Such as Saccharomyces cerevisiae) which are localization of the fusionprotein. impaired in their capabilities to maintain the integrity of 0235 Nucleic acid molecules from Physcomitrella are their genetic information. Typical mutator Strains have muta used for a direct gene knock-out by homologous recombi tions in the genes for the DNA repair System (e.g., muthLS, nation. Therefore Physcometrella sequences are usefull for mutD, mutT, etc.; for reference, see Rupp, W. D. (1996) functional genomic approaches. The technique is described DNA repair mechanisms, in: Escherichia coli and Salmo by Strepp et al., Proc. Natl. Acad. Sci. USA, 1998, 95: 4369 nella, p. 277-2294, ASM: Washington.) Such strains are well -4373; Girke et al. (1998), Plant Journal 15: 39-48; Hofmann known to those skilled in the art. he use of Such Strains is et al. (1999) Molecular and General Genetics 261: 92-99. illustrated, for example, in Greener, A. and Callahan, M. (1994) Strategies 7: 32-34. Transfer of mutated DNA mol Example 10 ecules into plants is preferably done after Selection and testing in microorganisms. Transgenic plants are generated Transformation of Agrobacterium according to various examples within the exemplification of this document. 0236 Agrobacterium mediated plant transformation can be performed using for example the GV3101(pTCMRP90) Example 13 (Koncz and Schell, Mol. Gen.Genet. 204 (1986), 383-396) or LBA4404 (Clontech) Agrobacterium tumefaciens strain. DNA Transfer Between Escherichia coli and Transformation can be performed by Standard transforma Corynebacterium glutamicum tion techniques (Deblaere et al., Nucl. Acids. Tes. 13 (1984), 4777-4788). 0243 Several Corynebacterium and Brevibacterium spe cies contain endogenous plasmids (as e.g., pHM1519 or Example 11 pBL1) which replicate autonomously (for review See, e.g., Martin, J. F. et al. (1987) Biotechnology, 5:137-146). Shuttle Plant Transformation vectors for Escherichia coli and Corynebacterium glutamicum can be readily constructed by using Standard 0237 Agrobacterium mediated plant transformation has vectors for E. coli (Sambrook, J. et al. (1989), “Molecular been performed using Standard transformation and regen Cloning: A Laboratory Manual”, Cold Spring Harbor Labo eration techniques (Gelvin, Stanton B.; Schilperoort, Robert ratory Press or Ausubel, F. M. et al. (1994) “Current Pro A., “Plant Molecular Biology Manual', 2nd Ed. --Dor tocols in Molecular Biology”, John Wiley & Sons) to which drecht: Kluwer Academic Publ., 1995. - in Sect., Ringbuc a origin or replication for and a Suitable marker from Zentrale Signatur: BT11-P ISBN 0-7923-2731-4; Glick, Corynebacterium glutamicum is added. Such origins of Bernard R.; Thompson, John E., “Methods in Plant Molecu replication are preferably taken from endogenous plasmids lar Biology and Biotechnology’, Boca Raton: CRC Press, isolated from Corynebacterium and Brevibacterium species. 1993. -360 S., ISBN 0-8493-5164-2). Of particular use as transformation markers for these Species are genes for kanamycin resistance (such as those derived 0238 For example rapeseed can be transformed via coty from the Tn5 or Tn903 transposons) or chloramphenicol ledon or hypocotyl transformation (Moloney et al., Plant cell (Winnacker, E. L. (1987) “From Genes to Clones-Intro Report 8 (1989), 238-242; De Blocket al., Plant Physiol. 91 duction to Gene Technology, VCH, Weinheim). There are (1989, 694-701). Use of antibiotica for agrobacterium and numerous examples in the literature of the construction of a plant Selection depends on the binary vector and the agro wide variety of shuttle vectors which replicate in both E. coli bacterium Strain used for transformation. Rapeseed Selection and C. glutamicum, and which can be used for Several is normally performed using kanamycin as Selectable plant purposes, including gene over-expression (for reference, See marker. e.g., Yoshihama, M. et al. (1985).J. Bacteriol. 162:591-597, 0239 Agrobacterium mediated gene transfer to flax can Martin J. F. et al. (1987) Biotechnology, 5:137-146 and be performed using for example a technique described by Eikmanns, B. J. et al. (1991) Gene, 102:93-98). Using Mlynarova et al. (1994), Plant Cell Report 13: 282-285. Standard methods, it is possible to clone a gene of interest US 2003/O157592 A1 Aug. 21, 2003 26 into one of the shuttle vectors described above and to nol., 32:205-210; von der Osten et al. (1998) Biotechnology introduce such a hybrid vectors into strains of Corynebac Letters, 11:11-16; Patent DE 4,120,867; Liebl (1992) “The terium glutamicum. Transformation of C. glutamicum can be Genus Corynebacterium, in: The Procaryotes, Volume II, achieved by protoplast transformation (Kastsumata, R. et al. Balows, A. et al., eds. Springer-Verlag). These media consist (1984) J. Bacteriol. 159306–311), electroporation (Liebl, E. of one or more carbon Sources, nitrogen Sources, inorganic et al. (1989) FEMS Microbiol. Letters, 53:399-303) and in Salts, Vitamins and trace elements. Preferred carbon Sources cases where special vectors are used, also by conjugation (as are Sugars, Such as mono-, di-, or polysaccharides. For described e.g. in Schäfer, A et al. (1990) J. Bacteriol. example, glucose, fructose, mannose, galactose, ribose, Sor 172:1663-1666). It is also possible to transfer the shuttle bose, ribulose, lactose, maltose, Sucrose, raffinose, Starch or vectors for C. glutamicum to E. coli by preparing plasmid cellulose Serve as very good carbon Sources. It is also DNA from C. glutamicum (using standard methods well possible to Supply Sugar to the media via complex com known in the art) and transforming it into E. coli. This pounds Such as molasses or other by-products from Sugar transformation Step can be performed using Standard meth refinement. It can also be advantageous to Supply mixtures ods, but it is advantageous to use an Mcr-deficient E. of different carbon sources. Other possible carbon sources colistrain, such as NM522 (Gough & Murray (1983) J Mol. are alcohols and organic acids, Such as methanol, ethanol, Biol. 166:1-19). acetic acid or lactic acid. Nitrogen Sources are usually organic or inorganic nitrogen compounds, or materials Example 14 which contain these compounds. Exemplary nitrogen Sources include ammonia gas or ammonia Salts, Such as ASSessment of the Expression of a Recombinant NHCl or (NHA)SO, NH4OH, nitrates, urea, amino acids Gene Product in a Transformed Organism or complex nitrogen Sources like corn Steep liquor, Soybean 0244. The activity of a recombinant gene product in the flour, Soy bean protein, yeast extract, meat extract and transformed host organism has been measured on the tran others. Scriptional or/and on the translational level. 0248 Inorganic salt compounds which may be included 0245. A useful method to ascertain the level of transcrip in the media include the chloride-, phosphorous- or Sulfate tion of the gene (an indicator of the amount of mRNA Salts of calcium, magnesium, Sodium, cobalt, molybdenum, available for translation to the gene product) is to perform a potassium, manganese, Zinc, copper and iron. Chelating Northern blot (for reference see, for example, Ausubel et al. compounds can be added to the medium to keep the metal (1988) Current Protocols in Molecular Biology, Wiley: New ions in Solution. Particularly useful chelating compounds York), in which a primer designed to bind to the gene of include dihydroxyphenols, like catechol or protocatechuate, interest is labeled with a detectable tag (usually radioactive or organic acids, Such as citric acid. It is typical for the media or chemiluminescent), such that when the total RNA of a to also contain other growth factors, Such as Vitamins or culture of the organism is extracted, run on gel, transferred growth promoters, examples of which include biotin, ribo to a stable matrix and incubated with this probe, the binding flavin, thiamin, folic acid, nicotinic acid, pantothenate and and quantity of binding of the probe indicates the presence pyridoxin. Growth factors and Salts frequently originate and also the quantity of mRNA for this gene. This informa from complex media components Such as yeast extract, tion is evidence of the degree of transcription of the trans molasses, corn Steep liquor and others. The exact composi formed gene. Total cellular RNA can be prepared from cells, tion of the media compounds depends Strongly on the tissueS or organs by Several methods, all well-known in the immediate experiment and is individually decided for each art, such as that described in Bormann, E. R. et al. (1992) Specific case. Information about media optimization is avail MOI. Microbiol. 6: 317-326. able in the textbook “ Applied Microbiol. Physiology, A Practical Approach (eds. P. M. Rhodes, P. F. Stanbury, IRL 0246 To assess the presence or relative quantity of pro Press (1997) pp. 53-73, ISBN 0 199635773). It is also tein translated from this mRNA, Standard techniques, Such possible to Select growth media from commercial Suppliers, as a Western blot, may be employed (see, for example, like standard 1 (Merck) or BHI (brain heart infusion, DIFC) Ausubel et al. (1988) Current Protocols in Molecular Biol or others. ogy, Wiley: New York). In this process, total cellular pro teins are extracted, Separated by gel electrophoresis, trans 0249 All medium components are sterilized, either by ferred to a matrix Such as nitrocellulose, and incubated with heat (20 minutes at 1.5 bar and 121DC) or by sterile a probe, Such as an antibody, which specifically binds to the filtration. The components can either be Sterilized together desired protein. This probe is generally tagged with a or, if necessary, Separately. All media components can be chemiluminescent or calorimetric label which may be present at the beginning of growth, or they can optionally be readily detected. The presence and quantity of label added continuously or batchwise. observed indicates the presence and quantity of the desired 0250 Culture conditions are defined separately for each mutant protein present in the cell. experiment. The temperature should be in a range between 15DC and 45D.C. The temperature can be kept constant or Example 15 can be altered during the experiment. The pH of the medium should be in the range of 5 to 8.5, preferably around 7.0, and Growth of Genetically Modified Corynebacterium can be maintained by the addition of buffers to the media. An Glutamicum-Media and Culture Conditions exemplary buffer for this purpose is a potassium phosphate 0247 Genetically modified Corynebacteria are cultured buffer. Synthetic buffers such as MOPS, HEPES, ACES and in Synthetic or natural growth media. A number of different others can alternatively or simultaneously be used. It is also growth media for Corynebacteria are both well-known and possible to maintain a constant culture pH through the readily available (Lieb et al. (1989) App. Microbiol Biotech addition of NaOH or NHOH during growth. If complex US 2003/O157592 A1 Aug. 21, 2003 27 medium components Such as yeast extract are utilized, the band-shift assays (also called gel retardation assays). The necessity for additional buffers may be reduced, due to the effect of Such proteins on the expression of other molecules fact that many complex compounds have high buffer capaci can be measured using reporter gene assays (Such as that ties. If a fermentor is utilized for culturing the micro described in Kolmar, H. et al. (1995) EMBO J. 14: 3895 organisms, the pH can also be controlled using gaseous 3904 and references cited therein). Reporter gene test sys ammonia. tems are well known and established for applications in both 0251 The incubation time is usually in a range from pro- and eukaryotic cells, using enzymes Such as beta Several hours to Several days. This time is Selected in order galactosidase, green fluorescent protein, and Several others. to permit the maximal amount of product to accumulate in the broth. The disclosed growth experiments can be carried 0255 The determination of activity of membrane-trans out in a variety of vessels, Such as microtiter plates, glass port proteins can be performed according to techniques Such tubes, glass flaskS or glass or metal fermentors of different as those described in Gennis, R. B. (1989) “Pores, Channels sizes. For Screening a large number of clones, the microor and Transporters”, in Biomembranes, Molecular Structure ganisms should be cultured in nmicrotiter plates, glass tubes and Function, Springer: Heidelberg, p. 85-137; 199-234; and or shake flasks, either with or without baffles. Preferably 100 27O-322. ml shake flasks are used, filled with 10% (by volume) of the required growth medium. The flaskS should be shaken on a Example 17 rotary Shaker (amplitude 25 mm) using a speed-range of 100 -300 rpm. Evaporation losses can be diminished by the Analysis of Impact of Recombinant Proteins on the maintenance of a humid atmosphere; alternatively, a math Production of the Desired Product ematical correction for evaporation losses should be per 0256 The effect of the genetic modification in plants, formed. algae, C. glutamicum, fungi, cilates or on production of a 0252) If genetically modified clones are tested, an desired compound (such as Vitamins) can be assessed by unmodified control clone or a control clone containing the growing the modified microorganism or plant under Suitable basic plasmid without any insert should also be tested. The conditions (Such as those described above) and analyzing the medium is inoculated to an ODoo of 0.5-1.5 using cells medium and/or the cellular component for increased pro grown on agar plates, Such as CM plates (10 g/l glucose, 2.5 duction of the desired product (i.e. fine chemicals). Such g/l NaCl, 2 g/l urea, 10 g/l polypeptone, 5 g/l yeast extract, analysis techniques are well known to one skilled in the art, 5 g/l meat extract, 22 g/l NaCl, 2 g/l urea, 10 g/l polypep and include SpectroScopy, thin layer chromatography, Stain tone, 5 g/l yeast extract, 5 g/l meat extract, 22 g/l agar, pH ing methods of various kinds, enzymatic and microbiologi 6.8 with 2M NaOH) that had been incubated at 3D.C. cal methods, and analytical chromatography Such as high Inoculation of the media is accomplished by either intro performance liquid chromatography (See, for example, Ull duction of a Saline Suspension of C. glutamicum cells from man, Encyclopedia of Industrial Chemistry, Vol. A2, p. CM plates or addition of a liquid preculture of this bacte 89-90 and p. 443-613, VCH: Weinheim (1985); Fallon, A. et U. al., (1987) “Applications of HPLC in Biochemistry” in: Laboratory Techniques in Biochemistry and Molecular Biol Example 16 ogy, vol. 17; Rehm et al. (1993) Biotechnology, vol. 3, In vitro Analysis of the Function of Physcomitrella Chapter III: “Product recovery and purification”, page 469 Genes in Transgenic Organisms 714, VCH: Weinheim; Belter, P. A. et al. (1988) Biosepa 0253) The determination of activities and kinetic param rations: downstream processing for biotechnology, John eters of enzymes is well established in the art. Experiments Wiley and Sons; Kennedy, J. F. and Cabral, J. M. S. (1992) to determine the activity of any given altered enzyme must Recovery processes for biological materials, John Wiley and be tailored to the Specific activity of the wild-type enzyme, Sons; Shaeiwitz, J. A. and Henry, J. D. (1988) Biochemical which is well within the ability of one skilled in the art. Separations, in: Ulmann's Encyclopedia of Industrial Chem Overviews about enzymes in general, as well as Specific istry, vol. B3, Chapter 11, page 1-27, VCH: Weinheim; and details concerning Structure, kinetics, principles, methods, Dechow, F. J. (1989) Separation and purification techniques applications and examples for the determination of many in biotechnology, Noyes Publications.) enzyme activities may be found, for example, in the follow 0257. In addition to the measurement of the final product ing references: Dixon, M., and Webb, E. C., (1979) in plant cells, microorganisms and algae, it is also possible Enzymes. Longmans: London; Fersht, (1985) Enzyme to analyze other components of the metabolic pathways Structure and Mechanism. Freeman: New York; Walsh, utilized for the production of the desired compound, Such as (1979) Enzymatic Reaction Mechanisms. Freeman: San intermediates and Side-products, to determine the overall Francisco; Price, N. C., Stevens, L. (1982) Fundamentals of efficiency of production of the compound. Analysis methods Enzymology. Oxford Univ. Press: Oxford; Boyer, P.D., ed. include measurements of nutrient levels in the medium (e.g., (1983) The Enzymes, 3" ed. Academic Press: New York; Sugars, hydrocarbons, nitrogen Sources, phosphate, and Bisswanger, H., (1994) Enzymkinetik, 2" nd ed. VCH: other ions), measurements of biomass composition and Weinheim (ISBN 3527300325); Bergmeyer, H. U., Bergm growth, analysis of the production of common metabolites eyer, J., Graf31, M., eds. (1983-1986) Methods of Enzymatic of biosynthetic pathways, and measurement of gasses pro Analysis, 3" ed., vol. I-XII, Verlag Chemie: Weinheim; and duced during fermentation. Standard methods for these Ullmann's Encyclopedia of Industrial Chemistry (1987) vol. measurements are outlined in Applied Microbial Physiology, A9, “Enzymes”. VCH: Weinheim, p. 352-363. A Practical Approach, P. M. Rhodes and P. F. Stanbury, eds., 0254 The activity of proteins which bind to DNA can be IRL Press, p. 103-129; 131-163; and 165-192 (ISBN: measured by several well-established methods, such as DNA 0199635773) and references cited therein. US 2003/O157592 A1 Aug. 21, 2003 28

0258 Material to be analyzed can be disintegrated via filtration or ultrafiltration, and Stored at a temperature at Sonification, glass milling, liquid nitrogen and grinding or which the stability of the product is maximized. Via other applicable methods. The material has to be cen 0265. There are a wide array of purification methods trifuged after disintegration. known to the art and the preceding method of purification is 0259 Vitamin E: not meant to be limiting. Such purification techniques are 0260 The determination of tocopherols in cells has been described, for example, in Bailey, J. E. & Ollis, D. F. either conducted according to Kurilich et al 1999, J. Agric. Biochemical Engineering Fundamentals, McGraw-Hill: Food. Chem. 47: 1576-1581 or alternatively as described in New York (1986). Tani Y and Tsumura H 1989 (Agric. Bio. Chem. 53: 305 0266 The identity and purity of the isolated compounds 312). may be assessed by techniques Standard in the art. These include high-performance liquid chromatography (HPLC), 0261) Carotenoids: Spectroscopic methods, Staining methods, thin layer chro 0262 The large scale production and purification of matography, NIRS, enzymatic assay, or microbiologically. carotenoids implies a Solution for Separation of lipophilic Such analysis methods are reviewed in: Patek et al. (1994) impurities from the host cell which have to be separated Appl. Environ. Microbiol. 60: 133-140; Malakhova et al. from the carotenoids. On a production Scale the material has (1996) Biotekhnologiya 11:27-32; and Schmidt et al. (1998) to be desintegrated for the production of oleoresins via Bioprocess Engineer. 19: 67-70. Ulmann's Encyclopedia of centrifugation as known skilled in the art from various Industrial Chemistry, (1996) vol. A27, VCH: Weinheim, p. production processes or via desintegration followed by 89-90, p. 521-540, p. 540-547, p. 559-566, 575-581 and p. evaporation and extraction. Acetone or hexane extraction for 581-587; Michal, G. (1999) Biochemical Pathways: An 8-12 hours in the dark to avoid carotenoid break down. After Atlas of Biochemistry and Molecular Biology, John Wiley removal of the Solvent the residue is dissolved in a dieth and Sons; Fallon, A. et al. (1987) Applications of HPLC in ylether-hexane mixture or, in case of hydroxycarotenoids, in Biochemistry in: Laboratory Techniques in Biochemistry acetone-petrol and purified via Silica-gel column. Suitable and Molecular Biology, vol. 17. solvent mixtures are diethylether:hexane or petrol (1:4 v/v) for carotenes and acetone:hexane or petrol (1:4 V/v) for Example 19 hydroxycarotenoids. To determine carotenoid purity in iso lated fractions HPLC techniques are most appropriate (Lin Generation of Transgenic Brassica napuS Plants den et al., FEMS Microbiol. Let. 106:99-104; Piccaglia et 0267 The generation of transgenic oilseed rape plants al., 1998; Industrial Crops and Products 8:45-51 and refer followed in principle a procedure of Bade, J. B. and Damm, ences therein). B. (in Gene Transfer to Plants, Potrykus, I. and Spangen berg, G., eds, Springer Lab Manual, Springer Verlag, 1995, Example 18 30-38), which also indicates the composition of the media and buffers used. transformations were done with the Agro Purification of the Desired Product From bacterium tumefaciens strains EHA105 and GV3101, Transformed Organisms respectively. Recombinate plasmids were used for transfor 0263. Recovery of the desired product from plants mate mation. Seeds of Brassica napus var. Westar were surface rial or fungi, algae, cilates or C. glutamicum cells or Supe sterilized with 70% ethanol (v/v), washed for 10 minutes at matant of the above-described cultures can be performed by 55DC in water, incubated for 20 minutes in 1% strength various methods well known in the art. If the desired product hypochlorite solution (25% v/v Teepol, 0.1% v/v Tween 20) is not secreted from the cells. The cells, can be harvested and washed six times with sterile water for in each case 20 from the culture by low-speed centrifugation, the cells can minutes. The Seeds were dried for three days on filter paper be lysed by Standard techniques, Such as mechanical force or and 10-15 Seeds were germinated in a glass flask containing Sonification. Organs of plants can be separated mechanically 15 ml of germination medium. Roots and apices were from other tissue or organs. Following homogenization removed from Several Seedlings (approx. Size 10 cm), and cellular debris is removed by centrifugation, and the Super the hypocotyls which remained were cut into Sections of natant fraction containing the Soluble proteins is retained for approx. length 6 mm. The approx. 600 explants thus further purification of the desired compound. If the product obtained were washed for 30 minutes in 50 ml of basal is Secreted from desired cells, then the cells are removed medium and transferred into a 300 ml flask. After addition from the culture by low-speed centrifugation, and the Supe of 100 ml of callus induction medium, the cultures were mate fraction is retained for further purification. incubated for 24 hours at 100 rpm. 0264. The Supernatant fraction from either purification 0268 An overnight culture of agrobacterial strain was set method is Subjected to chromatography with a Suitable resin, up in Luria broth medium Supplemented with kanamycin (20 in which the desired molecule is either retained on a chro mg/1) at 29DC, and 2 ml of this were incubated in 50 ml of matography resin while many of the impurities in the Sample Luria broth medium without kanamycin for 4 hours at 29OC are not, or where the impurities are retained by the resin until an ODoo of 0.4–0.5 was reached. After the culture had while the Sample is not. Such chromatography Steps may be been pelleted for 25 minutes at 2000 rpm, the cell pellet was repeated as necessary, using the same or different chroma resuspended in 25 ml of basal medium. The bacterial con tography resins. One skilled in the art would be well-versed centration of the solution was brought to an ODoo of 0.3 by in the Selection of appropriate chromatography resins and in adding more basal medium. their most efficacious application for a particular molecule to 0269. The callus induction medium was removed from be purified. The purified product may be concentrated by the oilseed rape explants using Sterile pipettes, 50 ml of US 2003/O157592 A1 Aug. 21, 2003 29 agrobacterial Solution were added, and the reaction was claforan and Selection marker. AS Soon as Substantial root mixed carefully and incubated for 20 minutes. The agrobac ball had developed, it was possible to pot up the plants in terial Suspension was removed, the oilseed rape explants Seed compost. were washed for 1 minute with 50 ml of callus induction medium, and 100 ml of callus induction medium were Example 21 Subsequently added. Coculturing was carried out for 24 hours on an orbital shaker at 100 rpm. Coculturing was Generation of Transgenic A. thaliana Plants Stopped by removing the callus induction medium and 0273 Wild-type A. thaliana plants (Columbia) were explants were washed twice for in each case 1 minute with transformed with the Agrobacterium tumefaciens strain 25 ml and twice for 60 minutes with in each case 100 ml of (EHA105) on the basis of a modified method (Steve Clough wash medium at 100 rpm. The wash medium together with and Andrew Bent. Floral dip: a simplified method for the explants was transferred into 15 cm Petri dishes, and the Agrobacterium mediated transformation of A. thaliana. medium was removed using Sterile pipettes. Plant J 16(6):735-43, 1998) of the vacuum infiltration method as described by Bechtold and coworkers (Bechtold, 0270. For regeneration, in each case 20-30 explants were N. Ellis, J. and Pelltier, G., in planta Agrobacterium-medi transferred into 90 mm Petri dishes containing 25 ml of ated gene transfer by infiltration of adult A. thaliana plants. shoot induction medium Supplemented with kanamycin. The Petri dishes were sealed with 2 layers of Leukopor and CRAcad Sci Paris, 1993. 1144(2):204-212). incubated at 25DC and 2000 lux at photoperiods of 16 hours Example 22 light/8 hours darkness. Every 12 days, the calli which developed were transferred to fresh Petri dishes containing Characterization of the Transgenic Plants shoot induction medium. All further Steps for the regenera tion of intact plants were carried out as described by Bade, 0274) To confirm that expression of the TCMRP genes J. Band Damm, B. (in Gene Transfer to Plants, Potrykus, I. affected Vitamin E biosynthesis in the transgenic plants, the and Spangenberg, G., eds, Springer Lab Manual, Springer tocopherol and tocotrienol contents in leaves and Seeds of Verlag, 1995, 30-38). the plants (Arabidopsis. thaliana, Brassica napus and Nic Otiana tabacum) which had been transformed with the above-described constructs were analyzed. To this end, the Example 20 transgenic plants were grown in the greenhouse, and plants which express the gene encoding the TCMRP polypeptides Generation of Transgenic Nicotiana tabacum plants were identified at Northern level. The tocopherol content 0271 10 ml of YEB medium supplemented with antibi and the tocotrienol content in leaves and Seeds of these otic (5 g/l beef extract, 1 g/l yeast extract, 5 g/l peptone, 5 plants were determined. In all cases, the tocopherol or g/l Sucrose and 2 mM MgSO) were inoculated with a tocotrienol concentration is elevated in comparison with colony of Agrobacterium tumefacienS and the culture was untransformed plants. grown overnight at 28OC. The cells were pelleted for 20 minutes at 40DC, 3500 rpm, using a bench-top centrifuge Example 23 and then resuspended under sterile conditions in fresh YEB medium without antibiotics. The cell Suspension was used Isolation of Full Length Physcomitrella patens for the transformation. 78 ppprot1 092 E12-260 cDNA 0275 Utilizing the partial sequence of the Physcomitrella 0272. The sterile-grown wild-type plants were obtained patens clone 78 ppprot1 092 E 12 as probe, an Physcomi by vegetative propagation. To this end, only the tip of the trella patens cINA library was screened by nucleic acid plant was cut off and transferred to fresh 2MS medium in a hybridization for full length cDNAs. Sterile preserving jar. AS regards the rest of the plant, the hairs on the upper Side of the leaves and the central veins of 0276 A large number of hybridizing clones were iso the leaves were removed. Using a razor blade, the leaves lated. The isolated cDNA 78 ppprot1 092 E12-260 (1968 were cut into sections of approximate size 1 cm. The bp) was sequenced completely. 78 ppprot1 092 E12-260 agrobacterial culture was transferred into a Small encodes a 492 amino acid protein. (diameter 2 cm). The leaf sections were briefly drawn through this solution and placed with the underside of the Example 24 leaves on 2MS medium in Petri dishes (diameter 9 cm) in Such a way that they touched the medium. After two days in Amplification of the Coding Sequence (ORF) of the dark at 25DC, the explants were transferred to plates the Full Length Clone 78 ppprot1 092 E12-260 with callus induction medium and warmed at 28C in a 0277. The coding sequence (ORF) of the 78 ppprot1 controlled-environment cabinet. The medium had to be 092 E12-260 clone was amplified using polymerase chain changed every 7-10 days. AS Soon as calli formed, the reaction (PCR). The sequence of the resultant PCR fragment explants were transferred into Sterile preserving jars onto is designated 092-260 ccds. The forward and reverse primers shoot induction medium supplemented with claforan (0.6% (78 ppprot1 092 E125' and 78 ppprot1 092 E123', BiTec-Agar (g/v), 2.0 mg/l Zeatin ribose, 0.02 mg/l naph respectively) were designed to add a BamHI site to the 5' and thylacetic acid, 0.02 mg/l of gibberellic acid, 0.25 g/ml 3' end of the resulting amplication product. claforan, 1.6% glucose (g/v) and 50 mg/l kanamycin). Organogenesis started after approximately one month and it 0278 Forward primer 78ppprot1 092 E12-260Z 5': was possible to cut off the shoots which had formed. The 0279 GGATCCATCATGGCGGTCAATAC shoots were grown on 2MS medium supplemented with CGAGC US 2003/O157592 A1 Aug. 21, 2003 30

0280 Reverse primer 78 ppprot1 092 E12-260 3': final concentration of 0.15 mM and the homogenate was incubated on ice for 10 minutes. 0281 GGATCCCAAGATCATAATGCCTTG TAGGC 0295) The cells were lysed by Sonification with a microtip Sonicator using Several 10 Second pulses. 0282. The PCR reaction was conducted in a 50 ul reac tion mixture, containing dNTPs (0.2 mM each), 1.5 mM 0296. After adding Triton X100 (fic. 0.1%) the homoge Mg(OAc), 40 pmol 78 pprot1-092 E125, 40 pmol nate was incubated for 30 minutes on ice, and Subjected to 78 ppprot1 092 E123", 15 u3.3xrTth DNA Polymerase centrifugation at 25000 g for 30 minutes. The Supernatant XLPuffer (PE Applied Biosystems), 5U rTth DNA Poly was Saved for methyltransferase assayS. merase XL (PE Applied Biosystems). 0297. The 2-methyl-6-phytylplastoquinol-methyltrans ferase assay was performed in a 500 ul Volume containing 0283 The following conditions were used: 135 ul (about 300-600 ug total protein) E.coli extract 0284) step 1: 5 minutes 94° C. (denaturation) expressing the 092-260 CDNA (prepared as described above), 200 ul (125 mM) Tricine-NaOH pH 8.0, 100 ul (1.25 0285) step 2: 3 seconds 94° C.(denaturation) mM) Sorbitol, 10 ul (50 mM) MgCl, and 20tul (250 mM) Ascorbate, 15 ul(0.46 mM "C-methyl-S-adenosylmethion 0286) step 3: 2 minutes 65 C. (annealing) ine (SAM)) as methyl group-donor and 2-methyl-6-phy step 4: 1 minutes 72° C. (elongation) tylplastoquinol as Substrate. The reaction was incubated for 0287) four hours at 25 C. in the dark. 0288 40 cycles step 2-4 0298 The reaction was stopped by adding 7501 ul Chlo 0289) step: 5: 10 minutes 72° C. roform/Methanol (1:2)+1501 ul 0.9% NaCl. The tube were mixed thoroughly, the phases were Separated by centrifuga 0290 The resulting PCR fragment was cloned into the tion and the upper part was discarded. The lower part was PCR cloning vector pGEM-T (Promega) as described in the transferred to a new tube and vaporized under a stream of instructions. The recombinant plasmid (pGEM-Teasy/092 nitrogen. 260 ccds) was sequenced to confirm the correct amplification. 0299 The dried residue was resuspended in 20 ul ether and Spotted onto a silica thin layer-chromatography (TLC) Example 25 plate. The TLC plate was exposed to a phosphoimager SCCC. Demonstration of 2-methyl-6-phytylplastoquinol-methyltransferase 0300. The result showed that the 092-260cds protein Activity (TMT type II) of 78 ppprot1 092 E12 expressed was able to methylate 2-methyl-6-phytylplasto cDNA Clone by Expression and Biochemical quinol. No radioactive labelling of the substrate was Analysis in E.coli observed in assays using extracts from control cells. 0291. In order to demonstrate that the clone 78 ppprot1 Example 26 092 E12-260 encodes a protein involved in tocopherol biosynthesis the cDNA 092-260 cds (cds=coding sequence Construction of Vectors for Expressing the amplified as described above) was expressed in E.coli and Physcomitrella tested for 2-methyl-6-phytylplastoquinol-methyltransferase 2-Methyl-6-phytylplastoquinol-methyltransferase in activity. A. thatiana and Other Plants for Altering the 0292 Hence, the 092-260 cds BamHI fragment was Content of Tocopherols subcloned in the correct reading frame into the BamHI site 0301 In order to manipulate the Vitamin E levels in of the E.coli pCE30 expression vector (QIAexpress Kit, seeds, the cDNA clone 78 ppprot1 092 E12-260 encoding Qiagen). The resulting plasmid (designated pCE30-092-260 the Physcomitrella patens 2-methyl-6-phytylplastoquinol cds, see FIG. 1) was used to transform the E.coli expression methyltransferase was expressed under the control of a Seed host strain M15pREP4). Specific promoter in transgenic A. thaliana plants. The Seed 0293 An E.coli colony transformed with the plasmid Specific plant gene expression plasmid was constructed pOE30-092-260 cds was used to inoculate an overnight using a pBin 19 (Bevan, Nucleic Acid Research 12: 8711 culture of Luria broth containing 200 lug/ml amplicillin. In 8720, 1984) derivative. The plasrnid contains the Vicia faba the morning an aliquot of this culture was used to inoculate Seed specific promotor from the Legumin B4 gene (Bäum lein et al., Nucleic Acids Research 14:2707-2719, 1996), the a 100 ml culture of Luria broth containing 200 tug/ml Sequence encoding the transit peptide of the N. tabacum ampicillin. This culture was incubated in a shaking incubator Transketolase (TkTp) (Badur, R., 1998, PhD thesis, Georg at 28°C. until the ODoo of the culture reached 0.4, at which August University of Göttingen, Germany, “Molecular and time isopropyl-B-D-thiogalactopyranosid (IPTG) was added functional analysis of isoenzymes for example of fructose to obtain a final concentration of 0.4 mM IPTG. The culture 1,6-bisphosphate aldolase, phosphoglucose-isomerase and was incubated for additional three hours at 28 C. After 3-deoxy-D-arabino-heptuSolonate-7-phosphate Synthase' wards the cells were harvested by centrifugation at 8000g. “Molekularbiologische und funktionelle Analyse vonpflan 0294 The pellet was resuspended in 600 ul lysis buffer Zlichen Isoenzymen am Beispiel der Fructose-1,6-bisphos (approximately 1-1.5 ml/g cell pellet, 10 mM HEPES KOH phat Aldolase, Phosphoglucose-Isomerase und der 3-Deoxy pH 7.8, 5 mM Dithiothreito.1 (DTT), 0.24 M Sorbito1). D-Arabino-Heptusolonat-7-Phosphat Synthase”) and the Subsequently Phenylmethylsulfonat (PMSF) was added to a transcriptional termination Sequence from the Octopin Syn US 2003/O157592 A1 Aug. 21, 2003

thase gene (Gielen et al., EMBO J. 3: 835-846, 1984). The 03.19. The resulting PCR fragment was cloned into the cDNA 092-260 ccds was cloned in sense orientation as a PCR cloning vector pGEM-T (Promega) as described in the BamnhI fragment into the BamHI site of the pBin-LePT instruction. The recombinant plasmid (pGEM-Teasy/087 kTp9 vector. The created plasmid was designated pBin 259C-term) was sequenced to confirm the correct amplifi LePTkTp9-092-260 cds. Due to the cloning in the correct cation. reading frame, the cDNA092-260 ccds was fused to the TkTp transit peptide, which governs the translocation of the 092 Example 29 260 ccds protein into plastids. 0302) A recombinant plasmid was obtained and desig Demonstration of Y-Tocopherol-Methyltransferase nated pBin-LePTkTp9-092-260 cds (see FIG. 2). This seed Activity of 087-259Cterm cDNA Clone by Specific 78 ppprot1 092 E12-260 plant gene expression Expression and Biochemical Analysis in E.coli construct (pBin-LePTkTp9-092-260 cds) was used to trans 0320 In order to demonstrate that the clone 087 form wild type, A. thaliana plants 259Cterm (amplified as described above) encodes a protein Example 27 involved in tocopherol biosynthesis the cDNA 087 259Cterm was expressed in E.coli and tested for Y-Toco Isolation of Full Length Physcomitrella patens pherol methyltransferase activity. Hence, the 087-259Cterm BamHI fragment was Subcloned in the correct reading frame 78 ppprot1 087 E12-259 cDNA into the BamHI site of the E. coli pCE30 expression vector 0303 Utilizing the partial sequence of the Physcomitrella (QIAexpress Kit, Qiagen). The resulting plasmid (desig patens clone 78 ppprot1 087 E12 as probe, an Physcomi nated pCE30-087-259Cterm, see FIG. 3) was used to trans trella patens c)NA library was screened by nucleic acid form the E.coli expression host strain M15pREP4). hybridization for full length cDNAs. 0321) An E.coli colony transformed with the plasmid 0304. A large number of hybridizing clones were iso pOE30-087-259Cterm was used to inoculate an overnight lated. The isolated cDNA 78 ppprot1 087 E12-259 (1867 culture of Luria broth containing 200 lug/ml amplicillin. In bp) was sequenced completely. 78 ppprot1 087 E12-259 the morning an aliquot of this culture was used to inoculate encodes a 371 amino acid protein. a 100 ml culture of Luria broth containing 200 tug/ml ampicillin. This culture was incubated in a shaking incubator Example 28 at 28°C. until the ODoo of the culture reached 0.4, at which 0305 Amplification of the coding sequence (ORF) of the time isopropyl-B-D-thiogalaktopyranosid (IPTG) was added full length clone 78 pprot1 087 E12-259 to obtain a final concentration of 0.4 mM IPTG. 0306 The coding sequences (ORF) of the 78 ppprot1 0322 The culture was incubated for additional three 087 E12-259 clone with homology to the Y-Tocopherol hours at 28 C. Afterwards the cells were harvested by methyltransferases (designated 087-259Cterm) was ampli centrifugation at 8000 g. fied using polymerase chain reaction (PCR). The forward and reverse primers (78 ppprot1 087 E12-259 5' and 0323 The pellet was resuspended in 600 ul lysisbuffer 78 ppprot1 087 E12-259 3", respectively) were designed (approximately 1-1.5 ml/g cell pellet, 10 mM HEPES KOH to add a BamHI site to the 5' and 3' end of the resulting pH 7.8, 5 mM Dithiothreitol (DTT), 0.24 M Sorbitol). amplication product. Subsequently Phenylmethylsulfonat (PMSF) was added to a final concentration of 0.15 mM and incubated on ice for 10 0307 Forward primer 78 pprotl 087 E12-259 5' minutes. 0308 GGATCCCGGACGGAGCCGGAGCTT. 0324. The cells were lysed by Sonification with a microtip TACG Sonicator using Several 10 Second pulses. After adding 0309 Reverse primer 78 ppprot1 087 E12-259 3' Triton X100 (fic. 0.1%) the homogenate was incubated for 0310 GGATCCCTACTAGCGGAGACCT 30 minutes on ice, and subjected to centrifugation at 25000 CAATCC g for 30 minutes 0311. The PCR reaction was conducted in a 50 82 1 0325 The Supernatant of this extract was assayed for reaction mixture, containing dNTPs (0.2 mM each), 1.5 mM Y-tocopherol-methyltransferase activity as follows. Mg(OAc), 40 pmol 78 ppprot1 087 E125', 40 pmol 0326. The Y-Tocopherol-methyltransferase assay was 78 ppprot1 087 E123", 15 ul 3,3xrTth DNA Polymerase performed in a 500 ul volume containing 135 ul (about XLPuffer (PE Applied Biosystems), 5U rTth DNA Poly 300-600 ug total protein) E.coli extract expressing the merase XL (PE Applied Biosystems). 087-259 cDNA (prepared as described above), 200ul (125 0312 The following conditions were used: mM) Tricine-NaOH pH 7.6, 100 ul (1.25 mM) Sorbitol, 10ug (50 mM) MgCl, and 20 ul (250 mM) Ascorbate, 15ul 0313) step 1: 5 minutes 94° C. (denaturation) (0.46 mM 'C-methyl-S-adenosylmethionine (SAM)) as 0314) step 2: 3 seconds 94° C.(denaturation) methyl group donor and 4.8 mM Y-Tocopherol as Substrate. The reaction was incubated for four hours at 25 C. in the 0315) step 3: 2 minutes 65 C. (annealing) dark. 0316) step 4: 2 minutes 72° C. (elongation) 0327. The reaction was stopped by adding 750 ul Chlo 0317) 40 cycles step 2-4 roform/Methanol (1:2)+150 ul 0.9% NaCl. The tube were 0318) step: 5: 10 minutes 72° C. mixed thoroughly, the phases were Separated by centrifuga US 2003/O157592 A1 Aug. 21, 2003 32 tion and the upper part was discarded. The lower part was Equivalents transferred to a new tube and vaporized under a stream of nitrogen. 0331 Those skilled in the art will recognize, or will be able to ascertain using no more than routine experimenta 0328. The dried residue was resuspended in 20 ul ether tion, many equivalents to the Specific embodiments of the and Spotted onto a silica thin layer-chromatography (TLC) invention described herein. Such equivalents are intended to plate. The TLC plate was exposed to a phosphoimager be encompassed by the following claims. SCCC. 0329. The result shows that the in E.coli expressed 087 Legends to the Figurs: 259Cterm protein was able to methylate Y-Tocopherol. No 0332 FIG. 1: Expression vector pGE30 harboring the radioactive labelling of the Substrate was observed in assays coding sequence of full length clone 78 ppprot1 092 E using extracts from control cells. 12-260 resluting in vector pGE30-092-260 cds Example 30 0333 FIG. 2: Plant transformation vector pBinLePT. kTp9-092-260 cds with abbreviations as follows: Construction of Vectors for Expressing the Physcomitrella patens 0334) LeB4. Vicia faba legumin B4 gene promoter Y-Tocopherol-methyltransferase in A. thaliana and (2700 bp) Other Plants for Altering the Content of 0335) TKTP: Sequence encoding the N. tabacum Tocopherols transketolase transit peptide (245 bp) 0330. In order to manipulate the Vitamin E levels in seeds, the cDNA clone 78 ppprot1 087 E12-259 encoding 0336) 092-260 cds: Sequence of the cDNA clone the Physcomitrella patens Y-tocopherol-methyltransferase 092-260 cds (1490 bp) was expressed under the control of a Seed specific promoter in transgenic A. thaliana plants. The Seed-specific plant gene 0337 OCS: Octopin synthase transcritional termi expression plasmid was constructed using a pBin 19 (Bevan, nation signal (219 bp) Nucleic Acid Research 12:8711-8720, 1984) derivative. The 0338 FIG. 3: Expression vector pGE30 harboring the plasmid contains the Vicia faba Seed specific promotor from coding Sequence of full length clone 78 ppprot1 osz E12 the Legumin B4 gene (Bäumlein et al., Nucleic Acids 259 resluting in vector pGE30-087-259Cterm Research 14: 2707-2719, 1996), the sequence encoding the transit peptide of the N. tabacum Transketolase (TkTp) 0339 FIG. 4: Plant transformation vector pBinLePT. (Badur, R., Ph.D thesis, 1998, Georg August University of kTp9-092-260 cds with abbreviations as follows: Göbttingen, Germany, "Molecular and functional analysis of isoenzymes for example of fructose-1,6-bisphosphate 0340 LeB4. Vici afaba legumin B4 gene promoter aldolase, phosphoglucose-isomerase and 3-deoxy-D-ara (2700 bp) bino-heptusolonate-7-phosphate synthase'“Molekularbi 0341 TKTP: Sequence encoding the N. tabacum ologische und funktionelle Analyse Von pflanzlichen ISOen Zymen am Beispiel der Fructose-1,6-bisphosphat Aldolase, transketolase transit peptide (245 bp) Phosphoglucose-Isomerase und der 3-Deoxy-D-Arabino 0342 092-260 cds: Sequence of the cDNA clone Heptusolonat-7-Phosphat Synthase”) and the transcrip 092-260 cds (1490 bp) tional termination Sequence from the Octopin Synthase gene (Gielen et al., EMBO J. 3: 835-846, 1984). The cDNA 0343 OCS: Octopin synthase transcritional termi 087-259Cterm was cloned in sense orientation as a BamHI nation signal (219 bp) fragment into the BamHI site of the p3in-LePTkTp9 vector. The created plasmid was designated pBin LePTkTp9-87 0344) Table 1: Enzymes involved in production of toco 259Cterm. Due to the cloning in the correct reading frame pherols and/or carotenoids, the accession/entry number of the cDNA 087-259Cterm was fused to the TkTp transit the corresponding partial nucleic acid molecules, the corre peptide which governs the translocation of the 087 sponding longest clones and the position of open reading 259Cterm protein into plastids. A recombinant plasmid frames. designated pBin-LePTkTp9-087-259Cterm was obtained 0345 Appendix A: Nucleic acid sequences encoding for (see FIG. 4). This seed-specific 78 ppprot1 087 E12-259 TCMRPs (Tocopherol and Caotenoid Metabolism Related plant gene expression construct (pBin-LePTkTp9-087 protein) 259Cterm) was used to transform wild type A. thaliana plants. 0346 Appendix B: TCMRP polypeptide sequences

SEQUENCE LISTING

<160> NUMBER OF SEQ ID NOS: 82 <210> SEQ ID NO 1 &2 11s LENGTH 560 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &22O > FEATURE US 2003/O157592 A1 Aug. 21, 2003 33

-continued <221 NAME/KEY: CDS <222> LOCATION: (66) . . (257) <223> OTHER INFORMATION: 84 ppprot1 50 f12 rev <400 SEQUENCE: 1 gctitatgg to aggaagtgaa toag catggg aaggttgaca atgcaaggta caagatc gat 60 cctga cct togc ggg cqc tot tta cqa gga cit g g gt tat gcc titt gac caa 110 Pro Cys Gly Arg Ser Lieu Arg Gly Lieu Gly Tyr Ala Phe Asp Glin 1 5 10 15 gca ggit coa ggt ggc cita tot tot cog acg tot goa citg acg to a titt 158 Ala Gly Pro Gly Gly Leu Ser Ser Pro Thr Ser Gly Leu Thir Ser Phe 2O 25 30 aac to g togg cag ata gtc. aag titg aag agg atc atc act gac at a gC c 2O6 Asn Ser Trp Glin Ile Wall Lys Lieu Lys Arg Ile Ile Thr Asp Ile Ala 35 40 45 cat tdt ggc ctic titc act c gt gag tta gC c tot gta cag aaa aca titt 254 His Cys Gly Leu Phe Thr Arg Glu Leu Ala Cys Val Gln Lys Thr Phe 5 O 55 60 tag totcatttitt ttgcatagaa goaccatcga ttgcttcttg ctitccaagttc 307 cagttittagc gcattcattt coctaggtgag catactittca acataaagat citccaccitcc 367 gaggttgagc cagtacgcct agattctgtg aatcago aac ggccaaagct tittcttctot 427 ggataggtoa gtcaatgcat acacttggca tacatacacc atgcggtgtt agtgctttitt 487 tittcgctato aaccqaggitt titact gctta totgcaataa gag cago caa tacct gcaag 547 titttittcaa aaa. 560

<210> SEQ ID NO 2 &2 11s LENGTH 63 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 2 Pro Cys Gly Arg Ser Lieu Arg Gly Lieu Gly Tyr Ala Phe Asp Glin Ala 1 5 10 15 Gly Pro Gly Gly Leu Ser Ser Pro Thr Ser Gly Leu Thir Ser Phe Asn 2O 25 30 Ser Trp Glin Ile Val Lys Lieu Lys Arg Ile Ile Thr Asp Ile Ala His 35 40 45 Cys Gly Leu Phe Thr Arg Glu Leu Ala Cys Val Gln Lys Thr Phe 5 O 55 60

<210> SEQ ID NO 3 <211& LENGTH 45.4 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) ... (439) <223> OTHER INFORMATION: 41 bd10 g 03 rev &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 450 <223> OTHER INFORMATION: n is g, a t, or c <400 SEQUENCE: 3 t caa aat cqg aaa at g g ga acg gala gtt aag citc act aat gga aac acc 49 Glin Asn Arg Lys Met Gly Thr Glu Val Lys Lieu. Thr Asn Gly Asn Thr 1 5 10 15 US 2003/O157592 A1 Aug. 21, 2003 34

-continued gto act gca cot gcc gga gaa cag act agt toc goc tac aag cita gtt 97 Val Thr Ala Pro Ala Gly Glu Gln Thr Ser Ser Ala Tyr Lys Leu Val 2O 25 30 ggc titc gala aac titc gtc. c.gg aac aac cot atg toc gac aaa titt aca 145 Gly Phe Glu Asin Phe Val Arg Asn Asn Pro Met Ser Asp Llys Phe Thr 35 40 45 gto: aaa agc titc. cac cat gtt gag titc togg togc ticc gac goc acc aac 193 Val Lys Ser Phe His His Val Glu Phe Trp Cys Ser Asp Ala Thr Asn 5 O 55 60 acc goc cqc cqt titc. tcc tog gga citc ggt atg cca atc gtt tac aag 241 Thr Ala Arg Arg Phe Ser Trp Gly Leu Gly Met Pro Ile Val Tyr Lys 65 70 75 8O toc gat tta tot acc gga aac aat atc. cac got tot tac citc citc cq c 289 Ser Asp Leu Ser Thr Gly Asn. Asn. Ile His Ala Ser Tyr Lieu Lieu Arg 85 90 95 toc ggit cac citc aat titc citc titt acc got cot tat tct cot toc ata 337 Ser Gly. His Leu Asn Phe Leu Phe Thr Ala Pro Tyr Ser Pro Ser Ile 100 105 110 toc acc goc acc gct tcc att cot acg ttt tot cac acc gac toc cq c 385 Ser Thr Ala Thr Ala Ser Ile Pro Thr Phe Ser His Thr Asp Cys Arg 115 120 125 aac titc acc goc tot cac ggt titt ggt gtc. c.gc ticg att got att gala 433 Asn Phe Thr Ala Ser His Gly Phe Gly Val Arg Ser Ile Ala Ile Glu 130 135 1 4 0 gtt gala gatgcc gacc nagct 454 Wall Glu 145

<210> SEQ ID NO 4 <211& LENGTH 146 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 4 Glin Asn Arg Lys Met Gly Thr Glu Val Lys Lieu. Thr Asn Gly Asn Thr 1 5 10 15 Val Thr Ala Pro Ala Gly Glu Gln Thr Ser Ser Ala Tyr Lys Leu Val 2O 25 30 Gly Phe Glu Asin Phe Val Arg Asn Asn Pro Met Ser Asp Llys Phe Thr 35 40 45 Val Lys Ser Phe His His Val Glu Phe Trp Cys Ser Asp Ala Thr Asn 5 O 55 60 Thr Ala Arg Arg Phe Ser Trp Gly Leu Gly Met Pro Ile Val Tyr Lys 65 70 75 8O Ser Asp Leu Ser Thr Gly Asn. Asn. Ile His Ala Ser Tyr Lieu Lieu Arg 85 90 95 Ser Gly. His Leu Asn Phe Leu Phe Thr Ala Pro Tyr Ser Pro Ser Ile 100 105 110 Ser Thr Ala Thr Ala Ser Ile Pro Thr Phe Ser His Thr Asp Cys Arg 115 120 125 Asn Phe Thr Ala Ser His Gly Phe Gly Val Arg Ser Ile Ala Ile Glu 130 135 1 4 0

Wall Glu 145

<210 SEQ ID NO 5 US 2003/O157592 A1 Aug. 21, 2003 35

-continued

&2 11s LENGTH 565 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (563) <223> OTHER INFORMATION: 58 mm.15 b11rev

<400 SEQUENCE: 5 ga titt gca atg gac cqa gct ggg citc gtt goa gCC gat ggg cct act 47 Phe Ala Met Asp Arg Ala Gly Lieu Val Gly Ala Asp Gly Pro Thr 1 5 10 15 cac tot ggg gct titc gat gito acc tac at g gCC togc cta colt aac atg 95 His Cys Gly Ala Phe Asp Val Thr Tyr Met Ala Cys Leu Pro Asn Met 2O 25 30 gtt gta at g gCt c cit gct gat gala gct gag citt titc. cac at g gta gCa 1 4 3 Val Val Met Ala Pro Ala Asp Glu Ala Glu Leu Phe His Met Val Ala 35 40 45 act gct gcc got att gat gac cqt coc agc tot titc agg tat coc aga 191 Thr Ala Ala Ala Ile Asp Asp Arg Pro Ser Cys Phe Arg Tyr Pro Arg 5 O 55 60 ggit aac ggg att got gtc. caa ttg cct gca aag aac aaa gga att cot 239 Gly Asn Gly Ile Gly Val Glin Leu Pro Ala Lys Asn Lys Gly Ile Pro 65 70 75 att gag gtc. g.gt aga ggg cqa att cita citg gaa got act gala gtg gCa 287 Ile Glu Val Gly Arg Gly Arg Ile Leu Lieu Glu Gly Thr Glu Val Ala 8O 85 90 95 citt cta ggit tat ggit aca atg gtc. caa aat tigc citg gct gct cac gtc 335 Leu Lieu Gly Tyr Gly Thr Met Val Glin Asn. Cys Lieu Ala Ala His Val 1 OO 105 110 tta citt goc gac citg g g g g to tca gcg act gtc gcc gat got cqg titt 383 Leu Lieu Ala Asp Leu Gly Val Ser Ala Thr Val Ala Asp Ala Arg Phe 115 120 125 tgc aag coc citt gac cqt gat citt att cqc cag citt gct aag aac cat 431 Cys Llys Pro Leu Asp Arg Asp Lieu. Ile Arg Glin Leu Ala Lys Asn His 130 135 1 4 0 caa gtg citt attaca gtg gaa gag ggit tot att goa ggc titt ggit tot 479 Glin Val Leu Ile Thr Val Glu Glu Gly Ser Ile Gly Gly Phe Gly Ser 145 15 O 155 cat gtt gtg caa titc atg gca ttg gat ggg citc citc gac gga aag citg 527 His Val Val Glin Phe Met Ala Lieu. Asp Gly Lieu Lleu. Asp Gly Lys Lieu 160 1.65 170 175 aag togg aga cca citt gtg cita cot gac cqc tac atc ga 565 Lys Trp Arg Pro Leu Val Lieu Pro Asp Arg Tyr Ile 18O 185

<210> SEQ ID NO 6 &2 11s LENGTH 187 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 6 Phe Ala Met Asp Arg Ala Gly Lieu Val Gly Ala Asp Gly Pro Thr His 1 5 10 15 Cys Gly Ala Phe Asp Val Thr Tyr Met Ala Cys Leu Pro Asn Met Val 2O 25 30 Val Met Ala Pro Ala Asp Glu Ala Glu Leu Phe His Met Val Ala Thr 35 40 45 Ala Ala Ala Ile Asp Asp Arg Pro Ser Cys Phe Arg Tyr Pro Arg Gly US 2003/O157592 A1 Aug. 21, 2003 36

-continued

5 O 55 60 Asn Gly Ile Gly Val Glin Leu Pro Ala Lys Asn Lys Gly Ile Pro Ile 65 70 75 8O Glu Val Gly Arg Gly Arg Ile Leu Lieu Glu Gly Thr Glu Wall Ala Lieu 85 90 95 Leu Gly Tyr Gly Thr Met Val Glin Asn. Cys Lieu Ala Ala His Val Lieu 100 105 110 Leu Ala Asp Leu Gly Val Ser Ala Thr Val Ala Asp Ala Arg Phe Cys 115 120 125 Lys Pro Leu Asp Arg Asp Lieu. Ile Arg Glin Leu Ala Lys Asn His Glin 130 135 1 4 0 Val Leu Ile Thr Val Glu Glu Gly Ser Ile Gly Gly Phe Gly Ser His 145 15 O 155 160 Val Val Glin Phe Met Ala Lieu. Asp Gly Lieu Lieu. Asp Gly Lys Lieu Lys 1.65 170 175 Trp Arg Pro Leu Val Lieu Pro Asp Arg Tyr Ile 18O 185

<210 SEQ ID NO 7 &2 11s LENGTH 630 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (38) . . (394) <223> OTHER INFORMATION: 10 ppprot1 092 b08 rev &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 1. 630 <223> OTHER INFORMATION: n is g, a t or c <400 SEQUENCE: 7 gatting caat ggatcgtgct gntcttgttg gagctga togg cca act cac tot goa 55 Trp Pro Thr His Cys Gly 1 5 gcq titc gat gta acc tac atg gCt tot cita colt aat atg gta gtc atg 103 Ala Phe Asp Val Thr Tyr Met Ala Cys Leu Pro Asn Met Val Val Met 10 15 20 gct cot gct gac gaa gog gaa citt titc. cac at g g to goc act gct gct 151 Ala Pro Ala Asp Glu Ala Glu Lieu Phe His Met Val Ala Thr Ala Ala 25 30 35 caa att gat gat cqa cot agt tot titc agg tat coa agg ggit aac gga 199 Glin Ile Asp Asp Arg Pro Ser Cys Phe Arg Tyr Pro Arg Gly Asn Gly 40 45 50 atc ggit gcc cag titg cct gag aat aac aag ggg atc ccc gtc. gag att 247 Ile Gly Ala Glin Leu Pro Glu Asn. Asn Lys Gly Ile Pro Val Glu Ile 55 60 65 70 ggit aaa gga aga att cita tta gala ggt acg gala gtg gca citt ttg ggit 295 Gly Lys Gly Arg Ile Leu Lleu Glu Gly Thr Glu Val Ala Lieu Lieu Gly 75 8O 85 tat ggc acc at g g to cag aat tdt citg gCt gct c go goa tta citt go c 343 Tyr Gly. Thir Met Val Glin Asn. Cys Lieu Ala Ala Arg Ala Lieu Lieu Ala 90 95 100 gac ttg ggt gtt gcg gcg act gtt gct gat gct agg titc toc aag coc 391 Asp Leu Gly Val Ala Ala Thr Val Ala Asp Ala Arg Phe Cys Llys Pro 105 110 115 citt taaatgaaat citgaaaggitt aggaataggit gctgctgctc tdaaatcgga 444 Teu US 2003/O157592 A1 Aug. 21, 2003 37

-continued gcagtcggat gttctgtggg gagittagagg cct gttc.cgit tagg gaggat aattittc.cct 504 toagtacggit gcatcgaact tag acatggc aaattttgta ccctacacac tottgtaaat 564 tattogtggit gatcacctica ttaataagtgaaatggg acc gaacttgacc citt cacttitt 624 toaaaa. 630

<210 SEQ ID NO 8 &2 11s LENGTH 119 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 8 Trp Pro Thr His Cys Gly Ala Phe Asp Val Thr Tyr Met Ala Cys Leu 1 5 10 15 Pro Asn Met Val Val Met Ala Pro Ala Asp Glu Ala Glu Leu Phe His 2O 25 30 Met Val Ala Thr Ala Ala Glin Ile Asp Asp Arg Pro Ser Cys Phe Arg 35 40 45 Tyr Pro Arg Gly Asn Gly Ile Gly Ala Glin Leu Pro Glu Asn. Asn Lys 5 O 55 60 Gly Ile Pro Val Glu Ile Gly Lys Gly Arg Ile Leu Lleu Glu Gly Thr 65 70 75 8O Glu Val Ala Lieu Lleu Gly Tyr Gly Thr Met Val Glin Asn. Cys Lieu Ala 85 90 95 Ala Arg Ala Lieu Lleu Ala Asp Leu Gly Val Ala Ala Thr Val Ala Asp 100 105 110 Ala Arg Phe Cys Lys Pro Leu 115

<210 SEQ ID NO 9 &2 11s LENGTH 534 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (533) <223> OTHER INFORMATION: 68 ck12 d1 Ofwd <400 SEQUENCE: 9 ag cct titt tot agt atc tat toc toc titc citt caa aga gga tat gac 47 Pro Phe Cys Ser Ile Tyr Ser Ser Phe Leu Glin Arg Gly Tyr Asp 1 5 10 15 cag gtt gta cac gat gta gat citg cag aaa ttg cca gtc. c.ga titt goa 95 Glin Val Val His Asp Wall Asp Leu Gln Lys Lieu Pro Val Arg Phe Ala 2O 25 30 atg gat cqt gct ggit citt gtt gga gct gat ggg cca act cac tot gga 1 4 3 Met Asp Arg Ala Gly Lieu Val Gly Ala Asp Gly Pro Thr His Cys Gly 35 40 45 gcq titc gat gta acc tac atg gCt tot cita colt aat atg gta gtc atg 191 Ala Phe Asp Val Thr Tyr Met Ala Cys Leu Pro Asn Met Val Val Met 5 O 55 60 gct cot gct gac gaa gog gaa citt titc. cac at g g to goc act gct gct 239 Ala Pro Ala Asp Glu Ala Glu Lieu Phe His Met Val Ala Thr Ala Ala 65 70 75 caa att gat gat cqa cot agt tot titc agg tat coa agg ggit aac gga 287 Glin Ile Asp Asp Arg Pro Ser Cys Phe Arg Tyr Pro Arg Gly Asn Gly 8O 85 90 95 US 2003/O157592 A1 Aug. 21, 2003

-continued atc ggit gcc cag titg cct gag aat aac aag ggg atc ccc gtc. gag att 335 Ile Gly Ala Glin Leu Pro Glu Asn. Asn Lys Gly Ile Pro Val Glu Ile 1 OO 105 110 ggit aaa gga aga att cita tta gala ggt acg gala gtg gca citt ttg ggit 383 Gly Lys Gly Arg Ile Leu Lleu Glu Gly Thr Glu Val Ala Lieu Lieu Gly 115 120 125 tat ggc acc at g g to cag aat tdt citg gCt gct c go goa tta citt go c 431 Tyr Gly. Thir Met Val Glin Asn. Cys Lieu Ala Ala Arg Ala Lieu Lieu Ala 130 135 1 4 0 gac ttg ggt gtt gcg gcg act gtt gct gat gct agg titc toc aag coc 479 Asp Leu Gly Val Ala Ala Thr Val Ala Asp Ala Arg Phe Cys Llys Pro 145 15 O 155 citt gac cqa gat citt att cqt caa citt gog aag aac cac caa gtg att 527 Leu Asp Arg Asp Lieu. Ile Arg Glin Leu Ala Lys Asn His Glin Val Ile 160 1.65 170 175 ata acc c 534 Ile Thr

<210> SEQ ID NO 10 &2 11s LENGTH 177 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 10 Pro Phe Cys Ser Ile Tyr Ser Ser Phe Leu Glin Arg Gly Tyr Asp Gln 1 5 10 15 Val Val His Asp Wall Asp Leu Gln Lys Lieu Pro Val Arg Phe Ala Met 2O 25 30 Asp Arg Ala Gly Lieu Val Gly Ala Asp Gly Pro Thr His Cys Gly Ala 35 40 45 Phe Asp Val Thr Tyr Met Ala Cys Leu Pro Asn Met Val Val Met Ala 5 O 55 60 Pro Ala Asp Glu Ala Glu Lieu Phe His Met Val Ala Thr Ala Ala Glin 65 70 75 8O Ile Asp Asp Arg Pro Ser Cys Phe Arg Tyr Pro Arg Gly Asn Gly Ile 85 90 95 Gly Ala Glin Leu Pro Glu Asn. Asn Lys Gly Ile Pro Val Glu Ile Gly 100 105 110 Lys Gly Arg Ile Leu Lieu Glu Gly Thr Glu Val Ala Lieu Lieu Gly Tyr 115 120 125 Gly Thr Met Val Glin Asn. Cys Lieu Ala Ala Arg Ala Lieu Lieu Ala Asp 130 135 1 4 0 Leu Gly Val Ala Ala Thr Val Ala Asp Ala Arg Phe Cys Llys Pro Leu 145 15 O 155 160 Asp Arg Asp Lieu. Ile Arg Glin Leu Ala Lys Asn His Glin Wal Ile Ile 1.65 170 175

Thr

<210> SEQ ID NO 11 &2 11s LENGTH 567 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) ... (118) <223> OTHER INFORMATION: 39 cK27 g O2fwdrew US 2003/O157592 A1 Aug. 21, 2003 39

-contin ued

<400 SEQUENCE: 11 c atc gag cat ggg gct coc aag gac cag tat gcc gaa goa ggit cita act 49 Ile Glu His Gly Ala Pro Lys Asp Glin Tyr Ala Glu Ala Gly Lieu. Thr 1 5 10 15 gCg ggit CaC att gca gcc act gca citg aac gtt citc ggg aag acg aga 97 Ala Gly His Ile Ala Ala Thr Ala Lieu. Asn. Wall Leu Gly Lys Thr Arg 2O 25 30 gala gC g Ctg Cala gto atg acc talagat.ctitc gtggittaaga tatggtgaat 148 Glu Ala Leu Glin Wall Met Thr 35 togttgcgaa citatgatcca gtcgacgacg ggcttctdat caatcaaag.c attacccaga 208 ttgcatgtct galacatgcca totaatgaac atattotggit citactgttcg totccittaaa. 268 tttaca aggc aacttctato atttgctgat tgcttagcag actitgaagat agggtottac 328 togaaagctg aaacgttgaa tatagatgct gctactctaa aattagagca gttggatggit 388 ttctaggcag ttatttggta tactacgc.ca tggagggcaa. tocgtactg.c actgctgtag 4 48 gctttgagcc taaacaatgc caaagtttgt actittacaca citcttgtaca citatagitttg 508 atcattcc.ca tittaataact gtaatggggit gcatgatgac totttittctic aaaaaaaaa. 567

<210> SEQ ID NO 12 &2 11s LENGTH 39 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens

<400 SEQUENCE: 12 Ile Glu. His Gly Ala Pro Lys Asp Glin Tyr Ala Glu Ala Gly Lieu. Thr 1 5 10 15 Ala Gly His Ile Ala Ala Thr Ala Lieu. Asn. Wall Leu Gly Lys Thr Arg 25 30

Glu Ala Leu Glin Wal Met Thr 35

<210> SEQ ID NO 13 &2 11s LENGTH 523 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (521) <223> OTHER INFORMATION: 68 mm17 D1 Orew

<400 SEQUENCE: 13 ga titt gca atg gac cqa gct ggg citc gtt goa gCC gat ggg cct act 47 Phe Ala Met Asp Arg Ala Gly Lieu Val Gly Ala Asp Gly Pro Thr 1 5 10 15

Cac togt ggg gct titc. gat gtc acc tac at g gCC tgc cta cot aac atg 95 His Cys Gly Ala Phe Asp Val Thr Tyr Met Ala Cys Lieu Pro Asn Met 2O 25 30 gtt gta atg gct cct gct gat gala gct gag citt titc. cac atg gta gCa 1 4 3 Wal Wal Met Ala Pro Ala Asp Glu Ala Glu Lieu Phe His Met Wall Ala 35 40 45 act gct gcc gct att gat gac cqt cc c agc tot titc agg tat cc c aga 191 Thir Ala Ala Ala Ile Asp Asp Arg Pro Ser Cys Phe Arg Tyr Pro Arg 5 O 55 60 ggit aac ggg att ggit gto caa ttg cct gca aag aac aaa gga att cost 239 Gly Asn Gly Ile Gly Wall Glin Leu Pro Ala Lys Asn Lys Gly Ile Pro US 2003/O157592 A1 Aug. 21, 2003 40

-continued

65 70 75 att gag gtc. g.gt aga ggg cqa att cita citg gaa got act gala gtg gCa 287 Ile Glu Val Gly Arg Gly Arg Ile Leu Lieu Glu Gly Thr Glu Val Ala 8O 85 90 95 citt cita ggit tat ggit aca atg gtc. caa aat tdc citg gct gct cac gtc 335 Leu Lieu Gly Tyr Gly Thr Met Val Glin Asn. Cys Lieu Ala Ala His Val 1 OO 105 110 tta citt goc gac citg g g g g to tca gcg act gtc gcc gat got cqg titt 383 Leu Lieu Ala Asp Leu Gly Val Ser Ala Thr Val Ala Asp Ala Arg Phe 115 120 125 tgc aag coc citt gac cqt gat citt att cqc cag citt gct aag aac cat 431 Cys Llys Pro Leu Asp Arg Asp Lieu. Ile Arg Glin Leu Ala Lys Asn His 130 135 1 4 0 caa gtg citt attaca gtg gaa gag ggit tot att goa ggc titt ggit tot 479 Glin Val Leu Ile Thr Val Glu Glu Gly Ser Ile Gly Gly Phe Gly Ser 145 15 O 155 cat gtt gtg caa titc atg gca ttg gat ggg citc citc gac gga aa 523 His Val Val Glin Phe Met Ala Lieu. Asp Gly Lieu Lleu. Asp Gly 160 1.65 170

<210> SEQ ID NO 14 &2 11s LENGTH 173 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 14 Phe Ala Met Asp Arg Ala Gly Leu Val Gly Ala Asp Gly Pro Thr His 1 5 10 15 Cys Gly Ala Phe Asp Val Thr Tyr Met Ala Cys Leu Pro Asn Met Val 2O 25 30 Val Met Ala Pro Ala Asp Glu Ala Glu Leu Phe His Met Val Ala Thr 35 40 45 Ala Ala Ala Ile Asp Asp Arg Pro Ser Cys Phe Arg Tyr Pro Arg Gly 5 O 55 60 Asn Gly Ile Gly Val Glin Leu Pro Ala Lys Asn Lys Gly Ile Pro Ile 65 70 75 8O Glu Val Gly Arg Gly Arg Ile Leu Lieu Glu Gly Thr Glu Wall Ala Lieu 85 90 95 Leu Gly Tyr Gly Thr Met Val Glin Asn. Cys Lieu Ala Ala His Val Lieu 100 105 110 Leu Ala Asp Leu Gly Val Ser Ala Thr Val Ala Asp Ala Arg Phe Cys 115 120 125 Lys Pro Leu Asp Arg Asp Lieu. Ile Arg Glin Leu Ala Lys Asn His Glin 130 135 1 4 0 Val Leu Ile Thr Val Glu Glu Gly Ser Ile Gly Gly Phe Gly Ser His 145 15 O 155 160 Val Val Glin Phe Met Ala Lieu. Asp Gly Lieu Lieu. Asp Gly 1.65 170

<210 SEQ ID NO 15 &2 11s LENGTH 510 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (452) <223> OTHER INFORMATION: 93 ck10 ho5fwdrew US 2003/O157592 A1 Aug. 21, 2003 41

-continued

<400 SEQUENCE: 15 tt toa ttg cag toc tat to a tta gaa aag tat ttg cct ttg ttg gca 47 Ser Leu Glin Ser Tyr Ser Lieu Glu Lys Tyr Lieu Pro Leu Lleu Ala 1 5 10 15 tgc aga citc at a ggg tta gta gag cqa tog aat cqt cac go a gga gaa 95 Cys Arg Lieu. Ile Gly Lieu Val Glu Arg Trp Asn Arg His Ala Gly Glu 2O 25 30 cca cag gtt goc tac acg titt gait gct ggit cog aat gcg gta atg titt 1 4 3 Pro Glin Val Ala Tyr Thr Phe Asp Ala Gly Pro Asn Ala Val Met Phe 35 40 45 gcc aag aac aaa gaa gtt gca gC g cag citg citt cag cqc citt citg tac 191 Ala Lys Asn Lys Glu Val Ala Ala Glin Leu Lieu Glin Arg Lieu Lleu Tyr 5 O 55 60 cag titc cct coa toc gog gat act gat att toc aga tat gtt cac ggc 239 Gln Phe Pro Pro Ser Ala Asp Thr Asp Ile Ser Arg Tyr Val His Gly 65 70 75 gat caa agt att ttg gag tot got ggc gtgaat toc titg aag gac atc 287 Asp Glin Ser Ile Leu Glu Ser Ala Gly Val Asn. Ser Lieu Lys Asp Ile 8O 85 90 95 gac toc citt tot gcg cca gct gag gtg gCt ggc att coc aat ttg cag 335 Asp Ser Lieu Ser Ala Pro Ala Glu Val Ala Gly Ile Pro Asn Lieu Glin 1 OO 105 110 agg at a cot gga gag gtt gac tat citc at a toc act aat gtt ggg aaa. 383 Arg Ile Pro Gly Glu Val Asp Tyr Lieu. Ile Cys Thr Asn. Wall Gly Lys 115 120 125 ggit gca tat gta ttg ggc gag cag ggit gca aac citg ata gac cot gtt 431 Gly Ala Tyr Val Lieu Gly Glu Glin Gly Ala Asn Lieu. Ile Asp Pro Wal 130 135 1 4 0 tot ggit citt citg aaa aag taa tag catttag tatcaggtgc taatttgttc 482 Ser Gly Lieu Lleu Lys Lys 145 tggatcaagc ticgcto catc atgctaat 510

<210> SEQ ID NO 16 &2 11s LENGTH 149 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 16 Ser Lieu Glin Ser Tyr Ser Lieu Glu Lys Tyr Lieu Pro Leu Lieu Ala Cys 1 5 10 15 Arg Lieu. Ile Gly Lieu Val Glu Arg Trp Asn Arg His Ala Gly Glu Pro 2O 25 30 Glin Val Ala Tyr Thr Phe Asp Ala Gly Pro Asn Ala Val Met Phe Ala 35 40 45 Lys Asn Lys Glu Val Ala Ala Glin Leu Lieu Glin Arg Lieu Lleu Tyr Glin 5 O 55 60 Phe Pro Pro Ser Ala Asp Thr Asp Ile Ser Arg Tyr Val His Gly Asp 65 70 75 8O Glin Ser Ile Leu Glu Ser Ala Gly Val Asn. Ser Lieu Lys Asp Ile Asp 85 90 95 Ser Lieu Ser Ala Pro Ala Glu Val Ala Gly Ile Pro Asn Lieu Glin Arg 100 105 110 Ile Pro Gly Glu Val Asp Tyr Lieu. Ile Cys Thr Asn Val Gly Lys Gly 115 120 125 US 2003/O157592 A1 Aug. 21, 2003 42

-continued

Ala Tyr Val Lieu Gly Glu Glin Gly Ala Asn Lieu. Ile Asp Pro Val Ser 130 135 1 4 0 Gly Lieu Lleu Lys Lys 145

<210 SEQ ID NO 17 &2 11s LENGTH 409 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) ... (408) <223> OTHER INFORMATION: 66 boos c12rev &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 405 <223> OTHER INFORMATION: n is g, a t, or c <400 SEQUENCE: 17 aat gtt citt gat tac citt caa acc gat titc. coc gat atg gat gtc atg 48 Asn Val Lieu. Asp Tyr Lieu Glin Thr Asp Phe Pro Asp Met Asp Wal Met 1 5 10 15 ggc att tot gga aac tat tdc tog gac aag aaa ccg gct gc g g to aac 96 Gly Ile Ser Gly Asn Tyr Cys Ser Asp Llys Llys Pro Ala Ala Val Asn 2O 25 30 tgg at a gaa ggg cqt ggit aaa tot gtg gtt tot gaa got gtg atc aag 144 Trp Ile Glu Gly Arg Gly Lys Ser Val Val Cys Glu Ala Wal Ile Lys 35 40 45 gala gag gtg gtg agc aag gtt ttg aaa acc aat gta gcc agt ttg gtc 192 Glu Glu Val Val Ser Lys Val Leu Lys Thr Asn Val Ala Ser Leu Val 5 O 55 60 gaa citt aac at g citc aag aac cita acc ggg to a gcc atg gCt ggit gca 240 Glu Lieu. Asn Met Leu Lys Asn Lieu. Thr Gly Ser Ala Met Ala Gly Ala 65 70 75 8O citt ggt ggg ttcaat gcg cat gct agc aat ata gtc. tcg gCt at a tat 288 Leu Gly Gly Phe Asn Ala His Ala Ser Asn. Ile Val Ser Ala Ile Tyr 85 90 95 ata gCC acc ggit caa gac cca gCC cag aat gttc gag agt tot cac toc 336 Ile Ala Thr Gly Glin Asp Pro Ala Glin Asn Val Glu Ser Ser His Cys 100 105 110 atc acc at g at g gaa goc att aac aat gga aaa gat citc cat atc to a 384 Ile Thr Met Met Glu Ala Ile Asn. Asn Gly Lys Asp Lieu. His Ile Ser 115 120 125 gto acc atg cct tct att gan gtt g 4.09 Wall Thr Met Pro Ser Ile Xaa Wall 130 135

<210> SEQ ID NO 18 &2 11s LENGTH 136 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 135 <223> OTHER INFORMATION: Xaa is Glu or Asp. <400 SEQUENCE: 18 Asn Val Lieu. Asp Tyr Lieu Glin Thr Asp Phe Pro Asp Met Asp Wal Met 1 5 10 15 Gly Ile Ser Gly Asn Tyr Cys Ser Asp Llys Llys Pro Ala Ala Val Asn 2O 25 30 US 2003/O157592 A1 Aug. 21, 2003 43

-continued

Trp Ile Glu Gly Arg Gly Lys Ser Val Val Cys Glu Ala Wal Ile Lys 35 40 45 Glu Glu Val Val Ser Lys Val Leu Lys Thr Asn Val Ala Ser Leu Val 5 O 55 60 Glu Lieu. Asn Met Leu Lys Asn Lieu. Thr Gly Ser Ala Met Ala Gly Ala 65 70 75 8O Leu Gly Gly Phe Asn Ala His Ala Ser Asn. Ile Val Ser Ala Ile Tyr 85 90 95 Ile Ala Thr Gly Glin Asp Pro Ala Glin Asn Val Glu Ser Ser His Cys 100 105 110 Ile Thr Met Met Glu Ala Ile Asn. Asn Gly Lys Asp Lieu. His Ile Ser 115 120 125

Wall Thr Met Pro Ser Ile Xaa Wall 130 135

<210 SEQ ID NO 19 &2 11s LENGTH 694 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (461) <223> OTHER INFORMATION: 26 ppprot1.40 E07 rev <400 SEQUENCE: 19 ct gga aac ggit ata tat aca ccc atg gat cog aaa ttg citt cot caa 47 Gly Asn Gly Ile Tyr Thr Pro Met Asp Pro Llys Leu Leu Pro Glin 1 5 10 15 citg tac citg atc tac acg aag aat coc agc gat tot goc aag gtg cat 95 Leu Tyr Lieu. Ile Tyr Thr Lys Asn Pro Ser Asp Ser Gly Lys Wal His 2O 25 30 agt acg gtg agg aaa agg tog tta gac ggit gat gaa ttg gtt agg aat 1 4 3 Ser Thr Val Arg Lys Arg Trp Lieu. Asp Gly Asp Glu Lieu Val Arg Asn 35 40 45 tgt at g aaa gala gtt gcg agt citt gcc gta aag gga cqa gat gct ttg 191 Cys Met Lys Glu Val Ala Ser Lieu Ala Wall Lys Gly Arg Asp Ala Lieu 5 O 55 60 citt cqg caa gat ttt toc acc atc gcg aag cita atg gac acc aac titt 239 Leu Arg Glin Asp Phe Ser Thr Ile Ala Lys Lieu Met Asp Thr Asn. Phe 65 70 75 gac tta cqt aga act atg titt ggc gat gct act citt goa aag atgaac 287 Asp Leu Arg Arg Thr Met Phe Gly Asp Ala Thr Lieu Gly Lys Met Asn 8O 85 90 95 att aaa at g gtt gag act gct cqc ggt gtt gga gct gca toc aag titt 335 Ile Lys Met Val Glu Thr Ala Arg Gly Val Gly Ala Ala Cys Llys Phe 1 OO 105 110 aca ggg agt gga ggit gca gtt att gca ttctgt cct gac ggc gala aag 383 Thr Gly Ser Gly Gly Ala Val Ile Ala Phe Cys Pro Asp Gly Glu Lys 115 120 125 caa gtg aag gCt ttg cag gag got tot got aaa got got tac act gtt 431 Glin Wall Lys Ala Leu Glin Glu Ala Cys Ala Lys Ala Gly Tyr Thr Val 130 135 1 4 0 gag ggt gtt att cot gct coa gCC aat gtc talaccitataa tatcc tagat 481 Glu Gly Val Ile Pro Ala Pro Ala Asn Val 145 15 O ttctgaga.gc gggtgg gaat titccalaggta ataatcatgg citgagtgcta tittatto gag 541 US 2003/O157592 A1 Aug. 21, 2003 44

-continued cactaaaaga ggatttittaa atacgctdaa tdcacgtatt tttctagttt cotctgtttg 6O1 accatgaaaa agggaaatgt acatgatgaa actgacaagg acactgcatc cagtatagtic 661 cittaacattt tttcctcitcc tittcttgaaa aaa 694

<210> SEQ ID NO 20 &2 11s LENGTH 153 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 20 Gly Asn Gly Ile Tyr Thr Pro Met Asp Pro Llys Lieu Lleu Pro Glin Lieu 1 5 10 15 Tyr Leu Ile Tyr Thr Lys Asn Pro Ser Asp Ser Gly Lys Val His Ser 2O 25 30 Thr Val Arg Lys Arg Trp Lieu. Asp Gly Asp Glu Lieu Val Arg Asn. Cys 35 40 45 Met Lys Glu Val Ala Ser Leu Ala Wall Lys Gly Arg Asp Ala Lieu Lieu 5 O 55 60 Arg Glin Asp Phe Ser Thr Ile Ala Lys Lieu Met Asp Thr Asn. Phe Asp 65 70 75 8O Leu Arg Arg Thr Met Phe Gly Asp Ala Thr Lieu Gly Lys Met Asn. Ile 85 90 95 Lys Met Val Glu Thr Ala Arg Gly Val Gly Ala Ala Cys Llys Phe Thr 100 105 110 Gly Ser Gly Gly Ala Val Ile Ala Phe Cys Pro Asp Gly Glu Lys Glin 115 120 125 Wall Lys Ala Leu Glin Glu Ala Cys Ala Lys Ala Gly Tyr Thr Val Glu 130 135 1 4 0 Gly Val Ile Pro Ala Pro Ala Asn Val 145 15 O

<210> SEQ ID NO 21 &2 11s LENGTH 548 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) ... (457) <223> OTHER INFORMATION: 45 ck24 ho2fwd

<400 SEQUENCE: 21 c at g gat gac att atg gac aat to a gttc act c gt cqa gga caa cot to c 49 Met Asp Asp Ile Met Asp Asn. Ser Val Thr Arg Arg Gly Glin Pro Cys 1 5 10 15 tgg tac cqc gtt coa aag gtt ggc citc att got atc aac gat gga ata 97 Trp Tyr Arg Val Pro Lys Val Gly Lieu. Ile Ala Ile Asn Asp Gly Ile 2O 25 30 atc titg aga acg cat atc. tct cqt gtt citg aag aga cat titc cqg cag 145 Ile Leu Arg Thr His Ile Ser Arg Val Lieu Lys Arg His Phe Arg Glin 35 40 45 toc coa atc tat gtg gaa citt gtc gac tta ttcaat gat gtc gag tat 193 Ser Pro Ile Tyr Val Glu Leu Val Asp Leu Phe Asn Asp Val Glu Tyr 5 O 55 60 cag aca gcc tot goa cag atg ttg gac citg atc acc act coa gca gga 241 Gln Thr Ala Ser Gly Glin Met Leu Asp Leu Ile Thr Thr Pro Ala Gly 65 70 75 8O US 2003/O157592 A1 Aug. 21, 2003

-continued gala gtt gat ttg tog aaa tat gta tta coc act tat citg cqa atc gta 289 Glu Val Asp Leu Ser Lys Tyr Val Leu Pro Thr Tyr Leu Arg Ile Val 85 90 95 aaa tac aaa act gca tat tat to a titt tat citt cot gtg gCa tot goc 337 Lys Tyr Lys Thr Ala Tyr Tyr Ser Phe Tyr Leu Pro Val Ala Cys Ala 100 105 110 ttg citt tta gCt ggg gag acg agc gtg gCC aag titt gag gCa gct aag 385 Leu Lleu Lieu Ala Gly Glu Thir Ser Val Ala Lys Phe Glu Ala Ala Lys 115 120 125 gala gtc. citt gta cag at g g gC aca tac titc caa gtc. cag gac gac tat 433 Glu Val Leu Val Gln Met Gly Thr Tyr Phe Glin Val Glin Asp Asp Tyr 130 135 1 4 0 citt gac tot tac ggc gcg cca gala gtgattggaa agat.cggaac togacattgaa 487 Leu Asp Cys Tyr Gly Ala Pro Glu 145 15 O gacactaaat gttcctggct gatagttcaa gocittaaagc gtgccaatga atc.ccagaaa 547 c 548

<210> SEQ ID NO 22 &2 11s LENGTH 152 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 22 Met Asp Asp Ile Met Asp Asn. Ser Val Thr Arg Arg Gly Glin Pro Cys 1 5 10 15 Trp Tyr Arg Val Pro Lys Val Gly Lieu. Ile Ala Ile Asn Asp Gly Ile 2O 25 30 Ile Leu Arg Thr His Ile Ser Arg Val Lieu Lys Arg His Phe Arg Glin 35 40 45 Ser Pro Ile Tyr Val Glu Leu Val Asp Leu Phe Asn Asp Val Glu Tyr 5 O 55 60 Gln Thr Ala Ser Gly Glin Met Leu Asp Leu Ile Thr Thr Pro Ala Gly 65 70 75 8O Glu Val Asp Leu Ser Lys Tyr Val Leu Pro Thr Tyr Leu Arg Ile Val 85 90 95 Lys Tyr Lys Thr Ala Tyr Tyr Ser Phe Tyr Leu Pro Val Ala Cys Ala 100 105 110 Leu Lleu Lieu Ala Gly Glu Thir Ser Val Ala Lys Phe Glu Ala Ala Lys 115 120 125 Glu Val Leu Val Gln Met Gly Thr Tyr Phe Glin Val Glin Asp Asp Tyr 130 135 1 4 0 Leu Asp Cys Tyr Gly Ala Pro Glu 145 15 O

<210> SEQ ID NO 23 <211& LENGTH 54 4 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (5.39) <223> OTHER INFORMATION: 95 boo2 ho6rev &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 532, 541 <223> OTHER INFORMATION: n is g, a c or t US 2003/O157592 A1 Aug. 21, 2003 46

-continued <400 SEQUENCE: 23 ct gga att caa citt tot ct g tac aga toa aat citc agc cqt coa toc 47 Gly Ile Glin Leu Ser Leu Tyr Arg Ser Asn Lieu Ser Arg Pro Ser 1 5 10 15

gto toa cc.g gca cca tct gct tac cgt. aga titt acc atc atc to c ggit 95 Wall Ser Pro Ala Pro Ser Ala Arg Arg Phe Thr Ile Ile Ser Gly 2O 25 30

atg gcc Cala a.a. C. Cala toa tat tgg gat toa ata cat toa gat atc gac 1 4 3 Met Ala Glin Asn Glin Ser Trp Asp Ser Ile His Ser Asp Ile Asp 35 40 45

to c cac citg a.a.a. a.a.a. gcc att cca att cgt. gag ccc. gtt to c gtt titc. 191 Ser His Telu Lys Ala Ile Pro Ile Arg Glu Pro Wall Ser Wall Phe 5 O 55 60 gag cca atg cac cac a Ca titt gca cca ccc. a.a.a. to c aCC gCg tog 239 Glu Pro Met His His Teu Thr Phe Ala Pro Pro Lys Ser Thr Ala Ser 65 70 75 gCg ttg ata gcc gcc gag cita gta ggC ggC cac cgg gaa gat 287 Ala Telu Cys Ile Ala Ala Glu Telu Wall Gly Gly His Arg Glu Asp 8O 85 90 95 gca gtt gtg gCg gCg toa gcc att cac cita atg cat gct tot ata tac 335 Ala Wall Wall Ala Ala Ser Ala Ile His Telu Met His Ala Ser Ile Tyr 1 OO 105 110 act cat gag cat citc. cita agg gaa cgg gcc atg ccc. gaa to c aga 383 Thr His Glu His Teu Teu Teu Arg Glu Arg Ala Met Pro Glu Ser Arg 115 120 125

atc. Ca CaC aag titt ggc cc.g aat atc. gag citt Cita act ggc gat ggg 431 Ile Pro His Lys Phe Gly Pro Asn Ile Glu Teu Teu Thr Gly Asp Gly 130 135 1 4 0 titt citg cct titc. ggg titt gag ttg citg gct gga tct gCg a.a. C. cag cita 479 Phe Telu Pro Phe Gly Phe Glu Telu Telu Ala Gly Ser Ala Asn Glin Telu 145 15 O 155

gta aCa act citg alta aat act aag ggit gat cat aga gat cac cc.g agc 527 Wall Thr Thr Telu Ile Asn Thr Lys Gly Asp His Arg Asp His Pro Ser 160 1.65 170 175

cgt. ang tgc angga 544 Arg Xaa Cys

SEQ ID NO 24 LENGTH 178 TYPE PRT ORGANISM: Physcomitrella patens FEATURE: NAME/KEY: misc feature LOCATION: 178 OTHER INFORMATION: Xaa is Lys, Met Thr, or Arg

<400 SEQUENCE: 24

Gly Ile Glin Telu Ser Teu Tyr Arg Ser Asn Teu Ser Arg Pro Ser Wall 1 5 10 15

Ser Pro Ala Pro Ser Ala Tyr Arg Arg Phe Thr Ile Ile Ser Gly Met 25 30

Ala Glin Asn Glin Ser Trp Asp Ser Ile His Ser Asp Ile Asp Ser 35 40 45

His Telu Ala Ile Pro Ile Arg Glu Pro Wall Ser Wall Phe Glu 5 O 55 60

Pro Met His His Teu Thr Phe Ala Pro Pro Lys Ser Thr Ala Ser Ala 65 70 75 8O US 2003/O157592 A1 Aug. 21, 2003 47

-continued Lieu. Cys Ile Ala Ala Cys Glu Lieu Val Gly Gly His Arg Glu Asp Ala 85 90 95 Val Val Ala Ala Ser Ala Ile His Leu Met His Ala Ser Ile Tyr Thr 100 105 110 His Glu His Lieu Lleu Lieu Arg Glu Arg Ala Met Pro Glu Ser Arg Ile 115 120 125 Pro His Llys Phe Gly Pro Asn. Ile Glu Lieu Lieu. Thr Gly Asp Gly Phe 130 135 1 4 0 Leu Pro Phe Gly Phe Glu Lieu Lieu Ala Gly Ser Ala Asn Glin Leu Val 145 15 O 155 160 Thir Thr Lieu. Ile Asn. Thir Lys Gly Asp His Arg Asp His Pro Ser Arg 1.65 170 175 Xaa Cys

<210> SEQ ID NO 25 &2 11s LENGTH 586 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (585) <223> OTHER INFORMATION: 14 ppprot1 53 co7 <400 SEQUENCE: 25 ccg aag togt gac cac gtt gca gttc gga acg ggg acg gtc atc aac aag 48 Pro Lys Cys Asp His Val Ala Val Gly Thr Gly Thr Val Ile Asin Lys 1. 5 10 15 cca gCC atc aaa aag tac cag acg gcc acg agg aac cqg gC g aag gac 96 Pro Ala Ile Lys Lys Tyr Glin Thr Ala Thr Arg Asn Arg Ala Lys Asp 2O 25 30 aag att gcc gga gga aag atc atc agg gtt gag goa cac coc att cog 144 Lys Ile Ala Gly Gly Lys Ile Ile Arg Val Glu Ala His Pro Ile Pro 35 40 45 gag cac cca agg cct C gC agg gC g agc gaC aga. gtg gcq tta gtt ggg 192 Glu His Pro Arg Pro Arg Arg Ala Ser Asp Arg Val Ala Lieu Val Gly 5 O 55 60 gac gog gCt gga tac gitg acg aag toc toc ggg gag ggt atc tac titt 240 Asp Ala Ala Gly Tyr Val Thr Lys Cys Ser Gly Glu Gly Ile Tyr Phe 65 70 75 8O gct gct aag tot goa cigc atg togt gct gag gCt att gtg gala ggc to c 288 Ala Ala Lys Ser Gly Arg Met Cys Ala Glu Ala Ile Val Glu Gly Ser 85 90 95 gcc aac gga act c gt atg att gac gag to a gat titg agg aca tat cita 336 Ala Asn Gly Thr Arg Met Ile Asp Glu Ser Asp Leu Arg Thr Tyr Lieu 100 105 110 gat aaa togg gac aag aag tac togg gca act tac aag gtg citg gac at a 384 Asp Llys Trp Asp Lys Lys Tyr Trp Ala Thr Tyr Lys Val Lieu. Asp Ile 115 120 125 ttg cag aag gtt titc tac agg to c aac cot goc aga gag gCattic gtc 432 Leu Gln Lys Val Phe Tyr Arg Ser Asn Pro Ala Arg Glu Ala Phe Val 130 135 1 4 0 gag at g togc goc gac gac tac gtg caa aag atg acg titt gat agt tat 480 Glu Met Cys Ala Asp Asp Tyr Val Glin Lys Met Thr Phe Asp Ser Tyr 145 15 O 155 160 ttg tac aag gtg gtg gtg cct gga aac coa ttg gac gac citg aag cita 528 Leu Tyr Lys Val Val Val Pro Gly Asn Pro Leu Asp Asp Leu Lys Lieu 1.65 170 175 US 2003/O157592 A1 Aug. 21, 2003

-continued gca gtt aac act atc ggg agc citg atc aga gCo aat gca ttg cqc aag 576 Ala Val Asn. Thir Ile Gly Ser Lieu. Ile Arg Ala Asn Ala Lieu Arg Lys 18O 185 190 gag tot gag a 586 Glu Ser Glu 195

<210> SEQ ID NO 26 &2 11s LENGTH 195 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 26 Pro Lys Cys Asp His Val Ala Val Gly Thr Gly Thr Val Ile Asin Lys 1 5 10 15 Pro Ala Ile Lys Lys Tyr Glin Thr Ala Thr Arg Asn Arg Ala Lys Asp 2O 25 30 Lys Ile Ala Gly Gly Lys Ile Ile Arg Val Glu Ala His Pro Ile Pro 35 40 45 Glu His Pro Arg Pro Arg Arg Ala Ser Asp Arg Val Ala Lieu Val Gly 5 O 55 60 Asp Ala Ala Gly Tyr Val Thr Lys Cys Ser Gly Glu Gly Ile Tyr Phe 65 70 75 8O Ala Ala Lys Ser Gly Arg Met Cys Ala Glu Ala Ile Val Glu Gly Ser 85 90 95 Ala Asn Gly. Thr Arg Met Ile Asp Glu Ser Asp Leu Arg Thr Tyr Leu 100 105 110 Asp Llys Trp Asp Lys Lys Tyr Trp Ala Thr Tyr Lys Val Lieu. Asp Ile 115 120 125 Leu Gln Lys Val Phe Tyr Arg Ser Asn Pro Ala Arg Glu Ala Phe Val 130 135 1 4 0 Glu Met Cys Ala Asp Asp Tyr Val Glin Lys Met Thr Phe Asp Ser Tyr 145 15 O 155 160 Leu Tyr Lys Val Val Val Pro Gly Asn Pro Leu Asp Asp Leu Lys Lieu 1.65 170 175 Ala Val Asn. Thir Ile Gly Ser Lieu. Ile Arg Ala Asn Ala Lieu Arg Lys 18O 185 190

Glu Ser Glu 195

<210 SEQ ID NO 27 &2 11s LENGTH 655 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (92). . (349) <223> OTHER INFORMATION: 34 ppprot1 092 fo8 rev <400 SEQUENCE: 27 totggacgca totgtgctga ggctattgttg aaggctcc.gc caacggaact c gitatgattg 60 acgagt caga tittgaggaca tat citagata a atg gga caa gaa gta citg gca 112 Met Gly Glin Glu Val Leu Ala 1 5 act tac aag gtg citg gac ata ttg cag aag gtt titc tac agg to c aac 160 Thr Tyr Lys Val Lieu. Asp Ile Leu Gln Lys Val Phe Tyr Arg Ser Asn 10 15 20 US 2003/O157592 A1 Aug. 21, 2003 49

-continued cct gcc aga gag goa titc gito gag at g togc goc gac gac tac gitg caa 208 Pro Ala Arg Glu Ala Phe Val Glu Met Cys Ala Asp Asp Tyr Val Glin 25 30 35 aag at g acg titt gat agt tat ttg tac aag gtg gtg gtg cct gga aac 256 Lys Met Thr Phe Asp Ser Tyr Leu Tyr Lys Val Val Val Pro Gly Asn 40 45 50 55 cca ttg gac gac citg aag cita gCa gtt aac act atc ggg agc citg atc 3O4. Pro Leu Asp Asp Leu Lys Lieu Ala Val Asn. Thir Ile Gly Ser Lieu. Ile 60 65 70 aga gCC aat gca ttg cqc aag gag tot gag aag atg acc gta tag 349 Arg Ala Asn Ala Leu Arg Lys Glu Ser Glu Lys Met Thr Val 75 8O 85 ygtgtggcgct ggaaatcttic to agttgata ttggcc agtic citcc to gaat totaaaattg 4.09 tagtggtata titc.cgaggct cocgggcacg gctctggttt togtaatcaa ttittgacitac 469 catt cattta cittgtagaac agagtaagta to cittittagt atc.ccgggat taggaatgct 529 agataatact ttgcagotaa tittaa.ccggc tict gaattta citaag.cgtoc toc goggttt 589 gacacatcct gaattctaat totcitcagat gttgttc.cct to atggc gala aaaaaaaaaa 649 aaaaaa. 655

<210> SEQ ID NO 28 &2 11s LENGTH 85 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens

<400 SEQUENCE: 28 Met Gly Glin Glu Val Lieu Ala Thr Tyr Lys Val Lieu. Asp Ile Leu Glin 1 5 10 15 Lys Val Phe Tyr Arg Ser Asn Pro Ala Arg Glu Ala Phe Val Glu Met 2O 25 30 Cys Ala Asp Asp Tyr Val Glin Lys Met Thr Phe Asp Ser Tyr Lieu. Tyr 35 40 45 Lys Val Val Val Pro Gly Asn Pro Leu Asp Asp Leu Lys Lieu Ala Wal 5 O 55 60 Asn. Thir Ile Gly Ser Lieu. Ile Arg Ala Asn Ala Lieu Arg Lys Glu Ser 65 70 75 8O Glu Lys Met Thr Val 85

<210 SEQ ID NO 29 &2 11s LENGTH 604 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (22). . (603) <223> OTHER INFORMATION: 83 ppprot1 056 foe &220s FEATURE <221 NAME/KEY: misc feature &222> LOCATION 595 <223> OTHER INFORMATION: n is g, a t, or c <400 SEQUENCE: 29 gtocatcttgt gcggggcctg a gac att gcg aga cat tot gca gtc at g gCt 51 Asp Ile Ala Arg His Ser Ala Wal Met Ala 1 5 10 tot citc cag goc gtt atc acc got toc cot goc toc titc got gog to c 99 US 2003/O157592 A1 Aug. 21, 2003 50

-continued

Ser Leu Glin Ala Wall Ile Thr Ala Ser Pro Ala Ser Phe Ala Ala Ser 15 2O 25 tot aga gC c gtc. tcc tocc cac to g gag act gct gcc gito ttg gtg cct 147 Ser Arg Ala Val Ser Ser His Ser Glu Thr Ala Ala Val Leu Val Pro 30 35 40 tgc gcc agc att toc toc cqa ggc gtg agc act tct toc ct g g g c titt 195 Cys Ala Ser Ile Ser Ser Arg Gly Val Ser Thr Ser Cys Leu Gly Phe 45 5 O 55 gtt goc toc agc ggg cqt aat gct tcg ttg aag toc titc gag ggc titg 243 Val Ala Ser Ser Gly Arg Asn Ala Ser Lieu Lys Ser Phe Glu Gly Lieu 60 65 70 agg ggt ttgaat gcc agt gga ccc acc toc goc gtg gag agc citg aag 291 Arg Gly Lieu. Asn Ala Ser Gly Pro Thir Ser Ala Val Glu Ser Lieu Lys 75 8O 85 90 gcc gag aga aga agc aat gtg gtt gala gaa gCo gga tac cag cot citt 339 Ala Glu Arg Arg Ser Asn Val Val Glu Glu Ala Gly Tyr Glin Pro Leu 95 100 105 Cgg gtg tat gCC gcg agg gga agt aaa aag att gag ggg cqa aag ttg 387 Arg Val Tyr Ala Ala Arg Gly Ser Lys Lys Ile Glu Gly Arg Lys Lieu 110 115 120 cga gtg gCa gtt gtc gga ggt ggC cct gcc ggt gga togc gct gCg gag 435 Arg Val Ala Val Val Gly Gly Gly Pro Ala Gly Gly Cys Ala Ala Glu 125 130 135 act citt gcc aag g g c ga att gag acattt citc att gag cqa aag ttg 483 Thr Lieu Ala Lys Gly Gly Ile Glu Thir Phe Lieu. Ile Glu Arg Lys Lieu 14 O 145 15 O gat aat gct aag cca tot gig gga gct att coc citt toc at g g to gga 531 Asp Asn Ala Lys Pro Cys Gly Gly Ala Ile Pro Lieu. Cys Met Val Gly 155 160 1.65 170 gala titc gac citg cc.g. ccc gaa att atc gac cqc aaa gtg acg aag atg 579 Glu Phe Asp Leu Pro Pro Glu Ile Ile Asp Arg Lys Val Thr Lys Met 175 18O 185 aaa atg att tog cct tmc aat gtt t 604 Lys Met Ile Ser Pro Xaa Asn Val 190

<210 SEQ ID NO 30 &2 11s LENGTH 194 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 192 <223> OTHER INFORMATION: Xaa is Tyr, Phe, Cys, or Ser. <400 SEQUENCE: 30 Asp Ile Ala Arg His Ser Ala Wal Met Ala Ser Lieu Glin Ala Val Ile 1 5 10 15 Thr Ala Ser Pro Ala Ser Phe Ala Ala Ser Ser Arg Ala Val Ser Ser 2O 25 30 His Ser Glu Thr Ala Ala Val Leu Val Pro Cys Ala Ser Ile Ser Ser 35 40 45 Arg Gly Val Ser Thr Ser Cys Leu Gly Phe Val Ala Ser Ser Gly Arg 5 O 55 60 Asn Ala Ser Lieu Lys Ser Phe Glu Gly Lieu Arg Gly Lieu. Asn Ala Ser 65 70 75 8O Gly Pro Thir Ser Ala Val Glu Ser Lieu Lys Ala Glu Arg Arg Ser Asn 85 90 95 US 2003/O157592 A1 Aug. 21, 2003 51

-continued

Val Val Glu Glu Ala Gly Tyr Glin Pro Leu Arg Val Tyr Ala Ala Arg 100 105 110 Gly Ser Lys Lys Ile Glu Gly Arg Lys Lieu Arg Val Ala Val Val Gly 115 120 125 Gly Gly Pro Ala Gly Gly Cys Ala Ala Glu Thir Lieu Ala Lys Gly Gly 130 135 1 4 0 Ile Glu Thir Phe Lieu. Ile Glu Arg Lys Lieu. Asp Asn Ala Lys Pro Cys 145 15 O 155 160 Gly Gly Ala Ile Pro Leu Cys Met Val Gly Glu Phe Asp Leu Pro Pro 1.65 170 175 Glu Ile Ile Asp Arg Lys Val Thr Lys Met Lys Met Ile Ser Pro Xaa 18O 185 190

Asn Wall

<210> SEQ ID NO 31 &2 11s LENGTH 604 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (19) . . (348) <223> OTHER INFORMATION: 23 ppprot1 071 d03 rev <400 SEQUENCE: 31 tggacgcatg totgct ga ggc tat tdt gala ggc toc goc aac gga act cqt 51 Gly Tyr Cys Glu Gly Ser Ala Asn Gly Thr Arg 1 5 10 atg att gac gag to a gat ttg agg aca tat cita gat aaa togg gac aag 99 Met Ile Asp Glu Ser Asp Leu Arg Thr Tyr Lieu. Asp Lys Trp Asp Lys 15 20 25 aag tac togg gca act tac aag gtg citg gac ata ttg cag aag gtt titc 147 Lys Tyr Trp Ala Thr Tyr Lys Val Lieu. Asp Ile Leu Gln Lys Val Phe 30 35 40 tac agg to c aac cct gcc aga gag gCattic gtc. gag atg togc gcc gac 195 Tyr Arg Ser Asn. Pro Ala Arg Glu Ala Phe Val Glu Met Cys Ala Asp 45 50 55 gac tac gtg caa aag atg acg titt gat agt tat ttg tac aag gtg gtg 243 Asp Tyr Val Glin Lys Met Thr Phe Asp Ser Tyr Leu Tyr Lys Val Val 60 65 70 75 gtg cct gga aac cca ttg gac gac citg aag cita gca gtt aac act atc 291 Val Pro Gly Asn Pro Leu Asp Asp Lieu Lys Lieu Ala Wall Asn. Thir Ile 8O 85 9 O ggg agc citg atc aga gcc aat gca ttg cqc aag gag tot gag aag atg 339 Gly Ser Lieu. Ile Arg Ala Asn Ala Lieu Arg Lys Glu Ser Glu Lys Met 95 100 105 acc gta tag gtgtggcgct ggaaatcttic to agttgata ttggcc agtc 388 Thir Wall citcc to gaat totaaaattg tagtggtata titc.cgaggct cocgggcacg gctctggttt 4 48 tggtaatcaa ttittgactac cattcattta cittgtagaac agagtaagta toctitttagt 508 atcc.cgggat taggaatgct agataatact ttgcagotaa tittaa.ccggc tict gaattta 568 citaag.cgt.cc toc goggttt gacaaaaaaa aaaaaa 604

<210> SEQ ID NO 32 &2 11s LENGTH 109 &212> TYPE PRT US 2003/O157592 A1 Aug. 21, 2003 52

-continued <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 32 Gly Tyr Cys Glu Gly Ser Ala Asn Gly. Thir Arg Met Ile Asp Glu Ser 1 5 10 15 Asp Leu Arg Thr Tyr Lieu. Asp Llys Trp Asp Llys Lys Tyr Trp Ala Thr 2O 25 30 Tyr Lys Val Lieu. Asp Ile Leu Glin Lys Val Phe Tyr Arg Ser Asn Pro 35 40 45 Ala Arg Glu Ala Phe Val Glu Met Cys Ala Asp Asp Tyr Val Glin Lys 5 O 55 60 Met Thr Phe Asp Ser Tyr Leu Tyr Lys Val Val Val Pro Gly Asn Pro 65 70 75 8O Leu Asp Asp Leu Lys Lieu Ala Val Asn. Thir Ile Gly Ser Lieu. Ile Arg 85 90 95 Ala Asn Ala Lieu Arg Lys Glu Ser Glu Lys Met Thr Val 100 105

<210 SEQ ID NO 33 &2 11s LENGTH 620 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) ... (472) <223> OTHER INFORMATION: 70 mb1 D11rev

<400 SEQUENCE: 33 g gCt cat coa att coa gag cac cot agg cct c go agg gcg agt aac cqg 49 Ala His Pro Ile Pro Glu His Pro Arg Pro Arg Arg Ala Ser Asn Arg 1 5 10 15 gtg gC g ttg atc ggg gat gcg gCa ggg tat gtt acc aag toc tot ggg 97 Val Ala Lieu. Ile Gly Asp Ala Ala Gly Tyr Val Thr Lys Cys Ser Gly 2O 25 30 gag gga att tac titc gct gcc aag to c ggg cqc atg tdt gct gag gC g 145 Glu Gly Ile Tyr Phe Ala Ala Lys Ser Gly Arg Met Cys Ala Glu Ala 35 40 45 atc gtg gag gga toc goc aat ggit act cqc at g g to gac gala to a gac 193 Ile Val Glu Gly Ser Ala Asn Gly Thr Arg Met Val Asp Glu Ser Asp 5 O 55 60 ttg aga aca tac citg gala aag togg gat aag aag tac tog gCC aca tat 241 Leu Arg Thr Tyr Lieu Glu Lys Trp Asp Llys Lys Tyr Trp Ala Thr Tyr 65 70 75 8O aag gtg ttg gac att citt cag aag gtt titc tac aga. tcg aac cot go c 289 Lys Val Lieu. Asp Ile Leu Gln Lys Val Phe Tyr Arg Ser Asn Pro Ala 85 90 95 cga gag gC g titc gtg gag atg togc gcc gat gac tat gtg cag aag atg 337 Arg Glu Ala Phe Val Glu Met Cys Ala Asp Asp Tyr Val Glin Lys Met 100 105 110 acg titc gac agc tat citg tac aag gtg gtg gtg cct gga aac coa ttg 385 Thr Phe Asp Ser Tyr Leu Tyr Lys Val Val Val Pro Gly Asn Pro Leu 115 120 125 gac gac atc aag titg gca atc aac aca atc ggg agt titg att aga gC c 433 Asp Asp Ile Lys Lieu Ala Ile Asn. Thir Ile Gly Ser Lieu. Ile Arg Ala 130 135 1 4 0 aac goc ttg cqc aag gag tog gag aag at g acc gtg tag ggittagggitt 482 Asn Ala Lieu Arg Lys Glu Ser Glu Lys Met Thr Val 145 15 O 155 US 2003/O157592 A1 Aug. 21, 2003

-continued cittatc.cgtt gatact gcct agacitttctg gttittataca attcgtagaa goacgttcgg 542 aggttcct ga gcttgggitat gtatttgtca atc cattgttg atgactic to a titcacttgta 6O2 aaacaggaca tottatct 62O

<210> SEQ ID NO 34 &2 11s LENGTH 156 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 34 Ala His Pro Ile Pro Glu His Pro Arg Pro Arg Arg Ala Ser Asn Arg 1 5 10 15 Val Ala Lieu. Ile Gly Asp Ala Ala Gly Tyr Val Thr Lys Cys Ser Gly 2O 25 30 Glu Gly Ile Tyr Phe Ala Ala Lys Ser Gly Arg Met Cys Ala Glu Ala 35 40 45 Ile Val Glu Gly Ser Ala Asn Gly Thr Arg Met Val Asp Glu Ser Asp 5 O 55 60 Leu Arg Thr Tyr Lieu Glu Lys Trp Asp Llys Lys Tyr Trp Ala Thr Tyr 65 70 75 8O Lys Val Lieu. Asp Ile Leu Gln Lys Val Phe Tyr Arg Ser Asn Pro Ala 85 90 95 Arg Glu Ala Phe Val Glu Met Cys Ala Asp Asp Tyr Val Glin Lys Met 1OO 105 110 Thr Phe Asp Ser Tyr Leu Tyr Lys Val Val Val Pro Gly Asn Pro Leu 115 120 125 Asp Asp Ile Lys Lieu Ala Ile Asn. Thir Ile Gly Ser Lieu. Ile Arg Ala 130 135 1 4 0 Asn Ala Lieu Arg Lys Glu Ser Glu Lys Met Thr Val 145 15 O 155

<210 SEQ ID NO 35 &2 11s LENGTH 637 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) ... (394) <223> OTHER INFORMATION: 84 ppprot1 36 F12 rev <400 SEQUENCE: 35 c gtg acg aag togc ticc ggg gag ggt atc tac titt gct gct aag tot gga 49 Val Thr Lys Cys Ser Gly Glu Gly Ile Tyr Phe Ala Ala Lys Ser Gly 1 5 10 15 cgc at g togt gct gag gCt att gtg gala ggc tocc gcc aac gga act cqt 97 Arg Met Cys Ala Glu Ala Ile Val Glu Gly Ser Ala Asn Gly. Thir Arg 2O 25 30 atg att gac gag to a gat ttg agg aca tat cita gat aaa togg gac aag 145 Met Ile Asp Glu Ser Asp Leu Arg Thr Tyr Lieu. Asp Lys Trp Asp Lys 35 40 45 aag tac togg gca act tac aag gtg citg gac ata ttg cag aag gtt titc 193 Lys Tyr Trp Ala Thr Tyr Lys Val Lieu. Asp Ile Leu Gln Lys Val Phe 5 O 55 60 tac agg to c aac cct gcc aga gag gCattic gtc. gag atg togc gcc gac 241 Tyr Arg Ser Asn. Pro Ala Arg Glu Ala Phe Val Glu Met Cys Ala Asp 65 70 75 8O US 2003/O157592 A1 Aug. 21, 2003

-continued gac tac gtg caa aag atg acg titt gat agt tat ttg tac aag gtg gtg 289 Asp Tyr Val Glin Lys Met Thr Phe Asp Ser Tyr Leu Tyr Lys Val Val 85 90 95 gtg cct gga aac cca ttg gac gac citg aag cita gca gtt aac act atc 337 Val Pro Gly Asn Pro Leu Asp Asp Lieu Lys Lieu Ala Wall Asn. Thir Ile 100 105 110 ggg agc citg atc aga gcc aat gca ttg cqc aag gag tot gag aag atg 385 Gly Ser Lieu. Ile Arg Ala Asn Ala Lieu Arg Lys Glu Ser Glu Lys Met 115 120 125 acc gta tag gtgtggcgct ggaaatcttic to agttgata ttggcc agtc 434 Thir Wall 130 citcc to gaat totaaaattg tagtggtata titc.cgaggct cocgggcacg gctctggttt 494 tggtaatcaa ttittgactac cattcattta cittgtagaac agagtaagta toctitttagt 554 atcc.cgggat taggaatgct agataatact ttgcagotaa tittaa.ccggc tict gaattta 614 citaag.cgt.cc toc goggttt gac 637

<210 SEQ ID NO 36 &2 11s LENGTH 130 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 36 Val Thr Lys Cys Ser Gly Glu Gly Ile Tyr Phe Ala Ala Lys Ser Gly 1. 5 10 15 Arg Met Cys Ala Glu Ala Ile Val Glu Gly Ser Ala Asn Gly. Thir Arg 2O 25 30 Met Ile Asp Glu Ser Asp Leu Arg Thr Tyr Lieu. Asp Lys Trp Asp Lys 35 40 45 Lys Tyr Trp Ala Thr Tyr Lys Val Lieu. Asp Ile Leu Gln Lys Val Phe 5 O 55 60 Tyr Arg Ser Asn. Pro Ala Arg Glu Ala Phe Val Glu Met Cys Ala Asp 65 70 75 8O Asp Tyr Val Glin Lys Met Thr Phe Asp Ser Tyr Leu Tyr Lys Val Val 85 90 95 Val Pro Gly Asn Pro Leu Asp Asp Lieu Lys Lieu Ala Wall Asn. Thir Ile 100 105 110 Gly Ser Lieu. Ile Arg Ala Asn Ala Lieu Arg Lys Glu Ser Glu Lys Met 115 120 125

Thir Wall 130

<210 SEQ ID NO 37 &2 11s LENGTH 519 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (515) <223> OTHER INFORMATION: 27 mm& 55 EO2rev <400 SEQUENCE: 37 ct cot gcg gtg ttg gala gtc gat gct gta att gga gCt gac ggit gcc 47 Pro Ala Wall Leu Glu Val Asp Ala Val Ile Gly Ala Asp Gly Ala 1 5 10 15 US 2003/O157592 A1 Aug. 21, 2003

-continued aac agc agg gtg gcc aag gac att gac got ggit gag tac gac tac goc 95 Asn Ser Arg Val Ala Lys Asp Ile Asp Ala Gly Glu Tyr Asp Tyr Ala 2O 25 30 atc got titc caa gaa agg att aag att cot gag gat aag at g gag tac 1 4 3 Ile Ala Phe Glin Glu Arg Ile Lys Ile Pro Glu Asp Lys Met Glu Tyr 35 40 45 tat gag aac ttg gca gag atg tat gtc. g.gt gac gat gtg tog cca gac 191 Tyr Glu Asn Lieu Ala Glu Met Tyr Val Gly Asp Asp Wal Ser Pro Asp 5 O 55 60 titc tac ggg togg gtg titc ccg aag togt gac cac gtt gca gtc gga acg 239 Phe Tyr Gly Trp Val Phe Pro Lys Cys Asp His Val Ala Val Gly. Thr 65 70 75 ggg acg gtc atc aac aag cca gCC atc aaa aag tac cag acg gcc acg 287 Gly Thr Val Ile Asn Lys Pro Ala Ile Llys Lys Tyr Gln Thr Ala Thr 8O 85 90 95 agg aac cqg gC g aag gac aag att gcc gga gga aag atc atc agg gtt 335 Arg Asn Arg Ala Lys Asp Lys Ile Ala Gly Gly Lys Ile Ile Arg Val 1 OO 105 110 gag gCa cac coc att cog gag cac coa agg cct c gc agg gcg agc gac 383 Glu Ala His Pro Ile Pro Glu. His Pro Arg Pro Arg Arg Ala Ser Asp 115 120 125 aga gtg gC g tta gtt gog gac gog gCt gga tac gtg acg aag toc to c 431 Arg Val Ala Lieu Val Gly Asp Ala Ala Gly Tyr Val Thr Lys Cys Ser 130 135 1 4 0 ggg gag ggt atc tac titt gct gct aag tot gga cqc atg togt gct gag 479 Gly Glu Gly Ile Tyr Phe Ala Ala Lys Ser Gly Arg Met Cys Ala Glu 145 150 155 gct att gtg gala gct cog cca acg gaa citc gta toga ttga 519 Ala Ile Wall Glu Ala Pro Pro Thr Glu Lieu Wall 160 1.65 170

<210 SEQ ID NO 38 &2 11s LENGTH 170 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 38 Pro Ala Wall Leu Glu Val Asp Ala Val Ile Gly Ala Asp Gly Ala Asn 1 5 10 15 Ser Arg Val Ala Lys Asp Ile Asp Ala Gly Glu Tyr Asp Tyr Ala Ile 2O 25 30 Ala Phe Glin Glu Arg Ile Lys Ile Pro Glu Asp Lys Met Glu Tyr Tyr 35 40 45 Glu Asn Lieu Ala Glu Met Tyr Val Gly Asp Asp Wal Ser Pro Asp Phe 5 O 55 60 Tyr Gly Trp Val Phe Pro Lys Cys Asp His Val Ala Val Gly Thr Gly 65 70 75 8O Thr Val Ile Asn Lys Pro Ala Ile Lys Lys Tyr Gln Thr Ala Thr Arg 85 90 95 Asn Arg Ala Lys Asp Lys Ile Ala Gly Gly Lys Ile Ile Arg Val Glu 100 105 110 Ala His Pro Ile Pro Glu His Pro Arg Pro Arg Arg Ala Ser Asp Arg 115 120 125 Val Ala Lieu Val Gly Asp Ala Ala Gly Tyr Val Thr Lys Cys Ser Gly 130 135 1 4 0 Glu Gly Ile Tyr Phe Ala Ala Lys Ser Gly Arg Met Cys Ala Glu Ala US 2003/O157592 A1 Aug. 21, 2003 56

-continued

145 15 O 155 160

Ile Wall Glu Ala Pro Pro Thr Glu Lieu Wall 1.65 170

<210 SEQ ID NO 39 &2 11s LENGTH 602 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) ... (328) <223> OTHER INFORMATION: 54 ppprot1 081 a 12 rev <400 SEQUENCE: 39 t att gtg gala ggc ticc gcc aac gga act cqt atg att gac gag to a gat 49 Ile Val Glu Gly Ser Ala Asn Gly Thr Arg Met Ile Asp Glu Ser Asp 1 5 10 15 ttg agg aca tat cita gat aaa togg gac aag aag tac tog gCa act tac 97 Leu Arg Thr Tyr Lieu. Asp Lys Trp Asp Llys Lys Tyr Trp Ala Thr Tyr 2O 25 30 aag gtg citg gac ata ttg cag aag gtt titc tac agg toc aac cot go c 145 Lys Val Lieu. Asp Ile Leu Gln Lys Val Phe Tyr Arg Ser Asn Pro Ala 35 40 45 aga gag gCattic gtc. gag atg togc gcc gac gac tac gtg caa aag atg 193 Arg Glu Ala Phe Val Glu Met Cys Ala Asp Asp Tyr Val Glin Lys Met 5 O 55 60 acg titt gat agt tat ttg tac aag gtg gtg gtg cct gga aac coa ttg 241 Thr Phe Asp Ser Tyr Leu Tyr Lys Val Val Val Pro Gly Asn Pro Leu 65 70 75 8O gac gac citg aag cita gca gtt aac act atc ggg agc ctd atc aga gC c 289 Asp Asp Leu Lys Lieu Ala Val Asn. Thir Ile Gly Ser Lieu. Ile Arg Ala 85 90 95 aat go a ttg cqc aag gag tot gag aag at g acc gta tag gtgtgg.cgct 338 Asn Ala Lieu Arg Lys Glu Ser Glu Lys Met Thr Val 100 105 ggaaatctitc. tcagttgata ttggc.cagtc. citcctggaat totaaaattig tagtggtata 398 titcc gaggct cocggg cacg gctctggttt togtaatcaa ttittgacitac cattcattta 458 cittgtagaac agagtaagta to cittittagt atc.ccgggat taggaatgct agataatact 518 ttgcagotaa tittaa.ccggc tict gaattta citaag.cgtoc toc goggttt gacacatcct 578 gaattctaat tctotcag at gttg 6O2

<210> SEQ ID NO 40 &2 11s LENGTH 108 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 40 Ile Val Glu Gly Ser Ala Asn Gly Thr Arg Met Ile Asp Glu Ser Asp 1 5 10 15 Leu Arg Thr Tyr Lieu. Asp Lys Trp Asp Llys Lys Tyr Trp Ala Thr Tyr 2O 25 30 Lys Val Lieu. Asp Ile Leu Gln Lys Val Phe Tyr Arg Ser Asn Pro Ala 35 40 45 Arg Glu Ala Phe Val Glu Met Cys Ala Asp Asp Tyr Val Glin Lys Met 5 O 55 60 Thr Phe Asp Ser Tyr Leu Tyr Lys Val Val Val Pro Gly Asn Pro Leu

US 2003/O157592 A1 Aug. 21, 2003 58

-continued <222> LOCATION: (86).. (502) <223> OTHER INFORMATION: 25 mm.18 eC1rev <400 SEQUENCE: 43 tgataataca taaattagtt coaaaaatca taaga gagga atacaagaca atatacgact 60 aaaacaaata catccataac aatga cca cog gCa at g g to acc tot gta cot 112 Pro Pro Ala Met Wall Thir Ser Wall Pro 1 5 act tc g g g c aca ata tat att gag aac ttg gCa gag atg tat gtc ggit 160 Thr Ser Gly Thr Ile Tyr Ile Glu Asn Leu Ala Glu Met Tyr Val Gly 10 15 2O 25 gac gat gtg tog cca gac titc tac ggg togg gtg titc ccg aag togt gac 208 Asp Asp Wal Ser Pro Asp Phe Tyr Gly Trp Val Phe Pro Lys Cys Asp 30 35 40 cac gtt gca gtc gga acg g g g acg gtc atc aac aag cca gCC atc aaa. 256 His Val Ala Val Gly Thr Gly Thr Val Ile Asn Lys Pro Ala Ile Lys 45 5 O 55 aag tac Cag acg gcc acg agg aac cqg gC g aag gaC aag att gCC gga 3O4. Lys Tyr Glin Thr Ala Thr Arg Asn Arg Ala Lys Asp Lys Ile Ala Gly 60 65 70 gga aag atc atc agg gtt gag gCa cac coc att cog gag cac coa agg 352 Gly Lys Ile Ile Arg Val Glu Ala His Pro Ile Pro Glu His Pro Arg 75 8O 85 Cct cqC agg gC g agc gaC aga. gtg gC g tta gtt ggg gaC gCg gct gga 400 Pro Arg Arg Ala Ser Asp Arg Val Ala Lieu Val Gly Asp Ala Ala Gly 90 95 1 OO 105 tac gitg acg aag togc ticc ggg gag ggt atc tac titt gct gct aag tot 4 48 Tyr Val Thr Lys Cys Ser Gly Glu Gly Ile Tyr Phe Ala Ala Lys Ser 110 115 120 gga cqc at g tot gct gag cita ttg tog aag gCt c cq cca acg gaa citc 496 Gly Arg Met Cys Ala Glu Lieu Lleu Trp Lys Ala Pro Pro Thr Glu Lieu 125 130 135 gta to a ttgacgagtic agatttgagg acat atctag ataaatggga caagaag 549 Wall

<210> SEQ ID NO 44 &2 11s LENGTH 138 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 44 Pro Pro Ala Met Val Thr Ser Val Pro Thr Ser Gly Thr Ile Tyr Ile 1 5 10 15 Glu Asn Lieu Ala Glu Met Tyr Val Gly Asp Asp Wal Ser Pro Asp Phe 2O 25 30 Tyr Gly Trp Val Phe Pro Lys Cys Asp His Val Ala Val Gly Thr Gly 35 40 45 Thr Val Ile Asn Lys Pro Ala Ile Lys Lys Tyr Gln Thr Ala Thr Arg 5 O 55 60 Asn Arg Ala Lys Asp Lys Ile Ala Gly Gly Lys Ile Ile Arg Val Glu 65 70 75 8O Ala His Pro Ile Pro Glu His Pro Arg Pro Arg Arg Ala Ser Asp Arg 85 90 95 Val Ala Lieu Val Gly Asp Ala Ala Gly Tyr Val Thr Lys Cys Ser Gly 100 105 110 Glu Gly Ile Tyr Phe Ala Ala Lys Ser Gly Arg Met Cys Ala Glu Lieu US 2003/O157592 A1 Aug. 21, 2003 59

-continued

115 120 125 Leu Trp Lys Ala Pro Pro Thr Glu Leu Val 130 135

<210> SEQ ID NO 45 &2 11s LENGTH 2.74 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (273) <223> OTHER INFORMATION: 80 bc.09 f1 Orew &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 1 - 274 <223> OTHER INFORMATION: n is g, a c or t <400 SEQUENCE: 45 agt tot cag titt cat tct citg aac aat acg gat tca gtt coc aat aac 48 Ser Ser Glin Phe His Ser Leu Asn Asn Thr Asp Ser Val Pro Asn Asn 1 5 10 15 agt cat ttg gCa anc aca tat tdt gca ttg gCt ata ttg aag aca gtt 96 Ser His Leu Ala Xala Thr Tyr Cys Ala Lieu Ala Ile Leu Lys Thr Val 2O 25 30 ggit tat gac ttn to a citt att gac tot cqg toa ata tat aag toa atg 144 Gly Tyr Asp Xala Ser Lieu. Ile Asp Ser Arg Ser Ile Tyr Lys Ser Met 35 40 45 aaa cat citt caa caa cct gat ggc agt titc atg cct att cat aca gga 192 Lys His Leu Gln Gln Pro Asp Gly Ser Phe Met Pro Ile His Thr Gly 5 O 55 60 gca gag acc gat tta cng ttn gtin tat tdt gct gct gtc nitt tot cot 240 Ala Glu Thr Asp Leu Xaa Xaa Val Tyr Cys Ala Ala Val Xaa Ser Pro 65 70 75 8O cita ttg gat aat togg agt gga at g gat naa gac a 274 Leu Lieu. Asp Asn Trp Ser Gly Met Asp Xaa Asp 85 90

<210> SEQ ID NO 46 &2 11s LENGTH 91. &212> TYPE PRT <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 21 <223> OTHER INFORMATION: Xaa is Asn. Ile Ser, or Thr. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 36 <223> OTHER INFORMATION: Xaa is Phe or Leu. &220s FEATURE <221 NAME/KEY: misc feature &222> LOCATION TO <223> OTHER INFORMATION: Xaa is Gln, Leu, Pro, or Arg &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 71 &223> OTHER INFORMATION Phe or Leu &220s FEATURE <221 NAME/KEY: misc feature &222> LOCATION 78 <223> OTHER INFORMATION: Xaa is Ile Leu Val or Phe &220s FEATURE <221 NAME/KEY: misc feature &222> LOCATION 9 O <223> OTHER INFORMATION: Xaa is Asn His Glu or Stop. <400 SEQUENCE: 46 US 2003/O157592 A1 Aug. 21, 2003 60

-continued

Ser Ser Glin Phe His Ser Leu Asn Asn Thr Asp Ser Val Pro Asn Asn 1 5 10 15 Ser His Leu Ala Xala Thr Tyr Cys Ala Lieu Ala Ile Leu Lys Thr Val 2O 25 30 Gly Tyr Asp Xala Ser Lieu. Ile Asp Ser Arg Ser Ile Tyr Lys Ser Met 35 40 45 Lys His Leu Gln Gln Pro Asp Gly Ser Phe Met Pro Ile His Thr Gly 5 O 55 60 Ala Glu Thr Asp Leu Xaa Xaa Val Tyr Cys Ala Ala Val Xaa Ser Pro 65 70 75 8O Leu Lieu. Asp Asn Trp Ser Gly Met Asp Xaa Asp 85 90

<210> SEQ ID NO 47 &2 11s LENGTH 488 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) ... (247) <223> OTHER INFORMATION: 78 ppprot1 087 e12 rev <400 SEQUENCE: 47 g tog gac tac gtc. tcc ata gcc aaa gac tta ggc ctd cag gat atc aag 49 Ser Asp Tyr Val Ser Ile Ala Lys Asp Leu Gly Lieu Glin Asp Ile Lys 1 5 10 15 agc gag gac to g toc gag tac gitg acg ccc titc togg cca gC g g to atg 97 Ser Glu Asp Trp Ser Glu Tyr Val Thr Pro Phe Trp Pro Ala Val Met 2O 25 30 aaa acc gcc ttg toc atg gaa ggg citg gtg gga citg gtc. aag to c ggc 145 Lys Thr Ala Leu Ser Met Glu Gly Lieu Val Gly Lieu Val Lys Ser Gly 35 40 45 tgg act act at g aaa gqa gct titc gcc at g acg citc atg atc cag ggc 193 Trp Thr Thr Met Lys Gly Ala Phe Ala Met Thr Leu Met Ile Glin Gly 5 O 55 60 tac cag cqa ggg citc att aaa titc gct gcc atc act toc agg aag cqg 241 Tyr Glin Arg Gly Lieu. Ile Lys Phe Ala Ala Ile Thr Cys Arg Lys Arg 65 70 75 8O gat tda cog actgatt cagtcctitcc to atttctoa to acatcatg gacaatgtcg 297 Asp caa.ccgatta cattctitatg ccagtgagga atggttgcgt ggtttctggit aatcgtoaag 357 cittcggagta taagggattg aggtotcc.gc tagtag actt tactato go a tattoaiacca 417 totgtacctt gagggagtaa toaccalattc gtgcatacat cattcggcaa aagat cattg 477 gacgtoaaaa a 488

<210> SEQ ID NO 48 &2 11s LENGTH 81 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 48 Ser Asp Tyr Val Ser Ile Ala Lys Asp Leu Gly Lieu Glin Asp Ile Lys 1 5 10 15 Ser Glu Asp Trp Ser Glu Tyr Val Thr Pro Phe Trp Pro Ala Val Met 2O 25 30 US 2003/O157592 A1 Aug. 21, 2003 61

-continued Lys Thr Ala Leu Ser Met Glu Gly Lieu Val Gly Leu Val Lys Ser Gly 35 40 45 Trp Thr Thr Met Lys Gly Ala Phe Ala Met Thr Leu Met Ile Glin Gly 5 O 55 60 Tyr Glin Arg Gly Lieu. Ile Lys Phe Ala Ala Ile Thr Cys Arg Lys Arg 65 70 75 Asp

<210 SEQ ID NO 49 &2 11s LENGTH 619 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) . . (508) <223> OTHER INFORMATION: 78 ppprot1 092 e12rev <400 SEQUENCE: 49 a to g atc goc aga aaa tot gca gtc gag titt gaa gtt gogg gat tdc acc 49 Ser Ile Ala Arg Lys Cys Ala Val Glu Phe Glu Val Gly Asp Cys Thr 1 5 10 15 aag att aat tac cct cac gca tot titt gat gto atc tac agt cgt. gat 97 Lys Ile Asn Tyr Pro His Ala Ser Phe Asp Wall Ile Ser Arg Asp 2O 25 30 acc att cita cac att Cala gat a.a.a. cct gCg citt titt Cala cgg titt tat 145 Thr Ile Telu His Ile Glin Asp Lys Pro Ala Teu Phe Glin Arg Phe Tyr 35 40 45 a.a.a. tgg ttg aag cct gga ggit cgg gtg citt atc agt gac tac tgt aga 193 Trp Telu Lys Pro Gly Gly Arg Wall Telu Ile Ser Asp Cys Arg 5 O 55 60 gct cca Cala act cc.g tog gCg titc. gct gca tac att cag cag agg 241 Ala Pro Glin Thr Pro Ser Ala Phe Ala Ala Ile Glin Glin Arg 65 70 75 8O ggit tat gat citc. cat agc gtt aag tac gga gag atg citg gaa gat 289 Gly Asp Telu His Ser Wall Lys Tyr Gly Glu Met Teu Glu Asp 85 90 95 gcc ggit titt gtg gaa gtg gto gag gac cgc acg gat cag titc. att 337 Ala Gly Phe Wall Glu Wall Wall Glu Asp Arg Thr Asp Glin Phe Ile 100 105 110 gaa gtg tta cag agg gag cita acc act gaa gca ggit cgt. gac cag 385 Glu Wall Telu Glin Arg Glu Teu Thr Thr Glu Ala Gly Arg Asp Glin 115 125 titc. atc a.a. C. gat titc. to c gag gat tat aaC tac att gtg agc gga 433 Phe Ile Asn Asp Phe Ser Glu Asp Asn Tyr Ile Wall Ser Gly 130 135 1 4 0 tgg aag agt aag citg aag tgt tog aat gac gaa cag aag tgg gga 481 Trp Lys Ser Lys Teu Lys Arg Cys Ser Asn Asp Glu Glin Tys Trp Gly 145 15 O 155 160 citc. titc. ata gcc tac aag gca tta tottgaaatt attitcggata 528 Teu Phe Ile Ala Tyr Lys Ala Telu 1.65 tagataaaac agcattgttg gaatagttca cacttgagag totgttttgt cittcttataa 588 ataaacatcg atactattoa cocacttaaa a 619

SEQ ID NO 50 LENGTH 168 TYPE PRT ORGANISM: Physcomitrella patens US 2003/O157592 A1 Aug. 21, 2003 62

-continued

<400 SEQUENCE: 50 Ser Ile Ala Arg Lys Cys Ala Val Glu Phe Glu Val Gly Asp Cys Thr 1 5 10 15 Lys Ile Asn Tyr Pro His Ala Ser Phe Asp Val Ile Tyr Ser Arg Asp 2O 25 30 Thir Ile Lieu. His Ile Glin Asp Llys Pro Ala Lieu Phe Glin Arg Phe Tyr 35 40 45 Lys Trp Lieu Lys Pro Gly Gly Arg Val Lieu. Ile Ser Asp Tyr Cys Arg 5 O 55 60 Ala Pro Gln Thr Pro Ser Ala Glu Phe Ala Ala Tyr Ile Glin Glin Arg 65 70 75 8O Gly Tyr Asp Lieu. His Ser Val Glin Lys Tyr Gly Glu Met Leu Glu Asp 85 90 95 Ala Gly Phe Val Glu Val Val Ala Glu Asp Arg Thr Asp Glin Phe Ile 100 105 110 Glu Val Lieu Glin Arg Glu Lieu Ala Thir Thr Glu Ala Gly Arg Asp Glin 115 120 125 Phe Ile Asin Asp Phe Ser Glu Glu Asp Tyr Asn Tyr Ile Val Ser Gly 130 135 1 4 0 Trp Llys Ser Lys Lieu Lys Arg Cys Ser Asn Asp Glu Gln Lys Trp Gly 145 15 O 155 160 Leu Phe Ile Ala Tyr Lys Ala Lieu 165

<210 SEQ ID NO 51 &2 11s LENGTH 563 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (563) <223> OTHER INFORMATION: O5 ck 19 a)3

<400 SEQUENCE: 51 tg togc gcc toc acc aca gito cot acg agg att tat gat gga gtg gcg 47 Cys Ala Ser Thr Thr Val Pro Thr Arg Ile Tyr Asp Gly Val Ala 1 5 10 15 gag gac caa gag gat tac atc aag gCt ggt gga gaa gag ttg gat citc 95 Glu Asp Glin Glu Asp Tyr Ile Lys Ala Gly Gly Glu Glu Lieu. Asp Lieu 2O 25 30 gtg cag citg cag goc toc aag to c titt gat cag toc aag att ggg gag 1 4 3 Val Glin Leu Glin Ala Ser Lys Ser Phe Asp Glin Ser Lys Ile Gly Glu 35 40 45 aag tta caa citt citg gga gac gala acg tta gat ttg gta gtt gta ggc 191 Lys Lieu Gln Leu Lleu Gly Asp Glu Thir Lieu. Asp Leu Val Val Val Gly 5 O 55 60 tgc ggit cot got goga atg togc titg gCa gct gaa goa gcg aaa cag ggc 239 Cys Gly Pro Ala Gly Met Cys Lieu Ala Ala Glu Ala Ala Lys Glin Gly 65 70 75 citt aat gt g g g c citc gta ggc cct gac cita cog titc gtc. aac aat tat 287 Lieu. Asn Val Gly Lieu Val Gly Pro Asp Leu Pro Phe Val Asn. Asn Tyr 8O 85 90 95 ggit gtt togg act gac gaa titt got gca ttg ggc citc gag gac toc at a 335 Gly Val Trp Thr Asp Glu Phe Ala Ala Leu Gly Lieu Glu Asp Cys Ile 1 OO 105 110 US 2003/O157592 A1 Aug. 21, 2003

-continued gag caa acc tog aaa gac to a gCt atg tat att gaa gag gac to g cct 383 Glu Glin Thir Trp Lys Asp Ser Ala Met Tyr Ile Glu Glu Asp Ser Pro 115 120 125 ata at g at a ggg cqt gca tat ggit cqt gtg agt cqg act citt citg aga 431 Ile Met Ile Gly Arg Ala Tyr Gly Arg Val Ser Arg Thr Lieu Lieu Arg 130 135 1 4 0 gala gag Ctt citg agg agg togC gct gag gga ggg gtt aga tac gtt gat 479 Glu Glu Lieu Lieu Arg Arg Cys Ala Glu Gly Gly Val Arg Tyr Val Asp 145 15 O 155 tot aaa gtt gac agg ata citt gala gtc gat gag gat ttg agt acc gtt 527 Ser Lys Val Asp Arg Ile Leu Glu Val Asp Glu Asp Leu Ser Thr Val 160 1.65 170 175 cita toc acc aat gga aaa aat atc aag agc aga citt 563 Lieu. Cys Thr Asn Gly Lys Asn. Ile Lys Ser Arg Lieu 18O 185

<210> SEQ ID NO 52 &2 11s LENGTH 187 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 52 Cys Ala Ser Thr Thr Val Pro Thr Arg Ile Tyr Asp Gly Val Ala Glu 1 5 10 15 Asp Glin Glu Asp Tyr Ile Lys Ala Gly Gly Glu Glu Lieu. Asp Lieu Val 2O 25 30 Gln Leu Glin Ala Ser Lys Ser Phe Asp Glin Ser Lys Ile Gly Glu Lys 35 40 45 Leu Gln Leu Lieu Gly Asp Glu Thir Lieu. Asp Leu Val Val Val Gly Cys 5 O 55 60 Gly Pro Ala Gly Met Cys Lieu Ala Ala Glu Ala Ala Lys Glin Gly Lieu 65 70 75 8O Asn Val Gly Lieu Val Gly Pro Asp Leu Pro Phe Val Asn. Asn Tyr Gly 85 90 95 Val Trp Thr Asp Glu Phe Ala Ala Lieu Gly Lieu Glu Asp Cys Ile Glu 100 105 110 Gln Thr Trp Lys Asp Ser Ala Met Tyr Ile Glu Glu Asp Ser Pro Ile 115 120 125 Met Ile Gly Arg Ala Tyr Gly Arg Val Ser Arg Thr Lieu Lieu Arg Glu 130 135 1 4 0 Glu Lieu Lieu Arg Arg Cys Ala Glu Gly Gly Val Arg Tyr Val Asp Ser 145 15 O 155 160 Lys Val Asp Arg Ile Leu Glu Val Asp Glu Asp Leu Ser Thr Val Lieu 1.65 170 175 Cys Thr Asn Gly Lys Asn. Ile Lys Ser Arg Lieu 18O 185

<210 SEQ ID NO 53 &2 11s LENGTH 684 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) ... (397) <223> OTHER INFORMATION: 02 ppprot1 046 a07 rev <400 SEQUENCE: 53 US 2003/O157592 A1 Aug. 21, 2003 64

-continued t acc atc. citg agg gat gtt gaa gaa gat gca cqc cqt ggc aga gta tac 49 Thir Ile Leu Arg Asp Val Glu Glu Asp Ala Arg Arg Gly Arg Val Tyr 1 5 10 15 citc cca cag gat gaa citg gca cqt titc ggit cto tcg gat gca gac att 97 Leu Pro Glin Asp Glu Lieu Ala Arg Phe Gly Lieu Ser Asp Ala Asp Ile 2O 25 30 titt gtc gga aaa gtt act gat aaa togg agg gCa titc atg aaa gac caa 145 Phe Val Gly Lys Val Thr Asp Llys Trp Arg Ala Phe Met Lys Asp Glin 35 40 45 att aaa aga gCt aga gtg titc titt gtg gag gCt gag aaa ggt gta cqt 193 Ile Lys Arg Ala Arg Val Phe Phe Val Glu Ala Glu Lys Gly Val Arg 5 O 55 60 gag citg gac aaa gac agt cqc togg cct gtg togg toc goc ctic att citt 241 Glu Lieu. Asp Lys Asp Ser Arg Trp Pro Val Trp Ser Ala Lieu. Ile Leu 65 70 75 8O tac cag caa att citg gac goc att gala gCC aac gat tac gat aac titc 289 Tyr Glin Glin Ile Leu Asp Ala Ile Glu Ala Asn Asp Tyr Asp Asn. Phe 85 90 95 aca aaa aga gCt tac gta gga aag tog aaa aag citg gct tct cita cot 337 Thr Lys Arg Ala Tyr Val Gly Lys Trp Llys Lys Lieu Ala Ser Lieu Pro 100 105 110 atc got tat ggc aga gcg ttg gtt coa cot coa gat gca citt coc agg 385 Ile Ala Tyr Gly Arg Ala Lieu Val Pro Pro Pro Asp Ala Lieu Pro Arg 115 120 125 tta gca cqt taa gttctaactt citgatgtacc atgggitatcg citggtoaacg. 437 Leu Ala Arg 130 aatticcacca gaatctgttt cqctgtcaca gggaatcctd aaagagctgc atttgcatcc 497 citgtc.ttittg acgaaacticc tagagcc.gga agaggcaaaa attgtagatg tagtggagtt 557 gacaagttctt ttgtaccg to cqtacttctg tacttggaac catttatgttg agc.cggttgt 617 ttatatagot gtgtatagot gag cagtctt tactatotac taaataaaat tctitccttct 677 cittcttg 684

<210> SEQ ID NO 54 <211& LENGTH 131 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 54 Thir Ile Leu Arg Asp Val Glu Glu Asp Ala Arg Arg Gly Arg Val Tyr 1 5 10 15 Leu Pro Glin Asp Glu Lieu Ala Arg Phe Gly Lieu Ser Asp Ala Asp Ile 2O 25 30 Phe Val Gly Lys Val Thr Asp Llys Trp Arg Ala Phe Met Lys Asp Glin 35 40 45 Ile Lys Arg Ala Arg Val Phe Phe Val Glu Ala Glu Lys Gly Val Arg 5 O 55 60 Glu Lieu. Asp Lys Asp Ser Arg Trp Pro Val Trp Ser Ala Lieu. Ile Leu 65 70 75 8O Tyr Glin Glin Ile Leu Asp Ala Ile Glu Ala Asn Asp Tyr Asp Asn. Phe 85 90 95 Thr Lys Arg Ala Tyr Val Gly Lys Trp Llys Lys Lieu Ala Ser Lieu Pro 100 105 110 Ile Ala Tyr Gly Arg Ala Lieu Val Pro Pro Pro Asp Ala Lieu Pro Arg US 2003/O157592 A1 Aug. 21, 2003 65

-continued

115 120 125 Leu Ala Arg 130

<210 SEQ ID NO 55 &2 11s LENGTH 576 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (221) <223> OTHER INFORMATION: 96 ck5 h12fwdrew

<400 SEQUENCE: 55 tt tac aag acg gtg cca gat tdt gag cct tot agg cca citt caa aga 47 Tyr Lys Thr Val Pro Asp Cys Glu Pro Cys Arg Pro Leu Glin Arg 1 5 10 15 to a cot att coa aag titc tac at g g c g g gt gac titc act aag cag aag 95 Ser Pro Ile Pro Lys Phe Tyr Met Ala Gly Asp Phe Thr Lys Gln Lys 2O 25 30 tac citc got tot atg gaa gag gCt gtg citc. tct ggc aaa ttt togt gcc 1 4 3 Tyr Lieu Ala Ser Met Glu Gly Ala Val Lieu Ser Gly Lys Phe Cys Ala 35 40 45 caa toc att gta cag gat titc aag gCa gga aaa citg aaa gC g g g c ggit 19 Glin Ser Ile Val Glin Asp Phe Lys Ala Gly Lys Lieu Lys Ala Gly Gly 5 O 55 60 gag aag gala gCt gtg citg gto tct caa toga ccaaagcttg agact cattt 24 Glu Lys Glu Ala Val Leu Val Ser Glin 65 70 accottgtac ttgtaattica ttatacttgg togtttgcac tagttgacgc gcgcttctoa 30 gcta acacat titt caccaat aataggtggg gctgttgttca atgcgcagaa atttggattg 36 gtacaggatt cactgatcca citgattacga tigcagotgat gggtotcgtt gttagg tagg 42 cittcattcat atgcc.gcaag citgatttgcc ggaaatccag caattcact g gtttittgaac 48 gaaaattgct g gttgaag at ttact gtaag cqgttcaccg catgctatto agtgcacttic 54 atgttcaaat citgaatcaat ttctgtcaaa aaaaa 576

<210 SEQ ID NO 56 &2 11s LENGTH 72 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 56 Tyr Lys Thr Val Pro Asp Cys Glu Pro Cys Arg Pro Leu Glin Arg Ser 1 5 10 15 Pro Ile Pro Llys Phe Tyr Met Ala Gly Asp Phe Thr Lys Gln Lys Tyr 2O 25 30 Leu Ala Ser Met Glu Gly Ala Val Lieu Ser Gly Lys Phe Cys Ala Glin 35 40 45 Ser Ile Val Glin Asp Phe Lys Ala Gly Lys Lieu Lys Ala Gly Gly Glu 5 O 55 60 Lys Glu Ala Val Lieu Val Ser Glin 65 70

<210 SEQ ID NO 57 &2 11s LENGTH 476 &212> TYPE DNA US 2003/O157592 A1 Aug. 21, 2003

-continued <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (245) . . (475) <223> OTHER INFORMATION: 42 cK10 g 09fwd <400 SEQUENCE: 57 gtgcaa.ca.gc actgaattgg aattgttgttcaagaggitttg ggattgttggg ttagtgttgtg 60 cgtgcgtgcg agtttgagag aagggggttt taagcticag gttgcaaata ttittggtagc 120 tatggcgggg ttggtggtgc aggcggggag gtgtgcaggg gtggctt Cac togtogttggC 18O titcc togtog to gagt catg toga agggatc gattccagog ccatgttittg cagttgttgga 240 citgaaag gat gcc agc agc aga cigg aca ggg agt gtg cqc gtc aca gcc 289 Lys Asp Ala Ser Ser Arg Arg Thr Gly Ser Val Arg Val Thr Ala 1 5 10 15 agc titg caa agc atg gtg tog gac at g agc agg aaa goa cog aaa ggit 337 Ser Lieu Glin Ser Met Val Ser Asp Met Ser Arg Lys Ala Pro Lys Gly 2O 25 30 citg titc cct coc gag ccc gag got tac aag ggg ccc aag citc aag gtc 385 Leu Phe Pro Pro Glu Pro Glu Ala Tyr Lys Gly Pro Lys Lieu Lys Wal 35 40 45 gcc att att ggc gct ggit citt go g g g c at g toc acc gct gtt gag citt 433 Ala Ile Ile Gly Ala Gly Lieu Ala Gly Met Ser Thr Ala Wall Glu Lieu 5 O 55 60 citc gag caa ggc cac gag gtg gat atc tat gag to g c ga aag t 476 Leu Glu Glin Gly His Glu Val Asp Ile Tyr Glu Ser Arg Lys 65 70 75

<210 SEQ ID NO 58 &2 11s LENGTH 77 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 58 Lys Asp Ala Ser Ser Arg Arg Thr Gly Ser Val Arg Val Thr Ala Ser 1 5 10 15 Leu Glin Ser Met Val Ser Asp Met Ser Arg Lys Ala Pro Lys Gly Lieu 2O 25 30 Phe Pro Pro Glu Pro Glu Ala Tyr Lys Gly Pro Lys Lieu Lys Val Ala 35 40 45 Ile Ile Gly Ala Gly Lieu Ala Gly Met Ser Thr Ala Val Glu Lieu Lieu 5 O 55 60 Glu Glin Gly His Glu Val Asp Ile Tyr Glu Ser Arg Lys 65 70 75

<210 SEQ ID NO 59 &2 11s LENGTH 535 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) ... (486) <223> OTHER INFORMATION: 84 mm.11 f12rev &220s FEATURE <221 NAME/KEY: misc feature &222> LOCATION 527 <223> OTHER INFORMATION: n is c, g, a or t. <400 SEQUENCE: 59 att acc gga gag togg tac toc aag titc gat act titc. tca coc go a gCa 48 US 2003/O157592 A1 Aug. 21, 2003 67

-continued Ile Thr Gly Glu Trp Tyr Cys Llys Phe Asp Thr Phe Ser Pro Ala Ala 1 5 10 15 gag cqa ggc titg cca gtc act cqa gtg atc agt cqg atg aaa citc cag 96 Glu Arg Gly Lieu Pro Val Thr Arg Val Ile Ser Arg Met Lys Lieu Glin 2O 25 30 gala att citt toc got gca ttg gga to a gag tac ata cag aat ggc tot 144 Glu Ile Leu Ser Gly Ala Leu Gly Ser Glu Tyr Ile Glin Asn Gly Ser 35 40 45 aat gtg gta gat titt gtg gac gac ggg aac aaa gtg gaa gttc gtg citg 192 Asn Val Val Asp Phe Val Asp Asp Gly Asn Lys Val Glu Val Val Lieu 5 O 55 60 gag gat gga cqg acattt gaa ggg gac atc citc gtc. g.gc got gat ggc 240 Glu Asp Gly Arg Thr Phe Glu Gly Asp Ile Leu Val Gly Ala Asp Gly 65 70 75 8O att cqc toc aag gtg cqa acg aaa ttg cita ggit gag tog tog acc gtg 288 Ile Arg Ser Lys Val Arg Thr Lys Lieu Lieu Gly Glu Ser Ser Thr Val 85 90 95 tat tot gat tac acc toc tac acg ggg att got gat titt gtg ccc got 336 Tyr Ser Asp Tyr Thr Cys Tyr Thr Gly Ile Ala Asp Phe Val Pro Ala 100 105 110 gat atc gac acc gtt gog tac cqc gtc titc citc ggc cac aaa cag tac 384 Asp Ile Asp Thr Val Gly Tyr Arg Val Phe Leu Gly. His Lys Glin Tyr 115 120 125 titt gtt tot tog gac gtt gogg caa ggg aag atg cag tog tat gcg titc 432 Phe Val Ser Ser Asp Val Gly Glin Gly Lys Met Gln Trp Tyr Ala Phe 130 135 1 4 0 tac aat gaa cot gcg ggc ggg gta gac goc coa gcg gaa gga aag caa 480 Tyr Asn. Glu Pro Ala Gly Gly Val Asp Ala Pro Ala Glu Gly Lys Glin 145 15 O 155 160 ggit toga totcgttgtt cqggggatgg totgacaagg togtggat.ct inctactggc 535 Gly

<210 SEQ ID NO 60 <211& LENGTH 161 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 60 Ile Thr Gly Glu Trp Tyr Cys Llys Phe Asp Thr Phe Ser Pro Ala Ala 1 5 10 15 Glu Arg Gly Lieu Pro Val Thr Arg Val Ile Ser Arg Met Lys Lieu Glin 2O 25 30 Glu Ile Leu Ser Gly Ala Leu Gly Ser Glu Tyr Ile Glin Asn Gly Ser 35 40 45 Asn Val Val Asp Phe Val Asp Asp Gly Asn Lys Val Glu Val Val Lieu 5 O 55 60 Glu Asp Gly Arg Thr Phe Glu Gly Asp Ile Leu Val Gly Ala Asp Gly 65 70 75 8O Ile Arg Ser Lys Val Arg Thr Lys Lieu Lieu Gly Glu Ser Ser Thr Val 85 90 95 Tyr Ser Asp Tyr Thr Cys Tyr Thr Gly Ile Ala Asp Phe Val Pro Ala 100 105 110 Asp Ile Asp Thr Val Gly Tyr Arg Val Phe Leu Gly. His Lys Glin Tyr 115 120 125 Phe Val Ser Ser Asp Val Gly Glin Gly Lys Met Gln Trp Tyr Ala Phe 130 135 1 4 0 US 2003/O157592 A1 Aug. 21, 2003 68

-continued

Tyr Asn. Glu Pro Ala Gly Gly Val Asp Ala Pro Ala Glu Gly Lys Glin 145 15 O 155 160 Gly

<210> SEQ ID NO 61 &2 11s LENGTH 620 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (311) <223> OTHER INFORMATION: 41 ppprot1 085 g03 rev <400 SEQUENCE: 61 ca toc gaa ata gag citt ggc gag titc cqg gct gtg acg gaa ccc gaa 47 Cys Glu Ile Glu Leu Gly Glu Phe Arg Ala Val Thr Glu Pro Glu 1 5 10 15 gtt go a coa cag cat gcc aaa citt gtg ttcaaa gac ggc goc citg titt 95 Val Ala Pro Gln His Ala Lys Lieu Val Phe Lys Asp Gly Ala Lieu Phe 2O 25 30 gtt acg gac cta gac agc aag act ggc acg togg att acg agt atc agt 1 4 3 Val Thr Asp Leu Asp Ser Lys Thr Gly Thr Trp Ile Thr Ser Ile Ser 35 40 45 ggit ggit cqc toc aaa ttg acc cc g aaa at g ccc act c ga gtt cac cog 191 Gly Gly Arg Cys Lys Leu Thr Pro Llys Met Pro Thr Arg Val His Pro 5 O 55 60 gag gat atc att gag titc ggc cct gcc aag gag gCt cag tac aag gtg 239 Glu Asp Ile Ile Glu Phe Gly Pro Ala Lys Glu Ala Glin Tyr Lys Wal 65 70 75 aag citc cqa agg toc cag cca gCt aga. tca aac tot tac aag aca gac 287 Lys Lieu Arg Arg Ser Glin Pro Ala Arg Ser Asn. Ser Tyr Lys Thr Asp 8O 85 90 95 ttgaat gcig citg aaa gtg gca taa gggg actcga taaacticcag tattogacga 341 Lieu. Asn Ala Lieu Lys Val Ala 1 OO citattotgca gtgatggg ac totag cagoa ttgaatctoc accoccc.ccc cittttitttitt 401 taattittaaa alacatcgata cagcacttga citggacccac ggattgaatt gaattgcago 461 aatgttgaag gattgctgca gctic gactica caggatagga tigtaaccoat gcc agcticta 521 gtgitatgaaa tagtaggcto tagatagatt aacco actot atattgttag totgtaatct 581 gatccaaagg gattcttaag atttcttggit toaaaaaaa 62O

<210> SEQ ID NO 62 <211& LENGTH 102 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 62 Cys Glu Ile Glu Leu Gly Glu Phe Arg Ala Val Thr Glu Pro Glu Val 1 5 10 15 Ala Pro Gln His Ala Lys Lieu Val Phe Lys Asp Gly Ala Lieu Phe Val 2O 25 30 Thr Asp Leu Asp Ser Lys Thr Gly Thr Trp Ile Thr Ser Ile Ser Gly 35 40 45 Gly Arg Cys Lys Leu Thr Pro Llys Met Pro Thr Arg Val His Pro Glu 5 O 55 60 US 2003/O157592 A1 Aug. 21, 2003 69

-continued Asp Ile Ile Glu Phe Gly Pro Ala Lys Glu Ala Glin Tyr Lys Val Lys 65 70 75

Teu Arg Arg Ser Glin Pro Ala Arg Ser Asn. Ser Tyr Lys Thr Asp Lieu 85 90 95

Asn Ala Lieu Lys Wall Ala 100

SEQ ID NO 63 LENGTH 465 TYPE DNA ORGANISM: Physcomitrella patens FEATURE: NAME/KEY: CDS LOCATION: (2) ... (433) OTHER INFORMATION: 06 ppprot1 062 a09 rev

<400 SEQUENCE: 63

a gtg gala ggc gcg gcc aca gaa gag cqa ttt titt citt titt cita gag gala 49 Wall Glu Gly Ala Ala Thr Glu Glu Arg Phe Phe Leu Phe Leu Glu Glu 5 10 15

titc. Cala cga cac to c agg aat tat gtc agg cag tta Ca tgg titc. 97 Phe Glin Arg His Ser Arg Asn Wall Arg Glin Teu Thr Trp Phe 2O 25 30 cga aat a.a.a. ggit Cala agt gag cag atg aaC tgg att gat gcc aCa 145 Arg Asn Lys Gly Glin Ser Glu Glin Met Asn Trp Ile Asp Ala Thr 35 40 45 cag cc c cita gaa gtg atg gtg gac gcc tta gcg a.a.a. gag tat gaa agg 193 Glin Pro Leu Glu Wall Met Wall Asp Ala Sel Ala Lys Glu Glu Arg 5 O 55 60

ccc. aat gaa gtg gtg agc gat gtc citg a.a.a. gcg gca agt gtt gtt acc 241 Pro Asn Glu Wall Wall Ser Asp Wall Telu Ala Ala Ser Wall Wall Thr 65 70 75 8O aag gag tot tac aag gag gaa a.a. C. citt aag cgc tac cga act 289 Glu Ser Ser Tyr Glu Glu Asn Lel Teu Arg Arg Thr 85 90 95

Cala a.a. C. agg ata titt act agt a.a. C. gag gcg citc. aag cgt. act tta 337 Glin Asn Arg Ile Phe Thr Ser Asn Ser Glu Ala Teu Arg Thr Telu 100 105 110

Cala tgg ata cga gat acc cag cita tgg cgg aac agt agc acg gtg 385 Glin Trp Ile Arg Asp Thr Glin Cys Telu Trp Arg Asn Ser Ser Thr Wall 115 120 125

gat gat citc. Cala aag aga atg gaa toa to c ttg acg acc tot atg taa 433 Asp Asp Telu Glin Lys Arg Met Glu Ser Ser Teu Thr Thr Ser Met 130 135 1 4 0

cgttgctitat tittatgagtg aagattittga ct 465

SEQ ID NO 64 LENGTH 143 TYPE PRT ORGANISM: Physcomitrella patens

<400 SEQUENCE: 64 Val Glu Gly Ala Ala Thr Glu Glu Arg Phe Phe Leu Phe Teu Glu Glu 1 5 10 15

Phe Glin Arg His Ser Arg Asn Tyr Wall Lys Arg Glin Leu Thr Trp Phe 25 30

Arg Asn Lys Gly Glin Ser Glu Glin Met Phe Asn Trp Ile Asp Ala Thr 35 40 45 US 2003/O157592 A1 Aug. 21, 2003 70

-continued Glin Pro Leu Glu Val Met Val Asp Ala Lieu Ala Lys Glu Tyr Glu Arg 5 O 55 60 Pro Asn Glu Val Val Ser Asp Val Leu Lys Ala Ala Ser Val Val Thr 65 70 75 8O Lys Glu Ser Ser Tyr Lys Glu Glu Asn Lieu Lleu Lys Arg Tyr Arg Thr 85 90 95 Glin Asn Arg Ile Phe Thr Ser Asn. Ser Glu Ala Leu Lys Arg Thr Lieu 100 105 110 Glin Trp Ile Arg Asp Thr Glin Cys Lieu Trp Arg Asn. Ser Ser Thr Val 115 120 125 Asp Asp Leu Glin Lys Arg Met Glu Ser Ser Lieu. Thir Thr Ser Met 130 135 1 4 0

<210 SEQ ID NO 65 &2 11s LENGTH 534 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (533) <223> OTHER INFORMATION: 16 ppprot1 082 co8 <400 SEQUENCE: 65 cit cag att gtc at g at g cat gac titt gcc atc acg gala aat tat gca 47 Glin Ile Val Met Met His Asp Phe Ala Ile Thr Glu Asn Tyr Ala 1 5 10 15 atc titt atg gat citt coc citc ctg atg gac ggc gaa agt atg atgaaa 95 Ile Phe Met Asp Leu Pro Leu Leu Met Asp Gly Glu Ser Met Met Lys 2O 25 30 gga aac titc titt atc aag titc gac gala acc aaa gaa got cqg ttg gga 1 4 3 Gly Asn. Phe Phe Ile Lys Phe Asp Glu Thir Lys Glu Ala Arg Lieu Gly 35 40 45 gta citt cot aga tac goc act aac gag agt cag citt cqc togg titc acc 191 Val Leu Pro Arg Tyr Ala Thr Asn Glu Ser Gln Leu Arg Trp Phe Thr 5 O 55 60 att coc gtg tot titc ata titt cac aac gog aac got togg gag gala ggc 239 Ile Pro Val Cys Phe Ile Phe His Asn Ala Asn Ala Trp Glu Glu Gly 65 70 75 gat gala att gtc. ttg cat tot tot cqa at g gala gaa ata aac cita acg 287 Asp Glu Ile Val Lieu. His Ser Cys Arg Met Glu Glu Ile Asn Lieu. Thr 8O 85 90 95 acg gCa gca gac gga titc aaa gala aat gaa cqc att tot caa cot aaa. 335 Thr Ala Ala Asp Gly Phe Lys Glu Asn. Glu Arg Ile Ser Glin Pro Lys 1 OO 105 110 ttg titt gag titt agg atc aac citt aag act ggit gag gtg aga cag aaa. 383 Leu Phe Glu Phe Arg Ile Asn Lieu Lys Thr Gly Glu Val Arg Gln Lys 115 120 125 cag citc. tca gtt citg gtg gtg gat titt coa agg gtc. aac gag gag tat 431 Gln Leu Ser Val Leu Val Val Asp Phe Pro Arg Val Asn Glu Glu Tyr 130 135 1 4 0 atg gga agg aaa act caa tat atg tat gga gCo att atg gac aaa gag 479 Met Gly Arg Lys Thr Glin Tyr Met Tyr Gly Ala Ile Met Asp Lys Glu 145 15 O 155 tot aaa at g gta gga gtc gga aag titc gac cita ttg aaa gaa cca gag 527 Ser Lys Met Val Gly Val Gly Lys Phe Asp Leu Lleu Lys Glu Pro Glu 160 1.65 170 175 gtgaac c 534 Wall Asn US 2003/O157592 A1 Aug. 21, 2003 71

-continued

<210 SEQ ID NO 66 &2 11s LENGTH 177 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 66 Glin Ile Val Met Met His Asp Phe Ala Ile Thr Glu Asn Tyr Ala Ile 1 5 10 15 Phe Met Asp Leu Pro Leu Lleu Met Asp Gly Glu Ser Met Met Lys Gly 2O 25 30 Asn Phe Phe Ile Lys Phe Asp Glu Thir Lys Glu Ala Arg Lieu Gly Val 35 40 45 Leu Pro Arg Tyr Ala Thr Asn Glu Ser Glin Leu Arg Trp Phe Thr Ile 5 O 55 60 Pro Val Cys Phe Ile Phe His Asn Ala Asn Ala Trp Glu Glu Gly Asp 65 70 75 8O Glu Ile Val Leu. His Ser Cys Arg Met Glu Glu Ile Asn Leu Thir Thr 85 90 95 Ala Ala Asp Gly Phe Lys Glu Asn. Glu Arg Ile Ser Glin Pro Llys Lieu 100 105 110 Phe Glu Phe Arg Ile Asn Lieu Lys Thr Gly Glu Val Arg Glin Lys Glin 115 120 125 Leu Ser Val Leu Val Val Asp Phe Pro Arg Val Asn Glu Glu Tyr Met 130 135 140 Gly Arg Lys Thr Glin Tyr Met Tyr Gly Ala Ile Met Asp Lys Glu Ser 145 15 O 155 160 Lys Met Val Gly Val Gly Lys Phe Asp Leu Lleu Lys Glu Pro Glu Val 1.65 170 175

Asn

<210 SEQ ID NO 67 &2 11s LENGTH 694 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) . . (694) <223> OTHER INFORMATION: 30 ppprot1 064 eo9 <400 SEQUENCE: 67 c cac togt gtt gtc. citc. tca ttt tot coa cqg titt togg caa att togt gtc 49 His Cys Val Val Leu Ser Phe Ser Pro Arg Phe Trp Glin Ile Cys Val 1 5 10 15 citt att gtt titt agt aaa aca aca aat at g gC g gcc gcg at a tot to a 97 Leu Ile Val Phe Ser Lys Thr Thr Asn Met Ala Ala Ala Ile Ser Ser 2O 25 30 gta agt toc atc. tct gca gct aag citc titc. tcc gtt gca gCt gca cot 145 Val Ser Cys Ile Ser Ala Ala Lys Lieu Phe Ser Val Ala Ala Ala Pro 35 40 45 cac goa acg agg cqc act tct gtg citg cac atc agc gct gta gct gac 193 His Ala Thr Arg Arg Thr Ser Val Leu. His Ile Ser Ala Val Ala Asp 5 O 55 60 aag gtc. tct cot gat coa gcc gttc gtg ccc cca aat gtg ctic gag tat 241 Lys Val Ser Pro Asp Pro Ala Val Val Pro Pro Asn Val Leu Glu Tyr 65 70 75 8O US 2003/O157592 A1 Aug. 21, 2003 72

-continued gcq aag aca at g ccc gga gtg act gct cog titc gag aac atc titc gac 289 Ala Lys Thr Met Pro Gly Val Thr Ala Pro Phe Glu Asn Ile Phe Asp 85 90 95 cct gct gac ctic citg gcc cqc got gcc toc agc ccc cqa coc att aag 337 Pro Ala Asp Leu Lleu Ala Arg Ala Ala Ser Ser Pro Arg Pro Ile Lys 100 105 110 gag citg aac agg togg agg gag tog gala atc act cac ggc cqt gtt go c 385 Glu Lieu. Asn Arg Trp Arg Glu Ser Glu Ile Thr His Gly Arg Val Ala 115 120 125 atg citt gcc tot tta gga titt att gtc. cag gag cag citc cag gat tac 433 Met Leu Ala Ser Leu Gly Phe Ile Val Glin Glu Gln Leu Glin Asp Tyr 130 135 1 4 0 tot ttg titc tac aac titt gac ggc caa atc tot ggit coa gCt atc tac 481 Ser Leu Phe Tyr Asn Phe Asp Gly Glin Ile Ser Gly Pro Ala Ile Tyr 145 15 O 155 160 cac titc cag cag gtt gaa got cqc ggit gcc gttc titt togg gag cot citt 529 His Phe Glin Glin Val Glu Ala Arg Gly Ala Val Phe Trp Glu Pro Leu 1.65 170 175 atc titc gcc atc gct citt toc gag gCa tac aga gta ggit citt ggit togg 577 Ile Phe Ala Ile Ala Lieu. Cys Glu Ala Tyr Arg Val Gly Lieu Gly Trp 18O 185 190 gca act coc cqt toc cag gac titc aac aca ttg agg gat gac tac gala 625 Ala Thr Pro Arg Ser Glin Asp Phe Asn. Thir Lieu Arg Asp Asp Tyr Glu 195 200 2O5 ccc ggit aac tt g g g c titt gac cot togg gcc toc toc caa citg atc cc.g 673 Pro Gly Asn Lieu Gly Phe Asp Pro Trp Ala Ser Ser Glin Lieu. Ile Pro 210 215 220 citg aaa gga agg tta toc aga 694 Leu Lys Gly Arg Lieu. Cys Arg 225 230

<210 SEQ ID NO 68 <211& LENGTH: 231 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 68 His Cys Val Val Leu Ser Phe Ser Pro Arg Phe Trp Glin Ile Cys Val 1 5 10 15 Leu Ile Val Phe Ser Lys Thr Thr Asn Met Ala Ala Ala Ile Ser Ser 2O 25 30 Val Ser Cys Ile Ser Ala Ala Lys Lieu Phe Ser Val Ala Ala Ala Pro 35 40 45 His Ala Thr Arg Arg Thr Ser Val Leu. His Ile Ser Ala Val Ala Asp 5 O 55 60 Lys Val Ser Pro Asp Pro Ala Val Val Pro Pro Asn Val Leu Glu Tyr 65 70 75 8O Ala Lys Thr Met Pro Gly Val Thr Ala Pro Phe Glu Asn Ile Phe Asp 85 90 95 Pro Ala Asp Leu Lleu Ala Arg Ala Ala Ser Ser Pro Arg Pro Ile Lys 100 105 110 Glu Lieu. Asn Arg Trp Arg Glu Ser Glu Ile Thr His Gly Arg Val Ala 115 120 125 Met Leu Ala Ser Leu Gly Phe Ile Val Glin Glu Gln Leu Glin Asp Tyr 130 135 1 4 0 Ser Leu Phe Tyr Asn Phe Asp Gly Glin Ile Ser Gly Pro Ala Ile Tyr US 2003/O157592 A1 Aug. 21, 2003 73

-continued

145 15 O 155 160

His Phe Glin Glin Wall Glu Ala Arg Gly Ala Wall Phe Trp Glu Pro Telu 1.65 170 175

Ile Phe Ala Ile Ala Teu Glu Ala Arg Wall Gly Teu Gly Trp 18O 185 190

Ala Thr Pro Arg Ser Glin Asp Phe Asn Thr Teu Arg Asp Asp Tyr Glu 195 200

Pro Gly Asn Telu Gly Phe Asp Pro Trp Ala Ser Ser Glin Teu Ile Pro 210 215 220

Teu Gly Arg Teu Cys Arg 225 230

SEQ ID NO 69 LENGTH 632 TYPE DNA ORGANISM: Physcomitrella patens FEATURE: NAME/KEY: CDS LOCATION: (3) . . (548) OTHER INFORMATION: 55 ppprot1 093 b04 rev FEATURE: NAME/KEY: misc feature LOCATION: 30 OTHER INFORMATION: n is c, 9. a or t. <400 SEQUENCE: 69 tg g g g g at gca ttcaac at g aga cat cot ntg aca ggc ggc ggc atg Gly Asp Ala Phe Asn Met Arg His Pro Xaa Thr Gly Gly Gly Met 1 5 10 15

acc gtg gct citt to c gat att gtt citg citc. cgg gac atg citc. agg cct 95 Thr Wall Ala Telu Ser Asp Ile Wall Telu Telu Arg Asp Met Teu Arg Pro 2O 25 30 tta agt agt titt cat gat gct Cala toa tta gat tac ttg cag gct 1 4 3 Teu Ser Ser Phe His Asp Ala Glin Ser Telu Asp Teu Glin Ala 35 40 45 titt tac acg cga cgc aag cct gtt gca gcc act atc aat act citt gCg 191 Phe Thr Arg Arg Wall Ala Ala Thr Ile Asn Thr Telu Ala 5 O 55 60 gga gcc citt tac a.a.a. gtg titt gac to c cct gat gCg atg a.a.a. 239 Gly Ala Telu Wall Phe Cys Asp Ser Pro Asp Teu Ala Met Lys 65 70 75 gaa atg aga cag gct tgc titt gac tat ttg agc att gga ggit gtc titc. 287 Glu Met Arg Glin Ala Cys Phe Asp Telu Ser Ile Gly Gly Wall Phe 8O 85 90 95 toa gga cca gtt gcc citt ttg tot gga citt aac cct cgt. cct ttg 335 Ser Ser Gly Pro Wall Ala Lel Telu Ser Gly Teu Asn Pro Arg Pro Telu 1 OO 105 110

agt cita gtg gtc cac titc. titt gCg gtt gct gta tat gga gta ggg aga 383 Ser Telu Wall Wall His Phe Phe Ala Wall Ala Wall Gly Wall Gly Arg 115 120 125 citc. citt gtt cct titt cct toa cc.g toa agg gta tgg att ggc gca cgt. 431 Teu Telu Wall Pro Phe Pro Ser Pro Ser Arg Wall Trp Ile Gly Ala Arg 130 135 1 4 0 citc. cita cgg gga gct gCg aat att ata titc. cc.g atc att a.a.a. gca gaa 479 Teu Telu Arg Gly Ala Ala Asn Ile Ile Phe Pro Ile Ile Ala Glu 145 15 O 155 gga gtc agg cag atg titc. titt cca aat atg gtt cct gca tat tac a.a.a. 527 Gly Wall Arg Glin Met Phe Phe Pro Asn Met Wall Pro Ala Tyr Tyr Lys 160 1.65 170 175 US 2003/O157592 A1 Aug. 21, 2003

-continued gca coa cog gCa gag gag taa gtgaaatgtg atggtgcggit attgaaatta 578 Ala Pro Pro Ala Glu Glu 18O accggtotcg tttactaata aacagagact ggtoattaat tcaaccagtt cotc 632

<210 SEQ ID NO 70 &2 11s LENGTH 181 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 10 <223> OTHER INFORMATION: Xaa is Met Leu, or Wall. <400 SEQUENCE: 70 Gly Asp Ala Phe Asn Met Arg His Pro Xaa Thr Gly Gly Gly Met Thr 1 5 10 15 Val Ala Leu Ser Asp Ile Val Lieu Lieu Arg Asp Met Leu Arg Pro Leu 2O 25 30 Ser Ser Phe His Asp Ala Glin Ser Lieu. Cys Asp Tyr Lieu Glin Ala Phe 35 40 45 Tyr Thr Arg Arg Lys Pro Wall Ala Ala Thir Ile Asn. Thir Lieu Ala Gly 5 O 55 60 Ala Leu Tyr Lys Val Phe Cys Asp Ser Pro Asp Leu Ala Met Lys Glu 65 70 75 8O Met Arg Glin Ala Cys Phe Asp Tyr Leu Ser Ile Gly Gly Val Phe Ser 85 90 95 Ser Gly Pro Wall Ala Leu Lleu Ser Gly Lieu. Asn Pro Arg Pro Leu Ser 100 105 110 Leu Val Val His Phe Phe Ala Val Ala Val Tyr Gly Val Gly Arg Leu 115 120 125 Leu Val Pro Phe Pro Ser Pro Ser Arg Val Trp Ile Gly Ala Arg Leu 130 135 1 4 0 Leu Arg Gly Ala Ala Asn. Ile Ile Phe Pro Ile Ile Lys Ala Glu Gly 145 15 O 155 160 Val Arg Gln Met Phe Phe Pro Asn Met Val Pro Ala Tyr Tyr Lys Ala 1.65 170 175

Pro Pro Ala Glu Glu 18O

<210 SEQ ID NO 71 &2 11s LENGTH 602 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) ... (420) <223> OTHER INFORMATION: O2 mm.14 a 7rev

<400 SEQUENCE: 71 cag aac cog gat ggc ggc tigg ggc gag toc toc goc tog tac gtc gac 48 Glin Asn Pro Asp Gly Gly Trp Gly Glu Ser Cys Ala Ser Tyr Val Asp 1 5 10 15 citg cag cag cqc ggt gtc. g.gc ccc agc acc gog toc cag act gcg togg 96 Leu Glin Glin Arg Gly Val Gly Pro Ser Thr Ala Ser Gln Thr Ala Trp 2O 25 30 gca citc at g gCa citg gtg to a gtg cqc cac toc agc gag tac tac gac 144 US 2003/O157592 A1 Aug. 21, 2003 75

-continued Ala Leu Met Ala Leu Val Ser Val Arg His Ser Ser Glu Tyr Tyr Asp 35 40 45 gca atc agg aat ggt gtg gag tat citg gtg cqg acg cqc aca gcg gCa 192 Ala Ile Arg Asn Gly Val Glu Tyr Lieu Val Arg Thr Arg Thr Ala Ala 5 O 55 60 ggc to a togg agt gat ggc ggc cta titc aca ggc act gga titc cct ggc 240 Gly Ser Trp Ser Asp Gly Gly Leu Phe Thr Gly Thr Gly Phe Pro Gly 65 70 75 8O aac gtc gta ggc acg cqg atc gat cit g g g c acc gat agc toc aag cog 288 Asn Val Val Gly Thr Arg Ile Asp Leu Gly Thr Asp Ser Ser Lys Pro 85 90 95 ggc cat gga aac gag citc agt cqc ggc tac atg ttg cqc tac cac atg 336 Gly His Gly Asn. Glu Lieu Ser Arg Gly Tyr Met Leu Arg Tyr His Met 100 105 110 tac cog cat tac titt cot citc at g g ct citt ggg cqg gct cqc aag tat 384 Tyr Pro His Tyr Phe Pro Leu Met Ala Leu Gly Arg Ala Arg Lys Tyr 115 120 125 titc cag cat gtg aag tot citc cct cqt toc citc tda atttatctga 430 Phe Glin His Val Lys Ser Leu Pro Arg Ser Leu 130 135 citctgaggct gcc citcaaaa tttgtaggct g gaga acaga aat attaccg acgtotaaat 490 attaaattaa atccaccitct gatcggatcc agtccittgta cacataataa gtoaaacaat 550 gacaatgtgt gactittgaag tacatatocaa to catttaca atggg tatgt ca 6O2

<210> SEQ ID NO 72 &2 11s LENGTH 139 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 72 Glin Asn Pro Asp Gly Gly Trp Gly Glu Ser Cys Ala Ser Tyr Val Asp 1 5 10 15 Leu Glin Glin Arg Gly Val Gly Pro Ser Thr Ala Ser Gln Thr Ala Trp 2O 25 30 Ala Leu Met Ala Leu Val Ser Val Arg His Ser Ser Glu Tyr Tyr Asp 35 40 45 Ala Ile Arg Asn Gly Val Glu Tyr Lieu Val Arg Thr Arg Thr Ala Ala 5 O 55 60 Gly Ser Trp Ser Asp Gly Gly Leu Phe Thr Gly Thr Gly Phe Pro Gly 65 70 75 8O Asn Val Val Gly Thr Arg Ile Asp Leu Gly Thr Asp Ser Ser Lys Pro 85 90 95 Gly His Gly Asn. Glu Lieu Ser Arg Gly Tyr Met Leu Arg Tyr His Met 100 105 110 Tyr Pro His Tyr Phe Pro Leu Met Ala Leu Gly Arg Ala Arg Lys Tyr 115 120 125 Phe Glin His Val Lys Ser Leu Pro Arg Ser Leu 130 135

<210 SEQ ID NO 73 &2 11s LENGTH 602 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (3) . . (470) US 2003/O157592 A1 Aug. 21, 2003 76

-continued <223> OTHER INFORMATION: 51 ppprot1 081, a05 rev <400 SEQUENCE: 73 gg titt cot gat gct cat gtc aca ggit cita gat ttg tog coc tac titt 47 Phe Pro Asp Ala His Val Thr Gly Leu Asp Leu Ser Pro Tyr Phe 1 5 10 15 tta gct gtg gCt caa tac atg gag aaa cag agg atc. tcc agc ggg citt 95 Leu Ala Val Ala Glin Tyr Met Glu Lys Glin Arg Ile Ser Ser Gly Lieu 2O 25 30 gga aga cqc aga cca ata agt togg gta cat gca aat gga gag to c acg 1 4 3 Gly Arg Arg Arg Pro Ile Ser Trp Val His Ala Asn Gly Glu Cys Thr 35 40 45 ggc titg cca agt toa tot titt gat gtg gtt tog citt gcc titc gtg att 191 Gly Leu Pro Ser Ser Ser Phe Asp Val Val Ser Leu Ala Phe Val Ile 5 O 55 60 cat gala tot cot caa cat gct att aga ggt tta citg aag gag gCt citc 239 His Glu Cys Pro Gln His Ala Ile Arg Gly Lieu Lleu Lys Glu Ala Lieu 65 70 75 aga tta ttg aaa ccc gga gga acc gtg tog cita act gac aac to g ccc 287 Arg Lieu Lleu Lys Pro Gly Gly Thr Val Ser Lieu. Thr Asp Asn. Ser Pro 8O 85 90 95 aaa tog aag gtc. citt cag aat ttg cca cot goa ata titt act cta atg 335 Lys Ser Lys Val Leu Glin Asn Leu Pro Pro Ala Ile Phe Thr Leu Met 1 OO 105 110 aag tot acg gag ccc tigg atg gat gag tac titc act titt gac ttg gala 383 Lys Ser Thr Glu Pro Trp Met Asp Glu Tyr Phe Thr Phe Asp Leu Glu 115 120 125 ggit gala at g gag aag att gog titc atgaat gttcaat tda att at g aca 431 Gly Glu Met Glu Lys Ile Gly Phe Met Asn Val Asn Ser Ile Met Thr 130 135 1 4 0 aat coa cq a cac cqt act g to aca ggc act gct cot tag gaatgc.cggc 480 Asn Pro Arg His Arg Thr Val Thr Gly Thr Ala Pro 145 15 O 155 agatggctta gaagattitta gtatatgaat tdttaaaggg cattttggag aatccatggc 540 cactitttitta citagat.cgaa gttccaagct coaa.gagcaa gatgaattaa gttctttittg 600 a.a.

<210> SEQ ID NO 74 &2 11s LENGTH 155 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 74 Phe Pro Asp Ala His Val Thr Gly Leu Asp Leu Ser Pro Tyr Phe Leu 1 5 10 15 Ala Val Ala Glin Tyr Met Glu Lys Glin Arg Ile Ser Ser Gly Lieu Gly 2O 25 30 Arg Arg Arg Pro Ile Ser Trp Wal His Ala Asn Gly Glu Cys Thr Gly 35 40 45 Leu Pro Ser Ser Ser Phe Asp Val Val Ser Leu Ala Phe Val Ile His 5 O 55 60 Glu Cys Pro Gln His Ala Ile Arg Gly Lieu Lleu Lys Glu Ala Lieu Arg 65 70 75 8O Leu Lleu Lys Pro Gly Gly Thr Val Ser Lieu. Thr Asp Asn. Ser Pro Lys 85 90 95 US 2003/O157592 A1 Aug. 21, 2003 77

-continued Ser Lys Val Leu Gln Asn Leu Pro Pro Ala Ile Phe Thr Leu Met Lys 100 105 110 Ser Thr Glu Pro Trp Met Asp Glu Tyr Phe Thr Phe Asp Leu Glu Gly 115 120 125 Glu Met Glu Lys Ile Gly Phe Met Asn Val Asn Ser Ile Met Thr Asn 130 135 1 4 0 Pro Arg His Arg Thr Val Thr Gly Thr Ala Pro 145 15 O 155

<210 SEQ ID NO 75 &2 11s LENGTH 475 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (2) ... (475) <223> OTHER INFORMATION: 93 ck24 ho5fwd &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 413 <223> OTHER INFORMATION: n is c, g, a or t. <400 SEQUENCE: 75 c gac tac ttgaac cag citc citc atc aag titc gac cac got tagt coa aac 49 Asp Tyr Lieu. Asn. Glin Lieu Lleu. Ile Llys Phe Asp His Ala Cys Pro Asn 1 5 10 15 gtg tac coc gtt gat citc titc gag cqt ttg togg atg gta gac cqc cta 97 Val Tyr Pro Val Asp Lieu Phe Glu Arg Lieu Trp Met Val Asp Arg Lieu 20 25 30 caa agg citg gga ata toc cqc tac titc gag cqa gaa atc aga gac tot 145 Glin Arg Lieu Gly Ile Ser Arg Tyr Phe Glu Arg Glu Ile Arg Asp Cys 35 40 45 cita caa tat gta tac cqa tac tog aag gat tdt ggt att ggc togg gca 193 Leu Glin Tyr Val Tyr Arg Tyr Trp Lys Asp Cys Gly Ile Gly Trp Ala 5 O 55 60 agc aat to g to c gtg cag gac gtg gac gac acg gcc atg gCC titc cqc 241 Ser Asn. Ser Ser Val Glin Asp Val Asp Asp Thr Ala Met Ala Phe Arg 65 70 75 8O citt citc cqc aca cac gga titc gac gtc. aag gag gac togc titc aga cag 289 Leu Lieu Arg Thr His Gly Phe Asp Wall Lys Glu Asp Cys Phe Arg Glin 85 90 95 titt ttcaaa gat ggit gag titc titc toc ttic goc ggc cag to c agc caa 337 Phe Phe Lys Asp Gly Glu Phe Phe Cys Phe Ala Gly Glin Ser Ser Glin 100 105 110 gcc gtc acg gga atg titc aac ctic agc aga gCatcg caa acg citc titc 385 Ala Val Thr Gly Met Phe Asn Leu Ser Arg Ala Ser Gln Thr Leu Phe 115 120 125 cca ggg gala to a citc. cita aaa aag gC cana acc titt toc aga aac titt 433 Pro Gly Glu Ser Lieu Lleu Lys Lys Ala Xala Thr Phe Ser Arg Asn. Phe 130 135 1 4 0 ttg aga acc aag cat gala aac aat gala toc titc gac aag togg 475 Leu Arg Thr Lys His Glu Asn. Asn. Glu Cys Phe Asp Lys Trp 145 15 O 155

<210 SEQ ID NO 76 &2 11s LENGTH 158 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: 138 US 2003/O157592 A1 Aug. 21, 2003 78

-continued <223> OTHER INFORMATION: Xaa is Arg, Ile, Thr or Lys. <400 SEQUENCE: 76 Asp Tyr Lieu. Asn Glin Lieu Lleu. Ile Llys Phe Asp His Ala Cys Pro Asn 1 5 10 15 Val Tyr Pro Val Asp Lieu Phe Glu Arg Lieu Trp Met Val Asp Arg Lieu 2O 25 30 Glin Arg Lieu Gly Ile Ser Arg Tyr Phe Glu Arg Glu Ile Arg Asp Cys 35 40 45 Leu Glin Tyr Val Tyr Arg Tyr Trp Lys Asp Cys Gly Ile Gly Trp Ala 5 O 55 60 Ser Asn. Ser Ser Val Glin Asp Val Asp Asp Thr Ala Met Ala Phe Arg 65 70 75 8O Leu Lieu Arg Thr His Gly Phe Asp Wall Lys Glu Asp Cys Phe Arg Glin 85 90 95 Phe Phe Lys Asp Gly Glu Phe Phe Cys Phe Ala Gly Glin Ser Ser Glin 100 105 110 Ala Val Thr Gly Met Phe Asn Leu Ser Arg Ala Ser Gln Thr Leu Phe 115 120 125 Pro Gly Glu Ser Lieu Lleu Lys Lys Ala Xala Thr Phe Ser Arg Asn. Phe 130 135 1 4 0 Leu Arg Thr Lys His Glu Asn. Asn. Glu Cys Phe Asp Lys Trp 145 15 O 155

<210 SEQ ID NO 77 &2 11s LENGTH 317 &212> TYPE DNA <213> ORGANISM: Physcomitrella patens &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (49).. (312) <223> OTHER INFORMATION: 51 ppprot1 0 052 a05 <400 SEQUENCE: 77 actggattta ccatacgatg ccactatott gcaacaaatc. tcggctga aag aga gaa 57 Lys Arg Glu 1 gala aat gala aaa agc agg att cot at g gC g at g g to tac aag tac coc 105 Glu Asin Glu Lys Ser Arg Ile Pro Met Ala Met Val Tyr Lys Tyr Pro act act ttg citg cat tot citg gala ggc ct g cac cqg gaa gtg gac togg 153 Thir Thr Lieu Lieu. His Ser Leu Glu Gly Lieu. His Arg Glu Val Asp Trp 2O 25 30 35 aac aag ctic ctic cag cita cag to c gag aat ggc ticc titt citg tat to a 201 Asn Lys Lieu Lieu Gln Leu Glin Ser Glu Asn Gly Ser Phe Leu Tyr Ser 40 45 5 O ccc goa toc act gca toc goa citt gta cac aaa aga tigt gala gtg citt 249 Pro Ala Ser Thr Ala Cys Ala Lieu Val His Lys Arg Cys Glu Val Lieu 55 60 65 cga cita citt gaa cca gct cot cat caa gtt cqa coa cqc ttg to c aaa. 297 Arg Lieu Lieu Glu Pro Ala Pro His Glin Val Arg Pro Arg Lieu Ser Lys 70 75 8O cgt gta coc cqt tda totct 317 Arg Val Pro Arg 85

<210 SEQ ID NO 78

US 2003/O157592 A1 Aug. 21, 2003 81

-continued attatcctac gitatcagaga acgttattot go.gcttgcat gtgttcaatgaattittgaaa 1817 ataaaaaag.c atcatctoag tatgataaaa aaaaaaaaaa aaaaa 1862

<210 SEQ ID NO 80 &2 11s LENGTH 370 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 80 Met Ala Wall Ala Leu Gly Ala Ala Gly Ser Phe Ala Gly Ala Ala Ala 1 5 10 15 Ala Arg Ala Trp Thr Cys Ser Ser Ser Ile Ser Ser Cys Asn. Glu Ile 2O 25 30 Arg Thr Arg Ser Thr Ser Val Thr Ser Ala Glin Val Cys Gly Leu Ile 35 40 45 Arg Ala Asp Asp Glu Val Gly Arg Arg Gly Wall Lys Thr Arg Ser Lieu 5 O 55 60 Arg Ser Gly Gly Val Val Arg Arg Ala Val Glin Arg Thr Glu Pro Glu 65 70 75 8O Leu Tyr Asp Gly Ile Ala His Phe Tyr Asp Glu Ser Ser Gly Val Trp 85 90 95 Glu Gly Ile Trp Gly Glu His Met His His Gly Tyr Tyr Asp Glu Glu 100 105 110 Ile Val Glu Ala Val Val Asp Gly Asp Pro Asp His Arg Arg Ala Glin 115 120 125 Ile Lys Met Ile Glu Lys Ser Lieu Ala Tyr Ala Gly Val Pro Asp Ser 130 135 1 4 0 Lys Asp Lieu Lys Pro Lys Thir Ile Val Asp Val Gly Cys Gly Ile Gly 145 15 O 155 160 Gly Ser Ser Arg Tyr Lieu Ala Arg Llys Phe Glin Ala Lys Wall Asn Ala 1.65 170 175 Ile Thr Leu Ser Pro Val Glin Val Glin Arg Ala Val Asp Leu Thr Ala 18O 185 190 Lys Glin Gly Lieu Ser Asp Leu Val Asn. Phe Glin Val Ala Asn Ala Lieu 195 200 2O5 Asn Glin Pro Phe Glin Asp Gly Ser Phe Asp Leu Val Trp Ser Met Glu 210 215 220 Ser Gly Glu His Met Pro Asp Llys Lys Llys Phe Val Gly Glu Lieu Ala 225 230 235 240 Arg Val Ala Ala Pro Gly Gly Arg Ile Ile Leu Val Thr Trp Cys His 245 250 255 Arg Asp Leu Lys Pro Gly Glu Thir Ser Lieu Lys Pro Asp Glu Glin Asp 260 265 27 O Leu Lieu. Asp Lys Ile Cys Asp Ala Phe Tyr Lieu Pro Ala Trp Cys Ser 275 280 285 Pro Ser Asp Tyr Val Ser Ile Ala Lys Asp Leu Gly Lieu Glin Asp Ile 29 O 295 3OO Lys Ser Glu Gly Trp Ser Glu Tyr Val Thr Pro Phe Trp Pro Ala Val 305 310 315 320 Met Lys Thr Ala Leu Ser Met Glu Gly Leu Val Gly Leu Val Lys Ser 325 330 335 Gly Trp Thr Thr Met Lys Gly Ala Phe Ala Met Thr Leu Met Ile Glin

US 2003/O157592 A1 Aug. 21, 2003 84

-continued gaa cag aag to g g g a citc titc at a gcc tac aag goa tta toga 1842 Glu Gln Lys Trp Gly Lieu Phe Ile Ala Tyr Lys Ala Lieu 480 485 490 tottgaaatt attitcggata tagataaaac agcattgttg gaatagttca cacttgagag 1902 totgttttgt cittcttataa ataaacatcg atactattoa cocaaaaaaa aaaaaaaaaa 1962

<210> SEQ ID NO 82 &2 11s LENGTH 491 &212> TYPE PRT <213> ORGANISM: Physcomitrella patens <400 SEQUENCE: 82 Met Ala Val Asn Thr Glu Arg Ser Leu Glin Ser Thr Tyr Trp Lys Glu 1 5 10 15 His Ser Val Glu Pro Ser Val Glu Ala Met Met Leu Asp Ser Glin Ala 2O 25 30 Ser Lys Lieu. Asp Lys Glu Glu Arg Pro Glu Ile Leu Ser Lieu Lleu Pro 35 40 45 Pro Tyr Glu Asn Lys Asp Wal Met Glu Lieu Gly Ala Gly Ile Gly Arg 5 O 55 60 Phe Thr Gly Glu Lieu Ala Lys His Ala Gly His Val Lieu Ala Met Asp 65 70 75 8O Phe Met Glu Asn Lieu. Ile Lys Lys Asn. Glu Asp Wall Asn Gly His Tyr 85 90 95 Asn Asn. Ile Asp Phe Lys Cys Ala Asp Val Thir Ser Pro Asp Leu Asn 100 105 110 Ile Ala Ala Gly Ser Ala Asp Leu Val Phe Ser Asn Trp Lieu Lleu Met 115 120 125 Tyr Lieu Ser Asp Glu Glu Wall Lys Gly Lieu Ala Ser Arg Wal Met Glu 130 135 1 4 0 Trp Leu Arg Pro Gly Gly Tyr Ile Phe Phe Arg Glu Ser Cys Phe His 145 15 O 155 160 Glin Ser Gly Asp His Lys Arg Lys Asn. Asn Pro Thr His Tyr Arg Glin 1.65 170 175 Pro Asn Glu Tyr Thr Asn Ile Phe Glin Glin Ala Tyr Ile Glu Glu Asp 18O 185 190 Gly Ser Tyr Phe Arg Phe Glu Met Val Gly Cys Lys Cys Val Gly. Thr 195 200 2O5 Tyr Val Arg Asn Lys Arg Asn Glin Asn Glin Val Cys Trp Leu Trp Arg 210 215 220 Lys Val Glin Ser Asp Gly Pro Glu Ser Glu Cys Phe Glin Lys Phe Lieu 225 230 235 240 Asp Thr Glin Glin Tyr Thr Ser Thr Gly Ile Leu Arg Tyr Glu Arg Ile 245 250 255 Phe Gly Glu Gly Phe Val Ser Thr Gly Gly Ile Glu Thir Thr Lys Ala 260 265 27 O Phe Val Ser Met Leu Asp Leu Lys Pro Gly Glin Arg Val Lieu. Asp Wal 275 280 285 Gly Cys Gly Ile Gly Gly Gly Asp Phe Tyr Met Ala Glu Glu Tyr Asp 29 O 295 3OO Ala Glu Val Val Gly Ile Asp Leu Ser Lieu. Asn Met Ile Ser Phe Ala 305 310 315 320 US 2003/O157592 A1 Aug. 21, 2003 85

-continued Teu Glu Arg Ser Ile Gly Lys Ala Wall Glu Phe Glu Wall Gly 325 330 335

Asp Cys Thr Lys Ile Asn Pro His Ala Ser Phe Asp Wall Ile Tyr 340 345 350

Ser Asp Thir Ile Lieu. His Ile Glin Asp Lys Pro Ala Teu Phe Glin 355 360 365

Arg Phe Lys Trp Lieu Lys Pro Gly Gly Arg Wall Teu Ile Ser Asp 370 375 38O

Tyr Cys Arg Ala Pro Glin Thr Pro Ser Ala Glu Phe Ala Ala Ile 385 390 395 400

Glin Glin Arg Gly Tyr Asp Lieu. His Ser Wall Glin Tyr Gly Glu Met 405 410 415

Teu Glu Asp Ala Gly Phe Wall Glu Wall Wall Ala Glu Arg Thr 420 425 430

Glin Phe Ile Glu Wall Teu Glin Arg Glu Telu Ala Thir Thr Glu Ala Gly 435 4 40 4 45

Arg Asp Glin Phe Ile Asn Asp Phe Ser Glu Glu Asp Tyr Asn Ile 450 455 460

Wall Ser Gly Trp Ser Telu Arg Cys Ser Asn Asp Glu Glin 465 470 475 480

Trp Gly Lieu Phe Ile Ala Tyr Ala Teu 485 490

1. An isolated nucleic acid molecule from a moSS encod 10. An isolated nucleic acid molecule comprising a frag ing a Tocopherol and Carotenoid Metabolism Related Pro ment of at least 15 nucleotides of a nucleic acid comprising tein (TCMRP), or a portion thereof. a nucleotide Sequence Selected from the group consisting of 2. An isolated nuclei acid molecule wherein the moss is those Sequences Set forth in Appendix A. selected from Physcomitrella patens or Ceratodon pur 11. An isolated nucleic acid molecule which hybridizes to pureus. the nucleic acid molecule of any one of claims 1-10 under 3. The isolated nucleic acid molecule of claim 1 or 2, Stringent conditions. wherein said nucleic acid molecule encodes an TCMRP 12. An isolated nucleic acid molecule comprising the nucleic acid molecule of any one of claims 1-11 or a portion capable of performing an enzymatic Step involved in the thereof and a nucleotide Sequence encoding a heterologous production of a fine chemical. polypeptide. 4. The isolated nucleic acid molecule of any one of claims 1 to 3, wherein Said nucleic acid molecule encodes an 13. A vector comprising one or more nucleic acid mol TCMRP capable of performing an enzymatic step involved ecule(s) of any one of claims 1-12. in the metabolism of tocopherols and/or carotenoids. 14. The vector of claim 13, which is an expression vector. 5. The isolated nucleic acid molecule of any one of claims 15. A host cell transformed with one or more expression 1 to 4, wherein Said nucleic acid molecule encodes an vector(s) of claim 14. TCMRP assisting in the transmembrane transport. 16. The host cell of claim 15, wherein said cell is a 6. An isolated nucleic acid molecule from mosses Selected microorganism. from the group consisting of those Sequences Set forth in 17. The host cell of claim 15, wherein said cell belongs to Appendix A, or a portion thereof. the genus mosses or algae. 7. An isolated nucleic acid molecule which encodes a 18. The host cell of claim 15, wherein said cell is a plant polypeptide Sequence Selected from the group consisting of cell. those Sequences Set forth in Appendix B. 19. The host cell of any one of claims 15 to 18, wherein 8. An isolated nucleic acid molecule which encodes a the expression of said nucleic acid molecule(s) results in the naturally occurring allelic variant of a polypeptide Selected modulation of the production of a fine chemical from Said from the group of amino acid Sequences consisting of those cell. Sequences Set forth in Appendix B. 20. The host cell of any one of claims 15 to 19, wherein 9. An isolated nucleic acid molecule comprising a nucle the expression of said nucleic acid molecule(s) results in the otide Sequence which is at least 50% homologous to a modulation of the production of tocopherols and/or caro nucleotide Sequence Selected from the group consisting of tenoids from Said cell. those Sequences Set forth in Appendix A, or a portion 21. Descendants, Seeds or reproducable cell material thereof. derived from a host cell of any one of claims 15 to 20. US 2003/O157592 A1 Aug. 21, 2003

22. A method of producing one or more polypeptide(s) 36. A method for producing a fine chemical, comprising comprising culturing the host cell of any one of claims 15 to culturing a cell containing one or more vector(s) of claim 13 20 in an appropriate culture medium to, thereby, produce the or 14 Such that the fine chemical is produced. polypeptide. 37. The method of claim 36, wherein said method further 23. An isolated TCMRP from mosses or algae or a portion thereof. comprises the Step of recovering the fine chemical from Said 24. An isolated TCMRP from microorganisms or fungi or culture. a portion thereof. 38. The method of claim 36 or 37, wherein said method 25. An isolated TCMRP from plants or a portion thereof. further comprises the Step of transforming Said cell with one 26. The polypeptide of any one of claims 23 to 25, or more vector(s) of claim 13 or 14 to result in a cell wherein Said polypeptide is involved in the production of a containing said vector(s). fine chemical. 39. The method of any one of claims 36 to 38, wherein 27. The polypeptide of any one of claims 23 to 25, Said cell is a microorganism. wherein Said polypeptide is involved in assisting in trans 40. The method of any one of claims 36 to 38, wherein membrane transport. Said cell belongs to the genus Corynebacterium or Brevi 28. An isolated polypeptide comprising an amino acid bacterium. Sequence Selected from the group consisting of those Sequences Set forth in Appendix B. 41. The method of any one of claims 36 to 38, wherein 29. An isolated polypeptide comprising a naturally occur Said cell belongs to the genus mosses or algae. ring allelic variant of a polypeptide comprising an amino 42. The method of any one of claims 36 to 38, wherein acid Sequence Selected from the group consisting of those Said cell is a plant cell. Sequences Set forth in Appendix B, or a portion thereof. 43. The method of any one of claims 36 to 42, wherein 30. The isolated polypeptide of any of claims 23 to 29, expression of one or more nucleic acid molecule(s) from further comprising heterologous amino acid Sequences. said vector(s) results in modulation of the production of Said 31. An isolated polypeptide which is encoded by a nucleic fine chemical. acid molecule comprising a nucleotide Sequence which is at 44. The method of claim 43, wherein said fine chemical least 50% homologous to a nucleic acid selected from the is Selected from the group consisting of tocopherols and group consisting of those Sequences Set forth in Appendix A. carotenoids. 32. An isolated polypeptide comprising an amino acid 45. A method for producing a fine chemical, comprising Sequence which is at least 50% homologous to an amino acid culturing a cell whose genomic DNA has been altered by the Sequence Selected from the group consisting of those inclusion of one or more nucleic acid molecule(s) of any one Sequences Set forth in Appendix B. of claims 1-12. 33. An antibody specifically binding to a TCMRP of any 46. A method of claim 45, comprising culturing a cell one of claims 23 to 32 or a portion thereof. 34. Test kit comprising a nucleic acid molecule of any one whose membrane has been altered by the inclusion of one or of claims 1 to 12, a portion and/or a complement thereof more polypeptide(s) of any one of claims 22 to 32. used as probe or primer for identifying and/or cloning 47. A fine chemical produced by a method of any one of further nucleic acid molecules involved in the production of claims 36 to 46. tocopherols and/or carotenoids or assisting in transmem 48. Use of a fine chemical of claim 47 or polypeptide(s) brane transport in other cell types or organisms. of any one of claims 22 to 32 for the production of another 35. Test kit comprising an TCMRP-antibody of claim 33 fine chemical. for identifying and/or purifying further TCMRP molecules or fragments thereof in other cell types or organisms.