US007795.503B2

(12) United States Patent (10) Patent No.: US 7,795,503 B2 Apuya et al. (45) Date of Patent: Sep. 14, 2010

(54) MODULATING ALKALOIDS 6,013,863 A 1/2000 Lundquist et al. 6,087,558 A 7/2000 Howard et al. (75) Inventors: Nestor Apuya, Culver City, CA (US); 6,093,874 A 7/2000 Jofuku et al. Steven Craig Bobzin, Malibu, CA (US); 6,136,320 A 10/2000 Arntzen et al. Joon-Hyun Park, Oak Park, CA (US) 6,235,975 B1 5, 2001 Harada et al. 6.255,562 B1 7/2001 Heyer et al. (73) Assignee: Ceres, Inc., Thousand Oaks, CA (US) 6,271,016 B1 8/2001 Anderson et al. 6,294,717 B1 9, 2001 Xie (*) Notice: Subject to any disclaimer, the term of this 6,303,341. B1 102001 Hiatt et al. patent is extended or adjusted under 35 6,326,527 B1 12/2001 Kirihara et al. U.S.C. 154(b) by 824 days. 6,329,567 B1 12/2001 Jofuku et al. 6,329,571 B1 12/2001 Hiei (21) Appl. No.: 11/360,459 6,417,429 B1 7/2002 Hein et al. 6,423,885 B1 7/2002 Waterhouse et al. (22) Filed: Feb. 22, 2006 6,452,067 B1 9, 2002 Bedbrook et al. 6,518,066 B1 2/2003 Oulmassov et al. (65) Prior Publication Data 6,573,099 B2 6/2003 Graham 6,645,765 B1 1 1/2003 Anderson et al. US 2006/O195934 A1 Aug. 31, 2006 6,664,446 B2 12/2003 Heard et al. 6,706,470 B2 3/2004 Choo et al. Related U.S. Application Data 6,753,139 B1 6/2004 Baulcombe et al. (60) Provisional application No. 60/654,927, filed on Feb. 6,777,588 B2 8, 2004 Waterhouse et al. 6,835,540 B2 12/2004 Broun 22, 2005. 6,846,669 B1 1/2005 Jofuku et al. 6,906.244 B2 6/2005 Fischer et al. (51) Int. Cl. - WW CI2N 15/29 (2006.01) SA, SSEord et al. CI2N 5/82 (2006.01) 2003/OO37355 A1 2/2003 Barbas et al. AOIH 5/00 (2006.01) 2003/006 1637 A1 3/2003 Jiang et al. (52) U.S. Cl...... 800/298; 800/287: 800/312: 2003/0131386 A1 7/2003 Samaha et al. 800/317; 800/321 2003. O135887 A1 7, 2003 Brandle et al. (58) Field of Classification Search ...... None 2003. O140381 A1 7, 2003 Bate et al. See application file for complete search history. 2003/0153097 A1 8/2003. Deshaies et al. 2003/017O656 A1 9, 2003 Cen et al. (56) References Cited 2003/0175783 Al 9, 2003 Waterhouse et al. U.S. PATENT DOCUMENTS 4,654,465. A 3, 1987 Brar et al. (Continued) 4,727,219 A 2f1988 Brar et al. 4,801,340 A 1, 1989 Inoue et al. FOREIGN PATENT DOCUMENTS 4,801,540 A 1, 1989 Hiatt et al. 4,936,904. A 6/1990 Carlson CA 2316036 8, 2000 4.946,778 A 8, 1990 Ladner et al. 4,987,071 A 1/1991 Cech et al. 5,034.323 A 7/1991 Jorgensen et al. (Continued) 5,188,958 A 2/1993 Moloney et al. 5,204.253 A 4, 1993 Sanford et al. OTHER PUBLICATIONS 5,231,020 A 7, 1993 Jorgensen et al. Grasser, K. Plant Molecular Biology 2003, vol. 53: pp. 281-295.* 5,254,678 A 10, 1993 Haseloff et al. 5,283,184 A 2/1994 Jorgensen et al. (Continued) 5,380,831 A 1/1995 Adang et al. 5,410,270 A 4/1995 Rybicki et al. Primary Examiner Russell Kallis 5.432,068 A 7, 1995 Albertsen et al. (74) Attorney, Agent, or Firm Fish & Richardson P.C. 5,445,934 A 8, 1995 Fodor et al. 5,538,880 A 7, 1996 Lundquist et al. (57) ABSTRACT 5,723,766 A 3/1998 Theologis et al. 5,766,847 A 6, 1998 Jackle et al. 5,777,079 A 7, 1998 Tsien et al. Materials and methods are provided for identifying regula 5,824,779 A 10/1998 Koegel et al. tory region-regulatory protein associations and modulating 5,824.798. A 10, 1998 Tallberg et al. expression of a sequence of interest. For example, a plant cell 5,859,330 A 1/1999 Bestwicket al. is provided containing a regulatory protein that can modulate 5,900,5255,880,333 A 3,5/1999 1999 GoffetAustin-Phillips al. et al. expression of one or more genes involved in alkaloid biosyn 5,925,806 A 7, 1999 McBride et al. thesis in , which, in turn, can modulate the amount 5,958,745 A 9/1999 Gruys et al. and/or rate of biosynthesis of one or more alkaloid com 5,994,622 A 1 1/1999 Jofuku et al. pounds. 6,004,804 A 12/1999 Kumar et al. 6,010,907 A 1/2000 Kmiec et al. 21 Claims, 57 Drawing Sheets US 7,795,503 B2 Page 2

U.S. PATENT DOCUMENTS WO O2/O55669 T 2002 WO O2/10.1052 12/2002 2003/0175965 A1 9, 2003 Lowe et al. WO O3,O13227 2, 2003 2003. O180945 A1 9/2003 Wang et al. WO O3,O25172 3, 2003 2003/0229915 A1 12/2003 Keddie et al. WO 03/034812 5, 2003 2004/OO 19927 A1 1/2004 Sherman et al. WO O3/O57877 T 2003 2004.0034.888 A1 2/2004 Liu et al. WO O3,060476 T 2003 2004/0045049 A1 3/2004 Zhang et al. WO 2004/O27038 4/2004 2004.0053876 A1 3, 2004 Turner et al. WO 2004/035798 4/2004 2004/OO72159 A1 4/2004 Takaiwa et al. WO 2004/O39956 5, 2004 2004/OO73972 A1 4/2004 Beachy et al. WO 2004/041170 5, 2004 2004f0078852 A1 4/2004 Thomashow et al. WO 2004/043361 5, 2004 2004/0203109 A1 10, 2004 Lal et al. WO 2005/047516 5, 2005 2004/0214330 A1 10, 2004 Waterhouse et al. WO 2005/054453 6, 2005 2004/0216190 A1 10, 2004 Kovalic WO 2006/005O23 1, 2006 2005.00091.87 A1 1/2005 Shinozaki et al. WO 2006/009922 1, 2006 2005/0081261 A1 4/2005 Pennell et al. 2005, 0108791 A1 5/2005 Edgerton et al. 2005/0223434 A1 10, 2005 Alexandrov et al. OTHER PUBLICATIONS 2005.0246785 A1 11, 2005 Cook et al. Stemmer et al. European Journal of Biochemistry; 1997, vol. 250, pp. 2005/0257293 A1 11, 2005 Mascia 646-652. 2006, OO1597O A1 1/2006 Pennell et al. Grasser KD, et al., FEBS Letters, Jul 26, 1993; vol. 327, No. 2: pp. 2006.0143729 A1 6/2006 Alexandrov et al. 141-144. 2006, O194959 A1 8/2006 Alexandrov et al. Hauschild K. et al. In Plant Molecular Biology: 1998, vol. 36; pp. 2006/0195934 A1 8/2006 Apuya et al. 473.478.* 2006/0272060 A1 11/2006 Heard et al. U.S. Appl. No. 60/121,700, filed Feb. 25, 1999, Bouckaert et al. 2007/0022495 A1 1/2007 Reuber et al. GenBank Accession No. U93215, dated Feb. 27, 2002. FOREIGN PATENT DOCUMENTS GenBank Accession No. AF129516, dated Apr. 6, 1999. GenBank Accession No. AFO96096, dated Jan. 25, 1999. DE 102 12703 10, 2003 GenBank Accession No. L05934, dated Oct. 22, 1993. EP O 329 308 8, 1989 GenBank Accession No. AAB57606, dated May 14, 1997. EP 1033 405 9, 2000 Abler et al. “Isolation and characterization of a genomic sequence EP O 625 572 4/2001 encoding the maize Cat3 catalase gene' Plant Mol. Biol., 22:1031 EP O 320500 11, 2004 1038 (1993). JP 5-219974 8, 1993 Ahn et al., “Homoeologous relationships of rice, wheat and maize JP O9-224672 9, 1997 chromosomes' Molecular and General Genetics, 241:483-490 KR 2004 OO8459 1, 2004 (1993). WO 90,08828 8, 1990 Alexandrov et al. “Features of Arabidopsis genes and genome dis WO 95/35505 12/1995 covered using full-length cDNAs' Plant Molecular Biology, 60:69 WO 96.34981 11, 1996 85 (2006). WO 96.36693 11, 1996 Allen et al. “RNAi-mediated replacement of morphine with the non WO 97/O1952 1, 1997 narcotic alkaloid reticuline in opium poppy' Nature Biotechnology, WO 97.31064 8, 1997 22(12): 1559-1566 (2004). WO 98,07842 2, 1998 Baerson et al., “Developmental regulation of an acyl carrier protein WO 98.36083 8, 1998 gene promoter in vegetative and reproductive tissues' Plant Mol. WO 98.53O83 11, 1998 Biol., 22(2):255-267 (1993). WO 99.04117 1, 1999 Bateman et al., “Pfam 3.1: 1313 multiple alignments and profile WO 99.07865 2, 1999 HMMs match the majority of proteins' Nucl. Acids Res. 27(1):260 WO 99.24574 5, 1999 262 (1999). WO 99.32619 7, 1999 Bechtold et al., “In planta Agrobacterium mediated gene transfer by WO 99.34663 7, 1999 infiltration of adult Arabidopsis thaliana plants' C.R. Acad. Sci. WO 99,38977 8, 1999 Paris, 3.16:1194-1199 (1993). WO 99.53050 10, 1999 BD Matchmaker One-Hybrid Library construction & Screening Kit, WO 99.58723 11, 1999 Clonetechniques, 2003. WO OO34318 6, 2000 Bird etal. "Atale of three cell types: alkaloid biosynthesis is localized WO OO34319 6, 2000 to sieve elements in opium poppy” The Plant Cell, 15:2626-2635 WO OO34320 6, 2000 (2003). WO OO34321 6, 2000 Broun et al. “Catalytic plasticity of fatty acid modification enzymes WO OO34322 6, 2000 underlying chemical diversity of plant lipids' Science, 282:1315 WO OO34323 6, 2000 1317 (1998). WO OO34324 6, 2000 Brummell, et al., “Inverted repeat of a heterologous 3'-untranslated WO OO,343.25 6, 2000 region for high-efficiency, high-throughput gene silencing Plant J. WO OO34326 6, 2000 33:793-800 (2003). WO OOf 42200 T 2000 Busk et al., “regulatory elements in vivo in the promoter of the WO OOf 46.383 8, 2000 abscisic acid responsive gene rab 17 from maize' plant journal, WO OO,55174 9, 2000 11(6): 1285-1295 (1997). WO O1/35725 5, 2001 Bustos et al., “Regulation of b-glucuronidase expression in WO O 1/75164 11, 2001 transgenic tobacco plants by an A/T-rich, cis-acting sequence found WO O2/15675 2, 2002 upstream of a French bean b-phaseolin gene' Plant Cell, 1 (9):839 WO O2,371.11 5, 2002 853 (1989). WO O2/46449 6, 2002 Carels et al., “Compositional properties of homologous coding WO O2/O55536 T 2002 sequences from plants' J. Mol. Evol. 46:45-53 (1998). US 7,795,503 B2 Page 3

Casaretto et al. “The transcription factors HvABI5 and Hv VP1 are Gilmour et al. “Low temperature regulation of the Arabidopsis CBF required for the abscisic acid induction of gene expression in barley family of AP2 transcriptional activators as an early step in cold aleuron cells' The Plant Cell, 15:271-284 (2003). induced COR gene expression' Plant Journal, 16(4):433-442 Cerdan et al., “A 146 by fragment of the tobacco Lhcb I*2 promoter (1998). confers very-low-fluence, low-fluence and high-irradiance responses Golovkin et al., “An SC35-like protein and a novel serine/arginine of phytochrom to a minimal CaMV 35S promoter'Plant Mol. Biol. rich protein interact with Arabidopsis U1-70K protein' J. Biol. 33:245-255 (1997). Chem. 274(51):36428-36438, (1999). Chen et al., “Functional analysis of regulatory elements in a plant Grec et al., Identification of regulatory sequence elements within the embryo-specific gene” Proc. Natl. Acad. Sci. USA, 83:8560-8564 transcription promoter region of NDABC1, a gene encoding a plant (1986). ABC transporter induced by diterpenes, The Plant Journal, vol. 35: Chen et al. “Expression profile matrix of Arabidopsis transcription 237-250 (2003). factor genes suggests their putative functions in response to environ Green et al., “Binding site requirements for pea nuclear protein factor mental stresses” The Plant Cell, 14:559-574 (2002). GT-1 correlate with sequences required for light-dependent tran Chenna et al., “Multiple sequence alignment with the Clustal series of scriptional activation of the rbcS-3A gene” EMBO.J., 7:4035-4044 programs' Nucleic Acids Res., 31(13):3497-500 (2003). (1988). Chinnusamy et al. “Screening for gene regulation mutants by Hamilton et al., “Antisense gene that inhibits synthesis of the hor bioluminescence imaging” Department of Plant Sciences, University mone ethylene in transgenic plants' Nature, 346:284-287 (1990). of Arizona, Tucson, pp. 1-10 (2002). Hannenhalli et al. “Promoter prediction in the human genome' Chitty et al., “Genetic transformation in commercial Tasmanian cul Bioinformatics, 17:S90-S96 (2001). tures of opium poppy, Papaver SOmniferum, and movement of Haralampidis et al. A new class of oxidosqualene cyclases directs transgenic pollen in the field.” Funct. Plant Biol. 30:1045-1058 synthesis of antimicrobial phytoprotectants in monocots' Proc. Natl. (2003). Acad. Sci. USA, 98(23): 13431-13436 (2001). Cho et al. “Expression of gamma-tocopherol methyltransferase Hauschild etal. “Isolation and analysis of the genebbel encoding the transgene improves tocopherol composition in lettuce (Latuca sativa berberine bridge enzyme from the California poppy Eschscholzia L.) Molecules and Cells, 19(1):16-22 (2005). californica" Plant Molec. Biol. 36:473–478 (1998). Chou et al. “Enzymatic oxidation in the biosynthesis of complex Hellens et al. “Transient expression vectors for functional genomics, alkaloids' The Plant Journal, 15(3):289-300 (1998). quantification of promoter activity and RNA silencing in plants' Collakova et al. “The role of homogentisate phytyltransferase and Plant Methods, 1:13 (2005). other tocopherol pathway enzymes in the regulation of tocopherol Herrera et al. “Cloning and characterization of the Arabidopsis synthesis during abiotic stress' Plant Physiology, 133(2):930-940 thaliana lupeol synthase gene” Phytochemistry, 49(7): 1905 (1998). (2003). Hilbricht et al. "CpR18, a novel SAP-domain plant transcription Conceicao, 'A cotyledon regulatory region is responsible for the factor, binds to a promoter region necessary for ABA mediated different spatial expression patterns of Arabidopsis 2S albumin expression of the CDeT27-45 gene from the resurrection plant genes' The Plant Journal, 1994, 5(4):493-505. Craterostigma plantagineum Hochst” The Plant Journal, 31(3):293 Conkling et al. “Isolation of transcriptionally regulated root-specific 303 (2002). Hong et al., “Promoter sequences from two different Brassica napus genes from tobacco” Plant Physiol. 93:1203-1211, (1990). tapetal oleosin-like genes direct tapetal expression of Cormack et al. “Leucine zipper-containing WRKY proteins widen B-glucuronidase intransgenic Brassica plants' Plant Mol Biol., 1997 the spectrum of immediate early elicitor-induced WRKY transcrip 34(3):549-555. tion factors in parsley” Biochimica et Biophysica Acta, 1572:92-100 Husselstein-Muller et al. “Molecular cloning and expression in yeast (2002). of 2,3-oxidosqualene-triterpenoid cyclases from Arabidopsis Dai et al., “RF2b, a rice bZIP transcription activator, interacts with thaliana' Plant Mol. Biol. 45(1):75-92 (2001). RF2a and is involved in Symptom development of rice tungro dis Hwang et al. An Arabidopsis thaliana root-specific kinase homolog ease” Proc. Natl. Acad. Sci. USA, 101(2):687-692 (2004). is induced by dehydration, ABA, and NaCl” The Plant Journal, de Feyter and Gaudron, Methods in Molecular Biology, vol. 74, 8(1):37-43 (1995). Chapter 43, “Expressing Ribozymes in Plants”. Edited by Turner, Hwang et al., “Aleurone- and embryo-specific expression of the P.C., Humana Press Inc., Totowa, NJ, 1997. 3-glucuronidase gene controlled by the barley Chi26 and Lip I pro Dixon "A two-for-one in tomato nutritional enhancement' Nature moters in transgenic rice' Plant Cell Rep. 20(7):647-654 (2001). Biotechnology, 23(7):825-826 (2005). Hyrup et al., Peptide nucleic acids (PNA): synthesis, properties and Doerks et al. “Protein annotation: detective work for function pre potential applications. Bioorgan. Med Chem., 4:5-23 (1996). diction” TIG, 14(6):248-250 (1998). Ichimura et al., “Isolation of ATMEKK1 (a MAP Kinase Kinase Dr. Duke's Phytochemical and ethnobotanical databases, obtained Kinase)- interacting proteins and analysis of a MAP Kinase cascade from the Internet on Feb. 9, 2005 at http://www.ars-grin.gov/cgi-bin/ in Arabidopsis" Biochem. Biophys. Res. Comm. 253:532-543, duke/farmacy2.pl, 7 pages. (1998). Dupont et al., “The benzophenanthridine alkaloid fagaronine induces Jakoby et al. “b/IP transcription factors in Arabidopsis' Trends in erythroleukemic cell differentiation by gene activation.” Planta Med. Plant Science, 7(3):106-111 (2002). 71(6):489-494 (2005). Joh et al. “High-level transient expression of recombinant protein in Facchini et al., “Expression patterns conferred by tyrosine? lettuce' Biotechnology and Bioengineering, 91(7):861-871 (2002). dihydroxyphenylalanine decarboxylase promoters from opium Jordano et al., “A Sunflower helianthinin gene upstream sequence poppy are conserved in transgenic tobacco Plant Physiology, Ameri ensemble contains an enhancer and sites of nuclear protein interac can Society of Plant Physiologists, vol. 118(1):69-81 (1998). tion” Plant Cell, 1:855-866 (1989). Facchini"Alkaloid biosynthesis in plants: Biochemistry, cell biology, Kanwischer et al. Alterations in tocopherol cyclase activity in molecular regulation, and metabolic engineering applications' Annu. transgenic and mutant plants of Arabidopsis affect tocopherol con Rev. Plant Physiol. Plant Mol. Biol. 52:29-66 (2001). tent, tocopherol composition, and oxidative stress' Plant Physiology, Fejes et al., “A 268 bp upstream sequence mediates the circadian 137:713-723 (2005). clock-regulated transcription of the wheat Cab-1 gene in transgenic Keller and Baumgartner, "Vascular-specific expression of the bean plants” Plant Mol. Biol. 15:921-932 (1990). GRP 1.8 gene is negatively regulated” Plant Cell, 3(10): 1051-1061 Fromm et al., “An octopine synthase enhancer element directs tissue (1991). specific expression and binds ASF-1, a factor from tobacco nuclear Kim et al. "A 20 nucleotide upstream element is essential for the extracts” The Plant Cell. 1:977-984 (1989). nopaline synthase (nos) promoter activity” Plant Molecular Biology, Galweiler et al., “The DNA-binding activity of Gal4 is inhibited by 24:105-117 (1994). Methylation of the Gal4 binding site in plant chromatin' The Plant Kuhlmann etal. "O-Helical D1 domain of the tobaccob/IP transcrip Journal, 23(1): 143-157 (2000). tion factor BZI-1 interacts with the ankyrin-repeat protein ANK1 and US 7,795,503 B2 Page 4 is important for BZI-1 function, both in auxin signaling and pathogen Pasquali et al. "Coordinated regulation of two indole alkaloid response' The Journal of Biological Chemistry, 278(10) 8786-8794 biosynthetic genes from catharanthus-roseus by auxin and elicitors' (2003). Plant Molecular Biology, 18(16): 1121-1131 (1992). Kushiro et al. “Cloning of oxidosqualene cyclase that catalyzes the Pauli et al., “Molecular cloning and functional heterologous expres formation of the most popular triterpene amoung higher plants' sion of two alleles encoding (S)-N-methylcoclaurine 3'-hydroxylase European Journal of Biochemistry, 256:238-244 (1998). (CYP80B1), a new methyl jasmonate-inducible cytochrome P-450 Kutchan “Molecular genetics of plant alkaloid biosynthesis' The dependent mono-oxygenase of benzylisoquinoline alkaloid Alkaloids, 50:257-316 (1998). biosynthesis” Plant Journal, 13(6):793-801, (1998). Lam et al., “Site-specific mutations in alter in vitro factor binding and Perriman et al., “Effective ribozyme delivery in plant cells' Proc. change promoter expression pattern in transgenic plants' Proc. Natl. Natl. Acad. Sci. USA, 92(13):6175-6179(1995). Acad. Sci. USA, 86:7890-7894 (1989). Pollock et al. “Human SRF-related proteins: DNA-binding proper Li et al., “Generation of destabilized green fluorescent protein as a ties and potential regulatory targets' Genes and Development, transcription reporter” J. Biol. Chem. 273(52):34970-34975 (1998). 12A:2327-41 (1991). Liljegren, “Interactions among APETALA I, LEAFY, and Terminal Potenza et al. “Invited review: Targeting transgene expression in Flower I specify meristem fate” Plant Cell, 11:1007-1018 (1999). research, agricultural, and environmental applications: Promoters Liu et al. “Two transcription factors, DREB1 and DREB2, with an used in plant transformation” In Vitro Cellular and Developmental EREBP/AP2 binding domain separate two cellular signal transduc Biology, 40(1):1-22 (2004). tion pathways in drought- and low- temperature-responsive gene Riechmann et al. “Arabidopsis transcription factors: genome-wide expression, respectively, in Arabidopsis.' The Plant Cell, 10:1391 comparative analysis among eukaryotes' Science, 290:2105-2110 1406 (1998). (2000). Lu et al. “Three novel MYB proteins with one DNA binding repeat Riggs et al., “Cotyledon nuclear proteins bind to DNA fragments mediate Sugar and hormone regulation of O-amylase gene expres harboring regulatory elements of phytohemagglutinin genes' Plant sion” The Plant Cell, 14:1963-1980 (2002). Cell, 1(6):609-621 (1989). Luan et al., “A rice cab gene promoter contains separate cis-acting Rivera et al., “Genomic evidence for two functionally distinct gene elements that regulate expression in dicot and monocot plants' The classes” Proc. Natl. Acad. Sci. USA.95:6239-6244 (1998). Plant Cell, 4:971-981 (1992). Rogers, S. G. et al., “Gene transfer in plants: production of trans Lubberstedt et al., “Promoters from genes for plastid proteins possess formed plants using ti plasmid vectors' Methods in Enzymology, regions with different sensitivities toward red and blue light' Plant 118:627 (1987). Physiol., 104:997-1006 (1994). Roulet et al. “Evaluation of computer tools for the prediction of Macleod et al., Expression of antisense to DNA methyltransferase transcription factor binding sites on genomic DNA obtained from mRNA induces DNA demethylation and inhibits tumorigenesis, J. the internet on Aug. 6, 2004 at http://www.bioinfo.de/isb. 1998/01/ Biological Chemistry, vol. 270(14): 8037-8043, (1995). 0004/main.html, 7 pages. Mariconti et al. “The E2F family of transcription factors from Sheehy et al., “Reduction of polygalacturonase activity in tomato Arabidopsis thaliana." The Journal of Biological Chemistry, fruit by antisense RNA” Proc. Nat. Acad. Sci. USA, 85:8805-88.09 277(12):9911-9919 (2002). (1988). Matsuoka et al., “Tissue-specific light-regulated expression directed Sheridan, “The mac I Gene: Controlling the commitment to the by the promoter of a C4 gene, maize pyruvate, orthophosphate meiotic pathway in Maize' Genetics, 142: 1009-1020 (1996). dikinase, in a C3 plant, rice' Proc. Natl. Acad. Sci. USA, 90:9586 Shintani et al. “Elevating the vitamin E content of plants through 9590 (1993). metabolic engineering Science, 282(5396): 2098-2100 (1998). Medberry et al., “The Commelina yellow mottle virus promoter is a Slocombe et al., “Temporal and tissue-specific regulation of a Bras strong promoter in vascular and reproductive tissues' Plant Cell, sica napus Stearoyl-acyl carrier protein desaturase gene' Plant 4(2):185-192 (1992). Physiol. 104(4): 1167-1176 (1994). Meier et al., “Elicitor-inducible and constitutive in vivo DNA foot Sonnhammer et al., “Pfam: A comprehensive database of protein prints indicate novel cis-acting elements in the promoter of a parsley domain families based on seed alignments' Proteins, 28:405-420 gene encoding pathogenesis-related protein 1' Plant Cell, 3:309-316 (1997). (1991). Sonnhammeret al., “Pfam: multiple sequencealignments and HMM Memelink “Putting the opium in poppy to sleep” Nature Biotechnol profiles of protein domains' Nucl. Acids Res., 26:320-322 (1998). ogy, 22(12): 1526-1527, Dec. 2004. Stockinger et al. "Arabidopsis thaliana CBFI encodes an AP2 Menke et al. A novel jasmonate- and elicitor-responsive element in domain-containing transcriptional activator that binds to the C-re the periwinkle secondary metabolite biosynthetic gene Sir interacts peat DRE, a cis-acting DNA regulatory element that stimulates tran with a jasmonate- and elicitor-inducible AP2-domain transcription Scription in response to low temperature and water deficit” Proc. factor, ORCA2” The EMBOJournal, 18(16):4455-4463 (1999). Natl. Acad. Sci. USA,94: 1035-1040 (1997). Ounaroon et al. “(R, S)-Reticuline 7-O-methyltrasnferase and (R. Summerton and Weller, “Morpholino antisense oligomers: design, S)-norcoclaurine 6-O- methyltransferase of Papaver preparation, and properties' Antisense Nucleic Acid Drug Dev., somniferum—cDNA cloning and characterization of methyl transfer 7:187-195 (1997). enzymes of alkaloid biosynthesis in opium poppy” The Plant Jour Truernit et al., “The promoter of the Arabidopsis thaliana SUC2 nal, 36(6):808-819 (2003). Sucrose-H"Symporter gene directs expression off-glucuronidase to Park et al., “Analysis of promoters from tyrosine? the phloem: Evidence for phloem loading and unloading by SUC2' dihydroxyphenylalanine decarboxylase and berberine bridge Planta. 196:564-570 (1995). enzyme genes involved in benzylisoquinoline alkaloid biosynthesis Ulmasov et al. "ARF1, a transcription factor that binds to auxin in opium poppy Plant Molecular Biology, 40(1): 121-131 (1999). response elements' Science, 276:1865-1868 (1997). Park & Facchini, “High-efficiency somatic embryogenesis and plant Urao et al. “Molecular cloning and characterization of a gene that regeneration in California poppy, Eschscholzi callifronica Cham.” encodes a MYC-related protein in Arabidopsis' Plant Mol. Biol. Plant Cell Rep 19: 421-426, (2000). 32:571-57 (1996). Park et al., “Agrobacterium rhizogenes-mediated transformation of Urdea et al. “Chemical synthesis of a gene for human epidermal opium poppy, Papaver SOmniferum L., and California poppy, growth factor urogastrone and its expression in yeast Proc. Natl. Eschscholzia californica Cham., root cultures”.J. Exp. Botany, 2000, Acad. Sci. USA, 80:7461-7465 (1983). 51(347): 1005-1016. Van der Fits et al. “The jasmonate-inducible AP2/ERF-domain tran Park & Facchini, 'Agrobacterium-mediated genetic transformation Scription factor ORCA3 activates gene expression via interaction of California poppy, Eschscholzia Californica Cham., via Somatic with a jasmonate-responsive promoter element” The Plant Journal, embryogenesis” Plant Cell Rep., 19: 1006-1012, (2000). 25(1):43-53 (2001). US 7,795,503 B2 Page 5 vander Krolet al., “Flavonoid genes in petunia: Addition of a limited Yamamoto et al., "Characterization of cis-acting sequences regulat number of gene copies may lead to a Suppression of gene expression' ing root-specific gene expression in tobacco”. The Plant Cell, 3:371 The Plant Cell, 2:291-299 (1990). 382 (1991). Van Eenennaam et al. “Elevation of seed C-tocopherol levels using Yamamoto et al., “The promoter of a pine photosynthetic gene allows plant-based transcription factors targeted to an endogenous locus' expression of a -glucuronidase reporter gene in transgenic rice Metab. Eng., 6(2): 101-108 (2004). plants in a light-independent but tissue-specific manner Plant Cell Verpoorte et al. “Engineering secondary metabolite production in Physiol. 35:773-778 (1994). plants' Current Opinion in Biotechnology, 13(2): 181-187 (2002). Yanagisawa “Transcription factors in plants: physiological functions and regulation of expression”. J. Plant Res., 111:363-371 (1998). Weber et al., “In vitro DNA methylation inhibits gene expression in Yanagisawa, “The Dof family of plant transcription factors' Trends transgenic tobacco.” The EMBO Journal, Dec. 1990, 9(13): 4409 in Plant Science, 7(12):555-560 (2002). 4415. Yanagisawa et al., “Metabolic engineering with Dof transcription Weigel D, “The APETALA2 domain is related to a novel type of DNA factor in plants: Improved nitrogen assimilation and growth under binding domain” Plant Cell, 7:388-389 (1995). low-nitrogen conditions” Proc. Natl. Acad. Sci. USA, 101 (20):7833 Wroblewski et al. “Optimization of Agrobacterium-mediated tran 7838 (2004). sient assays of gene expression in lettuce, tomato and Arabidopsis' Zhang et al. "Metabolic engineering of tropane alkaloid biosynthesis Plant Biotechnology Journal, 3:259-273 (2005). in plants' Journal of Integrative Plant Biology, 47(2): 136-143 Xiong et al. “Repression of stress-responsive genes by FIERY2, a (2005). novel transcriptional regulator in Arabidopsis" Proc. Natl. Acad. Sci. Zheng et al., “SPK1 Is an Essential S-Phase-Specific Gene of Sac USA, 99(16) 10899-10904 (2002). charomyces cerevisiae that encodes a nuclear serine/threonine? Xu et al., “Characterization of a rice gene family encoding root tyrosine kinase” Mol. Cell Biol., 1993, 13:5829-5842. specific proteins” Plant Mol. Biol., 27:237-248 (1995). * cited by examiner

U.S. Patent Sep. 14, 2010 Sheet 3 Of 57 US 7,795,503 B2

|8||

S

(panu?uoO)Z9.InãIA snsuæsuoQ

U.S. Patent Sep. 14, 2010 Sheet 7 Of 57 US 7,795,503 B2 OOZ 092

(panu?uoO)†9.In513

snsuæsuo3 snsuæsuoo snsuæsuoO

U.S. Patent Sep. 14, 2010 Sheet 14 Of 57 US 7,795,503 B2

U.S. Patent Sep. 14, 2010 Sheet 16 Of 57 US 7,795,503 B2

09||

|××VTEVÒ

(ponu?uoo)6Qin?l? snsuæsuoO Snsuasuo?) U.S. Patent Sep. 14, 2010 Sheet 17 Of 57 US 7,795,503 B2

0I9In3j U.S. Patent Sep. 14, 2010 Sheet 18 Of 57 US 7,795,503 B2 OOZ

snsuæsuoo snsuæsuoQ

U.S. Patent Sep. 14, 2010 Sheet 22 Of 57 US 7,795,503 B2

+--->•*-->

===*(=====))s?æ

(pºnu?uoo)IIQun?!) snsuæsuoo

U.S. Patent Sep. 14, 2010 Sheet 25 Of 57 US 7,795,503 B2 009:

(pºnu?uoO)ZIQIn314

snsuæsuoO snsuæsuo9

U.S. Patent Sep. 14, 2010 Sheet 27 Of 57 US 7,795,503 B2

09 09|

SSN83-),Ad––

snsuæsuo3 snsuasuo3 snsuæsuoO

U.S. Patent Sep. 14, 2010 Sheet 31 Of 57 US 7,795,503 B2

(panu?uoO)#7I9InãIA

U.S. Patent Sep. 14, 2010 Sheet 34 of 57 US 7,795,503 B2

-VOET}Xº3TOSTIHN-

–dEAST1SS9

(ponu?uoO)çIainºj

snsuæsuo9

U.S. Patent Sep. 14, 2010 Sheet 36 of 57 US 7,795,503 B2

U.S. Patent Sep. 14, 2010 Sheet 38 of 57 US 7,795,503 B2

(panu?uoO)LIQun?!) U.S. Patent Sep. 14, 2010 Sheet 39 Of 57 US 7,795,503 B2

099C

(panu?uoO)LIQIn?IJ

U.S. Patent Sep. 14, 2010 Sheet 42 of 57 US 7,795,503 B2

snsuæsuo3 snsuæsuo3

U.S. Patent Sep. 14, 2010 Sheet 45 of 57 US 7,795,503 B2

8/9 8/9 |9.Z

(panu?uoO)6IºInãIJ snsuæsuo3

U.S. Patent Sep. 14, 2010 Sheet 48 of 57 US 7,795,503 B2 097

–SS-S-S--c}

snsuæsuo3 Snsuasuo,)

U.S. Patent Sep. 14, 2010 Sheet 51 of 57 US 7,795,503 B2

OOZ––9BSTWNOA––>|—09———————————––––NNS-OS—E^—SSW}}snsuasuo3

U.S. Patent Sep. 14, 2010 Sheet 55 of 57 US 7,795,503 B2

099 006 ZO; 999, 689) 096 ZO; 999, 699C O00?

Snsuæsuo9 snsuasuoy snsuæsuoo snsuæsuoo

U.S. Patent Sep. 14, 2010 Sheet 57 Of 57 US 7,795,503 B2

09.Z| 009`|

\/\/NOOOANB|

OSIBÒ------SS100TONNA

snsuæsuoo snsuæsuo3 snsuæsuo9 US 7,795,503 B2 1. 2 MODULATING PLANTALKALOIDS interacting either directly or indirectly with regulatory regions of genes encoding enzymes in an alkaloid biosynthe CROSS-REFERENCE TO RELATED sis pathway, and thereby modulating expression, e.g., tran APPLICATIONS Scription, of Such genes. Modulation of expression can include up-regulation or activation, e.g., an increase of This application claims priority under 35 U.S.C. S 119 to expression relative to basal or native states (e.g., a control U.S. Provisional Application No. 60/654,927, filed Feb. 22, level). In other cases, modulation of expression can include 2005, which is incorporated herein by reference in its entirety. down-regulation or repression, e.g., a decrease of expression relative to basal or native states, such as the level in a control. TECHNICAL FIELD 10 In many cases, a regulatory protein is a transcription factor and its associated regulatory region is a promoter. Regulatory This document relates to materials and methods involved proteins identified as being capable of interacting directly or in modulating the rate of production and accumulation of indirectly with regulatory regions of genes encoding enzymes alkaloid compounds, e.g., alkaloid secondary metabolites, in in an alkaloid biosynthesis pathway can be used to create plants. For example, this document provides plants having 15 transgenic plants, e.g., plants capable of producing one or increased alkaloid levels as well as materials and methods for more alkaloids. Such plants can have modulated, e.g., making plants having increased alkaloid levels. increased, amounts and/or rates of biosynthesis of one or more alkaloid compounds. Regulatory proteins can also be INCORPORATION-BY-REFERENCE & TEXTS used along with their cognate promoters to modulate tran Scription of one or more endogenous sequences, e.g., alkaloid The material on the accompanying diskette is hereby incor biosynthesis genes, in a plant cell. Given the variety of uses of porated by reference into this application. The accompanying the various alkaloid classes of compounds, it would be useful compact discs contain one file, 11696-157001.txt, which was to control selective expression of one or more proteins, created on Feb. 22, 2006. The file named 11696-157001.txt is including enzymes, regulatory proteins, and other auxiliary 688 KB. The file can be accessed using Microsoft Word on a 25 proteins, involved in alkaloid biosynthesis, e.g., to regulate computer that uses Windows OS. biosynthesis of known and/or novel alkaloids. In one aspect, a method of determining whether or not a BACKGROUND regulatory region is activated by a regulatory protein is pro vided. The method comprises determining whether or not Regulation of gene expression is achieved by the direct or 30 reporter activity is detected in a plant cell transformed with indirect interaction of regulatory proteins such as transcrip (a) a recombinant nucleic acid construct comprising a regu tion factors with cis-acting DNA regulatory regions, includ latory region operably linked to a nucleic acid encoding a ing promoters, promoter elements, and promoter motifs, polypeptide having the reporter activity; and (b) a recombi which may be located upstream, downstream, and/or within a nant nucleic acid construct comprising a nucleic acid encod gene of interest. Certain regulatory proteins can interact with 35 ing a regulatory protein comprising a polypeptide sequence regulatory regions for a number of genes, often driving the having 80% or greater sequence identity to a polypeptide coordinate expression of multiple genes in a pathway. For sequence selected from the group consisting of SEQID NOs: example, binding of a transcription factor to a promoter or 2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, SEQID NOs: promoter element usually results in a modulation, e.g., an 35-42, SEQ ID NOs:44-61, SEQ ID NOs:63-70, SEQ ID increase, of basal rates of transcription initiation and/or elon 40 NOs:72-79, SEQID NOs:81-86, SEQID NOs:88-99, SEQ gation. Promoters typically have a modular organization that ID NOs: 101-113, SEQID NOs: 115-122, SEQID NOs: 124 includes multiple cis-elements (promoter elements), which 136, SEQID NOs:138-150, SEQID NOs: 152-156, SEQID can interact in additive or synergistic manners to modulate NOs:158-167, SEQID NOs: 169-175, SEQID NOs: 177-188, transcription. Identification of regulatory proteins that bind to SEQ ID NOS:190-191, SEQ ID NOs: 193-194, SEQ ID particular DNA regulatory regions can provide tools to facili 45 NO:196, SEQ ID NO:198, SEQ ID NOs:200-204, SEQ ID tate the selective expression of proteins of interest, e.g., to NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID modify plant biosynthetic pathways. NOs:221-236, and the consensus sequences set forth in FIGS. Plant families that produce alkaloids include the Papaver 1-22; where detection of the reporter activity indicates that aceae, Berberidaceae, Leguminosae, Boraginaceae, Apocyn the regulatory region is activated by the regulatory protein. aceae, Asclepiadaceae, Liliaceae, Gnetaceae, Erythroxy 50 The activation can be direct or indirect. The nucleic acid laceae, Convolvulaceae, Ranunculaeceae, Rubiaceae, encoding the regulatory protein can be operably linked to a Solanaceae, and families. Many alkaloids isolated regulatory region, where the regulatory region is capable of from Such plants are known for their pharmacologic (e.g., modulating expression of the regulatory protein. The regula narcotic), insecticidal, and physiologic effects. For example, tory region capable of modulating expression of the regula the poppy () family contains about 250 species 55 tory protein can be a promoter. The promoter can be an found mainly in the northern temperate regions of the world. organ-preferential promoter, a tissue-preferential promoter, a The principal morphinan alkaloids in opium poppy (Papaver cell type-preferential promoter, or an inducible promoter. The somniferum) are morphine, codeine, and thebaine, which are tissue can be stem, seed pod, reproductive, or parenchymal used directly or modified using synthetic methods to produce tissue. The cell can be a laticifer, Sieve element, or companion pharmaceutical compounds used for pain management, 60 cell. cough Suppression, and addiction. The plant cell can be stably transformed with the recom binant nucleic acid construct comprising the regulatory SUMMARY region operably linked to the nucleic acid encoding the polypeptide having the reporter activity and transiently trans The present invention relates to materials and methods for 65 formed with the recombinant nucleic acid construct compris identifying regulatory proteins that are associated with regu ing the nucleic acid encoding the regulatory protein. The plant latory regions, i.e., regulatory proteins that are capable of cell can be stably transformed with the recombinant nucleic US 7,795,503 B2 3 4 acid construct comprising the nucleic acid encoding the regu The regulatory region can be a promoter. The promoter can latory protein and transiently transformed with the recombi be a tissue-preferential promoter, a cell type-preferential pro nant nucleic acid construct comprising the regulatory region moter, or an inducible promoter. The tissue can be stem, seed operably linked to the nucleic acid encoding the polypeptide pod, reproductive, or parenchymal tissue. The cell type can be having the reporter activity. The plant cell can be stably trans a laticifer cell, a companion cell, or a sieve element cell. The formed with the recombinant nucleic acid construct compris plant cell can be capable of producing one or more alkaloids. ing the nucleic acid encoding the regulatory protein and sta The plant cell can further comprise an endogenous regulatory bly transformed with the recombinant nucleic acid construct region that is associated with the regulatory protein. The comprising the regulatory region operably linked to the regulatory protein can modulate transcription of an endog nucleic acid encoding the polypeptide having the reporter 10 enous gene involved in alkaloid biosynthesis in the cell. The activity. The plant cell can be transiently transformed with the modulation can be an increase in transcription of the endog recombinant nucleic acid construct comprising the nucleic enous gene. The endogenous gene can comprise a coding acid encoding the regulatory protein and transiently trans sequence for a regulatory protein involved in alkaloid biosyn formed with the recombinant nucleic acid construct compris thesis. The endogenous gene can comprise a coding sequence ing the regulatory region operably linked to the nucleic acid 15 for an alkaloid biosynthesis enzyme. encoding the polypeptide having the reporter activity. The endogenous gene can be a tetrahydrobenzylisoquino The reporter activity can be selected from an enzymatic line alkaloid biosynthesis enzyme, a benzophenanthridine activity and an optical activity. The enzymatic activity can be alkaloid biosynthesis enzyme, a morphinan alkaloid biosyn selected from luciferase activity, neomycin phosphotrans thesis enzyme, a monoterpenoid indole alkaloid biosynthesis ferase activity, and phosphinothricin acetyltransferase activ enzyme, a bisbenzylisoquinoline alkaloid biosynthesis ity. The optical activity can be bioluminescence, fluores enzyme, a pyridine, purine, tropane, or quinoline alkaloid cence, or phosphorescence. biosynthesis enzyme, a terpenoid, betaine, or phenethy In another aspect, a method of determining whether or not lamine alkaloid biosynthesis enzyme, or a steroid alkaloid a regulatory region is activated by a regulatory protein is biosynthesis enzyme. provided. The method comprises determining whether or not 25 The endogenous gene can be selected from the group con reporter activity is detected in a plant cell transformed with sisting of tyrosine decarboxylase (YDC or TYD; EC (a) a recombinant nucleic acid construct comprising a regu 4.1.1.25), norcoclaurine synthase (EC 4.2.1.78), coclaurine latory region comprising a nucleic acid having 80% or greater N-methyltransferase (EC 2.1.1.140), (R, S)-norcoclaurine sequence identity to a regulatory region selected from the 6-O-methyl transferase (NOMT, EC 2.1.1.128), S-adenosyl group consisting of SEQID NOS:237-252 operably linked to 30 L-methionine:3'-hydroxy-N-methylcoclaurine 4'-O-methyl a nucleic acid encoding a polypeptide having the reporter transferase 1 (HMCOMT1: EC 2.1.1.116); S-adenosyl-L- activity; and (b) a recombinant nucleic acid construct com methionine:3'-hydroxy-N-methylcoclaurine 4'-O- prising a nucleic acid encoding a regulatory protein. Detec methyltransferase 2 (HMCOMT2: EC 2.1.1.116); tion of the reporter activity indicates that the regulatory monophenol monooxygenase (EC 1.14.18.1), N-methylco region is activated by the regulatory protein. 35 claurine 3'-hydroxylase (NMCH; EC 1.14.13.71), (R.S)-reti The regulatory protein can comprise a polypeptide culine 7-O-methyltransferase (ROMT); berbamunine syn sequence having 80% or greater sequence identity to a thase (EC 1.14.21.3), columbamine O-methyltransferase (EC polypeptide sequence selected from the group consisting of 2.1.1.118), berberine bridge enzyme (BBE; EC 1.21.3.3), SEQ ID NOS:2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, reticuline oxidase (EC 1.21.3.4), dehydro reticulinium ion SEQID NOS:35-42, SEQ ID NOs:44-61, SEQ ID NOs:63 40 reductase (EC 1.5.1.27), (RS)-1-benzyl-1,2,3,4-tetrahy 70, SEQID NOs:72-79, SEQID NOs:81-86, SEQID NOs: droisoquinoline N-methyltransferase (EC 2.1.1.115), (S)- 88-99, SEQID NOs: 101-113, SEQID NOs: 115-122, SEQID scoulerine oxidase (EC 1.14.21.2), (S)-cheilanthifoline oxi NOs: 124-136, SEQID NOs: 138-150, SEQID NOs: 152-156, dase (EC 1.14.21.1), (S)-tetrahydroprotoberberine SEQID NOs: 158-167, SEQID NOs: 169-175, SEQID NOs: N-methyltransferase (EC 2.1.1.122), (S)-canadine synthase 177-188, SEQID NOS:190-191, SEQID NOs: 193-194, SEQ 45 (EC 1.14.21.5), tetrahydroberberine oxidase (EC 1.3.3.8), ID NO:196, SEQID NO: 198, SEQID NOs:200-204, SEQID and columbamine oxidase (EC 1.21.3.2). NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID The endogenous gene can be selected from the group con NOs:221-236, and the consensus sequences set forth in FIGS. sisting of those coding for dihydrobenzophenanthridine oxi 1-22. dase (EC 1.5.3.12), dihydrosanguinarine 10-hydroxylase In another aspect, a plant cell is provided. The plant cell 50 (EC 1.14.13.56), 10-hydroxydihydrosanguinarine 10-O-me comprises an exogenous nucleic acid comprising a nucleic thyltransferase (EC 2.1.1.119), dihydrochelirubine 12-hy acid encoding a regulatory protein comprising a polypeptide droxylase (EC 1.14.13.57), and 12-hydroxydihydrocheliru sequence having 80% or greater sequence identity to a bine 12-O-methyltransferase (EC 2.1.1.120). polypeptide sequence selected from the group consisting of The endogenous gene can be selected from the group con SEQ ID NOS:2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, 55 sisting of those coding for salutaridinol 7-O-acetyltransferase SEQID NOS:35-42, SEQ ID NOs:44-61, SEQ ID NOs:63 (SAT; EC 2.3.1.150), salutaridine synthase (EC 1.14.21.4), 70, SEQID NOs:72-79, SEQID NOs:81-86, SEQID NOs: salutaridine reductase (EC 1.1.1.248), morphine 6-dehydro 88-99, SEQID NOs: 101-113, SEQID NOs: 115-122, SEQID genase (EC 1.1.1.218); and codeinone reductase (CR; EC NOs: 124-136, SEQID NOs: 138-150, SEQID NOs: 152-156, 1.1.1.247). SEQID NOs: 158-167, SEQID NOs: 169-175, SEQID NOs: 60 The plant cell comprising an exogenous nucleic acid com 177-188, SEQID NOS:190-191, SEQID NOs: 193-194, SEQ prising a nucleic acid encoding a regulatory protein compris ID NO:196, SEQID NO: 198, SEQID NOs:200-204, SEQID ing a polypeptide sequence having 80% or greater sequence NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID identity to a polypeptide sequence selected from the group NOs:221-236, and the consensus sequences set forth in FIGS. consisting of SEQID NOS:2-5, SEQID NOS:7-16, SEQID 1-22, where the nucleic acid is operably linked to a regulatory 65 NOs: 18-33, SEQID NOS:35-42, SEQID NOs:44-61, SEQ region that modulates transcription of the regulatory protein ID NOs:63-70, SEQ ID NOs:72-79, SEQ ID NOs:81-86, in the plant cell. SEQ ID NOs:88-99, SEQ ID NOs: 101-113, SEQ ID NOs: US 7,795,503 B2 5 6 115-122, SEQID NOs: 124-136, SEQID NOs: 138-150, SEQ erpenoid indole alkaloid, a bisbenzylisoquinoline alkaloid, a ID NOs: 152-156, SEQID NOs: 158-167, SEQID NOs: 169 pyridine, purine, tropane, or quinoline alkaloid, a terpenoid, 175, SEQID NOs: 177-188, SEQ ID NOs:190-191, SEQID betaine, or phenethylamine alkaloid, or a steroid alkaloid. NOs: 193-194, SEQ ID NO:196, SEQ ID NO.198, SEQ ID A plant cell described above can be a member of the Papav NOs:200-204, SEQID NOs:206-215, SEQID NO:217, SEQ eraceae, Menispermaceae, Lauraceae, Euphorbiaceae, Ber ID NO:219, SEQ ID NOS:221-236, and the consensus beridaceae, Leguminosae, Boraginaceae, Apocynaceae, sequences set forth in FIGS. 1-22, can further comprise an Asclepiadaceae, Liliaceae, Gnetaceae, Erythroxylaceae, exogenous regulatory region operably linked to a sequence of Convolvulaceae, Ranunculaeceae, Rubiaceae, Solanaceae, or interest. The nucleic acid is operably linked to a regulatory Rutaceae families. A plant cell described above can be a region that modulates transcription of the regulatory protein 10 member of the species Papaver bracteatum, Papaver Orien in the plant cell. The exogenous regulatory region is associ tale, Papa versetigerum, Papaver somniferum, Croton salu ated with the regulatory protein. The exogenous regulatory taris, Croton balsamifera, Sinomenium acutum, Stephania region comprises a nucleic acid having 80% or greater cepharantha, Stephania Zippeliana, Litsea sebiferea, sequence identity to a regulatory region selected from the Alseodaphne perakensis, Cocculus laurifolius, Duguetia group consisting of SEQID NOS:237-252. 15 Obovata, Rhizocarya racemifera, or Beilschmiedia Oreophila. The sequence of interest can comprise a coding sequence A plant cell described above can further comprise a nucleic for a polypeptide involved in alkaloid biosynthesis. The acid encoding a second regulatory protein operably linked to polypeptide can be a regulatory protein involved in alkaloid a second regulatory region that modulates transcription of the biosynthesis. The polypeptide can be an alkaloid biosynthesis second regulatory protein in the plant cell. The nucleic acid enzyme. encoding a second regulatory protein operably linked to a The enzyme can be a morphinan alkaloid biosynthesis second regulatory region can be present on a second recom enzyme, a tetrahydrobenzylisoquinoline alkaloid biosynthe binant nucleic acid construct. sis enzyme, or a benzophenanthridine alkaloid biosynthesis A regulatory protein-regulatory region association can be enzyme. The enzyme can be a monoterpenoid indole alkaloid effective for modulating the amount of at least one alkaloid biosynthesis enzyme, a bisbenzylisoquinoline alkaloid bio 25 compound in the cell. An alkaloid compound can be selected synthesis enzyme, a pyridine, purine, tropane, or quinoline from the group consisting of salutaridine, salutaridinol, salu alkaloid biosynthesis enzyme, a terpenoid, betaine, or phen taridinol acetate, thebaine, isothebaine, papaverine, narco ethylamine alkaloid biosynthesis enzyme, or a steroid alka tine, noscapine, narceline, hydrastine, oripavine, morphinone, loid biosynthesis enzyme. morphine, codeine, codeinone, and neopinone. An alkaloid The enzyme can be selected from the group consisting of 30 compound can be selected from the group consisting of ber salutaridinol 7-O-acetyltransferase (SAT; EC 2.3.1.150), berine, palmatine, tetrahydropalmatine, S-canadine, colum salutaridine synthase (EC 1.14.21.4), salutaridine reductase bamine, S-tetrahydrocolumbamine, S-scoulerine, (EC 1.1.1.248), morphine 6-dehydrogenase (EC 1.1.1.218); S-cheilathifoline, S-stylopine, S-cis-N-methylstylopine, pro and codeinone reductase (CR; EC 1.1.1.247). topine, 6-hydroxyprotopine, R-norreticuline, S-norreticu The enzyme can be selected from the group consisting of 35 line, R-reticuline. S-reticuline, 1,2-dehydroreticuline, S-3'- tyrosine decarboxylase (YDC or TYD; EC 4.1.1.25), norco hydroxycoclaurine, S-norcoclaurine, S-coclaurine, S-N- claurine synthase (EC 4.2.1.78), coclaurine N-methyltrans methylcoclaurine, berbamunine, 2-norberbamunine, and ferase (EC 2.1.1.140), (R, S)-norcoclaurine 6-O-methyl guatteguamerine. An alkaloid compound can be selected transferase (NOMT, EC 2.1.1.128), S-adenosyl-L-methion from the group consisting of dihydro-sanguinarine, sangui ine:3'-hydroxy-N-methylcoclaurine 4'-O-methyltransferase 40 narine, dihydroxy-dihydro-sanguinarine, 12-hydroxy-dihy 1 (HMCOMT1: EC 2.1.1.116); S-adenosyl-L-methionine:3'- drochelirubine, 10-hydroxy-dihydro-sanguinarine, dihydro hydroxy-N-methylcoclaurine 4'-O-methyltransferase 2 macarpine, dihydro-chelirubine, dihydro-sanguinarine, (HMCOMT2: EC 2.1.1.116); monophenol monooxygenase chelirubine, 12-hydroxy-chelirubine, and macarpine. (EC1.14.18.1), N-methylcoclaurine 3'-hydroxylase (NMCH: In another aspect, a Papaveraceae plant is provided. The EC 1.14.13.71), (R.S)-reticuline 7-O-methyltransferase 45 plant comprises an exogenous nucleic acid comprising a (ROMT); berbamunine synthase (EC 1.14.21.3), columbam nucleic acid encoding a regulatory protein comprising a ine O-methyltransferase (EC 2.1.1.118), berberine bridge polypeptide sequence having 80% or greater sequence iden enzyme (BBE; EC 1.21.3.3), reticuline oxidase (EC tity to a polypeptide sequence selected from the group con 1.21.3.4), dehydro reticulinium ion reductase (EC 1.5.1.27), sisting of SEQ ID NOs:2-5, SEQ ID NOs:7-16, SEQ ID (RS)-1-benzyl-1,2,3,4-tetrahydroisoquinoline N-methyl 50 NOs: 18-33, SEQID NOS:35-42, SEQID NOs:44-61, SEQ transferase (EC 2.1.1.115), (S)-scoulerine oxidase (EC ID NOs:63-70, SEQ ID NOs:72-79, SEQ ID NOs:81-86, 1.14.21.2), (S)-cheilanthifoline oxidase (EC 1.14.21.1), (S)- SEQ ID NOs:88-99, SEQ ID NOs: 101-113, SEQ ID NOs: tetrahydroprotoberberine N-methyltransferase (EC 115-122, SEQID NOs: 124-136, SEQID NOs: 138-150, SEQ 2.1.1.122), (S)-canadine synthase (EC 1.14.21.5), tetrahy ID NOs: 152-156, SEQID NOs: 158-167, SEQID NOs: 169 droberberine oxidase (EC 1.3.3.8), and columbamine oxidase 55 175, SEQID NOs: 177-188, SEQ ID NOS:190-191, SEQID (EC 1.21.3.2). NOs: 193-194, SEQ ID NO:196, SEQ ID NO.198, SEQ ID The enzyme can be selected from the group consisting of NOs:200-204, SEQID NOs:206-215, SEQID NO:217, SEQ dihydrobenzophenanthridine oxidase (EC 1.5.3.12), dihyd ID NO:219, SEQ ID NOS:221-236, and the consensus rosanguinarine 10-hydroxylase (EC 1.14.13.56), 10-hy sequences set forth in FIGS. 1-22, where the nucleic acid is droxydihydrosanguinarine 10-O-methyltransferase (EC 60 operably linked to a regulatory region that modulates tran 2.1.1.119), dihydrochelirubine 12-hydroxylase (EC Scription of the regulatory protein in the plant cell. 1.14.13.57), and 12-hydroxydihydrochelirubine 12-O-meth In another aspect, a method of expressing a sequence of yltransferase (EC 2.1.1.120). interest is provided. The method comprises growing a plant A plant cell described above can be capable of producing cell comprising (a) an exogenous nucleic acid comprising a one or more alkaloids. An alkaloid can be a morphinan alka 65 regulatory region comprising a nucleic acid having 80% or loid, a morphinan analog alkaloid, a tetrahydrobenzyliso greater sequence identity to a regulatory region selected from quinoline alkaloid, a benzophenanthridine alkaloid, a monot the group consisting of SEQ ID NOS:237-252, where the US 7,795,503 B2 7 8 regulatory region is operably linked to a sequence of interest; the regulatory protein are associated. The plant cell is grown and (b) an exogenous nucleic acid comprising a nucleic acid under conditions effective for the expression of the endog encoding a regulatory protein comprising a polypeptide enous regulatory protein. sequence having 80% or greater sequence identity to a The sequence of interest can comprise a coding sequence polypeptide sequence selected from the group consisting of 5 for a polypeptide involved in alkaloid biosynthesis. The SEQ ID NOS:2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, nucleic acid encoding the exogenous regulatory protein can SEQID NOS:35-42, SEQ ID NOs:44-61, SEQ ID NOs:63 be operably linked to a regulatory region capable of modu 70, SEQID NOs:72-79, SEQID NOs:81-86, SEQID NOs: lating expression of the exogenous regulatory protein in the 88-99, SEQID NOs: 101-113, SEQID NOs: 115-122, SEQID plant cell. The regulatory region capable of modulating 10 expression of the exogenous regulatory protein in the plant NOs: 124-136, SEQID NOs: 138-150, SEQID NOs: 152-156, cell can be selected from a tissue-preferential, cell type-pref SEQID NOs: 158-167, SEQID NOs: 169-175, SEQID NOs: erential, organ-preferential, or inducible promoter. The regu 177-188, SEQID NOS:190-191, SEQID NOs: 193-194, SEQ latory region capable of modulating expression of the exog ID NO:196, SEQID NO: 198, SEQID NOs:200-204, SEQID enous regulatory protein can be a cell type-preferential NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID 15 promoter, where the cell type is a laticifer, sieve element, or NOs:221-236, and the consensus sequences set forth in FIGS. companion cell. 1-22. The regulatory region and the regulatory protein are In another aspect, a method of expressing a sequence of associated. The plant cell is grown under conditions effective interest is provided. The method comprises growing a plant for the expression of the regulatory protein. cell comprising an exogenous nucleic acid. The exogenous In another aspect, a method of expressing an endogenous nucleic acid comprises a nucleic acid encoding a regulatory sequence of interest is provided. The method comprises protein comprising a polypeptide sequence having 80% or growing a plant cell comprising an endogenous regulatory greater sequence identity to a polypeptide sequence selected region operably linked to a sequence of interest. The endog from the group consisting of SEQ ID NOS:2-5, SEQ ID enous regulatory region comprises a nucleic acid having 80% NOs:7-16, SEQID NOs: 18-33, SEQID NOs:35-42, SEQID or greater sequence identity to a regulatory region selected 25 NOs:44-61, SEQID NOs:63-70, SEQID NOs:72-79, SEQ from the group consisting of SEQID NOS:237-252. The plant ID NOs:81-86, SEQID NOs:88-99, SEQ ID NOs: 101-113, cell further comprises a nucleic acid encoding an exogenous SEQID NOs: 115-122, SEQID NOs: 124-136, SEQID NOs: regulatory protein. The exogenous regulatory protein com 138-150, SEQID NOs: 152-156, SEQID NOs: 158-167, SEQ prises a polypeptide sequence having 80% or greater ID NOs: 169-175, SEQID NOs: 177-188, SEQID NOS:190 30 191, SEQ ID NOs: 193-194, SEQ ID NO: 196, SEQ ID sequence identity to a polypeptide sequence selected from the NO:198, SEQID NOs:200-204, SEQID NOs:206-215, SEQ group consisting of SEQ ID NOS:2-5, SEQ ID NOS:7-16, ID NO:217, SEQID NO:219, SEQID NOs:221-236, and the SEQID NOs: 18-33, SEQ ID NOS:35-42, SEQ ID NOs:44 consensus sequences set forth in FIGS. 1-22. The nucleic acid 61, SEQID NOs:63-70, SEQID NOs:72-79, SEQID NOs: is operably linked to a regulatory region that modulates tran 81-86, SEQID NOs:88-99, SEQ ID NOs: 101-113, SEQ ID 35 Scription of the regulatory protein in the plant cell. The plant NOs: 115-122, SEQID NOs: 124-136, SEQID NOs: 138-150, cell further comprises an exogenous regulatory region oper SEQID NOs: 152-156, SEQID NOs: 158-167, SEQID NOs: ably linked to a sequence of interest, where the exogenous 169-175, SEQID NOs: 177-188, SEQID NOS:190-191, SEQ regulatory region is associated with said regulatory protein, IDNOs: 193-194, SEQIDNO:196, SEQIDNO:198, SEQID and where the exogenous regulatory region comprises a NOs:200-204, SEQID NOs:206-215, SEQID NO:217, SEQ 40 nucleic acid having 80% or greater sequence identity to a ID NO:219, SEQ ID NOS:221-236, and the consensus regulatory region selected from the group consisting of SEQ sequences set forth in FIGS. 1-22. The exogenous regulatory ID NOS:237-252. The plant cell is grown under conditions protein and the endogenous regulatory region are associated. effective for the expression of the regulatory protein. The plant cell is grown under conditions effective for the In another aspect, a method of modulating the expression expression of the exogenous regulatory protein. 45 level of one or more endogenous Papaveraceae genes In another aspect, a method of expressing an exogenous involved in alkaloid biosynthesis is provided. The method sequence of interest is provided. The method comprises comprises transforming a cell of a member of the Papaver growing a plant cell comprising an exogenous regulatory aceae family with a recombinant nucleic acid construct, region operably linked to a sequence of interest. The exog where the nucleic acid construct comprises a nucleic acid enous regulatory region comprises a nucleic acid having 80% 50 encoding a regulatory protein comprising a polypeptide or greater sequence identity to a regulatory region selected sequence selected from the group consisting of SEQID NOs: from the group consisting of SEQID NOS:237-252. The plant 2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, SEQID NOs: cell further comprises a nucleic acid encoding an endogenous 35-42, SEQ ID NOs:44-61, SEQ ID NOs:63-70, SEQ ID regulatory protein. The endogenous regulatory protein com NOs:72-79, SEQID NOs:81-86, SEQID NOs:88-99, SEQ prises a polypeptide sequence having 80% or greater 55 ID NOs: 101-113, SEQID NOs: 115-122, SEQID NOs: 124 sequence identity to a polypeptide sequence selected from the 136, SEQID NOs:138-150, SEQID NOs: 152-156, SEQID group consisting of SEQ ID NOS:2-5, SEQ ID NOS:7-16, NOs:158-167, SEQID NOs: 169-175, SEQID NOs: 177-188, SEQID NOs: 18-33, SEQ ID NOS:35-42, SEQ ID NOs:44 SEQ ID NOS:190-191, SEQ ID NOs: 193-194, SEQ ID 61, SEQID NOs:63-70, SEQID NOs:72-79, SEQID NOs: NO:196, SEQ ID NO:198, SEQ ID NOs:200-204, SEQ ID 81-86, SEQID NOs:88-99, SEQ ID NOs: 101-113, SEQ ID 60 NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID NOs: 115-122, SEQID NOs: 124-136, SEQID NOs: 138-150, NOs:221-236, and the consensus sequences set forth in FIGS. SEQID NOs: 152-156, SEQID NOs: 158-167, SEQID NOs: 1-22. The nucleic acid is operably linked to a regulatory 169-175, SEQID NOs: 177-188, SEQID NOS:190-191, SEQ region that modulates transcription in the family member. IDNOs: 193-194, SEQIDNO:196, SEQIDNO:198, SEQID In another aspect, a method of producing one or more NOs:200-204, SEQID NOs:206-215, SEQID NO:217, SEQ 65 alkaloids in a plant cell is provided. The method comprises ID NO:219, SEQ ID NOS:221-236, and the consensus growing a plant cell comprising an exogenous nucleic acid. sequences set forth in FIGS. 1-22. The regulatory region and The exogenous nucleic acid comprises a nucleic acid encod US 7,795,503 B2 10 ing a regulatory protein comprising a polypeptide sequence NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID having 80% or greater sequence identity to a polypeptide NOs:221-236, and the consensus sequences set forth in FIGS. sequence selected from the group consisting of SEQID NOs: 1-22. The nucleic acid is operably linked to a regulatory 2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, SEQID NOs: region that modulates transcription in the family member. 35-42, SEQ ID NOs:44-61, SEQ ID NOs:63-70, SEQ ID Unless otherwise defined, all technical and scientific terms NOs:72-79, SEQID NOs:81-86, SEQID NOs:88-99, SEQ used herein have the same meaning as commonly understood ID NOs: 101-113, SEQID NOs: 115-122, SEQID NOs: 124 by one of ordinary skill in the art to which this invention 136, SEQID NOs:138-150, SEQID NOs: 152-156, SEQID pertains. Although methods and materials similar or equiva NOs:158-167, SEQID NOs: 169-175, SEQID NOs: 177-188, lent to those described herein can be used to practice the SEQ ID NOS:190-191, SEQ ID NOs: 193-194, SEQ ID 10 invention, suitable methods and materials are described NO: 196, SEQ ID NO:198, SEQ ID NOs:200-204, SEQ ID below. All publications, patent applications, patents, and NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID other references mentioned herein are incorporated by refer NOs:221-236, and the consensus sequences set forth in FIGS. ence in their entirety. In case of conflict, the present specifi 1-22. The nucleic acid is operably linked to a regulatory cation, including definitions, will control. In addition, the region that modulates transcription of the regulatory protein 15 materials, methods, and examples are illustrative only and not in the plant cell. The plant cell further comprises an endog intended to be limiting. enous regulatory region that is associated with the regulatory The details of one or more embodiments of the invention protein. The endogenous regulatory region is operably linked are set forth in the accompanying drawings and the descrip to a sequence of interest comprising a coding sequence for a tion below. Other features, objects, and advantages of the polypeptide involved in alkaloid biosynthesis. The plant cell invention will be apparent from the description and drawings, is capable of producing one or more alkaloids. The plant cell and from the claims. is grown under conditions effective for the expression of the regulatory protein. DESCRIPTION OF THE DRAWINGS In another aspect, a method of producing one or more alkaloids in a plant cell is provided. The method comprises 25 FIG. 1 is an alignment of the amino acid sequence of Lead growing a plant cell comprising an exogenous nucleic acid. cDNAID 23356923 (SEQID NO:2) with homologous and/or The exogenous nucleic acid comprises a nucleic acid encod orthologous amino acid sequences gil51970702 (SEQ ID ing a regulatory protein comprising a polypeptide sequence NO:3), CeresClone:871060 (SEQ ID NO:4), and Ceres having 80% or greater sequence identity to a polypeptide Clone: 1069147 (SEQ ID NO:5). The consensus sequence sequence selected from the group consisting of SEQID NOs: 30 determined by the alignment is set forth. 2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, SEQID NOs: FIG. 2 is an alignment of the amino acid sequence of Lead 35-42, SEQ ID NOs:44-61, SEQ ID NOs:63-70, SEQ ID cDNA ID 23357249 (SEQID NO:7) with homologous and/or NOs:72-79, SEQID NOs:81-86, SEQID NOs:88-99, SEQ orthologous amino acid sequences CeresClone: 1388283 ID NOs: 101-113, SEQID NOs: 115-122, SEQID NOs: 124 (SEQ ID NO:8), gil 1778374 (SEQ ID NO:9), gi743.9995 136, SEQID NOs:138-150, SEQID NOs: 152-156, SEQID 35 (SEQID NO:10), gi7489099 (SEQID NO:11), gi34906972 NOs:158-167, SEQID NOs: 169-175, SEQID NOs: 177-188, (SEQ ID NO:12), CeresClone:536457 (SEQ ID NO:13), SEQ ID NOS:190-191, SEQ ID NOs: 193-194, SEQ ID CeresClone:744170 (SEQ ID NO:14), CeresClone:579861 NO: 196, SEQ ID NO:198, SEQ ID NOs:200-204, SEQ ID (SEQ ID NO:15), and gi21388662 (SEQ ID NO:16). The NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID consensus sequence determined by the alignment is set forth. NOs:221-236, and the consensus sequences set forth in FIGS. 40 FIG. 3 is an alignment of the amino acid sequence of Lead 1-22. The nucleic acid is operably linked to a regulatory cDNA ID 23358452 (SEQID NO:18) with homologous and/ region that modulates transcription of the regulatory protein or orthologous amino acid sequences CeresClone:873 113 in the plant cell. The plant cell further comprises an exog (SEQ ID NO:19), CeresClone:956.177 (SEQ ID NO:20), enous regulatory region operably linked to a sequence of CeresClone:721511 (SEQ ID NO:21), CeresClone: 641329 interest. The exogenous regulatory region is associated with 45 (SEQ ID NO:22), CeresClone:782784 (SEQ ID NO:23), the regulatory protein, and the exogenous regulatory region gil 18645 (SEQ ID NO:24), gi1052956 (SEQ ID NO:25), comprises a nucleic acid having 80% or greater sequence gi436424 (SEQ ID NO:26), gi2894.109 (SEQ ID NO:27), identity to a regulatory region selected from the group con CeresClone:686294 (SEQID NO:28), gi50726318 (SEQID sisting of SEQ ID NOS:237-252. The sequence of interest NO:29), gi729737 (SEQ ID NO:30), gi|729736 (SEQ ID comprises a coding sequence for a polypeptide involved in 50 NO:31), CeresClone: 1060767 (SEQ ID NO:32), and alkaloid biosynthesis. The plant cell is grown under condi gi7446231 (SEQID NO:33). The consensus sequence deter tions effective for the expression of the regulatory protein. mined by the alignment is set forth. In another aspect, a method of modulating an amount of FIG. 4 is an alignment of the amino acid sequence of Lead one or more alkaloid compounds in a Papaveraceae family cDNA ID 233601 14 (SEQID NO:35) with homologous and/ member is provided. The method comprises transforming a 55 or orthologous amino acid sequences CeresClone: 1382382 member of the Papaveraceae family with a recombinant (SEQ ID NO:36), CeresClone: 1561543 (SEQ ID NO:37), nucleic acid construct. The nucleic acid construct comprises gi51964362 (SEQID NO:38), CeresClone:557 109 (SEQID a nucleic acid encoding a regulatory protein comprising a NO:39), gi50912679 (SEQ ID NO:40), gi51535177 (SEQ polypeptide sequence selected from the group consisting of ID NO:41), and CeresClone:888753 (SEQ ID NO:42). The SEQ ID NOS:2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, 60 consensus sequence determined by the alignment is set forth. SEQID NOS:35-42, SEQ ID NOs:44-61, SEQ ID NOs:63 FIG. 5 is an alignment of the amino acid sequence of Lead 70, SEQID NOs:72-79, SEQID NOs:81-86, SEQID NOs: cDNA ID 23366941 (SEQID NO:44) with homologous and/ 88-99, SEQID NOs: 101-113, SEQID NOs: 115-122, SEQID or orthologous amino acid sequences gil 12324817 (SEQ ID NOs: 124-136, SEQID NOs: 138-150, SEQID NOs: 152-156, NO:45), gi55584076 (SEQID NO:46), CeresClone:303971 SEQID NOs: 158-167, SEQID NOs: 169-175, SEQID NOs: 65 (SEQ ID NO:47), gil 16516825 (SEQ ID NO:50), Ceres 177-188, SEQID NOS:190-191, SEQID NOs: 193-194, SEQ Clone: 1000657 (SEQ ID NO:52), gil 16516823 (SEQ ID ID NO:196, SEQID NO: 198, SEQID NOs:200-204, SEQID NO:53), gi2982285 (SEQ ID NO:54), CeresClone:963426 US 7,795,503 B2 11 12 (SEQ ID NO:55), CeresClone:682557 (SEQ ID NO:56), 555364 (SEQ ID NO:133), CeresClone: 94.4101 (SEQ ID gi59042581 (SEQID NO:58), CeresClone:602368 (SEQID NO:134), CeresClone:569593 (SEQ ID NO:135), and NO:59), and CeresClone:1114184 (SEQ ID NO:61). The gi50927517 (SEQ ID NO:136). The consensus sequence consensus sequence determined by the alignment is set forth. determined by the alignment is set forth. FIG. 6 is an alignment of the amino acid sequence of Lead FIG. 13 is an alignment of the amino acid sequence of Lead cDNA ID 23371050 (SEQID NO:63) with homologous and/ cDNA ID 23416527 (SEQ ID NO: 138) with homologous or orthologous amino acid sequences CeresClone: 962327 and/or orthologous amino acid sequences gill 4140141 (SEQ (SEQ ID NO:64), CeresClone:1101577 (SEQ ID NO:65), ID NO:139), gi7385636 (SEQ ID NO:141), gi50927517 CeresClone:634261 (SEQID NO:66), gi5031281 (SEQ ID (SEQ ID NO: 142), gi32401273 (SEQ ID NO:143), NO:67), gi35187687 (SEQID NO:68), gi34978689 (SEQ 10 gi3342211 (SEQID NO:144), CeresClone:605218 (SEQID ID NO:69), and gi34909836 (SEQID NO:70). The consen NO:145), gi57012759 (SEQ ID NO:146), gi57012876 Sus sequence determined by the alignment is set forth. (SEQ ID NO:147), and CeresClone:569593 (SEQ ID FIG. 7 is an alignment of the amino acid sequence of Lead NO:149). The consensus sequence determined by the align cDNA ID 23383878 (SEQID NO:72) with homologous and/ ment is set forth. or orthologous amino acid sequences CeresClone:94850 15 FIG. 14 is an alignment of the amino acid sequence of Lead (SEQ ID NO:73), gi21689807 (SEQ ID NO:74), cDNA ID 23419038 (SEQ ID NO:152) with homologous gil 18391322 (SEQID NO:75), CeresClone: 17426 (SEQ ID and/or orthologous amino acid sequences CeresClone: NO:76), CeresClone: 11593 (SEQ ID NO:77), CeresClone: 473902 (SEQ ID NO:153), Ceres.Cione: 1469452 (SEQ ID 1087844 (SEQID NO:78), and CeresClone:963628 (SEQID NO:154), gi41351817(SEQ ID NO:155), and gi33324520 NO: 79). The consensus sequence determined by the align (SEQ ID NO:156). The consensus sequence determined by ment is set forth. the alignment is set forth. FIG. 8 is an alignment of the amino acid sequence of Lead FIG.15 is an alignment of the amino acid sequence of Lead cDNA ID 23385144 (SEQID NO:81) with homologous and/ cDNA ID 23427553 (SEQ ID NO:158) with homologous or orthologous amino acid sequences CeresClone:473126 and/or orthologous amino acid sequences CeresClone: (SEQID NO:82), gi5428.7494 (SEQID NO:83), and Ceres 25 95.6457 (SEQ ID NO:159), CeresClone: 1172789 (SEQ ID Clone:238614 (SEQ ID NO:84). The consensus sequence NO:160), CeresClone:480785 (SEQ ID NO:161), Ceres determined by the alignment is set forth. Clone:85.9154 (SEQID NO:162), CeresClone:407007 (SEQ FIG. 9 is an alignment of the amino acid sequence of Lead ID NO:163), gi 13936312 (SEQ ID NO:164), CeresClone: cDNA ID 23385649 (SEQID NO:88) with homologous and/ 283597 (SEQ ID NO:165), CeresClone:443626 (SEQ ID or orthologous amino acid sequences CeresClone:474636 30 NO:166), and gil 13936314 (SEQ ID NO:167). The consen (SEQ ID NO:89), CeresClone: 1057375 (SEQ ID NO:90), Sus sequence determined by the alignment is set forth. CeresClone:1027534 (SEQID NO:91), gil 1632831 (SEQID FIG.16 is an alignment of the amino acid sequence of Lead NO:92), gi5669634 (SEQID NO:93), gi8895787 (SEQ ID cDNA ID 23472397 (SEQ ID NO:169) with homologous NO:94), CeresClone:638899 (SEQID NO:95), CeresClone: and/or orthologous amino acid sequences CeresClone: 348434 (SEQ ID NO:96), CeresClone: 1607224 (SEQ ID 35 554743 (SEQ ID NO:170), CeresClone: 1623097 (SEQ ID NO:97), gi50725389 (SEQ ID NO: 98), and gil 19225065 NO:171), gi3341468 (SEQ ID NO:172), CeresClone: (SEQID NO:99). The consensus sequence determined by the 1120474 (SEQ ID NO:173), CeresClone:729860 (SEQ ID alignment is set forth. NO:174), and gi37051131 (SEQID NO:175). The consen FIG.10 is an alignment of the amino acid sequence of Lead Sus sequence determined by the alignment is set forth. 532H5 (cDNA ID 23387851; SEQID NO:101) with homolo 40 FIG.17 is an alignment of the amino acid sequence of Lead gous and/or orthologous amino acid sequences gil30253268 cDNA ID 23522373 (5110H5; SEQ ID NO:177) with (SEQ ID NO:102), gi45826359 (SEQ ID NO:103), homologous and/or orthologous amino acid sequences gi45826360 (SEQ ID NO:104), gi37993864 (SEQ ID gi3608135 (SEQ ID NO:178), gi3336903 (SEQ ID NO: 105), CeresClone:707775 (SEQ ID NO:106), NO:180), CeresClone:545441 (SEQ ID NO:181), gi38257023 (SEQ ID NO:107), gi371.47896 (SEQ ID 45 gi5381313 (SEQ ID NO:182), gi3336906 (SEQ ID NO: 108), gi41351817 (SEQ ID NO:109), gi55824656 NO:183), gi13775109 (SEQID NO:184), gi435942 (SEQ (SEQ ID NO: 110), giG6269671 (SEQ ID NO:111), ID NO:185), and CeresClone:287677 (SEQID NO:188). The gi33638.194 (SEQ ID NO:112), and gi21908034 (SEQ ID consensus sequence determined by the alignment is set forth. NO:113). The consensus sequence determined by the align FIG. 18 is an alignment of the amino acid sequence of Lead ment is set forth. 50 cDNA ID 23655935 (5110C8: SEQ ID NO: 190) with FIG.11 is an alignment of the amino acid sequence of Lead homologous and/or orthologous amino acid sequence cDNA ID 23387900 (SEQ ID NO:115) with homologous gi50928937 (SEQ ID NO:191). The consensus sequence and/or orthologous amino acid sequences CeresClone: determined by the alignment is set forth. 118184 (SEQ ID NO: 116), CeresClone: 118878 (SEQ ID FIG. 19 is an alignment of the amino acid sequence of Lead NO:117), CeresClone:3929 (SEQID NO:118), CeresClone: 55 cDNA ID 24365511 (511OE8: SEQ ID NO.193) with 12459 (SEQ ID NO:119), CeresClone: 1354021 (SEQ ID homologous and/or orthologous amino acid sequence NO:120), gi30017217 (SEQID NO:121), and CeresClone: gi52076911 (SEQ ID NO:194). The consensus sequence 109026 (SEQ ID NO:122). The consensus sequence deter determined by the alignment is set forth. mined by the alignment is set forth. FIG.20 is an alignment of the amino acid sequence of Lead FIG. 12 is an alignment of the amino acid sequence of Lead 60 cDNA ID 23377122 (SEQ ID NO:200) with homologous cDNA ID 23401690 (SEQ ID NO:124) with homologous and/or orthologous amino acid sequences CeresClone: and/or orthologous amino acid sequences CeresClone: 467905 (SEQ ID NO:201), gi50907599 (SEQ ID NO:202), 605218 (SEQ ID NO:125), gi57012759 (SEQ ID NO.126), CeresClone:826195 (SEQ ID NO:203), and CeresClone: CeresClone:6397 (SEQ ID NO.127), CeresClone:282666 450772 (SEQ ID NO:204). The consensus sequence deter (SEQ ID NO:128), gi32401273 (SEQ ID NO:129), Ceres 65 mined by the alignment is set forth. Clone:592713 (SEQ ID NO:130), gi3342211 (SEQ ID FIG.21 is an alignment of the amino acid sequence of Lead NO:131), gi57012876 (SEQ ID NO:132), CeresClone: cDNA ID 23388445 (SEQ ID NO:206) with homologous US 7,795,503 B2 13 14 and/or orthologous amino acid sequences CeresClone: compounds, under a desired environmental condition or in a 538877 (SEQ ID NO:212), gi50907243 (SEQ ID NO:213), desired plant developmental pathway. For example, the use of CeresClone:260992 (SEQ ID NO:214), and CeresClone: recombinant regulatory proteins in plants, such as Papaver 634320 (SEQ ID NO:215). The consensus sequence deter aceae plants, that are capable of producing one or more alka mined by the alignment is set forth. loids, can permit selective modulation of the amount of Such FIG.22 is an alignment of the amino acid sequence of Lead compounds in Such plants. cDNA ID 23704869 (SEQ ID NO:221) with homologous and/or orthologous amino acid sequences CeresClone: Polypeptides 295738 (SEQ ID NO:224), gi7489532 (SEQ ID NO:227), The term “polypeptide' as used herein refers to a com gil 1076760 (SEQID NO:233), gi463212 (SEQID NO:235), 10 pound of two or more Subunit amino acids, amino acid ana and gil21435101 (SEQID NO:236). The consensus sequence logs, or other peptidomimetics, regardless of post-transla determined by the alignment is set forth. tional modification, e.g., phosphorylation or glycosylation. The subunits may be linked by peptide bonds or other bonds DETAILED DESCRIPTION such as, for example, ester or ether bonds. The term “amino 15 acid refers to natural and/or unnatural or synthetic amino Although a number of plant transcription factors are acids, including D/L optical isomers. Full-length proteins, known, as well as certain promoters and promoter motifs to analogs, mutants, and fragments thereofare encompassed by which these transcription factors bind, there has been consid this definition. erable uncertainty regarding the full range of promoters and The term "isolated with respect to a polypeptide refers to promoter motifs that are recognized by a particular transcrip a polypeptide that has been separated from cellular compo tion factor. Applicants have discovered novel methods of nents that naturally accompany it. Typically, the polypeptide screening for combinations, or associations, of regulatory is isolated when it is at least 60%, e.g., 70%, 80%, 90%,95%, proteins and regulatory regions. These discoveries can be or 99%, by weight, free from proteins and naturally occurring used to create plant cells and plants containing 1) a nucleic organic molecules that are naturally associated with it. In acid encoding a regulatory protein, and/or 2) a nucleic acid 25 general, an isolated polypeptide will yield a single major band including a regulatory region associated with a given regula on a reducing and/or non-reducing polyacrylamide gel. Iso tory protein, e.g., to modulate expression of a sequence of lated polypeptides can be obtained, for example, by extrac interest operably linked to the regulatory region. tion from a natural source (e.g., plant tissue), chemical Syn Thus, in one aspect, the invention features a method for thesis, or by recombinant production in a host plant cell. To identifying a regulatory protein capable of activating a regu 30 recombinantly produce a polypeptide, a nucleic acid latory region. The method involves screening for the ability of sequence containing a nucleotide sequence encoding a the regulatory protein to modulate expression of a reporter polypeptide of interest can be ligated into an expression vec that is operably linked to the regulatory region. The ability of tor and used to transform a bacterial, eukaryotic, or plant host the regulatory protein to modulate expression of the reporter cell, e.g., insect, yeast, mammalian, or plant cells. is determined by monitoring reporter activity. 35 Polypeptides described herein include regulatory proteins. A regulatory protein and a regulatory region are considered Such a regulatory protein typically is effective for modulating to be “associated when the regulatory protein is capable of transcription of a coding sequence, e.g., an endogenous regu modulating expression, either directly or indirectly, of a latory protein, Such as an endogenous transcription factor, nucleic acid operably linked to the regulatory region. For involved in alkaloid biosynthesis pathways; other endog example, a regulatory protein and a regulatory region can be 40 enous auxiliary proteins involved in transcription, e.g., of said to be associated when the regulatory protein directly polypeptides involved in alkaloid biosynthesis pathways; or binds to the regulatory region, as in a transcription factor an endogenous enzyme involved in alkaloid biosynthesis. promoter complex. In other cases, a regulatory protein and Modulation of transcription of a coding sequence can be regulatory region can be said to be associated when the regu either an increase or a decrease in transcription of the coding latory protein does not directly bind to the regulatory region. 45 sequence relative to the average rate or level of transcription A regulatory protein and a regulatory region can also be said of the coding sequence in a control plant. to be associated when the regulatory protein indirectly affects A regulatory protein can contain an AP2 domain charac transcription by being a component of a protein complex teristic of polypeptides belonging to the AP2/EREBP family involved in transcriptional regulation or by noncovalently of plant transcription factor polypeptides. AP2 (APETALA2) binding to a protein complex involved in transcriptional regu 50 and EREBPs (ethylene-responsive element binding proteins) lation. In some cases, a regulatory protein and regulatory are prototypic members of a family of transcription factors region can be said to be associated and indirectly affect tran unique to plants, whose distinguishing characteristic is that Scription when the regulatory protein participates in or is a they contain the so-called AP2 DNA-binding domain. AP2/ component of a signal transduction cascade or a proteasome EREBP genes form a large multigene family encoding degradation pathway, e.g., of repressors, that results in tran 55 polypeptides that play a variety of roles throughout the plant Scriptional amplification or repression. In some cases, regu life cycle: from being key regulators of several developmental latory proteins associate with regulatory regions and indi processes, such as floral organ identity determination and rectly affect transcription by, e.g., binding to methylated control of leaf epidermal cell identity, to forming part of the DNA, unwinding chromatin, binding to RNA, or modulating mechanisms used by plants to respond to various types of splicing. 60 biotic and environmental stress. SEQ ID NO:2, SEQ ID A regulatory protein and its associated regulatory region NO:101, SEQ ID NO.124, SEQ ID NO:138, SEQ ID can be used to selectively modulate expression of a sequence NO:152, and SEQ ID NO:217 set forth the amino acid of interest, when Such a sequence is operably linked to the sequences of DNA clones, identified herein as cDNA ID regulatory region. In addition, the use of Such regulatory 23356923 (SEQ ID NO:1), cDNA ID 23387851 (SEQ ID protein-regulatory region associations in plants can permit 65 NO:100), cDNA ID 23401690 (SEQID NO:123), cDNA ID selective modulation of the amount or rate of biosynthesis of 23416527 (SEQ ID NO:137), cDNA ID 23419038 (SEQ ID plant polypeptides and plant compounds, such as alkaloid NO:151), and cDNA ID 23395214 (SEQ ID NO:216), US 7,795,503 B2 15 16 respectively, that are predicted to encode AP2 domain-con (SEQ ID NO:154), gi41351817 (SEQ ID NO:155), and taining transcription factor polypeptides. gi33324520 (SEQID NO:156). A regulatory protein can comprise the amino acid sequence In some cases, a regulatory protein can include a polypep set forth in SEQID NO:2, SEQID NO:101, SEQID NO:124, tide having at least 80% sequence identity, e.g., 80%, 85%, SEQ ID NO: 138, SEQ ID NO:152, or SEQ ID NO:217. 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an Alternatively, a regulatory protein can be a homolog, amino acid sequence corresponding to any of SEQID NOs: ortholog, or variant of the polypeptide having the amino acid 3-5, SEQID NOs: 102-113, SEQ ID NOs:125-136, SEQ ID sequence set forthin SEQID NO:2, SEQID NO: 101, SEQID NOs: 139-150, SEQ ID NOs: 153-156, or the consensus NO:124, SEQ ID NO: 138, SEQ ID NO:152, or SEQ ID sequences set forth in FIG. 1, FIG. 10, FIG. 12, FIG. 13, or NO:217. For example, a regulatory protein can have an amino 10 FIG 14. acid sequence with at least 30% sequence identity, e.g., 30%, A regulatory protein can have one or more domains char 35%, 41%, 45%, 47%, 48%, 49%, 50%, 51%, 52%, 56%, acteristic of a Zinc finger transcription factor polypeptide. For 57%, 60%, 61%, 62%, 63%, 64%. 65%, 67%, 68%, 70%, example, a regulatory protein can contain a Zf-C3HC4 75%, 80%, 85%, 90%. 95%, 97%, 98%, or 99% sequence domain characteristic of a C3HC4 type (RING finger) zinc identity, to the amino acid sequence set forth in SEQID NO:2. 15 finger polypeptide. The RING finger is a specialized type of SEQID NO:101, SEQID NO.124, SEQID NO:138, SEQID zinc-finger of 40 to 60 residues that binds two atoms of zinc NO:152, or SEQID NO:217. and is reported to be involved in mediating protein-protein Amino acid sequences of homologs and/or orthologs of the interactions. There are two different variants, the C3HC4 polypeptide having the amino acid sequence set forth in SEQ type and a C3H2C3-type, which are related despite the dif ID NO:2, SEQ ID NO:101, SEQ ID NO.124, SEQ ID ferent cysteine/histidine pattern. The RING domain has been NO: 138, or SEQID NO:152 are provided in FIG. 1, FIG.10, implicated in diverse biological processes. Ubiquitin-protein FIG. 12, FIG. 13, or FIG. 14, respectively. Each of FIG. 1, ligases (E3s), which determine the substrate specificity for FIG. 10, FIG. 12, FIG. 13, and FIG. 14 also includes a con ubiquitylation, have been classified into HECT and RING sensus amino acid sequence determined by aligning homolo finger families. Various RING fingers exhibit binding to E2 gous and/or orthologous amino acid sequences with the 25 ubiquitin-conjugating enzymes. SEQID NO:35 sets forth the amino acid sequence set forth in SEQ ID NO:2, SEQ ID amino acid sequence of a DNA clone, identified herein as NO: 101, SEQ ID NO.124, SEQ ID NO:138, or SEQ ID cDNA ID 233601 14 (SEQ ID NO:34), that is predicted to NO:152, respectively. encode a C3HC4 type (RING finger) zinc-finger polypeptide. For example, the alignment in FIG. 1 provides the amino In some cases, a regulatory protein can containa Zf-C3HC4 30 domain and a Zf-CCCH domain characteristic of C-X8-C-X5 acid sequences of cDNA ID 23356923 (SEQ ID NO:2), C-X3-H type (and similar) Zinc finger transcription factor gi51970702 (SEQID NO:3), CeresClone:871060 (SEQ ID polypeptides. Polypeptides containing zinc finger domains of NO:4), and CeresClone: 1069147 (SEQID NO:5). the C-x8-C-x5-C-X3-H type include zinc finger polypeptides The alignment in FIG. 10 provides the amino acid from eukaryotes involved in cell cycle or growth phase-re sequences of 532H5 (cDNA ID 23387851; SEQID NO:101), 35 lated regulation, e.g. human TIS11B (butyrate response fac gi50253268 (SEQ ID NO:102), gi45826359 (SEQ ID tor 1), a predicted regulatory protein involved in regulating NO:103), gi45826360 (SEQ ID NO:104), gi37993864 the response to growth factors. Another protein containing (SEQ ID NO: 105), CeresClone:707775 (SEQ ID NO:106), this domain is the human splicing factor U2AF 35 kD subunit, gi38257023 (SEQ ID NO:107), gi371.47896 (SEQ ID which plays a role in both constitutive and enhancer-depen NO: 108), gi41351817 (SEQ ID NO:109), gi55824656 40 dent splicing by mediating essential protein-protein interac (SEQ ID NO: 110), giG6269671 (SEQ ID NO:111), tions and protein-RNA interactions required for 3' splice site gi33638.194 (SEQ ID NO:112), and gi21908034 (SEQ ID selection. It has been shown that different CCCH zinc finger NO:113). proteins interact with the 3' untranslated regions of various The alignment in FIG. 12 provides the amino acid mRNAs. SEQID NO:206 sets forth the amino acid sequence sequences of cDNA ID 23401690 (SEQID NO:124), Ceres 45 of a DNA clone, identified herein as cDNA ID 23388445 Clone:605218 (SEQ ID NO:125), gi57012759 (SEQ ID (SEQ ID NO:205), that is predicted to encode a zinc finger NO:126), CeresClone:6397 (SEQID NO:127), CeresClone: transcription factor polypeptide having a zf-CCCH and a 282666 (SEQID NO:128), gi32401273 (SEQ ID NO:129), Zf-C3HC4 domain. CeresClone:592713 (SEQID NO: 130), gi3342211 (SEQID In some cases, a regulatory protein can contain a Zf-CCCH NO:131), gi57012876 (SEQ ID NO:132), CeresClone: 50 domain and a YTH domain characteristic of a YT521-B-like 555364 (SEQ ID NO:133), CeresClone: 94.4101 (SEQ ID family polypeptide. The YT521-B-like family contains NO:134), CeresClone:569593 (SEQ ID NO:135), and YT521-B, a putative splicing factor from rat. YT521-B is a gi50927517 (SEQID NO:136). tyrosine-phosphorylated nuclear protein that interacts with The alignment in FIG. 13 provides the amino acid the nuclear transcriptosomal component scaffold attachment sequences of cDNA ID 23416527 (SEQ ID NO:138), 55 factor B and the 68 kDa Src substrate associated during mito gil 14140141 (SEQ ID NO:139), gil 17385636 (SEQ ID sis, Samó8. In vivo splicing assays have reportedly demon NO:141), gi50927517 (SEQ ID NO: 142), gi32401273 strated that YT521-B can modulate alternative splice site (SEQ ID NO:143), gi3342211 (SEQ ID NO:144), Ceres selection in a concentration-dependent manner. The domain Clone:605218 (SEQ ID NO:145), gi57012759 (SEQ ID is predicted to have four alpha helices and six beta strands. NO:146), gil570 12876 (SEQID NO:147), and CeresClone: 60 SEQIDNO:193 sets forth the amino acid sequence of a DNA 56.9593 (SEQID NO:149). Other homologs and/or orthologs clone, identified herein as cDNA ID 24365511 (SEQ ID of SEQID NO: 138 include gi56567585 (SEQID NO:140), NO192), that is predicted to encode a polypeptide having a CeresClone:398626 (SEQ ID NO:148), and CeresClone: Zf-CCCH domain and a YTH domain. 555364 (SEQID NO:150). In some cases, a regulatory protein can contain a Zf-Dof The alignment in FIG. 14 provides the amino acid 65 domain characteristic of a Dof domain Zinc finger transcrip sequences of cDNA ID 23419038 (SEQID NO:152), Ceres tion factor polypeptide. Dof (DNA-binding with one finger) Clone:473902 (SEQ ID NO:153), CeresClone: 1469452 domain polypeptides are plant-specific transcription factor US 7,795,503 B2 17 18 polypeptides with a highly conserved DNA-binding domain. Amino acid sequences of homologs and/or orthologs of the A Dof domain is a zinc finger DNA-binding domain that polypeptide having the amino acid sequence set forth in SEQ shows resemblance to the CyS2 Zinc finger, although it has a ID NO:35, SEQ ID NO:206, SEQ ID NO.193, SEQ ID longer putative loop where an extra Cys residue is conserved. NO:169, SEQID NO:158, and SEQID NO:63 are provided AOBP, a DNA-binding protein in pumpkin (Cucurbita in FIG. 4, FIG. 21, FIG. 19, FIG. 16, FIG. 15, and FIG. 6, maxima), contains a 52 amino acid Dof domain, which is respectively. Each of FIG.4, FIG. 21, FIG. 19, FIG. 16, FIG. highly conserved in several DNA-binding proteins of higher 15, and FIG. 6 also includes a consensus amino acid sequence plants. SEQID NO: 169 sets forth the amino acid sequence of determined by aligning homologous and/or orthologous a DNA clone, identified herein as cDNA ID 23472397 (SEQ amino acid sequences with the amino acid sequence set forth ID NO:168), that is predicted to encode a Dof domain zinc 10 in SEQID NO:35, SEQID NO:206, SEQID NO.193, SEQ ID NO:169, SEQ ID NO:158, or SEQ ID NO:63, respec finger transcription factor polypeptide. tively. In some cases, a regulatory protein can contain a Zf-CW For example, the alignment in FIG. 4 provides the amino domain characteristic of a CW-type Zinc finger transcription acid sequences of cDNA ID 233601 14 (SEQ ID NO:35), factor polypeptide. The Zf-CW domain is predicted to be a 15 CeresClone: 1382382 (SEQ ID NO:36), CeresClone: highly specialized mononuclear four-cysteine Zinc finger that 1561543 (SEQ ID NO:37), gi51964362 (SEQ ID NO:38), plays a role in DNA binding and/or promoting protein-protein CeresClone:557 109 (SEQID NO:39), gi50912679 (SEQID interactions in complicated eukaryotic processes including NO:40), gi51535177 (SEQ ID NO:41), and CeresClone: chromatin methylation status and early embryonic develop 888753 (SEQID NO:42). ment. The Zf-CW domain is found exclusively in vertebrates, The alignment in FIG. 21 provides the amino acid Vertebrate-infecting parasites and higher plants. A regulatory sequences of cDNA ID 23388445 (SEQ ID NO:206), Ceres protein having a Zf-CW domain can also have a methyl-CpG Clone:538877 (SEQ ID NO:212), gi50907243 (SEQ ID binding domain (MBD). Regulatory proteins with a methyl NO:213), CeresClone:260992 (SEQID NO:214), and Ceres binding domain, in association with other proteins, have pref Clone:634320 (SEQ ID NO:215). Other homologs and/or erential binding affinity to methylated DNA, which results in 25 orthologs of SEQID NO:206 include gi21618279 (SEQ ID changes in chromatin structure leading to transcriptional acti NO:207), CeresClone:3542 (SEQID NO:208), CeresClone: vation or transcriptional repression of affected genes. SEQID 29363 (SEQ ID NO:209), gi23198042 (SEQ ID NO:210), NO:158 and SEQ ID NO:219 set forth the amino acid and CeresClone: 1104497 (SEQID NO:211). sequences of DNA clones, referred to herein as cDNA ID The alignment in FIG. 19 provides the amino acid 23427553 (SEQID NO:157) and cDNA ID 23447935 (SEQ 30 sequences of cDNA ID 24365511 (5110E8: SEQID NO.193) ID NO:218), respectively, that are predicted to encode and gi52076911 (SEQID NO: 194). polypeptides containing Zf-CW and methyl-CpG binding The alignment in FIG. 16 provides the amino acid domains. sequences of cDNA ID 23472397 (SEQ ID NO:169), Ceres In some cases, a regulatory protein can contain a Zf-AN1 Clone:554743 (SEQ ID NO:170), CeresClone: 1623097 domain characteristic of an AN1-like Zinc finger transcription 35 (SEQ ID NO: 171), gi334.1468 (SEQ ID NO:172), Ceres factor polypeptide. The ZfAN1 domain was first identified as Clone: 1120474 (SEQ ID NO:173), CeresClone:729860 a Zinc finger at the C-terminus of An 1, a ubiquitin-like protein (SEQID NO:174), and gi37051131 (SEQID NO:175). in Xenopus laevis. The following pattern describes the Zinc The alignment in FIG. 15 provides the amino acid finger: C-X2-C-X(9-12)-C-X(11-2)-C-X4-C-X2-H-X5-H- sequences of cDNA ID 23427553 (SEQ ID NO:158), Ceres X-C, where X can be any amino acid, and the numbers in 40 Clone: 95.6457 (SEQ ID NO:159), CeresClone: 1172789 brackets indicate the number of residues. A ZfAN1 domain (SEQ ID NO:160), CeresClone:480785 (SEQ ID NO:161), has been identified in a number of as yet uncharacterized CeresClone:859154 (SEQID NO:162), CeresClone:407007 proteins from various sources. A regulatory protein having a (SEQ ID NO:163), gill 3936312 (SEQID NO:164), Ceres ZfAN1 domain can also have a ZfA20 domain. A20 (an Clone:283597 (SEQID NO:165), CeresClone:443626(SEQ inhibitor of cell death)-like zinc fingers are believed to medi 45 ID NO:166), and gil 13936314 (SEQID NO:167). ate self-association in A20. These fingers also mediate IL-1- The alignment in FIG. 6 provides the amino acid sequences induced NF-kappa Bactivation. SEQID NO:63 sets forth the of cDNA ID 23371050 (SEQ ID NO:63), CeresClone: amino acid sequence of a DNA clone, referred to herein as 962327 (SEQ ID NO:64), CeresClone:1101577 (SEQ ID cDNA ID 23371050 (SEQ ID NO:62) that is predicted to NO:65), CeresClone:634261 (SEQ ID NO:66), gi5031281 encode a Zinc finger transcription factor polypeptide having a 50 (SEQ ID NO:67), gi35187687 (SEQ ID NO:68), ZfAN1 domain and a ZfA20 domain. gi34978689 (SEQ ID NO:69), and gi34909836 (SEQ ID A regulatory protein can comprise the amino acid sequence NO:70). set forth in SEQ ID NO:35, SEQ ID NO:206, SEQ ID In some cases, a regulatory protein can include a polypep NO:193, SEQ ID NO:169, SEQ ID NO:158, SEQ ID tide having at least 80% sequence identity, e.g., 80%, 85%, NO:219, or SEQID NO:63. Alternatively, a regulatory pro 55 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an tein can be a homolog, ortholog, or variant of the polypeptide amino acid sequence corresponding to any of SEQID NOs: having the amino acid sequence set forth in SEQID NO:35, 36-42, SEQ ID NOs:207-215, SEQ ID NO:194, SEQ ID SEQID NO:206, SEQID NO.193, SEQID NO:169, SEQID NOs: 170-175, SEQID NOs: 159-167, SEQID NOs:64–70, or NO:158, SEQID NO:219, or SEQID NO:63. For example, a the consensus sequence set forth in FIG. 4, FIG. 21, FIG. 19. regulatory protein can have an amino acid sequence with at 60 FIG. 16, FIG. 15, or FIG. 6. least 30% sequence identity, e.g., 31%, 35%, 40%, 45%, A regulatory protein can contain a myb-like DNA-binding 47%, 48%, 49%, 50%, 51%, 52%, 56%, 57%, 60%, 61%, domain characteristic of myb-like transcription factor 62%, 63%, 64%, 65%, 67%, 68%, 70%, 75%, 80%, 85%, polypeptides. The retroviral oncogene V-myb and its cellular 90%.95%,97%.98%, or 99% sequence identity, to the amino counterpart c-myb encode nuclear DNA-binding proteins. acid sequence set forth in SEQID NO:35, SEQID NO:206, 65 These proteins belong to the SANT domain family that spe SEQID NO.193, SEQID NO: 169, SEQID NO:158, SEQID cifically recognize the sequence YAAC(G/T)G. In my b, one NO:219, or SEQID NO:63. of the most conserved regions consisting of three tandem US 7,795,503 B2 19 20 repeats has been shown to be involved in DNA-binding. SEQ 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequence identity, ID NO:198 and SEQ ID NO:200 set forth the amino acid to the amino acid sequence set forth in SEQID NO: 196. sequences of DNA clones, identified herein as cDNA ID A regulatory protein can have one or more domains char 23462512 (SEQID NO:197) and cDNA ID 23377122 (SEQ acteristic of a basic-leucine Zipper (bZIP) transcription factor ID NO:199), respectively, that are predicted to encode myb polypeptide. For example, a regulatory protein can have a like transcription factor polypeptides. bZIP 2 domain characteristic of a basic-leucine Zipper A regulatory protein can comprise the amino acid sequence (bZIP) transcription factor polypeptide. The basic-leucine set forth in SEQ ID NO:198 or SEQ ID NO:200. Alterna Zipper (bZIP) transcription factor polypeptides of eukaryotes tively, a regulatory protein can be a homolog, ortholog, or contain a basic region mediating sequence-specific DNA variant of the polypeptide having the amino acid sequence set 10 binding and a leucine Zipper region that is required for dimer forth in SEQID NO:198 or SEQID NO:200. For example, a ization. SEQID NO:221 sets forth the amino acid sequence of regulatory protein can have an amino acid sequence with at a DNA clone, identified herein as cDNA ID 23704869 (SEQ least 35% sequence identity, e.g., 36%, 39%, 41%, 45%, ID NO:220), that is predicted to encode a basic-leucine Zipper 47%, 48%, 49%, 50%, 51%, 52%, 56%, 57%, 60%, 61%, (bZIP) transcription factor polypeptide. 62%, 63%, 64%, 65%, 67%, 68%, 70%, 75%, 80%, 85%, 15 In some cases, a regulatory protein can have a bZIP Maf 90%.95%,97%.98%, or 99% sequence identity, to the amino domain and an MFMR domain, both of which are character acid sequence set forth in SEQ ID NO:198 or SEQ ID istic of basic region leucine Zipper (bZIP) domain-containing NO:2OO. transcription factor polypeptides. The Maf family of basic Amino acid sequences of homologs and/or orthologs of the region leucine Zipper (bZIP) domain-containing transcription polypeptide having the amino acid sequence set forth in SEQ factor polypeptides may be related to bZIP 1. An MFMR ID NO:200 are provided in FIG. 20. FIG. 20 also includes a region is found in the N-terminus of the bZIP 1 transcription consensus amino acid sequence determined by aligning factor domain. It is between 150 and 200 amino acids in homologous and/or orthologous amino acid sequences with length. The N-terminal half is rich in proline residues and has the amino acid sequence set forth in SEQID NO:200. been termed the PRD (proline rich domain). The C-terminal For example, the alignment in FIG. 20 provides the amino 25 half is more polar and has been called the MFMR (multifunc acid sequences of cDNA ID 23377122 (SEQ ID NO:200), tional mosaic region). It has been suggested that this family is CeresClone:467905 (SEQ ID NO:201), gi50907599 (SEQ composed of three sub-families called A, B and C, classified ID NO:202), CeresClone:826195 (SEQ ID NO:203), and according to motif composition, and that some of these motifs CeresClone:450772 (SEQID NO:204). may be involved in mediating protein-protein interactions. 30 SEQIDNO:177 sets forth the amino acid sequence of a DNA In some cases, a regulatory protein can include a polypep clone, identified herein as cDNA ID 23522373 (SEQ ID tide having at least 80% sequence identity, e.g., 80%, 85%, NO:176), that is predicted to encode a transcription factor 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an polypeptide having a bZIP Maf domain and an MFMR amino acid sequence corresponding to SEQID NO:201, SEQ domain. ID NO:202, SEQ ID NO:203, SEQ ID NO:204, or the con 35 A regulatory protein can comprise the amino acid sequence sensus sequence set forth in FIG. 20. set forth in SEQ ID NO:221 or SEQ ID NO:177. Alterna A regulatory protein can contain a WRKY DNA-binding tively, a regulatory protein can be a homolog, ortholog, or domain characteristic of a WRKY plant transcription factor variant of the polypeptide having the amino acid sequence set polypeptide. The WRKY domain is a 60 amino acid region forth in SEQID NO:221 or SEQID NO:177. For example, a that is defined by the conserved amino acid sequence 40 regulatory protein can have an amino acid sequence with at WRKYGQKat its N-terminal end, together with a novel Zinc least 40% sequence identity, e.g., 41%, 45%, 47%, 48%, finger-like motif. The WRKY domain is found in one or two 49%, 50%, 51%, 52%, 56%, 57%, 60%, 61%, 62%, 63%, copies in a Superfamily of plant transcription factors involved 64%, 65%, 67%, 68%, 70%, 75%, 80%, 85%, 90%, 95%, in the regulation of various physiological programs that are 97%, 98%, or 99% sequence identity, to the amino acid unique to plants, including pathogen defense, Senescence, 45 sequence set forth in SEQID NO:221 or SEQID NO:177. trichome development and the biosynthesis of secondary Amino acid sequences of homologs and/or orthologs of the metabolites. The WRKY domain binds specifically to the polypeptide having the amino acid sequence set forth in SEQ DNA sequence motif (T)(T)TGAC(C/T), which is known as ID NO:221 and SEQID NO:177 are provided in FIG.22 and the W box. The invariant TGAC core of the W box is essential FIG. 17, respectively. Each of FIG. 22 and FIG. 17 also for function and WRKY binding. Some proteins known to 50 includes a consensus amino acid sequence determined by containa WRKY domain include Arabidopsis thaliana ZAP1 aligning homologous and/or orthologous amino acid (zinc-dependent activator protein-1); AtWRKY44/TTG2, a sequences with the amino acid sequence set forth in SEQID protein involved in trichome development and anthocyanin NO:221 or SEQID NO:177. pigmentation; and wild oat ABF1-2, two proteins involved in For example, the alignment in FIG.22 provides the amino the gibberellic acid-induced expression of the alpha-Amy2 55 acid sequences of cDNA ID 23704869 (SEQ ID NO:221), gene. SEQID NO:196 sets forth the amino acid sequence of CeresClone:295738 (SEQID NO:224), gi7489532 (SEQID a DNA clone, identified herein as cDNA ID 23468313 (SEQ NO:227), gi1076760 (SEQID NO:233), gi463212 (SEQID ID NO:195) that is predicted to encode a WRKY plant tran NO:235), and gi21435101 (SEQ ID NO:236). Other Scription factor polypeptide. homologs and/or orthologs of SEQ ID NO:221 include A regulatory protein can comprise the amino acid sequence 60 gil 16797791 (SEQ ID NO:222), gi1806261 (SEQ ID set forth in SEQID NO: 196. Alternatively, a regulatory pro NO:223), gil 168428 (SEQID NO:225), gill 144536 (SEQID tein can be a homolog, ortholog, or variant of the polypeptide NO:226), gi542.187 (SEQ ID NO:228), gi34897226 (SEQ having the amino acid sequence set forth in SEQID NO: 196. ID NO:229), gi4115746 (SEQ ID NO:230), gil 15865782 For example, a regulatory protein can have an amino acid (SEQ ID NO:231), CeresClone:235570 (SEQ ID NO:232), sequence with at least 35% sequence identity, e.g., 36%. 39%. 65 and gil 1869928 (SEQID NO:234). 41%, 45%, 47%, 48%, 49%, 50%, 51%, 52%, 56%, 57%, The alignment in FIG. 17 provides the amino acid 60%, 61%, 62%, 63%, 64%. 65%, 67%, 68%, 70%, 75%, sequences of cDNA ID 23522373 (5110H5; SEQ ID US 7,795,503 B2 21 22 NO:177), gi3608135 (SEQID NO:178), gi3336903 (SEQ play important architectural roles in the assembly of nucle ID NO:180), CeresClone:545441 (SEQ ID NO:181), oprotein complexes in a variety of biological processes, for gi5381313 (SEQ ID NO:182), gi3336906 (SEQ ID example the initiation of transcription and DNA repair. In NO:183), gi13775109 (SEQID NO:184), gi435942 (SEQ addition to the HMG1 and HMG2 proteins, HMG domains ID NO:185), and CeresClone:287677 (SEQ ID NO:188). occur in single or multiple copies in the following protein Other homologs and/or orthologs of SEQID NO:177 include classes: the SOX family of transcription factors; SRY sex CeresClone:1 188156 (SEQ ID NO:179), CeresClone: determining regionY protein and related proteins; LEF1 lym 523155 (SEQ ID NO:186), and gil 13775107 (SEQ ID phoid enhancer binding factor 1; SSRP recombination signal NO:187). recognition protein; MTF1 mitochondrial transcription factor In some cases, a regulatory protein can include a polypep 10 1; UBF1/2 nucleolar transcription factors; Abf2 yeast ARS tide having at least 80% sequence identity, e.g., 80%, 85%, binding factor; and Saccharomyces cerevisiae transcription 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an factors IXr1, Rox1, Nhp6a, Nhp6b and Spp41. SEQID NO:18 amino acid sequence corresponding to any of SEQID NOs: sets forth the amino acid sequence of a DNA clone, identified 222-236, SEQID NOS:178-188, or the consensus sequence herein as cDNA ID 23358452 (SEQ ID NO:17), that is pre set forth in FIG. 22 or FIG. 17. 15 dicted to encode an HMG box-containing polypeptide. A regulatory protein can have an HTH 3 domain charac A regulatory protein can comprise the amino acid sequence teristic of helix-turn helix DNA binding polypeptides. The set forth in SEQID NO:18. Alternatively, a regulatory protein large family of DNA binding helix-turn helix proteins can be a homolog, ortholog, or variant of the polypeptide includes a bacterial plasmid copy control protein, bacterial having the amino acid sequence set forth in SEQID NO:18. methylases, various bacteriophage transcription control pro For example, a regulatory protein can have an amino acid teins, and a vegetative specific protein from Dictyostelium sequence with at least 40% sequence identity, e.g., 40%, 45%, discoideum. SEQ ID NO:88 sets forth the amino acid 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, sequence of a DNA clone, identified herein as cDNA ID 97%, 98%, or 99% sequence identity, to the amino acid 23385649 (SEQ ID NO:87), that is predicted to encode a sequence set forth in SEQID NO:18. helix-turn helix DNA-binding polypeptide. 25 Amino acid sequences of homologs and/or orthologs of the A regulatory protein can comprise the amino acid sequence polypeptide having the amino acid sequence set forth in SEQ set forth in SEQID NO:88. Alternatively, a regulatory protein ID NO:18 are provided in FIG. 3. FIG. 3 also includes a can be a homolog, ortholog, or variant of the polypeptide consensus amino acid sequence determined by aligning having the amino acid sequence set forth in SEQID NO:88. homologous and/or orthologous amino acid sequences with For example, a regulatory protein can have an amino acid 30 the amino acid sequence set forth in SEQID NO:18. sequence with at least 45% sequence identity, e.g., 45%, 50%, For example, the alignment in FIG. 3 provides the amino 55%, 60%. 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, acid sequences of cDNA ID 23358452 (SEQ ID NO:18), 98%, or 99% sequence identity, to the amino acid sequence CeresClone:873 113 (SEQ ID NO:19), CeresClone:956.177 set forth in SEQID NO:88. (SEQ ID NO:20), CeresClone:721511 (SEQ ID NO:21), Amino acid sequences of homologs and/or orthologs of the 35 CeresClone:641329 (SEQ ID NO:22), CeresClone:782784 polypeptide having the amino acid sequence set forth in SEQ (SEQ ID NO:23), gill 8645 (SEQ ID NO:24), gi1052956 ID NO:88 are provided in FIG. 9. FIG. 9 also includes a (SEQ ID NO:25), gi436424 (SEQ ID NO:26), gi2894.109 consensus amino acid sequence determined by aligning (SEQ ID NO:27), CeresClone:686294 (SEQ ID NO:28), homologous and/or orthologous amino acid sequences with gi50726318 (SEQ ID NO:29), gi729737 (SEQ ID NO:30), the amino acid sequence set forth in SEQID NO:88. 40 For example, the alignment in FIG. 9 provides the amino gi729736 (SEQID NO:31), CeresClone:1060767 (SEQ ID acid sequences of cDNA ID 23385649 (SEQ ID NO:88), NO:32), and gi7446231 (SEQ ID NO:33). CeresClone:474636 (SEQID NO:89), CeresClone: 1057375 In some cases, a regulatory protein can include a polypep (SEQ ID NO:90), CeresClone: 1027534 (SEQ ID NO:91), tide having at least 80% sequence identity, e.g., 80%, 85%, gil 1632831 (SEQID NO:92), gi5669634 (SEQID NO:93), 45 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an gi8895787 (SEQID NO:94), CeresClone:638899 (SEQ ID amino acid sequence corresponding to any of SEQID NOs: NO:95), CeresClone:348434 (SEQID NO:96), CeresClone: 19-33 or the consensus sequence set forth in FIG. 3. 1607224 (SEQID NO:97), gi0725389 (SEQID NO:98), and A regulatory protein can contain a GASA domain charac gil 19225065 (SEQID NO:99). teristic of a polypeptide belonging to the GASA gibberellin In some cases, a regulatory protein can include a polypep 50 regulated cysteine rich protein family. The expression of tide having at least 80% sequence identity, e.g., 80%, 85%, these proteins is up-regulated by the plant hormone gibberel 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an lin. Most of these gibberellin regulated proteins have a role in amino acid sequence corresponding to any of SEQID NOs: plant development. There are 12 conserved cysteine residues, 89-99 or the consensus sequence set forth in FIG. 9. making it possible for these proteins to posses six disulphide A regulatory protein can have an HMG (high mobility 55 bonds. SEQID NO:44 sets forth the amino acid sequence of group) box characteristic of a high mobility group (HMG or a DNA clone, identified herein as cDNA ID 23366941 (SEQ HMGB) protein. High mobility group (HMG or HMGB) ID NO:43), that is predicted to encode a gibberellin regulated proteins are a family of relatively low molecular weight non polypeptide. histone components in chromatin. HMG1 and HMG2 bind A regulatory protein can comprise the amino acid sequence single-stranded DNA preferentially and unwind double 60 set forth in SEQID NO:44. Alternatively, a regulatory protein Stranded DNA. Although they have no sequence specificity, can be a homolog, ortholog, or variant of the polypeptide HMG1 and HMG2 have a high affinity for bent or distorted having the amino acid sequence set forth in SEQID NO:44. DNA. HMG1 and HMG2 contain two DNA-binding HMG For example, a regulatory protein can have an amino acid box domains (A and B) and a long acidic C-terminal domain sequence with at least 50% sequence identity, e.g., 50%, 55%, rich in aspartic and glutamic acid residues. The acidic tail 65 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or modulates the affinity of the tandem HMG boxes in HMG1 99% sequence identity, to the amino acid sequence set forth in and 2 for a variety of DNA targets. HMG1 and 2 appear to SEQID NO:44. US 7,795,503 B2 23 24 Amino acid sequences of homologs and/or orthologs of the gil 18391322 (SEQID NO:75), CeresClone: 17426 (SEQID polypeptide having the amino acid sequence set forth in SEQ NO:76), CeresClone: 11593 (SEQ ID NO:77), CeresClone: ID NO:44 are provided in FIG. 5. FIG. 5 also includes a 1087844 (SEQID NO:78), and CeresClone:963628 (SEQID consensus amino acid sequence determined by aligning NO:79). homologous and/or orthologous amino acid sequences with 5 In some cases, a regulatory protein can include a polypep the amino acid sequence set forth in SEQID NO:44. tide having at least 80% sequence identity, e.g., 80%, 85%, For example, the alignment in FIG. 5 provides the amino 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an acid sequences of cDNA ID 23366941 (SEQ ID NO:44), amino acid sequence corresponding to any of SEQID NOs: gil 12324817 (SEQ ID NO:45), gi55584076 (SEQ ID 116-122, SEQID NOs:73-79, or the consensus sequence set NO:46), CeresClone:303971 (SEQID NO:47), gil 16516825 10 forth in FIG. 11 or FIG. 7. (SEQ ID NO:50), CeresClone:1000657 (SEQ ID NO:52), A regulatory protein can contain an RRM 1 domain char gil 16516823 (SEQID NO:53), gi2982285 (SEQID NO:54), acteristic of an RNA binding polypeptide. RNA recognition CeresClone:963426 (SEQ ID NO:55), CeresClone:682557 motifs (also known as RRM, RBD, or RNP domains) are (SEQ ID NO:56), gi59042581 (SEQ ID NO:58), Ceres found in a variety of RNA binding polypeptides, including Clone:602368 (SEQ ID NO:59), and CeresClone: 1114184 15 heterogeneous nuclear ribonucleoproteins (hnRNPs), (SEQ ID NO:61). Other homologs and/or orthologs of SEQ polypeptides implicated in regulation of alternative splicing, ID NO:44 include CeresClone: 1633647 (SEQ ID NO:48), and polypeptide components of small nuclear ribonucleopro CeresClone:314456 (SEQ ID NO:49), CeresClone:780025 teins (snRNPs). The RRM motif also appears in a few single (SEQID NO:51), CeresClone:646744 (SEQID NO:57), and stranded DNA binding proteins. The RRM structure consists CeresClone:566082 (SEQID NO:60). 2O of four strands and two helices arranged in an alpha/beta In some cases, a regulatory protein can include a polypep sandwich, with a third helix present during RNA binding in tide having at least 80% sequence identity, e.g., 80%, 85%, some cases. SEQID NO:7 sets forth the amino acid sequence 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an of a DNA clone, identified herein as cDNA ID 23357249 amino acid sequence corresponding to any of SEQID NOs: (SEQID NO:6), that is predicted to encode an RNA binding 45-61 or the consensus sequence set forth in FIG. 5. 25 polypeptide. A regulatory protein can contain a GRP domain character A regulatory protein can comprise the amino acid sequence istic of a polypeptide belonging to the glycine-rich protein set forth in SEQID NO:7. Alternatively, a regulatory protein family. This family of proteins includes several glycine-rich can be a homolog, ortholog, or variant of the polypeptide proteins as well as two nodulins 16 and 24. The family also having the amino acid sequence set forth in SEQID NO:7. contains proteins that are induced in response to various 30 For example, a regulatory protein can have an amino acid stresses. Some of the proteins that have a glycine-rich domain sequence with at least 50% sequence identity, e.g., 50%, 55%, (i.e., GRPs)are capable of binding to RNA, potentially affect 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or ing the stability and translatability of bound RNAs. SEQID 99% sequence identity, to the amino acid sequence set forth in NO: 115 sets forth the amino acid sequence of a DNA clone, SEQID NO:7. identified herein as cDNA ID 23387900 (SEQID NO:114), 35 Amino acid sequences of homologs and/or orthologs of the that is predicted to encode a glycine-rich protein. SEQ ID polypeptide having the amino acid sequence set forth in SEQ NO:72 sets forth the amino acid sequence of a DNA clone, ID NO:7 are provided in FIG. 2. FIG. 2 also includes a identified herein as cDNA ID 23383878 (SEQ ID NO:71), consensus amino acid sequence determined by aligning that is predicted to encode a glycine-rich RNA-binding pro homologous and/or orthologous amino acid sequences with tein. 40 the amino acid sequence set forth in SEQID NO:7. A regulatory protein can comprise the amino acid sequence For example, the alignment in FIG. 2 provides the amino set forth in SEQID NO:115 or SEQID NO:72. Alternatively, acid sequences of cDNA ID 23357249 (SEQ ID NO:7), a regulatory protein can be a homolog, ortholog, or variant of CeresClone: 1388283 (SEQID NO:8), gil 1778374 (SEQ ID the polypeptide having the amino acid sequence set forth in NO:9), gi743.9995 (SEQ ID NO:10), gi7489099 (SEQ ID SEQID NO:115 or SEQID NO:72. For example, a regulatory 45 NO:11), gi34906972 (SEQID NO:12), CeresClone:536457 protein can have an amino acid sequence with at least 75% (SEQ ID NO:13), CeresClone:744170 (SEQ ID NO:14), sequence identity, e.g., 75%, 80%, 85%, 90%, 95%, 97%, CeresClone:579861 (SEQ ID NO:15), and gi21388.662 98%, or 99% sequence identity, to the amino acid sequence (SEQID NO:16). set forth in SEQID NO: 115 or SEQID NO:72. In some cases, a regulatory protein can include a polypep Amino acid sequences of homologs and/or orthologs of the 50 tide having at least 80% sequence identity, e.g., 80%, 85%, polypeptide having the amino acid sequence set forth in SEQ 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an ID NO:115 and SEQID NO:72 are provided in FIG. 11 and amino acid sequence corresponding to any of SEQID NOs: FIG.7, respectively. Each of FIG. 11 and FIG. 7 also includes 8-16 or the consensus sequence set forth in FIG. 2. a consensus amino acid sequence determined by aligning A regulatory protein can contain a Mov34 domain charac homologous and/or orthologous amino acid sequences with 55 teristic of a Mov34/MPN/PAD-1 family polypeptide. Mov34 the amino acid sequence set forth in SEQID NO:115 or SEQ polypeptides are reported to act as regulatory Subunits of the ID NO:72. 26 proteasome, which is involved in the ATP-dependent deg For example, the alignment in FIG. 11 provides the amino radation of ubiquitinated proteins. Mov34 domains are found acid sequences of cDNA ID 23387900 (SEQ ID NO:115), in the N-terminus of the proteasome regulatory Subunits, CeresClone:1 18184 (SEQID NO:116), CeresClone:1 18878 60 eukaryotic initiation factor 3 (eIF3) subunits, and regulators (SEQ ID NO:117), CeresClone:3929 (SEQ ID NO:118), of transcription factors. SEQID NO:81 sets forth the amino CeresClone: 12459 (SEQID NO:119), CeresClone: 1354,021 acid sequence of a DNA clone, identified herein as cDNA ID (SEQ ID NO:120), gi30017217 (SEQ ID NO:121), and 23385144 (SEQ ID NO:80), that is predicted to encode a CeresClone: 109026 (SEQID NO:122). Mov34/MPN/PAD-1 family polypeptide. The alignment in FIG.7 provides the amino acid sequences 65 A regulatory protein can comprise the amino acid sequence of cDNA ID 23383878 (SEQID NO:72), CeresClone:94850 set forth in SEQID NO:81. Alternatively, a regulatory protein (SEQ ID NO:73), gi21689807 (SEQ ID NO:74), can be a homolog, ortholog, or variant of the polypeptide US 7,795,503 B2 25 26 having the amino acid sequence set forth in SEQID NO:81. A regulatory protein can include additional amino acids For example, a regulatory protein can have an amino acid that are not involved in modulating gene expression, and thus sequence with at least 60% sequence identity, e.g., 60%. 65%, can be longer than would otherwise be the case. For example, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% a regulatory protein can include an amino acid sequence that sequence identity, to the amino acid sequence set forth in SEQ functions as a reporter. Such a regulatory protein can be a ID NO:81. fusion protein in which a green fluorescent protein (GFP) Amino acid sequences of homologs and/or orthologs of the polypeptide is fused to, e.g., SEQ ID NO:7, or in which a polypeptide having the amino acid sequence set forth in SEQ yellow fluorescent protein (YFP) polypeptide is fused to, e.g., ID NO:81 are provided in FIG. 8. FIG. 8 also includes a SEQID NO:35. In some embodiments, a regulatory protein consensus amino acid sequence determined by aligning 10 includes a purification tag, a chloroplast transit peptide, a homologous and/or orthologous amino acid sequences with mitochondrial transit peptide, or a leader sequence added to the amino acid sequence set forth in SEQ ID NO:81. For the amino or carboxy terminus. example, the alignment in FIG. 8 provides the amino acid Regulatory protein candidates Suitable for use in the inven sequences of cDNA ID 23385144 (SEQ ID NO:81), Ceres tion can be identified by analysis of nucleotide and polypep Clone:473126 (SEQ ID NO:82), gi54287494 (SEQ ID 15 tide sequence alignments. For example, performing a query NO:83), and CeresClone:238614 (SEQ ID NO:84). Other on a database of nucleotide or polypeptide sequences can homologs and/or orthologs of SEQ ID NO:81 include identify homologs and/or orthologs of regulatory proteins. gi34903.124 (SEQ ID NO:85) and gi53791918 (SEQ ID Sequence analysis can involve BLAST, Reciprocal BLAST, NO:86). or PSI-BLAST analysis of nonredundant databases using In some cases, a regulatory protein can include a polypep known regulatory protein amino acid sequences. Those tide having at least 80% sequence identity, e.g., 80%, 85%, polypeptides in the database that have greater than 40% 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an sequence identity can be identified as candidates for further amino acid sequence corresponding to any of SEQID NOs: evaluation for Suitability as regulatory proteins. Amino acid 82-86 or the consensus sequence set forth in FIG. 8. sequence similarity allows for conservative amino acid Sub A regulatory protein can contain an NUC130/3NT domain 25 stitutions, such as substitution of one hydrophobic residue for and an SDA1 domain. An NUC130/3NT domain is an N-ter another or substitution of one polar residue for another. If minal domain found in a novel nucleolar protein family. An desired, manual inspection of Such candidates can be carried SDA1 domain characterizes a family consisting of several out in order to narrow the number of candidates to be further SDA1 protein homologues. SDA1 is a Saccharomyces cer evaluated. Manual inspection can be performed by selecting evisiae protein which is involved in the control of the actin 30 those candidates that appear to have domains Suspected of cytoskeleton, is essential for cell viability, and is localized in being present in regulatory proteins, e.g., conserved func the nucleus. SEQ ID NO:190 sets forth the amino acid tional domains. sequence of a DNA clone, identified herein as cDNA ID The identification of conserved regions in a template or 23655935 (SEQ ID NO:189), that is predicted to encode a Subject polypeptide can facilitate production of variants of polypeptide having an NUC130/3NT domain and an SDA1 35 regulatory proteins. Conserved regions can be identified by domain. locating a region within the primary amino acid sequence of A regulatory protein can comprise the amino acid sequence a template polypeptide that is a repeated sequence, forms set forth in SEQID NO: 190. Alternatively, a regulatory pro Some secondary structure (e.g., helices and beta sheets), tein can be a homolog, ortholog, or variant of the polypeptide establishes positively or negatively charged domains, or rep having the amino acid sequence set forth in SEQID NO: 190. 40 resents a protein motif or domain. See, e.g., the Pfamweb site For example, a regulatory protein can have an amino acid describing consensus sequences for a variety of protein sequence with at least 50% sequence identity, e.g., 50%, 55%, motifs and domains at Sanger.ac.uk/Pfam and genome.wus 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or tl.edu/Pfam. A description of the information included at the 99% sequence identity, to the amino acid sequence set forth in Pfam database is described in Sonnhammer et al., Nucl. Acids SEQID NO.190. 45 Res., 26:320-322 (1998); Sonnhammer et al., Proteins, The amino acid sequence of a homolog of the polypeptide 28:405-420 (1997); and Bateman et al., Nucl. Acids Res., having the amino acid sequence set forth in SEQID NO:190 27:260-262 (1999). is provided in FIG. 18. FIG. 18 also includes a consensus Conserved regions also can be determined by aligning amino acid sequence determined by aligning the homologous sequences of the same or related polypeptides from closely amino acid sequence with the amino acid sequence set forthin 50 related species. Closely related species preferably are from SEQ ID NO.190. For example, the alignment in FIG. 18 the same family. In some embodiments, alignment of provides the amino acid sequences of cDNA ID 23655935 sequences from two different species is adequate. For (5110C8: SEQ ID NO:190) and gi50928937 (SEQ ID example, sequences from Arabidopsis and Zea mays can be NO:191). used to identify one or more conserved regions. In some cases, a regulatory protein can include a polypep 55 Typically, polypeptides that exhibit at least about 40% tide having at least 80% sequence identity, e.g., 80%, 85%, amino acid sequence identity are useful to identify conserved 90%, 93%, 95%, 97%, 98%, or 99% sequence identity, to an regions. Conserved regions of related polypeptides can amino acid sequence corresponding to SEQID NO: 191 or exhibit at least 45% amino acid sequence identity, e.g., at least the consensus sequence set forth in FIG. 18. 50%, at least 60%, at least 70%, at least 80%, or at least 90% A regulatory protein encoded by a recombinant nucleic 60 amino acid sequence identity. In some embodiments, a con acid can be a native regulatory protein, i.e., one or more served region of target and template polypeptides exhibit at additional copies of the coding sequence for a regulatory least 92%, 94%, 96%, 98%, or 99% amino acid sequence protein that is naturally present in the cell. Alternatively, a identity. Amino acid sequence identity can be deduced from regulatory protein can be heterologous to the cell, e.g., a amino acid or nucleotide sequences. In certain cases, highly transgenic Papaveraceae plant can contain the coding 65 conserved domains have been identified within regulatory sequence for a transcription factor polypeptide from a Catha proteins. These conserved regions can be useful in identifying ranthus plant. functionally similar (orthologous) regulatory proteins. US 7,795,503 B2 27 28 In some instances, Suitable regulatory proteins can be syn DNA Binding Domain thesized on the basis of consensus functional domains and/or A regulatory protein can include a domain, termed a DNA conserved regions in polypeptides that are homologous regu binding domain, which binds to a recognized site on DNA. A latory proteins. Domains are groups of Substantially contigu DNA binding domain of a regulatory protein can bind to one ous amino acids in a polypeptide that can be used to charac or more specific cis-responsive promoter motifs described terize protein families and/or parts of proteins. Such domains herein. The typical result is modulation of transcription from have a "fingerprint’ or “signature' that can comprise con a transcriptional start site associated with and operably linked served (1) primary sequence, (2) secondary structure, and/or to the cis-responsive motif. In some embodiments, binding of (3) three-dimensional conformation. Generally, domains are a DNA binding domain to a cis-responsive motif in planta correlated with specific in vitro and/or in vivo activities. A 10 involves other cellular components, which can be supplied by domain can have a length of from 10 amino acids to 400 the plant. amino acids, e.g., 10 to 50 amino acids, or 25 to 100 amino acids, or 35 to 65amino acids, or 35 to 55 amino acids, or 45 Transactivation Domain to 60 amino acids, or 200 to 300 amino acids, or 300 to 400 A regulatory protein can have discrete DNA binding and amino acids. 15 transactivation domains. Typically, transactivation domains Representative homologs and/or orthologs of regulatory bring proteins of the cellular transcription and translation proteins are shown in FIGS. 1-22. Each Figure represents an machinery into contact with the transcription start site to alignment of the amino acid sequence of a regulatory protein initiate transcription. A transactivation domain of a regula with the amino acid sequences of corresponding homologs tory protein can be synthetic or can be naturally-occurring. and/or orthologs. Amino acid sequences of regulatory pro An example of a transactivation domain is the transactivation teins and their corresponding homologs and/or orthologs domain of a maize transcription factor C polypeptide. have been aligned to identify conserved amino acids and to Oligomerization Sequences determine consensus sequences that contain frequently In some embodiments, a regulatory protein comprises oli occurring amino acid residues at particular positions in the gomerization sequences. In some instances oligomerization aligned sequences, as shown in FIGS. 1-22. A dash in an 25 is required for a ligand/regulatory protein complex or protein/ aligned sequence represents a gap, i.e., a lack of an amino acid protein complex to bind to a recognized DNA site. Oligomer at that position. Identical amino acids or conserved amino ization sequences can permit a regulatory protein to produce acid Substitutions among aligned sequences are identified by either homo- or heterodimers. Several motifs or domains in boxes. the amino acid sequence of a regulatory protein can influence Each consensus sequence is comprised of conserved 30 heterodimerization or homodimerization of a given regula regions. Each conserved region contains a sequence of con tory protein. tiguous amino acid residues. A dash in a consensus sequence In some embodiments, transgenic plants also include a indicates that the consensus sequence either lacks an amino recombinant coactivator polypeptide that can interact with a acid at that position or includes an amino acid at that position. regulatory protein to mediate the regulatory protein’s effect If an amino acid is present, the residue at that position corre 35 on transcription of an endogenous gene. Such polypeptides sponds to one found in any aligned sequence at that position. include chaperonins. In some embodiments, a recombinant Useful polypeptides can be constructed based on the con coactivator polypeptide is a chimera of a non-plant coactiva sensus sequence in FIG. 1, FIG.2, FIG.3, FIG.4, FIG.5, FIG. tor polypeptide and a plant coactivator polypeptide. Thus, in 6, FIG. 7, FIG. 8, FIG.9, FIG. 10, FIG. 11, FIG. 12, FIG. 13, Some embodiments, a regulatory protein described herein FIG.14, FIG.15, FIG.16, FIG. 17, FIG. 18, FIG. 19, FIG.20, 40 binds as a heterodimer to a promoter motif. In such embodi FIG. 21, or FIG. 22. Such a polypeptide includes the con ments, plants and plant cells contain a coding sequence for a served regions in the selected consensus sequence, arranged second or other regulatory protein as a dimerization or mul in the order depicted in the Figure from amino-terminal end to timerization partner, in addition to the coding sequence for carboxy-terminal end. Such a polypeptide may also include the first regulatory protein. 45 Zero, one, or more than one amino acid in positions marked by Nucleic Acids dashes. When no amino acids are present at positions marked by dashes, the length of Such a polypeptide is the Sum of the A nucleic acid can comprise a coding sequence that amino acid residues in all conserved regions. When amino encodes any of the regulatory proteins as set forth in SEQID acids are present at all positions marked by dashes, such a NOs:2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, SEQ ID 50 NOs:35-42, SEQID NOs:44-61, SEQID NOs:63-70, SEQ polypeptide has a length that is the sum of the amino acid ID NOs:72-79, SEQ ID NOs:81-86, SEQ ID NOs:88-99, residues in all conserved regions and all dashes. SEQID NOs: 101-113, SEQID NOs: 115-122, SEQID NOs: A conserved domain in certain cases may be 1) a localiza 124-136, SEQID NOs: 138-150, SEQID NOs: 152-156, SEQ tion domain, 2) an activation domain, 3) a repression domain, ID NOs:158-167, SEQID NOs: 169-175, SEQID NOs: 177 4) an oligomerization domain or 5) a DNA binding domain. 55 188, SEQID NOS:190-191, SEQID NOs: 193-194, SEQID Consensus domains and conserved regions can be identified NO:196, SEQ ID NO:198, SEQ ID NOs:200-204, SEQ ID by homologous polypeptide sequence analysis as described NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID above. The suitability of polypeptides for use as regulatory NOs:221-236, and the consensus sequences set forth in FIGS. proteins can be evaluated by functional complementation 1-22. In some cases, a recombinant nucleic acid construct can studies. 60 include a nucleic acid comprising less than the full-length Alternatively, a regulatory protein can be a fragment of a coding sequence of a regulatory protein. In some cases, a naturally occurring regulatory protein. In certain cases, such recombinant nucleic acid construct can include a nucleic acid as transcription factor regulatory proteins, a fragment can comprising a coding sequence, a gene, or a fragment of a comprise the DNA-binding and transcription-regulating coding sequence or gene in an antisense orientation so that the domains of the naturally occurring regulatory protein. 65 antisense strand of RNA is transcribed. Additional information on regulatory protein domains is It will be appreciated that a number of nucleic acids can provided below. encode a polypeptide having a particular amino acid US 7,795,503 B2 29 30 sequence. The degeneracy of the genetic code is well known 1995. Generally, sequence information from the ends of the to the art; i.e., for many amino acids, there is more than one region of interest or beyond is employed to design oligonucle nucleotide triplet that serves as the codon for the amino acid. otide primers that are identical or similar in sequence to For example, codons in the coding sequence for a given opposite strands of the template to be amplified. Various PCR regulatory protein can be modified such that optimal expres strategies also are available by which site-specific nucleotide sion in a particular plant species is obtained, using appropri sequence modifications can be introduced into a template ate codon bias tables for that species. nucleic acid. Isolated nucleic acids also can be chemically A nucleic acid also can comprise a nucleotide sequence synthesized, either as a single nucleic acid molecule (e.g., corresponding to any of the regulatory regions as set forth in using automated DNA synthesis in the 3' to 5' direction using SEQ ID NOs:237-330 and SEQ ID NOs:365-371. In some 10 phosphoramidite technology) or as a series of oligonucle cases, a nucleic acid can comprise a nucleotide sequence corresponding to any of the regulatory regions as set forth in otides. For example, one or more pairs of long oligonucle SEQID NOS:237-330 and SEQID NOS:365-371 and a cod otides (e.g., >100 nucleotides) can be synthesized that con ing sequence that encodes any of the regulatory proteins as set tain the desired sequence, with each pair containing a short forth in SEQID NOs:2-5, SEQID NOs:7-16, SEQ ID NOs: 15 segment of complementarity (e.g., about 15 nucleotides) Such 18-33, SEQ ID NOS:35-42, SEQ ID NOs:44-61, SEQ ID that a duplex is formed when the oligonucleotide pair is NOs:63-70, SEQID NOs:72-79, SEQID NOs:81-86, SEQ annealed. DNA polymerase is used to extend the oligonucle ID NOs:88-99, SEQ ID NOs: 101-113, SEQ ID NOs: 115 otides, resulting in a single, double-stranded nucleic acid 122, SEQID NOs: 124-136, SEQID NOs: 138-150, SEQID molecule per oligonucleotide pair, which then can be ligated NOs: 152-156, SEQID NOs: 158-167, SEQID NOs: 169-175, into a vector. Isolated nucleic acids of the invention also can SEQID NOs: 177-188, SEQID NOS:190-191, SEQID NOs: be obtained by mutagenesis of e.g., a naturally occurring 193-194, SEQ ID NO:196, SEQID NO.198, SEQID NOs: DNA. 200-204, SEQID NOs:206-215, SEQ ID NO:217, SEQ ID As used herein, the term “percent sequence identity” refers NO:219, SEQID NOS:221-236, and the consensus sequences to the degree of identity between any given query sequence set forth in FIGS. 1-22. 25 and a Subject sequence. A Subject sequence typically has a The terms “nucleic acid' and “polynucleotide are used length that is more than 80 percent, e.g., more than 82,85, 87, interchangeably herein, and refer both to RNA and DNA, 89,90, 93, 95, 97,99, 100, 105, 110, 115, or 120 percent, of including cDNA, genomic DNA, synthetic DNA, and DNA the length of the query sequence. A query nucleic acid or (or RNA) containing nucleic acid analogs. Polynucleotides amino acid sequence is aligned to one or more Subject nucleic can have any three-dimensional structure. A nucleic acid can 30 be double-stranded or single-stranded (i.e., a sense Strand or acid or amino acid sequences using the computer program an antisense strand). Non-limiting examples of polynucle ClustalW (version 1.83, default parameters), which allows otides include genes, gene fragments, exons, introns, messen alignments of nucleic acid or protein sequences to be carried ger RNA (mRNA), transfer RNA, ribosomal RNA, siRNA, out across their entire length (global alignment). Chenna et micro-RNA, ribozymes, cDNA, recombinant polynucle 35 al., Nucleic Acids Res., 31(13):3497-500 (2003). otides, branched polynucleotides, plasmids, vectors, isolated ClustalW calculates the best match between a query and DNA of any sequence, isolated RNA of any sequence, nucleic one or more subject sequences, and aligns them so that iden acid probes, and primers, as well as nucleic acid analogs. tities, similarities and differences can be determined. Gaps of An isolated nucleic acid can be, for example, a naturally one or more residues can be inserted into a query sequence, a occurring DNA molecule, provided one of the nucleic acid 40 Subject sequence, or both, to maximize sequence alignments. sequences normally found immediately flanking that DNA For fast pairwise alignment of nucleic acid sequences, the molecule in a naturally-occurring genome is removed or following default parameters are used: word size: 2; window absent. Thus, an isolated nucleic acid includes, without limi size: 4; scoring method: percentage; number oftop diagonals: tation, a DNA molecule that exists as a separate molecule, 4; and gap penalty: 5. For multiple alignment of nucleic acid independent of other sequences (e.g., a chemically synthe 45 sequences, the following parameters are used: gap opening sized nucleic acid, or a cDNA or genomic DNA fragment penalty: 10.0; gap extension penalty: 5.0; and weight transi produced by the polymerase chain reaction (PCR) or restric tions: yes. For fast pairwise alignment of protein sequences, tion endonuclease treatment). An isolated nucleic acid also the following parameters are used: word size: 1; window size: refers to a DNA molecule that is incorporated into a vector, an 5; scoring method: percentage; number of top diagonals: 5; autonomously replicating plasmid, a virus, or into the 50 gap penalty: 3. For multiple alignment of protein sequences, genomic DNA of a prokaryote or eukaryote. In addition, an the following parameters are used: weight matrix: blosum: isolated nucleic acid can include an engineered nucleic acid gap opening penalty: 10.0; gap extension penalty: 0.05; such as a DNA molecule that is part of a hybrid or fusion hydrophilic gaps: on; hydrophilic residues: Gly, Pro, Ser, nucleic acid. A nucleic acid existing among hundreds to mil ASn, Asp, Gln, Glu, Arg, and Lys; residue-specific gap pen lions of other nucleic acids within, for example, cDNA librar 55 alties: on. The output is a sequence alignment that reflects the ies or genomic libraries, or gel slices containing a genomic relationship between sequences. ClustalW can be run, for DNA restriction digest, is not to be considered an isolated example, at the Baylor College of Medicine Search Launcher nucleic acid. site (searchlauncher.bcm.tmc.edu/multi-align/multi Isolated nucleic acid molecules can be produced by stan align.html) and at the European Bioinformatics Institute site dard techniques. For example, polymerase chain reaction 60 on the World Wide Web (ebi.ac.uk/clustalw). (PCR) techniques can be used to obtain an isolated nucleic To determine a percent identity between a query sequence acid containing a nucleotide sequence described herein. PCR and a subject sequence, ClustalW divides the number of iden can be used to amplify specific sequences from DNA as well tities in the best alignment by the number of residues com as RNA, including sequences from total genomic DNA or pared (gap positions are excluded), and multiplies the result total cellular RNA. Various PCR methods are described, for 65 by 100. The output is the percent identity of the subject example, in PCR Primer: A Laboratory Manual, Dieffenbach sequence with respect to the query sequence. It is noted that and Dveksler, eds., Cold Spring Harbor Laboratory Press, the percent identity value can be rounded to the nearest tenth. US 7,795,503 B2 31 32 For example, 78.11, 78.12, 78.13, and 78.14 are rounded inserted segment. Generally, a vector is capable of replication down to 78.1, while 78.15, 78.16, 78.17, 78.18, and 78.19 are when associated with the proper control elements. Suitable rounded up to 78.2. vector backbones include, for example, those routinely used The term “exogenous” with respect to a nucleic acid indi in the art Such as plasmids, viruses, artificial chromosomes, cates that the nucleic acid is part of a recombinant nucleic acid 5 BACs, YACs, or PACs. The term “vector” includes cloning construct, or is not in its natural environment. For example, an and expression vectors, as well as viral vectors and integrat exogenous nucleic acid can be a sequence from one species ing vectors. An 'expression vector” is a vector that includes a introduced into another species, i.e., a heterologous nucleic regulatory region. Suitable expression vectors include, with acid. Typically, Such an exogenous nucleic acid is introduced out limitation, plasmids and viral vectors derived from, for into the other species via a recombinant nucleic acid con 10 example, bacteriophage, baculoviruses, and retroviruses. struct. An exogenous nucleic acid can also be a sequence that Numerous vectors and expression systems are commercially is native to an organism and that has been reintroduced into available from Such corporations as Novagen (Madison, cells of that organism. An exogenous nucleic acid that Wis.), Clontech (Palo Alto, Calif.), Stratagene (La Jolla, includes a native sequence can often be distinguished from Calif.), and Invitrogen/Life Technologies (Carlsbad, Calif.). the naturally occurring sequence by the presence of non 15 The vectors provided herein also can include, for example, natural sequences linked to the exogenous nucleic acid, e.g., origins of replication, Scaffold attachment regions (SARs), non-native regulatory sequences flanking a native sequence in and/or markers. A marker gene can confer a selectable phe a recombinant nucleic acid construct. In addition, stably notype on a plant cell. For example, a marker can confer transformed exogenous nucleic acids typically are integrated biocide resistance. Such as resistance to an antibiotic (e.g., at positions other than the position where the native sequence kanamycin, G418, bleomycin, or hygromycin), or an herbi is found. It will be appreciated that an exogenous nucleic acid cide (e.g., chlorosulfuron or phosphinothricin). In addition, may have been introduced into a progenitor and not into the an expression vector can include a tag sequence designed to cell under consideration. For example, a transgenic plant facilitate manipulation or detection (e.g., purification or containing an exogenous nucleic acid can be the progeny of a localization) of the expressed polypeptide. Tag sequences, cross between a stably transformed plant and a non-trans 25 Such as green fluorescent protein (GFP), glutathione S-trans genic plant. Such progeny are considered to contain the exog ferase (GST), polyhistidine, c-myc, hemagglutinin, or Flag TM enous nucleic acid. tag (Kodak, New Haven, Conn.) sequences typically are Similarly, a regulatory protein can be endogenous or exog expressed as a fusion with the encoded polypeptide. Such tags enous to a particular plant or plant cell. Exogenous regulatory can be inserted anywhere within the polypeptide, including at proteins, therefore, can include proteins that are native to a 30 either the carboxyl or amino terminus. plant or plant cell, but that are expressed in a plant cell via a As described herein, plant cells can be transformed with a recombinant nucleic acid construct, e.g., a California poppy recombinant nucleic acid construct to express a polypeptide plant transformed with a recombinant nucleic acid construct of interest. The polypeptide can then be extracted and purified encoding a California poppy transcription factor. using techniques known to those having ordinary skill in the Likewise, a regulatory region can be exogenous or endog 35 art. enous to a plant or plant cell. An exogenous regulatory region is a regulatory region that is part of a recombinant nucleic acid Regulatory Regions construct, or is not in its natural environment. For example, a Particular regulatory regions were examined for their abil Nicotiana promoter present on a recombinant nucleic acid ity to associate with regulatory proteins described herein. The construct is an exogenous regulatory region when a Nicotiana 40 sequences of these regulatory regions are set forth in SEQID plant cell is transformed with the construct. NOs:237-252. These regulatory regions were initially chosen A transgenic plant or plant cell in which the amount and/or for investigation because they were thought to be regulatory rate of biosynthesis of one or more sequences of interest is regions involved in alkaloid biosynthetic pathways in plants modulated includes at least one recombinant nucleic acid Such as Arabidopsis, California poppy, Papaver SOmniferum, construct, e.g., a nucleic acid construct comprising a nucleic 45 and Catharanthus. Using the methods described herein, regu acid encoding a regulatory protein or a nucleic acid construct latory proteins that can associate with some of these regula comprising a regulatory region as described herein. In certain tory regions were identified, and Such associations are listed cases, more than one recombinant nucleic acid construct can in Table 4 (under Example 5 below). In turn, knowledge of a be included (e.g., two, three, four, five, six, or more recom regulatory protein-regulatory region association facilitates binant nucleic acid constructs). For example, two recombi 50 the modulation of expression of sequences of interest that are nant nucleic acid constructs can be included, where one con operably linked to a given regulatory region by the associated struct includes a nucleic acid encoding one regulatory regulatory protein. The regulatory protein associated with the protein, and another construct includes a nucleic acid encod regulatory region operably linked to the sequence of interest ing a second regulatory protein. Alternatively, one construct is itself operably linked to a regulatory region. The amount can include a nucleic acid encoding one regulatory protein, 55 and specificity of expression of a regulatory protein can be while another includes a regulatory region. In other cases, a modulated by selecting an appropriate regulatory region to plant cell can include a recombinant nucleic acid construct direct expression of the regulatory protein. For example, a comprising a nucleic acid encoding a regulatory protein and regulatory protein can be broadly expressed under the direc further comprising a regulatory region that associates with the tion of a promoter such as a CaMV 35S promoter. Once regulatory protein. In Such cases, additional recombinant 60 expressed, the regulatory protein can directly or indirectly nucleic acid constructs can also be included in the plant cell, affect expression of a sequence of interest operably linked to e.g., containing additional regulatory proteins and/or regula another regulatory region, which is associated with the regu tory regions. latory protein. In some cases, a regulatory protein can be Vectors containing nucleic acids such as those described expressed under the direction of a cell type- or tissue-prefer herein also are provided. A "vector” is a replicon, Such as a 65 ential promoter, Such as a cell type- or tissue-preferential plasmid, phage, or cosmid, into which another DNA segment promoter described below. In some embodiments, a regula may be inserted so as to bring about the replication of the tory region useful in the methods described herein has 80% or US 7,795,503 B2 33 34 greater, e.g., 85%, 90%, 95%, 97%, 98%, 99%, or 100%, (1989); Bustos et al., Plant Cell, 1:839-854 (1989); Green et sequence identity to a regulatory region set forth in SEQID al., EMBO.J., 7:4035-4044 (1988); Meier et al., Plant Cell, NOS:237-252. 3:309-316 (1991); and Zhang et al., Plant Physiology, 110: The methods described herein can also be used to identify 1069-1079 (1996). new regulatory region-regulatory protein association pairs. Examples of various classes of promoters are described For example, an ortholog to a given regulatory protein is below. Some of the promoters indicated below are described expected to associate with the associated regulatory region in more detail in U.S. Patent Application Ser. Nos. 60/505, for that regulatory protein. 689: 60/518,075; 60/544,771; 60/558,869; 60/583,691; It should be noted that for a given regulatory protein listed 60/619, 181; 60/637,140; 10/950,321; 10/957,569; 11/058, in Table 4 (under Example 5 below), a regulatory region 10 689; 11/172,703; 11/208,308; and PCT/US05/23639. Nucle construct that includes one or more regulatory regions is set otide sequences of promoters are set forth in SEQID NOs: forth. A regulatory protein is expected to associate with either 253-330 and SEQID NOS:365-371. It will be appreciated that one or both such regulatory regions. Similarly, FIGS. 1-22 a promoter may meet criteria for one classification based on provide ortholog/homolog sequences and consensus its activity in one plant species, and yet meet criteria for a sequences for corresponding regulatory proteins. It is con 15 different classification based on its activity in another plant templated that each Such ortholog/homolog sequence and species. each polypeptide sequence that corresponds to the consensus Broadly Expressing Promoters sequence of the regulatory protein would also associate with A promoter can be said to be “broadly expressing when it the regulatory regions associated with the given regulatory promotes transcription in many, but not necessarily all, plant protein as set forth in Table 4 (under Example 5 below). tissues. For example, a broadly expressing promoter can pro The term “regulatory region” refers to nucleotide mote transcription of an operably linked sequence in one or sequences that influence transcription or translation initiation more of the shoot, shoot tip (apex), and leaves, but weakly or and rate, and Stability and/or mobility of a transcription or not at all in tissues such as roots or stems. As another example, translation product. Regulatory regions include, without limi a broadly expressing promoter can promote transcription of tation, promoter sequences, enhancer sequences, response 25 an operably linked sequence in one or more of the stem, shoot, elements, protein recognition sites, inducible elements, pro shoot tip (apex), and leaves, but can promote transcription teinbinding sequences, 5' and 3' untranslated regions (UTRs), weakly or not at all in tissues such as reproductive tissues of transcriptional start sites, termination sequences, polyadeny flowers and developing seeds. Non-limiting examples of lation sequences, and introns. broadly expressing promoters that can be included in the As used herein, the term “operably linked’ refers to posi 30 nucleic acid constructs provided herein include the p326 tioning of a regulatory region and a sequence to be transcribed (SEQID NO:328),YPO144 (SEQID NO:307), YPO190(SEQ in a nucleic acid so as to influence transcription or translation ID NO:311), p.13879 (SEQID NO:327), YP0050 (SEQ ID of Such a sequence. For example, to bring a coding sequence NO:287), p32449 (SEQ ID NO:329), 21876 (SEQ ID under the control of a promoter, the translation initiation site NO:253), YP0158 (SEQ ID NO:309), YP0214 (SEQ ID of the translational reading frame of the polypeptide is typi 35 NO:313), YPO380 (SEQ ID NO:322), PTO848 (SEQ ID cally positioned between one and about fifty nucleotides NO:278), and PTO633 (SEQID NO:259) promoters. Addi downstream of the promoter. A promoter can, however, be tional examples include the cauliflower mosaic virus (CaMV) positioned as much as about 5,000 nucleotides upstream of 35S promoter, the mannopine synthase (MAS) promoter, the the translation initiation site, or about 2,000 nucleotides 1' or 2 promoters derived from T-DNA of Agrobacterium upstream of the transcription start site. A promoter typically 40 tumefaciens, the figwort mosaic virus 34S promoter, actin comprises at least a core (basal) promoter. A promoter also promoters such as the rice actin promoter, and ubiquitin pro may include at least one control element, such as an enhancer moters such as the maize ubiquitin-1 promoter. In some cases, sequence, an upstream element or an upstream activation the CaMV 35S promoter is excluded from the category of region (UAR). For example, a Suitable enhancer is a cis broadly expressing promoters. regulatory element (-212 to -154) from the upstream region 45 Root Promoters of the octopine synthase (ocs) gene. Fromm et al. The Plant Root-active promoters confer transcription in root tissue, Cell, 1:977-984 (1989). The choice of promoters to be e.g., root endodermis, root epidermis, or root vascular tissues. included depends upon several factors, including, but not In some embodiments, root-active promoters are root-prefer limited to, efficiency, selectability, inducibility, desired ential promoters, i.e., confer transcription only or predomi expression level, and cell- or tissue-preferential expression. It 50 nantly in root tissue. Root-preferential promoters include the is a routine matter for one of skill in the art to modulate the YPO128 (SEQ ID NO:304), YP0275 (SEQ ID NO:315), expression of a coding sequence by appropriately selecting PTO625 (SEQ ID NO:258), PTO660 (SEQ ID NO:261), and positioning promoters and other regulatory regions rela PTO683 (SEQ ID NO:266), and PTO758 (SEQ ID NO:274) tive to the coding sequence. promoters. Other root-preferential promoters include the Some Suitable promoters initiate transcription only, or pre 55 PTO613 (SEQ ID NO:257), PTO672 (SEQ ID NO:263), dominantly, in certain cell types. For example, a promoter that PTO688 (SEQ ID NO:267), and PTO837 (SEQ ID NO:276) is active predominantly in a reproductive tissue (e.g., fruit, promoters, which drive transcription primarily in root tissue ovule, pollen, pistils, female gametophyte, egg cell, central and to a lesser extent in ovules and/or seeds. Other examples cell, nucellus, Suspensor, synergid cell, flowers, embryonic of root-preferential promoters include the root-specific sub tissue, embryo sac, embryo, Zygote, endosperm, integument, 60 domains of the CaMV 35S promoter (Lam et al., Proc. Natl. or seed coat) can be used. Thus, as used herein a cell type- or Acad. Sci. USA, 86:7890-7894 (1989)), root cell specific pro tissue-preferential promoter is one that drives expression moters reported by Conkling et al., Plant Physiol., 93:1203 preferentially in the target tissue, but may also lead to some 1211 (1990), and the tobacco RD2 promoter. expression in other cell types or tissues as well. Methods for Maturing Endosperm Promoters identifying and characterizing promoter regions in plant 65 In some embodiments, promoters that drive transcription genomic DNA include, for example, those described in the in maturing endosperm can be useful. Transcription from a following references: Jordano et al., Plant Cell, 1:855-866 maturing endosperm promoter typically begins after fertili US 7,795,503 B2 35 36 Zation and occurs primarily in endosperm tissue during seed (SEQ ID NO:312). Other promoters that may be useful development and is typically highest during the cellulariza include the following rice promoters: p530c 10, posFIE2-2, tion phase. Most Suitable are promoters that are active pre pOsMEA, pCSYp 102, and pCsYp285. dominantly in maturing endosperm, although promoters that Embryo Promoters are also active in other tissues can sometimes be used. Non Regulatory regions that preferentially drive transcription in limiting examples of maturing endosperm promoters that can Zygotic cells following fertilization can provide embryo-pref be included in the nucleic acid constructs provided herein erential expression. Most suitable are promoters that prefer include the napin promoter, the Arcelin-5 promoter, the entially drive transcription in early stage embryos prior to the phaseolin promoter (Bustos et al., Plant Cell, 1 (9):839-853 heart stage, but expression in late stage and maturing embryos (1989)), the soybean trypsin inhibitor promoter (Riggs et al., 10 is also suitable. Embryo-preferential promoters include the Plant Cell, 1(6):609-621 (1989)), the ACP promoter (Baer barley lipid transfer protein (Ltp1) promoter (Plant Cell Rep son et al., Plant Mol. Biol., 22(2):255-267 (1993)), the (2001) 20:647-654), YP0097 (SEQ ID NO:292), YP0107 stearoyl-ACP desaturase promoter (Slocombe et al., Plant (SEQID NO:296), YP0088 (SEQID NO:289),YPO143 (SEQ Physiol., 104(4): 167-176 (1994)), the soybean O' subunit of ID NO:306), YP0156 (SEQ ID NO:308), PTO650 (SEQ ID B-conglycinin promoter (Chen et al., Proc. Natl. Acad. Sci. 15 NO:260), PTO695 (SEQ ID NO:268), PTO723 (SEQ ID USA, 83:8560-8564 (1986)), the oleosin promoter (Hong et NO:271), PTO838 (SEQ ID NO:277), PTO879 (SEQ ID al., Plant Mol. Biol., 34(3):549-555 (1997)), and Zein pro NO:280), and PTO740 (SEQID NO:272). moters, such as the 15 kD Zein promoter, the 16 kD Zein Photosynthetic Tissue Promoters promoter, 19 kD Zein promoter, 22 kD Zein promoter and 27 Promoters active in photosynthetic tissue confer transcrip kDZein promoter. Also suitable are the Osgt-1 promoter from tion in green tissues such as leaves and stems. Most Suitable the rice glutelin-1 gene (Zheng et al., Mol. Cell Biol., are promoters that drive expression only or predominantly in 13:5829-5842 (1993)), the beta-amylase promoter, and the Such tissues. Examples of Such promoters include the ribu barley hordein promoter. Other maturing endosperm promot lose-1,5-bisphosphate carboxylase (RbcS) promoters such as ers include the YP0092 (SEQID NO:290), PTO676 (SEQID the RbcS promoter from eastern larch (Larix laricina), the NO:264), and PTO708 (SEQ ID NO:269) promoters. 25 pine cab6 promoter (Yamamoto et al., Plant Cell Physiol., Ovary Tissue Promoters 35:773-778 (1994)), the Cab-1 promoter from wheat (Fejes et Promoters that are active in ovary tissues such as the ovule al., Plant Mol. Biol., 15:921-932 (1990)), the CAB-1 pro wall and mesocarp can also be useful, e.g., a polygalactur moter from spinach (Lubberstedt et al., Plant Physiol., 104: onidase promoter, the banana TRX promoter, and the melon 997-1006 (1994)), the cab1R promoter from rice (Luanet al., actin promoter. Examples of promoters that are active prima 30 Plant Cell, 4:971-981 (1992)), the pyruvate orthophosphate rily in ovules include YP0007 (SEQ ID NO:282), YPO111 dikinase (PPDK) promoter from corn (Matsuoka et al., Proc. (SEQIDNO:298), YP0092 (SEQID NO:290), YP0103 (SEQ Natl. Acad. Sci. USA, 90:9586-9590 (1993)), the tobacco ID NO:295), YP0028 (SEQ ID NO:285), YP0121 (SEQ ID Lhcb1*2 promoter (Cerdan et al., Plant Mol. Biol., 33:245 NO:303), YP0008 (SEQ ID NO:283), YP0039 (SEQ ID 255 (1997)), the Arabidopsis thaliana SUC2 sucrose-H+ NO:286), YPO115 (SEQ ID NO:299), YPO119 (SEQ ID 35 symporter promoter (Truernit et al., Planta, 196:564-570 NO:301), YP0120 (SEQID NO:302), and YPO374 (SEQ ID (1995)), and thylakoid membrane protein promoters from NO:320). spinach (psal), psaE, psaE, PC, FNR, atpC, atpl), cab, rbcS). Embryo Sac/Early Endosperm Promoters Other photosynthetic tissue promoters include PTO535 (SEQ To achieve expression in embryo Sac/early endosperm, ID NO:255), PTO668 (SEQ ID NO:254), PTO886 (SEQ ID regulatory regions can be used that are active in polar nuclei 40 NO:281), YP0144 (SEQ ID NO:307), YPO380 (SEQ ID and/or the central cell, or in precursors to polar nuclei, but not NO:322), and PTO585 (SEQID NO:256). in egg cells or precursors to egg cells. Most Suitable are Vascular Tissue Promoters promoters that drive expression only or predominantly in Examples of promoters that have high or preferential activ polar nuclei or precursors thereto and/or the central cell. A ity in vascular bundles include YP0087 (SEQ ID NO:365), pattern of transcription that extends from polar nuclei into 45 YP0093 (SEQ ID NO:366), YP0108 (SEQ ID NO:367), early endosperm development can also be found with embryo YP0022 (SEQID NO:368), and YPO080 (SEQID NO:369). sac/early endosperm-preferential promoters, although tran Other vascular tissue-preferential promoters include the gly Scription typically decreases significantly in later endosperm cine-rich cell wall protein GRP 1.8 promoter (Keller and development during and after the cellularization phase. Baumgartner, Plant Cell, 3(10): 1051-1061 (1991)), the Com Expression in the Zygote or developing embryo typically is 50 melina yellow mottle virus (COYMV) promoter (Medberry not present with embryo Sac/early endosperm promoters. et al., Plant Cell, 4(2): 185-192 (1992)), and the rice tungro Promoters that may be suitable include those derived from bacilliform virus (RTBV) promoter (Dai et al., Proc. Natl. the following genes: Arabidopsis viviparous-1 (see, Gen Acad. Sci. USA, 101(2):687-692 (2004)). Bank No. U93215); Arabidopsis atmycl (see, Urao (1996) Poppy Capsule Promoters Plant Mol. Biol., 32:571-57; Conceicao (1994) Plant, 5:493 55 Examples of promoters that have high or preferential activ 505); Arabidopsis FIE (GenBank No. AF 129516); Arabidop ity in siliques/fruits, which are botanically equivalent to cap sis MEA: Arabidopsis FIS2 (GenBank No. AF096096); and sules in opium poppy, include PTO565 (SEQID NO:370) and FIE 1.1 (U.S. Pat. No. 6,906,244). Other promoters that may YP0015 (SEQID NO:371). be suitable include those derived from the following genes: Inducible Promoters maize MAC1 (see, Sheridan (1996) Genetics, 142:1009 60 Inducible promoters confer transcription in response to 1020); maize Cat3 (see, GenBank No. L05934: Abler (1993) external stimuli such as chemical agents or environmental Plant Mol. Biol., 22:10131-1038). Other promoters include stimuli. For example, inducible promoters can confer tran the following Arabidopsis promoters: YP0039 (SEQ ID Scription in response to hormones such as gibberellic acid or NO:286), YP0101 (SEQ ID NO:293), YP0102 (SEQ ID ethylene, or in response to light or drought. Examples of NO:294), YPO110 (SEQ ID NO:297), YPO117 (SEQ ID 65 drought-inducible promoters include YP0380 (SEQ ID NO:300), YPO119 (SEQ ID NO:301), YP0137 (SEQ ID NO:322), PTO848 (SEQ ID NO:278), YPO381 (SEQ ID NO:305), DME, YP0285 (SEQ ID NO:316), and YP0212 NO:323), YPO337 (SEQ ID NO:318), PTO633 (SEQ ID US 7,795,503 B2 37 38 NO:259), YPO374 (SEQ ID NO:320), PTO710 (SEQ ID interest is present in a plant, e.g., two, three, four, five, six, NO:270), YPO356 (SEQ ID NO:319), YP0385 (SEQ ID seven, eight, nine, or ten sequences of interest. Each sequence NO:325),YPO396 (SEQIDNO:326), YP0388,YPO384 (SEQ of interest can be present on the same nucleic acid constructin ID NO:324), PTO688 (SEQ ID NO:267), YP0286 (SEQ ID Such embodiments. Alternatively, each sequence of interest NO:317), YPO377 (SEQ ID NO:321), PD1367 (SEQ ID can be present on separate nucleic acid constructs. The regu NO:330), PD0901, and PD0898. Nitrogen-inducible promot latory region operably linked to each sequence of interest can ers include PTO863 (SEQ ID NO:279), PTO829 (SEQ ID be the same or can be different. In addition, one or more NO:275), PTO665 (SEQID NO:262), and PTO886 (SEQ ID nucleotide sequences encoding a regulatory protein can be NO:281). included on a nucleic acid construct that is the same as or Basal Promoters 10 separate from that containing an associated regulatory A basal promoter is the minimal sequence necessary for region(s) operably linked to a sequence(s) of interest. The assembly of a transcription complex required for transcrip regulatory region operably linked to each sequence encoding tion initiation. Basal promoters frequently include a “TATA a regulatory protein can be the same or different. box' element that may be located between about 15 and about A sequence of interest that encodes a polypeptide can 35 nucleotides upstream from the site of transcription initia 15 encode a plant polypeptide, a non-plant polypeptide, e.g., a tion. Basal promoters also may include a “CCAAT box' mammalian polypeptide, a modified polypeptide, a synthetic element (typically the sequence CCAAT) and/or a GGGCG polypeptide, or a portion of a polypeptide. A sequence of sequence, which can be located between about 40 and about interest can be endogenous, i.e., unmodified by recombinant 200 nucleotides, typically about 60 to about 120 nucleotides, DNA technology from the sequence and structural relation upstream from the transcription start site. ships that occur in nature and operably linked to the unmodi Other Promoters fied regulatory region. Alternatively, a sequence of interest Other classes of promoters include, but are not limited to, can be an exogenous nucleic acid. leaf-preferential, stem/shoot-preferential, callus-preferen Alkaloid Biosynthesis Sequences tial, guard cell-preferential, such as PTO678 (SEQ ID In certain cases, a sequence of interest can be an endog NO:265), and senescence-preferential promoters. Promoters 25 enous or exogenous sequence associated with alkaloid bio designated YP0086 (SEQ ID NO:288), YP0188 (SEQ ID synthesis. For example, a transgenic plant cell containing a NO:310), YP0263 (SEQ ID NO:314), PTO758 (SEQ ID recombinant nucleic acid encoding a regulatory protein can NO:274), PTO743 (SEQ ID NO:273), PTO829 (SEQ ID be effective for modulating the amount and/or rate of biosyn NO:275), YPO119 (SEQID NO:301), and YP0096 (SEQ ID thesis of one or more alkaloid compounds. Such effects on NO:291), as described in the above-referenced patent appli 30 alkaloid compounds typically occur via modulation of tran cations, may also be useful. Scription of one or more endogenous or exogenous sequences Other Regulatory Regions of interest operably linked to an associated regulatory region, A 5' untranslated region (UTR) can be included in nucleic e.g., endogenous sequences involved in alkaloid biosynthe acid constructs described herein. A 5' UTR is transcribed, but sis, such as native enzymes or regulatory proteins in alkaloid is not translated, and lies between the start site of the tran 35 biosynthesis pathways, or exogenous sequences involved in Script and the translation initiation codon and may include the alkaloid biosynthesis pathways introduced via a recombinant +1 nucleotide. A 3' UTR can be positioned between the trans nucleic acid construct into a plant cell. lation termination codon and the end of the transcript. UTRs In some embodiments, the coding sequence can encode a can have particular functions such as increasing mRNA sta polypeptide involved in alkaloid biosynthesis, e.g., an bility or attenuating translation. Examples of 3' UTRs 40 enzyme involved in biosynthesis of the alkaloid compounds include, but are not limited to, polyadenylation signals and described herein, or a regulatory protein (such as a transcrip transcription termination sequences, e.g., a nopaline synthase tion factor) involved in the biosynthesis pathways of the alka termination sequence. loid compounds described herein. Other components that It will be understood that more than one regulatory region may be present in a sequence of interest include introns, may be presentina recombinant polynucleotide, e.g., introns, 45 enhancers, upstream activation regions, and inducible ele enhancers, upstream activation regions, transcription termi mentS. nators, and inducible elements. Thus, more than one regula A Suitable sequence of interest can encode an enzyme tory region can be operably linked to the sequence of a poly involved intetrahydrobenzylisoquinoline alkaloid biosynthe nucleotide encoding a regulatory protein. sis, e.g., selected from the group consisting of those encoding Regulatory regions, such as promoters for endogenous 50 for tyrosine decarboxylase (YDC or TYD; EC 4.1.1.25), nor genes, can be obtained by chemical synthesis or by Subclon coclaurine synthase (EC 4.2.1.78), coclaurine N-methyl ing from a genomic DNA that includes such a regulatory transferase (EC 2.1.1.140), (R, S)-norcoclaurine 6-O-methyl region. A nucleic acid comprising Such a regulatory region transferase (NOMT, EC 2.1.1.128), S-adenosyl-L-methion can also include flanking sequences that contain restriction ine:3'-hydroxy-N-methylcoclaurine 4'-O-methyltransferase 55 1 (HMCOMT1: EC 2.1.1.116); S-adenosyl-L-methionine:3'- enzyme sites that facilitate Subsequent manipulation. hydroxy-N-methylcoclaurine 4'-O-methyltransferase 2 Sequences of Interest and Plants and Plant Cells Containing (HMCOMT2: EC 2.1.1.116); monophenol monooxygenase the Same (EC1.14.18.1), N-methylcoclaurine 3'-hydroxylase (NMCH Plant cells and plants described herein are useful because EC 1.14.13.71), (R.S)-reticuline 7-O-methyltransferase expression of a sequence of interest can be modulated to 60 (ROMT); berbamunine synthase (EC 1.14.21.3), columbam achieve a desired amount and/or specificity in expression by ine O-methyltransferase (EC 2.1.1.118), berberine bridge selecting an appropriate association of regulatory region and enzyme (BBE; (EC 1.21.3.3), reticuline oxidase (EC regulatory protein. A sequence of interest operably linked to 1.21.3.4), dehydro reticulinium ion reductase (EC 1.5.1.27), a regulatory region can encode a polypeptide or can regulate (RS)-1-benzyl-1,2,3,4-tetrahydroisoquinoline N-methyl the expression of a polypeptide. In some embodiments, a 65 transferase (EC 2.1.1.115), (S)-scoulerine oxidase (EC sequence of interest is transcribed into an anti-sense mol 1.14.21.2), (S)-cheilanthifoline oxidase (EC 1.14.21.1), (S)- ecule. In some embodiments, more than one sequence of tetrahydroprotoberberine N-methyltransferase (EC US 7,795,503 B2 39 40 2.1.1.122), (S)-canadine synthase (EC 1.14.21.5), tetrahy can be generated by deducing the disulfide bridges of F(ab') droberberine oxidase (EC 1.3.3.8), columbamine oxidase fragments. Single chain FV antibody fragments are formed by (EC 1.21.3.2), and other enzymes, such as protopine-6-mo linking the heavy and light chain fragments of the Fv region nooxygenase, related to the biosynthesis of tetrahydrobenzyl viaanamino acid bridge (e.g., 15 to 18 amino acids), resulting isoquinoline alkaloids. in a single chain polypeptide. Single chain FV antibody frag In other cases, a sequence of interest can be an enzyme ments can be produced through standard techniques, such as involved in benzophenanthridine alkaloid biosynthesis, e.g., those disclosed in U.S. Pat. No. 4,946,778. U.S. Pat. No. selected from the group consisting of those encoding for 6.303.341 discloses immunoglobulin receptors. U.S. Pat. No. dihydrobenzophenanthridine oxidase (EC 1.5.3.12), dihyd 6,417,429 discloses immunoglobulin heavy- and light-chain rosanguinarine 10-hydroxylase (EC 1.14.13.56), 10-hy 10 polypeptides. droxydihydrosanguinarine 10-O-methyltransferase (EC 2.1.1.119), dihydrochelirubine 12-hydroxylase (EC TABLE 1 1.14.13.57), 12-hydroxydihydrochelirubine 12-O-methyl transferase (EC 2.1.1.120), and other enzymes, including Human Therapeutic Proteins dihydrobenzophenanthridine oxidase and dihydrosangui 15 Bromelain Humatrope (R) Proleukin (R) narine 10-monooxygenase, related to the biosynthesis of ben Chymopapain Humulin (R) (insulin) Protropin (R) Zophenanthridine alkaloids. Papain (R) Infergen (R) Recombivax-HB (R) Activase (R) Interferon-gamma-1a Recormon (R) In yet other cases, a sequence is involved in morphinan Albutein (R) Interleukin-2 Remicade (R) (s-TNF-r) alkaloid biosynthesis, e.g., selected from the group consisting Angiotensin II Intron (R) ReoPro (R) of salutaridinol 7-O-acetyltransferase (SAT; EC 2.3.1.150), Asparaginase Leukine (R) (GM-CSF) Retavase (R) (TPA) Avonex (R) Nartogastrim (R) Roferon-A (R) salutaridine synthase (EC 1.14.21.4), salutaridine reductase Betaseron (R) Neumega (R) Pegaspargas (EC 1.1.1.248), morphine 6-dehydrogenase (EC 1.1.1.218); BioTropin (R) Neupogen (R) Prandin (R) and codeinone reductase (CR; EC 1.1.1.247); and other Cerezyme (R) Norditropin (R) Procrit (R) sequences related to the biosynthesis of morphinan/opiate Enbrel (R) (s-TNF-r) Novolin (R) (insulin) Filgastrim (R) alkaloids. 25 Engerix-B (R) Nutropin (R) Genotropin (R) Epogen (R) Oncaspar (R) Gerefor) In other embodiments, a suitable sequence encodes an Sargramostrim Tripedia (R) Trichosanthin enzyme involved in purine alkaloid (e.g., Xanthines, such as TrEHIBit (R) Venoglobin-S (R) (HIG) caffeine) biosynthesis Such as Xanthosine methyltransferase, 7-N-methylxanthine methyltransferase (theobromine syn thase), or 3,7-dimethylxanthine methyltransferase (caffeine 30 A sequence of interest can encode a polypeptide or result in synthase). a transcription product anti-sense molecule that confers In some embodiments, a suitable sequence encodes an insect resistance, bacterial disease resistance, fungal disease enzyme involved in biosynthesis of indole alkaloids com resistance, viral disease resistance, nematode disease resis pounds such as tryptophane decarboxylase, strictosidine Syn tance, herbicide resistance, enhanced grain composition or thase, strictosidine glycosidase, dehydrogeissosshizine oxi 35 quality, enhanced nutrient composition, nutrient transporter doreductase, polyneuridine aldehyde esterase, Sarpagine functions, enhanced nutrient utilization, enhanced environ bridge enzyme, Vinorine reductase, vinorine synthase, Vino mental stress tolerance, reduced mycotoxin contamination, rine hydroxylase, 17-O-acetylajmalan acetylesterase, or female sterility, a selectable marker phenotype, a screenable norajamaline N-methyl transferase. In other embodiments, a marker phenotype, a negative selectable marker phenotype, Suitable sequence of interest encodes an enzyme involved in 40 or altered plantagronomic characteristics. Specific examples biosynthesis of vinblastine, Vincristine and compounds include, without limitation, a chitinase coding sequence and a derived from them, such as tabersonine 16-hydroxylase, glucan endo-1,3-B-glucosidase coding sequence. In some 16-hydroxytabersonine 16-O-methyl transferase, desacetox embodiments, a sequence of interest encodes a bacterial yvindoline 4-hydroxylase, or desacetylvindoline O-acetyl ESPS synthase that confers resistance to glyphosate herbicide transferasesynthase. 45 or a phosphinothricinacetyltransferase coding sequence that In still other embodiments, a Suitable sequence encodes an confers resistance to phosphinothricin herbicide. enzyme involved in biosynthesis of pyridine, tropane, and/or A sequence of interest can encode a polypeptide involved pyrrolizidine alkaloids such as arginine decarboxylase, sper in the production of industrial or pharmaceutical chemicals, midine synthase, ornithine decarboxylase, putrescine N-me modified and specialty oils, enzymes, or renewable non thyl transferase, tropinone reductase, hyoscyamine 6-beta 50 foods such as fuels and plastics, vaccines and antibodies. U.S. hydroxylase, diamine oxidase, and tropinone dehydrogenase. Pat. No. 5,824,779 discloses phytase-protein-pigmenting Other Sequences of Interest concentrate derived from green plant juice. U.S. Pat. No. Other sequences of interest can encode a therapeutic 5,900.525 discloses animal feed compositions containing polypeptide for use with mammals such as humans, e.g., as phytase derived from transgenic alfalfa. U.S. Pat. No. 6,136, set forth in Table 1, below. In certain cases, a sequence of 55 320 discloses vaccines produced in transgenic plants. U.S. interest can encode an antibody or antibody fragment. An Pat. No. 6,255,562 discloses insulin. U.S. Pat. No. 5,958,745 antibody or antibody fragment includes a humanized or chi discloses the formation of copolymers of 3-hydroxybutyrate meric antibody, a single chain FV antibody fragment, an Fab and 3-hydroxy valerate. U.S. Pat. No. 5,824,798 discloses fragment, and an F(ab')2 fragment. A chimeric antibody is a starch synthases. U.S. Pat. No. 6,087.558 discloses the pro molecule in which different portions are derived from differ 60 duction of proteases in plants. U.S. Pat. No. 6,271,016 dis ent animal species, such as those having a variable region closes an anthranilate synthase gene for tryptophan overpro derived from a mouse monoclonal antibody and a human duction in plants. immunoglobulin constant region. Antibody fragments that Methods of Inhibiting Expression of a Sequence of Interest have a specific binding affinity can be generated by known The polynucleotides and recombinant vectors described techniques. Such antibody fragments include, but are not 65 herein can be used to express or inhibit expression of a gene, limited to, F(ab')2 fragments that can be produced by pepsin Such as an endogenous gene involved in alkaloid biosynthe digestion of an antibody molecule, and Fab fragments that sis, e.g., to alter alkaloid biosynthetic pathways in a plant US 7,795,503 B2 41 42 species of interest. The term “expression” refers to the pro dictated by flanking regions that form complementary base cess of converting genetic information of a polynucleotide pairs with the target mRNA. The sole requirement is that the into RNA through transcription, which is catalyzed by an target RNA contain a 5'-UG-3' nucleotide sequence. The con enzyme, RNA polymerase, and into protein, through transla struction and production of hammerhead ribozymes is known tion of mRNA on ribosomes. “Up-regulation' or “activation in the art. See, for example, U.S. Pat. No. 5.254,678 and WO refers to regulation that increases the production of expres 02/46449 and references cited therein. Hammerhead sion products (mRNA, polypeptide, or both) relative to basal ribozyme sequences can be embedded in a stable RNA such or native states, while “down-regulation” or “repression' as a transfer RNA (tRNA) to increase cleavage efficiency in refers to regulation that decreases production of expression vivo. Perriman et al., Proc. Natl. Acad. Sci. USA, 92(13): products (mRNA, polypeptide, or both) relative to basal or 10 native states. 6175-6179 (1995); de Feyter and Gaudron, Methods in “Modulated level of gene expression' as used herein refers Molecular Biology, Vol. 74, Chapter 43, “Expressing to a comparison of the level of expression of a transcript of a Ribozymes in Plants”, Edited by Turner, P.C., Humana Press gene or the amount of its corresponding polypeptide in the Inc., Totowa, N.J. RNA endoribonucleases which have been presence and absence of a regulatory protein described 15 described, such as the one that occurs naturally in Tetrahy herein, and refers to a measurable or observable change in the mena thermophila, can be useful. See, for example, U.S. Pat. level of expression of a transcript of a gene or the amount of Nos. 4,987,071 and 6,423,885. its corresponding polypeptide relative to a control plant or RNAi can also be used to inhibit the expression of a gene. plant cell under the same conditions (e.g., as measured For example, a construct can be prepared that includes a through a suitable assay such as quantitative RT-PCR, a sequence that is transcribed into an interfering RNA. Such an “northern blot,” a “western blot' or through an observable RNA can be one that can anneal to itself, e.g., a double change in phenotype, chemical profile, or metabolic profile). stranded RNA having a stem-loop structure. One strand of the A modulated level of gene expression can include up-regu Stemportion of a double stranded RNA comprises a sequence lated or down-regulated expression of a transcript of a gene or that is similar or identical to the sense coding sequence of the polypeptide relative to a control plant or plant cell under the 25 polypeptide of interest, and that is from about 10 nucleotides same conditions. Modulated expression levels can occur to about 2,500 nucleotides in length. The length of the under different environmental or developmental conditions sequence that is similar or identical to the sense coding or in different locations than those exhibited by a plant or sequence can be from 10 nucleotides to 500 nucleotides, from plant cell in its native state. 15 nucleotides to 300 nucleotides, from 20 nucleotides to 100 A number of nucleic acid based methods, including anti 30 nucleotides, or from 25 nucleotides to 100 nucleotides. The sense RNA, co-suppression, ribozyme directed RNA cleav other strand of the stem portion of a double stranded RNA age, and RNA interference (RNAi) can be used to inhibit comprises a sequence that is similar or identical to the anti protein expression in plants. Antisense technology is one sense strand of the coding sequence of the polypeptide of well-known method. In this method, a nucleic acid segment interest, and can have a length that is shorter, the same as, or from a gene to be repressed is cloned and operably linked to 35 longer than the corresponding length of the sense sequence. a promoter so that the antisense strand of RNA is transcribed. The loop portion of a double stranded RNA can be from 10 The recombinant vector is then transformed into plants, as nucleotides to 5,000 nucleotides, e.g., from 15 nucleotides to described above, and the antisense strand of RNA is pro 1,000 nucleotides, from 20 nucleotides to 500 nucleotides, or duced. The nucleic acid segment need not be the entire from 25 nucleotides to 200 nucleotides. The loop portion of sequence of the gene to be repressed, but typically will be 40 the RNA can include an intron. A construct including a Substantially complementary to at least a portion of the sense sequence that is transcribed into an interfering RNA is trans Strand of the gene to be repressed. Generally, higher homol formed into plants as described above. Methods for using ogy can be used to compensate for the use of a shorter RNAi to inhibit the expression of a gene are known to those of sequence. Typically, a sequence of at least 30 nucleotides is skill in the art. See, e.g., U.S. Pat. Nos. 5,034.323; 6,326,527: used, e.g., at least 40, 50, 80, 100, 200, 500 nucleotides or 45 6,452,067; 6,573,099: 6,753,139; and 6,777,588. See also O. WO 97/01952; WO98/53083; WO99/32619, WO 98/36083: Constructs containing operably linked nucleic acid mol and U.S. Patent Publications 20030175965, 20030175783, ecules in the sense orientation can also be used to inhibit the 20040214330, and 20030180945. expression of a gene. The transcription product can be similar In some nucleic-acid based methods for inhibition of gene or identical to the sense coding sequence of a polypeptide of 50 expression in plants, a suitable nucleic acid can be a nucleic interest. The transcription product can also be unpolyadeny acid analog. Nucleic acid analogs can be modified at the base lated, lack a 5' cap structure, or contain an unsplicable intron. moiety, Sugar moiety, or phosphate backbone to improve, for Methods of co-suppression using a full-length cDNA as well example, stability, hybridization, or solubility of the nucleic as a partial cDNA sequence are known in the art. See, e.g., acid. Modifications at the base moiety include deoxyuridine U.S. Pat. No. 5,231,020. 55 for deoxythymidine, and 5-methyl-2'-deoxycytidine and In another method, a nucleic acid can be transcribed into a 5-bromo-2'-deoxycytidine for deoxycytidine. Modifications ribozyme, or catalytic RNA, that affects expression of an of the sugar moiety include modification of the 2' hydroxyl of mRNA. (See, U.S. Pat. No. 6,423,885). Ribozymes can be the ribose sugar to form 2'-O-methyl or 2'-O-allyl sugars. The designed to specifically pair with virtually any target RNA deoxyribose phosphate backbone can be modified to produce and cleave the phosphodiesterbackbone at a specific location, 60 morpholino nucleic acids, in which each base moiety is linked thereby functionally inactivating the target RNA. Heterolo to a six-membered morpholino ring, or peptide nucleic acids, gous nucleic acids can encode ribozymes designed to cleave in which the deoxyphosphate backbone is replaced by a particular mRNA transcripts, thus preventing expression of a pseudopeptide backbone and the four bases are retained. See, polypeptide. Hammerhead ribozymes are useful for destroy for example, Summerton and Weller, 1997, Antisense Nucleic ing particular mRNAS, although various ribozymes that 65 Acid Drug Dev, 7:187-195: Hyrup et al., Bioorgan. Med. cleave mRNA at site-specific recognition sequences can be Chem., 4:5-23 (1996). In addition, the deoxyphosphate back used. Hammerhead ribozymes cleave mRNAs at locations bone can be replaced with, for example, a phosphorothioate US 7,795,503 B2 43 44 or phosphorodithioate backbone, a phosphoroamidite, or an about 1-3 days. The use of transient assays is particularly alkyl phosphotriester backbone. convenient for rapid analysis in different species, or to con firm expression of a heterologous regulatory protein whose Transgenic Plant Cells and Plants expression has not previously been confirmed in particular Provided herein are transgenic plant cells and plants com recipient cells. prising at least one recombinant nucleic acid construct or Techniques for introducing nucleic acids into monocoty exogenous nucleic acid. A recombinant nucleic acid construct ledonous and dicotyledonous plants are known in the art, and or exogenous nucleic acid can include a regulatory region as include, without limitation, Agrobacterium-mediated trans described herein, a nucleic acid encoding a regulatory protein formation, viral vector-mediated transformation, electropo as described herein, or both. In certain cases, a transgenic 10 ration and particle gun transformation, e.g., U.S. Pat. Nos. plant cellor plant comprises at least two recombinant nucleic 5,538,880, 5,204,253, 6,329,571 and 6,013,863. If a cell or acid constructs or exogenous nucleic acids, one including a tissue culture is used as the recipient tissue for transforma regulatory region, and one including a nucleic acid encoding tion, plants can be regenerated from transformed cultures if the associated regulatory protein. desired, by techniques known to those skilled in the art. See, A plant or plant cell used in methods of the invention 15 e.g., Allen et al., “RNAi-mediated replacement of morphine contains a recombinant nucleic acid construct as described with the normarcotic alkaloid reticuline in opium poppy.” herein. A plant or plant cell can be transformed by having a Nature Biotechnology 22(12): 1559-1566 (2004); Chitty et construct integrated into its genome, i.e., can be stably trans al., “Genetic transformation in commercial Tasmanian cul formed. Stably transformed cells typically retain the intro tures of opium poppy, Papaver somniferum, and movement duced nucleic acid with each cell division. A plant or plant of transgenic pollen in the field.” Funct. Plant Biol. 30:1045 cell can also be transiently transformed Such that the con struct is not integrated into its genome. Transiently trans 1058 (2003); and Park et al., J. Exp. Botany 51(347): 1005 formed cells typically lose all or some portion of the intro 1016 (2000). duced nucleic acid construct with each cell division such that Plant Species the introduced nucleic acid cannot be detected in daughter 25 The polynucleotides and vectors described herein can be cells after a sufficient number of cell divisions. Both tran used to transform a number of monocotyledonous and dicoty siently transformed and stably transformed transgenic plants ledonous plants and plant cell systems. A Suitable group of and plant cells can be useful in the methods described herein. plant species includes dicots, such as poppy, safflower, Typically, transgenic plant cells used in methods described alfalfa, Soybean, cotton, coffee, rapeseed (high erucic acid herein constitute part or all of a whole plant. Such plants can 30 and canola), or Sunflower. Also suitable are monocots such as be grown in a manner Suitable for the species under consid corn, wheat, rye, barley, oat, rice, millet, amaranth or Sor eration, eitherina growth chamber, agreenhouse, or in a field. ghum. Also suitable are vegetable crops or root crops such as Transgenic plants can be bred as desired for a particular lettuce, carrot, onion, broccoli, peas, Sweet corn, popcorn, purpose, e.g., to introduce a recombinant nucleic acid into tomato, potato, beans (including kidney beans, lima beans, other lines, to transfer a recombinant nucleic acid to other 35 dry beans, green beans) and the like. Also suitable are fruit species or for further selection of other desirable traits. Alter crops such as grape, Strawberry, pineapple, melon (e.g., natively, transgenic plants can be propagated vegetatively for watermelon, cantaloupe), peach, pear, apple, cherry, orange, those species amenable to such techniques. Progeny includes lemon, grapefruit, plum, mango, banana, and palm. descendants of a particular plant or plant line. Progeny of an Thus, the methods and compositions described herein can instant plant include seeds formed on F, F, F, F, Fs, F and 40 be utilized with dicotyledonous plants belonging to the orders Subsequent generation plants, or seeds formed on BC, BC, Magniolales, Illiciales, Laurales, Piperales, Aristochiales, BC, and Subsequent generation plants, or seeds formed on Nymphaeales, , Papeverales, Sarraceniaceae, FBC, FBC, FBC, and Subsequent generation plants. Trochodendrales, Hamamelidales, Eucomiales, Leitneriales, Seeds produced by a transgenic plant can be grown and then Myricales, Fagales, Casuarinales, Caryophyllales, Batales, selfed (or outcrossed and selfed) to obtain seeds homozygous 45 Polygonales, Plumbaginales, Dilleniales. Theales, Malvales, for the nucleic acid construct. Urticales, Lecythidales, Violales, Salicales, Capparales, Eri Transgenic plant cells growing in Suspension culture, or cales, Diapensales, Ebenales, Primulales, Rosales, Fabales, tissue or organ culture, can be useful for extraction of alkaloid Podostemales, Haloragales, Myrtales, Cornales, Proteales, compounds. For the purposes of this invention, Solid and/or Santales, Rafflesiales, Celastrales, Euphorbiales, Rhamnales, liquid tissue culture techniques can be used. When using Solid 50 , Juglandales, Geraniales, Polygalales, Umbella medium, transgenic plant cells can be placed directly onto the les, Gentianales, Polemoniales, Lamiales, Plantaginales, medium or can be placed onto a filter film that is then placed Scrophulariales, Campanulales, Rubiales, Dipsacales, and in contact with the medium. When using liquid medium, Asterales. Methods described herein can also be utilized with transgenic plant cells can be placed onto a floatation device, monocotyledonous plants belonging to the orders Alis e.g., a porous membrane that contacts the liquid medium. 55 matales, Hydrocharitales, Najadales, Triuridales, Commeli Solid medium typically is made from liquid medium by add males, Eriocaulales, Restionales, Poales, Juncales, Cyperales, ing agar. For example, a solid medium can be Murashige and Typhales, Bromeliales, Zingiberales, Arecales, Cyclanthales, Skoog (MS) medium containing agar and a suitable concen Pandanales, Arales, Lilliales, and Orchidales, or with plants tration of an auxin, e.g., 2,4-dichlorophenoxyacetic acid (2,4- belonging to Gymnospermae, e.g., Pinales, Ginkgoales, D), and a Suitable concentration of a cytokinin, e.g., kinetin. 60 Cycadales and Gnetales. When transiently transformed plant cells are used, a The invention has use over a broad range of plant species, reporter sequence encoding a reporter polypeptide having a including species from the genera Allium, Alseodaphne, reporter activity can be included in the transformation proce Anacardium, Arachis, Asparagus, Atropa, Avena, dure and an assay for reporter activity or expression can be Beilschmiedia, Brassica, Citrus, Citrullus, Capsicum, Catha performed at a suitable time after transformation. A suitable 65 ranthus, Carthamus, Cocculus, Cocos, Coffea, Croton, Cucu time for conducting the assay typically is about 1-21 days mis, Cucurbita, Daucus, Duguetia, Elaeis, Eschscholzia, after transformation, e.g., about 1-14 days, about 1-7 days, or Ficus, Fragaria, Glaucium, Glycine, Gossypium, Helianthus, US 7,795,503 B2 45 46 Heterocallis, Hevea, Hordeum, Hyoscyamus, Lactuca, to a control plant or cell not transgenic for the particular Landolphia, Linum, Litsea, Lolium, Lupinus, Lycopersicon, regulatory protein using the methods described herein. In Malus, Manihot, Majorana, Medicago, Musa, Nicotiana, certain cases, therefore, more than one alkaloid compound Olea, Oryza, Panicum, Pannesetum, Papaver, Parthenium, (e.g., two, three, four, five, six, seven, eight, nine, ten or even Persea, Phaseolus, Pinus, Pistachia, Pisum, Pyrus, Prunus, more alkaloid compounds) can have its amount modulated Raphanus, Rhizocarya, Ricinus, Secale, Senecio, Sinome relative to a control plant or cell that is not transgenic for a nium, Sinapis, Solanum, Sorghum, Stephania, Theobroma, regulatory protein described herein. Trigonella, Triticum, Vicia, Vinca, Vitis, Vigna, and Zea. Alkaloid compounds can be grouped into classes based on Particularly suitable plants with which to practice the chemical and structural features. Alkaloid classes described invention include plants that are capable of producing one or 10 herein include, without limitation, tetrahydrobenzyliso more alkaloids. A plant that is capable of producing one or quinoline alkaloids, morphinan alkaloids, benzophenanthri more alkaloids' refers to a plant that is capable of producing dine alkaloids, monoterpenoid indole alkaloids, bisbenzyl one or more alkaloids even when it is not transgenic for a isoquinoline alkaloids, pyridine alkaloids, purine alkaloids, regulatory protein described herein. For example, a plant tropane alkaloids, quinoline alkaloids, terpenoid alkaloids, from the Solanaceae or Papaveraceae family is capable of 15 betaine alkaloids, Steroid alkaloids, acridone alkaloids, and producing one or more alkaloids when it is not transgenic for phenethylamine alkaloids. Other classifications may be a regulatory protein described herein. In certain cases, a plant known to those having ordinary skill in the art. Alkaloid or plant cell may be transgenic for sequences other than the compounds whose amounts are modulated relative to a con regulatory protein sequences described herein, e.g., growth trol plant can be from the same alkaloid class or from different factors or stress modulators, and can still be characterized as alkaloid classes. “capable of producing one or more alkaloids, e.g., a Solan In certain embodiments, a morphinan alkaloid compound aceae family member transgenic for a growth factor but not that is modulated is salutaridine, salutaridinol, salutaridinol transgenic for a regulatory protein described herein. acetate, thebaine, isothebaine, papaverine, narcotine, narce Useful plant families that are capable of producing one or ine, hydrastine, oripavine, morphinone, morphine, codeine, more alkaloids include the Papaveraceae, Berberidaceae, 25 codeinone, and neopinone. Other morphinan analog alkaloid Lauraceae, Menispermaceae, Euphorbiaceae, Leguminosae, compounds of interest include sinomenine, flavinine, ore Boraginaceae, Apocynaceae, Asclepiadaceae, Liliaceae, obeiline, and Zipperine. Gnetaceae, Erythroxylaceae, Convolvulaceae, Ranuncu In other embodiments, a tetrahydrobenzylisoquinoline laeceae, Rubiaceae, Solanaceae, and Rutaceae families. The alkaloid compound that is modulated is 2'-norberbamunine, Papaveraceae family, for example, contains about 250 species 30 S-coclaurine, S-norcoclaurine, R-N-methyl-coclaurine, S-N- found mainly in the northern temperate regions of the world methylcoclaurine, S-3'-hydroxy-N-methylcoclaurine, aro and includes plants such as California poppy and Opium marine, S-3-hydroxycoclaurine, S-norreticuline, R-norreti poppy. Useful genera within the Papaveraceae family include culine, S-reticuline, R-reticuline, S-scoulerine, the Papaver (e.g., Papaver bracteatum, Papaver Orientate, S-cheilanthifoline, S-stylopine, S-cis-N-methyl-stylopine, Papa versetigerum, and Papaver SOmniferum), Sanguinaria, 35 protopine, 6-hydroxy-protopine, 1,2-dehydro-reticuline, Dendromecon, Glaucium, Meconopsis, Chelidonium, S-tetrahydrocolumbamine, columbamine, palmatine, tet Eschscholzioideae (e.g., Eschscholzia, Eschscholzia Califor rahydropalmatine, S-canadine, berberine, noscapine, S-nor nia), and (e.g., Argemone hispida, Argemone mexi laudenosoline, 6-O-methylnorlaudanosoline, and nororienta cana, and Argemone munita) genera. Other alkaloid produc line. ing species with which to practice this invention include 40 In some embodiments, a benzophenanthridine alkaloid Croton salutaris, Croton balsamifera, Sinomenium acutum, compound can be modulated, which can be dihydrosangui Stephania cepharantha, Stephania Zippeliana, Litsea seb narine, sanguinarine, dihydroxy-dihydro-sanguinarine, iferea, Alseodaphne perakensis, Cocculus laurifolius, Dug 12-hydroxy-dihydrochelirubine, 10-hydroxy-dihydro-san uetia obovata, Rhizocarya racemifera, and Beilschmiedia guinarine, dihydro-macarpine, dihydro-chelirubine, dihydro Oreophila, or other species listed in Table 2, below. 45 sanguinarine, chelirubine, 12-hydroxy-chelirubine, or mac arpine. Alkaloid Compounds In yet other embodiments, monoterpenoid indole alkaloid Compositions and methods described herein are useful for compounds that are modulated include vinblastine, Vincris producing one or more alkaloid compounds. Alkaloid com tine, yohimbine, ajmalicine, ajmaline, and Vincamine. In pounds are nitrogenous organic molecules that are typically 50 other cases, a pyridine alkaloid is modulated. A pyridine derived from plants. Alkaloid biosynthetic pathways often alkaloid can be piperine, coniine, trigonelline, arecaidine, includeamino acids as reactants. Alkaloid compounds can be guvacine, pilocarpine, cytosine, nicotine, and sparteine. A mono-, bi-, or polycyclic compounds. Bi- or poly-cyclic com tropane alkaloid that can be modulated includes atropine, pounds can include bridged structures or fused rings. In cer cocaine, tropacocaine, hygrine, ecgonine, (-) hyoscyamine, tain cases, an alkaloid compound can be a plant secondary 55 (-) scopolamine, and pelletierine. A quinoline alkaloid that is metabolite. modulated can be quinine, Strychnine, brucine, Veratrine, or The regulatory proteins described previously can modulate cevadine. Acronycine is an example of an acridone alkaloid. transcription of sequences involved in the biosynthesis of In some cases, a phenylethylamine alkaloid can be modu alkaloid compounds. Thus, a transgenic plant or cell compris lated, which can be MDMA, methamphetamine, mescaline, ing a recombinant nucleic acid expressing Such a regulatory 60 and ephedrine. In other cases, a purine alkaloid is modulated, protein can be effective for modulating the amount and/or rate Such as the Xanthines caffeine, theobromine, theacrine, and of biosynthesis of one or more of Such alkaloids in a plant theophylline. containing the associated regulatory region, either as a Bisbenzylisoquinoline alkaloids that can be modulated in genomic sequence or introduced in a recombinant nucleic amount include (+)tubocurarine, dehatrine, (+)thalicarpine, acid construct. 65 aromoline, guatteguamerine, berbamunine, and isotetradine. An amount of one or more of any individual alkaloid com Yet another alkaloid compound that can be modulated in pound can be modulated, e.g., increased or decreased, relative amount is 3,4-dihydroxyphenylacetaldehyde. US 7,795,503 B2 47 48 Certain useful alkaloid compounds, with associated plant species that are capable of producing them, are listed in Table TABLE 2-continued 2, below. Alkaloid Compound Table TABLE 2 Alkaloid Name Plant Source(s) Alkaloid Compound Table Physostigmine Physostigma venenostin Pilocarpine microphyllus, Philocarpus spp. Alkaloid Name Plant Source(s) Oxandrin Pseudoxandra iticida Sarpagine Rati wolfia & Vinca spp. Apomorphine Papaver somniferum 10 Deserpidine Rati wolfia canescens, Rauwolfia spp. Hemsleyadine Aconii in hemsleyanum, Hemsleya anabilis Rescinnamine Rati wolfia spp. Anabasine Anabasis sphylia Reserpine Rati wolfia Serpentina, Rauwolfia spp. Aconitine Aconium spp. Ajmaline Rati wolfia Serpentina, Rauwolfia spp., Melodinus Anisodamine Anisodus tangiiticits balansae, Tonduzia longifolia Anisodine Dattira sangutinera Ajmalicine Rati wolfia spp., Vinca rosea Arecoline Areca Catechni 15 Sanguinarine Sanguinaria Canadensis, Eschscholtzia Atropine Atropa belladonna, Datura Stomonium Californica Homatropine Atropa belladonna Matrine Sophora spp. Berberine Berberis spp. and Mahonia spp. Tetrandrine Stephania tetrandra Caffeine Camelia sinensis, Theobroma cacao, Coffea Strychnine Strychnos nux-vonica, Strychnos spp. arabica, Cola spp. Brucine Strychnos spp. Camptothecin Camptotheca actiminata Protoveratrines A, B Veratrum spp. Orothecin Camptotheca actiminata Cyclopamine Vertatrum spp. 9-amino camptothecin Camptotheca actiminata Veratramine Veratrum spp. Topotecan Camptotheca actiminata Vasicine Vinca minor, Gaiega officinalis Irinotecan Camptotheca actiminata Vindesline Vinca rosea Castanospermine Castanosperma australe, Alexa spp. Vincamine Vinca spp. Vinblastine Catharanthus roseus Buprenorphine Papaver somniferum Vincristine Catharanthus roseus 25 Cimetropium Bromide Atropa, Datura, Scopolia, Hyoscyamits spp. Vinorelbine Catharanthus roseus Levallorphan Papaver somniferum Emetime Alangium iamarkii, Cephaelis ipecacuanha, Serpentine Rati wolfvia spp. and Catharanthus spp. Psychotria spp. NoScapine Papaver somniferum Homoharringtonine Cephalotaxi is spp. Scopolamine Atropa, Datura, Scopoia, Hyoscyamits spp. Harringtonine Cephalotaxi is spp. Salutaridine Croton salutaris, Croton balsamifera, Papaver Tubocurarine Chondodendron to mentosum 30 spp. and Giaticium spp. Quinine Cinchona officinalis, Cinchona spp., Remijia Sinomenine Sniomenium actitum and Stephania cepharantha pedunculata Flavinine Litsea sebiferea, Alseodaphne periakensis, Quinidine Cinchona spp., Remiia pedunculata Coccitius lat-trifolius, Duguetia obovata and Cissampareine Cissanpeios pareira Rhizocarya racemifera Cabergoline Claviceps pupi trea Oreobeiline Beilschmiedia oreophila Colchicine Coichicum autumnaie 35 Zippeline Stephania zippeiiana Demecolcine Colchicum spp., Merendera spp. Palmatine Copiisiaponica, Berberis spp., Mahonia spp. Tetrahydropalmatine Copiisiaponica, Berberis spp., Mahonia spp. Monocrotaline Crotaiaria spp. The amount of one or more alkaloid compounds can be Sparteine Cytistis scoparitis, Sophora pschycarpa, increased or decreased in transgenic cells or tissues express Ammodendron spp. ing a regulatory protein as described herein. An increase can Changrolin Dichroa febrifuga 40 Ephedrine Ephedra Sinica, Ephedra spp. be from about 1.5-fold to about 300-fold, or about 2-fold to Cocaine Erythroxylum coca about 22-fold, or about 50-fold to about 200-fold, or about Rotundine Eschsholtzia californica, Stephania sinica, 75-fold to about 130-fold, or about 5-fold to about 50-fold, or Eschsholtzia spp., Argemone spp. about 5-fold to about 10-fold, or about 10-fold to about Galanthamine Gaianthus wonorii 20-fold, or about 150-fold to about 200-fold, or about 20-fold Gelsemin Geisemium sempervivens 45 Glaucine Glaucium flavum, Berberis spp. and to about 75-fold, or about 10-fold to about 100-fold, or about Mahonia spp. 40-fold to about 150-fold, about 100-fold to about 200-fold, Indicine Heliotropium indicum & Messerschmidia about 150-fold to about 300-fold, or about 30-fold to about argentea 50-fold higher than the amount in corresponding control cells Hydrastine Hydrastis Canadensis Hyoscyamine Hyoscyamits, Atropa, Datura, Scopoia spp. 50 or tissues that lack the recombinant nucleic acid encoding the a-Lobeline Lobelia spp. regulatory protein. Huperzine A Lycopodium Serratum (= Hiperzia Serrata), Lycopodium spp. In other embodiments, the alkaloid compound that is Ecteinascidin 743 Marine tunicate - Ecteinascidia turbinata increased in transgenic cells or tissues expressing a regulatory Nicotine Nicotiana tabacum protein as described herein is either not produced or is not Ellipticine Ochrosia spp., Aspidospera Subincantin, 55 detectable in corresponding control cells or tissues that lack Bleekeria viiiensis the recombinant nucleic acid encoding the regulatory protein. 9-Methoxyellipticine Ochrosia spp., Excavatia coccinea, Bleekeria viiiensis Thus, in Such embodiments, the increase in Such an alkaloid Codeine Papaver somniferum compound is infinitely high as compared to corresponding Hydrocodone Papaver somniferum control cells or tissues that lack the recombinant nucleic acid Hydromorphone Papaver somniferum encoding the regulatory protein. For example, in certain Morphine Papaver somniferum 60 Narceline Papaver somniferum cases, a regulatory protein described herein may activate a Oxycodone Papaver somniferum biosynthetic pathway in a plant that is not normally activated Oxymorphone Papaver somniferum or operational in a control plant, and one or more new alka Papaverine Papaversonniferum, Rauwolfia Serpentina loids that were not previously produced in that plant species Thebaine Papaver bracteatin, Papaver spp. Yohimbine Pausinystalia yohimbe, Rauwolfia, Vinca, & 65 can be produced. Catharanthus spp. The increase in amount of one or more alkaloids can be restricted in Some embodiments to particular tissues and/or US 7,795,503 B2 49 50 organs, relative to other tissues and/or organs. For example, a the reporter. If reporter expression or activity is observed, it transgenic plant can have an increased amount of an alkaloid can be concluded that the regulatory protein increases tran in leaf tissue relative to root or floral tissue. Scription of the reporter coding sequence, Such as by binding In other embodiments, the amounts of one or more alka the regulatory region. A positive result indicates that expres loids are decreased in transgenic cells or tissues expressing a sion of the regulatory protein being tested in a plant would be regulatory protein as described herein. A decrease ratio can be effective for increasing the in planta amount and/or rate of expressed as the ratio of the alkaloid in Such a transgenic cell biosynthesis of one or more sequences of interest operably or tissue on a weight basis (e.g., fresh or freeze dried weight linked to the associated regulatory region. basis) as compared to the alkaloid in a corresponding control Similarly, a method of determining whether or not a regu cellor tissue that lacks the recombinant nucleic acid encoding 10 latory region is activated by a regulatory protein can include the regulatory protein. The decrease ratio can be from about determining whether or not reporter activity is detected in a 0.05 to about 0.90. In certain cases, the ratio can be from plant cell transformed with a recombinant nucleic acid con about 0.2 to about 0.6, or from about 0.4 to about 0.6, or from struct comprising a regulatory region as described herein about 0.3 to about 0.5, or from about 0.2 to about 0.4. operably linked to a reporter nucleic acid, and with a recom In certain embodiments, the alkaloid compound that is 15 binant nucleic acid construct comprising a nucleic acid decreased in transgenic cells or tissues expressing a regula encoding a test regulatory protein. Detection of reporter tory protein as described herein is decreased to an undetect activity indicates that the regulatory region is activated by the able level as compared to the level in corresponding control test regulatory protein. In certain cases, the regulatory protein cells or tissues that lack the recombinant nucleic acid encod is a regulatory protein as described herein, e.g., comprising a ing the regulatory protein. Thus, in Such embodiments, the polypeptide sequence having 80% or greater sequence iden decrease ratio in Such an alkaloid compound is Zero. tity to a polypeptide sequence set forth in any of SEQ ID The decrease in amount of one or more alkaloids can be NOs:2-5, SEQ ID NOs:7-16, SEQ ID NOs: 18-33, SEQ ID restricted in Some embodiments to particular tissues and/or NOs:35-42, SEQID NOs:44-61, SEQID NOs:63-70, SEQ organs, relative to other tissues and/or organs. For example, a ID NOs:72-79, SEQ ID NOs:81-86, SEQ ID NOs:88-99, transgenic plant can have a decreased amount of an alkaloid 25 SEQID NOs: 101-113, SEQID NOs: 115-122, SEQID NOs: in leaf tissue relative to root or floral tissue. 124-136, SEQID NOs: 138-150, SEQID NOs: 152-156, SEQ In some embodiments, the amounts of two or more alka ID NOs:158-167, SEQID NOs: 169-175, SEQID NOs: 177 loids are increased and/or decreased, e.g., the amounts of two, 188, SEQID NOS:190-191, SEQID NOs: 193-194, SEQID three, four, five, six, seven, eight, nine, ten (or more) alkaloid NO:196, SEQ ID NO:198, SEQ ID NOs:200-204, SEQ ID compounds are independently increased and/or decreased. 30 NOs:206-215, SEQ ID NO:217, SEQ ID NO:219, SEQ ID The amount of an alkaloid compound can be determined by NOs:221-236, or a consensus sequence set forth in FIGS. known techniques, e.g., by extraction of alkaloid compounds 1-22. followed by gas chromatography-mass spectrometry (GC A transformation can be a transient transformation or a MS) or liquid chromatography-mass spectrometry (LC-MS). stable transformation, as discussed previously. The regula If desired, the structure of the alkaloid compound can be 35 tory region and the nucleic acid encoding a test regulatory confirmed by GC-MS, LC-MS, nuclear magnetic resonance protein can be on the same or different nucleic acid con and/or other known techniques. StructS. A reporteractivity, such as an enzymatic or optical activity, Methods of Screening for Associations and Modulating can permit the detection of the presence of the reporter Expression of Sequences of Interest 40 polypeptide in situ or in vivo, either directly or indirectly. For Provided herein are methods of screening for novel regu example, a reporter polypeptide can itself be bioluminescent latory region-regulatory protein association pairs. The upon exposure to light. As an alternative, a reporter polypep described methods can thus determine whether or not a given tide can catalyze a chemical reaction in vivo that yields a regulatory protein can activate a given regulatory region (e.g., detectable product that is localized inside or that is associated to modulate expression of a sequence of interest operably 45 with a cell that expresses the chimeric polypeptide. Exem linked to the given regulatory region). plary bioluminescent reporter polypeptides that emit light in A method of determining whether or not a regulatory the presence of additional polypeptides, Substrates or cofac region is activated by a regulatory protein can include deter tors include firefly luciferase and bacterial luciferase. Biolu mining whether or not reporter activity is detected in a plant minescent reporter polypeptides that fluoresce in the absence cell transformed with a recombinant nucleic acid construct 50 of additional proteins, Substrates or cofactors when exposed comprising a test regulatory region operably linked to a to light having a wavelength in the range of 300 nm to 600 nm nucleic acid encoding a polypeptide having the reporteractiv include, for example: amFP486, Mut 15-amFP486, Mut32 ity and with a recombinant nucleic acid construct comprising amFP486, CNFP-MODCd1 and CNFP-MODCd2; asFP600, a nucleic acid encoding a regulatory protein described herein. mut1-RNFP, NE-RNFP, d1RNFP and d2RNFP; cFP484, Detection of the reporter activity indicates that the test regu 55 A 19-cFP484 and A38-cFP484: dgFP512; dmFP592: latory region is activated by the regulatory protein. In certain drFP583, E5 drFP583, E8 drEP583, E5UP drFP583, E5down cases, the regulatory region is a regulatory region as described drFP583, E57 drEP583, AG4 drPP583 and AG4H drFP583; herein, e.g., comprising a nucleic acid sequence having 80% drFP583/dmFP592, drFP583/dmFP592-2G and drFP583/ or greater sequence identity to a regulatory region as set forth dmFP592-Q3;dsFP483; ZFP506, N65M-ZFP506, d1zFP506 in SEQID NOs:237-252. 60 and d2ZFP506; ZFP538, M128V-ZFP538, YNFPM128V For example, a plant can be made that is stably transformed MODCd1 and YNFPM128V-MODCd2; GFP; EGFP, ECFP, with a sequence encoding a reporter operably linked to the EYFP, EBFP, BFP2; d4EGFP, d2EGFP, and d1EGFP; and regulatory region under investigation. The plant is inoculated DsRed and DsRed 1. See WO 00/34318: WO 00/34320; WO with Agrobacterium containing a sequence encoding a regu 00/34319; WO 00/34321; WO00/34322; WO 00/34323; WO latory protein on a Tiplasmid vector. A few days after inocu 65 00/34324; WO 00/34325; WO 00/34326; GenBank Acces lation, the plant tissue is examined for expression of the sion No. AAB57606: Clontech User Manual, April 1999, reporter, or for detection of reporter activity associated with PT2040-1, version PR94845; Li et al., J. Biol. Chem. 1998, US 7,795,503 B2 51 52 273:34970-5; U.S. Pat. No. 5,777,079; and Clontech User acid construct comprising a nucleic acid encoding a regula Manual, October 1999, PT34040-1, version PR9x217. tory protein as described herein, where the regulatory region Reporter polypeptides that catalyze a chemical reaction that and the regulatory protein are associated as indicated in Table yields a detectable product include, for example, B-galactosi 4 (under Example 5 below) and as described herein. Accord dase or B-glucuronidase. Other reporter enzymatic activities ingly, an orthologous sequence and a polypeptide corre for use in the invention include neomycin phosphotransferase sponding to the consensus sequence of a given regulatory activity and phosphinotricin acetyl transferase activity. protein would also be considered to be associated with the In some cases, it is known that a particular transcription regulatory region shown in Table 4 (under Example 5 below) factor can activate transcription from a particular alkaloid to be associated with the given regulatory protein. A method regulatory region(s), e.g., a regulatory region involved in 10 for expressing an endogenous sequence of interest can alkaloid biosynthesis. In these cases, similar methods can include growing Such a plant cell under conditions effective also be useful to Screen other regulatory regions, such as other for the expression of the regulatory protein. An endogenous regulatory regions involved in alkaloid biosynthesis, to deter sequence of interest can in certain cases be a nucleic acid mine whether they are activated by the same transcription encoding a polypeptide involved in alkaloid biosynthesis, factor. Thus, the method can comprise transforming a plant 15 Such as an alkaloid biosynthesis enzyme or a regulatory pro cell with a nucleic acid comprising a test regulatory region tein involved in alkaloid biosynthesis. operably linked to a nucleic acid encoding a polypeptide In other cases, knowledge of an associated regulatory having reporter activity. The plant cell can include a recom region-regulatory protein pair can be used to modulate binant nucleic acid encoding a regulatory protein operably expression of exogenous sequences of interest by endog linked to a regulatory region that drives transcription of the enous regulatory proteins. Such a method can include trans regulatory protein in the cell. If reporter activity is detected, it forming a plant cell that includes a nucleic acid encoding a can be concluded that the regulatory protein activates tran regulatory protein as described herein, with a recombinant Scription mediated by the test regulatory region. nucleic acid construct comprising a regulatory region Provided herein also are methods to modulate expression described herein, where the regulatory region is operably of sequences of interest. Modulation of expression can be 25 linked to a sequence of interest, and where the regulatory expression itself, an increase in expression, or a decrease in region and the regulatory protein are associated as shown in expression. Such a method can involve transforming a plant Table 4 (under Example 5 below) and described herein. A cell with, or growing a plant cell comprising, at least one method of expressing a sequence of interest can include recombinant nucleic acid construct. A recombinant nucleic growing such a plant cell under conditions effective for the acid construct can include a regulatory region as described 30 expression of the endogenous regulatory protein. above, e.g., comprising a nucleic acid having 80% or greater Also provided are methods for producing one or more sequence identity to a regulatory region set forth in SEQID alkaloids. Such a method can include growing a plant cell that NOs:237-252, where the regulatory region is operably linked includes a nucleic acid encoding an exogenous regulatory to a nucleic acid encoding a sequence of interest. In some protein as described herein and an endogenous regulatory cases, a recombinant nucleic acid construct can further 35 region as described herein operably linked to a sequence of include a nucleic acid encoding a regulatory protein as interest. The regulatory protein and regulatory region are described above, e.g., comprising a polypeptide sequence associated, as described previously. A sequence of interest having 80% or greater sequence identity to a polypeptide can encode a polypeptide involved in alkaloid biosynthesis. A sequence set forth in any of SEQID NOS:2-5, SEQID NOs: plant cell can be from a plant capable of producing one or 7-16, SEQID NOs: 18-33, SEQID NOS:35-42, SEQID NOs: 40 more alkaloids. The plant cell can be grown under conditions 44-61, SEQ ID NOs: 63-70, SEQ ID NOs:72-79, SEQ ID effective for the expression of the regulatory protein. The one NOs:81-86, SEQID NOs:88-99, SEQID NOs: 101-113, SEQ or more alkaloids produced can be novel alkaloids, e.g., not ID NOs: 115-122, SEQID NOs: 124-136, SEQID NOs: 138 normally produced in a wild-type plant cell. 150, SEQID NOs: 152-156, SEQ ID NOs: 158-167, SEQID Alternatively, a method for producing one or more alka NOs: 169-175, SEQID NOs: 177-188, SEQID NOs:190-191, 45 loids can include growing a plant cell that includes a nucleic SEQ ID NOs: 193-194, SEQ ID NO:196, SEQ ID NO.198, acid encoding an endogenous regulatory protein as described SEQ ID NOS:200-204, SEQ ID NOS:206-215, SEQ ID herein and a nucleic acid including an exogenous regulatory NO:217, SEQID NO:219, SEQID NOS:221-236, or a con region as described herein operably linked to a sequence of sensus sequence set forth in FIGS. 1-22. In other cases, the interest. A sequence of interest can encode a polypeptide nucleic acid encoding the described regulatory protein is 50 involved in alkaloid biosynthesis. A plant cell can be grown contained on a second recombinant nucleic acid construct. In under conditions effective for the expression of the regulatory either case, the regulatory region and the regulatory protein protein. The one or more alkaloids produced can be novel are associated, e.g., as shown in Table 4 (under Example 5 alkaloids, e.g., not normally produced in a wild-type plant below) or as described herein (e.g., all orthologs of a regula cell. tory protein are also considered to associate with the regula 55 Provided herein also are methods for modulating (e.g., tory regions shown to associate with a given regulatory pro altering, increasing, or decreasing) the amounts of one or tein in Table 4, under Example 5 below). A plant cell is more alkaloids in a plant cell. The method can include grow typically grown under conditions effective for the expression ing a plant cell as described above, e.g., a plant cell that of the regulatory protein. includes a nucleic acid encoding an endogenous or exog As will be recognized by those having ordinary skill in the 60 enous regulatory protein, where the regulatory protein asso art, knowledge of an associated regulatory region-regulatory ciates with, respectively, an exogenous or endogenous regu protein pair can also be used to modulate expression of latory region operably linked to a sequence of interest. In Such endogenous sequences of interest that are operably linked to cases, a sequence of interest can encode a polypeptide endogenous regulatory regions. In Such cases, a method of involved in alkaloid biosynthesis. Alternatively, a sequence of modulating expression of a sequence of interest includes 65 interest can result in a transcription product such as an anti transforming a plant cell that includes an endogenous regu sense RNA or interfering RNA that affects alkaloid biosyn latory region as described herein, with a recombinant nucleic thesis pathways, e.g., by modulating the steady-state level of US 7,795,503 B2 53 54 mRNA transcripts available for translation that encode one or Plants were infected by mildly wounding the surface of a more alkaloid biosynthesis enzymes. leaf using the tip of a Syringefneedle containing a suspension The invention will be further described in the following of one of the Agrobacterium transformants. A small droplet of examples, which do not limit the scope of the invention the Agrobacterium Suspension was placed on the wound area described in the claims. 5 after wounding. Each leaf was wounded approximately 10 times at different positions on the same leaf. Each leaf was EXAMPLES wounded using one Agrobacterium transformant. The syringe needle preferably did not pierce through the leaf to Example 1 increase the likelihood of Agrobacterium infection on the 10 wounded site. Treated leaves were left attached to the mother Generation of Arabidopsis Plants Containing plant for at least 5 days prior to analysis. Alkaloid Regulatory Region::Luciferase Constructs Example 3 T-DNA binary vector constructs were made using standard Screening of Regulatory Proteins in Nicotiana molecular biology techniques. A set of constructs were made 15 that contained a luciferase coding sequence operably linked Stable Nicotiana tabacum screening lines, cultivar Sam to one or two of the regulatory regions set forth in SEQ ID Sun, were generated by transforming Nicotiana leaf explants NO:237, SEQID NOS:239-247, SEQID NOs:249-250, and with the T-DNA binary vector containing regulatory region SEQ ID NO:252. Each of these constructs also contained a and luciferase reporter construct as described in Example 1, marker gene conferring resistance to the herbicide Finale R. following the transformation protocol essentially described Each construct was introduced into Arabidopsis ecotype by Rogers, S. G. et al., Methods in Enzymology 118:627 Wassilewskija (WS) by the floral dip method essentially as (1987). Leaf disks were cut from leaves of the screening lines described in Bechtold et al., C.R. Acad. Sci. Paris, 316:1194 using a paper puncher and were transiently infected with 1199 (1993). The presence of each reporter region:luciferase Agrobacterium clones prepared as described in Example 2. In construct was verified by PCR. At least two independent 25 addition, leaf disks from wild-type Nicotiana tabacum plants, events from each transformation were selected for further cultivar SR1, were transiently infected with Agrobacterium study; these events were referred to as Arabidopsis thaliana containing a binary vector comprising a CaMV 35S consti screening lines. T (first generation transformant) seeds were tutive promoter operably linked to a luciferase reporter cod germinated and allowed to self-pollinate. T. (second genera ing sequence. These leaf disks were used as positive controls tion, progeny of self-pollinated T plants) seeds were col 30 to indicate that the method of Agrobacterium infection was lected and a portion were germinated and allowed to self working. Some leaf disks from Nicotiana screening plants pollinate. T (third generation, progeny of self-pollinated T. were transiently infected with Agrobacterium containing a plants) seeds were collected. binary construct of a CaMV S constitutive promoter oper ably linked to a GFP coding sequence. These leaf disks served Example 2 35 as reference controls to indicate that the luciferase reporter activity in the treated disks was not merely a response to Screening of Regulatory Proteins in Arabidopsis treatment with Agrobacterium. Transient infection was performed by immersing the leaf T or T. Seeds of the Arabidopsis thaliana Screening lines 40 disks in about 5 to 10 mL of a suspension of Agrobacterium described in Example 1 were planted in Soil comprising Sun culture, prepared as described in Example 2, for about 2 min. shine LP5 Mix and Thermorock Vermiculite Medium #3 at a Treated leaf disks were briefly and quickly blot-dried in tissue ratio of 60:40, respectively. The seeds were stratified at 4°C. paper and then transferred to a plate lined with paper towels for approximately two to three days. After stratification, the sufficiently wet with 1x MS solution (adjusted to pH 5.7 with seeds were transferred to the greenhouse and covered with a 45 1N KOH and supplemented with 1 mg/L BAP and 0.25 mg/L plastic dome and tarp until most of the seeds had germinated. NAA). The leaf disks were incubated in a growth chamber Plants were grown under long day conditions. Approximately under long-day light/dark cycle at 22°C. for 5 days prior to seven to ten days post-germination, plants were sprayed with analysis. Finale Rherbicide to confirm that the plants were transgenic. Example 4 Between three to four weeks after germination, the plants 50 were used for screening. Co-infection Experiments in Nicotiana T-DNA binary vector constructs comprising a CaMVS constitutive promoter operably linked to one of the regulatory In some cases, a mixture of two different Agrobacterium protein coding sequences listed in Table 4 (under Example 5 cultures were used in transient co-infection experiments in below) were made and transformed into Agrobacterium. One 55 wild-type Nicotiana plants. One of the Agrobacterium cul colony from each transformation was selected and main tures contained a vector comprising a regulatory region of tained as a glycerol stock. Two days before the experiment interest operably linked to a luciferase reporter gene, and the commenced, each transformant was inoculated into 150 uL of other contained a vector that included the CaMV 35S consti YEB broth containing 100 g/mL spectinomycin, 50 ug/mL tutive promoter operably linked to a nucleotide sequence that rifampicin, and 20 uMacetosyringone; grown in an incuba 60 coded for a regulatory factor of interest. The Agrobacterium tor-shaker at 28°C.; and harvested by centrifugation at 4,000 culture and Suspension were prepared as described in rpm for at least 25 minutes. The Supernatant was discarded, Example 2. The two different Agrobacterium suspensions and each pellet was resuspended in a solution of 10 mM were mixed to a final optical density (ODoo) of approxi MgCl; 10 mM MES, pH 5.7; and 150 uMacetosyringone to mately 0.1 to 0.5. The mixture was loaded into a 1 mL syringe an optical density (OD) of approximately 0.05 to 0.1. Each 65 with a 30 gauge needle. Suspension was transferred to a 1 mL Syringe outfitted with a Depending on the size of a Nicotiana leaf, it can be divided 30 gauge needle. arbitrarily into several sectors, with each sector accommodat US 7,795,503 B2 55 56 ing one type of Agrobacterium mixture. Transient infection of luminescence signal was higher in the treated leaf than in the a wild-type tobacco leaf sector was done by mildly wounding 35S-GFP-treated reference control (considered the back the Surface of a leaf using the tip of a syringefneedle contain ground activity of the regulatory region), and (2) if the #1 ing a mixture of Agrobacterium culture Suspensions. A Small criterion occurred in at least two independent transformation droplet of the Agrobacterium Suspension was placed on the events carrying the regulatory region-luciferase reporter con wound area after wounding. Each leaf sector was wounded struct. Results of the visual inspection were noted according approximately 20 times at different positions within the same to the rating system given in Table 3, and with respect to both leaf sector. Treated Nicotiana leaves were left intact and the positive and negative controls. attached to the mother plant for at least 5 days prior to analy sis. A leaf sector treated with Agrobacterium that contained a 10 TABLE 3 binary construct including a CaMV 35 S constitutive pro moter operably linked to a GFP coding sequence was used as Luciferase activity Scoring System a reference control. Score Score Comment

15 ++ signal in the treated leaf is much stronger than in reference Example 5 background + signal in the treated leaf is stronger than in reference background Luciferase Assay and Results weak signal but still relatively higher than reference background - no response Treated intact leaves from Examples 2 and 4, and leaf disks from Example 3, were collected five days after infection and Alkaloid regulatory region/regulatory protein combina placed in a square Petri dish. Each leaf was sprayed with 10 tions that resulted in a score of +/-, + or ++ in both indepen uM luciferin in 0.01% Triton X-100. Leaves were then incu dent Arabidopsis transformation events were scored as hav bated in the dark for at least a minute prior to imaging with a ing detectable luciferase reporter activity. Combinations that Night OwlTM CCD camera from Berthold Technology. The resulted in a score of +/-, + or ++ in one independent Arabi exposure time depended on the screening line being tested; in 25 dopsis transformation event were also scored as having most cases the exposure time was between 2 to 5 minutes. detectable reporter activity if similar ratings were observed in Qualitative scoring of luciferase reporter activity from each the Nicotiana experiment. Combinations (also referred to as infected leaf was done by visual inspection and comparison of associations herein) having detectable luciferase reporter images, taking into account the following criteria: (1) if the activity are shown in Table 4, below.

TABLE 4 Combinations of regulatory regions and regulatory proteins producing expression of a reporter gene operably linked to each regulatory region Regulatory Regulatory Regulatory Regulatory Region Protein Protein Protein Construct SEQ ID NO: Gemini ID cDNA ID Screening Organism AtCR2-L-AtROX6-L 35 532E7 233601 14 Arabidopsis thaliana and Tobacco AtCR2-L-AtROX6-L 44 S32A7 23366941 Arabidopsis thaliana and Tobacco AtCR2-L-AtROX6-L 124 531F11 23401690 Arabidopsis thaliana and Tobacco AtSS1-L-AtWDC-K 193 S11OE8 24365511 Arabidopsis thaliana AtSS3-L-AtROX7-L 193 S11OE8 24365511 Arabidopsis thaliana and Tobacco AtSS3-L-AtROX7-L 190 S110C8 23655935 Arabidopsis thaliana and Tobacco AtSS3-L-AtROX7-L 177 511 OHS 23522373 Arabidopsis thaliana and Tobacco AtSS3-L-AtROX7-L 152 552G6 23419038 Arabidopsis thaliana and Tobacco CrSS-L-AtSLS6-K 2 531E4 23356923 Arabidopsis thaliana and Tobacco Ec 115 531B3 23387900 Arabidopsis thaliana Ec and Tobacco Ec 72 531D2 23383878 Arabidopsis thaliana Ec and Tobacco Ec 7 531E7 23357249 Arabidopsis thaliana Ec and Tobacco Ec 158 531H6 23427553 Arabidopsis thaliana Ec and Tobacco Ec 88 532C5 23.385649 Arabidopsis thaliana Ec and Tobacco Ec 198 533D3 23462512 Arabidopsis thaliana Ec and Tobacco Ec 169 531B1 23472397 Arabidopsis thaliana Ec and Tobacco EcBBE 18 531H7 23358452 Arabidopsis thaliana EcNMCH3-L and Tobacco US 7,795,503 B2 57 58

TABLE 4-continued Combinations of regulatory regions and regulatory proteins producing expression of a reporter gene operably linked to each regulatory region Regulatory Regulatory Regulatory Regulatory Region Protein Protein Protein Construct SEQ ID NO: Gemini ID cDNA ID Screening Organism

C BBE-L- 196 Zap1 23468313 Arabidopsis thaliana cNMCH3-L and Tobacco BBE-L- 2OO 555A3 23377122 Arabidopsis thaliana cNMCH3-L and Tobacco BBE-L- 221 555C4 23704869 Arabidopsis thaliana EcNMCH3-L and Tobacco PSBBE-L 138 23416527 O)3CCO PSBBE-L 81 2338.5144 80CO PSBBE-L 101 23387851 80CO PSBBE-L 124 234O1690 80CO PSBBE-L 2O6 23388445 80CO PSBBE-L 2OO 23377122 80CO PSHMCOMT2-L. 101 23387851 80CO PSHMCOMT2-L. 63 23371050 80CO PSHMCOMT2-L. 190 23655935 80CO PSHMCOMT2-L. 219 23447935 80CO PSHMCOMT2-L. 2O6 23388445 80CO PSROMTL 7 23357249 80CO PSROMTL 158 23427553 80CO PSROMTL 169 23472397 80CO PSROMTL 101 23387851 80CO PSROMTL 63 23371050 80CO PSROMTL 217 2339S214 80CO PSROMTL 219 23447935 80CO PSROMTL 2O6 23388445 80CO PSROMTL 221 23704869 80CO Legend: L = Luciferase K=Kanamycin (neomycin phosphotransferase) AtCR2 = Arabidopsis putative codeinone reductase gene 2 promoter AtROX6 = Arabidopsis putative reticuline oxidase gene 6 promoter AtROX7 = Arabidopsis putative reticuline oxidase gene 7 promoter CrSS = Catharanthus roseus strictosidine synthase gene promoter AtSLS6 = Arabidopsis putative secologanin synthase gene 6 promoter EcBBE = Eschscholzia californica berberine bridge enzyme gene promoter EcNMCH3 = Eschscholzia californica N-methylcoclaurine 3'-hydroxylase gene promoter AtSS1 = Arabidopsis putative strictosidine synthase gene 1 promoter AtSS3 = Arabidopsis putative strictosidine synthase gene 3 promoter AtWDC = Arabidopsis putative tryptophan decarboxylase gene promoter PsBBE = Papaver somniferum berberine bridge enzyme promoter PsHMCOMT2 = Papaver somniferum hydroxy N-methylS-coclaurine 4-O-methyltransferase 2 gene promoter PsROMT = Papaver somniferum (R, S)-reticuline 7-O-methyltransferase (PsROMT) gene promoter

Example 6 45 TABLE 5-continued Transformation of Opium poppy and Analysis of Transcriptional Activation Regulatory proteins expressed in Opium poppy plants Regulatory Regulatory Regulatory 50 Opium poppy (Papaver SOmniferum) was transformed Protein Protein Protein with the eight cDNA clones and one genomic clone listed in Gemini ID SEQID NO: Clone ID Type of Insert Table 5, below. These clones were selected because they were 532E7 35 343.63 At-cDNA S32A7 44 18663 At-cDNA able to activate alkaloid-related promoters in primary and 531A3 138 1007869 At-cDNA secondary transactivation screens in Arabidopsis and ss 531B1 169 2S1343 At-cDNA tobacco. Transformation of Opium poppy was performed as 531H6 158 40SO1 At-cDNA described below. 532G5 63 2SO132 At-cDNA 531F11 124 603410 Gm-cDNA TABLE 5 Regulatory proteins expressed in Opium poppy plants 60 Ex-Plant Preparation and Embryogenic Callus Induction Regulatory Regulatory Regulatory Seed Sterilization and Germination: Protein Protein Protein Seeds of Papaver somniferum cv. Bea's Choice (Source: Gemini ID SEQID NO: Clone ID Type of Insert The Basement Shaman, Woodstock, Ill.) were surface-steril 196 Zap1 At-genomic 65 ized in 20% Clorox (commercial bleach) plus 0.1% Liqui 2 19578 At-cDNA Nox (surfactant) for 20 min. and rinsed 3 times with sterile MilliOHO. Seeds were allowed to germinate in Germination US 7,795,503 B2 59 60 Medium (GM; /2 strength of MS basal salts supplemented L-Glutamine--200 mg/L L-Cysteine) and incubated at 25°C. with B5 vitamins, 1.5% sucrose and 4 g/L Phytagar, pH 5.7) under highlight in Percival growth chamber with 16 hr photo in Magenta boxes by incubating in Percival growth chamber period. with 16 hr/8 hr light/dark photo period at 25°C. After 10-15 days, bialaphos resistant calli were transferred Preparation of Embryogenic Callus Highly Competent for 5 Transformation: to Regeneration Medium 2 (RM2=CM+250 mg/L carbenicil Hypocotyls, roots, and young leaves of 10 to 20 day old lin+0.5 mg/L Zeatin--0.05 mg/L IBA+100 mg/L seedlings were cut and placed on Callus Induction Medium L-Glutamine--200 mg/L L-Cysteine). Bialaphos resistant EC (CIM; MS basal medium with B5 vitamins, 1 g/L Casamino will continue to grow and differentiate into embryos. These acid, 2 mg/L 2.4 D, 0.5 mg/L BA, and 6.5 g/L Phytagar) and 10 embryos developed into plantlets after 15-20 days. incubated at low light at 25°C. in Percival growth chamber. Small plantlets with roots were transferred to Rooting Callus was initiated from the cut surface of the explants Medium (RtM=CM--250 mg/L carbenicillin+0.2 mg/LIBA+ within 20 days. Callus was subcultured onto fresh CIM. 50 mg/L L-Glutamine-4 g/L Phytagar) insterile Sundae Cup. Thereafter, subculture was done every 10 to 15 days. After 2-3 subcultures compact light yellow to white spherical embryo 15 Fully-regenerated plants are transferred to soil at appropri genic callus (EC) usually emerged from the Surface of trans ate time. lucent friable non-embryogenic callus (NEC). EC was sepa Protocol for qRT-PCR Analysis rated from NEC and subcultured in CIM every 10 to 12 days. In most cases, five (5) independent transgenic events for Transformation each TF clone were used for qRT-PCR analysis. In a few Preparation of Agrobacterium: cases, three independent events were used. Tissues collected Agrobacterium contained a T-DNA construct using the from wild-type and transgenic lines were of two types: callus binary vector CRS338 with a DNA insert corresponding to a and embryogenic callus. The control for transgenic regular clonelisted in Table 5 above driven by a CaMV35S promoter. callus was the corresponding wild-type regular callus. Simi The T-DNA also contained a synthetic gene encoding phos 25 larly, the control for the transgenic embryogenic callus was phinotricin-acetyltransferase under the control of 28716 pro the wild-type embryogenic tissue. The difference between the moter (gDNA ID: 7418782). The Agrobacterium was then two tissue types is morphological, i.e., the presence of inoculated into 2 mL of YEB liquid medium with appropriate antibiotics and incubated overnight at 28°C. with appropriate embryo-like structures surrounded by callus cells. shaking. Agrobacterium cells were spun down at 10,000 rpm 30 Total RNA was isolated from the tissue samples using in 1.5 mL Eppendorf tube at room temperature (RT) using a Trizol Reagent (Invitrogen). RNA was converted to cDNA micro-centrifuge. Cells were resuspended in 6 mL of liquid using the reagents included in the iScript kit (BioRad). Quan co-cultivation medium (liquid CCM-CM with 100 uM titative RT-PCR was performed using BioRad iCycler Acetosyringone) in 50 mL conical tube to get a final ODoo of 35 reagents and iCycler PCR machine. O.O6-0.08. Opium poppy CAB (chlorophyll-a/b binding protein) gene Transfection of EC: was used to normalize the expression of different alkaloid Approximately 0.5 to 1 gram of EC was infected with related genes in the samples. The expression level of CAB Agrobacterium Suspension for 5 min with gentle agitation. gene appeared to be similar in all wild-type and transgenic Transfected EC was blotted-dry with sterile Kimwipe paper 40 in a Petri plate before transfer on top of sterile Whatman filter tissues analyzed. The extent of transcription of the transgenes paper contained in co-cultivation Medium (CCM). Trans relative to non-transgenic wild-type was calculated to a cer fected EC was incubated at 22°C. under low light in Percival tain degree using any measurable threshold cycle (Ct). If there growth chamber for 3 days for co-cultivation. was no measurable Ct, the samples were given an arbitrary, Callus Recovery: 45 conservative estimate number of 35 to have an estimate of the Transfected EC were washed 3 times with 20-30 mL of expression relative to wild type. sterile MilliO-HO with moderate shaking. The last wash was Aside from the transcription of the corresponding trans done in the presence of 500 mg/L Carbenicillin. Washed EC gene, the transcription of the genes listed in Table 6, below, was briefly dried in sterile Kimwipe paper prior to transfer in was monitored for each of the transgenic events using the Recovery Medium (RRM=CIM+500 mg/L carbenicillin). 50 corresponding set of primers, also shown below (Table 7). Transfected EC was incubated at 25°C. under low light in Percival growth chamber for 7-9 days. TABLE 6 Selection for Transformed Calli: Genes monitored for transcription in transgenic opium poppy plants After the recovery period, all calli were transferred to Cal 55 lus Selection Medium (CSM=CM+500 mg/L carbenicillin+5 Gene Code Identity mg/L bialaphos) and incubated at 25°C. under low light in CR Codeinone reductase (EC 1.1.1.247) Percival growth chamber. Subculture of transfected EC was BBE Berberine bridge enzyme (EC 1.21.3.3) done every 10 to 12 days. After the second subculture, only HMCOMT1 S-adenosyl-L-methionine:3'-hydroxy-N-methylcoclaurine bialaphos resistant calli were transferred to fresh CSM. The 4'-O-methyltransferase 1 (EC 2.1.1.116) resistant embryogenic calli typically had light yellow color. 60 HMCOMT2 S-adenosyl-L-methionine:3'-hydroxy-N-methylcoclaurine 4'-O-methyltransferase 2 (EC 2.1.1.116) Non-resistant calli typically were light to dark brown in color NOMT (RS)-norcoclaurine 6-O-methyltransferase and were dead or dying. (EC 2.1.1.128) Regeneration: ROMT (RS)-reticuline 7-O-methyltransferase SAT Salutaridinol 7-O-acetyltransferase (EC 2.3.1.150) After 3 subcultures, bialaphos resistant calli were trans 65 YDC (or TYD) Tyrosine decarboxylase (EC 4.1.1.25) ferred to Regeneration Medium 1 (RM1=CM--250 mg/L car benicillin+2 mg/L Zeatin--0.05 mg/L IBA+100 mg/L US 7,795,503 B2

TABLE 7 Primers used to monitor gene expression in transgenic opium poppy plants

SEQ SEQ ID ID Gene Name Sense Primer NO: Anti-sense Primer NO :

PSBBE AAACGTGTTTAGACATGCATTAGA 331 CCTCCGAAACCATTTAGAGCTATA 332

PSCR TTACTGCATACTCGCCTTTGG 333 AGATTTTCCTCTGACCTGGGA 334

PSSAT GGTAAAATGTGTGAGTTCATGTCG 33 AACGATCACCAGTGCTTCCTT 336

PSHMCOMT1 TTGGTGAATACAGGTGGTAGAGA 337 GCGAATCGGTCTGATCTTATGA 338

PSHMCOMT2 AAAACAATGATGGGGCAATCAC 339 GGTGTACCAAGTATCTTACCATTC 34 O

PSROMT TACAACAATAATCAACGAGATGCT 341 TAACTATTTCAGCGATTATCGACC 342

PSNOMT ATGGCTTGTGATACTAGATTGGTT 343 GCTTTAGATATGGCTTTCACTGC 344

PSYDC GTTTCTATGTGCTACTGTGGGTAC 34 is GCCTGAACTCCGGGCAAA 346

At - 531F11 GCCAGCACGCAAACTTCAG 347 GCTTGAGGTGGTGTTAGAATTGTT 348

At - 531E4 ATCTCATTCACCGACGCAGAAG 349 CGGCGAGTCTGGGATGCT 350

At - 532 G5 CGAGCCACGGAGGAACAA 351 CCCCTCTCGTTGATGAGGAA 352

At - 531A3 ATTACAGAGGAGTGAGGCAGAGA 353 CGCATCCTAAAAGCAGCTATATCG 354

At - 532Ef GCTGCACAGTATGGAGTTCTC 35. CCCTTGATCCTGACAGCTCTAA 356

At - 532A7 TTCTGCCAAAAGTGTTGTGCTA 357 TGTTGAGTCTTCCAGTTGTTGTAA 38

At - 531H6 AAGAAGCTAATCCACTGGCATG 359 CGATGGTTGGTAATGTGAATTGTT 360

At-Zap CAAACAAAGATCACGCAGCCA 361 TGCCTTCTCTTGACAAACAGTG 362

PsCAB AAGAATGGTAGACTTGCTATGTTG 363 ATGTTGCAGTGACCAGGATC 364

Summary of the qRT-PCR Results The values for each gene shown in Table 8, below, are normalized to CAB gene expression for each of the transfor- 40 mation events relative to the averaged value of non-transgenic wild-type.

TABLE 8 qRT-PCR results

Gemini Clone ID ID Tissue Line CR BBE HMCOMT1 HMCOMT2 NOMT ROMT SAT YDC

Zap Zap Average 8.46 9.10 4.69 1.11 1.25 O.OS 0.55 3.68 Callus Line1 8.22 S.10 4.79 4.79 O.71 O.11 0.57 4.08 Callus Line2 16.45 25.11 4.96 4.96 2.08 O.19 18O 9.06 Callus Line3 10.48 S.28 5.31 5.31 1.08 O.OS O.28 4.69 Callus Line4 6.92 9.19 7.01 7.01 2.16 O.O2 0.97 1.84 Callus Line5 8.22 O.84 2.57 2.57 O.88 O.O2 O.18 2.11 Zap Zap Average 64.60 Not O.O2 149.09 66.72 O.19 8.94 O.93 performed Embryonic 162O2 Not O.OO 1009.90 421.68 1.32 69.SS 3.92 Line1 performed Embryonic 15.35 Not O.19 89.26 25.46 0.73 4.66 O.14 Line2 performed Embryonic 16.45 Not O.04 36.25 22.94 1.37 2.17 1.54 Line3 performed 531E4 19578 Average 4.98 6.08 7.73 1.02 O.78 O.28 O42 23.51 Callus Line1 11.08 13.93 18.51 3.53 2.39 O.SO O.62. 140.07 Callus Line2 3.18 6.06 4.96 O.69 O.66 0.27 O.28. 18.13 Callus Line3 4.99 2.64 4.96 1.25 O.74 0.14 O.S.S 29.45 Callus Line4 O.65 1.68 7.78 O.36 O.32 0.35 O.34 4.08 US 7,795,503 B2 63 64

TABLE 8-continued QRT-PCR results Gemini Clone ID ID Tissue Line CR BBE HMCOMT1 HMCOMT2 NOMT ROMT SAT YDC 532E7 34363 Average 1.98 148 39 2.73 3.94 120 O.70 1.92 Embryonic O.18 1.13 .44 O.83 1.01 1.41 O.S1 3.07 Line1 Embryonic 6.92 3.18 30 7.89 30.27 1.37 O.82 2.87 Line2 Embryonic 140 O.80 17 3.43 3.66 O.81 0.67 1.21 Line3 Embryonic O.68 1.OS 17 1.39 3.41 O.81 O.88 1.59 Line4 Embryonic 0.73 1.25 2.04 4.86 2.50 .93 O.69 1.54 Line5 S32A7 18663 Average 13.77 11.07 54 39.40 43.11 37 1.75 5.43 Embryonic 28.64 1997 2.35 160.90 210.84 27 1.39 9.00 Line1 Embryonic 20.25 22.94 30 106.15 159.79 .46 5.17 7.31 Line2 Embryonic 14.32 6.82 17 63.12 44.32 O.84 3.92 10.34 Line3 Embryonic 3.71 3.18 2.27 9.71 10.70 .87 0.72 2.87 Line4 Embryonic 1.92 2.41 O6 9.06 9.32 62 O.82 2.41 Line5 531A3 1007869 Average 12.75 6.67 O6 16.34 38.32 61 1.36 19.03 Embryonic 5.43 3.66 16 25.63 27.28 .74 1.16 16.22 Line1 Embryonic 32.90 9.32 72 9.71 20.68 O.90 1.21 91.77 Line2 Embryonic 9.13 7.84 .21 3S.O2 45.89 1S 1.6S 18.00 Line3 Embryonic 2.45 1.09 O.S1 3.43 16.8O 2.55 O.82 6.59 Line4 Embryonic 13.83 11.47 55 38.85 194O1 2.38 2.41 14.12 Line5 531B1 251343 Average 1948 5.86 17 2.22 1.72 1.04 0.62 4.OO Callus Line1 41.36 11.71 2.31 8.40 5.31 S.82 3.14 18.13 Callus Line2 14.12 2.93 O.6O 1.34 4.63 S.82 O.47 14.72 Callus Line3 2.97 2.93 16 O.98 O.20 O.O3 O16 O.24 531H6 40501 Average 19.62 24.10 S1 42.52 11143 O.65 2.17 3.61 Embryonic 24.08 31.78 17 41.64 95.01 1.00 2.25 4SO Line1 Embryonic 40...SO 35.26 .44 198.09 177.29 O.23 7.84 8.40 Line2 Embryonic 7.67 24.93 91 51.27 SO.91 0.57 2.77 4SO Line3 Embryonic 6.23 4.41 60 7.89 1514 1.37 O46 1.01 Line4 532G5 250132 Average 21.01 12.85 O9 38.05 60.13 O.81 3.34 3.25 Embryonic 28.64 10.85 26 47.84 69.55 1.41 3.78 S.S4 Line1 Embryonic 3.12 1.99 O.8O 4.86 10.34 1.32 0.77 1.71 Line2 Embryonic 29.65 21.71 26 95.67 139.10 O.90 8.11 3.78 Line3 Embryonic 19.56 15.89 26 63.12 101.83 0.55 4SO 3.41 Line4 Embryonic 24.08 13.83 O.95 56.89 77.17 O.45 3.92 2.97 Line5 531F11 603410 Average 10.06 27.76 O.86 49.87 92.41 O.31 3.89 1.68 Embryonic 8.22 34.06 O.95 95.67 154.34 0.15 7.06 O.82 Line1 Embryonic 16.45 S1.63 1.60 145.01 134.36 O.34 4.50 2.33 Line2 Embryonic 10.48 15.35 1.13 27.47 SO.91 1.15 3.07 3.07 Line3 Embryonic 6.92 32.90 0.44 75.06 154.34 0.04 3.41 O.62 Line4 Embryonic 8.22 4.89 O.61 10.78 41.36 1.32 2.68 3.66 Line5

Six of the nine regulatory protein clones tested were able to poppy. These clones correspond to Zap. 531A3, 532G5, transcriptionally activate at least two genes that belong to the 532A7, 531H6, and 531F11. In all of the six cases, the trans isoquinoline/morphinan biosynthetic pathways in opium activated genes are the HMCOMT2 S-adenosyl-L-methion US 7,795,503 B2 65 66 ine. 3'-hydroxy-N-methylcoclaurine 4'-O-methyltransferase 2 and NOMT (R.S)-norcoclaurine 6-O-methyltransferase. TABLE 9-continued Depending on the transformation event, the transcription of Regulatory proteins expressed in California poppy plants these genes is increased between 10x to 1000x. In some cases, expression of CR (codeinone reductase) and BBE (ber- 5 Gemini ID SEQID NO: Clone ID Type of insert berine bridge enzyme) genes are also enhanced by at least 10- 531 HT 18 16204 At-cDNA to 162-fold. Zap 196 Zap At-genomic It is interesting to highlight that the effect of the Zap gene on transcriptional transactivation is manifested differently 10 The clones listed in Table 9, above, were selected because depending on the developmental state of the tissue. Transac- they were able to activate alkaloid-related promoters in pri tivation is greater in embryogenic tissue than in the callus. mary and secondary transactivation screens in Arabidopsis This implies that in callus, transcriptional repression of alka- S. East h City E. loid-related genes possiblv exists and that 35S-driven over- ormed essenually Iollowing une procedures publisned by expression For like a TF may not be enough 15 Park and Facchini, 1999 (Plant Cell Rep 19: 421-426) and to bring the transcriptional transactivation of relevant genes to 2000 (Plant Cell Rep 19:1006-1012). a level present in embryogenic tissues. Stical Analysis of Alkaloid Production in California oppy: Example 7 Tissues from at least three independent transformation 20 events per clone were used. Twenty mg freeze-dried Califor nia po callus tissue was extracted in methanol using a Transformation of California Poppy and Chemical E. 4 hours. Reserpine was included during E. Analysis for Alkaloids tion to serve as internal standard. The crude extract was clarified using a syringe filter and the resulting methanol California poppy (Eszcholtzia Californica) was trans- 25 filtrate was analyzed using LC-MS. LC-MS analysis was formed with the ten cDNA clones and one genomic clone performed on the resuspended methanol extract using listed in Table 9, below: Waters-Micromass ZMD (single quadrupole, benchtop MS detector with positive electrospray ionization). The area of TABLE 9 the signature peaks from LC-MS data for known alkaloid 30 intermediate was normalized to the internal standard. Regulatory proteins expressed in California poppy plants LC-MS Conditions: Gemini ID SEQID NO: Clone ID Type of insert A gradient of 20% to 95% acetonitrile (in 0.1% formic acid) for 55 min. followed by a 5 min. isocratic run in 95% E; s S. ARNA acetonitrile (in 0.1% formic acid) using an Alltima C 18 S32A7 44 18663 NSNA 35 column (5um; 150x4.6 mm). 531A3 138 1OO7869 At-cDNA Summary of Chemical Analysis for Selected Benzophenan 531B1 169 251343 At-cDNA thridine Alkaloids in California poppy : 3. ARNA The values shown in Table 10, below, indicate the fold 531F11 124 603410 Gm-cDNA 40 increase or fold-decrease in the amount of selected alkaloids S33B1 81 111598 At-cDNA for each transgenic plant relative to average values of the non-transgenic wild-type.

TABLE 10 Fold change in benzophenanthridine alkaloids in transgenic California poppy plants Fold Increase or Decrease in the Amount of Indicated Alkaloid Intermediate Trans- Relative to Non-transgenic Wild-type

O- 12- 10 Gemini mation Dihydroxy- hydroxy- hydroxy- Dihydro- Dihydro- Dihydro Clone ID ID Event # Sanguinarine dihydrosanguinarine dihydrochelirubine dihydrosanguinarine macarpine chelirubine sanguinarine 2S143 531B1 Event 1 3.53 O.O2 O.OO 2.60 O.17 O.OS 2.73 Event 2 2.38 O.31 O.19 7.OO 1.35 O.S3 10.18 Event 3 2.41 O.64 0.55 65.32 18O 0.97 35.55 1OO7869 531A3 Ewent 1 3.76 O.87 O.S8 29.31 2.21 2.32 40.74 Event 2 3.81 2.15 12.63 135.74 1.57 3.44 1OS.O.S Event 3 3.63 O.9S 1.52 100.84 1.69 1.75 14473 162O4 S31H7 Event 2 2.76 O.S1 0.27 24.76 O.OO O.45 116.06 Event 3 2.55 O.77 O.18 52.53 3.73 O.60 63.76 111598 S33B1 Event 1 2.10 3.32 13.75 183.83 3.16 2.63 102.49 Event 2 2.25 1.18 O.94 79.54 0.67 1.18 128.44 Event 3 2.48 1.26 O.25 85.09 3.15 O.88 37.63 19578 S31E4 Event 1 2.23 O.29 O.12 1989 O40 O.21 9.09 Event 2 O.98 O.26 O.12 73.99 4.SS 0.57 74.64 Event 3 1.13 O.28 O16 31.51 O.80 O.26 30.90 18663 532A7 Ewent 1 1.06 O.29 O.21 46.24 5.37 O42 124.64 2S1466 S32HS Event 1 1.83 O.71 O4O 25.30 2.41 1.47 33.10 Event 2 1.91 O.70 O.96 22.21 1.19 1.59 1474 Event 3 1.70 O.34 O.2O 7.92 0.67 1.72 12.73 US 7,795,503 B2 67 68

TABLE 10-continued Fold change in benzophenanthridine alkaloids in transgenic California poppy plants Fold Increase or Decrease in the Amount of Indicated Alkaloid Intermediate Trans- Relative to Non-transgenic Wild-type

O- 12- 10 Gemini mation Dihydroxy- hydroxy- hydroxy- Dihydro- Dihydro- Dihydro Clone ID ID Event # Sanguinarine dihydrosanguinarine dihydrochelirubine dihydrosanguinarine macarpine chelirubine sanguinarine 34363 S32E7 Event 1 1.51 O.20 O.37 12.55 1.68 1.63 13.29 Event 2 2.02 O.35 O.34 33.84 3.21 2.50 30.10 Event 3 1.75 O.69 4.96 40.36 1.06 2.95 22:42 603410 531F11 Ewent 1 O.88 O.S6 O16 0.67 O.24 O.94 140 Event 2 O.65 O.34 0.71 1.92 O.39 3.61 O.81 Event 3 O.74 0.67 8.24 O.78 OSO 1.86 4.27 250132 S32GS Event 1 1.42 O.71 O.91 5.52 3.00 1.44 27.86 Event 2 O.93 O.48 O.18 1.52 2.12 O.83 35.05 Event 3 1.16 O.60 0.55 13.15 2.73 1.13 SO.62 Zap Zap Event 1 O42 2.72 2.27 23.82 1.14 0.37 1129 Event 2 O.94 6.23 4.22 2.OO 1.17 1.92 O.9S Event 3 1.OO 4.33 3.30 34.88 2.03 O.S6 30.26 Event 4 3.15 4.16 O.S3 6.03 1.33 O.83 2.75

Nine of the 11 regulatory protein clones tested were able to immediately frozen in liquid nitrogen. Samples were lyo significantly increase the production of some of the ben 25 philized prior to analysis. Zophenanthridine alkaloid intermediates in California poppy LC-MS Analysis: plants. These clones were Zap. 531A3, 531 H7, 533B1, Freeze-dried samples were extracted using methanol as 531E4,532A7,532H5,532E7, and 532G5. Depending on the solvent with sonication for 30 min. with shaking. An internal intermediate, the extent of the increase was between 2-fold standard (either reserpine or oxycodone) was included during and 183-fold. Most of the benzophenanthridine alkaloid 30 extraction. Analysis of morphinan alkaloids was performed intermediates that were Substantially increased were sangui using an LC-MS Ion Trap (Thermo-Finnigan) with a step narine and its derivatives 10-hydroxy-dihydrosanguinarine gradient mobile phase of 5% to 100% methanol for 60 min. and dihydrosanguinarine. Retention times of signature ions for alkaloid intermediates were checked against reference standards. Areas of the peaks Example 8 35 corresponding to the signature ions were integrated using the software programs associated with the LC-MS system. The Chemical Analysis of Transgenic Opium Poppy areas of the alkaloid peaks were divided by the area of the Plants for Alkaloids peak corresponding to the internal standard. Peak areas rela tive to the internal standard are listed in columns 3 to 5 of Rosette leaves from transgenic Opium poppy lines contain 40 Table 11 below. These values were divided by the average ing selected regulatory factors from the transactivation screen values from the wild-type to calculate the normalized values were analyzed. Leaf samples were collected from pre-flow listed in columns 6 to 8 of Table 11 below. Normalized values ering plants (first generation transgenic lines) and were of 1.5 or greater are highlighted in bold. TABLE 11 Fold change in norphinan alkaloids in transgenic opium poppy plants Morphine Codeine Peak Thebaine Normalized Normalized Normalized Plant Line Event Peak Area Area Peak Area Morphine Codeine Thebaine Gemini ID Code (Relative to Internal Std) (Peak AreaWTAverage) 531A3 PS-531A3-1-04 3.307 O564 7.581 O.686 1.120 1046 PS-531A3-2-O1 3.467 0.377 2.552 O.719 O.749 O.352 PS-531A3-2-03 1577 O.196 2.2O6 0.327 O.388 O.304 PS-531A3-2-04 4.369 O.299 3.31.8 O.906 O.S94 O.458 PS-531A3-2-OS 2.563 O.28O 11.382 O.S32 0.556 1571 PS-531A3-2-07 3.352 O.062 O.467 O.695 O.122 O.064 PS-531A3-6-O1 S.286 O.336 8.66O 1.096 O.667 1.195 PS-531A3-6-O2 3.386 O686 13.582 O.702 1.363 1874 PS-531A3-6-03 3.889 O.474 4.363 O.807 O941 O. 602 PS-531A3-6-04 S.283 0.570 20.956 1.096 1.132 2.892 PS-531A3-6-07 3.366 O409 1758 O.698 O812 O.243 PS-531A3-6-08 5.947 O.S96 8.459 1234 1184 1.167 531E4 PS-531E4-4-O1 2.814 0.550 7.239 O.S84 1.092 O.999 PS-531E4-4-03 2.785 0.357 1632 0.578 O.709 O.225 PS-531E4-4-04 4.275 O.479 6.966 O.887 O.9SO O.961 PS-531E4-4-0S 4.176 O.458 6.148 O.866 O.909 O.848 PS-531E4-4-09 3.008 O431 3.923 O.624 0.855 O.541 531F11 PS-531F11-1-01 O.933 O.339 O.490 O.194 O.673 O.O68 PS-531F11-2-01 2164 0.177 6.028 O449 O.351 O.832 US 7,795,503 B2 69 70 TABLE 1 1-continued Fold change in morphinan alkaloids in transgenic opium poppy plants Morphine Codeine Peak Thebaine Normalized Normalized Normalized Plant Line Event Peak Area Area Peak Area Morphine Codeine Thebaine Gemini ID Code (Relative to Internal Std) (Peak Area/WTAverage) PS-531F11-2-03 6.569 O.728 9.308 1.363 1445 1284 PS-531F11-2-04 4.100 O.286 6.518 O.8SO O.S68 O.899 PS-531F11-2-OS 2.708 O.310 1668 O.S62 O616 O.230 PS-531F11-3-O1 3.952 O.475 9.788 O.82O O943 1.351 PS-531F11-3-03 3.794 O.722 4...106 0.787 1434 0.567 PS-531F11-3-OS 4.906 OS46 13.788 1.018 1.084 1903 PS-531F11-6-02 1.591 O.S2O 34.951 O.330 1.032 4.823 PS-531F11-6-03 3.719 0.157 0.379 0.771 O.312 O.OS2 PS-531F11-6-04 3.782 O.381 2.921 O.784 0.756 O403 PS-531F11-6-06 4.366 O.S13 2.058 O.906 1.019 O.284 531H6 PS-531H6-1-02 2.538 O.331 3.260 O.S26 0.657 O4SO PS-531H6-1-04 O.679 O840 26.468 O.141 1668 3.652 PS-531H6-3-O1 1923 O.239 882 O.399 O.475 O.260 PS-531H6-3-O2 3.874 O.233 13.239 O.803 O462 1827 PS-531H6-4-O2 3.S60 O.330 20.288 O.738 0.655 2.800 PS-531H6-4-03 2.316 O438 12.670 O.48O O869 1748 PS-531H6-4-04 2.886 O.399 .293 O.S.99 O.792 O.178 PS-531H6-4-06 2.763 O.SS4 8.698 0.573 1.101 1200 PS-531H6-5-03 2.81.1 O.261 469 O.S83 O.S18 O.2O3 PS-531H6-6-01 4.576 O.215 4.846 O.949 O426 O669 PS-531H6-6-02 2.724 O.16S 388 0.565 0.327 O.192 PS-531H6-6-03 5.376 O417 7.319 1.115 O828 1.010 PS-531H6-6-04 3.155 O497 0.737 O.654 O.986 O.102 PS-531H6-6-0S 4.1.16 O.760 9.58O 0.854 1509 1322 S32A7 PS-532A7-10-01 5.878 O460 11130 1.219 O.914 1536 PS-532A7-10-02 5.637 O.935 19.229 1.169 1857 2.653 PS-532A7-10-05 3.968 O.171 2.044 O.823 O.340 O.282 PS-532A7-2-01 3.356 O.239 229 O.696 O.475 O.170 PS-532A7-3-05 6.047 O.362 11.154 1.254 O.718 1539 532E7 PS-532E7-2-03 9.846 0.077 0.377 2042 O.153 O.OS2 PS-532E7-3-03 3.091 O.62O 8.993 O641 1231 1.241 PS-532E7-4-O2 8.2O6 2.308 23.385 1702 4.582 3.227 PS-532E7-4-06 7.522 O.972 12.66S 1S60 1930 1748 PS-532E7-7-01 3.778 O.S96 26.885 O.784 1184 3.710 532G5 PS-532GS-3-02 1.166 0.675 9.948 O.242 1340 1.373 PS-532GS-4-O1 1.547 O.772 2.692 O.321 1533 O.371 PS-532GS-4-O2 3.725 O916 4.939 0.773 1818 O681 PS-532GS-4-03 2.774 O427 2.387 0.575 O849 O.329 PS-532GS-4-04 1681 0.565 8.744 O.349 1.121 1.2O7 PS-532GS-6-01 1.370 O.S89 5.140 O.284 1.170 O.709 PS-532GS-6-02 1.726 O.264 O.317 O.358 0.525 O.044 Zap1 Ps-Zap-2-02 5.359 O.260 32.042 1112 0.517 4.421 Ps-Zap-3-02 3.149 O497 16.780 O.653 O.987 2315 Ps-Zap-3-04 3.975 O.335 11.842 O.825 O.66S 1634 Ps-Zap-4-01 3.670 O.S24 13.268 O.761 1040 1831 Ps-Zap-4-04 2.08O O.197 20.363 O431 O.391 2.810 Wild Type Ps-WT-01 S.342 O437 10.361 PS-WTO2 4.211 0.595 6.891 PS-WTO3 4.910 O.479 4.489 Ps-WT (Average) 4.821 OSO4 7.247 1.OOO 1.OOO 1.OOO

Example 9 50 polypeptide and an alignment length of 85% or greater along the shorter sequence in the alignment. The query polypeptide Determination of Functional Homolog and/or and any of the aforementioned identified polypeptides were Ortholog Sequences designated as a cluster. The main Reciprocal BLAST process consists of two A subject sequence was considered a functional homolog 55 rounds of BLAST searches; forward search and reverse or ortholog of a query sequence if the Subject and query search. In the forward search step, a query polypeptide sequences encoded proteins having a similar function and/or sequence, “polypeptide A. from Source species SA was activity. A process known as Reciprocal BLAST (Rivera et BLASTed against all protein sequences from a species of al., Proc. Natl. Acad. Sci. USA, 95:6239-6244 (1998)) was interest. Top hits were determined using an E-value cutoff of used to identify potential functional homolog and/or ortholog 60 10-5 and an identity cutoff of 35%. Among the top hits, the sequences from databases consisting of all available public sequence having the lowest E-value was designated as the and proprietary peptide sequences, including NR from NCBI best hit, and considered a potential functional homolog or and peptide translations from Ceres clones. ortholog. Any other top hit that had a sequence identity of Before starting a Reciprocal BLAST process, a specific 80% or greater to the best hit or to the original query polypep query polypeptide was searched against all peptides from its 65 tide was considered a potential functional homolog or source species using BLAST in order to identify polypeptides ortholog as well. This process was repeated for all species of having sequence identity of 80% or greater to the query interest. US 7,795,503 B2 71 72 In the reverse search round, the top hits identified in the forward search from all species were BLASTed against all TABLE 13-continued protein sequences from the Source species SA. A top hit from the forward search that returned a polypeptide from the afore Percent identity to Ceres cDNA ID 23357249 (SEQ ID NO: 7 mentioned cluster as its best hit was also considered as a 5 SEQID 9% potential functional homolog or ortholog. Designation Species NO: Identity e-value Functional homologs and/or orthologs were identified by Public GI no. 7439995 Nicoiiana 10 66.6 5.19E-39 manual inspection of potential functional homolog and/or Sylvestris ortholog sequences. Representative functional homologs 10 Public GI no. 7489099 Noting 11 65.8 7.1 OE-3S Sylvestris and/or orthologs for SEQID NO:2, SEQ ID NO:7, SEQID Public GI no. 34906972 Oryza sativa 12 646 1.59E-35 NO:18, SEQ ID NO:35, SEQ ID NO:44, SEQ ID NO:63, Subsp.japonica SEQ ID NO:72, SEQ ID NO:81, SEQ ID NO:88, SEQ ID Ceres CLONEID no: Glycine max 13 61.6 18OE-38 NO:101, SEQ ID NO:115, SEQ ID NO.124, SEQ ID (CONE ID Givoli 14 61 7.49E-40 NO:138, SEQ ID NO:152, SEQ ID NO:158, SEQ ID 15 .. O. ycine max a - NO: 169, SEQ ID NO:177, SEQ ID NO: 190, SEQ ID " Cees CLONE, ID . It 15 604 17OE-39 NO: 193, SEQ ID NO:200, SEQ ID NO:206, and SEQ ID 579861 aestivain NO:221 are shown in FIGS. 1-22, respectively. The percent Public GI no. 21388662 Physcomitrella 16 544 1.79E-29 identities of functional homologs and/or orthologs to SEQID patens NO:2, SEQID NO:7, SEQID NO:18, SEQID NO:35, SEQ ID NO:44, SEQID NO:63, SEQID NO:72, SEQID NO:81, 2O SEQID NO:88, SEQID NO:101, SEQID NO:115, SEQ ID TABLE 1.4 NO:124, SEQ ID NO: 138, SEQ ID NO:152, SEQ ID NO:158 s SEQ ID NO:169 s SEQ ID NO:177 s SEQ ID Percent identity to Ceres cDNA ID 23358452 (SEQ ID NO: 18 NO:190, SEQ ID NO.193, SEQ ID NO:200, SEQ ID 25 SEQ ID % NO:206, and SEQ ID NO:221 are shown below in Tables Designation Species NO: Identity e-value 12-33, respectively. Ceres CLONEID no: Brassica 19 65.3 S.7OE-42 873 113 naptiS TABLE 12 Ceres CLONE ID no. Brassica 2O 64.5 2.7OE-42 956.177 naptiS Percent identity to Ceres cDNA ID 23356923 (SEQ ID NO. 2) 30 Ceres CLONE ID no. Glycine max 21 52.4 8.99E-28 721511 SEQID Ceres CLONEID no. Glycine max 22 52.4 8.99E-28 Designation Species NO: % Identity e-value 641329 Ceres CLONEID no: Glycine max 23 S1.6 1.90E-27 Public GI no. 51970702 Arabidopsis 3 98.7 4.5OE-81 782784 thaliana 35 Public GI no. 1864.5 Glycine max 24 S1.6 1.90E-27 Ceres CLONE ID no. Brassica 4 81.8 1.59E-51 Public GI no. 1052956 Ipomoea nil 25 48.7 119E-23 871060 naptiS Public GI no. 436424 Pistin 26 48 1.7OE-26 Ceres CLONEID no: Brassica 5 78.9 449E-49 Sativlin 1069147 naptiS Public GI no. 2894.109 Soianum 27 47.6 109E-22 tuberosum 40 Ceres CLONE ID no. Trictim 28 46.8 4.69E-22 686294 aestivain TABLE 13 Public GI no. 50726318 Oryza sativa 29 45.6 3.19E-23 Subsp. Percent identity to Ceres cDNA ID 23357249 (SEQ ID NO: 7) japonica Public GI no. 729737 Vicia faba 30 44 3.6OE-24 SEQID % 45 Public GI no. 729736 Ipomoea nil 31 43.5 109E-22 Designation Species NO: Identity e-value Ceres CLONEID no: Zea mayS 32 43 2.49E-23 O6O767 Ceres CLONEID no: Zea mays 8 75.3 5.19E-48 Public GI no. 7446231 Canavalia 33 42.8 2.OOE-23 1388.283 gladiata Public GI no. 1778.374 Pistin Saivain 9 66.9 2.7OE-42

TABLE 1.5

Percent identity to Ceres cDNA ID 2336O114 (SEQID NO:35) SEQ ID 9% Designation Species NO: Identity e-value

Ceres CLONEID no: Zea mays 36 87.8 6.6OE-103 1382382 Ceres CLONEID no: Zea mays 37 61.5 4.2OE-53 1561543 Public GI no. 51964362 Oryza sativa subsp. 38 59.5 2.1OE-60 japonica Ceres CLONEID no: Glycine max 39 57.3 1.90E-57 557109 Public GI no. 50912679 Oryza sativa subsp. japonica 40 56.9 2.1OE-42 US 7,795,503 B2 73

TABLE 15-continued Percent identity to Ceres cDNA ID 2336O114 (SEQ ID NO:35 SEQ ID % Designation Species NO: Identity e-value Public GI no. 51535177 Oryza sativa subsp. 41 52.6 1.2OE-46 japonica Ceres CLONEID no: Trictim aestivum 42 35.2 8.2OE-34 888753

TABLE 16

Percent identity to Ceres cDNAID 23366941 (SEOID NO:44 Designation Species SEQ ID NO: % Identity e-value Public GI no. 12324817 Arabidopsis thaliana 45 98.75 1.79E-45 Public GI no. 55584076 Pelagonium zonale 46 69.3 9.99E-36 Ceres CLONEID no: Zea mays 47 68.3 7.6OE-31 3O3971 Ceres CLONEID no: Zea mays 48 68.3 9.69E-31 633647 Ceres CLONEID no: Zea mays 49 68.3 7.6OE-31 3.14456 Public GI no. 16516825 Petuniax hybrida 50 65.3 3.89E-34 Ceres CLONEID no: Trictim aestivum 51 65 4.99E-34 780025 Ceres CLONEID no: Trictim aestivum 52 65 4.99E-34 OOO657 Public GI no. 16516823 Petuniax hybrida 53 64.3 1.7OE-33 Public GI no.2982285 Picea mariana S4 64.3 1.79E-31 Ceres CLONEID no: Brassica naptis 55 64.3 5.8OE-33 963426 Ceres CLONEID no: Glycine max 56 64 2.79E-33 68.2557 Ceres CLONEID no: Glycine max 57 6.3.3 3.09E-34 646744 Public GI no. 59042581 Pelangonium zonale 58 63.2 9.69E-31 Ceres CLONEID no: Glycine max 59 63 2.2OE-33 6O2368 Ceres CLONEID no: Glycine max 60 61.3 2.39E-34 S66082 Ceres CLONEID no: Brassica naptis 61 58 6.59E-32 1114184

TABLE 17 Percent identity to Ceres cDNA ID 23371050 (SEQ ID NO: 63 % Designation Species SEQID NO: Identity e-value Ceres CLONEID no: Brassica naptis 64 65 24OE-50 96.2327 Ceres CLONEID no: Glycine max 65 S4.9 5.39E-37 11 O1577 Ceres CLONEID no: Trictim aestivum 66 49.6 4.09E-32 634261 Public GI no. 5031281 Prints armeniaca 67 47 3.79E-29 Public GI no. 35187687 Oryza sativa subsp. indica 68 38.7 3.6OE-31 Public GI no. 34978689 Oryza sativa subsp. 69 51.3 2.2OE-33 japonica Public GI no. 34909836 Oryza sativa subsp. 70 41.67 5.19E-06 japonica

TABLE 1.8

Percent identity to Ceres cDNAID 23383878 (SEOID NO: 72 Designation Species SEQ ID NO: % Identity e-value Ceres CLONEID no: Arabidopsis thaliana 73 98.3 6.99E-28 948SO US 7,795,503 B2 75 76

TABLE 18-continued

Percent identity to Ceres cDNAID 23383878 (SEOID NO: 72 Designation Species SEQ ID NO: % Identity e-value Public GI no. 21689807 Arabidopsis thaliana 74 98.1 8.8OE-53 Public GI no. 18391322 Arabidopsis thaliana 75 97.8 S.79E-40 Ceres CLONEID no: Arabidopsis thaliana 76 97.7 11 OE-40 17426 Ceres CLONEID no: Arabidopsis thaliana 77 97.6 2.3OE-38 11593 Ceres CLONEID no: Brassica naptis 78 81.4 14OE-31 1087844 Ceres CLONEID no: Brassica naptis 79 79.6 1.19E-34 963628

TABLE 19

Percent identity to Ceres cDNAID 2338.5144 (SEOID NO: 81 Designation Species SEQ ID NO: % Identity e-value Ceres CLONEID no: Glycine max 82 96.53 O 473126 Public GI no. 5428.7494 Oryza sativa subsp. 83 95.37 3.39E-130 japonica Ceres CLONEID no: Zea mays 84 95.37 2.6OE-130 238.614 Public GI no. 34903124 Oryza sativa subsp. 85 94.98 1.39E-129 japonica Public GI no. 53791918 Oryza sativa subsp. 86 67.18 4.99E-88 japonica

TABLE 20

Percent identity to Ceres cDNAID 23385649 (SEOID NO: 88

% Designation Species SEQ ID NO: Identity e-value Ceres CLONEID no: Glycine max 89 S.O.3 12OE-32 474636 Ceres CLONEID no: Glycine max 90 51.8 3.19E-32 1057375 Ceres CLONEID no: Glycine max 91 48.8 4.09E-32 102.7534 Public GI no. 1632831 Ricinits communis 92 51.8 5.8OE-33 Public GI no. 5669634 Lycopersicon escientiin 93 68.2 4.8OE-45 Public GI no. 8895787 Soianum tuberosum 94 49.6 5.19E-32 Ceres CLONEID no: Trictim aestivum 95 604 3.29E-37 638899 Ceres CLONEID no: Zea mays 96 63.1 6.89E-37 348434 Ceres CLONEID no: Parthenium argentatum 97 67.6 3.49E-42 16O7224 Public GI no. 50725389 Oryza sativa subsp. 98 61.9 2.59E-37 japonica Public GI no. 19225065 Reiana raetan 99 75.1 9.09E-51

TABLE 21 Percent identity to Ceres cDNA ID 23387851 (SEQ ID NO: 101 % Designation Species SEQ ID NO: Identity e-value Public GI no. 50253268 Oryza sativa subsp. 102 43.57 S2OE-22 japonica Public GI no. 45826359 Lycopersicon esculentum 103 38.46 5.7OE-16 Public GI no. 45826360 Lycopersicon esculentum 104 36.23 1.2OE-15 Public GI no. 37993864 Gossypium hirsutum 105 38.17 189E-15 Ceres CLONEID no: Glycine max 106 S1.67 2.59E-2O 707 775