US 2016O177404A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2016/0177404 A1 McKernan (43) Pub. Date: Jun. 23, 2016

(54) GENOMES AND USES THEREOF Publication Classification (71) Applicant: Courtagen Life Sciences Inc., Woburn, (51) Int. Cl. MA (US) CI2O I/68 (2006.01) (52) U.S. Cl. (72) Inventor: Kevin McKernan, Marblehead, MA CPC ...... CI2O I/6895 (2013.01): CI2O I/6806 (US) (2013.01); C12O 2600/13 (2013.01) (21) Appl. No.: 14/545,122 (57) ABSTRACT (22) Filed: Mar. 27, 2015 Using the efficiency of next generation sequencing, a draft de novo reference sequence for the Cannabis (C.) Sativa and C. Related U.S. Application Data Indica genomes has been generated as well as four full length (63) Continuation-in-part of application No. 13/588,935, contiguous sequences with homology to THCA and CBDA filed on Aug. 17, 2012 bandoned synthases and 10 partially homologous contigs with truncated ed. On Aug. 1 1, , now abandoned. ORFs. In particular aspects the invention is directed to an (one (60) Provisional application No. 61/600,436, filed on Feb. or more) isolated sequence (e.g., nucleic acid sequence, 17, 2012, provisional application No. 61/575,329, DNA, RNA, genomic sequence, polypeptide) of a Cannabis filed on Aug. 18, 2011. genome and uses thereof. Patent Application Publication Jun. 23, 2016 Sheet 1 of 58 US 2016/0177404 A1

Patent Application Publication Jun. 23, 2016 Sheet 2 of 58 US 2016/0177404 A1

Figure 2B AT, A. G "A AA A A C oA AA-ce A A A- f. A G A-T 40-A-TEAA 60 T-A Tag-T-Crst TSAa', p-T-A-A-T-t-A-A- 80 GSAPA cIf- A NiAA100 C FA i.e.,TA A.A. AA A C FAAC A C-C A-T-if A A e-I-KA c -A-AA T T A 126 T T A A A i A. A C A 5'-A g 3'- T A 140 A-G-: a co g- AAs f;: A TAC-Ai-A w C AsT J 240 is T-A se t A-6 YC -TT-I ta:SYA YA clfRA'CLALALACAP 166t EA-g G A A-200 T A T A AG ^A C - A G GT AC 220. YGA A A-f I C G A Patent Application Publication Jun. 23, 2016 Sheet 3 of 58 US 2016/0177404 A1

Patent Application Publication Jun. 23, 2016 Sheet 4 of 58 US 2016/0177404 A1

3.x:...: Patent Application Publication Jun. 23, 2016 Sheet 5 of 58 US 2016/0177404 A1

07 | NNIHXSHOÖTHNQH?NWYISI, IMÄÄASIO'ILV9WGAMVI?,SH IYIKK'ICH5)IJ.W.50WCIAMVII,XII,CI

ACIAJ,IS'H'INHWOITIAH?AXIS

| ACIAJ,ISYI'INYIWCITIIA),?IAXIS 0Z ÖIS)S'ITSATIXOT?M?S?J. IQAXAIHWNHTGAIV??AÕS A?SVIÐEIGIH5)Snsu3Su00 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Patent Application Publication Jun. 23, 2016 Sheet 6 of 58 US 2016/0177404 A1

9OSHI,HIÖT------Z??0SHIHIÕ'IÐAXIXSOTISVÕIHSANSåIAIAT?XãLIGISLIH'INÕIJISNTASW?TÖGHÖLÄIŠIX?NV?NN? 9S'HIHIÖTS)–XIXSOT--WÖ-HS-NS?.L-IAT?XIII.--S-HH'IN?IISN'IAS-À--Q--I-I-XI----NN 0Z|00||0809 |||| 99SHLYHIÖT------– |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| }||}||}||}||}||}||}|||||||||||}||}||}|}|}|}|}|}|}|}|}|}|

Patent Application Publication Jun. 23, 2016 Sheet 18 of 58 US 2016/0177404 A1

801HIÕIÐAXXIS0TISVÕIHSANSILAIAT?XãLLQSL??'INÒILSNTASWATÖGHÕLKIEXãNV?NN?IK?S 8ZZMITG30WSXWQTAYOGANATHYGIINGww.T?ÃNHW'Ivºx900s?H00AÐALä0x00?äs?NaNWENIMxx HIQT5)HIXISOKASVÖAHSWNS?LAIAT?XåJ.S?)ISHH'INÖINSNWASWJSSOINHJÄLT?VTXJ.SINHS |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

E .32 zas .2 E 5 US 2016/0177404 A1

|$$$$$$$ ?999– ©993– }}{{#0.11%

Patent Application Publication Jun. 23, 2016 Sheet 48 of 58 US 2016/0177404 A1

Og9||IV–––WWJ.OJ.W.J.W.W.J.V.0\,V_0 G£9)––--VIVIAI,J,WOJ,500 -J.W.W.J.00.J.500087||—- gÇ9)––J.W.W.J.J.W.OJ,5)O 9G?––--I,VIOJ,500 9091?lº)J.V.VLVWLVLWWLWIWº)?i,OVJ,5)?,J.J.W.J.WWW.J.OJ,00500 –--LWOWI,WO6/7– G£9)––--VI,VIOJ,WO G09)– 99||– G??7)– 998– 998– gGZI------–WWI),OJ,OOº)O 09810%| –JAVIOJ,500

I,VO505)OV100), J.W.O500OV100), -05)OOVOI, -0O.J. J.V.09.00W00), J.V.05000V00J. J.W.050.00WOOJ. J,5005000V00J. Patent Application Publication Jun. 23, 2016 Sheet 49 of 58 US 2016/0177404 A1

Figure 6A >contig2004 GACCATEGCCCCTCCACCAAAAGTCACATAAAAACACTAAAGGGAGTATTGGG AAATTCTAAATCCCGGCCAATCCCGACATTCCCAATGTCTAAAACCCGTCCCCAAACTA CTA ACATACTAAGTTGTGATTCTACTGAGCCAAACGCCGAGTTCCAA AATACCGGGCAC CGGAATTACAAAATAGGCAAGCTACTGAATAACATAAATAACAGTATAATATATATTTA AATAGCTATAAAAATCATAATGAATCAAAACAACTGCAATTTCCAAATTAACTAA GCGGGCTTTACAGAGGGATATGTGGATCTCAATCTTTTGTATGATTAATCTAATTGAACC AAGGATTCTTTAAGAAAGTAAAATAAGATTATAACTTAACTACATATACCCAAAATTTTA AAAGTAACTCTAGATTATAATTAAAGGAAGTCACTACACAACACTCTTTACTATAATAA CAAGTAAAAGAATAACAAGTACCCAAATTTTATTAATGAATAATTATAATTCAAT GCATGTAAAATAAAACTATTCCATAAACATGTAAGTTAAAAGAATATTAATTTATTCTCC ATAAATAAGAAAGAAATTATATCCATATATCTTAGATTGTCTACACAGTTGT ACTGTAGATOTCATACTTAGGAGCATACATAGTACAGGTAAACAAATTATATAATAAC AGATACATGTATGGTATAAGGATTAACTTAG TATA AGAGAAATAGATATATTAAAAAT AATTAATGACGTCGTGGCGGAAGAGGGGGATGCTTTGCTCGTTCTAAAAAAATTATTG GGATCAACTTTGGTTTCACCTTAACTAACTTGTCAAAGTTTTTACCAAAGTACTTTCA CCCCAGATACGTGCTTGGGTGTAATTATTAGGACTCTTGGGATCAGTTTCCTAAATCA AGGTCCCAAATTGACAACGCCATCGGATTGGGACACAAAGGCGTCAGAAA TTATAAACACTCAAACCCAGTATAGCTTCATATCTTCTTGCTTCTCCCAGGTA GCTGCGTATCAAACTTCGTACATGATTCCAGCTCGATGAGGGAAAGGAATTGTTCATECT GAGATCTCTCCATTATACCACCGAAGGGTACAATACA ACACCCCAACCCTACACT TCTTCATACAATTTCTCCAAAATTTTGACAATTGCAGTTTCTGGAATTGGTTTCTAACG TAGCTAACTTAACTGAGAAAGCTACTTCTGCCCAGCTGATCTATCAAGCAAAATTTCC TTGAAAAT AGAGTGTE ATAATTACA ACACCACTGTAGAAGATGGTAGACAATC CAGCCAATCTTGCAATCAGTTTEAATACCCAACTCAGGAAAGCTCTTGTTCATC AAGTTGACTAGACTATCTACTCCACCATGGAAAATGCAAGAGAAGTAACCGTGTACTGTA GTCTATCCCGATACTATAAACCGGATGAACGGAGCAGAGAA AAATCTTTGTCATACTTGTAAGCAATATTGCCATTATTAAATAACTTGACAAGCCCA TGTATCTCCAATTCCTTAACACTGAATATAGAGCCCTGAGGGACAGCAACCAGT CTAATTTTCCACGCTGCAATGATTCCAAAGTTTCACCCCACCACCACGTATAGCCCAA AAT AGATCTCCCCCATGGATSTCGACAGAACTTTCCACAA CATTGACAAGTGT GCATCAATGARATTATCAGCTGCGAGGCCATAATTCGCATTAATGCTCTATAGCCTCCT CCACTAAAGTGTCCACCTACGCCAACAG TAGGGCAATACCCACCAGGAAAACTAAGATTC TCATAATCCAATAATAAACTTCTCCAAGGGTAGCTCCGGCTTCA ACCCACGCAATTGG CTACGAATATCTATTTCACCGAATGCATGTECTCAAGTCTACTATAACAAAGGGACT TGAGATGTGTAGGACAAACCCTCAGCATCATGGCCACCGCTTCGAGTCGAATTTGCAAG CCCACTCTTGGAGCATAGAATAGTGGCTTGGATATGGGAGACATTTGAAGGAGTGACA ATAACGAGTGGTTTGGGGTTGTATCAGAGGTAAACTAAGATTTGTATTGTCAAATTC AGGACAGACATATACAATTGGTCGTGTTGACTGTATACGAGTTTAGATTTGCTGGATTG TTATGAATATATTGGGAGAAGCATTTTAGGAAATTTTCTTGAGGATTAGCTATTGAAATT TTGATATTGAATGAGAGAAAGAAAAATAAATTTTGTAAACAAACCTAAAGGAGAATGTT GAGTAGTACTCATTTTTTAGTCTTAATGATTTTGAATTACTATGAATTAAGAATTG TCCTTCTATAACTTATATATTGGAGCCGGGACTGAGAAATTGACTTAAATAATTCTAT AATGGTAGTCATAAATAGCGACAATGGTAAAAAAATCCACAAAAATTAGGCACATT TGTTAGGGAAGCCCTTAATGAAAATAAATCTAAATTTTACATTGAATAGTTTCGCATCA TTATTTATGGAAAGAAGTTAAGCTGCCAAAATTGACTATTGGAAAAAAATAAAAAACAAA AAATTAGATATGAAAAGAAAGGACACGCAAATATATCTATGGATCTATTTGTCTTCA AACTATAGCTAGTGTGGGATATAGGTGCCTATGGCACTTTCCACCTTTTTTTTCA TAATATATATTTTTTGGGTAAAAATAAGAACTATGCAAGTATACAAAATTACTTTA TAGTTAACAATATCATCATTATCATTCCCATAAAAACTAAATATGAACTACCAATA TATATETTATTTTTATTGGTAATCAAAATTAGAATTTTTGAAAAGTAAAATAACATAT AGCAA AATTAAAGAAGAATAATAAAGAACAGAAACAAACAAAAATAAAATGAAA AATTATGAGTTCTAATCTCCTCTGCTATCTACTTTATTATTCAATATTGATGATGGATAT Patent Application Publication Jun. 23, 2016 Sheet 50 of 58 US 2016/0177404 A1

Figure 6B

TCAAAAGCATTAATTATTCGGATTAAAAAAAACAACTTGACCTATTGAAATCTCTA ATCTCTCTCSAGAAATAAAAATAGGAAGCAAGAACTCAA CTCAAAATCGCATAGAC CAATCATGCTCTGATTCTAAATTAAAATCTATGTCATTTTTATCATATTAAAATCT ACTTCACGCTTAGTCGAATTCATAAATAGCATGTAAGAGGTGCGTAATAAAT ATTCACAATAAAACAAATTTAATAGATCTTTTAAATTTTGTATTTTTTTTATTAAAT TAAA TATAAAATCAGAAAAAGGATAGAAAATCATTATTGTAAAATAACATGAA AT"TTAAAAGATAGE EAGCATTTTTEAAATGATATTTAAGTTAGT"TGAAACCTAATTTT CAAATGTAGGTTAATAAATAAAAATAAATTTATAAAAAAA CCGAAAATATTTTTTATTGTTCATTATTTCGATTTTTTTAAATTAATTAATTAATTTA AAAAAATAAATCCACACTAACTATCCAACAACTTGGCAGGAGTAGTGTTT AGCTTTGTGTAAGTTTTATAAAACCTGTTATTGCTGATCTACATTGGCATGGTTAACT GTGACAGACCAATGACTGATAACCCAGGTAAATGTCAAAGGCAAGAA ATAATTGGTAACAGGTAAATTTACATTCTCTGTTGGAAATATTTTACCAGGATCTTA GATTACAACAAGTATGTTGTEAACACCCTAAATATGAACTTCTAAAACGATAATTA AACACATAAAAGAAGAAAACCTTACATGATGGCAGCGGAATAATGTCCCCCAC TCAGATCTCTAACCATTGTATCCTTTCTGTCGCGGAGTATTACAAGATCTGAGCCCGAA TGTCCTTCCCTTGGTGGTCCCACAGTCTCCAATCRA GATTGAGGEACCACT GCGGGTGGGCACTACTCTATCACTAAAGTTGATATTGAA AGAGGAAAAGAGAGA GGGATAGGGTCGGCTATAGAGAGAAAAGTGGAAGGCTCAATTTTCTGAAAAAAAAAATC TCTGATTTCGACCAAAAGGCATATAAAACCTATATTGACTAAGCCACCACTTTCTA TATAGGCA ACACAGGTTAGGTTAGCAAATTGGCATAAAATAATGAAAATAT CAACGGCAATATCGCTAAGTGCCGGCCACAGGGTA"TGGGCTCACGGATE GCAGTCACAATTACACTCAAAAACGACAATTCCAATCTAACC TTTCAAGCCAAAACAATTATAAAACTAAAAAGATTATAAATAATATTGCCA AATTAATAAATTAGACAAAGAGTCCTAATAATAAAAAACCAGAACT CTTCTTTACAATTTCACCCCTGCTTAGTGAAAATTCACAAATTAGACTAGC AACE TAGAATTATAATGATAAAAAAATCAAATTGACCACAAGAGTAAGTCTC AACTAGAAGGGGACCATGGATCTATATGCTGACCTCCAATAAGGAACTGAATAC AAGTAAACCCAA CATTAATTCCTCGGAACCACTCTTAGAACTTAGAATCCACA ATATCAATATGCTATCTCATTTAACCATTGTTATAATCTTATTGFGATTTAAAGATCCTC TATATAGATGATCTACATCAAGATGGGATTCTTACCGTCCACCCCTCAATGATT GCCCCEAAAACACAGCACCEGTAAATCGGTGEAGTCACTAATAATAGTCAGT TAAACAAGAGCTCATCC ATTTACTTCTATTTGCTAAGTTCGAAGGGAATCATCACTTGAC TCTA ACACCAGAGAAGCTATAGACCAAATGTAGCACTCCCACCAATCA TACTATCATCTCCCAAAATATACGTATCACCCTGACCTAAAAGTAGGTTT. (SEQ ID NO: 407,642) Patent Application Publication Jun. 23, 2016 Sheet 51 of 58 US 2016/0177404 A1

Figure 6C >contig34396 TGCATGTTTATATATAGGACCGAATTATATAGTGTGTTTCTATAATACTTCCT AAAAGGGATGCATTATGCATTGATACACCACACIATAGCATCGAGGATAATTT TTTAATTTTTTTTATATATTCCTGTATGTATAGTATTAATATATTTATAAAATT TAAAAAAATTAAAATAATTACAATATAGAAAACA GTGTTCAACCAATCATTTCACACG CGTACAAAATAAAATAATCATGCGTGCAACATAATGAGTGAACCCTGTETCAGCGTATT AAATTTTTTTTGAATTTTTTAAAAATTGCAGGATATCTTAAATAATTATAATGTACAA ACCATAAAAAAATAAATTAACTAAAATATTACTTCAAATATCAAAAT AGATAAGATACA TGCATCCCTTAAAAAGGTGAATGTAGAATTCATATTATATTATATCAGTACAA ATATTAGAGGCGTCCCGACAGAAGTGGTGGGATGCTTTGTTCGTTTCTAAAGAAATAC TTGGATCGACCTTGGTTTCACATAACTAACCTCTTAAAGTTCATACCGAAAACTTTT TACCCCAAACACTTGCTTGTGAGTAACTAATAGGACCCTTCTCGTATTTGTCCCAAAT CAAGGTCCCTATAATGAGATATGTAGCTCTTGGATTGAGACACATAAGGAGACATGA AATTATATGCACTTCGGACCCAATTCATATGCTTTTCACTTTCTCCTTCTTTCTCCCATG CAGACAAGTACAAAATTTTGTACATAAATCCAGCTCGATGAGGGAATGGAATTGCTGATT CTGAAATCTCATCCATTTTACCACCATAAGGGTACATCATAAGAAACCCCACACCACAT CTCTTCATATAACTTCTCCAAAAGTTEACCACACAACTECTGGAACTGGCTCTBTA CATAGTCTAATTCACCTTGAAAAATACATTTTGTTGAAATCGTATAAGTAAAGATT CTGGTGTATCCCCACTTGAAAAACCATCAAAGTAAACGACAGTTTCAATCCAGCTCTTTT CGAGGCAATCTTTCT CAATTCCCAACCAGGAAATTCTTGCATCAA AGAGGA GACTCTCAACCCACCAAGAAATAAGAAGAGAATGAACCTTGTATAGTTCGTCTGTATT CCCTTGGAACGGTAGAATAAA GTCATGAAGCEAACAAAGAGAACAAACTTGT CAAACTTGTAAGCAGTATTTTGCCATTTGTTATATATCTTCACAGTCTCATTCATCTCCA ACTTACAACAGAGAA CATAGTAACTTAGATGGAACTCCAACCAATCTGATTCC AAGCGAGAATGATTCCAAAGCGCTCCACCACCACCACGTATGGCCCAAAACAAATCTT CTCCCATAGATCGGTCAAGGAATTCCCATCTCCGTGACTAAGGAGCACAATGA TATTATCAGCTCCAAGGCCATATTTCGCA ACAGTGCTCCATAGCCTCCTCCACTGAAAT GCCCACCACCCCAACAGTACGACAG ACCCAGCAGGAAAACTCAGATCCCATTTECT CAGCAATCCTATAATAAAGTCACCAATGGTAGCTCCAGCTCAACCCATGCAGTTTGG TATCACGTCTACAGTGATTGAACGTAGATTCTCATACTAATATCACAAATGGGACT TAGACACATAGGAGGCACCCTCAAAGTCATGGCCACCGCTTCGAGTTCGAATCTGCAAGC CATATTTCTTGGAGCATAAAACACAGGCTGGACATGGGATGTATTTGAAGGTGTGATGA TAACTAGTGGTTTGGTGTTGATGGAGAAGCGAATCTAAGGTTTTGTATAGTTGAATTTA GGACAGAGATATATGATGGATCATTTGGAGTGTGTATGACTTTACAATTGATTTATG TATTGTTGTTGATATGTTTGGAGAAGCATGAAGAAAGTTGTCATGTGGATTAGCTCGAG AAGTTTGGATAGAGAATGAGAGAAGTGAAACTAATAETTTGCAAAGAAACCAAAAAGAGA ATGTTGACTACTTCATTTTTTTTGCAATTAACATGAGTTTAGACCTTTCTTATATATAGC ATTATATGTGGAAAAGGAAACAAAACTAAATTAAAATAAATGGGTGAGGAAGGAACCT AAACAGATTCACACCAAAAAACGTACTGACAATAA CATTATAGTTTCATGTATAAAATAATAGTCATGATGATGCCAA CGTTATCTTCTAAA AAAAACAATAATATGACTATTGTAATAATTATATTGAATGGATACGTACT TGTTGATTTTAGAATTATATGAATATCTTCTGTGTGTGATTTTTTTTTTTCAGAA AATGAAAATTGGCCTCATGAAAATCGCAAAAATAAG ACTAGGGGAACTCGAGGGGGATAA AAGAGAGTTAATATCATTGATGGAGGAGGTCGATGAGGGTCTCAATATAAATATCTCATG GACTTCTTACTATEGAAACCAAAGTGTATCATACAATACTTAGTGTGCACCT CAGAAAAATGTGTATTGATGCACTTTTTATFATTTTGGCATAAAAAAATTGTTTAATTFT TTATAATAGTGTACGTGAGTTATAAAATATCATATAAAATATTAAAAATTCAAAA ATCATAATCATAAATGTTGTAGTTGATAAATTATAAAAAAATTTCAAAATTTGAAAA ATATETAAATAATTACAACCTATTTACAAAGAAAAAAAAAACATATACATTTACAA AAGTTTACATTTAGGGTGTCCATCAAAGTTAGTCCAAATCGTAGATCTCAAC TCTCTCCATCAAAATTACFTCTATATAGCCAACTTATTCAATTTCCTACCTTGAA TTGFTACTTTTCTCCCACTCCATCCTCATTCCTCCCCTAAACACCATGAGATCACCAATG AAATTCCATTTTTATAATTACAAATTTATTAETTAATTAATCAATTAATTAAACAGTGT ACAATAAA GTCAGTACAGAAATAGATAATTTGTAACTATAGA ACACGATTETGTCACTAC Patent Application Publication Jun. 23, 2016 Sheet 52 of 58 US 2016/0177404 A1

Figure 6 D AAGAAATGGAACCTTTTATCTCATTTTATGCTTAGAAAAATAAAAAGTGGGATAAAAA GTAAGCGAAAATTTACCCCACTTGTATAAG ACACAAGGGCCATGTTTGGTTTAAC ACTATCAAAAGGAAAATATTAAACAAGAAAAAGGTAAATATATTTTTTATTGACTA GCCAAACACACTCCTAGTGGGTCTCTACTACACTTTGGAGCAAGGGCCTTGTTGGGTA CGAAAAAATAAGTTTTCACTCTTGATATTGCGAAACCAAACGAGGTAAAAGGGAA CCCTTCCGGGATAAAAGGGTTTCCCTAGTCCTTTTATAAATACTTTTAGGTTTAGGGTT ATCTCTCCATCCAAGTTCTCTCTCTTGCGCCCTCTCTCTTGTTTCTCCCAATCT TTTGCATTTCCGGATTCTAGGCCGCGATCTCAGCCACGCACGAAGGGCCTTCTCTCC TCTCCACC ATCTTACCCACACATCGGAGAAAACTCGCTGAAAAAGGCCTGAA GCTCTCTCTCCATTGCTTCTCTCTCTCTCTCTCTCTC (SEQ ID NO: 407,644) Patent Application Publication Jun. 23, 2016 Sheet 53 of 58 US 2016/0177404 A1

Figure 6E >contig3207 ATTGACGATGTTAAATTAGGACTTGTCTAATTTCATTCAATTACTATTTFCAGTGA TTGTTTATGTACTAAACAATGCTCCCTAGACTTGATATCTACCAAATCATGCTCCTAA ACTTGATATGTACTAAATCATGCCCTTGAACTTCATCCATGTAGAATTTTTACT AAAATTAGACAAAAGTCCTTAAATCTAACAATCTCAATAGTTCACGGGGGAATTTTATTA ACGGCCAAAAAAATTAGGGGCATGATTGGTACATGTCAAAATTCGGGGGATTACTAAT TAGCCAAATAAACAAAAAAAGAAAGAATTACGAAATTTAGTTAGAAATATTTTTATTA ATTGAAATAATAAAAAAAAATAAATAAAACTA AGATTGTTATGTTATATACCCT CAAATCAATATATATATATTGTTGAATCTGAATTAGAATATTAGTGGACTTTTGGAATT AATAAAGGTAATAAAAAAAATTGTAACTAAATTATTATAAAAATAAGTACAATGG CTGTTCA AACAAATCGTCCTAATTTAATATGAAATAAATAGACTTAATATATACACACT ATTGTAGAGTGAGGAACTATACGTAAGTTGGTATTCTCCCTTAATGCTGGTTGT AATCAATATTACGAAACTTCATTTCTAATGACACAATAAAGTAGTAAAACTGCAGCTA TTAGAAAGGTAGGGTCAATTATATCTCTAATAAAATAACTATAAAAAAAG ACTAAT TTGTATGTAAG ACCATTGAACTGAAAGCAAACATTTTTATTTTTTATAAATCATGACG AAGAAACACAAATTAATTTATTGTATAGATACATGAAGAAACAAACCGTACGTATAAATA TTACAATACAATACAACAAATAATAGTAAATTATA ACATAATATATATATATATAT ATGCAAGAGAAACGGGAGAAAAGGGGAAGCTCTCGTCCGTCCAAAGAAA AGTCGGGTCA ACCTTGGTTTCACTTGAACTAATTCTTAAAATTTTTATTGAAGTACTT TTTCCCCCAAATGCTGCTGTTCATAACTTATAGGACCTTCATATTGTCCCAA ATCAGGTCCTATAATAATATATGAACGCCGGATTTCCGGACACGTAGGAGTCA GAATAAAACACTTCGAACCCAATTCATATGCCTTCACTCTCTCTCCCA CTGAGACCAATAAAAATTGACATAACTCCAGCTCTATGGGGAAGGAATTGCAGA TTCAGGAATCTCGCTCATTTEACCACCATAAGGGTACATTTGAATAATGCCAATCCTAC ATCCTTCATATAACTTTCCAAAAGTTTGACCATAACAATTTCTGAAATTGGCTTCCT AACGTAGTCCAGTTTCCCCTTGAAATAACCCTTACTACATTGTGGACGAGCAAAAC CTCCAATTATCCTCACTTGACGAATTCATTGAAGAAAAAGATCATTCAATCCAACTCGT TTCAAAACAATCTTTTCTTACTACACCCAACTCAGGAAAGTTAGTGCATCAAGGAAAG AAAATTACCACTCTACCAAGAAAAATAGAAAAGAATGTAGCTTGTATTGTTGTCATATT CCCCAGCCATCAGTAGA ACACAGAAGAACCAACACCACAACAAACTG GTCAAACTTGTGAGCAATATTTTGCCACTTATATAAATTCACAGTTTCATTATCGG CAAGTECTACTAACGATAATACAGTAACCTTAGATGGCACAGGAACCAATCTAATTT CCAAGCAAGAACAATTCCAAAGCTFGCTCCACCACCACCACGAATGGCCCAAAACAAATC TCTCCCATAGACGGCAA CGAATCTCCACAGCGGACAAGTGAGCACAAT GACATTATCAGCTGCAAGGCCATAETCGCATCAATGCTCCATAGCCTCCTCCACTGAA ATGCCCACCAACCCCAACACTATGGCAATACCCACCAGGAAAACTAAGATTCTCATTTT CTCAGCAATCCTATAATAAAGTTCTCCAAGGGTAGCTCCAGATTCA ACCCATGCAGTTTT GTTATCTACGTCACAGTGATTGAACGTAGATTCTCATATCTAATATCACAAATGGGAC TTTAGACACATAGGAGACACCCTCAAAGTCATGGCCGCCGCTCGAGTTCGAATCTGTAA GCCATATETCETGGAACAAAAACACAGGCTGGACATGGGATGTATTGAAGGTGTGAT GATAACTAGTGGTTTTGGTGTTGTTGAACTGAATCTAAGGTTCTGTATGGTTGAATTTAG AAGGGAGATATATGAAAAGTCATTTGGAGTGTGTATGAGTGGCAAGTGATCATGT GTTGGAGATATGTTTGGAGAAGCATTGAAGAAAGTTGTTGTGAGGATTAGCTTGAGAAGT TTGGATAGAGAATGAAAGAAGATAAACTAATAATATTTGCAAAGAAACCAAGATACTGA TGAGTACTTCATTTTCTTAAGCCTTATATGTTTTGCATTAGCATGAATTGAGATCTTTCT TATATACAACTTCACGCAAGGAAAAGTACACAAATCATTTTTCTATATATATAAAAAA TTAAAAATCAGGAAGCAGCTTACACCTTTCATTCGTGAATGATTCTCTTATATTTTA TTTTAAAACAATAGGGTTCCTTAGTATGCCGTATTACACCGTATTGTATTGTATAAG ATTATATTGAATTGTATTTCATGTAATATTTEATGTAAAACTATACGTGGTATTGATTT TTAGGGCACTAAATAATAATATTTAGTATAAATAAAAAACATTGTATATAT AAAAATATATATATAATCAATAATGTAATACAATACAAATAACTAATACAACGTA CCACACGAGCTCTAAATAAATTATATGAACAACTTTGTTGAATTCAAACTTATAATA ATAATCTTTATCTTTGACTAATTTTAAAAATTGAATTTTTCACAAATATTACCTTGCAT CAATTTTTCCTTCATATTTAATCACTACCGCCGTTGCCCAATTGATCCATCAAACT Patent Application Publication Jun. 23, 2016 Sheet 54 of 58 US 2016/0177404 A1

Figure 6F CACGAAAAGAAAATGAATAATGACTTCACAGTGGTTCGGAAAAAAAATCATCTGTCCC AATTATTCTATCACAATCACTCCTTATTAGATAACAATCTTCTTCGAGACAGTATAAT TAAATTCTAAAATAATTTTTCGAGTCAGTAATAAACT CAATTGAACAAAAGTTCTA TTTTTATTTCCATATCCAATAATAACTACACAATTATATATAAATTTCACACCCTTTAC AAATTACAATAAGCCTCATAGTATTTTCATAGTCTAAACACAATGTGCAAATTGGG TTAAATAATCCAAAATATACCATAAAATTATATTTTGTATTTTTTGTAGTTGTGCT CAATGAAGAAGATGCAAATTAGCTATAATAAAAGTAATAACTATAATGCTGCTTG TTEGGTATTATATATGGTAGTTCGTGAGATGATTTTAATCAATTTTATTTTGATATTGA TTCGGTTAATTAATTATTTGTGTATTATTAGGACTGAACATCGAGCCGGTTTACCGA GTTTCACATTGGCGGGTTTCGGGTTCACGGGTTTAGGTTCAAAAAATTGAAACCAAACC CGGACCCGAATAGTGCACGGGGGCGGGTTGGCAGGTCACGGGTTGGCGGGTTTCGGTT GGTCGGGTTTAGCGGGTTGGCGGGTTGACCCGGTCAA CTCAGTGAACTCGGTCAGCTCTT TAAAAAAATTAATTTTTTATTCAAATAA (SEQ ID NO: 407,646) Patent Application Publication Jun. 23, 2016 Sheet 55 of 58 US 2016/0177404 A1

Figure 6G >contig2087 GTTGTTTTGTTGTATATTTATGAAAAAAGCCCAATTTTTTGAGGTGTATGGTTCTETT ATGTTTTTCTAAGTGCTTATATCTGCTTATTGGTTTACATTTATTTTTGTGTTGTTGTAG TAATTTTTTTGTTCTGATTTTTTAAGATTGGGCTTGCAAATGTAACTGTTATTGCATTG CGTATTGAGGTAGTGTAGAACATTTTTTTAGTTTTTTTTAATCATGTTTTCCTCTCTTT TGTTGATTGITAGTGTATTACAGTGTTTGCCATGATAGTATTGACGTC AATTTCACTTTTTTTTTCTCAATCTCACCGGAATTTATGGTTGTTTTCCAATTGTTCTAA TGTTCCTGGGTGGTTCTATCCGTGAGTGTAGAAGATGGAGGCCTCATTTTTTTTTGGCG TTTACAGTATAAAAGAAAAAAAATEGGCCAAGACAGTATAAAAATAATTCTGCCGCGTG ACAGTATTTTTGTAAAAATTAAGCCTAAAATCAGTATTGTAAGTCCCAAAAACTT ATTAGTAAAAGATTTTATTTAGGCCTAATAGAACTAGGGGTGTTCAATGTCACATCCAA TGTTAGGTCTAGCAGATCCGAFCCAATCATAATGGATGTTAGATTTTACCATCC GATCCGATCCAATAATTTCCATCATCCAATTGATCTAACGTCCATTGGATGTEGGATTG GATCGGTTCTACCATGGATGACAATATATITATTGTAATTTTGATTFATC TAATATTTTAGGCCTAGAATTTTTTGGATCAGATCGGATCCATCAAATATTTTAGGTCCC ATATGTATTGGATTGGACTGGATCC ATCCAATCCAAGAAAAAATAGTAAAATTTAATGT TAATTTTCATATATACATATAAATTGACCTATAACAATATAAAATAATACTCATCCA AACTAAAAAAAAACAATAGTAATGGATGTTAGATATATTGGATTAGACAC CCCATCCACCATCCGATCTCATCCAATTGGATATAAAAATAACATCCAATCCGATCCG ATCACAATGGATATCCAATGTTTGACGGTCAGTATAATTGGATCGATCAGTTCTCATT GAATTGACGATTTATACACACCCTAAATAAAACATTAAGTGAGGTTGTAATTAAGATA AATCTAGTGCACAAATACAATAAGATTTATTATATGTGAAAAAGTTAAGATATAATT TCTATTTTTATCAAAAGATAATAAAATTACATAAAATGAGTFTCATTTCAAGCATAAAA TCCCAACACATGTTTACATAAAATTCCTTACAATAACTACAEACAATAGAATGAAAT AGTCCACGTGAACATGATTTTATACACAATAAATATATAATTATACATAAACAAAGATGA AAAAAAATAGTGGAAAGATCAATGGTACTATCCATTATCTTAGACTGGCAGT GGCAGTGGAAGAGGTGGAATGCTCTGTTCATTCCTGAAGAAATTTTGGGGATCAACTTTA GTCTTCACATGAACTAACCTCT CAAATTGTCTTTCCCAAAATACTTTCTACCCCAAATA CTTGCTTGTGTGTAACTCGTGGGTCCCTTGTCGTATTTTTACCCAAGTCAAGATCTCTA TAGTFTACAATGCACTCTGGGCTEEGGACACAAAGGAGTCAGAGITATAAACA CTTCTAGGCCAATTGATACTCCCATCATCATCTCCTGGTTTTTGAACCCATTGAGCATAG TAGAGAATGTAGAGGECCCGCTCATGGGAGAAGGAATCCGA"CAGAAATC TCATTCATTTTCCCACCATAAGGAAACAACTGAAAGAATCCCACTCCAACATCTTCTTCA TATAACTTTCAAGCATTGTTTCAATACATTCTCTCGTATAGTTTTCTCACGTAGTCA AGTTTACCTTTGAATGAAAATAACCATTGTTGAGTTCTATTAAGCAAGTTTCCATCTCA GCTCCAAAGGGAATCCAGCAAAATAAGAAACAGACTCAATCCAACTCATTTCATTGCAA TCTTCTCTTTCA AACCAAACTCAGGAAAACTCTTTTCCATTAATGAAAGAAGATTATCC ACTCCACCAAGAAACAATGACAAGAATTGGGCTTGGAGTATAACCTTATTATTGAGA GAATTCACAGTCGAGAACCTAATAAGATAACCAAATCATCATCCAACTTGTCAGCAATG TATTGCCACTTGTTCACAAGCTTCATGGTCTCATTTTGGCCCAAATCCCTATAACAATG AATGTTGECACTGGATGGGACAGGAACCAATCTGATTTCCAAGCGAGAA CGACCA AAGCTIGCTGCTCCACCACCACGTATGGCCCAAAACAAATCCTCCCCCATTGATTCTCGA TCAAGAATTTTCCCATCA ACATTAACAATGTAAGCATCAATGATATATCAGCTGCTAGG CCATATTCGCACCAAAGGTCCATAGCCACCCCACAAAGTGTCCACCAACGCCTACC GTAGGGCAAAATCCAGCAGGAAAGCCGAGATTTCGACTTTTCTCGGCAATCCTATAATAA ACTTCACCAATGGAGCCCGGCTCAACCCAAGCTGTTTTTTCATCCACGTTTACGTTG ATAGAACTTAGGTTTCTCAAGTCTATATAACGAATGGGACTTCAGAAA CGTAAGAGAGA CCTTCGAAATCATGGCCACCCCTTCGTGTTCGGATCTCTAACCCATGTATCTTGAGCAG TAGACAGAGGCTTGGACATGGGAGGCGTTTCAAGGTGTAACGATAACGAGTGGTTTGGA GTTGAAGGAAAGGAAAATTAGGTTTGTATGTTGGAGTTCATTACAGACATAAACGAC GAGTCGTTCGAGTGTATGTGAGTTCAGCCAA AATGGTTGTGTTTGAGATATTGTGGGAC AGACATGAAGGAAGCCTCATGGGTGAGAAGCATGAGTAACGATAATGCAAAAAGAT AACAATGAAAAGAAAGCTAAGTAGTACTTCATTTTTCTGGTAAAGTTTTGCCTTTTTG GTATGTACAATATCATATAAGTCTGCCATATATATAAAGTATTATATGCTGCATG Patent Application Publication Jun. 23, 2016 Sheet 56 of 58 US 2016/0177404 A1

Figure 6H ACAGAGTAGTATTGTATGAATATATTAATATATATATCTATGAAATTTCAACGCGTATA AAGTTCAGATTACCATATTGATCTAGTTAAATGCTTACTCGGATGATGAACTACCAG TTCAACAATTATAAATTAATACTAACAAGTGTGTAATTTGTTTTTTTAAAAAGGATTTAT CTGTATATCTTGAATTCTCATATTATCATCTTATTTTAATAATATAAGCTTTACGTA GGTACTTATTCTTATGTAAAAAGGGTACGCAAAATTGAAGCCGACAAAGCCGACAAGAA AAATTGACTTTTTTGAAATCAAGAAAATATTGACTATATATATACATATAAATGA ACACACATTAETGTGTGTATATATAACATATAGTAGTCAAATAAAAATTTAGTAATATTA TCGTTAAA CTCAATTATGGTOGATTGGTAGAAGAGTTATATATACATATTATATATATT TTCGAGAAATATATATATATATTAAAACTAGACGGTTTGAACACATAAGATCACGC TCTATAATATATTAAAATTTGTTAAAAAAATAGCTGGATTATTTTTAGATATCTCTC TGTTCACAAAGTGAATTAATCAACAACTAATGATTATAAAACATTTAETTAGAAAGAC ACGTAATCGAAACATCAACGAAGCTGAAATTCTTTTAAATTGTTCAATTATGCAAATATA TGTETACATATATAATATACTGTCCTATATATATATATGTATATAAATATATAA ATACATCTGATCAAACATTGTTATAAATACTAGAAATTTGGAACTGGGTCAGCCGTATT TATATTTTAATTGAACAAGCTACGTACAAGTTACATGCTTTTTATAAAATTATATT TATCTATTTTAAA CGAAAATTCAATATCAATGAATTATCTAGTTGAACTCTTTTT TTTGACGAAAAAAACTGCTATATAAACAAACCAGAGAGTCATTTACAATAACAGAA TTATAGGCGGAGGAATACAATCCGACCAAAAACAATCTCATCTACCCCTAGTGCAAAT TAGCCAAGCCATCAGCAGCCATATTAGCTGTGCGCTTGACATGAGTTACATTTACATA GGGAAAAAAGAAAGAAG ACTAGAAACATCCATGATAATATCATAAAAACTGAGGAAGCT ACTGTGGGGAGCAAATTGCATTCTCACTCGAAGTGCATCTGTCTCTATAAGCAGATGA GGAAATTGAAAGTTTAGAGCATAATGCACACCATGGAACATAGCAAGAGCTCCATTCA TGAGGTTCAAAGCTGCCTAG CAATTTCTGGAGAAAGCGGCTATGACCATACCAGAGTTG TCCCGTAGAAGAGCGCCCAAACCAGTAGCAGGTTGCGGTCCACTGCAGCATCTATA TCAGCTEGTAAGCACCTGTGGGAGGTGGTTTCCATGGGACGTCGTFACGATTCGAAGCA GAGGGAGTGGCATTCTGCATTGGTGCTGCTGCACGAGCTGACAAGGTTGCTGATATTG GATATAGCE GATCTGTAATTCACGCAGGTAGTFTCCAGCAAAGGAAGACAACTCCATGGCC TTTTTGGCTTTGTGTCCATGTATAACTCTGTTTCGTTCAGACCAGATAGCCCACAAAATA CATACAATTGTTCCATTCAGCCTTAGAGTGAATACTGTAAGGTGAACCAAGTAATCC CCTTTATGCATTGAAGAGGCTATACCAA CAAAGTTGAAACCAGAAATTCTCCACACA GCTTGGCATACTACAACCAAAAAGAGCATGTCCCACAGTCTCCCACGCTGTTACAT AATGAGCAAGTGGAATCCGTAATAACTTTCCGTGACAAGGCCAGTGG (SEQ ID NO: 407,648)

US 2016/0177404 A1 Jun. 23, 2016

CANNABS GENOMIES AND USES THEREOF SEQID NO: 407,646 or SEQ ID NO: 407,648 or a portion thereof that encodes a biologically active syn RELATED APPLICATIONS thase, or a complement thereof. In a particular aspect, the 0001. This application is a continuation-in-part of U.S. invention is directed to nucleic acid comprising SEQID NO: application Ser. No. 13/588,935, filed Aug. 17, 2012, which 407,642, SEQID NO: 407,644, SEQID NO:407,646 or SEQ claims the benefit of U.S. Provisional Application No. ID NO: 407,648 or a portion thereof that encodes a biologi 61/600,436, filed on Feb. 17, 2012, and U.S. Provisional cally active cannabinoid synthase, or a complement thereof. Application No. 61/575,329 filed on Aug. 18, 2011. The 0016. In another aspect, the invention is directed to a entire teachings of the above applications are incorporated polypeptide comprising an amino acid sequence that has herein by reference. about 67% identity to SEQ ID NO. 407,643, SEQ ID NO: 407,645, SEQID NO: 407,647 or SEQID NO: 407,649 or a INCORPORATION BY REFERENCE OF biologically active portion thereof. Such as a biologically MATERIAL IN ASCII TEXT FILE active portion that functions as a cannabinoid synthase. In a particular aspect, the invention is directed to a polypeptide 0002 This application contains sequences (SEQID NOs: comprising SEQ ID NO: 407,643, SEQ ID NO: 407,645, 1-407,689) and information concerning the sequences (anno SEQID NO: 407,647 or SEQ ID NO: 407,649 or a biologi tated genome and single nucleotide polymorphisms) that are cally active portion thereof. Such as a biologically active contained on one computer readable form (CRF) disk and two portion that functions as a cannabinoid synthase. duplicate copies (Copy 1 and Copy 2) of three (3) compact 0017. Other aspects of the invention include an antibody disks all of which are herein incorporated by reference. Each that specifically binds one or more polypeptides described disk contains a sequence listing for SEQID NOs: 1-407,689 herein. Also encompasses by the inventions are vectors com and are identical. prising the nucleic acid sequences provided herein and cells 0003. Each disk is identified as follows: comprising the vectors. 0004 Disk CRF contains the following: 0005 File name: 0018. In another aspect, the invention is directed to a 0006 4747.1000-003 SL.TXT, created Mar. 23, 2015; method of producing a Cannabinoid synthase comprising 814,928,661 Bytes in size. maintaining a cell comprising a vector comprising the nucleic 0007 Copy 1 contains the following: acid sequences provided herein under conditions in which the 0008 File name: Cannabinoid synthase gene is produced. The method can 0009 4747.1000-003 SL.TXT, Mar. 23, 2015; 814,928, further comprise isolating the Cannabinoid synthase pro 661 Bytes in size. duced by the cell. In another aspect, the invention is directed 0010 Copy 2 contains the following: to a Cannabinoid synthase gene produced by the method. 0011 File name: 0019. In yet another aspect, the invention is directed to a 0012 4747.1000-003 SL.TXT, created Mar. 23, 2015; method of detecting a Cannabinoid in a sample comprising 814,928,661 Bytes in size. detecting the nucleic acid sequences described herein in the sample, wherein if the nucleic acid is detected, then a Can BACKGROUND OF THE INVENTION nabinoid is detected in the sample. The invention also encom passes a method of detecting Cannabis in a sample compris 0013 The non-psychoactive cannabinoid, has ing detecting the polypeptides provided herein, wherein if the recently been shown to promote apoptosis in tumor cells. polypeptide is detected, then a Cannabinoid is detected in the Eighty four (84) other have been measured in sample. but the genetics governing the synthesis of 0020. In still other aspects, the invention is directed to a all of these compounds are only partially known. method of detecting one or more cannabinoid genes in a Cannabis plant. The method comprises contacting all or a SUMMARY OF THE INVENTION portion of a genomic sequence of the Cannabis plant with one 0014 Described herein is a de novo assembly of the or more primers that are complementary to SEQ ID NO: medicinal plants Cannabis Sativa and . 407,642, SEQID NO: 407,644, SEQID NO: 407,646, SEQ These diploid assemblies range in size from 280 Mb to 303 ID NO: 407,648 or a combination thereof, thereby producing Mb, are 67% AT, and have mitochondrial genomes up to 366 a reaction mixture. The reaction mixture is maintained under Kb. Of particular interest is a mPIF transposon mediated copy conditions in which one or more sequences in the genomic number variation in the synthase genes responsible for can sequence of the Cannabis plant that are complementary to nabigerol acid (CBGA) conversion to one or more of the primers hybridize to the one or more (THC). Also evident is high diversity in the limonene and primers. The one or more sequences that hybridize to the one alpha pinene synthases. In total, the data provided herein or more primers are amplified, thereby producing one or more increases the available knowledge on the sequence on this amplicons; and all or a portion of the sequence of the one or plant over 70,000 fold and over 98.6% of the Cannabis more amplicons is determined, thereby detecting one or more sequence in Genbank has been covered with the 300 Mb cannabinoid genes in the Cannabis plant. The method can assemblies described herein. These data provide selective further comprise quantifying the one or more Cannbinoid breeding strategies to maximize medicinal expression and genes; measuring the Cannabinoid messenger ribonucleic attenuate psychoactive content while also providing a tool for acid (mRNA) of the plant, detecting whether fungal nucleic genetic prediction of cannabinoid expression and chemo acid, bacterial nucleic acid, or a combination thereof is types at Seedling stages. present in the plant; quantifying the fungal nucleic acid, bac 0015. Accordingly, in one aspect, the invention is directed terial nucleic acid, or a combination thereof if fungal nucleic to a nucleic acid comprising a nucleotide sequence that has acid, bacterial nucleic acid, or a combination thereof is about 82% to SEQID NO. 407,642, SEQID NO: 407,644, present; and/or comparing the quantified fungal nucleic acid, US 2016/0177404 A1 Jun. 23, 2016

bacterial nucleic acid, or a combination thereof to the quan 0032 FIG. 7A-7D show the amino acid sequences of con tified cannabinoid nucleic acid. tig #20041 (SEQID NO: 407,643), contig #34396 (SEQ ID NO: 407,645), contig #32071 (SEQ ID NO: 407,647) and BRIEF DESCRIPTION OF THE DRAWINGS contig #20817 (SEQID NO: 407,649). 0021 FIG. 1 shows the preliminary 2x assembly of 750 bp 454 GS FLX--reads in the THC synthase gene. DETAILED DESCRIPTION OF THE INVENTION 0022 FIGS. 2A-2B show a hairpin sequence (SEQ ID NO: 407,650) of a putative miniature P element inverted 0033. In recent years the pharmacology related to medici repeat family (mPIF) transposon sequence 5' to the gene in the nal cannabis use has been transformed with the discovery of Sativa assembly. the human endocannabinoid pathways and the endogenous 0023 FIGS. 3A and 3B show the target site for PIF inser human neurotransmitter (Devane et al. 1992, tion (Zhange et al., PNAS, 98(22): 12572-12577 (2001) and Science, 258(5090): 1946-1949: Fride and Mechoulam 1993, the cannabis sativa gene for tetrahydrocannabinolic acid syn Eur J Pharmacol, 231(2):401-409). Two human G-Protein thase (SEQ ID NO: 407,643). coupled receptors (GPCRs) known as CB1 and CB2 have 0024 FIGS. 4A-4D shows a Multiple Sequence Align been extensively characterized and are encoded by CNR1 and ment and amino acid confirmation of MGC-s3 or LA Con CNR2 genes on chromosome 6 and 1 respectively. Three tigii34396 vs PK contig HPK 23203.1 (LA contig34396 other GPCRs (GPR55, GPR18 and GPR119) are showing ORF THCAS like 3 (SEQID NO: 407,645); PK23203.1 evidence as other potential endocannabinoid receptors (Begg THCASlike 3 (SEQ ID NO: 407,655); CD contig27237 etal. 2005, Pharmacol Ther. 106(2):133-145; Brown 2007, Br ORF THCAS like 3 (SEQ ID NO: 407,656): THC J Pharac, 152(2):567-575). Eighty-five phyto-cannabinoids Synthase translation (SEQ ID NO: 18SEQ ID NO: 407, have been discovered in the Cannabis plant (El-Alfy et al., 657); Consensus (SEQ ID NO. 407,658)). Pharmacol Biochem Behav 95(4):434–442). Only one is 0025 FIGS. 5A-5AN shows a Multiple Sequence Align known to be independently psychoactive (tertrahydrocannab ment and conservation charts of peptide sequences from inol or THC). Non-psychoactive cannabinoids like canna LAC, CD, PK and Mexican or “CSA sequences. One can see bidiol (CBD) and cannabidiolic acid (CBDA) have shown divergent 5' and 3' ends with internal changes from LAC & impressive medical benefits as it pertains to tumor specific PK to CD & CSA at position 287 (FIG.5C). Several internal apoptosis in 9 different cancer types (Guzman 2003, Nat Rev amino acid changes can be seen with Sativa to Indica align Ca, 3(10):745-755), pain management via cox-2 inhibition ments in FIG.5B. LAC & PK are Indica dominant and CD & (Takeda et al. 2008, Drug Meatb Dispos 36(9): 1917-1921), CSA are Sativa dominant. effectiveness with antiemesis in HIV or chemotherapy related 0026 (FIGS. 5A-5D: LA contig20041 ORF THCAS nausea and improved muscle spasm control in patients with like 1 (SEQID NO: 407,659); PK20093.1 THCAS like 1 MS (Sarfaraz et al. 2008, Ca Res 68(2):339-342: Lakhanand (SEQID NO: 407,660): THC Synthase translation (SEQID Rowland 2009, BMC Neurol, 9:59). In addition the FDA has NO: 407,661); Consensus (SEQ ID NO: 407,662)) approved the use of and for glaucoma. 0027 (FIGS. 5E-5H: LA contig32071 ORF THCAS Combined with an extremely low therapeutic index, these like 2 (SEQ ID NO. 407,663); CD contig32295 ORF TH reported medical benefits have resulted in a “compassionate CAS like 2 (SEQID NO. 407,664); PK09375.1 THCAS use exemption' with 16 states and the District of Columbia like 2 (SEQ ID NO: 407,665): THC Synthase translation decriminalizing medical use of cannabis in the United States (SEQID NO. 407,661); Consensus (SEQID NO: 407,666)) and pharmaceutical companies actively investing in cannab 0028 (FIGS. 5I-5L: LA contig20817 ORF THCAS inoid research. This has resulted in approved cannabinoid like 4 (SEQID NO: 407,667): PKI 1708.1 THCAS like 4 therapeutics such as MarinolTM and SativeXTM. (SEQ ID NO: 407,668); THC synthase-translation (SEQID 0034. Due in part to recreational demand, the cannabis NO: 407,661); Consensus (SEQ ID NO: 407,669)) plant has been selectively bred in the last 30 years to express 0029 FIGS. 5M-5AN shows a Nucleic Acid multiple very high THC levels (above 20% in the flower weight) sequence alignments and conservation charts of many of the (Miller Coyle et al. 2003, Croat Med J, 44(3):315-321). This other THC-Like sequences in the LA confidential assembly has come at the cost of most plants available today having with homology to THCA synthase, Purple Kush “PK' and very low CBD content (below 1% flower weight) and consid Chemdawg “CD' closest contigs. erable interest in the genetics controlling chemotype (Ko 0030 (THC Synthase (SEQ ID NO: 407,670); LA con joma et al. 2006). To this end, De Meijer et al have demon tig-60432 (SEQ ID NO: 407,671): LA contig 20041 (SEQ strated that the cannabinoid contents are under strict genetic ID NO:407,672); LA contig 23755 (SEQID NO. 407,673); control and can be predicted from DNA sequence information CBD Synthase (SEQ ID NO: 407,674); LA contig 27956 before the plant has expressed active compounds (de Meijer (SEQ ID NO: 407,675); LA contig 46083 (SEQ ID NO: et al. 2003, Genetics, 163(1):335-346). The De Meijer study 407,676); LA contig 24266 (SEQ ID NO: 407,677); utilized PCR and Sanger sequencing to genotype CBD Syn LA contig 86540 (SEQ ID NO. 407,678); LA contig thase and THC synthase in many drug and fiber strains but has 66523 (SEQID NO: 407,679); CD contig 27237 rev (SEQ stimulated many questions in regards to the genetics control ID NO. 407,680); PK RNA 23203.1 (SEQ ID NO: 407, ling the other 83 cannabinoids. 681); LA contig 54324 (SEQ ID NO: 407,682); LA con 0035. In addition to cannabinoids, the plant is reported to tig 163104 (SEQID NO:407,683); Consensus (SEQID NO: have up to 140terpenes (Ross and ElSohly 1996, JNatl Prod, 407,684)) 59(1):49-51) (ElSohly 2007, Marijuana abd the cannab 0031 FIG. 6A-6H show the nucleotide sequences of con inoids. Human Press, Totowa, N.J.) at least one of which tig #20041 (SEQ ID NO. 407,642), contig #34396 (SEQ ID (Beta-caryophylene) is reported to be a volatile CB2 NO: 407,644), contig #32071 (SEQ ID NO: 407,646) and agonist (Gertsch et al. 2008, Proc Natl Acad Sci., USA, 105 contig #20817 (SEQID NO: 407,648). (26):9099-9104) with anti-inflammatory effects. US 2016/0177404 A1 Jun. 23, 2016

0036. As described herein, using the efficiency of next 0041 As will be apparent to those of sill in the art, all or a generation sequencing, a draft denovo reference sequence for portion of a biologically active cannabinoid synthase is a full the Cannabis (C.) Sativa and C. Indica genomes has been length or portion of a full length cannabinoid synthase that generated. This provides for the sequencing and resequencing has one or more activities of a cannabinoid synthase (e.g., of many more cannabis cultivars to better understand the atalyses the oxidocyclization of to can diversity of the genes encoding the cannabinoid and terpene nabidiolic acid). synthesis or the “cannabinome'. In addition, as shown herein, 0042. Other aspects of the invention include an antibody the LAC Indica assembly herein had four full length contigu that specifically binds one or more polypeptides described ous sequences, referred to herein as "contigs' (Contigs herein. antibody or antigen binding fragment thereof that #20041 (SEQID NOS:407,642 and 407,643), #32071 (SEQ specifically binds to all or a portion of polypeptides having ID NOS: 407,646 and 407,647), #34396 (SEQ ID NOS: the amino acid sequence of SEQ ID NOs: 407,643, NO: 407,644 and 407,645), #20817 (SEQ ID NOS: 407,648 and 407,645, 407,647, and/or 407,649. That is, the antibody can 407,649) with homology to THCA and CBDA synthases and bind to all of the polypeptide of from about 8 amino acids to 10 partially homologous contigs with truncated ORFs. The about 450 amino acids of the polypeptide. In particular full length contig, in particular, i34396, 81% sequence simi embodiments, the antibody can bind to about 10, 25, 50, 75, larity to both, was highly expressed in the PK Indica RNA 100, 125, 150, 175, 200, 225, 250, 275,300, 325, 350,375, Seq data but was absent from the PKIndica Cansata genomic 400, or 425 amino acids of the polypeptide. assembly. 0043. As used herein, the term “specific’ when referring to 0037 Accordingly, in one aspect the invention is directed an antibody-antigen interaction, is used to indicate that the to an (one or more) isolated sequence (e.g., nucleic acid antibody can selectively bind to the polypeptide. In one sequence, DNA, RNA, genomic sequence, polypeptide, pro embodiment, the antibody inhibits the activity of the polypep tein) of a Cannabis genome. tide. An antibody that is specific for polypeptides described 0038. In a particular aspect, the invention is directed to an herein is a molecule that selectively binds to the polypeptide isolated nucleic acid comprising SEQ ID NOs: 1-175,268 but does not substantially bind to other molecules in a sample, (Cannabis sativa genome). In another particular aspect, the e.g., in a biological sample a Cannabis plant. The term “anti invention is directed to an isolated nucleic acid comprising body, as used herein, refers to an immunoglobulin or a part SEQ ID NOs: 175,269-407,641 (Cannabis indica genome). thereof, and encompasses any polypeptide comprising an In other aspects, the invention is directed to an isolated antigen-binding site regardless of the Source, method of pro sequence that has about (at least about, at least) 80%, 81%, duction, and other characteristics. The term includes but is not 82%, 83%, 84%, 85%, 86%, 97%, 88%, 89%, 90%, 91%, limited to polyclonal, monoclonal, monospecific, polyspe 92%, 93%, 94%, 95%, 96%, 97%, 98%, of 99% to SEQID cific, humanized, human, single-chain, chimeric, synthetic, NOs: 1-175,268 and SEQID NOs: 175,269-407,641. recombinant, hybrid, mutated, conjugated and CDR-grafted 0039. In another aspect, the invention is directed to a antibodies. The term “antigen-binding site' refers to the part nucleic acid comprising a nucleotide sequence that has about of an antibody molecule that comprises the area specifically 82% to SEQID NO: 407,642, SEQID NO: 407,644, SEQID binding to or complementary to, a part or all of an antigen. An NO: 407,646 or SEQID NO: 407,648 or a portion thereofthat antigen-binding site may comprise an antibody light chain encodes a biologically active cannabinoid synthase, or a variable region (VL) and an antibody heavy chain variable complement thereof. In a particular aspect, the invention is region (VH). An antigen-binding site may be provided by one directed to nucleic acid comprising SEQ ID NO: 407,642, or more antibody variable domains (e.g., an Fd antibody SEQID NO:407,644, SEQID NO: 407,646 or SEQID NO: fragment consisting of a VH domain, an FV antibody frag 407,648 or a portion thereofthat encodes a biologically active ment consisting of a VH domain and a VL domain, oran Schv cannabinoid synthase, or a complement thereof. In other antibody fragment consisting of a VH domain and a VL aspects, the invention is directed to an isolated sequence that domain joined by a linker). has about (at least about; at least) 82%, 83%, 84%, 85%,86%, 0044) The various antibodies and portions thereof can be 97%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, produced using known techniques (Kohler and Milstein, 97%, 98%, of 99% to SEQID NOS: 407,642, 407,644, 407, Nature 256:495-497 (1975); Current Protocols in Immunol 646 or 407,648. ogy, Coligan et al., (eds.) John Wiley & Sons, Inc., New York, 0040. In another aspect, the invention is directed to a N.Y. (1994); Cabilly et al., U.S. Pat. No. 4,816,567; Cabilly et polypeptide comprising an amino acid sequence that has al., European Patent No. 0,125,023 B1; Boss et al., U.S. Pat. about 67% identity to SEQ ID NO. 407,643, SEQ ID NO: No. 4,816,397: Boss et al., European Patent No. 0,120,694 407,645, SEQID NO: 407,647 or SEQID NO: 407,649 or a B1; Neuberger, M.S. et al., WO 86/01533: Neuberger, M. S. biologically active portion thereof. Such as a biologically et al., European Patent No. 0,194.276 B1; Winter, U.S. Pat. active portion that functions as a cannabinoid synthase. In a No. 5.225,539; Winter, European Patent No. 0.239,400 B1; particular aspect, the invention is directed to a polypeptide Queen et al., European Patent No. 0451 216 B1; and Padlan, comprising SEQ ID NO: 407,643, SEQ ID NO: 407,645, E. A. et al., EPO 519596 A1: Newman, R. et al., BioTech SEQID NO: 407,647 or SEQ ID NO. 407,649 or a biologi nology, 10: 1455-1460 (1992); Ladner et al., U.S. Pat. No. cally active portion thereof. Such as a biologically active 4.946,778; Bird, R. E. et al., Science, 242: 423-426 (1988)). portion that functions as a cannabinoid synthase. In other 0045 Also encompasses by the inventions are vectors aspects, the invention is directed to an isolated sequence that comprising the nucleic acid sequences provided herein and has about (at least about; at least) 67%, 68%, 69%, 70%, 71%, cells comprising the vectors. As will be apparent to those of 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, skill in the art a number of cells and/or vectors can be used in 82%, 83%, 84%, 85%, 86%, 97%, 88%, 89%, 90%, 91%, conjunction with the nucleic acid sequences provided herein. 92%, 93%, 94%, 95%, 96%, 97%, 98%, of 99% to SEQID For example, a suitable plant cell includes a Cannabis plant NOS: 407,643, 407,645, 407,647 or 407,649. cell and a Suitable vector includes an agrobacterium vector. US 2016/0177404 A1 Jun. 23, 2016

0046. In another aspect, the invention is directed to a obtain a control value (standard value; known standard). See, method of producing a Cannabinoid synthase comprising for example, models described in Knapp, R. G. and Miller M. maintaining a cell comprising a vector comprising the nucleic C. (1992) Clinical Epidemiology and Biostatistics, William acid sequences provided herein under conditions in which the and Wilkins, Harual Publishing Co. Malvern, Pa., which is Cannabinoid synthase gene is produced. The method can incorporated herein by reference. Thus, as used herein, a further comprise isolating the Cannabinoid synthase pro “control’ or "known standard’ can to an amount and/or dis duced by the cell. In another aspect, the invention is directed tribution characteristic of an plant that does or does not have to a Cannabinoid synthase gene produced by the method. a cannbinoid gene. 0047. In yet another aspect, the invention is directed to a 0053 As shown herein, sequencing of the Cannabis sativa method of detecting a Cannabinoid in a sample comprising genome revealed that the THC synthase gene has replicated detecting the nucleic acid sequences described herein in the itself throughout the genome via a mobile genetic element sample, wherein if the nucleic acid is detected, then a Can also referred to herein as a transposable element. As used nabinoid is detected in the sample. The invention also encom herein, mobile genetic element or transposable element are passes a method of detecting Cannabis in a sample compris elements or regions in a sequence that allow replication and ing detecting the polypeptides provided herein, wherein if the insertion of a sequence into one or more additional places in polypeptide is detected, then a Cannabinoid is detected in the a sequence Such as a genomic sequence (see Jiang, N., et al., sample. The sample can be a plant sample (e.g., root tissue, Nature, 42:163-167*2003); Zhang, X., et al., PNAS, 98(22): leaf tissue) and/or a mammalian sample Such as tissue (e.g. 12572-12577 (2001); Wessler, S., Miniature Inverted-repeat skin, hair), or fluid (e.g., urine, blood). Transposable Elements (MITEs) and their Relationship with 0048. In still other aspects, the invention is directed to a Established DNA Transposons, University of Georgia, Dept. method of detecting one or more cannabinoid genes in a Botany and Genetics, Athens, Ga., all of which are incorpo Cannabis plant. The method comprises contacting all or a rated herein by reference). portion of a genomic sequence of the Cannabis plant with one 0054 Knowing this genome is tolerant of the copia and or more primers that are complementary to SEQ ID NO: miniature inverted-repeat transposable elements (MITE) rep 407,642, SEQID NO: 407,644, SEQID NO: 407,646, SEQ lication machinery enables the use of these sequences to ID NO: 407,648 or a combination thereof, thereby producing replicate other desired synthase genes throughout the plant. a reaction mixture. The reaction mixture is maintained under Of particular interest is the CBD synthase gene that produces conditions in which one or more sequences in the genomic the anti-cancer compound cannabidiol. sequence of the Cannabis plant that are complementary to 0055 Knowledge of the transposon systems which are one or more of the primers hybridize to the one or more tolerated by this species opens up avenues for improving the primers. The one or more sequences that hybridize to the one production of other cannabinoids. Specifically, the use of or more primers are amplified, thereby producing one or more these transposons to increase the % CBD (cannanbidiol) amplicons; and all or a portion of the sequence of the one or expressed would aid in, for example, fighting cancer. More more amplicons is determined, thereby detecting one or more specifically, synthesizing a DNA fragment which has the cannabinoid genes in the Cannabis plant. leader sequence identical to the THC synthase gene and its 0049. The method can further comprise quantifying the transposon signal where the THC synthase gene is replaced one or more Cannbinoid genes. In addition, the method can with CBD synthase one could then use Agrobacteria or other further comprise measuring the Cannabinoid messenger ribo pant transfection tools such as Gene Gun to introduce many nucleic acid (mRNA) of the plant. more CBD synthase genes into the plant. This would result in 0050. In a particular aspect, the method can further com a plant that expresses increased levels of CBD. prise detecting whether fungal nucleic acid, bacterial nucleic 0056. Accordingly, in another aspect, the invention is acid, or a combination thereof is present in the plant. As will directed to a method of increasing the copy number of one or be appreciated by those of skill in the art, if fungal nucleic more sequences in a Cannabis genome comprising operably acid, bacterial nucleic acid, or a combination thereof is linking the one or more sequences to one or more mobile present, then the fungal nucleic acid, bacterial nucleic acid or genetic elements, thereby increasing the copy number of one the combination thereof can also be quantified. The method or more sequences in a Cannabis. In yet another aspect, the can further comprise comparing the quantified fungal nucleic invention provides methods of introducing Such sequences acid, bacterial nucleic acid, or a combination thereof to the operably linked to one or more mobile genetic elements into quantified cannabinoid nucleic acid. a plant (e.g., a Cannabis plant) using, for example, a plant 0051. As will be apparent to those of skill in the art a transfection tool, e.g., Agrobacteria, and maintaining the number of methods can be used to detect and/or quantify one plant under conditions in which the copy number of the one or or more cannabinoid genes in a Cannabis plant Such as poly more sequences is increased in the plant (under conditions in merase chain reaction (PCR; quantitative PCR), real time which the expression of polypeptide encoded by the sequence PCR (rtPCR), and/or reverse transcription PCR. In addition a is increased in the plant, for example, as compared to a plant variety of methods can be used to detect and/or quantify which does not comprise the sequence operably linked to the bacterial and/or fungal nucleic acid in a Cannabis plant (e.g., mobile genetic element). The invention is also directed to SEQTM Bacterial and Fungal Detection System, Life Tech plants produced by the methods. nologies). 0057 Thus, examples of sequences whose copy number 0052. As will also be appreciated by those of skill in the could be increased include sequences that encode one or more art, the Cannabionoid, fungal and/or bacterial content can be polypeptides involved in the biosynthesis of one or more compared to a control. Any Suitable control can be used. For cannabinoids, and/or one or more terpenes. Specific example, a suitable control can be established by assaying examples include sequences that encode a Cannabidiol one or more (e.g., a large sample of) plants which do and/or do (CBD) synthase, a (CBC) synthase or not have a Cannabinoid gene and using a statistical model to other Cannabinoids in place of THC synthase, olivetol acid US 2016/0177404 A1 Jun. 23, 2016 synthase, divarinic acid synthase limonene synthase, and 0060 Example of a Sequence that Encodes a Limonene alpha pinene synthase. Specific examples of other Such Synthase sequences include the following: 0061 >Gil 1127901541gb|DQ8394.04.1 Cannabis sativa (-)-Limonene Synthase mRNA, Complete Cds 0058 Example of a Sequence that Encodes an Olivetol Synthase (SEO ID NO: 4 O7, 653) 0059) >Gil 171363646Dbj|AB164375.1| Cannabis ATGCAGTGCATAGCTTTTCACCAATTTGCTTCATCATCATCCCTCCCTAT sativa OLS mRNA for Olivetol Synthase, Complete Cds TTGGAGTAGTATTGATAATC

(SEQ ID NO: 4 O7, 652) GTTTTACACCAAAAACTTCTATTACTTCTATTTCAAAACCAAAACCAAAA ATGAATCATCTTCGTGCTGAGGGTCCGGCCTCCGTTCTCGCCATTGGCAC CTAAAATCAAAATCAAACTT CGCCAATCCGGAGAACATTT GAAATCGAGATCGAGATCAAGTACTTGCTACTCCATACAATGTACTGTGG TATTACAAGATGAGTTTCCTGACTACTATTTTCGCGTCACCAAAAGTGAA TCGATAACCCTAGTTCTACG CACATGACTCAACT CAAAGA ATTACTAATAATAGTGATCGAAGATCAGCCAACTATGGACCTCCCATTTG AAAGTTTCGAAAAATATGTGACAAAAGTATGATAAGGAAACGTAACTGTT GTCTTTTGATTTTGTTCAAT TCTTAAATGAAGAACACCTA CTCTTCCAATCCAATATAAGGGTGAATCTTATACAAGTCGATTAAATAAG AAGCAAAACCCAAGATTGGTGGAGCACGAGATGCAAACTCTGGATGCACG TTGGAGAAAGATGTGAAAAG TCAAGACATGTTGGTAGTTG GATGCTAATTGGAGTGGAAAACTCTTTAGCCCAACTTGAACTAATTGATA AGGTTCCAAAACTTGGGAAGGATGCTTGTGCAAAGGCCATCAAAGAATGG CAATACAAAGACTTGGAATA GGTCAACCCAAGTCTAAAAT TCTTATCGTTTTGAAAATGAAATCATTTCTATTTTGAAAGAAAAATTCAC CACTCATTTAATCTTCACTAGCGCATCAACCACTGACATGCCCGGTGCAG CAATAATAATGACAACCCTA ACTACCATTGCGCTAAGCTT ATCCTAATTATGATTTATATGCTACTGCTCTCCAATTTAGGCTTCTACGC CTCGGACTGAGTCCCTCAGTGAAGCGTGTGATGATGTATCAACTAGGCTG CAATATGGATTTGAAGTACC TTATGGTGGTGGAACCGTTC TCAAGAAATTTTCAATAATTTTAAAAATCACAAGACAGGAGAGTTCAAGG TACGCATTGCCAAGGACATAGCAGAGAATAACAAAGGCGCACGAGTTCTC CAAATATAAGTAATGATATT GCCGTGTGTTGTGACATAAT ATGGGAGCATTGGGCTTATATGAAGCTTCATTCCATGGGAAAAAGGGTGA GGCTTGCTTGTTTCGTGGGCCTTCAGAGTCTGACCTCGAATTACTAGTGG AAGTATTTTGGAAGAAGCAA GACAAGCTATCTTTGGTGAT GAATTTTCACAACAAAATGTCTCAAAAAATACAAATTAATGTCAAGTAGT GGGGCTGCTGCGGTGATTGTTGGAGCTGAACCCGATGAGTCAGTTGGGGA AATAATAATAATATGACATT AAGGCCGATATTTGAGTTGG AATATCATTATTAGTGAATCATGCTTTGGAGATGCCACTTCAATGGAGAA TGTCAACTGGGCAAACAATCTTACCAAACTCGGAAGGAACTATTGGGGGA TCACAAGATCAGAAGCTAAA CATATAAGGGAAGCAGGACT TGGTTTATTGAAGAAATATATGAAAGAAAACAAGACATGAATCCAACTTT GATATTTGATTTACATAAGGATGTGCCTATGTTGATCTCTAATAATATTG ACTTGAGTTTGCCAAATTGG AGAAATGTTTGATTGAGGCA ATTTCAATATGCTGCAATCAACATATCAAGAGGAGCTCAAAGTACTCTCT TTTACTCCTATTGGGATTAGTGATTGGAACTCCATATTTTGGATTACACA AGGTGGTGGAAGGATTCTAA CCCAGGTGGGAAAGCTATTT ACTTGGAGAGAAATTGCCTTTCGTTAGAGATAGATTGGTGGAGTGTTTC TGGACAAAGTGGAGGAGAAGTTGCATCTAAAGAGTGATAAGTTTGTGGAT TATGGCAAGTTGGAGTAAGA TCACGTCATGTGCTGAGTGA TTTGAGCCACAATTCAGTTACTTTAGAATAATGGATACAAAACTCTATG GCATGGGAATATGTCTAGCTCAACTGTCTTGTTTGTTATGGATGAGTTGA TCTATTAACAATAATTGATG

GGAAGAGGTCGTTGGAGGAA ATATGCATGACATTTATGGAACATTGGAGGAACTACAACTTTTCACTAAT

GGGAAGTCTACCACTGGAGATGGATTTGAGTGGGGTGTTCTTTTTGGGTT GCTCTTCAAAGATGGGATTT

TGGACCAGGTTTGACTGTCG GAAAGAATTAGATAAATTACCAGATTATATGAAGACAGCTTTCTACTTTA

AAAGAGTGGTCGTGCGTAGTGTTCCCATCAAATATTAA CATACAATTTCACAAATGAA US 2016/0177404 A1 Jun. 23, 2016

- Continued - Continued TTAAGGGAGTATTGGCTTTA TTGGCATTTGATGTATTACAAGAACATGGTTTTGTTCACATTGAATACTT TATGAAGCTTCATTCTATGTGAAAAATGGTGAAAATATTTTGGAGGAAGC CAAGAAACTGATGGTAGAGT TAGGGTTTTCACAACAGAAT TGTGTAAACATCATTTGCAAGAGGCAAAATGGTTTTATAGTGGATACAAA ATCTCAAAAGATATGTAATGATGATTGATCAAAACATAATATTAAATGAT CCAACATTGCAAGAATATGT AATATGGCAATATTAGTGAG TGAGAATGGATGGTTGTCTGTGGGAGGACAAGTTATTCTTATGCATGCAT ACATGCCTTGGAGATGCCACTTCATTGGAGGACTATAAGAGCAGAAGCTA ATTTCGCTTTTACAAATCCT AGTGGTTCATTGAAGAATAT GTTACCAAAGAGGCATTGGAATGTCTAAAAGACGGTCATCCTAACATAGT GAGAAGACACAAGACAAGAATGGCACTTTGCTTGAATTTGCGAAATTGGA TCGCCATGCATCGATAATAT TTTCAACATGCTTCAATCAA TACGACTTGCAGATGATCTAGGAACATTGTCGGATGAACTGAAAAGAGGC TATTTCAAGAAGATCTAAAACATGTCTCGAGGTGGTGGGAACATTCTGAG GATGTTCCTAAATCAATTCA CTTGGAAAGAATAAAATGGT ATGTTATATGCACGATACTGGTGCTTCTGAAGATGAAGCTCGTGAGCACA TTATGCTAGAGATAGATTGGTAGAGGCTTTTCTATGGCAGGTTGGAGTAA TAAAATATTTAATAAGTGAA GATTTGAGCCACAATTCAGC TCATGGAAGGAGATGAATAATGAAGATGGAAATATTAACTCTTTTTTCTC CACTTTAGGAGAATATCTGCAAGAATATATGCTCTAATTACAATCATAGA AAATGAATTTGTTCAAGTTT TGACATATATGATGTGTATG GCCAAAATCTTGGTAGAGCGTCACAATTCATATACCAGTATGGCGATGGA GAACATTGGAAGAGTTAGAGCTTTTCACCAAGGCTGTTGAGAGATGGGAT CATGCTTCTCAGAATAATCT GCGAAGACCATACACGAGTT ATCGAAAGAGCGCGTTTTAGGGTTGATTATTACTCCTATCCCCATGTAA ACCAGATTATATGAAGTTGCCTTTCTTTACTTTATTTAACACCGTAAATG 0062) Example of a Sequence that Encodes an Alpha Pinene Synthase AAATGGCGTATGATGTATTA 0063 >Gil 1127901561Gb|DQ8394.05.1 | Cannabis sativa GAAGAGCATAATTTTGTCACCGTTGAATAC CTCAAGAACTCGTGGGCAGA (+)-Alpha-Pinene Synthase mRNA, Complete Cds GTTATGTAGGTGCTATTTGG

AAGAGGCAAAATGGTTCTATAGCGGATACAAACCAACCTTGAAAAAATAT (SEQ ID NO: 4 O7, 654) ATGCATTGCATGGCTGTTCGCCATTTCGCTCCATCGTCATCGCTCTCCAT ATTGAGAACGCCTCGCTTTC

ATTTTCGAGTACTAATATTA AATAGGAGGACAAATTATTTTTGTATATGCTTTTTTCTCTCTTACAAAGT

ATAATCATTTTTTTGGTAGAGAAATTTTTACACCAAAAACATCTAATATT CCATAACAAACGAGGCCTTA

ACAACAAAAAAATCAAGATC GAGTCCTTGCAAGAGGGTCATCACGCTGCATGTCGCCAAGGATCCTTAAT

AAGACCTAATTGCAATCCAATCCAATGTAGTTTGGCCAAAAGCCCTAGTA GTTACGACTTGCAGATGATC

GTGATACTAGTACAATTGTT TAGGAACATTGTCGGATGAAATGAAAAGAGGCGATGTTCCTAAATCAATT

AGAAGATCAGCCAACTATGATCCTCCCATTTGGTCTTTTGATTTCATTCA CAATGTTATATGCACGATAC

GTCTCTTCCATGCAAATATA TGGTGCTTCTGAAGATGAAGCTCGTGAGCACATCAAATTTTTGATAAGTG

AGGGAGAACCCTATACAAGTCGATCGAATAAGCTAAAAGAAGAAGTGAAA AAATATGGAAGGAGATGAAT

AAGATGTTAGTTGGAATGGA GATGAAGATGAATATAACTCTATTTTCTCTAAAGAGTTTGTTCAAGCTTG

AAACTCTTTAGTCCAACTTGAGTTGATTGATACATTACAAAGACTTGGAA CAAAAATCTTGGTAGGATGT

TATCTTATCATTTTGAGAAT CATTATTTATGTATCAACATGGAGATGGACATGCTTCTCAAGATAGCCAT

GAAATCATTTCTATTTTGAAAGAATATTTCACTAATATTAGTACTAATAA TCAAGGAAACGTATTTCAGA

AAACCCTAAATATGATTTAT TTTAATTATTAATCCTATTCCTTTATAA

ATGCCACTGCTCTCGAATTTAGGCTTTTACGCGAATATGGATATGCAATA 0064. In other aspects, the invention is directed to method CCTCAAGAAATATTTAATGA of sequencing a genome of a target species within a genus, TTTTAAGGACGAGACGGGAAAGTTCAAAGCGAGTATTAAAAATGATGATA wherein the genome of the species within the genus vary by about 1 in about 100 bases. Next Generation sequencers drop the cost of sequencing genomes 100,000 fold by using one US 2016/0177404 A1 Jun. 23, 2016

clever trick. They know what they looking for. The majority variants were called on all 3 and a Venn Diagram of the of these massively parallel short read (<400 bp) sequencing variation within all there species were generated for novel systems are Successful at sequencing humans because there is strains being sequenced. This was computationally much a reference genome to compare short reads to. Since the cheaper than a full blown de novo assembly for each strain human genome is not very polymorphic only 1 in 1000 letters and provided important information, which a de novo assem is different. This means that most reads from a Next Genera bly may miss as it leverages the information of what is already tion sequencer map to the genome perfectly and when there is known about the plants and will be more tolerant to repeat a variant there is most likely only one in that 100 bp read. Structures. 0065. Each human genome sequenced on SOLiD or Illu 0070. In the method of sequencing a genome of a target mina usually generates 4M SNPs and 400,000 deletion or species within a genus, wherein genomes of species within insertion polymorphisms and 40,000 large copy number the genus vary by about 1 base in about 100 bases, the method variations of structural variations larger than 1,000 bases. comprises obtaining sequencing reads of the genome of the Since humans diverged so recently, we are mostly the same target species (e.g., using massively parallel sequencing), that makes resequencing the human genome a very easy aligning the sequencing reads to at least two different refer analysis problem. One can load the 3 billion bases into RAM ence sequences, wherein each reference sequence is a known and scan every read across this index and find locations for sequence of a species within the genus; and obtaining a con where all the reads should be placed and regions where muta sensus of variation between the sequence of the target species tions occur with commodity hardware. This is described as an and each reference sequence, thereby sequencing the genome algorithmic problem that scales to N of the reads in the analy of the target species. In a particular aspect, the sequencing sis. More reads-linearly more time but the reference genome reads are aligned to at least three reference sequences (e.g., is always hg19 (the human genome in genbank). This is all Cannabis sativa, Cannabis indica, ). possible because the human genome project spent billions of 0071. The genetics governing the synthesis of the 85 dollar first making this reference with expensive tools that phyto-cannabinoids found in Cannabis Sativa L. are only generate long reads. known for the tetrahydrocannabinolic acid (THCA) and can 0066. This long read process is very different. When there nabidiolic acid (CBDA) synthase pathways. While, the Can is no reference genome to work with one must compare every nabis Sativa sequence of Purple Kush has recently been com read to all other reads so if you have 20 Million reads, the pared to hemp, less is known in regards to how each medicinal computation problem is now 20M readsx20Mreads or 400 strain of cannabis may vary with respect to each other. To this Trillion comparisons. This is called a N2 (N squared) prob end, presented herein is a de novo assembly of the medicinal lem as its not linear but multiplicative based on the read plants Cannabis Sativa and Cannabis Indica. These diploid numbers. Some advancements in algorithms have made this assemblies range in size from 300Mb to 727 Mb, are 65% AT, an N log N problem by Sorting reads and using Small word and have mitochondrial genomes up to 415 Kb. Over 1.5 sizes but this is still Substantially more computationally inten million SNVs for the Sativa genome, 925,602 SNVs for the sive than resequencing and alignment to a reference. In other Indica genome, and approximately 4M single nucleotide vari words this is computationally a much more difficult problem ants (SNVs) compared to the recently published Purple Kush, than matching reads to a 3 Billion letter sequence. This is 30% of which are found in both our Sativa and Indica refer known as “de novo' sequencing as opposed to “resequenc ences, are detailed. These assemblies cover over 85% of the ing used for most humans today. Cannabis RNA-Seq sequence in genbank. Of particular inter 0067. There are some examples of people using de novo est is a copy number variation in the synthase genes respon assembly on humans despite its excessive costs as it is thought sible for cannabigerolic acid (CBGA) conversion to THCA. to be more thorough but this is still very bleeding edge in Also evident is flower to root differential expression of this terms of its completeness next to re-alignment. Some have expanded gene family and novel synthase homologs not Suggested to perform a hybrid approach to get the best of both found in the Purple Kush assembly. These data provide selec methods. tive breeding strategies to alter medicinal expression. 0068. With the costs of DNA sequencing plummeting the 0072. Non-psychoactive cannabinoids like cannabidiol cost to perform the easier Re-alignment process is still at least (CBD) and cannabidiolic acid (CBDA) exhibit evidence of half the costa genomics experiment and de novo assembly is tumor specific apoptosis in 9 different cancer cell types, pain likely 90% of the cost of the sequencing project so efficient management via cox-2 inhibition, effectiveness with anti use of the computational architecture is now more important emesis from chemotherapy, and enhanced muscle spasm con than cheaper sequencing methods. trol in patients with MS. Separately, the FDA has approved 0069. Until now, cannabis has never had its entire genome the use of cannabinoid drugs Dronabinol and Nabilone for sequenced. As shown herein, in sequencing Cannabis it was chemotherapy related nausea and HIV related appetite stimu discovered that the polymorphism rate in the plant was 10x lation. 84 other cannabinoids have been measured in Can higher than in humans. This means the re-alignment problem nabis and their expression varies tremendously plant to plant. needed to be re-invented to even work and enable a non de The pharmacology of cannabinoids has been transformed novo assembly approach. To this end, a method to generate with the discovery of the human endocannabinoid pathways not 1 reference sequence but 2 or more references was and the endogenous human neurotransmitters anandamide devised. PIn a particular aspect, 3 reference sequences, one and 2-AG. Two human G-Protein coupled receptors (GPCRs) for each of the known cultivars in the field are used. Cannabis known as CB and CB have been extensively characterized has 3 known species; Sativa, Indica and Ruderalis. These 3 and are encoded by CNR1 and CNR2 genes on chromosome have been interbred and the strategy devised herein involved 6 and 1, respectively. Mutations in these human receptor back crossing each of these strains to be pure species and then genes are associated with increased addiction and extreme making a reference genome from each of them. By having 3 body mass index. Three additional GPCRs (GPR55, GPR18 reference genomes the reads were aligned to all 3 references, and GPR119) are showing evidence as potential endocannab US 2016/0177404 A1 Jun. 23, 2016

inoid receptors. Combined with an extremely low therapeutic was added and heated to 65°C. for 10 minutes while inverting index, these reported medical benefits have resulted in a and vortexing for a minute every 3 minutes. Plant material “compassionate use exemption' with 16 states and the Dis was placed into an IKA turrax tissue homogenizer tube mixer trict of Columbia decriminalizing medical use of cannabis in prefilled with 5 ml of AP1 and vorterxed at top speed for 10 the United States for non-FDA approved “off label' indica seconds and 2 minutes at 2000 rpm. Morter and Pescle tions. Despite the popular medicinal use, the genetics of the homogenization with liquid nitrogen was used but yields can GPCR targets and genes governing the cannabinoid expres vary. With the exception of the 3x increased AP1, the rest of sion remain only partially characterized. the protocol followed was according to Qiagens plant mini 0073. Due in part to prohibition, the cannabis plant has prep volume suggestions (part number in 2011 is 69104) been selectively bred in the last 30 years to express very high (increased everything 3x accordingly with the exception of tetrahydrocannabinol (THC) levels (above 20% in the flower the final elution step). Qiagen MaxiPrep columns can also be weight). Due to THCA and CBDA synthase competition for used to handle the increased 3x Volume recommendation. their shared pathway precursor CBGA, this selective pressure Lower volumes showed lower yield as the plant oils seem to has come at the cost of most strains available today containing interfere with the prep but this was dependent on how dry the very low cannabidiol (CBD) content (below 1% flower sample is. Fresh plant clippings used 2x Volume recommen weight). This in turn has prompted considerable interest in the dations and 1x delivered DNA. DNA purified with this genetics controlling chemotype. To this end, others have method was predominantly more than 10,000 bases in length demonstrated that the cannabinoid contents are under strict for 10 different cultivars according to E-Gel 1% gel analysis. genetic control and can be predicted from DNA sequence Fragments could be larger due to the gels resolution. information before the plant has expressed active com (0077. After Qiagen isolation, DNA most likely didn't pounds. This study has stimulated many questions in regards freeze do to glycols, terpenes and other pigments in the iso to the genetics controlling the other cannabinoids, as well as lation. Use of Beckman Genomics Ampure was used to clean the 140 terpenes reportedly expressed in the plant. These these samples up (formerly known as Agencourt Ampure). terpenes also compete for an IPP cannabinoid precursor. At 100 ul of Ampure to 100 ul of sample instead of the Manu least one of these terpenes, (Beta-) is reported facturers instructions of 180ul of Ampure to 100 ul of sample to be a volatile CB2 receptoragonist with anti-inflammatory was used to save on reagents and keep the conditions within effects. the volume of a 96 well plate and a 96 well magnet plate 0074. Described herein is the generation of a draft de novo magnetic field. reference sequence for the C. Sativa and C. Indica genomes 0078 Lower ratios of Ampure (50 ul to 100ul) were tested with a focus on resolving the high polymorphism rates in the and worked well. This lowered cost but quantitative yields synthase genes. This provides a view of drug type strain across many cultivars may vary. This DNA was clean enough differences along with a complementary tool for many ongo to freeze and used in most next generation sequencing library ing investigations in other cultivars. construction kits like the SPRIworks system from Beckman. Multiple different libraries can be made from fragment librar EXEMPLIFICATION ies to jumping libraries or even RNA libraries. Described below is the simplest library but those skilled in the art will Example 1 know how to apply and RNA or DNA prep to a kit that converts this DNA or RNA to sequencable material. What is Methods important is to be able to purify the DNA from a plant high in 0075) DNA was purified with Qiagen Mini and Maxiplant oil, cannabinoid and terpenes content to ensure it will be pure DNA purification Kits. Sativa cultivar “Chemdawg” and enough to be enzymatically active. Indica cultivar "L.A. Confidential were used as the first (0079 Fragment libraries are short (less than 1000 bases reference genomes (DNA Genetics). CBD and THC levels and usually less than 600 bp). To get DNA this small after were measured with HPLC and GC analysis by Steep Hills isolation from a plant, a covaris or nebulization device from Lab. Results were verified with Thin Layer Chromatography Life Technologies was used to shear the high molecular prior to sequencing (Montana Biotech). Sequencing of the weight (HMW) DNA into smaller fragments that were ame Indica reference genome was accomplished with twelve 454 nable to the Next Generation Sequencers (Illumina, SOLiD, GS FLX-700 bp runs delivering and an estimated 12x cov 454. Ion Torrent, Pacific Biosciences, Helicos and others). erage. Genome sequencing and assembly was performed by 0080 Purified DNA was nebulized/sonicated/acoustic the 454 Sequencing center in Branford Conn. with Newbler. bombardment (Covaris Corp) or hydrodynamically sheared to The Sativa strain utilized a hybrid assembly approach with break the DNA down to more managable pieces as large DNA 100x of 2x100 ILMN HiSeq (651M reads, 131 Gb of PF acts like a viscous polymer which is difficult to manage and filtered data) sequencing reads combined with an additional inefficient in ligation. Once HMW DNA was broken into four 454 FLX 400 bp runs. These reads were assembled with Smaller pieces, known sequences or "Primers' (also known as CLCbio Genomics Workbench 4.7.1. High quality reads not “Adaptors') were added to both ends of the DNA fragment. mapping to the assembly were retained for separate de novo These known sequence sites can be any sequence a person assembly. desires but are preferable sequences the popular DNA 0076. To PCR or Sequence DNA from Cannabis, Plant sequencing platforms utilize for sequencing. Once "Adapted DNA material was purified from the plant. 100-300 mg of dry the distribution was measured with an Agilent Bioanalyzer or plant material was first diced into fine plant fragments with a other gel eletrcophoresis device and decide if size selection is knife or razor. This material was then added to Qiagen Plant needed to narrow the library size distribution. The Agilent gel Lysis buffer or AP1 was added. 2x more lysis buffer than the was size selected as its distribution was large but this is very manufacturer recommended was added as the plant flowers dependent on the sequencing platform and strategy. The size are very lipophilic. For each 1 g of plant material 10 ml of AP1 range of DNA for sequencing was selected. It’s preferable to US 2016/0177404 A1 Jun. 23, 2016 have a very tight size distribution, e.g., much tighter than the governing the dominant phenotype monitored with selective initial HMW prep where fragments range from 50 bp to 1500 breeding. With short reads alone, phasing the sequence to bp. A fraction of this material in the 300-400 bp range was provide accurate amino acid prediction was challenging, collected and a Polymerase Chain Reaction performed to however many SNPs in the THC synthase gene are nicely make many copies of the molecules in this size range. Once phased with the 750 bp 454 data. Evidence for a gene expan many copies were made they were put on a NeXt Generation sion can be seen in this data with the increased genome Sequencer for Massively Parallel Sequencing. The fragment coverage in this location (FIG. 1). One can see more phased distribution for the sheared library DNA measured was alleles than expected with a diploid plant. On the boundaries obtained on an Agilent Bioanalyzer for the ChemDawg cul of this gene a sequence with homology to the mPIF transpo tivar sequenced to over 350x coverage on the Illumina HiSeq son family (evalue of 2e-6) was observed that likely explains 2000 platform by Beckman Genomics. The distribution after the expansion. This region has coverage 100 fold higher than size selection and PCR was also obtained. average and is likely an assembly knot but multiple 700 bp 0081. To address the polymorphism rate in the genome, a reads with THC synthase sequence read into the mPIF triple backcrossed pure Indica cultivar named LA Confiden homologous sequence implying copies of THC synthase tial (DNA Genetics, NL) was chosen to build a reference were in tight linkage with this putative transposable element. genome with over 12 million 454 GS FLX+750 bp reads (6.4 As with other mPIF transposons, a long inverted sequence is Gb). The genome was assembled with three different align present 5' to the THC synthase gene (FIG. 2B). The Hairpin ment stringencies on CLCbio workbench (0.8 or default, 0.9 seen using mFold in the putative mPIF transposon sequence 5' and 0.95). N50 contigs of 1500-1600 bp and genome sizes to the gene in the Sativa Assembly. Also observed in the 454 ranging from 280 Mb to 303 Mb were obtained. An outbred sequence on reads which map to THC but have frayed high Sativa cultivar known as "Chemdawg” was also sequenced quality ends. with 131 Gb from Illumina's HiSeq platform with 2x100 0085 >ALT-THC SYNTHASE 83553 reads from 250 bp inserts. 164M paired reads (single lane of 7) were assembled with the CLCbio workbench and resulted in N50s of 2.2 Kb and a genome size of 288 Mb. (SEO ID NO: 4 O7, 65O) 0082 To assess genome completeness, all Cannabis DNA ACAATATTCTTTTACTATAAAACTTCAATTATCATTTTAAGAACACGTAC sequence in Genbank were aligned to the Indica reference and CAAAAATTTTAATAATAAATATATTATAATGTTCTAATCCATTGAACATG significant blast hits for over 98.3% of the entries were found. Many of these entries were mRNA sequences and thus TAAACTAAAATTGTTCCATAAACATATAAGCTCAAATAATATTATTTTAT enriched for euchromatic sequence. To assess the heterochro TTGCTATTGAAATAAGAAAGACAATTTATTTTATTACATATATCTTATGA matic coverage the number of reads (filtered of dots and polyclonals) not mapped in the varying assemblies was mea TAGTCTACACAGTTGTAATGTAGATTTTCATACTTGGGAGCATACATAGT sured. These ranged from 9.8% of the reads at the default alignment Stringency to 33% of the reads at the most stringent ATGGGT. assembly conditions. To complement this all of the Sativa I0086 DNA sequence of the THCA synthase gene reported reads were mapped to the Indica references where non by Kojoma et al. unique sequence was left unmapped and only 22% of the I0087 Highlighted and underlined section, CTC reads were found to not map to the 0.95 stringent Indica GAAGCGGTGGCC, is the FAD binding domain. High reference. The Indica reads with the 0.9 mapping stringency lighted region, CACTTAGT, is the mPIF signal described by were mapped back to the stringent Indica assemble and 14% Zhang et al. 2001 Proc Natl Acad Sci., USA 98(22): 12572 of the reads were found to not map indicating a genome size 12577 of 346 Mb. Using the methods described by Xuetal (Xu et al. 2001, Natl Biotech, 29(8):73741) a 396 Mb genome size was I0088 >Gi81158005|Db|AB212841.1 Cannabis sativa estimated using the total kmer number/kmer volume of the Gene for Tetrahydrocannabinolic Acid Synthase, Partial Cds, Sativa assembly. This differs from prior published reports on Strain:078 the genome size (Sakamoto) of 1.4 pg per diploid genome but flow sorting technique can be very sensitive to GC content (SEQ ID NO: 4 O7, 651) based on the stains used (Greilhuber 2005, Ann Bot, 95(1): ATGAATTGCTCAGCATTTTCCTTTTGGTTTGTTTGCAAAATAATATTTTT 91-98) and male plants are known to have larger genomes than female cannabis genome sequenced in this study. Reads CTTTCTCTCATTCAATATCCAAATTTCATTAGCTAATCCTCAAGAAAACT that don't assemble have a GC content of Y% and consist of low complexity sequence. TCCTTAAATGCTTCTCGGAATATATTCCTAACAATCCAGCAAATCCAAAA 0083) To assess polymorphisms on a draft genome, reads TTCATATACACTCAACACGACCAATTGTATATGTCTGTCCTGAATTCGAC to the consensus assemblies were remapped to look for single nucleotide polymorphisms (SNPs) and deletion/insertion AATACAAAATCTTAGATTCACCTCTGATACAACCCCAAAACCACTCGTTA polymorphisms (DIPs) (Indels). This produces heterozygous TTGTCACTCCTTCAAATGTCTCCCATATCCAGGCCAGTATTCTCTGCTCC SNPs for self mappings but heterozygous and homozygous SNPs for cross cultivar mappings. As expected, the more AAGAAAGTTGGTTTGCAGATTCGAACTCGAAGCGGGGCCATGATGCTGA outbred Sativa cultivar had more variation than the triple GGGTTTGTCCTACATATCTCAAGTCCCATTTGCTATAGTAGACTTGAGAA backcrossed Indica and both cultivars exhibited a high degree of polymorphism as compared to the variation content seen ACATGCATACGGTCAAAGTAGATATTCATAGCCAAACTGCGTGGGTTGAA the human genome. GCCGGAGCTACCCTTGGAGAAGTTTATTATTGGATCAATGAGATGAATGA 0084. The THC synthase genes display a polymorphism rate closer to 5% perhaps explained by this being a gene US 2016/0177404 A1 Jun. 23, 2016

- Continued mately 90% monoterpenes and 7% sesquiterpenes and vari GAATTTTAGTTTTCCTGGTGGGTATTGCCCTACTGTTGGCGTAGGTGGAC ous other ketones and esters. One of the closest relatives to cannabis, Humulus lupulus or Hops has sequenced EST ACTTTAGTGGAGGAGGCTATGGAGCATTGATGCGAAATTATGGCCTTGCG libraries extracted from the glandular trichomes (Wang et al. GCTGATAATATCATTGATGCACACTTAGTCAATGTTGATGGAAAAGTTCT 2008, Plant Physiol, 148(3):1254-1266) identifying over 22 unigenes encoding terpene biosynthesis. AGATCGAAAATCCATGGGAGAAGATCTATTTTGGGCTATACGTGGTGGAG

GAGGAGAAAACTTTGGAATCATTGCAGCATGGAAAATCAAACTTGTTGTT Polymorphisms in the Human Endocannabinoid Pathways 0092. To understand the variation found in the cannabi GTCCCATCAAAGGCTACTATATTCAGTGTTAAAAAGAACATGGAGATACA nome and the impact of phyto-cannabinoids, the polymor TGGGCTTGTCAAGTTATTTAACAAATGGCAAAATATTGCTTACAAGTATG phism in the human endocannabinoid pathways are of equal and relevant interest. Harismendy et al demonstrate SNPs ACAAAGATT'TAATGCTCACGACT CACTTCAGAACTAGGAATATTACAGAT which impact body mass index (BMI) in the Fatty Acid amide AATCATGGGAAGAATAAGACTACAGTACATGGTTACTTCTCTTCCATTTT hydrolase (FAAH) and the monoglyceride lipase (MGLL) genes (Harismendy et al. Genome Biol, 11(11):R118). These TCTTGGTGGAGTGGATAGTCTAGTTGACTTGATGAACAAGAGCTTTCCTG genes encode that catabolize endocannabinoids, anandamide (AEA) and 2-arachidonyl glycerol (2-AG) AGTTGGGTATTAAAAAAACTGATTGCAAAGAATTGAGCTGGATTGATACA respectively. The commonly used and thermoregu ACCATCTTCTACAGTGGTGTTGTAAATTACAACACTGCTAATTTTAAAAA latory prodrug is known to require FAAH to metabolize paracetamol with anandamide to form AM404. GGAAATTTTGCTTGATAGATCAGCTGGGAAGAAGACGGCTTTCTCAATTA This metabolite is thought to be an endocannabinoid re-up AGTTAGACTATGTTAAGAAACTAATACCTGAAACTGCAATGGTCAAAATT take inhibitor preventinganandamide clearance from the Syn aptic cleft analogous to SSRI drugs regulation of TTGGAAAAATTATATGAAGAAGAGGTAGGAGTTGGGATGTATGTGTTGTA reuptake. This helps to explain one of the cannabinoids CCCTTACGGTGGTATAATGGATGAGATTTCAGAATCAGCAATTCCATTCC reported benefits in pain management (Hogestatt et al. 2005, JBiol Chem, 280(36):31405-31412). In addition, AM404 has CTCATCGAGCTGGAATAATGTATGAACTTTGGTACACTGCTACCTGGGAG been shown to be an agonist of the TRPV1 or vanilloid recep tors much like capsaicinfound in many cayenne and other red AAGCAAGAAGATAACGAAAAGCATATAAACTGGGTTCGAAGTGTTTATAA peppers and an inhibitor of cyclooxigenase COX-1 and COX TTTCACAACGCCTTATGTGTCCCAAAATCCAAGATTGGCGTATCT CAATT 2. These findings prioritize a more thorough understanding of the 85 cannabinoids and the polymorphic diversity of the ATAGGGACCTTGATTTAGGAAAAACTAATCCTGAGAGTCCTAATAATTAC FAAH, MGLL, TRPV1 receptors and the genes encoding ACACAAGCACGTATTTGGGGTGAAAAGTATTTTGGTAAAAATTTTAACAG human cyclooxigenases. 0093. The findings of Harismendy suggest that polymor GTTAGTTAAGGTGAAAACCAAAGCTGATCCCAATAATTTTTTTAGAAACG phism content in the human endocannabinoid pathway can AACAAAGTATCCCACCTCTTCCACCGCATCATCAT better guide patients to cultivars with more favorable cannab inoid content. Independent isolation of cannabinoids has I0089 Interestingly the THC synthase gene has a CWCT resulted in FDA approved drugs (THC or MarinolTM) but TAGWC (Zhang et al. 2001, Proc Natl Acad Sci., USA, studies have shown a 330% increase in efficacy with com 98(22): 12572-12577) motif at base 630. This is one base bined CBD and THC delivery resulting in the European different from the motifs seen in different plants for mPIF approved SativeXTM (Fairbairn and Pickens 1981, Br J Phar integration (CWCTTAGWG) although Zhang etal report the macol, 72(3):401-409). Patients still report better outcomes outer base has only 61% conservation. Integration events mid from the whole plant extracts suggesting synergistic effects of gene (1635 bp full length) would be expected to multiply a the shotgun therapy and an interest in how each popular truncated peptide but the active site including the FAD bind cultivar may vary in expression of active content. Cultivars ing domain would remain un-altered at base 165. that express THCV as another therapeutic cannabinoid are now being pursued. This genome sequence provides a tool to Homologs of the Cannabinoid Synthase Genes help selectively breed higher expression levels of various 0090 The increased coverage of the THC synthase gene cryptic cannabinoids into plants to better study the impact of and its 90% homology to CBD synthase could be a result of the cannabinoid and terpene repertoire. many other novel synthase genes being collapsed in assem bly. Description of ClustalW and Medicinal Genomics THC Synthase Sequences. Terpene Biosynthesis 0094 ClustalW is a tool which takes similar Sequences 0091 Terpenes are another class of molecules expressed and “clusters' them together so one can see them aligned and in plants that exhibit antifungal, antibiotic and other medici compared to each other. As an example provided herein is a nal properties like vitamin A and Taxol. Gallucci et al dem ClustalW of the 16 known THC Synthase sequences which onstrate the benefits of combination therapy of penicillin and were in Genbank to date. various terpenes on MRSA. Vitis Vinifera or grapes have 40 0.095 Areas where polymorphisms existed were deter unigenes related to the terpene synthesis (Martin et al., BMC mined. Other Java based viewers can also be used. These can Plant Biol, 10:226) and Cannabis has reports of at least 68 be very helpful tool for comparing new sequences and finding Terpenes using headspace gas chromatography and up to 140 amino acid altering differences. This was done for multiple terpenes (Ross and ElSohly 1996) consisting of approxi sequences from C. Indica genome which have some variation US 2016/0177404 A1 Jun. 23, 2016

in the THC synthase DNA sequence and some of this accomplished with sixteen 454 GS FLX-700 bp runs deliv sequence variance is Amino Acid altering making them very ering and 14.x coverage. Genome sequencing and assembly important variations as they impact the synthesis of THC and was performed by the 454 Sequencing Service Center in probably CBD and a variety of other Cannabinoids. Branford Conn. assembled with Newbler. The Sativa strain was sequenced to 327x coverage with 2x100 ILMN HiSeq Discussion (651M reads, 131 Gb of PF filtered data) sequencing reads performed by Beckman Genomics The Illumina and 454 0096 Gregor Mendel pioneered genetics working with assemblies 10, 11, & 12 were assembled with CLCbio Pisum sativum, an angiosperm with 10x larger genome and Genomics Workbench 4.7.1. SNP calling was performed with an 8x longer breeding cycle. The recently sequenced Date CLCbio Genomics Workbench 4.7.2. For Illumina data a Palm genome highlighted the challenging genetics presented minimum of 2 pairs was required to calla SNP and the default with a 7 year reproductive cycle (Al-Dous et al., Nat Biotech Neighborhood Quality Scores (NQS) were used. SNP lists nol, 29(6):521-527). Cannabis cultivars flowers in 40-90 were exported as csv files and compared with perl scripts for days making it an ideal candidate for genome directed selec tive breeding once many of the cannabis genomes are overlapping coordinates. sequenced. Prior to this sequence dbEST, dbGSS, dbPLN, Results and dbHTG have a combined sequence for Cannabis of just over 2.05 Mb with 3944 entries. This study represents over a 0100. The outbred Sativa cultivar Chemdawg or “CD 65,000 fold increase in genomic data publically available for Sativa” was sequenced to over 320x coverage with Illumina this plant and brings light to the polymorphism content and 2x100 paired end reads. Single lane assemblies and multi structure governing the medicinal synthase genes. lane assemblies produced very similar fragmented assem 0097. One of the challenges embarking on such a study is blies and demonstrated both high AT content (65.6%) and a maintaining strong chain of custody of the plant matter to high polymorphism rate (0.5% intra-cultivar, 0.63% intercul DNA as few countries have legal mechanisms to obtain plant tivar. To address the polymorphism rate in the genome, a material and legally sold cannabis has few quality and track triple backcrossed pure Indica cultivar named LA Confiden ing standards to afford a properly designed genetic study. tial or “LACIndica” (DNA Genetics, NL) was chosen to build Material accessible through NIDA has been deemed less rel a high-quality reference genome with over 19.5 million 454/ evantas it fails to represent THC levels present in most strains Roche GS FLX+System 700 bp reads. The Indica genome used medicinally today. was assembled with three different alignment stringencies on 0098. As a result, the study described herein was aimed at CLCbio workbench and Newbler. Genome assembly size sequencing one of the more popular C. sativa cultivar estimates of 286-340 Mb for the CD Sativa cultivar were (“Chemdawg”) that has a controversial folklore over its ori obtained based upon the Illumina-CLC assembly, and 676 gin to help drive a genetics based standard in the industry. 727 Mb for the 454 LAC Indica cultivar based upon the 454 Complementing this is the sequence of a triple back-crossed sequencing assembly with N50s of 2.6 Kb. The variation in C. Indica strain (“L.A. Confidential”) where legal commer genome size estimations are a result of the high polymor cial entities are maintaining the seed line (DNA Genetics, phism rate in the genome collapsing, or occasionally split Netherlands). This sequence can better aid the understanding ting, the maternal and paternal alleles in assembly, and is a of the genetics which govern cannabinoid expression and known challenge with modern DNA assemblers. Therefore, the CD Sativa assembly is likely smaller as a result of shorter help build tracking and Standardization tools to enable Can reads inability to phase highly polymorphic branch points in nabis extracts as a more measured therapeutic. the assembly despite the 20 fold higher coverage. The LAC Example 2 Indica results are Supported by van Bakel's genome assembly size estimates for Purple Kush (PK Indica) and flow sorting Methods experiments suggesting 1.4 pg per diploid genome (Saka moto). 0099 DNA was purified with Qiagen Mini and Maxiplant 0101 To assess genome completeness, all cannabis DNA DNA purification Kits in Holland. Briefly, 500 mg of plant sequences in genbank were aligned to the Indica reference tissue was carefully diced with a razor and after addition of and significant blast hits for over 98.3% of the entries were AP1 lysis solution homogenized with an IKA Turrax tissue found. An RNA-Seq assembly is publically available (me homogenizer for 45 seconds on speed 10. Centrifugation dicinalplantgenomics.msu.edu) for a different Sativa cultivar steps were replaced with positive pressure filtration. Eluents (“Mexican or CSA), and BLAST results confirmed that over from the final columns were re-purified with Ampure using a 89% and 85% of the 69,557 transcripts from the CSA cultivar 1:1 Volume of Ampure to sample (Beckman Genomics) and were present in the LAC Indica reference (Any E score, E eluted from the magnetic particles with 65 CddH2O for 5 score

SEQUENCE LISTING The patent application contains a lengthy “Sequence Listing section. A copy of the “Sequence Listing is available in electronic form from the USPTO web site (http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20160177404A1). An electronic copy of the “Sequence Listing will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR1.19(b)(3).

What is claimed is: of the Cannabis plant that are complementary to one or more of the primers hybridize to the one or more prim 1. A method of detecting one or more cannabinoid genes in ers; a Cannabis plant comprising: c) amplifying the one or more sequences that hybridize to the one or more primers, thereby producing one or more a) contacting all or a portion of a genomic sequence of the amplicons; and Cannabis plant with one or more primers that are d) determining all or a portion of the sequence of the one or complementary to SEQ ID NO: 407,644, thereby pro more amplicons, ducing a reaction mixture; thereby detecting one or more cannabinoid genes in the b) maintaining the reaction mixture under conditions in Cannabis plant. which one or more sequences in the genomic sequence