Genome

The genetics of – genomic variations of key synthases and their effect on content

Journal: Genome

Manuscript ID gen-2020-0087.R1

Manuscript Type: Mini Review

Date Submitted by the 17-Sep-2020 Author:

Complete List of Authors: Singh, Aparna; University of Lethbridge, biological sciences Bilichak, Andriy; Morden Research and Development Centre Kovalchuk, Igor; University of Lethbridge Keyword: Cannabis sativaDraft L., , marijuana, THCAS, CBDAS Is the invited manuscript for consideration in a Special Genome Biology Issue? :

© The Author(s) or their Institution(s) Page 1 of 43 Genome

1 The genetics of Cannabis – genomic variations of key synthases and their

2 effect on cannabinoids content

3

4 Aparna Singh1, Andriy Bilichak2 and Igor Kovalchuk1*

5

6 1 – Department of Biological Sciences, University of Lethbridge, Lethbridge, AB T1K

7 3M4, Canada, 2 – Morden Research and Development Center, Agriculture and Agri-Food

8 Canada, Morden, MB R6M 1Y5, Canada

9 * Corresponding author: [email protected]

10

11 Draft

12

13

14

15

16

17

18

19

20

21

22

23

1 © The Author(s) or their Institution(s) Genome Page 2 of 43

24 Abstract

25 Despite being a controversial crop, L. has a long history of

26 cultivation throughout the world. Following recent legalisation in Canada, it is emerging

27 as an important plant for both medicinal and recreational purposes. Recent progress in

28 genome sequencing of both cannabis and hemp varieties allows for systematic analysis

29 of genes coding for enzymes involved in the biosynthesis pathway. Single

30 nucleotide polymorphisms in the coding regions of cannabinoid synthases play important

31 role in determining plant chemotype. Deep understanding of how these variants affect

32 enzymes activity and accumulation of cannabinoids will allow breeding of novel cultivars

33 with desirable cannabinoid profile. Here we present a short overview of the major

34 cannabinoid synthases and present Draftthe data on the analysis of their genetic variants and

35 their effect on cannabinoid content using several in-house sequenced Cannabis cultivars.

36

37 Keywords: Cannabis sativa L., hemp, marijuana, THCAS, CBDAS

38

39

40

41

42

43

44

45

46

2 © The Author(s) or their Institution(s) Page 3 of 43 Genome

47 Introduction

48 Cannabis sativa L. (including marijuana and hemp) is a herbaceous plant belonging

49 to the Cannabaceae family (Vavilov and Freier 1951, Brizicky 1966). Being one of the

50 major source of medicine, oil and fibre, it has been extensively cultivated in many

51 countries (Camp 1936, Godwin 1967, Quimby, Doorenbos et al. 1973, Schultes, Klein et

52 al. 1974, Kriese, Schumann et al. 2004, Laverty, Stout et al. 2019). Since ancient times,

53 the Cannabis plant is valued for its medicinal properties and used for treating pain,

54 nausea, depression, glaucoma, asthma, insomnia, etc. (Mechoulam, Lander et al. 1976,

55 Duke and Wain 1981). Although therapeutic properties of cannabinoids have been

56 extensively studied, the role of phytocannabinoids within plants is poorly understood.

57 Cannabis is diploid and its karyotypeDraft consists of nine autosomes and a pair of sex

58 chromosomes (2n = 18+XX for female or XY for male) (Flemming, Muntendam et al. 2007,

59 Divashuk, Alexandrov et al. 2014, Vyskot and Hobza 2015). The haploid genome size of

60 female and male plants is approximately 818 Mb and 843 Mb, respectively (Sakamoto,

61 Akiyama et al. 1998).

62 The medicinal properties of Cannabis are owed to the presence of terpenophenolic

63 compounds known as cannabinoids. They can modulate the human endocannabinoid

64 system and are useful for various physiopathological processes (Izzo, Borrelli et al. 2009).

65 They are named as cannabinoids due to their typical exhibition of a C21 terpenophenolic

66 structure (Hillig 2004, Brenneisen 2007, De Meijer 2014). To date, more than 120

67 cannabinoids (class of metabolites specific to Cannabis plant), including

68 (CBD), (THC), (CBC), (CBG) and

69 their propyl homologs CBDV, THCV, CBCV, CBGV have been identified including those

3 © The Author(s) or their Institution(s) Genome Page 4 of 43

70 that occur in plant and their derivatives (ElSohly 2007, Radwan, ElSohly et al. 2009, de

71 Meijer and Pertwee 2014). Two of the most common ones are THCA and CBDA, with

72 varying levels among cultivars. The acidic forms of these cannabinoids, THCA and CBDA,

73 are present in major quantities inside plant (De Meijer, Hammond et al. 2009, Swift, Wong

74 et al. 2013). Δ9-tetrahydrocannabinol (THC) is the main psychoactive cannabinoid

75 responsible for therapeutic and hallucinogenic effects and therefore extensively studied

76 (Brenneisen, Egli et al. 1996, Long, Malone et al. 2005, Sirikantaramas, Taura et al.

77 2007). CBG was the first compound isolated from C. sativa in a pure form and considered

78 as an intermediate precursor to most of the phytocannabinoids. Recently, seven more

79 CBG type cannabinoids have been isolated from the buds of the mature female C. sativa

80 plants (Appendino, Giana et al. 2008,Draft Flores-Sanchez and Verpoorte 2008, Radwan,

81 Ross et al. 2008, Radwan, ElSohly et al. 2009, Pollastro, Taglialatela-Scafati et al. 2011).

82 In this review, we will discuss sequence variations in synthase enzymes involved in

83 biosynthesis of cannabinoids, causes of their occurrence as well as their effect on

84 cannabinoid content leading to chemotype diversity. This review will also cover the

85 significance of these variations in distinguishing Cannabis varieties and in the

86 establishment of novel cultivars with unique chemotypes having potential to meet the

87 requirements of future pharmaceutical demands.

88

89 Historical perspective of use of Cannabis-derived products

90 Cannabis is one of the earliest cultivated plant by mankind and is native to western,

91 central and eastern Asia (Li 1974, Small 2015). It has been used traditionally as a herbal

92 medicine in ancient times by Chinese, Tibetan and Indian civilizations (Mechoulam and

4 © The Author(s) or their Institution(s) Page 5 of 43 Genome

93 Parker 2013). Cannabis has also been associated with religious practices in Southern

94 Asia, specially in India, where written records of its holy use were found (Hasan 1974).

95 The evidence of its first cultivation comes from as early as 4000 B.C. to obtain fibre,

96 medicine and food for humans and cattle (Small and Cronquist 1976, Jiang, Li et al. 2006).

97 There were several reports supporting shamanistic uses of Cannabis suggesting ancient

98 Chinese were well aware of its psychotropic properties (Touw 1981, Farag and Kayser

99 2017).

100 Hemp, a type of Cannabis sativa plant species, is presumed as one of the oldest

101 sources of fibre and has been valued for its strength and durability, hence was used for

102 manufacturing ropes and clothes in earlier times (Allegret 2013). Nowadays, it is

103 particularly grown for industrial purposesDraft to obtain its derivatives such as oil, fibre and

104 food. In around 2000 B.C., hemp was introduced as fibre to Egypt, Europe and western

105 Asia (Schultes 1979). Hemp seeds were indeed among one of the five most important

106 grains in ancient China, where it was considered as staple food until tenth century

107 (Cheatham, Johnston et al. 2009). In the modern world, hemp is also grown for its

108 medicinal and nutritional value (Farag and Kayser 2017).

109 Apart from food, Chinese used plant extracts and seeds of Cannabis to treat various

110 illnesses including constipation, malaria, rheumatic pain and female reproductive system

111 disorders. They also used different parts of plants such as roots and foliage as a medicine

112 for various treatments (Wang and Wei 2012). Historically, hemp seeds were used for the

113 treatment of jaundice, sores pain, skin diseases, blood related illnesses and constipation

114 (Callaway 2004). A popular beverage in Scandinavia known as “Maltos-Cannabis” was

115 used in early twentieth century for treating anemia, asthenia, emaciation and pulmonary

5 © The Author(s) or their Institution(s) Genome Page 6 of 43

116 diseases (Dahl and Frank 2011). Considerable evidence of Cannabis use as medicine

117 and recreational drug in different forms (, and ) was also reported from

118 ancient India approximately 1000 years ago. It is considered as a sacred plant in Hindu

119 religion and was used in several religious rituals and ceremonies (Hasan 1974). It has

120 been also actively used as an analgesic, tranquilizer, anticonvulsant, anti-inflammatory,

121 aphrodisiac, antispasmodic and antibiotic. Cannabis was also used in Tibet for religious,

122 medicinal and meditation purposes (Touw 1981). In Africa, Cannabis is known since the

123 fifteenth century and is used for the treatment of snake bite and diseases like malaria,

124 asthma, fever and dysentery. In South America, Cannabis use presumably started for

125 recreational and medicinal purposes during seventeenth and eighteenth century (Zuardi

126 2006, Rubin 2011). Cannabis was extensivelyDraft grown for use as fibre in Europe and north

127 Asia, whereas in Africa and Southern Asia it has been mostly used as a recreational,

128 medicinal, and cultural drug. Due to its classification as narcotics, very limited research

129 related to Cannabis and its effects on human body was conducted previously. Even today,

130 Cannabis is one of the major illicitly cultivated plant in the world, but only recently it was

131 made legal in several parts around the globe, and great advances have been made to

132 understand how cannabinoids affect human brain and nervous system, to develop new

133 Cannabis-based therapeutic products.

134

135 Taxonomical Classification

136 Historically, vernacular taxonomy differentiated three different Cannabis groups - C.

137 sativa (high CBD-containing plant), C. indica (high THC-containing plant) and C. ruderalis

138 (wild-type, equal levels of THC and CBD)(McPartland 2018). DNA barcode analysis

6 © The Author(s) or their Institution(s) Page 7 of 43 Genome

139 provides evidence for separation of the first two taxa at a subspecies level into C. sativa

140 subsp. sativa and C. sativa subsp. indica. At the same time, historical records reveal that

141 field botanists were not scrupulous when differentiating these subspecies. Furthermore,

142 “Sativa” and “Indica” varieties were extensively interbred throughout domestication

143 process, therefore making their distinction impossible (McPartland 2018). The third type,

144 C. ruderalis, is not a popular variety and is adapted to extreme environments of Indian

145 Himalayan ranges, Siberia and Eastern Europe. The plant is small and bushy with low-

146 THC and high-CBD content which is often not enough to produce any psychological

147 effects and hence is not widely used.

148

149 Biosynthesis of CannabinoidsDraft

150 Cannabigerolic acid (CBGA) is one of the main precursors in cannabinoid

151 biosynthesis pathway. CBGA is formed by the condensation of geranyl diphosphate

152 (GPP) and olivetolic acid (Vavilov and Freier). GPP originate from non-mevalonate

153 pathway occurring in plastid, known as 2-C-methyl-D-erythritol 4-phosphate (MEP)

154 pathway. Olivetolic acid (Vavilov and Freier) is derived from hexanoic acid which is first

155 converted to hexanoyl-CoA by the action of hexanoyl CoA synthetase enzyme (Stout,

156 Boubakir et al. 2012). Later, hexanoyl-CoA is converted to OLA using three molecules of

157 malonyl-CoA, catalyzed by polyketide synthase (PKS) enzyme and an olivetolic acid

158 cyclase (OAC) enzyme (Gagne, Stout et al. 2012). The geranylpyrophosphate:olivetolate

159 geranyltransferase, also known as prenyltransferase or cannabigerolic acid synthase

160 (CBGAS) catalyzes alkylation reaction between OLA and GPP to form CBGA (Fellermeier

161 and Zenk 1998). Downstream in the pathway, there are three oxidocyclases responsible

7 © The Author(s) or their Institution(s) Genome Page 8 of 43

162 for establishing structural diversity among cannabinoids: tetrahydrocannabinolic acid

163 synthase (THCAS), cannabidiolic acid synthase (CBDAS) and cannabichromenic acid

164 synthase (CBCAS). These enzymes catalyze stereoselective cyclization of CBGA to

165 THCA, CBDA and CBCA, respectively (Figure 1C) (Sirikantaramas, Morimoto et al. 2004,

166 Sirikantaramas, Taura et al. 2005, Taura, Sirikantaramas et al. 2007, Degenhardt, Stehle

167 et al. 2017). In addition to these pentyl-alkyl-cannabinoids which are dominant in plants,

168 propyl-alkyl-cannabinoids such as Δ9-tetrahydrocannabivarinic acid (THCVA) and

169 cannabidivarinic acid (CBDVA) are also reported from Cannabis plants of some

170 geographical regions (Baker, Fowler et al. 1980, Hillig and Mahlberg 2004) and are

171 synthesized from cannabigerovarinic acid or CBGVA (Taura, Sirikantaramas et al. 2007,

172 Flores-Sanchez and Verpoorte 2008).Draft All cannabinoids are synthesized in their carboxylic

173 acid form inside plant and can be converted to their neutral forms via thermal

174 decarboxylation (Dussy, Hamberg et al. 2005, Happyana, Agnolet et al. 2013, Happyana

175 and Kayser 2013).

176

177 Synthases involved in the Cannabinoids Biosynthesis Pathway

178 Three different synthases responsible for biosynthesis of the major cannabinoids are

179 THCAS, CBDAS and CBCAS. THCAS catalyzes stereoselective oxidative cyclization of

180 CBGA into THCA (acidic precursor of THC) using molecular oxygen. THCA eventually

181 undergoes non-enzymatic decarboxylation to form the psychoactive agent, Δ9-THC.

182 THCAS has 1635 nucleotide open reading frame encoding a 545 amino acid (Quimby,

183 Doorenbos et al.) polypeptide chain with 24 AA signal peptide and belongs to p-cresol

184 methyl-hydroxylase superfamily. Like THCAS, CBDAS is a single exon gene and encodes

8 © The Author(s) or their Institution(s) Page 9 of 43 Genome

185 a protein with 516 AA, including 28 AA-long signal peptide. Both synthases contain two

186 major domains – FAD binding and Berberine like (BBE) domain (Figure 1A and B). FAD

187 coenzyme binds to His114 and Cys176 amino acids present in domain I. Mutation of active

188 site residues (His292 and Tyr417) results in decreased enzymatic activity (Shoyama,

189 Tamada et al. 2012).

190 CBDAS catalyzes oxidative cyclization of CBGA to form CBDA which can be further

191 decarboxylated to cannabidiol (CBD) (Taura, Morimoto et al. 1996, Taura, Dono et al.

192 2007). Like THCAS, CBDAS possess His114 and Cys176 flavin-binding site. It has a FAD

193 binding site composed of amino acid sequence (Arg‐Ser‐Gly‐Gly‐His). Similarly, mutation

194 of His114 residue results in loss of CBDAS activity (Taura, Sirikantaramas et al. 2007).

195 Cannabichromenic acid synthase Draft (CBCAS) catalyzes stereoselective cyclization of

196 CBGA to CBCA. CBCAS does not require molecular oxygen for the oxidocyclization of

197 CBGA (Morimoto, Komatsu et al. 1998).

198 All three synthases involved in cannabinoid synthesis share high sequence

199 similarity. For example, at amino acid level, THCAS and CBDAS are 84% identical

200 whereas THCAS and CBCAS are approximately 96% identical (Taura, Sirikantaramas et

201 al. 2007, Laverty, Stout et al. 2019). Structural and biochemical properties of THCAS and

202 CBDAS are also similar. Both are soluble enzymes with 28 amino acid long signal peptide

203 and possess FAD domain. High level of sequence identity suggests that both synthases

204 evolved from a common ancestor over a period of time (Taura, Sirikantaramas et al.

205 2007). It was proposed that THCAS evolved from CBDAS by gene duplication (Taura,

206 Sirikantaramas et al. 2007, Shoyama, Tamada et al. 2012, Onofri, de Meijer et al. 2015).

9 © The Author(s) or their Institution(s) Genome Page 10 of 43

207 The reaction mechanism of both enzymes is similar, as both require molecular

208 oxygen for CBGA oxidation and produce hydrogen peroxide. Domain present in these

209 two enzymes shares a striking similarity with berberine-bridge enzyme (BBE) domain.

210 BBE is a crucial enzyme of alkaloid biosynthesis pathway of Eschscholzia. californica

211 (Onofri, de Meijer et al. 2015). Enzymatic reactions catalyzed by both enzymes starts with

212 the transfer of a hydride ion from substrate CBGA to isoalloxazine ring of FAD coenzyme.

213 Interestingly, only small difference in the sequence of amino acids between the two

214 enzymes is solely responsible for determining their product specificity (Taura,

215 Sirikantaramas et al. 2007). Biochemically, THCAS and CBDAS are monomeric with

216 native protein mass of 74 kDa and similar Pi, Vmax and Km for CBGA substrate.

217 Phylogenetic analysis of aminoDraft acid sequences of THCAS and CBDAS from in-

218 house sequenced cultivars demonstrated lower level of divergence for THCAS as

219 compared to CBDAS among cultivars (Figures 2 A and B). Among 29 analyzed cultivars,

220 we detected only 8 and 18 unique sequences for THCAS and CBDAS, respectively. This

221 agrees with a previous study suggesting a recent evolution of the THCAS from CBDAS

222 group (Onofri, de Meijer et al. 2015). Alternatively, artificial selection of plants with the

223 highest level of THCA and with the most active THCAS, apparently, reduced number of

224 THCAS with variations in the Cannabis population. In case of CBDAS, whereas most of

225 the drug-type cultivars clustered together, hemp varieties, like Finola and CFX2, were

226 placed separately on the tree (Figure 2, Maximum Likelihood, 100 bootstraps).

227

228 Cannabis Chemotypes

10 © The Author(s) or their Institution(s) Page 11 of 43 Genome

229 A detailed screening of germplasm collection is necessary for successful breeding

230 of Cannabis cultivars with desired level and ratio of cannabinoids both for pharmaceutical

231 application and for setting up breeding strategies (Welling, Liu et al. 2016). High

232 performance liquid chromatography (HPLC) method offers unequivocal way to analyze

233 cannabinoid profile of examined plant and assigning chemotype (De Backer, Debrus et

234 al. 2009). It can be further supported by using functional DNA markers associated with

235 genes coding for THCAS and CBDAS. Chemotaxonomically, Cannabis sativa is divided

236 into three chemotypes determined based on THC:CBD ratio (Small and Beckstead 1973).

237 THC:CBD ratio is generally used to distinguish high and low-THC containing plants. The

238 first study was performed by Fetterman et. al. 1971 to discriminate between fibre and drug

239 type plants using THC:CBD ratio (Fetterman,Draft Keith et al. 1971, Turner and Elsohly 1979).

240 Chemotype I, also known as drug-type with THC:CBD ratio of more than 1 and THC

241 content higher than 0.3% of total dry weight. Chemotype II is an intermediate type and

242 has THC:CBD ratio of around 1. Chemotype III, also known as a fibre-type, typically has

243 low THC content. Therefore, it is non-psychoactive with THC:CBD ratio of less than 1.

244 Two more chemotypes that have been added were chemotype IV showing CBG (>0.30%)

245 and CBD (<0.50% ) content and chemotype V with undetectable level of cannabinoids

246 (Fournier, Richez-Dumanois et al. 1987, Mandolino and Carboni 2004).

247 We analyzed cannabinoid content in flowers of individual plants representing 31

248 cultivars using HPLC [2] and assigned the chemotype based on the total THC equivalent

249 to total CBD equivalent ratio (Table 1). Overall, total level of all cannabinoids did not

250 exceed 21% of the flower’s dry weight (Figure 3C). Depending on a cultivar, percentage

251 of CBGA varied from 0 to 1.1% (e.g., NF). Similarly, level of CBN, which is an oxidized

11 © The Author(s) or their Institution(s) Genome Page 12 of 43

252 metabolite of THC [5], was in the range from 0 to 0.19%. Overall, among 31 cultivars

253 examined 26 were chemotype I, 3 – chemotype II and 2 – chemotype III.

254 Interestingly, some genotyping studies have revealed that hemp and marijuana are

255 significantly different at genome level (Sawler, Stout et al. 2015) and it has also been

256 proven that both environmental and genetic factors are responsible for such chemotype

257 diversity in Cannabis (Bócsa, Máthé et al. 1997, de Meijer, Bagatta et al. 2003, Hillig

258 2005) Environmental factors such as amount of light received by plant and its quality,

259 nutrients and temperature have shown to modulate cannabinoids accumulation in a plant.

260 At the same time, it has been reported that the CBD/THC ratio remains constant

261 irrespective of plant development and ambient conditions (Pacifico, Miselli et al. 2008).

262 Therefore, analysis of cannabinoid profileDraft in leaves of developing plants allows to deduce

263 chemotype of a plant before its maturity. To examine accumulation of cannabinoids

264 throughout plant development, we crossed hemp variety X59 to cannabis cultivar HC.

265 Segregating population contained two chemotypes – II and III with consistent ratio of CBD

266 to THC in leaves and flowers at different stages of plant development regardless of the

267 chemotype as measured by HPLC (e.g. CBD/THC = 1.29, 1.53 and 1.23 for 4-, 7-week-

268 old plants and flowers of X59HC9a#12, respectively) (Figure 3A). Therefore, the

269 chemotype of potential mother plants can be determined already in 4-weeks-old plants.

270 We also examined accumulation of CBGA throughout plant development, nevertheless,

271 we did not detect gradual increase in the level of cannabinoid in the leaves and flowers

272 of the progeny of X59 x HC crosses, although inflorescence accumulated higher level of

273 this cannabinoid in most of the plants. At the same time, we observed modest positive

12 © The Author(s) or their Institution(s) Page 13 of 43 Genome

274 correlation between accumulation of total CBD equivalent and CBGA in the progeny (r =

275 0.69) (Figure 3B).

276 Genetics of Cannabis

277 Genomic studies of several new synthase variants have been carried out recently

278 (Weiblen, Wenger et al. 2015, Grassa, Wenger et al. 2018, Laverty, Stout et al. 2019,

279 Gao, Wang et al. 2020). Initially, CBDAS and THCAS were identified as a codominant

280 allele at single locus where BT/BT and BD/BD homozygous plants are THC and CBD

281 dominant, respectively. However, recent advances in genomic studies have suggested

282 the involvement of multiple linked loci harbouring alleles at different loci. Weiblen et al. 283 proposed this observation based onDraft several factors such as presence of diverse THCA 284 and CBDA synthase sequences in test samples, expression pattern and loci position on

285 chromosome map (Weiblen, Wenger et al. 2015). Onofri et al. suggested that

286 THCA/CBDA variation is due to sequence variations at the BT and/or BD loci (Onofri, de

287 Meijer et al. 2015). However, Grassa et. al. reports that divergence at CBDAS loci is

288 mainly responsible for determining THCA/CBDA ratio of cultivars resulting in cannabinoid

289 profile differences between marijuana and hemp (Grassa, Wenger et al. 2018).

290 Interestingly, variation in gene copy number of THCAS and CBDAS has also contributed

291 to varied cannabinoid content in cultivars and is responsible for phytochemical diversity

292 which helps plant in adaptation (Vergara, Huscher et al. 2019). To identify duplications of

293 cannabinoid synthases in in-house sequenced cultivars a genBlastA program was used

294 (She, Chu et al. 2009). As an input we generated scaffolds of genomic sequences of

295 target cultivars and used gene sequences of THCAS, CBDAS and CBCAS. E-value

296 threshold was set to 0.00001 and the minimum percentage of query gene coverage in the

13 © The Author(s) or their Institution(s) Genome Page 14 of 43

297 output to 80%. Overall, the number of THCAS gene duplicates varied among cultivars

298 from one to four (e.g., BC and Skywalker, respectively, Supplementary data File

299 S2) with no correlation to the level of total THC equivalent (r = 0.01). Similar numbers

300 were observed for CBDAS and CBCAS with the only exception for Zambiah which had

301 five duplicates of CBDAS. We can not exclude a possibility that some of the synthases

302 code for pseudogenes (Kojoma, Seki et al. 2006), since we did not examine their

303 sequences in detail. Therefore, lack of observed correlation between synthases copy

304 number and accumulation of cannabinoids needs to be examined further.

305 Mutation analysis of enzymes shows how substitutions of some targeted amino

306 acids could affect cannabinoid production in vitro. It was interesting to know that

307 glycosylation sites are not essentialDraft for optimum THCAS enzyme activity, whereas

308 increase in disulphide bonds improved CBDAS enzyme activity (Zirpel, Kayser et al.

309 2018).

310

311 Variation in Cannabis Synthases

312 Large number of Cannabis strains were developed over the centuries through

313 breeding and selection process. Modern breeders have used different types of DNA

314 marker tools such as random amplified polymorphic DNA (RAPD), restriction fragment

315 length polymorphism (RFLP), amplified fragment length polymorphisms (AFLP), inter

316 simple sequence repeat amplification (ISSR), expressed sequence tag simple sequence

317 repeat (EST-SSRs), single nucleotide polymorphism (SNP) and short tandem repeats

318 (STRs) to assess genetic diversity among C. sativa accessions (Gillan, Cole et al. 1995,

319 de Meijer, Bagatta et al. 2003, Miller, Shutler et al. 2003, Hu, Guo et al. 2012, Shirley,

14 © The Author(s) or their Institution(s) Page 15 of 43 Genome

320 Allgeier et al. 2013, Gao, Xin et al. 2014, Sawler, Stout et al. 2015, De Meijer and

321 Hammond 2016). These DNA marker tools also enabled discrimination between drug and

322 non-drug type Cannabis cultivars (Rotherham and Harbison 2011). Two sets of DNA

323 sequenced characterised amplified region (SCAR) markers have been used for prediction

324 of chemotype at the early stages of plant development – dominant D589 and co-dominant

325 B1080/B1192 (Pacifico, Miselli et al. 2006, Staginnus, Zörntlein et al. 2014). Whereas

326 D589 marker can provide information only regarding presence of active THCAS defined

327 as BT allele, B1080/B1192 marker can be used to assess both synthases – THCAS and

328 CBDAS indicated as BT and BD alleles, respectively. Both markers are mapped at or near

329 the protein domains (Figure 1).

330 Comparison of HPLC data for in-houseDraft sequenced cultivars to the presence of DNA

331 SCAR markers revealed 100% correlation only for D589 marker, whereas B1080/B1192

332 demonstrated inconclusive results, which was consistent with a previous study

333 (Brenneisen 2007). Moreover all analyzed cultivars carried BD marker, we detected single

334 nucleotide polymorphism at position 583 (C -> T) in CBDAS coding sequence in a number

335 of cultivars (e.g., BC Kush, Black Jack, etc.), resulting in premature stop codon rendering

336 synthase inactive. At the same time, several cultivars carried both – BD marker and

337 complete ORF, but very low level of total CBD equivalent (less than 1%, e.g. Bon Homme,

338 Canadian Cheese, etc.). Additionally, an opposite result was observed as well – absence

339 of active CBDAS with relatively high level of CBD equivalent (e.g. Jungle Wreck and

340 Zambiah), therefore suggesting that further analysis of CBDAS gene sequence is

341 required for detection of critical SNPs both in coding and promoter regions responsible

342 for accumulation of CBD in plants.

15 © The Author(s) or their Institution(s) Genome Page 16 of 43

343 Several studies on SNPs analysis in synthases genes have helped in determining

344 genetic differences associated with hemp versus drug type Cannabis and their affect on

345 enzymes’ activity (Borna, Salami et al. 2017, Cascini, Farcomeni et al. 2019). For

346 instance, Chiara et al. (2015) performed genotyping of inbred lines from different

347 geographical backgrounds and revealed a single SNP at position 706 in the THCAS gene

348 causing a change in amino acid from glutamic acid to glutamine resulting in strain which

349 mainly accumulated CBG(V)A as the major cannabinoid and produces low amount of

350 THCA whereas, active THCAS was found in cultivar in which single change in nucleotide

351 could not translate into different amino acid. Notably, SNPs did not alter enzyme’s activity

352 as seen in the case of CBDAS. Despite amino acid substitutions, CBDA content remained

353 high, suggesting that mutations occurringDraft near FAD binding site or catalytic site are

354 mostly responsible for altered enzyme activity (Onofri, de Meijer et al. 2015). Rotherham

355 et al. developed SNP assay system to differentiate drug and non-drug type Cannabis for

356 commercial production of fibre and seed oil cultivars and for analysis of confiscated

357 samples. The assay was capable in characterizing active THCAS from homozygous drug-

358 type plant, and inactive THCAS from both heterozygous drug-type and homozygous non-

359 drug type Cannabis varieties (Rotherham and Harbison 2011). In addition, SNP variations

360 among marijuana-type (Purple kush and chemdawg) and hemp-type cultivars (Finola and

361 ‘USO-31’) supported their phylogenetic separation (van Bakel, Stout et al. 2011). In fact,

362 SNPs were found to be responsible for establishing different genetic clusters among

363 Cannabis population (Soorni, Fatahi et al. 2017).

364

365 Characterization of cis-elements in the CBDAS and THCAS promoters

16 © The Author(s) or their Institution(s) Page 17 of 43 Genome

366 Cis-elements in promoters are known to be at the core of regulation of gene

367 expression during plant development and under environmental stimuli (Hernandez-

368 Garcia and Finer 2014). Manipulation of the THCAS and CBDAS genes expression can

369 potentially result in increased yield of target cannabinoids or altered ratio of CBD to THC.

370 To analyze upstream cis-regulatory elements, we extracted 768 and 1,000 bp 5' regions

371 representing promoter sequences for THCAS and CBDAS, respectively, from Purple

372 Kush and Finola genomes (PK scaffold 19603:6668-7668 and FN scaffold

373 14546436:3508-4508 for THCAS and CBDAS, respectively) (Laverty, Stout et al. 2019).

374 Cis-regulatory elements were downloaded from (Korkuc, Schippers et al. 2014) and

375 promoter sequences were scanned for the presence of motifs against 325 and 1496

376 promoters for CBDAS and THCAS, respectively,Draft using Find Individual Motif Occurrences

377 (FIMO) program (Grant, Bailey et al. 2011). 16 and 14 significantly enriched (p<0.0001)

378 motifs were identified for the THCAS and CBDAS promoters, respectively

379 (Supplementary data File S1). Ten cis-regulatory elements were common between

380 promoters and a number of motifs were responsible for regulation of gene expression

381 under abiotic stresses such as dehydration, cold and light (e.g., ATHB6, AtMYB2, RAV1-

382 A, GATA, Ibox, etc.). We also identified circadian clock responsive motif Evening Element

383 in promoter of the CBDAS gene, suggesting its possible circadian-regulated gene

384 expression. Further examination of the THCAS and CBDAS genes expression under

385 different stress conditions will reveal role of identified cis-regulatory elements and will

386 potentially allow to increase accumulation of corresponding cannabinoids.

387

388

17 © The Author(s) or their Institution(s) Genome Page 18 of 43

389 Conclusion and Future Prospects

390 Sequence variations provide insight about the complexity of cannabinoid

391 biosynthesis in plant resulting in chemotype diversity, altered gene expression and

392 enzymatic activity. Therefore, deeper analysis of regulatory mechanisms as well as

393 variants in sequences of synthases is needed for developing novel cultivars with diverse

394 cannabinoid profile. For this, synthetic biosynthesis pathways in heterologous hosts for

395 cannabinoid production (e.g., yeast, tobacco, etc.) (Sirikantaramas, Taura et al. 2005,

396 Luo, Reiter et al. 2019) can be established both for pharmaceutical applications as well

397 as for dissection of regulatory elements involved in synthases activity.

398

399 LIST OF FIGURES CAPTIONS Draft

400 Figure 1. Schematic representation of THCAS (A) and CBDAS (B) domains and

401 annealing regions of genotyping primers D589 and B1080/B1192 (C) Biosynthesis

402 pathway of major cannabinoids. Schematic view is derived and modified from pathway

403 reviewed in (Degenhardt, Stehle et al. 2017). Chemical structures are generated with

404 chemdraw. GPP-geranyl diphosphate; OLA-Olivetolicacid; CBGA-Cannabigerolic acid;

405 THCA- Δ9-Tetrahydrocannabinolic acid; THC- Δ9-Tetrahydrocannabinol; CBCA-

406 Cannabichromenic acid; CBC-Cannabichromene; CBDA-Cannabidiolic acid;CBD-

407 Cannabidiol;CBGAS-Cannabigerolic acid synthase; THCAS-Tetrahydrocannabinolic acid

408 synthase:CBCAS-Cannabichromenic acid synthase and CBDAS-Cannabidioloc acid

409 synthase

410

18 © The Author(s) or their Institution(s) Page 19 of 43 Genome

411 Figure 2. Phylogenetic tree of amino acid sequences of (A) THCAS and (B) CBDAS from

412 examined Cannabis cultivars. Phylogenetic tree was build using Geneious Prime

413 2020.2.3 software. AA sequences were aligned using MUSCLE alignment and consensus

414 tree was inferred using the Maximum likelihood method with 100 bootstraps. Values on

415 the branches demonstrate bootstrap proportions.

416

417 Figure 3. HPLC analysis of in-house cannabis cultivars. Chemotypes were deduced from

418 the ratio of total equivalent THC to total equivalent CBD (A) HPLC profile of total THC,

419 CBD (B) and CBGA throughout plant development in the progeny of X59 x HC crosses.

420 WOP – weeks old plant. (C) Percentage of pentyl-cannabinoids in individual plants of

421 Cannabis sativa L. representing 31 cultivars.Draft Level of cannabinoids was measured in dry

422 inflorescence using HPLC.

423

424

425

426

427

428

429

430

431

432

433 References

19 © The Author(s) or their Institution(s) Genome Page 20 of 43

434

435 Allegret, S. (2013). "The history of hemp." Hemp: industrial production and uses: 4-26.

436 Appendino, G., A. Giana, S. Gibbons, M. Maffei, G. Gnavi, G. Grassi and O. Sterner (2008). "A polar

437 cannabinoid from Cannabis sativa var. Carma." Natural Product Communications 3(12):

438 1934578X0800301207.

439 Baker, P. B., R. Fowler, K. R. Bagon and T. A. Gough (1980). "Determination of the distribution of

440 cannabinoids in cannabis resin using high performance liquid chromatography." Journal of analytical

441 toxicology 4(3): 145-152.

442 Bócsa, I., P. Máthé and L. Hangyel (1997). "Effect of nitrogen on tetrahydrocannabinol (THC) content in

443 hemp (Cannabis sativa L.) leaves at different positions." J Int Hemp Assoc 4(2): 78-79.

444 Borna, T., S. A. Salami and M. Shokrpour (2017).Draft "High resolution melting curve analysis revealed SNPs in

445 major cannabinoid genes associated with drug and non-drug types of cannabis." Biotechnology &

446 Biotechnological Equipment 31(4): 839-845.

447 Brenneisen, R. (2007). Chemistry and analysis of phytocannabinoids and other Cannabis constituents.

448 Marijuana and the Cannabinoids, Springer: 17-49.

449 Brenneisen, R., A. Egli, M. Elsohly, V. Henn and Y. Spiess (1996). "The effect of orally and rectally

450 administered delta 9-tetrahydrocannabinol on spasticity: a pilot study with 2 patients." International

451 journal of clinical pharmacology and therapeutics 34(10): 446-452.

452 Brizicky, G. K. (1966). Cultivated Plants and Their Wild Relatives. Taxonomy, Geography, Cytogenetics,

453 Ecology, Origin, Utilization, JSTOR.

454 Callaway, J. (2004). "Hempseed as a nutritional resource: An overview." Euphytica 140(1-2): 65-72.

455 Camp, W. (1936). "The antiquity of hemp as an economic plant J." NY Bot. Gard 37: 110-114.

20 © The Author(s) or their Institution(s) Page 21 of 43 Genome

456 Cascini, F., A. Farcomeni, D. Migliorini, L. Baldassarri, I. Boschi, S. Martello, S. Amaducci, L. Lucini and J.

457 Bernardi (2019). "Highly Predictive Genetic Markers Distinguish Drug-Type from Fiber-Type Cannabis

458 sativa L." Plants 8(11): 496.

459 Cheatham, S., M. Johnston and L. Marshall (2009). "The useful wild plants of Texas, the Southeastern

460 and Southwestern United States, the Southern Plains, and Northern Mexico, vol 3. Useful Wild Plants."

461 Inc, Austin (Treatment of Cannabis: pp 13–126).

462 Dahl, H. V. and V. A. Frank (2011). "Medical marijuana–exploring the concept in relation to small scale

463 cannabis growers in Denmark." World wide weed–Global trends in and its control:

464 116-141.

465 De Backer, B., B. Debrus, P. Lebrun, L. Theunis, N. Dubois, L. Decock, A. Verstraete, P. Hubert and C.

466 Charlier (2009). "Innovative development andDraft validation of an HPLC/DAD method for the qualitative and

467 quantitative determination of major cannabinoids in cannabis plant material." J Chromatogr B Analyt

468 Technol Biomed Life Sci 877(32): 4115-4124.

469 De Meijer, E. and K. Hammond (2016). "The inheritance of chemical phenotype in Cannabis sativa L.(V):

470 regulation of the propyl-/pentyl cannabinoid ratio, completion of a genetic model." Euphytica 210(2):

471 291-307.

472 De Meijer, E., K. Hammond and A. Sutton (2009). "The inheritance of chemical phenotype in

473 Cannabissativa L.(IV): cannabinoid-free plants." Euphytica 168(1): 95-112.

474 de Meijer, E. and R. Pertwee (2014). "Handbook of Cannabis. Handbooks in Psychopharmacology."

475 De Meijer, E. P. (2014). "The chemical phenotypes (chemotypes) of Cannabis." Handbook of Cannabis:

476 89-110.

477 de Meijer, E. P., M. Bagatta, A. Carboni, P. Crucitti, V. C. Moliterni, P. Ranalli and G. Mandolino (2003).

478 "The inheritance of chemical phenotype in Cannabis sativa L." Genetics 163(1): 335-346.

21 © The Author(s) or their Institution(s) Genome Page 22 of 43

479 Degenhardt, F., F. Stehle and O. Kayser (2017). The biosynthesis of cannabinoids. Handbook of Cannabis

480 and related pathologies, Elsevier: 13-23.

481 Divashuk, M. G., O. S. Alexandrov, O. V. Razumova, I. V. Kirov and G. I. Karlov (2014). "Molecular

482 cytogenetic characterization of the dioecious Cannabis sativa with an XY chromosome sex determination

483 system." PloS one 9(1).

484 Duke, J. and K. Wain (1981). "Medicinal plants of the world. Computer index with more than 85000

485 entries." Handbook of Medicinal Herbs (Ed. Duke JA), CRC press, Boca Raton, Florida: 96.

486 Dussy, F. E., C. Hamberg, M. Luginbühl, T. Schwerzmann and T. A. Briellmann (2005). "Isolation of Δ9-

487 THCA-A from hemp and analytical aspects concerning the determination of Δ9-THC in cannabis

488 products." Forensic science international 149(1): 3-10.

489 ElSohly, M. A. (2007). Marijuana and the Cannabinoids,Draft Springer Science & Business Media.

490 Farag, S. and O. Kayser (2017). The cannabis plant: botanical aspects. Handbook of Cannabis and Related

491 Pathologies, Elsevier: 3-12.

492 Fellermeier, M. and M. H. Zenk (1998). "Prenylation of olivetolate by a hemp transferase yields

493 cannabigerolic acid, the precursor of tetrahydrocannabinol." FEBS Letters 427(2): 283-285.

494 Fetterman, P. S., E. S. Keith, C. W. Waller, O. Guerrero, N. J. Doorenbos and M. W. Quimby (1971).

495 "Mississippi-grown Cannabis sativa L.: Preliminary observation on chemical definition of phenotype and

496 variations in tetrahydrocannabinol content versus age, sex, and plant part." Journal of Pharmaceutical

497 Sciences 60(8): 1246-1249.

498 Flemming, T., R. Muntendam, C. Steup and O. Kayser (2007). Chemistry and biological activity of

499 tetrahydrocannabinol and its derivatives. Bioactive Heterocycles IV, Springer: 1-42.

500 Flores-Sanchez, I. J. and R. Verpoorte (2008). "Secondary metabolism in cannabis." Phytochemistry

501 reviews 7(3): 615-639.

22 © The Author(s) or their Institution(s) Page 23 of 43 Genome

502 Fournier, G., C. Richez-Dumanois, J. Duvezin, J.-P. Mathieu and M. Paris (1987). "Identification of a new

503 chemotype in Cannabis sativa: cannabigerol-dominant plants, biogenetic and agronomic prospects."

504 Planta Medica 53(03): 277-280.

505 Gagne, S. J., J. M. Stout, E. Liu, Z. Boubakir, S. M. Clark and J. E. Page (2012). "Identification of olivetolic

506 acid cyclase from Cannabis sativa reveals a unique catalytic route to plant polyketides." Proceedings of

507 the National Academy of Sciences 109(31): 12811-12816.

508 Gao, C., P. Xin, C. Cheng, Q. Tang, P. Chen, C. Wang, G. Zang and L. Zhao (2014). "Diversity analysis in

509 Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence

510 repeat markers." PloS one 9(10).

511 Gao, S., B. Wang, S. Xie, X. Xu, J. Zhang, L. Pei, Y. Yu, W. Yang and Y. Zhang (2020). "A high-quality

512 reference genome of wild Cannabis sativa."Draft Horticulture Research 7(1): 73.

513 Gillan, R., M. Cole, A. Linacre, J. Thorpe and N. Watson (1995). "Comparison of Cannabis sativa by

514 random amplification of polymorphic DNA (RAPD) and HPLC of cannabinoids: a preliminary study."

515 Science & justice: journal of the Forensic Science Society 35(3): 169-177.

516 Godwin, H. (1967). "The ancient cultivation of hemp." Antiquity 41(161): 42-49.

517 Grant, C. E., T. L. Bailey and W. S. Noble (2011). "FIMO: scanning for occurrences of a given motif."

518 Bioinformatics 27(7): 1017-1018.

519 Grassa, C. J., J. P. Wenger, C. Dabney, S. G. Poplawski, S. T. Motley, T. P. Michael, C. Schwartz and G. D.

520 Weiblen (2018). "A complete Cannabis chromosome assembly and adaptive admixture for elevated

521 cannabidiol (CBD) content." BioRxiv: 458083.

522 Happyana, N., S. Agnolet, R. Muntendam, A. Van Dam, B. Schneider and O. Kayser (2013). "Analysis of

523 cannabinoids in laser-microdissected trichomes of medicinal Cannabis sativa using LCMS and cryogenic

524 NMR." Phytochemistry 87: 51-59.

23 © The Author(s) or their Institution(s) Genome Page 24 of 43

525 Happyana, N. and O. Kayser (2013). "Monitoring metabolites production and cannabinoids analysis in

526 medicinal Cannabis trichomes during flowering period by 1H NMR-based metabolomics." Planta Medica

527 79(13): SL44.

528 Hasan, K. A. (1974). "Social aspects of the use of ." Cannabis and culture: 235-246.

529 Hernandez-Garcia, C. M. and J. J. Finer (2014). "Identification and validation of promoters and cis-acting

530 regulatory elements." Plant Sci 217-218: 109-119.

531 Hillig, K. W. (2004). "A chemotaxonomic analysis of terpenoid variation in Cannabis." Biochemical

532 systematics and ecology 32(10): 875-891.

533 Hillig, K. W. (2005). "Genetic evidence for speciation in Cannabis (Cannabaceae)." Genetic Resources and

534 Crop Evolution 52(2): 161-180.

535 Hillig, K. W. and P. G. Mahlberg (2004). "A chemotaxonomicDraft analysis of cannabinoid variation in

536 Cannabis (Cannabaceae)." American journal of botany 91(6): 966-975.

537 HU, Z.-G., H.-Y. GUO, X.-L. HU, X. CHEN, X.-Y. LIU, M.-B. GUO, Q.-Y. ZHANG, Y.-P. XU, L.-F. GUO and M.

538 YANG (2012). "Genetic diversity research of hemp (Cannabis sativa L) cultivar based on AFLP analysis."

539 Journal of Plant Genetic Resources 13(4): 555-561.

540 Izzo, A. A., F. Borrelli, R. Capasso, V. Di Marzo and R. Mechoulam (2009). "Non-psychotropic plant

541 cannabinoids: new therapeutic opportunities from an ancient herb." Trends in pharmacological sciences

542 30(10): 515-527.

543 Jiang, H.-E., X. Li, Y.-X. Zhao, D. K. Ferguson, F. Hueber, S. Bera, Y.-F. Wang, L.-C. Zhao, C.-J. Liu and C.-S.

544 Li (2006). "A new insight into Cannabis sativa (Cannabaceae) utilization from 2500-year-old Yanghai

545 Tombs, , China." Journal of ethnopharmacology 108(3): 414-422.

546 Jin, J., F. Tian, D. C. Yang, Y. Q. Meng, L. Kong, J. Luo and G. Gao (2017). "PlantTFDB 4.0: toward a central

547 hub for transcription factors and regulatory interactions in plants." Nucleic Acids Res 45(D1): D1040-

548 D1045.

24 © The Author(s) or their Institution(s) Page 25 of 43 Genome

549 Kojoma, M., H. Seki, S. Yoshida and T. Muranaka (2006). "DNA polymorphisms in the

550 tetrahydrocannabinolic acid (THCA) synthase gene in "drug-type" and "fiber-type" Cannabis sativa L."

551 Forensic Sci Int 159(2-3): 132-140.

552 Korkuc, P., J. H. Schippers and D. Walther (2014). "Characterization and identification of cis-regulatory

553 elements in Arabidopsis based on single-nucleotide polymorphism information." Plant Physiol 164(1):

554 181-200.

555 Kriese, U., E. Schumann, W. Weber, M. Beyer and L. Brühl (2004). "Oil content, tocopherol composition

556 and fatty acid patterns of the seeds of 51 Cannabis sativa L. genotypes." Euphytica 137(3): 339-351.

557 Laverty, K. U., J. M. Stout, M. J. Sullivan, H. Shah, N. Gill, L. Holbrook, G. Deikus, R. Sebra, T. R. Hughes

558 and J. E. Page (2019). "A physical and genetic map of Cannabis sativa identifies extensive

559 rearrangements at the THC/CBD acid synthaseDraft loci." Genome research 29(1): 146-156.

560 Laverty, K. U., J. M. Stout, M. J. Sullivan, H. Shah, N. Gill, L. Holbrook, G. Deikus, R. Sebra, T. R. Hughes, J.

561 E. Page and H. van Bakel (2019). "A physical and genetic map of Cannabis sativa identifies extensive

562 rearrangements at the THC/CBD acid synthase loci." Genome Res 29(1): 146-156.

563 Li, H.-L. (1974). "The origin and use of Cannabis in eastern Asia linguistic-cultural implications." Economic

564 Botany 28(3): 293-301.

565 Ling, Y., Z. Du, Z. Zhang and Z. Su (2010). "ProFITS of maize: a database of protein families involved in the

566 transduction of signalling in the maize genome." BMC Genomics 11: 580.

567 Long, L. E., D. T. Malone and D. A. Taylor (2005). "The pharmacological actions of cannabidiol." Drugs of

568 the Future 30(7): 747.

569 Luo, X., M. A. Reiter, L. d’Espaux, J. Wong, C. M. Denby, A. Lechner, Y. Zhang, A. T. Grzybowski, S. Harth,

570 W. Lin, H. Lee, C. Yu, J. Shin, K. Deng, V. T. Benites, G. Wang, E. E. K. Baidoo, Y. Chen, I. Dev, C. J. Petzold

571 and J. D. Keasling (2019). "Complete biosynthesis of cannabinoids and their unnatural analogues in

572 yeast." Nature 567(7746): 123-126.

25 © The Author(s) or their Institution(s) Genome Page 26 of 43

573 Mandolino, G. and A. Carboni (2004). "Potential of marker-assisted selection in hemp genetic

574 improvement." Euphytica 140(1-2): 107-120.

575 McPartland, J. M. (2018). "Cannabis Systematics at the Levels of Family, Genus, and Species." Cannabis

576 and Cannabinoid Research 3(1): 203-212.

577 Mechoulam, R., N. Lander, S. Dikstein, E. Carlini and M. Blumenthal (1976). On the therapeutic

578 possibilities of some cannabinoids. The Therapeutic potential of marihuana, Springer: 35-45.

579 Mechoulam, R. and L. A. Parker (2013). "The endocannabinoid system and the brain." Annual review of

580 psychology 64: 21-47.

581 Miller, H. C., G. Shutler, S. Abrams, J. Hanniman, S. Neylon, C. Ladd, T. Palmbach and H. C. Lee (2003). "A

582 simple DNA extraction method for marijuana samples used in amplified fragment length polymorphism

583 (AFLP) analysis." Journal of forensic sciencesDraft 48(2): 343-347.

584 Morimoto, S., K. Komatsu, F. Taura and Y. Shoyama (1998). "Purification and characterization of

585 cannabichromenic acid synthase from Cannabis sativa." Phytochemistry 49(6): 1525-1529.

586 Onofri, C., E. P. de Meijer and G. Mandolino (2015). "Sequence heterogeneity of cannabidiolic-and

587 tetrahydrocannabinolic acid-synthase in Cannabis sativa L. and its relationship with chemical

588 phenotype." Phytochemistry 116: 57-68.

589 Onofri, C., E. P. M. de Meijer and G. Mandolino (2015). "Sequence heterogeneity of cannabidiolic- and

590 tetrahydrocannabinolic acid-synthase in Cannabis sativa L. and its relationship with chemical

591 phenotype." Phytochemistry 116: 57-68.

592 Pacifico, D., F. Miselli, A. Carboni, A. Moschella and G. Mandolino (2008). "Time course of cannabinoid

593 accumulation and chemotype development during the growth of Cannabis sativa L." Euphytica 160(2):

594 231-240.

595 Pacifico, D., F. Miselli, M. Micheler, A. Carboni, P. Ranalli and G. Mandolino (2006). "Genetics and

596 Marker-assisted Selection of the Chemotype in Cannabis sativa L." Molecular Breeding 17(3): 257-268.

26 © The Author(s) or their Institution(s) Page 27 of 43 Genome

597 Pollastro, F., O. Taglialatela-Scafati, M. Allara, E. Munoz, V. Di Marzo, L. De Petrocellis and G. Appendino

598 (2011). "Bioactive prenylogous cannabinoid from fiber hemp (Cannabis sativa)." Journal of natural

599 products 74(9): 2019-2022.

600 Quimby, M. W., N. J. Doorenbos, C. E. Turner and A. Masoud (1973). "Mississippi-Grown Marihuana:

601 Cannabis sativa Cultivation and Observed Morphological Variations." Economic botany: 117-127.

602 Radwan, M. M., M. A. ElSohly, D. Slade, S. A. Ahmed, I. A. Khan and S. A. Ross (2009). "Biologically active

603 cannabinoids from high-potency Cannabis sativa." Journal of natural products 72(5): 906-911.

604 Radwan, M. M., S. A. Ross, D. Slade, S. A. Ahmed, F. Zulfiqar and M. A. ElSohly (2008). "Isolation and

605 characterization of new cannabis constituents from a high potency variety." Planta medica 74(03): 267-

606 272.

607 Rotherham, D. and S. Harbison (2011). "DifferentiationDraft of drug and non-drug Cannabis using a single

608 nucleotide polymorphism (SNP) assay." Forensic science international 207(1-3): 193-197.

609 Rubin, V. (2011). Cannabis and culture, Walter de Gruyter.

610 Sakamoto, K., Y. Akiyama, K. Fukui, H. Kamada and S. Satoh (1998). "Characterization; genome sizes and

611 morphology of sex chromosomes in hemp (Cannabis sativa L.)." Cytologia 63(4): 459-464.

612 Sawler, J., J. M. Stout, K. M. Gardner, D. Hudson, J. Vidmar, L. Butler, J. E. Page and S. Myles (2015). "The

613 genetic structure of marijuana and hemp." PloS one 10(8).

614 Schultes, R. E. (1979). The Species Problem in Cannabis—Science and Semantics, by Ernest Small,

615 published by Corpus, Toronto, Canada, 2 volumes; soft cover, price $28.(Vol. 1, soft cover, $10.95; hard

616 cover, $16.95. Vol. 2, soft cover, $9.95; hard cover, $14.95.), Elsevier.

617 Schultes, R. E., W. M. Klein, T. Plowman and T. E. Lockwood (1974). "Cannabis: an example of taxonomic

618 neglect." Botanical Museum Leaflets, Harvard University 23(9): 337-367.

619 She, R., J. S. Chu, K. Wang, J. Pei and N. Chen (2009). "GenBlastA: enabling BLAST to identify homologous

620 gene sequences." Genome Res 19(1): 143-149.

27 © The Author(s) or their Institution(s) Genome Page 28 of 43

621 Shirley, N., L. Allgeier, T. LaNier and H. M. Coyle (2013). "Analysis of the NMI01 Marker for a Population

622 Database of Cannabis Seeds." Journal of Forensic Sciences 58(s1): S176-S182.

623 Shoyama, Y., T. Tamada, K. Kurihara, A. Takeuchi, F. Taura, S. Arai, M. Blaber, Y. Shoyama, S. Morimoto

624 and R. Kuroki (2012). "Structure and function of∆ 1-tetrahydrocannabinolic acid (THCA) synthase, the

625 enzyme controlling the psychoactivity of Cannabis sativa." Journal of molecular biology 423(1): 96-105.

626 Sirikantaramas, S., S. Morimoto, Y. Shoyama, Y. Ishikawa, Y. Wada, Y. Shoyama and F. Taura (2004). "The

627 gene controlling marijuana psychoactivity molecular cloning and heterologous expression of Δ1-

628 tetrahydrocannabinolic acid synthase from Cannabis sativa L." Journal of Biological Chemistry 279(38):

629 39767-39774.

630 Sirikantaramas, S., F. Taura, S. Morimoto and Y. Shoyama (2007). "Recent advances in Cannabis sativa

631 research: biosynthetic studies and its potentialDraft in biotechnology." Current pharmaceutical biotechnology

632 8(4): 237-243.

633 Sirikantaramas, S., F. Taura, Y. Tanaka, Y. Ishikawa, S. Morimoto and Y. Shoyama (2005).

634 "Tetrahydrocannabinolic acid synthase, the enzyme controlling marijuana psychoactivity, is secreted

635 into the storage cavity of the glandular trichomes." Plant and Cell Physiology 46(9): 1578-1582.

636 Sirikantaramas, S., F. Taura, Y. Tanaka, Y. Ishikawa, S. Morimoto and Y. Shoyama (2005).

637 "Tetrahydrocannabinolic acid synthase, the enzyme controlling marijuana psychoactivity, is secreted

638 into the storage cavity of the glandular trichomes." Plant Cell Physiol 46(9): 1578-1582.

639 Small, E. (2015). "Evolution and Classification of Cannabis sativa (Marijuana, Hemp) in Relation to

640 Human Utilization." The Botanical Review 81(3): 189-294.

641 Small, E. and H. Beckstead (1973). "Common cannabinoid phenotypes in 350 stocks of Cannabis."

642 Lloydia.

643 Small, E. and A. Cronquist (1976). "A practical and natural taxonomy for Cannabis." Taxon: 405-435.

28 © The Author(s) or their Institution(s) Page 29 of 43 Genome

644 Soorni, A., R. Fatahi, D. C. Haak, S. A. Salami and A. Bombarely (2017). "Assessment of Genetic Diversity

645 and Population Structure in Iranian Cannabis Germplasm." Scientific Reports 7(1): 15668.

646 Staginnus, C., S. Zörntlein and E. de Meijer (2014). "A PCR marker linked to a THCA synthase

647 polymorphism is a reliable tool to discriminate potentially THC-rich plants of Cannabis sativa L." J

648 Forensic Sci 59(4): 919-926.

649 Stout, J. M., Z. Boubakir, S. J. Ambrose, R. W. Purves and J. E. Page (2012). "The hexanoyl-CoA precursor

650 for cannabinoid biosynthesis is formed by an acyl-activating enzyme in Cannabis sativa trichomes." The

651 Plant Journal 71(3): 353-365.

652 Swift, W., A. Wong, K. M. Li, J. C. Arnold and I. S. McGregor (2013). "Analysis of cannabis seizures in

653 NSW, Australia: cannabis potency and cannabinoid profile." PloS one 8(7).

654 Taura, F., E. Dono, S. Sirikantaramas, K. Yoshimura,Draft Y. Shoyama and S. Morimoto (2007). "Production of

655 Δ1-tetrahydrocannabinolic acid by the biosynthetic enzyme secreted from transgenic Pichia pastoris."

656 Biochemical and biophysical research communications 361(3): 675-680.

657 Taura, F., S. Morimoto and Y. Shoyama (1996). "Purification and characterization of cannabidiolic-acid

658 synthase from Cannabis sativa L. Biochemical analysis of a novel enzyme that catalyzes the

659 oxidocyclization of cannabigerolic acid to cannabidiolic acid." Journal of Biological Chemistry 271(29):

660 17411-17416.

661 Taura, F., S. Sirikantaramas, Y. Shoyama, K. Yoshikai, Y. Shoyama and S. Morimoto (2007).

662 "Cannabidiolic-acid synthase, the chemotype-determining enzyme in the fiber-type Cannabis sativa."

663 FEBS letters 581(16): 2929-2934.

664 Touw, M. (1981). "The religious and medicinal uses of , India and Tibet." Journal of

665 psychoactive drugs 13(1): 23-34.

666 Turner, C. E. and M. A. Elsohly (1979). "Constituents of cannabis sativa L. XVI. A possible decomposition

667 pathway of Δ9-tetrahydrocannabinol to ." Journal of heterocyclic chemistry 16(8): 1667-1668.

29 © The Author(s) or their Institution(s) Genome Page 30 of 43

668 van Bakel, H., J. M. Stout, A. G. Cote, C. M. Tallon, A. G. Sharpe, T. R. Hughes and J. E. Page (2011). "The

669 draft genome and transcriptome of Cannabis sativa." Genome Biology 12(10): R102.

670 van Bakel, H., J. M. Stout, A. G. Cote, C. M. Tallon, A. G. Sharpe, T. R. Hughes and J. E. Page (2011). "The

671 draft genome and transcriptome of Cannabis sativa." Genome Biol 12(10): R102.

672 Vavilov, N. I. and F. Freier (1951). "Studies on the origin of cultivated plants." Studies on the origin of

673 cultivated plants.

674 Vergara, D., E. L. Huscher, K. G. Keepers, R. M. Givens, C. G. Cizek, A. Torres, R. Gaudino and N. C. Kane

675 (2019). "Gene copy number is associated with phytochemistry in Cannabis sativa." AoB PLANTS 11(6).

676 Vyskot, B. and R. Hobza (2015). "The genomics of plant sex chromosomes." Plant Science 236: 126-135.

677 Wang, H. and Y. Wei (2012). "Survey on the germplasm resources of Cannabis sativa L." Medicinal Plant

678 3(7): 11-14. Draft

679 Weiblen, G. D., J. P. Wenger, K. J. Craft, M. A. ElSohly, Z. Mehmedic, E. L. Treiber and M. D. Marks (2015).

680 "Gene duplication and divergence affecting drug content in Cannabis sativa." New Phytologist 208(4):

681 1241-1250.

682 Welling, M. T., L. Liu, T. Shapter, C. A. Raymond and G. J. King (2016). "Characterisation of cannabinoid

683 composition in a diverse Cannabis sativa L. germplasm collection." Euphytica 208(3): 463-475.

684 Zirpel, B., O. Kayser and F. Stehle (2018). "Elucidation of structure-function relationship of THCA and

685 CBDA synthase from Cannabis sativa L." Journal of biotechnology 284: 17-26.

686 Zuardi, A. W. (2006). "History of cannabis as a medicine: a review." Brazilian Journal of Psychiatry 28(2):

687 153-157.

688

689

690

30 © The Author(s) or their Institution(s) Page 31 of 43 Genome

691 LIST OF TABLES

692 Table 1. Genotyping of examined Cannabis sativa L. cultivars

Chemotype B1080/B1192 D589 marker Complete Cultivars identified by marker phenotype CBDAS ORF HPLC phenotype

BC Kush I BT/BD BTpresent -

Black Jack I BT/BD BTpresent -

Bon Homme I BT/BD BTpresent +

Brasil KC I BT/BD BTpresent -

Canadian Cheese I BT/BD BTpresent +

Candy I BT/BD BTpresent -

CBD Chemdog I DraftBT/BD BTpresent +

CBD God Bud I BD BTpresent -

CBD Haze I BT/BD BTpresent -

CBD Rene I BT/BD BTpresent -

Chemdaws I BT/BD BTpresent +

Cherry I BD BTpresent -

Crystal Limit I BT/BD BTpresent -

Doctor G I BT/BD BTpresent -

Girl Scout I BT/BD BTpresent +

Haze I BD BTpresent -

Malawi Gold I BT/BD BTpresent +

NF I BT/BD BTpresent -

Pink Rush I BT/BD BTpresent +

Pink Rush x Head I BT/BD BTpresent + Band

PP2 x Maui I BD BTpresent -

31 © The Author(s) or their Institution(s) Genome Page 32 of 43

RKA I BT/BD BTpresent +

RKE I BT/BD BTpresent +

Skywalker I BT/BD BTpresent -

Trainwreck I BT/BD BTpresent -

White Grapefruit I BT/BD BTpresent +

Jungle Wreck II BT/BD BTpresent -

RIO II BT/BD BTpresent +

Zambiah II BT/BD BTpresent -

CFX2 III BD BTabsent +

Finola III BD BTabsent +

693 694 Draft 695

696

697

698

699

700

701

702

703

704

705

706

707

32 © The Author(s) or their Institution(s) Page 33 of 43 Genome

708 LIST OF FIGURES

709

710

Draft

711 712 713

714 Figure 1. 715

33 © The Author(s) or their Institution(s) Genome Page 34 of 43

716 717 718 719 720 721 722 723 724

725

Draft

726

727

728

729 Figure 2.

34 © The Author(s) or their Institution(s) Page 35 of 43 Genome

730

731

A

B Draft

C

35 © The Author(s) or their Institution(s) Genome Page 36 of 43

732 Figure 3.

Draft

36 © The Author(s) or their Institution(s) Page 37 of 43 Genome

Table 1. Genotyping of examined Cannabis sativa L. cultivars.

Chemotype B1080/B1192 D589 marker Complete Cultivars identified by marker phenotype CBDAS ORF HPLC phenotype

BC Kush I BT/BD BTpresent -

Black Jack I BT/BD BTpresent -

Bon Homme I BT/BD BTpresent +

Brasil KC I BT/BD BTpresent -

Canadian Cheese I BT/BD BTpresent +

Candy I BT/BD BTpresent -

CBD Chemdog I BT/BD BTpresent + CBD God Bud I DraftBD BTpresent - CBD Haze I BT/BD BTpresent -

CBD Rene I BT/BD BTpresent -

Chemdaws I BT/BD BTpresent +

Cherry I BD BTpresent -

Crystal Limit I BT/BD BTpresent -

Doctor G I BT/BD BTpresent -

Girl Scout I BT/BD BTpresent +

Haze I BD BTpresent -

Malawi Gold I BT/BD BTpresent +

NF I BT/BD BTpresent -

Pink Rush I BT/BD BTpresent +

Pink Rush x Head I BT/BD BTpresent + Band

PP2 x Maui I BD BTpresent -

RKA I BT/BD BTpresent +

© The Author(s) or their Institution(s) Genome Page 38 of 43

RKE I BT/BD BTpresent +

Skywalker I BT/BD BTpresent -

Trainwreck I BT/BD BTpresent -

White Grapefruit I BT/BD BTpresent +

Jungle Wreck II BT/BD BTpresent -

RIO II BT/BD BTpresent +

Zambiah II BT/BD BTpresent -

CFX2 III BD BTabsent +

Finola III BD BTabsent +

Draft

© The Author(s) or their Institution(s) Page 39 of 43 Genome

257x71mm (150 x 150 DPI)

Draft

© The Author(s) or their Institution(s) Genome Page 40 of 43

Draft

158x154mm (96 x 96 DPI)

© The Author(s) or their Institution(s) Page 41 of 43 Genome

328x171mmDraft (150 x 150 DPI)

© The Author(s) or their Institution(s) Genome Page 42 of 43

Draft

207x191mm (150 x 150 DPI)

© The Author(s) or their Institution(s) Page 43 of 43 Genome

322x164mmDraft (150 x 150 DPI)

© The Author(s) or their Institution(s)