<<

bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Genetic tools weed out misconceptions of strain reliability in sativa: Implications for a 2 budding industry. 3

4

5 Anna L. Schwabe1*¶ and Mitchell E. McGlaughlin1*¶

6

7 1School of Biological Sciences, University of Northern Colorado, Greeley, Colorado, United

8 States of America

9 *Corresponding Authors

10

11 Email

12 Anna Schwabe: [email protected] (970) 217-3300

13 Mitchell McGlaughlin: [email protected] (970) 351- 2139

14 ¶These authors contributed equally to this work

15

16 Date of Submission: May 27, 2018

17 Number of tables: 3

18 Number of Figs: 4 (total), 2 (color in print), 2 (color online only)

19 Supplementary: 3 Figs, 2 tables

20 Word count: 6239

21

22 Highlight: Genetic analyses provide evidence of genetic variation within clonal and stable seed

23 strains of commercially available , indicating the potential for inconsistent

24 products for medical patients and recreational users.

1 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

25 Abstract

26 Cannabis sativa is listed as a Schedule I substance by the United States Drug Enforcement

27 Agency and has been federally illegal in the United States since 1937. However, the majority of

28 states in the United States, as well as several countries, now have various levels of legal

29 Cannabis. Products are labeled with identifying strain names but there is no official mechanism

30 to register Cannabis strains, therefore the potential exists for incorrect identification or labeling.

31 This study uses genetic analyses to investigate strain reliability from the consumer point of view.

32 Ten microsatellite regions were used to examine samples from strains obtained from dispensaries

33 in three states. Samples were examined for genetic similarity within strains, and also a possible

34 genetic distinction between Sativa, Indica, or Hybrid types. The analyses revealed genetic

35 inconsistencies within strains. Additionally, although there was strong statistical support dividing

36 the samples into two genetic groups, the groups did not correspond to commonly reported

37 Sativa/Hybrid/Indica types. Genetic differences have the potential to lead to phenotypic

38 differences and unexpected effects, which could be surprising for the recreational user, but have

39 more serious implications for patients relying on strains that alleviate specific medical

40 symptoms.

41

42

43 44 Keywords: – Cannabis sativa – consumer – genotype – – 45 medical – microsatellite – phenotype – strain 46

47 List of abbreviations

2 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

48 US: United States HIV: human immunodeficiency virus AIDS: acquired immune deficiency

49 syndrome PTSD: post-traumatic stress disorder THC: Δ⁹- USDA: United

50 States Department of Agriculture PVPA: The Plant Variety Protection Act PVPO: Plant Variety

51 Protection Office SLO: San Luis Obispo DNA: deoxyribonucleic acid CTAB: Acetyl

52 trimethylammonium bromide PCR: Polymerase chain reaction HWE: Hardy–Weinberg

53 equilibrium PCoA: Principle Coordinates Analysis SD: standard Deviation IA: identical alleles

54 55 Introduction

56 Cannabis sativa L. is one of the most useful plants (Clarke & Merlin, 2013) with

57 evidence of human cultivation dating back thousands of years (Abel, 2013). Cannabis

58 prohibition in the United States began with the Marihuana Tax Act in 1937 (The Marihuana Tax

59 Act of 1937), and the Controlled Substances Act of 1970 classified Cannabis as a Schedule I

60 drug with no “accepted medical use in treatment in the United States” (Controlled Substances

61 Act, 1970). Cannabis is largely illegal worldwide, but laws allowing Cannabis for use as hemp,

62 medicine, and some adult recreational use are emerging (ProCon, 2016a). Cannabis is a multi-

63 billion dollar crop, but global restrictions have limited Cannabis related research. The origins

64 and genetic identities of many Cannabis strains are largely unknown, as there are relatively few

65 genetic studies focused on strains (Lynch et al., 2016).

66 The World Drug Report estimates ~4.5% of the global population, consumes Cannabis

67 regularly (United Nations Office on Drugs, Crime, 2010), and there are an estimated ~3.5 million

68 medical marijuana patients in the US (, 2017). Recent legalization has

69 led to a surge of new strains as breeders are producing new plant varieties with novel chemical

70 profiles with various psychotropic effects, and relief for an array of symptoms associated with

3 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

71 medical conditions including (but not limited to): chronic pain, depression, anxiety, PTSD,

72 autism, fibromyalgia, epilepsy, Chron’s Disease, and glaucoma (Ogborne et al., 2000; Tomida et

73 al., 2004; Borgelt et al., 2013; Naftali et al., 2013; ProCon, 2016b).

74 Research using a variety of techniques consistently finds drug-types and hemp are

75 genetically distinct (de Meijer et al., 1996; Small, 1997; Sawler et al., 2015; Lynch et al., 2016;

76 Dufresnes et al., 2017). Variation within the drug-types is higher than within hemp (Small, 1997;

77 Sawler et al., 2015; Lynch et al., 2016; Vergara et al., 2016). There is limited genetic research on

78 variation within strains, but in studies with multiple accessions of a particular strain, variation is

79 observed (Sawler et al., 2015; Lynch et al,. 2016; Soler et al., 2017).

80 There are generally two Cannabis usage groups (hemp and drug-types) although the

81 scientific and common nomenclature is conflicted. The current Flora of North America

82 recognizes all forms of Cannabis as Cannabis sativa L. (Small, 1997), but many breeders and

83 botanists support the polytypic taxonomy of Cannabis based on morphological (de Lamarck &

84 Poiret, 1789; Schultes, 1970; Emboden, 1974; Anderson, 1980), chemical (de Meijer et al., 2003;

85 Hillig & Mahlberg, 2004; Hillig, 2005; Hazekamp & Fischedick, 2012) and psychotropic (de

86 Meijer et al., 2003; Hillig & Mahlberg, 2004; Hazekamp & Fischedick, 2012; Clarke & Merlin,

87 2013) differences. However, the suggested putative species are presumed to readily interbreed

88 and therefore violate species concepts that are applicable to plants (De Queiroz, 2007). The

89 common terminology for Cannabis products are, that (1) hemp types have < 0.3% Δ9-

90 tetrahydrocannabinol (THC), (2) plants of broad and narrow leaf drug-types as well as hybrid

91 variants with moderate to high THC concentrations are referred to as marijuana, (3) drug-type

92 strains of Cannabis are commonly divided into three categories: Sativa, Indica and Hybrid type

93 strains, (4) drug-type strains with low THC and high (CBD) are sought after for

4 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

94 medicinal use, and (5) there are thousands of variants of Cannabis referred to as strains. Genetic

95 analyses have not provide a clear consensus for higher taxonomic distinction among these

96 commonly described Cannabis types (Sawler et al., 2015; Lynch et al., 2016), but both the

97 recreational and communities claim there are distinct differences in effects

98 between Sativa and Indica type strains (Smith, 2012; Leaf Science, 2014). Sativa type strains are

99 associated with tall, loosely branched plants with long, narrow leaflets, and are reported to have

100 energizing or uplifting psychotropic effects (Russo, 2007; Fischedick et al., 2010; Hillig, 2004).

101 Indica type strains are associated with shorter plants with dense branching and broad leaflets, and

102 reportedly exhibit sedating effects and pain relieving properties (Russo, 2007; Fischedick et al.,

103 2010; Hillig, 2004). Hybrid types are a mix of varying degrees of the reported effects of Sativa

104 and Indica types.

105 Morphological variation is typically used to categorize species, sub-species, and varieties.

106 However, morphological identification can be difficult with closely related taxa and hybrid

107 organisms (Rieseberg, 1995; Rieseberg, 1997; Cattell & Karl, 2004; Mallet, 2005; Zha et al.,

108 2008, Schwabe et al. 2015). Sexual reproduction generally results in offspring with a blend of

109 traits from both parents. On the other hand, clonal offspring or progeny produced from self-

110 fertilization should be virtually identical to the parent. Unique physical differences (phenotypes)

111 and varying chemical profiles (chemotypes) may result when plants with the same genetic profile

112 (genotype) are impacted by environmental factors (phenotypic plasticity) (Schlichting, 1986;

113 Elzinga et al. 2015). Phenotypic plasticity is commonly observed in Cannabis, and therefore, the

114 use of chemical profile or other physical characteristics are not ideal to precisely identify

115 Cannabis variants (Schultes, 1970; Clarke & Merlin, 2013; Small, 2017)

5 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

116 Female flowers of predominantly dioecious Cannabis plants produce the majority of

117 and terpenes in glandular trichomes. Female plants are selected based on desirable

118 characters (mother plants) and are reproduced through cloning and, in some cases, self-

119 fertilization to produce seeds (Green, 2005). The offspring will be identical (from clone), or

120 nearly identical (from seed), to the mother plant. Cross-pollination allows for genetic variability

121 and novel strain creation, but generally Cannabis growers use cloning to produce consistent

122 products of established and popular strains. Whether propagated through cloning or from

123 germination of self-fertilized seed, genetic variation within strains should be minimal no matter

124 the source of origin.

125 There are an overwhelming number of Cannabis strains that vary widely in appearance,

126 taste, smell and psychotropic effects (de Lamarck & Poiret, 1789; Schultes, 1970; Emboden,

127 1974; Anderson, 1980; de Meijer et al., 2003; Hillig & Mahlberg, 2004; Hillig, 2005; Hazekamp

128 & Fischedick, 2012; Clarke & Merlin, 2013). Strains are generally categorized as Indica, Sativa

129 or Hybrid types. Online databases such as Leafly (Leafly, 2018) and Wikileaf (Wikileaf, 2018)

130 provide consumers with information about strains but lack scientific merit for the Cannabis

131 industry to regulate the consistency of strains. To our knowledge, there have not been any

132 published scientific studies specifically investigating the genetic consistency of strains at

133 multiple points of sale for Cannabis consumers.

134 Of particular interest is how the genetic integrity of named Cannabis strains over time in

135 the absence of regulation been maintained (Green, 2014; Stockton, 2015). Other crop varieties

136 are protected by certification through the United States Department of Agriculture (USDA) and

137 The Plant Variety Protection Act of 1970 (PVPA), or similar mechanisms in other countries.

138 This system protects against commercial exploitation, allows for trademarking, and recognizes

6 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

139 intellectual property for developers of new plant cultivars (United States Department of

140 Agriculture, 1989). Traditionally, morphological characters were used to define new varieties in

141 crops such as grapes (Vitis vinifera L.), olives (Olea europea L.) and apples (Malus domestica

142 Borkh.). With the rapid development of new varieties in these types of crops, morphological

143 characters have become increasingly difficult to distinguish. Currently, quantitative and/or

144 molecular characters are often used to demonstrate uniqueness among varieties to obtain a plant

145 variety protection certificate from the Plant Variety Protection Office (PVPO) of the Agricultural

146 Marketing Service, USDA (United States Department of Agriculture, 2015). Microsatellite

147 genotyping enables growers and breeders of new cultivars to demonstrate uniqueness through

148 variable genetic profiles (Rongwen et al., 1995). Microsatellite genotyping has been used to

149 distinguish cultivars and hybrid varieties of multiple crop varietals within species (Guilford et

150 al., 1997; Hokanson et al., 1998; Cipriani et al., 2002; Belaj et al., 2004; Sarri et al., 2006;

151 Baldoni et al., 2009; Sˇtajner et al., 2011; Costantini et al., 2015; Pellerone et al., 2015).

152 Multiple crop studies have found that 3-12 microsatellite loci are sufficient to accurately identify

153 varietals and detect misidentified individuals (Cipriani et al., 2002; Belaj et al., 2004; Sarri et al.,

154 2006; Poljuha et al., 2008; Baldoniet al., 2009; Muzzalupo et al., 2009;). Cannabis varieties

155 however, are not afforded any legal protections, as the USDA considers it an “ineligible

156 commodity” (United States Department of Agriculture, 2016), but this system provides a model

157 by which Cannabis strains could also be developed, identified, registered, and protected.

158 Currently, the Cannabis industry has no way to verify strains. Consequently, suppliers

159 are unable to provide confirmation of strains. Reports of inconsistencies, along with the history

160 of underground trading and growing in the absence of a verification system, reinforce the

161 likelihood that strain names may be unreliable identifiers for Cannabis products at the present

7 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

162 time. Without verification systems in place, there is the potential for misidentification and

163 mislabeling of plants, creating names for plants of unknown origin, and even re-naming or re-

164 labeling plants with prominent names for better sale. Cannabis taxonomy is complex, but given

165 the success of microsatellites to determine varieties in other crops, we suggest the using genetic

166 based approaches to provide identification information for strains in the medical and recreational

167 marketplace.

168 Variable microsatellite markers were developed using the Cannabis sativa ‘Purple

169 draft genome (National Center for Biotechnology Information, accession AGQN00000000.1).

170 These regions were compared within commercially available C. sativa strains to determine if

171 products with the same name purchased from different sources have the genetic congruence we

172 expect from propagation of clones or self-fertilized seeds. The unique approach for this study

173 was that of the common retail consumer. Flower samples were purchased legally from

174 dispensaries based on what was available at the time of purchase. All products were purchased

175 as-is, with no additional information provided by the facility, other than the identifying label

176 (strain name). This study aimed to determine if: (1) any genetic distinction separates the common

177 perception of Sativa, Indica and Hybrid types; (2) purported proportions for Sativa, Indica and

178 Hybrid type strains are reflected in the genotypes of multiple strains; (3) consistent genetic

179 identity is found within a variety of different strain accessions obtained from different facilities;

180 (4) there is evidence of misidentification or mislabeling.

181

182 Materials and Methods

183 Genetic Material

8 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

184 Cannabis samples for 30 strains were acquired from 20 dispensaries or donors in three

185 states: Colorado - Denver (4), Boulder (3), Fort Collins (3), Garden City (4), Greeley (1),

186 Longmont (1); California - San Luis Obispo (4); and Washington - Union Gap (1) (Table 1). All

187 samples used in this study were obtained legally from either retail (Colorado and Washington),

188 medical (California) dispensaries, or as a donation from legally obtained samples (Greeley 1).

189 DNA was extracted using a modified CTAB extraction protocol (Doyle 1987) with 0.035-0.100

190 grams of dried flower tissue per extraction Proportions of Sativa and Indica phenotypes for each

191 strain were retrieved from Wikileaf (Wikileaf, 2018). Analyses were performed on the full 122-

192 sample dataset (Table 1). A subset of twelve strains in high demand was used throughout the

193 study to emphasize various genetic anomalies and patterns (Table 2). The twelve strains were

194 chosen based on popularity (Leafly, 2018; Wikileaf, 2018) and availability.

195

196 Microsatellite Development

197 The Cannabis draft genome from ‘Purple Kush’ (GenBank accession AGQN00000000.1)

198 was scanned for microsatellite repeat regions using MSATCOMMANDER-1.0.8-beta (Faircloth,

199 2008). Primers were developed de-novo flanking thirty microsatellites with 3-6 nucleotide repeat

200 units (Table S1). One primer in each pair was tagged with a 5’ universal sequence (M13, CAGT

201 or T7) so that a matching sequence with a fluorochrome tag could be incorporated via PCR

202 (Schwabe et al., 2013). Ten of the thirty primer pairs produced consistent peaks within the

203 predicted size range and were used for the genetic analyses herein.

204 205 PCR and Data Scoring

9 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

206 Microsatellite loci were amplified in 12 µL reactions using 1.0 µL DNA (10-20 ng/ µL),

207 0.6 µL fluorescent tag (5 µM; FAM, VIC, or PET), 0.6 µL non-tagged primer (5 µM), 0.6 µL

208 tagged primer (0.5 µM), 0.7 µL dNTP mix (2.5mM), 2.4 µL GoTaq Flexi Buffer (Promega,

209 Madison, WI, USA), 0.06 µL GoFlexi taq polymerase (Promega), 0.06 µL BSA (Bovine Serum

210 Albumin 100X), 0.5-6.0 µL MgCl or MgSO4, and 0.48-4.98 µL dH2O. Amplified products were

211 combined into multiplexes and diluted with water. Hi-Di formamide and LIZ 500 size standard

212 (Applied Biosystems, Foster City, CA, USA) were added before electrophoresis on a 3730

213 Genetic Analyzer (Applied Biosystems) at Arizona State University. Fragments were sized using

214 GENEIOUS 8.1.8 (Biomatters Ltd).

215

216 Genetic Statistical Analyses

217 GENALEX ver. 6.4.1 (Peakall & Smouse, 2006; Peakall & Smouse, 2012) was used to

218 calculate deviation from Hardy–Weinberg equilibrium (HWE). Linkage disequilibrium was

219 tested using GENEPOP ver. 4.0.10 (Raymond & Rousset, 1995; Rousset, 2008). The possibility

220 of null alleles was assessed using MICRO-CHECKER (Van Oosterhoutet al., 2004). Genotypes

221 were analyzed using the Bayesian cluster analysis program STRUCTURE ver. 2.4.2 (Pritchard et

222 al., 2000). Burn-in and run-lengths of 50,000 generations were used with ten independent

223 replicates for each STRUCTURE analysis. STRUCTURE HARVESTER (Earl, 2012), which

224 implements the Evanno method (Evanno et al., 2005), was used to determine the K value that

225 best describes the number of genetic groups for the data set. GENALEX was used to conduct a

226 Principal Coordinate Analysis (PCoA) to examine variation in the dataset. Lynch & Ritland

227 (Lynch & Ritland, 1999) pairwise genetic relatedness (r) values were reported for each sample

228 within a strain using GENALEX. Mean pairwise relatedness (r) statistics were calculated

10 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

229 between all 122 samples resulting in 7381 pairwise r-values showing degrees of relatedness. A

230 genetic pairwise relatedness heat map of the data set was generated in Microsoft EXCEL. For all

231 strains the r-mean and standard deviation (SD) was calculated averaging among all samples.

232 Obvious outliers were determined by calculating the lowest r-mean and iteratively removing

233 those samples to determine the relatedness among the remaining samples in the subset. A graph

234 was generated for the twelve popular strains to show how the r-mean value change within a

235 strain when outliers were removed.

236

237 Results

238 The microsatellite analyses show genetic inconsistencies in Cannabis strains acquired

239 from different facilities. The samples used in this study are drug-type strains and are categorized

240 as Sativa, Indica and Hybrid type according to Wikileaf (Wikileaf, 2018). While some popular

241 strains were widely available, some strains were found only at two dispensaries (Table 1 & 2).

242 Since the aim of the research was not to identify specific locations where strain inconsistencies

243 were found, the names for each dispensary are coded to protect the identity of businesses.

244 There was no evidence of linkage-disequilibrium when all the samples were treated as a

245 single population. All loci deviate significantly from HWE when all samples were treated as a

246 single population, and all but one locus was monomorphic in at least two strains. All but one

247 locus had excess homozygosity and therefore possibly null alleles. Given the inbred nature and

248 extensive hybridization of Cannabis, deviations from neutral expectations are not surprising, and

249 the lack of linkage-disequilibrium indicates that the markers are spanning multiple regions of the

250 genome. There was no evidence of null alleles due to scoring errors.

11 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

251 STRUCTURE HARVESTER calculated high support (∆K=146.56) for two genetic

252 groups, K=2 (Fig. 1). STRUCTURE assignment for all samples is shown in Fig. 2 with the

253 strains ordered by the purported proportions of Sativa phenotype (Wikileaf, 2018) and then

254 alphabetically within each strain by city. A clear genetic distinction between Sativa and Indica

255 types would assign 100% Sativa strains (‘Durban Poison’) to one genotype, and assign 100%

256 Indica strains (‘Purple Kush’) to the other genotype (Table 2, Fig. 2). Division of the genotypes

257 into two genetic groups does not support the commonly described Sativa and Indica phenotypes.

258 For the assigned 100% Sativa type strain ‘Durban Poison’, seven of nine samples show greater

259 than 96% assignment to genotype 1 (blue; Fig. 2). For the assigned 100% Indica type ‘Purple

260 Kush’ three of four samples of show greater than 89% assignment to genotype 2 (yellow; Fig. 2).

261 However, samples of ‘Hawaiian’ (90% Sativa) and ‘Grape Ape’ (100% Indica) do not show

262 consistent patterns of predominant assignment to genotype 1 or 2. Interestingly, ‘Durban Poison’

263 (100% Sativa, n = 9) and ‘’ (90% Sativa, n = 7) have 86% and 14% average

264 assignment to genotype 1, respectively. Hybrid strains should result in some proportion of shared

265 ancestry, with assignment to both genotype 1 and 2. The strains ‘Blue Dream’ and ‘Tahoe OG’

266 are reported as 50-50% Sativa-Indica Hybrid strains, but eight of nine samples of ‘Blue Dream’

267 show > 80% assignment to genotype 1, and three of four samples of ‘Tahoe OG’ show < 7%

268 assignment to genotype 1.

269 Principal Coordinate Analyses (PCoA) were conducted using GENALEX for (1) all

270 samples (Fig. 2) and (2) twelve popular strains (Fig. S2). The samples in the PCoA of all 30

271 strains are organized from 100% Sativa types (red), through all levels of Hybrid types, to 100%

272 Indica types (purple; Fig. 4). Strain types with the same reported proportions are the same color

273 but have different symbols. The PCoA of all strains represents 14.90% of the variation in the

12 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

274 data on coordinate axis 1, 9.56% on axis 2, and 7.07% on axis 3 (not shown). The second PCoA

275 of twelve popular strains specifically examines the genetic relationship within strains that are in

276 high demand (Fig. S2). The results from this analysis found that 15.30% of the variation in the

277 data is explained by coordinate axis 1, 12.98% on axis 2, and 7.96% on axis 3 (not shown).

278 Lynch & Ritland (Lynch & Ritland, 1999) pairwise genetic relatedness (r) between all

279 122 samples was calculated in GENALEX. The resulting 7380 pairwise r-values were converted

280 to a heat map using purple to indicate the lowest pairwise relatedness value (-1.09) and green to

281 indicate the highest pairwise relatedness value (1.00; Fig. S3. Comparisons are detailed for six

282 popular strains (Fig. 3) to illustrate the relationship of samples from different sources and the

283 impact of outliers. Values of close to 1.00 indicate a high degree of relatedness (Lynch &

284 Ritland, 1999), which could be indicative of clones or seeds from the same mother (Green, 2005;

285 SeedFinder, 2017). First order relatives (full siblings or mother-daughter) share 50% genetic

286 identity (r-value = 0.50), second order relatives (half siblings or cousins) share 25% genetic

287 identity (r-value = 0.25), and unrelated individuals are expected to have an r-value of 0.00 or

288 lower. Negative values arise when individuals are less related than expected under normal

289 panmictic conditions (Moura et al., 2013; Norman et al., 2017). Values ranged from -1.09

290 (between ‘Purple Haze’ Greeley 1 and ‘Girl Scout Cookies’ Union Gap 1) indicating low levels

291 of relatedness, to 1.00 (e.g., between ‘Durban Poison’ samples from Boulder 3 and Fort Collins

292 3).

293 Individual pairwise r-values were averaged within strains to calculate the overall r-mean

294 as a measure of genetic similarity within strains. The overall r-means within strains ranged from

295 -0.22 (‘Tangerine’) to 0.68 (‘Island Sweet Skunk’) (Table 3). Standard deviations ranged from

296 0.04 (‘) to 0.51 (‘Bruce Banner’). The strains with higher standard deviation values

13 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

297 indicate a wide range of genetic relatedness within a strain, while low values indicate that

298 samples within a strain share similar levels of genetic relatedness. In order to determine how

299 outliers impact the overall relatedness in a strain, the farthest outlier (lowest pairwise r-mean

300 value) was removed and the overall r-means and SD values within strains were recalculated

301 (Table 3). In all strains, the overall r-means increased when outliers were removed. In strains

302 with more than three samples, a second outlier was removed and the overall r-means and SD

303 values were recalculated. Overall r-means were used to determine degree of relatedness as clonal

304 (or from stable seed; overall r-means > 0.9), first or higher order relatives (overall r-means 0.46

305 – 0.89), second order relatives (overall r-means 0.26 - 0.45), low levels of relatedness (overall r-

306 means 0.00 - 0.25), and not related (overall r-means <0.00). Initial overall r-means indicate only

307 three strains are first or higher order relatives (Table 3). Removing outliers revealed samples

308 within ten of the remaining 22 strains are first or higher order relatives. After outliers were

309 removed, 15 of the 30 strains are comprised of first or higher order relatives, indicating outliers

310 are often responsible for variability within strains. Removing outliers revealed samples within

311 seven of the twelve popular strains are of first or higher order relatives (Table 3, Fig. 4). Three

312 strains are comprised of second order relatives with overall r-means ranging from 0.22 - 0.25.

313 Two strains show low levels of relatedness with overall r-means ranging from 0.13 - 0.16 even

314 after outliers are removed (Table 3). The impact of outliers can be clearly seen in the heat map

315 for ‘Durban Poison’ which shows the relatedness for 36 comparisons (Fig. 3A), six of which are

316 nearly identical (r-value 0.90 - 1.0), six of which are first order siblings (r-value 0.46 - 0.89), six

317 of which are second order relatives (r-value 0.26 - 0.45), five of which have low levels of

318 relatedness (r-value 0.00 - 0.25), and 13 which are not related (r-value <0.00). However, removal

14 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

319 of two outliers, Denver 1 and Garden City 2, reduces the number of comparisons ranked as not

320 related from 13 to zero, and low level of relatedness from five to one.

321

322 Discussion

323 The legal status and social attitudes toward Cannabis are changing worldwide, with more

324 than half the states in the U.S. having sanctioned medical Cannabis use (ProCon, 2016a).

325 Cannabis types and strains are becoming an ever-increasing topic of discussion, so it is

326 important that scientists and the public can discuss Cannabis in a similar manner. Currently, not

327 only are Sativa and Indica types disputed, but also experts are at odds about nomenclature for

328 Cannabis (Clarke & Merlin, 2015; Small, 2015b). We investigated the possibility of a genetic

329 distinction in commonly described Sativa and Indica strains. Previous genetic research found

330 genetic variability among seeds from the same strain supplied from a single source, indicating

331 genotypes within strains are variable (Sohler et al., 2017). However, it was unclear if the seeds in

332 the study were produced from multiple parent plants, which could have introduced a source for

333 genetic variation. The focus of this study is that genetic profiles from strains with the same

334 identifying name should have identical, or at least, highly similar genotypes no matter the source

335 of origin. It is important that strain names reflect consistent genetic identity, especially for those

336 who rely on Cannabis to alleviate specific medical symptoms. An important element for this

337 study is that samples were acquired from multiple locations to maximize the potential for

338 variation among samples. The multiple genetic analyses used here address important questions

339 and bring scientific evidence to support claims that inconsistent products are being distributed.

340 Genotype analysis can be used to ensure higher levels of consistency within strains. Maintenance

341 of the genetic integrity of strains is possible only following evaluation of genetic consistency,

15 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

342 and continuing to overlooking this aspect will to promote variability and phenotypic variation.

343 Addressing strain variability at the molecular level is of the utmost importance while the industry

344 is still relatively new.

345 Genetic analyses have consistently found genetic distinction between hemp and

346 marijuana, but no clear distinction has been shown between the common description of Sativa

347 and Indica types (de Meijer et al., 1996; Small, 1997; Lynch et al., 2016; Sawler et al., 2015;

348 Vergara et al., 2016; Dufresnes et al., 2017; Soler et al., 2017). We found high support for two

349 genetic groups in the data (Fig. 1) but no discernable distinction or pattern between the described

350 Sativa and Indica strains. The color-coding of strains in the PCoA for all 122 samples allows for

351 visualization of clustering among similar phenotypes by color Sativa (red/orange), Indica

352 (blue/purple) and Hybrid (green) type strains (Fig. 2). However, there is no evidence of

353 clustering in the three commonly described types. If genetic differentiation of the commonly

354 perceived Sativa and Indica types previously existed, it is no longer detectable in the neutral

355 genetic markers used here. Extensive hybridization and selection has presumably created a

356 homogenizing effect and erased evidence of potentially divergent historical genotypes.

357 Wikileaf maintains that the proportions of Sativa and Indica reported for strains are

358 largely based on genetics and lineage (Dan Nelson, Wikileaf, personal communication). This has

359 seemingly become convoluted over time (Russo, 2007; Small, 2015a; Clarke & Merlin, 2013;

360 Small, 2017). Our results show that commonly reported levels of Sativa, Indica and Hybrid type

361 strains are often not reflected in the average genotype. For example, two sought-after Sativa

362 strains, ‘Durban Poison’ and ‘Sour Diesel’, were found to have contradicting genetic

363 assignments (Fig. 1, Table 2). ‘Durban Poison’, described as 100% Sativa, has an 86% average

364 assignment to genotype 1, while ‘Sour Diesel’, described as 90% Sativa, has a 14% average

16 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

365 assignment to genotype 1. This analysis indicates strains with similar reported proportions of

366 Sativa or Indica may have differing genetic assignments. Further illustrating this point is that

367 ‘Bruce Banner’, ‘Flo’, ‘Jillybean’, ‘Pineapple Express’, ‘Purple Haze’, and ‘Tangerine’ are all

368 reported to be 60/40 Hybrid type strains, but clearly have differing levels of admixture both

369 within and among these reportedly similar strains (Table 2, Fig. 1). From these results, we can

370 conclude that reported ratios or differences between Sativa and Indica phenotypes are not

371 discernable using these genetic markers. Given the lack of genetic distinction between Indica and

372 Sativa types, it is not surprising that reported ancestry proportions are also not supported.

373 To accurately address reported variation within strains, samples were purchased from

374 various locations, as a customer, with no information of strains other than publically available

375 online information. Evidence for genetic inconsistencies is apparent within many strains and

376 supported by multiple genetic analyses. In our analyses of 30 strains, only 4 strains had

377 consistent STRUCTURE genotype assignment and admixture among all samples: ‘Chemdawg’

378 (n=7), ‘Island Sweet Skunk’ (n=3), ‘Larry OG’ (n=3) and ‘Jack Flash’ (n = 2; Fig. 2). However,

379 it is clear that many strains contained one or more obvious genetic outliers (e.g. Durban Poison –

380 Denver 1; Fig 1, 3A). With the removal of one obvious outlier, the remaining samples of eleven

381 strains were classified as first order relatives based on pairwise genetic relatedness r-values

382 (overall r-mean >0.45; Table 3, Fig. 4). The removal of a second outlier resulted in 15 of the 30

383 strains having an overall r-mean >0.45 (Table 3, Fig. 4). Together, these results indicate that half

384 of the strains used in this analysis showed relatively stable genetic identity among most samples

385 within a strain. Six of the strains with inconsistent patterns had only two samples, both of which

386 were different (e.g., ‘Trainwreck’ and ‘Headband’). The remaining nine strains in the analysis

387 had more than one obvious outlier (e.g., ‘Sour Diesel’) or had no consistent genetic pattern

17 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

388 among the samples within the strain (e.g., ‘Girl Scout Cookies’; Table 3, Fig. 1, Fig. 2, Fig. S2).

389 It is noteworthy that many of the strains used here fell into a range of genetic relatedness

390 indicative of first order siblings (r-value 0.46 - 0.89) when samples with high genetic divergence

391 were isolated and removed from the data set (Table 4; Figs. 3, 4).

392 Relationships within the twelve popular strains were analyzed separately to determine if

393 (1) strains with more samples show a higher degree of clustering, and (2) strains in higher

394 demand have a higher degree of genetic relatedness. The analysis of genetic variation for the

395 subset of twelve popular strains shows some clustering within strains (Fig. S2), but clustering is

396 not seen for all strains, and outliers are apparent. This analysis represents more of the variation in

397 the data compared to the PCoA for all 30 strains and shows clustering of some strains, such as

398 ‘Durban Poison’, ‘Golden Goat’ and ‘Blue Dream’. However, all clusters have at least one

399 sample that is removed from the other samples in the group. From this we argue that samples

400 representing the popular strains may be slightly more likely to have a higher degree of genetic

401 relatedness, but more sampling would be required to determine this with confidence.

402 A pairwise genetic heat map based on Lynch & Ritland (Lynch & Ritland, 1999)

403 pairwise genetic relatedness (r-values) was generated to visualize genetic relatedness throughout

404 the data set (Fig. S3). Values of 1.00 (or close to) are assumed to be clones or plants from self-

405 fertilized seed. Six examples of within-strain pairwise comparison heat maps were examined to

406 illustrate common patterns (Fig.7). The heat map shows that many strains contain samples that

407 are first order relatives or higher (r-value > 0.49). For example ‘Sour Diesel’ (Fig. 3??) has 12

408 comparisons of first order or above, and six have low/no relationship. There are also values that

409 could be indicative of clones or plants from a stable seed source such as ‘Blue Dream’ (Fig.

410 3???), which has 10 nearly identical comparisons (r-value 0.90-1.00), and no comparisons in

18 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

411 ‘Blue Dream’ have negative values. While ‘Blue Dream’ has an initial overall r-mean indicating

412 first order relatedness within the samples (Table 3, Fig. 4), it still contains more variation than

413 would be expected from a clone only strain (SeedFinder, 2017). Other clone-only strains

414 (SeedFinder, 2017), e.g. ‘Girl Scout Cookies’ (Table 3, Fig. 3??) and ‘Golden Goat’ (Table 3,

415 Fig. 3??), have a high degree of genetic variation resulting in low overall relatedness values.

416 Outliers were calculated and removed iteratively to demonstrate how they affected the overall r-

417 mean within the twelve popular strains (Table 3, Fig. 4). In all cases, removing outliers increased

418 the mean r-value, as illustrated by ‘Bruce Banner’, which increased substantially, from 0.3 to 0.9

419 when samples with two outlying genotypes removed. The outliers are evidence of

420 inconsistencies within strains and when removed, genetic relatedness greatly improves. There are

421 unexpected areas in the heat map that indicate high degrees of relatedness between different

422 strains (Fig. S3). For example, comparisons between ‘Golden Goat’ and ‘Island Sweet Skunk’

423 (overall r- mean 0.37) are higher than within samples of ‘Sour Diesel’. Interestingly, ‘Golden

424 Goat’ is reported to be a hybrid descendant of ‘Island Sweet Skunk’ (Leafly, 2018), which

425 explains the high genetic relatedness between these strains. However, most of the between strain

426 overall r- mean are negative (e.g., ‘Golden Goat’ to ‘Durban Poison’ -0.03 and ‘Chemdawg’ to

427 ‘Durban Poison’ -0.22; Fig. S3), indicative of limited recent genetic relationship.

428 While collecting samples from various dispensaries, it was noted that strains of

429 ‘Chemdawg’ had various different spellings of the strain name, as well as numbers and/or letters

430 attached to the name. Without knowledge of the history of ‘Chemdawg’, the assumption was that

431 these were local variations. These were acquired to include in the study to determine if and how

432 these variants were related. Upon investigation of possible origins of ‘Chemdawg’, an interesting

433 history was uncovered, especially in light of the results (Backes & Weil, 2014). Legend has it

19 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

434 that someone named “Chemdog” (a person) grew the variations (‘Chemdawg 91’, ‘Chemdawg

435 D’, ‘Chemdawg 4’, ‘Chemdog 1’) from seeds he found in an ounce he purchased at a Grateful

436 Dead concert. This illustrates how Cannabis strains may have come to market in a non-

437 traditional manner. The history of ‘Chemdawg’ is currently unverifiable, but the analysis

438 supports that these variations could be from seeds of the same plant. Genetic analyses can add

439 scientific support to the stories behind vintage strains and possibly help clarify the history of

440 specific strains.

441 Possible facilitation of inconsistencies may come from both suppliers and growers of

442 Cannabis clones and stable seed, because currently they can only assume the strains they possess

443 are true to name. There is a chain of events from seed to sale that relies heavily on the supplier,

444 grower, and dispensary to provide the correct product, but there is currently no reliable way to

445 verify Cannabis strains. The possibility exists for errors in plant labeling, misplacement,

446 misspelling, and/or relabeling along the entire chain of production. Although the expectation is

447 that plants are labeled carefully and not re-labeled with a more desirable name for a quick sale,

448 these misgivings must be considered. Identification by genetic markers has largely eliminated

449 these types of mistakes in other widely cultivated crops such as grapes, olives and apples.

450 Modern genetic applications can accurately identify varieties and can clarify ambiguity in closely

451 related and hybrid species, [e.g., Rongwen et al., 1995; Guilford et al., 1997; Belaj et al. 2004;

452 Muzzalupo et al., 2009; Sˇtajner et al., 2011).

453 Matching genotypes within the same strains were expected, but highly similar genotypes

454 between samples of different strains could be the result of mislabeling or misidentification,

455 especially when acquired from the same source. The pairwise genetic relatedness r-values were

456 examined for incidence of possible mislabeling or re-labeling. There were instances in which

20 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

457 different strains had r-values = 1.0 (Fig. S3), indicating clonal genetic relationships. Two

458 samples with matching genotypes were obtained from the same location (‘Larry OG’ and ‘Tahoe

459 OG’ from San Luis Obispo 3). This could be evidence for mislabeling or misidentification

460 because these two samples have similar names. It is unlikely that these samples from reportedly

461 different strains have identical genotypes, and more likely that these samples were mislabeled at

462 some point. Misspelling may also be a source of error, especially when facilities are handwriting

463 labels. An example of possible misspelling may have occurred in the sample labeled ’Chemdog

464 1’ from Garden City 1. ‘Chemdawg 1’, a described strain, could have easily been misspelled, but

465 it is unclear whether this instance is evidence for mislabeling or renaming a local variant.

466 Inadvertent mistakes may carry through to scientific investigation where strains are spelled or

467 labeled incorrectly. For example, Vergara et al. (2016) reports genome assemblies for

468 ‘Chemdog’ and ‘Chemdog 91’ as they are reported in GenBank (GCA_001509995.1), but

469 neither of these labels are recognized strain names. It is likely that these are ‘Chemdawg’ and

470 ‘Chemdawg 91’ (Leafly, 2018; Wikileaf, 2018) although it is possible these strains are

471 unreported variants. Another example that may lead to confusion is how information is reported

472 in public databases. For example, data is available for the reported monoisolate of ‘Pineapple

473 Banana Bubba Kush’ in GenBank (SAMN06546749), and while ‘Pineapple Kush’, ‘Banana

474 Kush’ and ‘Bubba Kush’ are known strains (Leafly, 2018; Wikileaf, 2018), the only record of

475 ‘Pineapple Banana Bubba Kush’ is in Genbank. This study has highlighted several possible

476 sources of error and how genotyping can serve to uncover sources of variation. Although this

477 study was unable to confirm sources of error, it is important that producers, growers and

478 consumers are aware that there are errors and they should be documented and corrected

479 whenever possible.

21 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

480

481 Conclusion

482 Over the last decade, the legal status of Cannabis has shifted and is now legal for medical

483 use, and some recreational adult use, in the majority of the United States as well as several other

484 countries that have legalized or decriminalized Cannabis. The recent legal changes have led to

485 an unprecedented increase in the number of strains available to consumers. There are currently

486 no baseline genotypes for any strains, but steps should be taken to ensure products marketed as a

487 particular strain are genetically congruent. Although the sampling in this study was not

488 exhaustive, the results are clear: strain inconsistency is evident and is not limited to a single

489 source, but rather exists among dispensaries across cities in multiple states. Various suggestions

490 for naming the genetic variants do not seem to align with the current widespread definitions of

491 Sativa, Indica, Hybrid, and Hemp (Hillig, 2005; Clarke & Merlin, 2013). As our Cannabis

492 knowledge base grows, so does the communication gap between scientific researchers and the

493 public. Currently, there is no way for Cannabis suppliers, growers or consumers to definitively

494 verify strains. Exclusion from protection, due to the Federal status of Cannabis as a Schedule I

495 drug, has created avenues for error and inconsistencies. Presumably, the genetic inconsistencies

496 will often manifest as differences in overall effects (Backes, 2014). Differences in characteristics

497 within a named strain may be surprising for a recreational user, but differences may be more

498 serious for a medical patient who relies on a particular strain for alleviation of specific

499 symptoms.

500 This study shows that in neutral genetic markers, there is no consistent genetic

501 differentiation between the widely held perceptions of Sativa and Indica Cannabis types.

22 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

502 Moreover, the genetic analyses do not support the reported proportions of Sativa and Indica

503 within each strain, which is expected given the lack of genetic distinction between Sativa and

504 Indica. Instances were found where samples within strains are not genetically similar, which is

505 unexpected given the manner in which Cannabis plants are propagated. Although it is impossible

506 to determine the source of these inconsistencies as they can arise at multiple points throughout

507 the chain of events from seed to sale, we theorize misidentification, mislabeling, misplacement,

508 misspelling, and/or relabeling are all possible. Especially where names are similar, there is the

509 possibility for mislabeling, as was shown here. In many cases genetic inconsistencies within

510 strains were limited to one or two samples. We feel that there is a reasonable amount of genetic

511 similarity within many strains, but currently there is no way to verify the “true” genotype of any

512 strain. Although the sampling here includes merely a fragment of the available Cannabis strains,

513 our results give scientific merit to claims that strains can be unpredictable.

514

515 Supplementary Data

516 Table S1: Primer information used in this research. 517 518 Fig. S1: STRUCTURE HARVESTER graph indicating K=2 is highly supported. 519 520 Fig. S2: Principal Coordinates Analysis (PCoA) for twelve popular strains. 521 522 Fig. S3: Pairwise genetic relatedness (r) heat table with values for 122 samples. 523

524

525 Acknowledgements

526 We thank Gerald Bresowar and Nolan Kane for comments on an earlier draft of this manuscript.

527 The University of Northern Colorado School of Biological Sciences supported this research, and

23 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

528 we are grateful to the Graduate Student Association and the Gerald Schmidt Memorial Biology

529 Scholarship for providing partial funding to carry out this research.

24 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure Legends

Fig. 1 Bar plot graphs generated from STRUCTURE analysis for 122 individuals from 30 strains dividing genotypes into two genetic groups, K=2. Samples were arranged by purported proportions from 100% Sativa to 100% Indica (Wikileaf, 2018) and then alphabetically within each strain by city. Each strain includes reported proportion of Sativa in parentheses (Wikileaf, 2018) and each sample includes the coded location and city from where it was acquired. Each bar indicates proportion of assignment to genotype 1 and genotype 2.

Fig. 2 Principal Coordinates Analysis (PCoA) generated in GENALEX. Samples are a color-coded continuum by proportion of Sativa (Table 2) with the strain name given for each sample: Sativa type (red: 100% Sativa proportion, Hybrid type (dark green: 50% Sativa proportion), and Indica type (purple: 0% Sativa proportion). Different symbols are used to indicate different strains within reported phenotype. Coordinate axis 1 explains 14.29% of the variation, coordinate axis 2 explains 9.56% of the variation, and Coordinate axis 3 (not shown) explains 7.07%.

Fig. 3 Heat maps of six prominent strains using Lynch & Ritland (1999) pairwise genetic relatedness (r) values: purple indicates no genetic relatedness (minimum value -1.09) and green indicates a high degree of relatedness (maximum value 1.0). Sample strain names and location of origin are indicated along the top and down the left side of the chart. Pairwise genetic relatedness (r) values are given in each cell and cell color reflects the degree to which two individuals are related.

Fig. 4 This graph indicates the mean pairwise genetic relatedness (r) initially (light gray) and after the removal of one (medium gray) or two (dark gray) outlying samples in 12 prominent strains.

25 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

References

Abel EL. 2013. Marihuana: the first twelve thousand years. Springer Science & Business Media.

Anderson LC. 1980. Leaf variation among Cannabis species from a controlled garden. Botanical Museum Leaflets, Harvard University 28, 61-9.

Backes M, Weil A. 2014. Cannabis pharmacy: the practical guide to medical marijuana. Black Dog & Leventhal.

Baldoni L, et al. 2009. A consensus list of microsatellite markers for olive genotyping. Molecular Breeding. 24, 213-31.

Belaj A, Cipriani G, Testolin R, Rallo L, Trujillo I. 2004. Characterization and identification of the main Spanish and Italian olive cultivars by simple-sequence-repeat markers. HortScience. 2004. 39, 1557-61.

Borgelt LM, Franson KL, Nussbaum AM, Wang GS. 2013. The pharmacologic and clinical effects of medical cannabis. Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy. 33, 195-209.

Cattell MV, Karl SA. 2004. Genetics and morphology in a Borrichia frutescens and B. arborescens (Asteraceae) hybrid zone. American Journal of Botany. 91, 1757-66.

Cipriani G, Marrazzo MT, Marconi R, Cimato A, Testolin R. 2002. Microsatellite markers isolated in olive (Olea europaea L.) are suitable for individual fingerprinting and reveal polymorphism within ancient cultivars. Theoretical and Applied Genetics. 104, 223-8.

Clarke R, Merlin M. 2013. Cannabis: Evolution and Ethnobotany. University of California Press.

Clarke R, Merlin M. 2015. Letter to the Editor: Small, Ernest. 2015. Evolution and Classification of Cannabis sativa (Marijuana, Hemp) in Relation to Human Utilization. The Botanical Review. 81, 295-305.

Controlled Substances Act. 1970.Pub. L. 91–513, title II, § 101, Oct 27, 1970, 84 Stat. 1242.

Costantini LA, Monaco A, Vouillamoz JF, Forlani M, Grando MS. 2015. Genetic relationships among local Vitis vinifera cultivars from Campania (Italy). VITIS-Journal of Grapevine Research. 44, 25.

De Queiroz K. 2007. Species concepts and species delimitation. Systematic biology. 56, 879-86.

Doyle JJ. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue.

26 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Phytochemical Bulletin. 19, 11-5.

Dufresnes C, Jan C, Bienert F, Goudet J, Fumagalli L. 2017. Broad-Scale Genetic Diversity of Cannabis for Forensic Applications. PloS one. 2017. 12, e0170522.

Earl DA. 2012. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources. 4, 359-61.

Elzinga S, Fischedick J, Podkolinski R, Raber JC. 2015. Cannabinoids and terpenes as chemotaxonomic markers in cannabis. Natural Products Chemistry & Research. 3.

Emboden WA. 1974. Cannabis—a polytypic genus. Economic Botany. 28, 304-10.

Evanno G, Regnaut S, Goudet J. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology. 14, 2611-20.

Faircloth BC. 2008. Msatcommander: detection of microsatellite repeat arrays and automated, locus‐specific primer design. Molecular Ecology Resources. 8, 92-4.

Fischedick JT, Hazekamp A, Erkelens T, Choi YH, Verpoorte R. 2010. Metabolic fingerprinting of Cannabis sativa L., cannabinoids and terpenoids for chemotaxonomic and drug standardization purposes. Phytochemistry. 71, 2058-73.

Green G. 2005. The Cannabis Breeder’s Bible. San Francisco: Green Candy Press.

Green J. 2014. How Many Marijuana Strain Names Are There? Marijuana Business News. http://www.theweedblog.com/how-many-marijuana-strains-are-there/. Accessed July 14 2016.

Guilford P, Prakash S, Zhu JM, Rikkerink E, Gardiner S, Bassett H, Forster R. 1997 Microsatellites in Malus x domestica (apple): abundance, polymorphism and cultivar identification. Theoretical and Applied Genetics. 94, 249-54.

Hazekamp A, Fischedick JT. 2012. Cannabis‐from cultivar to chemovar. Drug Testing and Analysis. 4, 660-7.

Hillig KW. 2004. A chemotaxonomic analysis of terpenoid variation in Cannabis. Biochemical Systematics and Ecology. 32, 875-91.

Hillig KW, Mahlberg PG. 2004. A chemotaxonomic analysis of variation in Cannabis (Cannabaceae). American Journal of Botany. 91, 966-75.

Hillig KW. 2005. Genetic evidence for speciation in Cannabis (Cannabaceae). Genetic Resources and Crop Evolution. 52, 161-80.

27 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Hokanson SC, Szewc-McFadden AK, Lamboy WF, McFerson JR. 1998. Microsatellite (SSR) markers reveal genetic identities, genetic diversity and relationships in a Malus× domestica Borkh. core subset collection. Theoretical and Applied Genetics. 97, 671-83.

de Lamarck JB, Poiret JL. 1789. Encyclopédie méthodique: botanique. chez Panckoucke.

Leaf Science. 2014. Indica vs. Sativa: Understanding The Differences. http://www.leafscience.com/2014/06/19/indica-vs-sativa-understanding-differences/. Accessed June 19 2016.

Leafly. 2017. and Infused Product Explorer. https://www.leafly.com Accessed May 31 2017.

Lynch M, Ritland K. 1999. Estimation of pairwise relatedness with molecular markers. Genetics. 152, 1753-66.

Lynch RC, Vergara D, Tittes S, White K, Schwartz CJ, Gibbs MJ, Ruthenburg TC, deCesare K, Land DP, Kane NC. 2016. Genomic and chemical diversity in Cannabis. Critical Reviews in Plant Sciences. 35, 349-63.

Mallet J. 2005. Hybridization as an invasion of the genome. Trends in ecology & evolution. 20, 229-37.

Marijuana Policy Project. Medical Marijuana Patient Numbers. 2017. https://www.mpp.org/issues/medical-marijuana/state-by-state-medical-marijuana- laws/medical-marijuana-patient-numbers/. Accessed May 30 2017.

de Meijer EP, Bagatta M, Carboni A, Crucitti P, Moliterni VC, Ranalli P, Mandolino G. 2003. The inheritance of chemical phenotype in Cannabis sativa L. Genetics. 163, 335- 46.

de Meijer ED, Keizer LC. 1996. Patterns of diversity in Cannabis. Genetic resources and crop evolution. 43, 41-52.

Moura AE, Natoli A, Rogan E, Hoelzel AR. 2013. Atypical panmixia in a European dolphin species (Delphinus delphis): implications for the evolution of diversity across oceanic boundaries. Journal of Evolutionary Biology. 26, 63-75.

Muzzalupo I, Stefanizzi F, Perri E. 2009. Evaluation of olives cultivated in southern Italy by simple sequence repeat markers. HortScience. 44, 582-8.

Naftali T, Schleider LB, Dotan I, Lansky EP, Benjaminov FS, Konikoff FM. 2013. Cannabis induces a clinical response in patients with Crohn's disease: a prospective placebo- controlled study. Clinical Gastroenterology and Hepatology. 11, 1276-80.

Norman AJ, Stronen AV, Fuglstad GA, Ruiz-Gonzalez A, Kindberg J, Street NR, Spong G.

28 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2017. Landscape relatedness: detecting contemporary fine-scale spatial structure in wild populations. Landscape Ecology. 32, 181-94.

Ogborne AC, Smart RG, Weber T, Birchmore-Timney C. 2000. Who is using cannabis as a medicine and why: an exploratory study. Journal of Psychoactive Drugs. 32, 435-43.

Peakall RO, Smouse PE. 2006. GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular ecology notes. 6, 288-95.

Peakall RO, Smouse PE. 2012. GENALEX 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update. Bioinformatics. 28, 2537-9.

Pellerone FI, Edwards KJ, Thomas MR. 2015. Grapevine microsatellite repeats: isolation, characterisation and use for genotyping of grape germplasm from Southern Italy. VITIS- Journal of Grapevine Research. 40, 179.

Poljuha D, Sladonja B, Šetić E, Milotić A, Bandelj D, Jakše J, Javornik B. 2008. DNA fingerprinting of olive varieties in Istria (Croatia) by microsatellite markers. Scientia Horticulturae. 115, 223-30.

Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics. 155, 945-59.

ProCon (a). 2016. States with Pending Legislation or Ballot Measures to Legalize Medical Marijuana – Medical Marijuana – ProCon.org. http://medicalmarijuana.procon.org Accessed 31 May 2017.

ProCon (b). 2016. For Which Symptoms or Conditions Might Marijuana Provide Relief? http://medicalmarijuana.procon.org. Accessed August 6 2016.

Raymond M, Rousset F. 1995. GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. Journal of heredity. 86, 248-9.

Rieseberg LH. 1995. The role of hybridization in evolution: old wine in new skins. American Journal of Botany. 82, 944-53.

Rieseberg LH. 1997. Hybrid origins of plant species. Annual review of Ecology and Systematics. 28, 359-89.

Rongwen J, Akkaya MS, Bhagwat AA, Lavi U, Cregan PB. 1995. The use of microsatellite DNA markers for soybean genotype identification. Theoretical and Applied Genetics. 90, 43-8.

Rousset F. 2008. genepop’007: a complete re‐implementation of the genepop software for Windows and Linux. Molecular Ecology Resources. 8, 103-6.

29 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Russo EB. 2007. History of cannabis and its preparations in saga, science, and sobriquet. Chemistry & Biodiversity. 4, 1614-48.

Sarri V, Baldoni L, Porceddu A, Cultrera NG, Contento A, Frediani M, Belaj A, Trujillo I, Cionini PG. 2006. Microsatellite markers are powerful tools for discriminating among olive cultivars and assigning them to geographically defined populations. Genome. 49, 1606-15.

Sawler J, Stout JM, Gardner KM, Hudson D, Vidmar J, Butler L, Page JE, Myles S. 2015. The genetic structure of marijuana and hemp. PloS one. 10, e0133292.

Schlichting CD. 1986. The evolution of phenotypic plasticity in plants. Annual Review of Ecology and Systematics. 17, 667-93.

Schultes RE. 1970. The botanical and chemical distribution of hallucinogens. Annual Review of Plant Physiology. 21, 571-98.

Schwabe AL, Hubbard AR, Neale JR, McGlaughlin ME. 2013. Microsatellite loci development for rare Colorado Sclerocactus (Cactaceae). Conservation Genetics Resources. 5, 69-72.

Schwabe AL, Neale JR, McGlaughlin ME. 2015. Examining the genetic integrity of a rare endemic Colorado cactus (Sclerocactus glaucus) in the face of hybridization threats from a close and widespread congener (Sclerocactus parviflorus). Conservation Genetics. 16, 443-57.

SeedFinder. 2017, Clone Only Strains. http://en.seedfinder.eu/database/strains/cloneonly/. Accessed May 31 2017.

Small E. 1997. Cannabaceae. Flora of North America Editorial Committee, editors. Flora of North America North of Mexico. New York and Oxford. vol. 3, p. 381-387.

Small E. (a) 2015. Evolution and classification of Cannabis sativa (marijuana, hemp) in relation to human utilization. The Botanical Review. 81, 189-294.

Small E. (b) 2015. Response to the erroneous critique of my Cannabis monograph by RC Clarke and MD Merlin. The Botanical Review. 81, 306-16.

Small E. 2017. Cannabis: A Complete Guide. CRC Press: Taylor and Francis.

Smith MH. 2012. Heart of Dankness: Underground Botanists, Outlaw Farmers, and the Race for the . Broadway Books.

Soler S, Gramazio P, Figàs MR, Vilanova S, Rosa E, Llosa ER, Borràs D, Plazas M,

30 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Prohens J. 2017. Genetic structure of Cannabis sativa var. indica cultivars based on genomic SSR (gSSR) markers: Implications for breeding and germplasm management. Industrial Crops and Products. 104, 171-8.

Sˇtajner N, Rusjan D, Korosec-Koruza Z, Javornik B. 2011. Genetic characterization of old Slovenian grapevine varieties of Vitis vinifera L. by microsatellite genotyping. American Journal of Enology and Viticulture. ajev-2011.

Stockton N. 2015. Sorry, but the names for weed strains are kinda meaningless. Wired - Science. http://www.wired.com/2015/08/sorry-names-weed-strains-kinda-meaningless/. Accessed August 15 2016.

The Marihuana Tax Act of 1937. 19937. Pub. 238, 75th Congress, Aug 2, 1937, 50 Stat. 55.

Tomida I, Pertwee RG, Azuara-Blanco A. 2004. Cannabinoids and glaucoma. British Journal of Ophthalmology. 88, 708-13.

United Nations Office on Drugs, 2010. Crime. World Drug Report. United Nations Publications. https://www.unodc.org/documents/wdr/WDR_2010/World_Drug_Report_2010_lo- res.pdf. Accessed 31 May 2017.

United States Department of Agriculture. 1989. United States Plant Variety Protection Act of 24 December 1970. USDA. https://www.ams.usda.gov/sites/default/files/media/Plant%20Variety%20Protection%20 Act.pdf. Accessed May 31 2017.

United States Department of Agriculture. 2015. Agricultural Marketing Service, Plant Variety Protection Office Application Requirements, Guidelines Exhibit B- Statement of Distinctness. https://www.ams.usda.gov/sites/default/files/media/Exhibt%20B.pdf. Accessed August 8 2016.

United States Department of Agriculture. 2016. Agricultural Marketing Service, What is a Specialty Crop? https://www.ams.usda.gov/services/grants/scbgp/specialty-crop Accessed August 2 2016.

Van Oosterhout C, Hutchinson WF, Wills DP, Shipley P. 2004. MICRO‐CHECKER: software for identifying and correcting genotyping errors in microsatellite data. Molecular Ecology Notes. 4, 535-8.

Vergara D, Baker H, Clancy K, Keepers KG, Mendieta JP, Pauli CS, Tittes SB, White KH, Kane NC. 2016. Genetic and genomic tools for Cannabis sativa. Critical Reviews in Plant Sciences. 35, 364-77.

Wikileaf. 2018. Cannabis Strain Research Center. 2017. http://www.wikileaf.com. Accessed April 30 2018.

31 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Zha HG, Milne RI, Sun H. 2008. Morphological and molecular evidence of natural hybridization between two distantly related Rhododendron species from the Sino- Himalaya. Botanical Journal of the Linnean Society. 156, 119-29.

32 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 1 Cannabis samples (122) from 30 strains with the reported proportion of Sativa from Wikileaf (Wikileaf, 2018) and the city location and state where each sample was acquired. (SLO: San Luis Obispo). Name Sativa City State Name Sativa City State Durban Poison 100 Boulder 1 CO OG Kush 55 Denver 3 CO Durban Poison 100 Boulder 3 CO OG Kush 55 Fort Collins 3 CO Durban Poison 100 Denver 1 CO OG Kush 55 Garden City 2 CO Durban Poison 100 Denver 2 CO OG Kush 55 SLO 1 CA Durban Poison 100 Fort Collins 3 CO Blue Dream 50 Boulder 1 CO Durban Poison 100 Fort Collins 4 CO Blue Dream 50 Boulder 2 CO Durban Poison 100 Garden City 1 CO Blue Dream 50 Boulder 3 CO Durban Poison 100 Garden City 2 CO Blue Dream 50 Denver 1 CO Durban Poison 100 Union Gap 1 WA Blue Dream 50 Garden City 4 CO Hawaiian 90 Boulder 1 CO Blue Dream 50 Garden City 4 CO Hawaiian 90 Fort Collins 2 CO Blue Dream 50 SLO 2 CA Sour Diesel 90 Boulder 1 CO Blue Dream 50 SLO 3 CA Sour Diesel 90 Boulder 3 CO Blue Dream 50 SLO 4 CA Sour Diesel 90 Greeley 1 CO Tahoe OG 50 Boulder 1 CO Sour Diesel 90 Denver 4 CO Tahoe OG 50 Denver 1 CO Sour Diesel 90 Fort Collins 3 CO Tahoe OG 50 Fort Collins 4 CO Sour Diesel 90 Garden City 1 CO Tahoe OG 50 SLO 3 CA Sour Diesel 90 Garden City 2 CO ChemdawgD 40 Boulder 1 CO Trainwreck 90 Denver 1 CO ChemDawg 45 Boulder 2 CO Trainwreck 90 Garden City 1 CO ChemDawg 45 Boulder 3 CO Island Sweet Skunk 80 Boulder 1 CO ChemdawgD 40 Denver 1 CO Island Sweet Skunk 80 Garden City 1 CO Chemdawg 91 40 Denver 5 CO Island Sweet Skunk 80 Garden City 2 CO Chemdog 1 40 Garden City 1 CO AK-47 65 Boulder 1 CO ChemDawg 45 Garden City 2 CO AK-47 65 Denver 3 CO Headband 45 Garden City 1 CO AK-47 65 SLO 2 CA Headband 45 Greeley 1 CO Golden Goat 65 Boulder 1 CO Banana Kush 40 Denver 1 CO Golden Goat 65 Boulder 2 CO Banana Kush 40 Garden City 1 CO Golden Goat 65 Boulder 3 CO Banana Kush 40 Garden City 2 CO Golden Goat 65 Denver 1 CO Banana Kush 40 Greeley 1 CO Golden Goat 65 Garden City 1 CO Girl Scout Cookies 40 Boulder 1 CO Golden Goat 65 Garden City 1 CO Girl Scout Cookies 40 Denver 1 CO Golden Goat 65 Garden City 2 CO Girl Scout Cookies 40 Fort Collins 2 CO Green Crack 65 Fort Collins 2 CO Girl Scout Cookies 40 Garden City 2 CO Green Crack 65 Garden City 1 CO Girl Scout Cookies 40 Garden City 3 CO Green Crack 65 SLO 2 CA Girl Scout Cookies 40 SLO 3 CA Bruce Banner 60 Boulder 1 CO Girl Scout Cookies 40 SLO 4 CA Bruce Banner 60 Denver 1 CO Girl Scout Cookies 40 Union Gap 1 WA Bruce Banner 60 Denver 4 CO Jack Flash 55 Boulder 1 CO Bruce Banner 60 Fort Collins 3 CO Jack Flash 55 Denver 3 CO Bruce Banner 60 Fort Collins 4 CO Larry OG 40 Boulder 1 CO Bruce Banner 60 Garden City 1 CO Larry OG 40 Denver 4 CO Flo 60 Boulder 1 CO Larry OG 40 SLO 3 CA Flo 60 Denver 1 CO G-13 30 Boulder 3 CO Flo 60 Fort Collins 2 CO G-13 30 Fort Collins 3 CO Flo 60 Garden City 1 CO G-13 30 Garden City 2 CO Jillybean 60 Garden City 1 CO Lemon Diesel 30 Boulder 1 CO Jillybean 60 Garden City 2 CO Lemon Diesel 30 Garden City 2 CO Jillybean 60 Greeley 1 CO Hash Plant 20 Fort Collins 3 CO Pineapple Express 60 Boulder 1 CO Hash Plant (Australian) 20 Garden City 1 CO Pineapple Express 60 Denver 1 CO Hash Plant 20 Garden City 1 CO Pineapple Express 60 Garden City 2 CO Hash Plant 20 Garden City 2 CO Pineapple Express 60 Longmont 1 CO Bubba Kush 98 20 Denver 1 CO Pineapple Express 60 Union Gap WA Pre-98 Bubba Kush 15 Fort Collins 3 CO Purple Haze 60 Denver 4 CO Grape Ape 0 Boulder 1 CO Purple Haze 60 Greeley 1 CO Grape Ape 0 Union Gap 1 WA Purple Haze 60 Fort Collins 1 CO Purple Kush 0 Denver 1 CO Tangerine 60 Denver 1 CO Purple Kush 0 Garden City 3 CO Tangerine 60 Garden City 1 CO Purple Kush 0 Garden City 4 CO

33 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Jack Herer 55 Garden City 3 CO Jack Herer 55 SLO 1 CA Jack Herer 55 Union Gap 1 WA

34 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 2 Cannabis samples (122) from 30 strains with the reported proportion of Sativa retrieved from Wikileaf (Wikileaf, 2018). Strains arranged by proportion of Sativa, from reported pure Sativa to pure Indica (which has no reported proportion of Sativa) and the proportions of membership for genotype 1 and genotype 2 from the STRUCTURE (Fig. 2) are reported as a percentage according to the proportion of inferred ancestry. Asterisk indicates the twelve popular strains used in further analyses Diamond indicates clone only strains (SeedFinder, 2018)

Sativa Genotype 1 Genotype 2 Standard Strain # Samples Percentage (% average) (% average) Deviation

Durban Poison* 9 100 86 14 9.9 Hawaiian 2 90 61 39 27.58 Sour Diesel* 7 90 14 86 53.74 Trainwreck 2 90 59 41 21.92 Island Sweet Skunk 3 80 93 7 9.19 AK-47 3 65 55 45 7.07 Golden Goat*v 7 65 68 32 2.12 Green Crackv 3 65 60 40 3.54 Bruce Banner* 6 60 19 81 28.99 Flo* 4 60 38 62 15.56 Jillybean 3 60 73 27 9.19 Pineapple Express* 5 60 62 38 1.41 Purple Haze 3 60 77 23 12.02 Tangerine 2 60 53 47 4.95 Jack Herer 3 55 66 34 7.78 OG Kush*v 4 55 28 72 19.09 Blue Dream*v 9 50 80 20 21.21 Tahoe OG 4 50 26 74 16.97 Chemdawg* 7 45 9 91 25.46 Headband 2 45 57 43 8.49 Banana Kush* 4 40 52 48 8.49 Girl Scout Cookies*v 8 40 25 75 10.61 Jack Flash 2 40 96 4 39.6 Larry OG 3 40 7 93 23.33 G-13 3 30 50 50 14.14 Lemon Dieselv 2 30 85 15 38.89 Hash Plant 4 20 37 63 12.02 Pre98-Bubba Kush 2 15 7 93 5.66 Grape Ape 2 0 55 45 38.89 Purple Kush*v 4 0 29 71 20.51

35 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 3 Lynch & Ritland (1999) pairwise relatedness comparisons of overall r-means (Mean) and standard deviations (SD) for samples of 30 strains including r-mean and SD after the first and second (where possible) outliers were removed. Outliers were samples with the lowest r-mean. The twelve popular strains are indicated with an asterisk. Diamonds indicate clone-only strains (SeedFinder, 2018) All Outlier 1 Outlier 2 Strain # Samples Measure samples removed removed Durban Poison* 9 Mean 0.31 0.43 0.58 SD 0.40 0.37 0.30

Hawaiian 2 Mean -0.115 - - SD

Sour Diesel* 7 Mean 0.44 0.57 0.60 SD 0.29 0.22 0.18

Trainwreck 2 Mean -0.001 - - SD

Island Sweet Skunk 3 Mean 0.682 1.000 - SD

AK-47 3 Mean 0.158 0.446 - SD

Golden Goat*v 7 Mean 0.25 0.31 0.46 SD 0.32 0.36 0.36

Green Crackv 3 Mean 0.375 0.885 - SD

Bruce Banner* 6 Mean 0.30 0.51 0.90 SD 0.51 0.50 0.05

Flo* 4 Mean 0.29 0.55 - SD 0.38 0.39 -

Jillybean 3 Mean -0.033 0.039 - SD

Pineapple Express* 5 Mean 0.02 0.04 0.13 SD 0.16 0.17 0.19

Purple Haze 3 Mean 0.041 0.263 - SD

Tangerine 2 Mean -0.219 - - SD

Jack Herer 3 Mean 0.102 0.127 - SD

OG Kush*v 4 Mean 0.13 0.25 - SD 0.19 0.22 -

Blue Dream*v 9 Mean 0.50 0.63 0.76 SD 0.39 0.34 0.24

Tahoe OG 4 Mean 0.210 0.406 0.539 SD

Chemdawg* 7 Mean 0.42 0.51 0.64 SD 0.31 0.31 0.28

36 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Headband 2 Mean 0.107 - - SD

Banana Kush* 4 Mean 0.13 0.24 - SD 0.20 0.13 -

Girl Scout Cookies*v 8 Mean 0.08 0.13 0.22 SD 0.27 0.30 0.32

Jack Flash 2 Mean 0.621 - - SD

Larry OG 3 Mean 0.316 0.671 - SD

G-13 3 Mean 0.286 0.562 - SD

Lemon Dieselv 2 Mean 0.102 - - SD

Hash Plant 4 Mean 0.250 0.250 0.427 SD

Pre98-Bubba Kush 2 Mean -0.024 - - SD

Grape Ape 2 Mean -0.050 - - SD

Purple Kush*v 4 Mean 0.03 0.16 - SD 0.21 0.22 -

37 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Hawaiian Trainwreck Island Sweet Skunk AK-47 Flo Jillybean Pineapple Express Purple Haze Tangerine Jack Herer Durban Poison (100) (90) Sour Diesel (90) (90) (80) (65) Golden Goat (65) Green Crack (65) Bruce Banner (60) (60) (60) (60) (60) (60) (55) San Luis Obispo 2 San Luis Obispo 2 San Luis Obispo 1 Garden City 2* Garden City 1 Garden City 2 Garden City 1 Garden City 2 Garden City 1 Garden City 1 Garden City 2 Garden City 1 Garden City 1 Garden City 2 Garden City 1 Garden City 1 Garden City 1 Garden City 1 Garden City 2 Garden City 2 Garden City 1 Garden City 3 Fort Collins 2 Fort Collins 3 Fort Collins 4 Fort Collins 3 Fort Collins 2 Fort Collins 3 Fort Collins 4 Fort Collins 2 Fort Collins 1 Union Gap 1 Union Gap 1 Union Gap 1 Longmont 1 Greeley 1 Greeley 1 Greeley 1 Boulder 1 Boulder 3 Boulder 1 Boulder 1 Boulder 1 Boulder 1 Boulder 1 Boulder 2 Boulder 3 Boulder 1 Boulder 1 Boulder 1 Denver 1 Denver 2 Denver 4 Denver 1 Denver 3 Denver 1 Denver 1 Denver 4 Denver 1 Denver 1 Denver 4 Denver 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Lemon Jack Herer OG Kush Blue Dream Tahoe OG Chemdawg Headband Banana Kush Girl Scout Cookies Jack Flash Larry OG G-13 Diesel Hash Plant Bubba Kush Grape Ape Purple Kush (55) (55) (50) (50) (45) (45) (40) (40) (40) (40) (30) (30) (20) (20) (0) (0) San Luis Obispo 1 San Luis Obispo 2 San Luis Obispo 3 San Luis Obispo 4 San Luis Obispo 3 San Luis Obispo 3 San Luis Obispo 4 San Luis Obispo 3 San Luis Obispo 2 Garden City 1* Garden City 2 Garden City 4 Garden City 4 Garden City 2 Garden City 1 Garden City 1 Garden City 1 Garden City 2 Garden City 2 Garden City 2 Garden City 1 Garden City 2 Garden City 2 Garden City 3 Garden City 4 Garden City 3 Fort Collins 3 Fort Collins 4 Fort Collins 3 Fort Collins 3 Fort Collins 3 Fort Collins 2 Union Gap 1 Union Gap 1 Union Gap 1 Greeley 1 Greeley 1 Boulder 1 Boulder 2 Boulder 3 Boulder 1 Boulder 1 Boulder 2 Boulder 3 Boulder 1 Boulder 1 Boulder 1 Boulder 1 Boulder 3 Boulder 1 Denver 3 Denver 1 Denver 1 Denver 1 Denver 5 Denver 1 Denver 1 Denver 1 Denver 3 Denver 4 Denver 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fig. 1

38 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Principal Coordinates (PCoA) All Strains

Durban Poison Hawaiian Sour Diesel Trainwreck Island Sweet Skunk Ak-47 Golden Goat Green Crack Bruce Banner Flo Jilly Bean Pineapple Express Purple Haze Tangerine Jack Herer OG Kush Coord 2. (9.56%)2. Coord Blue Dream Tahoe OG Chemdawg Headband Banana Kush Girl Scout Cookies Jack Flash Larry OG G-13 Lemon Diesel Hash Plant Bubba Kush Grape Ape Purple Kush Coord2. (14.29%)

Fig. 2

39 Durban Poison Boulder 1 Boulder 3 Denver 1 Denver 2 Fort Collins 3Fort Collins 4Garden CityGarden 1 City 2

Boulder 3 0.49 Denver 1 -0.26 -0.12 Denver 2 0.35 0.67 -0.13 Fort Collins 3 0.35 1.00 -0.14 0.95 Fort Collins 4 0.49 1.00 -0.12 0.67 1.00 Garden City 1 0.07 0.25 -0.02 0.34 0.39 0.25 Garden City 2 -0.02 -0.07 -0.13 0.09 -0.06 -0.07 -0.04 Union Gap 1 0.35 0.67 -0.13 1.00 0.95 0.67 0.34 0.09

bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not Bluecertified Dream by peerBoulder review) 1 Boulder is 2 theBoulder author/funder, 3 Denver 1 Garden City Garden who4 City hasSan 4* Luis granted OpisboSan 2Luis Opisbo bioRxiv 3 a license to display the preprint in perpetuity. It is made available under Boulder 2 0.25 aCC-BY-NC-ND 4.0 International license. Boulder 3 0.45 0.39 Sour Diesel Boulder 1 Denver 4 Fort Collins 3Garden CityGarden 1 CityGarden 2* City 2 Denver 1 0.38 0.06 0.89 Denver 4 0.70 Garden City 4 0.49 0.24 0.87 0.84 Fort Collins 3 0.50 0.33 Garden City 4* 0.45 0.14 1.00 0.92 0.91 Garden City 1 0.58 0.11 0.58 San Luis Opisbo 2 0.07 0.00 0.18 0.00 0.09 Garden City 2* 0.89 0.81 0.47 0.58 San Luis Opisbo 3 0.45 0.14 1.00 0.92 0.91 1.00 0.00 Garden City 2 0.56 0.67 0.27 0.70 0.85 San Luis Opisbo 4 0.38 0.06 0.89 1.00 0.84 0.92 0.00 0.92 Greeley 1 0.07 0.33 0.01 -0.10 0.25 0.17 A" D"

Durban Poison Boulder 1 Boulder 3 Denver 1 Denver 2 Fort Collins 3Fort Collins 4Garden CityGarden 1 City 2 Durban Poison Boulder 1 Boulder 3 Denver 1 Denver 2 Fort Collins 3Fort Collins 4Garden CityGarden 1 City 2 Chemdawg Boulder 1 Boulder 2 Boulder 3 Denver 1 Denver 5 Garden City 1 Golden Goat Boulder 1 Boulder 2 Boulder 3 Denver 1 Garden CityGarden 1 City 1* Boulder 3 0.49 Boulder 3 0.49 BoulderDenver 2 10.68-0.26 -0.12 DenverBoulder 1 2 -0.260.88 -0.12 Denver 2 0.35 0.67 -0.13 Boulder 3 0.04 0.24 DenverBoulder 2 3 0.350.87 0.671.00 -0.13 DenverFort Collins 1 30.450.35 0.251.00 0.09-0.14 0.95 Denver 1 0.09 0.08 0.04 Fort Collins 4 0.49 1.00 -0.12 0.67 1.00 Fort Collins 3 0.35 1.00 -0.14 0.95 Denver 5 0.12 0.06 0.53 0.38 Garden City 1 0.08 0.02 0.08 -0.02 Garden City 1 0.07 0.25 -0.02 0.34 0.39 0.25 Garden City 1 0.40 1.00 0.48 0.55 0.06 Fort Collins 4 0.49 1.00 -0.12 0.67 1.00 Garden City 2 -0.02 -0.07 -0.13 0.09 -0.06 -0.07 -0.04 Garden City 1* 0.03 -0.03 -0.02 -0.01 0.29 Garden City 2 0.68 1.00 0.36 0.42 0.13 1.00 Garden City 1 0.07 0.25 -0.02 0.34 0.39 0.25 Union Gap 1 0.35 0.67 -0.13 1.00 0.95 0.67 0.34 0.09 Garden City 2 0.52 0.47 0.38 0.22 0.16 0.07 Garden City 2 -0.02 -0.07 -0.13 0.09 -0.06 -0.07 -0.04 Union Gap 1 0.35 0.67 -0.13 1.00 0.95 0.67 0.34 0.09 B" E"

Sour Diesel Boulder 1 Denver 4 Fort Collins 3Garden CityGarden 1 CityGarden 2* City 2

BlueGirl Scout DreamDenver 4 0.70 Boulder 1 Boulder 2 Boulder 3 Denver 1 Garden CityGarden 4 CitySan 4* Luis OpisboSan 2Luis Opisbo 3 CookiesFort Collins 3 0.50 0.33 GardenBoulder City 1 2 0.58Boulder0.25 1 0.11Denver 10.58Fort Collins 2Garden CityGarden 2 CitySan 3 Luis OpisboSan 3Luis Opisbo 4 Sour Diesel Boulder 1 Denver 4 Fort Collins 3Garden CityGarden 1 CityGarden 2* City 2 GardenBoulder City 2* 3 0.890.45 0.810.390.47 0.58 Denver 1 -0.08 Denver 4 0.70 FortGarden CollinsDenver City 2 2 0.161 0.560.38-0.030.670.060.27 0.890.70 0.85 Greeley 1 0.07 0.33 0.01 -0.10 0.25 0.17 Fort Collins 3 0.50 0.33 GardenGarden City City2 0.214 0.49-0.10 0.240.64 0.87 0.84 Garden City 1 0.58 0.11 0.58 GardenGarden City City 3 4*0.14 0.45-0.06 0.14-0.04 1.000.25 0.92 0.91 SanSan Luis Luis Opisbo Opisbo 3 -0.02 2 0.07-0.05 0.00-0.25 0.18-0.05 0.00-0.02 0.09 Garden City 2* 0.89 0.81 0.47 0.58 SanSan Luis Luis Opisbo Opisbo 4 0.003 0.45-0.03 0.14-0.05 1.00-0.10 0.920.04 0.91-0.03 1.00 0.00 Garden City 2 0.56 0.67 0.27 0.70 0.85 SanUnion Luis Gap Opisbo 1 0.184 0.38-0.13 0.060.61 0.891.00 1.000.31 0.84-0.10 -0.110.92 0.00 0.92 Greeley 1 0.07 0.33 0.01 -0.10 0.25 0.17

Golden Goat Boulder 1 Boulder 2 Boulder 3 Denver 1 Garden CityGarden 1 City 1* C" Boulder 2 0.88 F" Boulder 3 0.87 1.00 Denver 1 0.09 0.08 0.04 Garden City 1 0.08 0.02 0.08 -0.02 Garden City 1* 0.03 -0.03 -0.02 -0.01 0.29 Garden City 2 0.52 0.47 0.38 0.22 0.16 0.07 Golden Goat Boulder 1 Boulder 2 Boulder 3 Denver 1 Garden CityGarden 1 City 1* Chemdawg Boulder 1 Boulder 2 Boulder 3 Denver 1 Denver 5 Garden City 1 Blue Dream Boulder 1 Boulder 2 Boulder 3 Denver 1 Garden CityGarden 4 CitySan 4* Luis OpisboSan 2Luis Opisbo 3 Boulder 2 0.88 Boulder 2 0.25 Boulder 2 0.68 BoulderBoulder 3 0.45 3 0.870.39 1.00 Boulder 3 0.04 0.24 DenverDenver 1 0.38 1 0.090.06 0.080.89 0.04 Denver 1 0.45 0.25 0.09 GardenGarden City City 4 0.49 1 0.080.24 0.020.87 0.840.08 -0.02 Garden City 4* 0.45 0.14 1.00 0.92 0.91 Denver 5 0.12 0.06 0.53 0.38 Garden City 1* 0.03 -0.03 -0.02 -0.01 0.29 Garden City 1 0.40 1.00 0.48 0.55 0.06 San Luis Opisbo 2 0.07 0.00 0.18 0.00 0.09 San LuisGarden Opisbo City 3 0.45 2 0.520.14 0.471.00 0.920.38 0.910.22 1.00 0.160.00 0.07 Garden City 2 0.68 1.00 0.36 0.42 0.13 1.00 San Luis Opisbo 4 0.38 0.06 0.89 1.00 0.84 0.92 0.00 0.92

Fig. 3 Chemdawg Boulder 1 Boulder 2 Boulder 3 Denver 1 Denver 5 Garden City 1 Girl Scout Boulder 2 0.68 Boulder 3 0.04 0.24 Cookies Boulder 1 Denver 1 Fort Collins 2Garden CityGarden 2 CitySan 3 Luis OpisboSan 3Luis Opisbo 4 Denver 1 0.45 0.25 0.09 Denver 5 0.12 0.06 0.53 0.38 Denver 1 -0.08 Garden City 1 0.40 1.00 0.48 0.55 0.06 Fort Collins 2 0.16 -0.03 Garden City 2 0.68 1.00 0.36 0.42 0.13 1.00 Garden City 2 0.21 -0.10 0.64 Garden City 3 0.14 -0.06 -0.04 0.25 San Luis Opisbo 3 -0.02 -0.05 -0.25 -0.05 -0.02 San Luis Opisbo 4 0.00 -0.03 -0.05 -0.10 0.04 -0.03 Union Gap 1 0.18 -0.13 0.61 1.00 0.31 -0.10 -0.11 Girl Scout

Cookies Boulder 1 Denver 1 Fort Collins 2Garden CityGarden 2 CitySan 3 Luis OpisboSan 3Luis Opisbo 4 Denver 1 -0.08 Fort Collins 2 0.16 -0.03 Garden City 2 0.21 -0.10 0.64 Garden City 3 0.14 -0.06 -0.04 0.25 San Luis Opisbo 3 -0.02 -0.05 -0.25 -0.05 -0.02 San Luis Opisbo 4 0.00 -0.03 -0.05 -0.10 0.04 -0.03 Union Gap 1 0.18 -0.13 0.61 1.00 0.31 -0.10 -0.11

40 bioRxiv preprint doi: https://doi.org/10.1101/332320; this version posted May 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Change in r-mean Genetic Relatedness

1.00

0.90

0.80

0.70

0.60

0.50

0.40

0.30

0.20 Overall r-mean genetic relatedness value

0.10

0.00

Flo

OG Kush Sour Diesel Blue DreamChemdawg Purple Kush Golden Goat Banana Kush Durban Poison Bruce Banner Pineapple Express Girl Scout Cookies Strain

Fig. 4

41