<<

Genetics: Early Online, published on September 21, 2017 as 10.1534/genetics.117.300310

1

2

3 TITLE:

4

5 Insights into DDT resistance from the Drosophila melanogaster Genetic Reference Panel

6

7

8 AUTHORS:

9

10 Joshua M. Schmidt1,2,*, Paul Battlay2,*, Rebecca S. Gledhill-Smith2, Robert T. Good2, Chris

11 Lumb2, Alexandre Fournier-Level2, Charles Robin2

12

13

14 1. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary

15 Anthropology, Leipzig, 04103, Saxony, Germany.

16 2. * School of BioSciences and the Bio21 Institute, University of Melbourne,

17 Melbourne, 3010, Victoria, Australia.

18 * These authors contributed equally.

19

20

21

22

23

24

25

1

Copyright 2017. 26

27

28

29 Short running title:

30 DDT resistance in the DGRP

31

32 Keywords:

33 Drosophila Genetic Reference Panel (DGRP), DDT, CG10737, Cyp6w1, triallele

34

35 Corresponding author:

36 Charles Robin, The Bio21 Institute, 30 Flemington Road, Parkville, Melbourne Victoria,

37 3010. Australia.

38 Ph:61 3 8344 2349

39 Email: [email protected]

40

41

42

43

44

45

46

47

48

49

50

2

51 ABSTRACT

52 resistance is considered a classic model of microevolution, where a strong

53 selective agent is applied to a large natural population, resulting in a change in frequency of

54 alleles that confer resistance. While many insecticide resistance variants have been

55 characterized at the gene level, they are typically single genes of large effect identified in

56 highly resistant pest species. In contrast, multiple variants have been implicated in DDT

57 resistance in Drosophila melanogaster, however only the Cyp6g1 locus has previously been

58 shown to be relevant to field populations. Here we use genome-wide association studies to

59 identify DDT-associated polygenes and use selective sweep analyses to assess their adaptive

60 significance. We identify and verify two candidate DDT resistance loci. A largely

61 uncharacterized gene, CG10737, has a function in muscles that ameliorates the effects of

62 DDT, while a putative detoxifying P450, Cyp6w1, shows compelling evidence of positive

63 selection.

64

65

66

67

68

3

69 INTRODUCTION

70 Many insights into the genetic basis of adaptation have been gained by genetic analyses of

71 resistance to man-made chemicals; whether that be bacteria to antibiotics (Zhang et al. 2011),

72 weeds to herbicides (Baucom 2016), fungi to fungicides (Schoustra et al. 2005), or insects to

73 (Crow 1957, McKenzie 1996). Typically, the genetics of such traits are thought

74 to be strongly monogenic and thus distinct from most other traits whose variation is governed

75 by the combined effect of multiple loci of small effect (Lande 1983, Roush and McKenzie

76 1987, Allen et al. 2010). This contention is central to the debate over the genetics of DDT

77 resistance in D. melanogaster. As it is not considered a pest, this model insect may have

78 never been exposed to the high selection intensities thought necessary for the evolution of

79 major-gene-based resistance (MacNair 1991, McKenzie and Batterham 1994). Despite most

80 genetic investigations supporting a polygenic architecture, (Crow 1954, Dapkus and Merrell

81 1977, King and Sømme 1958, Shepanski et al. 1977) or more precisely weak polygenicity

82 (Dapkus 1992; i.e. at most one or two loci per chromosome arm), a gene of major effect,

83 Cyp6g1, was identified and described as ‘necessary and sufficient’ for DDT resistance

84 (Daborn et al. 2002). Proposed solutions to this apparent contradiction between ‘polygenic’

85 and ‘monogenic’ schools include delineating a difference between field and laboratory

86 evolved resistance (Ffrench- Constant 2013) or suggesting that Cyp6g1 is only inconsistently

87 associated with resistance (Kuruganti et al. 2007).

88

89 In the specific case of the long-term DDT-selected strain 91-R, there is a strong argument for

90 the polygenicity of DDT resistance. 91-R is over 1500 times more resistant to DDT than

91 another lab strain, Canton-S (Strycharz et al. 2013); levels far beyond those attributable to

92 Cyp6g1 alone. Much of the recent work aimed at identifying polygenes associated with DDT

93 resistance has focussed on contrasting these resistant and susceptible lab strains, and has

4

94 allowed the generation of transcriptomic (Pedra et al. 2004), proteomic (Festucci-Buselli et

95 al. 2005), physiological (Strycharz et al. 2013) and genomic (Steele et al. 2014, Steele et al.

96 2015) datasets of differentiated factors. These candidates, however, remain to be validated,

97 and their contribution to either the selection response in the lab, or to field resistance, have

98 yet to be quantified. Until now Cyp6g1 has remained the only DDT-resistance locus with

99 molecularly defined alleles that contribute to DDT-related genetic variation observed in field

100 populations.

101

102 Previously Schmidt et al. (2010) demonstrated that an allelic series at the Cyp6g1 locus is

103 sweeping through D. melanogaster populations. The resistant Cyp6g1 variant described by

104 Daborn et al. (2002), which harbored a partial Accord transposable element insertion

105 upstream of the Cyp6g1 locus, was subsequently found to be invariably associated with a

106 duplication of the Cyp6g1 locus (Schmidt et al. 2010). It is still unclear to what extent the

107 upregulation of Cyp6g1 observed at this allele is due to the Accord element (Chung et al.

108 2007) or to the presence of multiple Cyp6g1 copies, or both. However, it is clear that the

109 ancestral Cyp6g1 allele (Cyp6g1-M) is at very low frequencies throughout the world and has

110 largely been replaced by haplotypes carrying the duplication and various transposable

111 element insertions. The Cyp6g1-AA allele (which has two copies of the gene both bearing the

112 Accord element) and the Cyp6g1-BA allele (where one of the duplicated copies contains an

113 additional HMS-Beagle transposable element insertion) both confer resistance to DDT

114 relative to the Cyp6g1-M allele, although they show only subtle phenotypic differences

115 between one another. A fourth allele, Cyp6g1-BP, is at high frequencies in populations in

116 northeastern Australia, and provides a further increase in resistance, in both a set of

117 isochromosomal lines and in a field population (Schmidt et al. 2010). In the latter, the

5

118 Cyp6g1-BP allele accounts for approximately 16% of the total variation in the DDT

119 resistance phenotype.

120

121 The analysis of field variation at the Cyp6g1 locus provides two arguments suggesting that

122 other loci contribute to DDT resistance in the field. Firstly, as Cyp6g1 allelic variation does

123 not account for all the 20-50% heritability expected for insecticide resistance (McKenzie

124 2000), there is much genetic variation that is not accounted for (20-68%). Secondly, as a

125 succession of Cyp6g1 alleles provides a series of adaptive events at one locus, there has been

126 time and selective pressure for other adaptive events arising elsewhere in the genome.

127

128 DDT affects the nervous system by targeting the voltage-gated sodium channel resulting in

129 uncontrolled nerve firing. In D. melanogaster, the alpha subunit of the voltage-gated sodium

130 channel is encoded by para, and target-site resistance to DDT has been described in para

131 mutants by Pittendrigh et al. (1997) and Lindsay et al. (2008). However, para variation

132 affecting DDT resistance has not yet been identified in outbred populations of D.

133 melanogaster, nor has it been implicated in the phenotypes of well-studied DDT-resistant

134 laboratory strains (Dapkus and Merrell 1977; Steele et al. 2014).

135

136 DDT-induced mortality is typically preceded by the loss of peripheral muscle control and

137 manifests as a ‘knockdown’ phenotype in which the fly lies paralyzed with appendages

138 twitching, or exhibits uncontrolled flight. Here we use the Drosophila Genetic Reference

139 Panel (DGRP; Mackay et al. 2012; Mackay and Huang 2017), a resource for the dissection of

140 quantitative traits to characterize the genetic architecture of both knockdown and mortality

141 caused by DDT exposure using predominantly field-derived genetic variation from a single

142 population. This population has shown great phenotypic diversity for, and furnished genetic

6

143 associations with, a diverse range of traits including insecticide resistance (Mackay et al.

144 2012, Huang, Massouras et al. 2014, Battlay et al. 2016). We perform genome-wide

145 association studies on two DDT-related traits to gain insights into insecticide biology and

146 insecticide-driven selection using a classic insecticide model.

147

148

149 MATERIALS AND METHODS

150 Fly stocks: DGRP lines were obtained from the Bloomington Stock Center, and were

151 maintained on corn meal media within a 22-25 degree Celsius temperature range. For the

152 DDT assays, larval density was controlled by setting up vials with 10 mated females that

153 were allowed to lay for three days, then transferred to fresh media for a further three days to

154 establish an additional brood to provide enough progeny for replication. At this stage 2-3

155 replicates per line were established. Adults were allowed to eclose for 2-3 days, and females

156 were collected to food-containing holding vials after brief CO2 anaesthetization. After 2-3

157 days recovery adult females were assayed, thus the age range was 4-6 days for flies assayed.

158

159 DDT assays: Glass scintillation vials were inoculated with 200ul of Acetone/DDT solution at

160 a concentration of 0.5 µg/µL, giving a final contact assay amount of 100 µg of DDT. Cotton

161 wool moistened with 10% sucrose solution was used to stopper the scintillation vials. Assays

162 were setup between 7-8 am, roughly corresponding with the onset of the light phase of the

163 diurnal cycle. Time of setup was recorded, and each vial was scored for the presence of flies

164 exhibiting either knockdown or mortality at approximately one- hour intervals. Knockdown

165 here is defined as either flies permanently seen to be in a prone position with jolting

166 movement of legs or wings, or a prolonged inability to right themselves from a prone position

167 after tapping of the scintillation vial on the lab bench. Mortality was determined as the point

7

168 all external movement ceased. Data was recorded in data sheets with time points as columns

169 and DGRP line number as rows. Assay vials contained 10 females, and there were at least 3

170 replicates per DGRP line. A total of 184 DGRP lines were assayed. In total the assay time

171 was 30 hours. Phenotypes were scored through 1-15, 24 and 30 hours.

172

173 Estimation of heritability: At least three replicates of ten flies per vial were analyzed per

174 DGRP line, which allowed the calculation of broad sense heritability (H2). Following the

175 method of Clowers et al. (2010), heritability was estimated from the variance components of

176 a linear model of the form: Phenotype = Population mean + Line effect + error. As these lines

177 are inbred, the components of the Genetic variance (e.g. additive or dominant), cannot be

178 separated, thus these estimates are of broad sense (H2) heritability. An ANOVA was

179 performed in SPSS, and the variance components were estimated as the Mean Sum of

180 Squares. Total phenotypic variance was estimated as Genetic Variance + Environmental

2 181 Variance. H was thus estimated as Gv/Gv+Ev. This was calculated for each discreet time

182 point.

183

184 Genome-wide Association Studies: Phenotypes were submitted to the Mackay Lab DGRP2

185 website for GWAS (http://dgrp.gnets.ncsu.edu/; Huang, Massouras et al. 2014). To perform

186 our own GWAS for DDT resistance traits, we used the PLINK GWAS program (PLINK

187 v1.9, https://www.cog-genomics.org/plink2; Chang et al. 2015). For genotypes, we initially

188 downloaded the DGRP freeze 1 SNP tables

189 (http://www.hgsc.bcm.tmc.edu/projects/dgrp/freeze1_July_2010/snp_calls/). These tables

190 encoded ‘haploid’ variant calls, with heterozygous positions encoded with the relevant

191 ambiguity codes. To make PLINK compatible files, we used perl scripts to modify these SNP

192 tables into diploid genotypes. We were surprised by a large number of sites that contained

8

193 more than two alleles. The initial DGRP freeze 1 GWAS tool removed all such sites, but

194 assuming that many of these represented low frequency sites or errors, we ranked each

195 variant by frequency and kept the two most frequent ones. The one major difference between

196 our initial results and that of the DGRP freeze 1 GWAS tool was the tri-allelic variant at

197 Cyp6w1.

198

199 However, our analyses here are based on the newer DGRP freeze 2 release. For this we

200 downloaded the dgrp2.bed file (PLINK format) from the DGRP2 website

201 (http://dgrp2.gnets.ncsu.edu/data/website/dgrp2.bed). Thus, these genotypes are identical to

202 those used in the DGRP webtool, except for choices in variant filtering as indicated in the

203 PLINK command line below. It should also be noted that now no heterozygous genotypes

204 are present, and that the two most common variants at tri-allelic sites are also present in this

205 bed file.

206

207 We used the following base PLINK command to perform GWAS (linear regression of

208 genotype phenotype association assuming an additive model of allelic effects) after initially

209 filtering for SNPs with MAF >= 0.05 and at least 70% genotyping rate (given that a tri-allelic

210 site with equal proportions of all three genotypes would have a rate of ~ 66% in the DGRP

211 freeze 2 data).

212

213 plink --allow-extra-chr --allow-no-sex --bfile dgrp2 --covar

214 dgrp2_ESTRAT_PCA20.txt --linear --map3 --no-fid --no-parents --no-pheno --

215 no-sex --aperm 5 1000000

216

217 The ‘aperm’ option was used to generate empirical estimates of the p-value for each SNP. To

9

218 calculate the family wise error rate (FWER) for a given p-value or level of significance, we

219 generated 5,000 random permutations of the 4hrkd and 24hrm phenotype data and performed

220 the same linear association model as for the observed phenotypes. For each permutation we

221 record the lowest observed p-value. The FWER is then the number of random permutations

222 that generated a p-value lower than or equal to that observed, for each SNP i.e. that chance of

223 at least one false positive at this level of significance. After filtering, 1,776,058 SNPs were

224 tested for each PLINK GWAS.

225

226 We performed Principal Components Analysis on the DGRP data using PLINK after pre-

227 filtering genotypes for minor allele frequency (>0.05), missing rate (<0.70) and linkage

228 disequilibrium (LD) pruning (r2 < 0.2) using the indep pairwise command, with both window

229 and step size of 500 variants. To control for confounding cryptic relatedness in the DGRP, we

230 used the first 20 Principle Components as co-variables in all GWAS, as indicated above with

231 the ‘--covar dgrp2_ESTRAT_PCA20.txt’ option in the PLINK command. As inversions and

232 wolbachia infection status can also influence the phenotypes of the DGRP lines, we used the

233 phenotypes adjusted for these factors outputted from the DGRP2 website. Thus, our results

234 for the DGRP freeze 1 and DGRP freeze 2 differ due to now correcting for population

235 structure, adjusting phenotypes for inversion and wolbachia infection status, and the removal

236 of heterozygous genotypes in the DGRP freeze 2. Finally, for annotating variants we used the

237 annotations made available on the DGRP2 website

238 (http://dgrp2.gnets.ncsu.edu/data/website/dgrp.fb557.annot.txt).

239

240 To test specific variants at Cyp6g1 we used PCR primers described in Schmidt et al. (2010)

241 and/or local alignments of Illumina and 454 reads to determine the Cyp6g1 genotypes. For

242 Cyp6w1, our DGRP freeze 1 GWAS indicated the presence of a triallelic site at

10

243 2R:6174944[V6]. We used the re-processed DGRP genotype data from the Drosophila

244 Genome Nexus (Lack et. al. 2015) files to extract genotype data for this position (described

245 further under the iHS and nSL tests methods), as the genotypes for the third variant state are

246 removed from the DGRP freeze 2 data. For both these genes, the sites of interest are triallelic

247 in the DGRP. We generated files with the ancestral allele and derived alleles, both singularly

248 and combined after collapsing to the same allelic code, included to test for associations with

249 DDT phenotypes. In the particular case of Cyp6g1 we introduced a variable site to code

250 Cyp6g1-M, Cyp6g1-AA, or Cyp6g1-BA alleles.

251

252 iHS and nSL tests of recent positive selection: We downloaded the haploid FASTA

253 alignments for 205 DGRP2 genomes, described by Lack et. al. (2015), that are contained in

254 the Drosophila Genome Nexus (DGN; http://johnpool.net/genomes.html). We applied the

255 supplied masking filter for identity by decent (IBD) regions, but not the masks for admixture.

256 For ease of downstream processing we converted these FASTA alignments into VCF files

257 using custom bash and perl scripts. Note, these scripts kept the genotype information at

258 triallelic sites.

259

260 After IBD masking and the pre-filtering of excessively heterozygous regions in the DGN data

261 (using the scripts available from http://johnpool.net/genomes.html), the genomes varied

262 drastically in the proportion of sites with missing data. We excluded chromosomes with more

263 than 20% missing data, leaving us with 134 lines for chromosome 3L. For other

264 chromosomes, we randomly sampled lines without replacement to n=134. In the compilation

265 of VCF files, we excluded sites with greater than 15% missing genotypes in the DGRP2.

266 There were of course missing data remaining after these two steps, still rendering them

267 unsuitable for haplotype tests. We filled in missing genotypes by randomly sampling from

11

268 called genotypes, in proportion to their population frequency. In the main this procedure will

269 reduce local linkage disequilibrium, and thus is conservative vis. a vis. LD based tests of

270 selection, and is similar to the method employed by Garud et. al. (2015).

271

272 We used two tests of selection that both utilize signatures of LD around a beneficial allele.

273 This LD can be measured by the extended haplotype homozygosity (EHH) statistic (Sabeti et.

274 al. 2002). EHH summarises the probability that two chromosomes bearing the beneficial

275 allele are also identical by state at a nearby neutral variant. Increased LD will thus also

276 increase the EHH statistic. The integrated haplotype score (iHS; Voight et. al., 2006), extends

277 EHH by calculating the area under the EHH curve (iHH, and thus integrated) for both the

278 derived and ancestral variant, and then finding the ratio of iHHDerived/iHHAncestral. If LD

279 for both the derived and ancestral allele is approximately the same this ratio should be close

280 to 1, whereas large departures could indicate the action of recent positive selection. As LD is

281 also influenced by the age of mutations, iHS scores are standardised in bins of derived allele

282 frequency, as under a neutral model frequency is indicative of allele age (on average). nSL

283 (number of segregating sites by length; Ferrer-Admetlla et. al., 2014) is similar in spirit to

284 iHS, but instead of EHH the base statistic is the average count of SNPs for which two

285 haplotypes are identical. We use it here because this statistic may be less biased towards

286 regions of low recombination, and have an increased ability to detect soft sweeps (ie the

287 beneficial allele is segregating on more than one distinct haplotype at the time of selection).

288 To calculate both iHS and nSL we used the versions implemented in selscan with default

289 parameters for each (Szpiech and Hernandez 2014). We kept bi-allelic SNPs with 5% < DAF

290 < 95%. The ancestral state was estimated as the homologous Drosophila simulans reference

291 genome allele (aligned to D. melanogaster, also downloaded from

292 http://johnpool.net/genomes.html). Both statistics were normalised in 1% frequency bins per

12

293 chromosome, and a p-value per site was calculated from the empirical distribution of

294 normalised scores (|iHS| |nSL|) per chromosome. Genetic map positions were estimated using

295 the recombination maps generated by Comeron et. al. (2012).

296 For the triallelic sites at Cyp6g1 and Cyp6w1, we manually constructed haplotype files

297 containing the set of one of the derived haplotypes and the ancestral haplotypes. These sets

298 are a subset of those for biallelic sites, and p-values for each of these was calculated as above.

299 For the extended haplotype structure for the Cyp6w1_GLY allele, the extended haplotype

300 homozygosity (EHH) curve does not decay below 0.05 within the default window size, a

301 reflection of both increased haplotype structure but also a region of increased heterozygosity

302 downstream of Cyp6w1 which is filtered out in most DGRP2 individuals and introduces a

303 gap >200kb. To compute an iHS value for this variant we relaxed the default parameters, so

304 that the max gap is 250 kb. Manual inspection of the EHH curves (figure 4) suggests that the

305 high iHS statistic for this variant is due to haplotype structure and not this gap in the

306 estimated haplotypes.

307

308 CG10737 RNAi: Line CG10737-KK (VDRC ID 106383) that has a UAS followed by a

309 hairpin sequence matching CG10737 with no recorded off-targets was obtained from the

310 VRDC. It was crossed to Mef2-GAL4 (Bloomington ID 27390). Line y, w[1118];

311 P(attP,y+,w[3’]) (VDRC ID 60100) was used a control and also crossed to the Mef2-GAL4

312 driver. The offspring of these crosses were evaluated for the DDT resistance phenotypes, 3-

313 hour knockdown and 24-hour mortality. The screens for each cross were performed over a

314 range of DDT concentrations using replicates of 10 females aged 4 to 6 days old. For every

315 DDT concentration, a minimum of 80 females were screened for 3-hour knockdown and a

316 minimum of 50 females were screened for 24-hour mortality. The relationship between the 3-

13

317 hour knockdown resistance phenotypes and DDT concentration was evaluated using

318 XLSTAT to perform a Probit analysis, a log normal regression of the knockdown data

319 (Sakuma 1998). Dosage mortality curves were constructed for the relevant crosses and their

320 controls and the ED50, the dose that will theoretically knockdown 50% of the population, was

321 estimated. To ascertain that the gene had indeed been knocked down total RNA was extracted

322 from 20-30 adult females using the TRIsure-reagent. The extracted RNA was treated with

323 RNA-free DNase (NEB) and residual genomic DNA contamination was assessed for each

324 sample by attempting to amplify a PCR product from the RNA. The samples were converted

325 to cDNA using 1µg of RNA in 20µl reaction with Anchored Oligo (dT)20 and MuLV Reverse

326 Transcriptase (NEB) according to the manufacturer’s instructions.

327

328 Cyp6w1 transgenic overexpression: UAS-Cyp6w1 constructs were designed using the y; cn

329 bw sp; reference strain Cyp6w1 coding sequence as a template. Constructs containing each

330 state of Cyp6w1 triallele identified in the 24hrm GWAS were created by mutating the

331 VAL370 codon in the reference sequence to ALA370 and GLY370. All three constructs were

332 manufactured by Biomatik USA (Delaware, USA), and were supplied cloned into plasmid

333 pUASTattB.

334

335 The UAS-Cyp6w1 constructs were transformed into the y1 M{vas-int.Dm}ZH- 2A w*; (EPS)

336 M{36P3-RFP.attP}ZH-86Fb recipient strain, which has a defined integration site on

337 Chromosome 3, using the attP–attB system and QC31 integrase (Bischof et al. 2007). To

338 generate heterozygous Cyp6w1/tubulin-GAL4 or serrate flies, virgin female homozygous

339 UAS-Cyp6w1 flies were collected on light CO2, stored at 22-25 degrees Celsius for four days

340 to ensure virginity, and then 10 females per vial were crossed to tubulin-GAL4/serrate males.

341 At the same time vials with either 10 UAS-Cyp6w1 or tubulin- GAL4/serrate females were

14

342 established. Adults for all these fly lines were allowed to eclose for 2-3 days, and females

343 were collected to food-containing holding vials after brief CO2 anesthesia.

344

345 Each line was tested on a minimum of 5 doses of DDT. A minimum of 5 replicates of 20

346 individuals per dose per line were treated in the bio-assay. Glass scintillation vials were

347 inoculated with 200µL of Acetone/DDT solution with concentrations ranging from 5×10-5 to

348 1.0µg/µL, giving a final contact assay range of 0.01 to 200µg of DDT. Cotton wool

349 moistened with 10% sucrose solution was used to stopper the scintillation vials. Flies were

350 assayed at 25 degrees Celsius for 24 hours. Data for each tested line was individually

351 analyzed using PriProbit (Sakuma 1998) to estimate the LC50 and generate data for plotting

352 dosage mortality curves and 95 per cent confidence intervals.

353

354 91-R and 91-C genome examination: .bam files containing alignments of 91-R and 91-C

355 sequencing reads to the y; cn bw sp; reference genome were obtained from the sequence read

356 archive (Leinonen et al. 2010; SRR1237973; SRR1237974). Alignments at relevant loci were

357 visualized using IGV software (Thorvaldsdóttir, Robinson & Mesirov 2013) to determine

358 presence of SNPs and discordant reads consistent with structural variation.

359

360 Raw phenotypic data is available in the Supplementary material file S1.

361

362

363 RESULTS

364 Genome-wide association studies: 184 lines of the Drosophila Genetic Reference Panel

365 (DGRP) were exposed to a dose of 100µg of DDT and monitored over 30 hours. Two

366 phenotypes were recorded; percentage knocked-down and percentage dead. The median of

15

367 the 50% knockdown time for the DGRP lines was 15 hours, while the median of 50%

368 mortality time was 18 hours.

369

370 We chose to focus on 24-hour mortality (24hrm), as it is a standard assay in D. melanogaster

371 insecticide resistance literature (Daborn et al. 2001, Schmidt et al. 2010). We also observed

372 minimal zero-dose control mortality and maximal broad-sense heritability (H2= 0.8) at this

373 time point. Of our knockdown timepoints, we chose four-hour knockdown (4hrkd) for

374 further analysis, as the H2 of our assay reached an early peak at this time point and it thus

375 provides a robust yet contrasting dataset to the 24-hour mortality (fig. 1A; file S1). Based on

376 the number of lines phenotyped and genotyped (n=179), the number of replicates (r=3) and

377 the broad-sense heritability of the trait (H2=0.8), the method of Mackay and Huang (2017)

378 was used to estimate the power to detect variants for a range of effect sizes. Variants with an

379 effect of 12% of H2 (~0.1 units of the trait's standard deviation) would be detected with 50%

380 chance and variants with an effect of 22% of H2 (~0.18 units of the trait's standard

381 deviation) with 95% chance, respectively (File S4).

382

383 The 4hrkd phenotype may reveal the intrinsic variation in insecticide uptake routes and

384 nervous system sensitivities, while the later 24hrm phenotype may encompass variation in the

385 physiological response to the insecticide such as inducible detoxification mechanisms. 4hrkd

386 and 24hrm show distinctly different distributions within the DGRP. The population mean for

387 4hrkd is 21 per cent with standard deviation of 19 per cent, while 24hrm has a mean mortality

388 of 58 per cent, with standard deviation 29 per cent. The genetic correlation between the two

389 traits was estimated to be 0.42 (fig. 1B).

390

16

A ●●●●●●●●●●●●●●●● 100 100 ●●●●● ● ●●●● ● ●●

●●●●●

●● ● ● ●●●● ● ●●●●●●● ●●●●●●●● ● 80 ● 80 ●●●● ●● ●● ●● ●●● ● ●● ● ●● ●●●●

●●● ● ●●●●● ●●● ● ) ● ) ●● t t ●●● ●●●● n

n ●●●●●● 60 ● 60 ●

● e ●●●● e ● ●● ●●● c ●●● c

●● r ●●●●●●●●● r ●● ●● e

e ● ●●●●● ●●●● p

p ●

●● ( ●● (

●●●●● ● ●● ●●●

40 m 40 ●● ●●● ●

kd ●●●● ●●● ● ● ●●●●●● hr

hr ●●●●● ●●●●●

4 ●●●●●●●● ●●●● ● ● 24 ●● ●●●●●● ● ● ●●●●●●●● ● ● ●●●●●●●●● ●

●●● ●●●● 20 ●●●● 20 ● ●●●●●●●●● ●●●● ●●●●●●●●● ●● ●●●●●●●● ●●●●●

●●●●●●●●●● ● ●●● ●●●●●●●●●●●●● ●● ●● ● ●●●● ● ●●●●●●●●●●●●●● ●● 0 ●●●●●●●●●●●●●●●●● 0 ●●●●●

0 50 100 150 200 0 50 100 150 200 4hrkd rank 24hrm rank B

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

k 60 ● ● ● ● ● ● ●

n ● ● ● ● ● ● ● a

r ● ● ● ● ● ● ● ● ●

m ● ● ● ● ●

hr ● ● ● ● 40 ● ● ● ● ● ● ● ● 24 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ●

● ● ● ● ● ● ● ● ● ● 0 ● ●

0 20 40 60 80 100 4hrkd rank 391

392 Figure 1: DDT Knockdown and Mortality. (A) The mean percentage four-hour knockdown

393 and 24-hour mortality for each DGRP line (based on at least three replicates of 10 individuals

394 each) with lines ordered from lowest to highest for each trait. (B) The scatterplot shows the

395 correlation between 24-hour mortality and four-hour knockdown. Flies that are knocked

396 down early can recover and exhibit late mortality.

397

398 We performed GWAS on our four-hour knockdown and 24-hour mortality phenotypes using

17

399 both the DGRP2 webtool (Huang, Massouras et al. 2014) and a custom implementation using

400 PLINK (version 1.9). We did not observe a significant (ANOVA p > 0.05) effect on DDT

401 induced knockdown or mortality of Wolbacchia infection (knockdown: r2= 0.013 and

402 mortality: r2= 0.010), inversions with a frequency greater than 5% (knockdown: r2 range = -

403 0.145 - 0.176 and mortality: r2 range = -0.264 - 0.136) or genome size

404 (knockdown: r2=0.005 and mortality: r2=0.004). All four GWAS detected variants with p-

405 values less than the arbitrary genome-wide significance threshold (1×10-5; file S2; Mackay et

406 al. 2012). None of the variants associated with 24hrm were also associated with the 4hrkd

407 phenotype. Only 24hrm with the PLINK pipeline showed a variant with family-wise error

408 rate (FWER) < 0.05, in a cytochrome P450 gene Cyp6w1. Using a less conservative cutoff of

409 p<1×10-6, 4hrkd showed 8 variants. Of note, one of these is in CG10737 a gene that has

410 previously been implicated in DDT resistance.

411

412 CG10737: Two SNPs separated by 30 nucleotides occur in the intron of CG10737, and are

413 highly associated with 4hrkd in both PLINK and DGRP2 GWAS; they are in complete LD

414 with each other and have slightly different p-values of association because of missing data

415 (2R:19169399[V6], 2R:19169429[V6]; file S2). CG10737 is predicted to encode a protein

416 with C1 and C2 domains that are capable of diacylglycerol binding and calcium dependent

417 targeting respectively, and is therefore postulated to be involved in intracellular signal

418 transduction (Attrill et al. 2015; Mitchell et al. 2014). A previous study highlighted that

419 CG10737 is one of five ‘lipid metabolism’ genes shown to be differentially expressed among

420 two DDT-resistant lines and a susceptible lab line. Specifically, CG10737 was expressed at

421 significantly lower levels in the DDT-resistant Wisc-1 line than the susceptible Canton-S line

422 (Pedra et al. 2004). This led us to hypothesize that a reduction of CG10737 transcript would

423 lead to increased resistance to DDT-knockdown. To test this hypothesis, we crossed a VDRC

18

424 UAS-CG10737-KK line to knockdown this gene using a Mef2-GAL4 driver, which targets the

425 muscles. The justification for this driver is four-fold; (i) according to the modENCODE

426 datasets CG10737 is highly expressed in the muscle associated tissues (crop, hindgut, heart,

427 carcass and male accessory gland) and is considered muscle-enriched in two microarray

428 analyses (Zhan et al. 2007, Weake et al. 2008), (ii) genes that are co-expressed with

429 CG10737 are known muscle genes (Myosin heavy chain, myosin light chain 1 and 2,

430 troponin and myosuppressins) (iii) chromatin co-immunoprecipitation experiments show that

431 MEF2 binds to the CG10737 cis regulatory region (Sandmann et al. 2006, Cunha et al. 2010)

432 and (iv) Bonn (2010) demonstrated that antibodies to CG10737 stained less intensely in the

433 somatic and visceral mesoderm when in a mef2 null background.

434

435 The progeny of the Mef2-GAL4 x UAS-CG10737-KK cross were assayed for DDT

436 knockdown resistance over a range of doses. The dose of DDT required to knockdown 50%

437 of flies (ED50) was more than twice as high for UAS-CG10737-KK lines as it was for the

438 control lines (109 ug/vial versus <47ug/vial; fig. 2). These findings support the hypothesis

439 that CG10737 transcript reduction increases resistance to DDT knockdown.

440

19

125

100

75

50

ED50 (ug/mL) 25

0

CG10737-KK 60100 CG10737-KK × w1118 × mef2-GAL4 × mef2-GAL4

441

442 Figure 2: RNAi knockdown confirms that lowering CG10737 transcript abundance increases

443 DDT resistance, and that this effect can be mediated by manipulating muscle expression.

444 Flies in which CG10737 is knocked down in muscles using the mef2-GAL4 driver crossed to

445 the VDRC CG10737-KK line are significantly more resistant to DDT than control flies which

446 are the result of substituting the relevant genetic background line for either mef2-GAL4

447 (w1118) or the CG10737-KK line (60100) in the cross.

448

449 Cyp6w1: This gene is the standout candidate as judged by significance of association with

450 variation in 24-hour mortality. The associated SNP (2R:6174944[V6]) results in a coding

451 sequence change in an enzyme belonging to a family capable of detoxification and which is

20

452 most abundantly expressed in the fly fat body, a known detoxification tissue. It is also

453 expressed in the spermatheca, heart, head and carcass (Chintapalli et al. 2007), and Pedra et

454 al. (2004) found it to be overexpressed in the DDT-resistant 91-R line relative to the Canton-

455 S susceptible line. The variant site associated with DDT 24hrm in Cyp6w1 is particularly

456 interesting as three different allelic states at this site are observed within the DGRP, and each

457 state encodes a different amino acid. Such triallelic sites, which represent 2.5% of variable

458 sites in the DGRP, get excluded by many analyses (either explicitly- by excluding sites with

459 more than two states, or, implicitly- because the most abundant two states combined are not

460 enough to pass abundance thresholds).We surmise that this latter factor is the reason we do

461 not recover this variant using the Mackay DGRP2 webtool (Huang, Massouras et al. 2014;

462 http://dgrp.gnets.ncsu.edu/), as less than 75 per cent of lines have a called genotype for this

463 position. In contrast, we used a lower missing genotype threshold of 60 per cent in our

464 PLINK GWAS pipeline. We initially identified 2R:6174944[V6], as a bi-allelic TC transition

465 resulting in a Val370Ala substitution. By investigating this site in the Drosophila genome

466 DGRP2 data, we found a third state, G, that is at a frequency of 14% in the DGRP, and

467 results in a Val370Gly substitution. Combining the effect of the C and G alleles, the p-value

468 for association with DDT mortality is 7.88×10-7. Individually the G allele is insignificant

469 genome-wide (p=0.015), but taken together these results suggest that it too could influence

470 DDT resistance.

471

472 To test what effect the three amino acid states at site 370 of CYP6W1 have on DDT

473 resistance three transgenic lines were generated such they that only differed at that single site.

474 The three lines were crossed to a GAL4 driver line under the control of tubulin promoter

475 sequence, so that the UAS-Cyp6w1 isoforms were expressed in a ubiquitous manner. All

476 three of the Cyp6w1 transgenes were at least 12 times more resistant to DDT than their intra-

21

477 cross sibs when under the control of the tubulin-GAL4 driver (Figure 3; LD50 [the dose

478 predicted to result in 50% mortality] for Cyp6w1 overexpression lines: Cyp6w1_VAL =

479 14ug/vial, Cyp6w1_GLY = 12ug/vial and Cyp6w1_ALA = 28ug/vial). This indicates that

480 overexpression of Cyp6w1 in itself can result in a DDT resistance. However, an additional

481 effect is seen for the Ala370 amino acid substitution, as flies overexpressing this isoform are

482 at least two times more resistant than either Val370 or Gly370 counterparts (fig. 3). Thus, the

483 Ala370 substitution appears to increase DDT resistance, supporting the results of our 24-hour

484 mortality GWAS.

485

c 40

30 b 20 Cyp6w1_ALA Cyp6w1_GLY

10 a a Cyp6w1_VAL LD50 (ug/mL) 0 UAS-Cyp6w1 intra-cross UAS-Cyp6w1 parent control × Tub-GAL4 486

487 Figure 3: Toxicology of transgenic lines overexpressing the three allelic forms of Cyp6w1

488 (far right) and various controls. All three allelic forms of Cyp6w1 confer DDT resistance

489 relative to controls, and Cyp6w1_ALA confers a further significant increase in DDT

490 resistance relative to the other two alleles.

491

492 As our transgenic analyses showed that Cyp6w1 overexpression can yield resistance and

493 Cyp6w1 transcripts are more abundant in Pedra et al. (2004), we examined Cyp6w1

494 transcription in females from DGRP transcriptome dataset (Huang et al. 2015). However

22

495 there was no strong correlation between Cyp6w1 transcription level and DDT mortality or

496 knockdown data, nor between Cyp6w1 transcription level and the triallelic states of Cyp6w1.

497 Similarly, we did not find that any of the other eight detoxification or four lipid metabolism

498 genes that were focused on by Pedra et al. (2004) to have strong correlations between the

499 DGRP transcription data sets and either DDT traits we examined (file S3).

500

501 Selective sweep tests at DDT-associated loci: To assess whether 4hrkd and 24hrm-

502 associated (p<1×10-5) variants from the PLINK GWAS exhibited evidence of being the

503 targets of selection we used the integrated Haplotype Score (iHS) test (Voight et al. 2006,

504 Sabeti et al. 2002). We also calculated the related statistic nSL (number of segregating sites

505 by length; Ferrer-Admetlla et. al. 2014), which defines haplotype lengths via pairwise

506 differences and may have higher power to detect soft selective sweeps. The only derived

507 variant yielding a significant iHS or nSL score was the Glycine allele of Cyp6w1 (|iHS| =

508 3.82, p-value = 0.00085; fig. 4; |nSL| = 3.04, p-value = 0.0028). The other derived variants

509 did not show significant departures from expectation, including the second derived allele at

510 Cyp6w1 – the alanine allele that confers increased DDT resistance in our transgenic

511 experiments.

512

513 To ascertain the power of these iHS and nSL approaches with this dataset we also examined

514 Cyp6g1, the genomic region around which Catania et al. (2004), Schlenke and Begun (2004)

515 and Garud et al. (2015) noted a reduction in genetic diversity indicative of a selective sweep.

516 While there is an allelic series at this locus (Schmidt et al. 2010), the two predominate alleles

517 in the DGRP are Cyp6g1-AA and Cyp6g1-BA, and the EHH for both of these alleles and for

518 the ancestral Cyp6g1-M allele were determined. Clearly both derived Cyp6g1 alleles exhibit

519 large extended haplotypes. Both of these are significant compared to ancestral Cyp6g1-M

23

520 allele (fig. 4; Cyp6g1-AA : |iHS| = 2.45, p-value = 0.018; |nSL| = 2.61, p-value = 0.011;

521 Cyp6g1-BA : |iHS| = 1.96, p-value = 0.027; |nSL| = 2.47, p-value = 0.012).

A B 1.0 1.0 Cyp6w1−Val Cyp6g1−M Cyp6w1−Ala Cyp6g1−AA Cyp6w1−Gly Cyp6g1−BA 0.8 0.8 0.6 0.6 EHH EHH 0.4 0.4 0.2 0.2

−0.05 0.00 0.05 0.10 0.15 −0.2 −0.1 0.0 0.1 0.2 0.3

genetic distance from focal variant (cM) genetic distance from focal variant (cM) 522

523 Figure 4: Extended Haplotype Homozygosity (EHH) plots, (A) around the Cyp6w1 locus

524 using the triallelic second site of codon 370 as the focal variant, and (B) around the Cyp6g1

525 locus using transposable element insertion sites as the focal variant.

526

527 Further evidence that insecticide resistance alleles may be adaptive comes from elevated

528 levels of population differentiation (Taylor et al. 1995). We examined Cyp6w1 allele

529 frequencies in African, European, Australian and North American populations (Pool et al.

530 2012, Martins et al. 2014, Bergman and Haddrill 2015, Reinhardt et al. 2014; table 1); most

531 populations indicate segregation of each of the triallelic states, except for Portugal, which

532 does not exhibit Ala370. The frequency of the variants, however, shows marked population

533 differentiation, perhaps indicating geographical variation in selection intensity by DDT or

534 other insecticides.

535

24

536 Table 1: Population frequencies of CYP6W1 codon 370 tri-alleles

Allele Frequency Data Sample Continent Population Source Size VAL ALA GLY

Pool et al. Africa Combined 139 0.9 0.07 0.3 2012 Reinhardt Queensland 17 0.53 0.03 0.46 et al. 2014 Australia Reinhardt Tasmania 15 1 0 0 et al. 2014 Haddrill and France - A 50 0.78 0.02 0.2 Bergman 2012 Europe Pool et al. France - B 8 0.5 0 0.5 2012 Martins et Portugal 12 0.95 0 0.05 al. 2014 Reinhardt Maine 16 0.69 0.25 0.06 et al. 2014 North North Mackay et 162 0.54 0.32 0.14 America Carolina al. 2012 Reinhardt Florida 16 0.63 0.31 0.06 et al. 2014 537

538

539 DISCUSSION

540 The Drosophila genetic reference panel (DGRP) is a powerful resource for identifying natural

541 variation contributing to a range of phenotypes (Mackay and Huang 2017). However, it is of

542 particular interest to the study of insecticide resistance due to the important role insecticides

543 have apparently played in the evolutionary history of this population; two of the strongest

544 peaks of selection in the DGRP, genome wide, are Ace and Cyp6g1 (Garud et al. 2015).

545

25

546 Ace is the molecular target of and carbamate insecticides, and within the

547 DGRP segregate three of the four substitutions in Ace known to confer resistance to these

548 insecticide classes in D. melanogaster (Menozzi et al. 2004, Battlay et al. 2016). Cyp6g1 has

549 likewise been demonstrated to provide resistance to insecticides, including the

550 and nitenpyram (Daborn et al. 2001, Daborn et al. 2007, Joussen et al. 2008)

551 and the organophosphate azinphos-methyl (Battlay et al. 2016). There is also evidence that

552 Cyp6g1 may contribute to resistance in other including ,

553 , and (Kikkawa 1961, Pyke 2004). But the best studied of Cyp6g1’s

554 insecticide substrates remains DDT (Daborn et al. 2002, Daborn et al. 2007, Joussen et al.

555 2008, Schmidt et al. 2010); derived alleles of Cyp6g1 have previously been shown to

556 contribute significantly to DDT resistance, with the Cyp6g1-BP allele explaining ~16% of the

557 total variance in DDT resistance in a field population from Queensland, Australia (Schmidt et

558 al. 2010).

559

560 While DDT is not used in much of the world today, DDT resistance is a trait of interest for

561 two important reasons: Firstly, it may help to explain the strong and recent sweeps at the

562 Cyp6g1 locus (Schmidt et al. 2010, Kolaczkowski et al. 2011, Garud et al. 2015), and

563 secondly it was the subject of the classic work by Crow (1956), Sokal and Hunter (1954) and

564 others into microevolution. We performed GWAS of two DDT resistance phenotypes in

565 hopes of dissecting the quantitative nature of a selective response to DDT exposure.

566

567 Cyp6g1 was not identified in the GWAS: The association of Cyp6g1 allelic variation with

568 either the 4hrkd or 24hrm DDT phenotypes is not significant, even at the genome-wide

569 significance threshold. This is partly attributable to the Cyp6g1 sweep itself; resistant

570 Cyp6g1-AA and Cyp6g1-BA alleles are present in all but nine DGRP lines (all nine were

26

571 phenotyped in this study), while the Cyp6g1-BP allele, which confers the most drastic

572 increase in DDT resistance (Schmidt et al. 2010), is absent from the population. Although

573 much of the power to detect the effect of Cyp6g1 resistance is gone from the DGRP, Battlay

574 et al. (2016) have previously shown that a haplotype in linkage disequilibrium with the

575 ancestral Cyp6g1-M allele was strongly associated with susceptibility to a low dose of the

576 organophosphate insecticide azinphos-methyl in DGRP larvae. One reason Cyp6g1 was

577 associated with the azinphos-methyl phenotype and not our DDT phenotypes could be the

578 relative contributions of Cyp6g1 overexpression to these traits: Daborn et al. (2007)

579 demonstrated a 3-fold increase in adult DDT LD50 when Cyp6g1 was overexpressed using the

580 GAL4-UAS system, while Battlay et al. (2016) observed a 6.5-fold increase in larval

581 azinphos-methyl LD50 in the same crosses. It is also possible that the dose of the DDT used in

582 this study did not maximize the chances of recovering the Cyp6g1 variation present in the

583 DGRP. Steele et al. (2014) also reported an absence of Cyp6g1 signal in their comparison of

584 91-R and 91-C genome sequences. This is due, however, to Steele et al. (2014) limiting their

585 study to analysis of open reading frames. We revisited genome sequence alignments

586 generated by Steele et al. (2014), and found that evidence of the Accord LTR insertion into

587 the 5’ UTR of Cyp6g1 (Daborn et al. 2002) as well as copy number variation in both Cyp6g1

588 and Cyp6g2 (Schmidt et al. 2010), both identifiers of Cyp6g1 resistance alleles, are present in

589 91-R but not 91-C.

590

591 The two detoxification enzymes implicated in DDT mortality: Cyp6g1 and Cyp6w1 could

592 interact in in parallel or in series. Epistatic interactions between alleles at the two loci were

593 examined but unfortunately, we did not have the power to test for such interaction in the

594 DGRP because there is only one DGRP line (Ral-486) that is homozygous for the susceptible

595 Cyp6g1-M allele and homozygous for the resistant Cyp6w1-Ala allele. We have however

27

596 assessed pairwise epistatic interactions of the GWAS nominally associated variants listed in

597 the combined mortality and knockdown lists, and none were significantly extreme to pass the

598 Bonferroni correction threshold (~0.000015) for either the knockdown (lowest value: 0.0002)

599 or the mortality (lowest value: 0.0015) traits.

600

601 Variation at CG10737 contributes to four-hour knockdown but not 24-hour mortality:

602 The GWAS presented here demonstrate that the two DDT-associated traits under

603 examination have distinct genetic architectures, and for each trait we have generated

604 transgenic evidence to support the involvement of a top candidate in DDT phenotype

605 variation. CG10737 has previously been implicated as a DDT resistance candidate, being one

606 of 158 genes shown to be significantly differentially regulated between DDT-resistant 91-R

607 and a standard susceptible lab stock (Canton-S; Pedra et al. 2004). Moreover, inspection of

608 the 91-R and 91-C genome sequences generated by Steele et al. (2014) shows that both

609 CG10737 GWAS SNPs, associated with increased knockdown susceptibility, are present in

610 91-C but absent in 91-R. Using a completely independent approach we have found that these

611 non-coding variants in CG10737, present in ~10% of the DGRP, are strongly associated with

612 4hrkd. Our RNAi knockdown of CG10737 increased DDT resistance, consistent with Pedra

613 et al.’s (2004) observation that CG10737 transcript was less abundant in the resistant 91-R

614 line. We note that the significantly associated GWAS variants also lie in a region that

615 modENCODE found to bind the Trithorax-like (Trl) GAGA factor, suggesting an untested

616 mechanism linking the naturally occurring polymorphisms to transcriptional abundance.

617

618 An important insight gained through these experiments is that they demonstrate a role of

619 CG10737 in the muscles of flies, and a relationship between DDT and muscle biology.

620 CG10737 knockdown in the muscles resulted in a two to four-fold increase in resistance,

28

621 demonstrating that DDT perturbation can be ameliorated by genetic manipulation limited to

622 the muscles.

623

624 The molecular function of CG10737 has not been well characterized. It is a gene with 14

625 coding exons and produces at least 12 alternatively spliced mRNA isoforms. It putatively

626 encodes a transmembrane protein with C1 and C2 calcium/lipid binding domain and it has

627 been annotated as ‘protein kinase c-like’ although it lacks a catalytic domain. C1 domains

628 have cysteine motifs that form zinc fingers and typically bind diacylglycerol (Colón-

629 González and Kazanietz 2006) while C2 domains typically bind calcium ions and are

630 associated with protein-protein and protein-membrane interactions (Ochoa 2001). Given the

631 important role that calcium plays in synapse and muscle signaling perhaps a decrease in the

632 expression level of CG10737 makes flies less sensitive to Ca2+ ion signaling. It thereby may

633 reduce the expression of the DDT-knockdown phenotype by mediating the effect of over-

634 stimulation of muscle cells caused by the uncontrolled firing of nerves due to DDT’s effect

635 on the voltage gated sodium channel, PARA, rather than influencing nerve functionality

636 directly.

637

638 While CG10737 is associated with DDT-knockdown in the GWAS it was not significantly

639 associated with 24-hour mortality GWAS. Furthermore, a re-examination of the DDT-

640 knockdown phenotype for DGRP lines over the initial time course of the experiment found

641 that variants in this gene were only strongly associated with knockdown over a narrow time

642 window (3-5 hours), suggesting that the gene’s role in knockdown resistance is transient.

643 Additionally, in the RNAi experiments the dose that effectively interrogated the temporally

644 defined knockdown was so high that the flies did not live long enough to accurately score 24-

645 hour mortality. This is attributable to genetic background differences between the lines going

29

646 into the RNAi crosses and the DGRP lines. These issues highlight the differences in genetic

647 architecture of subtly different DDT-related traits. While the knockdown trait may give us

648 insight into the perturbations of DDT onto insect physiology, and the function of previously

649 uncharacterized genes, one has to ask how relevant this variation is to the field exposure to

650 DDT. We were unable to detect any signature of selection around variants in this gene, or

651 indeed any high frequency derived knockdown-associated variants, that would indicate

652 microevolutionary change in the recent history of the Raleigh population of D. melanogaster

653 – the source of the DGRP. However, this may reflect an issue of power to detect subtle

654 changes in frequency of pre-existing variants rather than an irrelevance of the phenotype to

655 survivorship in the field. We also note that in early DDT selection experiments in D.

656 melanogaster by Hunter and Sokal (1954) pupation height was correlated with DDT

657 resistance and that Riedl et al. (2007) mapped a pupal height QTL to a narrow region

658 encompassing CG10737.

659

660 Triallelic variation at Cyp6w1 separates DDT-resistant allele from a selective sweep

661 signal: Like CG10737, Cyp6w1 was previously identified as a DDT resistance candidate by

662 Pedra et al. (2004), who found it had high transcriptional abundance in 91-R relative to the

663 susceptible Canton-S. Again, inspection of the Steele et al. (2014) genomes revealed that 91-

664 R also carries the DDT resistance-associated Cyp6w1_ALA allele, whereas 91-C carries the

665 Cyp6w1_GLY allele. While our transgenic manipulation of this gene shows that increased

666 expression of three different alleles of Cyp6w1 can indeed lead to increased resistance to

667 DDT, it is amino acid variants that appear to explain the difference in resistance between

668 DGRP lines. The associated variant is a rare triallelic site where each state encodes a

669 different amino acid (Valine GTG, Alanine GCG or Glycine GGG). No sites are in linkage

670 disequilibrium with it that are also significantly associated with 24-hour mortality, and the

30

671 sharp and immediate decrease in Extended Haplotype Homozygosity for both the Val370 and

672 Ala370 variants makes it highly unlikely that this variant is a proxy for an undetected, linked

673 regulatory change polymorphism. Amino acid site 370 of CYP6W1 is predicted to occur in

674 Substrate Recognition Domain SRS5 (Gotoh 1992, Zawaira et al. 2011), of an enzyme in a

675 gene family frequently associated with insecticide resistance. It is therefore completely

676 credible that it acts to metabolize DDT, perhaps in a similar way to its paralog CYP6G1

677 (Joußen et al. 2008, Hoi et al. 2014).

678

679 Inspection of CYP6W1 site 370 in related species suggest that Val370 is the ancestral state of

680 this site; the Ala370 and Gly370 substitutions are therefore both derived. The Ala370 variant

681 is strongly associated (p=5×10-10) with reduced 24-hour morality in our GWAS, and this is

682 supported by our transgenic work, which found overexpression of Cyp6w1_ALA increased

683 DDT resistance relative to Cyp6w1_VAL and Cyp6w1_GLY. Calculation of iHS and nSL did

684 identify a significant signal of selection at Cyp6w1_ALA, but it was associated with the other

685 derived allele, Gly370. This allele is not significantly associated with 24-hour mortality in

686 our GWAS, and transgenic overexpression of Cyp6w1_GLY does confer resistance to DDT

687 relative to Cyp6w1_VAL. There is no evidence, therefore, that DDT is responsible for the

688 selective footprint identified at Cyp6w1, and we propose that the Gly370 mutation increases

689 fitness under a different selective pressure, possibly by increasing the affinity of Cyp6w1 to

690 another insecticide. The extreme population differentiation observed (TABLE 1) is also

691 consistent with multiple selective agents acting on this site.

692

693 GWAS-associated variants in 91-R and 91-C: Interestingly, resistance-associated variants

694 in CG10737 and Cyp6w1 from our GWAS were present in 91-R but not 91-C. To see if any

695 of our other DGRP resistance candidates were differentiated between 91-R and 91-C, we

31

696 manually searched the genome sequences of these lines generated by Steele et al. (2014) for

697 each of our GWAS candidates (p<1×10-5; file S2). 50 distinct GWAS candidate variants

698 varied between 91-R and 91-C. Of these, 17 (including the two SNPs in CG10737 and the

699 Cyp6w1 triallele) varied in the expected direction, with the ‘resistant’ allele in 91-R and the

700 ‘susceptible’ allele in 91-C (table 2). Thus, these variants may have been segregating in the

701 population used to found 91R and 91C and potentially contributed to the strong selection

702 response observed in 91R.

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

32

721 Table 2: GWAS resistance variants enriched in 91R but not 91C

Phenotype Pipeline Location [V6] p-value Gene Site class 91-C 91-R

non- 24hrm PLINK 2R:6,174,944 1.72E-09 Cyp6w1 T C synonymous 4.28E- DGRP2, 4hrkd 2R:19,169,429 07, CG10737 intron T G PLINK 7.58E-07

4hrkd DGRP2 2L:14,188,350 1.64E-06 smi35A intron G G/C

4hrkd PLINK 3L:4,059,748 1.75E-06 - intergenic T C

4hrkd PLINK 2L:5,600,056 1.97E-06 - intergenic C A/C

4hrkd PLINK 3L:12,404,415 3.00E-06 CG32103 synonymous C C/T

4hrkd DGRP2 3R:30,968,896 3.70E-06 - intergenic A ATA

5.45E- DGRP2, 4hrkd 2R:19,169,399 06; CG10737 intron T C PLINK 7.62E-06

24hrm PLINK 2L:10,546,377 5.68E-06 Trim9 intron A A/G

4hrkd DGRP2 3L:7,102,962 6.79E-06 form3 intron C C/T

4hrkd PLINK 3L:6,935,976 7.88E-06 - intergenic A A/G

24hrm PLINK 3L:15,854,279 8.76E-06 pHCl intron G A

4hrkd DGRP2 2L:8,232,517 9.49E-06 Pvr intron C/A C

4hrkd DGRP2 2L:4,086,206 9.57E-06 ed intron T G

4hrkd DGRP2 2L:19,231,491 1.66E-05 - intergenic A G

2.02E- DGRP2, 4hrkd 3L:7,495,781 05; Cyp4d8 intron G T PLINK 7.38E-06

4hrkd DGRP2 3R:25,716,964 2.15E-05 LpR2 intron A C

722

723

33

724 Other DDT resistance genes: Three substitutions in Cyp6a2 increasing its DDT

725 metabolism capacity have been described in RalDDTR (Amichot et al. 2004), a lab strain

726 selected for DDT resistance over 50 generations (Cùany et al. 1990). The first two

727 substitutions (R335S and L336V) are absent from the DGRP, as well as population samples

728 from Africa (DPGP; Pool et al. 2012), France and Portugal in Europe, Florida and Maine in

729 the USA, and Queensland and Tasmania in Australia (Bergman and Haddrill 2015; Pool et al.

730 2012; Martins et al. 2014; Reinhardt et al 2014). The third substitution, V476L, is found in

731 around a quarter of DGRP and DPGP sequences (24.1% of DGRP, 26.5% of screened

732 DGRP, 32.2% of DPGP), but this variant does not affect DDT metabolism by itself (Amichot

733 et al. 2004), and was not associated with either 4hrkd or 24hrm at p<1×10-5. Similarly, the

734 resistance-linked Cyp6a2 promoter variant described by Wan et al. (2013) is present in the

735 DGRP (68.8% of DGRP, 68.6% of screened DGRP), but was not associated with either of

736 our phenotypes at p<1×10-5.

737

738 Another P450, Cyp12d1, is overexpressed in the DDT-resistant Wisc-1 strain relative to

739 Canton-S (Pedra et al. 2004), and has been shown to increase DDT resistance when

740 overexpressed transgenically (Daborn et al. 2007), as well as reducing resistance when

741 knocked down with RNAi (Gellatly et al. 2015). We did not find associations with Cyp12d1

742 in any of our four GWAS, nor did we find strong correlations between our phenotypes and

743 Cyp12d1 copy number variation or transcription level (file S3). Steele et al. (2014) likewise

744 found no differentiation in Cyp12d1 coding regions between 91-R and 91-C, and our

745 inspection of these genomes found no differentiation in copy number between the two strains.

746

747 Cyp12d1 has been shown to be inducible by DDT (Brandt et al. 2002, Festucci- Buselli et al.

748 2005, Willoughby et al. 2006), as well as other xenobiotics (Willoughby et al. 2006, Misra et

34

749 al. 2011). Our results do not rule out the involvement of this gene in DDT resistance in the

750 DGRP, but show that genomic variation at Cyp12d1, and its transcription level without

751 induction, do not correlate with the DDT-resistance phenotypes measured in this study.

752

753 In the case of para, neither the kdr nor super-kdr resistance mutations are present in the

754 DGRP, nor any of the analogous sites conferring resistance in mutagenized lines

755 characterized by Pittendrigh et al. (1997) or Lindsay et al. (2008).

756

757

758 CONCLUSIONS

759 There are three main lines of evidence used in the literature to associate genes with DDT

760 resistance in D. melanogaster: Differentiation between resistant and susceptible laboratory

761 strains, association of natural alleles with resistance in outbred populations, and transgenic

762 manipulation in a controlled genetic background. Until now, only Cyp6g1 has bridged the

763 divides between these approaches. With this study, Cyp6w1 and CG10737 can be added to

764 this list. These results alone support a polygenic model and are reinforced by the possibility

765 that other genes from our GWAS, or other studies, also contribute to DDT resistance but have

766 yet to be verified. We have focused on two specific DDT-related traits; four-hour knockdown

767 and 24-hour mortality, both in adult females with a single dose of DDT, and they give only

768 partially correlated results. Which of these has greater relevance in the field, however, is not

769 clear. Furthermore, there are many alternate parameter values and indeed parameters (eg.

770 temperature; Fournier-Level et al. 2016) that may make lab-based DDT assays more

771 accurately reflect the variation most relevant to field survivorship. Perhaps one of the most

772 compelling diagnostics for a relevant assay would be if the loci uncovered showed the

773 molecular signs of selection. Previous studies have revealed that the major DDT-resistance

35

774 locus of some populations, Cyp6g1, shows the hallmarks of a selective sweep. We did not,

775 however, observe signs of selection around our DGRP GWAS candidates. While we did

776 detect signs of a sweep at Cyp6w1 it was associated with a second derived variant, not with

777 the variant associated with DDT resistance. Given that Cyp6g1 is also capable of providing

778 resistance to other insecticides, the failure to find sweeps at DDT-specific resistance loci

779 strengthens the possibility that other insecticides have contributed to driving the Cyp6g1

780 selective sweeps in field populations of Drosophila melanogaster. In the case of DDT

781 resistance alleles that negatively impact fitness, relaxed DDT selection since the insecticide’s

782 deregulation may explain their absence from the population. Finally, it is also a possibility

783 that statistics such as iHS and nSL are insensitive to the slight changes in allele frequency that

784 might occur if a trait is largely polygenic.

785

786

787 ACKNOWLEDGEMENTS

788 This work was supported by ARC Discovery Grant DP0985013. Jason Somers contributed to

789 the cloning and transgenic delivery of the Cyp6w1 constructs. We thank Trudy Mackay for

790 her advice and support for this work, and the anonymous reviewers for suggestions that

791 improved this manuscript.

792

36

793 LITERATURE CITED

794 Allen, H. L., Estrada, K., Lettre, G., Berndt, S. I., Weedon, M. N., Rivadeneira, F., ... &

795 Ferreira, T. (2010). Hundreds of variants clustered in genomic loci and biological pathways

796 affect human height. Nature, 467(7317), 832-838.

797

798 Amichot, M., Tares, S., Brun‐Barale, A., Arthaud, L., Bride, J. M., & Bergé, J. B. (2004).

799 Point mutations associated with insecticide resistance in the Drosophila cytochrome P450

800 Cyp6a2 enable DDT metabolism. European Journal of Biochemistry, 271(7), 1250-1257.

801

802 Attrill, H., Falls, K., Goodman, J. L., Millburn, G. H., Antonazzo, G., Rey, A. J., ... &

803 FlyBase Consortium. (2015). FlyBase: establishing a Gene Group resource for Drosophila

804 melanogaster. Nucleic acids research, gkv1046.

805

806 Battlay, P., Schmidt, J. M., Fournier-Level, A., & Robin, C. (2016). Genomic and

807 Transcriptomic Associations Identify a New Insecticide Resistance Phenotype for the

808 Selective Sweep at the Cyp6g1 Locus of Drosophila melanogaster. G3: Genes| Genomes|

809 Genetics, 6(8), 2573-2581.

810

811 Baucom, R. S. (2016). The remarkable repeated evolution of herbicide resistance. American

812 journal of botany, 103(2), 181-183.

813

814 Bergman, C. M., & Haddrill, P. R. (2015). Strain-specific and pooled genome sequences for

815 populations of Drosophila melanogaster from three continents. F1000Research, 4.

816

817 Bischof, J., Maeda, R. K., Hediger, M., Karch, F., & Basler, K. (2007). An optimized

37

818 transgenesis system for Drosophila using germ-line-specific φC31 integrases. Proceedings of

819 the National Academy of Sciences, 104(9), 3312-3317.

820

821 Bonn, B. Myosin heavy chain like (Mhcl) agiert während der Embryonalentwicklung und

822 Myogenese von Drosophila melanogaster in Redundanz zu Zipper, die Funktion des C2-

823 Domänen-Proteins CG10737-P während der Muskelentwicklung bleibt unklar. PhD Thesis,

824 Marburg: Philipps-Universität, 2010

825

826 Brandt, A., Scharf, M., Pedra, J. H. F., Holmes, G., Dean, A., Kreitman, M., & Pittendrigh,

827 B. R. (2002). Differential expression and induction of two Drosophila cytochrome P450

828 genes near the Rst (2) DDT locus. Insect molecular biology, 11(4), 337-341.

829

830 Browning, S. R., & Browning, B. L. (2007). Rapid and accurate haplotype phasing and

831 missing-data inference for whole-genome association studies by use of localized haplotype

832 clustering. The American Journal of Human Genetics, 81(5), 1084-1097.

833

834 Catania, F., Kauer, M. O., Daborn, P. J., Yen, J. L., fFrench‐Constant, R. H., & Schlötterer,

835 C. (2004). World‐wide survey of an Accord insertion and its association with DDT resistance

836 in Drosophila melanogaster. Molecular ecology, 13(8), 2491-2504.

837

838 Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015).

839 Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience,

840 4(1), 7.

841

38

842 Chintapalli, V. R., Wang, J., & Dow, J. A. (2007). Using FlyAtlas to identify better

843 Drosophila melanogaster models of human disease. Nature genetics, 39(6), 715-720.

844

845 Chung, H., Bogwitz, M. R., McCart, C., Andrianopoulos, A., Batterham, P., & Daborn, P. J.

846 (2007). Cis-regulatory elements in the Accord retrotransposon result in tissue-specific

847 expression of the Drosophila melanogaster insecticide resistance gene Cyp6g1. Genetics,

848 175(3), 1071-1077.

849

850 Clowers, K. J., Lyman, R. F., Mackay, T. F., & Morgan, T. J. (2010). Genetic variation in

851 senescence marker protein-30 is associated with natural variation in cold tolerance in

852 Drosophila. Genetics research, 92(02), 103-113.

853

854 Colón-González, F., & Kazanietz, M. G. (2006). C1 domains exposed: from diacylglycerol

855 binding to protein–protein interactions. Biochimica et Biophysica Acta (BBA)-Molecular and

856 Cell Biology of Lipids, 1761(8), 827-837.

857

858 Comeron, J. M., Ratnappan, R., & Bailin, S. (2012). The many landscapes of recombination

859 in Drosophila melanogaster. PLoS genetics, 8(10), e1002905.

860

861 Crow, J. F. (1954). Analysis of a DDT-resistant strain of Drosophila. Journal of Economic

862 Entomology, 47(3), 393-398.

863

864 Crow, J. F. (1956). Genetics of DDT resistance in Drosophila. In Cytologia Proc. Int. Genet.

865 Symp (pp. 408-409).

866

39

867 Crow, J. F. (1957). Genetics of insect resistance to chemicals. Annual review of entomology,

868 2(1), 227-246.

869

870 Cùany, A., Pralavorio, M., Pauron, D., Berge, J. B., Fournier, D., Blais, C., ... & Lange, R.

871 (1990). Characterization of microsomal oxidative activities in a wild-type and in a DDT

872 resistant strain of Drosophila melanogaster. Pesticide Biochemistry and Physiology, 37(3),

873 293-302.

874

875 Cunha, P. M., Sandmann, T., Gustafson, E. H., Ciglar, L., Eichenlaub, M. P., & Furlong, E.

876 E. (2010). Combinatorial binding leads to diverse regulatory responses: Lmd is a tissue-

877 specific modulator of Mef2 activity. PLoS Genet, 6(7), e1001014.

878

879 Daborn, P., Boundy, S., Yen, J., & Pittendrigh, B. (2001). DDT resistance in Drosophila

880 correlates with Cyp6g1 over-expression and confers cross-resistance to the

881 imidacloprid. Molecular Genetics and Genomics, 266(4), 556-563.

882

883 Daborn, P. J., Yen, J. L., Bogwitz, M. R., Le Goff, G., Feil, E., Jeffers, S., ... & Feyereisen,

884 R. (2002). A single P450 allele associated with insecticide resistance in Drosophila. Science,

885 297(5590), 2253-2256.

886

887 Daborn, P. J., Lumb, C., Boey, A., Wong, W., & Batterham, P. (2007). Evaluating the

888 insecticide resistance potential of eight Drosophila melanogaster cytochrome P450 genes by

889 transgenic over-expression. Insect biochemistry and molecular biology, 37(5), 512-519.

890

40

891 Dapkus, D., & Merrell, D. J. (1977). Chromosomal analysis of DDT-resistance in a long-term

892 selected population of Drosophila melanogaster. Genetics, 87(4), 685-697.

893

894 Dapkus, D. (1992). Genetic localization of DDT resistance in Drosophila melanogaster

895 (Diptera: Drosophilidae). Journal of economic entomology, 85(2), 340-347.

896

897 Ferrer-Admetlla et al. 2014. On detecting incomplete soft or hard selective sweeps using

898 haplotype structure. Mol. Biol. Evol. doi: 10.1093/molbev/msu077.

899

900 Festucci‐Buselli, R. A., Carvalho‐Dias, A. S., Oliveira‐Andrade, D., Caixeta‐Nunes, C., Li,

901 H. M., Stuart, J. J., ... & Pittendrigh, B. R. (2005). Expression of Cyp6g1 and Cyp12d1 in

902 DDT resistant and susceptible strains of Drosophila melanogaster. Insect molecular biology,

903 14(1), 69-77.

904

905 fFrench-Constant, R. H. (2013). The molecular genetics of insecticide resistance. Genetics,

906 194(4), 807.

907

908 Fournier-Level, A., Neumann-Mondlak, A., Good, R. T., Green, L. M., Schmidt, J. M., &

909 Robin, C. (2016). Behavioural response to combined insecticide and temperature stress in

910 natural populations of Drosophila melanogaster. Journal of evolutionary biology, 29(5),

911 1030-1044.

912

913 Garud, N. R., Messer, P. W., Buzbas, E. O., & Petrov, D. A. (2015). Recent selective sweeps

914 in North American Drosophila melanogaster show signatures of soft sweeps. PLoS Genet,

915 11(2), e1005004.

41

916

917 Gellatly, K. J., Yoon, K. S., Doherty, J. J., Sun, W., Pittendrigh, B. R., & Clark, J. M. (2015).

918 RNAi validation of resistance genes and their interactions in the highly DDT-resistant 91-R

919 strain of Drosophila melanogaster. Pesticide biochemistry and physiology, 121, 107-115.

920

921 Gotoh, O. (1992). Substrate recognition sites in cytochrome P450 family 2 (CYP2) proteins

922 inferred from comparative analyses of amino acid and coding nucleotide sequences. Journal

923 of Biological Chemistry, 267(1), 83-90.

924

925 Hoi, K. K., Daborn, P. J., Battlay, P., Robin, C., Batterham, P., O’Hair, R. A., & Donald, W.

926 A. (2014). Dissecting the insect metabolic machinery using twin ion mass spectrometry: a

927 single P450 enzyme metabolizing the insecticide imidacloprid in vivo. Analytical chemistry,

928 86(7), 3525-3532.

929

930 Huang, W., Massouras, A., Inoue, Y., Peiffer, J., Ràmia, M., Tarone, A. M., ... & Magwire,

931 M. M. (2014). Natural variation in genome architecture among 205 Drosophila melanogaster

932 Genetic Reference Panel lines. Genome research, 24(7), 1193-1208.

933

934 Huang, W., Carbone, M. A., Magwire, M. M., Peiffer, J. A., Lyman, R. F., Stone, E. A., ... &

935 Mackay, T. F. (2015). Genetic basis of transcriptome diversity in Drosophila melanogaster.

936 Proceedings of the National Academy of Sciences, 112(44), E6010-E6019.

937

938 Joußen, N., Heckel, D. G., Haas, M., Schuphan, I., & Schmidt, B. (2008). Metabolism of

939 imidacloprid and DDT by P450 CYP6G1 expressed in cell cultures of Nicotiana tabacum

42

940 suggests detoxification of these insecticides in Cyp6g1‐overexpressing strains of Drosophila

941 melanogaster, leading to resistance. Pest management science, 64(1), 65-73.

942

943 Kikkawa, H. (1961). Genetical studies on the resistance to parathion in Drosophila

944 melanogaster. Annu Rep Sci Wks Osaka Univ, 9, 1-20.

945

946 King, J. C., & Sømme, L. (1958). Chromosomal analyses of the genetic factors for resistance

947 to DDT in two resistant lines of Drosophila melanogaster. Genetics, 43(3), 577.

948

949 Kolaczkowski, B., Kern, A. D., Holloway, A. K., & Begun, D. J. (2011). Genomic

950 differentiation between temperate and tropical Australian populations of Drosophila

951 melanogaster. Genetics, 187(1), 245-260.

952

953 Kuruganti, S., Lam, V., Zhou, X., Bennett, G., Pittendrigh, B. R., & Ganguly, R. (2007).

954 High expression of Cyp6g1, a cytochrome P450 gene, does not necessarily confer DDT

955 resistance in Drosophila melanogaster. Gene, 388(1), 43-53.

956

957 Lack, J. B., Cardeno, C. M., Crepeau, M. W., Taylor, W., Corbett-Detig, R. B., Stevens, K.

958 A., ... & Pool, J. E. (2015). The Drosophila genome nexus: a population genomic resource of

959 623 Drosophila melanogaster genomes, including 197 from a single ancestral range

960 population. Genetics, 199(4), 1229-1241.

961

962 Lande, R. (1983). The response to selection on major and minor mutations affecting a

963 metrical trait. Heredity, 50(1), 47-65.

964

43

965 Leinonen, R., Sugawara, H., & Shumway, M. (2010). The sequence read archive. Nucleic

966 acids research, 39(suppl_1), D19-D21.

967

968 Lindsay, H. A., Baines, R., Lilley, K., Jacobs, H. T., & O'Dell, K. M. (2008). The dominant

969 cold-sensitive Out-cold mutants of Drosophila melanogaster have novel missense mutations

970 in the voltage-gated sodium channel gene paralytic. Genetics, 180(2), 873-884.

971

972 Low, W. Y., Feil, S. C., Ng, H. L., Gorman, M. A., Morton, C. J., Pyke, J., ... & Gooley, P.

973 R. (2010). Recognition and detoxification of the insecticide DDT by Drosophila

974 melanogaster glutathione S-transferase D1. Journal of molecular biology, 399(3), 358-366.

975

976 Mackay, T. F., Richards, S., Stone, E. A., Barbadilla, A., Ayroles, J. F., Zhu, D., ... &

977 Richardson, M. F. (2012). The Drosophila melanogaster genetic reference panel. Nature,

978 482(7384), 173-178.

979 Mackay, T.F.C. and W. Huang (2017). Charting the genotype-phenotype map:lessons from

980 the Drosophila melanogaster Genetic Reference Panel. WIREs Dev. Biol. e289 doi:

981 10.1002/wdev.289.

982

983 Macnair, M. R. (1991). Why the evolution of resistance to anthropogenic toxins normally

984 involves major gene changes: the limits to natural selection. Genetica, 84(3), 213-219.

985

986 Martins, N. E., Faria, V. G., Nolte, V., Schlötterer, C., Teixeira, L., Sucena, É., & Magalhães,

987 S. (2014). Host adaptation to viruses relies on few genes with different cross-resistance

988 properties. Proceedings of the National Academy of Sciences, 111(16), 5938-5943.

989

44

990 McKenzie, J. A., & Batterham, P. (1994). The genetic, molecular and phenotypic

991 consequences of selection for insecticide resistance. Trends in Ecology & Evolution, 9(5),

992 166-169.

993

994 McKenzie, J. A. (1996). Ecological and evolutionary aspects of insecticide resistance. RG

995 Landes.

996

997 McKenzie, J. A. (2000). The character or the variation: the genetic analysis of the insecticide-

998 resistance phenotype. Bulletin of entomological research, 90(01), 3-7.

999

1000 Menozzi, P., Shi, M. A., Lougarre, A., Tang, Z. H., & Fournier, D. (2004). Mutations of

1001 acetylcholinesterase which confer insecticide resistance in Drosophila melanogaster

1002 populations. BMC evolutionary biology, 4(1), 1.

1003

1004 Misra, J. R., Horner, M. A., Lam, G., & Thummel, C. S. (2011). Transcriptional regulation of

1005 xenobiotic detoxification in Drosophila. Genes & development, 25(17), 1796-1806.

1006

1007 Mitchell, A., Chang, H. Y., Daugherty, L., Fraser, M., Hunter, S., Lopez, R., ... & Sangrador-

1008 Vegas, A. (2014). The InterPro protein families database: the classification resource after 15

1009 years. Nucleic acids research, gku1243.

1010

1011 Mueller, J. C., & Andreoli, C. (2004). Plotting haplotype-specific linkage disequilibrium

1012 patterns by extended haplotype homozygosity. Bioinformatics, 20(5), 786-787.

1013

45

1014 Ochoa, W. F., Garcia-Garcia, J., Fita, I., Corbalan-Garcia, S., Verdaguer, N., & Gomez-

1015 Fernandez, J. C. (2001). Structure of the C2 domain from novel protein kinase Cϵ. A

1016 membrane binding model for Ca 2+-independent C2 domains. Journal of molecular biology,

1017 311(4), 837-849.

1018

1019 Pedra, J. H. F., McIntyre, L. M., Scharf, M. E., & Pittendrigh, B. R. (2004). Genome-wide

1020 transcription profile of field-and laboratory-selected dichlorodiphenyltrichloroethane (DDT)-

1021 resistant Drosophila. Proceedings of the National Academy of Sciences of the United States of

1022 America, 101(18), 7034-7039.

1023

1024 Pittendrigh, B., Reenan, R., & Ganetzky, B. (1997). Point mutations in the Drosophila

1025 sodium channel gene para associated with resistance to DDT and insecticides.

1026 Molecular and General Genetics MGG, 256(6), 602-610.

1027

1028 Pool, J. E., Corbett-Detig, R. B., Sugino, R. P., Stevens, K. A., Cardeno, C. M., Crepeau, M.

1029 W., ... & Langley, C. H. (2012). Population genomics of sub-Saharan Drosophila

1030 melanogaster: African diversity and non-African admixture. PLoS Genet, 8(12), e1003080.

1031

1032 Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., ... & Sham,

1033 P. C. (2007). PLINK: a tool set for whole-genome association and population-based linkage

1034 analyses. The American Journal of Human Genetics, 81(3), 559-575.

1035

1036 Pyke, F. M., Bogwitz, M. R., Perry, T., Monk, A., Batterham, P., & McKenzie, J. A. (2004).

1037 The genetic basis of resistance to diazinon in natural populations of Drosophila melanogaster.

1038 Genetica, 121(1), 13-24.

46

1039

1040 Reinhardt, J. A., Kolaczkowski, B., Jones, C. D., Begun, D. J., & Kern, A. D. (2014). Parallel

1041 geographic variation in Drosophila melanogaster. Genetics, 197(1), 361-373.

1042

1043 Riedl, C. A., Riedl, M., Mackay, T. F., & Sokolowski, M. B. (2007). Genetic and behavioral

1044 analysis of natural variation in Drosophila melanogaster pupation position. Fly, 1(1), 23-32.

1045

1046 Roush, R. T., & McKenzie, J. A. (1987). Ecological genetics of insecticide and acaricide

1047 resistance. Annual review of entomology, 32(1), 361-380.

1048

1049 Sabeti, P. C., Reich, D. E., Higgins, J. M., Levine, H. Z., Richter, D. J., Schaffner, S. F., ... &

1050 Ackerman, H. C. (2002). Detecting recent positive selection in the human genome from

1051 haplotype structure. Nature, 419(6909), 832-837.

1052

1053 Sakuma, M. (1998). Probit analysis of preference data. Applied entomology and zoology,

1054 33(3), 339-347.

1055

1056 Sandmann, T., Jensen, L. J., Jakobsen, J. S., Karzynski, M. M., Eichenlaub, M. P., Bork, P.,

1057 & Furlong, E. E. (2006). A temporal map of transcription factor activity: mef2 directly

1058 regulates target genes at all stages of muscle development. Developmental cell, 10(6), 797-

1059 807.

1060

1061 Schlenke, T. A., & Begun, D. J. (2004). Strong selective sweep associated with a transposon

1062 insertion in Drosophila simulans. Proceedings of the National Academy of Sciences of the

1063 United States of America, 101(6), 1626-1631.

47

1064

1065 Schmidt, J. M., Good, R. T., Appleton, B., Sherrard, J., Raymant, G. C., Bogwitz, M. R., ... &

1066 Robin, C. (2010). Copy number variation and transposable elements feature in recent,

1067 ongoing adaptation at the Cyp6g1 locus. PLoS Genet, 6(6), e1000998.

1068

1069 Schoustra, S. E., Slakhorst, M., Debets, A. J., & Hoekstra, R. F. (2005). Comparing artificial

1070 and natural selection in rate of adaptation to genetic stress in Aspergillus nidulans. Journal of

1071 evolutionary biology, 18(4), 771-778.

1072

1073 Shepanski, M. C., Glover, J., & Kuhr, R. J. (1977). Resistance of Drosophila melanogaster to

1074 DDT. Journal of economic entomology, 70(5), 539-543.

1075

1076 Sokal, R. R., & Hunter, P. E. (1954). Reciprocal selection for correlated quantitative

1077 characters in Drosophila. Science, 119(3097), 649-651.

1078

1079 Szpiech, Z. A. and Hernandez, R. D. 2014. selscan: an efficient multithreaded program to

1080 perform EHH- based scans for positive selection. Molecular Biology and Evolution, 31:

1081 2824–2827.

1082

1083 Steele, L. D., Muir, W. M., Seong, K. M., Valero, M. C., Rangesa, M., Sun, W., ... &

1084 Pittendrigh, B. R. (2014). Genome-wide sequencing and an open reading frame analysis of

1085 dichlorodiphenyltrichloroethane (DDT) susceptible (91-C) and resistant (91-R) Drosophila

1086 melanogaster laboratory populations. PloS one, 9(6), e98584.

1087

48

1088 Steele, L. D., Coates, B., Valero, M. C., Sun, W., Seong, K. M., Muir, W. M., ... &

1089 Pittendrigh, B. R. (2015). Selective sweep analysis in the genomes of the 91-R and 91-C

1090 Drosophila melanogaster strains reveals few of the ‘usual suspects’ in

1091 dichlorodiphenyltrichloroethane (DDT) resistance. PloS one, 10(3), e0123066.

1092

1093 Strycharz, J. P., Lao, A., Li, H., Qiu, X., Lee, S. H., Sun, W., ... & Clark, J. M. (2013).

1094 Resistance in the highly DDT-resistant 91-R strain of Drosophila melanogaster involves

1095 decreased penetration, increased metabolism, and direct excretion. Pesticide biochemistry

1096 and physiology, 107(2), 207-217.

1097

1098 Tang, A. H., & Tu, C. P. D. (1995). Pentobarbital-induced changes in Drosophila glutathione

1099 S-transferase D21 mRNA stability. Journal of Biological Chemistry, 270(23), 13819-13825.

1100

1101 Taylor, M. F. J., Shen, Y., & Kreitman, M. E. (1995). A population genetic test of selection at

1102 the molecular level. Science, 270(5241), 1497.

1103

1104 Thorvaldsdóttir, H., Robinson, J. T., & Mesirov, J. P. (2013). Integrative Genomics Viewer

1105 (IGV): high-performance genomics data visualization and exploration. Briefings in

1106 bioinformatics, 14(2), 178-192.

1107

1108 Voight, B. F., Kudaravalli, S., Wen, X., & Pritchard, J. K. (2006). A map of recent positive

1109 selection in the human genome. PLoS Biol, 4(3), e72.

1110

49

1111 Wan, H., Liu, Y., Li, M., Zhu, S., Li, X., Pittendrigh, B. R., & Qiu, X. (2014). Nrf2/Maf‐

1112 binding‐site‐containing functional Cyp6a2 allele is associated with DDT resistance in

1113 Drosophila melanogaster. Pest management science, 70(7), 1048-1058.

1114

1115 Wang, K., Li, M., & Hakonarson, H. (2010). ANNOVAR: functional annotation of genetic

1116 variants from high-throughput sequencing data. Nucleic acids research, 38(16), e164-e164.

1117

1118 Weake, V. M., Lee, K. K., Guelman, S., Lin, C. H., Seidel, C., Abmayr, S. M., & Workman,

1119 J. L. (2008). SAGA‐mediated H2B deubiquitination controls the development of neuronal

1120 connectivity in the Drosophila visual system. The EMBO journal, 27(2), 394-405.

1121

1122 Willoughby, L., Chung, H., Lumb, C., Robin, C., Batterham, P., & Daborn, P. J. (2006). A

1123 comparison of Drosophila melanogaster detoxification gene induction responses for six

1124 insecticides, caffeine and phenobarbital. Insect biochemistry and molecular biology, 36(12),

1125 934-942.

1126

1127 Zawaira, A., Ching, L. Y., Coulson, L., Blackburn, J., & Chun Wei, Y. (2011). An expanded,

1128 unified substrate recognition site map for mammalian cytochrome P450s: analysis of

1129 molecular interactions between 15 mammalian CYP450 isoforms and 868 substrates. Current

1130 drug metabolism, 12(7), 684-700.

1131

1132 Zhan, M., Yamaza, H., Sun, Y., Sinclair, J., Li, H., & Zou, S. (2007). Temporal and spatial

1133 transcriptional profiles of aging in Drosophila melanogaster. Genome research, 17(8), 1236-

1134 1243.

1135

50

1136 Zhang, Q., Lambert, G., Liao, D., Kim, H., Robin, K., Tung, C. K., ... & Austin, R. H.

1137 (2011). Acceleration of emergence of bacterial antibiotic resistance in connected

1138 microenvironments. Science, 333(6050), 1764-1767.

1139

1140

1141 FIGURE LEGENDS

1142 Figure 1: DDT Knockdown and Mortality. (A) The mean percentage four-hour knockdown

1143 and 24-hour mortality for each DGRP line (based on at least three replicates of 10 individuals

1144 each) with lines ordered from lowest to highest for each trait. (B) The scatterplot shows the

1145 correlation between 24-hour mortality and four-hour knockdown. Flies that are knocked

1146 down early can recover and exhibit late mortality.

1147

1148 Figure 2: RNAi knockdown confirms that lowering CG10737 transcript abundance increases

1149 DDT resistance, and that this effect can be mediated by manipulating muscle expression.

1150 Flies in which CG10737 is knocked down in muscles using the mef2-GAL4 driver crossed to

1151 the VDRC CG10737-KK line are significantly more resistant to DDT than control flies which

1152 are the result of substituting the relevant genetic background line for either mef2-GAL4

1153 (w1118) or the CG10737-KK line (60100) in the cross.

1154

1155 Figure 3: Toxicology of transgenic lines overexpressing the three allelic forms of Cyp6w1

1156 (far right) and various controls. All three allelic forms of Cyp6w1 confer DDT resistance

1157 relative to controls, and Cyp6w1_ALA confers a further significant increase in DDT

1158 resistance relative to the other two alleles.

1159

51

1160 Figure 4: Extended Haplotype Homozygosity (EHH) plots, (A) around the Cyp6w1 locus

1161 using the triallelic second site of codon 370 as the focal variant, and (B) around the Cyp6g1

1162 locus using transposable element insertion sites as the focal variant.

1163

1164 File S1: PLINK-formatted phenotypes generated in this study. Knockdown and mortality

1165 were measured hourly at 1-24 hours, and at 30 hours.

1166

1167 File S2: Variants associated with 4hrkd and 24hrm phenotypes using DGRP2 and custom

1168 PLINK GWAS pipelines, p<1×10-5. Also includes calls for each variant in 91-R and 91-C

1169 genomes (generated by Steele et al. [2014]).

1170

1171 File S3: Phenotype to transcription level tests performed using data from the DGRP

1172 transcriptome (Huang et al. 2015).

1173

1174 File S4: The power to detect association given the number of lines, the heritability and the

1175 minimum number of replicates per line (Mackay and Huang 2017).

1176

1177

1178

1179

1180

1181

1182

1183

52

1184 Table 1: Population frequencies of CYP6W1 codon 370 tri-alleles

Allele Frequency Data Sample Continent Population Source Size VAL ALA GLY

Pool et al. Africa Combined 139 0.9 0.07 0.3 2012 Reinhardt Queensland 17 0.53 0.03 0.46 et al. 2014 Australia Reinhardt Tasmania 15 1 0 0 et al. 2014 Haddrill and France - A 50 0.78 0.02 0.2 Bergman 2012 Europe Pool et al. France - B 8 0.5 0 0.5 2012 Martins et Portugal 12 0.95 0 0.05 al. 2014 Reinhardt Maine 16 0.69 0.25 0.06 et al. 2014 North North Mackay et 162 0.54 0.32 0.14 America Carolina al. 2012 Reinhardt Florida 16 0.63 0.31 0.06 et al. 2014 1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

53

1195 Table 2: GWAS resistance variants enriched in 91R but not 91C

Phenotype Pipeline Location [V6] p-value Gene Site class 91-C 91-R

non- 24hrm PLINK 2R:6,174,944 1.72E-09 Cyp6w1 T C synonymous 4.28E- DGRP2, 4hrkd 2R:19,169,429 07, CG10737 intron T G PLINK 7.58E-07

4hrkd DGRP2 2L:14,188,350 1.64E-06 smi35A intron G G/C

4hrkd PLINK 3L:4,059,748 1.75E-06 - intergenic T C

4hrkd PLINK 2L:5,600,056 1.97E-06 - intergenic C A/C

4hrkd PLINK 3L:12,404,415 3.00E-06 CG32103 synonymous C C/T

4hrkd DGRP2 3R:30,968,896 3.70E-06 - intergenic A ATA

5.45E- DGRP2, 4hrkd 2R:19,169,399 06; CG10737 intron T C PLINK 7.62E-06

24hrm PLINK 2L:10,546,377 5.68E-06 Trim9 intron A A/G

4hrkd DGRP2 3L:7,102,962 6.79E-06 form3 intron C C/T

4hrkd PLINK 3L:6,935,976 7.88E-06 - intergenic A A/G

24hrm PLINK 3L:15,854,279 8.76E-06 pHCl intron G A

4hrkd DGRP2 2L:8,232,517 9.49E-06 Pvr intron C/A C

4hrkd DGRP2 2L:4,086,206 9.57E-06 ed intron T G

4hrkd DGRP2 2L:19,231,491 1.66E-05 - intergenic A G

2.02E- DGRP2, 4hrkd 3L:7,495,781 05; Cyp4d8 intron G T PLINK 7.38E-06

4hrkd DGRP2 3R:25,716,964 2.15E-05 LpR2 intron A C

1196

54