1 Supplementary Materials for

2

3 Aishwarya Korgaonkar, Clair Han, Andrew L. Lemire, Igor Siwanowicz, Djawed Bennouna, 4 Rachel Kopec, Peter Andolfatto, Shuji Shigenobu, David L. Stern 5 6 Correspondence to: [email protected] 7 8 9 This PDF file includes: 10 11 Materials and Methods 12 Supplementary Text 13 Figs. S1 to S16 14 Tables S1 to S11 15 16 Other Supplementary Materials for this manuscript include the following: 17 18 Movie S1 19

20

1

21 Materials and Methods

22

23 Imaging of leaves and fundatrices inside developing galls

24

25 Young Hamamelis virginiana (witch hazel) leaves or leaves with early stage galls of

26 Hormaphis cornu were fixed in Phosphate Buffered Saline (PBS: 137 mM NaCl, 2.7 mM KCl,

27 10 mM Na2HPO4, 1.8 mM KH2PO4, pH 7.4) containing 0.1% Triton X-100, 2%

28 paraformaldehyde and 0.5% glutaraldehyde (paraformaldehyde and glutaraldehyde were EM

29 grade from Electron Microscopy Services) at room temperature for two hours without agitation

30 to prevent the disruption of the aphid stylet inserted into leaf tissue. Fixed leaves or galls were

31 washed in PBS containing 0.1% Triton X-100, hand cut into small sections (~10 mm2), and

32 embedded in 7% agarose for subsequent sectioning into 0.3 mm thick sections using a Leica

33 Vibratome (VT1000s). Sectioned plant tissue was stained with Calcofluor White (0.1 mg/mL,

34 Sigma-Aldrich, F3543) and Congo Red (0.25 mg/mL, Sigma-Aldrich, C6767) in PBS

35 containing 0.1% Triton X-100 with 0.5% DMSO and Escin (0.05 mg/mL, Sigma-Aldrich,

36 E1378) at room temperature with gentle agitation for 2 days. Stained sections were washed with

37 PBS containing 0.1% Triton X-100. Soft tissues were digested and cleared to reduce light

38 scattering during subsequent imaging using a mixture of 0.25 mg/mL collagenase/dispase (Roche

39 #10269638001) and 0.25 mg/mL hyaluronidase (Sigma Aldrich #H3884) in PBS containing

40 0.1% Triton X-100 for 5 hours at 37°C. To avoid artifacts and warping caused by osmotic

41 shrinkage of soft tissue and agarose, samples were gradually dehydrated in glycerol (2% to 80%)

42 and then ethanol (20% to 100 %) (Ott 2008) and mounted in methyl salicylate (Sigma-Aldrich,

43 M6752) for imaging. Serial optical sections were obtained at 2 µm intervals on a Zeiss 880

2

44 confocal microscope with a Plan-Apochromat 10x/0.45 NA objective, at 1 µm with a LD-LCI

45 25x/0.8 NA objective or at 0.5 µm with a Plan-Apochromat 40x/0.8 NA objective. Maximum

46 projections of confocal stacks or rotation of images were carried out using FIJI (Schindelin et al.

47 2012).

48

49 Hormaphis cornu genome sequencing, assembly, and annotation

50

51 We collected H. cornu aphids from a single gall for genome sequencing (Supplemental

52 Table 3). All aphids within the gall were presumed to be clonal offspring of a single fundatrix,

53 because all H. cornu galls we have ever examined contain only a single fundatrix and the ostiole

54 of the galls was closed at the time we collected this gall, so there is little chance of inter-gall

55 migration. High molecular weight (HMW) DNA was prepared by gently grinding aphids with a

56 plastic pestle against the inside wall of a 2 mL Eppendorf tube in 1 mL of 0.5% SDS, 200 mM

57 Tris-HCl pH 8, 25 mM EDTA, 250 mM NaCl with 10 uL of 1 mg/mL RNAse A. Sample was

58 incubated at 37°C for 1 hour and then 30uL of 10 mg/mL ProteinaseK was added and the sample

59 was incubated for an additional 1 hr at 50°C with gentle agitation at 300 RPM. 1 mL of

60 Phenol:Chloroform:Isoamyl alcohol (25:24:1) was added and the sample was centrifuged at

61 16,000 RCF for 10 min. The supernatant was removed to a new 2mL Eppendorf tube and the

62 Phenol:Chloroform:Isoamyl alcohol extraction was repeated. The supernatant was removed to a

63 new 2 mL tube and 2.5 X volumes of absolute ethanol were added. The sample was centrifuged

64 at 16,000 RCF for 15 min and then washed with fresh 70% ethanol. All ethanol was removed

65 with a pipette and the sample was air dried for approximately 15 minutes and DNA was

3

66 resuspended in 50 uL TE. This sample was sent to HudsonAlpha Institute for Biotechnology for

67 genome sequencing.

68 DNA quality control, library preparation, and Chromium 10X linked read sequencing

69 were performed by HudsonAlpha Institute for Biotechnology. Most of the mass of the HMW

70 DNA appeared greater than approximately 50 kb on a pulsed field gel and paired end sequencing

71 on an Illumina HiSeq X10 yielded 816M reads. The genome was assembled using Supernova

72 (Weisenfeld et al. 2017) using 175M reads, which generated the best genome N50 of a range of

73 values tested. This 10X genome consisted of 21,072 scaffolds of total length 319.352 MB. The

74 genome scaffold N50 was 839.101 KB and the maximum scaffold length was 3.495 MB.

75 We then contracted with Dovetail Genomics to apply HiC and Chicago to generate larger

76 scaffolds. We submitted HMW gDNA from the same sample used for 10X genome sequencing

77 for Chicago and a separate sample of frozen aphids for HiC (Supplemental Table 3). The

78 Dovetail genome consisted of 11,244 scaffolds of total length 320.34 MB with a scaffold N50 of

79 36.084 Mb. This genome, named hormaphis_cornu_26Sep2017_PoQx8, contains 9 main

80 scaffolds, each longer than 17.934 Mb, which appear to represent the expected 9 chromosomes

81 of H. cornu (Blackman and Eastop 1994). This assembly also includes the circular genome of the

82 bacterial endosymbiont Buchnera aphidicola of 643,259 bp. BUSCO analysis (Simao et al. 2015)

83 using the gVolante web interface (Nishimura et al., 2017) with the Arthropod gene set revealed

84 that the genome contains 1026 of 1066 (96.25%) complete core genes and 1038 of 1066

85 (97.37%) partial core genes.

86 We annotated this genome for protein-coding genes using RNA-seq data collected from

87 salivary glands and carcasses of many stages of the H. cornu life cycle (Supplemental Table 4)

88 using BRAKER (Altschul et al. 1990; Barnett et al. 2011; Camacho et al. 2009; Hoff et al. 2016,

4

89 2019; Li 2013; Lomsadze, Burns, and Borodovsky 2014; Stanke et al. 2006). To increase the

90 efficiency of mapping RNA-seq reads for differential expression analysis, we predicted 3’ UTRs

91 using UTRme (Radío et al. 2018). We found that UTRme sometimes predicted UTRs within

92 introns. We therefore applied a custom R script to remove such predicted UTRs. Later, after

93 discovering the bicycle genes, we manually annotated all predicted bicycle genes, including 5’

94 and 3’ UTRs, in APOLLO (Lewis et al. 2002) by examining evidence from RNAseq reads

95 aligned to the genome with the Integrative Genomics Viewer (Robinson et al. 2011;

96 Thorvaldsdóttir, Robinson, and Mesirov 2013). We found that the start sites of many bicycle

97 genes were incorrectly annotated by BRAKER at a downstream methionine, inadvertently

98 excluding predicted putative signal peptides from these genes. RNAseq evidence often supported

99 transcription start sites that preceded an upstream methionine and these exons were corrected in

100 APOLLO. The combined collection of 18,895 automated and 687 manually curated gene models

101 (19,582 total) were used for all subsequent analyses of H. cornu genomic data.

102

103 Genome-wide association study of aphids inducing red and green galls

104

105 Galls produced by H. cornu were collected in the early summer (Table S5) and dissected

106 by making a single vertical cut down the side of each gall with a razor blade to expose the aphids

107 inside. DNA was extracted using the Zymo ZR-96 Quick gDNA kit from the foundress of each

108 gall, because foundresses do not appear to travel between galls. We performed tagmentation of

109 genomic DNA derived from 47 individuals from red galls and 43 from green galls using

110 barcoded adaptors compatible with the Illumina sequencing platform (Hennig et al. 2018).

111 Tagmented samples were pooled without normalization, PCR amplified for 14 cycles, and

5

112 sequenced on an Illumina NextSeq 500 to generated paired end 150 bp reads to an average depth

113 of 2.9X genomic coverage. The average sequencing depth before filtering was calculated by

114 multiplying the number of read pairs generated by SAMtools flagstat version 1.3 (Li et al. 2009)

115 by the read length of 150bp, then dividing by the total genome size (323,839,449bp).

116 We performed principle component analysis on the genome-wide polymorphism data to

117 detect any potential population structure that might confound GWAS study. Reads were mapped

118 using bwa mem version 0.7.17-r1188 (Li 2013) and joint genotyped using SAMtools mpileup

119 version 1.3, with the flag -ugf, followed by BCFtools call version 1.9 (Li 2011), with the flag -m.

120 Genotype calls were then filtered for quality and missingness using BCFtools filter and view

121 version 1.9, where only SNPs with MAF > 0.05, QUAL > 20, and genotyped in at least 80% of

122 the individuals were kept. To limit the number of SNPs for computational efficiency, the SNPs

123 were additionally thinned using VCFtools –thin version 0.1.15 to exclude any SNPs within

124 1000bp of each other. PCA was performed using the snpgdsPCA function from the R package

125 SNPRelate version 1.20.1 in R version 3.6.1 (Zheng et al. 2012).

126 We performed genome-wide association mapping with these low coverage data by

127 mapping reads with bwa mem version 0.7.17-r1188, and then calculated the likelihood of

128 association with gall color with SAMtools mpileup version 0.1.19 and BCFtools view -vcs

129 version 0.1.19 using BAM files as the input. Association for each SNP was measured by the

130 likelihood-ratio test (LRT) value in the INFO field of the output VCF file, which is a one-degree

131 association test p-value. This method calculates association likelihoods using genotype

132 likelihoods rather than hard genotype calls, ameliorating the issue of low-confidence genotype

133 calls resulting from low-coverage data (Li 2011). The false discovery rate was set as the

6

134 Bonferroni corrected value for 0.05, which was calculated as 0.05 / 50,957,130 (the total number

135 of SNPs in the genome-wide association mapping).

136

137 Enrichment and sequencing of the genomic region containing highly significant GWAS hits

138

139 The low coverage GWAS identified multiple linked SNPS on chromosome 1 that were

140 strongly associated with gall color (Figure 2A). To identify all candidate SNPs in this genomic

141 region and to generate higher-confidence GWAS calls, we enriched this genomic region from a

142 library of pooled tagmented samples of fundatrix DNA from red and green galls using custom

143 designed Arbor Bioscience MyBaits for a 800,290 bp region on chromosome 1 spanning the

144 highest scoring GWAS SNP (40,092,625 - 40,892,915 bp). This enriched library was sequenced

145 on an Illumina NextSeq 550 generating paired-end 150 bp reads and resulted in usable re-

146 sequencing data for 48 red gall-producing individuals and 42 green gall-producing individuals,

147 with average pre-filtered sequencing depth of 58.2X.

148 We mapped reads with bwa mem version 0.7.17-r1188 and sorted bam files with

149 SAMtools sort version 1.7, marked duplicate reads with Picard MarkDuplicates version 2.18.0

150 (http://broadinstitute.github.io/picard/), re-aligned indels using GATK IndelRealigner version 3.4

151 (McKenna et al. 2010), and called variants using SAMtools mpileup version 1.7 and BCFtools

152 call version 1.7 (https://github.com/SAMtools/bcftools). This genotyping pipeline is available at

153 https://github.com/YourePrettyGood/PseudoreferencePipeline (thereafter referred to as

154 PseudoreferencePipeline). SNPs were quality filtered from the VCF file using BCFtools view

155 version 1.7 at DP > 10 and MQ > 40 and merged using BCFtools merge.

7

156 For PCA analysis, the joint genotype calls were filtered for quality and missingness using

157 BCFtools filter and view version 1.9, where only SNPs with MAF > 0.05 and genotyped in at

158 least 80% of the individuals were retained. PCA was performed using the snpgdsPCA function

159 from the R package SNPRelate version 1.20.1 in Rstudio version 3.6.1.

160 Association testing was performed using PLINK version 1.90 (Purcell et al. 2007) with

161 minor allele frequency filtered at MAF > 0.2. We did not apply a Hardy-Weinberg equilibrium

162 filter because the samples were not randomly collected from nature. Red galls are rare in our

163 population and we oversampled fundatrices from red galls to roughly match the number of

164 fundatrices sampled from green galls. Results of the GWAS were plotted using the

165 plotManhattan function of Sushi version 1.24.0 (Phanstiel et al. 2014).

166 To calculate LD for the 45kbp region surrounding the 11 GWAS SNPs in Fig. S5B, we

167 extracted positions chromosome1:40,475,000 – 40,520,000 from the merged VCF and retained

168 SNPs that both exhibited MAF > 0.05 and were genotyped in at least 80% of the samples using

169 BCFtools view and filter version 1.9.

170 To plot LD for the entire target enrichment in Fig. S6D, we filtered the VCF for only

171 SNPs with MAF>0.2 and genotyped in at least 80% of the samples using bcftools view and filter

172 version 1.9, and thinned the resulting SNP set using VCFtools–thin version 0.1.15 to exclude any

173 SNPs within 500bp of each other. We further merged back the 11 significant GWAS SNPs using

174 bcftools concat, since the thinning process could have removed one or more of these SNPs. We

175 also removed SNPs in regions where the H. cornu reference genome did not align with the

176 genome of the sister species H. hamamelidis using BEDTools intersect version 2.29.2 (Quinlan

177 and Hall 2010).

8

178 The LD heatmaps were generated using the R packages vcfR version 1.10.0 (Knaus and

179 Grünwald 2017), snpStats version 1.36.0 (Clayton 2019) and LDheatmap version 0.99.8 (Shin et

180 al. 2006) in Rstudio version 3.6.1. The R code used to generate the LD heatmap figure was

181 adapted from code provided at sfustatgen.github.io/LDheatmap/articles/vcfOnLDheatmap.html.

182 The gene models were plotted using the plotGenes function from the R package Sushi version

183 1.24.0.

184

185 Lack of evidence for chromosomal aberrations

186

187 To identify possible chromosomal rearrangements or transposable elements that might be

188 linked to the GWAS SNPs, we first trimmed adapters from the H. cornu target enrichment data

189 using Trim Galore! version 0.6.5 and cutadapt version 2.7 (Martin 2011). The trimming pipeline

190 is available at github.com/YourePrettyGood/ParsingPipeline. We then mapped the reads to the

191 H. cornu reference genome with bwa mem version 0.7.17, sorted BAM files with SAMtools sort

192 version 1.9, and marked duplicate reads with Picard MarkDuplicates version 2.22.7

193 (http://broadinstitute.github.io/picard/), all done with the MAP function of the

194 PseudoreferencePipeline. The analysis includes 43 high coverage red individuals and 42 high

195 coverage green individuals. The five individuals isolated from red galls that did not carry the

196 associated GWAS SNPs in dgc were excluded since the genetic basis for their gall coloration is

197 unknown.

198 We subset the BAM file for each individual to contain only the target enrichment region

199 on chromosome 1 from 40,092,625 to 40,892,915 bp using SAMtools view version 1.9 and

200 generated merged BAM file for each color using SAMtools merge. The discordant reads were

9

201 then extracted from each BAM file using SAMtools view with flag -F 1286 and the percentage of

202 discordant reads was calculated as the ratio of the number of discordant reads over the total

203 number of mapped reads for each 5000 bp window.

204 To further explore the possibility that chromosomal aberrations near the GWAS signal

205 might differ between red- and green-gall producing individuals, we plotted the mapping

206 locations of discordant reads in the 100 kbp region near the 11 GWAS hits (40,450,000 -

207 40,550,000 bp) for red individuals, since the H. cornu reference was made from a green

208 individual. We obtained the read ID for all the discordant reads within the 100kbp region and

209 extracted all occurrences of these reads from the whole genome BAM file, regardless of their

210 mapping location. We then extracted the paired-end reads from the discordant reads BAM file

211 using bedtools bamtofastq version 2.29.2 and used bwa mem to map these reads as single-end

212 reads for read 1 and read 2 separately to a merged reference containing the H. cornu genome and

213 the 343 Acyrthosiphon pisum transposable elements annotated in RepBase (Jurka 1998). We then

214 removed duplicates and sorted the BAM file using SAMtools rmdup and sort and determined the

215 mapping location of all discordant reads using SAMtools view. We masked the windows from

216 chromosome 1: 40,400,000 - 40,499,999 bp and 40,500,000 - 40,599,999 bp in the genome-wide

217 scatter plot of discordant reads mapping because the majority of the discordant reads are

218 expected to map to these regions and displaying their counts would obscure potential signals in

219 the rest of the genome.

220

221 Large scale survey of 11 dgc SNPs associated with gall color

222

10

223 Aphids were collected from red and green galls as described above for the GWAS study

224 directly into Zymo DNA extraction buffer and ground with a plastic pestle. DNA was prepared

225 using the ZR-96 Quick gDNA kit. We developed qPCR assays and amplicon-seq assays to

226 genotype all individuals at all 11 SNPs (primers, etc below). PCR amplicon products were

227 barcoded and samples were pooled for sequencing on an Illumina platform.

228 Adaptors were trimmed from amplicon reads using Trim Galore! version 0.6.5 and

229 cutadapt version 2.7. The wrapper pipeline is available at

230 github.com/YourePrettyGood/ParsingPipeline. We mapped reads to a 34 kbp region of

231 chromosome 1 of the H. cornu genome that includes the amplicon SNPs (40,477,000 – 40,

232 511,000 bp) with bwa mem version 0.7.17, sorted BAM files with SAMtools sort version 1.9, and

233 re-aligned indels using GATK IndelRealigner version 3.4. No marking of duplicates was done

234 given the nature of amplicon sequencing data. To maximize genotyping efficiency and improve

235 accuracy, we performed variant calling with two distinct pipelines: SAMtools mpileup version

236 1.7 (Li et al., 2009) plus BCFtools call version 1.7, and GATK HaplotypeCaller version 3.4. The

237 mapping and indel re-alignment pipelines are available as part of the MAP (with flag only_bwa)

238 and IR functions of the PseudoreferencePipeline. Using the same PseudoreferencePipeline,

239 variant calling was performed using the MPILEUP function of BCFtools and HC function of

240 GATK. FASTA sequences for each individual, where the genotyped SNPs were updated in the

241 reference space, were then generated for both BCFtools and GATK variant calls using the above

242 PseudoreferencePipeline’s PSEUDOFASTA function, with flags “MPILEUP, no_markdup” and

243 “HC, no_markdup” respectively. The BCFtools SNP updating pipeline used bcftools filter,

244 query, and consensus version 1.9, and we masked all sites where MQ <= 20 or QUAL <= 26 or

245 DP <= 5. The HC SNP updating pipeline used GATK SelectVariants and

11

246 FastaAlternateReferenceMaker version 3.4, and we masked all sites where MQ < 50, DP <= 5,

247 GQ < 90 or RGQ < 90.

248 We then merged the variant calls from BCFtools and GATK, as well as the qPCR

249 genotyping results, and manually identified all missing or discrepant genotypes. We manually

250 curated these missing or discrepant genotype calls from the indel realigned BAM files using the

251 following criteria: for heterozygous calls, the site has to have at least two reads supporting each

252 allele, and for homozygous calls, the site has to have at least ten reads supporting the allele and

253 no reads supporting alternative alleles.

254

255 RNA-seq of salivary glands from aphids inducing red and green galls

256

257 We dissected salivary glands from fundatrices isolated from green and red galls in PBS,

258 gently pipetted the salivary glands from the dissection tray in < 0.5uL volume of PBS, and

259 deposited glands into 3 uL of Smart-seq2 lysis buffer (0.2%Triton-X 100, 0.1 U/uL RNasin®

260 Ribonuclease Inhibitor). RNA-seq libraries were prepared with a single-cell RNA-seq method

261 developed by the Janelia Quantitative Genomics core facility and described previously

262 (Cembrowski et al. 2018).

263 RNAseq libraries were prepared as described above for red and green gall samples except

264 that the entire 3uL sample of salivary glands in lysis buffer was provided as input. Barcoded

265 samples were pooled and sequenced on an Illumina NextSeq 550. We detected 9.0 million reads

266 per sample on average. We replaced the original oligonucleotides with the following

267 oligonucleotides to generate unstranded reads from the entire transcript:

Name Sequence Scale Purification WGB_RTr1 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN 250nm PAGE

12

WGB_PCRr1 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG 250nm HPLC WGB_PCRr2 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG 250nm HPLC WGB_TSOr2 /5Biosg/GTCTCGTGGGCTCGGAGATGTGTATAAGAGACArGrGrG 250nmR RNASE 268

269 Samples were PCR amplified for 18 cycles and the library was prepared using ¼ of the standard

270 Nextera XT sample size and 150 pg of cDNA.

271

272 Differential expression analyses of fundatrix salivary glands from red and green galls

273

274 All differential expression analyses for plant and aphid samples were performed in R

275 version 3.6.1 (R Core Team 2018) and all R Notebooks are provided on Github (will be posted

276 prior to publication). Adapters were trimmed using cutadapt version 2.7 and read counts per

277 transcript were calculated by mapping reads to the genome with hisat2 version 2.1.0 (Kim,

278 Langmead, and Salzberg 2015) and counting reads per gene with htseq-count version 0.12.4

279 (Anders, Pyl, and Huber 2015). In R, technical replicates were examined and pooled, since all

280 replicates were very similar to each other. We performed exploratory data analysis using

281 interactive multidimensional scaling plots, using the command glMDSPlot from the package

282 Glimma (Su et al. 2017), and outlier samples were excluded from subsequent analyses.

283 Differentially expressed genes were identified using the glmQLFTest and associated functions of

284 the package edgeR (Robinson, McCarthy, and Smyth 2009). Volcano plots were generated using

285 the EnhancedVolcano command from the package EnhancedVolcano version 1.4.0 (Blighe,

286 Rana, and Lewis 2020). Mean-Difference (MD) plots and multidimensional scaling plots were

287 generated using the plotMD and plotMDS functions, respectively, from the package limma

288 version 3.42.0 (Ritchie et al. 2015).

289

13

290 Hamamelis virginiana genome sequencing, assembly, and annotation

291

292 Leaves from a single tree of Hamamelis virginiana were sampled from the Janelia

293 Research Campus forest as follows. Branches containing leaves that were less than 50%

294 expanded were wrapped with aluminum foil and harvested after 40 hours. Leaves were cleared

295 of obvious contamination, including aphids and other insects, and then plunged into liquid N2.

296 Samples were stored at -80°C and sent to the Arizona Genomics Institute, University of Arizona

297 on dry ice, which prepared HMW DNA from nuclei isolated from the frozen leaves. The Janelia

298 Quantitative Genomics core facility generated a 10X chromium linked-read library from this

299 DNA and sequenced the library on an Illumina NextSeq 550 to generate 608M linked reads.

300 The H. virginiana genome was assembled with the supernova commands run and

301 mkoutput version 2.1.1, with options minsize=1000 and style=pseudohap (Weisenfeld et al.

302 2017). We used 332M reads in the assembly to achieve raw coverage of 56X as recommended

303 by the supernova instruction manual. The assembled genome was 886Mb and the scaffold N50

304 was 472Kb. BUSCO completeness analysis (Simao et al. 2015) was performed using the

305 gVolante Web interface (Nishimura, Hara, and Kuraku 2017) using the plants database.

306 BUSCO_v2/v3 showed 1309 of 1440 (90.9%) completely detected core genes and 1365 of 1440

307 (94.8%) partially detected core genes in the assembled H. virginiana reference genome.

308 The assembled genome reference was repeat masked with soft masking using

309 RepeatMasker version 4.0.9 (Smit n.d.). Twenty-five RNA-seq libraries from galls and leaves

310 were used for genome annotation. RNA-seq reads were adapter trimmed using cutadapt version

311 2.7 and mapped to the genome using HISAT2 version 2.1.0. Genome annotation was performed

312 with BRAKER version 2.1.4 using the RNAseq data to provide intron hints (Altschul et al. 1990;

14

313 Barnett et al. 2011; Camacho et al. 2009; Hoff et al. 2016, 2019; Li et al. 2009; Lomsadze et al.

314 2014; Ritchie et al. 2015; Stanke et al. 2006, 2008) and 3’ UTRs were predicted using UTRme

315 (Radío et al. 2018).

316

317 H. virginiana RNA extraction and RNA-seq library preparation

318

319 RNA was extracted from frozen H. virginiana leaf or gall tissue as follows. Plant tissue

320 frozen at -80°C was placed into ZR BashingBead Lysis Tubes (pre-chilled in liquid N2) and

321 pulverized to a fine powder in a Talboys High Throughput homogenizer (Troemer) with minimal

322 thawing. Powdered plant tissue was suspended in extraction buffer (100 mM Tris-HCl, pH 7.5,

323 25 mM EDTA, 1.5 M NaCl, 2% (w/v) Hexadecyltrimethylammonium bromide, 10%

324 Polyvinylpyrrolidone (w/v) and 0.3% (v/v) β-mercaptoethanol) and heated to 55°C for 8 min

325 followed by centrifugation at 13,000 x g for 5 minutes at room temperature to remove insoluble

326 debris (Jordon-Thaden et al. 2015). Total RNA was extracted from the supernatant using the

327 Quick-RNA Plant Miniprep Kit (Zymo Research) with the inclusion of in-column DNAse I

328 treatment. RNA-seq libraries were prepared with the Universal Plus mRNA-Seq kit (Nugen).

329

330 Differential expression analysis of red versus green galls

331

332 We performed differential expression analysis on red and green galls by collecting paired

333 red and green gall samples from the same leaves. In total, we collected 17 red galls and 22 green

334 galls from 17 leaves. RNA was prepared as described above for plant material and RNA-seq

15

335 libraries were prepared with a single-cell RNA-seq method modified for plant material described

336 above.

337 These RNAseq libraries contained on average 4.4 million mapped reads per sample.

338 Reads were quality trimmed and mapped to the transcriptome as described above. Only genes

339 with greater than 1 count per million in at least 15 samples were included in subsequent analyses.

340 The expression analysis model included the effect of leaf blocking.

341

342 Gall pigment extraction and analysis

343

344 Frozen gall tissue was ground to a powder under liquid nitrogen. Approximately 20 mg

345 of ground gall tissue was suspended in 100 µL methanol (Optima grade, Fisher Scientific) and

346 400 µl of 5% aqueous formic acid (Optima grade, Fisher Scientific), vortexed for 30 seconds and

347 centrifuged at 8000 x g for 2 min at 10° C. The supernatant containing pigment was filtered

348 using a 0.2 µm, 13 mm diameter PTFE syringe filter to remove debris. Colorless pellet was

349 discarded. Authentic anthocyanin pigment standards for malvidin 3,5-diglucoside chloride

350 (Sigma Aldrich, St. Louis, MO, USA) and peonidin-3,5-diglucoside chloride (Carbosynth

351 LLC,San Diego, CA, USA) were prepared at 1mg/mL in 5% aqueous formic acid.

352 Pigment separation and identification alongside standards was performed on a reverse

353 phase C18 column (Acquity Plus BEH, 50 mm × 2.1 mm, 1.7 μm particle size, Waters, Milford,

354 MA) using an Agilent 1290 UHPLC coupled to an Agilent 6545 quadrupole time-of-flight mass

355 spectrometer (Agilent Technologies, Santa Clara, CA, USA) using an ESI probe in positive ion

356 mode. Five µl of filtered pigment extract or a 1:100 dilution of anthocyanin standard was

357 injected Solvent (A) consisted of 5% aqueous formic acid and (B) 1:99 water/acetonitrile

16

358 acidified with 5% aqueous formic acid (v/v) (B). The gradient conditions were as follows: 1 min

359 hold at 0% B, 4 min linear increase to 20% B, 5 min linear increase to 40% B, ramp up to 95% B

360 in 0.1 min, hold at 95% B for 5 min, return to 0% B and hold for 0.9 min. The column flow rate

361 was 0.3 mL/min, and the column temperature 30° C. The MS source parameters for initial

362 anthocyanin detection were as follows: capillary = 4000 V, nozzle = 2000 V, gas temperature =

363 350° C, gas flow = 13 L/min, nebulizer = 30 psi, sheath gas temperature = 400° C, sheath gas

364 flow = 12 L/min. DAD detection at 300 nm and 520 nm, and MS scanning from 50-1700 m/z at

365 a rate of 2 spectra per second. Iterative fragmentation, followed by targeted MS/MS experiments,

366 were performed using a collision energy = 35. Authentic standards confirmed the presence of

367 peonidin-3,5-diglucoside and malvidin 3,5-diglucoside. The remaining anthoycyanin species

368 were identified using UV-Vis spectra, retention time relative to the other species in the sample,

369 [M]+ precursor ions and aglycone fragment ions matching the respective entries in the RIKEN

370 database (Sawada et al. 2012).

371

372 Differential expression analysis of galls versus leaves

373

374 We performed RNA seq on 36 gall samples and 17 adjacent leaf samples. These gall

375 samples did not overlap with the gall samples used in red versus green gall comparison described

376 earlier. For larger galls, RNA was isolated separately from basal, medial, and apical gall regions

377 (Table S10). Libraries were sequenced on an Illumina NextSeq 550 to generate 150 bp paired-

378 end reads with an average of 8.1 million mapped reads per sample. Only genes expressed at

379 greater than 1 count per million in at least 18 samples were included in subsequent analysis. We

17

380 included only samples where there are paired gall and leaf samples from the same leaf and

381 potential leaf effects were modeled in the different expression analysis.

382 To facilitate Gene Ontology (GO) analysis, the UniProt IDs of the differentially

383 expressed genes were obtained by mapping the coding sequence of the H. virginana genome to

384 the UniProt / Swiss-Prot database (Bateman 2019) using Protein-Protein BLAST 2.7.1 (Altschul

385 et al. 1990; Camacho et al. 2009) and extracting the differentially expressed genes. We used the

386 WebGestalt 2019 webtool (Liao et al. 2019) to perform GO analysis on the differentially

387 expressed genes.

388

389 Differential expression analysis of H. cornu organs and life stages

390

391 RNAseq libraries were generated for fundatrix salivary glands (N = 20) and whole bodies

392 (N = 8), G2 salivary glands (N = 6) and carcasses (N = 3), G5 salivary glands (N = 6) and

393 carcasses (N = 2), and G7 salivary glands (N = 5). Libraries were generated as described above

394 for salivary glands except that RNA samples of carcasses and whole bodies were prepared using

395 the Arcturus PicoPure RNA Isolation Kit (Applied Biosystems). Only genes expressed with at

396 least 1 count per million in at least 29 samples were included in subsequent analyses.

397

398 Bioinformatic identification of bicycle genes in H. cornu

399

400 Genes that were upregulated specifically in the salivary glands of the fundatrix generation

401 were prime candidates for inducing galls. We therefore identified genes that were upregulated

402 both in salivary glands of fundatrices versus sexuals and in fundatrix salivary glands. These

18

403 differentially expressed genes were then separated into genes with and without homologs

404 containing some functional annotation. Homologs with previous functional annotations were

405 identified using three methods: we performed (1) translated query-protein (blastx) and (2)

406 protein-protein (blastp) based homology searches using BLAST 2.7.1 (Altschul et al. 1990;

407 Camacho et al. 2009) against the UniProt / Swiss-Prot database (Bateman 2019) (Bateman

408 2019), and (3) Hidden-Markov based searches with the predicted proteins using hmmscan in

409 HMMER version 3.1b2 (Eddy 2011) against the pfam database (Finn et al. 2014). For all

410 predicted proteins, we also searched for secretion signal peptides using SignalP-5.0 (Almagro

411 Armenteros et al. 2019) and for transmembrane domains using tmhmm version 2.0 (Krogh et al.

412 2001). Gene Ontology analysis of genes with annotations that were enriched in fundatrix salivary

413 glands was performed by searching for Drosophila melanogaster homologs of differentially

414 expressed genes and using these D. melanogaster homologs as input into gene ontology analysis.

415 To determine whether any of these genes without detectable homologs in existing protein

416 databases were homologous to each another, we performed sensitive homology searches of all-

417 against-all of these genes using jackhmmer in HMMER version 3.1b2. We performed

418 hierarchical clustering on the quantitative results of the jackhmmer analysis by first calculating

419 distances amongst genes with the dist function using method canberra and clustering using the

420 hclust function with method ward.D2, both from the library stats in R (R Core Team 2018). We

421 aligned sequences of the clustered homologs using MAFFT version 7.419 with default

422 parameters (Katoh 2002; Katoh and Standley 2013), trimmed aligned sequences using trimAI

423 (Capella-Gutiérrez, Silla-Martínez, and Gabaldón 2009) with parameters -gt 0.50, and generated

424 sequence logos by importing alignments using the functions read.alignment and ggseqlogo in the

425 R packages seqinr (Charif and Lobry 2007) and ggseqlogo (Wagih 2017). After identification of

19

426 the bicycle genes, we searched for additional bicycle genes in the entire H. cornu genome, which

427 might not have been enriched in fundatrix salivary glands, using jackhmmer followed by

428 hierarchical clustering to identify additional putative homologs. As described above, we

429 manually annotated all these putative bicycle genes.

430

431 Differential expression analysis of a single H. cornu fundatrix with a dgcGreen genotype inducing

432 a red gall

433

434 Approximately 2.1% of fundatrices inducing red galls were homozygous for ancestral

435 “green” alleles at all of the 11 dgc SNPs (Fig. 2E). Five such individuals were found in our

436 original GWAS study and we did not observe any variants in the dgc gene region that were

437 associated specifically with these individuals, suggesting that they carried variants elsewhere in

438 the genome that caused them to generate red galls. Since isolating salivary glands from

439 fundatrices is challenging and time consuming, we were unable to systematically examine

440 transcriptome changes in the salivary glands of the rare individuals homozygous for dgcGreen that

441 induced red galls. However, we fortuitously isolated salivary glands from one fundatrix from a

442 red gall that was homozygous for dgcGreen. We performed whole-transcriptome sequencing of the

443 salivary glands from this one individual and compared expression levels of all genes in the

444 bicycle gene paralog group to which dgc belongs. We also examined the dgc transcripts

445 produced by this individual and found no exonic SNPs in the dgc transcripts produced by this

446 individual, indicating that this individual probably expressed a functional dgc transcript.

447

448 H. cornu and H. hamamelidis polymorphism and divergence measurements in GWAS region

20

449

450 To summarize the polymorphism and divergence patterns in H. cornu and H.

451 hamamelidis in the target enrichment region, we generated SNP updated FASTA sequences for

452 each individual by mapping the H. cornu reads to the H. cornu genome reference and the H.

453 hamamelidis reads to a H. hamamelidis SNP-updated reference genome in H. cornu genome

454 coordinate space. We used the same genotyping pipeline as described in the enrichment region

455 GWAS, using the PseudoreferencePipeline, and masked sites were DP <= 10, MQ <= 20.0, or

456 QUAL <= 29.5. High coverage individuals of H. cornu (n=90) and H. hamamelidis (N=92) were

457 included in this analysis, including both of the color phenotypes. Polymorphism within each

458 species and divergence (Dxy) between species were calculated using a custom script

459 calculateDiversity. Windowed measurements of each of these statistics are generated with the

460 custom script nonOverlappingWindows.cpp, using window size 3000bp. Both custom scripts are

461 available at github.com/YourePrettyGood/RandomScripts. Sites that are missing genotypes due

462 to masking in 50% or more of the samples are not included in the windowed average and

463 windows where 50% or more of the sites are missing are excluded from the plot in Fig. S13E.

464

465 Whole-genome polymorphism and divergence measurements in H. cornu and H. hamamelidis

466

467 To examine genome-wide pattern of polymorphism and divergence, we used the same

468 whole-genome sequencing data as in the genome-wide association study for gall color. This data

469 set includes samples from fundatrices isolated from 43 green galls and 47 red galls for H. cornu

470 and 48 galls for H. hamamemLidis. Reads were mapped using bwa mem version 0.7.17-r1188.

471 The H. cornu reads were mapped to the H. cornu reference genome and the H. hamamelidis

21

472 reads were mapped to the H. hamamelidis SNPs updated reference genome, which are in the

473 same coordinate space. We joint genotyped each species separately using bcftools version 1.9

474 mpileup and call -m to generate raw, multi-sample VCF for each species. Sites filtering for

475 QUAL >= 20 was then done using bcftools filter, and insertions and deletions sites were

476 removed using VCFtools version 0.1.16 –remove-indels. To generate a BED file of sites lacking

477 genotypes for each sample, bcftools view with flag -g ^miss was used to extract only the

478 genotyped sites for each sample, bcftools query was used to generate the location of all SNPs and

479 reference calls, and bedtools version 2.29.2 complement was used to generate the complementary

480 positions to these genotyped calls as the positions to be masked. Consensus sequences were

481 generated for each sample using bcftools consensus, with the flag –iupac-codes.

482 Polymorphism within each species and divergence (Dxy) between species were calculated

483 using a custom script calculateDiversity. Windowed measurements of each of these statistics

484 were generated with the custom script nonOverlappingWindows.cpp, using window size 1000bp.

485 Both scripts are available at github.com/YourePrettyGood/RandomScripts. Windows were

486 designated as “bicycle” if they overlapped in coordinates with any bicycle genes; otherwise they

487 were designated as “non-bicycle”. Sites masked in 80% or more of the samples were excluded in

488 the windowed average and windows with 80% or more of the sites missing were excluded from

489 genome-wide summary statistic calculations. This genotyping rate filter is more lenient to

490 accommodate the higher amount of missing data due to the shallower sequencing depth.

491

492 Bioinformatic analysis of bicycle gene structure and evolution

493

22

494 The H. cornu median exon size and number of exons per bicycle gene were calculated

495 from the GFF annotation file Augustus.updated_w_annots.21Aug20.gff3. The bicycle genes

496 alignment for the divergence tree was generated using FastTree version 2.1.11 (Price, Dehal, and

497 Arkin 2010) and plotted using ggtree version 2.2.2 (Yu 2020) in R.

498 To calculate DnDs between H. cornu and H. hamamelidis, we first generated a masked

499 H. hamamelidis reference genome by updating the H. cornu reference genome with H.

500 hamamelidis SNPs. We randomly subset H. hamamelidis reads from 150bp PE 10X linked reads

501 sequencing data to roughly 65X coverage using seqtk version 1.3 sample -s100

502 (https://github.com/lh3/seqtk) and trimmed adapters using Long Ranger version 2.2.2 basic (10X

503 Genomics). We then mapped the H. hamamelidis reads to the H. cornu reference and updated the

504 SNPs using the PseudoreferencePipeline. Sites where MQ <= 20, QUAL <= 26 or DP <= 5 were

505 masked, resulting in 21.0% of the genome being masked. This H. hamamelidis reference genome

506 was used in the DnDs calculation.

507 DnDs between all orthologous genes in H. cornu and H. hamamelidis was calculated

508 using the codeml function from the PAML package, version 4.9j (Yang 2007). The CDS

509 sequences were extracted from both reference genomes for each gene using

510 constructCDSesFromGFF3.pl and we split all degenerate bases into the two alleles to generate

511 two pseudo-haplotypes using fakeHaplotype.pl -s 42 (both scripts available at

512 github.com/YourePrettyGood/RandomScripts). Haplotype 1 was used as input into codeml for

513 both species. The settings used in codeml control file are: runmode=-2, seqtype=1,

514 CodonFreq=0, clock=0, model=0, NSsites=0, fix_kappa=0, kappa=2, fix_omega=0, omega=1,

515 Small_Diff=0.5e-6, method=0, fix_blength=0. Then, to evaluate whether DnDs is significantly

516 different from 1 for each gene, we additionally ran a null model with the same control file,

23

517 except with fix_omega=1. We then calculated the likelihood scores between the two models as

518 2×(lnL1-lnL2) and compared it to the 95th percentile of the chi-square distribution with 1 degree

519 of freedom. Mann-Whitney U test comparing DnDs of bicycle and non-bicycle genes was

520 performed using the wilcox.test function in R version 3.6.1.

521 To perform McDonald-Kreitman tests for bicycle genes, we generated population sample

522 of the bicycle coding regions for H. cornu by mapping RNA-seq data from 21 fundatrix salivary

523 gland samples, which provided high coverage. We performed genotyping using the STAR,

524 IRRNA, and HC functions in the PseudoreferencePipeline. This pipeline uses STAR version

525 2.7.3a for mapping and IndelRealigner and HaplotypeCaller in GATK version 3.4 for indel

526 realignment and variant calling. SNP-updated FASTA sequences were generated for each

527 individual from the genotype VCF using the PSEUDOFASTA function of the

528 PseudoreferencePipeline without any additional masking. Then, for each individual, we split all

529 degenerate bases into the two alleles to generate two pseudo-haplotypes using fakeHaplotype.pl -

530 s 42. We calculated the number of synonymous and nonsynonymous polymorphisms (PS and PN,

531 respectively) and divergent substitutions (DS and DN, respectively) using Polymorphorama

532 version 6

533 (https://ib.berkeley.edu/labs/bachtrog/data/polyMORPHOrama/polyMORPHOrama.html)

534 (Andolfatto 2007) for multiple categories of genes significantly over-expressed in the fundatrix

535 salivary gland. The fraction of non-synonymous divergent substitutions that were fixed by

536 positive selection (a; Fay, Wyckoff, and Wu 2002; Smith and Eyre-Walker 2002) was estimated

537 using the Cochran-Mantel-Haenszel test framework (Shapiro et al. 2007). Using the

! " 538 mantelhaen.test function in R version 3.6.1, we estimated a common odds ratio ( ! ") across a !""!

539 series of McDonald-Kreitman tables (J H McDonald and Kreitman 1991), and estimated � =

24

! " 540 1 − ! ". A MAF filter of 0.03 was applied to the polymorphism counts to reduce the downward !""!

541 bias in the estimator due to segregating weakly deleterious amino acid polymorphisms(Fay et al.

542 2002).

543

544 Genome-wide tests of selection

545

546 To scan for genome-wide signatures of positive selection in H. cornu and H.

547 hamamelidis, we used the joint-genotyped VCF for each species as described above. We filtered

548 for QUAL >= 20 using bcftools filter, and 80% maximum missing genotype and biallelic sites

549 using VCFtools version 0.1.16 -max-missing 0.2, -max-alleles 2, and -min-alleles 2. We then

550 performed the selection scan using SweeD-P version 3.1 (Pavlidis et al. 2013), with one grid

551 point per kilobase and -folded flag.

552 To determine the significance cutoff for the composite likelihood ratio (CLR) output by

553 SweeD, we simulated 10 Mbp regions under neutrality for 100 haplotypes for each species using

554 MaCS version 0.4f (Chen, Marjoram, and Wall 2008). The effective population size was set to

555 1,174,297 and 1,115,822 for H. cornu and H. hamamelidis, respectively. The mutation and

556 recombination rates were set to 2.8e-9 and 1.1e-8, respectively, per bp per generation. The

557 simulation output was converted to multi-sample VCF format using a custom script and run

558 through SweeD-P using the same parameters as for the observed data. We ran ten iterations of

559 the simulation for each species and combined the results to account for stochasticity of the

560 simulations. The CLR cutoffs for the observed data were chosen as the 99th quantiles of the CLR

561 values from the neutral simulations of their respective species.

562

25

563 Supplementary Text

564

565 Chromosomal aberrations cannot explain association of 11 SNPs with gall color

566

567 We considered the possibility that DNA aberrations linked to the 11 associated SNPs

568 might explain the red-green gall polymorphism. We found no evidence for large insertions,

569 deletions, or other chromosomal aberrations in this region (Fig. S1E-I). In addition, we

570 performed a genome-free association study (Rahman et al. 2018) and found strong linkage only

571 for a subset of the 11 SNPs identified in the original GWAS (Fig. S3), further supporting the

572 inference that these SNPs, and not other genetic changes, are associated with gall color.

573

574 A small fraction of fundatrices likely carry variants not linked to dgc that induce red galls

575

576 Two percent of fundatrices from red galls were homozygous for the dgcGreen alleles at all

577 11 SNPs (Fig. 2E). These individuals do not carry exonic SNPs in dgc that would alter dgc

578 function. We fortuitously performed RNA-seq on the salivary glands from one of these rare

579 fundatrices and found that dgc was expressed at high levels (Fig. S7C), which suggests that the

580 rare individuals that make red galls and are homozygous for dgcGreen alleles harbor variants at

581 loci other than dgc that generate red galls.

582

583 Gene ontology analysis of plant genes differentially expression in galls is consistent with the

584 observed cell biology of gall development

585

26

586 Genes associated with cell division were strongly upregulated in galls (Fig. S9B),

587 consistent with the extensive growth observed in gall tissue (Figure 1G-I). Genes associated with

588 chloroplasts and photosynthesis were strongly downregulated in gall tissue (Fig. S9B), probably

589 reflecting the reprogramming of plant leaf tissue from nutrient exporters to nutrient sinks (Inbar,

590 Eshel, and Wool 1995). We also observed downregulation of genes related to immune response

591 and multiple defense mechanisms in gall tissue (Fig. S9B), which may reflect suppression of the

592 plant’s defense responses to infection. The pattern of differential expression does not appear to

593 reflect a plant wound response, because wound response genes are not significantly over-

594 expressed in gall tissue (Chi-square P-value = 0.367) and are biased toward under-represented.

595

596 Evidence for roles of homologs of genes previously proposed as gall effector genes in H. cornu

597

598 Previously, the SSGP-71s effector gene family was described from the Hessian fly

599 genome (Zhao et al. 2015) and it was suggested that these encode E3-ligase-mimicking effector

600 proteins. We identified four homologs of the SSGP-71s effector gene family in H. cornu by

601 searching H. cornu protein sequences using a hidden marker model search with a protein profile

602 generated using the 426 SSGP-71 genes from Document S2, SSGP-71s tab, from Zhao et al.

603 2015 (hmmsearch with HMMER v3.2.1). However, none of these genes were differentially

604 expressed in the fundatrix salivary gland. We also identified seven ubiquitin E3-ligase genes

605 (Zhao et al. 2015) in H. cornu, two of which are upregulated in fundatrix salivary gland.

606 However, none of these seven genes contain N-terminal secretion signals, making it unlikely that

607 they could be injected into plants. Thus, these genes are unlikely to contribute to H. cornu gall

608 development.

27

609 Ring domain proteins usually mediate E2 ubiquitin ligase activity, which can modify

610 protein function or target proteins for proteosomal degradation (Metzger et al. 2014). The

611 genome of a galling phylloxeran, a species from the sister family to aphids, has been shown to

612 encode secreted RING domain proteins (Zhao, Rispe, and Nabity 2019), although it is not yet

613 clear how the phylloxeran secreted RING domain proteins act and whether they are involved in

614 gall-specific processes. We identified two H. cornu secreted RING domain proteins based on

615 their similarity to proteins in the PFAM database (Finn et al. 2014), but only one was

616 upregulated weakly in fundatrix salivary glands. Thus, secreted RING domain proteins are

617 unlikely to contribute to H. cornu gall development.

618

619 Estimation of divergence time and effective population sizes of H. cornu and H. hamamelidis

620

621 The sister species H. cornu and H. hamamelidis produce similar looking galls in adjacent

622 geographic areas and were long confused as populations of a single species (von Dohlen and

623 Stoetzel 1991). However, H. cornu exhibits a life cycle where aphids alternate between

624 Hamamelis virginiana and Betula nigra (Fig. 4A), whereas H. hamamelidis does not host

625 alternate to B. nigra and displays a truncated life cycle, where the offspring of the fundatrix

626 develop as sexuparae (the equivalent of G6 in Fig. 4A) and deposit sexuals on H. virginiana in

627 the autumn. Populations of H. cornu tend to live in the lowlands with prolonged summers and H.

628 hamamelidis tend to live in the highlands and more northern latitudes, which experience shorted

629 summers. In some locations, both species can be found making galls on the same trees.

630 Previous mtDNA sequencing supported the hypothesis that these are distinct species (von

631 Dohlen, Kurosu, and Aoki 2002). Genome-wide divergence (Dxy) between these species was

28

632 estimated to be 0.0189 and polymorphism was 0.0132 and 0.0125 for H. cornu and H.

633 hamamelidis, respectively. Assuming a mutation rate �=2.8e-9 (Keightley et al. 2014), we

634 estimated effective population sizes using � = 4�#� for H. cornu and H. hamamelidis as

635 1,174,297 and 1,115,822, respectively. Following the neutral theory of molecular evolution and

636 assuming a simple isolation model, we estimated the time of divergence between these two

!$% 637 species to be approximately 1.20 million generations, using ( − 1) × 2� , where � and � & # #

638 are calculated as the mean of the two species (Kimura 1968). Since these species experience one

639 sexual generation per year, we estimate the divergence time as 1.2 MYA.

640

641 References

642 Almagro Armenteros, José Juan, Konstantinos D. Tsirigos, Casper Kaae Sønderby, Thomas 643 Nordahl Petersen, Ole Winther, Søren Brunak, , and Henrik Nielsen. 644 2019a. “SignalP 5.0 Improves Signal Peptide Predictions Using Deep Neural Networks.” 645 Nature Biotechnology 37(4):420–23. doi: 10.1038/s41587-019-0036-z.

646 Almagro Armenteros, José Juan, Konstantinos D. Tsirigos, Casper Kaae Sønderby, Thomas 647 Nordahl Petersen, Ole Winther, Søren Brunak, Gunnar von Heijne, and Henrik Nielsen. 648 2019b. “SignalP 5.0 Improves Signal Peptide Predictions Using Deep Neural Networks.” 649 Nature Biotechnology 37(4):420–23. doi: 10.1038/s41587-019-0036-z.

650 Altschul, Stephen F., Warren Gish, , Eugene W. Myers, and David J. Lipman. 1990. 651 “Basic Local Alignment Search Tool.” Journal of Molecular Biology 215(3):403–10. doi: 652 10.1016/S0022-2836(05)80360-2.

653 Anders, S., P. T. Pyl, and W. Huber. 2015. “HTSeq--a Python Framework to Work with High- 654 Throughput Sequencing Data.” Bioinformatics 31(2):166–69. doi: 655 10.1093/bioinformatics/btu638.

656 Andolfatto, Peter. 2007. “Hitchhiking Effects of Recurrent Beneficial Amino Acid Substitutions 657 in the Drosophila Melanogaster Genome.” Genome Research 17(12):1755–62. doi: 658 10.1101/gr.6691007.

659 Barnett, Derek W., Erik K. Garrison, Aaron R. Quinlan, Michael P. Str̈ mberg, and Gabor T. 660 Marth. 2011. “Bamtools: A C++ API and Toolkit for Analyzing and Managing BAM 661 Files.” Bioinformatics 27(12):1691–92. doi: 10.1093/bioinformatics/btr174.

29

662 Bateman, Alex. 2019. “UniProt: A Worldwide Hub of Protein Knowledge.” Nucleic Acids 663 Research 47(D1):D506–15. doi: 10.1093/nar/gky1049.

664 Blackman, R. L., and V. F. Eastop. 1994. Aphids on the World’s Trees: An Identification and 665 Information Guide. Wallingford: CAB International.

666 Blighe, Kevin, Sharmila Rana, and Myles Lewis. 2020. “EnhancedVolcano: Publication-Ready 667 Volcano Plots with Enhanced Colouring and Labeling.”

668 Camacho, Christiam, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, 669 Kevin Bealer, and Thomas L. Madden. 2009. “BLAST+: Architecture and Applications.” 670 BMC Bioinformatics 10:1–9. doi: 10.1186/1471-2105-10-421.

671 Capella-Gutiérrez, Salvador, José M. Silla-Martínez, and Toni Gabaldón. 2009. “TrimAl: A Tool 672 for Automated Alignment Trimming in Large-Scale Phylogenetic Analyses.” 673 Bioinformatics 25(15):1972–73. doi: 10.1093/bioinformatics/btp348.

674 Cembrowski, Mark S., Matthew G. Phillips, Salvatore F. DiLisio, Brenda C. Shields, Johan 675 Winnubst, Jayaram Chandrashekar, Erhan Bas, and Nelson Spruston. 2018. “Dissociable 676 Structural and Functional Hippocampal Outputs via Distinct Subiculum Cell Classes.” 677 Cell 173(5):1280-1292.e18. doi: 10.1016/j.cell.2018.03.031.

678 Charif, D., and J. R. Lobry. 2007. “Seqin{R} 1.0-2: A Contributed Package to the {R} Project 679 for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis.” Pp. 680 207–32 in Structural approaches to sequence evolution: Molecules, networks, 681 populations, Biological and Medical Physics, Biomedical Engineering, edited by U. 682 Bastolla, M. Porto, H. E. Roman, and M. Vendruscolo. New York: Springer Verlag.

683 Chen, G. K., P. Marjoram, and J. D. Wall. 2008. “Fast and Flexible Simulation of DNA 684 Sequence Data.” Genome Research 19(1):136–42. doi: 10.1101/gr.083634.108.

685 Clayton, David. 2019. “SnpStats: SnpMatrix and XSnpMatrix Classes and Methods.”

686 von Dohlen, C. D., and M. B. Stoetzel. 1991. “Separation and Redescription of Hormaphis 687 Hamamelidis (Fitch 1851) and Hormaphis Cornu (Shimer 1867) (Homoptera: Aphididae) 688 on Witch-Hazel in the Eastern United States.” Proc. Entomol. Soc. Wash. 93:533–48.

689 von Dohlen, Carol D., Utako Kurosu, and Shigeyuki Aoki. 2002. “Phylogenetics and Evolution 690 of the Eastern Asian-Eastern North American Disjunct Aphid Tribe, Hormaphidini 691 (Hemiptera: Aphididae).” Molecular Phylogenetics and Evolution 23(2):257–67. doi: 692 10.1016/S1055-7903(02)00025-8.

693 Drozdetskiy, Alexey, Christian Cole, James Procter, and Geoffrey J. Barton. 2015. “JPred4: A 694 Protein Secondary Structure Prediction Server.” Nucleic Acids Research 43(W1):W389– 695 94. doi: 10.1093/nar/gkv332.

696 Eddy, Sean R. 2011. “Accelerated Profile HMM Searches.” PLoS Computational Biology 7(10). 697 doi: 10.1371/journal.pcbi.1002195.

30

698 Fay, Justin C., Gerald J. Wyckoff, and Chung-I. Wu. 2002. “Testing the Neutral Theory of 699 Molecular Evolution with Genomic Data from Drosophila.” Nature 415:1024–26.

700 Finn, Robert D., , Jody Clements, Penelope Coggill, Ruth Y. Eberhardt, Sean R. 701 Eddy, Andreas Heger, Kirstie Hetherington, Liisa Holm, Jaina Mistry, Erik L. L. 702 Sonnhammer, John Tate, and Marco Punta. 2014. “Pfam: The Protein Families 703 Database.” Nucleic Acids Research 42(D1):222–30. doi: 10.1093/nar/gkt1223.

704 Hennig, Bianca P., Lars Velten, Ines Racke, Chelsea Szu Tu, Matthias Thoms, Vladimir Rybin, 705 Hüseyin Besir, Kim Remans, and Lars M. Steinmetz. 2018. “Large-Scale Low-Cost NGS 706 Library Preparation Using a Robust Tn5 Purification and Tagmentation Protocol.” G3: 707 Genes, Genomes, Genetics 8(1):79–89. doi: 10.1534/g3.117.300257.

708 Hoff, Katharina J., Simone Lange, Alexandre Lomsadze, Mark Borodovsky, and Mario Stanke. 709 2016. “BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark- 710 ET and AUGUSTUS.” Bioinformatics 32(5):767–69. doi: 711 10.1093/bioinformatics/btv661.

712 Hoff, Katharina J., Alexandre Lomsadze, Mark Borodovsky, and Mario Stanke. 2019. Whole- 713 Genome Annotation with BRAKER. Vol. 1962.

714 Inbar, M., A. Eshel, and D. Wool. 1995. “Interspecific Competition among Phloem-Feeding 715 Insects Mediated by Induced Host-Plant Sinks.” Ecology 76(5):1506–15. doi: 716 10.2307/1938152.

717 Jurka, Jerzy. 1998. “Repeats in Genomic DNA: Mining and Meaning.” Current Opinion in 718 Structural Biology 8(3):333–37. doi: 10.1016/S0959-440X(98)80067-5.

719 Katoh, K. 2002. “MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on 720 Fast Fourier Transform.” Nucleic Acids Research 30(14):3059–66. doi: 721 10.1093/nar/gkf436.

722 Katoh, Kazutaka, and Daron M. Standley. 2013. “MAFFT Multiple Sequence Alignment 723 Software Version 7: Improvements in Performance and Usability.” Molecular Biology 724 and Evolution 30(4):772–80. doi: 10.1093/molbev/mst010.

725 Keightley, Peter D., Rob W. Ness, Daniel L. Halligan, and Penelope R. Haddrill. 2014. 726 “Estimation of the Spontaneous Mutation Rate per Nucleotide Site in a Drosophila 727 Melanogaster Full-Sib Family.” Genetics 196(1):313–20. doi: 728 10.1534/genetics.113.158758.

729 Kim, Daehwan, Ben Langmead, and Steven L. Salzberg. 2015. “HISAT: A Fast Spliced Aligner 730 with Low Memory Requirements.” Nature Methods 12(4):357–60. doi: 731 10.1038/nmeth.3317.

732 Kimura, Motoo. 1968. “Evolutionary Rate at the Molecular Level.” Nature 217(5129):624–26. 733 doi: 10.1038/217624a0.

31

734 Knaus, Brian J., and Niklaus J. Grünwald. 2017. “VCFR : A Package to Manipulate and Visualize 735 Variant Call Format Data in R.” Molecular Ecology Resources 17(1):44–53. doi: 736 10.1111/1755-0998.12549.

737 Krogh, Anders, Björn Larsson, Gunnar Von Heijne, and Erik L. L. Sonnhammer. 2001. 738 “Predicting Transmembrane Protein Topology with a Hidden Markov Model: 739 Application to Complete Genomes.” Journal of Molecular Biology 305(3):567–80. doi: 740 10.1006/jmbi.2000.4315.

741 Lewis, S. E., S. M. Searle, N. Harris, M. Gibson, V. Lyer, J. Richter, C. Wiel, L. Bayraktaroglir, 742 E. Birney, M. A. Crosby, J. S. Kaminker, B. B. Matthews, S. E. Prochnik, C. D. Smithy, 743 J. L. Tupy, G. M. Rubin, S. Misra, C. J. Mungall, and M. E. Clamp. 2002. “Apollo: A 744 Sequence Annotation Editor.” Genome Biology 3(12):1–14. doi: 10.1186/gb-2002-3-12- 745 research0082.

746 Li, Heng. 2011. “A Statistical Framework for SNP Calling, Mutation Discovery, Association 747 Mapping and Population Genetical Parameter Estimation from Sequencing Data.” 748 Bioinformatics 27(21):2987–93. doi: 10.1093/bioinformatics/btr509.

749 Li, Heng. 2013. “Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA- 750 MEM.” ArXiv 1303.3997.

751 Li, Heng, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, 752 Goncalo Abecasis, and Richard Durbin. 2009. “The Sequence Alignment/Map Format 753 and SAMtools.” Bioinformatics 25(16):2078–79. doi: 10.1093/bioinformatics/btp352.

754 Liao, Yuxing, Jing Wang, Eric J. Jaehnig, Zhiao Shi, and Bing Zhang. 2019. “WebGestalt 2019: 755 Gene Set Analysis Toolkit with Revamped UIs and APIs.” Nucleic Acids Research 756 47(W1):W199–205. doi: 10.1093/nar/gkz401.

757 Lomsadze, Alexandre, Paul D. Burns, and Mark Borodovsky. 2014. “Integration of Mapped 758 RNA-Seq Reads into Automatic Training of Eukaryotic Gene Finding Algorithm.” 759 Nucleic Acids Research 42(15):1–8. doi: 10.1093/nar/gku557.

760 Martin, Marcel. 2011. “Cutadapt Removes Adapter Sequences from High-Throughput 761 Sequencing Reads.” EMBnet.Journal 17(1):10–10. doi: 10.14806/ej.17.1.200.

762 McDonald, J H, and M. Kreitman. 1991. “Adaptive Protein Evolution at the Adh Locus in 763 Drosophila.” Nature 351(6328):652–54. doi: 10.1038/351652a0.

764 McDonald, John H., and Martin Kreitman. 1991. “Adaptive Protein Evolution at the Adh Locus 765 in Drosophila.” Nature 351(6328):652–54. doi: 10.1038/351652a0.

766 McKenna, Aaron, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew 767 Kernytsky, Kiran Garimella, David Altshuler, Stacey Gabriel, Mark Daly, and Mark A. 768 DePristo. 2010. “The Genome Analysis Toolkit: A MapReduce Framework for 769 Analyzing next-Generation DNA Sequencing Data.” Genome Research 20(9):1297– 770 1303. doi: 10.1101/gr.107524.110.

32

771 Metzger, Meredith B., Jonathan N. Pruneda, Rachel E. Klevit, and Allan M. Weissman. 2014. 772 “RING-Type E3 Ligases: Master Manipulators of E2 Ubiquitin-Conjugating Enzymes 773 and Ubiquitination.” Biochimica et Biophysica Acta - Molecular Cell Research 774 1843(1):47–60. doi: 10.1016/j.bbamcr.2013.05.026.

775 Nishimura, Osamu, Yuichiro Hara, and Shigehiro Kuraku. 2017. “GVolante for Standardizing 776 Completeness Assessment of Genome and Transcriptome Assemblies.” Bioinformatics 777 33(22):3635–37. doi: 10.1093/bioinformatics/btx445.

778 Ott, Swidbert R. 2008. “Confocal Microscopy in Large Insect Brains: Zinc-Formaldehyde 779 Fixation Improves Synapsin Immunostaining and Preservation of Morphology in Whole- 780 Mounts.” Journal of Neuroscience Methods 172(2):220–30. doi: 781 10.1016/j.jneumeth.2008.04.031.

782 Pavlidis, Pavlos, Daniel Živković, Alexandros Stamatakis, and Nikolaos Alachiotis. 2013. 783 “SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes.” 784 Molecular Biology and Evolution 30(9):2224–34. doi: 10.1093/molbev/mst112.

785 Phanstiel, Douglas H., Alan P. Boyle, Carlos L. Araya, and Michael P. Snyder. 2014. “Sushi.R: 786 Flexible, Quantitative and Integrative Genomic Visualizations for Publication-Quality 787 Multi-Panel Figures.” Bioinformatics 30(19):2808–10. doi: 788 10.1093/bioinformatics/btu379.

789 Price, Morgan N., Paramvir S. Dehal, and Adam P. Arkin. 2010. “FastTree 2 – Approximately 790 Maximum-Likelihood Trees for Large Alignments.” PLOS ONE 5(3):e9490. doi: 791 10.1371/journal.pone.0009490.

792 Purcell, Shaun, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel A. R. Ferreira, 793 David Bender, Julian Maller, Pamela Sklar, Paul I. W. De Bakker, Mark J. Daly, and Pak 794 C. Sham. 2007. “PLINK: A Tool Set for Whole-Genome Association and Population- 795 Based Linkage Analyses.” American Journal of Human Genetics 81(3):559–75. doi: 796 10.1086/519795.

797 Quinlan, Aaron R., and Ira M. Hall. 2010. “BEDTools: A Flexible Suite of Utilities for 798 Comparing Genomic Features.” Bioinformatics 26(6):841–42. doi: 799 10.1093/bioinformatics/btq033.

800 R Core Team. 2018. “R: A Language and Environment for Statistical Computing.”

801 Radío, Santiago, Rafael Sebastián Fort, Beatriz Garat, José Sotelo-Silveira, and Pablo Smircich. 802 2018. “UTRme: A Scoring-Based Tool to Annotate Untranslated Regions in 803 Trypanosomatid Genomes.” Frontiers in Genetics 9(December):671–671. doi: 804 10.3389/fgene.2018.00671.

805 Rahman, Atif, Ingileif Hallgrímsdóttir, Michael Eisen, and . 2018. “Association 806 Mapping from Sequencing Reads Using K-Mers.” ELife 7:e32920. doi: 807 10.7554/eLife.32920.

33

808 Ritchie, Matthew E., Belinda Phipson, Di Wu, Yifang Hu, Charity W. Law, Wei Shi, and 809 Gordon K. Smyth. 2015. “Limma Powers Differential Expression Analyses for RNA- 810 Sequencing and Microarray Studies.” Nucleic Acids Research 43(7):e47–e47. doi: 811 10.1093/nar/gkv007.

812 Robinson, James T., Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, 813 Gad Getz, and Jill P. Mesirov. 2011. “Integrative Genomics Viewer.” Nature 814 Biotechnology 29(1):24–26. doi: 10.1038/nbt0111-24.

815 Robinson, Mark D., Davis J. McCarthy, and Gordon K. Smyth. 2009. “EdgeR: A Bioconductor 816 Package for Differential Expression Analysis of Digital Gene Expression Data.” 817 Bioinformatics 26(1):139–40. doi: 10.1093/bioinformatics/btp616.

818 Sawada, Yuji, Ryo Nakabayashi, Yutaka Yamada, Makoto Suzuki, Muneo Sato, Akane Sakata, 819 Kenji Akiyama, Tetsuya Sakurai, Fumio Matsuda, Toshio Aoki, Masami Yokota Hirai, 820 and Kazuki Saito. 2012. “RIKEN Tandem Mass Spectral Database (ReSpect) for 821 Phytochemicals: A Plant-Specific MS/MS-Based Data Resource and Database.” 822 Phytochemistry 82:38–45. doi: 10.1016/j.phytochem.2012.07.007.

823 Schindelin, Johannes, Ignacio Arganda-Carreras, Erwin Frise, Verena Kaynig, Mark Longair, 824 Tobias Pietzsch, Stephan Preibisch, Curtis Rueden, Stephan Saalfeld, Benjamin Schmid, 825 Jean-Yves Yves Tinevez, Daniel James White, Volker Hartenstein, Kevin Eliceiri, Pavel 826 Tomancak, and Albert Cardona. 2012. “Fiji: An Open-Source Platform for Biological- 827 Image Analysis.” Nat Methods 9(7):676–82. doi: 10.1038/nmeth.2019.

828 Shapiro, Joshua A., Wei Huang, Chenhui Zhang, Melissa J. Hubisz, Jian Lu, David A. Turissini, 829 Shu Fang, Hurng-Yi Wang, Richard R. Hudson, Rasmus Nielsen, Zhu Chen, and Chung- 830 I. Wu. 2007. “Adaptive Genic Evolution in the Drosophila Genomes.” Proceedings of the 831 National Academy of Sciences 104(7):2271–76. doi: 10.1073/pnas.0610385104.

832 Shin, Ji-Hyung, Sigal Blay, Jinko Graham, and Brad McNeney. 2006. “LDheatmap : An R 833 Function for Graphical Display of Pairwise Linkage Disequilibria Between Single 834 Nucleotide Polymorphisms.” Journal of Statistical Software 16(Code Snippet 3). doi: 835 10.18637/jss.v016.c03.

836 Simao, Felipe A., Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, and 837 Evgeny M. Zdobnov. 2015. “BUSCO: Assessing Genome Assembly and Annotation 838 Completeness with Single-Copy Orthologs.” Bioinformatics 31(19):3210–12. doi: 839 10.1093/bioinformatics/btv351.

840 Smit, A. F. A., Hubley, R. ,. Green, P. n.d. RepeatMasker Open-4.0.

841 Smith, N. G., and A. Eyre-Walker. 2002. “Adaptive Protein Evolution in Drosophila.” Nature 842 415(6875):1022–24. doi: 10.1038/4151022a.

843 Stanke, Mario, Mark Diekhans, Robert Baertsch, and . 2008. “Using Native and 844 Syntenically Mapped CDNA Alignments to Improve de Novo Gene Finding.” 845 Bioinformatics 24(5):637–44. doi: 10.1093/bioinformatics/btn013.

34

846 Stanke, Mario, Oliver Schöffmann, Burkhard Morgenstern, and Stephan Waack. 2006. “Gene 847 Prediction in Eukaryotes with a Generalized Hidden Markov Model That Uses Hints 848 from External Sources.” BMC Bioinformatics 7:1–11. doi: 10.1186/1471-2105-7-62.

849 Su, Shian, Charity W. Law, Casey Ah-Cann, Marie-Liesse Asselin-Labat, Marnie E. Blewitt, and 850 Matthew E. Ritchie. 2017. “Glimma: Interactive Graphics for Gene Expression 851 Analysis.” Bioinformatics 33(13):2050–52.

852 Thorvaldsdóttir, Helga, James T. Robinson, and Jill P. Mesirov. 2013. “Integrative Genomics 853 Viewer (IGV): High-Performance Genomics Data Visualization and Exploration.” 854 Briefings in Bioinformatics 14(2):178–92. doi: 10.1093/bib/bbs017.

855 Wagih, Omar. 2017. “Ggseqlogo: A Versatile R Package for Drawing Sequence Logos.” 856 Bioinformatics 33(22):3645–47. doi: 10.1093/bioinformatics/btx469.

857 Weisenfeld, Neil I., Vijay Kumar, Preyas Shah, Deanna M. Church, and David B. Jaffe. 2017. 858 “Direct Determination of Diploid Genome Sequences.” Genome Research 27(5):757–67. 859 doi: 10.1101/gr.235812.118.

860 Yang, Z. 2007. “PAML 4: Phylogenetic Analysis by Maximum Likelihood.” Molecular Biology 861 and Evolution 24(8):1586–91. doi: 10.1093/molbev/msm088.

862 Yu, Guangchuang. 2020. “Using Ggtree to Visualize Data on Tree-Like Structures.” Current 863 Protocols in Bioinformatics 69(1):e96. doi: 10.1002/cpbi.96.

864 Zhao, Chaoyang, Lucio Navarro Escalante, Hang Chen, Thiago R. Benatti, Jiaxin Qu, Sanjay 865 Chellapilla, Robert M. Waterhouse, David Wheeler, Martin N. Andersson, Riyue Bao, 866 Matthew Batterton, Susanta K. Behura, Kerstin P. Blankenburg, Doina Caragea, James C. 867 Carolan, Marcus Coyle, Mustapha El-Bouhssini, Liezl Francisco, Markus Friedrich, 868 Navdeep Gill, Tony Grace, Cornelis J. P. Grimmelikhuijzen, Yi Han, Frank Hauser, 869 Nicolae Herndon, Michael Holder, Panagiotis Ioannidis, Laronda Jackson, Mehwish 870 Javaid, Shalini N. Jhangiani, Alisha J. Johnson, Divya Kalra, Viktoriya Korchina, 871 Christie L. Kovar, Fremiet Lara, Sandra L. Lee, Xuming Liu, Christer Löfstedt, Robert 872 Mata, Tittu Mathew, Donna M. Muzny, Swapnil Nagar, Lynne V. Nazareth, Geoffrey 873 Okwuonu, Fiona Ongeri, Lora Perales, Brittany F. Peterson, Ling Ling Pu, Hugh M. 874 Robertson, Brandon J. Schemerhorn, Steven E. Scherer, Jacob T. Shreve, Denard 875 Simmons, Subhashree Subramanyam, Rebecca L. Thornton, Kun Xue, George M. 876 Weissenberger, Christie E. Williams, Kim C. Worley, Dianhui Zhu, Yiming Zhu, Marion 877 O. Harris, Richard H. Shukle, John H. Werren, Evgeny M. Zdobnov, Ming Shun Chen, 878 Susan J. Brown, Jeffery J. Stuart, and Stephen Richards. 2015. “A Massive Expansion of 879 Effector Genes Underlies Gall-Formation in the Wheat Pest Mayetiola Destructor.” 880 Current Biology 25(5):613–20. doi: 10.1016/j.cub.2014.12.057.

881 Zhao, Chaoyang, Claude Rispe, and Paul D. Nabity. 2019. “Secretory RING Finger Proteins 882 Function as Effectors in a Grapevine Galling Insect.” BMC Genomics 20(1):1–12. doi: 883 10.1186/s12864-019-6313-x.

35

884 Zheng, X., D. Levine, J. Shen, S. M. Gogarten, C. Laurie, and B. S. Weir. 2012. “A High- 885 Performance Computing Toolset for Relatedness and Principal Component Analysis of 886 SNP Data.” Bioinformatics 28(24):3326–28. doi: 10.1093/bioinformatics/bts606.

887

36

888

889 Fig. S1. Supporting genetic evidence that the 11 SNPs located near dgc are the only genetic 890 variants strongly associated with gall color. 891 (A-D) The first two components of a principal components analysis (PCA) of genome-wide 892 variation (A) and of variation in an 800 kbp window centered on dgc (C) reveal no large scale 893 association of genetic variation with gall color. The top ten principal components of the genome- 894 wide (B) and targeted (D) PCA each explain less than 2% and 5%, respectively, of the genetic 895 variance.

37

896 (E-I) Distribution of mapped, unpaired Illumina sequencing reads provides no evidence for 897 common large-scale genomic aberrations in aphids from either green (above) or red (below) 898 galls. Discordant reads from fundatrices making red (E) and green (F) galls from the entire 899 ~800kbp resequenced region are rare and are not specifically associated with gall color (G). 900 Close examination of the genomic region harboring SNPs strongly associated with gall color (red 901 vertical lines) shows that discordant reads are rare and not linked to the associated SNPs. 902 Discordant reads do not map preferentially anywhere in the genome (I).

38

903 904 Fig. S2. Genotypes of SNPs in the dgc gene region. 905 (A) Gene models for the region ~40.475-40.52 of Chromosome 1. Dgc is labelled in red, other 906 bicycle genes are labelled in yellow. 907 (B) Genotypes for each of the 90 individuals (42 from green galls, 48 from red galls; 4 low 908 coverage individuals were excluded) from the original GWAS inferred from approximately 60X 909 sequencing coverage for this region. Each row represents the genotypes at each of 698 SNPs. 910 Genotypes are color coded as homozygous reference alleles (green), heterozygous (yellow), 911 homozygous non-reference allele (orange). Data were thinned to exclude any SNPs within 500bp 912 of each other. 913 (C) Frequencies of each genotype at each variable position.

39

914

915 916 917 Fig. S3. An alignment-free association test identifies most of the same associated SNPs as 918 the GWAS. 919 (A) Distribution of significant kmers associated with gall color detected by the alignment-free 920 association test in the approximately 800 kb re-sequenced region centered on dgc (approximately 921 40.48-40.5 Mbp). The most significant kmers are found only located within and close to dgc. 922 Non-significant kmers are not shown. 923 (B) Region from approximately 40.475-40.51 of plot (A), showing the finer-scale distribution of 924 significant kmers. The red lines indicate SNPs detected in the original GWAS. All of the GWAS 925 SNPs are confirmed by the alignment-free association test. The absence of significant kmers 926 elsewhere in this region suggests that there are no genome rearrangements of transposon 927 insertions that are associated with gall color.

40

928 929 930 Fig. S4. Predicted amino-acid sequence of g16073. 931 Several secondary structure predictions are shown, including SignalP 5.0 prediction (Almagro 932 Armenteros et al. 2019) of an N-terminal secretion signal from positions 1-24 (bold, underlined) 933 and jpred4 secondary structure prediction (Drozdetskiy et al. 2015) below primary sequence (H 934 = helical, E = extended). Cysteines and tyrosines are colored red and blue, respectively, and the 935 two CYC motifs are underlined. Orange triangles above protein sequence indicate intron-exon 936 boundaries. 937

41

938 939

940 Fig. S5. Linkage disequilibrium in dgc gene region. 941 (A) Gene models for the region ~40.475-40.52 of Chromosome 1. Dgc is labelled in red, other 942 bicycle genes are labelled in yellow.

943 (B) Linkage disequilibrium between all sites. Strong linkage disequilibrium is observed between 944 the 11 SNPs linked to gall color (labelled as “Significant GWAS SNPs” above plot), but not 945 between these SNPs and other variable sites.

42

946

947 Fig. S6. No evidence for long-distance linkage disequilibrium among variable sites in an 948 800 kbp region centered on dgc.

43

949 (A) Gene models for the region ~40.1-40.9 of Chromosome 1. Dgc is labelled in red, other 950 bicycle genes are labelled in yellow. 951 (B) Genotypes for each of the 90 individuals (42 from green galls, 48 from red galls; 4 low 952 coverage individuals were excluded) from the original GWAS inferred from approximately 60X 953 sequencing coverage for this region. Each row represents the genotypes at each of 698 SNPs. 954 Genotypes are color coded as homozygous reference alleles (green), heterozygous (yellow), 955 homozygous non-reference allele (orange). Data were thinned to exclude any SNPs within 500bp 956 of each other. 957 (C) Frequencies of each genotype at each variable position. 958 (D) Linkage disequilibrium between all sites. Strong linkage disequilibrium is observed between 959 the 11 SNPs linked to gall color (labelled as “Significant GWAS SNPs” above plot), but not 960 between these SNPs and other variable sites.

44

961

962 Fig. S7. Expression of dgc 963 (A) Expression of dgc in salivary glands, whole bodies, or carcasses (body minus salivary 964 glands) throughout the H. cornu life cycle. Salivary glands were dissected from multiple 965 nymphal stages and adults of four generations representing major morphs of the life cycle, G1 966 (fundatrix), G2, G5, and sexuals. Dgc is expressed at highest levels in salivary glands of 967 fundatrices. Dgc expression declines in salivary glands through the instars of G2 animals and 968 later generations and was not detected in salivary glands of sexuals. Expression observed in full 969 bodies of G1 animals (fundatrices) probably reflect expression in the salivary glands and 970 expression was not observed in carcasses. 971 (B) MD plot showing that dgc is the only gene differentially expressed in salivary glands of 972 fundatrices sampled from red versus green galls and is strongly downregulated. The mean 973 expression level of dgc illustrates that it is among the most highly expressed genes in salivary 974 glands. Aphids from red and green galls were genotyped and carried dgcRed/dgcGreen and 975 dgcGreen/dgcGreen genotypes at all 11 SNPs associated with gall color, respectively. 976 (C) Expression levels of genes in the bicycle gene paralog group that includes dgc from a single 977 individual with a dgcGreen/dgcGreen genotype at all 11 SNPs that inhabited a red gall. Dgc is 978 expressed at similar levels as Horco_16072, similar to the pattern observed for individuals that 979 make green galls (Fig. 3B). Thus, it is likely that this individual, and possibly the remaining rare 980 fundatrices that induce red galls with dgcGreen/dgcGreen genotypes, do not do so through reduction

45

981 in dgc expression, but instead carry genetic variation at one or more other genes that induce red 982 galls. 983 (D) Number of fundatrix salivary gland RNA-seq reads containing alternative dgc exonic SNPs 984 at two genomic location for aphids collected from green (blue font) and red (red font) galls. Four 985 out of eight green and one out of two red samples deviate from the expectation of equal 986 expression from both alleles in an individual. Thus, there is no evidence for allelic imbalance 987 between fundatrices making green and red galls. In addition, both alternative alleles in red 988 samples are expressed at substantially lower levels than the same alleles in green samples. Thus, 989 a single copy of the SNPs that repress dgc expression (Fig. 3B) appear to repress dgc from both 990 alleles in heterozygous individuals. 991

46

992 993 Fig. S8. MS/MS spectra of anthocyanins detected in the red gall extract, which clearly 994 demonstrate the aglycone product ions previously reported for cyanidin-3,5-diglucoside 995 (A), delphinidin-3,5-diglucoside (B), malvidin-3,5-diglucoside (C), peonidin-3,5-diglucoside 996 (D), and petunidin-3,5-diglucoside (E). 997

47

998

999 Fig. S9. Genome-wide differential expression analysis of H. virginiana galls versus leaves 1000 (A) Genome-wide differential expression analysis of H. virginiana transcripts isolated from galls 1001 (N = 36) versus leaves (N = 17). Approximately 60% of expressed genes are differentially 1002 expressed between gall and leaf tissue at FDR < 0.05. 1003 (B) Gene ontology analysis of GO terms down (left) and up-regulated (right) in galls, presented 1004 as volcano plots. Genes involved in cell division and morphogenesis were strongly upregulated 1005 in galls and genes involved in photosynthesis were strongly down-regulated in galls.

48

1006

1007 Fig. S10. Further analysis of H. cornu fundatrix salivary gland enriched and bicycle genes.

49

1008 (A, B) Differential expression comparing gene expression in fundatrix salivary glands versus 1009 fundatrix whole bodies (A) and salivary glands of fundatrices versus sexuals (B) shown as 1010 volcano plots. Both comparisons reveal that a large number of genes are strongly over-expressed 1011 in fundatrix salivary glands. Intersection of the genes over-expressed in these two comparisons 1012 yielded the focal collection of genes over-expressed specifically in fundatrix salivary glands 1013 (Fig. 4B). Genes differentially over- or under-expressed at FDR < 0.05 are shown in blue or red, 1014 respectively. 1015 (C) Volcano plot of Gene Ontology over-representation analysis of “Annotated” genes from Fig. 1016 6B showing KEGG pathway terms for biological processes that were over-represented with a 1017 raw P value < 0.05. No GO terms were significantly under-represented with a raw P value < 1018 0.05. Analyses performed with WebGestalt (Liao et al. 2019). 1019 (D) MD plot of differential expression results shown in panel (B) with “Unannotated” and 1020 “Annotated” genes with signal peptides over-expressed in fundatrix salivary glands labeled in 1021 purple and yellow, respectively. 1022 (E) Distribution of singleton bicycle genes and paralog clusters in the H. cornu genome. Number 1023 of genes per cluster and genomic range is indicating by height and width, respectively, of blue 1024 bars above chromosomes. Histogram of number of bicycle genes per paralog cluster is shown in 1025 inset. 1026 (F) Example of part of a paralog cluster of bicycle genes from chromosome 7 of the H. cornu 1027 genome, illustrating abundance of small exons in each gene. 1028 (G) Number of exons per gene versus median exon size for H. cornu bicycle (blue) and non- 1029 bicycle (red) genes. Bicycle genes possess an unusually large number of unusually small exons. 1030 (H) Maximum likelihood phylogenetic tree of H. cornu bicycle gene amino acid sequences 1031 reveals extensive sequence divergence of bicycle genes. 1032 1033

50

1034

1035 Fig. S11. Further analysis of unannotated fundatrix salivary gland enriched genes. 1036 (A) Logo plot of the aligned predicted protein sequences from CWG genes. Sequence alignment 1037 used for logo was filtered to highlight conserved positions.

51

1038 (B, E) Volcano plots of differential expression of fundatrix versus sexual salivary glands with 1039 CWG genes (B) and Unclustered genes (E) labelled in purple for clusters identified through 1040 hierarchical clustering (Fig. 6C). 1041 (C, F) Expression levels of CWG (C) and Unclustered (F) genes in the salivary glands of 1042 individuals across four generations of the H. cornu life cycle. Lines connect the same gene 1043 across generations and are color coded by relative expression levels in G1, with blue to red 1044 representing most to least strongly expressed, respectively. 1045 (D) bicycle genes are, on average, the most strongly differentially expressed category of 1046 unannotated genes.

52

1047 1048 Figure S12. Comparison of rate of non-synonymous to synonymous substitutions in bicycle 1049 versus non-bicycle genes 1050 (A) The majority of bicycle genes display dN/dS values great than 1, with few showing strong 1051 sequence conservation (dN/dS <<1). Dashed vertical red line indicates dN/dS = 1. 1052 (B) Non-bicycle genes are more conserved, on average, than bicycle genes (Mann-Whitney U 1053 test p=2.6e-76. Dashed vertical red line indicates dN/dS = 1. 1054 (C) Mean number of adaptive non-synonymous substitutions scaled by protein length for 1055 different categories of genes over-expressed in fundatrix salivary glands. As a proportion of 1056 protein length, bicycle genes display the fastest rate of adaptive evolution of any category of 1057 these genes. Note that the four categories on the right include all genes shown on the left, but 1058 categorized by whether genes were annotated and included a signal peptide. Thus, for example, 1059 the category “No annotation with signal peptide” is composed mostly of bicycle and CWG genes.

53

1060 1061 1062 Fig. S13. Polymorphism is depressed relative to divergence in bicycle gene regions. 1063 (A) Gene models in 800 kb region including bicycle gene cluster containing dgc shown above. 1064 Divergence between (black line) and polymorphism within H. cornu (blue line) and H. 1065 hamamelidis (pink line) in 3000bp windows shown below. 1066 (B-H) Gene models and population genomic statistics for seven additional genomic regions 1067 containing bicycle gene clusters. 1068 1069

54

1070 1071

1072 1073 Fig. S14. Strongly reduced polymorphism to divergence levels are found in bicycle gene 1074 regions 1075 (A) Genome-wide non-overlapping window moving averages of polymorphism (Pi) divided by 1076 divergence (Dxy) between H. cornu (black lines) and H. hamamelidis (purple lines). Gene 1077 models in the 800 kbp region centered on dgc. Bicycle genes are located within ranges indicated 1078 in orange. 1079 (B and C) Ratio of Pi to Dxy for bicycle and non-bicycle gene regions in H. cornu (B) and H. 1080 hamamelidis (C). 1081 (D and E) The observed difference in Pi/Dxy between non-bicycle and bicycle genes (dashed red 1082 line) is much larger than the expectation generated by permuting the locations of Pi/Dxy values 1083 relative to gene locations for both H. cornu (D) and H. hamamelidis (E).

55

1084 1085 Fig. S15. Genome-wide distribution of signatures of recent selective sweeps detected by 1086 SweeD in H. cornu. 1087 (A) Genome-wide distribution of the loge of the composite likelihood ratio (CLR) for evidence of 1088 a recent selective sweep is shown in purple. The location of significant signals (P < 0.01) 1089 calculated after simulations assuming neutrality (Supplementary text) is indicated with red 1090 circles. The locations of bicycle genes are indicated with orange rectangles. 1091 (B) The average distance from each bicycle gene to the closest significant selective sweep signal 1092 is shown with dashed red line and the values after 1000 permutations of sweep signals relative to 1093 gene locations are shown as grey histogram. The observed sweep signals are closer to bicycle 1094 genes than expected by chance.

56

1095 1096 Fig. S16. Genome-wide distribution of signatures of recent selective sweeps detected by 1097 SweeD in H. hamamelidis. 1098 (A) Genome-wide distribution of the loge of the composite likelihood ratio (CLR) for evidence of 1099 a recent selective sweep is shown in purple. The location of significant signals (P < 0.01) 1100 calculated after simulations assuming neutrality (Supplementary text) is indicated with red 1101 circles. The locations of bicycle genes are indicated with orange rectangles. 1102 (B) The average distance from each bicycle gene to the closest significant selective sweep signal 1103 is shown with dashed red line and the values after 1000 permutations of sweep signals relative to 1104 gene locations are shown as grey histogram. The observed sweep signals are closer to bicycle 1105 genes than expected by chance. 1106 1107 1108 1109

57

1110 Table S1. dN/dS test for significant positive selection on genes diverged between H. cornu 1111 and H. hamamelidis. Only genes estimated to have experienced significant positive or 1112 negative are included. 1113 dN/dS <1 >1 Proportion >1 non-bicycle 5716 229 0.039 bicycle 10 58 0.853

Proportion bicycle 0.002 0.202 1114 1115 1116 Table S2. McDonald-Kreitman alpha and gene statistics for different classes of genes 1117 overexpressed in H. cornu fundatrix salivary glands. 1118 non- synonymous synonymous CMH substitutions ± substitutions ± alpha CMH alpha CI SD SD Mean # CDS bicycle 0.33 0.242-0.408 8.0 ± 6.3 1.7 ± 1.5 683 bicycle dnds > 1 0.45 0.362-0.533 9.8 ± 6.7 1.3 ± 1.2 689 bicycle dnds significantly > 1 0.62 0.450-0.733 15.9 ± 8.8 1.1 ± 1.3 692 CWG 0.38 0.246-0.498 15.4 ± 7.8 6.3 ± 3.4 2337 unclustered 0.27 0.110-0.410 3.4 ± 4.0 1.4 ± 2.0 710 Annotated with signal peptide 0.42 0.243-0.557 3.8 ± 4.8 2.5 ± 2.2 1414 No annotation with signal peptide 0.34 0.262-0.404 8.7 ± 6.9 2.2 ± 2.3 848 Annotated without signal peptide 0.41 0.349-0.461 3.1 ± 4.8 3.2 ± 4.1 1750 No annotation without signal peptide 0.33 0.189-0.439 4.5 ± 4.8 1.4 ± 1.8 710 1119

1120 Table S3. Details of biological samples, sequence files, and SRA accession numbers for 1121 genome sequencing. 1122 Sample # Collection Collection Species Library Sequenci # Read pairs Accession # Date Location Prep ng Method Method MP2 15-Jun- Morven Park, Hormaphis 10X Illumina 408,093,791 2016 Leesburg, VA cornu Chromium PE151 MP2 15-Jun- Morven Park, Hormaphis Chicago Illumina 234,423,934 2016 Leesburg, VA cornu PE151 80 31-May- Gap Road, Hormaphis HiC Illumina 281,999,702 2017 Rt. 651, cornu PE151 Leesburg, VA ML97Red 7-Jun- Mountain Hormaphis 10X Illumina 91,611,428 2018 Lakes hamamelidis Chromium PE151

58

Biological Station, Pembroke, VA Hamamelis 17-Apr- Janelia Hamamelis 10X Illumina 303,926,536 virginiana 2019 Research virginiana Chromium PE150 nuclei Campus, Ashburn, VA 1123

59

1124 Table S4. Details of biological samples, sequence files, and SRA accession numbers for H. 1125 cornu RNA-sequencing. WGB refers to Whole Gene Body library preparation method 1126 described in Supplemental Methods. SS refers to the Smart-SCRB library preparation 1127 methods described in (Cembrowski et al., 2018). JRC = Janelia Research Campus, 1128 Ashburn, VA. 1129 Sample ID Collec Collect Stage Gener Organ Library Sequen # Read Accession tion ion ation Prep cing pairs # Locati Date Method Method on 1st_I_FundS JRC 6-4- 1 1 SG WGB Illumina 12643792, G 2016 PE100 33584393 1st_II_Fund JRC 6-4- 1 1 SG WGB Illumina 12412280, SG 2016 PE100 29986671 1st_III_Fund JRC 6-4- 1 1 SG WGB Illumina 9566012, SG 2016 PE100 28830029 early2nd_I_ JRC 6-4- 2 1 SG WGB Illumina 11846803, FundSG 2016 PE100 31258786 early2nd_II_ JRC 6-4- 2 1 SG WGB Illumina 14097211, FundSG 2016 PE100 33244826 early2nd_III JRC 6-4- 2 1 SG WGB Illumina 10610077, _FundSG 2016 PE100 26423617 late2nd_I_F JRC 6-4- 2 1 SG WGB Illumina 11783417, undSG_rep1 2016 PE100 29867606 late2nd_II_F JRC 6-4- 2 1 SG WGB Illumina 14189826, undSG_rep1 2016 PE100 37094462 late2nd_III_ JRC 6-4- 2 1 SG WGB Illumina 12137125, FundSG_rep 2016 PE100 30412534 1 late2nd_IV_ JRC 6-4- 2 1 SG WGB Illumina 34021474 FundSG 2016 PE100 late2nd_V_F JRC 6-4- 2 1 SG WGB Illumina 33255973 undSG 2016 PE100 late2nd_VI_ JRC 6-4- 2 1 SG WGB Illumina 29588804 FundSG 2016 PE100 3rd_I_FundS JRC 29-4- 2 1 SG WGB Illumina 30833668 G 2016 PE100 3rd_II_Fund JRC 29-4- 2 1 SG WGB Illumina 33651454 SG 2016 PE100 3rd_III_Fund JRC 29-4- 2 1 SG WGB Illumina 40510300 SG 2016 PE100 4th_I_FundS JRC 3-5- 2 1 SG WGB Illumina 36267453 G 2016 PE100 4th_II_Fund JRC 3-5- 2 1 SG WGB Illumina 37712709 SG 2016 PE100 4th_III_Fund JRC 3-5- 2 1 SG WGB Illumina 37200793 SG 2016 PE100 adult_I_Fun JRC 3-5- 2 1 SG WGB Illumina 37826115 dSG 2016 PE100

60

adult_II_Fun JRC 3-5- 2 1 SG WGB Illumina 31970558 dSG 2016 PE100 adult_III_Fu JRC 4-5- 2 1 SG WGB Illumina 29795163 ndSG 2016 PE100 whole_1st_I JRC 6-4- 1 1 Whole WGB Illumina 36313971 2016 PE100 whole_early JRC 6-4- 2 1 Whole WGB Illumina 44067290 2nd_I_Fund 2016 PE100 whole_late2 JRC 6-4- 2 1 Whole WGB Illumina 32843185 nd_I_Fund 2016 PE100 whole_v_lat JRC 6-4- 2 1 Whole WGB Illumina e2nd_I_Fun 2016 PE100 35509383 d whole_3rd_I JRC 29-4- 3 1 Whole WGB Illumina 39418696 _Fund 2016 PE100 whole_3rd_I JRC 29-4- 3 1 Whole WGB Illumina 40421752 I_Fund 2016 PE100 whole_4th_I JRC 3-5- 4 1 Whole WGB Illumina 38585734 _Fund 2016 PE100 whole_adult JRC 4-5- Ad 1 Whole WGB Illumina 42742613 _I_Fund 2016 PE100 2nd_gen_1s JRC 2-6- 1 2 SG WGB Illumina 11405653 tinst_SG1 2018 PE150 2nd_gen_1s JRC 2-6- 1 2 SG WGB Illumina 10794548 tinst_SG2 2018 PE150 2nd_gen_1s JRC 2-6- 1 2 SG WGB Illumina 12574061 tinst_SG3 2018 PE150 2ndgen_3rdi JRC 19-5- 3 2 SG WGB Illumina nst_sg_I_1 2016 PE100 20922005 2ndgen_3rdi JRC 19-5- 3 2 SG WGB Illumina nst_sg_II_1 2016 PE100 19238796 2ndgen_3rdi JRC 19-5- 3 2 SG WGB Illumina nst_sg_III_1 2016 PE100 19740444 2ndgen_3rdi JRC 19-5- 3 2 Whole WGB Illumina nst_whole_I 2016 PE100 _1 18321283 2ndgen_3rdi JRC 19-5- 3 2 Whole WGB Illumina nst_whole_I 2016 PE100 I_1 18971169 2ndgen_3rdi JRC 19-5- 3 2 Whole WGB Illumina nst_whole_I 2016 PE100 II_1 19463547 2nd_gen_Ad JRC 2-6- 3 Ad SG WGB Illumina 12714848 _SG4 2018 PE150 2nd_gen_Ad JRC 2-6- 3 Ad SG WGB Illumina 10858738 _SG5 2018 PE150 2nd_gen_Ad JRC 2-6- 3 Ad SG WGB Illumina 13120645 _SG6 2018 PE150 ex_bet_AdS JRC 8-7- Ad Ex SG WGB Illumina 4914246, G8 2018 Betula PE150 13175533 ex_bet_4thS JRC 8-7- 4 Ex SG WGB Illumina 6611299, G9 2018 Betula PE150 18243185 61

ex_bet_4thS JRC 8-7- 4 Ex SG WGB Illumina 6578419, G10 2018 Betula PE150 17834081 ex_bet_AdS JRC 8-7- Ad Ex SG WGB Illumina 5627200, G11 2018 Betula PE150 10425494 ex_bet_AdS JRC 8-7- Ad Ex SG WGB Illumina 6915842, G12 2018 Betula PE150 13798518 ex_bet_AdS JRC 8-7- Ad Ex SG WGB Illumina 7365598, G13 2018 Betula PE150 14951729 ex_bet_nym JRC 8-7- 4 Ex SG WGB Illumina 11106332 phs_15 2018 Betula PE150 ex_bet_nym JRC 8-7- 4 Ex SG WGB Illumina 12481764 phs_16 2018 Betula PE150 sex_SG_19_ JRC 22-10- Nymph Sexual SG WGB Illumina 9662029 1 2019 PE150 sex_SG_19_ JRC 22-10- Nymph Sexual SG WGB Illumina 12690319 2 2019 PE150 sex_SG_19_ JRC 22-10- Nymph Sexual SG WGB Illumina 10875978 3 2019 PE150 sex_SG_19_ JRC 22-10- Nymph Sexual SG WGB Illumina 9540662 4 2019 PE150 sex_SG_19_ JRC 22-10- Nymph Sexual SG WGB Illumina 9632389 5 2019 PE150 D02 JRC 18-11- Nymph Sexual 3 pooled SS Illumina 23182206 2018 heads SE125 E02 JRC 18-11- Nymph Sexual 3 pooled SS Illumina 2018 bodies SE125 23157666 F02 JRC 18-11- Nymph Sexual 4 pooled SS Illumina 2018 salivary SE125 glands 24353653 G02 JRC 18-11- Nymph Sexual 3 pooled SS Illumina 2018 salivary SE125 glands 21591985 H02 JRC 18-11- Nymph Sexual 3 pooled SS Illumina 2018 salivary SE125 glands 17705013 1130 1131

62

1132 Table S5. Details of biological samples, sequence files, and SRA accession numbers for H. 1133 cornu GWAS whole-genome re-sequencing. 1134 Sample # Collection Collection Gall Library Sequencing # Read Accession Location Date Color Prep Method pairs # Method Moore St & 10-May- Tn5 Illumina B02L02-gr Guyot Ave., 2017 Green PE150 Princeton, NJ 6366491 Moore St & 10-May- Tn5 Illumina B03L03 Guyot Ave., 2017 Green PE150 Princeton, NJ 4593475 Goose Creek 31-May- Tn5 Illumina GCL74 Lane, Leesburg, 2017 Green PE150 VA 4809727 Herrontown 31-May- Tn5 Illumina HW03L02 Woods, 2017 Green PE150 Princeton, NJ 3884184 Herrontown 31-May- Tn5 Illumina HW03L07 Woods, 2017 Green PE150 Princeton, NJ 8469155 Herrontown 31-May- Tn5 Illumina HW04L01 Woods, 2017 Green PE150 Princeton, NJ 4589629 Herrontown 31-May- Tn5 Illumina HW04L07 Woods, 2017 Green PE150 Princeton, NJ 6432189 Herrontown 31-May- Tn5 Illumina HW05L02 Woods, 2017 Green PE150 Princeton, NJ 5734918 Herrontown 31-May- Tn5 Illumina HW06L01G01 Woods, 2017 Green PE150 Princeton, NJ 5168885 Herrontown 31-May- Tn5 Illumina HW07L01 Woods, 2017 Green PE150 Princeton, NJ 5796597 Herrontown 31-May- Tn5 Illumina HW08L03 Woods, 2017 Green PE150 Princeton, NJ 5326728 Herrontown 31-May- Tn5 Illumina HW09L05 Woods, 2017 Green PE150 Princeton, NJ 3783083 Herrontown 31-May- Tn5 Illumina HW10L01 Woods, 2017 Green PE150 Princeton, NJ 3390296 Herrontown 31-May- Tn5 Illumina HW11L01 Woods, 2017 Green PE150 Princeton, NJ 6969888

63

Herrontown 31-May- Tn5 Illumina HW02L05 Woods, 2017 Green PE150 Princeton, NJ 4714807 Herrontown 31-May- Tn5 Illumina HWL04 Woods, 2017 Green PE150 Princeton, NJ 5643363 Herrontown 31-May- Tn5 Illumina HWL07 Woods, 2017 Green PE150 Princeton, NJ 4917264 Janelia Research 3-Apr- Tn5 Illumina JRC03 Campus, 2017 Green PE150 Ashburn, VA 4863866 Janelia Research 3-Apr- Tn5 Illumina JRC04 Campus, 2017 Green PE150 Ashburn, VA 4523085 Janelia Research 3-Apr- Tn5 Illumina JRC10 Campus, 2017 Green PE150 Ashburn, VA 5182279 Janelia Research 3-Apr- Tn5 Illumina JRC12 Campus, 2017 Green PE150 Ashburn, VA 4178879 Janelia Research 3-Apr- Tn5 Illumina JRC13 Campus, 2017 Green PE150 Ashburn, VA 1985664 Janelia Research 3-Apr- Tn5 Illumina JRC15 Campus, 2017 Green PE150 Ashburn, VA 4191677 Janelia Research 3-Apr- Tn5 Illumina JRC17 Campus, 2017 Green PE150 Ashburn, VA 5230834 Janelia Research 3-Apr- Tn5 Illumina JRC19 Campus, 2017 Green PE150 Ashburn, VA 4844552 Janelia Research 3-Apr- Tn5 Illumina JRC20 Campus, 2017 Green PE150 Ashburn, VA 4750713 Janelia Research 3-Apr- Tn5 Illumina JRC27 Campus, 2017 Green PE150 Ashburn, VA 4868486 Man Lu Ridge, 16-May- Tn5 Illumina MLR36 Green Maryland 2017 PE150 3547891 Man Lu Ridge, 16-May- Tn5 Illumina MLR37 Green Maryland 2017 PE150 3401601 Morven Park, 20-May- Tn5 Illumina MP55 Green Leesburg, VA 2017 PE150 4030936 Morven Park, 20-May- Tn5 Illumina MP56 Green Leesburg, VA 2017 PE150 5264763 Old Waterford 20-May- Tn5 Illumina OWR61 Road, Leesburg, 2017 Green PE150 VA 4227643

64

Old Waterford 30-May- Tn5 Illumina OWR73 Road, Leesburg, 2017 Green PE150 VA 4276012 Point of Rocks, 16-May- Tn5 Illumina POR29 Green Maryland 2017 PE150 3962098 St. Michael's 1-Jun- Tn5 Illumina SM01L01 Reserve, 2017 Green PE150 Princeton, NJ 4883070 St. Michael's 1-Jun- Tn5 Illumina SM02L01 Reserve, 2017 Green PE150 Princeton, NJ 4381921 St. Michael's 1-Jun- Tn5 Illumina SM03L01 Reserve, 2017 Green PE150 Princeton, NJ 4311021 St. Michael's 1-Jun- Tn5 Illumina SM05L01 Reserve, 2017 Green PE150 Princeton, NJ 4506282 St. Michael's 1-Jun- Tn5 Illumina SM05L05 Reserve, 2017 Green PE150 Princeton, NJ 3972013 Woodfield 26-May- Tn5 Illumina W02L01G01 Reservation, 2017 Green PE150 Princeton, NJ 5347819 Woodfield 26-May- Tn5 Illumina W03L01 Reservation, 2017 Green PE150 Princeton, NJ 4536267 Woodfield 26-May- Tn5 Illumina W05L03G02 Reservation, 2017 Green PE150 Princeton, NJ 6397623 Woodfield 26-May- Tn5 Illumina W06L02 Reservation, 2017 Green PE150 Princeton, NJ 6315684 Moore St & 10-May- Tn5 Illumina B01L01 Guyot Ave., 2017 Red PE150 Princeton, NJ 8135750 Moore St & 10-May- Tn5 Illumina B01L03 Guyot Ave., 2017 Red PE150 Princeton, NJ 4429167 Moore St & 10-May- Tn5 Illumina B02L01 Guyot Ave., 2017 Red PE150 Princeton, NJ 4620277 Moore St & 10-May- Tn5 Illumina B02L02-red Guyot Ave., 2017 Red PE150 Princeton, NJ 5625715 Moore St & 10-May- Tn5 Illumina B03L01 Guyot Ave., 2017 Red PE150 Princeton, NJ 10312327 Herrontown 31-May- Tn5 Illumina HW02L02 Woods, 2017 Red PE150 Princeton, NJ 4039449

65

Herrontown 31-May- Tn5 Illumina HW03L01 Woods, 2017 Red PE150 Princeton, NJ 4130302 Herrontown 31-May- Tn5 Illumina HW03L03 Woods, 2017 Red PE150 Princeton, NJ 3384712 Herrontown 31-May- Tn5 Illumina HW03L04 Woods, 2017 Red PE150 Princeton, NJ 3904224 Herrontown 31-May- Tn5 Illumina HW06L01G02 Woods, 2017 Red PE150 Princeton, NJ 5403059 Herrontown 31-May- Tn5 Illumina HW07L02 Woods, 2017 Red PE150 Princeton, NJ 8799194 Herrontown 31-May- Tn5 Illumina HW09L02 Woods, 2017 Red PE150 Princeton, NJ 3673337 Herrontown 31-May- Tn5 Illumina HW09L03 Woods, 2017 Red PE150 Princeton, NJ 5235539 Herrontown 31-May- Tn5 Illumina HW09L07 Woods, 2017 Red PE150 Princeton, NJ 8256988 Herrontown 31-May- Tn5 Illumina HW10L02 Woods, 2017 Red PE150 Princeton, NJ 5492996 Herrontown 31-May- Tn5 Illumina HWL03 Woods, 2017 Red PE150 Princeton, NJ 7814556 Janelia Research 3-Apr- Tn5 Illumina JRC01 Campus, 2017 Red PE150 Ashburn, VA 4617289 Janelia Research 3-Apr- Tn5 Illumina JRC06 Campus, 2017 Red PE150 Ashburn, VA 5709657 Janelia Research 3-Apr- Tn5 Illumina JRC11 Campus, 2017 Red PE150 Ashburn, VA 4441152 Janelia Research 3-Apr- Tn5 Illumina JRC14 Campus, 2017 Red PE150 Ashburn, VA 4255537 Janelia Research 3-Apr- Tn5 Illumina JRC16 Campus, 2017 Red PE150 Ashburn, VA 7587374 Janelia Research 3-Apr- Tn5 Illumina JRC18 Campus, 2017 Red PE150 Ashburn, VA 4944638 Janelia Research 3-Apr- Tn5 Illumina JRC22 Campus, 2017 Red PE150 Ashburn, VA 6692470

66

Janelia Research 3-Apr- Tn5 Illumina JRC23 Campus, 2017 Red PE150 Ashburn, VA 4475288 Janelia Research 3-Apr- Tn5 Illumina JRC25 Campus, 2017 Red PE150 Ashburn, VA 5506970 Janelia Research 21-May- Tn5 Illumina JRC84 Campus, 2017 Red PE150 Ashburn, VA 7362956 Janelia Research 21-May- Tn5 Illumina JRC85 Campus, 2017 Red PE150 Ashburn, VA 7150172 Janelia Research 21-May- Tn5 Illumina JRC86 Campus, 2017 Red PE150 Ashburn, VA 6592107 Janelia Research 21-May- Tn5 Illumina JRC87 Campus, 2017 Red PE150 Ashburn, VA 6071483 Janelia Research 21-May- Tn5 Illumina JRC88 Campus, 2017 Red PE150 Ashburn, VA 3476862 Janelia Research 21-May- Tn5 Illumina JRC89 Campus, 2017 Red PE150 Ashburn, VA 6474867 Janelia Research 21-May- Tn5 Illumina JRC90 Campus, 2017 Red PE150 Ashburn, VA 7186265 Janelia Research 21-May- Tn5 Illumina JRC91 Campus, 2017 Red PE150 Ashburn, VA 6643361 Janelia Research 21-May- Tn5 Illumina JRC92 Campus, 2017 Red PE150 Ashburn, VA 4674510 Janelia Research 21-May- Tn5 Illumina JRC93 Campus, 2017 Red PE150 Ashburn, VA 5182706 Janelia Research 21-May- Tn5 Illumina JRC94 Campus, 2017 Red PE150 Ashburn, VA 4946068 Janelia Research 21-May- Tn5 Illumina JRC95 Campus, 2017 Red PE150 Ashburn, VA 7026186 Janelia Research 21-May- Tn5 Illumina JRC96 Campus, 2017 Red PE150 Ashburn, VA 4946848 Janelia Research 21-May- Tn5 Illumina JRC97 Campus, 2017 Red PE150 Ashburn, VA 6115771 Old Waterford 30-May- Tn5 Illumina OWR71 Road, Leesburg, 2017 Red PE150 VA 5324101

67

Point of Rocks, 16-May- Tn5 Illumina POR30 Red Maryland 2017 PE150 3709000 Point of Rocks, 16-May- Tn5 Illumina POR31 Red Maryland 2017 PE150 6515377 St. Michael's 1-Jun- Tn5 Illumina SM05L02 Reserve, 2017 Red PE150 Princeton, NJ 5328410 St. Michael's 1-Jun- Tn5 Illumina SM05L04 Reserve, 2017 Red PE150 Princeton, NJ 4428657 Woodfield 31-May- Tn5 Illumina W02L02 Reservation, 2017 Red PE150 Princeton, NJ 4068904 Woodfield 31-May- Tn5 Illumina W02L06 Reservation, 2017 Red PE150 Princeton, NJ 3336967 Woodfield 31-May- Tn5 Illumina W05L05 Reservation, 2017 Red PE150 Princeton, NJ 4073188 1135

68

1136 Table S6. Details of biological samples, sequence files, and SRA accession numbers for H. 1137 cornu GWAS targeted re-sequencing of approximately 800 kbp of Chromosome 1. 1138 Sample # Collection Collection Gall Library Sequencing # Read Accession Location Date Color Prep Method pairs # Method B01L02 Moore St & 10-May- Green Tn5 Illumina Guyot Ave., 2017 PE150 Princeton, NJ 214163 B02L02-gr Moore St & 10-May- Green Tn5 Illumina Guyot Ave., 2017 PE150 Princeton, NJ 190639 B03L03 Moore St & 10-May- Green Tn5 Illumina Guyot Ave., 2017 PE150 Princeton, NJ 212907 GCL74 Goose Creek 31-May- Green Tn5 Illumina Lane, Leesburg, 2017 PE150 VA 306230 HW03L02 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 291669 HW03L07 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 306531 HW04L07 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 270871 HW05L02 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 262310 HW06L01G01 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 269379 HW07L01 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 237951 HW08L03 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 257010 HW09L05 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 211921 HW10L01 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 241629 HW11L01 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 237027

69

HWL02L05 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 272737 HWL04 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 249525 HWL07 Herrontown 31-May- Green Tn5 Illumina Woods, 2017 PE150 Princeton, NJ 304184 JH01L04G01 Jockey Hollow, 12-Jun- Green Tn5 Illumina Morristown, NJ 2017 PE150 225593 JH02L04G02 Jockey Hollow, 3-Apr- Green Tn5 Illumina Morristown, NJ 2017 PE150 241035 JRC03 Janelia Research 3-Apr- Green Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 270579 JRC04 Janelia Research 3-Apr- Green Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 291029 JRC12 Janelia Research 3-Apr- Green Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 291716 JRC15 Janelia Research 3-Apr- Green Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 278157 JRC19 Janelia Research 3-Apr- Green Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 296693 JRC27 Janelia Research 3-Apr- Green Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 289882 MLR36 Man Lu Ridge Rd., 16-May- Green Tn5 Illumina Jefferson, MD 2017 PE150 237264 MLR37 Man Lu Ridge Rd., 16-May- Green Tn5 Illumina Jefferson, MD 2017 PE150 215201 MP55 Morven Park, 20-May- Green Tn5 Illumina Leesburg, VA 2017 PE150 233325 MP56 Morven Park, 20-May- Green Tn5 Illumina Leesburg, VA 2017 PE150 201670 OWR61 Old Waterford 20-May- Green Tn5 Illumina Road, Leesburg, 2017 PE150 VA 175533 OWR73 Old Waterford 20-May- Green Tn5 Illumina Road, Leesburg, 2017 PE150 VA 132423 POR29 Point of Rocks, 16-May- Green Tn5 Illumina MD 2017 PE150 236262 SM01L01 St. Michael’s 1-Jun- Green Tn5 Illumina Reserve, 2017 PE150 Princeton, NJ 325832

70

SM02L01 St. Michael’s 1-Jun- Green Tn5 Illumina Reserve, 2017 PE150 Princeton, NJ 313663 SM03L01 St. Michael’s 1-Jun- Green Tn5 Illumina Reserve, 2017 PE150 Princeton, NJ 257124 SM05L01 St. Michael’s 1-Jun- Green Tn5 Illumina Reserve, 2017 PE150 Princeton, NJ 292603 SM05L05 St. Michael’s 1-Jun- Green Tn5 Illumina Reserve, 2017 PE150 Princeton, NJ 231169 W01L01 Woodfield 26-May- Green Tn5 Illumina Reservation, 2017 PE150 Princeton, NJ 300257 W02L01G01 Woodfield 26-May- Green Tn5 Illumina Reservation, 2017 PE150 Princeton, NJ 315314 W03L01 Woodfield 26-May- Green Tn5 Illumina Reservation, 2017 PE150 Princeton, NJ 313024 W05L03G02 Woodfield 26-May- Green Tn5 Illumina Reservation, 2017 PE150 Princeton, NJ 223027 W06L02 Woodfield 26-May- Green Tn5 Illumina Reservation, 2017 PE150 Princeton, NJ 303327 B01L01 Moore St. & 10-May- Red Tn5 Illumina Guyot Ave., 2017 PE150 Princeton, NJ 303803 B01L03 Moore St & 10-May- Red Tn5 Illumina Guyot Ave., 2017 PE150 Princeton, NJ 323593 B02L01 Moore St & 10-May- Red Tn5 Illumina Guyot Ave., 2017 PE150 Princeton, NJ 256759 B02L02 Moore St & 10-May- Red Tn5 Illumina Guyot Ave., 2017 PE150 Princeton, NJ 305206 B03L01 Moore St & 10-May- Red Tn5 Illumina Guyot Ave., 2017 PE150 Princeton, NJ 319941 HW02L02 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 333873 HW03L01 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 258120 HW03L03 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 275293 HW03L04 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 299356 HW06L01G02 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 260553

71

HW06L02G03 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 300112 HW07L02 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 259311 HW09L02 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 260464 HW09L03 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 247150 HW09L07 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 252787 HW10L02 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 254752 HWL03 Herrontown 31-May- Red Tn5 Illumina woods, NJ 2017 PE150 254968 JRC01 Janelia Research 3-Apr- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 252369 JRC06 Janelia Research 3-Apr- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 317542 JRC11 Janelia Research 3-Apr- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 304971 JRC14 Janelia Research 3-Apr- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 327441 JRC16 Janelia Research 3-Apr- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 253464 JRC18 Janelia Research 3-Apr- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 362357 JRC22 Janelia Research 3-Apr- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 226260 JRC23 Janelia Research 3-Apr- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 338219 JRC25 Janelia Research 3-Apr- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 235476 JRC84 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 244087 JRC85 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 242809 JRC86 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 277938

72

JRC87 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 294687 JRC88 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 283813 JRC89 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 293816 JRC90 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 307410 JRC91 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 301578 JRC92 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 274484 JRC93 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 251875 JRC94 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 246812 JRC95 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 255648 JRC96 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 228158 JRC97 Janelia Research 21-May- Red Tn5 Illumina Campus, 2017 PE150 Ashburn, VA 242492 OWR71 Old Waterford 30-May- Red Tn5 Illumina Road, Leesburg, 2017 PE150 VA 271725 POR30 Point of Rocks, 16-May- Red Tn5 Illumina MD 2017 PE150 264484 POR31 Point of Rocks, 16-May- Red Tn5 Illumina MD 2017 PE150 132066 SM05L02 St. Michael’s 1-Jun- Red Tn5 Illumina preserve, NJ 2017 PE150 317520 SM05L04 St. Michael’s 1-Jun- Red Tn5 Illumina preserve, NJ 2017 PE150 327054 W02L02 Woodfield 31-May- Red Tn5 Illumina reservation, NJ 2017 PE150 272076 W02L06 Woodfield 31-May- Red Tn5 Illumina reservation, NJ 2017 PE150 266667 W05L05 Woodfield 31-May- Red Tn5 Illumina reservation, NJ 2017 PE150 233816 1139

73

1140 Table S7. Details of biological samples, sequence files, and SRA accession numbers for H. 1141 hamamelidis whole-genome re-sequencing. MLBS = Mountain Lakes Biological Station, 1142 Pembroke, Virginia. 1143 Sample # Collection Collection Library Sequencing # Read pairs Accession # Location Date Prep Method Method Goose Creek Lane, 31-May- Tn5 Illumina GCL77 Leesburg, VA 2017 PE150 7327275 MLBS 7-Jul-2018 Tn5 Illumina ML10_Gr PE150 7333060 MLBS 7-Jul-2018 Tn5 Illumina ML14_Gr PE150 7868181 MLBS 7-Jul-2018 Tn5 Illumina ML15_Gr PE150 6656988 MLBS 7-Jul-2018 Tn5 Illumina ML17_Gr PE150 7010451 MLBS 7-Jul-2018 Tn5 Illumina ML18_Gr PE150 6481897 MLBS 7-Jul-2018 Tn5 Illumina ML2_Gr PE150 7344032 MLBS 7-Jul-2018 Tn5 Illumina ML21_Gr PE150 7166893 MLBS 7-Jul-2018 Tn5 Illumina ML23_Gr PE150 7503826 MLBS 7-Jul-2018 Tn5 Illumina ML24_Gr PE150 4641363 MLBS 7-Jul-2018 Tn5 Illumina ML27_Gr PE150 6152152 MLBS 7-Jul-2018 Tn5 Illumina ML28_Gr PE150 7763943 MLBS 7-Jul-2018 Tn5 Illumina ML30_Gr PE150 6873459 MLBS 7-Jul-2018 Tn5 Illumina ML33_Gr PE150 5378458 MLBS 7-Jul-2018 Tn5 Illumina ML34_Gr PE150 6891338 MLBS 7-Jul-2018 Tn5 Illumina ML36_Gr PE150 228271 MLBS 7-Jul-2018 Tn5 Illumina ML38_Gr PE150 7212647 MLBS 7-Jul-2018 Tn5 Illumina ML40_Gr PE150 6162598 MLBS 7-Jul-2018 Tn5 Illumina ML41_Gr PE150 6170559 MLBS 7-Jul-2018 Tn5 Illumina ML43_Gr PE150 7580980 MLBS 7-Jul-2018 Tn5 Illumina ML44_Gr PE150 7912117 MLBS 7-Jul-2018 Tn5 Illumina ML48_Gr PE150 5620965

74

MLBS 7-Jul-2018 Tn5 Illumina ML49_Gr PE150 4744505 MLBS 7-Jul-2018 Tn5 Illumina ML50_Gr PE150 7383271 MLBS 7-Jul-2018 Tn5 Illumina ML51_Gr PE150 7607244 MLBS 7-Jul-2018 Tn5 Illumina ML53_Gr PE150 5317613 MLBS 7-Jul-2018 Tn5 Illumina ML55_Gr PE150 7580879 MLBS 7-Jul-2018 Tn5 Illumina ML57_Gr PE150 1605322 MLBS 7-Jul-2018 Tn5 Illumina ML6_Gr PE150 7702032 MLBS 7-Jul-2018 Tn5 Illumina ML60_Gr PE150 7457247 MLBS 7-Jul-2018 Tn5 Illumina ML61_Gr PE150 7359768 MLBS 7-Jul-2018 Tn5 Illumina ML62_Gr PE150 7278081 MLBS 7-Jul-2018 Tn5 Illumina ML65_Gr PE150 7448093 MLBS 7-Jul-2018 Tn5 Illumina ML70_Gr PE150 6912997 MLBS 7-Jul-2018 Tn5 Illumina ML71_Gr PE150 8049317 MLBS 7-Jul-2018 Tn5 Illumina ML73_Gr PE150 7124219 MLBS 7-Jul-2018 Tn5 Illumina ML77_Gr PE150 8407556 MLBS 7-Jul-2018 Tn5 Illumina ML79_Gr PE150 8352011 MLBS 7-Jul-2018 Tn5 Illumina ML82_Gr PE150 6974944 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR38 Maryland 2017 PE150 7732998 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR40 Maryland 2017 PE150 8121428 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR50 Maryland 2017 PE150 6605756 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR51 Maryland 2017 PE150 7442782 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR52 Maryland 2017 PE150 7346231 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR53 Maryland 2017 PE150 7679601 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR54 Maryland 2017 PE150 6710199 Old Waterford 22-May- Tn5 Illumina Road, Leesburg, 2017 PE150 OWR57 VA 7307248

75

Old Waterford 30-May- Tn5 Illumina Road, Leesburg, 2017 PE150 OWR70 VA 6821266 MLBS 7-Jul-2018 Tn5 Illumina ML10_Red PE150 7157138 MLBS 7-Jul-2018 Tn5 Illumina ML12_Red PE150 135505 MLBS 7-Jul-2018 Tn5 Illumina ML13_Red PE150 8118063 MLBS 7-Jul-2018 Tn5 Illumina ML16_Red PE150 9041960 MLBS 7-Jul-2018 Tn5 Illumina ML18_Red PE150 8826653 MLBS 7-Jul-2018 Tn5 Illumina ML19_Red PE150 8143654 MLBS 7-Jul-2018 Tn5 Illumina ML2_Red PE150 7175142 MLBS 7-Jul-2018 Tn5 Illumina ML20_Red PE150 8225262 MLBS 7-Jul-2018 Tn5 Illumina ML22_Red PE150 8664565 MLBS 7-Jul-2018 Tn5 Illumina ML23_Red PE150 8102497 MLBS 7-Jul-2018 Tn5 Illumina ML24_Red PE150 7884498 MLBS 7-Jul-2018 Tn5 Illumina ML25_Red PE150 8547869 MLBS 7-Jul-2018 Tn5 Illumina ML27_Red PE150 7831540 MLBS 7-Jul-2018 Tn5 Illumina ML29_Red PE150 7930016 MLBS 7-Jul-2018 Tn5 Illumina ML3_Red PE150 7532536 MLBS 7-Jul-2018 Tn5 Illumina ML32_Red PE150 7411318 MLBS 7-Jul-2018 Tn5 Illumina ML35_Red PE150 7957062 MLBS 7-Jul-2018 Tn5 Illumina ML36_Red PE150 8051736 MLBS 7-Jul-2018 Tn5 Illumina ML37_Red PE150 8015006 MLBS 7-Jul-2018 Tn5 Illumina ML38_Red PE150 8549973 MLBS 7-Jul-2018 Tn5 Illumina ML39_Red PE150 7014367 MLBS 7-Jul-2018 Tn5 Illumina ML41_Red PE150 7008886 MLBS 7-Jul-2018 Tn5 Illumina ML42_Red PE150 6255046 MLBS 7-Jul-2018 Tn5 Illumina ML44_Red PE150 8196265

76

MLBS 7-Jul-2018 Tn5 Illumina ML45_Red PE150 7370732 MLBS 7-Jul-2018 Tn5 Illumina ML46_Red PE150 7101764 MLBS 7-Jul-2018 Tn5 Illumina ML47_Red PE150 7935806 MLBS 7-Jul-2018 Tn5 Illumina ML52_Red PE150 8092411 MLBS 7-Jul-2018 Tn5 Illumina ML54_Red PE150 5607371 MLBS 7-Jul-2018 Tn5 Illumina ML56_Red PE150 7892227 MLBS 7-Jul-2018 Tn5 Illumina ML57_Red PE150 6630514 MLBS 7-Jul-2018 Tn5 Illumina ML58_Red PE150 7561525 MLBS 7-Jul-2018 Tn5 Illumina ML59_Red PE150 7637387 MLBS 7-Jul-2018 Tn5 Illumina ML62_Red PE150 7912262 MLBS 7-Jul-2018 Tn5 Illumina ML63_Red PE150 6922716 MLBS 7-Jul-2018 Tn5 Illumina ML64_Red PE150 7470732 MLBS 7-Jul-2018 Tn5 Illumina ML65_Red PE150 7195503 MLBS 7-Jul-2018 Tn5 Illumina ML66_Red PE150 7596370 MLBS 7-Jul-2018 Tn5 Illumina ML67_Red PE150 7351721 MLBS 7-Jul-2018 Tn5 Illumina ML68_Red PE150 6448176 MLBS 7-Jul-2018 Tn5 Illumina ML69_Red PE150 7633376 MLBS 7-Jul-2018 Tn5 Illumina ML70_Red PE150 8179649 MLBS 7-Jul-2018 Tn5 Illumina ML9_Red PE150 6874469 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR32 Maryland 2017 PE150 5102731 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR34 Maryland 2017 PE150 6204711 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR43 Maryland 2017 PE150 7370217 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR49 Maryland 2017 PE150 6120506 Old Waterford 22-May- Tn5 Illumina Road, Leesburg, 2017 PE150 OWR58 VA 7305751 1144 1145

77

1146 Table S8. Details of biological samples, sequence files, and SRA accession numbers for H. 1147 hamamelidis targeted genome re-sequencing of approximately 800 kbp of Chromosome 1. 1148 MLBS = Mountain Lakes Biological Station, Pembroke, Virginia. Sample # Collection Collection Library Sequencing # Read pairs Accession # Location Date Prep Method Method Goose Creek Lane, 31-May- Tn5 Illumina GCL77 Leesburg, Virginia 2017 PE150 314257 MLBS 7-Jul-2018 Tn5 Illumina ML10_Gr PE150 334744 MLBS 7-Jul-2018 Tn5 Illumina ML14_Gr PE150 325866 MLBS 7-Jul-2018 Tn5 Illumina ML15_Gr PE150 298479 MLBS 7-Jul-2018 Tn5 Illumina ML17_Gr PE150 313916 MLBS 7-Jul-2018 Tn5 Illumina ML18_Gr PE150 290694 MLBS 7-Jul-2018 Tn5 Illumina ML2_Gr PE150 331929 MLBS 7-Jul-2018 Tn5 Illumina ML21_Gr PE150 329166 MLBS 7-Jul-2018 Tn5 Illumina ML23_Gr PE150 327831 MLBS 7-Jul-2018 Tn5 Illumina ML24_Gr PE150 178226 MLBS 7-Jul-2018 Tn5 Illumina ML27_Gr PE150 275723 MLBS 7-Jul-2018 Tn5 Illumina ML28_Gr PE150 346212 MLBS 7-Jul-2018 Tn5 Illumina ML30_Gr PE150 300476 MLBS 7-Jul-2018 Tn5 Illumina ML33_Gr PE150 235769 MLBS 7-Jul-2018 Tn5 Illumina ML34_Gr PE150 313858 MLBS 7-Jul-2018 Tn5 Illumina ML36_Gr PE150 6195 MLBS 7-Jul-2018 Tn5 Illumina ML38_Gr PE150 312165 MLBS 7-Jul-2018 Tn5 Illumina ML40_Gr PE150 262504 MLBS 7-Jul-2018 Tn5 Illumina ML41_Gr PE150 266728 MLBS 7-Jul-2018 Tn5 Illumina ML43_Gr PE150 334740 MLBS 7-Jul-2018 Tn5 Illumina ML44_Gr PE150 355990 MLBS 7-Jul-2018 Tn5 Illumina ML48_Gr PE150 76870

78

MLBS 7-Jul-2018 Tn5 Illumina ML49_Gr PE150 219732 MLBS 7-Jul-2018 Tn5 Illumina ML50_Gr PE150 319064 MLBS 7-Jul-2018 Tn5 Illumina ML51_Gr PE150 335990 MLBS 7-Jul-2018 Tn5 Illumina ML53_Gr PE150 240432 MLBS 7-Jul-2018 Tn5 Illumina ML55_Gr PE150 334902 MLBS 7-Jul-2018 Tn5 Illumina ML57_Gr PE150 89944 MLBS 7-Jul-2018 Tn5 Illumina ML6_Gr PE150 340415 MLBS 7-Jul-2018 Tn5 Illumina ML60_Gr PE150 283159 MLBS 7-Jul-2018 Tn5 Illumina ML61_Gr PE150 317924 MLBS 7-Jul-2018 Tn5 Illumina ML62_Gr PE150 312738 MLBS 7-Jul-2018 Tn5 Illumina ML65_Gr PE150 332910 MLBS 7-Jul-2018 Tn5 Illumina ML70_Gr PE150 313079 MLBS 7-Jul-2018 Tn5 Illumina ML71_Gr PE150 353342 MLBS 7-Jul-2018 Tn5 Illumina ML73_Gr PE150 309115 MLBS 7-Jul-2018 Tn5 Illumina ML77_Gr PE150 374346 MLBS 7-Jul-2018 Tn5 Illumina ML79_Gr PE150 367693 MLBS 7-Jul-2018 Tn5 Illumina ML82_Gr PE150 280482 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR38 Maryland 2017 PE150 331238 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR40 Maryland 2017 PE150 356281 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR50 Maryland 2017 PE150 290972 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR51 Maryland 2017 PE150 327369 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR52 Maryland 2017 PE150 327149 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR53 Maryland 2017 PE150 323702 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR54 Maryland 2017 PE150 285992 Old Waterford Road, 22-May- Tn5 Illumina OWR57 Leesburg, VA 2017 PE150 306545

79

Old Waterford Road, 30-May- Tn5 Illumina OWR70 Leesburg, VA 2017 PE150 302774 MLBS 7-Jul-2018 Tn5 Illumina ML10_Red PE150 329939 MLBS 7-Jul-2018 Tn5 Illumina ML12_Red PE150 3091 MLBS 7-Jul-2018 Tn5 Illumina ML13_Red PE150 365121 MLBS 7-Jul-2018 Tn5 Illumina ML16_Red PE150 402478 MLBS 7-Jul-2018 Tn5 Illumina ML18_Red PE150 395314 MLBS 7-Jul-2018 Tn5 Illumina ML19_Red PE150 369502 MLBS 7-Jul-2018 Tn5 Illumina ML2_Red PE150 323168 MLBS 7-Jul-2018 Tn5 Illumina ML20_Red PE150 360596 MLBS 7-Jul-2018 Tn5 Illumina ML22_Red PE150 380225 MLBS 7-Jul-2018 Tn5 Illumina ML23_Red PE150 355840 MLBS 7-Jul-2018 Tn5 Illumina ML24_Red PE150 360261 MLBS 7-Jul-2018 Tn5 Illumina ML25_Red PE150 381256 MLBS 7-Jul-2018 Tn5 Illumina ML27_Red PE150 354745 MLBS 7-Jul-2018 Tn5 Illumina ML29_Red PE150 360523 MLBS 7-Jul-2018 Tn5 Illumina ML3_Red PE150 341628 MLBS 7-Jul-2018 Tn5 Illumina ML32_Red PE150 332324 MLBS 7-Jul-2018 Tn5 Illumina ML35_Red PE150 353534 MLBS 7-Jul-2018 Tn5 Illumina ML36_Red PE150 369572 MLBS 7-Jul-2018 Tn5 Illumina ML37_Red PE150 349811 MLBS 7-Jul-2018 Tn5 Illumina ML38_Red PE150 393709 MLBS 7-Jul-2018 Tn5 Illumina ML39_Red PE150 304153 MLBS 7-Jul-2018 Tn5 Illumina ML41_Red PE150 312756 MLBS 7-Jul-2018 Tn5 Illumina ML42_Red PE150 281446 MLBS 7-Jul-2018 Tn5 Illumina ML44_Red PE150 360792

80

MLBS 7-Jul-2018 Tn5 Illumina ML45_Red PE150 331967 MLBS 7-Jul-2018 Tn5 Illumina ML46_Red PE150 306892 MLBS 7-Jul-2018 Tn5 Illumina ML47_Red PE150 312341 MLBS 7-Jul-2018 Tn5 Illumina ML52_Red PE150 355827 MLBS 7-Jul-2018 Tn5 Illumina ML54_Red PE150 220263 MLBS 7-Jul-2018 Tn5 Illumina ML56_Red PE150 341262 MLBS 7-Jul-2018 Tn5 Illumina ML57_Red PE150 287334 MLBS 7-Jul-2018 Tn5 Illumina ML58_Red PE150 337310 MLBS 7-Jul-2018 Tn5 Illumina ML59_Red PE150 342148 MLBS 7-Jul-2018 Tn5 Illumina ML62_Red PE150 360053 MLBS 7-Jul-2018 Tn5 Illumina ML63_Red PE150 303988 MLBS 7-Jul-2018 Tn5 Illumina ML64_Red PE150 326164 MLBS 7-Jul-2018 Tn5 Illumina ML65_Red PE150 308072 MLBS 7-Jul-2018 Tn5 Illumina ML66_Red PE150 334547 MLBS 7-Jul-2018 Tn5 Illumina ML67_Red PE150 313769 MLBS 7-Jul-2018 Tn5 Illumina ML68_Red PE150 285472 MLBS 7-Jul-2018 Tn5 Illumina ML69_Red PE150 335029 MLBS 7-Jul-2018 Tn5 Illumina ML70_Red PE150 368484 MLBS 7-Jul-2018 Tn5 Illumina ML9_Red PE150 311326 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR32 Maryland 2017 PE150 206644 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR34 Maryland 2017 PE150 271798 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR43 Maryland 2017 PE150 320317 Man Lu Ridge Rd., 16-May- Tn5 Illumina MLR49 Maryland 2017 PE150 256744 Old Waterford Road, 22-May- Tn5 Illumina OWR58 Leesburg, VA 2017 PE150 320996 1149

81

1150 Table S9. Details of biological samples, sequence files, and SRA accession numbers for 1151 Hormaphis cornu salivary gland RNA-sequencing from fundatrices from red and green 1152 galls. JRC = Janelia Research Campus. 1153 Sample Collection Collection Gall Library Sequencing # Read pairs Accession # ID Location Date color Prep Method Method AKG_4 JRC 28-Apr- Green WGB Illumina PE 11607393 2019 150 DSG_6 JRC 28-Apr- Green WGB Illumina PE 12179326 2019 150 DSG_7 JRC 30-Apr- Green WGB Illumina PE 2019 150 13321660 DSG_8 JRC 30-Apr- Green WGB Illumina PE 10433415 2019 150 DSG_9 JRC 30-Apr- Green WGB Illumina PE 14776468 2019 150 DSG_10 JRC 30-Apr- Green WGB Illumina PE 13105789 2019 150 DSG_11 JRC 01-May- Green WGB Illumina PE 12867296 2019 150 DSG_13 JRC 01-May- Green WGB Illumina PE 10772840 2019 150 DSR_11 JRC 01-May- Red WGB Illumina PE 15042624 2019 150 DSR_12 JRC 01-May- Red WGB Illumina PE 11051413 2019 150 DSR_13 JRC 01-May- Red WGB Illumina PE 15396180 2019 150 DSR_15 JRC 01-May- Red WGB Illumina PE 18975841 2019 150 DSR_16 JRC 01-May- Red WGB Illumina PE 22465588 2019 150 DSR_17 JRC 01-May- Red WGB Illumina PE 15618318 2019 150 DSR_18 JRC 01-May- Red WGB Illumina PE 16229009 2019 150 AKR1 JRC 28-Apr- Red WGB Illumina PE 2019 150 14353107 AKR2 JRC 28-Apr- Red WGB Illumina PE 2019 150 17302776 AKR3 JRC 28-Apr- Red WGB Illumina PE 2019 150 24214993 AKR4 JRC 28-Apr- Red WGB Illumina PE 2019 150 12914802 DSR5 JRC 28-Apr- Red WGB Illumina PE 2019 150 16153051 DSR6 JRC 28-Apr- Red WGB Illumina PE 2019 150 10998399 DSR7 JRC 28-Apr- Red WGB Illumina PE 2019 150 10264369

82

DSR_8 JRC 28-Apr- Red WGB Illumina PE 2019 150 19769187 DSR_9 JRC 28-Apr- Red WGB Illumina PE 2019 150 18487599 DSR JRC 28-Apr- Red WGB Illumina PE 2019 150 9649227

1154

83

1155 Table S10. Details of biological samples, sequence files, and SRA accession numbers for 1156 Hamamelis virginiana RNA-sequencing from galls and leaf around galls. JRC = Janelia 1157 Research Campus, Ashburn, VA. 1158 Sample ID Collection Collection Library prep Sequencing # Read Accession # Location Date method method pairs Gall_9 JRC 11-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 14453904 Gall_11 JRC 11-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 7224527 Early_gall_18 JRC 12-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 25279196 Gall_20 JRC 12-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 11692754 Gall_22 JRC 12-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 10990345 Gall_26 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 6419861 Gall_27 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 7086413 Gall_28 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 7618914 Gall_29 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 5633325 Gall_31 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 12324002 Early_gall_33 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 31823857 gall_35 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 11104190 Gall_39 Morven 15-Apr- Nugen Illumina Park, 2017 Universal Plus PE150 Leesburg mRNA-seq 10931442 Gall_42 JRC 15-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 6199468

84

Gall_44 JRC 15-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 11104190 Gall_46 JRC 15-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 11965583 Gall_47 JRC 15-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 18509404 Bottom_gall_51 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 13291659 Top_gall_52 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 11679869 Bottom_gall_53 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 11661968 Top_gall_54 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 12277949 Bottom_gall_55 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 10705594 Top_gall_56 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 12245151 Bottom_gall_57 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 5688881 Top_gall_58 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 8195240 Gall_60 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 7075731 Gall_61 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 5226728 Gall_62 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 3985629 Gall_63 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 6194899 Gall_65 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 10541245 Gall_66 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 5324437

85

Apical_gall_118 JRC 22-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 8305160 Apical_gall_122 JRC 22-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 3971025 Basal_gall_116 JRC 22-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 10405803 Basal_gall_120 JRC 22-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 4887393 Medial_gall_117 JRC 22-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 3452858 Medial_gall_121 JRC 22-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 5465679 Basal_gall_132 JRC 23-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 12071443 Medial_gall_133 JRC 23-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 11699167 Apical_gall_134 JRC 23-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 4455419 Basal_gall_135 JRC 23-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 58608022 Medial_gall_136 JRC 23-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 1262243 Apical_gall_137 JRC 23-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 7961621 Basal_gall_138 JRC 23-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 10164815 Medial_gall_139 JRC 23-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 8975097 Pink_apical_gall_140 JRC 23-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 7155908 Medial_gall_173 JRC 01-May- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 14175668 Basal_gall_175 JRC 01-May- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 7226842

86

Medial_gall_176 JRC 01-May- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 9636190 Apical_gall_177 JRC 01-May- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 8921020 Medial_gall_179 JRC 01-May- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 4426575 Apical_gall_180 JRC 01-May- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 8103716 Leaf_around_gall_10 JRC 11-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 3055859 Leaf_around_gall_12 JRC 11-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 2924715 Leaf_around_gall_19 JRC 11-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 39720279 Leaf_around_gall_21 JRC 12-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 4346238 Leaf_around_gall_23 JRC 12-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 9527411 Leaf_around_gall_30 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 10162178 Leaf_around_gall_34 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 34938887 Leaf_around_gall_38 JRC 14-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 5195676 Leaf_around_gall_43 JRC 15-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 11911791 Leaf_around_gall_45 JRC 15-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 9012995 Leaf_around_gall_49 JRC 15-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 2405452 Leaf_around_gall_50 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 4591989 Leaf_around_gall_59 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 6517483

87

Leaf_around_gall_64 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 4871569 Leaf_around_gall_67 JRC 17-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 4352513 Leaf_around_gall_119 JRC 22-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 4505696 Leaf_around_gall_123 JRC 22-Apr- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 2655968 Leaf_around_gall_181 JRC 01-May- Nugen Illumina 2017 Universal Plus PE150 mRNA-seq 5869984 1159

88

1160 Table S11. Details of biological samples, sequence files, and SRA accession numbers for 1161 Hamamelis virginiana RNA-sequencing from green and red galls. 1162 Sample ID Collection Collection Leaf Gall Library Sequencing # Read Accession Location Date Sample color Prep Method pairs # Method JRC 08-May- 1 Red WGB Illumina 20JRC105R 2020 SE75 5813855 JRC 08-May- 1 Green WGB Illumina 20JRC104G 2020 SE75 6618273 JRC 08-May- 2 Red WGB Illumina 20JRC107R 2020 SE75 7330505 JRC 08-May- 2 Green WGB Illumina 20JRC106G 2020 SE75 6999471 JRC 08-May- 3 Red WGB Illumina 20JRC108R 2020 SE75 7444078 JRC 08-May- 3 Green WGB Illumina 20JRC109G 2020 SE75 7711014 JRC 08-May- 4 Red WGB Illumina 20JRC110R 2020 SE75 6961512 JRC 09-May- 4 Green WGB Illumina 20JRC111G 2020 SE75 6765543 JRC 08-May- 5 Red WGB Illumina 20JRC113R 2020 SE75 6707106 JRC 09-May- 5 Green WGB Illumina 20JRC112G 2020 SE75 7274181 JRC 08-May- 7 Red WGB Illumina 20JRC116R 2020 SE75 12806892 JRC 08-May- 8 Red WGB Illumina 20JRC1076R 2020 SE75 9172629 JRC 08-May- 8 Red WGB Illumina 20JRC1077R 2020 SE75 9280201 JRC 08-May- 8 Green WGB Illumina 20JRC1074G 2020 SE75 9200050 JRC 08-May- 8 Green WGB Illumina 20JRC1075G 2020 SE75 9001911 JRC 08-May- 9 Red WGB Illumina 20JRC1079R 2020 SE75 8765890 JRC 08-May- 9 Green WGB Illumina 20JRC1078G 2020 SE75 8534470 JRC 08-May- 10 Red WGB Illumina 20JRC1081R 2020 SE75 7555292 JRC 08-May- 10 Green WGB Illumina 20JRC1080G 2020 SE75 8840709 JRC 08-May- 11 Red WGB Illumina 20JRC1083R 2020 SE75 9657276 JRC 08-May- 11 Green WGB Illumina 20JRC1082G 2020 SE75 8984115 JRC 08-May- 12 Red WGB Illumina 20JRC1085R 2020 SE75 9542403

1

JRC 08-May- 13 Red WGB Illumina 20JRC1086R 2020 SE75 9508095 JRC 08-May- 13 Green WGB Illumina 20JRC1087G 2020 SE75 9552857 JRC 08-May- 14 Red WGB Illumina 20JRC1089R 2020 SE75 8580538 JRC 08-May- 14 Green WGB Illumina 20JRC1088G 2020 SE75 9348175 JRC 08-May- 15 Red WGB Illumina 20JRC1091R 2020 SE75 10193238 JRC 09-May- 15 Green WGB Illumina 20JRC1090G 2020 SE75 9782456 JRC 09-May- 15 Green WGB Illumina 20JRC1092G 2020 SE75 10354726 JRC 08-May- 16 Red WGB Illumina 20JRC1093R 2020 SE75 10632771 JRC 09-May- 16 Green WGB Illumina 20JRC1094G 2020 SE75 9767489 JRC 09-May- 16 Green WGB Illumina 20JRC1095G 2020 SE75 9957641 JRC 09-May- 16 Green WGB Illumina 20JRC1096G 2020 SE75 9938437 JRC 09-May- 16 Green WGB Illumina 20JRC1097G 2020 SE75 8222414 JRC 08-May- 17 Red WGB Illumina 20JRC1100R 2020 SE75 5354681 JRC 09-May- 17 Green WGB Illumina 20JRC1098G 2020 SE75 5890682 JRC 09-May- 17 Green WGB Illumina 20JRC1099G 2020 SE75 6057780 JRC 09-May- 17 Green WGB Illumina 20JRC1101G 2020 SE75 7237058 JRC 08-May- 18 Green WGB Illumina 20JRC1102G 2020 SE75 4943324 JRC 08-May- 18 Green WGB Illumina 20JRC1103G 2020 SE75 7407432

1163

2