bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1

1 Divergent gene expression levels between diploid and autotetraploid Tolmiea

2 () relative to the total transcriptome, the cell, and biomass.

3

4 Clayton J. Visger1, Gane K-S. Wong2,3, Yong Zhang4,5, Pamela S. Soltis6,7, and Douglas E.

5 Soltis6,7,8

6 7 Author Affiliation: 8 9 1Department of Biological Sciences, California State University Sacramento, Sacramento, CA

10 95819, USA

11 2Department of Biological Sciences, University of Alberta, Edmonton AB, T6G 2E9, Canada

12 3Department of Medicine, University of Alberta, Edmonton AB, T6G 2E1, Canada

13 4Bejing Genomics Institute-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen

14 518083, China

15 5HuaHan Gene Co. Ltd., 7F JianAnShanHai Building, No. 8000, Shennan Road, Futian District,

16 Shenzhen 518040, China

17 6Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA

18 7Genetics Institute, University of Florida, Gainesville, FL 32610, USA

19 8Department of Biology, University of Florida, Gainesville, FL 32611, USA

20

21

22

23 Corresponding Author: Clayton J. Visger, [email protected]

24 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2

25 Summary:

26 • Studies of gene expression and polyploidy are typically restricted to characterizing

27 differences in transcript concentration. Integrating multiple methods of transcript

28 analysis, we document a difference in transcriptome size, and make multiple

29 comparisons of transcript abundance in diploid and autotetraploid Tolmiea.

30 • We use RNA spike-in standards to identify and correct for differences in

31 transcriptome size, and compare levels of gene expression across multiple scales:

32 per transcriptome, per cell, and per biomass.

33 • In total, ~17% of all loci were identified as differentially expressed (DEGs) between

34 the diploid and autopolyploid species. A shift in total transcriptome size resulted in

35 only ~58% of the total DEGs being identified as differentially expressed following a

36 per transcriptome normalization. When transcript abundance was normalized per

37 cell, ~82% of the total DEGs were recovered. The discrepancy between per-

38 transcriptome and per-cell recovery of DEGs occurs because per-transcriptome

39 normalizations are concentration-based and therefore blind to differences in

40 transcriptome size.

41 • While each normalization enables valid comparisons at biologically relevant scales,

42 a holistic comparison of multiple normalizations provides additional explanatory

43 power not available from any single approach. Notably, autotetraploid loci tend to

44 conserve diploid-like transcript abundance per biomass through increased gene

45 expression per cell, and these loci are enriched for photosynthesis-related functions.

46 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 3

47 Introduction

48 Polyploidy (whole-genome duplication; WGD), a process now recognized to be of major

49 importance across many eukaryotic lineages (Van de Peer et al., 2009; Jiao et al., 2011; Van

50 de Peer, 2011; Jiao and Paterson, 2014), was long considered an evolutionary dead-end by

51 some, including several of the most prominent evolutionary biologists of the past century

52 (Stebbins, 1950; Wagner, 1970). However, during the last several decades there has been a

53 resurgence of interest in the study of polyploid evolution, particularly the genetic and

54 genomic consequences of polyploidy (e.g., Ainouche et al., 2012; Barker et al., 2016;

55 Canestro, 2012; Chen and Birchler, 2013; Doyle, 2012; Gaeta et al., 2007; Doyle et al., 2008;

56 Soltis and Soltis, 2009, 2012; Salmon et al., 2010; Greilhuber et al., 2012; Shi et al., 2012;

57 Madlung and Wendel, 2013; Renny-Byfield and Wendel, 2014; Soltis et al., 2014; Wendel,

58 2015; Spoelhof et al., accepted).

59 The role of polyploidy in facilitating changes in gene expression, through expression

60 level divergence, altered expression patterns (e.g., across tissue-types), and/or the

61 generation of unique splice variants, is arguably one of the most important research topics

62 in the field today (e.g., Liu et al., 2001; Adams et al., 2003; Adams and Wendel, 2012;

63 Chelaifa et al., 2010; Dong and Adams, 2011; Ainouche et al., 2012; Buggs, 2012; Buggs et al.,

64 2014; Rambani et al., 2014; see Yoo et al., 2014 for review). For example, autotetraploid

65 Solanum phureja expresses ~10% of all loci at a different levels relative to diploid S.

66 phureaja (Stupar et al., 2007). In Paulownia fortunei, the autotetraploid differentially

67 expresses ~6% of all loci relative to the diploid progenitor (Zhang et al., 2014), and in

68 Arabidopsis thaliana ~4% of all loci are differentially expressed between the diploid and

69 autotetraploid derivative (Del Pozo and Ramirez-Parra, 2014). However, the rapid increase

70 in studies of patterns of gene expression in polyploids appear to have outpaced our bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 4

71 fundamental understanding of the transcriptome’s response to polyploidy per se and how

72 this response might influence our interpretation of gene expression measures.

73 One major concern in comparisons of gene expression between diploids and polyploids

74 is that any transcriptome amplification or transcriptome-wide effects induced following

75 polyploidy are rarely, if ever, investigated (see Coate and Doyle, 2010, 2015).

76 Transcriptional amplification is a biological phenomenon in which the total mRNA

77 produced per cell is increased up to several fold in one treatment group compared to

78 another, resulting in unequal total transcriptome sizes (Nie et al., 2012). Transcriptional

79 amplification would be anticipated in a polyploid compared to a diploid progenitor, as

80 polyploidy globally alters gene/genome copy number and often influences cell size, which,

81 in turn, has been shown to correlate with transcriptome size (Fomina-yadlin et al., 2014).

82 Recent work has shed light on some of the biases inherent to expression level

83 comparisons between treatment groups that differ in transcriptome size. Loven et al.

84 (2012) showed that inferences drawn from a typical RNAseq workflow can be confounded

85 when treatment groups have transcriptomes of differing size. Because transcriptome size

86 variation has rarely been explored, RNAseq studies involving transcriptome amplification

87 are likely to underestimate the proportion of the transcriptome being differentially

88 expressed. Surprisingly, only a single investigation into gene expression change following

89 polyploidy has accounted for variation in transcriptome size (Coate and Doyle, 2010). Coate

90 and Doyle (2010) found that allotetraploid G. dolicocarpa has a total mRNA transcriptome

91 ~1.4 times greater than its diploid progenitors, demonstrating that polyploidy can induce

92 an increase in transcriptome size. However, similar data from additional polyploid systems

93 are badly needed to achieve a broader understanding of the role that polyploidy plays in

94 transcriptome-wide changes in gene expression. Current methods for comparing

95 differences in levels of gene expression across the transcriptome rely on the detection of bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 5

96 statistically different RNAseq read abundances at each locus to generate a summary of

97 differential gene expression. RNAseq libraries are usually sequenced to different depths by

98 chance, resulting in library size (the total number of reads per sample) varying across the

99 final dataset. To prevent variation in library size from influencing analyses of differential

100 expression, library size is typically normalized across all samples (e.g., Mortazavi et al.,

101 2008). Commonly used library normalization methods quantify expression level on a

102 concentration basis by dividing read abundance by a factor of the whole library (e.g.,

103 transcripts per million or reads per kilobase per million) (see Coate and Doyle, 2010, 2015;

104 Lovén et al., 2012). Concentration-based normalizations place gene expression in the

105 context of expression level per transcriptome. Using a concentration-based normalization

106 approach to infer differences in expression level requires that total transcriptome size does

107 not vary between ploidal levels. If transcriptome size varies by ploidy, then loci identified as

108 differentially expressed (differentially expressed genes; DEGs) are not necessarily

109 expressed at different levels, but instead are maintained in different concentrations relative

110 to the transcriptome (Figure 1A).

111 To overcome these problems, the use of synthetic spike-in RNA standards has been

112 proposed (Lovén et al., 2012). These standards, developed by the External RNA Controls

113 Consortium (External RNA Controls Consortium, 2005; Baker et al., 2005; referred to as

114 Spike-in RNA standards, henceforth spike-ins), facilitate comparison of absolute expression

115 level across treatments with different transcriptome sizes (see Pine et al., 2016). Spike-ins

116 allow for normalization of transcript abundance independent of transcriptome size. To date,

117 spike-ins have been largely restricted to use in model systems (e.g., hamster – Fomina-

118 yadlin et al., 2014; human – Xu et al., 2014; zebrafish – Schall et al., 2017), single cell

119 sequencing (e.g., Krishnaswami et al., 2016; Liu et al., 2016), and methods development and

120 validation (e.g., Lovén et al., 2012; Gu et al., 2014; Germain et al., 2016); they remain a bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 6

121 promising but unutilized method for cross-ploidy comparisons.

122 Most polyploids exhibit differences in cell size relative to their parents, which may

123 result in an alteration of cell density (Stebbins, 1971; Masterson, 1994). Because

124 normalization per transcriptome is based on concentration of transcripts, these

125 comparisons are therefore robust to variation in cell size and density. Differences in cell

126 number and cell size might be expected to influence levels of gene expression on certain

127 scales, but cell size and density are rarely investigated prior to studying differences in gene

128 expression levels. Conversely, normalizing by the abundance of an internal standard within

129 a library (e.g., spike-ins) does not account for variation in the number of contributing cells

130 across treatments, and the inferred transcript abundance may be biased (Fomina-yadlin et

131 al., 2014). It is therefore critical to have information on cell density differences between

132 treatment groups when normalizing to an internal standard.

133 Following polyploidy, a balance of cell size and density effects, as well as differences in

134 gene/allelic dosage, could have profound effects on all facets of physiology. Across a

135 ploidal series in Atriplex confertifolia, cell density decreases with increasing ploidy (Warner

136 and Edwards, 1989). Despite harboring fewer cells per unit area, higher ploidal levels of A.

137 confertifolia are capable of higher photosynthesis per unit leaf area, due to increased

138 photosynthesis of tetraploid cells relative to diploid cells. Conversely, in tetraploid

139 Medicago sativa, which also exhibits lower cell density than does its diploid progenitor,

140 photosynthesis per cell is increased in the tetraploid to yield diploid-like photosynthetic

141 output per unit leaf area (Warner and Edwards, 1993). In A. confertifolia and M. sativa, cell

142 size and density alone are not sufficient for predicting physiological change following

143 polyploidy. Instead, information on the interaction between cell density and cell efficiency is

144 needed to understand fully the physiological impact of polyploidy. In light of these

145 observations, studies of changes in gene expression levels following polyploidy should bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 7

146 routinely investigate the interaction between gene expression across a unit of biomass and

147 per cell.

148 When discussing transcript abundance between samples of differing transcriptome

149 sizes and/or differing cell density, a clear nomenclature is critical. We illustrate three

150 separate ways of defining transcript abundance when both transcriptome size and cell

151 density vary between treatment groups (Figure 1). When RNAseq data are normalized

152 without an external standard (Figure 1A), differences in abundance observed between two

153 treatments reflect a difference in transcript concentration. We follow Coate and Doyle

154 (2010) in referring to this type of comparison as ‘per transcriptome’.

155 Differences in expression level between treatments normalized to the abundance of

156 spike-ins reflect differences in transcript abundance relative to the abundance of spike-ins.

157 When spike-ins are added in equal amounts to samples derived from equivalent volumes of

158 tissue, then differences observed following a spike-in-based normalization indicate that

159 transcripts differ in abundance within a given volume of tissue (Figure 1B). We refer to this

160 comparison as ‘per biomass’. Finally, if the spike-in abundance is scaled by a factor equal to

161 differences in cell density, differences in expression level will reflect changes in transcript

162 abundance relative to the cell (Figure 1C), and we refer to this case as ‘per cell’ (Figure 1).

163 Tolmiea (Saxifragaceae) is an excellent evolutionary model for investigating the

164 transcriptional impact of polyploidy because: 1) it is a clear diploid (T. diplomenziesii) and

165 autotetraploid (T. menziesii) system; and 2) there is strong support for a single origin of the

166 autotetraploid (Soltis and Soltis, 1988; Soltis et al., 1989; Visger et al., 2016). In addition,

167 examination of changes in expression level following autopolyploidy is less problematic

168 than similar investigations within allopolyploid systems for several reasons. First,

169 autopolyploidy results in a duplication of a single genome, rather than the merger and

170 duplication of two divergent genomes (as in allopolyploidy). Hence, a single diploid-based bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 8

171 mapping reference can be used for both the diploid and the autotetraploid. Additionally, by

172 definition, there are no homeologues in an autopolyploid, substantially reducing the

173 potential for biased mismapping of paralogous reads. A single origin of the polyploid is also

174 an advantage for exploring issues pertaining to transcriptional amplification and reduces

175 analytical complexity. For example, multiple, independent origins of a polyploid may have

176 differing effects on transcriptome size. Thus, an investigation accounting for biases in

177 transcriptome size should start with an autopolyploid arising from a single origin.

178 Here we leverage spike-in RNA standards and multiple normalization methods to

179 collectively characterize for the first-time gene expression across three biological scales in a

180 natural system. Through this multi-scale comparison, we investigate how polyploidy has

181 influenced transcriptional change in gene pathway stoichiometry, cellular expression levels,

182 and transcript abundance across leaf organ tissue. Finally, we synthesize the differences

183 represented across all three scales and place our findings within the context of

184 ecophysiological data for Tolmiea.

185

186 Materials and Methods

187 Sampling— were collected from three geographically separate natural

188 populations of both T. diplomenziesii and T. menziesii (Figure 2). Taking advantage of the

189 ability of Tolmiea to reproduce asexually via plantlet formation, each individual collected in

190 the field was subsequently propagated in quadruplicate in a greenhouse at the University of

191 Florida. The resulting plantlets were then grown to maturity under standardized conditions

192 within a common garden greenhouse at the University of Florida. One of four replicate

193 plants from two T. menziesii populations died prior to sampling.

194 Each of the 22 mature plants was sampled for equivalent volumes of leaf tissue using an

195 8-mm-diameter cork borer. The tissue was then flash frozen in liquid N2 and stored at -80C bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 9

196 until RNA extraction. Total RNA was extracted from this tissue using the CTAB and Trizol

197 method of Jordon-Thaden et al. (2015; protocol number 2) with the addition of 20%

198 sarkosyl. DNA was removed using a Turbo DNA-free kit (Invitrogen, USA). Following the

199 manufacturer's recommendations, the total RNA was spiked with 4 ul of 1:100 diluted ERCC

200 RNA Spike-in mix (Ambion, USA). RNAseq libraries were then built using the TruSeq kit

201 (Illumina, USA); 100-bp paired-end sequencing was performed using an Illumina HiSeq at

202 the Beijing Genomics Institute.

203 Quantifying cell density--To quantify differences in levels of gene expression per cell, it

204 was necessary to account for differences in both transcriptome size and cell density

205 between diploid and autotetraploid Tolmiea. A potential source of uncertainty is our

206 estimation of differences in cell density that were used to normalize our reads on a relative

207 per-cell basis. To decrease the likelihood of misrepresenting differences in cell density, we

208 characterized cell density using both DNA/RNA co-extraction and cell counting as described

209 below.

210 Duplicate leaf punches from each sample were used for co-extraction of DNA and RNA.

211 We first followed the Jordon-Thaden et al. (2015) method #2 with 20% sakrosyl. Following

212 the CTAB incubation, the supernatant was split into two equal aliquots, one of which was

213 used for RNA extraction (following Jordon-Thaden et al., 2015) and the other for DNA

214 extraction (following Doyle and Doyle, 1987). DNA concentrations were quantified with

215 dsDNA broad-range chemicals using a Qubit (Life Technologies). DNA concentration was

216 placed into a 1C context by dividing by ploidal level. The 1C DNA concentration was used to

217 infer the relative difference in cell density of the leaf tissue contributing to RNA extraction

218 between T. menziesii and T. diplomenziesii. To validate this approach, we also directly

219 estimated cell density per unit area. Leaf punches two cm in diameter collected from 10

220 diploids and 11 tetraploids were digested in 500 ul of 10% chromic acid until cells were bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 10

221 fully dissociated (Brown and Rickless, 1949; Ilut et al., 2012). Each individual was assayed

222 twice, and the number of cells in the suspension was counted twice for each assay using 10-

223 ul aliquots in a hemocytometer. Statistical analyses of 1C DNA concentration and cell

224 density were performed using a linear mixed-effects model implemented in JMP (version

225 12; SAS Institute, Cary, NC, USA) with individual as a random effect. All datasets were tested

226 for normality using a goodness of fit test, and if normality was rejected, the data were log

227 transformed.

228 Differential expression analysis— Raw reads were cleaned using CutAdapt (Martin,

229 2011) and Sickle (Joshi and Fass, 2011). A Tolmiea reference transcriptome was generated

230 from concatenated reads taken from all samples (with the spike-in reads removed), and in

231 silico read normalization was employed using Trinity) (Grabherr et al., 2011). Extremely

232 low-expressed isoforms were removed (< 1 transcript per million), and the remaining

233 transcriptome was annotated using the Trinotate pipeline

234 (http://trinotate.sourceforge.net/). Trimmed reads for each sample were mapped to a

235 concatenation of the Tolmiea transcriptome with isoforms clustered together (using the

236 Trinity ‘gene’ option) and the publically available ERCC spike-in reference using Bowtie2

237 (Langmead and Salzberg, 2012), and read counts were extracted using eXpress (Roberts,

238 2013).

239 We provide an iPython notebook including the entire code for the normalization,

240 differential expression, and figure generation described below (Appendix 1). In short, read

241 count normalization and differential expression analyses were conducted using Limma-

242 Voom (Ritchie et al., 2015). A per-transcriptome normalization was implemented using the

243 total library following the removal of spike-in count data to compute normalization factors.

244 A per-biomass normalization used only spike-in count data to compute normalization

245 factors. A per-cell normalization used the spike-in count data following in silico adjustment bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 11

246 of tetraploid spike-in abundance using the difference of diploid versus tetraploid cell

247 density (65%; see methods on quantifying cell size and density and Results). Following

248 each normalization, using the plotMDS function, a multidimensional scaling plot was

249 generated from 500 loci exhibiting the highest variation in expression level among samples.

250 Differences in transcriptome size were approximated using the sum of normalized read

251 counts per cell; however, Limma-Voom normalizations use log-counts, which are not

252 applicable to straight summation, so we used the DEseq package (Anders and Huber, 2010)

253 to compute per-cell normalized counts for this purpose only. Next, a differential expression

254 analysis was run using the normalization approaches above described, with loci identified

255 as differentially expressed (DE) using a 0.05 p-value, 0.05 false discovery rate, and cutoff of

256 1 log-fold change (logFC). Differentially expressed loci were binned both broadly across the

257 three normalization methods and more finely to characterize interplay between the three

258 normalization results using gplots (Warnes et al., 2009). Gene ontology (GO) terms were

259 extracted from the Trinotate annotations of the mapping reference. Each of the fine-scale

260 bins of DEGs was tested for functional enrichment using GOSeq (Young et al., 2010).

261

262 Results

263 Quantifying cell size and density -- The mean diploid and tetraploid 1C DNA

264 concentrations per leaf punch extraction were 1.32591 +/- 0.09439 ug/ml and 0.76432 +/-

265 0.0619 ug/ml, respectively; these differed significantly (p < 0.0001) (Figure 3A). The mean

266 tetraploid 1C DNA per punch was 57.6% of the diploid value. Following tissue digestion, the

267 estimated values of mean number of cells per leaf punch in diploids and tetraploids were

268 412,146 +/- 38,002 and 268,958 +/- 20,522, respectively. These values differed significantly

269 (p = 0.0129). The tetraploids on average had 65.3% as many cells per leaf punch as found in

270 the diploids (Figure 3B). bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 12

271 Our two methods for characterizing differences in cell density both revealed similar

272 reductions in tetraploid cell density; tetraploids possessed ~58% and 65% of the diploid

273 density, based on the DNA/RNA co-extraction and cell-count results, respectively. We

274 elected to normalize our per-cell analysis by applying a 0.65 factor to the per-cell

275 normalization factor of the tetraploid samples to reflect the more conservative estimate of

276 differences in cell density; however, both estimates of cell density reveal the same the major

277 conclusions of this study (the alternative cell density difference can be run using the

278 supplemental ipython notebook – Appendix 1).

279 Differential expression analysis—We obtained an average of 25 million reads per sample

280 after cleaning low-quality reads (see appendix 2 for total reads per sample). The Tolmiea

281 reference transcriptome assembly resulted in 58,046 isoforms binned within 28,467

282 clusters (henceforth genes) with an N50 of 1,821 bp. After read mapping, 26,816 genes had

283 at least 5 non-zero counts and were used for downstream differential expression analyses.

284 In all, 15,205 genes were annotated according to GO using Trinotate.

285 A multidimensional scaling plot of the 500 loci with the highest variation in expression

286 level revealed that all three normalization methods (per transcriptome, per cell, and per

287 biomass) performed well at clustering members from the same population with one

288 another (Figure 4A-C). After summing the read counts normalized per cell for each sample,

289 we found that the mean transcription per cell (henceforth transcriptome size) for the

290 tetraploid was 2.1 times higher than the mean for the diploid (Figure 5). Additionally, the

291 total transcriptome size per cell was highly variable, more so in the tetraploids than the

292 diploids (20,815,579 and 11,799,791 normalized counts, respectively -- see appendix 1 for

293 standard deviation within each population). Across the three normalization methods, the

294 differential expression analysis found the tetraploid relative to the diploid had 1,559 up-

295 and 1,071 down-regulated genes per transcriptome, 1,440 up- and 1,550 down-regulated bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 13

296 genes per biomass, and 3,005 up- and 751 down-regulated genes per cell (Figure 4D-F).

297 Across the three different normalization methods, we found 4,555 unique loci were DE

298 under one or more methods (Figure 6). Finer binning of the interactions between

299 normalization methods revealed the majority of DEGs to be either up-regulated in the

300 tetraploid across all normalizations (1,392 – Figure 7A) or only up-regulated per cell (1,398

301 Figure 7D).

302

303 Discussion

304 Differences in gene expression following autopolyploidy: This is the first study to

305 leverage synthetic RNA standards to characterize differences in gene expression level per

306 transcriptome, per biomass, and on a relative per-cell basis between ploidal levels. Through

307 the use of this ‘three normalization approach’ and characterization of gene expression levels

308 in diploid and autotetraploid Tolmiea, we found 4,555 out of 26,816 loci were DE (~17% of

309 the transcriptome) and identified four notable trends. First, the per-transcriptome

310 normalization, the method researchers typically use, captured the fewest DEGs and failed to

311 detect any DEGs not found by the other two methods. Second, most differential expression

312 occurs on a per-cell basis, and there is a clear unbalanced distribution of up- versus down-

313 regulation in the autotetraploid relative to the diploid, with 3,005 up- versus 751 down-

314 regulated DEGs per cell. Third, in the tetraploid, transcripts up-regulated per cell appear to

315 compensate for a decreased cell density, resulting in a conservation of expression level per

316 biomass relative to the diploid (see Figure 8). Loci exhibiting conservation per biomass

317 were significantly enriched for functions related to photosynthesis and the chloroplast.

318 Fourth, transcriptome size varied substantially across our dataset, with a significant

319 increase in the inferred transcriptome size of the tetraploid relative to the diploid (Figure

320 5). Below we discuss each of these four trends in greater detail. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 14

321 Autopolyploids have rarely been compared to their diploid progenitors with respect to

322 divergence in expression level (e.g., Stupar et al., 2007; Del Pozo and Ramirez-Parra, 2014;

323 Zhang et al., 2014). These previous diploid-autopolyploid comparisons found that

324 autopolyploids tend to deviate from diploid-like gene expression levels across 1-10% of the

325 transcriptome (~6% in Paulownia fortunei – Zhang et al., 2014; ~10% in Solanum phureja –

326 Stupar et al., 2007; ~1-4% in Arabidopsis – Del Pozo and Ramirez-Parra, 2014). However,

327 none of these comparisons was normalized using spike-in standards; instead they used

328 what is referred to here as a per-transcriptome comparison. Unlike the spike-in-derived

329 per-biomass and per-cell normalized transcript counts, the results of per-transcriptome

330 normalizations reflect concentration differences and are not a proxy for absolute

331 expression. Therefore, the results of previous autopolyploid expression studies must be

332 interpreted as differences in transcript concentration rather than absolute abundance.

333 Spike-in standards have rarely been used in any evolutionary comparisons and have

334 primarily been adopted for use in studies of model fish (zebrafish), mammals (human,

335 mouse, and hamster), and plants (Arabidopsis) (e.g., Brennecke et al., 2013; Fomina-yadlin

336 et al., 2014; Xu et al., 2014; Schall et al., 2017). Despite the use of spike-ins in these

337 investigations of model systems, in methods development, and in single-cell sequencing,

338 spike-in approaches have not yet been applied to evolutionarily motivated studies of

339 natural populations. The lack of spike-in usage in non-model studies is surprising, because

340 a reference genome is not a requirement for a spike-in-based normalization. In fact, spike-

341 ins may be more important in non-model versus model systems because comparisons

342 within non-model systems are often broad (e.g., interspecific comparisons) and may be

343 more likely to result in unequal transcriptome size between treatment groups.

344 Considering only the results of our per-transcriptome normalization, approximately 9%

345 of the Tolmiea transcriptome was differentially expressed on a concentration basis between bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 15

346 the diploid and autotetraploid species, in line with the results for other diploid--

347 autopolyploid pairs cited above. This finding suggests that, in general, only a small fraction

348 of the transcriptome, less than 10%, responds to autopolyploidization through novel

349 alterations to transcript concentration. Despite differences in transcript abundance per

350 transcriptome representing less than 10% of all loci, concentration-based differences could

351 have important consequences for the stoichiometry of gene expression pathways.

352 Unfortunately, although Tolmiea is a good evolutionary model, it is not a genetic model, and

353 the ability to investigate specific pathways and make inferences regarding physiological

354 impact is severely hampered. However, it would be important for future studies to test

355 whether members of a given pathway respond similarly with respect to the maintenance of

356 transcript abundances relative to the transcriptome, cell, or biomass.

357 It is also notable that although ~9% of the loci examined in Tolmiea were differentially

358 expressed per transcriptome, none of these DEGs were uniquely recovered only from the

359 per-transcriptome analysis. The majority of per-transcriptome DEGs (2,106 of the 2,630)

360 were differentially expressed at a sufficiently high magnitude to be detected by all three

361 normalization methods. These results suggest that the loci typically identified as DE in

362 previous studies of polyploid gene expression represent only differences in expression level

363 extreme enough to be detected through a significantly altered concentration. Therefore,

364 examination of more subtle differences in expression level requires the use of normalization

365 approaches that allow for quantitative comparisons of transcript abundance.

366 When we compared expression level per cell, we found that ~14% (3,756 loci) of the

367 transcriptome was maintained in different abundances per cell between diploid and

368 tetraploid Tolmiea. Importantly, the direction of differences in expression level in T.

369 menziesii relative to the diploid was extremely unbalanced. Nearly all per-cell DEGs were

370 up-regulated in T. menziesii (3,005 loci), with only 751 down-regulated. In addition, 1,398 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 16

371 of the 3,005 per-cell DEGs up-regulated in the tetraploid were unique to the per-cell

372 normalization and not recovered under either per-transcriptome or per-biomass

373 normalizations (Figure 7D). Taken together, it appears that most differential expression in

374 Tolmiea represents a pattern consistent with dosage sensitivity, with increased levels of

375 gene expression in the tetraploid correlating with an increase in allele copy number. This

376 result would not have been detected under a typical library-size normalization method, as

377 the expression of these loci is not significantly altered in concentration relative to the

378 whole.

379 The over-abundance of up-regulation per cell in the tetraploid may serve as a

380 mechanism to compensate for reduced cell density. In other words, there is tendency in T.

381 menziesii for conserved gene expression levels per biomass through novelty at the cellular

382 level (henceforth per-biomass conservation) (Figure 8). Approximately 1,398 loci, or 5.2%

383 of all loci, in T. menziesii exhibit per-biomass conservation (Figure 7D). This buffering effect

384 is achieved by per-cell up-regulation in T. menziesii mirroring the decrease in cell density

385 relative to T. diplomenziesii. Fifteen functional categories were significantly over-

386 represented among the loci exhibiting per-biomass conservation. Of these, seven were

387 related to either the chloroplast or photosynthesis. Whether there is selective pressure to

388 conserve expression per biomass is unclear, but alteration of photosynthesis either per cell

389 or per biomass appears to be a recurring theme across diploid/polyploid comparisons (e.g.,

390 Warner and Edwards, 1993; Vyas et al., 2007; Coate et al., 2013). For example, Warner and

391 Edwards (1993) revealed photosynthetic conservation per biomass in M. sativa, but an

392 overall increase in photosynthesis per biomass following polyploidy in A. confertifolia. A

393 clear trend in the effects of polyploidization on photosynthesis per biomass has yet to

394 emerge, and like many aspects of polyploidy, it may be lineage-specific and/or require

395 additional study systems (Soltis et al., 2016). bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 17

396 Diploid and tetraploid Tolmiea occur under similar light regimes in nature (Visger et al.,

397 2016), and previously collected physiological data revealed no significant difference in

398 photosynthetic rate per leaf area (Visger et al., 2016) (Figure 9). In Tolmiea, conservation of

399 expression level per biomass may be a mechanism for the maintenance of optimal

400 photosynthesis per biomass, facilitating the ecological conservation of light preference in

401 Tolmiea. To determine if photosynthesis-related functional enrichment of conservation per

402 biomass is indeed a key underlying molecular mechanism for buffering cell density as it

403 pertains to photosynthesis, additional autopolyploid systems should be similarly studied.

404 Revisiting the work of Warner and Edwards (1993) using our spike-in standard-based gene

405 expression approach should also show a similar conservation of gene expression per

406 biomass in polyploid M. sativa. Conversely, in A. confertifolia where polyploidy increases

407 photosynthesis per leaf area, we might expect photosynthesis-related gene expression per

408 cell to be increased by a factor greater than the difference in cell density between diploid

409 and polyploid plants (Warner and Edwards, 1993).

410 An initial motivation for utilizing a spike-in approach to read-count normalization was

411 to tease apart variation in library size from differences in transcriptome size between

412 diploid and tetraploid Tolmiea. We found that when normalizing read counts per cell, the

413 mean transcriptome size of the tetraploid is over twice that of the diploid (Figure 5). This

414 difference in transcriptome size should be qualified, as the variation within and between

415 populations is quite large, though the mean transcriptome size still significantly differed

416 with ploidal level. Notably, we also observed that the variability of transcriptome size was

417 greater in the tetraploids. The diploid populations all exhibited a similar degree of variation

418 in transcriptome size, while tetraploid populations represented both the least and greatest

419 variation in transcriptome size (Figure 5; see Appendix 1 for standard deviation by

420 population). Excluding all other results presented in this study, the variability of bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 18

421 transcriptome size alone should be motivation enough for researchers of polyploidy to

422 adopt a spike-in-based approach.

423

424 The benefits of spike-in RNA standards: This study is not the first to apply synthetic

425 RNA spike-in standards in dealing with a transcriptome-wide effect (e.g., see Lovén et al.,

426 2012; Fomina-yadlin et al., 2014). However, this is the first study to: 1) leverage spike-in

427 standards in a cross-ploidy comparison, and 2) quantify expression level on three different,

428 biologically relevant scales. By using multiple read-count normalizations, with and without

429 spike-in standards, we investigated the interaction of expression level per cell and per

430 biomass between diploid and autotetraploid Tolmiea. Had this study been performed

431 without the use of spike-in standards, the results would have been limited to those of the

432 per-transcriptome normalization.

433 Spike-in normalization is most valuable when evaluating differences in gene expression

434 levels between two groups that have differing cell density and/or transcriptome size. By

435 normalizing RNAseq count data by the abundance of spike-in standards, sequencing depth

436 and transcriptome size are effectively disentangled. Methods employed by previous

437 comparative studies of diploid/polyploid pairs were limited to quantifying transcriptional

438 differences on a concentration-basis only. Concentration-based comparisons, while useful

439 for inferring alterations of pathway stoichiometry, are effectively blind to large proportions

440 of the transcriptome exhibiting additive expression levels associated with polyploidy. That

441 is, if the expression of many genes is increased in a single direction and is commensurate

442 with the increase in ploidal level, then the impact on any single gene’s concentration will be

443 minimal. An additional advantage of employing a spike-in normalization is that transcript

444 abundance can be independently quantified relative to biomass and relative to cell. Spike-in bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 19

445 reads can also be removed for some downstream analyses, allowing for a typical library-size

446 normalization so that concentration-based differences may be characterized as well.

447 The three normalization approaches presented here are all individually informative,

448 and the decision to include any or all of them should be guided by the research question.

449 For example, if the research question revolves around the bulk production of a compound,

450 evaluating differences in expression level per biomass may be the best approach. Research

451 questions focusing on complex gene pathways may be better served by an analysis of

452 expression level per cell. Information on potential differences in expression level

453 stoichiometry can be gained using traditional comparisons per transcriptome. Additionally,

454 as demonstrated by our study of Tolmiea, the interaction among multiple normalization

455 approaches can be even more informative than any single approach.

456 While the use of non-concentration-based normalizations can enable researchers to

457 address new questions, there is an important caveat that could lead to potential biases or

458 increased uncertainty. Differences in RNA extraction efficiency between treatment groups

459 are difficult to tease apart from variation in total transcriptome size. In much the same way

460 that variation in transcriptome size influences estimates of expression level, differences in

461 extraction efficiency could bias calculations of expression level for per-cell and per-biomass

462 analyses. For example, if RNA extraction were half as efficient in one treatment group

463 versus another, then differential expression analyses per cell would only consider half of the

464 actual transcript abundance of one group relative to the other. Future approaches should

465 consider partially accounting for this issue through the addition of a second unique set of

466 spike-ins prior to RNA extraction. Comparing the ratio of pre- versus post-extraction spike-

467 in abundance should highlight differences in extraction efficiency. However, even the use of

468 a second spike-in set will not account for different extraction efficiencies if those differences

469 arise from variation in cell lysis. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 20

470 In summary, this study has demonstrated that the use of synthetic RNA spike-in

471 standards can be used to explore previously uninvestigated aspects of divergence in gene

472 expression levels in a comparison of a diploid and its autotetraploid derivative. To our

473 knowledge, this multiple normalization approach has recovered the largest fraction of a

474 transcriptome as DE in a diploid/autopolyploid species pair ever reported (~17% in

475 Tolmiea vs. up to ~10% in several other plant systems; Stupar et al., 2007; Del Pozo and

476 Ramirez-Parra, 2014; Zhang et al., 2014). Further, our methodology allowed for a fine-scale

477 examination of how divergence in gene expression level interacts with cell density,

478 revealing a mosaic of difference in transcript concentration, abundance per cell, and

479 abundance across tissue.

480 While we compared a diploid-autopolyploid pair, in which a difference in transcriptome

481 size might be predicted, variation in transcriptome size is rarely investigated and could be

482 widespread in biological systems at diverse scales due to factors other than shifts in ploidy.

483 To date, nearly all studies of global gene expression have used normalization methods that

484 implicitly assume transcriptome size is invariable, yet this assumption is not empirically

485 supported. Examples of studies where variation in transcriptome size might be likely

486 include (but are by no means limited to) comparisons between related species, between

487 different developmental stages, and across stress treatments. Yet even in these cases, the

488 potential for variation in transcriptome size remains ignored and uninvestigated. If

489 transcriptome size is in fact invariable between two experimental treatment groups, then

490 following our proposed methodology, the results of the per-transcriptome and per-cell

491 comparisons should be identical. It is of critical importance that researchers making

492 comparisons using RNAseq data, particularly in the broad suite of examples noted above,

493 avoid making the assumption that transcriptome size is invariable by contrasting multiple

494 normalization approaches. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 21

495 Acknowledgements:

496 We are thankful to J.E. Coate for insightful discussions on normalization. Clayton J. Visger was

497 supported by NSF Graduate Research Fellowship DGE- 1315138. This research was supported in

498 part by an NSF Doctoral Dissertation Improvement Grant DEB-1501803 and by the 1KP project.

499

500 Author’s contributions:

501 C.J.V., G.K.W., P.S.S., and D.E.S. planned and designed the research. C.J.V. and Y.Z. performed

502 the experiments. C.J.V. analyzed the data. C.J.V., P.S.S., and D.E.S. wrote the manuscript.

503

504 References: 505 Adams KL, Cronn R, Percifield R, Wendel JF. 2003. Genes duplicated by polyploidy show 506 unequal contributions to the transcriptome and organ-specific reciprocal silencing. 507 Proceedings of the National Academy of Sciences of the United States of America 100: 508 4649–4654. 509 Adams KL, Wendel JF. 2013. Dynamics of Duplicated Gene Expression in Polyploid Cotton. 510 Polyploid and Hybrid Genomics. John Wiley and Sons, Inc., 187–194. 511 Ainouche M, Chelaifa H, Ferreira J, Bellot S, Ainouche A, Salmon A. 2012. Polyploid 512 Evolution in Spartina: Dealing with Highly Redundant Hybrid Genomes. In: Soltis PS,, 513 In: Soltis DE, eds. Polyploidy and Genome Evolution. Berlin, Heidelberg: Springer Berlin 514 Heidelberg, 225–243. 515 Anders S, Huber W. 2010. Differential expression analysis for sequence count data. 516 Genome Biology 11: R106. 517 Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, Causton H, Conley MP, 518 Elespuru R, Fero M, et al. 2005. The External RNA Controls Consortium: a progress 519 report. Nature methods 2: 731–734. 520 Barker MS, Husband BC, Pires JC. 2016. Spreading Winge and flying high: The 521 evolutionary importance of polyploidy after a century of study. American Journal of 522 Botany 103: 1139–1145. 523 Brennecke P, Anders S, Kim JK, Kołodziejczyk AA, Zhang X, Proserpio V, Baying B, 524 Benes V, Teichmann SA, Marioni JC, et al. 2013. Accounting for technical noise in 525 single-cell RNA-seq experiments. Nature Methods 10: 1093–1095. 526 Brown R, Rickless P. 1949. A new method for the study of cell division and cell extension 527 with some preliminary observations on the effect of temperature and of nutrients. 528 Proceedings of the Royal Society of London. Series B, Biological Sciences 136: 110–125. 529 Buggs RJA. 2012. Monkeying around with ploidy. Molecular Ecology 21: 5159–5161. 530 Buggs RJA, Wendel JF, Doyle JJ, Soltis DE, Soltis PS, Coate JE. 2014. The legacy of diploid 531 progenitors in allopolyploid gene expression patterns. Philosophical Transactions of the 532 Royal Society of London. Series B, Biological Sciences 369: 20130354. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 22

533 Cañestro C. 2012. Two rounds of whole genome duplication: Evidence and impact on the 534 evolution of vertebrate innovations. In: Soltis PS, Soltis DE, eds. Polyploidy and Genome 535 Evolution. Springer, New York. 536 Chelaifa H, Monnier A, Ainouche M. 2010. Transcriptomic changes following recent 537 natural hybridization and allopolyploidy in the salt marsh species Spartina x townsendii 538 and Spartina anglica (Poaceae). The New Phytologist 186: 161–174. 539 Chen JZ, Birchler JA. 2013. Polyploid and Hybrid Genomics. New York: Wiley-Blackwell. 540 Coate JE, Doyle JJ. 2010. Quantifying whole transcriptome size, a prerequisite for 541 understanding transcriptome evolution across species: An example from a plant 542 allopolyploid. Genome Biology and Evolution 2: 534–546. 543 Coate JE, Doyle JJ. 2015. Variation in transcriptome size: Are we getting the message? 544 Chromosoma 124: 27–43. 545 Coate JE, Powell AF, Owens TG, Doyle JJ. 2013. Transgressive physiological and 546 transcriptomic responses to light stress in allopolyploid Glycine dolichocarpa 547 (Leguminosae). Heredity 110: 160–170. 548 Del Pozo JC, Ramirez-Parra E. 2014. Deciphering the molecular bases for drought 549 tolerance in Arabidopsis autotetraploids. Plant, Cell and Environment: 2722–2737. 550 Dong S, Adams KL. 2011. Differential contributions to the transcriptome of duplicated 551 genes in response to abiotic stresses in natural and synthetic polyploids. The New 552 Phytologist 190: 1045–1057. 553 Doyle JJ. 2012. Polyploidy in legumes. In: Soltis PS, Soltis DE, eds. Polyploidy and Genome 554 Evolution. Springer, New York. 555 Doyle JJ, Doyle JL. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf 556 tissue. Phytochemistry Bulletin: 11–15. 557 Doyle JJ, Flagel LE, Paterson AH, Rapp RA, Soltis DE, Soltis PS, Wendel JF. 2008. 558 Evolutionary genetics of genome merger and doubling in plants. Annual Review of 559 Genetics 42: 443–461. 560 External RNA Controls Consortium. 2005. Proposed methods for testing and selecting the 561 ERCC external RNA controls. BMC Genomics 6: 150. 562 Fomina-yadlin D, Du Z, Mcgrew JT. 2014. Gene expression measurements normalized to 563 cell number reveal large scale differences due to cell size changes, transcriptional 564 amplification and transcriptional repression in CHO cells. Journal of Biotechnology 189: 565 58–69. 566 Gaeta RT, Pires JC, Iniguez-Luy F, Leon E, Osborn TC. 2007. Genomic changes in 567 resynthesized Brassica napus and their effect on gene expression and phenotype. The 568 Plant Cell 19: 3403–3417. 569 Germain P-L, Vitriolo A, Adamo A, Laise P, Das V, Testa G. 2016. RNAontheBENCH: 570 computational and empirical resources for benchmarking RNAseq quantification and 571 differential expression methods. Nucleic Acids Research 44: 5054–5067. 572 Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, 573 Raychowdhury R, Zeng Q, et al. 2011. Full-length transcriptome assembly from RNA- 574 Seq data without a reference genome. Nature Biotechnology 29: 644–652. 575 Greilhuber J, Dolezel J, Wendel J, Leitch IJ. 2012. Plant Genome Diversity, Volume 2: 576 Phsyical Structure, Behaviour and Evolution of Plant Genomes. Springer Vienna. 577 Gu J, Wang X, Halakivi-Clarke L, Clarke R, Xuan J. 2014. BADGE: A novel Bayesian model 578 for accurate abundance quantification and differential analysis of RNA-Seq data. BMC 579 Bioinformatics 15: S6. 580 Ilut DC, Coate JE, Luciano AK, Owens TG, May GD, Farmer A, Doyle JJ. 2012. A 581 comparative transcriptomic study of an allotetraploid and its diploid progenitors 582 illustrates the unique advantages and challenges of RNA-Seq in plant species. American bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 23

583 Journal of Botany 99: 383–396. 584 Jiao Y, Paterson AH. 2014. Polyploidy-associated genome modifications during land plant 585 evolution. Philosophical transactions of the Royal Society of London. Series B, Biological 586 Sciences 369. 587 Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho 588 LP, Hu Y, Liang H, Soltis PS, et al. 2011. Ancestral polyploidy in seed plants and 589 angiosperms. Nature 473: 97–100. 590 Jordon-Thaden IE, Chanderbali AS, Gitzendanner MA, Soltis DE. 2015. Modified CTAB 591 and TRIzol protocols improve RNA extraction from chemically complex Embryophyta. 592 Applications in Plant Sciences 3. 593 Joshi NA, Fass JN. 2011. Sickle: A sliding-window, adaptive, quality-based trimming tool for 594 FastQ files (Version 1.33)[Software]. 595 Krishnaswami SR, Grindberg R V, Novotny M, Venepally P, Lacar B, Bhutani K, Linker 596 SB, Pham S, Erwin JA, Miller JA, et al. 2016. Using single nuclei for RNA-seq to capture 597 the transcriptome of postmortem neurons. Nature Protocols 11: 499–524. 598 Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature 599 Methods 9: 357–359. 600 Liu B, Brubaker CL, Mergeai G, Cronn RC, Wendel JF. 2001. Polyploid formation in cotton 601 is not accompanied by rapid genomic changes. Genome 44: 321–330. 602 Liu SJ, Nowakowski TJ, Pollen AA, Lui JH, Horlbeck MA, Attenello FJ, He D, Weissman 603 JS, Kriegstein AR, Diaz AA, et al. 2016. Single-cell analysis of long non-coding RNAs in 604 the developing human neocortex. Genome Biology 17: 67. 605 Lovén J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, Levens DL, Lee TI, Young 606 RA. 2012. Revisiting global gene expression analysis. Cell 151: 476–482. 607 Madlung a, Wendel JF. 2013. Genetic and epigenetic aspects of polyploid evolution in 608 plants. Cytogenetic and Genome Research 140: 270–285. 609 Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing 610 reads. EMBnet.journal 17: 10. 611 Masterson J. 1994. Stomatal size in fossil plants: evidence for polyploidy in majority of 612 angiosperms. Science 264: 421–424. 613 Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying 614 mammalian transcriptomes by RNA-Seq. Nature methods 5: 621–628. 615 Nie Z, Hu G, Wei G, Cui K, Yamane A, Resch W, Wang R, Green DR, Tessarollo L, 616 Casellas R, et al. 2012. c-Myc is a universal amplifier of expressed genes in 617 lymphocytes and embryonic stem cells. Cell 151: 68–79. 618 Pine SP, Munro SA, Parsons JR, Mcdaniel J, Bergstrom Lucas A, Lozach J, Myers TG, Su 619 Q, Jacobs-Helber SM, Salit M. 2016. Evaluation of the External RNA Controls 620 Consortium (ERCC) reference material using a modified Latin square design. BMC 621 Biotechnology 16. 622 Rambani A, Page JT, Udall JA. 2014. Polyploidy and the petal transcriptome of Gossypium. 623 BMC Plant Biology 14: 3. 624 Renny-Byfield S, Wendel JF. 2014. Doubling down on genomes: Polyploidy and crop 625 plants. American Journal of Botany 101: 1–15. 626 Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. 2015. limma powers 627 differential expression analyses for RNA-sequencing and microarray studies. Nucleic 628 Acids Research 43: e47. 629 Roberts A. 2013. Thesis: Ambiguous fragment assignment for high-throughput sequencing 630 experiments. EECS Department, University of California, Berkeley. 631 Salmon A, Flagel L, Ying B, Udall JA, Wendel JF. 2010. Homoeologous nonreciprocal 632 recombination in polyploid cotton. New Phytologist 186: 123–134. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 24

633 Schall KA, Thornton ME, Isani M, Holoyda KA, Hou X, Lien C-L, Grubbs BH, Grikscheit 634 TC. 2017. Short bowel syndrome results in increased gene expression associated with 635 proliferation, inflammation, bile acid synthesis and immune system activation: RNA 636 sequencing a zebrafish SBS model. BMC Genomics 18: 23. 637 Shi X, Ng DW-K, Zhang C, Comai L, Ye W, Chen ZJ. 2012. Cis- and trans-regulatory 638 divergence between progenitor species determines gene-expression novelty in 639 Arabidopsis allopolyploids. Nature Communications 3: 950. 640 Soltis DE, Soltis PS. 1988. Electrophoretic evidence for tetrasomic segregation in Tolmiea 641 menziesii (Saxifragaceae). Heredity 60: 375–382. 642 Soltis PS, Soltis DE. 2009. The role of hybridization in plant speciation. Annual review of 643 plant biology 60: 561–88. 644 Soltis DE, Soltis PS. 2012. Polyploidy and Genome Evolution. Berlin, Heidelberg: Springer- 645 Verlag. 646 Soltis D, Soltis P, Ranker T, Ness B. 1989. Chloroplast DNA variation in a wild plant, 647 . Genetics 121: 819–826. 648 Soltis DE, Visger CJ, Marchant DB, Soltis PS. 2016. Polyploidy: Pitfalls and paths to a 649 paradigm. American Journal of Botany 103: 1–21. 650 Soltis DE, Visger CJ, Soltis PS. 2014. The polyploidy revolution then...and now: Stebbins 651 revisited. American Journal of Botany 101: 1057–1078. 652 Spoelhof JP, Soltis PS, Soltis DE. (Accepted). Pure polyploidy: Closing the gaps in 653 autopolyploid research. Journal of Systematics and Evolution. 654 Stebbins GL. 1950. Variation and Evolution in Plants. Columbia University Press. 655 Stebbins Jr. GL. 1971. Chromosomal Evolution in Higher Plants. Addison-Wesley. 656 Stupar RM, Hermanson PJ, Springer NM. 2007. Nonadditive expression and parent-of- 657 origin effects identified by microarray and allele-specific expression profiling of maize 658 endosperm. Plant Physiology 145: 411–425. 659 Van de Peer Y. 2011. A mystery unveiled. Genome Biology 12: 113. 660 Van de Peer Y, Fawcett JA, Proost S, Sterck L, Vandepoele K. 2009. The flowering world: 661 A tale of duplications. Trends in Plant Science 14: 680–688. 662 Visger CJ, Germain-Aubrey CC, Patel M, Sessa EB, Soltis PS, Soltis DE. 2016. Niche 663 divergence between diploid and autotetraploid Tolmiea. American Journal of Botany 664 103: 1396–1406. 665 Vyas P, Bisht MS, Miyazawa S-I, Yano S, Noguchi K, Terashima I, Funayama-Noguchi S. 666 2007. Effects of polyploidy on photosynthetic properties and anatomy in leaves of Phlox 667 drummondii. Functional Plant Biology 34: 673. 668 Wagner W. 1970. Biosystematics and Evolutionary Noise. Taxon 19: 146–151. 669 Warner DA, Edwards GE. 1989. Effects of polyploidy on photosynthetic rates, 670 photosynthetic enzymes, contents of DNA, chlorophyll, and sizes and numbers of 671 photosynthetic cells in the C(4) dicot Atriplex confertifolia. Plant Physiology 91: 1143– 672 1151. 673 Warner DA, Edwards GE. 1993. Effects of polyploidy on photosynthesis. Photosynthesis 674 Research 35: 135–147. 675 Warnes GR, Bolker B, Bonebakker L, Gentleman R., Huber W, Liaw A, Lumley T, 676 Maechler M, Magnusson A, Moeller S, Schwartz M. 2009. gplots: Various R 677 programming tools for plotting data. R Package Version, 2(4). 678 Wendel JF. 2015. The wondrous cycles of polyploidy in plants. American Journal of Botany 679 102: 1753–1756. 680 Xu J, Su Z, Hong H, Thierry-Mieg J, Thierry-Mieg D, Kreil DP, Mason CE, Tong W, Shi L. 681 2014. Cross-platform ultradeep transcriptomic profiling of human reference RNA 682 samples by RNA-Seq. Scientific Data 1: 140020. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 25

683 Yoo M-J, Liu X, Pires JC, Soltis PS, Soltis DE. 2014. Nonadditive gene expression in 684 polyploids. Annual Review of Genetics 48: 485–517. 685 Young MD, Wakefield MJ, Smyth GK, Oshlack A, Fu X, Fu N, Guo S, Yan Z, Xu Y, Hu H, et 686 al. 2010. Gene ontology analysis for RNA-seq: Accounting for selection bias. Genome 687 Biology 11: R14. 688 Zhang X, Deng M, Fan G. 2014. Differential transcriptome analysis between Paulownia 689 fortunei and its synthesized autopolyploid. International Journal of Molecular Sciences 690 15: 5079–5093. 691 692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 26

713 Figures:

"diploid" "polyploid"

Transcript of interest

Non-target transcript

Spike-in standard

Per-transcriptome normalization: A DE analysis tests whether the transcript of interest is per per maintained in a different abundence relative to the total = transcriptome. Per-biomass normalization: B DE analysis tests whether the transcript of interest is per per maintained at different levels within a given biomass <

Per-cell normalization: DE analysis tests whether a given transcript is per per C maintained at different levels within a cell ÷ < ÷

714 715 Figure 1: A simplified example of how spike-in standards can be used during read

716 normalization to enable comparisons of expression level at different biological scales

717 between a hypothetical diploid-polyploid pair with differing cell density. The large circles

718 represent a unit of biomass and contain a number of cells (green squares). Beneath each

719 circle is a depiction of how the read normalizations are calculated. Under a per

720 transcriptome normalization, the ratio of target transcripts to the total transcriptome is

721 compared. While the per biomass normalization uses the ratio of the transcript of interest

722 to the spike-in transcripts. The per cell normalization also uses the ratio of the transcript of

723 interest to spike-in transcripts, but scales the spike-in transcript abundance by cell density,

724 represented here by multiplying the spike-in abundance by the number of contributing bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 27

725 cells. Whether the transcript of interest would be found as not differentially expressed or

726 higher/lower expressed in the polyploid under each normalization is indicted using ‘=’, ‘<’,

727 or ‘>’ respectively.

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 28

745

746 Figure 2: Generalized distributions of Tolmiea menziesii and T. diplomenziesii. The

747 population sources for plants used in this study are represented as red triangles (T.

748 diplomenziesii) and blue squares (T. menziesii).

749 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 29

A h B 6 8e+05 1. 4 1. 6e+05 2 1. 0 1. 4e+05 1C DNA ug/ml 8 0. 6 2e+05 0. Estimated cells per 2-cm diameter leaf punc

T. diplomenziesii T. menziesii T. diplomenziesii T. menziesii

750 751 Figure 3: Results of ploidy variation in leaf cell density using, A: 1C DNA concentration

752 following a DNA/RNA co-extraction, and B: estimated cell counts per 2cm diameter leaf

753 punch.

754

755

756

757

758

759

760

761

762

763

764

765

766

767 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 30

Per transcriptome Per biomass Per cell 3 3 A 3 B C 2 2 2 1 1 1 2 0 0 0 1 - 1 Leading logFC dim - 1 - 2 - 2 - 2 - 3 - 3 - 3 -

-4 -2 0 2 -4 -2 0 2 -4 -2 0 2 Leading logFC dim 1 Leading logFC dim 1 Leading logFC dim 1 Log Fold Change

Average Log Expression Average Log Expression Average Log Expression 768 769 Figure 4: Results from multiple differential expression analysis. MDS plots A-C cluster

770 individual based on the 500 most variable loci, with color indicating ploidal level and shape

771 reflecting population of origin. MA plots D-F show every locus in the Tolmiea transcriptome

772 (represented as dots), with log fold expression level differences in the polyploid relative to

773 the diploid on the y-axis and average expression level on the x-axis-- red indicates statistical

774 significance.

775

776

777

778

779

780

781

782

783

784 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 31

1e7 Total reads normalized per Cell 8 7 Tolmiea menziesii

6

5

4

3

2 Sum of normalized counts

1

0 Population 785 786 Figure 5: Sum of read counts normalized per cell, and clustered by population of origin.

787 Diploid and tetraploid mean significantly differed (p < 0.008).

788

789

790

791

792

793

794

795

796

797 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 32

Differentially expressed genes by normalization method

Per Cell Per Biomass 85 1,398 442

2,106 167 357

Per Transcriptome 0

n =4,555 798 799 Figure 6: Venn diagram contrasting the three normalization methods. Numbers within the

800 different sections indicate loci that were identified as being differentially expressed

801 between Tolmiea menziesii and T. diplomenziesii.

802

803

804

805

806

807

808

809

810

811

812 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 33

s

ranscriptome T Per Per Cell Per Biomas

Up-regulated in 4x

Not DE Down-regulated in 4x A B C D Number/Proportion of loci: 1,392/5.19%: None 167/0.623%: None 48/0.179%: None 1,398/5.21%: Photosystem I, Photosynthesis, Signifcantly enriched GO categories light harvesting, Chlorophyll binding, Protein- chromophore linkage, Photosystem II, Auxin polar transport, Protein xylosyltransferase activity, Chloroplast thylakoid membrane, Anthocyanin accumulation in tissues, Auxin- s activated signaling pathway, Lateral root formation, Positive gravitropism, Auxin infux ranscriptome transmembrane transporter, Chondroitin T sulfate biosynthetic, Chloroplast thylakoid Per Per Cell Per Biomas

Up-regulated in 4x E F G H

Not DE

Down-regulated in 4x

Number/Proportion of loci: 714/2.66%: None 37/0.138%: None 442/1.65%: None 357/1.33%: Drug transmembrane transport Signifcantly enriched GO categories activity, Hydrogen ion transmembrane transport, Proanthocyanidin biosynthetic process, Solute: proton antiporter activity, Maintenance of seed dormancy

813

814 Figure 7: Loci binned by their DE categorization across the three normalization

815 approaches. The number of loci belonging to each bin and the results of GO enrichment

816 analyses are reported below the corresponding bin. Bins containing no loci are not shown.

817

818

819

820

821

822

823 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 34 "diploid" "polyploid" Leaf punch Cell Transcript

Conservation of gene expression level per biomass (per biomass conservation)

per per 10x per 10x per

824 825 Figure 8: A simplified example of an interaction between expression level and cell density.

826 =Conservation of gene expression per biomass occurs when expression level per cell in

827 samples with lower cell density is up-regulated enough to yield equivalent levels of

828 transcript per unit biomass.

829

830

831

832

833

834

835

836 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 35

Photosynthetic rate 10 8 1 6 µmol CO2 m-2 s- 4 2

Tolmiea diplomenziesii Tolmiea menziesii

837 838 Figure 9: Adapted from data collected from Visger et al. 2016. Tolmiea diplomenziesii and

839 T. menziesii did not significantly differ in photosynthetic output under common garden

840 conditions in the green houses of University of Florida.