bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1
1 Divergent gene expression levels between diploid and autotetraploid Tolmiea
2 (Saxifragaceae) relative to the total transcriptome, the cell, and biomass.
3
4 Clayton J. Visger1, Gane K-S. Wong2,3, Yong Zhang4,5, Pamela S. Soltis6,7, and Douglas E.
5 Soltis6,7,8
6 7 Author Affiliation: 8 9 1Department of Biological Sciences, California State University Sacramento, Sacramento, CA
10 95819, USA
11 2Department of Biological Sciences, University of Alberta, Edmonton AB, T6G 2E9, Canada
12 3Department of Medicine, University of Alberta, Edmonton AB, T6G 2E1, Canada
13 4Bejing Genomics Institute-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen
14 518083, China
15 5HuaHan Gene Co. Ltd., 7F JianAnShanHai Building, No. 8000, Shennan Road, Futian District,
16 Shenzhen 518040, China
17 6Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
18 7Genetics Institute, University of Florida, Gainesville, FL 32610, USA
19 8Department of Biology, University of Florida, Gainesville, FL 32611, USA
20
21
22
23 Corresponding Author: Clayton J. Visger, [email protected]
24 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2
25 Summary:
26 • Studies of gene expression and polyploidy are typically restricted to characterizing
27 differences in transcript concentration. Integrating multiple methods of transcript
28 analysis, we document a difference in transcriptome size, and make multiple
29 comparisons of transcript abundance in diploid and autotetraploid Tolmiea.
30 • We use RNA spike-in standards to identify and correct for differences in
31 transcriptome size, and compare levels of gene expression across multiple scales:
32 per transcriptome, per cell, and per biomass.
33 • In total, ~17% of all loci were identified as differentially expressed (DEGs) between
34 the diploid and autopolyploid species. A shift in total transcriptome size resulted in
35 only ~58% of the total DEGs being identified as differentially expressed following a
36 per transcriptome normalization. When transcript abundance was normalized per
37 cell, ~82% of the total DEGs were recovered. The discrepancy between per-
38 transcriptome and per-cell recovery of DEGs occurs because per-transcriptome
39 normalizations are concentration-based and therefore blind to differences in
40 transcriptome size.
41 • While each normalization enables valid comparisons at biologically relevant scales,
42 a holistic comparison of multiple normalizations provides additional explanatory
43 power not available from any single approach. Notably, autotetraploid loci tend to
44 conserve diploid-like transcript abundance per biomass through increased gene
45 expression per cell, and these loci are enriched for photosynthesis-related functions.
46 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 3
47 Introduction
48 Polyploidy (whole-genome duplication; WGD), a process now recognized to be of major
49 importance across many eukaryotic lineages (Van de Peer et al., 2009; Jiao et al., 2011; Van
50 de Peer, 2011; Jiao and Paterson, 2014), was long considered an evolutionary dead-end by
51 some, including several of the most prominent evolutionary biologists of the past century
52 (Stebbins, 1950; Wagner, 1970). However, during the last several decades there has been a
53 resurgence of interest in the study of polyploid evolution, particularly the genetic and
54 genomic consequences of polyploidy (e.g., Ainouche et al., 2012; Barker et al., 2016;
55 Canestro, 2012; Chen and Birchler, 2013; Doyle, 2012; Gaeta et al., 2007; Doyle et al., 2008;
56 Soltis and Soltis, 2009, 2012; Salmon et al., 2010; Greilhuber et al., 2012; Shi et al., 2012;
57 Madlung and Wendel, 2013; Renny-Byfield and Wendel, 2014; Soltis et al., 2014; Wendel,
58 2015; Spoelhof et al., accepted).
59 The role of polyploidy in facilitating changes in gene expression, through expression
60 level divergence, altered expression patterns (e.g., across tissue-types), and/or the
61 generation of unique splice variants, is arguably one of the most important research topics
62 in the field today (e.g., Liu et al., 2001; Adams et al., 2003; Adams and Wendel, 2012;
63 Chelaifa et al., 2010; Dong and Adams, 2011; Ainouche et al., 2012; Buggs, 2012; Buggs et al.,
64 2014; Rambani et al., 2014; see Yoo et al., 2014 for review). For example, autotetraploid
65 Solanum phureja expresses ~10% of all loci at a different levels relative to diploid S.
66 phureaja (Stupar et al., 2007). In Paulownia fortunei, the autotetraploid differentially
67 expresses ~6% of all loci relative to the diploid progenitor (Zhang et al., 2014), and in
68 Arabidopsis thaliana ~4% of all loci are differentially expressed between the diploid and
69 autotetraploid derivative (Del Pozo and Ramirez-Parra, 2014). However, the rapid increase
70 in studies of patterns of gene expression in polyploids appear to have outpaced our bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 4
71 fundamental understanding of the transcriptome’s response to polyploidy per se and how
72 this response might influence our interpretation of gene expression measures.
73 One major concern in comparisons of gene expression between diploids and polyploids
74 is that any transcriptome amplification or transcriptome-wide effects induced following
75 polyploidy are rarely, if ever, investigated (see Coate and Doyle, 2010, 2015).
76 Transcriptional amplification is a biological phenomenon in which the total mRNA
77 produced per cell is increased up to several fold in one treatment group compared to
78 another, resulting in unequal total transcriptome sizes (Nie et al., 2012). Transcriptional
79 amplification would be anticipated in a polyploid compared to a diploid progenitor, as
80 polyploidy globally alters gene/genome copy number and often influences cell size, which,
81 in turn, has been shown to correlate with transcriptome size (Fomina-yadlin et al., 2014).
82 Recent work has shed light on some of the biases inherent to expression level
83 comparisons between treatment groups that differ in transcriptome size. Loven et al.
84 (2012) showed that inferences drawn from a typical RNAseq workflow can be confounded
85 when treatment groups have transcriptomes of differing size. Because transcriptome size
86 variation has rarely been explored, RNAseq studies involving transcriptome amplification
87 are likely to underestimate the proportion of the transcriptome being differentially
88 expressed. Surprisingly, only a single investigation into gene expression change following
89 polyploidy has accounted for variation in transcriptome size (Coate and Doyle, 2010). Coate
90 and Doyle (2010) found that allotetraploid G. dolicocarpa has a total mRNA transcriptome
91 ~1.4 times greater than its diploid progenitors, demonstrating that polyploidy can induce
92 an increase in transcriptome size. However, similar data from additional polyploid systems
93 are badly needed to achieve a broader understanding of the role that polyploidy plays in
94 transcriptome-wide changes in gene expression. Current methods for comparing
95 differences in levels of gene expression across the transcriptome rely on the detection of bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 5
96 statistically different RNAseq read abundances at each locus to generate a summary of
97 differential gene expression. RNAseq libraries are usually sequenced to different depths by
98 chance, resulting in library size (the total number of reads per sample) varying across the
99 final dataset. To prevent variation in library size from influencing analyses of differential
100 expression, library size is typically normalized across all samples (e.g., Mortazavi et al.,
101 2008). Commonly used library normalization methods quantify expression level on a
102 concentration basis by dividing read abundance by a factor of the whole library (e.g.,
103 transcripts per million or reads per kilobase per million) (see Coate and Doyle, 2010, 2015;
104 Lovén et al., 2012). Concentration-based normalizations place gene expression in the
105 context of expression level per transcriptome. Using a concentration-based normalization
106 approach to infer differences in expression level requires that total transcriptome size does
107 not vary between ploidal levels. If transcriptome size varies by ploidy, then loci identified as
108 differentially expressed (differentially expressed genes; DEGs) are not necessarily
109 expressed at different levels, but instead are maintained in different concentrations relative
110 to the transcriptome (Figure 1A).
111 To overcome these problems, the use of synthetic spike-in RNA standards has been
112 proposed (Lovén et al., 2012). These standards, developed by the External RNA Controls
113 Consortium (External RNA Controls Consortium, 2005; Baker et al., 2005; referred to as
114 Spike-in RNA standards, henceforth spike-ins), facilitate comparison of absolute expression
115 level across treatments with different transcriptome sizes (see Pine et al., 2016). Spike-ins
116 allow for normalization of transcript abundance independent of transcriptome size. To date,
117 spike-ins have been largely restricted to use in model systems (e.g., hamster – Fomina-
118 yadlin et al., 2014; human – Xu et al., 2014; zebrafish – Schall et al., 2017), single cell
119 sequencing (e.g., Krishnaswami et al., 2016; Liu et al., 2016), and methods development and
120 validation (e.g., Lovén et al., 2012; Gu et al., 2014; Germain et al., 2016); they remain a bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 6
121 promising but unutilized method for cross-ploidy comparisons.
122 Most polyploids exhibit differences in cell size relative to their parents, which may
123 result in an alteration of cell density (Stebbins, 1971; Masterson, 1994). Because
124 normalization per transcriptome is based on concentration of transcripts, these
125 comparisons are therefore robust to variation in cell size and density. Differences in cell
126 number and cell size might be expected to influence levels of gene expression on certain
127 scales, but cell size and density are rarely investigated prior to studying differences in gene
128 expression levels. Conversely, normalizing by the abundance of an internal standard within
129 a library (e.g., spike-ins) does not account for variation in the number of contributing cells
130 across treatments, and the inferred transcript abundance may be biased (Fomina-yadlin et
131 al., 2014). It is therefore critical to have information on cell density differences between
132 treatment groups when normalizing to an internal standard.
133 Following polyploidy, a balance of cell size and density effects, as well as differences in
134 gene/allelic dosage, could have profound effects on all facets of plant physiology. Across a
135 ploidal series in Atriplex confertifolia, cell density decreases with increasing ploidy (Warner
136 and Edwards, 1989). Despite harboring fewer cells per unit area, higher ploidal levels of A.
137 confertifolia are capable of higher photosynthesis per unit leaf area, due to increased
138 photosynthesis of tetraploid cells relative to diploid cells. Conversely, in tetraploid
139 Medicago sativa, which also exhibits lower cell density than does its diploid progenitor,
140 photosynthesis per cell is increased in the tetraploid to yield diploid-like photosynthetic
141 output per unit leaf area (Warner and Edwards, 1993). In A. confertifolia and M. sativa, cell
142 size and density alone are not sufficient for predicting physiological change following
143 polyploidy. Instead, information on the interaction between cell density and cell efficiency is
144 needed to understand fully the physiological impact of polyploidy. In light of these
145 observations, studies of changes in gene expression levels following polyploidy should bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 7
146 routinely investigate the interaction between gene expression across a unit of biomass and
147 per cell.
148 When discussing transcript abundance between samples of differing transcriptome
149 sizes and/or differing cell density, a clear nomenclature is critical. We illustrate three
150 separate ways of defining transcript abundance when both transcriptome size and cell
151 density vary between treatment groups (Figure 1). When RNAseq data are normalized
152 without an external standard (Figure 1A), differences in abundance observed between two
153 treatments reflect a difference in transcript concentration. We follow Coate and Doyle
154 (2010) in referring to this type of comparison as ‘per transcriptome’.
155 Differences in expression level between treatments normalized to the abundance of
156 spike-ins reflect differences in transcript abundance relative to the abundance of spike-ins.
157 When spike-ins are added in equal amounts to samples derived from equivalent volumes of
158 tissue, then differences observed following a spike-in-based normalization indicate that
159 transcripts differ in abundance within a given volume of tissue (Figure 1B). We refer to this
160 comparison as ‘per biomass’. Finally, if the spike-in abundance is scaled by a factor equal to
161 differences in cell density, differences in expression level will reflect changes in transcript
162 abundance relative to the cell (Figure 1C), and we refer to this case as ‘per cell’ (Figure 1).
163 Tolmiea (Saxifragaceae) is an excellent evolutionary model for investigating the
164 transcriptional impact of polyploidy because: 1) it is a clear diploid (T. diplomenziesii) and
165 autotetraploid (T. menziesii) system; and 2) there is strong support for a single origin of the
166 autotetraploid (Soltis and Soltis, 1988; Soltis et al., 1989; Visger et al., 2016). In addition,
167 examination of changes in expression level following autopolyploidy is less problematic
168 than similar investigations within allopolyploid systems for several reasons. First,
169 autopolyploidy results in a duplication of a single genome, rather than the merger and
170 duplication of two divergent genomes (as in allopolyploidy). Hence, a single diploid-based bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 8
171 mapping reference can be used for both the diploid and the autotetraploid. Additionally, by
172 definition, there are no homeologues in an autopolyploid, substantially reducing the
173 potential for biased mismapping of paralogous reads. A single origin of the polyploid is also
174 an advantage for exploring issues pertaining to transcriptional amplification and reduces
175 analytical complexity. For example, multiple, independent origins of a polyploid may have
176 differing effects on transcriptome size. Thus, an investigation accounting for biases in
177 transcriptome size should start with an autopolyploid arising from a single origin.
178 Here we leverage spike-in RNA standards and multiple normalization methods to
179 collectively characterize for the first-time gene expression across three biological scales in a
180 natural system. Through this multi-scale comparison, we investigate how polyploidy has
181 influenced transcriptional change in gene pathway stoichiometry, cellular expression levels,
182 and transcript abundance across leaf organ tissue. Finally, we synthesize the differences
183 represented across all three scales and place our findings within the context of
184 ecophysiological data for Tolmiea.
185
186 Materials and Methods
187 Sampling— Plants were collected from three geographically separate natural
188 populations of both T. diplomenziesii and T. menziesii (Figure 2). Taking advantage of the
189 ability of Tolmiea to reproduce asexually via plantlet formation, each individual collected in
190 the field was subsequently propagated in quadruplicate in a greenhouse at the University of
191 Florida. The resulting plantlets were then grown to maturity under standardized conditions
192 within a common garden greenhouse at the University of Florida. One of four replicate
193 plants from two T. menziesii populations died prior to sampling.
194 Each of the 22 mature plants was sampled for equivalent volumes of leaf tissue using an
195 8-mm-diameter cork borer. The tissue was then flash frozen in liquid N2 and stored at -80C bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 9
196 until RNA extraction. Total RNA was extracted from this tissue using the CTAB and Trizol
197 method of Jordon-Thaden et al. (2015; protocol number 2) with the addition of 20%
198 sarkosyl. DNA was removed using a Turbo DNA-free kit (Invitrogen, USA). Following the
199 manufacturer's recommendations, the total RNA was spiked with 4 ul of 1:100 diluted ERCC
200 RNA Spike-in mix (Ambion, USA). RNAseq libraries were then built using the TruSeq kit
201 (Illumina, USA); 100-bp paired-end sequencing was performed using an Illumina HiSeq at
202 the Beijing Genomics Institute.
203 Quantifying cell density--To quantify differences in levels of gene expression per cell, it
204 was necessary to account for differences in both transcriptome size and cell density
205 between diploid and autotetraploid Tolmiea. A potential source of uncertainty is our
206 estimation of differences in cell density that were used to normalize our reads on a relative
207 per-cell basis. To decrease the likelihood of misrepresenting differences in cell density, we
208 characterized cell density using both DNA/RNA co-extraction and cell counting as described
209 below.
210 Duplicate leaf punches from each sample were used for co-extraction of DNA and RNA.
211 We first followed the Jordon-Thaden et al. (2015) method #2 with 20% sakrosyl. Following
212 the CTAB incubation, the supernatant was split into two equal aliquots, one of which was
213 used for RNA extraction (following Jordon-Thaden et al., 2015) and the other for DNA
214 extraction (following Doyle and Doyle, 1987). DNA concentrations were quantified with
215 dsDNA broad-range chemicals using a Qubit (Life Technologies). DNA concentration was
216 placed into a 1C context by dividing by ploidal level. The 1C DNA concentration was used to
217 infer the relative difference in cell density of the leaf tissue contributing to RNA extraction
218 between T. menziesii and T. diplomenziesii. To validate this approach, we also directly
219 estimated cell density per unit area. Leaf punches two cm in diameter collected from 10
220 diploids and 11 tetraploids were digested in 500 ul of 10% chromic acid until cells were bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 10
221 fully dissociated (Brown and Rickless, 1949; Ilut et al., 2012). Each individual was assayed
222 twice, and the number of cells in the suspension was counted twice for each assay using 10-
223 ul aliquots in a hemocytometer. Statistical analyses of 1C DNA concentration and cell
224 density were performed using a linear mixed-effects model implemented in JMP (version
225 12; SAS Institute, Cary, NC, USA) with individual as a random effect. All datasets were tested
226 for normality using a goodness of fit test, and if normality was rejected, the data were log
227 transformed.
228 Differential expression analysis— Raw reads were cleaned using CutAdapt (Martin,
229 2011) and Sickle (Joshi and Fass, 2011). A Tolmiea reference transcriptome was generated
230 from concatenated reads taken from all samples (with the spike-in reads removed), and in
231 silico read normalization was employed using Trinity) (Grabherr et al., 2011). Extremely
232 low-expressed isoforms were removed (< 1 transcript per million), and the remaining
233 transcriptome was annotated using the Trinotate pipeline
234 (http://trinotate.sourceforge.net/). Trimmed reads for each sample were mapped to a
235 concatenation of the Tolmiea transcriptome with isoforms clustered together (using the
236 Trinity ‘gene’ option) and the publically available ERCC spike-in reference using Bowtie2
237 (Langmead and Salzberg, 2012), and read counts were extracted using eXpress (Roberts,
238 2013).
239 We provide an iPython notebook including the entire code for the normalization,
240 differential expression, and figure generation described below (Appendix 1). In short, read
241 count normalization and differential expression analyses were conducted using Limma-
242 Voom (Ritchie et al., 2015). A per-transcriptome normalization was implemented using the
243 total library following the removal of spike-in count data to compute normalization factors.
244 A per-biomass normalization used only spike-in count data to compute normalization
245 factors. A per-cell normalization used the spike-in count data following in silico adjustment bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 11
246 of tetraploid spike-in abundance using the difference of diploid versus tetraploid cell
247 density (65%; see methods on quantifying cell size and density and Results). Following
248 each normalization, using the plotMDS function, a multidimensional scaling plot was
249 generated from 500 loci exhibiting the highest variation in expression level among samples.
250 Differences in transcriptome size were approximated using the sum of normalized read
251 counts per cell; however, Limma-Voom normalizations use log-counts, which are not
252 applicable to straight summation, so we used the DEseq package (Anders and Huber, 2010)
253 to compute per-cell normalized counts for this purpose only. Next, a differential expression
254 analysis was run using the normalization approaches above described, with loci identified
255 as differentially expressed (DE) using a 0.05 p-value, 0.05 false discovery rate, and cutoff of
256 1 log-fold change (logFC). Differentially expressed loci were binned both broadly across the
257 three normalization methods and more finely to characterize interplay between the three
258 normalization results using gplots (Warnes et al., 2009). Gene ontology (GO) terms were
259 extracted from the Trinotate annotations of the mapping reference. Each of the fine-scale
260 bins of DEGs was tested for functional enrichment using GOSeq (Young et al., 2010).
261
262 Results
263 Quantifying cell size and density -- The mean diploid and tetraploid 1C DNA
264 concentrations per leaf punch extraction were 1.32591 +/- 0.09439 ug/ml and 0.76432 +/-
265 0.0619 ug/ml, respectively; these differed significantly (p < 0.0001) (Figure 3A). The mean
266 tetraploid 1C DNA per punch was 57.6% of the diploid value. Following tissue digestion, the
267 estimated values of mean number of cells per leaf punch in diploids and tetraploids were
268 412,146 +/- 38,002 and 268,958 +/- 20,522, respectively. These values differed significantly
269 (p = 0.0129). The tetraploids on average had 65.3% as many cells per leaf punch as found in
270 the diploids (Figure 3B). bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 12
271 Our two methods for characterizing differences in cell density both revealed similar
272 reductions in tetraploid cell density; tetraploids possessed ~58% and 65% of the diploid
273 density, based on the DNA/RNA co-extraction and cell-count results, respectively. We
274 elected to normalize our per-cell analysis by applying a 0.65 factor to the per-cell
275 normalization factor of the tetraploid samples to reflect the more conservative estimate of
276 differences in cell density; however, both estimates of cell density reveal the same the major
277 conclusions of this study (the alternative cell density difference can be run using the
278 supplemental ipython notebook – Appendix 1).
279 Differential expression analysis—We obtained an average of 25 million reads per sample
280 after cleaning low-quality reads (see appendix 2 for total reads per sample). The Tolmiea
281 reference transcriptome assembly resulted in 58,046 isoforms binned within 28,467
282 clusters (henceforth genes) with an N50 of 1,821 bp. After read mapping, 26,816 genes had
283 at least 5 non-zero counts and were used for downstream differential expression analyses.
284 In all, 15,205 genes were annotated according to GO using Trinotate.
285 A multidimensional scaling plot of the 500 loci with the highest variation in expression
286 level revealed that all three normalization methods (per transcriptome, per cell, and per
287 biomass) performed well at clustering members from the same population with one
288 another (Figure 4A-C). After summing the read counts normalized per cell for each sample,
289 we found that the mean transcription per cell (henceforth transcriptome size) for the
290 tetraploid was 2.1 times higher than the mean for the diploid (Figure 5). Additionally, the
291 total transcriptome size per cell was highly variable, more so in the tetraploids than the
292 diploids (20,815,579 and 11,799,791 normalized counts, respectively -- see appendix 1 for
293 standard deviation within each population). Across the three normalization methods, the
294 differential expression analysis found the tetraploid relative to the diploid had 1,559 up-
295 and 1,071 down-regulated genes per transcriptome, 1,440 up- and 1,550 down-regulated bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 13
296 genes per biomass, and 3,005 up- and 751 down-regulated genes per cell (Figure 4D-F).
297 Across the three different normalization methods, we found 4,555 unique loci were DE
298 under one or more methods (Figure 6). Finer binning of the interactions between
299 normalization methods revealed the majority of DEGs to be either up-regulated in the
300 tetraploid across all normalizations (1,392 – Figure 7A) or only up-regulated per cell (1,398
301 Figure 7D).
302
303 Discussion
304 Differences in gene expression following autopolyploidy: This is the first study to
305 leverage synthetic RNA standards to characterize differences in gene expression level per
306 transcriptome, per biomass, and on a relative per-cell basis between ploidal levels. Through
307 the use of this ‘three normalization approach’ and characterization of gene expression levels
308 in diploid and autotetraploid Tolmiea, we found 4,555 out of 26,816 loci were DE (~17% of
309 the transcriptome) and identified four notable trends. First, the per-transcriptome
310 normalization, the method researchers typically use, captured the fewest DEGs and failed to
311 detect any DEGs not found by the other two methods. Second, most differential expression
312 occurs on a per-cell basis, and there is a clear unbalanced distribution of up- versus down-
313 regulation in the autotetraploid relative to the diploid, with 3,005 up- versus 751 down-
314 regulated DEGs per cell. Third, in the tetraploid, transcripts up-regulated per cell appear to
315 compensate for a decreased cell density, resulting in a conservation of expression level per
316 biomass relative to the diploid (see Figure 8). Loci exhibiting conservation per biomass
317 were significantly enriched for functions related to photosynthesis and the chloroplast.
318 Fourth, transcriptome size varied substantially across our dataset, with a significant
319 increase in the inferred transcriptome size of the tetraploid relative to the diploid (Figure
320 5). Below we discuss each of these four trends in greater detail. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 14
321 Autopolyploids have rarely been compared to their diploid progenitors with respect to
322 divergence in expression level (e.g., Stupar et al., 2007; Del Pozo and Ramirez-Parra, 2014;
323 Zhang et al., 2014). These previous diploid-autopolyploid comparisons found that
324 autopolyploids tend to deviate from diploid-like gene expression levels across 1-10% of the
325 transcriptome (~6% in Paulownia fortunei – Zhang et al., 2014; ~10% in Solanum phureja –
326 Stupar et al., 2007; ~1-4% in Arabidopsis – Del Pozo and Ramirez-Parra, 2014). However,
327 none of these comparisons was normalized using spike-in standards; instead they used
328 what is referred to here as a per-transcriptome comparison. Unlike the spike-in-derived
329 per-biomass and per-cell normalized transcript counts, the results of per-transcriptome
330 normalizations reflect concentration differences and are not a proxy for absolute
331 expression. Therefore, the results of previous autopolyploid expression studies must be
332 interpreted as differences in transcript concentration rather than absolute abundance.
333 Spike-in standards have rarely been used in any evolutionary comparisons and have
334 primarily been adopted for use in studies of model fish (zebrafish), mammals (human,
335 mouse, and hamster), and plants (Arabidopsis) (e.g., Brennecke et al., 2013; Fomina-yadlin
336 et al., 2014; Xu et al., 2014; Schall et al., 2017). Despite the use of spike-ins in these
337 investigations of model systems, in methods development, and in single-cell sequencing,
338 spike-in approaches have not yet been applied to evolutionarily motivated studies of
339 natural populations. The lack of spike-in usage in non-model studies is surprising, because
340 a reference genome is not a requirement for a spike-in-based normalization. In fact, spike-
341 ins may be more important in non-model versus model systems because comparisons
342 within non-model systems are often broad (e.g., interspecific comparisons) and may be
343 more likely to result in unequal transcriptome size between treatment groups.
344 Considering only the results of our per-transcriptome normalization, approximately 9%
345 of the Tolmiea transcriptome was differentially expressed on a concentration basis between bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 15
346 the diploid and autotetraploid species, in line with the results for other diploid--
347 autopolyploid pairs cited above. This finding suggests that, in general, only a small fraction
348 of the transcriptome, less than 10%, responds to autopolyploidization through novel
349 alterations to transcript concentration. Despite differences in transcript abundance per
350 transcriptome representing less than 10% of all loci, concentration-based differences could
351 have important consequences for the stoichiometry of gene expression pathways.
352 Unfortunately, although Tolmiea is a good evolutionary model, it is not a genetic model, and
353 the ability to investigate specific pathways and make inferences regarding physiological
354 impact is severely hampered. However, it would be important for future studies to test
355 whether members of a given pathway respond similarly with respect to the maintenance of
356 transcript abundances relative to the transcriptome, cell, or biomass.
357 It is also notable that although ~9% of the loci examined in Tolmiea were differentially
358 expressed per transcriptome, none of these DEGs were uniquely recovered only from the
359 per-transcriptome analysis. The majority of per-transcriptome DEGs (2,106 of the 2,630)
360 were differentially expressed at a sufficiently high magnitude to be detected by all three
361 normalization methods. These results suggest that the loci typically identified as DE in
362 previous studies of polyploid gene expression represent only differences in expression level
363 extreme enough to be detected through a significantly altered concentration. Therefore,
364 examination of more subtle differences in expression level requires the use of normalization
365 approaches that allow for quantitative comparisons of transcript abundance.
366 When we compared expression level per cell, we found that ~14% (3,756 loci) of the
367 transcriptome was maintained in different abundances per cell between diploid and
368 tetraploid Tolmiea. Importantly, the direction of differences in expression level in T.
369 menziesii relative to the diploid was extremely unbalanced. Nearly all per-cell DEGs were
370 up-regulated in T. menziesii (3,005 loci), with only 751 down-regulated. In addition, 1,398 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 16
371 of the 3,005 per-cell DEGs up-regulated in the tetraploid were unique to the per-cell
372 normalization and not recovered under either per-transcriptome or per-biomass
373 normalizations (Figure 7D). Taken together, it appears that most differential expression in
374 Tolmiea represents a pattern consistent with dosage sensitivity, with increased levels of
375 gene expression in the tetraploid correlating with an increase in allele copy number. This
376 result would not have been detected under a typical library-size normalization method, as
377 the expression of these loci is not significantly altered in concentration relative to the
378 whole.
379 The over-abundance of up-regulation per cell in the tetraploid may serve as a
380 mechanism to compensate for reduced cell density. In other words, there is tendency in T.
381 menziesii for conserved gene expression levels per biomass through novelty at the cellular
382 level (henceforth per-biomass conservation) (Figure 8). Approximately 1,398 loci, or 5.2%
383 of all loci, in T. menziesii exhibit per-biomass conservation (Figure 7D). This buffering effect
384 is achieved by per-cell up-regulation in T. menziesii mirroring the decrease in cell density
385 relative to T. diplomenziesii. Fifteen functional categories were significantly over-
386 represented among the loci exhibiting per-biomass conservation. Of these, seven were
387 related to either the chloroplast or photosynthesis. Whether there is selective pressure to
388 conserve expression per biomass is unclear, but alteration of photosynthesis either per cell
389 or per biomass appears to be a recurring theme across diploid/polyploid comparisons (e.g.,
390 Warner and Edwards, 1993; Vyas et al., 2007; Coate et al., 2013). For example, Warner and
391 Edwards (1993) revealed photosynthetic conservation per biomass in M. sativa, but an
392 overall increase in photosynthesis per biomass following polyploidy in A. confertifolia. A
393 clear trend in the effects of polyploidization on photosynthesis per biomass has yet to
394 emerge, and like many aspects of polyploidy, it may be lineage-specific and/or require
395 additional study systems (Soltis et al., 2016). bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 17
396 Diploid and tetraploid Tolmiea occur under similar light regimes in nature (Visger et al.,
397 2016), and previously collected physiological data revealed no significant difference in
398 photosynthetic rate per leaf area (Visger et al., 2016) (Figure 9). In Tolmiea, conservation of
399 expression level per biomass may be a mechanism for the maintenance of optimal
400 photosynthesis per biomass, facilitating the ecological conservation of light preference in
401 Tolmiea. To determine if photosynthesis-related functional enrichment of conservation per
402 biomass is indeed a key underlying molecular mechanism for buffering cell density as it
403 pertains to photosynthesis, additional autopolyploid systems should be similarly studied.
404 Revisiting the work of Warner and Edwards (1993) using our spike-in standard-based gene
405 expression approach should also show a similar conservation of gene expression per
406 biomass in polyploid M. sativa. Conversely, in A. confertifolia where polyploidy increases
407 photosynthesis per leaf area, we might expect photosynthesis-related gene expression per
408 cell to be increased by a factor greater than the difference in cell density between diploid
409 and polyploid plants (Warner and Edwards, 1993).
410 An initial motivation for utilizing a spike-in approach to read-count normalization was
411 to tease apart variation in library size from differences in transcriptome size between
412 diploid and tetraploid Tolmiea. We found that when normalizing read counts per cell, the
413 mean transcriptome size of the tetraploid is over twice that of the diploid (Figure 5). This
414 difference in transcriptome size should be qualified, as the variation within and between
415 populations is quite large, though the mean transcriptome size still significantly differed
416 with ploidal level. Notably, we also observed that the variability of transcriptome size was
417 greater in the tetraploids. The diploid populations all exhibited a similar degree of variation
418 in transcriptome size, while tetraploid populations represented both the least and greatest
419 variation in transcriptome size (Figure 5; see Appendix 1 for standard deviation by
420 population). Excluding all other results presented in this study, the variability of bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 18
421 transcriptome size alone should be motivation enough for researchers of polyploidy to
422 adopt a spike-in-based approach.
423
424 The benefits of spike-in RNA standards: This study is not the first to apply synthetic
425 RNA spike-in standards in dealing with a transcriptome-wide effect (e.g., see Lovén et al.,
426 2012; Fomina-yadlin et al., 2014). However, this is the first study to: 1) leverage spike-in
427 standards in a cross-ploidy comparison, and 2) quantify expression level on three different,
428 biologically relevant scales. By using multiple read-count normalizations, with and without
429 spike-in standards, we investigated the interaction of expression level per cell and per
430 biomass between diploid and autotetraploid Tolmiea. Had this study been performed
431 without the use of spike-in standards, the results would have been limited to those of the
432 per-transcriptome normalization.
433 Spike-in normalization is most valuable when evaluating differences in gene expression
434 levels between two groups that have differing cell density and/or transcriptome size. By
435 normalizing RNAseq count data by the abundance of spike-in standards, sequencing depth
436 and transcriptome size are effectively disentangled. Methods employed by previous
437 comparative studies of diploid/polyploid pairs were limited to quantifying transcriptional
438 differences on a concentration-basis only. Concentration-based comparisons, while useful
439 for inferring alterations of pathway stoichiometry, are effectively blind to large proportions
440 of the transcriptome exhibiting additive expression levels associated with polyploidy. That
441 is, if the expression of many genes is increased in a single direction and is commensurate
442 with the increase in ploidal level, then the impact on any single gene’s concentration will be
443 minimal. An additional advantage of employing a spike-in normalization is that transcript
444 abundance can be independently quantified relative to biomass and relative to cell. Spike-in bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 19
445 reads can also be removed for some downstream analyses, allowing for a typical library-size
446 normalization so that concentration-based differences may be characterized as well.
447 The three normalization approaches presented here are all individually informative,
448 and the decision to include any or all of them should be guided by the research question.
449 For example, if the research question revolves around the bulk production of a compound,
450 evaluating differences in expression level per biomass may be the best approach. Research
451 questions focusing on complex gene pathways may be better served by an analysis of
452 expression level per cell. Information on potential differences in expression level
453 stoichiometry can be gained using traditional comparisons per transcriptome. Additionally,
454 as demonstrated by our study of Tolmiea, the interaction among multiple normalization
455 approaches can be even more informative than any single approach.
456 While the use of non-concentration-based normalizations can enable researchers to
457 address new questions, there is an important caveat that could lead to potential biases or
458 increased uncertainty. Differences in RNA extraction efficiency between treatment groups
459 are difficult to tease apart from variation in total transcriptome size. In much the same way
460 that variation in transcriptome size influences estimates of expression level, differences in
461 extraction efficiency could bias calculations of expression level for per-cell and per-biomass
462 analyses. For example, if RNA extraction were half as efficient in one treatment group
463 versus another, then differential expression analyses per cell would only consider half of the
464 actual transcript abundance of one group relative to the other. Future approaches should
465 consider partially accounting for this issue through the addition of a second unique set of
466 spike-ins prior to RNA extraction. Comparing the ratio of pre- versus post-extraction spike-
467 in abundance should highlight differences in extraction efficiency. However, even the use of
468 a second spike-in set will not account for different extraction efficiencies if those differences
469 arise from variation in cell lysis. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 20
470 In summary, this study has demonstrated that the use of synthetic RNA spike-in
471 standards can be used to explore previously uninvestigated aspects of divergence in gene
472 expression levels in a comparison of a diploid and its autotetraploid derivative. To our
473 knowledge, this multiple normalization approach has recovered the largest fraction of a
474 transcriptome as DE in a diploid/autopolyploid species pair ever reported (~17% in
475 Tolmiea vs. up to ~10% in several other plant systems; Stupar et al., 2007; Del Pozo and
476 Ramirez-Parra, 2014; Zhang et al., 2014). Further, our methodology allowed for a fine-scale
477 examination of how divergence in gene expression level interacts with cell density,
478 revealing a mosaic of difference in transcript concentration, abundance per cell, and
479 abundance across tissue.
480 While we compared a diploid-autopolyploid pair, in which a difference in transcriptome
481 size might be predicted, variation in transcriptome size is rarely investigated and could be
482 widespread in biological systems at diverse scales due to factors other than shifts in ploidy.
483 To date, nearly all studies of global gene expression have used normalization methods that
484 implicitly assume transcriptome size is invariable, yet this assumption is not empirically
485 supported. Examples of studies where variation in transcriptome size might be likely
486 include (but are by no means limited to) comparisons between related species, between
487 different developmental stages, and across stress treatments. Yet even in these cases, the
488 potential for variation in transcriptome size remains ignored and uninvestigated. If
489 transcriptome size is in fact invariable between two experimental treatment groups, then
490 following our proposed methodology, the results of the per-transcriptome and per-cell
491 comparisons should be identical. It is of critical importance that researchers making
492 comparisons using RNAseq data, particularly in the broad suite of examples noted above,
493 avoid making the assumption that transcriptome size is invariable by contrasting multiple
494 normalization approaches. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 21
495 Acknowledgements:
496 We are thankful to J.E. Coate for insightful discussions on normalization. Clayton J. Visger was
497 supported by NSF Graduate Research Fellowship DGE- 1315138. This research was supported in
498 part by an NSF Doctoral Dissertation Improvement Grant DEB-1501803 and by the 1KP project.
499
500 Author’s contributions:
501 C.J.V., G.K.W., P.S.S., and D.E.S. planned and designed the research. C.J.V. and Y.Z. performed
502 the experiments. C.J.V. analyzed the data. C.J.V., P.S.S., and D.E.S. wrote the manuscript.
503
504 References: 505 Adams KL, Cronn R, Percifield R, Wendel JF. 2003. Genes duplicated by polyploidy show 506 unequal contributions to the transcriptome and organ-specific reciprocal silencing. 507 Proceedings of the National Academy of Sciences of the United States of America 100: 508 4649–4654. 509 Adams KL, Wendel JF. 2013. Dynamics of Duplicated Gene Expression in Polyploid Cotton. 510 Polyploid and Hybrid Genomics. John Wiley and Sons, Inc., 187–194. 511 Ainouche M, Chelaifa H, Ferreira J, Bellot S, Ainouche A, Salmon A. 2012. Polyploid 512 Evolution in Spartina: Dealing with Highly Redundant Hybrid Genomes. In: Soltis PS,, 513 In: Soltis DE, eds. Polyploidy and Genome Evolution. Berlin, Heidelberg: Springer Berlin 514 Heidelberg, 225–243. 515 Anders S, Huber W. 2010. Differential expression analysis for sequence count data. 516 Genome Biology 11: R106. 517 Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, Causton H, Conley MP, 518 Elespuru R, Fero M, et al. 2005. The External RNA Controls Consortium: a progress 519 report. Nature methods 2: 731–734. 520 Barker MS, Husband BC, Pires JC. 2016. Spreading Winge and flying high: The 521 evolutionary importance of polyploidy after a century of study. American Journal of 522 Botany 103: 1139–1145. 523 Brennecke P, Anders S, Kim JK, Kołodziejczyk AA, Zhang X, Proserpio V, Baying B, 524 Benes V, Teichmann SA, Marioni JC, et al. 2013. Accounting for technical noise in 525 single-cell RNA-seq experiments. Nature Methods 10: 1093–1095. 526 Brown R, Rickless P. 1949. A new method for the study of cell division and cell extension 527 with some preliminary observations on the effect of temperature and of nutrients. 528 Proceedings of the Royal Society of London. Series B, Biological Sciences 136: 110–125. 529 Buggs RJA. 2012. Monkeying around with ploidy. Molecular Ecology 21: 5159–5161. 530 Buggs RJA, Wendel JF, Doyle JJ, Soltis DE, Soltis PS, Coate JE. 2014. The legacy of diploid 531 progenitors in allopolyploid gene expression patterns. Philosophical Transactions of the 532 Royal Society of London. Series B, Biological Sciences 369: 20130354. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 22
533 Cañestro C. 2012. Two rounds of whole genome duplication: Evidence and impact on the 534 evolution of vertebrate innovations. In: Soltis PS, Soltis DE, eds. Polyploidy and Genome 535 Evolution. Springer, New York. 536 Chelaifa H, Monnier A, Ainouche M. 2010. Transcriptomic changes following recent 537 natural hybridization and allopolyploidy in the salt marsh species Spartina x townsendii 538 and Spartina anglica (Poaceae). The New Phytologist 186: 161–174. 539 Chen JZ, Birchler JA. 2013. Polyploid and Hybrid Genomics. New York: Wiley-Blackwell. 540 Coate JE, Doyle JJ. 2010. Quantifying whole transcriptome size, a prerequisite for 541 understanding transcriptome evolution across species: An example from a plant 542 allopolyploid. Genome Biology and Evolution 2: 534–546. 543 Coate JE, Doyle JJ. 2015. Variation in transcriptome size: Are we getting the message? 544 Chromosoma 124: 27–43. 545 Coate JE, Powell AF, Owens TG, Doyle JJ. 2013. Transgressive physiological and 546 transcriptomic responses to light stress in allopolyploid Glycine dolichocarpa 547 (Leguminosae). Heredity 110: 160–170. 548 Del Pozo JC, Ramirez-Parra E. 2014. Deciphering the molecular bases for drought 549 tolerance in Arabidopsis autotetraploids. Plant, Cell and Environment: 2722–2737. 550 Dong S, Adams KL. 2011. Differential contributions to the transcriptome of duplicated 551 genes in response to abiotic stresses in natural and synthetic polyploids. The New 552 Phytologist 190: 1045–1057. 553 Doyle JJ. 2012. Polyploidy in legumes. In: Soltis PS, Soltis DE, eds. Polyploidy and Genome 554 Evolution. Springer, New York. 555 Doyle JJ, Doyle JL. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf 556 tissue. Phytochemistry Bulletin: 11–15. 557 Doyle JJ, Flagel LE, Paterson AH, Rapp RA, Soltis DE, Soltis PS, Wendel JF. 2008. 558 Evolutionary genetics of genome merger and doubling in plants. Annual Review of 559 Genetics 42: 443–461. 560 External RNA Controls Consortium. 2005. Proposed methods for testing and selecting the 561 ERCC external RNA controls. BMC Genomics 6: 150. 562 Fomina-yadlin D, Du Z, Mcgrew JT. 2014. Gene expression measurements normalized to 563 cell number reveal large scale differences due to cell size changes, transcriptional 564 amplification and transcriptional repression in CHO cells. Journal of Biotechnology 189: 565 58–69. 566 Gaeta RT, Pires JC, Iniguez-Luy F, Leon E, Osborn TC. 2007. Genomic changes in 567 resynthesized Brassica napus and their effect on gene expression and phenotype. The 568 Plant Cell 19: 3403–3417. 569 Germain P-L, Vitriolo A, Adamo A, Laise P, Das V, Testa G. 2016. RNAontheBENCH: 570 computational and empirical resources for benchmarking RNAseq quantification and 571 differential expression methods. Nucleic Acids Research 44: 5054–5067. 572 Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, 573 Raychowdhury R, Zeng Q, et al. 2011. Full-length transcriptome assembly from RNA- 574 Seq data without a reference genome. Nature Biotechnology 29: 644–652. 575 Greilhuber J, Dolezel J, Wendel J, Leitch IJ. 2012. Plant Genome Diversity, Volume 2: 576 Phsyical Structure, Behaviour and Evolution of Plant Genomes. Springer Vienna. 577 Gu J, Wang X, Halakivi-Clarke L, Clarke R, Xuan J. 2014. BADGE: A novel Bayesian model 578 for accurate abundance quantification and differential analysis of RNA-Seq data. BMC 579 Bioinformatics 15: S6. 580 Ilut DC, Coate JE, Luciano AK, Owens TG, May GD, Farmer A, Doyle JJ. 2012. A 581 comparative transcriptomic study of an allotetraploid and its diploid progenitors 582 illustrates the unique advantages and challenges of RNA-Seq in plant species. American bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 23
583 Journal of Botany 99: 383–396. 584 Jiao Y, Paterson AH. 2014. Polyploidy-associated genome modifications during land plant 585 evolution. Philosophical transactions of the Royal Society of London. Series B, Biological 586 Sciences 369. 587 Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho 588 LP, Hu Y, Liang H, Soltis PS, et al. 2011. Ancestral polyploidy in seed plants and 589 angiosperms. Nature 473: 97–100. 590 Jordon-Thaden IE, Chanderbali AS, Gitzendanner MA, Soltis DE. 2015. Modified CTAB 591 and TRIzol protocols improve RNA extraction from chemically complex Embryophyta. 592 Applications in Plant Sciences 3. 593 Joshi NA, Fass JN. 2011. Sickle: A sliding-window, adaptive, quality-based trimming tool for 594 FastQ files (Version 1.33)[Software]. 595 Krishnaswami SR, Grindberg R V, Novotny M, Venepally P, Lacar B, Bhutani K, Linker 596 SB, Pham S, Erwin JA, Miller JA, et al. 2016. Using single nuclei for RNA-seq to capture 597 the transcriptome of postmortem neurons. Nature Protocols 11: 499–524. 598 Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature 599 Methods 9: 357–359. 600 Liu B, Brubaker CL, Mergeai G, Cronn RC, Wendel JF. 2001. Polyploid formation in cotton 601 is not accompanied by rapid genomic changes. Genome 44: 321–330. 602 Liu SJ, Nowakowski TJ, Pollen AA, Lui JH, Horlbeck MA, Attenello FJ, He D, Weissman 603 JS, Kriegstein AR, Diaz AA, et al. 2016. Single-cell analysis of long non-coding RNAs in 604 the developing human neocortex. Genome Biology 17: 67. 605 Lovén J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, Levens DL, Lee TI, Young 606 RA. 2012. Revisiting global gene expression analysis. Cell 151: 476–482. 607 Madlung a, Wendel JF. 2013. Genetic and epigenetic aspects of polyploid evolution in 608 plants. Cytogenetic and Genome Research 140: 270–285. 609 Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing 610 reads. EMBnet.journal 17: 10. 611 Masterson J. 1994. Stomatal size in fossil plants: evidence for polyploidy in majority of 612 angiosperms. Science 264: 421–424. 613 Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying 614 mammalian transcriptomes by RNA-Seq. Nature methods 5: 621–628. 615 Nie Z, Hu G, Wei G, Cui K, Yamane A, Resch W, Wang R, Green DR, Tessarollo L, 616 Casellas R, et al. 2012. c-Myc is a universal amplifier of expressed genes in 617 lymphocytes and embryonic stem cells. Cell 151: 68–79. 618 Pine SP, Munro SA, Parsons JR, Mcdaniel J, Bergstrom Lucas A, Lozach J, Myers TG, Su 619 Q, Jacobs-Helber SM, Salit M. 2016. Evaluation of the External RNA Controls 620 Consortium (ERCC) reference material using a modified Latin square design. BMC 621 Biotechnology 16. 622 Rambani A, Page JT, Udall JA. 2014. Polyploidy and the petal transcriptome of Gossypium. 623 BMC Plant Biology 14: 3. 624 Renny-Byfield S, Wendel JF. 2014. Doubling down on genomes: Polyploidy and crop 625 plants. American Journal of Botany 101: 1–15. 626 Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. 2015. limma powers 627 differential expression analyses for RNA-sequencing and microarray studies. Nucleic 628 Acids Research 43: e47. 629 Roberts A. 2013. Thesis: Ambiguous fragment assignment for high-throughput sequencing 630 experiments. EECS Department, University of California, Berkeley. 631 Salmon A, Flagel L, Ying B, Udall JA, Wendel JF. 2010. Homoeologous nonreciprocal 632 recombination in polyploid cotton. New Phytologist 186: 123–134. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 24
633 Schall KA, Thornton ME, Isani M, Holoyda KA, Hou X, Lien C-L, Grubbs BH, Grikscheit 634 TC. 2017. Short bowel syndrome results in increased gene expression associated with 635 proliferation, inflammation, bile acid synthesis and immune system activation: RNA 636 sequencing a zebrafish SBS model. BMC Genomics 18: 23. 637 Shi X, Ng DW-K, Zhang C, Comai L, Ye W, Chen ZJ. 2012. Cis- and trans-regulatory 638 divergence between progenitor species determines gene-expression novelty in 639 Arabidopsis allopolyploids. Nature Communications 3: 950. 640 Soltis DE, Soltis PS. 1988. Electrophoretic evidence for tetrasomic segregation in Tolmiea 641 menziesii (Saxifragaceae). Heredity 60: 375–382. 642 Soltis PS, Soltis DE. 2009. The role of hybridization in plant speciation. Annual review of 643 plant biology 60: 561–88. 644 Soltis DE, Soltis PS. 2012. Polyploidy and Genome Evolution. Berlin, Heidelberg: Springer- 645 Verlag. 646 Soltis D, Soltis P, Ranker T, Ness B. 1989. Chloroplast DNA variation in a wild plant, 647 Tolmiea menziesii. Genetics 121: 819–826. 648 Soltis DE, Visger CJ, Marchant DB, Soltis PS. 2016. Polyploidy: Pitfalls and paths to a 649 paradigm. American Journal of Botany 103: 1–21. 650 Soltis DE, Visger CJ, Soltis PS. 2014. The polyploidy revolution then...and now: Stebbins 651 revisited. American Journal of Botany 101: 1057–1078. 652 Spoelhof JP, Soltis PS, Soltis DE. (Accepted). Pure polyploidy: Closing the gaps in 653 autopolyploid research. Journal of Systematics and Evolution. 654 Stebbins GL. 1950. Variation and Evolution in Plants. Columbia University Press. 655 Stebbins Jr. GL. 1971. Chromosomal Evolution in Higher Plants. Addison-Wesley. 656 Stupar RM, Hermanson PJ, Springer NM. 2007. Nonadditive expression and parent-of- 657 origin effects identified by microarray and allele-specific expression profiling of maize 658 endosperm. Plant Physiology 145: 411–425. 659 Van de Peer Y. 2011. A mystery unveiled. Genome Biology 12: 113. 660 Van de Peer Y, Fawcett JA, Proost S, Sterck L, Vandepoele K. 2009. The flowering world: 661 A tale of duplications. Trends in Plant Science 14: 680–688. 662 Visger CJ, Germain-Aubrey CC, Patel M, Sessa EB, Soltis PS, Soltis DE. 2016. Niche 663 divergence between diploid and autotetraploid Tolmiea. American Journal of Botany 664 103: 1396–1406. 665 Vyas P, Bisht MS, Miyazawa S-I, Yano S, Noguchi K, Terashima I, Funayama-Noguchi S. 666 2007. Effects of polyploidy on photosynthetic properties and anatomy in leaves of Phlox 667 drummondii. Functional Plant Biology 34: 673. 668 Wagner W. 1970. Biosystematics and Evolutionary Noise. Taxon 19: 146–151. 669 Warner DA, Edwards GE. 1989. Effects of polyploidy on photosynthetic rates, 670 photosynthetic enzymes, contents of DNA, chlorophyll, and sizes and numbers of 671 photosynthetic cells in the C(4) dicot Atriplex confertifolia. Plant Physiology 91: 1143– 672 1151. 673 Warner DA, Edwards GE. 1993. Effects of polyploidy on photosynthesis. Photosynthesis 674 Research 35: 135–147. 675 Warnes GR, Bolker B, Bonebakker L, Gentleman R., Huber W, Liaw A, Lumley T, 676 Maechler M, Magnusson A, Moeller S, Schwartz M. 2009. gplots: Various R 677 programming tools for plotting data. R Package Version, 2(4). 678 Wendel JF. 2015. The wondrous cycles of polyploidy in plants. American Journal of Botany 679 102: 1753–1756. 680 Xu J, Su Z, Hong H, Thierry-Mieg J, Thierry-Mieg D, Kreil DP, Mason CE, Tong W, Shi L. 681 2014. Cross-platform ultradeep transcriptomic profiling of human reference RNA 682 samples by RNA-Seq. Scientific Data 1: 140020. bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 25
683 Yoo M-J, Liu X, Pires JC, Soltis PS, Soltis DE. 2014. Nonadditive gene expression in 684 polyploids. Annual Review of Genetics 48: 485–517. 685 Young MD, Wakefield MJ, Smyth GK, Oshlack A, Fu X, Fu N, Guo S, Yan Z, Xu Y, Hu H, et 686 al. 2010. Gene ontology analysis for RNA-seq: Accounting for selection bias. Genome 687 Biology 11: R14. 688 Zhang X, Deng M, Fan G. 2014. Differential transcriptome analysis between Paulownia 689 fortunei and its synthesized autopolyploid. International Journal of Molecular Sciences 690 15: 5079–5093. 691 692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 26
713 Figures:
"diploid" "polyploid"
Transcript of interest
Non-target transcript
Spike-in standard
Per-transcriptome normalization: A DE analysis tests whether the transcript of interest is per per maintained in a different abundence relative to the total = transcriptome. Per-biomass normalization: B DE analysis tests whether the transcript of interest is per per maintained at different levels within a given biomass <
Per-cell normalization: DE analysis tests whether a given transcript is per per C maintained at different levels within a cell ÷ < ÷
714 715 Figure 1: A simplified example of how spike-in standards can be used during read
716 normalization to enable comparisons of expression level at different biological scales
717 between a hypothetical diploid-polyploid pair with differing cell density. The large circles
718 represent a unit of biomass and contain a number of cells (green squares). Beneath each
719 circle is a depiction of how the read normalizations are calculated. Under a per
720 transcriptome normalization, the ratio of target transcripts to the total transcriptome is
721 compared. While the per biomass normalization uses the ratio of the transcript of interest
722 to the spike-in transcripts. The per cell normalization also uses the ratio of the transcript of
723 interest to spike-in transcripts, but scales the spike-in transcript abundance by cell density,
724 represented here by multiplying the spike-in abundance by the number of contributing bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 27
725 cells. Whether the transcript of interest would be found as not differentially expressed or
726 higher/lower expressed in the polyploid under each normalization is indicted using ‘=’, ‘<’,
727 or ‘>’ respectively.
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 28
745
746 Figure 2: Generalized distributions of Tolmiea menziesii and T. diplomenziesii. The
747 population sources for plants used in this study are represented as red triangles (T.
748 diplomenziesii) and blue squares (T. menziesii).
749 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 29
A h B 6 8e+05 1. 4 1. 6e+05 2 1. 0 1. 4e+05 1C DNA ug/ml 8 0. 6 2e+05 0. Estimated cells per 2-cm diameter leaf punc
T. diplomenziesii T. menziesii T. diplomenziesii T. menziesii
750 751 Figure 3: Results of ploidy variation in leaf cell density using, A: 1C DNA concentration
752 following a DNA/RNA co-extraction, and B: estimated cell counts per 2cm diameter leaf
753 punch.
754
755
756
757
758
759
760
761
762
763
764
765
766
767 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 30
Per transcriptome Per biomass Per cell 3 3 A 3 B C 2 2 2 1 1 1 2 0 0 0 1 - 1 Leading logFC dim - 1 - 2 - 2 - 2 - 3 - 3 - 3 -
-4 -2 0 2 -4 -2 0 2 -4 -2 0 2 Leading logFC dim 1 Leading logFC dim 1 Leading logFC dim 1 Log Fold Change
Average Log Expression Average Log Expression Average Log Expression 768 769 Figure 4: Results from multiple differential expression analysis. MDS plots A-C cluster
770 individual based on the 500 most variable loci, with color indicating ploidal level and shape
771 reflecting population of origin. MA plots D-F show every locus in the Tolmiea transcriptome
772 (represented as dots), with log fold expression level differences in the polyploid relative to
773 the diploid on the y-axis and average expression level on the x-axis-- red indicates statistical
774 significance.
775
776
777
778
779
780
781
782
783
784 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 31
1e7 Total reads normalized per Cell 8 Tolmiea diplomenziesii 7 Tolmiea menziesii
6
5
4
3
2 Sum of normalized counts
1
0 Population 785 786 Figure 5: Sum of read counts normalized per cell, and clustered by population of origin.
787 Diploid and tetraploid mean significantly differed (p < 0.008).
788
789
790
791
792
793
794
795
796
797 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 32
Differentially expressed genes by normalization method
Per Cell Per Biomass 85 1,398 442
2,106 167 357
Per Transcriptome 0
n =4,555 798 799 Figure 6: Venn diagram contrasting the three normalization methods. Numbers within the
800 different sections indicate loci that were identified as being differentially expressed
801 between Tolmiea menziesii and T. diplomenziesii.
802
803
804
805
806
807
808
809
810
811
812 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 33
s
ranscriptome T Per Per Cell Per Biomas
Up-regulated in 4x
Not DE Down-regulated in 4x A B C D Number/Proportion of loci: 1,392/5.19%: None 167/0.623%: None 48/0.179%: None 1,398/5.21%: Photosystem I, Photosynthesis, Signifcantly enriched GO categories light harvesting, Chlorophyll binding, Protein- chromophore linkage, Photosystem II, Auxin polar transport, Protein xylosyltransferase activity, Chloroplast thylakoid membrane, Anthocyanin accumulation in tissues, Auxin- s activated signaling pathway, Lateral root formation, Positive gravitropism, Auxin infux ranscriptome transmembrane transporter, Chondroitin T sulfate biosynthetic, Chloroplast thylakoid Per Per Cell Per Biomas
Up-regulated in 4x E F G H
Not DE
Down-regulated in 4x
Number/Proportion of loci: 714/2.66%: None 37/0.138%: None 442/1.65%: None 357/1.33%: Drug transmembrane transport Signifcantly enriched GO categories activity, Hydrogen ion transmembrane transport, Proanthocyanidin biosynthetic process, Solute: proton antiporter activity, Maintenance of seed dormancy
813
814 Figure 7: Loci binned by their DE categorization across the three normalization
815 approaches. The number of loci belonging to each bin and the results of GO enrichment
816 analyses are reported below the corresponding bin. Bins containing no loci are not shown.
817
818
819
820
821
822
823 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 34 "diploid" "polyploid" Leaf punch Cell Transcript
Conservation of gene expression level per biomass (per biomass conservation)
per per 10x per 10x per
824 825 Figure 8: A simplified example of an interaction between expression level and cell density.
826 =Conservation of gene expression per biomass occurs when expression level per cell in
827 samples with lower cell density is up-regulated enough to yield equivalent levels of
828 transcript per unit biomass.
829
830
831
832
833
834
835
836 bioRxiv preprint doi: https://doi.org/10.1101/169367; this version posted July 28, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 35
Photosynthetic rate 10 8 1 6 µmol CO2 m-2 s- 4 2
Tolmiea diplomenziesii Tolmiea menziesii
837 838 Figure 9: Adapted from data collected from Visger et al. 2016. Tolmiea diplomenziesii and
839 T. menziesii did not significantly differ in photosynthetic output under common garden
840 conditions in the green houses of University of Florida.