bioRxiv preprint doi: https://doi.org/10.1101/163980; this version posted July 15, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 Dark period transcriptomic and metabolic profiling of two diverse 2 Eutrema salsugineum accessions 3 4 Jie Yin1 5 Email: [email protected] 6 7 Michael J. Gosney2 8 Email: [email protected] 9 10 Brian P. Dilkes3* 11 * Corresponding author 12 Email: [email protected] 13 14 Michael V. Mickelbart1,2* 15 * Corresponding author 16 Email: [email protected] 17 18 1 Department of Horticulture and Landscape Architecture, Purdue University, West 19 Lafayette, Indiana 47907, USA 20 21 2 Department of Botany and Plant Pathology, Purdue University, West Lafayette, Indiana 22 47907, USA 23 24 3 Department of Biochemistry, Purdue University, West Lafayette, Indiana 47907, USA 25
1
bioRxiv preprint doi: https://doi.org/10.1101/163980; this version posted July 15, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
26 27 Abstract 28 29 Eutrema salsugineum is a model species for the study of plant adaptation to abiotic stresses. Two 30 accessions of E. salsugineum, Shandong (SH) and Yukon (YK), exhibit contrasting morphology, 31 biotic, and abiotic stress tolerance. Transcriptome and metabolic profiling from tissue samples 32 collected during the dark period were used to investigate the molecular and metabolic bases of 33 these contrasting phenotypes. RNA sequencing identified 17,888 expressed genes, of which 157 34 were not in the published reference genome and 65 were detected for the first time. Differential 35 expression was detected for only 31 genes. The RNA sequencing data contained 14,808 single 36 nucleotide polymorphisms (SNPs) in transcripts, 3,925 of which are newly identified. Among the 37 differentially expressed genes, there were no obvious candidates for the physiological or 38 morphological differences between SH and YK. Metabolic profiling indicated that YK 39 accumulates free fatty acids and long-chain fatty acid derivatives as compared to SH; whereas 40 sugars are more abundant in SH. Metabolite levels suggest that carbohydrate and respiratory 41 metabolism, including starch degradation, is more active during the first half of the dark period 42 in SH. These metabolic differences may explain the greater biomass accumulation in YK over 43 SH. The accumulation of 56% of the identified metabolites was lower in F1 hybrids than the mid- 44 parent averages and the accumulation of 17% of the metabolites in F1 plants transgressed the 45 level in both parents. Concentrations of several metabolites in F1 hybrids agree with previous 46 studies and suggest a role for primary metabolism in heterosis. The improved annotation of the E. 47 salsugineum genome and newly-identified high-quality SNPs will permit accelerated studies 48 using the standing variation in this species to elucidate the mechanisms of its diverse adaptations 49 to the environment. 50 51 Keywords: extremophile, differentially expressed genes, transcriptome, single nucleotide 52 polymorphisms, metabolomics, fatty acid metabolism, transgressive variation
2
bioRxiv preprint doi: https://doi.org/10.1101/163980; this version posted July 15, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
53 54 55 Introduction 56 57 Eutrema salsugineum (formerly Thellungiella halophila) is a model species for the study of plant 58 stress tolerance (Amtmann, 2009; Bressan et al., 2001; Griffith et al., 2007; Oh et al., 2010; 59 Orsini et al., 2010; Pilarska et al., 2016; Taji et al., 2004; Volkov et al., 2003; Wong et al., 2005). 60 The two most commonly studied accessions, Shandong (SH) and Yukon (YK), are native to the 61 Yellow River region of China (Bressan et al., 2001; Inan et al., 2004) and the Yukon territories 62 of Canada (Wong et al., 2005), respectively. These accessions contrast in cold tolerance (Lee et 63 al., 2012), water stress tolerance (MacLeod et al., 2014; Xu et al., 2014), and disease resistance 64 (Yeo et al., 2014). In response to water stress for example, YK accumulates more cuticular wax 65 (Xu et al., 2014), exhibits delayed wilting due to higher leaf water content, and maintains a 66 higher leaf area, as compared to SH (MacLeod et al., 2014). Some differences in adaptive 67 mechanisms have been linked to metabolism, such as a more pronounced increase in fructose 68 and proline content after cold acclimation in YK compared to SH (Lee et al., 2012). 69 70 RNA-seq has been used to identify differentially expressed genes (DEGs) that contribute to 71 genotypic variation in response to physiological conditions (Bazakos et al., 2012; Huang et al., 72 2012a; Huber-Keener et al., 2012; Niederhuth et al., 2013; Stein and Waters, 2012; Wang et al., 73 2009). When comparing expression between polymorphic accessions of a species, RNA-seq 74 allows for the simultaneous identification and quantification of single nucleotide polymorphisms 75 (SNPs) (Nielsen et al., 2011; Wang et al., 2009). These data can later be utilized as genetic 76 markers for linkage mapping (Chen et al., 2011; Hancock et al., 2011; Hyten et al., 2009; Lisec 77 et al., 2008; Trick et al., 2012) and to assess allele-specific expression (Oshlack et al., 2010; 78 Pickrell et al., 2010). 79 80 Plant metabolic profiling permits the simultaneous measurement of multiple intermediates and 81 the end-products of biochemical pathways. Similar to RNA-seq, metabolic profiling can be used 82 to investigate the metabolic and physiological status of biological systems (Fiehn et al., 2000) 83 and may provide biochemical bases for differences in growth and physiology (Meyer et al., 84 2007). Metabolite profiling has revealed correlations between particular metabolites and growth 85 in Arabidopsis thaliana (Meyer et al., 2007), but it is not clear whether these metabolites are 86 more universally linked to heterosis. 87 88 The use of combined transcriptome and metabolome analyses have been successfully utilized to 89 identify novel genes and enzymes in plant metabolite biosynthesis pathways (Boke et al., 2015; 90 Sumner et al., 2015). In addition, this approach has also provided insights into pathways that are 91 affected by gene mutation (Etalo et al., 2013; Masclaux-Daubresse et al., 2014; Page et al., 2016; 92 Satou et al., 2014), that are related to responses to biotic stresses (Gurkok et al., 2015; Liu et al., 93 2016), and abiotic stresses (Bielecka et al., 2014; Hamanishi et al., 2015). 94 95 Together, transcriptome and metabolite profiling provide complementary experimental evidence 96 to guide the construction of rational hypotheses for the biochemical basis of variation in growth. 97 The goal of this study is to identify the metabolic and transcription bases for the growth 98 differences between the two E. salsugineum accessions. We utilized contrasting genotypes to
3
bioRxiv preprint doi: https://doi.org/10.1101/163980; this version posted July 15, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
99 identify genetic differences, expression divergence and metabolic compounds associated with 100 observed phenotypic variation. We obtained gene expression and metabolite concentration data 101 from SH and YK accessions during the dark period by RNA-seq and gas chromatography mass 102 spectrometry (GC/MS), respectively. We utilized RT-PCR to identify DEGs implicated by our 103 RNA-seq experiment, validating 23 of 25 candidates. By mining the RNA-seq experiment, we 104 validated previously identified SNPs and discovered additional SNPs that differentiate these E. 105 salsugineum accessions. We propose that observed differences in metabolite accumulation could 106 contribute to differences in biomass. 107 108 Materials and Methods 109 110 Plant material and growth conditions 111 112 Seeds of E. salsugineum SH and YK accessions were obtained from individual selfed plants. To 113 generate SH x YK F1 seed, closed flower buds of YK plants were manually opened, anthers 114 removed, and pollen from SH plants was applied to the stigma. Multiple crosses (ca. 20) were 115 made on a single plant, which was kept in isolation until seed set. Seeds from this plant were 116 used for F1 experiments. 117 118 Seeds were stratified in the dark for 10 days at 4°C then sown in a 4:1 mix of Fafard 52 Mix 119 (transcriptome and first metabolome experiments) or Fafard 2 Mix (second metabolome 120 experiment) soilless media (Conrad Fafard, Inc., Agawam, MA, USA) and Turface calcined clay 121 (Profile Products LLC., Buffalo Grove, IL, USA). Plants were grown in 50 mL conical tubes 122 (USA Scientific, Inc., Ocala, FL, USA). A 0.5 cm hole was drilled at the bottom of each tube 123 where seeds were sown. Tubes were then closed with a mesh cap and placed cap-side-down to 124 allow for sub-irrigation. Tubes were placed in a mist room for 12 days, at which point seedlings 125 had reached the 4-leaf stage. Plants were then grown in a growth chamber (E15; Conviron, 126 Pembina, ND, USA) set at 60% relative humidity under a 12 h photoperiod (approximately 230 127 μmol m-2 s-1 provided by both fluorescent and incandescent bulbs) with light and dark 128 temperatures of 22 and 20°C, respectively. Irrigation water was a 3:1 mix of two water-soluble 129 fertilizers (15N–2.2P–12.5K and 21N–2.2P–16.6K; The Scotts Co., Marysville, OH, USA) to 130 supply (in mg L-1): 200 N, 26 P, 163 K, 50 Ca, 20 Mg, 1.0 Fe, 0.5 Mn and Zn, 0.24 Cu and B, 131 and 0.1 Mo. 76% of the nitrogen was provided as nitrate. 93% sulfuric acid (Brenntag North 132 America, Inc., Reading, PA, USA) at 0.08 mg L-1 was mixed in irrigation water to maintain pH 133 between 5.8 and 6.2. All plants were grown in the Purdue University Horticulture Plant Growth 134 Facility (https://ag.purdue.edu/hla/Hort/Greenhouse/Pages/Default.aspx).
135 136 Transcriptome sequencing and analysis 137 138 For RNA extraction, tissue was collected in the middle of the dark period. Four biological 139 replicates consisting of five plants each were collected. Whole rosettes of 4-week-old plants were 140 frozen in liquid nitrogen and stored at −80°C. Tissue was finely ground in liquid nitrogen using a 141 mortar and pestle. Approximately 800 mg ground tissue was combined with 1 ml of TRIzol 142 reagent (Gibco/BRL Life Technologies, Invitrogen, Carlsbad, CA, USA) and RNA was extracted 143 according to the manufacturer’s instructions. Quality of RNA was estimated by the ratio of 144 absorbance at 260 to 280 nm and 260 to 230 nm by spectrophotometer (DU 730; Beckman 4
bioRxiv preprint doi: https://doi.org/10.1101/163980; this version posted July 15, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
145 Coulter, Inc., Indianapolis, IN, USA) with both ratios between 1.8 and 2.2. RNA was reverse- 146 transcribed into cDNA using the Poly(A) Purist protocol (Ambion, Austin, TX, USA). 147 148 The cDNA samples were fragmented into 300–500 bp molecules and sequencing libraries were 149 constructed for the 454 GS-FLX instrument (454 Life Sciences, Roche Company, Branford, CT, 150 USA) at the Purdue University Genomics Facility (West Lafayette, IN, USA). One sequencing 151 run was performed (Margulies et al., 2005). Raw sequences were trimmed to remove adaptor 152 sequences, and an initial quality trimming was performed using GS De Novo Newbler (v2.5.3; 153 default parameters) (Part C – GS De Novo Assembler, GS Reference Mapper, SFF Tools, 2010). 154 Trimmed reads were further trimmed using the FASTX-Toolkit (Gordon and Hannon, 2010), 155 with a minimum quality value of 12 and minimum read length of 50 bp. Trimmed reads were 156 aligned to the Joint Genome Institute (JGI) E. salsugineum genome (Phytozome v9.1: 157 Thellungiella halophila: http://www.phytozome.net/thellungiella.php; Yang et al., 2013) using 158 the splice-aware aligner GMAP v2012-11-27 with default parameters (Wu and Watanabe, 2005). 159 Only uniquely mapped reads were used for transcript abundance estimates and single nucleotide 160 polymorphism (SNP) calling. 161 162 SNPs were detected using mpileup from SAMtools v0.1.18 with mapping quality ≥ 15, and 163 depth ≥ 3 (Li et al., 2009). 454 sequencing has a high error rate for detecting indels (Margulies et 164 al., 2005), so only SNPs resulting from substitutions were retained. The two accessions are 165 substantially inbred lines and should be homozygous at each base position. Hence, only 166 monomorphic base positions within each accession were considered for detection of differences 167 between the two accessions. Custom Perl scripts were used to remove SNPs 1) that were 168 heterozygous within either accession, 2) that were not biallelic between accessions, 3) that were 169 supported by fewer than 3 sequence reads, 4) for which the alternative allele accounts for fewer 170 than 10% of aligned reads, and 5) that were heterozygous between the SH accession and the JGI 171 SH reference. If more than 4 SNPs were detected within a 100 bp region using the 172 VariantFiltration module from GATK v2.4.9 (McKenna et al., 2010), they were not included in 173 the final SNP dataset. SNPs that had mpileup quality score 999 based SAMtools were deemed 174 “high quality” SNPs. on are Sanger sequencing data of the YK accession available from the 175 National Center for Biotechnology Information (NCBI) (Wong et al., 2005) was also aligned to 176 the reference genome using SSAHA2 (Ning et al., 2001). SNPs were called using SAMtools and 177 filtered for clustered SNPs (4 SNPs within 100 bp region) using GATK as indicated. SNPs that 178 were not biallelic or were heterozygous within YK were removed. 179 180 Genes were identified via a reference annotation-based transcript assembly method using the 181 Cufflinks package (Roberts et al., 2011; Trapnell et al., 2010). Reads from SH and YK were 182 assembled separately and then merged using the cuffmerge command (Roberts et al., 2011; 183 Trapnell et al., 2010). The intersect function within BEDTools v2.17.0 was used to identify 184 genes not annotated in the JGI E. salsugineum genome (newly annotated genes). Same- 185 strandedness was not enforced when identifying newly annotated genes because of the non- 186 strand-specific protocol for 454 library preparation. Newly annotated genes that are unique from 187 or overlap genes annotated by Champigny et al. (2013) but are present in the JGI reference 188 genome were also identified using the same method (Supplementary Table 1). 189 190 The number of reads uniquely aligned to each gene was determined with htseq-count within
5
bioRxiv preprint doi: https://doi.org/10.1101/163980; this version posted July 15, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
191 HTSeq v0.5.4p5 (http://www-huber.embl.de/users/anders/HTSeq/doc/count.html) using union 192 mode. The bioconductor package “DESeq” v.1.14.0 was used to identify genes likely to be 193 differentially expressed between SH and YK without biological replicates (Anders and Huber, 194 2010). Gene expression was normalized, and the significance threshold for differential 195 expression was based on a 0.2 false discovery rate (FDR) (Benjamini and Hochberg, 1995). 196 197 Genes were annotated by the best BLAST (Altschul et al., 1990) hit of A. thaliana. For E. 198 salsugineum genes predicted by the JGI genome, annotation was taken from Phytozome v9.1 199 (Phytozome v9.1: Thellungiella halophila: http://www.phytozome.net/thellungiella.php Yang et 200 al., 2013). For newly identified genes, nucleotide databases of A. thaliana version TAIR 10 201 (Lamesch et al., 2012; TAIR10: 202 ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/), A. lyrata (Hu et 203 al., 2011; Phytozome v9.1: Arabidopsis lyrata: http://www.phytozome.net/alyrata.php), and S. 204 parvula (Dassanayake et al., 2011; Thellungiella parvula genome: 205 http://www.thellungiella.org/data) were built and used for similarity searching by BLASTN 206 v2.2.28+ (Altschul et al., 1990). Genes were annotated by the best BLAST hit with the following 207 threshold parameters: E ≤ 1-30; sequence identity ≥ 30%; sequence aligned ≥ 30% of query 208 sequence. 209 210 Quantitative real-time PCR 211 212 For quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR), tissue was 213 collected as described for 454 sequencing, RNA was extracted using the RNeasy Plant Mini Kit 214 (QIAGEN, Valencia, CA, USA), and genomic DNA was removed using the TURBO DNA- 215 free™ Kit (Ambion, Grand Island, NY, USA). The quality and quantity of mRNA was assessed 216 using a NanoDrop 2000 (Thermo Fisher Scientific Inc., Wilmington, DE, USA). All samples 217 were diluted to 200 ng μl-1, and 260/280 and 260/230 ratios were between 1.8 and 2.2. cDNA 218 was synthesized using a High-Capacity cDNA Reverse Transcription Kit (Invitrogen, Grand 219 Island, NY, USA). Primers were designed using Primer Express Software (v3.0.1). Primer 220 specificity was then estimated by BLASTN using the E. salsugineum genome with all primer 221 pairs. Primer efficiency was tested for all pairs of primers. cDNA was diluted 5 times by a 5-fold 222 gradient, and then used as template for qRT-PCR, the threshold cycles (CT) were regressed 223 against cDNA concentration (log). Slope of the regression line was estimated and the efficiency 224 was calculated as 10-(1/slope)-1. For genes expressed in both accessions, primer efficiency was 225 between 80 and 110% in both accessions. For genes that were only expressed in one accession 226 based on RNA-seq data, primer efficiency was tested on both accessions, but only the accession 227 with detected expression exhibited efficiency between 80 and 110%. Supplementary Table 2 228 contains all primer sequences except gene XLOC_004723, for which acceptable qRT-PCR 229 primers could not be designed. 230 231 All qRT-PCR reactions were conducted in StepOnePlus™ Real-Time PCR Systems (Applied 232 Biosystems, Invitrogen, Grand Island, NY, USA). Relative gene expression of target genes was 233 quantified by the ΔCT method (Livak and Schmittgen, 2001). Relative gene expression was 234 calculated as: 235 236 Relative expression 1 č
6
bioRxiv preprint doi: https://doi.org/10.1101/163980; this version posted July 15, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.