Downloaded Publicly Available RNA-Seq (Schmitz, Schultz Et Al
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/037077; this version posted January 18, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Genetic regulation of transcriptional variation in 2 wild-collected Arabidopsis thaliana accessions 3 4 Yanjun Zan1, Xia Shen1,2, 3,4, Simon K. G. Forsberg 1,Örjan Carlborg1* 5 1Department of Clinical Sciences, Division of Computational Genetics, Swedish University of 6 Agricultural Sciences, Uppsala, Sweden 7 2Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, 8 Edinburgh, United Kingdom 9 3Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, 10 Sweden 11 4MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University 12 of Edinburgh, Edinburgh, UK 13 14 *To whom correspondence should be addressed: [email protected] 15 16 17 1 bioRxiv preprint doi: https://doi.org/10.1101/037077; this version posted January 18, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Abstract 2 3 An increased understanding of how genetics contributes to expression variation in 4 natural Arabidopsis thaliana populations is of fundamental importance to understand 5 adaptation. Here, we reanalyse data from two publicly available datasets with 6 genome-wide data on genetic and transcript variation from whole-genome and 7 RNA-sequencing in populations of wild-collected A. thaliana accessions. We found 8 transcripts from more than half of all genes (55%) in the leaf of all accessions. In the 9 population with higher RNA-sequencing coverage, transcripts from nearly all 10 annotated genes were present in the leaf of at least one of the accessions. Thousands 11 of genes, however, were found to have high transcript levels in some accessions and 12 no detectable transcripts in others. The presence or absence of particular gene 13 transcripts within the accessions was correlated with the genome-wide genotype, 14 suggesting that part of this variability was due to a genetically controlled 15 accession-specific expression. This was confirmed using the data from the largest 16 collection of accessions, where cis-eQTL with a major influence on the presence or 17 absence of transcripts was detected for 349 genes. Transcripts from 172 of these 18 genes were present in the second, smaller collection of accessions and there, 81 of the 19 eQTLs for these genes could be replicated. Twelve of the replicated genes, including 20 HAC1, are particularly interesting candidate adaptive loci as earlier studies have 21 shown that lack-of-function alleles at these genes have measurable phenotypic effects 22 on the plant. In the larger collection, we also mapped 2,320 eQTLs regulating the 23 expression of 2,240 genes that were expressed in nearly all accessions, and 636 of 24 these replicated in the smaller collection. This study thus provides new insights to the 25 genetic regulation of global gene-expression diversity in the leaf of wild-collected A. 26 thaliana accessions and in particular illustrate that strong cis-acting polymorphisms 27 are an important genetic mechanisms leading to the presence or absence of transcripts 28 in individual accessions. 29 30 Key words: eQTL mapping, RNA sequencing, Gene Expression, Adaptation. 2 bioRxiv preprint doi: https://doi.org/10.1101/037077; this version posted January 18, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Introduction 2 Several earlier studies have utilized genome-wide genotype data to explore the 3 genetic regulation adaptive phenotypes in populations of wild-collected Arabidopsis 4 thaliana accessions (Atwell, Huang et al. 2010, Cao, Schneeberger et al. 2011). Some 5 of the phenotypic variation in these populations is likely to be due to regulatory 6 genetic variants in the genome. A way to potentially identify putative adaptive 7 mutations is therefore to map genetic variants that lead to differences in gene 8 expression (Expression Level Polymorphisms or ELPs for short), as have been done 9 earlier in various organisms (Wang, Stec et al. 1999, Carroll 2000, Brem, Yvert et al. 10 2002, Kliebenstein, Pedersen et al. 2002, Carneiro, Piorno et al. 2015). In A. thaliana, 11 many naturally occurring ELPs with significant phenotypic changes have been 12 reported, including semi-dwarfs (Barboza, Effgen et al. 2013), changes in 13 flowering-time (Schwartz, Balasubramanian et al. 2009), changes in seed flotation 14 (Saez-Aguayo, Rondeau-Mouro et al. 2014) and changes in self-incompatibility 15 (Nasrallah, Liu et al. 2004). Extending such efforts to a whole genome and 16 transcriptome level in large collections of natural A. thaliana accessions might 17 provide useful insights to the link between genetic and expression level variation in 18 order to ultimately reveal how this contributes to adaptation to the natural 19 environment. 20 21 Only a few studies have explored the expression variation in natural A. thaliana 22 populations. A microarray based study on shoot-tissue from two accessions, Sha and 23 Bay-0, found that (i) 15,352 genes (64% of all tested) were expressed in the shoot in 24 the field, (ii) 3,344 genes (14% of all tested) were differentially expressed between 25 the accessions and (iii) 53 genes were uniquely expressed in Sha/Bay-0, respectively 26 (Richards, Rosas et al. 2012). Using RNA-sequencing Gan et al. (Gan, Stegle et al. 27 2011) studied the seedling transcriptome from 18 wild-collected accessions to reveal 28 that 75% (20,550) of the protein-coding genes, 21% of the non-coding RNAs and 21% 29 of the pseudogenes were expressed in the seeding tissue of at least one of the 30 accessions. Further, they also found that 46% (9,360) of the expressed protein-coding 31 genes were differently expressed between at least one pair of accessions (Gan, Stegle 32 et al. 2011). Although these studies suggest that there is a considerable transcriptional 33 variation between natural A. thaliana accessions, further studies are needed to obtain a 3 bioRxiv preprint doi: https://doi.org/10.1101/037077; this version posted January 18, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 more complete picture about the actual variation in the worldwide A. thaliana 2 population. 3 4 Expression quantitative trait loci (eQTL) mapping is a useful approach to link genetic 5 and expression variation and a starting-point to identify the regions containing the 6 functional ELPs. It has been successfully applied in many organisms, including yeast 7 (Brem, Yvert et al. 2002), plants (West, Kim et al. 2007), as well as animals and 8 humans (Schadt, Monks et al. 2003). Most eQTLs are detected in the close vicinity of 9 the gene itself (cis-eQTL) and these often explain a large proportion of the observed 10 expression variation (Li, Álvarez et al. 2006, Petretto, Mangion et al. 2006, Wentzell, 11 Rowe et al. 2007). Fewer eQTL (from 20-50% reported in various organisms (Morley, 12 Molony et al. 2004, Li, Álvarez et al. 2006, Keurentjes, Fu et al. 2007)) are located in 13 the remainder of the genome (trans-eQTL). Using eQTL mapping, the genetic 14 regulation of expression variation has been dissected also in A. thaliana using 15 recombinant inbred lines (RIL) (Keurentjes, Fu et al. 2007, West, Kim et al. 2007) 16 and other experimental crosses (Zhang, Cal et al. 2011). These initial studies, 17 however, had a limited ability to dissect the genetic basis of adaptive expression 18 variation. First, studied experimental populations that only captured a small part of 19 the genetic diversity present in the natural A. thaliana population. Second, the 20 measures of gene-expression were from microarray-data, making them indirect as 21 they were based on quantifying the light intensity emitted after hybridizing the 22 extracted mRNA with predesigned probes on the array. As a consequence of this, 23 these studies were limited by the inherent drawbacks of this technology, including the 24 low specificity of probes, the lack of specificity for different isoforms of a gene as 25 well as experimental batch effects (Kothapalli, Yoder et al. 2002). Further, in the 26 context of eQTL mapping, the limited detection range of the microarray 27 measurements (i.e. the upper and lower bounds for detection of expression) make it 28 unlikely that this data captured the full range of expression differences present in 29 nature. Power was therefore lost in cases when the biological range of expression 30 exceeded the detection boundaries of the array. It is now possible to quantify 31 transcript levels using RNA sequencing (RNA-seq) instead. This facilitates eQTL 32 mapping where expression-levels are quantified with higher precision and specificity 33 than in the hybridization-based microarray-based approaches. Although RNA-seq has 4 bioRxiv preprint doi: https://doi.org/10.1101/037077; this version posted January 18, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 its own shortcomings, such as sequence specific bias (Schwartz, Oren et al. 2011), 2 fragment bias and amplification bias (Roberts, Trapnell et al. 2011), it is an attractive 3 method for scoring expression in eQTL analyses as it overcomes many of the 4 fundamental shortcomings of microarrays by directly counting reads mapped to a 5 certain transcript (Wang, Gerstein et al.