
Mora-Ortiz et al. BMC Genomics (2016) 17:756 DOI 10.1186/s12864-016-3083-6 RESEARCHARTICLE Open Access De-novo transcriptome assembly for gene identification, analysis, annotation, and molecular marker discovery in Onobrychis viciifolia Marina Mora-Ortiz1,3, Martin T. Swain2, Martin J. Vickers2,4, Matthew J. Hegarty2, Rhys Kelly2, Lydia M. J. Smith1 and Leif Skøt2* Abstract Background: Sainfoin (Onobrychis viciifolia) is a highly nutritious tannin-containing forage legume. In the diet of ruminants sainfoin can have anti-parasitic effects and reduce methane emissions under in vitro conditions. Many of these benefits have been attributed to condensed tannins or proanthocyanidins in sainfoin. A combination of increased use of industrially produced nitrogen fertilizer, issues with establishment and productivity in the first year and more reliable alternatives, such as red clover ledtoadeclineintheuseofsainfoinsincethemiddleof the last century. In recent years there has been a resurgence of interest in sainfoin due to its potential beneficial nutraceutical and environmental attributes. However, genomic resources are scarce, thus hampering progress in genetic analysis and improvement. To address this we have used next generation RNA sequencing technology to obtain the first transcriptome of sainfoin. We used the library to identify gene-based simple sequence repeats (SSRs) and potential single nucleotide polymorphisms (SNPs). Results: One genotype from each of five sainfoin accessions was sequenced. Paired-end (PE) sequences were generated from cDNA libraries of RNA extracted from 7 day old seedlings. A combined assembly of 92,772 transcripts was produced de novo using the Trinity programme. About 18,000 transcripts were annotated with at least one GO (gene ontology) term. A total of 63 transcripts were annotated as involved in the tannin biosynthesis pathway. We identified 3786 potential SSRs. SNPs were identified by mapping the reads of the individual assemblies against the combined assembly. After stringent filtering a total of 77,000 putative SNPs were identified. A phylogenetic analysis of single copy number genes showed that sainfoin was most closely related to red clover and Medicago truncatula,whileLotus japonicus, bean and soybean are more distant relatives. Conclusions: This work describes the first transcriptome assembly in sainfoin. The 92 K transcripts provide a rich source of SNP and SSR polymorphisms for future use in genetic studies of this crop. Annotation of genes involved in the condensed tannin biosynthesis pathway has provided the basis for further studies of the genetic control of this important trait in sainfoin. Keywords: Transcriptome assembly, RNA-seq, Onobrychis viciifolia, Condensed tannins, Proanthocyanidins, SSR, Single nucleotide polymorphism * Correspondence: [email protected] 2Aberystwyth University, IBERS, Gogerddan, Aberystwyth, Ceredigion SY23 3EB, UK Full list of author information is available at the end of the article © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Mora-Ortiz et al. BMC Genomics (2016) 17:756 Page 2 of 13 Background literature, whereas the latter can be considered representa- Onobrychis viciifolia or sainfoin is a perennial forage leg- tive of the majority of sainfoin accessions. Polyploidy has ume crop which contains condensed tannins or proantho- been associated with the domestication process of sainfoin cyanidins (PAs). Multiple benefits to animal nutrition and in which more productive plants were selected [9, 19–21]. health have been attributed to the PA present in sainfoin. Both diploid and tetraploid accessions have a basic set These benefits include anthelminthic properties, in vitro of seven chromosomes [9, 20]. Tetraploid lines have methane emission reduction in ruminants fed on this been characterized as autopolyploids or allopolyploids. forage and prevention of the potentially life-threatening However, it is unclear whether the inheritance is tetra- bloat associated with other non-PA producing forage somic or disomic in nature [19, 22–24]. A few EST-SSR legumes [1–5]. Sainfoin is also highly drought tolerant, (expressed sequence tag-simple sequence repeat) markers duepartlytoitsdeeptaprootandisresistanttomost from Medicago truncatula have been validated in sainfoin, common pests and diseases. It also contributes to im- and some phylogenetic studies have been performed using proving soil nitrogen levels due to atmospheric nitro- sequence information from the Internal Transcribed genfixationinrootnodulesbyrhizobia [6, 7]. Spacer Region (ITS) and matK markers [25]. Genomic These benefits suggest that sainfoin could be an alterna- and molecular resources in sainfoin are however, still tive to Medicago sativa (alfalfa) as a valuable forage crop. under-developed [25–27]. To our knowledge, there are There are, however, a number of qualitative and agro- no molecular markers derived directly from sainfoin - nor nomic issues that need to be addressed before this poten- have any de novo studies been conducted in this species. tial can be realised. Sainfoin has on average a 20 % lower Our knowledge of the content, structure and complex- yield than alfalfa. This is associated with poor establish- ity of PAs in sainfoin germplasm is growing [28, 29], but ment and a smaller leaf area. Also if the drill date is de- little is known about the genetics of PA biosynthesis layed until late spring, this normally prevents harvest in and its regulation. PAs are formed by polymerisation of the first year. All these factors have discouraged growers flavan-3-ols, which in turn are products of a branch of from cultivating sainfoin more widely [8, 9] and its use the flavonoid biosynthesis pathway. The latter is well has therefore declined. Another reason for its decline is documented in many species [30, 31]. While a lot of the widespread use of inexpensive industrially produced progress has been made in recent years in Arabidopsis nitrogen fertilizer. This has had a negative impact more thaliana and forage legumes such as Medicago trunca- generally on the use of forage legumes, not just sainfoin. tula, the mechanism and genetic regulation of polymerisa- This is compounded by the lack of systematic breeding or tion of the flavan-3-ols to PAs is still not fully understood agronomic improvements in sainfoin. There is also a [30, 32]. Furthermore, PAs in the above model species are scarcity of basic genetic information available. The al- produced primarily in the seed coat [32, 33], and not, as in most complete lack of molecular markers available has sainfoin, in vegetative tissue. In sainfoin 12 cDNAs encod- hampered the development of genetic diversity infor- ing genes involved in the flavonoid biosynthesis pathway mation in germplasm, as well as analysis of the genetic were cloned and sequenced [34]. A better understanding basis of complex traits from mapping families. of the regulation of PA accumulation in vegetative tissue Next generation sequencing has revolutionized the is needed to facilitate breeding of sainfoin with improved potential for systematic crop genetic improvement, fa- PA content benefitting ruminant nutrition. Here we take a cilitating the study of genomes and transcriptomes step in this direction by reporting the first annotated tran- [10–12]. RNA-seq can be used for gene identification, scriptome library from sainfoin. It was used to identify annotation, gene ontology, expression level analysis and genes involved in the PA biosynthesis pathways. We also SSRs and SNPs mining [13–15]. A significant advantage provide data to demonstrate the potential for mining the of this strategy is that it does not require previous transcriptome for simple sequence repeats (SSRs) and knowledge of the genetic sequence of the organism. It single nucleotide polymorphisms (SNPs). is expected that RNA-seq will overtake other alternative methodologies for gene expression analysis due to the larger range of expression, base-pair resolution and Methods higher sensitivity [16–18]. Plant materials The primary aim of this work was to use next generation We selected a set of five accessions representing a range sequencing technology to develop molecular resources that of diversity [25–27]. The accessions are listed in Table 1. will facilitate the development of genetic diversity analyses Seeds were germinated in standard potting compost M2 of germplasm and provide a platform for studying the gen- under controlled glasshouse conditions under a long-day etic basis of PA biosynthesis in sainfoin. Sainfoin can be a photoperiod conditions (16/8 h light/dark). Seven day old diploid (2n = 2x = 14) or tetraploid (2n = 4x = 28) species; whole seedlings of each sainfoin accession were collected the former occurs rarely and is poorly characterized in the and used for RNA extraction. Mora-Ortiz et al. BMC Genomics (2016) 17:756 Page 3 of 13 Table 1 Onobrychis viciifolia accessions selected for sequencing The five cDNA libraries were sequenced with a HiSeq Accession
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages13 Page
-
File Size-