Genome-Wide Profiling and Characterization of Terpene Synthase-Linked Simple Sequence Repeats in Coffea Canephora Pierre Ex A
Total Page:16
File Type:pdf, Size:1020Kb
Philippine Journal of Science 148 (S1): 267-273, Special Issue on Genomics ISSN 0031 - 7683 Date Received: 14 May 2019 Genome-wide Profiling and Characterization of Terpene Synthase-linked Simple Sequence Repeats in Coffea canephora Pierre ex A. Froehner for Identification of Potential Markers for Aroma in Philippine-grown Coffee Varieties Daisy May C. Santos1, Angelo Joshua A. Victoria1,2, Carla Francesca F. Besa2, and Ernelea P. Cao1 1Institute of Biology, College of Science 2Philippine Genome Center (PGC) University of the Philippines (UP) Diliman, Quezon City 1101 Philippines Aroma plays an important role in determining the market value of coffee. The Philippines produces four coffee varieties – namely, Arabica, Robusta, Liberica, and Excelsa for commercial use. In this study, simple sequence repeats (SSRs) linked to terpene synthase (TPS) genes were used to characterize and differentiate the Philippine coffee varieties. SSR loci were mined from the existing sequenced genome of Coffea canephora. Data mining yielded 759,747 perfect SSRs, of which 317 were linked to the aroma. Two hundred were screened using Philippine Arabica, Robusta, Liberica, and Excelsa samples. There were 27 loci that were successfully amplified for all specimens and showed clear polymorphisms. These were used to characterize the coffee varieties based on number of alleles and heterozygosity. The number of alleles ranged from 2 to 8, while the expected heterozygosity ranged from 0.18 to 0.85. There were 26 bands from 17 loci that could differentiate among varieties, of which 15 bands were unique to C. arabica (Arabica variety), three to C. canephora (Robusta variety), three to C. liberica (Liberica and Excelsa varieties), and five that were shared by C. arabica and C. canephora. This study is the first to characterize TPS-linked SSR markers in Philippine grown coffee varieties. Keywords: aroma, coffee, microsatellites, terpene synthase INTRODUCTION because it is high-yielding and tolerant to diseases (Butt and Sultan 2011). Coffea liberica has two varieties The Philippines is one of the few countries that cultivate – Liberica and Excelsa. Coffea liberica var. liberica four coffee varieties for commercial use. These are the (Liberica) is locally known as the “kapeng barako.” It is Arabica, Robusta, Liberica, and Excelsa coffee. Coffea characterized by its strong, woody, and bitter taste with arabica (Arabica) is considered as the world’s best a pungent aroma (N’Diaye et al. 2005). Coffea liberica coffee due to its superior taste and aroma (Vieira et al. var. dewevrei (Excelsa), on the other hand, has a sweet 2010). Coffea canephora (Robusta) accounts for 69% and fruity aroma. of Philippine coffee production (2017–2022 Philippine Coffee Industry Roadmap). Although inferior to Arabica In the coffee industry, bean quality is valued for its taste in terms of taste and aroma, it is of certain advantage and aroma. The coffee bean contains a complex mixture of different substances such as lipids, trigonelline, chlorogenic *Corresponding Author: [email protected] 267 Special Issue on Genomics Santos et al.: TPS-linked SSRs in Philippine Coffee acid, and volatile compounds. Volatile compounds are said genetic aspect of Philippine coffee aroma. SSR analysis to be emitted to give coffee its characteristic aroma. These provides a cheaper and quicker method of differentiating volatile organic compounds can be classified according to TPS genes of Philippine coffee varieties. their metabolic origins. Examples are terpenes, benzenoids, and small organic compounds such as alcohols, ketones, aldehydes, and esters (Del Terra et al. 2013). Among the classes of volatile compounds, terpenes were determined to MATERIALS AND METHODS be the major component of the ripe coffee berries (Mathieu et al. 1998). These terpenes are catalyzed from prenyl Reference Genome Retrieval, Pre-processing, and diphosphates by TPSs. SSR Loci Mining Volatile compounds in coffee are usually determined The Coffea canephora reference genome (Denoeud et through gas chromatography/mass spectrometry analysis. al. 2014) was downloaded from the Coffee Genome Hub Volatile profiling of coffee beans can be used to database (Dereeper et al. 2015). The genome was split into discriminate between species, determine place of origin, 12 files – one for each of the 11 chromosomes (Chr1, ..., and determine bean quality (Agresti et al. 2008, Cheong Chr11) – with an additional file for the non-mapped reads et al. 2013, Scholz et al. 2014). Indirect aroma profile (Chr0). The files were pre-processed by cleaving the reads characterization of coffee varieties may be done by wherever there were ambiguous bases (N). characterizing TPS-linked SSR markers. SSR markers are Mining of SSR loci consisting of at least eight repeats from short, tandem repeats of DNA present in the coding and each of the C. canephora chromosomes was done using non-coding portions of the genome (Wang et al. 2009). three SSR-mining tools with different search algorithms The abundance and highly polymorphic property of to cover as many SSR loci as possible – namely MISA SSRs make them good markers for plant genetic studies, with a dictionary approach (Thiel et al. 2003), MREPS identification of cultivars, and evaluating varieties with a with a two-phased algorithm (Kolpakov et al. 2003), and narrow genetic base (Vieira et al. 2010, Wang et al. 2009). SciRoko with a sliding window approach (Kofler et al. SSRs have already been used for varietal identification 2007). The results of the three search tools were then and evaluation of genetic diversity in coffee (Anthony collated into a non-redundant list of SSR loci for the C. et al. 2001, 2002; Geleta et al. 2012; Lashermes et al. canephora reference genome. 1999; Teressa et al. 2010; Vieira et al. 2010). In other studies, Arabica DNA fingerprinting using SSR markers Prior to primer design, the predicted SSR loci were filtered has also been developed as a method to distinguish from according to the following criteria: (a) the locus must be the Robusta variety to ensure authenticity of the coffee perfect/pure (no interruptions by any base in the motif), product sold in the market (Tornincasa et al. 2010). SSRs (b) the locus must not have a mononucleotide repeating have also been used to evaluate certain traits such as leaf unit, and (c) the locus must be at least 20 bases away from miner resistance in Arabica coffee (Pereira et al. 2011). either end of a contig. Sequences of SSR loci that passed the aforementioned criteria were then extracted to generate The sequencing of the diploid (2n = 2x = 22) C. canephora the input files for primer design. Pierre ex A. Froehner genome by Denoeud and co-workers (2014) has provided information on the genes associated with the biosynthesis of terpenes. The elucidation of the Primer Design whole genome sequence allows for the development of To design primer pairs for the amplification of the markers using bioinformatics analysis, namely genome- predicted perfect SSR loci, which were the candidate wide SSR mining and primer design. Mining of SSRs loci for marker development, Primer3 (Untergasser et al. in C. canephora has been previously reported by Ogutu 2012) was used. The designed primers were set to have et al. (2016) using a single motif-finding tool for use in the following properties: (a) the resulting amplicon must Kenyan C. canephora. This study aimed to: (1) utilize be 100–500 bases, (b) primer length must be between 17 the sequenced genome of C. canephora to mine SSR loci and 25 bases with an optimal length of twenty bases, (c) using three SSR-mining tools, (2) design primers for the primer melting temperature must be between 53 and 60 loci, (3) use the annotation information of the reference °C, with an optimal melting temperature of 55 °C, (d) genome to guide the selection of SSR loci that are likely primer G-C content must be between 20 and 80% with to be related to genes contributing to aroma, and (4) an optimal G-C content of 50%, and (e) a primer must characterize the designed SSR primers using Philippine not have a homopolymer stretch longer than three bases. coffee grown varieties. Since very little information is The output of Primer3 was collated into a tab-separated available in the literature regarding the aroma of Philippine matrix with information on the chromosome number, coffee, this study will provide baseline information on the primer sequence, melting temperature, G-C content, and estimated product length. 268 Special Issue on Genomics Santos et al.: TPS-linked SSRs in Philippine Coffee Filtering of TPS-linked Primer Pairs RESULTS AND DISCUSSION The output of Primer3 was then filtered to collect primer pairs that hit or has at least one base-pair overlap with The study reports the genome-wide analysis of SSR motifs exons as indicated in the Generic File Format (GFF3) in C. canephora and primer design was guided with the annotation file of the C. canephora genome. Primers were use of gene annotations for aroma. After dereplication of filtered according to the annotations by Denoeud et al. results from the three SSR-mining tools MISA, MREPS, (2014). Primers with hits for TPS genes were classified for and SciRoKo, a total of 2,640,056 SSR loci were mined the aroma trait. The output tab-separated matrix preserved from the C. canephora reference genome. Among these, the information from the original output from Primer3. 1,880,309 (71.22%) were found to be ambiguous and The primers were synthesized by a commercial company. 759,747 (28.78%) were perfect. These results were contingent on the search criteria used and may not reflect the actual distribution of SSRs in the Coffea canephora Plant Material and DNA Extraction reference genome. However, the use of the three tools Young leaf samples from four Philippine grown coffee with differing search algorithms aimed to ensure that as varieties (Arabica, Robusta, Liberica, and Excelsa) were many SSR loci as possible were included.