Assembly, Gene Annotation and Marker Development Using 454 Floral Transcriptome Sequences in Ziziphus Celata (Rhamnaceae), a Highly Endangered, Florida Endemic Plant
Total Page:16
File Type:pdf, Size:1020Kb
DNA Research Advance Access published October 27, 2011 DNA RESEARCH pp. 1–9, (2011) doi:10.1093/dnares/dsr037 Assembly, Gene Annotation and Marker Development Using 454 Floral Transcriptome Sequences in Ziziphus Celata (Rhamnaceae), a Highly Endangered, Florida Endemic Plant CHRISTINE E. Edwards1,*,†,THOMAS L. Parchman2,†, and CARL W. Weekley3 USACE ERDC, Environmental Laboratory, 3909 Halls Ferry Road, Vicksburg, MS 39180, USA1; Department of Botany, University of Wyoming, Laramie, WY 82071, USA2 and Archbold Biological Station, PO Box 2057, Lake Placid, FL 33862, USA3 *To whom correspondence should be addressed. Email [email protected] (C.E.E.); [email protected] (T.L.P.). Downloaded from Edited by Satoshi Tabata (Received 27 July 2011; accepted 26 September 2011) Abstract http://dnaresearch.oxfordjournals.org/ Large-scale DNA sequence data may enable development of genetic resources in endangered species, thereby facilitating conservation efforts. Ziziphus celata, a federally endangered, self-incompatible plant species occurring in Florida, USA, is one species for which genetic resources are necessary to facilitate new introductions and augmentations essential for recovery of the species. We used 454 pyrosequencing of a Z. celata normalized floral cDNA library to create a genomic resource for gene and marker discovery. A half-plate GS-FLX Titanium run yielded 655 337 reads averaging 250 bp. A total of 474 025 reads were assembled de novo into 84 645 contigs averaging 408 bp, while 181 312 reads remained unassembled. Forty-seven and 43% of contig consensus sequences had BLAST matches to known proteins in the Uniref50 and TAIR9 annotated protein databases, respectively; many contigs fully represented ortholo- gous proteins in TAIR9. A total of 22 707 unique genes were sequenced, indicating substantial coverage by guest on November 22, 2011 of the Z. celata transcriptome. We detected single-nucleotide polymorphisms and simple sequence repeats (SSRs) and developed thousands of SSR primers for use in future genetic studies. As a first step towards understanding self-incompatibility in Z. celata, we identified sequences belonging to the gene family encoding self-incompatibility. This study demonstrates the efficacy of 454 transcriptome sequen- cing for rapid gene and marker discovery in an endangered plant. Key words: transcriptome; conservation genetics; microsatellites; S-locus; Ziziphus 1. Introduction quantify levels of genetic diversity and identify factors such as founder effects, genetic drift, genetic While genetic information is just one of many tools bottlenecks, and inbreeding that may threaten that may be used to conserve endangered species, the endangered species.4 – 6 However, because of the use of such information is often valuable to facilitate difficulty of isolating genetic markers and the recovery efforts and the management of surviving considerable cost and time associated with extensive populations.1 Genetic data from endangered species genotyping of individuals in populations, many con- have been used for multiple conservation appli- servation-genetic studies have employed relatively cations, e.g. to carry out parentage analysis in ex situ small numbers of genetic loci, such as RFLPs, AFLPs, breeding and translocation programmes,2,3 and to and SSRs (simple sequence repeats, or microsatellites). Increasing the number of sampled loci may increase the precision and accuracy of estimates of genetic † These authors contributed equally to this work structure and population genetic parameters such as # The Author 2011. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http:// creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Page 2 of 9 Transcriptome Sequencing and Assembly in Ziziphus Celata levels of genetic diversity and inbreeding coefficients.7 markers, e.g. as of April 2011, only six DNA sequences Furthermore, large numbers of genetic loci are had been deposited at NCBI for Z. celata, and none of necessary to unequivocally assign parentage, which these represented expressed genes. Furthermore, the may be very useful for establishing genetically entire Rhamnaceae family, which contains 900 diverse populations of endangered species through species,27 was represented by fewer than 2000 the augmentation of existing genetically depauperate sequences at NCBI. The development of a genome- populations or the creation of new populations. level resource for Z. celata stands to improve conserva- Recently, however, the number of genomic-scale tion-genetic approaches in Z. celata and to provide a sequence collections has increased, enabling rapid future basis for comparative genomic studies in the gene and marker discovery for an increased number Rhamnaceae, which contains economically important of taxa.8 Previously, the development of such crop and ornamental tree and shrub species. genomic resources was largely limited to model Given the small numbers of populations and geno- organisms; however, recent advances in DNA sequen- types of Z. celata, its recovery relies on the augmenta- cing technology have reduced the cost and time tion of uniclonal populations and the introduction of required for the development of genomic resources, multiple genotypes to publicly protected sites contain- 28,29 resulting in a rapid growth in the availability of such ing appropriate habitat; however, the ability to Downloaded from data, particularly in non-model organisms.9– 11 establish self-sustaining, sexually reproducing, translo- Transcriptome sequencing, which is DNA sequencing cated populations has been complicated by the repro- of the mRNA pool of a given tissue, has allowed ductive failure of most experimental crosses. Such sequencing efforts to focus on the protein-coding reproductive failure has been attributed to self-incom- portion of the genome, and has become a valuable patibility in Z. celata. Self-incompatibility has been http://dnaresearch.oxfordjournals.org/ approach for developing genomic-level resources of documented or proposed in at least five genera in the the transcribed portion of the genome.8 Next-gener- Rhamnaceae, including Colletia,30 Discaria,31,32 ation sequencing technologies, such as 454 pyrose- Frangula,33 Trevoa,34 and Ziziphus.25,35,36 quencing, remove many time-consuming steps Gametophytic self-incompatibility (GSI) has been involved in Sanger sequencing and now facilitate tran- proposed as the system of self-incompatibility in scriptome sequencing at a minute fraction of the pre- Z. celata based on results from hundreds of exper- viously required time and cost.10 – 15 A single-plate imental hand pollinations, analysis of RAPD data, run on the 454 GS-FLX titanium pyrosequencing plat- and studies of pollen tube inhibition.25,35,37 GSI is a form typically produces around a million reads aver- system whereby pollen is rejected when its S-haplo- aging 400 bp in length, often resulting in near type is the same as either of the S-haplotypes by guest on November 22, 2011 complete transcriptome coverage.11,16,17 The gener- present in the pistil.38,39 In four of the five plant ation of such large-scale sequence data is enabling families with well-characterized GSI systems, GSI is gene discovery, molecular marker development, and mediated by the S-RNase gene.38,39 Previous research comparative analyses in ecologically important, non- suggests that a S-RNase-based system of GSI likely model plant taxa,11,18 – 22 including those of conser- evolved only once in Angiosperms,38,39 and the pres- vation concern.23 ence of S-RNase-based GSI in the Rosaceae, a plant One species for which conservation efforts family which is closely related to the Rhamnaceae, would be greatly assisted by the establishment of a suggests that this is the most likely mechanism genome-level resource is Ziziphus celata causing GSI in Rhamnaceae. (Rhamnaceae), a federally endangered, diploid shrub An understanding of the genetic basis of self- endemic to central Florida, which is currently known incompatibility and the ability to assess mating from only 14 populations along the Lake Wales types in Z. celata would greatly facilitate recovery Ridge in Polk and Highlands Counties. The species is efforts. Previous research has shown that as few as highly clonal and previous genetic analyses using allo- two mating types may be present in Z. celata wild zymes,24 RAPDs,25 AFLPs, and SSRs26 found that 9 of populations,25,35 the minimum necessary to main- the 14 extant wild populations are uniclonal (i.e. tain sexual reproduction in a GSI species; however, each population comprises a single genetic individ- more mating types may exist, as the compatibility of ual). Depending on the method of determining the most genotypes has not yet been tested. The most number of genotypes, the remaining five populations efficient approach to determine cross-compatibility comprise between 22 and 32 genotypes;26 however, is to directly sequence the genes that encode the unambiguous quantification of the number of geno- GSI reaction. Another efficient approach is parentage types in these populations will require the employ- analysis of progeny to provide indirect evidence of ment of a larger number of genetic markers. Very compatible mating types; however, more genetic little genetic