Low-Copy Nuclear Genes for Plant Phylogenies: a Preliminary
Total Page:16
File Type:pdf, Size:1020Kb
Selecting Single-copy Nuclear Genes for Plant Phylogenetics: A Preliminary Analysis for the Senecioneae (Asteraceae) Inés Álvarez*, Andrea Costa, Gonzalo Nieto Feliner Real Jardín Botánico de Madrid, CSIC, Plaza de Murillo, 2, E-28014 Madrid, Spain. Phone: 34-91- 4203017, Fax: 34 91 4200157, e-mail: [email protected] *Corresponding author 1 ABSTRACT Compared to organelle genomes, the nuclear genome comprises a vast reservoir of genes that potentially harbour phylogenetic signal. Despite the valuable data that sequencing projects of model systems offer, relatively few single-copy nuclear genes are being used in systematics. In part this is due to the challenges inherent in generating orthologous sequences, a problem that is ameliorated when the gene family in question has been characterized in related organisms. Here we illustrate the utility of diverse sequence databases within the Asteraceae as a framework for developing single-copy nuclear genes useful for inferring phylogenies in the tribe Senecioneae. We highlight the process of searching for informative genes by using data from Helianthus annuus, Lactuca sativa, Stevia rebaudiana, Zinnia elegans, and Gerbera cultivar. Emerging from this process were several candidate genes; two of these were used for a phylogenetic assessment of the Senecioneae, and were compared to other genes previously used in Asteraceae phylogenies. Based on the preliminary sampling used, one of the genes selected during the searching process was more useful than the two previously used in Asteraceae. The search strategy described is valid for any group of plants but its efficiency is dependent on the phylogenetic proximity of the study group to the species represented in sequence databases. Key words: Single-copy nuclear genes; phylogenetic markers; cellulose synthase; chalcone synthase; deoxyhypusine synthase; Asteraceae; Senecioneae 2 INTRODUCTION Over the last two decades molecular data have become the most powerful and versatile source of information for revealing the evolutionary history among organisms (Van de Peer et al. 1990; Chase et al. 1993; Van de Peer and De Wachter 1997; Baldauf 1999; Mathews and Donoghue 1999; Soltis et al. 1999; Graham and Olmstead 2000; Brown 2001; Nozaki et al. 2003; Schlegel 2003; Hassanin 2006). In most cases, however, only a few molecular markers are employed for phylogeny reconstruction; in plants, for example, the predominant tools are chloroplast genes and multi copy rDNA genes and spacers such as ITS (Álvarez and Wendel 2003). Because of the limitations inherent in cpDNA and rDNA markers, and because of the enormous phylogenetic potential of single-copy nuclear genes, these latter are increasingly being used in systematic studies (Strand et al. 1997; Hare 2001; Sang 2002; Zhang and Hewitt 2003; Mort and Crawford 2004; Small et al. 2004; Schlüter et al. 2005). Among the main advantages of single-copy nuclear genes are: 1) bi-parental inheritance, 2) co- occurrence of introns and exons within the same gene, yielding characters that evolve at different rates thus can provide phylogenetic signal at different levels, and 3) a very large number of independent markers. This potential has yet to be fully realized, in part because developing single-copy nuclear genes requires previously generated sequence information from related groups. When sequence availability is high (e.g., from genomic libraries or sequencing projects of closely related taxa), it may be possible to screen thousands of sequences for potential use through comparisons with homologous sequences in other taxa (Fulton et al. 2002). Here we use this approach and the recommendations in Small et al. (2004) to design a selection strategy for identifying single-copy nuclear genes of potentially phylogenetic value in the tribe Senecioneae (Asteraceae). There is a recently published study pursuing similar aims although establishing different criteria for selection of genes (Wu et al. 2006). Senecioneae is the largest tribe (~ 3,000 species and around 150 genera) of one of the largest family of seed plants (Asteraceae) and yet, relative to the remaining tribes, it is rather poorly known from a systematic point of view. All molecular phylogenetic analyses of the Senecioneae are based on 3 chloroplast markers (Jansen et al. 1990, 1991; Kim et al. 1992; Kim and Jansen 1995; Kadereit and Jeffrey 1996) and only a small portion of the tribe is represented. Currently several teams are collaborating to analyze available Senecioneae sequences (for about 600 species representing 115 genera) of several chloroplast markers (i.e., ndhF, psbA-trnH, trnK, trnT-L, trnL, and trnL-F) plus the ITS region of the nuclear ribosomal DNA, to generate a supertree of the tribe (Pelser et al. 2006, see also http://www.compositae.org/); this will provide an essential preliminary phylogenetic hypothesis of the tribe. Although supertrees are employed for phylogenetic analyses of large taxonomic groups (see http://tolweb.org/tree/), these methods are not devoid of criticisms (Bininda-Emonds 2004). In addition, in the Senecioneae, only the maternally inherited chloroplast genome and ITS, with its unpredictable evolutionary behavior (Álvarez and Wendel 2003) have been widely used. Thus, there is a need to employ additional independent nuclear markers, both to test previous phylogenetic hypotheses and to complement supertree datasets. At present there are no genomic libraries available or sequencing projects for any member of the Senecioneae. However, two genomic libraries from model organisms belonging to different tribes (Helianthus annuus, Heliantheae) and (Lactuca sativa, Lactuceae), provide a framework for selecting potentially homologous genes. Since Lactuca is relatively distant from Helianthus (see http://www.compositae.org/), comparisons of homologous sequences from these two genera may prove fruitful in designing tools for phylogenetic use in the Senecioneae. Thus, primers selected on conserved regions in Helianthus and Lactuca should also work for members of the Senecioneae and presumably for most members within the Asteraceae. These assumptions need to be tested, of course, as genes can vary in copy number or presence among taxa, and because primer sites for PCR amplification might be polymorphic. To minimize these problems it is helpful to compare as many sequence databases as possible. Within Asteraceae we had available for the present study sequence databases from genomic libraries of organisms from other genera, such as Stevia (Eupatorieae) and Zinnia (Heliantheae), 4 thereby allowing us to use members from three different tribes (Eupatorieae, Heliantheae, and Lactuceae). The approach we detail here is applicable to any group of organisms belonging to or related to taxonomic groups well represented in public nucleotide databases. While comparisons among sequence databases are relatively straightforward, the selection of the best candidate genes may be challenging due to: 1) difficulty in diagnosing paralogy, and 2) the need to assess variation and its phylogenetic utility. The latter, especially required at low taxonomic levels, can only be ascertained when a good representation of taxa and sequences (clones) are analyzed. Although some approaches such as Wu et al. (2006) are successful for deep phylogenies, the lack of a phylogenetic analysis to asses all candidates selected during the search process might limit their usefulness at lower taxonomic levels. Polyploidy contributes additional complications, since multiple diverse sequences representing homoeologs and paralogs may be present in the same genome (Fortune et al. 2007), but they are difficult to avoid in many plant groups, including the Senecioneae, where polyploidy is known to be prevalent in most lineages (Nordenstam 1977; Lawrence 1980; Knox and Kowal 1993; Liu 2004; López et al. 2005). MATERIAL AND METHODS Plant Material Fresh leaf tissue of plants from the living collection of Real Jardín Botánico in Madrid, collected in the field and preserved in silica-gel or cultivated from seeds received from other botanical institutions (Table 1), were used to isolate total DNA with the Plant DNeasy kit (Qiagen), following the manufacturer’s instructions. Since sampling was aimed at assessing the phylogenetic utility of the markers tested within the Senecioneae, we selected 8 species from the main taxonomic groups within the tribe (Pelser et al. 2006) spanning different ploidy levels (from x = 5 to 2x, 4x, 6x, and unknown), 5 and distributed in different biogeographical areas. In addition, sequences from 3 species belonging to other tribes in Asteraceae were included as outgroups (See Table 1). DNA Sequence Databases The main sources of DNA sequences used were the on-line databases (DDJB, EMBL, and GenBank). These databases are interconnected making all data available from any of their web sites. Arbitrarily we choose the GenBank web site to do our searches (http://www.ncbi.nlm.nih.gov/). At present, around 63 million sequences are available for Eukaryota, of which 14 million are from plants. Focussing on single-copy nuclear genes for Senecioneae, and to accelerate searches within this database, we excluded sequences from plastid genomes as well as ribosomal DNA and microsatellites from the Asteraceae. A total of 180,747 sequences were downloaded in a file named “Asteraceae NCBI” that was the main database for our searches (Fig. 1). Another database used is generated from genomic libraries from Helianthus annuus lines RHA801 and RHA280, H. paradoxus, H. argophyllus,