1 Supplemental Material Results–Pseudogenes of Ndhf Were Found Via Direct Sequencing of PCR Products in Rhodolirium Speciosum
Total Page:16
File Type:pdf, Size:1020Kb
1 Supplemental Material PSEUDOGENES AND EXCLUDED SEQUENCES Results–Pseudogenes of ndhF were found via direct sequencing of PCR products in Rhodolirium speciosum (Herb.) Ravenna (GB: KC217409) and Phycella australis Ravenna (GB: KC217397). These sequences exhibit a stop codon at position 145-147 and share a 97-aa deletion spanning positions 1,483-1,773 of our ndhF nucleotide alignment. Two pseudogenized copies of ndhF were identified through cloning in Famatina maulensis, Fmau-ndhF1 (GB: KC217380) and Fmau-ndhF2 (GB: KC217381). Fmau- ndhF1 exhibits a stop codon at position 649-651 and a deletion at 717-722, while Fmau- ndhF2 has a stop codon at position 1,057-1,059 of the ndhF alignment. An unusual 3′ndhF sequence was obtained by direct sequencing of Rhodophiala ananuca-1 (GB: KC217413), which does not show a stop codon in the amino acid alignment, but has an unusually high substitution rate, as well as numerous non- synonymous substitutions and a deletion between positions 1,614-1,619 of the alignment. Cloning of the ndhF PCR product from this sample and sequencing of 4 colonies revealed the non-pseudogenized copy only, which was used in subsequent analyses. Through direct sequencing of the trnL(UAA)-F(GAA) PCR product from the same R. ananuca sample, we obtained a symplesiomorphic sequence (GB: KC217491), which contains unusual features such as insertions between positions 50-53 (shared with Cyrtanthus), 7 bp after position 149 (autapomorphic), and 780-785 (shared with Cyrtanthus, Lycoris radiata (L’Hér.) Herb., Pancratium canariense Ker Gawl., Griffinia 2 Ker Gawl., Worsleya procera (Lem.) Traub, Phycella, Placea, Rhodolirium, and Traubia modesta), and a 110-bp autapomorphic deletion between positions 192-301 of the trnL(UAA)-F(GAA) alignment. Finally, the 3′ycf1 sequence from Rhodophiala splendens-1 (GB: KC207537) has an unusually high substitution rate with numerous autapomorphic non-synonymous substitutions; however, no stop codon was detected in the amino acid alignment. These sequences possessed extremely long branches relative to the other sequences in ML trees of each region (not shown). Moreover, these seven excluded sequences appear in basal positions relative to sequences from related species or from different samples of the same species. The only exception to this observation is Fmau- ndhF1, which appears embedded in the clade formed by Phycella and Placea, in agreement with the position of Famatina maulensis based on other markers. Discussion–Pseudogenes are usually defined as sequences of genomic DNA that are derived from functional genes and exhibit degenerative features such as premature stop codons and frameshift mutations that prevent their expression, although some highly divergent sequences are functional by being involved in the regulation of gene expression and by generating genetic diversity (Balakirev and Ayala 2003). In the case of the unusual trnL(UAA)-F(GAA) sequence from R. ananuca, we hesitate to call it a pseudogene, because most of this region is non-coding. Nonetheless, pseudogenized trnF genes, although extremely rare, have been reported in other groups, including Annonaceae, Brassicaceae, Asteraceae, Solanaceae, and Juncaceae (reviewed in Poczai and Hyvönen 2011). We were unable to detect a stop codon in the R. splendens-1 3′ycf1 sequence, perhaps because this is just a partial sequence and does not span the whole length of this gene. 3 On the other hand, ndhF pseudogenes have frequently been reported in a diverse array of angiosperms, including Orchidaceae subfamily Epidendroideae (Neyland and Urbatsch 1996), Anacampseros L. (Anacampserotaceae; Applequist and Wallace 2001), Catalpa Scop. (Bignoniaceae; Li 2008), and Erodium L’Hér. ex Aiton (Geraniaceae; Blazier et al. 2011). The complete absence of the ndhF gene from the chloroplast genome has been reported in Gnetales and Pinaceae (Wakasugi et al. 1994; Braukmann et al. 2009), in several parasitic lineages (e.g., de Pamphilis and Palmer 1990; Haberhausen and Zetsche 1994), and in Amaryllidaceae tribe Eucharideae (Meerow 2010). The latter example suggests that the American lineage of Amaryllidaceae is susceptible to the loss of this gene. Within Clade A, ndhF might have been pseudogenized in two independent events given the pattern of deletions, the first involving the most recent common ancestor of Phycella australis and Rhodolirium speciosum, and the second involving Famatina maulensis, where there are two ndhF pseudogenes. The unusual 3′ndhF sequence obtained from Rhodophiala ananuca-1 might constitute a third event of ndhF pseudogenization within Hippeastreae; however, because a “normal” copy was also detected in this sample, we might be dealing with a duplication of the gene and its transfer to the nucleus (Baldauf and Palmer 1990; Gantt et al. 1991; Millen et al. 2001). We currently do not have an explanation for these odd cpDNA sequences and hesitate to provide one without further experimental evidence. Other studies have explained the presence of pseudogenes in the chloroplast genome by invoking functional transference to the nucleus, functional replacement by another gene, intermolecular recombination, and complex processes of structurally mediated illegitimate recombination (e.g., Ansell et al. 2007; Blazier et al. 2011). 4 POSITIVE SELECTION TESTS Materials and Methods–A detection analysis of positive selection was performed for both coding cpDNA regions in DataMonkey (Delport et al. 2010a). Codon alignments for 3′ycf1 and ndhF were determined in Geneious Pro, considering only complete, non- identical sequences. After removal of indels, these alignments were used to infer a maximum likelihood (ML) tree for selection analysis for each locus, using the following approach. Each codon alignment, its respective ML tree, and the codon model of evolution selected by the CodonTest method (Delport et al. 2010b) were used to detect alignment-wide evidence for positive selection with the PARRIS method (Scheffler et al. 2006) at p = 0.05. Results–The final codon alignments consisted of 56 sequences and 582 codons for ndhF, and 72 sequences and 691 codons for 3′ycf1. CodonTest selected the K81 model (Kimura 1981) with substitution code 012210 for the ndhF data set and a model with substitution code 012010 for 3′ycf1. Trees inferred by RAxML for these codon alignments agreed in topology and had similar branch lengths to those estimated for the complete matrices. Neither codon alignment showed evidence of positive selection at p < 0.05 (ndhF: LRT = 0.1, p = 0.951; 3′ycf1: LRT = 3.32, p = 0.190). Under the null (‘nearly neutral’) model (M1), DataMonkey estimated a mean dN/dS ratio of 0.366 for ndhF and 0.725 for 3′ycf1. RECOMBINATION DETECTION ANALYSES 5 Materials and Methods–Tests to detect recombination in the expanded ITS alignment of Hippeastreae were conducted with the Recombination Detection Program 4 (RDP4; Heath et al. 2006; Martin et al. 2010). RDP4 implements several non-parametric recombination detection methods for identifying and analyzing recombination signals (Martin et al. 2010). Besides identifying putative recombinant sequences, RDP4 also infers the most likely recombination breakpoints in the sequence, and putative major (i.e., a sequence closely related to that from which the greater part of the recombinant’s sequence may have been derived) and minor (i.e., a sequence closely related to that from which sequences in the proposed recombinant region may have been derived) parental sequences for each recombination event. Breakpoint polishing, checking for misalignments, and phylogenetic evidence for recombination signals were options that were invoked. The highest acceptable p value was set to 0.05 with Bonferroni correction applied for multiple comparisons. Two data sets were analyzed: 1) the complete matrix with 91 non-identical sequences including outgroups (Table 2); and 2) a reduced set of 69 sequences restricted to Clade B, where a recombination signal was detected by Meerow (2010). When a signal was detected in the preliminary scan step, the guidelines to test and refine recombination hypotheses found in the RDP3 manual (Martin et al. 2010) were followed. Results–No signal of recombination was detected for the complete data set. However, after the refinement of the preliminary hypotheses, two putative recombination events were inferred in the data set restricted to Clade B (Fig. S9) by the Sister-Scanning Method (SiScan; Gibbs et al. 2000). 6 The first event involves Rhodophiala bifida subsp. granatiflora as the major parent, and possibly Hippeastrum cipoanum as the minor parent, although the program noted the possibility of misidentification of the second parent. This event gave rise to the sequences of Eithea blumenavia and most core-Rhodophiala. The program suggested that Zephyranthes cearensis, Famatina herbertiana, and Rhodophiala ananuca were also daughter sequences of this event, but these were rejected based on their p-values and guidelines to refine recombination hypotheses in the RDP3 manual. However, it is still possible that these sequences are also involved in this event, which could have been masked by more recent recombination events (Martin et al. 2010). The second recombination event gave rise to Hippeastrum cipoanum; an unknown sequence, possibly R. bifida subsp. granatiflora, was identified as the major parent, and Hippeastrum striatum as the minor parent. LITERATURE CITED Ansell, S. W., H. Schneider, N. Pedersen, M. Grundmann, S. J. Russell, and J. C. Vogel. 2007. Recombination diversifies chloroplast trnF pseudogenes in Arabidopsis lyrata. Journal of Evolutionary Biology 20: 2400–2411. Applequist, W. L. and R. S. Wallace. 2001. Phylogeny of the portulacaceous cohort based on ndhF sequence data. Systematic Botany 26: 406–419. Balakirev, E. S. and F. J. Ayala. 2003. Pseudogenes: are they “junk” or functional DNA? Annual Review of Genetics 37: 123–151. 7 Baldauf, S. L. and J. D. Palmer. 1990. Evolutionary transfer of the chloroplast tufA gene to the nucleus. Nature 344: 262–265. Blazier, J. C., M. M. Guisinger, and R. K. Jansen. 2011. Recent loss of plastid-encoded ndh genes within Erodium (Geraniaceae). Plant Molecular Biology 76: 263–272. Braukmann, T. W. A., M.