Chapter 3

Preliminary Application of matK Gene Sequences to Grass Systematics1

3.1 Introduction

The grass family () is the fourth largest flowering family, ecologically the most dominant, and economically the most important. Disagreement regarding the systematic treatments and phylogenetic trends in the family is evident because of reduction in reproductive and vegetative structures, suspected high degree of homoplasy, and prevalence of hybridization and polyploidy. The number of subfamilies varies from 2 to 13 and the circumscription and taxonomic position of numerous tribes and subfamilies are far more problematic (Caro 1982; Hilu 1985; Hilu and Wright 1982; Watson et al. 1985; Clayton and Renvoize 1986). Our knowledge of grass systematics and evolution has advanced considerably as a result of contributions from interdisciplinary fields such as cytology, anatomy, biochemistry, and molecular biology (Reviewed in Hilu and Wright 1982; Pohl 1987). The accumulating data have been analyzed phenetically and phylogenetically to reveal the systematic relationships and phylogenetic trends of the Poaceae (Hilu and Wright 1982; Watson et al 1985; Baum 1987; Kellogg and Campbell 1987). In spite of these interdisciplinary contributions, our understanding of the systematic and evolutionary relationships among members of the grass family still suffers from a number of shortcomings. For instance, the phylogenetic position and monophyly of some subfamilies is being questioned, delimitations of some major tribes (e.g. Aveneae and Chlorideae) and smaller tribes (e.g., Meliceae and Pappophoreae) are still controversial,

1 An early version of this chapter was published in Canadian Journal of Botany, 74: 125-134, 1996.

44 and the taxonomic position of others (e.g., Aristideae, Thysanolaeneae and Stipeae) remains disputable.

With the advances in molecular biology, approaches such as DNA reassociation, restriction site analysis, RFLP (Restriction Fragment Length Polymorphism) analysis, RAPD (Random Amplified Polymorphic DNA), and DNA sequencing have become valuable in addressing questions and testing hypotheses in plant systematics (Conti et al. 1993; Donoghue et al. 1993; Duvall et al. 1993; Cummings et al. 1993). DNA sequencing has emerged as one of the most utilized of the molecular approaches for inferring phylogenies because of the direct comparison of the nucleotide sequences and the relative ease of interpreting the sequence information. The sequence data are cumulative, the potential sizes of informative data sets are immense, and the data are available in public computer databases. The rbcL gene is the most widely used in plant systematics and there are about 2,000 sequences available for this gene. The origin of angiosperms, relationship of dicotyledons and , and phylogenetic reconstruction at class, order, family and genus levels have been studied using the rbcL gene sequence data (Chase et al. 1993). In addition to the rbcL, other genes such as the matK, rpo4, ndhF, and ITS region of the ribosomal genes have been used in addressing similar systematic and evolutionary questions.

3.1.1 DNA Sequencing in the Poaceae

DNA sequencing has emerged as one of the most utilized molecular approaches for inferring phylogenies, especially utilizing chloroplast genes (Clegg et al. 1994). There have been a limited number of DNA sequencing studies in the grass family. Hamby and Zimmer (1988) examined portions of the coding region of the 18S and 26S nuclear ribosomal DNA (rDNA) in nine species from the bambusoid, oryzoid, pooid and panicoid grasses. Their analysis suggested that the bambusoid species are basal in grasses and related to Oryza. The pooid group diverged after Oryza, and the panicoids were terminal in their cladogram. Doebley et al. (1990) analyzed rbcL sequences from six grass species representing the subfamilies Pooideae, Oryzoideae and Bambusoideae, using Spinacia

45 oleracea L. (Chenopodiaceae) as an outgroup. Their study showed the Pooideae as a basal clade (Bambusoideae was not represented) and the oryzoid genus Oryza diverged next, as a sister group to the Panicoideae terminal taxon. They concluded that the information is too small to differentiate statistically among alternative topologies in grasses. The recent work of Barker et al. (in press), with a larger sample size and emphasis on the Arundinoideae, showed more variation in the rbcL gene and resulted in a useful phylogenetic resolution. Nevertheless, the resolution at certain nodes was still unsatisfactory.

Nadot et al. (1994) examined rps4 sequences in 28 species of the Poaceae. Using Iris (Iridaceae) as an outgroup, they found a dichotomy in the tree between the Pooideae and the remaining subfamilies. The use of a species from the relatively distant Iridaceae makes topology inference a bit difficult. In addition, the rps4 is a short gene (600 bp long) that provided only 63 informative sites for the study. Hsiao et al. (1994) examined the utility of the internal transcribed spacer region (ITS) in grass phylogeny. Based on 10 sequences from the subfamilies Pooideae, Panicoideae and Oryzoideae, they concluded that the ITS sequences are useful for inferring phylogenetic relationships among closely related species.

3.1.2 Characteristics of the matK Gene

The matK gene, previously called orfK, is approximately 1500 bps in length. It is a maturase coding gene located within the intron of the chloroplast gene trnK (Fig. 3.1); these genes are located upstream of the psbA gene close to the inverted repeat. A homology search indicates that the carboxyl terminus (at amino acid positions 369-471) is structurally related to portions of maturase-like polypeptide and might involve in splicing group II intron (Neuhaus and Link 1987). The open reading frame of 509 codons of this gene was first characterized in tobacco, Nicotiana tabacum L. (Sugita et al. 1985). A longer reading frame (524 codons) was later found in mustard, Sinapis alba L. (Neuhaus and Link 1987). Comparing the amino acid sequence between tobacco and mustard, Neuhaus and Link (1987) revealed an overall homology of 66%. Only a few small length

46 mutations were found, none of which caused major problems in alignment of the two sequences.

rps16 trnK5' mat K trnK3' psbA

MATK8 MATK7 MG15 MATK5 MG1

Fig. 3.1. Relative position of the PCR amplification primers (MG1: 5' CTA CTG CAG AAC TAG TCG GAT GGA GTA GAT; MG15: 5' ATC TGG GTT GCT AAC TCA ATG) and sequencing primers (matK5: 5'CGA TCC TTT CAT GCA TT, matK7: 5'GTA TTA GGG CAT CCC ATT, and matK8: 5'CTT CGA CTT TCT TGT GCT) used in this study. Boxed areas represent coding regions.

The matK gene has evolved at a higher rate than several other genes currently used in systematic studies. Olmstead and Palmer (1994) reported that among 20 genes used in molecular systematics, the matK gene has the highest overall nucleotide substitution rate. In the Saxifragaceae, the matK gene has been found to evolve approximately three fold faster than the rbcL gene (Johnson and Soltis 1994). The sequences of the matK in the Polemoniaceae displayed an overall rate twice that of rbcL sequences (Steele and Vilgalys 1994).

3.1.3 Constrasting matK, rbcL, psbA, and rps4 genes in grasses

In order to compare the amount of nucleotide substitution of the matK gene in grasses to those of other genes, we used the sequences of Oryza sativa L. (rice) and

47 Hordeum vulgare L. (barley). These two species were chosen because they are intensely studied and the data are accessible from computer databases. Four chloroplast genes, rbcL, psbA, rps4 and matK, were found to encompass enough data to be used for comparative studies. The sequences were aligned with the PILEUP program of GCG Package (Devereux et al. 1984). Among these genes, the matK has the highest degree of nucleotide substitution, showing a 8.94% variable sites compared with 5.72% or lower for the other genes (Table 3.1).

Table 3.1. Comparison of DNA sequences of four chloroplast genes from rice and barley for nucleotide variation, length, G+C content and transition / transversion ratios (tr/tv).

Similarity Length G+C% Variable % tr/tv site rbcL 94.28 1725 40.8 95 5.72 2.276 psbA 97.27 1062 42.0 29 2.73 3.833 rps4 96.26 588 36.9 28 3.74 1.200 matK 91.06 1560 33.9 135 8.94 1.250

To determine the patterns of variation within these four genes, each gene was divided into ten sections, and the percentages of nucleotide substitutions in each section were calculated. The percentage value, instead of absolute value, was used because the genes have different lengths. The matK gene displayed the highest level of sequence variation in all sectors, except for sector 3 where both rbcL and matK genes were equally variable (Fig. 3.2). The mean for percent nucleotide substitution for the matK was 8.65 compared with 5.47, 2.71, and 3.74 for the rbcL, psbA and rps4, respectively. To evaluate the pattern of variation within the genes, the standard deviations were calculated. The standard deviation for the matK, 3.53, was higher than the 3.01, 1.37, and 1.93 of

48 the other three genes. It also appears that in all four genes, the variation increases towards the 5' end of the gene (Fig. 3.2b). This pattern is even more evident when the genes are divided into three sectors (Fig. 3.2a).

The transition-transversion ratio (tr/tv) for the matK gene sequences between O. sativa and H. vulgare was calculated to be 1.25 (Table 3.1). This ratio is lower than the expected value of 2.0 for relatively recently diverged sequences and exceeds the value of 0.4 for highly substitution-saturated sequences (Holmquist 1983). Comparing the matK gene with the other three genes, the matK gene was comparable to the rps4, and displayed almost half of the tr/tv values of the rbcL (Table 3.1). It was noted that the tr/tv ratios and G+C content show a ranking order correlation (Table 3.1). It remains to be seen if this trend is universal, and if so, what would be the underlying factors that promote this correction.

The relatively high rate of substitution, reasonable size, and low tr/tv underscore the usefulness of the matK gene in systematic studies. To further address the potential application of this gene to resolving patterns of evolution in the grass family, the 3' end of the gene was sequenced from across the Poaceae, and the sequences were analyzed cladistically and phenetically.

3.2 Materials and Methods

3.2.1 DNA Extraction and Amplification

Seventeen species representing 13 tribes and 6 subfamilies were selected for this study (Table 3.2). In addition, Joinvillea (Joinvilleaceae) was chosen as an outgroup based on recent analyses that reveal a close relationship between the Joinvilleaceae and Poaceae (Campbell and Kellogg 1987; Doyle et al. 1992; Linder and Rudall 1993). The subfamily and tribal classification of the grass species followed Hilu and Wright (1982). Leaves were harvested from grown in the greenhouse or collected in the field and were stored at -80oC until DNA extraction.

49 10.0 a. 3 Sections

8.0

6.0

4.0

2.0 Nucleotide Variation (%)

0.0 1 2 3

16 rbcL psbA 14 rps4 b. 10 Sections matK 12

10

8

6

Nucleotide Variation (%) 4

2

0 1 2 3 4 5 6 7 8 9 10 Sectors of the Genes

Fig. 3.2. Variability within matK and other chloroplast genes between Oryza sativa L. and Hordeum vulgare L. X-axis is the sectors of the genes from 5' end to 3' end; Y-axis is the percentage of nucleotide substitution between O. sativa and H. vulgare. All four genes were divided into 3 sections (a) and 10 sections (b).

50 Table 3.2. Eighteen taxa and their respective tribes and subfamilies used in the sequence analysis, *sequence available in GenBank.

Subfamily, Tribe, Species Abbreviation Sources of Seeds

Pooideae Triticeae Hordeum vulgare L.* Hord Avenaeae Avena sativa L. Aven Hilu, KH9406 Agrostideae Phleum pratense L. Phle USDA, PI303130 Chloridoideae Cynodonteae Chloris distichophylla Lag. Chlo Hilu, KH5538 Dactyloctenium aegypticum (L.) Beauv. Dact USDA, PI271559 Eragrosteae Eleusine indica (L.) Gaertn. Eleu USDA, PI408801 Eragrostis capensis (Thunb.) Trin.Erag Hilu, KH5539 Pappophoreae bicolor Fourn. Papp Hilu, KH5542 Zoysieae Perotis patens Grand. Pero USDA, PI364995 Panicoideae Andropodoneae Zea diploperenis Iltis, Dobley & Guzman Zea Hugh Iltis, Univ. of Wisconsin Sorghum bicolor (L.) Moench Sorg Hilu, KH9408 Aristideae Aristida adscensionis L. Aris KH5516 Paniceae Digitaria sanguinalis (L.) Scop. Digi Sharp Brothers Seed Co., Paspalum almum Chase Pasp USDA, PI303958 Arundinoideae Arundineae Arundo donax L. Arun KH5546 Bambusoideae Bambuseae Phyllostachys aurea Riv. Phyl KH9418 Oryzoideae Oryzeae

51 Oryza sativa L.* Oryz Joinvilleaceae Joinvillea plicata (Hooker f.) Newell & Stone Join C. S. Christopher

Total cellular DNA was isolated from leaf material of individual plants following Hilu (1994). Since the matK gene is nested between the two exons of the trnK gene that contain conservative sites for amplification (Fig. 3.1), we designed two primers, MG1 and MG15, based on sequences at the two conservative coding exon of the trnK gene. These primers were used to amplify the whole trnK region, including the matK gene, from total DNA via the polymerase chain reaction (PCR). These amplified products were utilized as templates in sequencing since the matK gene is a single copy gene and the cycle sequencing method was used in this study. For the PCR amplification, each reaction mixture (100 mL) contained 71.5 ml sterile water, 10 ml of 10´ PCR reaction buffer (Promega), 6.0 ml of 25 mM magnesium chloride, 4 ml of 10 mM dNTP, 2 ml of each of the two primers (20mM), 0.5 ml (2.5 units) of Taq polymerase (Promega), and 4 ml of genomic DNA template (10- 100ng). The amplification was done in a PTC 100 thermocycler. The first cycle was 96oC for 1 min 30 sec; 64oC for 1 min; 72oC for 3 min, followed by 34 cycles of 95oC for 30 sec; 64oC for 1 min; 72oC for 3 min, and a final extension at 72oC for 5 min.

3.2.2 DNA Sequencing

The cycle sequencing method used in this study directly sequences the DNA by the PCR method. The cycle sequencing kit and procedure of the Bethesda Research Laboratories (BRL) were used. The three primers used in sequencing were provided by Vilgalys from his joint study with Dr. K. P. Steele on the Polemoniaceae (Steele and Vilgalys 1994). The PCR-amplified fragments were purified following the procedure designated in the kit. Labeling was carried out at 37oC for 10 min in a premixed cocktail that contained 2 ml 0.5 mM sequencing primers, 1 ml 5X kinase buffer, 1 ml T4 polynucleotide kinase and 1 ml 33P-labeled dATP (NEN), following by 55oC for 5 minutes

52 to terminate the reaction. Sequencing reaction contained 21 ml sterile water, 5 ml of labeled primer, 4.5 ml 10x sequencing buffer, 5 ml purified DNA template (40-50 fmol) and 0.5 ml (2.5 units) of Taq polymerase. 8ml of this mixture was used for each ddNTP termination reaction. The cycle sequencing program included 20 cycles of 95oC for 30 sec; 52oC for 30 sec; 70oC for 60 sec, followed by 10 cycles of 95oC for 30 sec and 70oC for 60 sec.

The sequencing reaction products were denatured by diluting 1:1 in formamide- dye solution (final formamide concentration was 50%) at 90oC for 5 min and then 4 ml of the denatured mix were loaded on a 0.35 mm 5% polyacrylamide sequencing gel. The gels were run in 1X TBE (Sambrook et al. 1989) at a constant power of 1800 Volts and 60 Watts. When the dye of the first load is approximately two-thirds down the gel, then a second load of 4 ml was applied, and the gel was run until the dye was near the bottom. The gels were fixed in a 10% acetic acid solution for 10 minutes, oven-dried, and exposed to Kodak Biomax films for 48 h to 72 h.

3.2.3 Analysis of Sequence Data

The DNA sequences were aligned with the PILEUP program of the Genetics Computer Group (GCG) software package (Devereux et al. 1984), using a VAX computer system. The gap weight and gap length weight were set to 5.0 and 0.3, respectively. The basic sequence statistics, including nucleotide frequencies, tr/tv ratio and variability in different regions of the sequences were computed by MEGA (Kumar et al. 1993) and MacClade 3.0 (Maddison and Maddison, 1992). The aligned sequences were used as the input data for PAUP 3.0 (Swafford 1990), PHYLIP 3.4 (Felsenstein 1992) and MEGA. The data set was transformed into NEXUS format using MacClade 3.0 and then analyzed by the Wagner parsimony method of PAUP. The parsimony analyses were conducted using a heuristic search with MULPARS, TBR branch swapping, and CLOSEST addition to estimate relationships and tree topology. Four different consensus trees were generated by CONSENSUS option based on the equally parsimonious trees produced by heuristic search. Both bootstrapping and decay analyses

53 (Bremer 1988; Donoghue et al. 1993; Johnson and Soltis 1994) were performed in PAUP to determine relative support for various clades found in the parsimony analysis. The same data set was analyzed with PHYLIP, using DNAPARS and CONSENSE programs to construct the most parsimonious trees and consensus trees. The DNABOOT program was then used to assess the confidence in each clade (Felsenstein 1985). The sequence data were also analyzed with the neighbor-joining (NJ) method (Saitou and Nei 1987) using the MEGA program. The Juke-Cantor distance measure was used and the confidence probability values were calculated.

3.3 Results and Discussion

3.3.1 matK Sequence Comparison

The 583-bp of matK sequenced from grasses in this study represent about 40% of the gene and covered the 3' end. The G + C content of this section ranged from 36.0% in Digitaria to 38.2% in Arundo, with an average of 36.8%. The G + C content of this part of the matK appears to be higher than that for the whole gene as calculated from the barley and rice sequences (Table 3.1). This difference implies that the 5' end of the gene is more A + T rich. The G + C content is a consideration in designing primers and in estimating annealing temperature for PCR amplification and cycle sequencing. Among the six subfamilies of Poaceae, the Arundinoideae had the highest G + C content (38.2%), followed by the Chloridoideae with an average of 37.0%. The two subfamilies are considered to be closely related (Clayton, 1981; Hilu and Esen 1990). Interestingly, the G + C content in Aristida was 37.5%; this genus was placed in the Arundinoideae, Chloridoideae, or considered as an isolated tribe allied to these two subfamilies (see Esen and Hilu 1991).

The distribution of the number of variable sites along the matK 3' section was calculated with MEGA program by dividing the sequences into 50-bp sectors. For the most part, even distribution of variable sites was observed among the 18 grass species studied (Fig. 3.3). The average number of variable sites is 13.6 bp per 50-bp sector. The

54 exception is the 151-200 sector where the number of variable sites was 24 (Fig. 3.3). The region with the lowest number of variable sites (451 to 500, with 9 variable sites) is flanked by two sectors with average and above average numbers of variable sites (Fig. 3.3). Therefore, it appears that the whole 3' area is likely to provide sequence information in the Poaceae.

30

Nucleotide 20 Amino Acid

10 Number of Nucleotide and Amino Acid Substitution

0 1 2 3 4 5 6 7 8 9 10 11 12

Sectors of the matK Gene

Fig. 3.3. Variability of the matK gene in the grass family. The sequences the matK gene refer to the 3' section that begins at primer matK5 illustrated in Figure 3.1. The low value of sector 8 results from incomplete sequences for this region.

The ratios of transition and transversion varied within and among the six subfamilies (Tables 3.3). When the overall picture of tr/tv is considered, two major trends are evident. One trend was displayed by the bambusoid and oryzoid groups where the

55 tr/tv values between them was 3.25, the highest among grass subfamilies. This value indicates three times as much transition as transversion. The tr/tv ratios between the oryzoid-bambusoid groups and the other subfamilies also show a tendency toward higher tr/tv values, averaging 2.40 (Table 3.3). The other tr/tv trend was characteristic of the Pooideae, in which the ratios were consistently low when their sequences were compared with those of the other subfamilies. The multivariate analysis of variance revealed significant differences between the Pooideae and other grass subfamilies (Wilks' lambda = 0.813, P< 0.0001). The natural logarithm of the tr/tv ratios was also found to be significantly different (Wilks' lambda = 0.821, P< 0.0001). Univeriate analysis of variance showed no significant differences among subfamilies in transition, but the differences were significant in transversion (log-transformed F = 10.7, P< 0.0001) and tr/tv ratios (log- transformed F = 14.67, P< 0.0001). Because of the small sample size, the Bambusoideae trend could not be analyzed statistically.

Table 3.3. The ratios of transition/transversion among six subfamilies and outgroup

1 2 3 4 5 6 7

1. Pooideae ---- 2. Chloridoideae 1.44 ---- 3. Panicoideae 1.25 1.69 ---- 4. Arundinoideae 1.60 1.95 2.07 ---- 5. Bambusoideae 1.74 2.28 2.06 2.67 ---- 6. Oryzoideae 1.60 1.94 2.07 2.00 3.25 ---- 7. Joinvilleaceae 1.32 1.46 1.39 1.33 1.67 1.70 ----

The nucleotide substitutions between the Pooideae and other subfamilies required

56 less transitions compared to the other subfamilies, as the tr/tv ratios averaged 1.53. Interestingly, comparing the pair-wise substitution between the Pooideae and other subfamilies, the tr/tv ratio was the highest with the Bambusoideae, a subfamily that represents the high tr/tv trend. Similarly, low tr/tv values were observed when comparing the sequences of the outgroup Joinvillea (Joinvilleaceae) with those of the subfamilies of the Poaceae. The tr/tv of the other subfamilies falls between these two major trends.

Holmquist (1983) reported that the commonly observed tr/tv is 2:1. As DNA sequences are more likely to undergo transition than transversion the tr/tv ratio is commonly above unity. This tendency exists in spite of the presence of eight possible ways for transversion compared with four possibilities for transitions. For this reasons, transversion is considered a more reliable type of mutations in constructing phylogenies (Quicke 1993). Consequently, some studies either gave more weight to transversion in phylogenetic analyses or based the analysis on transversion alone, resulting in what is called transversion parsimony (Lake 1987; Quicke 1993). The low frequency of transversion compared with transition implies that the accumulation of transversion would accrue on a longer time span, giving a lower rate of substitution compared with transition. If this is true, then taxa that have higher numbers of transversions (lower tr/tv values) would be older, diverging early in the history of evolution of the group. Considering this scenario, it is logical to see the low tr/tv values between the outgroup Joinvillea and the grass subfamilies (Table 3.3). The low tr/tv values between the Pooideae and the other subfamilies should also mean an early split between the Pooideae and the rest of the grasses. On the other hand, the high tr/tv values between the oryzoid and bambusoid grasses signify recent evolutionary split between these two taxa. This latter point is not supported by the cladistic analysis.

The distribution of transitions and transversions among the three codon positions was examined in two cases: Oryza vs. Phyllostachys and Joinvillea vs. all 17 grass species. These two cases represent the two extremes in tr/tv ratios (averages 3.25 and 1.53; Table 3.3). The tr/tv ratios for Oryza vs. Phyllostachys were 1.3, 2.5, 8.0, and for

57 Joinvillea vs. the grass species were 1.3, 1.4 and 1.7 for the first, second and third codon positions, respectively. It is apparent that the third codon position, the most variable and the least influenced by functional constraints, has more transitions than the other two codon positions. This situation is well exemplified by the Oryza vs. Phyllostachys species where there are more than three times as many transitions as transversions. The data thus imply that in the species we examined, transversion, being less frequent and difficult to explain, are more common in codon positions that are more likely to be translated into nonsynonymous mutations. However, due to the functional constraints imposed on the first and second codon positions, stemming from the nonsynonymous mutations generated by them, phylogenetic trees based on these codon positions might not be as informative as those generated from the third codon position or from all codon positions. This prediction was supported when the sequence data were analyzed by the NJ method using one codon position at a time, and the resulting trees were contrasted with that based on all codon positions. The trees based on the first and second codon positions were less informative than the one based on the third codon position. Since the number of phylogenetically informative codon positions are smaller in the first and second codon positions, a tree was generated on the basis of both codon positions combined. This tree, which used 52% of the informative sites, did not allocate some genera to their respective subfamilies. Thus, it is evident that to take advantage of both transversion and the maximum number of informative sites, all three codon positions ought to be used in constructing phylogenies based on matK sequences in the Poaceae.

3.3.2 Phylogenetic Analysis

Out of the 583 bp sequenced, 183 were variable among taxa. Removing strictly autoapomorphic base positions resulted in 87 potentially informative sites. Thus, the variable and informative sites, thus, represent 30% and 14.9% of the sequences, respectively. These data are quite comparable to those obtained by Johnson and Soltis (1994) and Steele and Vilgalys (1994) in their matK sequence studies of 31 and 20 taxa from the Saxifragaceae and Polemoniaceae, respectively. The percentages of informative

58 sites are higher than the 11.5% obtained from the rbcL gene sequences in grasses (Barker et al., 1995).

Fig. 3.4. Dendrogram produced by the neighbor-joining method with Jukes- Cantor distance. Confidence probability (CP) values are indicated on the nodes. Sites containing gaps were not included in the distance analyses.

Skewness analysis of 17 grass taxa and the outgroup was tested for phylogenetic signal using the random tree option in PAUP 3.0. The g1 value for the entire data set with 100,000 random parsimony trees is -0.618, which is similar to the value (-0.626) in Saxifragaceae (Johnson and Soltis 1994). This value is well beyond the p = 0.001 level of significance and indicates that a high probability of phylogenetic signal in the data set

59 (Hillis and Huelsenbeck 1992).

The four consensus trees generated by the strict, semi-strict, Adams, and 50% majority-rule methods were identical in topologies, and thus only the first one will be discussed. Parsimony analyses with PAUP provided five most parsimonious trees of 302 steps and a 0.74 consistency index (CI). The strict consensus tree of the most parsimonious trees constructed by the CONCENSUS program had a 304 length, 0.73 CI, and 0.64 retention index (RI) (Fig. 3.5). The topology of this consensus tree was identical to that of the strict consensus tree obtained from the DNAPARS and CONSENSUS options in PHYLIP 3.4 (Felsenstein 1992). The bootstrap values ranged from 100% to 25%, but were mostly relatively high, indicating high levels of support.

The topology of the strict consensus tree, rooted by Joinvillea, is similar to that of the NJ tree (Figs. 3.4-3.5). The two differ in the positions of Oryza and Aristida, and in minor branching patterns at the tribal level. The bambusoid Phyllostachys appeared as a basal clade in both strict CONCENSUS and NJ trees. The basal group for the Poaceae is one of the unsettled evolutionary questions. The Bambusoideae was traditionally considered the most primitive group of grasses based on reproductive characters (Stebbins 1956; Soderstrom and Calderon 1979). However, the presence of derived anatomical, vegetative and some reproductive characters is indicative of certain degree of evolutionary advancement. Traditional studies and recent phylogenetic analyses disagree on whether or not the Bambusoideae is a basal lineage. Soderstrom (1981) stated that the bambusoid line, with its complex leaf and epidermis anatomy, is itself specialized and not to be regarded as the precursor of the other groups of grasses. In a cladistic study of morphological and anatomical characters, Kellogg and Watson (1993) maintained that the Bambusoideae can not be both basal and monophyletic. Sequence data from rDNA and rbcL genes (Hamby and Zimmer 1988; Barker et al. in press) showed a basal position for the Bambusoideae. Information from the matK gene sequences is in complete congruence with the data from these nuclear and chloroplast genes.

60 Fig. 3.5. The strict consensus tree derived from matK sequence analysis for Poaceae. This tree is generated by five most parsimonious trees through heuristic search with MULPARS, TBR branch swapping and Simple addition. Numbers above each branch indicated the number of base substitutions. Below branches, bootstrap support is indicated as percentages based on 100 bootstrap replications; the number of additional steps required to collapse each branch (decay index) is indicated in parentheses.

61 Oryza appeared in the strict consensus tree as a sister group to Phyllostachys (Fig. 3.5). The clade was supported by 10 unique substitutions. On the other hand, in the NJ tree, Oryza was a sister clade to the bambusoid and pooid clades (Fig. 3.4). Similar phylogenetic positions for Oryza have been found when the ndhF sequences were used in a study of grass phylogeny (Clark et al. in press). The position of Oryza in the NJ tree is supported by the results of the DNA sequence studies from the chloroplast rbcL and nuclear rDNA genes (Hamby and Zimmer 1988; Doebley et al. 1990). The sequence studies of the rps4 and rbcL genes (Barker et al. in press; Nadot et al. 1994) showed Oryza within the bambusoid clade. However, Barker et al. stated that the Bambusoideae are unresolved. The numerical study of Hilu and Wright (1982) showed the Oryzoideae grouping with the Pooideae in a cluster that was distinct from the Bambusoideae. The oryzoid grasses have either been treated as a distinct subfamily (Stebbins and Crampton 1961; Hilu and Wright 1982) or were included in the Bambusoideae, inferring monophyly of the two groups (Clayton and Renvoize 1986; Kellogg and Watson 1993). The cladistic study of Baum (1987) and the DNA sequence data from the matK and rDNA genes do not support monophyletic origin for the bambusoid and oryzoid grasses. Similar findings were also reported in the prolamin and immunological studies of Hilu and Esen (1988) and Esen and Hilu (1989). The bambusoid and oryzoid taxa displayed characteristic, 15-20 kd polypeptide; the immunological similarity between the two, however, was low. This information indicates pronounced divergence between the bambusoid and oryzoid grasses.

The three members of the Pooideae diverged after the Oryza clade in the strict concensus tree, while in the NJ tree they appeared as a sister group to the bambusoid genus (Figs. 3.4-3.5). The clade is strongly supported in 93% of the bootstrap replicates, 15 nucleotide substitutions and relatively high decay index. Hordeum vulgare (tribe Triticeae) appeared as a sister taxon to the two members of the Aveneae (Avena sativa and Phleum pratense) in the strict consensus tree. The NJ tree did not resolve this tribal position (Fig. 3.4). The position of the pooid lineage in the Poaceae was controversial.

62 Based on a cpDNA restriction site study (Davis and Soreng 1993), the Pooideae was basal in the family. This phylogenetic position is not supported by this study or by the results of the rDNA, rbcL, and ndhF sequence analyses (Doebley et al. 1990; Hamby and Zimmer 1988; Barker et al. in press; Clark et al. in press). The low tr/tv ratios calculated from comparing the Pooideae sequences with those of the other subfamilies might indicate their divergence from a separate ancestral stock and provide further support for their distinctness as an evolutionary line in the Poaceae.

The matK-based tree shows Arundo (Arundinoideae) as a basal taxon to the Panicoideae and Chloridoideae. The affinities among members of this clade (as well as the Centostecoideae) was first revealed in the study of Hilu and Wright (1982). The phylogenetic affinities among these taxa were also apparent in the cladistic study of Baum (1987). The presence of this evolutionary lineage in grasses was then established on the bases of information from prolamin (seed storage proteins) size and immunological affinities as well as cpDNA sequence similarities as measured by DNA reassociation (Hilu and Esen 1988; Esen and Hilu 1989; Hilu and Johnson 1992). The clade gained support later from the cpDNA restriction study of Davis and Soreng (1993) and was named the PACC clade. Further support for this clade also comes from the rbcL sequence study of Barker et al. (1995).

Aristida is a member of the Aristideae, a tribe placed in the Arundinoideae, Chloridoideae, considered as a separate subfamily, or left unclassified (reviewed in Esen and Hilu 1991). The genus appeared in the strict consensus tree within the Panicoideae, but the clade was weakly supported by the bootstrapping (Fig. 3.5). On the other hand, the NJ tree resolved Aristida as a sister group to the arundinoid genus (Fig. 3.4). Prolamin and immunological studies (Hilu and Esen 1990; Esen and Hilu 1991) demonstrated the distinctness of the Aristideae from the Arundinoideae and Chloridoideae. Barker et al. (in press) in their rbcL sequence study showed the Aristideae as a sister group to the Chloridoideae in the PACC clade. Aristida diverged as a sister group to the Arundinoideae in the ndhF sequence study of Clark et al. (in press). Thus, accumulating

63 evidence point to the isolated position of Aristida in the Poaceae. The question remains whether this tribe ought to be given a subfamily status. Although the subfamilial position is justifiable, this kind of taxonomic treatment would result in a trend of dividing the grass family into an unmanageable number of subfamilies. An isolated tribal position will reflect the phylogenetic status of this group, at least until the trends of evolution in the Poaceae are better resolved.

The Panicoideae species formed two lineages corresponding to the major tribes Paniceae and Andropogoneae in the strict consensus tree (Fig. 3.5), while the NJ tree (Fig. 3.4) showed Sorghum and Zea separated by a short genetic distance (0.5). The two tribes appeared highly diverged, which is in agreement with the numerical study of Hilu and Wright (1982). Additional taxa need to be included from both tribes and from the Arundinelleae, a tribe regarded as a sister group to the Andropogoneae (Clayton 1981). The Chloridoideae is represented by six taxa that formed a terminal lineage. The monophyly of the Chloridoideae is supported by 100% bootstrap value and a genetic distance of 6.85 (Figs. 3.4-3.5). The chloridoid genera formed an unresolved clade in the consensus tree. In their cladistic study of the Poaceae, Kellogg and Campbell (1987) found that the monophyly of the Chloridoideae is not strongly supported. However, the monophyly of the subfamily was demonstrated in the prolamin and immunological study of Hilu and Esen (1993) and in the rbcL and the ndhF sequence studies of the grass family (Barker et al. in press; Clark et al. in press). Eleusine and Dactyloctenium appeared in one clade in the consensus tree and as sister groups separated by 0.66 genetic distance in the NJ tree. The two genera are considered morphologically allied and are linked by E. multiflora (Calyton and Renvoize 1986). Pappophorum and Eragrostis formed a clade in both analyses. Pappophorum is either placed in the tribe Pappophoreae based on the unusual feature of the 9-more lobed and nerved lemmas, or considered as a member of the Eragrostideae. The phylogenetic relationship between Pappophorum and Eragrostis is well demonstrated by the 100% bootstrap value, 7 unique nucleotide substitutions, 4 decay index value, and the 7.48 genetic distance.

64 The matK coding region, therefore, shows a good potential in advancing our knowledge in grass systematics and evolution. Increasing the sequences toward the 5' end of the gene and expanding the number of representative species as proposed in this study will provide important data that will help us establish a robust phylogeny for the Poaceae. Our sequence alignment of the whole coding region of the matK gene between barley and rice showed even distribution of mutations along the region and, consequently, the potential of the 5' region to provide yet more informative sites.

3.4 Literature Cited

BARKER, N. P., LINDER, H. P., AND HARLEY, E. H. 1995. Polyphyly of Arundinaoideae (Poaceae): Evidence from rbcL sequence data. Systematic Botany 20: 423-435.

BREMER, K. 1988. The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42:795-803. CAMPBELL, C. S., AND KELLOGG, E. A. 1987. Sister group relationships of the Poaceae. In Grass systematics and evolution. 217-224 Edited by T. R. Soderstrom, K. W. Hilu, C. S. Campbell, and M. A. Barkworkth. Smithsonian Institution Press, Washington, DC. CARO, J. A. 1982. Sinopsis taxonomica de las gramineas argentinas. Dominguezia. 4:1- 4. CHASE, M. W., SOLTIS, D. E., OLMSTEAD, R. G., MORGAN, D., LES, D. H., MISHLER, B.D., DUVALL, M. R., PRICE, R. A., HILLS, H. G., QIU, Y. -L., KRON, K. A., RETTIG, J. H., CONTI, E., PALMER, J. D., MANHART, J. R., SYTSMA, K. J., MICHAELS, H. J., KRESS, W. J., KAROL, K. G., CLARK, W. D., HEDREN, M., GART, B. S., JANSEN, R. K., KIM, K. -J., WIMPEE, C. P., SMITH, J. F., FURNIER, G. R., STRAUSS, S. H., XIANG, Q. -Y., PLUNKETT, G. M., SOLTIS, P. S., SWENSEN, S. M., WILLIAMS, S. E., GRADEK, P. A., QUINN, C. J., EGUIARTE, L. E., BARRETT, S. C. H., DAYANANDAN, S., AND ALBERT, V. A. 1993. Phylogenetics of seed plants:

65 an analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanic Garden 80:528-580. CLARK, L. G., ZHANG, W., AND WENDEL, J. F. 1995. A phylogeny of the grass family (Poaceae) based on ndhF sequence data. Systematic Botany. 20: 436-460. CLAYTON, W. D. 1981. Evolution and distribution of grasses. Annals of the Missouri Botanic Garden 68:5-14. CLAYTON, W. D., AND RENVOIZE, S. A. 1986. Genera Graminium. HMSO publication, London. CONTI, E., FISCHBACK, A., AND SYTSMA, K. J. 1993. Tribal relationships in Onagraceae: implication from rbcL sequence data. Annals of the Missouri Botanic Garden 80:672-685. CUMMINGS, M. P., KING, L. M., AND KELLOGG, E. A. 1994. Slipped-strand mispairing in a plastid gene: rpoC2 in grasses (Poaceae). Molecular Biology and Evolution. 11:1-8. DAVIS, J. I., AND SORENG, R. J. 1993. Phylogenetic structure in the grass family (Poaceae) as inferred from chloroplast DNA restriction site variation. American Journal of Botany 80:1444-1454. DEVEREUX, J., HAEBERLI, P., AND SMITHIES, O. 1984. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Research. 12(1):387- 395. DOEBLEY, J., DURBIN, M., GOLENBERG, E. M., CLEGG, M. T., AND MA, D. P. 1990. Evolutionary analysis of the large subunit of carboxylase (rbcL) nucleotide sequence among grasses (Gramineae). Evolution. 44:1097-1108. DONOGHUE, M. J., OLMSTEAD, R. G., SMITH, J. F., AND PALMER, J. D. 1993. Phylogenetic relationships of Dipsacales based on rbcL sequences. Annals of the Missouri Botanic Garden 79:333-345. DOYLE, J. A., DONOGHUE, M. J., AND ZIMMER, E. A. 1994 Integration of morphological and ribosomal RNA data on the origin of angiosperms. Annals of the Missouri Botanic Garden 81:419-450.

66 DUVALL, M. R., CLEGG, M. T., CHASE, M. W., CLARK, W. D., KRESS, W. J., HILLS, H. G., EGUIARTE, L. E., SMITH, J. F., GAUT, B. S., ZIMMER, E. A., AND LEARN, G. H. 1993. Phylogenetic hypotheses for the monocotyledons constructed from rbcL sequence data. Annals of the Missouri Botanic Garden 80:607-619. ESEN, A., AND HILU, K. W., 1989. Immunological affinities among subfamilies of the Poaceae. American Journal of Botany 76:196-203. ESEN, A., AND HILU, K. W., 1991. Electrophoretic and immunological studies of prolamins in the Poaceae. II. Phylogenetic affinities of the Aristideae. Taxon. 40:5- 17. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 39:783-791. FELSENSTEIN, J. 1992. PHYLIP: Phylogeny inference package. Version 3.4. University of Washington, Seattle. HAMBY, R. K., AND ZIMMER, E. A. 1988. Ribosomal RNA sequences for inferring phylogeny within the grass family (Poaceae). Plant Systematics and Evolution. 160:29-37. HILLIS, D. M. AND HUELSENBECK, J. P. 1992. Signal, noise, and reliability in molecular analyses. Journal of Heredity 83:189-195. HILU, K. W. 1985. Trends of variation and systematics of Poaceae. Taxon. 34:102-114. HILU, K. W. 1994. Evidence from RAPD markers in the evolution of Echinochloa millets (Poaceae). Plant Systematics and Evolution 189:247-157. HILU, K. W., AND ESEN, A. 1988. Prolamin size diversity in the Poaceae. Biochemical Systematics and Ecology. 16:457-465. HILU, K. W., AND ESEN, A. 1990. Prolamin and immunological similarities in subfamilies Panicoideae and Chloridoideae. In Abstracts of the Annual Meeting of the Botanical Society of America, Richmond, Va., 1990. Amercan Journal of Botany 77:137. (Abstr. No. 354). HILU, K. W., AND ESEN, A. 1993. Prolamin and immunological studies in the Poaceae. III. subfamily Chloridoideae. American Journal Botany 80:104-113.

67 HILU, K. W., AND JOHNSON, J. L. 1991. Chloroplast DNA reassociation and grass phylogeny. Plant Systematics and Evolution. 176:21-31. HILU, K. W., AND WRIGHT, K. 1982. Systematics of Gramineae: a cluster analysis study. Taxon 31:9-36. HOLMQUIST, R. 1983. Transitions and transversion in evolutionary descent: an approach to understanding. Journal Molecular Evolution. 19:134-144. HSIAO, C., CHATTERTON, N. J., ASAY, K. H., AND JENSEN, K. B., 1994. Phylogenetic relationships of 10 grass species: an assessment of phylogenetic utility of the internal transcribed spacer region in nuclear ribosomal DNA in monocots. Genome. 37:112-120. JOHNSON L. A., AND SOLTIS, D. E. 1994. matK DNA sequences and phylogenetic reconstruction in Saxifragaceae s. str. Systematic Botany. 19:143-156. KUMAR, S., TAMURA, K., AND NEI, M. 1993. MEGA: molecular evolutionary genetics analysis. Version 1.01. Pennsylvania State University, University Park. LAKE, J. A. 1987. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Molecular Biology and Evolution. 4:167-191. LI, W. -H., AND GRAUR, D. 1991. Fundamentals of molecular evolution. Sinauer Associates, Inc., Sunderland, Massachusetts. LINDER, H. P., AND RUDALL, P. J. 1993. The megagametophyte in Anarthria (Anarthriaceae, ) and its implications for the phylogeny of Poales. American Journal of Botany 80:1455-1464. MADDISON, W. P., MADDISON, D. R. 1992. MacClade: analysis of phylogeny and character evolution. Version 3.0. Sinauer, Sunderland, Massachusetts. NADOT, S., BAJON, R., AND LEJEUNE, B. 1994. The chloroplast gene rps4 as a tool for the study of Poaceae phylogeny. Plant Systematics and Evolution 191:27-38. NEUHAUS, H., AND LINK, G. 1987. The chloroplast tRNALys(UUU) gene from mustard (Sinapis alba) contains a class II intron potentially coding for a maturase- related polypeptide. Current Genetics. 11:251-257. OLMSTEAD, R. G., AND PALMER, J. D. 1994. Chloroplast DNA systematics: a review of methods and data analysis. American Journal of Botany 81:1205-1224.

68 POHL, R. W. 1987. Man and the grasses: a history. In Grass systematics and evolution. Edited by T. R. Soderstrom, K. W. Hilu, C. S. Campbell, and M. A. Barkworkth. Smithsonian Institution Press, Washington, DC. QUICKE, D. L. J. 1993. Principles and techniques of contemporary . Chapman & Hall, Glasgow, UK. SAITOU, N., AND NEI, M. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution. 4:406-425. SAMBROOK, J., FRITSCH, E. F., AND MANIATIS, T. 1989. Molecular cloning: a laboratory manual. Vol. 1-3. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York. SODERSTROM, T. R. 1981. The grass subfamily Centostecoideae. Taxon. 30:614-616. SODERSTROM, T. R., AND CALDERON, C. E. 1979. A commentary on the bamboos (Poaceae: Bambusoideae). Biotropica. 11:161-172. STEELE, K. P., AND VILGALYS, R. 1994. Phylogenetic analyses of Polemoniaceae using nucleotide sequences of the Plastid gene matK. Systematic Botany. 19:126- 142. STEBBINS, G. L. 1956. Cytogenetics and evolution of the grass family. American Journal of Botany 43:890-905. SUGITA, M., SHINOZAKI, K., AND SUGIURA, M. 1985. Tobacco chloroplast tRNALys(UUU) gene contains a 2.5-kilobase-pair intron: an open reading frame and a conserved boundary sequence in the intron. Proceedings of the National Academy of Sciences, USA 82:3557-3561. SWOFFORD, D. L. 1990. PAUP: phylogenetic analysis using parsimony. Version 3.0. Illinois Natural History Survey, Champaign, Illinois. ------, G. J. OLSEN, P. J. WADDELL, AND D. M. HILLIS. 1996. Chapter. 11 phylogenetic inference, In D. M. Hillis, C. Moritz, and G. Mable [eds.], Molecular systematics, 2d ed., 405-514. Sunderland, MA. WATSON, L., CLIFFORD, H. T., DALLWITZ, M. J. 1985. The classification of Poaceae: subfamilies and supertribes. Australian Journal Botany. 33:433-486.

69 WOLFE, K. H., C. W. MORDEN, AND J. D. PALMER. 1992. Functional Evolution of a minimal plastid genome from a nonphotosynthetic parasitic plants. Proceedings of the National Academy of Sciences, USA 89:10648-10652.

70