AN ABSTRACT OF THE DISSERTATION OF

John V. Syring for the degree of Doctor of Philosophy in Botany and Pathology, presented on May 5. 2006.

Title: Phylogenetics and Genomic Patterns of Speciation in Pinus withan Emphasis on Subgenus Strobus.

Abstract approved:

Redacted for Privacy Redacted for Privacy

Data from nuclear ribosomal internal transcribed spacer regions (nrITS) and chioroplast DNA (cpDNA) have failed to resolve phylogenetic relationships in Pinus. To provide greater interspecific resolution, five low-copy nuclear genes were developed from mapped anchor loci. Four genes were sequenced from species representing all Pinus subsections. Individual loci do not uniformly support the nrITS or cpDNA hypotheses. Combined analysis of low-copy nuclear loci produces a well-supported subsectional topology. The phylogenetic positionsofF. nelsonii and P. krempfui are of systematic interest. Results stronglysupport P. nelsonii as sister to sect. Parrya, and suggest a moderately well-supported position of P. krempfiias sister to the remaining sect. Quinquefoliae. The most informative locus,a Late Embryogenesis Abundant-like gene, was used to explore phylogenetic relationships among closely related species in subg. Strobus. Thirty-nine species were sequenced, with twoor more alleles representing 33 species. Nineteen of 33 species exhibited allelic nonmonophyly in the strictconsensus tree. Nucleotide diversity was strongly associated (P<0.000 1) with the degree of species monophyly.While species nonmonophyly complicates phylogenetic interpretations, this locus offersgreater topological support than cpDNA or nrITS. Lacking evidence for hybridization,recombination, or imperfect taxonomy, incomplete lineage sorting remains the bestexplanation for trans-species polymorphisms. The absence of allelic coalescence isa severe constraint in the application of phylogenetic methods in Pinus, and taxa sharing similar life history traitsmay show analogous patterns. While lack of coalescence may limit their utility in traditional phylogenetics,nuclear genes remain highly informative in describing speciation events. Pinus chiapensis isa threatened species originally described as a variety ofF. strobus. Prior morphological work suggests P. chiapensis is a distinct species, but that taxonomy is not universally accepted. Multiple accessions of three probable progenitors were sequenced at three nuclear loci. No interspecific allele sharing occurs with P. chiapensis, and its alleles are monophyletic at two loci. Results demonstrate that P. chiapensis is a distinct species. However, determination of the sister species is complicated by lack of species monophyly and interlocus variability. Pinus ayacahuite is the least likely progenitor, but the relationship of P. chiapensis to P. monticola or P. strobus is unclear. © Copyright by John V. Syring May 5, 2006 All Rights Reserved PHYLOGENETICS AND GENOMIC PATTERNS OF SPECIATION IN PINUS WITH AN EMPHASIS ON SUBGENUS STROBUS

by John V. Syring

A DISSERTATION

submitted to

Oregon State University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Presented May 5, 2006 Commencement June 2006 Doctor of Philosophy dissertation of John V. Syring presented on May 5, 2006.

APPROVED:

Redacted for Privacy

Co-Major Professor, representing Botany and Plant Pathology

Redacted for Privacy

Co-Major Professor, representing Botany and Plant Pathology

Redacted for Privacy

Chair of the Departme(t of Botany and Plant Pathology

Redacted for Privacy Dean of the GrLiduat School

I understand that my dissertation will become part of the permanent collection ofOregon State University libraries. My signature below authorizes release ofmy dissertation to any reader upon request. Redacted for privacy

Author CONTRIBUTION OF AUTHORS

Drs. Aaron Liston and Richard Cronn were intensively involved in the writing, analyzing, and editing of each of the three manuscripts. Ann Willyardwas involved in collecting sequencing data and editing of the first manuscript as well as providing timely and constructivecomments on each of the additional manuscripts. Dr. Roman Businsk provided much of the Asian material used in this dissertation, and collaborated in the analysis, writing, and editing of the second manuscript. Kathleen Farrell was an undergraduate doing research inour lab, and contributed significantly to both the collection and analysis of data for the second manuscript.Dr. Rafael del Castillo collaborated on the data analysis, writing, and editing of the third manuscript.

Dr. Castillo also provided much of the Mexican material from P.chiapensis,and provided guidance and assistance for my fieldwork in southern Mexico. TABLE OF CONTENTS

Chapter 1. General Introduction .1

THE GENUS PIN US...... I THE SYSTEMATIC AND PHYLOGENETIC HISTORY OF PIN US (IN PART) AND THE PURPOSE OF THIS RESEARCH...... 2

THE COMPOSITION OF THE DISSERTATION...... 5

REFERENCES...... 7 Chapter 2. Evolutionary relationships amongPinus() subsections inferred from multiple low-copy nuclear loci...... 13 ABSTRACT...... 14 iNTRODUCTION...... 14 MATERIALS AND METHODS...... 17 Plantmaterials...... 17 Locus selection, amplification, and sequencing...... 17 Sequence alignment and data analysis...... 19 RESULTS...... 21 Locusselection...... 21

Sequence characteristics of low-copy nuclear loci inPinus...... 21 Phylogenetic analyses...... 24 DISCUSSION...... 29 Pine phylogeny and taxonomic sampling...... 29

Which loci have the greatest phylogenetic utility inPinus...... 29 Molecular data and prospects for resolving phylogenetic relationships within ...... 32 ACKNOWLEDGEMENTS...... 35 REFERENCES...... 35 Chapter 3. Widespread genealogical nonmonophyly in species of Pinussubgenus Strobus...... 53 ABSTRACT...... 54 INTRODUCTION...... 54 MATERIALS AND METHODS...... 58 Plantmaterials...... 58 Choiceof outgroup...... 58 Locus amplification, sequencing, alignment and recombination...... 59 TABLE OF CONTENTS (Continued)

MATERIALS AND METHODS (Continued) Phylogenetic and statistical analysis...... 60 Coalescent expectations of genic and genomic monophyly...... 60 RESULTS...... 61 Sequences, alignment characteristics, allelic diversity and Recombination...... 61 Phylogenetic analyses...... 63 Species monophyly and toplogical conflict arising from genealogical nonmonophyly...... 65 DISCUSSION...... 68 Phylogenetic implications...... 68 Possible factors leading to species-level nonmonophyly in Pinus...... 70 How do pine species attainmonophyly7...... 75 Broader implications for phylogenetic studies of gymnosperms and angiospenns...... 77 ACKNOWLEDGEMENTS...... 79 REFERENCES...... 80 Chapter 4: Multiple nuclear loci reveal the distinctiveness of the threatened, neotropical Pinus chiapensis (Pinaceae)...... 102

ABSTRACT...... 103

iNTRODUCTION...... 103 MATERIALS AND METHODS...... 107 Plant materials and distribution maps...... 107 Locus amplification, sequencing, alignment, and data analysis...... 108 RESULTS...... 110 Sequence characteristics of low-copy nuclear loci in Pinus...... 110

Recombination...... 113

Phylogenetic analyses...... 113

Topological constraints on the sister relationship of Pinus chiapensis...... 115 DISCUSSION...... 116 Recognition of Pinus chiapensis at the taxonomic rank of species...... 116 TABLE OF CONTENTS (Continued)

DISCUSSION (Continued)

IdentiQ'ing a progenitor of Pinus chiapensis...... 117 Genetic diversity of Pinus chiapensis and patterns of population Subdivision...... 119 ACKNOWLEDGEMENTS...... 121 REFERENCES...... 122 Chapter 5: CONCLUSIONS...... 133 GENERAL CONCLUSIONS...... 133 FUTURERESEARCH...... 135 REFERENCES...... 137 BIBLIOGRAPHY...... 140 LIST OF FIGURES

Figure

2.1 Gene diagrams for the nuclear loci included in this study...... 48

2.2 Pairwise intron alignments within and between subgenera of Pinus...... 49

2.3 Most-parsimonious trees derived from individual loci...... 50

2.4 Most-parsimonious trees derived from four low-copy nuclear loci combined (A), and cpDNA [matK+rbcL], nrITS, and four low-copy nuclear loci combined(B)...... 51

2.5 Percent variable characters by coding and non-coding regions for the six data setsexamined in this study...... 52

3.1 One of 20 most-parsimonious trees for section Quinquefoliae...... 98

3.2 One of 4495 most-parsimonious trees for section Parrya...... 100

4.1 Distribution and sample locations for all accessions at all loci...... 129

4.2 Graphic respresentation of within-taxa and between-taxa nucleotide diversity as measured by p-distances...... 130

4.3 Most parsimonious frees derived from individual loci...... 131 LIST OF TABLES

Table

2.1 Sampled Pinus species and Picea outgroup...... 42

2.2 Variability of low-copy nuclear genes in Pinus, with comparisons to nrITS andcpDNA...... 44

2.3 Pairwise incongruence length difference test (ILD) results. Picea included in theanalysis...... 45

2.4 Templeton test results of six loci and two combined data sets (one including all low-copy nuclear genes and the other a concatenation of all six loci)...... 46

2.5 Comparison of three data sets used in reconstructing Pinus phylogeny. Picea not included in statistics...... 47

3.1 Representative studies from the recent literature taken from four prominent journals (Systematic Biology, Systematic Botany, American Journal of Botany, Plant Systematics and Evolution)...... 89

3.2 Sampled Pinus subg. Strobus representatives...... 90

3.3 Interspecific nucleotide diversity(it),ranges, and monophyletic status of the species included in this study...... 94

3.4 Wilcoxon Signed Rank (WSR) tests of species monophyly...... 95

3.5 Nucleotide diversity and geographic range of strongly nonmonophyletic versus monophyletic to weakly nonmonophyletic species of pines...... 96

3.6 Population and genetic parameters for select Pinus species (seetext for details on calculations)...... 97

4.1 Sampled Pinus representatives...... 126 LIST OF TABLES (Continued)

Table

4.2 Summary of the number of sequences (N) and the number of unique alleles (A) acquired per species across all three loci...... 127

4.3 Wilcoxon Signed Rank (WSR) results from the topological constraint of Monophyly enforced on either individual species or species pairs...... 128 I dedicate this work to my wife,Lou,our future together, and to our extended community of friends and family. Phylogenetics and Genomic Patterns of Speciation in Pinus withan Emphasis on Subgenus Stro bus

CHAPTER 1

General Introduction

THE GENUS PIN US With over 100 extant species, Pinus represents the most widespreadgenus of woody in the Northern Hemisphere (Price et al., 1998; Critchfield and Little, 1966). Pinus is the largest of the eleven genera of the Pinaceae. It contains almost half of the 225 species composing the Pinaceae (Farjon, 1998), a family that is monophyleticamong gymnosperms (Hart, 1987; Chase et al., 1993; Chaw et al., 1997; Gugerli etal., 2001). Within the Pinaceae, Pinus displaysa number of unique morphological features, such as needle-like leaves clustered in fascicles(shoot dimorphism) of 2-5 (rarely 1 or 6-8) and woody ovulatecone scales with specialized apical regions that represent the areas left exposed in the firstyear and mature cone (umbos). These features, in combination with molecular evidence (Wang et al., 2000a; ListonCt aL, 1999, 2003), indicate that members of the genus Pinus form a distinct monophyletic lineage thatis strongly differentiated from related genera such as Cathaya and Picea. Pinus is a diverse and relatively ancient genus with origins that date to the late Cretaceous (145125 million years ago; Alvin, 1960). Paleontologica! and molecular data suggest that the first major divergence event separating extant lineages occurred perhaps8545 million years ago (Miller, 1973; Meijer, 2000; Magallon and Sanderson,2002; Willyard et al., in review), giving rise to two distinct lineages recognized todayas subg. Pinus and subg. Strobus, the "hard" and "soft" pines, respectively (Liston et al., 1999; Gernandtet al., 2005; Syring etal. 2005). While there is extensive morphological variation in thegenus, most character states exhibit homoplasy across subgenera and sections (Malusa, 1992; Gernandtet al., 2005). The number of fibrovascular bundles per needle is the only diagnostic characterthat is non- homoplastic for the two subgenera (one for subg. Stro bus, two for subg.Pinus). Recent classifications recognize 7172 species of subg. Pinus, and 36-40 species of subg. Strobus (Farjon, 2005; Gernandt et al., 2005). Specific treatments for regions withhigh endemism, e.g., 2

Mexico (Peny, 1991) and East Asia (Businsk, 1999, 2004), distinguishnarrower taxonomic limits and thus recognize several additional species. Among coniferous genera, the importance of Pinus is difficult to overestimate. Species from this genus account for a large percentage of the world's annual fiber productionas well as providing many secondary products of economic importance (e.g., edible , resins, turpentine, ornamentals; Le Maitre, 1998). Ecologically, species of Pinus function ina myriad of roles (Richardson and Rundel, 1998), ranging from early-successional species that colonize disturbed habitats (P. contorta, P. banksiana, P. slyvestris, P. muricata; Mirov, 1967), toa critical food source for corvid birds and many species of mammals (subsect. Cembroides and members of subsect. Quinquefolius; Tomback and Linhart 1990, Lanner 1998, Pease and Mattson 1999), to an important fuel source in fire-prone and fire-dependent ecosystems (P. coulteri, P. jeffreyi; P. sabiniana; P. banksiana, P. pal ustris, Agee, 1998; Schwilk and Ackerly, 2001).

THE SYSTEMATIC AND PHYLOGENETIC HISTORY OF FINUS (IN PART) AND THE PURPOSE OF THIS RESEARCH The extensive utility (Le Maitre, 1998) and ecological significance (Richardson and Rundel, 1998) of Pinus has made it the focus of numerous systematic (reviewed in Mirov, 1967; Little and Critchfield, 1969) and molecular-evolutionary studies (reviewed in Priceet al., 1998; Liston et al., 2003). The history of pine systematics includesan extensive body of research and has been well documented in other references (see Mirov, 1967; Little and Critchfield,1969). The earliest influential work of the 20th century was by Shaw (1914), whichwas the first monograph to consider the classification of pines in a framework of character evolution (Mirov, 1967). Throughout the 20th century there have been many modifications of Shaw's (1914) classification as well as unique circumscriptions (reviewed in Mirov, 1967). Significant among these is the widely cited classification of Little and Crichfield (1969). Prior to theonset of the molecular revolution, pine taxonomy was based on morphological and chemical analyses(reviewed in Mirov, 1967; Price et al., 1998), or on studies of intrageneric hybridization(Critchfield, 1986). In the age of molecular phylogenetics, the interest in Pinus continuesto lure researchers, and ca. 30 molecular phylogenetic studies have been publishedon pine phylogenetics in the past two decades. Among these studies, a variety of methods have been used to infer relationships, including allozymes (Conkle et al., 1988; Millar et al., 1988; Karalamangala,1989; Parker et al., 1997), restriction fragment length polymorphisms of DNA (Strauss and Doerksen,1990; Perez de la Rosa et al, 1995; Krupkin et al., 1996),anonymous DNA markers (Dvorak et al., 2000; 3

Nkongolo et al., 2002), direct gene sequencing (Liston Ct al., 1999; Wang et al., 1999; Zhang and Li, 2004; Gernandt et al., 2005; Ma et al., 2006), and various combinations of the above (Wu et at., 1999). The most inclusive studies to date have utilized chioroplast DNA (Wang and Schimdt, 1994; Krupkin et at., 1996; Wang et al., 1999; Lopez et al., 2002; Gernandt et al., 2003, 2005; Eckert and Hall, in press); when combined, they provide an evolutionary hypothesis for 101 of the approximately 110 species in the genus. Phylogenetic analyses using nuclear ribosomal DNA (primarily the ITS region) have also been completed for approximately 45% of thegenus (Govindaraju et al., 1992; Liston et al., 1999, 2003; Gernandt et al., 2001).

Given the wealth of information available forPinus,it is perhaps surprising that many evolutionary relationships remain to be resolved. Direct comparisons of the most complete chioroplast (Gernandt et at., 2005) and nuclear ribosomal ITS (Liston et at., 1999, 2003) data sets for pines provide an example of the limits to our current understanding. Phylogenetic estimates derived from cpDNA include 3060 bp of matK and 1460 bp of rbcL sequence representing 101 species (Gernandt et al., 2005), while estimates from nrDNA-ITS are derived from 650 bp

spanning ITS2 (complete) and 233 bp ofITSi from 47 species inPinus(Liston et al., 1999, 2003). In a direct comparison of these data sources, 12 of 17 commonly-resolved clades (subsections, approximately) appear congruent, although it is important to note that nearly all clades have low bootstrap support. Major topological incongruence between these phylogenetic estimates occurs in three cases: the position of subsect. Contortae, the position of P. krempfii (subsect. KrempJIanae in full), and the monophyly or paraphyly of subsect. Pinaster. However, even in regions of topological agreement, both sources of data are marked by a nearly ubiquitous lack of resolution within the terminal clades. Resolving relationships among recently diverged taxa isa pervasive problem in molecular systematics, and it may be the most daunting challenge ofan infrageneric phylogeny of Pinus since studies based on ITS and cpDNA return numerous polytomies at this level of divergence. For example, a combined analysis of four cpDNA loci (>3500 bp) yieldeda 6-species polytomy in a lO-taxa sample of sect. Strobus (Wang et al., 1999), anda combined analysis of four cpDNA loci (>3200 bp) yielded a 7-species polytomy for the 7 sampled membersof subsects. Australes, Le!ophyllae, and Oocarpae (Lopez Ct al., 2002). Similarly,over 2400 bp of nrDNA sequence fail to resolve relationships among subsects. Cembroides, Nelsoniae,and Balfourianae, and produce a 13-species polytomy for the 15 species in subsect. Cembroides (Gernandt et al., 2001). The lack of resolution in cpDNA-based studies resultsfrom insufficient 4

sequence variation (Gernandt Ct al., 2005), while excessive intragenomic polymorphism in ITS (apparently resulting from a lack of ITS coalescence within species; Gernandtet al., 2001) places a limit on the utility of nrDNA. Cumulative evidence clearly shows that cpDNA and nrDNAare unlikely to provide sufficient character support fora robust phylogeny of Pinus. Even if either data source were to provide satisfactory resolution, conflict between these two sources of data will impose additional uncertainty that is not easily resolved. This

observation of cpDNA / nrDNA incongruence is a widely cited phenomenon (reviewedin Wendel and Doyle 1998), and the underlying basis for this incongruenceranges from biological (e.g., unsuspected reticulation events, lineage sorting) to methodological (e.g.,inappropriate match between mode of sequence evolution and analytical approach). In addition, resolutionof lower-level taxonomic relationships based on cpDNA and nrDNAcan be limited (Clegg et al., 1994; Small et al., 2004) due to the conservative nature of cpDNA and the frequentpositional homoplasy in nrDNA (Alvarez and Wendel, 2003). If molecular phylogenetic approaches are to deliveron the promise of a limitless supply of characters (Avise, 1994), reasonable strategies need to be adoptedto exploit the untapped evolutionary signal residing in low-copy nuclear genes. Recent studies using low-copynuclear loci (Harris and Disotell, 1998; Springer et al., 2001; Cronnet al., 2002; Malcomber, 2002; Cronn et al., 2003; Rokas et al., 2003; Alvarez et al., 2005; Ma et al., 2006) reveal thepower of using multiple low-copy nuclear loci for disentangling phylogenetic relationshipsat low taxonomic levels. These studies show that multiple independent markerscan sample a range of substitution rates and patterns (e.g., Gaut and Doebley, 1997; Cronn et al., 1999), and that they havethe potential to provide greater resolution than cpDNA and nrDNA (aloneor combined) for identifying temporally-compressed radiation events, relationshipsamong closely-related species complexes, and historical hybridization events (Harris and Disotell, 1998;Springer et al., 2001; Cronn et al., 2002; Malcomber, 2002; Cronn et al., 2003; Smallet al., 2004). Perhaps most significantly, these studies show that low-copy nuclear locican converge upon strongly-supported phylogenetic conclusions that statistically reject hypotheses derived from eitherorganellar sequences (Springer et al., 2001; Cronn et al., 2002; Cronn et al., 2003),orfrom bothorganellar DNA and nrDNA (Cronn et al., 2003).

The large nuclear genome of pines (22 37 pg/2C; Grotkoppet al., 2004) presents a challenge in the development and application of low-copy nuclear markersfor phylogenetic applications, as multigene families (Kinlaw and Neale, 1997) andrepetitive and retrotransposon DNA (Friesen Ct al., 2001) are abundant. Despite these obstacles, genomics andgenetic linkage

mapping efforts directed towards P.taeda(loblolly pine; Brown et al., 2001) andP. radiata (Monterey pine, Devey et al., 2004) have revealeda large number of low-copy markers that show orthology across pine species (Brown et at., 2001). Because these conifer anchorloci show strong evidence for positional orthology, they make promising candidates for populationand phylogenetic applications in pines and other members of Pinaceae (Brownet al., 2004). The purpose of this dissertation is to determine the utility of low-copynuclear loci in resolving relationships among species of pines, and to initiatea multiple-gene, low-copy nuclear phylogeny of the genusPinus.A complete, multiple-gene, low-copy nuclear phylogenywas beyond the scope of this research, but it is a goal that could be accomplished in thenear future, in part, as a result of the work contained in this dissertation. In addition, thepurpose of my dissertation was to characterize the behavior of nuclear loci ina phylogenetic context, and to document how low-copy nuclear genes track speciation events. Iexpect this research to have a sizeable impact on a diversity of fields. First, the results of this body of workwill either resolve or will begin to resolve questions regarding the taxonomic affinities formany of the species in this notable genus. The results will further facilitate studies of pine morphologicalevolution and biogeography. A resolution of relationships among theCa. 110 species will provide a solid foundation for ecological, evolutionary, and molecular genetic studies ofPinus,and be valuable to researchers studying the many organisms associated with pines, eitheras ecological associates, or as obligatory commensals and parasites. In addition, data from this study will provide insights into the temporal and spatial radiation of extant members of the pinegenus. The low-copy nuclear genes developed for phylogenetic analysis of the genetically diverse genusPinuswill have application in phylogenetic and population-genetic investigations throughoutthe Pinaceae. Finally, this study will have broad significance to the systematicscommunity, as it provides yet another successful example of exploiting the wealth of genetic markersemerging from eDNA and genomic libraries, and comparative genome mapping studies.

THE COMPOSITION OF THE DISSERTATION

This dissertation includes three manuscripts that outline the hierarchicalmethodology that was employed to explore the use of low-copy nucleargenes to provide greater resolution of phylogenetic relationships Pinus. The first manuscript (Chapter 2)outlines the procedures used in the screening of low-copy nuclear loci that have been mapped invarious pedigrees from P.taeda, P. radiata and other important pine species (Syring et al., 2005; Brown et al., 2001; Devey etal., 2004). Four of these 'conifer anchor loci' are used to address phylogenetic relationshipsamong exemplar species selected from the ca. twelve major subsections of Pinus. Resu!ts from this study indicate that individual loci do not uniformly support either the nrITSor cpDNA hypotheses and in some cases produce unique topo!ogies. Combined analysis of low-copy nuc!ear loci producesa well-supported subsectional topology of two subgenera, each divided into two sections, congruent with prior hypotheses of deep divergence in Pinus. The placements of P. nelsonii, P. krempfii, and P. contorta have been of continued systematic interest. Results presented in this manuscript strongly support the placement of P. nelsonii as sister to the remaining members of sect. Parrya, suggest a moderately well-supported and consistent position of P. krempJIi as sister to the remaining members of sect. Quinquefoliae, and are ambiguous about the placement of P. contorta. Chapter 2 addresses the challenging issue of combining molecular data across loci to achieve finer phylogenetic resolution. This approach has not been without controversy, and the intricacies involved in assessing character congruence across data setsare outlined in this chapter. The locus screening and evaluation strategy presented herecan be broadly applied in other taxa to facilitate the development of phylogenetic markers from the increasing number of available genomic resources. The second manuscript (Chapter 3) explores the phylogenetic potential of themost informative of the developed loci to resolve relationshipsamong closely related species in subgenus Strobus, the soft pines. To provide greater resolutionamong these taxa, a 900-bp intron from a Late Embryogenesis Abundant (LEA)-likegene (1FG86 12) was sequenced from 39 pine species, with two or more alleles representing 33 species. In phylogenetic studies thatuse a single individual to represent a species, an implicit assumption is made that thespecies is monophyletic (Shaw and Small, 2005; Funk and Omland, 2003). Ina test of this assumption, 19 of 33 species exhibited allelic nonmonphyly in the strict consensus tree, and 10 deviated significantlyfrom allelic monophyly based on topology incongruence tests. Intraspecific nucleotidediversity ranged from 0.00.0211, and analysis of variance shows that nucleotide diversitywas strongly associated (P<0.000 1) with the degree of species monophyly. While speciesnonmonophyly complicates phylogenetic interpretations, this nuclear locus offersgreater topological support than previously observed for cpDNA or nrITS. Lacking evidence for hybridization, recombination, or imperfect taxonomy, we feel that incomplete lineage sortingremains the best explanation for the polymorphisms shared among species. The absence of alleliccoalescence was 7

determined to be a severe constraint in the application of phylogenetic methods in Finus, and I suggest that taxa sharing similar life history traits with Pinus are likely to show species nonmonophyly using nuclear markers. The final manuscript (Chapter 4) focuses narrowly on a specific question in pine systematics, the taxonomic status of the threatened, neotropical Pinus chiapensis. This species was first described as a disjunct variety of the widespread P. strobus from the eastern United States and southeastern Canada. Prior morphological work (Andresen, 1966) has suggested that P. chiapensis is a distinct species, but not all authors and pine systematists have accepted that taxonomy. In this study we sequenced multiple accessions from across the range of each of the three most probable progenitors of P. chiapensis at three nuclear loci. Our results showed that P. chiapensis had the lowest combined nucleotide diversity of any of the four species, and had only a single allele across its entire range at one of the loci. Pinus chiapensis does not share alleles with any of the possible progenitors we tested and the alleles for this pineare monophyletic at two of the three loci. While the results are conclusive that P. chiapensis is at leastas distinct as the remaining three widely accepted species, determination of the most recentcommon ancestor was complicated by lack of species monophyly and interlocus variability. Lacking knowledge of the sister species, any phylogeographic story is premature. Rangewide population sampling of P. chiapensis failed to uncover any consistent population structureas has been supported in other studies. Data from this chapter suggest that while allelic nonmonophylymay detract from the utility of low-copy nuclear genes in phylogenetics, that these lociare highly informative and contain information on patterns of speciation among closely related taxa. My dissertation finishes (Chapter 5) with conclusions covering each of the manuscripts, and outlines the future role of my findings in pine phylogenetics.

REFERENCES

Alvarez, I. and J. F. Wendel. 2003. Ribosomal ITSsequences and plant phylogenetic inference. Molecular Phylogenetics and Evolution 29:417-34.

Alvarez, I., R. C. Cronn, and J. F. Wendel. 2005. Phylogenyof the New World diploid cottons (Gossypium L., Malvaceae) based on sequences of three low-copy nucleargenes. Plant Systematics and Evolution 252:199-214.

Alvin, K. 1960. Further of the Pinaceae from the Wealden Formation of Belgium. Memoirs of the Royal Institute of Natural Sciences, Belgium 146:1-39. Andresen, J. W. 1966. A multivariate analysis of the Pinus chiapensismonticolastrobus phylad. Rhodora 68:1-24.

Avise, J. C. 1994. Molecular markers, natural history and evolution. Chapman and Hall, New York.

Brown, 0. R., E. E. Kadel Ill, D. L. Bassoni, K. L. Kiehne, B. Temesgen, J. P.van Buijtenen, M. M. Sewell, K. A. Marshall, and D. B. Neale. 2001. Anchored reference loci in loblolly pine (Pinus taeda L.) for integrating pine genomics. Genetics 159:799-809.

Brown, G. R., G. P. Gill, R. J. Kuntz, C. fl. Langley, and D. B. Neale. 2004. Nucleotide diversity and linkage disequilibrium in loblolly pine. Proceedings of the National Academy of Sciences 10 1:15255-15260.

Businsk', R. 1999. Taxonomic revision of Eurasian pines (genus Pinus L.)Survey of species and infraspecific taxa according to latest knowledge. Acta Pruhoniciana 68: 7-86.

Businsk', R. 2004. A revision of the Asian Pinus subsection Strobus (Pinaceae). Willdenowia 34: 209-257.

Chase, M. W., ET AL. 1993. Phylogenetics of plants: An analysis of nucleotidesequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden 80: 528-548+550-580.

Chaw, S. M., A. Zharkikh, H. M. Sung, T. C. Lau, and W. H. Li. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: Analysis of nuclear 18S rRNAsequences. Molecular Biology and Evolution 14:56-68.

Clegg, M. T., B. S. Gaut, G. H. Learn Jr., and B. R. Morton. 1994. Rates andpatterns of chloroplast DNA evolution. Proceedings of the National Academy of Sciences 91:6795-6801.

Critchfield, W. B. 1986. Hybridization and classification of the white pines (Pinus section Strobus). Taxon 35:647-656.

Critchfield, W. B., and E. L. Little, Jr. 1966. Geographic distribution of the pines of theworld. USDA Forest Service Miscellaneous Publication 991. Washington, DC.

Cronn, R. C., R. L. Small, and J. F. Wendel. 1999. Duplicatedgenes evolve independently after polyploid formation in cotton. Proceedings of the National Academy of Sciences,USA 96: 14406-14411.

Cronn, R. C., R. L. Small, T. Haselkorn, and J. F. Wendel. 2002. Rapid diversificationof the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chioroplast genes. American Journal of Botany 89:707-725.

Cronn, R. C., R. L. Small, T. Haselkorn, and J. F. Wendel. 2003. Crypticrepeated genomic recombination during speciation in Gossypium gossypioides. Evolution57:2475-2489.

Devey, M. E., S. D. Carson, M. F. Nolan, A. C. Matheson, C. Te Riini, and J.Hohepa. 2004. QTL associations for density and diameter in Pinus radiata and the potential formarker-aided selection. Theoretical Applied Genetics 108:51 6-524.

Dvorak, W. S., A. P. Jordan, G. P. Hodge, and J. L.Romero. 2000. Assessing evolutionary relationships of pines in the Oocarpae and Australes subsections using RAPD markers. New Forests 20:163-192.

Eckert, A. J., and B. D. Hall. Phylogeny, historical biogeography, and patterns of diversification for Pinus (Pinaceae): Phylogenetic tests of fossil-based hypotheses. Molecular Phylogentics and Evolution, in press.

Farjon, A. 1998. World Checklist and Bibliography of Conifers. Royal Botanical Gardens at Kew, Richmond, U.K.

Farjon, A. 2005. Drawings and descriptions of the genus Pinus,2ded. E. J. Brill and W. Backhuys, Leiden, Netherlands.

Friesen, N., A. Brandes, and J. S. Heslop-Harrison. 2001. Diversity, origin, and distribution of retrotransposons (gypsy and copia) in conifers. Molecular Biology and Evolution 18:1176- 1188.

Funk, D. J., and K. E. Omland. 2003. Species-level paraphyly and polyphyly: Frequency, causes, and consequences, with insights from animal mitochondrial DNA. Annual Review of Ecology, Evolution, and Systematics 34:397-423.

Gaut, B. S., and J. F. Doebley. 1997. DNA sequence evidence for the segmental allotetraploid origin of maize. Proceedings of the National Academy of Sciences, USA 94:6809-6814.

Geada Lopez, G., K. Kamiya, and K. Harada. 2002. Phylogenetic relationships of diploxylon pines (subgenus Pinus) based on plastid sequence data. International Journal of Plant Science 163:737-747.

Gernandt, D. S., A. Liston, and D. Piflero. 2001. Variation in the nrDNA ITS of Pinus subsection Cembroides: implications for molecular systematic studies of pine species complexes. Molecular Phylogenetics and Evolution 21:449-467.

Gernandt, D. S., A. Liston, and D. Piñero. 2003. Phylogenetics of Pinus subsections Cembroides and Nelsoniae inferred from cpDNA sequences. Systematic Botany 4:657-673.

Gernandt, D. S., G. Geada Lopez,s.0. Garcia, and A. Liston. 2005. Phylogeny and classification of Pinus. Taxon 54:29-42.

Govindaraju, D., P. Lewis, and C. Cullis. 1992. Phylogenetic analysis of pines using ribosomal DNA restriction fragment polymorphisms. Plant Systematics and Evolution 179: 141-153.

Grotkopp, E., M. Rejmanek, M. J. Sanderson, and T. L. Rost. 2004. Evolution ofgenome size in pines (Pinus) and its life-history correlates: Supertree analyses. Evolution 58:1705-1729.

Gugerli, F., J. Senn, M. Anzidei, A. Madaghiele, U. Buchier, C. Sperisen, and G. G. Vendramin. 2001. Chloroplast microsatellites and mitochondrial nadi intron 2 sequences indicate congruent 10

phylogenetic relationships among Swiss stone pine (Pinus cembra), Siberian stone pine (Pinus sibirica), and Siberian dwarf pine (Pinuspumila). Molecular Ecology 10:1489-1497.

Harris, E. E. and T. R. Disotell. 1998. Nuclear gene trees and the phylogenetic relationships of the mangabeys (Primates: Papionini). Molecular Biology and Evolution 15:892-900.

Hart, J. A. 1987. A cladistic analysis of conifers: Preliminary results. Journal of the Arnold Arboretum 68:269-307.

Karalamangala, R. R. and D. L. Nickrent. 1989. An electrophoretic study of representatives of subgenus Diploxylon of Pinus. Canadian Journal of Botany 67:1750-1759.

Kinlaw, C. S. and D. B. Neale. 1997. Complex gene families in pine genomes. Trends in Plant Science 2:356-359.

Krupkin, A. B., A. Liston, and S. H. Strauss. 1996. Phylogenetic analysis of the hard pines (Pinus subgenus Pinus, Pinaceae) from chloroplast DNA restriction site analysis. American Journal of Botany 83:489-498.

Lanner, R. M. 1998. Seed dispersal in Pinus. In D. M. Richardson [ed., Ecology and biogeography of Pinus, 281-295. Cambridge University Press, New York.

Le Maitre, D. C. 1998. Pines in cultivation: A global view. In D. M. Richardson [ed.], Ecology and biogeography of Pinus, 407-431. Cambridge University Press, New York, New York, USA.

Liston, A., W. A. Robinson, D. Piflero, and E. R. Alvarez-Buylla. 1999. Phylogenetics ofPinus (Pinaceae) based on nuclear ribosomal DNA internal transcribedspacer region sequences. Molecular Phylogenetics and Evolution 11:95-109.

Liston, A., D. S. Gernandt, T. F. Vining, C. S. Campbell, and D. Piflero. 2003. Molecular phylogeny of Pinaceae and Pinus. Acta Horticulturae 615:107-114.

Little, E. L., Jr., and W. B. Critchfield. 1969. Subdivisions of thegenus Pinus. USDA Forest Service Miscellaneous Publication 1144. Washington, DC.

Lopez, G. G., K. Kamiya, and K. Harada. 2002. Phylogenetic relationships ofdiploxylonpines (subgenusPinus)based on plastid sequence data. International Journal of Plant Science 163:737-747.

Ma, X.-F., A. E. Szmidt, and X.-R. Wang. 2006. Genetic structure and evolutionary historyof a diploid hybrid pine Pinus densata inferred from the nucleotide variationat seven gene loci. Molecular Biology and Evolution 23:807-8 16.

Magallon, S., and M. J. Sanderson. 2002. Relationshipsamong seed plants inferred from highly conserved genes: Sorting conflicting phylogenetic signalsamong ancient lineages. American Journal of Botany 89:1991-2006.

Malcomber, S. T. 2002. Phylogeny of Gaertnera Lam. (Rubiaceae) basedon multiple DNA 11

markers: Evidence of rapid radiation in a widespread, morphologically diverse genus. Evolution 56:42-57.

Malusa, J. 1992. Phylogeny and biogeography of the pinyon pines (Pinus subsect. Cembroides). Systematic Botany 17:42-66.

Meijer, J. 2000. Fossil woods from the late Cretaceous Aachen Formation. Review of Palaeobotany and Palynology 112: 297-336.

Millar, C. I., S. H. Strauss, M. T. Conkle, and R. D. Westfall. 1988. Allozyme differentiation and biosystematics of the Californian closed-cone pines (Pinus subsect. Oocarpae). Systematic Botany 13:351-370.

Miller, C. 1973. Silicified cones and vegetative remains of Pinus from the Eocene of British Columbia. Contributions from the University of Michigan Museum of Paleontology 24:10 1- 118.

Mirov, N. T. 1967. The Genus Pinus. The Ronald Press Company, New York, New York, USA.

Nkongolo, K. K., P. Michael, and W. S. Gratton. 2002. Identification and characterization of RAPD markers inferring genetic relationships among pine species. Genome 45:51-58.

Parker, K. C., J. L. Hamrick, A. J. Parker, and E. A. Stacy. 1997. Allozyme diversity in Pinus virginiana (Pinaceae): Intraspecific and interspecific comparisons. American Journal of Botany 84:1372-1382.

Pease, C. M. and D. J. Mattson. 1999. Demography of the yellowstone grizzly bears. Ecology 80:957-975.

Perez de la Rosa, J., S. A. Harris, and A. Farjon. 1995. Noncoding chloroplast DNA variation in Mexican pines. Theoretical and Applied Genetics 91:1101-1106.

Perry, J. P. 1991. The Pines of Mexico and Central America. Timber Press, Portland, OR.

Price, R. A., A. Liston, and S. Fl. Strauss. 1998. Phylogeny and systematics of Pinus. In D. M. Richardson [ed.], Ecology and biogeography of Pinus, 49-68. Cambridge University Press, New York, New York, USA.

Richardson, D. M., and P. W. Rundel. 1998. Ecology and biogeography of Pinus: An introduction. In D. M. Richardson [ed.], Ecology and biogeography of Pinus, 3-46. Cambridge University Press, New York, New York, USA.

Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798-804.

Shaw, G. R. 1914. The genus Pinus. Publications of the Arnold Arboretum No. 5, Riverside Press, Cambridge, MA.

Shaw, J., and R. L. Small. 2005. Chloroplast DNA phylogeny and phylogeography of the North 12

American plums (Prunus subgenus Prunus section Prunocerasus, Rosaceae). American Journal of Botany 92:2011-2030.

Small, R. L., R. C. Cronn, and J. F. Wendel. 2004. Use of nuclear genes for phylogeny reconstruction in plants. Australian Systematic Botany 17:145-170.

Springer, M. S., R. W. DeBry, C. Douady, H. M. Amrine, 0. Madsen, W. W. de Jong, and M. J. Stanhope. 2001. Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Molecular Biology and Evolution 18:1 32-43.

Strauss, S. H. and A. H. Doerksen. 1990. Restriction fragment analysis of pine phylogeny. Evolution 44:1081-1096.

Syring, J., A. Willyard, R. Cronn, and A. Liston. 2005. Evolutionary relationshipsamong Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. American Journal of Botany 92:2086-2 100.

Tomback, D. F., and Y. B. Linhart. 1990. The evolution of bird-dispersed pines. Evolutionary Ecology 4:185-219.

Wang, X.-R., and A.E. Szmidt. 1994. Hybridization and chloroplast DNA variation ina Pinus species complex from Asia. Evolution 48:1020-1031.

Wang, X. R., Y. Tsumura, H. Yoshimaru, K. Nagasaka, and A. E. Szmidt. 1999. Phylogenetic relationships of Eurasian pines (Pinus, Pinaceae) based on chloroplast rbcL, matK, rpl2O-rpsl8 spacer, and tm V intron sequences. American Journal of Botany 86:1742-1753.

Wang, X. Q., D. C. Tank, and T. Sang. 2000a. Phylogeny and divergence times in Pinaceae: Evidence from three genomes. Molecular Biology and Evolution 17:773-81.

Wendel, J. F. and J. J. Doyle 1998. Phylogenetic incongruence: window intogenome history and molecular evolution. In D. Soltis, P. Soltis, and J. Doyle [eds.], Molecular Systematicsof Plants II, 265-296. Chapman and Hall, New York, New York, USA.

Willyard, A., J. Syring, D. Gernandt, A. Liston, and R. Cronn. Unpublished manuscript. Molecular evolutionary rates indicate a recent and rapid diversification for modern pine lineages.

Wu, J., K. V. Krutovskii, and S. H. Strauss. 1999. Nuclear DNA diversity,population differentiation, and phylgenetic relationships in the California closed-cone pines basedon RAPD and allozyme markers. Genome 42:893-908.

Zhang, Z. Y. and D. Z. Li. 2004. Molecular phylogeny of section Parryaof Pinus (Pinaceae) based on chioroplast matK gene sequence data. Acta Botanica Sinica46:17 1-179. 13

CHAPTER 2

Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low- copy nuclear loci

John Syring', Ann Willyard', Richard Cronn2, and Aaron Liston'

'Department of Botany and Plant Pathology, Oregon State University,Corvallis, Oregon 97331 USA

2Pacflc Northwest Research Station, USDA Forest Service, 3200SW Jefferson Way, Corvallis, Oregon 97331 USA

American JournalofBotany 92(12): 2086-2100 14

ABSTRACT Sequence data from nrITS and cpDNA have failed to fully resolve phylogenetic relationships among Pinus species. Four low-copy nuclear genes, developed from the screening of 73 mapped conifer anchor loci, were sequenced from 12 species representing all subsections. Individual loci do not uniformly support either the nrITS or cpDNA hypotheses and insome cases produce unique topologies. Combined analysis of low-copy nuclear loci producesa well-supported subsectional topology of two subgenera, each divided into two sections, congruent with prior hypotheses of deep divergence in Pinus. The placements of P. nelsonii, P. krempfii, and P. contorta have been of continued systematic interest. Results strongly support the placement ofF. nelsonii as sister to the remaining members of sect. Parrya, suggesta moderately well-supported and consistent position of P. krempfii as sister to the remaining members of sect. Quinquefoliae, and are ambiguous about the placement ofF. contorta. A successful phylogenetic strategy in Pinus will require many low-copy nuclear loci that include a high proportion of silent sites and derive from independent linkage groups. The locus screening and evaluationstrategy presented here can be broadly applied to facilitate the development of phylogenetic markers from the increasing number of available genomic resources.

INTRODUCTION Pinus L. is one of the eleven genera of Pinaceae,a family that is monophyletic among gymnosperms (Hart, 1987; Chaw et al., 1997). Pinus is by far the largest genus in the family, and its approximately 110 species comprise Ca. 50% of the Pinaceae (Farjon,1998). Unique morphological features such as needle-like leaves clustered in fascicles (shoot dimorphism)and woody ovulate cone scales with specialized apical regions (umbos), incombination with molecular evidence (Wang et al., 2000a; Liston et al., 2003), indicate thatmembers of the genus Pinus are well-differentiated from related genera suchas Cathaya Chun and Kuang and Picea A. Dietrich.

The extensive utility (Le Maitre, 1998) and ecological significance (Richardsonand Rundel, 1998) of Pinus has made it the focus ofnumerous molecular evolutionary studies (reviewed in Price et al., 1998; Liston et al., 2003). Among these studies,a variety of data types have been used to infer relationships, including allozymes (Karalamangalaand Nickrent, 1989), restriction fragment length polymorphisms (Strauss and Doerksen,1990; Krupkin et al., 1996), anonymous DNA markers (Dvorak et at., 2000; Nkongoto et at., 2002), DNA sequencing (Liston 15

et al., 1999; Wang et al., 1999; Gemandt et aL, 2005), and various combinations of the above (Wu et al., 1999). The most inclusive studies have utilized chioroplast DNA sequencing (Krupkin et al., 1996; Wang et al., 1999; Geada Lopez etal., 2002), and a cpDNA-based (1555 bp of matK, 1262 bp of rbcL) phylogenetic hypothesis for 101 species of Pinus has recent!y been pub!ished (Gemandt et al., 2005). Phylogenetic analyses using the nuclear ribosomal DNA ITS region have also sampled nearly half of the species (Liston et al., 1999, 2003; Gemandt et al., 2001). With the wealth of data availab!e for Pinus, it is perhaps surprising that key phylogenetic relationships remain unreso!ved. Comparisons between the most comprehensive chlorop!ast (Gernandt et al., 2005) and nuclear ribosoma! ITS (Liston et al., 2003) data sets highlight the !imits to our current understanding. In these ana!yses, 12 of 17 common!y-reso!ved c!adesare congruent, although many of the c!ades show !ow bootstrap support. Incongruence between these estimates occurs in severa! cases, some marked by strong branch support and others being poor!y supported. Examp!es include: (1) the conflicting yet robust resolutions of subsect. Contortae, either as the sister !ineage to the remaining members of sect. Trfoliae (cpDNA),or as a more derived lineage within sect. Trfoliae (nrDNA); (2) the poor resolution among subsects. Gerardianae, Krempfianae and Nelsoniae (Wang et al. 2000b; Gernandt et al., 2003; Zhang and Li, 2004); and (3) the monophyly or paraphy!y of sect. Pinus. Even in regions ofcoarse topological agreement, both analyses are plagued by a nearly universal !ack of resolutionamong termina! taxa in the species-rich subsections Ausirales, Pinus, Ponderosae, and Stro bus. Despite Ca. 30 published studies over the past two decades (reviewed in Price et al., 1998; Liston et al., 2003), a wel! resolved phy!ogeny of Pinus remains a work in progress. Resolving relationships among recent!y diverged taxa isa pervasive prob!em in molecular systematics, and it is complicated by an array of bio!ogical phenomena (use of inappropriate molecu!es for recent divergence events, rapid radiations, reticulation) and analytica! shortcomings (e.g., inability of cladistic models to accommodate species non-coalescence)(Sma!! et al., 2004). In Pinus, nrITS and cpDNA gene trees are largely unresolved at this scale of divergence. For examp!e, a combined analysis of four cpDNA !oci (>3 500 bp) yie!deda 6-species polytomy in a lO-taxa sample of subsect. Strobus (Wang etal., 1999), anda combined analysis of four cpDNA loci (>3 200 bp) yie!ded a 7-species polytomy for the 7 samp!ed membersof subsects. Australes, Leiophyllae, and Oocarpae (Geada Lopez et al., 2002). Similarly,2400 bp of nrDNA sequence fai!ed to resolve re!ationships among subsects. Cembroides, Nelsoniae,and Balfourianae, and the nrDNA gene tree contains a 13-species po!ytomy for the15 species in Ir

subsect. Cembroides (Gernandt et al., 2001). The lack of resolution in cpDNA-based studies results from insufficient sequence variation (Gernandt et al., 2005), while intragenomic polymorphism in nrITS (arising from weak concerted evolution or a failure of intraspecific coalescence; Gernandt et al., 2001) limits the utility of nrDNA. Thus, cpDNA and nrDNA are unlikely to provide sufficient character support for a robust phylogeny of Pinus. Even if either data source were to provide satisfactory resolution, there is conflict between these two sources of data (such as the placement of subsect. Contortae) that is not easily resolved. Recent studies using low-copy nuclear loci (Harris and Disotell, 1998; Springer Ct al., 2001; Cronn et al., 2002; Malcomber, 2002; Cronn et al., 2003; Rokas et al., 2003; Alvarez et al., 2005) reveal the power of multiple independent markers for disentangling phylogenetic relationships at low taxonomic levels. These studies suggest that independent markers that sample a range of substitution rates and patterns (Yang, 1998) can provide greater resolution than cpDNA and nrDNA (alone or combined) for resolving rapid radiations, relationships among closely-related species, and complex historical hybridization events (Harris and Disotell, 1998; Springer et al., 2001; Cronn et al., 2002; Malcomber, 2002; Cronn et al., 2003; Small Ct al., 2004). The large nuclear genome of pines (22-3 7 pg/2C; Grotkopp et al., 2004) hinders the development and application of low-copy nuclear markers for phylogenetic applications, as multi- gene families (Kinlaw and Neale, 1997) and repetitive and retrotransposon DNA (Friesen et al., 2001) are abundant. Despite these obstacles, genomics and genetic linkage mapping efforts directed towards P. taeda (loblolly pine; Brown et al., 2001) and P. radiata (Monterey pine, Devey et al., 2004) have revealed a large number of low-copy markers that show orthology across pine species (Brown et aT., 2001). Since these conifer anchor loci show strong evidence for positional orthology, they make promising candidates for population and phylogenetic applications in pines and other members of Pinaceae (Brown Ct al., 2004). In this paper, we explore the potential of four conifer anchor loci to resolve evolutionary relationships among the 11 subsections recognized by Gernandt et al. (2005). Each subsection forms a well-supported dade in the intensively sampled cpDNA-based study. Species selected for this study represent the most ancient divergence events (e.g., subgenera, sections) and more recent divergence events (e.g., among subsections). The phylogenetic resolution and support provided by these low-copy nuclear loci (individually and combined) are compared with existing hypotheses derived from chloroplast DNA (Gernandt et al., 2005) and nuclear ribosomal DNA (Liston et al., 2003). Analysis of the concatenated cpDNA, nrITS, and low-copy nuclear data sets 17

provides a revised estimate of pine subsectional phylogeny, and is used to analyze how conflicts among data sets are resolved in a combined analysis.

MATERIALS AND METhODS Plant materials - Twelve species of Pinus (Table 1) were sampled as "exemplars" to represent all 11 subsections recognized by Gernandt et al. (2005). Two of these subsections, subsects. Nelsoniae and Krempfianae, are monotypic. Species from the remaining subsections were chosen because of their economic and ecological importance (e.g. P. taeda, P. radiata, P. ponderosa, P. contorta, P. inonticola) or because of the availability of fresh tissue. All analyses were rooted with Picea based on morphological (Hart, 1987; Farjon, 1990) and molecular (Wang et al., 2000a) phylogenetic studies of Pinaceae. DNA was isolated from fresh or dried needles using the FastPrep DNA isolation kit (Qbiogene, Carlsbad, CA). Haploid megagametophyte tissue was used as the DNA source for select amplifications (P. roxburghii P. taeda, P. gerardiana; Table 1), particularly in cases where heterozygosity for length polymorphisms resulted in poor quality sequences.

Locus selection, amplflcation, and sequencing Over 150 low-copy conifer anchor loci are available as possible candidates for cross- species amplifications in Pinus (Brown et al., 2001; Brown et al., 2004; Temesgen et al., 2001; Krutovsky et al., 2004), all of which have been mapped in P. taeda (Brown et al., 2001; Temesgen et al., 2001). From this list, we screened 73 loci using three selective criteria. First, primers had to readily amplify the target locus across the diversity of the genus,as respresented by P. taeda L. (sect. Trfoliae), P. merkusii Jungh. & deVriese s. 1. or P. thunbergii Parl. (sect. Pinus), P. monticola D. Don (sect. Quinquefoliae), and P. nelsonii Shaw (sect. Parrya). Second, the amplified product needed to be -500 bp or larger in length so that they could containa reasonable number of variable sites. To provide a check for orthology, our third criterionwas that amplification products needed to be homogeneous, with only a single prominent bandupon amplification, and a single nucleotide sequence if amplified from haploid megagametophte tissue. Once sequences were obtained, our final criterion was to exclude all loci showing frame shiftsor exonlintron deletions (evidence of pseudogenization), or unexpectedly high rates of divergence (evidence of potentially paralogous comparisons). 18

Among the loci meeting these criteria, we chose four that show considerable diversity in length, ratio of non-coding/coding sequence, and function. The first, 1FG1934, is derived from a loblolly pine eDNA clone that maps to the distal end of linkage group 3 in loblolly pine (Krutovsky et al., 2004; GenBank X67714). This locus corresponds to a chlorophyll A/B binding protein type Ill B precursor gene that was first characterized in P. sylvestris (Jansson and Gustafsson, 1990; EMBL X14506). The second locus, AGP6, shows high sequence identity to an arabinogalactan-like protein that maps to linkage group 5 from P. taeda (Krutovsky et al., 2004; GenBank AF1O 1785), and is associated with secondary cell wall formation in differentiating xylem (Zhang et al. 2003). As a class, AGP-like proteins are proteoglycans rich in hydroxyproline (HyP) and proline (Pro), which serve as targets for 0-glycosylation (Showalter, 2001). The third locus is 4-coumarate: CoA ligase (4CL), an enzyme in the phenylpropanoid pathway that serves as a precursor pathway to lignin biosynthesis (Zhang and Chiang, 1997), and that maps to linkage group 7 ofF. taeda (Krutovsky et al., 2004). For the purpose of aligning and describing our sequences, we used the full length 4CL sequence from P. taeda (GenBank U39405; Zhang and Chiang, 1997) as a reference. The fourth locus, 1FG8612, is derived from a loblolly pine cDNA clone that maps to linkage group 3 in P. taeda (Krutovsky et al., 2004; C.S. Kinlaw, Institute of Forest Genetics, unpublished; GenBank AA739606), approximately 64 cM distant from IFG 1934. This locus shows high identity with a Late Embryogenesis Abundant (LEA)-like gene identified in Pseudotsuga menzeisii (Iglesias and Babiano, 1999; GenBank AJ012483). Primers used to amplify loci were published previously (Brown et al., 2001; Temesgen et al., 2001; Zhang et al., 2003), and are shown in Fig. 1. In some cases it was possible to amplify universally across Pinus with a single primer combination; for most loci, however, itwas necessary to develop subgenus-specific primers. Amplification protocols generally followed those outlined in Brown et al. (2001) and Temesgen et al. (2001). This involved an initial preheating step at 80C for 2 mm, followed by 35 cycles of 94CC for denaturation, a 62CC52CC "Touch- Down" annealing step (-1C /cycle for 10 cycles; Don et al., 1991), and 72CC extension for 1 mm per kb of amplification product. PCR reactions (25 p1) utilized -50 200 ng of DNA template, 0.4 1iM each primer, 0.2 mM dNTP, 1.5 to 2.0mM MgCl2, 0.13 j.tg/j.tL BSA, and 2 units of Taq polymerase (Fisher Scientific, Pittsburgh, PA) in 1X buffer (10 mM Tris-HCI [pH 9.0], 50 mM KCI). Amplification products were resolved on 1.5% TAE agarose gels containing 0.2% crystal violet; bands were visualized in white light, excised, and purified using QlAquick Gel Extraction II,

kits (QIAGEN, Valencia, CA). Purified products (-4 OOng) were sequenced using the ABI-Prism Big Dye Terminator Mix (ver. 3.1; Foster City, CA). Products were resolved on an Applied Biosystems 3100 Genetic Analyzer capillary sequencer at the Oregon State University Center for Gene Research and Biotechnology Central Services Laboratory. Automated traces were evaluated using the BioEdit Sequence Alignment Editor 5.0.6 (Hall, 1999). In some instances, length polymorphism heterozygosity and/or poor amplification precluded direct sequencing of PCR products. These products were cloned into pGem-T Easy (Promega, Madison, Wisconsin) following manufacturers recommendations. The current study incorporates two previously published data sets. The first includes two cpDNA regions, maturase K (matK) and ribulose 1,5-bisphosphate carboxylase large subunit (rbcL), which have been sequenced for 101 pine species and Picea (Gernandt et al., 2005). The second data set includes a 650-bp portion of the nuclear ribosomal DNA internal transcribed spacer (nrDNA-ITS) region that was sequenced from 47 pine species and Picea (Liston et al., 1999; Liston et al., 2003). The nrITS data set was modified from the original study by adding a sequence from P. monticola (amplified and sequenced using methods described in Liston et al., 1999). Two sequences included in the original reference were replaced; these include P. krempfii (replaced with GenBank AF305061; Williams and Auckland, unpublished) and P. echinata (replaced with GenBank AF367378; Chen et al., 2002). Three species used in our low copy gene analysis are replaced by closely-related taxa for nrITS; Subsect. Balfourianae is represented by P. aristata (P. longaeva in our study; GenBank AF03700); Subsect. Australes is represented by P. attenuata (P. radiata in our study; GenBank AFO3 702); and Picea sitchensis is replaced by Picea rubens (from Germano and Klein, 1999; GenBank AF305061). Due to uncertain homology between subgenera within nrITSl, a stretch of ca. 114 nucleotides (representing positions 95-32 1 in the global alignment) was aligned by subgenus.

Sequence alignment and data analysis Sequences were aligned by ClustalW (in BioEdit) using default parameters. Exon regions aligned with ease, as ingroup and outgroup sequences showed minimal length variation. In contrast, genus-wide alignments of non-coding sequences were non-trivial. For these, separate alignments were carried out for each subgenus using an outgroup from the alternative subgenera (P. monticola for alignment of subg. Pinus; P. roxburghii for subg. Strobus). Regions of shared homologous sequence were analyzed across sectional exemplars using the "align two sequences" 20

(bl2seq) option of NCBI BLAST (http://www.ncbi.nlm.nih.gov/BLAST/). For this analysis the following criteria were set: word sizel5, dropoff value for percentage of shared nucleotides=50%, open gap penalty=5, and gap extension penalty=2. These subgeneric alignments were used as a guide in attempting to construct global alignments. Final adjustments were made by eye to minimize the number of required indels. Alignments are available at TreeBase (SN2289; http://www.treebase.org). Data from individual genes were analyzed independently so that we could evaluate variation on a locus-by-locus basis. Alignment statistics (determined using MEGA 2.1; Kumar et al., 2001) included the average number of characters, number of variable and parsimony informative characters, average within-group p-distance, and average base composition. Indels were counted for each species and averaged across taxa. Phylogenetic analyses were performed on individual data sets using maximum parsimony (MP; PAUP* 4.OblO; Swofford, 2002). Most parsimonious trees were found from branch-and- bound searches, with all characters weighted equally and treated as unordered. Branch support was evaluated using the nonparametric bootstrap (Felsenstein, 1985), with 1000 replicates and tree bisection reconstruction (TBR) branch swapping. In cases when two or more most parsimonious trees were recovered, the MP tree that most closely (exactly, except forJFG 1934) matched the maximum likelihood tree topology was presented. In these cases, phylogenies were inferred using maximum-likelihood estimates with PAUP*. Maximum-likelihood parameters, such as substitution model, base frequencies, shape of gamma distribution, and proportion of invariant sites, were optimized using ModelTest 3.5 (Posada and Crandall, 1998). The nuclear genes used in this study evolve in an environment characterized by recombination and independent assortment. Additionally, three of the four genes (AGP6, 1FG1934, 4CL)contain a high proportion of non-synonymous sites so they may respond to selection in addition to mutation, drift and migration. For these reasons, congruent phylogenetic signal among loci is important to evaluate prior to combining data sets for simultaneous analysis (Bull et al., 1993). To assess congruence among pairs of individual data sets, we used the MP- based incongruence length difference test (ILD; Farris et al., 1994) in a pairwise fashionon all six loci. Null distributions were generated using 1,000 replicates of data sets pruned of invariant and uninformative characters. This test was implemented using PAUP* ("partition homogeneity" test) with the threshold for significance at P<0.Ol, following the recommendations of Cunningham (1997). Since the ILD test may be prone to Type I errors (see Cunningham, 1997; Yoder et al., 21

2001; but see Hipp et al., 2004), we also used the Templeton or Wilcoxon signed-ranks test ('WSR'; Templeton, 1983) to further explore incongruence inferred in the ILD test. For this test, all most-parsimonious trees recovered for individual genes were used as constraint topologies. P- values were adjusted using sequential Bonferroni correction (Rice, 1989), and results were interpreted conservatively, as the test returning the highest P-value (least significant; Miyamoto, 1996; Johnson and Soltis, 1998) is accepted as the experiment-wide P-value. Results from Templeton tests were considered indicative of heterogeneity among data sets only if significance was observed in both directions (Johnson and Soltis, 1998). Since a clear consensus has yet to be reached on the relative merits of 'total evidence' versus 'conditional' prior agreement approaches for data combination (Huelsenbeck et al., 1996; Sanderson and Shaffer, 2002), we chose to explore both of these approaches.

RESULTS Locus selection Of the 73 loci initially screened, 61(84% of the total surveyed) were excluded because they failed to meet the criteria outlined in Methods. Failure to amplify the target locus genus-wide was the most common problem. Loblolly pine (P. taeda) was the source for nearly all conifer anchor loci, so amplification success was highest for subg. Pinus (P. taeda and P. merkusii or P. thunbergii), and failure was most frequent for members of subg. Strobus (P. monticola and P. nelsonii). In total, twelve loci met our selection criteria (detailed in Willyard et al., unpublished manuscript; and Cronn, unpublished). Among these, four loci were selected for phylogenetic analyses. Two loci were composed entirely of exon sequence (1FG1934, AGP6), and the remaining loci either contained a high proportion (4CL) or a small proportion (1FG8612) ofexon sequence.

Sequence characteristicsof low-copynuclear loci in Pinus

1FG1934 The sequenced region corresponds to amino acids 4- 260 (256 of 274 total) of the P. sylvestris chlorophyll A/B binding protein gene. This exon encodes an N-terminal signal transit peptide (nucleotide positions 1 108) and three ct-helix motifs (aligned positions 285-375; 465- 510; 630-726), and exhibits a G+C content of 60.2% (33.6% G + 26.6% C). Stop codons and insertions were absent in this alignment, although P. krempJIi showeda three base pair deletion (positions 145-147). The apparent structural conservation and low average divergence (mean P- 22

distance = 0.043 5 within Pinus) led to an unambiguous alignment of 1FG1934. The alignment was 770 bp long and it included 99 variable and 55 parsimony informative positions within Pinus (Table 2). Among variable sites, 23 were restricted to first codon positions, 11 to second positions, and 65 to third positions. Amino acid replacements were observed in 26 of the 256 amino acid sites (10.2%).

AGP6 We obtained an aligned sequence of 609 bp for AGP6 corresponding to positions 31 - 640 of P. taedaAGP6. This exonic region shows a high G C content (24.2% G + 42.3% C), and it includes 202 codons with an abundance of proline (3 1.5% across pines), alanine (22.6%), valine (11.9 %) and threonine (11.4%). The random-coil conformation adopted by this glycosylated protein appears prone to indel events, presumably through the expansionlcontraction of HyP-P- (AIV/T) motifs. The first of these repetitive motifs, "TAPAAPTTAKP" (residues 31-41 of the P. taeda AGP6 translation AAF75 821) is present in three (P. nelsonii) or two (P. monophylla, P. radiata, P. roxburghii) near-perfect repeats, with one unit for the remaining species. A second repeat, "PVAPAAAPTKP[A/T]P" (residues 61-73 ofAAF75821), follows the first motif and is present as two tandem units (P. merkusii, P. longaeva) or one unit (all remaining species). A third repeat, "PPVA[T/V/A}" (residues 109-138 ofAAF75821), is present as seven near-perfect repeats in Picea sitchensis, but is fixed at six tandem repeats in pines. Due to the complexity of these repeats, we first translated AGP6 sequences, aligned the amino acids using ClustalW, then converted amino acids back to the nucleotide sequence. Of the 609 bp included in our analysis, 66 nucleotide positions were variable (10.8% overall) and 29 were parsimony informative in Pinus (Table 2). Among variable positions, 9 were restricted to first codon positions, 9 to second positions, and 48 to third codon positions. Amino acid replacements were observed at 16 of the 202 amino acid sites (7.9%). While this gene showed a relatively high non-synonymous rate of substitution, eight of the 16 amino acid replacements involved four common amino acids (P +- A = 2; V *-* L = 2; V A = 4), and no missense frame shifts were detected.

4CL We obtained an aligned sequence of 1315 bp for 4CL, corresponding to nucleotide positions 461-1513 of 4CL from P. taeda. This sequence included 675 bp of exon I that contained 225 codons (residues 107 -331 of 537 total). Indels in the 4CL exon were restricted to a single species, P. nelsonii. This deletion removed 44 bp of exon 1 (aligned positions 631675, 23

inclusive), including the exon 1intron 1 splice site present in all other species. Despite this disruption, the gene may be functional if a GC dinucleotide pair at positions 627-628 is usedas the splice site; this would result in a 19 amino acid deletion that retains the normal reading frame. Of the 675 exon nucleotides included in our analysis, 75 positions were variable and 36were parsimony informative within Pinus (Table 2). Among variable positions, 13were found in first codon positions, 9 in second positions, and 53 in third codon positions. Amino acid replacements were observed at 22 of the 225 amino acid sites (9.8%). Also included in 4CL was intron 1, a region characterized by eight-fold length variation across the included taxa. The shortest 4CL intron originated from Picea sitchensis and was 38 bp in length. Within Pinus, the shortest introns were found in members of Sect. Quinquefoliae, and were 227 - 229 bp in length. The largest introns were present in members of Sect. Parrya (particularly P. monophylla, 539 bp), with much of the expansion due to the gain ofan A/T- microsatellite and regions of low complexity (runs of A or T). Length differences in the 4CL intron were accompanied by substantial variation in nucleotide composition,as %A+T ranged from 70.2% in P. taeda to 76.3% in P. nelsonii. An insertion in the 4CL intronwas observed at 895 bp in the global alignment, and it was restricted to members of sect. Parrya (P. nelsonii, P. longaeva, and P. monophylla). The region varies in length from 252 (P. nelsonii)to 323 bp (P. monophylla) and it appears to be composed of superimposed indels of uncertain homology. Alignment of this stretch was sufficiently problematic to warrant its exclusion from all analyses. As might be expected from such a dynamic region, intron alignmentsacross divergent taxa were difficult. The effect of indels and low sequence complexity is apparent in pairwise BLAST comparisons between P. taeda (subg. Pinus), P. monticola (subg. Sirobus),and four species representing the sections of the genus (Fig. 2). Local alignments between speciesfrom a common subgenus (e.g., P. taeda vs. P. ponderosa or P. merkusii) show relatively long stttches of easily aligned high-quality sequence (cut-off criteria defined bya word size of 15 and sequence identity> 5 0%), with only short regions of non-homologous indels. In contrast, alignments across sub genera returned short stretches of alignablesequence (<40 bp), or else failed entirely to identif' blocks meeting cut-off criteria (e.g., P. taedavs. P. nelsonii; P. monticola vs. P. ponderosa). Due to the uncertain homology between subgenericintrons, we aligned 4CL exon sequences genus-wide, but aligned intronsequences only within subgenus (except adjacent to the flanking sequence where homologywas apparent). Of the 472 aligned intron bases included in our subg. Pinus alignment, 32 positionswere variable and 8 parsimony 24

informative (Table 2). Of the 365 aligned intron bases included in our subg. Strobus alignment, 38 were variable and 15 were parsimony informative.

1FG8612 We obtained an aligned sequence of 2645 bp for 1FG8612, corresponding to nucleotide positions 222-525 of the LEA mRNA described from Pseudotsuga menzeisii. This sequence includes two partial exons that span 304 bp and code for 101 amino acids (28.3 from exon A, 71.7 from exon B). These exons show 37 variable positions within Pinus, of which 20 are parsimony informative. Among variable positions, 11 were restricted to first codon positions, 10 to second positions, and 16 to third positions. Amino acid replacementswere observed at 17 of 101 amino acid sites (16.8%), of which 7 were from exon A, and 10 from exon B. Nucleotide frequencies were relatively balanced in the 1FG8612 exon (24.1%G, 20.4%C, 29.0%A, 26.5%T), and indels and stop codons were absent. The intron from 1FG8612 ranged in absolute length from 765 bp in P. gerardiana to 1024 bp in P. radiata. As observed with the 4CL intron, alignment of the 1FG8612 intron could only be accomplished within members of the same subgenus. Local alignments between members of the same subgenus (e.g., P. taeda vs. P. ponderosa or P. merkusii) showed long blocks of easily aligned high-quality sequence interspersed with short repeats that showed similarityto several regions (e.g., P. taeda vs. P. merkusii; P. monticola vs. P. longaeva) (Fig 2). Alignmentsacross subgenera returned short blocks (<70 bp) of similarity that were composed ofrepeats and stretches of low sequence complexity. We aligned 1FG8612 exonsequences genus-wide, but aligned intron sequences by subgenus. Intron alignments for members of subg. Pinuswere 1177 bp long and included 80 variable positions (11 parsimony informative), and aligned intronsfrom subg. Strobus were 1263 bp long and included 124 variable sites (18 parsimony informative, Table 2).

Phylogenetic analyses -

Individual loci Most-parsimonious trees derived from individual loci show general patterns of conformity across the four low-copy nuclear loci, as wellas differences reminiscent of conflicting resolutions obtained from nrITS and cpDNA (Fig. 3). For example, monophyly ofsubgenera Strobus and Pinus is supported by all low-copy nucleargenes (bootstrap support [BS] values from 89 to 100%), and is in agreement with both nrITS and cpDNA. Within subg.Strobus, sect. Parrya (P. nelsonii, P. longaeva, and P. monophylla) is supportedas monophyletic in all low- 25

copy gene trees except 1FG8612. Data from 1FG8612 resolve P. nelsonii in a novel position sister to sect. Quinquefoliae (P. monticola, P. gerardiana, P. krempfii), but this resolution shows low support (73% BS). The monophyly of sect Quinquefoliae is also supported in three of the four low-copy nuclear genes. Only 1FG1934 provides a clear resolution ofF. krempfii, although this resolutionsister to the remaining members of subg. Strobus- has low support (71% BS). In general, the consensus resolution emerging from three of four low-copy nuclear loci is that these taxonomic sections are monophyletic. This resolution has strong support from the cpDNA data set, and the apparently conflicting resolution from nrITS involves the unique placement of P. gerardiana with its low character support. Equivocal resolutions also occur in trees derived from low copy markers, and the conflict between these markers does little to resolve discrepancies between nrITS and cpDNA. For example, the conflict between nrITS and cpDNA regarding the position of P. contorta remains unresolved. Likewise, within subg. Pinus, P. merkusii and P. roxburghii, are resolvedas sister taxa by 1FG1934 with high support (90% BS). This resolution is supported by cpDNA data (87% BS), although the nrITS does not resolve the relationships of these two species. The remaining low-copy nuclear genes resolve P. merkusii and P. roxburghii as paraphyletic, with 4CL showing moderate bootstrap support (73%).

Conflict among low-copy nuclear genes, nrITS and cpDNA Results from the ILD test (Table 3) show that significant incongruence among the low-copy nuclear loci is not indicated. If these comparisons are extended to include nrITS, only the nrITS vs. 1FG8612 comparison ranksas significant (p=O.O03), indicating a general lack of incongruence between most nuclear loci and nrITS. Comparisons between nuclear genes and cpDNA reveals that 1FG8612 shows significant incongruence with plastid data (p0.003) and that 4CL is near the threshold of significance (p0.O45). For an alternative assessment of conflict, we evaluated the sensitivity of each dataset to topological change using the parsimony-based WSR test. Results from thistest (Table 4) show a number of one-way comparisons that are significant, including comparisons involving1FG8612 (5 pairwise comparisons), nrITS (4 comparisons), cpDNA (4 comparisons), and AGP6and 4CL (1 comparison each). Of these comparisons, only 1FG8612vs. cpDNA was significant intwo directions. For example, constraining cpDNA data to alternative topologies ledto inferences of topological conflict with three data sets (1FG8612, P0.004; 4CL, P= 0.023; nrITS, P = 0.0 10); 26

conflict was only reflected in the alternative test when 1FG8612 data are constrained to the cpDNA topology (P0.027). While a significant test in one direction may be indicative of potential conflict, there have been recommendations to recognize conflict as 'significant' only when it is observed in two directions (Johnson and Soltis, 1998). Using these criteria, the sole case of clear partition conflict involves the nuclear locus 1FG8612 and cpDNA. In total, results from both tests (ILD, WSR) show that the low-copy nuclear gene data sets do not exhibit strong conflict, and that they represent a reasonable partition for combination. The merit of combining the low-copy genes with other partitions, such as nrITS or cpDNA is debatable since both of these data sets show conflict with one of the four low copy nuclear loci (1FG8612). To explore the impact of potentially conflicting data in our phylogenetic analyses, we combined the low copy data with nrITS and cpDNA.

Topological comparisons between low-copy nuclear data, nrITS and cpDNA Combining the four low-copy nuclear genes into a single data set (5,338 aligned characters) yields three MP trees that fully resolve Pinus subsectional representatives (Fig. 4A). In contrast to trees derived from individual genes, many clades exhibit strong character support (> 90% BS), and eight often potential nodes show BS values over 70%. Subgenera Pinus and Strobus are strongly supported (100% BS), as are three of the four sections. The sole exception among subgenera is support for sect. Pinu,s, which is low (71% BS) in this comparison. At the level of subsectional representatives, sect. Trfoliae remains unresolved (Fig 4A) with P. contorta, P. ponderosa, and P. taeda/P. radiata forming a trichotomy in the strict consensus of 3 MP trees (not shown). This strongly supported resolution provides an opportunity to reassess the outstanding questions raised in previous analyses of Pinus nrITS and cpDNA. For example, resolution of subg. Pinus based on nrITS and cpDNA differs in two key aspects. First, the basal divergence inferred from previous analyses of nrITS is a grade, whereby sect Pinus is paraphyletic due to the absence of monophyly between P. merkusii and P. roxburghii. This is not the case in our pruned nrITS analysis where P. merkusii and P. roxburghii are monophyletic, however, both alternatives have low support values. In contrast, cpDNA shows sect Pinus to be the monophyletic sister group to sect Trfoliae (87% BS). The combined low-copy nuclear data (Fig. 4A) indicate that sect Pinus has moderately low support (71% BS) as a monophyletic lineage, a finding that supports the cpDNA topology (but does not explicitly contradict the pruned nrITS tree; Fig. 3). The second point of disagreement in subg. Pinus involves the resolution of subsect. Contortae (P. contorta), either as sister to subsect. Australes (P. taeda, P. attenuata) with nrITS, or as the sister lineage to the remainder of sect Trfoliae with cpDNA. The combined low-copy nuclear data (Fig. 4A) clearly resolves P. contorta as a member of sect. Trfo1iae, but it is one lineage in a trichotomy that includes the remaining samples from subsects. Ponderosae, Australes, and Attenuatae. Among the low-copy nuclear loci, only 1FG8612 provides a clear resolution of P. contorta, placing it sister to the remaining members of sect. Trfoliae as is observed with nrITS (Fig. 3). The remaining relationships within sect Trfoliae of the 1FG8612 tree contradict a substantial body of prior molecular evidence (Strauss and Doerksen, 1990; Liston et al., 1999; Geada Lopez et al., 2002; Gernandt et al., 2005), principally the paraphyly of subsect Australes relative to P. ponderosa. Due to this lack of consistency, we consider the relationship between subsect. Contortae and the remaining lineages within sect. Trfoliae unresolved. Resolution within subg. Strobus based on nrITS and cpDNA differs substantially with respect to relationships within sect Quinquefoliae, specifically the placement of subsects. Gerardianae (P. gerardiana) and Krempfianae (P. krempfii). Whereas nrITS resolves subsect. Krempfianae sister to subsect. Strobus (P. monticola) and subsect. Gerardianae in a weakly supported dade with sect Parrya, cpDNA resolves subsect. Strobus as sister to a weakly supported dade of subsects. Gerardianae and KrempfIanae. Neither of these resolutions have strong support. Indeed, the subg. Strobus dade inferred from nrITS collapses to a polytomy if one only considers nodes with >65% bootstrap support, and the same is true for sect Quinquefoliae relationships in the cpDNA tree. In contrast, the combined low-copy phylogeny reflects the consensus of individual loci by placing subsects. Strobus, Gerardianae, and Krempfianae in a well supported dade (94% BS) with subsect Krempfianae sister to the other two lineages (70% BS). This result adds support to recent phylogenetic and taxonomic investigations based on cpDNA that identify subsects. Strobus, Gerardianae, and Krempfianae as a distinct monophyletic lineage. It differs markedly, however, from the cpDNA resolution in that the morphologically distinct subsect. Krempfianae is resolved as sister to subsects. Strobus and Gerardianae. While support for this node is moderately low (70% BS), the fact that individual analysis of all low copy loci show this same resolution (individual BS = 5 1-81%; Fig. 3) adds measurably toour confidence in this finding. The low-copy nuclear resolution of subsect. Nelsoniae (P. nelsonii) in Fig. 4A underscores the power (and the unexpected outcome) of combining data. The placement of this taxon in the low-copy tree is unambiguous, as it is included in a dade with (94% BS), and sister 28

to (98% BS), the combined lineages of subsects. Cembroides (P. monophylla) and Balforianae (P. longaeva). This resolution conflicts with cpDNA and nrITS trees, which show uncharacteristic agreement (but low support) for subsect. Nelsoniae as sister to subsect Balfourianae. Critically, the topology recovered in the low-copy nuclear data set is supported by only one of the individual nuclear loci, 1FG1934 (59% BS). In contrast, AGP6 and 4CL show these lineages as an unresolved trichotomy, and 1FG8612 places subsect. Nelsoniae in an unusual location sister to sect Quinquefoliae. The final placement and enhanced support of the resolution of subsect. Nelsoniae through combined analysis is somewhat unexpected, arising through the accumulation of modest phylogenetic signal across all individual loci.

Analysis of combined nrITS, cpDNA, and low-copy nuclear data sets As a final analysis, we combined data sets consisting of low-copy nuclear genes, nrITS, and cpDNA to explore how a global analysis would be resolved, with particular attention focused on the possible sources of conflict revealed by ILD and WSR tests (Tables 2, 3). Analysis of these five nuclear loci and one cpDNA partition as a single data set (8886 aligned characters) produces one MP tree that fully resolves Pinus subsectional representatives (Fig. 4B). The most striking feature of this analysis is that adding nrITS and cpDNA data did not change the topology of the combined low-copy tree. However, BS support values show that the addition of nrITS and cpDNA data improve the nuclear resolution (increased BS values) at some nodes, while adding to character conflict (decreased BS values) at other nodes. The addition of nrITS and cpDNA data uniformly increased support for sectional nodes, with nodal BS values increasing by 4 to 21%. Relationships within sect Trfoliaeunresolved in the combined low-copy data setare enhanced by the addition of cpDNA, resulting in a cpDNA-like resolution of these taxa, including the position of subsect Contortae. The support values within sect Trjfoliae are reduced from the cpDNA tree (a result of the strong disagreement with nrITS), particularly at the node separating subsect. Contortae from the remainder of sect. Trfoliae. Topological relationships within sects. Quinquefoliae and Parrya do not change with the addition of nrITS and cpDNA data, but support values for the clades of P. monticolalP. gerardiana and P. monophyllalP. longaeva are reduced from the combined low- copy analysis due to the inclusion of conflicting characters from both nrITS and cpDNA. DISCUSSION Pine phylogeny and taxonomic sampling This is the first study to confidently resolve the relationships among the subsections of subg. Strobus, including the positions ofF. krempfii and P. nelsonii. In subg. Pinus the monophyly of sect. Pinus is weakly supported and the position of sect. Contortae remains equivocal. Additional taxon sampling across the genus may stabilize uncertain relationships in deeper divergence events by breaking up long branches and adding additional synapomorphic sites (Pollock Ct al., 2002; but see Rosenburg and Kumar 2001). A coarse estimate of the influence of taxon density can be investigated by studying the topological differences in the cpDNA and nrITS trees presented here (Fig. 3) and their original, taxonomically-dense analyses (Liston et al., 1999; Gernandt Ct al., 2005). In the case of cpDNA (Fig. 3), the topology is in good agreement with the topology of Gernandt et al. (2005) following pruning of 89 species. The same is true of the nrITS subsectional topology following the removal of 35 species. Minor differences in topology between trees in Fig. 3 and their original citations are restricted to nodes with low support (<6 1% BS). These results suggest that our limited taxonomic sampling does not bias the resolution of subsectional phylogeny.

Which loci have the greatest phylogenetic utility in Pinus? - Molecular studies of PiniLs using cpDNA and nrDNA have returned conflicting topologies at deeper nodes, and largely failed to resolve the numerous and possibly rapid radiations that comprise the terminal clades. These results highlight the need for alternative data sources that can be used to reduce phylogenetic uncertainty by increasing resolution and support. The data used in this study represent six independent molecular sources, each of which have different attributesboth positive and negativewith regard to molecular phylogenetic analysis (Tables 1,4; Fig. 5). Among the loci screened in this study, nrITS shows the highest average p-distance among pine species (0.064), and a higher proportion of phylogenetically informative sites (10.1%) than either the low-copy (6.0%) or cpDNA sequences (3.1%, Tables 1, 4). While this high degree of variation appears favorable by divergence measures, nrITS resolves pines with generally low bootstrap support values (Table 5, Fig. 3); for these exemplars, only three nodes show greater than 65% bootstrap support, and no nodes show greater than 80% support. It could be argued that additional sampling of nrITS could improve resolution and support, especially sinceour sample 30

of-630 bp represents one-fifth of the 3000+ bp ITS 1-ITS2 region inPinus(Liston et al., 1999; Gernandt et al., 2001). Despite the promise of additional characters, Gernandt etal. (2001) showed that concerted evolution is weak in the 5' region of ITS!. The absence of complete concerted evolution is increasingly reported in phylogenetic studies (summarized in Alvarez and Wendel, 2003; Campbell et al., 2005). In such cases, allelic heterozygosity and non-allelic (paralogous) diversity can accumulate, placing practical limitations on the use and interpretation of ITS variation in a phylogenetic context. For these reasons, we believe that nrITS is unlikely to provide new insights into thePinuphylogeny. Chloroplast coding DNA generally evolves at a conservative rate, making it useful for the resolution of more ancient divergence events (Clegg etal. 1994; Small et al., 2004). Average p-

distances among the species in this study were 0.0 17 for two cpDNA loci(matK, rbcL),a value that is about one-fourth the rate of nrITS and one-half the rate of the four nuclear genes. Where this genome fails to provide information is at cladogenic events involving the most diverse groups (Krupkin et al., 1996; Geada Lopez et al., 2002; Gernandt et al., 2005). Increased sampling of non-coding regions has the potential to increase resolution (Shaw et al., 2005; Gernandt, pers. comm.). However, even if a resolved topology is obtained with cpDNA, this genome represents a single linkage group. As has been shown, over-reliance on cpDNA (or any single data set) can provide strongly supported yet erroneous phylogenetic inferences since it is susceptible to lineage sorting of ancestral polymorphisms, as well as ch!oroplast capture through introgression (reviewed in Wendel and Doyle, 1998). Both phenomena can lead to inaccurate organisma! phylogenies that are difficult to verify in the absence of alternative hypotheses. The four nuclear genes used in this study diverge at rates intermediate to nrITS and cpDNA, with locus structure being highly influential in determining overall variability. Among low-copy nuclear loci, there is an eight-fold difference between the slowest and fastest evolving regions (AGP6 in subg.Pinusand4CLintron in subg.Strobus,respectively). Average pairwise distances (Table 2) across appropriate clades (genus-wide for coding regions, within subgenera for non-coding regions) indicate that exons diverge 2.1 times faster than cpDNA, and introns diverge 1.3 times faster than nrITS. Introns from4CLand 1FG8612 are 1.9 to 2.3 times more variable than their respective coding regions (Table 2). In general, introns are attractive targets for evaluating relationships among closely-related taxa since they diverge at relatively rapid rates (Small et al., 2004). This will be desirable (and necessary) as we focus on reconstructing relationships among closely related species, such as 31

those in subsections Australes and Ponderosae. It should be noted that inferred rates of divergence alone can be misleading with respect to phylogenetic utility or informativeness. For example, the 4CL intron shows the highest rate of divergence and greatest number of phylogenetically informative sites among all sequences examined (Table 2) yet it shows72% A+T. This distorted base composition may limit the resolution of taxa due to the reduced genetic alphabet and increased incidence of homoplasy. In addition, the high lability of this region leads to alignment difficulties due to AlT microsatellites and single sequence repeats (Fig. 2). We predict that introns with roughly equal base frequencies (such as 1FG8612) will be preferredeven if they show a lower rate of change. For gaining insight into the overall phylogenetic pattern of Pinus subsections (agroup containing both ancient and recent divergence events), a combination of coding and non-coding regions is required to obtain a (relatively) well-supported phylogeny across thegenus (Fig. 4A). For example, the ancient divergence event leading to the formation of subgenera cannot be revealed using intron sequences alone (such as 1FG8612) due to the absence of recognizable homology in these regions (Fig. 2). Exon sequences also show limitations, although in thiscase selective constraints limit the degree of character change in terminal taxa. Considering that the greatest phylogenetic uncertainty in Pinus resides in terminal lineages (e.g., sections and subsections, down to species), we suggest that future emphasis be placedon rapidly evolving non-coding regions. While length variation among more divergent taxa is likely to limit the breadth of phylogenetic comparisons (e.g., across subgenera or sections), this limitationcan be circumvented by the judicious use of slower evolving exon regions. The greatest value of adding additional nuclear genes for studying relationshipsamong pines is that they present multiple evolutionarily independent perspectives thatcan be compared to hypotheses derived from nrITS and cpDNA. These loci reside on three different nuclear chromosomes, and the two loci located on the same chromosome 1FG8612 and 1FG1934are sufficiently distant (> 60 centiMorgans; Krutovsky et al., 2004) that theyare unlikely to show linkage disequilibrium at the species level (Brown et al., 2004). The generalagreement between these independent nuclear and cpDNA loci strengthens earlier findings that Pinus includestwo main lineages (subg. Pinus and Strobus), both of which can be divided intotwo sublineages (sects. Pinus and Trfo1iae within subg. Pinus; sects. Quinquefoliae and Parrya withinsubg. Strobus). Agreement between low copy nuclear loci and cpDNA that differs withnrITS (e.g., monophyly of subsects. Stro bus, Gerardianae, and Krempfianae; Figs. 3, 4A)provides support 32

for earlier inferences that have been identified by cpDNA alone (Gernandt Ct al., 2005). Critically, disagreement between low-copy genes, cpDNA and nrITS (e.g., ilationships in sects. Parrya and Trfoliae) highlights the most problematic nodes, and underscores the need for focused sampling with respect to taxa and characters. In light of the overall lack of resolution within subsections in prior molecular studies of pines, we suggest that useful molecular information is likely limited to non-coding regions of cpDNA and low-copy nuclear loci with a high proportion of silent sites, such as 1FG8612. These genomic resources are readily available (Brown et al., 2004; Krutovsky et al., 2004; Small et al., 2004), and they are certain to be applied with increasing frequency in Pinus and related conifers. Traditionally, one of the most challenging aspects of working with low-copy nuclear genes is assessing orthology in the presence of high heterozygosity (Small et al., 2004). The availability of haploid megagametophyte tissue in Pinus seeds (and most gymnosperms) simplifies this task, making it possible to amplify single haplotypes (e.g., Brown et al., 2004). If amplification products from megagametophyte tissue show 'heterozygosity', it is a clear indication that paralogous loci are being amplified.

Molecular data and prospects for resolving phylogenetic relationships within pines The use of multiple independent loci to explore phylogenetic relationships in coniferous plants is a recent advancement (Wang et al., 2000a; Kusumi et al., 2002), although the use of mapped nuclear loci has only been accomplished in angiosperm groups (Chee et al., 1995; Cronn et al., 2002; Alvarez et al., 2005). Studies of this nature that compare and contrast the properties and utility of a set of loci are important in establishing cost effective approaches for advancing the resolution of evolutionary relationships within any group of taxa (see Small et al., 2004). With the burgeoning list of nuclear loci available for molecular genetic applications in economically- important groups (Fulton et al., 2002; Krutovsky et al., 2004), two hurdles need to be overcome in order to gain a finer evolutionary perspective of pines: (1) reconciling the inevitable conflict that will arise as more genes are added, and (2) resolving widespread, possibly rapid radiations that characterize many subsections. The first issuehow to interpret a phylogeny in the presence of conflictis a topic of continuing debate in molecular phylogenetics (Bull et al., 1993; Miyamoto, 1996; Suchard et al., 2003). Increasingly, phylogenetic studies are being performed on very large samples ranging from tens of genes (Cronn et al., 2002; Takezaki et al., 2004) to entire genomes (Rokas et al., 33

2003; reviewed in Eisen and Fraser, 2003). Results show that incongruence among gene trees is to be expected when estimates are based on a small number of genetically independent data sources; indeed, strongly supported incorrect phylogenies can be derived from a few genes if they show non-independence (e.g., linkage or similar functional constraints). Studies in yeast (Rokas et al., 2003) show that concatenation of a relatively small number of genesperhaps 20can produce stable phylogenies that minimize the influence of evolutionary noise and selective effects. In light of these findings it seems clear that taxonomic conclusions based on a single source of molecular phylogenetic data (e.g., cpDNA; Gernandt et al., 2005) should be considered working hypotheses awaiting confirmation, rather than a final product. The number of loci that will be required to produce a stable pine phylogeny remains unknown, but given the abundance of pine ESTs, the number of comparative linkage maps in Pinus, and the extensive history of seed collection/gene conservation activities in the forest genetics community, a genome-wide phylogenetic estimate of the pine phylogeny based on multiple nuclear loci is a realistic goal. The growing number of cross-genera comparisons in the Pinaceae (Wang et al., 2000a; Krutovsky et al., 2004) shows that similar studies are equally tractable in other coniferous genera. The second obstacle to a resolved pine phylogeny is the numerous, apparently rapid radiations in Sect. Trfoliae and subsects. Pinus, Cembroides, and Strobus (Krupkin et al., 1996; Wang et al., 1999; Gernandt et al., 2001; Geada Lopez Ct al., 2002; Gernandt et al., 2005; Willyard et al., unpublished manuscript). The problem is illustrated by the 48 species of sect. Tr?foliae. Among the four representatives of this section (Table 1), average pairwise divergence at four low-copy nuclear genes is 1.5% over 5,338 aligned bases, and there are 102 variable and 6 parsimony informative sites. Including additional species in future analyses will certainly shift variable sites to the 'phylogenetically informative' category. Nevertheless, there are fewer than 16 synapomorphies uniting these taxa, and the terminal branches leading to P. taeda and P. ponderosa (representing the species rich subsections Australes and Ponderosae) are only 34 and 14 characters long, respectively (Fig. 4A). This amount of variation is unlikely to be sufficient to provide robust character support for resolving the 24 remaining species of subsect. Australes and 17 species of subsect. Ponderosae. As noted, adding nrITS (Liston et al., 1999, 2003) and cpDNA (Gernandt et al., 2005) could add resolution, but it seems likely that many of thesegroups will remain multi-species polytomies even after the acquisition of more data. Fromour current temporal vantage point, such divergences appear nearly instantaneous (e.g., Fig. 4A). We have calculated estimated divergence times for the four lineages of sect. Trfo1iae using nine nuclear 34

gene sequences (Willyard et al., unpublished manuscript); irrespective of the calibration method, the four lineages of sect. Trfoliae appear to have diverged in a brief time period, perhaps as little as 2 to 5 million years during the Miocene. This perspective is valuable, as it suggests that many species groups diversified over a relatively narrow time frame, pethaps in response to climate change or adaptive ecotypic divergence. The near-absence of interspecific crossing barriers within these subsections (Little and Righter, 1965; Garrett, 1979; Critchfield, 1986) attests to their limited genetic divergence and close phylogenetic affinity. A relatedyet possibly more vexingobstacle to resolving recently diverged pine groups is the apparent longevity of allele lineages in conifers. Standard phylogenetic methods rely on the assumption that intraspecific divergence is low relative to interspecific divergence; restated, the branches derived from gene geneologies need to be long compared to within-species coalescence times. This "long branch assumption" has recently been shown to be violated in species of Picea across three low-copy genes (Bouillé and Bousquet, 2005), and at nrITS in Pinus and Picea (Gernandt et al., 2001; Campbell et al., 2005). Comparisons of three nuclear loci from Picea abies, Picea glauca and Picea mariana show that non-coalescence among species was commonplace (Bouillé and Bousquet, 2005). The estimated coalescence time between randomly selected alleles from any two species ranged from 1018 million years ago, and these estimates overlap with estimated species divergence times (13 20 mya). Similarly, preliminary genetic data gathered from the comparably aged North American five-needle pines (Willyard et al., unpublished manuscript; Syring, unpublished) shows a similar pattern for two of the loci used in this study (1FG8612, AGP6). As might be expected, narrowly restricted species (e.g., P. albicaulis, a timberline endemic) exhibit species-level allelic coalescence, but widespread species (e.g., P. lambertiana, sugar pine) share allele lineages with other North American (even East Asian) pine species. If trans-species shared polymorphisms are commonplace inPinus and Picea, itseems likely that similar trends will be revealed in other plant groups with long fossil representation, similar life histories, and large population sizes. Bouillé and Bousquet (2005) cautioned that widespread lack of species-level coalescence may impede phylogenetic estimation in conifers and other trees with similar ecological and reproductive traits. These authors "call for caution in estimating congeneric species phylogenies from nuclear gene sequences" in conifers. Whilewe acknowledge that their results are likely to apply to closely related pine species (within subsections), the congruence of results from four low-copy independent datasources presented 35

herein shows that these markers confidentally resolve phylogenetic relationshipsamong more distantly related pines. This finding underscores the importance of historical eventson the utility of nuclear markers, and in the interpretation of phylogenies derived from theiruse. Considering the dramatic impact intraspecific polymorphism can have on phylogenetic results,we predict that a final resolution of the pine phylogeny will incorporate many sources of data (such as those described in this paper), and a mix of traditional phylogenetic approaches (parsimony, likelihood) and coalescent analyses to evaluate the relative age of species and their complex genealogical relationships.

ACKNOWLEDGEMENTS We thank David Neale, Garth Brown, and Konstantin Krutovsky (USDA Forest Service, Institute of Forest Genetics, Davis, CA) for assistance selecting conifer anchor loci, David Gernandt (Universidad Autónoma del Estado de Hidalgo, Mexico) for sharing cpDNA data prior to publication; and Katie Farrell for laboratory assistance. Roman Businsky (Silva Tarouca Research Institute for Landscape and Ornamental Gardening, Prühonice, Czech Republic), MartinGardner, Fiona Inches, and Philip Thomas (Royal Botanic Garden Edinburgh, Scotland), David Gernandt (Universidad Autónoma de Hidalgo, Mexico), Dave Johnson (USDA Forest Service, Institute of Forest Genetics, Placerville, CA), Richard Sneizko (USDA Forest Service, Dorena Genetic Resource Center, Cottage Grove, OR), and Don Zobel (Oregon State University) madegenerous contributions of seed and needle tissue. Funding for this studywas provided by the National Science Foundation grant DEB 0317103 to Aaron Liston and Richard Cronn, and theUSDA Forest Service Pacific Northwest Research Station.

REFERENCES

Alvarez, I. and J. F. Wendel. 2003. Ribosomal ITSsequences and plant phylogenetic inference. Molecular Phylogenetics and Evolution 29:417-34.

Alvarez, I., R. C. Cronn, and J. F. Wendel. 2005. Phylogenyof the New World diploid cottons (Gossypium L., Malvaceae) based on sequences of three low-copy nucleargenes. Plant Systematics and Evolution 252:199-2 14.

Bouillé, M. and J. Bousquet. 2005. Trans-species shared polymorphismsat orthologous nuclear gene loci among distant species in the conifer Picea (Pinaceae): Implications for the long-term maintenance of genetic diversity in trees. American Journal of Botany 92:63-73. T1

Brown, G. R., E. E. Kadel Ill, D. L. Bassoni, K. L. Kiehne, B. Temesgen, J. P.van Buijtenen, M. M. Sewell, K. A. Marshall, and D. B. Neale. 2001. Anchored reference loci in loblolly pine (Pinus taeda L.) for integrating pine genomics. Genetics 159:799-809.

Brown, G. R., G. P. Gill, R. J. Kuntz, C. H. Langley, and D. B. Neale. 2004. Nucleotide diversity and linkage disequilibrium in loblolly pine. Proceedings of the National Academy of Sciences 101: 15255-15260.

Bull, J. J., J. P. Huelsenbeck, C. W. Cunningham, D. L. Swofford, and P. J. Waddell. 1993. Partitioning and combining data in phylogenetic analysis. Systematic Biology 42:384-397.

Campbell, C. S., W. A. Wright, M. Cox, T. F. Vining, C. S. Major, and M. P. Arsenault. 2005. Nuclear ribosomal DNA internal transcribed spacer 1 (ITS 1) in Picea (Pinaceae):sequence divergence and structure. Molecular Phylogenetics and Evolution 35:165-185.

Chaw, S. M., A. Zharkikh, H. M. Sung, T. C. Lau, and W. H. Li. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: analysis of nuclear l8S rRNA sequences. Molecular Biology and Evolution 14:56-68.

Chee P.W, M. Lavin, L.E. Talbert. 1995. Molecular analysis of evolutionary patterns in U genome wild wheats. Genome 38:290-297.

Chen, J., C. G. Tauer, and Y. Huang. 2002. Nucleotidesequences of the internal transcribed spacers and 5.8S region of nuclear ribosomal DNA in Pinus taeda L. and Pinus echinata Mill. DNA Sequence 13:129-31.

Clegg, M. T., B. S. Gaut, G. H. Learn Jr., and B. R. Morton. 1994. Rates andpatterns of chloroplast DNA evolution. Proceedings of the National Academy of Sciences 91:6795-6801.

Critchfield, W. B. 1986. Hybridization and classification of the white pines (Pinus section Strobus). Taxon 35:647-656.

Cronn, R. C., R. L. Small, T. Haselkorn, and J. F. Wendel. 2002. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes. American Journal of Botany 89:707-725.

Cronn, R. C., R. L. Small, T. Flaselkorn, and J. F. Wendel. 2003. Cryptic repeatedgenomic recombination during speciation in Gossypium gossypioides. Evolution 57:2475-2489.

Cunningham, C. W. 1997. Can three incongruence tests predict when data shouldbe combined? Molecular Biology and Evolution 14:733-740.

Devey, M. E., S. D. Carson, M. F. Nolan, A. C. Matheson, C. Te Riini, and J.Hohepa. 2004. QTL associations for density and diameter in Pinus radiata and the potential formarker-aided selection. Theoretical Applied Genetics 108:516-524.

Don, R. H., P. T. Cox, B. J. Wainwright, K. Baker, and J. S. Mattick. 1991.'Touchdown' PCR to circumvent spurious priming during gene amplification. Nucleic AcidsResearch 19:4008. 37

Dvorak, W. S., A. P. Jordan, G. P. Hodge, and J. L.Romero. 2000. Assessing evolutionary relationships of pines in the Oocarpae and Australes subsections using RAPD markers. New Forests 20:163-192.

Eisen, J. A. and C. M. Fraser. 2003. Phylogenomics: intersection of evolution and genomics. Science 300:1706-1707.

Farjon, A. 1990. A Bibliography of Conifers. Regnum Vegetabile. 122. Koeltz Scientific Books, Koenigstein, Germany.

Farjon, A. 1998. World checklist and bibliography of conifers. Royal Botanical Gardens Kew, Richmond, U.K.

Farris, J. S., M. Källersjo, A. G. Kiuge, and C. Bult. 1994. Testing significance of incongruence. Cladistics 10:315-319.

Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791.

Friesen, N., A. Brandes, and J. S. Heslop-Harrison. 2001. Diversity, origin, and distribution of retrotransposons (gypsy and copia) in conifers. Molecular Biology and Evolution 18:1176- 1188.

Fulton, T. M., R, Van der Hoeven, N. T. Eannetta, and S. D. Tanksley. 2002. Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants. The Plant Cell 14: 1457-1467.

Garrett, P. W. 1979. Species hybridization in the genus Pinus. USDA Forest Service Research Paper NE-436. U.S.D.A. Forest Service Northeastern Forest Experiment Station, Broomall, Pennsylvania, USA.

Geada Lopez, G., K. Kamiya, and K. Harada. 2002. Phylogenetic relationships ofdiploxylon pines (subgenus Pinus) based on plastid sequence data. International Journal ofPlant Science 163:737-747.

Germano, J. and A. S. Klein. 1999. Using species-specific nuclear and chioroplastsingle nucleotide polymorphisms to distinguish between Picea glauca, P. mariana, andP. rubens from single needle samples. Theoretical and Applied Genetics 99:37-49.

Gernandt, D. S., A. Liston, and D. Piflero. 2001. Variation in the nrDNA ITS of Pinussubsection Cembroides: implications for molecular systematic studies of pine speciescomplexes. Molecular Phylogenetics and Evolution 21:449-467.

Gernandt, D. S., A. Liston, and D. Piflero. 2003. Phylogenetics of Pinus subsectionsCembroides and Nelsoniae inferred from cpDNA sequences. Systematic Botany 4:657-673.

Gernandt, D. S., G. Geada Lopez, S. 0. Garcia, and A. Liston. 2005. Phylogenyand classification of Pinus. Taxon 54:29-42. 38

Grotkopp, E., M. Rejmanek, M. J. Sanderson, and T. L. Rost. 2004. Evolution ofgenome size in pines (Pinus) and its life-history correlates: Supertree analyses. Evolution 58:1705-29.

Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series 41:95-98.

Harris, E. E. and T. R. Disotell. 1998. Nuclear gene trees and the phylogenetic relationships of the mangabeys (Primates: Papionini). Molecular Biology and Evolution 15:892-900.

Hart, J. A. 1987. A cladistic analysis of conifers: preliminary results. Journal of the Arnold Arboretum 68:269-307.

Hipp, A. L., J. C. Hall, and K. Sytsma. 2004. Congruence versus phylogeneticaccuracy: revisiting the incongruence length difference test. Systematic Biology 53:81-89.

Huelsenbeck, J. P., J. J. Bull, and C. W. Cunningham. 1996. Combining data in phylogenetic analysis. Trends in Ecology and Evolution 11:152-158.

Iglesias, R. G. and M. J. Babiano. 1999. Late-embryogenesis-abundant (LEA) protein from Pseudotsuga menziesii seeds. Plant Physiology 1 19:806-806.

Jansson, S. and P. Gustafsson. 1990. Type I and Type II genes for the chlorophyll a/b-binding protein in the gymnosperm Pinus sylvestris (Scots pine): eDNA cloning andsequence analysis. Plant Molecular Biology 14:287-296.

Johnson, L. A. and D. E. Soltis 1998. Assessing congruence: empirical examples frommolecular data. In D. E. Soltis, P. 5. Soltis, and J. J. Doyle [eds.], Molecular Systematics of PlantsII: DNA Sequencing, 297-348. Kluwer Academic Publishers, Norwell, Massachusetts,USA.

Karalamangala, R. R. and D. L. Nickrent. 1989. An electrophoretic study of representativesof subgenus Diploxylon of Pinus. Canadian Journal of Botany 67:1750-1759.

Kinlaw, C. S. and D. B. Neale. 1997. Complexgene families in pine genomes. Trends in Plant Science 2:356-359.

Krupkin, A. B., A. Liston, and S. H. Strauss. 1996. Phylogenetic analysis of the hardpines (Pinus subgenus Pinus, Pinaceae) from chioroplast DNA restriction site analysis. AmericanJournal of Botany 83:489-498.

Krutovsky, K. V., M. Troggio, G. R. Brown, K. D. Jermstad, and D. B. Neale.2004. Comparative mapping in the Pinaceae. Genetics 168:447-461.

Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecularevolutionary genetics analysis software. Bioinformatics 17:1244-1245.

Kusumi, J., Y. Tsumura, H. Yoshimaru, and H. Tachida. 2002. Molecularevolution of nuclear genes in Cupressaceae, a group of conifer trees. Molecular Biology and Evolution 19:736-747.

Le Maitre, D. C. 1998. Pines in cultivation: a global view. In D. M. Richardson[ed.], Ecology and biogeography of Pinus, 407-431. Cambridge University Press, New York, New York, USA.

Liston, A., W. A. Robinson, D. Piñero, and E. R. Alvarez-Buylla. 1999. Phylogenetics of Pinus (Pinaceae) based on nuclear ribosomal DNA internal transcribed spacer regionsequences. Molecular Phylogenetics and Evolution 11:95-109.

Liston, A., D. S. Gernandt, T. F. Vining, C. S. Campbell, and D. Piflero. 2003. Molecular phylogeny of Pinaceae and Pinus. Acta Horticulturae 615:107-1 14.

Little, E. L.JR.and F. I. Righter. 1965. Botanical descriptions of forty artificial pine hybrids. USDA Forest Service Technical Bulletin 1345. U.S.D.A. Forest Service, Washington, D.C., USA.

Malcomber, S. T. 2002. Phylogeny ofGaertneraLam. (Rubiaceae) based on multiple DNA markers: evidence of rapid radiation in a widespread, morphologically diversegenus. Evolution 56:42-57.

Miyamoto, M. M. 1996. A congruence study of molecular and morphological data for Eutherian mammals. Molecular Phylogenetics and Evolution 6:373-390.

Nkongolo, K. K., P. Michael, and W. S. Gratton. 2002. Identification and characterization of RAPD markers inferring genetic relationships among pine species. Genome 45:51-58.

Pollock, D. D., D. J. Zwickl, J. S. McGuire, and D. M. Hillis. 2002. Increasedtaxon sampling is advantageous for phylogenetic inference. Systematic Biology 51:664-671.

Posada, D. and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817-818.

Price, R. A., A. Liston, and S. H. Strauss. 1998. Phylogeny and systematics of Pinus. InD. M. Richardson [ed.], Ecology and biogeography of Pinus, 49-68. Cambridge UniversityPress, New York, New York, USA.

Rice, W. R. 1989. Analyzing tables of statistical tests. Evolution 43 :223-225.

Richardson, D. M., and P. W. Rundel. 1998. Ecology and biogeography of Pinus:an introduction. In D. M. Richardson [ed.], Ecology and biogeography of Pinus, 3-46. Cambridge University Press, New York, New York, USA.

Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome scale approachesto resolving incongruence in molecular phylogenies. Nature 425:798-804.

Rosenberg, M. S. and S. Kumar. 2001. Incomplete taxon sampling isnot a problem for phylogenetic inference. Proceedings of the National Academy of Sciences98:10751-10756.

Sanderson, M. 3. and H. B. Shaffer. 2002. Troubleshooting molecular phylogeneticanalyses. Annual Review of Ecology and Systematics 33:49-72. 40

Shaw, J., E. B. Lickey, J. T. Beck, S. B. Farmer, W. Liu, J. Miller, K. C. Siripun, C. T. Winder,E. E. Schilling, and R. L. Small. 2005. The tortoise and the hare II: relative utility of 21 noncoding chiorop last DNA sequences for phylogenetic analysis. American Journal of Botany 92:142- 166.

Showalter, A. M. 2001. Arabinogalactan-proteins: structure, expression, and function. Cellular and Molecular Life Sciences 58:1399-14 17.

Small, R. L., R. C. Cronn, and J. F. Wendel. 2004. Use of nucleargenes for phylogeny reconstruction in plants. Australian Systematic Botany 17:145-170.

Springer, M. S., R. W. DeBry, C. Douady, H. M. Amrine, 0. Madsen, W. W. de Jong, and M.J. Stanhope. 2001. Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Molecular Biology and Evolution 18:1 32-43.

Strauss, S. H. and A. H. Doerksen. 1990. Restriction fragment analysis of pine phylogeny. Evolution 44:1081-1096.

Suchard, M. A., C.M.R. Kitchen, J. S. Sinsheimer, and R. E. Weiss. 2003. Hierarchical phylogenetic models for analyzing multipartite sequence data. Systematic Biology 52:649-664.

Swofford, D. L. 2002. PAUP*4.OblO: phylogenetic analysis using parsimony (* and other methods). Sinauer Associates, Sunderland, Massachusetts, USA.

Takezaki, N., F. Figueroa, Z. Zaleska-Rutczynska, N. Takahata, and J. Klein. 2004. The phylogenetic relationship of tetrapod, coelacanth, and lungfish revealed by thesequences of forty-four nuclear genes. Molecular Biology and Evolution 21:1512-1524.

Temesgen, B., G. R. Brown, D. E. Harry, C. S. Kinlaw, M. M. Sewell, and D. B. Neale.2001. Genetic mapping of expressed sequence tag polymorphism (ESTP) markers in loblollypine (Pinus taeda). Theoretical Applied Genetics 102:664-675.

Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavagesite maps with particular reference to the evolution of humans and theapes. Evolution 37:221-244.

Wang, X. R., Y. Tsumura, H. Yoshimaru, K. Nagasaka, and A. E. Szmidt. 1999.Phylogenetic relationships of Eurasian pines (Pinus, Pinaceae) basedon chloroplast rbcL, matK, rpl2O-rpsl8 spacer, and tm V intron sequences. American Journal of Botany 86:1742-1753.

Wang, X. Q., D. C. Tank, and T. Sang. 2000a. Phylogeny and divergencetimes in Pinaceae: evidence from three genomes. Molecular Biology and Evolution 17:773-781.

Wang, X. R., A. E. Szmidt, and H. N. Nguyen. 2000b. The phylogeneticposition of the endemic flat-needle pine Pinus kmempfii (Pinaceae) from Vietnam, basedon PCR-RFLP analysis of chioroplast DNA. Plant Systematics and Evolution 220:21-36.

Wendel, J. F. and J. J. Doyle 1998. Phylogenetic incongruence: windowinto genome history and molecular evolution. In D. Soltis, P. Soltis, and J. Doyle [eds.], MolecularSystematics of Plants II, 265-296. Chapman and Hall, New York, New York, USA. 41

Willyard, A., J. Syring, D. Gernandt, A. Liston, and R. Cronn. Unpublished manuscript. Molecular evolutionary rates indicate a recent and rapid diversification for modem pine lineages.

Wu, J., K. V. Krutovskii, and S. H. Strauss. 1999. Nuclear DNA diversity, population differentiation, and phylgenetic relationships in the California closed-cone pines based on RAPD and allozyme markers. Genome 42:893-908.

Yang, Z. 1998. On the best evolutionary rate for phylogenetic analysis. Systematic Biology 47:125-133.

Yoder, A. D., J. A. Irwin, and B. A. Payseur. 2001. Failure of the ILD to determine data combinability for slow loris phylogeny. Systematic Biology 50:408-424.

Zhang, X. H. and V. L. Chiang. 1997. Molecular cloning of 4-coumarate:coenzyme A ligase in loblolly pine and the roles of this enzyme in the biosynthesis of lignin in compression wood. Plant Physiology 113:65-74.

Zhang, Y., G. Brown, R. Whetten, C. A. Loopstra, D. B. Neale, M. J. Kieliszewski, and R. R. Sederoff. 2003. An arabinogalactan protein associated with secondary cell wall formation in differentiating xylem of loblolly pine. Plant Molecular Biology 52:91-102.

Zhang, Z. Y. and D. Z. Li. 2004. Molecular phylogeny of section Parrya of Pinus (Pinaceae) based on chloroplast matK gene sequence data. Acta Botanica Sinica 46:17 1-179. 42

Table 2.1. Sampled Pinus species andPiceaoutgroup. Taxa are listed by subgenus with information in the following format: taxona, section, subsection, voucher information", GenBank accession number (1FG1934, AGP6, 4CL, 1FG8612). A dash indicates the region was not sampled

(1FG8612 Picea),that sequences were taken from GenBank (P.taeda;see footnote), or that an

alternative voucher source was used (P.roxburghii).Information for nrITS and cpDNA are reported in the original citations with modifications noted in the methods.

Taxona; Section; Subsection; Voucher informationt'; 1FG1934, AGP6, 4CL, 1FG8612

Subgenus Pinus P. merkusi! Junghuhn & de Vriese; Pinus; Pinus; Thailand, X.R. Wang 956 (no voucher); AY6 17085, AY634320, AY634352, AY634337 P. roxburghii Sargent; Pinus; Pinaster; Nepal, Kew 1979.06 113 (K);, -, AY634353, - P. roxburghii Sargent; Pinus; Pinaster; India, S.C. Garkofij s.n., RMPC 0416e (no voucher); AY6 17086, AY634321, -, AY634338 P. radiala D. Don; Trfo1iae; Australes; California, USA, RMPC 0418 (OSC); AY617083, AY634318, AY634350, AY634335 P. taeda Linnaeus; Trfo1iae; Australes; Georgia, USA, R. Price s.n. (no voucher)e.f; -, AY634332 P. contorta Douglas ex Loudon; Trfoliae; Contortae; Oregon, USA, A. Liston 1219 (OSC); AY6 17084, AY634319, AY634351, AY634336 P. ponderosa Douglas cx P. & C. Lawson; Trfo1iae; Ponderosae; Oregon, USA, F.C. Sorenson s.n., RMPC 0415 (OSC); AY6 17082, AY6343 17, AY634349, AY634334 43

Table 2.1. Sampled Pinus species and Picea outgroup (Continued).

Subgenus Strobus P. Iongaeva D.K. Bailey; Parrya; Balfourianae; California, USA, D.S. Gernandt 03099 (OSC); AY6 17094, AY634329, AY634361, AY634346 P. monophylla Torrey & Fremont; Parrya; Cembroides; California, USA, D.S. Gernandt 404 (OSC); AY6 17092, AY634327, AY6343 59, AY634344 P. nelsonil Shaw; Parrya; Nelsoniae; Mexico, D.S. Gemandt & S. Ortiz 11298 (OSC & MEXU); AY6 17095, AY634330, AY634362, AY634347 P. gerardiana Wallich ex D. Don; Quinquefoliae; Gerardianae; Pakistan, R. Businslq' 41105 (MLOGdf; DQ018375, DQ018373, DQ018377, DQ018379 P. krempfii Lecomte; Quinquefoliae; Krempflanae; Vietnam, P. Thomas, First Darwin Expedition 242 (E); DQ018376, DQ018374, DQ018378, DQ018380 P. monticola Douglas ex D. Don; Quinquefoliae; Strobus; Oregon, USA, J. Syring 1001 (OSC); AY6 17090, AY634325, AY634357, AY634342

Outgroup Picea sitchensis (Bong.) Can.; Oregon, USA, J. Syring 1002 (OSC); AY6 17096, AY63433 1, AY634363,

Taxonomy follows Gemandt et al. 2005. Some taxonomis restrict P. merkusil to populations from the Phillipines and Sumatra, in which case the plant used here would be consided P. latteri Mason. Herbarium acronyms follow Index Herbariorum: http://sciweb.nybg.orscience2'IndexHerbariorum.a RMP is the Resource Management and Production Division of the Pacific Northwest Forest Science Center in Corvallis. Oregon RILOG is the Silva Tarouca Research Institute for Landsce and Ornamental Gardening, 252 43 Prhonice, Czech Republic DNA extracted from megagametopl'te rather than leaf tissue Sequences for P. taeda fromIFG 1934, AGP6,and4CLwere taken from GenBank (H75 103, AF 101785, U39405, respectively) Table 2.2. Variability of low-copy nuclear genes in Pinus, with comparisons to nrITS and cpDNA. !FG1934 Gene Region exon subg. Strobussub. Pinus Taxon LengthbAligned 770 769.5 (767, 770) Avg. Lengthc 770 (na)(range) %G +60.459.9%C Variable sites35 (7/3/25) (1/2/3)d Sitese 1710P1 Average DT (SE)0.01 72 (.003) Indels9 0 (0) AGP6 exon subg. Strobusgenussubg. Pinus Pinus 609582540770 769.8490.5503.5477.5 (767, (462-543)(465-543)(462-498) 770) 66.566.166.860.2 99(23/11/65)55(11/8/36)6644(4/8/32)15 (9/9/48) (3/2110) 2911554 0.00940.03750.03520.04350.0269(004) (.003) (.006)(.005) 2.41.7 (37.8) (47.1)(35.7) 0.10.2 (3) 4CL intronexon 1 1 subQ. Strobusgenussubg.subp. Pinus Pinus 472675 671.3667.5358.7 (630, (328-417) 675) 675 (na) 28.154.754.355.1 75(13/9/53)3222 (5/4/23)(3/5/14) 32 36 864 0.03630.01960.01220.036 (.007)(.004)(.003) (.005) 10.5 (7.0)0.10.2 (45) 0 (0) 1FG8612 exonAB suba.subq. Strobusgenussubc. Pinus Pinus 304365 201.7 (110-229)" 304 (na) 44.544.427.5 15 19(5/4/6) (4/7/8) 38 201562 0.02720.0759(.017)0.01 86(.009) (.004) 2.5(11.3) 0 (0) nrlTS intron A 5.8S,ITS1, subg. StrobusStmbussubg.subQ. Pinus Pinus 12631177635626 963.3610.7 (869-1024) (609-612)602908 (586-611) (765-976) 54.855.940.042.8 37(11/10/16) 124735180 171811 9 0.03070.04180.02980.0311 (.004)(.007) (.004)(.009) 13.0 (27.5)9.7 (11.0)3.3(1.3) 0(0) cpDNA rbcL matKITS2 + subg.genus Stmbus Pinus°subci. Pinus 2806515 515 (497-508) 2806 (na) 40.540.255.6 75(11/23/38)59 (16/8/34) 99 202155 0.00860.01050.06440.0418 (.001)(.007)(.005) 11.87.8 (2.5) (1.7) 00(0) (0) ba Excludes positions 95-32 1 of ITS! in the global alignment which were aligned by subgenus genus Pinus 2806 2806 (na) 40.4 as ned in the methods. 162 (43/42/77) 88 0.0177 (.002) 0 (0) dC not used in determiningTotalNumberAligned all variable other of lengthcharacters statistics positions in base perpresented in pairsspeciesalignment determined in averaged this (parsed table. usingover by codon all outgroups species position infrom the 1, alternative2alignment, subgenera"na" refers (P. to monticola or 3). no variability across species. or P. merkusii) or using Picea in the case of genus totals. Outgroups parenthesesg areAverage the NotP-distance,averageNumber number incorporating length oftheand parsimony ofproportion length thethe indelssection of informativeindels. of in nucleotideof base excludedNumber pairs. sites. sites ofalignment; inferred at which indelssee paired Methods were sequences counted for details are for different. each species in an alignment and averaged over total number of species. Numbers in 45

Table 2.3. Pairwise incongruence length difference test (ILD) results. Picea included in the analysis.

1FG1934AGF64CL 1FG8612 nrITS AGP6 0.812 4CL 0.610 0.702 1FG86120.390 0.143 0.079 nrITS 0.358 0.438 0.313 0.003** cpDNA 0.231 0.323 QØ45*0.003** 0.060

Notes: * significant at P0.05; ** significant at P0.01; 46

Table 2.4. Templeton test results of six loci and two combined data sets (one including all low- copy nuclear genes and the other a concatenation of all six loci). Outgroup rooting was used for all topologies.

Constraint Topology Low All Data Set #Trees 1FG1934 AGP6 4CL 1FG8612 nrlTS cpDNA Copy Loci 1FG1934 50 0469 0 542 0.503 0.491 0.635 0.629 0.629 AGP6 6 0 152 0 180 0.170 0.057 0.182 0.393 0.393 4CL 90 0205 0292 0.164 0.294 0.164 0.564 0.292

1 0.024* 1FG8612 0.102 0.101 0 012* 1 0 027* I 0.180 0.118 nrlTS 4 0.791 0.774 0.791 0 415 0 944 0.992 0.828 cpDNA 2 0.078 0.137 0.023* 0 004 0 010** 0.846 0.846

LowCopy 1 0.826 0.787 1.000 0.719 0.039* 0.206 1 000 All loci 3 0.058 0.0632 0.145 0.011* 0.019* 0.234 1 000

Notes: * significant at P0.05; ** significant at P0.01; double boxes are significant in both directions following sequential Bonferroni correction. 47

Table 2.5. Comparison of three data sets used in reconstructing Pinus phylogeny. Picea not included in statistics.

cpDNA nrITS low-copy tandem Number of genes 2 arrays 4 Average base pairs 2806 606 3451

Phylogenetically 88 61 206 informative characters (3.1%)(10.1%) (6.0%)

Linkage groups 1 1 4

Consistency Index 0.86 0.86 0.90 Resolved nodes in strict consensus 8 9 8 Nodes wI >70% bootstrap support (10 nodes max) 8 3 8 Nodes w/ >90% bootstrap support (10 nodes max) 5 0 6 %G+C 40.4 55.4 51.0 Average P-distance 0.017 0.064 0.038 48

/ CL 300bp a b,c, o -. 277-589 Exon 2 I Exon 3) listron2 - d'V

a. 4clpF Ib: GAGAGAGTAGCGGAATFCGCAGBiand on dclpFlb/4dRStrob: 4CL Primer Combinationa and length of amplicons: b Mon1F. GccAxrccTrrrrAcAAGc Total Length: 1085-1397 bp 4clpFlb/4clRSlrob: TAED 1236, ECHI 1186, POND I I86 MERK 1186, ROXB 1185 EclsF: GCSAATCC1TrCTACAAGC tnlron Variability: 4cIpFIb/MontR: CHIA 1037. FLEX 1037, LAMB 1036, MONT 1037, NELS 1126 d MoOR CTGTIAGATGCAAATCAGGTAG Th,ms 377-474 bp MontF/MonR STRO 885. MONO 1195. PEMO I 182, LONG 1150 e. 4clRSlrob: CTGCTFCTGTCATGCCGTA Strobas: 277-589 bp EchF/4clRSlrob: RADI 1055, CONT 1130 PICEA 896 bp Alignment Uted m Analyam: 885-1236 bp

j[g8612

858-1024 l__->214-217

b

a 861 2F1. TGT1'AGCATOCAATCAATCAC Total Length: 1162-1328 bp b. 8612R2: CTGACAGAGTGGGCAGCTTCAT Intron Variability: ifg1934 Picas: 869-1024 bp

Slrobao: 858-1020bp PIcEA. No Sequence

agp6 a. 1934F2: CTACTGCTGCAACAATGGCAA b. 1934112: TAGGCCCAGGCGTIS3UGTr PICEA: 770bp

b nrITS a agpOFl; TCAGGGTCAACAATGGCGTFC 17 b agp6Rl. GGGC1TVItAGTGCGGACG a 162 241-244

Eson Variability: Total Length: 520-604 bp I 5.8S I Picas. 520-556 bp Alignment Uced in Analyom: 462-543 bp Subrepeals° ITSI ITS2 ' Stroba 529-604 bp a. P1N2451 GGCG/CACCCTAGTCCTIC1t PIcEA: 505 bp b. 26S25R: TATGCTrAAACTCAGC000T

Intron Variability: Total Length: 625-641 bp ITS! Picas: 207-210 bp Alignment Used in Analyam: 600-616 bp

ITSI Slrobus: 197-208 bp for details of tandem subrepeats see Gemand!CIa! 2001 ITS2 Picas: 241-244 bp

ITS2 Slrobas: 244 bp PICEA: partial sequence only

Figure 2.1. Gene diagrams for the nuclear loci included in this study. Shown is the size (bp) of each locus and the location of exons (open bars) and intronsor noncoding regions (lines). Arrow heads on exons and introns indicate that the locusor domain extends beyond our sample. Primers used for amplification and sequencingare indicated by small arrows above and below gene diagrams; primer names andsequences are associated with each diagram. Amplicon lengths for Picea amplicon andranges for Pinus subgenera are given.

50

ifg1934 27 trees C -770 L -185 CI -0.83 RI = 0.85 agp6 3 trees C609 L -115 CI -0.90 RI -0.88 MERKUSH 98rTAEDADIATA ROXBURGHII n-H PONDEROSA 64 TAEDA *tfO!ORTA 62 RADIATA 67 90 MERKUSII 65 PONDEROSA ROXBURGHII CONTORTA KREMPFH 94 KREMPFII 100 65 MONTICOLA 72 MONTICOLA ERARDIANA 91 GERARDIANA 83 NELSONII MONOPHYLLA 73 MONOPHYLLA - I LONGAEVA * LONGAEVA

4CL. 12 trees C -1315 L -214 CI -0.97 RI = 0.96 ifg8612 2 trees C = 2644 L -274 C10.94 RI -0.86 ROXBtJRGHII ROXBURGHII 91 100 MERKUSII MERKUSII 61 CONTORTA [___NDEoNDERoSA 88 80 TAEDA CONTORTA 89 PONDEROSA 50* TAEDA RADIATA RADIATA 83 KREMPFII NELSONII 81 73 j- MONTICOLA Ii 73 KREMPFII 89 MONTICOLA L_RDIANA I 92 LONGAEVA * L___ GERARDIANA MONOPHYLLA 90 MONOPHYLLA 1 NELSONII - 1 LONGAEVA

nrITS 3treesC742L=247CI-O.86RI-0.83 cpDNA 3 trees C -2806 L -272 CI -0.86 RI =0.88

52 PONDEROSA CONTORTA 98 r-1 CONTORTA PONDEROSA HTAEDA r-1 72 I 6t__ 100 i-H ATTENUATA r-H TAEDARADIATA ROXBURGHII 53. MONTICOLA MONTICOLA KREMPFII GERARDIANA I GERARDIANA 100 53* KREMPFII LI MONOPHYLLA 85 MONOPHYLLA - 1 HARISTATA LONGAEVA NELSONII l_ NELSONII

Figure 2.3. Most-parsimonious trees derived from individual loci. Trees selected for presentation are those that most closely match the topology of the maximum likelihood tree. Bootstrap values from 1000 replicates and TBR branch swapping are shownnear nodes. All trees are rooted with Picea except 1FG8612 which is midpoint rooted; for clarity the outgroup branch has been omitted. Pinus aristata and P attenuata are used in the nrITS phylogeny as placeholders for P longaeva and P radiata, respectively. Cnumber of characters, L=length of trees, Clconsistency index, RI=retention index. Asterisks (*) indicate nodes that collapse in the strict consensus tree. 51

A. Combined Low-Copy Nuclear B. Low-Copy Nuclear, cpDNA, nrITS

CONTORTA CONTORTAE

96 80 100 PONDEROSA PONDEROSAE

91 * TAEDA AUSTRALES 100 100 * RADIATA AUSTRALES

71 MERKUSII PINUS 92 ROXBURGHII PINASTER

KREMPFH KREMPFIANAE 94 99 MONTICOLA STROBUS 70 61 100 GERARDIANA GERARDIANAE 100

NELSONI! NELSONIAE 100

98 MONOPHYLLA CEMBROITIES 91 LONGAEVA BALFOURIANAE

3 trees C = 5338 L = 801 CI = 0.90 RI = 0.87 I tree C = 8886 L = 1326 CI = 0.88 RI = 0.86

Figure 2.4. Most-parsimonious trees derived from four low-copy nuclear loci combined (A), and cpDNA [matK+rbcL], nrITS, and four low-copy nuclear loci combined (B). Species names are used in (A) and their respective subsections are used in (B); the topologies are identical. Bootstrap values from 1000 replicates and TBR branch swapping are shown near nodes. Both trees are rooted with Picea; for clarity the outgroup branch has been omitted. Crnumber of characters, Llength of trees, Clconsistency index, Rlretention index. Asterisks (*) indicate nodes that collapse in the strict consensus tree. 52

35 I IsiWhhL 30 E 25 V

C., 20

15 a)

10

5

[SI 4 , \ , ,\f \f,, I I \Q I I (y \(

Locus

Figure 2.5. Percent variable characters by coding and non-coding regions forthe six data sets examined in this study. Percent variable charactersare shown at the top of each bar; average number of characters indicated in parentheses following the locusname. 53

CHAPTER 3

Widespread genealogical nonmonophyly in species of Pinus subgenus Strobus

John Syring', Kathleen Farrell1, Roman Businsk2, Richard Cronn3, and AaronListon1

'Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331

USA 2Silva Tarouca Research Inst it ute for Landscape and Ornamental Gardening, Przhonice, Czech Republic 3Pacflc Northwest Research Station, USDA Forest Service, 3200 SW Jefferson Way, Corvallis, Oregon 97331 USA

In press at Systematic Biology 54

ABSTRACT

Phylogenetic relationships amongPinusspecies from subgenus Strobus remain unresolved despite combined efforts based on nrITS and cpDNA. To provide greater resolution among these taxa, a 900-bp intron from a Late Embryogenesis Abundant (LEA)-like gene (1FG86 12) was sequenced from 39 pine species, with two or more alleles representing 33 species. Nineteen of 33 species exhibited allelic nonmonphyly in the strict consensus tree, and 10 deviated significantly from allelic monophyly based on topology incongruence tests. Intraspecific nucleotide diversity ranged from 0.0 0.0211, and analysis of variance shows that nucleotide diversity was strongly associated (P<0.0001) with the degree of species monophyly. While species nonmonophyly complicates phylogenetic interpretations, this nuclear locus offers greater topological support than previously observed for cpDNA or nrITS. Lacking evidence for hybridization, recombination, or imperfect taxonomy, we feel that incomplete lineage sorting remains the best explanation for the polymorphisms shared among species. Depending on the species, coalescent expectations indicate that reciprocal monophyly will be more likely than paraphyly in 1.71 to 24.0x106years, and that complete genome wide coalescence in these species may require up to 76.3x106years. The absence of allelic coalescence is a severe constraint in the application of phylogenetic methods inPinus,and taxa sharing similar life history traits withPinusare likely to show species nonmonophyly using nuclear markers.

INTRODUCTION Whenever a phylogenetic study uses a single individual to represent a species, an implicit assumption is made that the species is monophyletic (Shaw and Small, 2005; Funk and Omland, 2003). To the extent that complicating factors (e.g., reticulation) are rare in the divergence history of terminal taxa, this assumption offers a convenient simplification for sampling. However, sampling a single individual per species provides no opportunity to test the hypothesis of allelic monophyly (coalescence) within species. In molecular phylogenetics, this issuecan become acute, since gene trees are used to infer organismal phylogenies. These gene trees are subject to processes that can result in the nonmonophyly of sequences sampled from a single species (lineage sorting, reticulate evolution; Nei, 1987, Wendel and Doyle, 1998). Errors in phylogenetic estimation can occur when gene tree I species tree incongruence exists, but species sampling is 55

insufficient to detect the responsible phenomena. Many species remain poorly known, and the presence of cryptic taxa or an inadequate taxonomic treatment can be recognized when nonmonophyletic species are discovered in a phylogenetic analysis. Awareness of the existence and complications arising from intraspecific polymorphism is growing. This is illustrated by recently published plant molecular phylogenetic studies (Table 1) that increasingly include sampling to evaluate species level monophyly. Not surprisingly, varying levels of nonmonophyly are encountered as more intensive population-level sampling is included in phylogenetic analyses (Table 1). Despite the prevalence of nonmonophyly, many recent studies in plants do not include multiple samples per species, even in the species rich genera Rhododendron (86 included species; Goetsch et al., 2005), Aconitum (54 included species; Luo et at., 2005), Utricularia (31 included species; Muller and Borsch, 2005), Solanum (14 included species; Levin et al., 2005), Viburnum (41 included species; Winkworth and Donaghue), Silene (16 included species; Popp and Oxelman, 2004), and Dioscorea (67 included species; Wilkin et al., 2005). While many examples of species nonmonophyly are the direct result of inadequate phylogenetic signal (Cross et at., 2002; Roalson and Friar, 2004; Levin and Miller, 2005; Shaw and Small, 2005), recent publications across the plant kingdom have demonstrated that species- level paraphyly and polyphyly can be well supported (Roalson and Friar, 2004; Alvarez et al., 2005; Church and Taylor, 2005; Kamiya et al., 2005; Oh and Potter, 2005; Yuan Ct al., 2005). Causative factors responsible for species nonmonophyly are often difficult to establish. Factors commonly cited include introgressive hybridization (Roalson and Friar, 2004; Kamiya et al., 2005; Mason-Gamer, 2005; Shaw and Small, 2005), incomplete lineage sorting (Chiang et al., 2004; Bouillé and Bousquet, 2005; Kamiya et al., 2005), unrecognized amplification of a paralogous locus (Roalson and Friar, 2004; Alvarez et al., 2005), recombination among divergent alleles (Schierup and Hem, 2000), and imperfect taxonomy including the occurrence of cryptic species (Goodwillie and Stiller, 2001; Treutlein et al., 2003; Roalson and Friar, 2004; Shaw and Small, 2005). Further, paraphyletic species may be the direct result of certain evolutionary processes, as suggested for recent progenitor-derivative speciation (Rieseberg and Brouillet 1994; Rosenberg, 2003). Funk and Omland (2003) reported a common trend of species-level coalescence failure from mitochondrial DNA studies in animals. Their survey of 584 studies and 2319 species found that 23.1% of the studies showed species-level paraphyly or polyphyly. Bouillé and Bousquet 56

(2005) recently demonstrated a striking case of trans-species allelic polymorphism in three low- copy nuclear genes in different species of spruce (Picea). Allelic coalescence times between randomly selected alleles from these spruce species were estimated at 1018 million years ago, values that overlapped with estimated divergence times (1320 million years ago) for the species studied. Because spruces share many life history traits with other temperate zone gymnosperm and angiosperm trees (e.g. highly outcrossing, long-lived perennials with large effective population sizes), Bouillé and Bousquet (2005) suggest that the incomplete lineage sorting phenomenon detected in Picea could hinder the utility of the nuclear markers in phylogenetic analyses of conifers and other trees. The similarities in life history traits between Picea and Pinus (both genera of Pinaceae) suggest that incomplete lineage sorting could be a common feature of the 100+ species in this genus, thus creating a potential obstacle to phylogenetic analyses based on low-copy nuclear genes. Pinus is a diverse and relatively ancient genus with origins that date to the late Cretaceous (145 125 million years ago; Alvin, 1960). Paleontological and molecular data suggest that the first major divergence event separating extant lineages occurred perhaps 85 45 million years ago (Miller, 1973; Meijer, 2000; Magallon and Sanderson, 2002; Willyard et al., in review), giving rise to two distinct lineages recognized today as subg. Pinus and subg. Strobus, the "hard" and "soft" pines, respectively (Liston et al., 1999; Gernandt et al., 2005; Syring et al. 2005). While there is extensive morphological variation in the genus, most character states exhibit homoplasy across subgenera and sections (Gernandt et al., 2005). The number of fibrovascular bundles per needle is the only diagnostic character that is non-homoplastic for the two subgenera (one for subg. Stro bus, two for subg. Pinus). Recent classifications of subg. Strobus recognize 36 40 species (Farjon, 2005; Gernandt et al., 2005), with complete agreement on 34 species and alternative treatments for the remaining taxa. Specific treatments for regions with high endemism, e.g., Mexico (Perry, 1991) and East Asia (Businsk, 1999, 2004), distinguish narrower taxonomic limits and thus recognize several additional species. Within subg. Strobus the recent classification of Gernandt el al. (2005) recognizes two sections, Quinquefoliae and Parrya, each containing three subsections (Table 2). Despite the relative antiquity of the subgenus Strobussubgenus Pinus split, molecular evidence suggests that extant species within sections from these subgenera showa high degree of genetic similarity. For example, average pairwise nucleotide divergence (iç) of species from subg. Strobus ranges from 0.84% for two chioroplast genes (Gernandt et al., 2005) and 5.6% for eleven 57

nuclear genes (Willyard et al., in review). By integrating this information with fossil calibrations, Willyard et al. (in review) showed that the lineages harboring the greatest number of soft pine species (Subsects. Strobus [21 species] and Cembroides [11 species]) arose between 10 and 20 million years ago. The combination of recent divergence, generally very large effective

population sizes (Ne),outcrossing reproduction, and large geographic ranges makes it likely that mutations have spread and become fixed slowly across a species' range. This presents the potential for long-lived allelic diversity that spans one or more speciation events. Pinus subg. Strobus has a long history of systematic inquiry (reviewed in Critchfield 1986; Price et al., 1998; Wang et al., 1999; Gernandt et al., 2005; Syring et al., 2005). To date, however, relationships among the terminal taxa remain nearly unresolved (reviewed in Syring et al., 2005). In pines, low-copy nuclear genes are an untapped resource for clarif'ing these terminal relationships, especially when genetic variation is interpreted within a framework where species monophyly can be assessed, and where the impact of incomplete lineage sorting can be determined. Data from multiple low-copy nuclear loci in pines (Syring et al., 2005) provides initial evidence that intraspecific diversity is confined within Pinus subsections, although the frequency of non-coalescence at the species level has not been previously addressed. In this paper, we present a phylogenetic analysis of subg. Strobus using the most informative nuclear locus identified in a recent survey (Syring et al., 2005), a ca. 900-bp intron localized within a Late Embryogenesis Abundant (LEA)-like gene. The goal of this study is to intensively sample the remaining species of subg. Strobus, particularly the species-rich subsects. Strobus and Cembroides, and to place them within the phylogenetic framework developed in prior studies (Gernandt et al., 2005; Syring et al. 2005) with multiple markers. In addition, we seek to examine the impact of intraspecific variation and patterns of allelic coalescence on the accuracy of the derived phylogeny. To achieve this goal, we sequenced a minimum of two alleles for 33 of the 39 species used in this analysis. This sampling strategy allows us to address three important questions: (1) How frequently do species show complete allelic coalescence at this locus?; (2) In the absence of species monophyly, at what taxonomic rank do the alleles coalesce?; and (3) If species-level non-coalescence is common, what insights can this provide regarding the nature of pine species, the process of speciation in Pinus, or operational species definitions within this group? 58

MATERIALS AND METHODS Plant materials Thirty-nine species of Pinus subgenus Strobus were sampled (Table 2). Haploid megagametophyte tissue was used as the DNA source for most amplifications and extracted using the FastPrep DNA isolation kit (Qbiogene, Carlsbad, California, USA). In the select cases where needle tissue was used, direct sequencing identified homozygotes and heterozygotes. Homozygous sequences were directly incorporated into the alignment, while heterozygous sequences were cloned into pGem-T Easy (Promega, Madison, Wisconsin, USA). From each of the 39 sampled species, an effort was made to obtain a minimum of two unique alleles. Depending on availability, individuals were selected that spanned the geographic range of the species. In cases of narrow endemism or where collections were limited, two alleles were sequenced from the same individual. Estimates of species distribution area were based on published maps (Critchfield and Little, 1966; Malusa, 1992; Farjon and Styles, 1997; Delgado et al., 1999, Ledig et al., 1999). For the eight representatives of North American subsection Stro bus, three alleles were chosen from a more extensive data set (four to seven alleles per species; Cronn et al., in prep). The three alleles selected represent the most divergent sequences based on calculated p-distances.

Choiceofoutgroup As described in Syring et al. (2005), introns between pine subgenera are frequently unalignable due to numerous and overlapping indels, repeated motifs, and uncertain sequence homology. This precludes the use of members from subgenus Pinus or more distantly related genera as outgroups in this analysis. Independent evidence from five nuclear genes and cpDNA (Gemandt et al., 2005; Syring et al., 2005) shows that the sections of subg. Strobus are monophyletic and sister to each other; for this reason, members of one section were used as the outgroup for analyzing the alternative section. Pinus nelsonii (Sect. Parrya, subsect. Nelsoniae) is exceptional. Evidence from three nuclear genes (Syring et al. 2005) and cpDNA (Gernandt et al., 2005) resolve P. nelsonii as the sister lineage to the remaining members of sect. Parrya. In contrast, the LEA-like locus used in this study places P. nelsonii in a unique, moderately supported (71% BS) position sister to sect. Quinquefoliae when midpoint rooting is employed. Because of this uncertainty, we chose to include P. nelsonii as a member of the outgroup in 59

analyzing both sects. Quinquefoliae and Parrya. Therefore, P. monticola, P. krempfii, P. gerardiana, and P. nelsonii were used as outgroup taxa for analyzing sect. Parrya, and P. aristata, P. monophylla, and P. nelsonii were used as outgroup taxa for analyzing sect. Quinquefoliae. While the LEA-like locus is too labile to provide insight into the placement of P. nelsonii, future studies using more evolutionarily constrained molecules will be used to investigate the position of this lineage.

Locus amp! jflcation, sequencing, alignment, and recombination analysis - Description of the LEA-like locus and the protocols for amplification, sequencing, and alignment are given in Syring et al. (2005). Gaps were coded as phylogenetic characters using the method of Simmons and Ochoterena (2000) and the on-line program Gap Recoder (R. Ree, http://maen.huh.harvard.edu: 8080/services/gap_recoder); all coded gaps were verified manually. Four previously published sequences (DQ018379, DQ018380, AY634346, AY634347) are incorporated in this study (Table 2). The alignment is available at TreeBase. Statistics calculated from the alignment include the average number of characters, number of variable and parsimony informative characters, average within-group p-distance, and average base composition (determined using MEGA 2.1; Kumar et al., 2001).

Recombination was assessed using sequence-based (maximumx2method) and

topological (difference in sum of squares; DSS) approaches. The maximumx2method (Smith, 1992; Posada and Crandall, 2001) identifies recombinant segments by testing for significant differences among proportions of variable and non-variable polymorphic positions in adjacent regions of aligned sequences. For all possible sequence pairs, a sliding window containing 10% of the variable positions was divided into two equal partitions and moved in one bp increments

x along the alignment. At each increment, a 2 2x2was calculated as an expression of the difference in the number of variable sites on each side of the partition for pairs of sequences.

Putative recombination points were identified by plottingx2values along the length of the alignment. The difference in sum of squares method (DSS; McGuire et al., 1997; Milne et al., 2004) identifies recombinants by using phylogenetic trees constructed from adjacent regions of an alignment. For all sequences, a 150 bp window was divided into two partitions and moved in 20 bp increments. For each increment, a distance matrix (Fitch 84) was calculated and a least squares tree constructed for each partition. Branch lengths were estimated using least squares, and sum of squares for each partition were recorded. Topologies for partitions were then swapped, branch lengths for the forced topology were determined, and sum of squares recorded. DSS values were plotted along alignments, and recombinant segments were identified by peaks in DSS values. Significance for both methods was determined by 1000 permutations (parametric bootstrapping) using a = 0.01 as a threshold of significance and the program RDP2 (Martinet al., 2005). Analyses were conducted on data sets where shared indels were discarded to prevent false positives.

Phylogenetic and statistical analysis Phylogenetic analyses were performed by taxonomic section using maximum parsimony (MP; PAUP* version 4.0b10; Swofford, 2003). Most parsimonious trees were found from branch-and-bound searches, with all characters weighted equally and treated as unordered. Branch support was evaluated using the nonparametric bootstrap (Felsenstein, 1985), with 1000 replicates and TBR branch swapping. Alternative phylogenetic hypotheses and statistical strength for species nonmonophyly were analyzed using constraints on tree topologies in PAUP*. The Wilcoxon signed-rank test (WSR; Templeton, 1983) was employed to test for significant differences among topologies. For this test, up to 1000 of the most-parsimonious trees were used as constraint topologies. The range of P values across all topologies is reported for every test. Statistical associations between intraspecific nucleotide diversity, geographic range (a proxy for species abundance and census population size), and extent of monophyly were explored using analysis of variance (ANOVA). For these tests, species were categorized into two "PI-JYLY" classes, either strongly nonmonophyletic (i.e., intraspecific allelic coalescence resulted in statistically significant topological distortion as shown by the WSR constraint test) or weakly nonmonophyletic to monophyletic (i.e., WSR constraint tests were insignificant). Nucleotide diversity was analyzed directly, and geographic ranges (km2; Critchfield and Little, 1966) were log 10-transformed to minimize skewness and kurtosis. Differences in univariate measures between PHYLY classes were tested using one-factor ANOVA. Statistical analyses were performed using SAS (SAS Institute Inc., 1999).

Coalescent expectations of genic and genomic monophyly- According to Rosenberg (2003) reciprocal monophyly is predicted to be more likely than

X paraphyly under conditions of neutrality at 1.67 2Ne diploid generations, and complete 61

genome-wide coalescence may require5.30x 2Nediploid generations. In order to make these calculations for species of Pinus, Watterson's estimate of Theta (0) was detennined for select species using DnaSP v.4.00.6 (Rozas et al., 2004). Effective population sizes were estimated

using the formulaNe= 01(4x tG),where11Gis the absolute silent mutation rate per nucleotide, adjusted for generation time. The absolute silent mutation rate for Pinus was recently estimated from eleven nuclear genes (Willyard et al., in review) to average 7.0x10.10 substitutions! site/year, assuming an 85 million year divergence of pine subgenera. Generation times for individual species were taken as the average for the range of years to seed bearing age cited in Krugman and Jenkinson (Woody Plant Seed Manual2nded., R. G. Nisley (ed), http://www.nsl.fs.fed.us/wpsm!). In this study we test the extent of monophyly for LEA-like allele lineages within species. For simplicity, we abbreviate this as 'species monophyly' or 'species nonmonophyly', but want to emphasize that we are referring to the coalescence of genealogies of the LEA-like gene and not the extent of monophyly for actual species.

RESULTS Sequences, alignment characteristics, allelic diversity, and recombination We acquired 86 LEA-like alleles from 39 species of Pinus subgenus Strobus (Table 2). From 14 of the 39 species we sequenced three unique alleles, and from 19 species we sequenced two unique alleles. From the remaining six species we obtained a single allele, although in three of these cases multiple sequences returned an identical allele; these include P. peuce (N = 3 different seeds), P.maximartinezii(N = 2), P. squamata (N = 2), and P. kwangtungensis, P. krempfii, and P. rzedowskii (N = I template each). In Pinus krempfii alone, the locus was amplified directly from diploid needle tissue; direct sequencing indicated this individual was homozygous. For three species having two unique alleles, further seed sampling recovered one of the known alleles: albicaulisAl (allele Al, N = 2),chiapensisAl(allele Al; N = 2), and parvjfioraA2 (allele A2; N = 3). For P.bungeana,P. culminicola, and P. morrisonicola both alleles were sequenced from the same individual; for P. discolor (Al and A3), P.koraiensis(Al and A2), and P. lambertiana (Al and A2) two of the three alleles were from the same individual.

Ten sequences included in this data set were cloned:koraiensisA1, koraiensisA2, longaevaA 1, nelsoniiA2, parvfloraA 1, pumilaA 1, remotaA3, and strobusA2. Based on this sample, our phylogenetic estimates of allelic monophyly can be assessed in 33 of 39 included species. 62

Our aligned sequence for the LEA-like locus was 1554 bp in length, with individual sequences averaging 898.5 bp from sect. Quinquefoliae (range = 821 971 bp) and 987.3 bp from sect. Parrya (range = 9371132 bp). The aligned sequence includes a partial exon on the 3' end with 44 complete and two partial codons (135 bp). The remaining sequence is intron and has an aligned length of 1419 bp. The exon has 19 variable positions within subg. Strobus (12 in sect. Quinquefoliae, eight in sect. Parrya), of which three were localized in first codon positions, four in second positions, and 12 in third positions. Within subg. Strobus, inferred amino acid replacements occur at seven of 44 sites. The intron segment included 249 variable sites and 136 parsimony informative (PT) sits within subg. Stro bus. This included 157 variable sites (88 P1) in sect. Quinquefoliae, and 110 variable sites (41 PT) in sect. Parrya. Nucleotide frequencies were relatively AT-rich (25.9% A, 33.5% T, 20.2% G, 20.6% C). Complex and simple indels are frequent across the length of the intron and range in length from a single nucleotide to a 110 bp deletion inarmandiiA1. In total, seventy-four gaps were scored and appended to the alignment.

Average interspecificitfor subg. Strobu.s at the LEA-like locus is 0.0330 ± 0.0029. Estimates of intraspecific iv show that soft pine species display a wide range of genetic diversity and differentiation. Across the 33 species with non-identical alleles, intraspecific nucleotide diversity ranged from 0.0 in P. albicaulis (alleles differ by one indel) to 0.02 11 in P. strobformis, and averaged 0.0085 (Table 3). Other species showing high nucleotide diversity in our sample include P. lambertiana (iv = 0.0196), P. johannis (0.0186), P. bhutanica (0.0185), P. monticola (0.0 155), P. edulis (0.0 158) and P. aristata (0.0149). Members of sect. Parrya showed a trend toward higher intraspecific nucleotide diversity relative to sect. Quinquefoliae, with 10 of 13 species (76.9%) versus 11 of 20 species (55.0%) showing iv? 0.005, respectively (Table 3). Only one allele was shared between two species, namelyayacahuiteAland flexilisAl. This observation was striking, since these samples came from wild collections separated by -3600 km (P. ayacahuite originated from the state of Mexico, Mexico and P. flexilis from Alberta, Canada). This allele has also been sequenced for P. sfrobformis from both New Mexico and Texas (J. Syring, unpublished). All other comparisons between P. ayacahuite and P. flexilis alleles show pairwise distances between 0.00 1 10.0043, suggesting a close genetic affinity. While not containing identical alleles, seven interspecific comparisons yielded highly similar alleles with p-distances0.0011: ayacahuiteAl /flexilisA3, ayacahuiteA2 IfiexilisAl, ayacahuiteA3 I strobformisA2, cembraA 1 / parv?fioraA2, bhutanicaA2 / wallichianaA2, bhutanicaA2 / wallichi anaA3,and culminicolaA2 / remotaA3. 63

Of the 86 LEA-like sequences used in this analysis, only one allele showed evidence of

recombination, despite a relatively lax threshold for significance (a = 0.01). The maximumx2 method identified one recombinant region from strobformisAl between positions 388399. Close inspection shows that strobformisA 1 contains an eleven nucleotide motif (CTTTTGTAGCC) with 37% identity to the same region (e.g., TT-CYACAGGA) from all other species of sect. Quinquefoliae. If this segment is recombinant, it likely arose via xenologous recombination or the PCR process because it lacks similarity to other homologous alleles. The topology-based DSS method provides no evidence for recombination in strobformisA 1 or other sequences. Since the single putatively recombinant segment from strobformisA 1 only adds autapomorphic steps to this lineage and does not distort the topology (data not shown), we included this sequence in all analyses (the potentially recombinant 12 bases were omitted from

estimates of7r).

Phylogenelic analyses Section Quinquefoliae From the branch and bound search of the sect. Quinquefoliae data set, twenty most parsimonious trees were recovered (Fig. 1). Trees were 350 steps in length, had a consistency index (CI) of 0.8229, and a retention index (RI) of 0.8835. Subsection Krempfianae is sister to a dade of subsects. Strobus and Gerardianae (67% bootstrap support, BS). Support for the topology within subsect. Gerardianae is relatively high with 81% BS for the monophyly of the subsection, and 90% BS for the resolution of the rare endemic P. squamata as sister to P. gerardiana and P. bungeana. Alleles from species from subsect. Strobus resolved into five major groups (Fig. 1), all of which were present in the strict consensus tree. Bootstrap support ranged from lacking (< 50%; clades A and D), or moderate (74%; dade E), to very strong (> 98%; clades B and C). Clade A includes alleles from five North America species (P. strobformis, P. albicaulis, P. strobus, P. lambertiana, P. monticola) and two east Asian species (P. koraiensis, P. pumila). Phylogenetic relationships within this group are unresolved, and bootstrap support is primarily limited to uniting alleles from the same species. The sole exception is the 100% BS uniting alleles lam bertianaA 1 and monticolaA 1. Despite the sister relationship among these alleles,a sister relationship among these species is contradicted by the lack of allelic coalescence within these two species (described below). 64

Sister to dade A are two strongly supported and exclusively North American groups, one of which includes alleles from five species (dade B; P. ayacahuite, P. flexilis, P. strobformis, P. monticola, P. lambertiana) and the second of which includes only P. chiapensis alleles (dade C). Clade C is the only group where species do not share alleles with another dade. While clades A and B are depicted as sister groups in Fig. 1, relationships between clades A, B, and C are essentially unresolved, and the node supporting dade C as sister to clades A / B collapses in the strict consensus tree. Relationships within dade B generally show little resolution with the exception of alleles from P. ayacahuite, P. flexilis, and strobformisA2, all of which form a dade with moderate support (76% BS). This association follows taxonomic interpretations for these three species, which have been interpreted as a dine of similar morphologies (Critchfield and Little, 1966) running from north (P. flexilis) to south (P. ayacahuite). In addition, the remaining alleles from P. monticola (A2, A3) both resolved within dade B, as did one allele from P. lambertiana (A2). The remaining Eurasian species are found in clades D and E. Clade D is composed entirely of alleles sampled from European (P. cembra, P. peuce) and Asian (P. bhutanica, P. wallichiana, P. sibirica, P. parvflora) species, with the exception of lambertianaA3 which is in an unsupported position as the sister to the remaining members of this dade. Pinus bhutanica and P. wallichiana have alleles in both dade D and dade E. The latter dade is composed strictly of alleles from species distributed in southeastern Asia (P. armandii, P. dalatensis, P. kwangtungensis, and P. morrisonicola).

Section Parrya - Branch and bound searches on the sect. Parrya data recovered 4495 most parsimonious trees (Fig. 2) 292 steps in length (CI=0.843, RI=0.883). Following Syring et al. (2005), subsects. Balfourianae and Cembroides were monophyletic sister lineages, with 97% BS each. Support for relationships within subsect. Balfourianae is high (97% BS), with the placement of P. longaeva as sister to the dade of P. aristata / P. balfouriana, although the latter species is nonmonophyletic in the strict consensus tree. Species of subsect. Cembroides resolved into four clades (F thru I; Fig. 2), all of which were present in the strict consensus tree with the exception of the culminicolaAl and discolorA3 alleles (these collapse out of dade F in the strict consensus tree). With the exclusion of these two poorly supported alleles and johannisA2 from dade I, each of the four major clades within subsect. Cembroides have moderate to strong support (7890% BS), though relationships among and within the clades remain largely unresolved. C lade F contains all alleles from P. cembroides and P. discolor, along with some of the alleles for P. culminicola, P. edulis, and P. remota, the remainder of which are found in dade G. The only bootstrap support found in dade F supports the exclusion of culminicolaAl and discolorA3 from the remaining members (79%). Aside from containing the remainder of the alleles for P. culminicola, P. edulis, and P. remota that are shared with dade F, dade G also contains all three alleles of P. monophylla and one of two P.johannis alleles. Support for the monophyly and relationships among the three alleles of P. monophylla is weak (57 and 55%, respectively), and all remaining support within dade 0 unites alleles from differing species. Clade H contains only the two alleles of P. quadrfolia (90% BS), and similar to dade C of sect. Quinquefoliae, is unique in being the only dade from this section where species do not share alleles with another dade. Clade I contains both alleles of P. pinceana and the singleton alleles of P. maximartinezii and P. rzedowskii. The support for the node uniting these three species is strong (89%), but the node supporting the sister relationship of P. maximartinezii and P. pinceana is only weakly supported (68%). Sister to these three species is the poorly supported johannisA2 (62%), whose other allele is found in a strongly supported position within dade G.

Species monophyly and topological conflict arising from genealogical nonmonophyly- Section Quinquefoliae The strict consensus tree indicates that of the 20 species for which multiple unique alleles were sequenced, 9 species (45%) were monophyletic (Table 3). For species exhibiting allelic monophyly, seven showed moderate to very high support (82100% BS). For species exhibiting paraphyly, eight had at least one well supported allele (83 100% BS) ensuring nonmonophyly. Most severe in this regard were the five species that possess alleles in two or more of the major clades: these include P. bhutanica (clades D, E), P. lambertiana (clades A, B, D), P. monticola (clades A, B), P. strobformis (clades A, B), and P. wallichiana (clades D, E). For species including three alleles, the proportion of monophyletic species was nearly identical to the broader sample (50%; 6 of 12 species), suggesting that small sample sizes generally capture enough intraspecfic heterogeneity to be indicative of allelic monophyly in this section.

Species-level monophyly in the strict consensus tree is not necessarily dependenton having low nucleotide diversity (Table 3). While intraspecific mean p-distancesare low for P. flexilis (0.001) and P. armandii (0.005), these species are not monophyletic. In contrast, P. strobus has a relatively high mean intraspecific p-distance (0.008) and is well supported as monophyletic (88% BS). However, of the seven species having mean intraspecific p-distances> 0.008, only P. pumila was monophyletic in the strict consensus tree, and none had bootstrap support> 50%. Enforcing topological constraints for species monophyly shows that 6 of the 11 nonmonophyletic members of sect. Quinquefoliae return significant results in the WSR test (Table 4), indicating statistical support for their nonmonophyly (a = 0.05). For example, constraining topologies for the monophyly of P. flexilis or P. ayacahuite led to trees with lengths 351 and 352, respectively (one step and two steps longer than unconstrained topologies). These topologies returned insignificant WSR results (P0.3 1730.6547 for P.flexilis and 0.1573 0.4 142 for P. ayacahuite) suggesting that the nonmonophyly of these species is not strongly supported. In contrast, results from P. lambertiana (369 steps, P<0.000 I 0.0003), P. monticola (368 steps, P=<0.00010.0001), and P. bhutanica (367 steps, P=<0.00010.0002) strongly support the polyphyly of alleles from these species.

Section Parrya The strict consensus tree indicates that of the 13 species for which multiple alleles were sequenced, only 5 species (38.5%) were monophyletic (Table 3). The remaining 8 species (61.5%) were either paraphyletic or polyphyletic. For those species showing allelic monophyly, four were strongly supported (85100% BS; the monophyly of P. longaeva has 70% BS). For those species showing paraphyly or polyphyly, five had at least one allele in a moderately or strongly supported position (7990% BS) ensuring nonmonophyly. Alleles from four species (P. culmicola, P. edulis, P. johannis, P. remota) are spread across two or more of the major clades. None of the five species represented with three sequences were monophyletic in the strict consensus tree, suggesting that, in contrast to sect. Quinquefoliae, additional sampling may reduce the number of monophyletic species in sect. Parrya. Following the trend in sect. Quinquefoliae, allelic monophyly is not a simple function of intraspecific genetic diversity (Table 3). While nucleotide diversity is low for the three alleles of P. monophylla (r =0.006), this species is not monophyletic in the strict consensus tree. In contrast, P. aristata (ir =0.0 15) and P. longaeva (iv =0.0 13), once considered to be a single species (see Bailey, 1970), show high nucleotide diversity yet are each monophyletic. 67

Enforcing topological constraints for species monophyly shows that four of the eight nonmonophyletic members of sect. Parrya return significant results in the WSR test (Table 4), and 335 of the 951 trees (35.2%) indicate that the nonmonophyly of P. discolor is significant. It is noteworthy that in both sects. Quinquefoliae and Parrya, all species whose alleles fall into two or more of the major clades return significant WSR results, thereby providing further confidence in the distinctivenessofthese clades.

Subgenus summary Across 33 species from subgenus Strobus, we found that 14 were monophyletic (42.4%) and 19 (57.6%) were nonmonphyletic at the LEA-like locus in the strict consensus tree (Table 3). Tenof33 species (30.3%) were supported as nonmonophyletic both in

the strict consensus tree and in the WSR constraint analyses (referred to as casesof"strong" nonmonophyly; Table 4). Data for the remaining nine species that were nonmonophyletic in the strict consensus tree, but did not return significant results in the WSR tests are considered ambiguous (including P. discolor; referred to as casesof"weak" nonmonophyly). Excluding casesofweak nonmonophyly, sixof15 species from sect. Quinquefoliae (40.0%), and fourof nine species (44.4%) from sect. Parrya show strong allelic nonmonophyly.

Across both pine sections, thePIIYLYofa sampleofalleles from a species (e.g., strongly nonmonophyletic versus weakly nonmonophyletic or monophyletic) was evaluated for statistical associations with two dependent variables, ir and log 10[geographic range]. These two variables are uncorrelated with each other (Pearson's R = 0.0554; P = 0.759). Individually,PHYLYshows a statistically significant association with nucleotide diversity. Average nucleotide diversities in species showing strong deviation from monophyly are significantly higher (ct=0.0 149) than comparable values from monophyletic and weakly nonmonophyletic species combined (a=0.0056; one-way ANOVA, P<0.0000; Table 5). Monophyletic and weakly nonmonophyletic species are not significantly different (2r =0.0049 and 0.0067 respectively; one-way ANOVA,

P'0.339; results not shown). A relationship between PIIYLY and geographic rangeofa species, in contrast, was not detected. Average geographic ranges for these two classes are essentially indistinguishable (one-way ANOVA, P<0.8007; Table 5), with the average geographicrange for strongly nonmonophyletic species at 40,615 km2, and the average range for weakly nonmonophyletic-to-monophyletic species at 52,516 km2. In combination, these analyses indicate that the most important determinantofallelic monophyly is intraspecific variation (and possibly 68

apportionment of genetic variation across species), rather than the geographic range of a species or their census population size.

Estimating Ne, the Time to Reciprocal Monophyly, and Genome- Wide Coalesence Estimates of N, the number of years for reciprocal monophyly to be more likely than paraphyly, and the number of years for complete genome-wide coalescence were calculated for three species of pines (Table 6). These species were chosen because they represent a range of nucleotide diversities (Table 3), as well as the full phylogenetic spectrum from essentially monophyletic (P. flexilis, iv = 0.0014), to weakly non-monophyletic (P. discolor; iv = 0.0112), to

strongly non-monophyletic (P. lambertiana;it= 0.0196). Estimates of 0 are not significantly

different than values ofit(based on Tajima's D statistic; data not shown), so our estimates of 0 are unlikely to reflect errors due to selection. Estimated values for Ne range from Ca. 1.7x1 for P. discolor to 12x iofor P. lambertiana (Table 6). Based on these estimates, the time for reciprocal monophyly to become more likely than paraphyly at this locus is substantially different across species, ranging from 1.7 million years for the nearly-monophyletic species P. flexilis, to 24 million years for the strongly nonmonophyletic P. lambertiana. The estimated time for complete genome-wide coalescence ranges from 5.4 million years to 76 million years.

DISCUSSION Phylogenetic implications - The subsectional topology of the strict consensus of 20 trees matched that presented in Syring et al. (2005). While there are nine detected cases of species having alleles in two or more of the major clades in subsects. Strobus and Cembroides, this study has detected no intraspecific variability that crosses the taxonomic subsections defined by Gernandt et al. (2005). In this respect, the subsectional exemplar sampling strategy used in Syring et al. (2005) is supported. The phylogeny presented in this paper resolves the same monophyletic subsections previously observed with both cpDNA (Gernandt et al., 2005) and nrDNA ITS (Liston et al., 1999). While species nonmonophyly complicates many phylogenetic interpretations within subsects. Strobus and Cembroides, the LEA-like intron offers greater support for the relationships among and within subsections than previously observed for cpDNA (Wang et al., 1999; Gernandt et al., 2005), and nrITS (Liston et al., 1999). Average p-distances across four subsections range from 0.0131 (subsect. Cembroides) to 0.0177 (subsect. Strobus). These values are 4.5 times (subsects. Strobus, Cembroides) to 6.1 times (subsect. Balfourianae) greater than comparable values for cpDNA (Gernandt et al., 2005). To date, infraspecific sampling remains insufficient to test for species-level monophyly in cpDNA. In additicii, despite greater divergence for nrITS (Liston et al., 1999) relative to the LEA-like locus in the range of 1.6 (subsect. Cembroides) to 2.9fold (subsect. Gerardianae), orthological complexity in rDNA precludes its use for assessing species-level monophyly. Considering that widespread allelic nonmonophyly was observed and the fact that this is a single locus estimate of relationships, phylogenetic conclusions are premature except in those cases where support is strong and species monophyly is clear. The two sections of subg. Stro bus are similar in having a species-rich subsection (subsects. Cembroides and Strobus), a species-poor subsection (subsects. Balfourianae and Gerardianae), and one monotypic subsection (subsects. Nelsonianae and KrempfIanae). In both sections the relationships within the species-poor subsections are resolved and well supported, while the relationships within the species-rich subsections are ambiguous. For example, this is the first study to find resolution among the three species of subsect. Balfourianae. Interestingly, P. aristata and P. longaeva are well differentiated at the LEA-like locus. The alleles from P. balfouriana, representing both the northern and southern subspecies (Oline et al., 2000), are an example of weak nonmonophyly (Table 4), but are both strongly supported (98% BS) in a dade with P. aristata (Fig. 2). In sect. Quinquefoliae the moderate support for subsection Gerardianae (81% BS) and the strong support for the sister relationship of P. bungeana and P. gerardiana (90% BS) are in agreement with the cpDNA data of Gernandt et al. (2005) and offer additional support for those relationships (Fig. 1). Two clades deserve mention, the first being Clade C (Fig. 1) which contains all of the allelic diversity of P. chiapensis. The isolated resolution of P. chiapensis was unexpected based on morphological assessments and observations that hypothesize an affinity to either P. strobus (Martinez, 1940, Farjon and Styles, 1997) or P. monticola (Andresen, 1966; Rzedowski and Vela, 1966; Wright et al., 1996). While the relationship of P. chiapensis to the remainder of the North American sect. Strobus is ambiguous, the distinctiveness of this species at the LEA-like locus is noteworthy. The second noteworthy dade is subsection Rzedowskianae sensu Farjon (2005) consisting of P. maximartinezii, P. pinceana, and P. rzedowskii, within dade I of subsection Cembroides (Fig. 2). Pinus rzedowskii is unique among subsect. Cembroides in having seed wings. The placement of P. rzedowskii in a well supported dade (89% BS) with P. "C

maximartinezii and P. pinceana is consistent with some nrITS topologies (Liston et al., 1999, but not Gernandt et al., 2001) and morphological analysis (Malusa, 1992). The sister relationship of P. maximartinezii and P. pinceana observed here (68% BS) is concordant with Gernandt Ct al. (2003) and Perez de la Rosa (1995). The cpDNA phylogeny of Gernandt et al. (2005) is ambiguous about the placement ofF. rzedowskii, but shows strong support (98% BS) for the sister relationship between P. maximartinezii and P. pinceana. The strong support (97% BS) for P. rzedowskii inside subsect. Cembroides further supports the decision not to recognize a monotypic subsect. Rzedowskianae (Price et al., 1998; Gernandt et al., 2005). Likewise, recognition of these three species as subsect. Rzedowskianae sensu Farjon (2005) could possibly create a paraphyletic subsect. Cembroides.

Possible factors leading to species-level nonmonophyly in Pinus A number of factors have been suggested as potential mechanisms for apparent species nonmonophyly based on cytoplasmic and nuclear markers (Pamilo and Nei, 1988; Rieseberg and Broulliet, 1994; Moore, 1995; Crisp and Chandler, 1996; Doyle, 1997; Shaw, 2001; Hudson and Coyne, 2002; Funk and Omland, 2003; Rosenberg, 2003; Sites and Marshall, 2003; Bouillé and Bosquet, 2005). Here, we review the main factors in reference to the LEA-like topologies (Figs. 1 and 2), and consider how these factors may have played a role in species level monophyly and nonmonophyly in subg. Strobus. As Funk and Omland (2003) point out, definitive causes for species polyphyly are difficult to prove, however, from the observed topological patterns causal inferences can be made.

(a) Inadequate phylogenetic signal The signature of insufficient phylogenetic signal is apparent, but not universal, in this data set. Inadequate information almost certainly plays a role in the nine cases of weak allelic nonmonophyly across the subgenus where insufficient data was available to determine the status of species monophyly. Two of the clearest examples are P. balfouriana and P. monophylla, which have no alleles separated by supported nodes in the strict consensus tree, and show insignificant nonmonophyly in WSR tests (Table 4). However, inadequate phylogenetic information cannot explain many of the cases of nonmonophyly. There were 14 species across the subgenus which had one or more alleles in a supported (79 100% BS) topological position that ensured allelic nonmonophyly. Further, of these 14 species, constrained topologies resulted in 10 species being significantly nonmonophyletic (WSR tests; 71

Table 4). The fact that 10 species are resolved as nonmonophyletic, show strong nodal support, and have significant WSR results provides compelling evidence that the species level polyphyly uncovered in this study is the product of true biological signal and not simply inadequate phylogenetic information.

(b) Imperfect taxonomy In pines, over-reliance on labile characters, such as number of needles per fascicle or the misinterpretation of intraspecific variation has caused imperfect taxonomic circumscriptions in subg. Strobus (reviewed in Farjon and Styles 1997; Businsk, 2004; Gernandt et al., 2005). While issues of taxonomic uncertainty remain, particularly in the Asiatic members of subsect. Strobus and the Mexican members of subsect. Cembroides, it appears that imperfect taxonomy plays only a minor role in explaining the observed species nonmonophyly. For taxa in subsect. Cembroides, broader species circumscription (e.g., synonymizing P. cembroides and P. remota (Farjon and Styles, 1997), synonymizing P. cembroides, P. discolor, and P. remota (Kral, 1993), or synonymizing P. cembroides, P. discolor, and P. johannis (Farjon and Styles, 1997)) does not result in monophyly due to allele sharing across the major clades or well-supported nodes within one of the major clades. Even the broadest species concept, which treats all taxa of clades F, G, and H as varieties of P. cembroides (Shaw, 1914, with the caveat that several of these taxa were described subsequent to his publication), would fail to yield a monophyletic P. cembroides sensu lato due to the position of johannisA2 in dade I (Fig. 2). The situation in sect. Quinquefoliae is similar to that of sect. Parrya. Pinus bhutanica was recently recognized as a subspecies of P. wallichiana (Businsk, 1999, 2004). Although their alleles are closely related, they appear in two separate clades (D and E), and thus even a broader species concept does not result in a monophyletic allele lineage. Pinus kwangtungensis, recently considered to be conspecific with P. wangii Hu & W. C. Cheng and related to P. parvfiora (Businsk', 2004), is found in the well supported (98% BS) dade E, several steps removed from P. parvflora. Pinus chiapensis, first described as a variety of P. strobus (Martinez, 1940), was very strongly supported (100% BS) as monophyletic in a monotypic dade C, and with no allele sharing between this species and P. stro bus. Further, in the 20 recovered trees for sect. Quinquefoliae, the alternative placement for P. chiapensis (dade C) was sister to dade B, which contains two of three alleles from P. monticola, but none of the diversity of P. strobus. In a final example, Critchfield and Little (1966) recognize P. strobformis as a morphological and geographical link between P. flexilis to the north and P. ayacahuite to the south. Neither a 72

broadly circumscribed P. ayacahuite nor P. flexilis would restore species monophyly in this taxonomic complex due to the diversity within P. strobformis.

(c) Hybridization and introgression The ability for pine species to interbreed under artificial (Little and Righter, 1965; Garrett, 1979; Critchfield, 1986) and natural conditions (P. monophylla

P. edulis: Lanner 1 974a; Lanner and Phillips, 1992; P. parvflorax P.pumila: Watano et al., 2004) has been well documented. In addition, there are at least three cases where hypothesized hybridization has given rise to a named taxon in subg. Sirobus, namely P. hakkodensis (Farjon, 1998), P. quadrfolia (Lanner, 1 974b), and P. edulis var. fallax (Little, 1968). However, none of these cases have been supported with molecular evidence, in contrast to the well-documented hybrid origin of P. densata in subg. Pinus (Wang et al., 2001; Song et al. 2003). In Pinus, natural hybridization is usually of local occurrence in areas of sympatry, and often associated with disturbance (Ledig, 1998). The two subgenera, Pinus and Strobus, are completely isolated, and crossing among subsections is very rare (Little and Critchfield, 1969). In order to determine whether hybridization could explain the distribution of alleles in the nonmonophyletic species, we looked specifically at the groups where hybridization has been documented either in the wild or under artificial conditions. In subsect. Strobus, hybridization between P. pumilax P.parv/lora (Watano et al., 2004) is not reflected in Fig. 1, where both species are monophyletic and found in unique clades (A and D, respectively). Gernandt et al. (2005) resolve Pinusparvfiora in a dade of North American species based on a cpDNA analysis. The discrepancy between cpDNA and the LEA-like locus may be a result of cpDNA

introgression. In subsect. Parrya, the documented hybridization between P. edulisx P. monophylla is widely recognized (Lanner, 1974a). In this case, the introgression of alleles in regions of sympatry could be observed, but we uncovered no cases of allele sharing among these species (Fig. 2). The hybrid origin of P. quadrfolia could not be evaluated, since we did not sample populations of putative parents "P. juarazensis" Lanner from Baja California and P. monophylla from southern California. Pinus lambertiana serves to illustrate the potential for allelic nonmonophyly in the absence of hybridization. This species is easily distinguished from all other members of subg. Strobus, and it is reproductively isolated from all North American pines (Critchfield, 1986; Fernando et al., 2005). In our study, P. lambertiana is polyphyletic, with three alleles appearing in as many clades in the strict consensus tree. The most recent common ancestor for all P. 73

lambertiana alleles is at the node supporting the divergence of dade D from clades A / B / C (Fig. 1, marked with an "L"). The allele lambertianaAl (dade A) is sister to monticolaAl with 100% BS, and could be interpreted as a potentially introgressant allele between these species. However, Critchfield (1975, 1986) demonstrated that P. lambertiana can hybridize only with Eurasian

members of subsect. Strobus (Critchfield, 1986), and the failure ofF. lambertianax P.monticola crosses traces to pre-fertilization barriers (Fernando et al., 2005). It is noteworthy that lambertianaA3 is found associated with dade D, an otherwise strictly Eurasian dade. There remains the possibility that allele A3 represents an ancient hybridization event, but this seems unlikely given the geographic and reproductive isolation of P. lambertiana.

(d) Paralogy versus orthology We have no evidence to suggest that the observed pattern of species nonmonophyly is the result of paralogous gene amplification. On the contrary, evidence suggests that the sequences amplified from across subg. Strobus were orthologous. The LEA-like locus has been determined to have low-copy number (Krutovsky et al., 2004), all amplifications yielded a single band when visualized, and sequences from haploid megagametophyte tissue never detected cases of nucleotide polymorphism.

(e) Lineage sorting Lineage sorting is the process by which ancestral polymorphism is 'sorted' and later fixed in daughter lineages, either by drift or by selection. Incomplete lineage sorting, by contrast, is the persistence and retention of ancestral polymorphisms through multiple speciation events. Incomplete lineage sorting can potentially impact any single locus gene tree in any taxon. The theoretical impact of incomplete lineage sorting on gene trees and inferred species trees has been known for some time (Nei, 1987; Doyle, 1992). Incomplete lineage sorting can be promoted by biological conditions that encourage the retention of genetic variability in a species, e.g., long life-spans, large Ne, and outcrossing. The retention of shared ancestral polymorphisms is also affected by natural selection (Broughton and Harrison, 2003), since balancing selection works to oppose directional selection and maintain genetic diversity. The timing of speciation events is another critical factor impacting the process of lineage sorting, since rapid radiations or multiple speciation events in quick succession reduce the chances for lineage sorting to reach completion before cladogenesis. If the time intervals between species divergence events are short relative to the time intervals between lineage-branching events in each species, ancestral polymorphisms may be carried through successive rounds of divergence. 74

Recent studies implicate incomplete lineage sorting as a major factor in the retention of polymorphism in plants (loerger et al., 1990; Comes and Abbott, 2001; Chiang et al., 2004; Bouillé and Bousquet, 2005) and animals (Nagl et al., 1998; Hare et al., 2002; Broughton and Harrison, 2003; citations in Funk and Omland, 2003). Estimates for the retention of ancestral polymorphisms range from 27 36 million years for SI alleles under balancing selection in the Solanaceae (Joerger et al., 1990), to 10 18 million years for genes of unknown function in three species of Picea (Bouillé and Bousquet, 2005). In Pinus, ancestral retention on this order could explain all cases of species nonmonophyly within each of the subsections. According to the molecular clock divergence estimates of Willyard et al. (in review), the species-rich subsections Cembroides and Strobus diverged from their sister lineages (subsects. Balfourianae and Gerardianae, respectively) between 20 and 10 million years ago, depending upon the calibration date used. Given the rapidity with which lineages radiated in these subsections (e.g., clades A E in Fig. 1, clades FI in Fig. 2), divergence events may have occurred so rapidly that lineage sorting rarely approached allelic fixation within daughter lineages.

7)Recombination The effect of recombination on reconstructing genealogies has been well- documented (reviewed in Posada et al., 2002). Since recombination produces sequence segments that have different genealogical histories, organismal history cannot be accurately depicted by a single phylogenetic 'tree', but rather a set of correlated trees across recombinant segments in the alignment. Under scenarios of ancient recombination, recombination between highly similar segments, or low recombination rates, the majority of positions sampled will accurately reflect phylogenetic history and the impact of recombination will be limited to alleles within species or recently diverged species (Schierup and Hem, 2000; Posada and Crandall, 2002). In contrast, recombination between divergent sequences or high recombination rates can produce inaccurate phylogenies that show artifactually long terminal branches, apparent trans-specific polymorphism (as shown here and the study of Bouillé and Bousquet, 2005), and phylogenies that are significantly different from the true histories underlying the data (Posada and Crandall, 2002). The locus examined in this study has certainly experienced recombination during the divergence of sections of subg. Strobus, but different methods employed (maximum x2; DSS) fail to detect recombination in our data. We chose these methods because they use different approaches to infer recombination, and because the maximumx2method shows high sensitivity and a low false error rate in detecting recombination from empirical data sets (Posada, 2002). The 75

low nucleotide diversity characteristic of pines from subg. Strobus (r = 0.02 for Sect. Quinquefoliae and 0.03 for Sect. Parrya) may be the primary obstacle in detecting recombination since iç values of-j 5% are required to obtain statistical power (Posada and Crandall, 2002). The added combination of high haplotype diversity and large effective population sizes for these species makes the detection of parental sequences and daughter recombinants unlikely given our sampling strategy. For pine species exhibiting monophyly across multiple LEA-like alleles (e.g., P. albicaulis, P. chiapensis; Figs 1, 2), undetected recombination will have mimimal impact on phylogenetic resolution since recombinants between 'divergent' alleles will show congruent phylogenetic signal. For species deviating significantly from monophyly, recombination among divergent alleles could have a pronounced impact. Simulations by Schierup and Hem (2000) show that low levels of recombination can produce trees that underestimate the amount of true divergence between parental alleles, and yield more 'star-like' phylogenies than would be obtained with non-recombined sequences. Clearly, the presence of divergent allelic polymorphism is primarily responsible for the pattern of nonmonophyly in pines; recombination simply adds complexity and uncertainty to the pattern.

How do pine species attain monophyly? - Rieseberg and Brouillet (1994) suggest that species concepts based on monophyly are inadequate, because paraphyletic species should be expected from progenitor-derivative speciation. Under that mode of speciation, paraphyly should be expected as a direct result of incomplete lineage sorting. Given enough time following a speciation event, drift or directional selection should theoretically lead to genome-wide monophyly via the sorting and extinction of lineages (Rieseberg and Brouillet, 1994; Rosenberg, 2003), so long as balancing selection does not maintain polymorphisms that predate speciation events. At the point when lineage sorting is complete all new mutation will result from the same lineage, and intraspecific variation will reflect post-speciation mutation. The situation in Pinus is more complex because multiple speciation events appear to have occurred before lineage sorting was completed in any single bifurcation event. As a consequence, ancestral polymorphisms have been retained through multiple speciation events. One potential outcome of ancestral allelic retention on this order is that allele lineages within species become polyphyletic. ,Li

The occurrence of paraphyletic and polyphyletic species in both Figs. 1 and 2 appears consistent with our estimates for the number of years until reciprocal monophyly is expected to be more likely than paraphyly (Table 6; Rosenberg, 2003). Even though our estimates of 0 are based on a small sample, the values in Table 6 are instructive. For example, the estimated time for reciprocal monophyly to be more likely than paraphyly is 13.4 million years for P. discolor. This value is bounded by the estimated age of sect. Cembroides, which is calculated at 19 million years (Willyard et al., in review). Topological analyses suggest that P. discolor is weakly nonmonophyletic (35% of WSR tests were significant; Table 4) at the LEA-like locus, a finding that shows surprisingly good agreement with the approximations of allele coalescence and molecular evolutionary age of the Cembriodes lineage. Nevertheless, the estimate for genome- wide coalescence in this species is ca. 43 million years, suggesting that portions of the genome may harbor deep trans-species polymorphisms, even under neutrality. For species with large values of 0, such as P. lambertiana, the phylogenetic implications are striking. A calculated value of ca. 24 million years until reciprocal monophyly is expected to be more likely than paraphyly would predate the divergence of subsects. Strobus and Gerardianae. If this estimate is correct, it suggests that the potential exists for trans-species polymorphisms to be shared among pine species from different subsections. Further, a calculated value of ca. 76 million years for complete genome-wide coalescence would extend far beyond the divergence of the two sections from subg. Strobus. Clearly, the accuracy of these estimates depends upon many assumptions; nevertheless, they highlight the fallacy of assuming 'species monophyly' in groups characterized by large population sizes, and the complexity of resolving phylogenetic relationships among pine species. In light of this information, the historic effective population size, as reflected by nucleotide diversity within contemporary species, seems to be the driving factor in determining whether pine species are genetically unique, and whether genes can accurately trace a species phylogenetic history. Noteworthy in this regard is that current geographic ranges (a proxy for census population sizes and global abundance) are uncorrelated with nucleotide diversity, and show no association with the extent of monophyly or nonmonophyly of a species (Table 5). The multiple cycles of glaciation in North America, Europe, and Asia have had a pronounced impact on conifer genetic diversity (MacDonald et al., 1998; Petit et al., 2003), and it's likely that contemporary ranges reflect non-equilibrium processes of recent expansion (e.g., species with low genetic diversity and large geographic ranges like P.koraiensis and P. strobus), recent range 77

contraction (e.g., species with high genetic diversity and small ranges like P. bhutanica and P. johannis), and possibly geographic or ecological isolation coupled with long distance dispersal events (e.g., P. chiapensis and P. albicaulis). This is not entirely unexpected, since studies of pines have shown that geographically-widespread species can lack genetic diversity, while narrowly distributed pines can show ample genetic diversity (Ledig, 1998; Ledig et al., 1999; Delgado et al., 1999). This observation has implications for molecular phylogenetic studies focusing on recently diverged taxa (e.g., tribes, genera, and species complexes) since it highlights the importance of considering the magnitude of intraspecific diversity within the overall pattern of phyletic divergence. Until the correlates of interspecific nonmonophyly are better understood, the only way to quantif' the impact of allelic nonmonophyly on phylogenetic accuracy is by simulation (e.g., Rosenberg, 2003) or through the acquisition of empirical data. Based on this analysis, it seems that effective population size either has an unpredictable association with monophyly in pines, or perhaps more likely, that geographic range (or census count) is a poor predictor of the effective population size of a species. We note, for instance, that P. chiapensis (a narrow endemic of Mexico, limited to ca. 15,000 km2) shows far greater nucleotide diversity than P. albicaulis (r = 0.0031 vs. 0.0000), even though the latter species is dispersed across ca. 400,000km2of western North America. The multiple cycles of glaciation in North America, Europe, and Asia have had a pronounced impact on conifer genetic diversity (MacDonald et al., 1998; Petit et al., 2003), and it is likely that contemporary ranges reflect non- equilibrium processes of recent expansion (e.g., species with low nucleotide diversity and large ranges like P. sibirica, P. strobus, or P.koraiensis), recent range contraction (e.g., species with high diversity but small ranges like P. bhutanica, P. johannis, and P. longaeva), and possibly even long distance dispersal. Until accurate estimates of relative or absolute effective population sizes become available, the connection between monophyly and Necan only be inferred from simulation studies (e.g., Rosenberg, 2003).

Broader implications for phylogenic studiesofgymnosperms and angiosperms - Coalescence of allele lineages is dependent on a suite of interacting processes that occurred prior to and during speciation, and continue to the present. Processes responsible for genic nonmonophyly in species include hybridization with subsequent introgression, incomplete lineage sorting, and recombination. These processes may become superimposed, and their present-day genealogical patterns may reflect ancient and recent events. Sequences of this nature 78

track the mutational history of an allele, but they are unlikely to track the comparatively simple cladistic history of recently diverged species. It is noteworthy that these processes are found to varying degrees in cpDNA and mtDNA, thus the lack of species monophyly cannot be dismissed as exclusively a nuclear gene phenomenon. Nonetheless, coalescent simulations by Rosenberg (2003) show that when three coalescent units of time have passed for haploid organellar genes, the probability of reciprocal monophyly exceeds 0.8; this same length of time equates to 0.75 coalescent units for nuclear loci, at which time the probability of genic monophyly is less than 0.1. In the absence of complicating factors (e.g., organellar introgression, nuclear recombination), the prevalence of non-coalescence is expected to be much more problematic for nuclear loci than organellar loci.

Given the long time frame predicted by coalescent theory for species ofPinusto attain monophyly (Table 6), genealogical-based species concepts (de Queiroz and Donoghue, 1988; Baum and Donoghue, 1995; Shaw, 1998) derived from molecular markers may be inappropriate for pines and other species sharing similar life history traits. Distinct morphological and ecological differences are readily apparent between most of the species of subgenus Strobus included in this study, yet species-level paraphyly or polyphyly appears in nearly half of the

species examined. Evidence fromPinus isconsistent with the theoretical expectation that a large portion of the genome frequently remains common to closely related species well after speciation has occurred (Rosenberg, 2003). Wu (2001) outlined a theory of speciation whereby "differentiation loci" become fixed within a pair of species while regions of "neutral divergence" in between these fixed sites are free to have unrestricted gene flow. Under this theory, the fixed regions in both genomes become larger over time as gene flow is further restricted. Our data suggest the possibility for retention of ancestral polymorphisms in these regions of neutral divergence, regardless of whether gene flow is ongoing. In other words, even after interspecific mating barriers become fixed at many loci, pines can be expected to harbor ancestral polymorphisms at a large fraction of their genome. The demonstration of widespread species-level nonmonophyly appears to be a severe constraint in the application of nuclear genes in resolving organismal evolutionary history of pines at low taxonomic ranks. Similar conclusions were reported by Bouillé and Bousquet (2005) and Broughton and Harrison (2003), all of whom have suggested that nuclear gene genealogies have limited potential to reconstruct evolutionary histories among closely related species. While we generally agree with this conclusion for our work in deciphering relationships among the tenninal species ofPinus,it is difficult to extrapolate these findings to dissimilar taxa with

different life history traits (e.g., smallNe,short life spans, higher inbreeding rates). Even with the limitation of noncoalescence, discrete nuclear genes can provide important insights into the historical, demographic, and possibly even selective processes that forge new species. This information offers new perspectives (relative to organellar DNA or nrITS) as to the biological basis for the presence (or absence) of phylogenetic patterns. Perhaps the most important general finding of this work is that coalescence failure can lead to significantly different phylogenetic interpretations that are only detectable by sampling

multiple individuals per species, and perhaps multiple loci. InPinus,largeNe,long generation times, and high outcrossing rates combine to make allelic noncoalescence a readily-detected pattern, even with a small sample size. Given this combination of traits, it seems reasonable to expect that other species-rich tree genera with large, widespread populations, e.g. Quercus (400 species), Salix (450 species), Ficus (750 species) and Eucalyptus (680 species), should be prone to similar levels of nonmonophyly if examined by nuclear markers. Less clear is the impact noncoalescence will have on less widespread, more genetically uniform taxa. In order to ensure the robustness of future studies, we recommend that researchers explicitly test the monophyly of species and lower-level taxa by including multiple individuals across the range of a species, especially prior to proposing new classifications or delimiting species. In the absence of such sampling, and working under the assumption of species monophyly, it seems likely that many more cases of species nonmonophyly will remain undetected.

ACKNOWLEDGEMENTS We thank Martin Gardner, Fiona Inches, and Philip Thomas (Royal Botanic Garden Edinburgh, Scotland), David Gernandt (Universidad Autónoma de Hidalgo, Mexico), Dave Johnson (USDA Forest Service, Institute of Forest Genetics, Placerville, CA), Richard Sneizko (USDA Forest Service, Dorena Genetic Resource Center, Cottage Grove, OR), Michael Wall (Rancho Santa Ana Botanic Garden, Claremont, CA), Jesus Vargas Hernandez (Instituto de Recursos Naturales, Mexico), Randall Hitchin (Washington Park Arboretum, Seattle), Dale Simpson and Jean Beaulieu (Natural Resources Canada), Steven Roelof (Native Plant Society of Oregon), Jim Buck (University of Michigan), Richard Halse (Oregon State University), Bill Dvorak (CAMCORE), Forest Tree Breeding Center (Japan), Sanderson McArthur (USDAFS Rocky Mt. Research Station), Paul Halladin (Iseli Nursery, Oregon), Dennis Ringnes (USDAFS Central Zone Genetic 80

Resource Program, California), Carrie Sweeney (USDAFS Oconto River Seed Orchard, Minnesota), Joe Myers (USDAFS Coeur d'Alene Nursery, Idaho), Yasayuki Watano (Chiba

University, Japan),Joan Blada (Forest Research and Management Institute of Bucharest, Romania), Frank Hammond (U.S. Army, Fort Huachuca, AZ), Kirsten Winter (Cleveland National Forest, San Diego, CA), James C. Zech (Sull Ross State University, Alpine, TX), and Konstantin Krutovsky (Texas A&M University) for the generous contributions of seed and needle tissue. Funding for this study was provided by the National Science Foundation grant DEB 0317103 to Aaron Liston and Richard Cronn, and the USDA Forest Service Pacific Northwest Research Station.

RJFERENCES

Alvarez, I., R. Cronn, and J. F. Wendel. 2005. Phylogeny of the New World diploid cottons (Gossypium L., Malvaceae) based on sequences of three low-copy nuclear genes. Plant Systematics and Evolution 252:199-214.

Alvin, K. 1960. Further conifers of the Pinaceae from the Wealden Formation of Belgium. Memoirs of the Royal Institute of Natural Sciences, Belgium 146:1-39.

Andresen, J. W. 1966. A multivariate analysis of the Pinus chiapensismonticolasirobus phylad. Rhodora 68:1-24.

Bailey, D. K. 1970. Phytogeography and taxonomy of Pinus subsection Balfourianae. Annals of the Missouri Botanical Garden 57:210-249.

Baum, D. A., and M. J. Donoghue. 1995. Choosing among alternative "phylogenetic" species concepts. Systematic Botany 20:560-573.

Berry, P. E., A. L. Hipp, K. J. Wurdack, B. Van Ee, and R. Riina. 2005. Molecular phylogenetics of the giant genus Croton and tribe Crotoneae (Euphorbiaceae sensu stricto) using ITS and TRNL-TRNF DNA sequence data. American Journal of Botany 92:1520-1534.

Bouillé, M., and J. Bousquet. 2005. Trans-species shared polymorphisms at orthologous nuclear gene loci among distant species in the conifer Picea (Pinaceae): Implications for the long-term maintenance of genetic diversity in trees. American Journal of Botany 92:63-73.

Broughton, R. E., and R. G. Harrison. 2003. Nuclear gene genealogies reveal historical, demographic and selective factors associated with speciation in field crickets. Genetics 163:1389-140 1.

Businsk, R. 1999. Taxonomic revision of Eurasian pines (genus PiniLs L.)Survey of species and infraspecific taxa according to latest knowledge. Acta Pruhoniciana 68: 7-86. 81

Businsk, R. 2004. A revision of the Asian Pinus subsection Strobus (Pinaceae). Wilidenowia 34: 209-257.

Chiang, T.-Y., K.-H. Hung, T.-W. Hsu, and W.-L. Wu. 2004. Lineage sorting and phylogeography in Lithocarpusformosanus and L. dodonaefolius (Fagaceae) from Taiwan. Annals of the Missouri Botanical Garden 91:207-222.

Church, S. A., and D. R. Taylor. 2005. Speciation and hybridization among Houstonia (Rubiaceae) species: The influence of polyploidy on reticulate evolution. American Journal of Botany 92:1372-1380.

Comes, H. P., and R. J. Abbott. 2001. Molecular phylogeography, reticulation, and lineage sorting in Mediterranean Senecio sect. Senecio (Asteraceae). Evolution 55:1943-1962.

Crisp, M. D., and G. T. Chandler. 1996. Paraphyletic species. Telopea 6:813-844.

Critchfield, W. B. 1975. Interspecific hybridization in Pinus: A summaly review. Pages 99-105 in: Proceedings 14th Canadian Tree Improvement Association, Part 2, Symposium on interspecific hybridization in forest trees, NB, 28-3 0 August 1973 (Fowler, D. P., and C. W. Yeatman, eds.). Canadian Forestry Service, Ottawa, ON.

Critchfield, W. B. 1986. Hybridization and classification of the white pines (Pinus section Strobus). Taxon 35:647-656.

Critchfield, W. B., and E. L. Little, Jr. 1966. Geographic distribution of the pines of the world. USDA Forest Service Miscellaneous Publication 991. Washington, DC.

Cross, E. W., C. J. Quinn, and S. J. Wagstaff. 2002. Molecular evidence for the polyphyly of Olearia (Astereae: Asteraceae). Plant Systematics and Evolution 23 5:99-120. de Queiroz, K., and M. J. Donoghue. 1988. Phylogenetic systematics and the species problem. Cladistics 4:317-338.

Delgado, P., D Piñero, A. Chaos, N. Perez-Nasser, and E. R. Alavarez-Buylla. 1999. High population differentiation and genetic variation in the endangered Mexican pine Pinus rzedowskii (Pinaceae). American Journal of Botany 86:669-676.

Doyle, J. J. 1992. Gene trees and species trees: Molecular systematics as one-character taxonomy. Systematic Botany 17:144-163.

Doyle, J. J. 1997. Trees within trees: Genes and species, molecules and morphology. Systematic Biology 46:53 7-553.

Farjon, A. 1998. World Checklist and Bibliography of Conifers. Royal Botanical Gardens at Kew, Richmond, U.K. 82

Farjon, A. 2005. Drawings and descriptions of the genus Pinus, 2ed. E. J. Brill and W. Backhuys, Leiden, Netherlands.

Farjon, A., and B. T. Styles. 1997. Pinus (Pinaceae). Flora Neotropica Monograph 75. The New York Botanical Garden, New York.

Felsenstein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39:783-791.

Feng, Y., S.-H. Oh, and P. S. Manos. 2005. Phylogeny and historical biogeography of the genus Platanus as inferred from nuclear and chloroplast DNA. Systematic Botany 30:786-799.

Fernando, D. D., S. M. Long, and R. A. Sniezko. 2005. Sexual reproduction and crossing barriers in white pines: The case between Pinus lambertiana (sugar pine) and P. monticola (western white pine). Tree Genetics and Genomes 1:143-150.

Funk, D. J., and K. E. Omland. 2003. Species-level paraphyly and polyphyly: Frequency, causes, and consequences, with insights from animal mitochondrial DNA. Annual Review of Ecology, Evolution, and Systematics 34:397-423.

Garrett, P. W. 1979. Species hybridization in the genus Pinus. USDA Forest Service, Northeast Forest Experiment Station, Station Research Paper 436.

Gernandt, D. S., A. Liston, and D. Pifiero. 2001. Variation in the nrDNA ITS of Pinus subsection Cembroides: Implications for molecular systematic studies of pine species complexes. Molecular Phylogenetics and Evolution 21:449-467.

Gernandt, D. S., A. Liston, and D. Pifiero. 2003. Phylogenetics of Pinus subsections Cembroides and Nelsoniae inferred from cpDNA sequences. Systematic Botany 4:657-673.

Gernandt, D. S., G. G. Lopez, 5. 0. Garcia, and A. Liston. 2005. Phylogeny and classification of Pinus. Taxon 54:29-42.

Gilbert, C., J. Dempcy, C. Ganong, R. Patterson, and G. S. Spicer. 2005. Phylogenetic relationships within Phacelia subgenus Phacelia (Hydrophyllaceae) inferred from nuclear rDNA ITS sequence data. Systematic Botany 30:627-634.

Goetsch, L., A. J. Eckert, and B. D. Hall. 2005. The molecular systematics of Rhododendron (Ericaceae): A phylogeny based upon RPB2 gene sequences. Systematic Botany 30:616-626.

Goodwillie, C., and J. W. Stiller. 2001. Evidence for polyphyly in a species of Linanthus (Polemoniaceae): Convergent evolution in self-fertilizing taxa. Systematic Botany 26:273- 282.

Hare, M. P., F. Cipriano, and S. R. Palumbi. 2002. Genetic evidence on the demography of speciation in allopatric dolphin species. Evolution 56:804-8 16. 83

Hudson, R. R., and J. A. Coyne. 2002. Mathematical consequences of the genealogical species concept. Evolution 56:1557-1565.

loerger, T. R., A. G. Clark, and T. H. Kao. 1990. Polymorphism at the self-incompatibility locus in Solanaceae predates speciation. Proceedings of the National Academy of Sciences, USA 87:9732-9735.

Kamiya, K., K. Harada, H. Tachida, and P. S. Ashton. 2005. Phylogeny of PgiC gene in Shorea and its closely related genera (Dipterocarpaceae), the dominant trees in southeast Asian tropical rain forests. American Journal of Botany 92:775-788.

Kral, R. 1993. Pinus. Flora of North America Editorial Committee (ed.): Flora of North America (North of Mexico), vol. 2. Oxford University Press, New York.

Krutovsky, K. V., M. Troggio, G. R. Brown, K. D. Jermstad, and D. B. Neale. 2004. Comparative mapping in the Pinaceae. Genetics 168:447-461.

Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: Molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245.

Lanner, R. M. 1 974a. Natural hybridization between Pinus edulis and Pinus monophylla in the American Southwest. Silvae Genetica 23:108-116.

Lanner, R. M. 1974b. A new pine from Baja California and the hybrid origin of Pinus quadrfolia. Southwest Naturalist 19: 75-95.

Lanner, R. M., and A. M. Phillips III. 1992. Natural hybridization and introgression of pinyon pines in northwestern Arizona. International Journal of Plant Sciences 153:250-257.

Ledig, F. T. 1998. Genetic variation in Pinus. Pages 25 1-280 in: Ecology and Biogeography of Pinus (Richardson, D. M., ed.). Cambridge Univeiity Press, Cambridge, U.K.

Ledig, F. T., M. T. Conkle, B. Bermejo-Velazaquez, T. Eguiluz-Piedra, P. D. Hodgskiss, D. R. Johnson, W. S. Dvorak. 1999. Evidence for an extreme bottleneck in a rare Mexican pinyon: Genetic diversity, disequilibrium, and the mating system in Pinus maximartinezii. Evolution 53 :91-99.

Lee, C., S.-C. Kim, K. Lundy, and A. Santos-Guerra. 2005. Chloroplast DNA phylogeny of the woody Sonchus alliance (Asteraceae: Sonchinae) in the Macaronesian Islands. American Journal of Botany 92:2072-2085.

Levin, R. A., K. Watson, and L. Bohs. 2005. A four-gene study of evolutionary relationships in Solanum section Acanthophora. American Journal of Botany 92:603-612.

Levin, R. A., and J. S. Miller. 2005. Relationships within tribe Lycieae (Solanaceae): Paraphyly of Lycium and multiple origins of gender dimorphism. American Journal of Botany 92:2044- 2053. 84

Liston, A., W. A. Robinson, D. Piflero, and E. R. Alvarez-Buylla. 1999. Phylogenetics of Pinu.s (Pinaceae) based on nuclear ribosomal DNA internal transcribed spacer region sequences. Molecular Phylogenetics and Evolution 11:95-109.

Little, E. L., Jr. 1968. Two new pinyon varieties from Arizona. Phytologia 17:329-342.

Little, E. L., Jr., and F. I. Righter. 1965. Botanical descriptions of forty artificial pine hybrids. USDA Forest Service Technical Bulletin 1345:1-47. Washington, DC.

Little, E. L., Jr., and W. B. Critchfield. 1969. Subdivisions of the genus Pinus. USDA Forest Service Miscellaneous Publication 1144. Washington, DC.

Luo, Y., F. Zhang, and Q.-E. Yang. 2005. Phylogeny of Aconitum subgenus Aconitum (Ranunculaceae) inferred from ITS sequences. Plant Systematics and Evolution 252:11-25.

MacDonald, G. M., L. C. Cwynar, and C. Whitlock. 1998. The late Quaternary dynamics of pines in northern North America. Pages 122-149 in: Ecology and Biogeography ofPinus (Richardson, D. M., ed.). Cambridge University Press, Cambridge, U.K.

Magallon, S., and M. J. Sanderson. 2002. Relationships among seed plants inferred from highly conserved genes: Sorting conflicting phylogenetic signals among ancient lineages. American Journal of Botany 89:1991-2006.

Malusa, J. 1992. Phylogeny and biogeography of the pinyon pines(Pinussubsect. Cembroides). Systematic Botany 17:42-66.

Martin, D. P., C. Williamson, and D. Posada. 2005. RDP2: Recombination detection and analysis from sequence alignments. Bioinformatics 21:260-262.

Martinez, M. 1940. Pinaceas Mexicanas. Anales del Instituto de Biologia de Mexico 11:57-84.

Mason-Gamer, R. J. 2005. The {beta}-amylase genes of grasses and a phylogenetic analysis of the Triticeae (Poaceae). American Journal of Botany 92:1045-1058.

McGuire, G., F. Wright, and M. J. Prentice. 1997. A graphical method for detecting recombination in phylogenetic data sets. Molecular Biology and Evolution 14:1125-1131.

McKown, A. D., J.-M. Moncalvo, and N. G. Dengler. 2005. Phylogeny of Flaveria (Asteraceae) and inference of C4 photosynthesis evolution. American Journal of Botany 92:1911-1928.

Meijer, J. 2000. Fossil woods from the late Cretaceous Aachen Formation. Review of Palaeobotany and Palynology 112: 297-3 36.

Miller, C. 1973. Silicified cones and vegetative remains ofPinusfrom the Eocene of British Columbia. Contributions from the University of Michigan Museum of Paleontology 24:10 1- 118. 85

Mime, I., F. Wright, G. Rowe, D. F. Marshall, D. Husmeier, and G. McGuire. 2004. TOPALi: Software for automatic identification of recombinant sequences within DNA multiple alignments. Bioinformatics 20:1806-1807.

Moore, W. S. 1995. Inferring phylogenies fmm mtDNA variation: Mitochondrial-gene trees versus nuclear-gene trees. Evolution 49:718-726.

MUller, K., and T. Borsch. 2005. Phylogenetics of Utricularia (Lentibulariaceae) and molecular evolution of the trnK intron in a lineage with high substitutional rates. Plant Systematics and Evolution 250:39-67.

Nagl, S., H. Tichy, W. E. Mayer, N. Takahata, and J. Klein. 1998. Persistence of neutral polymorphisms in Lake Victoria cichlid fish. Proceedings of the National Academy of Sciences, USA 95:14238-14243.

Nei, M. 1987. Molecular Evoltionary Genetics. Columbia University Press, New York.

Oh, S.-FI., and D. Potter. 2005. Molecular phylogenetic systematics and biogeography of tribe Neillieae (Rosaceae) using DNA sequences of cpDNA, rDNA, and LEAFY. American Journal of Botany 92:179-192.

Oline, D. K., J. B. Mitton, and M. C. Grant. 2000. Population and subspecific genetic differentiation in the foxtail pine (Pinus balfouriana). Evolution 54: 1813-1819.

Pamilo, P., and M. Nei. 1988. Relationships between gene trees and species trees. Molecular Biology and Evolution 5:568-583.

Perez de Ia Rosa, J., S. A. Harris, and A. Farjon. 1995. Noncoding chloroplast DNA variation in Mexican pines. Theoretical and Applied Genetics 91:1101-1106.

Perry, J. P. 1991. The Pines of Mexico and Central America. Timber Press, Portland, OR.

Petit, R. J., I. Aguinagalde, J. L. de Beaulieu, C. Bittkau, S. Brewer, R. Cheddadi, R. Ennos, S. Fineschi, D. Grivet, M. Lascoux, A. Mohanty, G. Müller-Starck, B. Demesure-Musch, A. Palmé, J. P. MartIn, S. Rendell, and G. G. Vendramin. 2003. Glacial refugia: Hotspots but not melting pots of genetic diversity. Science 300:1563-1565.

Popp, M., and B. Oxelman. 2004. Evolution of a RNA polymerase gene family inSilene (Caryophyllaceae)incomplete concerted evolution and topological congruence among paralogues. Systematic Biology 53:914-932.

Posada, D. 2002. Evaluation of methods for detecting recombination from DNA sequences: Empirical data. Molecular Biology and Evolution 19:708-7 17.

Posada, D., and K. A. Crandall. 2001. Evaluation of methods for detecting recombination from 86

DNA sequences: Computer simulations. Proceedings of the National Academy of Sciences USA 98:13757-13762.

Posada, D., and K. A. Crandall. 2002. The effect of recombination on the accuracy of phylogeny estimation. Journal of Molecular Evolution 54:396-402.

Price, R. A., A. Liston,and S. H. Strauss. 1998. Phylogeny and systematics of Pinus. Pages 49- 68 in: Ecology and Biogeography of Pinus (Richardson, D. M., ed.). Cambridge University Press, Cambridge, U.K.

Rieseberg, L. H., and L. Broulliet. 1994. Are many plant species paraphyletic? Taxon 43 :21-32.

Roalson, B. H., and E. A. Friar. 2004. Phylogenetic relationships and biogeographic patterns in North American members of Carex section Acrocystis (Cyperaceae) using nrDNA ITS and ETS sequence data. Plant Systematics and Evolution 243:175-187.

Robba, L., M. A. Carine, S. J. Russell, and F. M. Raimondo. 2005. The monophyly and evolution of Cynara L. (Asteraceae) sensu lato: Evidence from the internal transcribed spacer region of nrDNA. Plant Systematics and Evolution 253:53-64.

Rosenberg, N. A. 2003. The shapes of neutral gene genealogies in two species: Probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. Evolution 57:1465-1477.

Rozas, J., J. Sánchez-DelBarrio, X. Messeguer, and R. Rozas. 2004. DnaSP: DNA sequence polymorphism v4.O0.6. Bioinformatics 19:2496-2497.

Rzedowski, J., and L. Vela. 1966. Pinus strobus var. chiapensis en la Sierra Madre del Sur de Mexico. Ciencia 24:211-216.

SAS Institute, Inc. 1999. SAS/STAT user's guide, ver. 8, vol. 1. SAS Institute, Inc. Cary, NC. 1,243 pg.

Schierup, M. H., and J. Hem. 2000. Consequences of recombination on traditional phylogenetic analysis. Genetics 156:879-891.

Schneeweiss, G., P. Schonswetter, S. Kelso, and H. Nikifeld. 2004. Complex biogeographic patterns in Androsace (Primulaceae) and related genera: Evidence from phylogenetic analyses of nuclear internal transcribed spacer and plastid trnL-F sequences. Systematic Biology 53:856-876.

Shaw, G. R. 1914. The genus Pinus. Publications of the Arnold Arboretum No. 5, Riverside Press, Cambridge, MA.

Shaw, K. L. 1998. Species and the diversity of natural groups. Pages 44-56 in: Endless fonns: Species and speciation (Howard, D. J., and S. J. Berlocher, eds.). Oxford University Press, Oxford, U.K. 87

Shaw, J. 2001. Biogeographic patterns and cryptic speciation in bryophytes. Journal of Biogeography 28:253-261.

Shaw, J., and R. L. Small. 2005. Chloroplast DNA phylogeny and phylogeography of the North American plums (Prunus subgenus Prunus section Prunocerasus, Rosaceae). American Journal of Botany 92:2011-2030.

Simmons, M. P., and H. Ochoterena. 2000. Gaps as characters in sequence-based phylogenetic analyses. Systematic Biology 49:369-381.

Sites, J. W., and J. C. Marshall. 2003. Delimiting species: A Renaissance issue in systematic biology. Trends in Ecology and Evolution 18:462-470.

Smith, J. M. 1992. Analyzing the mosaic structure of genes. Journal of Molecular Evolution 34:126-129.

Song, B. H., X. Q. Wang, X. R. Wang, K. Y. Ding, and D. Y. Hong. 2003. Cytoplasmic composition in Pinus densata and population establishment of the diploid hybrid pine. Molecular Ecology 12:2995-3001.

Swofford, D. L. 2003. PAUP*4.OblO: Phylogenetic analysis using parsimony (* and other methods). Sinauer Associates, Sunderland, Massachusetts, USA.

Syring, J., A. Willyard, R. Cronn, and A. Liston. 2005. Evolutionary relationshipsamong Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. American Journal of Botany 92:2086-2 100.

Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage sitemaps with particular reference to the evolution of humans and the apes. Evolution 37:221-244.

Treutlein, J., G. F. Smith, B.-E. van Wyk, and M. Wink. 2003. Evidence for the polyphyly of Haworthia (Asphodelaceae Subfamily Alooideae; Asparagales) inferred from nucleotide sequences of rbcL, matK, ITS 1 and genomic fingerprinting with ISSR-PCR. Plant Biology 5 :5 13-521.

Wang, X. R., A. E. Szmidt, and 0. Savolainen. 2001. Genetic composition and diploid hybrid speciation of a high mountain pine, Pinus densata, native to the Tibetan plateau. Genetics 159:337-346.

Wang, X. R., Y. Tsumura, H. Yoshimaru, K. Nagasaka, and A. E. Szmidt. 1999. Phylogenetic relationships of Eurasian pines (Pinus, Pinaceae) basedon chloroplast rbcL, MA 1K, RPL2O- RPS1 8 spacer, and TRNV intron sequences. American Journal of Botany 86:1742-1753.

Watano, Y., A. Kanai, and N. Tani. 2004. Genetic structure of hybridzones between Pinus pumila and P. parvfiora var. pentaphylla (Pinaceae) revealed by molecular hybridindex analysis. American Journal of Botany 9 1:65-72. 88

Wendel, J. F., and J. J. Doyle. 1998. Phylogenetic incongruence: Window into genome history and molecular evolution. Pages 265-296 in Molecular Systematics of Plants II: DNA sequencing (P. S. Soltis, D. E. Soltis, and J. J. Doyle, eds.). Kiuwer Academic Publishers, Dordrecht, Netherlands.

Wilkin, P., P. Schols, M. W. Chase, K. Chavamarit, C. A. Furness, S. Huysmans, F. Rakotonasolo, E. Smets, and C. Thapyai. 2005. A plastid gene phylogeny of the yam genus, Dioscorea: Roots, fruits and Madagascar. Systematic Botany 30:736-749.

Winkworth, R. C., and M. J. Donoghue. 2005. Viburnum phylogeny based on combined molecular data: Implications for taxonomy and biogeography. American Journal of Botany 92:653-666.

Wright, J. A., A. M. Mann V, and W. S. Dvorak. 1996. Conservation and use of the Pinus chiapensis genetic resource in Columbia. Forest Ecology and Management 88:283-288.

Wu, C.-I. 2001. The genie view of the process of speciation. Journal of Evolutionary Biology 14: 85 1-865.

Yamane, K., and T. Kawahara. 2005. Intra- and interspecific phylogenetic relationships among diploid Triticum-Aegilops species (Poaceae) based on base-pair substitutions, indels, and microsatellites in chloroplast noncoding sequences. American Journal of Botany 92:1887- 1898.

Yuan, Y. M., S. Wohlhauser, M. Moller, J. Klackenberg, M. Callmander, and P. Kupfer. 2005. Phylogeny and biogeography of Exacum (Gentianaceae): A disjunctive distribution in the Indian ocean basin resulted from long distance dispersal and extensive radiation. Systematic Biology 54:21-34. studiesBotany,Table 3.1. that American Representative sampled Journal closely studies of related Botany, from species Plant the recent withinSystematics literature a genus and aretaken Evolution). included. from four WhereOnly prominent studies possible, that journals determinations included (Systematic multiple of speciesBiology, samples monophyly Systematic per species and were based on strict consensus trees (or ML trees). Numberaccessions of species (total with multiple speciesmonophyletic, that were (% Number of Number of Berry et al., 2005 Authors Taxonomic Group Loci species in study, % of speciestested) of species testedmonophyetic) that were included per spp. accessions (range) Schneeweiss et aL, 2004 Robba etFengetal., al., 2005 2005 Androsace,Croton,Platanus, Euphorbiaceae PlatanaceaeCynara, Asteraceae Primulaceae ITSITS + trnLF+ trnT-L ITS1 15(47, 31.9%)57 (8,(78, 62.5%) 9.0%) 7 (100%)14(93%)4 (80%) 22-42-3 4 Oh and Potter,Gilbert 2005et al., 2005 Hydrophyllaceaetribe Neillieae, Phacelia, trnLF + trnDT + psbAtrnK + ITS 17(5332.1%)4 (5, 80.0%) 13(76%)4 (80%) 2-42 6 Yamane and Kawahara, 2005 TriticumAegilops, RosaceaePoaceae trnCrpoBmatKtrnK + trnFndhJ + ITS + +ETS ndhF + leaf rp1132 + atplatpH + 7 microsatellites 12 (16, 75.0%) 8 (66%) 2-4 Yuan et Leeal., 2005 etal., 2005 Exacum,Sonchus, Gentianaceae Asteraceae ITS + trnTLF + matK ITS + trnL intron 13(13,5 (30, 100%) 16.7%) 38(62%) (60%) 4-132-4 Church and Taylor, 2005AlvarezKamiya et al.,et al., 2005 2005 Gossypium,Houstonia, Malvaceae Rubiaceae Shorea, trnL intron + trnStmG spacer AdhC 1412 6(17,(27, (14, 82.4%)444%) 42.9%) 63 (50%)(43%) 2-132-52-4 Levin McKownand Miller, et 2005al., 2005 Flaveria,Lycium, Asteraceae SolanaceaeDipterocarpaceae waxy+ trnTF P9iC 17 (48,6(47, 35.4%) 12.8%) 26 (33%)(35%) 2 5 RoalsonShaw and and Friar, Small, 2004 2005 Ca,ex,Prunus, Cyperaceae Rosaceae ITS tmLFrpLl6 + ETS 13(14,92.9%)20 (21,4 (22,95.2%) 18.2%) 15 (25%) 0 3-532-422-3 7 00 andfromTable 2. GenBank the 3.2. same Sampled accessionsindividual, Pinus are orsubg. thegiven collectionsStrobus for sequences representatives. were madetaken infrom Whenbulk Syring and two etalleles al. (2005). are marked with a subscript. Clade information refers to Figures 1 are listed for the same collection then either both alleles came Section Quinquefoliae Taxona Allele GenBank Clade Collection Information Collector or Source (Voucherb) Subsection GerardianaeP. gerardiana Wallich ex D. bungeanaDon Zuccarini ex Endlicher Al, A2 Al DQ01 8379 Gilgit,Shanxi Pakistan Province, Chinae Businsk'USDAFS 41105 Institute (RILOGC) of Forest Genetics (OSC) Subsection KrempfianaeP. squamata X.W. Li AlA2 Yunnan,Gilgit, ChinaPakistan BusinskyBusinskBusinsk' 46118 4112346120 (RILOGC)(RILOGC) Subsection StrobusP. albicauliskrempfii Lecomte Engelmann Al DQO1 8380 A Montana,California,Lam Dong, USA USA Vietnam USDAFSRanchoFirst Darwin Santa Coeur Expedition Ana dAlene Botanic 242 Nursery Garden(RBGE) (OSC) (OSC) P. armandii Franchet Franchet subsp. mastersiana AlA2Al EA Anhui,Oregon, China USA BusinskyUSDAFS 46136Dorena (RILOGC) Genetic Resource Center (OSC) veitchiiP.(Hayata) ayacahuito (Roezl) Businsk Ehrenberg Shaw ex Schlechtendal var. AlA2 BE Mexico,Kaohsiung, Mexico Taiwan USDAFSBusinsk 32163 Institute (RILOGC) of Forest Genetics (OSC) P.veitchii ayacahuite (Roezi) Ehrenberg Shaw exbhutanica Schlechtendal var. Grierson,Ehrenberg Long & Page ex Schlechtendal A3A2 B LaMichoacan, Paz, Honduras Mexico USDAFS Institute of Forest Genetics (OSC) P. cembra Linnaeus A2Al DDE NorthWestArunachal Kameng,Carpathians, Pradesh, India Romania India BladaBusinsky (OSC) 5712157112 (RILOGC) P. chiapensis (Martinez) Andresen A2Al C Veracruz,AustriaGuatemalaChiapas, Mexico Mexico HernandezUSDAFSDvorak, CAMCORE InstituteDorena (OSC) Geneticof (noForest voucher) Resource Genetics Center (OSC) (OSC) Table 3.2. Sampled SubsectionPinus subg. Strobus Strobus (Continued) representatives (Continued). P. dalatensis Ferré subsp. procerakwangtungensis Businsk' Ferré subsp. procera Businsk' Chun ex Tsiang A2Al E ChinaKon Turn Province, Vietnam Businsky 4411444116b (RILOGC) (RILOGC) P. flexills James A3A2Al B Colorado,California,Alberta, Cananda USAUSA RoelofUSDAFSNaturalWashington (OSC) Resources Institute Park Arboretumof Canada Forest Genetics(OSC) 1943-40 (OSC)(OSC) P. lambertianakoraiensis Siebold & ZuccariniDouglas Ald, pdAl, A2 A3 A, BA A California,Saitama,Khabarovsky, Japan USAe Russia USDAFSForestKrutovsky Tree CentralInstitute (no Breeding voucher) Zone of Forest Center, Genetic Genetics Japan Resource (OSC)(OSC) Program P. monticola Douglas ex D. Don A2AlA3 AB0 Oregon,California, USA USA USDAFS(OSC) DorenaCentral ZoneGenetic Genetic Resource Resource Center Program (OSC) P. P.par,iflora momsonicola Siebold hayata& Zuccarini subsp. Al, A2 A3 E, E B TaiwaneCalifornia, USA USDAFS(OSC) DorenaCentral ZoneGenetic Genetic Resource Resource Center Program (OSC) pentaphyllaP.pa,vif parv'iflorapa,viflora lore SieboldSiebold & Zuccarini subsp. Al'A2 D JapanShikoku, Japan BusinskyIseli Nursery 45155 (OSC)(RILOGC) P.pentaphylla pariiflorapeuce (Mayr)Siebold Businsk & Zuccarini subsp. Grisebach (Mayr) Businsk AlA2A2 D BulgariaHokkaido, Japan USDAFSForest Tree Dorena Breeding Genetic Center, Resource Japan Center(OSC) (OSC) P. pumila (Pallas) Regel AIdA2Al AD Hokkaido,Byelorussia,Macedonia, Japan YugoslaviaRussia WatanoKrutovskyUSDAFS (OSC) Institute (no voucher) of Forest Genetics (OSC) P. sibirica Du Tour A2AlA3 AD0 KrasnoyarskKemerovoHokkaido, JapanDistrict, Krai, RussiaRussia BuckNaturalWatano (OSC) Resources(OSC) Canada (OSC) Table 3.2. Sampled Pinus subg. Strobus representatives (Continued). Subsection StmbusP. (Continued) strobiformissfrobiformis Engelmann A2Al AB Coahuila,Texas, USA Mexico USDAFSGernandt InstituteDSG599 of (OSC) Forest Genetics (OSC) P. strobus Linnaeus AlA3 AB Wisconsin,Nuevo Leon, USA Mexico GemandtUSDAFS DSGOO500 DorenaInstitute Geneticof (OSC) Forest Resource Genetics Center (OSC) (OSC) P. wallichiana A. B. Jackson A2AlA3 AE0 Gilgit,Hutu,Minnesota,New Jersey, Rara,Pakistan USANepal USA BusinskyNaturalUSDAFS Resources 41101Oconto (RILOGC) River Canada Seed (OSC) Orchard (OSC) Section Parrya Subsection Balfounanae A3 E Himalayas Natural Resources Canada (OSC) P. anstata Engelmann balfouriana Balfour AlA2 S.Arizona,Colorado, California, USA USA USA WisuraUSDAFSFarrell (OSC) 36 CentralInstitute(OSC) Zone of Forest Genetic Genetics Resource (OSC) Program Subsection CembroidesP. longaeva Bailey Al A2 d AY634346 Utah,California,N. California, USA USA USA GernandtMcArthur(OSC) 03099 (OSC) (OSC) P. cembroides Zuccarini A3A2Al F Hidalgo,Texas,San Luis USA MexicoPotosi, Mexico GernandtUSDAFS InstituteDSG593 of (OSC) Forest Genetics (OSC) P. culminicoladiscolor Bailey Andresen & Hawksworth & Beaman Aid, A3dAl, A2 A2 F, GF F SanNuevo Luis Leon, Potosi, Mexico Mexico GernandtVelazquezHernandez 2101 (OSC) (OSC) (MEXU) P. edulis Engelmann A3A2Al G0F NewUtah,Arizona, Mexico, USA USA USA USDAFSGemandtHammond InstituteDSG4890SG488 (OSC) of (OSC) Forest Genetics (OSC) Pinus subg. Strobus Table 3.2. SampledSubsection CembroidesP. johannis (Continued) Robert-Passini representatives (Continued). G Nuevo Leon, Mexico Frankis, M.P. 179 (E) P. maximartinezii Rzedowski A2Al I San Luis Potosi, Mexico Gernandt DSG5O1 (MEXU) P. monophylla Torrey & Fremont A2Al G I California,Zacatecas, USA Mexico USDAFSRanchoUSDAFS Institute Santa Institute Anaof Forestof Botanic Forest Genetics GardenGenetics (OSC) (OSC) (OSC) A3 G Utah,Nevada, USA USA GernandtHalse 6668 DSG48O (OSC) (OSC) P. quadrifoliapinceana Gordon Parlatore ex Sudworth A2Al HI Coahuila,Queretaro, Mexico Mexico USDAFS Institute of Forest Genetics (OSC) P. remota (Little) Bailey & Hawksworth A2Al FH Texas,California, USA USA ZechWinterGernandt 382-4 (OSC) (OSC) DSG588 (OSC) Subsection NelsoniaeP. rzedowskii Madrigal & Caballero A3dAl G I Michoacán,Coahuila, Mexico Mexico BusinskyGernandt 47131 DSG 0601(RILOGC) (MEXU) a P. nelson/i Shaw Al d AY634347 Nuevo Leon, Mexico Gernandt & Ortiz D5G11298DSG10398 (OSC, MEXU) Cb HerbariumTaxonomy acronymsfollows Gernandt follow Index et al. Herbariorum:(2005). http://sciweb.nybg.org/science2/IndexHerbariorum.asp. ed cloned product,bulkRILOG collections = Silva Tarouca from numerous Research individuals Institute forat theLandscape same location. and Ornamental Gardening, 252 43 PrCihonice, Czech Republic see methods for details. 94

Table 3.3. Interspecific nucleotide diversity(it),ranges, and monophyletic status of the

species included in this study. N = number of unique alleles upon whichitwas based,itwas

determined using p-distances, SD = standard deviation of theitcalculations. Approximate ranges for each species were determined from several sources and represent best estimates (see Methods).

Monophyletic Approx. Range in the Strict Species N it (SD) (km2) Consensus Tree?

P. strobformis 3 0.0211 (0.0041)a 160 000 N P. lambertiana 3 0.0 196 (0.0038) 110 000 N P. johannis 2 0.0186 (0.0040) 18000 N P. bhutanica 2 0.0185 (0.0045) 8 000 N P. edulis 3 0.0158 (0.0033) 260 000 N P. monticola 3 0.0155 (0.0033) 370 000 N P. aristata 2 0.0149 (0.0040) 15 000 Y P. longaeva 2 0.0128 (0.0038) 10 000 Y P. discolor 3 0.0112 (0.0026) 27 000 N P. remota 3 0.0112 (0.0026) 27 000 N P. wallichiana 3 0.0112 (0.0028) 150 000 N P. balfouriana 2 0.0 107 (0.0034) 2500 N P. cembra 2 0.0100 (0.0032) 80 000 N P. pumila 3 0.0092 (0.0025) 6 000 000 Y P. sibirica 2 0.0088 (0.0033) 5 000 000 N P. strobus 3 0.0078 (0.0024) 1 800 000 Y P. cembroides 3 0.0076 (0.0023) 190 000 N P. culminicola 2 0.0075 (0.0022) 50 N P. dalatensis 2 0.0077 (0.0028) 50 N P. monophylla 3 0.0058 (0.0021) 75 000 N P. armandii 2 0.005 1 (0.0027) 600 000 N P. pinceana 2 0.0047 (0.0023) 25 000 Y P. nelsonii 2 0.0044 (0.0022) 6 000 Y P. koraiensis 3 0.0038 (0.0018) 600 000 Y P. gerardiana 2 0.0035(0.0019) 25000 Y P. chiapensis 2 0.003 1 (0.0017) 5 000 Y P. ayacahuite 3 0.0022 (0.00 12) 50 000 N P. parvflora 2 0.0022 (0.00 15) 140 000 Y P.flexilis 3 0.0014 (0.0010) 250 000 N P. albicaulis 2 0.0000 (0.0000) 400 000 Y P. bungeana 2 0.00 12 (0.00 12) 50 000 Y P. morrisonicola 2 0.0011(0.0011) 5 000 Y P. quadrfolia 2 0.0010(0.0011) 5000 Y P. kwangtungensis I n/a 500 n/a P. k.rempfii 1 n/a 300 n/a P. maximartinezii 1 n/a 4 n/a P. peuce 1 n/a 2000 n/a P. rzedowskii 1 n/a 100 n/a P. sauamata 1 n/a 0.05 n/a a value excludes the region where tests indicated potential recombination between sites 388- 399, see text for details. Table 3.4. Wilcoxon Signed Rank (WSR) tests of species nonmonophyly. All species from the strict consensus tree that were either para- or polyphyletic were constrained to monophyly and the resulting trees were tested for topological incongruence against the unconstrained trees. Up to 1000 most parsimonious trees were saved in the constraint analysis. Significant results at the a = 0.05 level are marked with an asterisk (*).

#of trees saved from Length Clades in which constraint of Range of P

Species alleles appeara analysis Treesb values sect. Quinquefoliae Fig. I P. armandii E-2 114 351 0.3173-0.6547 P. ayacahuite B-3 150 352 0.1573-0.4142 P. bhutanica D-1, Ei 1000 367 <0.000i0.0002* P. cembra D-2 397 359 0.00270.0i26* P. dalatensis E-2 61 351 0.3173-0.6547 P. flexilis 6-3 487 351 0.31 73-0.6547 P. lambertiana AI, BI, Di 1000 369 <0.00010.0003* P. monticola AI, B-2 831 368 <0.00010.0001 P. siberica D-2 35 353 0.2568-0.3173 P. strobiformis Ai, B-2 154 364 0.00050.0010* P. wallichiana Di, E-2 70 364 0.00050.00I7* sect. ParryaFig. 2 P. balfouriana sect. Balfourianae 1000 292 1.0000 P. cembroides F-3 1000 294 0.1573-0.5637 P. culminicola FI, GI 1000 302 0.00I60.0352* P. discolor F-3 951 298 0.0339*0.I336c P. edulis Fi, G-2 594 304 0.00470.0073* P. johannis GI, II 1000 302 0.00I60.0I24* P. monophylla G-3 1000 292 1.0000 P. remote F-2, G-1 1000 308 0.000i0.0036* aLetters refer to clades in Figures 1 and 2, numbers following letters indicate the number of sequences in each dade. bLength of most parsimonious trees without constraints is 350 steps for sect. Quinquefoliae and 292 steps for sect. Parrya 35% of the topologies for P. discolor returned significant results in the WSR test. Table 3.5. Nucleotide diversity and geographic range of strongly nonmonophyletic versus monophyletic to weakly nonmonophyletic species of pines. Significant differences between phyly categories were tested using one-way ANOVA.

Dependent Variable Phyly N Mean SD

Nucleotide diversity

Strongly 10 0.01494 0.00471

nonmonophyletic

Monophyletic to weakly 23 0.00566 0.004 16

nonmonophyletic

F = 32.02; df= 1 between groups, 31 within groups; P < 0.0000

Logi 0 (geographic range)

Strongly 10 4.6087 1.1613

nonmonophyletic

Monophyletic to weakly 23 4.7203 1.1554

nonmonophyletic

F = 0.06; df= 1 between groups, 31 within groups; P = 0.8007 Table 3.6. Population and genetic parameters for select Pinus species (see text for details on calculations).

Years for monophyly to be Years to reach genome- ga Species 0(!il Ne more likely than paraphyly wide coalescence P. lambertiana 600.02016 12.0x104 24.0x106 76.3x106 P. discolor 500.01124 8.03x104 13.4x106 42.6x106 P.flexilis 300.00143 1.70x104 1.71x106 5.41x106

Generation time in years, taken as the average number of years to seed bearing (Krugman and Jenkinson, online Woody Plant Seed Manual). Estimates for P. discolor are not available, so we used values from P. edulis,which is highly similar. bEstimates of theta are calculated using Watterson's approximation (Rozas et al., 2004) based on silent positions. 98

Figure 3.1. One of 20 most-parsimonious trees for section Quinquefoliae. Trees are rooted with P. aristata, P. edulis, and P. nelsonii. Bootstrap values from 1000 replicates and TBR branch swapping are shown near nodes. Number of characters = 1554, length of trees = 358, consistency index = 0.827, retention index = 0.889. Asterisks (*) indicate nodes that collapse in the strict consensus tree. Numbers in parentheses following alleles refer to the number of additional times the same allele was sequenced. The letters "L" and "M" are placed on the node of the most recent common ancestors for all alleles ofF. lam bertiana and P. monticola, respectively. 93 aIblcaullsAl (2) alblcau!1sA2

* 82 koraienslsAl 86 koralenslsA2 * koralenslsA3 100 lambe,lianaAl monticolaAl CladeA * 99 strobusM 88 sttobusA2 strobusA3 pumIIaAl pumIlaA2 pumlIaA3 sfroblfovmlsAl ayacahuiteAl ayacahulteA2 M* 50 86 ayacahulteA3 1111tb!fonnisA2 76

flexIIIsA2 Clade B 67 flexlIIsA3 monticolaA2 subsect. Strobus 98 monticolaA3 stroblformisA3 lambesllanaA2 chlapenslsAl (2) 100 Glade C L chiapensisA2 99 bhutanlcaAl walllchlanaAl peuceAl (3) 5 changes 83 cembraA2 sib! rIcaAl Glade D sib! ricaA2 cembraAl 74 parvlfloraAl 6 parvlfloraA2 (3) IambertianaA3 I bhutanlcaA2 waII!chianaA2 wall!chlanaA3

62 dalalenslsA2 74 kwangtungensisAl * Glade E 65 da!atens!sAl a,mand!1A2 94 monlson!coIaAl moirlsonftolaA2 97 armandl!A1 squamataAl (2) 81 100 bungeanaA2 bun geanaAl 85 subsect. Gerardianae gerardlanaAl gerard!anaA2

krempThAl Isubsect. Krempfianae

Figure 3.1. 100

Figure 3.2. One of 4495 most-parsimonious trees for section Parrya. Trees are rooted with P. gerardiana, P. krempfii, P. monticola, and P. nelsonii. Bootstrap values from 1000 replicates and TBR branch swapping are shown near nodes. Number of characters = 1554, length of trees = 300, consistency index = 0.843, retention index = 0.883. Asterisks (*) indicate nodes that collapse in the strictconsensus tree. Numbers in parentheses following alleles refer to the number of additional times the same allele was sequenced. 101

aristataAl

L______aristataA2 9 balfourianaAl subsect. Balfourianae balfourianaA2 IongaevaAl

* cembroidesAl * edu!isAl cembroidesA2

cembroidesA3 * 79 dscoIorA1 remotaAl Clade F * dlscolorA2 * remotaA2

100 I culminicolaAl discolorA3

culminlcolaA2 90 johannisAl subsect. Cembroides 55* remotaA3

78 edulisA2 Clade G edul!sA3 * monophyl!aAl

monophylla.A2 97 57* monophyllaA3

90 quadrifoliaAl Clade H quadrifoliaA2 /ohannisA2

62 maximartineziiAl (2) 68 Clade I 85 pinceanaAl

89 pinceanaA2 - 5 changes rzedowskllAl

Figure 3.2. 102

CHAPTER 4

Multiple nuclear loci reveal the distinctiveness of the threatened, neotropical Pinus chiapensis (Pinaceae)

John Syring', Rafael del Castillo2, Richard Cronn3, and Aaron Liston'

'Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331 USA 2Departamento de Recursos Naturales, CIIDIR IPN Oaxaca, Hornos 1003, Xoxocotlán, 71230, Oaxaca, Mexico 3Pacfic Northwest Research Station, USDA Forest Service, 3200 SW Jefferson Way, Corvallis, Oregon 97331 USA

To be submitted to Systematic Botany 103

ABSTRACT Pinus chiapensis is a threatened species of pine from southern Mexico and northwestern Guatemala. It was first described as a disjunct variety of the widespread P. strobus from the eastern United States and southeastern Canada. Prior morphological work has suggested that P. chiapensis is a distinct species, but not all authors and pine systematists have accepted that taxonomy. In this study we sequenced multiple accessions from across the range of each of the three most probable progenitors of P. chiapensis at three nuclear loci. Pinus chiapensis had the lowest combined nucleotide diversity of any of the four species (0.0031), and had only a single allele across its entire range at one of the loci. Pinus chiapensis does not share alleles with any of the possible progenitors and the alleles for this pine are monophyletic at two of the three loci. Constraint topologies forcing allelic monophyly at the third locus are not statistically different from the unconstrained trees. While the results are conclusive that P. chiapensis is at least as distinct as the remaining three widely accepted species, determination of the sister species is complicated by lack of species monophyly and interlocus variability. The sympatrically occurring P. ayacahuite, which has been cited as having the potential to hybridize with P. chiapensis, was determined as the least likely progenitor with no evidence for introgression uncovered. The data are conflicting as to whether P. monticola or P. strobus is more closely related to P. chiapensis. Lacking knowledge of the sister species, any phylogeographic story is premature. Rangewide population sampling ofF. chiapensis failed to uncover any consistent population structureas has been supported in other studies. The loss of habitat and overexploitation of thisresource, in combination with extremely limited genetic diversity, make conservation of P. chiapensisa high priority.

INTRODUCTION Pinus strobus var. chiapensis Mart. (Pinus chiapensis (Mart.) Andresen, Pinaceae) ranges from the Mexican state of Veracruz south into northwestern Guatemala (Fig. 1). This pine occurs in small remnant populations at elevations ranging from 150 2300 m (Dvorak et al., 1996) in habitats receiving 1500-3000 mm of precipitation (Perry, 1991; Farjon and Styles, 1997) in the mountains of both the Sierra Madre Occidental and the Sierra Madre del Sur. Associates include other species of pines as well as several species of broadleaved trees,some of which are considered disjunct from temperate counterparts in the eastern United States (Sharp, 104

1953; Dressier, 1954; Martin and Harreil, 1957). Examples of the latter include Liquidambar styracflua and Carpinus caroliniana. The conservation status of Pinus strobus var. chiapensis is of increasing concern due to habitat loss and population fragmentation associated with human population expansion into the highlands, exploitation of this tree as a timber resource, and agricultural practices that clear land for maize farming (del Castillo, 2002). In addition, this species has a restricted range that is naturally fragmented due to past climatic change and discontinuous distribution of suitable habitat. The declining status of this taxon prompted the IUCN to list Pinus strobus var. chiapensis as vulnerable (Farjon and Page, 1999) and the FAO (Food and Agriculture Organization of the United Nations; 1981) to list it as rare (small populations that are at risk) and endangered (extinction likely if causal factors continue unabated). While there have been no data reported on the specific loss of acreage for this taxa, its importance as a genetic resource has not been overlooked. For example, CAMCORE (Central America and Mexico Coniferous Resources Cooperative, North Carolina State University; Donahue et at., 1991) has established ex-situ germplasm banks and provenance tests outside of the native range of this species beginning in 1984. Pinus strobus var. chiapensis is one of ca. 23 members of sect. Quinquefoliae, eight of which are native to North America. Of the eight North American members of sect. Quinquefoliae, three grow in Mexico: P. ayacahuite and P. strobus var. chiapensis (southern Mexico into Central America; Fig. 1), and P. strobformis (predominantly south of the US border in northern Mexico) (there is disagreement as to whether P. flexilis occurs in Mexico (Perry, 1991; Farjon and Styles, 1997). Of the remaining five North American members, four (P. albicaulis, P.flexilis, P. lambertiana, and P. monticola) grow in the western United States and southwestern Canada. Pinus strobus, which grows in the eastern United States and southeastern Canada, isa notable outlier to this primarily western North American group (Fig. 1). Pinus strobus and P. monticola are considered to have the most similar morphology to Pinus strobus var. chiapensis (Andresen, 1966; Farjon and Styles, 1997). Pinus ayacahuite is morphologically distinct but is sympatric and could potentially hybridize with Pinus strobus var. chiapensis (Perry, 1991). All remaining North American pines in sect. Quinquefoliae are allopatric relative to Pinus strobusvar. chiapensis and are morphologically distinct. Other pines that grow in association with P. strobus var. chiapensis are from the alternative subgenus, subg. Pinus. 105

Pinus strobus var. chiapensis was first described by MartInez in 1940 as a variety of P. strobus. The closest distance between occurrences in the United States and Mexico is Ca. 2000 km, and as such P. strobus var. chiapensis would represent a disjunct population of the more expansive northern population. P. strobus var. chiapensis shares a general morphological resemblance to P. strobus, having similar form, bark characteristics (furrowed bark breaking into long rectangular plates versus the distinctly square bark plates of P. monticola), and cone size and general cone morphology. However, all of these characters are either homoplasious or sympleisiomorphic among members of subsection Strobus. Aside from morphological resemblance between P. strobus var. chiapensis and P. strobus, the previously described floristic affinities between Mexico and the eastern U.S. appear to have influenced Martinez's decision to include Chiapas pine within P. strobus (Sharp, 1953; Dressler, 1954; Martin and Harrell, 1957). In contrast, several authors (including Andresen, 1966; Rzedowski and Vela, 1966; Wright et al., 1996) make mention of the morphological similarities between P. chiapensis and P. monticola. In 1960, Gaussen was the first to propose elevating P. chiapensis to the level of species, but in failing to follow the conventional rules of nomenclature it was Andresen (1964) who is credited with the change in rank. Andresen's evidence came from a multivariate morphometric analysis between P. strobus, P. monticola, and P. chiapensis, where it was determined that the latter was sufficiently distinct from the other species on the basis of multiple characters, including the number of reflexed scales contiguous to the peduncle, the number of needle serrations along the leaf edge, leaf width, and ratio of leaf width to leaf length (Andresen, 1966). Andresen also stressed the ecological differences between the tropically occurring P. chiapensis and the temperate P. strobus and P. monticola. Interestingly, Andresen (1966) reached the conclusion that the eastern P. strobus has a much closer morphological affinity to the western P. monticola than either has to P. chiapensis. Not all taxonomists have followed Andresen in recognizing P. chiapensis as a distinct species. Among the major taxonomic treatments that have considered these taxa, Perry (1991), Landry (1989), Price et al. (1998), and Gernandt et al. (2005) recognize P. chiapensis, but Standley and Steyermark (1958), Critchfield and Little (1966), Little and Critchfield (1969), Kral (1993), and Farjon and Styles (1997) recognize its varietal status. In response to Andresen (1966), Farjon and Styles (1997) argue that the absence of discontinuities in traits (individuallyor combined) among taxa in the morphometric analysis, and the lack of unique traits to P. strobus var. chiapensis presents a weak case for the recognition of this taxa as a distinct species. 106

To date, molecular data regarding both the rank and affinity of P. strobus var. chiapensis are conflicting and incomplete. Results from nrITS (Liston et al., 1999) do not support P. chiapensis and P. strobus as sister taxa, but statistical support for species-level relationships are essentially absent. In addition, other potential progenitors (including P. monticola) were not sampled in that study. However, Liston Ct al. (1999) concluded that the substantial sequence divergence between P. chiapensis and P. strobus suggests that their isolation predates the Pleistocene. Results from the cpDNA analysis of Gernandt et al. (2005) showed a weak sister relationship (52% bootstrap support [BS]) between P. chiapensis and P. strobus, a resolution that could be consistent with either the rank of variety (e.g., P. strobus var. chiapensis) or species (P. chiapensis) (it should be noted that the bootstrap support published in Gernandt et al. (2005) uniting P. chiapensis and P. strobus is incorrect and should be 52%, with the 99% value supporting the node uniting P. ayacahuite and P. flexilis). However, without greater intraspecific sampling it is not possible to interpret the relationship between these two species. Significantly, cpDNA (Gernandt et al., 2005) shows P. chiapensis to resolve outside a well supported (99% BS) dade that contains P. monticola. Additional phylogenetic information gleaned from Late Embryogenesis Abundant (LEA)-like nuclear gene sequences has recently been published (Syring et al., in review). In that study, lack of allelic monophyly has been shown for three of the eight north American representatives of sect. Quinquefoliae, based on a sample of two to three alleles per species. Syring et al. (in review) concluded that incomplete lineage sorting was the primaly factor responsible for the lack of monophyly. Of relevance to the present study, P. chiapensis, and P. strobus both showed allelic monophyly at the LEA-like locus, while P. monticola showed significant nonmonophyly that is indicative of the retention of substantial ancestral diversity. At this locus, both alleles sequenced from P. chiapensis were strongly supported as monophyletic (100% BS), showed minimal sequence divergence (p-distance = 0.0046), and were divergent from both P. strobus and P. monticola. Indeed, individual parsimony trees indicate that P. chiapensis may form a lineage sister to the remaining North American members of sect. Quinquefoliae + P. koraiensis and P. pumila (Syring et al., in review), although this relationship lacks support (< 50% BS). It is important to note that these results (Syring Ct al., in review) derive from a single locus, and do not include broad, rangewide sampling for P. chiapensis, P. stro bus, or P. monticola. Whether allelic monophyly is a general condition across the nuclear genome of P. chiapensis and P. strobus remains to be demonstrated. While allelic monophyly at a single 107

locus is not necessarily an indication of species monophyly (Rosenberg, 2003), allelic monophyly across multiple loci could provide evidence of the genetic distinctiveness of a taxon (Baum and Donoghue, 1995; Cronn et al., 2003; Rokas et al., 2003; Small et al., 2004; Peters et al., 2005), and perhaps a clearer picture of it's ancestral relationships with other pine species. In this study we present data from three nuclear genes amplified across multiple accessions of four species from North American sect. Quinquefoliae; P. ayacahuite, P. chiapensis, P. monticola, and P. strobus. The goals of this effort are to address three specific questions: (1) Is the pattern of monophyly indicated by LEA-like mirrored across other unlinked regions of the nuclear genome?; (2) Based on combined evidence, can molecular data reveal the most likely ancestral gene pooi leading to Pinus strobus var. chiapensis, and what can be said about the phylogeography amongst the North American members of sect. Quinquefoliae?; and (3) How much genetic diversity is contained within Pinus strobus var. chiapensis, and do these three loci indicate a pattern of population subdivision across the geographic range of this taxa?

MATERIALS AND METHODS Plant materials and distribution maps - Seven species of Pinus subgenus Strobus were sampled (Table 4.1). This includes four species of the North American members of subsect. Quinquefoliae and three species from subsect. Gerardianae which were used as the outgroup (P. bungeana, P. gerardiana, and P. squamata; P bungeana was not sequenced for AGP6). Members from subsect. Gerardianae were used as the outgroup because data from Syring et al. (in review) and unpublished data (Syring) indicate that the relationships amongst members of subsect. Quinquefoliae are unresolved. The number of sequences obtained for each species varies by locus, but ranges from three to ten (Table 4.1). For the four ingroup species every effort was made to sample across their full geographic range (Fig. 1). Sequences published in earlier studies (Syring et al., 2005; Syring Ct al., in review) are indicated in Table 4.1. Distribution maps of ingroup species were acquired from multiple sources. Ranges for Pinus monticola and P. strobus were taken from the U.S. Geological Survey (1999) and Critchfield and Little (1966). Distributions for P. ayacahuite and P. chiapensis were digitized from Perry (1991). For P. ayacahuite, distributions given by Perry (1991) for P ayacahuitevar. veitchii and P. ayachahuite were combined into a single map following Farjon and Styles (1997). 108

All ranges are projected onto a standard North American base map (U.S. Geological Survey, 1999).

Locus amplification, sequencing, alignment, and data analysis Three low-copy nuclear loci were sequenced in this study. The first, AGP6, has high sequence identity to an arabinogalactan-like protein that localizes to linkage group 5 in P. taeda (Krutovsky et al., 2004; GenBank AF101785) and is associated with secondary cell wall formation in differentiating xylem (Zhang et al., 2003). Primer sequences are modified from Syring et al. (2005) (AGP6 F2, 5'-GGTCAACAATGGCGTTCAAT; AGP6 R2, 5'- TCTAACGTGAAGCGACAGGA) to include a larger portion of the gene. None of the sequences from this data set were cloned. The second locus sequenced in this study, cesAl, is part of the large cellulose synthase gene family that codes for the membrane-bound cellulose synthase (Richmond and Somerville, 2000). Gene structure was inferred from comparisons between Arabidopsis thaliana (GenBank IRX1) and Pinus taeda (GenBank AY789650), and primers were developed based on P. taeda sequence. The forward primer (CXF 10: 5'-ATCCAAGGGCCAGTGTATGT) is located in the putative exon 10, and is 204 bp upstream of intron 10; the reverse primer (CX1 1R1: 5'- CAAACGACTTCTCAAAGCTTCTCT) lies 36 bp downstream of the 5' end of exon 11. None of the sequences from this data set were cloned. The third locus shows high identity with a Late Embryogenesis Abundant (LEA)-like gene identified in Pseudotsuga menzeisii (Iglesias and Babiano, 1999; GenBank AJO 12483). LEA-like is derived from a loblolly pine eDNA clone that maps to linkage group 3 in P. taeda (Krutovsky et al., 2004; C. S. Kinlaw, Institute of Forest Genetics, unpublished data; GenBank AA739606). Additional details of the AGP6 and LEA-like (maps of their structure, primer sequences used in amplification) are given in Syring et al. (2005). DNA was extracted from haploid megagametophyte tissue using the FastPrep DNA isolation kit (Qbiogene, Carlsbad, California, USA). PCR amplification used an initial preheating step at 80°C for 2 mm, followed by 35 cycles of 94°C for denaturation, a 60°C annealing step (52°C for AGP6), and 72°C extension for 1 mm per kb of amplification product. Rreactions (40 jtL) utilized -50-200 ng of DNA template, 0.4 mM each primer, 0.2 mM dNTP, 1.5-2.0 mM MgCl2, 0.13 g.tg4tL BSA, and 2 U of Taq polymerase (Fisher Scientific, Pittsburgh, Pennsylvania, USA) inlxbuffer (10 mM Tris-HC1 [pH 9.0], 50 mM KCI). Amplification products were either run on 1.5% TAE agarose gels and isolated using QlAquick Gel Extraction kits (QIAGEN, Valencia, California, USA), or directly precipitated and desalted by ethanol precipitation. Cleaned PCR products (-100 ng) were sequenced using the ABI-Prism Big Dye Terminator Mix (ver. 3.1; Applied Biosystems, Foster City, California, USA). Products were resolved on an Applied Biosystems 3100 Genetic Analyzer capillary sequencer at the Oregon State University Center for Gene Research and Biotechnology Central Services Laboratory. In rare cases when needle tissue was used for DNA, direct sequencing identified homozygotes and heterozygotes. Homozygous sequences were directly incorporated into the alignment, while Heterozygous sequences were cloned into pGem-T Easy (Promega, Madison, Wisconsin, USA). Automated traces were evaluated using the BioEdit Sequence Alignment Editor 7.0.5.2 (Hall, 1999). Sequences were aligned by ClustalW (in BioEdit) using default parameters. Final adjustments were made by eye to minimize the number of required indels. Alignments are available at TreeBase. Statistics calculated from the alignment include the average number of characters, number of variable and parsimony informative chamcters, average within-group p- distance, and average base composition (determined using MEGA 2.1; Kumar et al., 2001). Gaps were coded as phylogenetic characters using the method of Simmons and Ochoterena (2000) and the on-line program Gap Recoder (R. Ree, http://maen.huh.harvard.edu: 8080/services! gap_recoder); all coded gaps were verified manually.

Recombination was assessed using two approaches. The maximumx2method (Smith, 1992; Posada and Crandall, 2001) identifies recombinants by testing for differences among proportions of variable and non-variable polymorphic positions in adjacent regions of aligned sequences. For all sequence pairs, a sliding window containing 10% of the variable positions is divided into two equal partitions and moved in one bp increments along the alignment. At each increment, a 2x 2 x2is calculated as an expression of the difference in the number of variable sites on each side of the partition for pairs of sequences. Recombination points are identified by plotting 2values along the length of the alignment. The difference in sum of squares method (DSS; McGuire Ct al., 1997; Milne et al., 2004) identifies recombinants using phylogenetic trees constructed from adjacent regions of an alignment. For all sequences, a 150 bp windowwas divided into two partitions and moved in 20 bp increments. For each increment, a distance matrix (Fitch 84) was calculated and a least squares tree constructed for each partition. Branch lengths were estimated using least squares, and sum of squares for each partition were recorded. Topologies for partitions were then swapped, branch lengths for the forced topologywere 110

determined, and sum of squares recorded. DSS values are plotted along alignments, and recombinant segments are identified by peaks in DSS values. Significance for both methodswas determined by 1000 permutations (parametric bootstrapping) using a = 0.01 as a threshold of

significance. The maximumx2method was implemented in the program RDP2 (MaxChi; Martin et at., 2005), while the DSS method was implemented in TOPALi vO.28 (Mime et al., 2004). Analyses were conducted on data sets where shared indels were discarded to prevent false positives. Phylogenetic analyses were performed by locus using maximum parsimony (MP; PAUP* version 4.Ob 10; Swofford, 2002). Branch support was evaluated using the nonparametric bootstrap (Felsenstein, 1985), with 1000 replicates and TBR branch swapping. Constraints on topologies were applied in PAUP* and the Wilcoxon signed-rank test of Templeton (WSR; Templeton, 1983) was employed to test for significant differences among topologies. For this test, up to 1000 most-parsimonious trees recovered were used as constraint topologies. The range of P values across all topologies is reported in every case. When testing the constraint of allelic monophyly, the lack of significance in the WSR tests implies that nonmonophyly could be the result of insufficient signal, and may not represent a true evolutionary relationship.

RESULTS

Sequence characteristicsof low-copynuclear loci in Pinus AGP6 From the AGP6 locus we acquired 34 sequences for the ingroup species, representing 19 unique alleles (Table 4.2). The number of unique alleles ranges from three to six per species. From the Washington population ofF. monticola we acquired two alleles from one maternal parent. Aligned AGP6 sequences are 774 bp in length and correspond to nucleotide positions 55-69 1 ofF. taeda (Krutovsky et al., 2004; GenBank AF1O 1785). Individualsequences of the four ingroup species average 731 bp in length (range = 729735 bp), while outgroup sequences are 774 bp in length. The aligned sequence includes 227 complete codons and one partial codon at the 5' end. The exon has 46 variable positions, of which eightare localized in first codon positions, eight in second positions, and 30 in third positions. Twenty-two of the variable positions are parsimony informative (P1). Inferred amino acid replacementsoccur at 16 of 227 sites. The alignment includes a 92-bp intron spanning nucleotide positions 632723. The intron segment includes eight variable sites and five P1 sites. Nucleotide frequencieswere relatively GC-rich (17.7% A, 19.5% T, 23.2% G, 39.6% C). 111

The intron has a single inferred indel at nucleotide position 642 that is shared between alleles of P. ayacahuite (Jal) and P. monticola (OR-i, WA, BC-i). Although AGP6 is predominantly exonic, four indels occur within exons. One mdcl is fixed across the ingroup and outgroup members of subsect. Gerardianae. Indels in the exon range from three to 39 nucleotides in length. In total, five gaps were scored and appended to the AGP6 alignment. Allele sharing across species was restricted to P. ayacahuite (Guatemala, Honduras, and Oaxaca, Mexico) and P. strobus (Michigan, North Carolina, New Jersey, Nova Scotia, and Quebec) that share a single, geographically widespread allele.

Estimates of within-taxa nucleotide diversity(it)in the AGP6 data set range from 0.0016 (P. strobus) to 0.013 (P. monticola) (Table 4.2). The relationship of within-taxa irto between-taxa it is shown in Figure 4.2 for P. chiapensis, P. monticola, and P. stro bus. AGP6 is the only locus in this study where P. chiapensis is not monophyletic, and it is also the locus that contains the greatest within-taxaitfor P. chiapensis. This is depicted in Figure 4.2, where P. chiapensis and P. monticola show overlapping allele lineages as a result of both large within-taxa and small between-taxa allelic diversity. The between-taxa it is smallest for P. chiapensisP. monticola (0.0160), and greatest for P. chiapensisP. strobus (0.0196). However, the between-taxa iv for the comparison of P. monticolaP. strobus (0.0 164) is nearly as small as for the comparison of P. chiapensisP. monticola. The between-taxa it for P. ayacahuiteP. chiapensis (not shown in Fig. 2) is greater than all other comparisons at 0.0221.

cesAl - From cesAl, we isolated 22 sequences for the ingroup taxa, representing 9 unique alleles (Table 4.2). The number of unique alleles ranges from one to four per species, and none of the sequences showed evidence of recombination. Our aligned cesAl sequences are 1036 bp in length, and the average length of individual alleles for the four ingroup species is 1000 bp (range = 881 1032 bp), while outgroup species range from 1026 1029 bp. The alignment includes 166 bp of exon, including 155 bp from exon 10 (54 complete codons) and 11 bp from exon 11(3 complete codons). Exons show three variable positions, all of which are localized to silent (third codon) positions. Two of the three variable positions are PT. No amino acid replacements are inferred in the exon. The intron includes 36 variable sites and 33 PT sites. Nucleotide frequencies are AT-rich (31.8% A, 34.8% T, 16.6% G, 16.8% C). 112

Simple indels are moderately frequent across the length of the intron, and range from 1 151 nucleotides in length. No indels are present in the exon. In total, 20 gaps were scored and appended to the alignment. Three of the indels are shared across the species boundaries of the ingroup members. One indel is shared among all members ofF. chiapensis and P. strobus, except the southernmost sample of P. strobus from North Carolina. No interspecific allele sharing was detected.

Estimates of within-taxaitrange from 0.000 (P. ayacahuite and P. chiapensis) to 0.007

(P. monticola) (Table 4.2). The between-taxaitis smallest for P. chiapensisP. strobus (0.0119), and greatest for P. monticolaP. strobus (0.0163) (Figure 4.2). The comparison of P. chiapensis P. monticola falls in between these values (0.0135). The between-taxaitfor P. ayacahuiteP. chiapensis (not shown in Fig. 2) is greater than both other comparisons with P. chiapensis (0.0153).

LEA-like - From the LEA-like locus we acquired 30 sequences for the ingroup species, representing 18 unique alleles (Table 4.2). The number of unique alleles ranges from three to seven per species. From the southern Oregon (OR-i) and northern Californian (CA-3) samples of P. monticola we acquired both alleles from the maternal parents. The P. strobus sequence from New Jersey (NJ) was cloned from needle tissue (Syring et al., 2005). Our aligned sequence for the LEA-like locus is 990 bp in length. Average ingroup lengths were 894 bp (range = 858 941 bp), and outgroup sequences ranged from 847 980 bp in length. The alignment includes 147 bp of exon on the 3' end of the amplicons (positions 844-990), including 48 complete and two partial codons. The exon has four variable positions, of whichone is localized at a second codon position and results in an amino acid replacement, while the remaining 3 are in silent (third) positions; none of these variable positions are P1. The intron segment includes 51 variable sites and 30 PT sites. Nucleotide frequencies are moderately AT-rich (26.2% A, 33.1% T, 20.4% G, 20.3% C). Complex and simple indels are moderately frequent across the length of the intron, and range from 165 nucleotides in length. No indels are present in the exon. In total, 18 gaps were scored and appended to the alignment. Two of the indels are shared across species boundaries of the ingroup members. A single indel is specific to all members of P. chiapensis, and another indel is shared between P. bungeana and all members ofF. chiapensis. No interspecific allele sharing was detected. 113

Estimates of within-taxaitrange from 0.0015 (P. ayacahuite) to 0.0085 (P. monticola) (Table 4.2). The between-taxaitis smaller for P. chiapensisP. monticola (0.0190), than for P. chiapensisP. strobus (0.0235) (Figure 4.2). However, the meanitbetween P. monticola - P. strobus (0.0 169) is 11% smaller than the P. chiapensisP. monticola value. The between-taxait for P. ayacahuiteP. chiapensis (not shown in Fig. 2) is on the same order as the P. chiapensis P. strobus comparison (0.0227).

Recombination No evidence of recombination among the included sequences was detected using either the maximumx2method or the DSS method. As a result, no sequences were excluded from the analyses on the basis of being recombinant products.

Phylogenetic analyses - AGP6 From the branch and bound search of the AGP6 data set, one most parsimonious tree was recovered (Fig. 3). The tree was 107 steps in length, had a consistency index (CI) of 0.785, and a retention index (RI) of 0.872. AGP6 is the only locus in which the alleles for P. chiapensis are not exclusively monophyletic, but instead are paraphyletic with one allele from P. monticola (CA-i; southern extent of the range) nested within the weakly supported (68% bootstrap support; BS) "P. chiapensis" dade. Pinus chiapensis alleles appear to be divided intotwoclades, one with moderate support (76% BS) including alleles from Oaxaca and Chiapas, and the other with identical support (76% BS) including P. chiapensis alleles from populations in Guatemala, Guerrero, and Oaxaca, and the sole allele from P. monticola (CA-I). In a constraint analysis forcing the monophyly of P. chiapensis, WSR results were insignificant (Table 4.3), implying that thetwoP. chiapensis clades are essentially not significantly different than a single lineage. In contrast, the nonmonophyly of all P. monticola alleles is statistically supported in the AGP6 phylogeny (Table 4.3). Interestingly, three of the six P. monticola alleles (representing six of the nine populations) are strongly supported as monophyletic (91% BS) in the most parsimonious tree. These three alleles represent the northern diversity of P. monticola, range from Oregon through British Columbia, and are weakly supported (55% BS) as sister to the "P. chiapensis" dade. The three P. monticola alleles not in this dade are all from Californian populations. Aside from the P. monticola allele found in the "P. chiapensis" dade, the othertwo 114

Californian alleles are in a weakly supported (54% BS) or unsupported position as sister to either alleles of P. ayacahuite or P. ayacahuite + P. stro bus, respectively. The only detected case of allele sharing between species occurs at the AGP6 locus between P. strobus and P. ayacahuite. In this case, the allele in common was shared among five populations of P. strobus and three populations of P. ayacahuite that have a combined geographic range spanning Nova Scotia, Canada to Honduras (26504450 km apart). All of the diversity of P. strobus is contained within a very sirongly supported dade (99% BS) that includes three of five P. ayacahuite alleles. The ir among all eight of the alleles from the P. strobus + P. ayacahuite dade is a minimal 0.26%, compared to 1.6% for all five of the P. ayacahuite alleles, 1.0% for all three P. chiapensis alleles, or 1.8% for all six of the P. monticola alleles. Constraining the alleles from P. strobus to be monophyletic results in a tree of equivalent length to the unconstrained tree, while constraining the alleles from P. ayacahuite to be monophyletic results in a highly significant WSR result (Table 4.3).

cesA 1 - A branch and bound search of the cesA 1 data set produced two most parsimonious trees, that differed in the resolution amongst outgroup taxa (Fig. 3). Trees were 84 steps in length, had a consistency index (CI) of 0.905, and a retention index (RI) of 0.929. No allelic diversity was detected among the six Pinus chiapensis sequences taken from populations across its range. The recovered allele has five unique (autapomorphic) mutations relative to a shared node with three of the four alleles of P. strobus. The P. chiapensis lineage is weakly supported as sister to three of the four P. strobus alleles, which are very strongly supported (99%) as monophyletic. However, the sister relationship of P. chiapensis + P. strobus is based on a single synapomorphic position and is poorly supported (66% BS). Collapsing this poorly supported node yields a strongly supported (88% BS) trichotomy that includes P. chiapensis, three monophyletic alleles from P. strobus (99% BS), and the two monophyletic (98% BS) alleles from Californian P. monticola. The fourth P. strobus allele (NC), representing the southernmost sample from this study, is moderately supported (70%) as the sister to a single allele shared between the three populations of P. ayacahuite. The northern samples of P. monticola are very strongly supported as monophyletic (100% BS) and moderately supported as sister to the remaining diversity of the ingroup taxa (79% BS). The lack of species-level monophyly for the alleles of either P. monticola or P. strobus is not statistically supported in WSR constraint analyses (Table 4.3). 115

LEA-like - From the branch and bound search of the LEA-like data set, 84 most parsimonious trees were recovered (Fig. 3). Trees were 119 steps in length, had a consistency index (CI) of 0.882, and a retention index (RI) of 0.932. The topology of the strict consensus of the 84 most parsimonious trees matches the LEA-like tree shown in Figure 4.3, except that nodes with less than 58% BS collapse. The three alleles representing seven populations of P. chiapensis are strongly supported (100% BS) as monophyletic, and share eight synapormorphies that separate them from a moderately supported (70% BS) dade of P. ayacahuite, P. monticola, and P. strobus. Within the P. chiapensis dade, the alleles from Veracruz and Guerrero are identical. Of the two remaining alleles, accessions from Chiapas-1, Chiapas-2, Guatemala, and Oaxaca-2 shared the same allele, and are weakly supported (62% BS) in a dade with the Oaxaca-1 allele. The alleles for P. ayacahuite are very strongly supported (95% BS) as monophyletic and are nested in a strongly supported (90% BS) dade that contains the northern-most accessions ofF. monticola. The alleles for P. strobus are strongly resolved as monophyletic (90% BS), and are nested within a dade defined by Californian P. monticola (71% BS). Constraining the alleles of P. monticola to be monophyletic results in a significant WSR result (Table 4.3).

Topological constraints on the sister relationshipofPinus chiapensis - Across all data sets topological constraints were enforced for the monophyly of all P. chiapensis alleles with the alleles for each of the three other ingroup species (Table 4.3). Forcing the monophyly of P. chiapensis with each alternative species yielded only two insignificant results (a = 0.05), implying that a sister relationship cannot be statistically rejected in those two cases. The insignificant results both involved P. strobus, one at the cesAl locus and the other at LEA-like. However, the latter result is insignificant in only 12 of the 52 total trees (23.1%), while the remaining 40 constrained trees are significant. Further, constraining LEA-like topologies for the monophyly of P. monticolaP. strobus results in 12 most parsimonious trees that are all highly insignificant in a WSR analysis. If significance in Table 4.3 is assessed at the a = 0.01 level, then a sister relationship cannot be rejected for P. chiapensis and either P. monticola or P. strobus at AGP6, all three alternative species and P. chiapensis at cesAl (P. monticola for I of 2 trees), and P. chiapensis and P. strobus for all recovered topologies at the LEA-like locus. iU1

DISCUSSION Recognition of Pinus chiapensis at the taxonomic rank of species - Based on the combined evidence, the distinctiveness of P. chiapensis warrants its recognition as a species. If P. chiapensis were a variety of any of the alternative species, the expectation is that the alleles from P. chiapensis would either be shared across species boundaries or would be consistently supported in a dade that would render the alternative species paraphyletic across the majority of the loci. In summary, P. chiapensis is monophyletic at two of the three loci, and lack of monophyly at the AGP6 locus is highly, statistically insignificant in a WSR constraint analysis (Table 4.3). In comparison, P. monticola is polyphyletic at each locus (statistically significant at AGP6 and LEA-like, but insignificant at cesAl); P. strobus is monophyletic at the LEA-like locus, polyphyletic at the cesAl locus (although insignificantly), and shares allelic diversity with P. ayacahuite at AGP6; P. ayacahuite has a single allele at cesAl that is sister to the southernmost allele sequenced from P. strobus (NC), is monophyletic at the LEA-like locus, and polyphyletic at the AGP6 locus. Therefore, at these three randomly selected loci, P. chiapensis is at least as distinct as each of the remaining ingroup taxa, all of which are recognized at the specific rank. In addition to the molecular evidence, Andresen's (1966) multivariate analysis adds morphological characters to the distinction of this species. The two morphological characters that Andresen determined to be the most significant in distinguishing P. chiapensis from either of P. monticola or P. strobus are the number of reflexed basal-scales contiguous to the peduncle (P. chiapensis has no scales that are fully reflexed), and the number of serrations per 5 mm interval of leaf edge at the leaf center (P. chiapensis has the greatest number of serrations). The third most significant character differs by species. For differentiating P. chiapensis and P. monticola, the third most significant character is the leaf width, with P. chiapensis having leaves nearly one- third more narrow than P. monticola. The third most significant character for differentiating between P. chiapensis and P. strobus is the number of involutions of edges of central scales, with P. chiapensis having 2.5 compared to 1.1 for P. strobus. The fourth most significant charcter found to differentiate P. chiapensis from either of P. monticola or P. strobus is the ratio of leaf width to leaf length, with P. chiapensis having the smallest value and therefore the "finest" or "thinnest" needles of the triad. One additional character outside of Andresen's study that might be 117 used to differentiate between P. chiapensis and P. strobus is a chemical analysis of the turpentine of these two species. Perry (1991; p. 57) mentions that he found significant differences in an interspecies comparison, but does not elaborate on his findings.

Identifying a progenitor of Pinuschiapensis - Lack of species monophyly, most likely resulting from incomplete lineage sorting (Syring et al., in review), and interlocus variability are both confounding factors in determining the progenitor of P. chiapensis. Based on the data, the least likely progenitor from the remaining species is P. ayacahuite. In phylogenetic analyses (Fig. 3), the alleles from P. chiapensis are never found in a sister relationship with alleles from P. ayacahuite. At both the cesAl and LEA- like loci, P. chiapensis alleles are separated from P. ayacahuite alleles by multiple nodes with greater than 70% BS; at the AGP6 locus the alleles from thetwospecies are mutually exclusive and separated by five nodes (with the clades containing the P. chiapensis alleles having 7576%

BS). Between-taxa estimates ofitfor P. ayacahuiteP. chiapensis are greater than remaining comparisons involving P. chiapensis at the AGP6 and cesAl loci, while at LEA-like this comparison would rank the second largest behind P. chiapensisP. strobus (0.0227 versus 0.023 5, respectively). Further, constraining topologies for the monophyly of P. ayacahuiteP. chiapensis results in significant WSR results at all three loci (Table 4.3). The allele sharing at the AGP6 locus between populations of P. strobus (from such disparate locations as Newfoundland and North Carolina) and populations of P. ayacahuite (from Oaxaca through Honduras) could reflect either ancestral retention of polymorphism (i.e. incomplete lineage sorting) or ancient introgression, but not ongoing hybridization between these two geographically isolated species. Interestingly, this pattern is not uncovered between the sympatrically occurring P. ayacahuite and P. chiapensis. This strongly suggests that hybridization between the latter two species is not rampant, if it is occurring at all. The relationship between P. flexilis, P. strobformis, and P. ayacahuite (all remaining Mexican and Central American members of subsection Strobus) has been interpreted by Critchfield and Little (1966) as clinal variation running from north to south, respectively, while molecular data confirm the close relationship of these three species (Syring et al., in review; Syring, unpublished data). Taken together, the combined evidence indicates that the progenitor of P. chiapensis is not an extant Mexican or Central American species. 118

Relationships between P. chiapensis,P. monticola, and P. strobus are more difficult to establish due to conflicting results across tests and loci. At the LEA-like locus, tree topologies place P. chiapensis sister to all remaining species. Between-taxa comparisons of iv (Figure 4.2) show P. chiapensis being slightly more similar to P. monticola (iv = 0.0190) than to P. strobus (iv = 0.023 5). However, constraining the topologies for the monophyly of P. chiapensisP. strobus resulted in 23.1% of the topologies being insignificant, and thus not statistically different from the most parsimonious trees. In results that mirror the conclusions of Andresen (1966), P. monticola and P. strobus appear to be more closely related at LEA -like than either is to P. chiapensis. The between taxa comparison for these two species (iv = 0.0 169; Fig. 2) is smaller than for any comparison involving P. chiapensis, and constraining topologies for the monophyly of P. monticolaP. strobus is insignificant for all recovered trees in a WSR test (data not shown). At cesAl, between-taxa comparisons of iv (Fig. 2) place P. chiapensis closer toP. strobus (ir = 0.0119) than to P. monticola (iv = 0.0 135), and constraining topologies for the monophyly of P. chiapensisP. strobus results in all six recovered trees being insignificant in a WSR analysis (Table 4.3). In contrast to the LEA-like locus, cesAl shows P. monticola and P. stro bus to be more divergent from each other than either is to P. chiapensis. This result is supported in a WSR analysis where topologies constrained to the monophyly of P. monticolaP. strobus were significant for all recovered trees (Table 4.3). While this may seem like potential evidence for a P. chiapensisP. strobus relationship, it was noted earlier that if nodes with less than 70% BS were collapsed in the most parsimonious trees that a trichotomy of P. chiapensis, P. stro bus (minus the NC allele), and the Californian P. monticola alleles (CA-i, CA-2) would form a single well supported (88% BS) dade. One finding of this study is that genetic diversity for P. monticola populations from California is consistently greater across all three loci compared to populations north of the California/Oregon border. These results mirror the allozyme data of Steinhoff (1983). As a consequence, it may be more appropriate to consider only the southern populations of P. monticola as sharing a recent common ancestor to P. chiapensis, rather than P. monticola range- wide. In this light, it is interesting that the population from the southernmost extent of the range of P. monticola (CA-i) renders P. chiapensis paraphyletic at AGP6. At this locus, between-taxon comparisons of iv were greater for P. chiapensisP. strobus than for either P. chiapensisP. monticola or P. monticolaP. strobus. Looking at AGP6 alone, one might reach the conclusion that P. monticola is the progenitor ofF. chiapensis. 119

However, when data from all loci are compared it seems apparent that insufficient information is available to make a determination with any degree of certainty as to the shared recent ancestor of P. chiapensis and other North American pines from Subsect. Strobus. Having clearly ruled out P. ayacahuite as the nearest relative of P. chiapensis is an important first step given their geographic proximity and potential for historical contact. Given the abundance of genomic resources available for pines, it is presently feasible to address the phylogenetic history of P. chiapensis by exploring a greater number of loci with correspondingly high population-level sampling. Future sampling should also include highly variable organellar genes (chloroplast and mitochondrial DNA). By exploring patterns of organellar evolution, it will be possible to compare and contrast results across genomes. For example, it is possible that the phylogenetic position of P. chiapensis in the cpDNA analysis of Gernandt et al. (2005) reflects the 'true' relationship for these genomes, but greater genomic and population-level sampling is required before this relationship is considered anything more than a hypothesis. Due to the uncertainty in the progenitor ofF. chiapensis, the phylogeographic question of how this pine came to occupy its current range in southern Mexico and Guatemala, whether by long-distance dispersal, or via connections with a northern or extinct relative during periods of favorable climate (Sharp, 1953; Dressler, 1954; Martin and Harrell, 1957), remains a topic for future exploration.

Genetic diversity of Pinus chiapensis and patterns of population subdivision - Pinus chiapensis has the lowest nucleotide diversity across all three loci as compared to its near-relatives P. strobus, P. monticola, and P. ayachauite (Table 4.2). In fact, P. chiapensis has the lowest recorded diversity of all eight North American Pinus subsection Strobus species (Syring, unpublished data; similar geographic and population-level sampling), and it is among the least genetically diverse species in a broader survey of Subg. Strobus species at the LEA-like locus (Syring, in review). These estimates may be relevant to conservation concerns for P. chiapensis and other rare pines. In Syring et al. (in review), nucleotide diversity at LEA-like was estimated for 33 species from 23 alleles per species, and these estimates were compared to current species ranges (a proxy for census population sizes and global abundance). The conclusion was that genetic diversity and geographic ranges (or census sizes, approximately) were uncorrelated. For example, P. bhutanica has a relatively small geographic range for pines (ca. 8000km2across SE Asia), but is ranks in the top third with respect to genetic diversity (ir = 120

0.0.0185). Similarly, P. culmicola (from Mexico) and P. dalatensis (from Mexico) are known to occupy extremely small ranges (50 km2), but they rank near the median of 33 species for genetic diversity (r = 0.0075 and 0.0077, respectively) (Syring Ct al., in review). In contrast, P. albicaulis has a very large range (400,000 km2), yet seven samples from across the range of the species show no diversity at the LEA-like locus (Syring et al., in review; Syring, unpublished data). While conservation efforts often focus on species with small or fragmented ranges, retention of genetic diversity may be of paramount importance in genetically depauparate species, particularly if they show strong inbreeding depression as has been described for pines (Sorenson, 1999). In this light, Pinus chiapensis may be of particular concern since it is potentially threatened both by a small (-5000 km2) and decreasing natural range, as well as unusually low genetic diversity relative to other pines. The complete protection of the P. chiapensis genetic resource is dependent on understanding the structure of the genetic diversity across the range of this species, such that all populations of significant genetic divergence can be conserved. It is possible that successful restoration efforts across the native range of P. chiapensis may be dependent on planting locally adapted genotypes. In this regard, Newton et al. (2002) reported the occurrence of two mitotypes based on mtDNA RFLP markers with one mitotype being isolated towards the center of the species' range. Intermediate populations containing both mitotypes were interpreted as supporting evidence for the hypothesis that populations were more contiguous in the past and have since been fragmented as a result of human impact and climate change. However, the overall population structure based on mtDNA RFLP's (Newton et al., 2002) is difficult to assess because the westernmost and northernmost populations, in the state of Guerrero and Veracruz, were not included in the analysis. In the same study, RAPD data revealed an alternative population structure, suggesting a tendency for the eastern populations of Guatemala and the Mexican state of Chiapas to be differentiated from the more northern and western populations. The percentage polymorphism of RAPD loci (24.5%) was substantially lower than in any of the other tree species for which comparable data are available (Newton et al., 2002), matching the reported findings of the current study. An AMOVA analysis conducted on the RAPD data of Newton et al. (2002) indicated that a significant proportion of the variation was attributable to differences between populations, suggesting reproductive isolation as a result of inbreeding and genetic drift that would be expected of a highly fragmented distribution. 121

The results from the current study offer little in the way of supportor contradiction to the findings of Newton et al. (2002), mostly as a result of the limited genetic diversity of P. chiapensis, but also has a result of limited population-level sampling. At the LEA-like locus, populations from Guerrero and Veracruz (the west and north of the range, respectively)were poorly supported (62% BS) as sister to the remaining diversity to the south and east. This pattern is not supported at AGP6, where one allele was sequenced from populations in Guatamala, Oaxaca, and Guerrero. That allele was moderately supported (75% BS) ina dade with P. monticola (CA-i). No information on genetic structure is available from the cesA 1 locus. The determination of population-genetic structure from sequence information will require several to many highly informative loci and perhaps denser population-level sampling than was employed in this study. Although, it should be noted that with similar sampling a population structure for P. monticola was uncovered (Californian versus northern groups), suggesting that the population structure for P. chiapensis may not be well defined.

ACKNOWLEDGEMENTS We thank David Gernandt (Universidad Autónoma de Hidalgo, Mexico), Dave Johnson and Tom Ledig (USDA Forest Service, Institute of Forest Genetics, Placerville, CA), Richard Sneizko and Jerry Berdeen (USDA Forest Service, Dorena Genetic Resource Center, Cottage Grove, OR), Dennis Ringnes (USDAFS Central Zone Genetic Resource Program, California), Roman Businsky (Silva Tarouca Research Institute for Landscape and Ornamental Gardening, Prühonice, Czech Republic), Jesus Vargas Hernandez (Instituto de Recursos Naturales, Mexico), Dale Simpson (Natural Resources Canada), Jean Beaulieu (Canadian Forest Service), Bill Dvorak (CAMCORF), Carrie Sweeney (USDAFS Oconto River Seed Orchard, Minnesota), and Victor Vankus (USDAFS National Tree Seed Laboratory) for their kind help in acquiring seed and needle collections. Kathleen Farrell and Mariah Parker contributed many hours of laboratory assistance. Special thanks to Raul Rivera for risking his life climbing trees to get us fresh seed. Funding for this study was provided by the National Science Foundation grant DEB 0317103 to Aaron Liston and Richard Cronn, and the USDA Forest Service Pacific Northwest Research Station. 122

REFERENCES

Andresen, J. W. 1964. The taxonomic status of Pinus chiapensis. Phytologia 10:417-421.

Andresen, J. W. 1966. A multivariate analysis of the Pinus chiapensis-monticola-strobus phylad. Rhodora 68:1-24.

Baum, D. A., and M. J. Donoghue. 1995. Choosing among alternative "phylogenetic" specied concepts. Systematic Botany 20:560-573.

Critchfield, W. B., and E. L. Little Jr. 1966. Geographic distribution of the pines of the world. USDA Forest Service Miscellaneous Publication 991. Washington, D.C.

Cronn, R. C., R. L. Small, T. Haselkorn, and J. F. Wendel. 2003. Cryptic repeated genomic recombination during speciation in Gossypium gossypioides. Evolution 57:2475-2489. del Castillo, R. F., and S. Acosta. 2002. Ethnobotanical notes on Pinus strobus var. chiapensis. Anales del Instituto de BiologIa, Universidad Nacional Autónoma de Mexico, Serie Botánica 73:319-327.

Delgado, P., D Piflero, A. Chaos, N. Perez-Nasser, and E. R. Alavarez-Buyila. 1999. High population differentiation and genetic variation in the endangered Mexican pine Pinus rzedowskii (Pinaceae). American Journal of Botany 86:669-676.

Donahue, J. K., W. S. Dvorak, and E. A. Gutierrez. 1991. The distribution, ecology, and gene conservation of Pinus ayacahuite and P. chiapensis in Mexico and Central America. CAMCORE, Raleigh, North Carolina.

Dressier, R. L. 1954. Some floristic relationships between Mexico and the United States. Rhodora 56:81-96.

Dvorak, W. S., J. K. Donahue, and J. A. Vasquez. 1996. Provenance and progeny results for the tropical white pine, Pinus chiapensis, at five and eight years of age. New Forests 12:125-140.

FAO. 1981. Report of the fifth session of the FAO panel of experts on forest gene resources. FAO, Rome, Italy.

Farjon, A., and B. T. Styles. 1997. Pinus (Pinaceae). Flora Neotropica Monograph 75. The New York Botanical Garden, New York.

Farjon, A. and C. N. Page. 1999. Conifersstatus survey and conservation action plan. IUCN/SSC Conifer Specialist Group, Cambridge, U.K.

Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791.

Gaussen, H. 1960. Pinus. In "Les gymnospermes actuelles et fossiles," Tome II, Section I, Vol. I, Chap. XI, Fascicle VI, pp. 1-272, Travaux du Laboratorie Forestier de Toulouse, Toulouse, France. 123

Gernandt, D. S., G. G. Lopez, S. 0. Garcia, and A. Liston. 2005. Phylogeny and classification of Pinus. Taxon 54:29-42.

Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series 41:95-98.

Iglesias, R. G., and M. J. Babiano. 1999. Isolation and characterization of a eDNA encoding a late-embryogenesis-abundant protein (Accession No. AJO 12483) from Pseudotsuga menziesii seeds. Plant Physiology 119: 806.

Kral, R. 1993. Pinus. Flora of North America (North of Mexico), vol 2 (Flora of North America editorial committee (ed.), ed.). Oxford Univerisity Press, New York.

Krutovsky, K. V., M. Troggio, G. R. Brown, K. D. Jermstad, and D. B. Neale. 2004. Comparative mapping in the Pinaceae. Genetics 168:447-461.

Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: Molecular evolutionary genetics analysis software. Bioinfonnatics 17:1244-1245.

Landry, P. 1989. A revised synopsis of the white pines (Pinus, section Quinquefoliis). Phytologia 65:467-474.

Liston, A., W. A. Robinson, D. Pinero, and E. R. Alvarez-Buylla. 1999. Phylogenetics of Pinus (Pinaceae) based on nuclear ribosomal DNA internal transcribed spacer region sequences. Molecular Phylogenetics and Evolution 11:95-109.

Little, E. L. Jr., and W. B. Critchfield. 1969. Subdivisions of the genus Pinus. USDA Forest Service Miscellaneous Publication 1144.

Martin, P. 5., and B. E. Harrell. 1957. The pleistocene history of temperate biotas in Mexico and eastern United States. Ecology 38:468-480.

Martin, D. P., C. Williamson, and D. Posada. 2005. RDP2: Recombination detection and analysis from sequence alignments. Bioinformatics 21:260-262.

Martinez, M. 1940. Pinaceas Mexicanas. Anales del Instituto de Biologla de Mexico 11:57-84.

McGuire, G., F. Wright, and M. J. Prentice. 1997. A graphical method for detecting recombination in phylogenetic data sets. Molecular Biology and Evolution 14:1125-1 131.

Milne, I., F. Wright, G. Rowe, D. F. Marshall, D. Husmeier, and G. McGuire. 2004. TOPALi: Software for automatic identification of recombinant sequences within DNA multiple alignments. Bioinfonnatics 20:1806-1807.

Newton, A. C., T. R. Allnutt, W. S. Dvorak, R. F. del Castillo, and R. A. Ennos. 2002. Patterns of genetic variation in Pinus chiapensis, a threatened Mexican pine, detected by RAPD and mitochondrial DNA RFLP markers. Heredity 89:191-198. 124

Perry, J. P. 1991. The Pines of Mexico and Central America. Timber Press, Portland, OR.

Posada, D., and K. A. Crandall. 2001. Evaluation of methods for detecting recombination from DNA sequences: Computer simulations. Proceedings of the National Academy of Sciences, USA 98:13757-13762.

Price, R. A., A. Liston,and S. H. Strauss. 1998. Phylogeny and systematics of Pinus. Pages 49- 68 in: Ecology and Biogeography of Pinus (Richardson, D. M., ed.). Cambridge University Press, Cambridge, U.K.

Richmond, T. A. and Somerville, C. R. 2000. The cellulose synthase superfamily. Plant Physiology 124:495-498.

Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome scale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798-804.

Rzedowski, J. and L. Vela. 1966. Pinus strobus var. chiapensis en la Sierra Madre del Sur de Mexico. Ciencia 24:211-216.

Sharp, A. J. 1953. Notes on the flora of Mexico: World distribution of the woody dicotyledonous families and the origin of the modern vegetation. Journal of Ecology 41:374-380.

Simmons, M. P., and H. Ochoterena. 2000. Gaps as characters in sequence-based phylogenetic analyses. Systematic Biology 49:369-38 1.

Small, R. L., R. C. Cronn, and J. F. Wendel. 2004. Use of nuclear genes for phylogeny reconstruction in plants. Australian Systematic Botany 17:145-170.

Smith, J. M. 1992. Analyzing the mosaic structure of genes. Journal of Molecular Evolution 34:126-129.

Sorenson, F.C. 1999. Relationship between self-fertility, allocation of growth, and inbreeding depression in three coniferous species. Evolution 53: 417-425.

Standley, P. C. and J. A. Steyermark. 1958. Flora of Guatemala, Part I. Chicago Natural History Museum, Fieldiana, Botany, 24. 476 p.

Steinhoff, R.J., D.G. Joyce, and L. Fins. 1983. Isozyme variation in Pinus monticola. Canadian Journal of Forest Resources 13:1122-1 132.

Swofford, D. L. 2003. PAUP*4.OblO: Phylogenetic analysis using parsimony (* and other methods). Sinauer Associates, Sunderland, Massachusetts, USA.

Syring, J., A. Willyard, R. Cronn, and A. Liston. 2005. Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. American Journal of Botany 92:2086-2 100. 125

Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37:22 1-244.

U.S. Geological Survey. 1999. Digital representation of Atlas of United States Trees by Elbert L. Little, Jr.

Wright, J. A., A. M. Mann V, and W. S. Dvorak. 1996. Conservation and use of the Pinus chiapensis genetic resource in Columbia. Forest Ecology and Management 88:283-288.

Zhang, Y., G. Brown, R. Whetten, C. A. Loopstra, D. Neale, M. J. Kieliszewski, and R. R. Sederoff. 2003. An arabinogalactan protein associated with secondary cell wall formation in differentiating xylem of loblolly pine. Plant Molecular Biology 52:91-102. Table 4.1. Sampled Pinus representatives.P. bungeana Zuccanni en Taxon Endlicher P. bungeana Code Shanxi Province, China Collection USDAFSCollector Institute or Source of Forest Genetics(Voucher (OSC) 5) P. sqriamatagerardisna X.W. Wallich Li cx D Don P. gerardiana-l'P. gerardiana-25squamata-15 Vunnan,Gilgil,Gjlgit PakistanPakistan China BueinskBusinskiBusin5k 461184110541123 (RILOG) P. ayacshuife var veifchui (Roez!) Show Ehrenberg ox Sctrlechtendal P. squamata-25 Max° Jat Mexico,jalioco,Vunrsarr, MexicoMexico China USDAFS Institute of Forest Genetics (OSC) Businsky 46120 (RILOGI P. ayacahuife Ehr.nberg en Schlechfendal Mic-15Hon5Gua Totonicapa, GuatemalaMichoacan,La Paz, Mexico Honduras USDAFS Institute of Forest Genetics (OSC) P. driapensis (Martinez) Mdresen Chi-15Mic-2Oax Michoacan, MexicoChiapas,Oaxaca, MexicoMexico USDAFS tnstitute of Forest GerreticaJeandra (OSC) Dvorak. Moreno-LetelierCAMCORE (no voucher) (no voutheq) Gua5Chi-2Gue Guerrero,Chiapas, Mexico Mexico Guatemala USDAFS Institute of Forest Genetics (OSC) SyringSynng 10121022 (OSC) Oax-3Oax-2Oax-1 Oaxaca, Mexico Jesus Vargas SyringSynng 10201018 (OSC) Hernandez (OSC) P. mo,'rticola Douglas cx D. Don BrC-2BrC-1Ver° British Columbia.Columbia, CanadaCanada Veraauz, Mexico Natural Resources Canada (OSC) Hemandez (OSC) CA-35CA-I5CA-2 Catifornia,Catifomia,California, USA USDAFS Central Zone Genetic Resource Program (OSC) USDAFS RB Cannno Seed Orchard (O$C) OR-ISOR-2 WA Washington, USA Oregon.Oregon, USA USDAFS DorensDorena Genetic Resource Center (OSC) P. strubus Linnaeus MN5NBc Ml New Brunswick, Canada Minnesota,Michigan, USA USA USDAFS Oconto River SeedSend Orchard (OSC) Natural Resources Canada (OSC) NJ5NflNC Newfoundtand,North Carolina,New USA Jersey, USA Canada USDA Forest Service National Tree Seed Laboratory (OSC) Naturel Reaources Canada (OSC)G.mandt DSGOO500 (OSC) QuoNSc Nova Scotia, CanadaQuebec,Wisconsrn, Canada USA USDAFS Dorena Genetic Resource Center (OSC) Natural ResourcesCanadxin Canada (OSC) Forest Service (OSC) LEA-likeTaxononly sequences follows Gemandt published at previouslyal. (2005). in Syring at at. (in review) Wt-25 Wsconsin, USA USDAFS Dorena Genetic Resource Center (OSC) 5Herberium acronymsRILOGBoth AGP6 follow and LEA-likeIndex Herbariorurn: sequences hltp:llsciweb.nybg.orgfscience2/lndexHerbariorum.aap.published previously in Synng at at. (2005) Silva Tarouca Research Institute for Landscape and Ornamental Gardening, 25243 Prirrhonice, Czech Republic. Table 4.2. Summary of the number of sequences (N) and the number of unique alleles (A) acquired per species across all three loci. Within averageCombinedgroup estimates iv). of nucleotide diversity (iv) and standard deviation (SD) were calculated from p-distances using 500iv bootstrap replicates.was weighted by average sequence length across all three loci, and a rank was assigned from I (highest average iv) to 4 (lowest Combined P.ayacahuite Species N8 A5 0.011AGP6 m SD N3 A 1 cesAl 0 r SD N A3 LEA-like SD Rank P. strobusmonticolachiapensis 1079 635 0.00160.00740.013 0.00070.00220.0021 67 341 0.0060.007 0 0.00160.0017 0 1076 573 0.00850.00290.00150.005 0.00140.00180.00120.0008 0.00440.00920.00310.0036 4213 128

Table 4.3. Wilcoxon signed rank (WSR) results from the topological constraint of monophyly enforced on either individual species or species pairs. All alleles from each species are used in each test. The level of significance was assessed at alpha = 0.05. Insignificant values indicate that the constrained topologies are not statistically different from the unconstrained trees. Boxed values are at least partially insignificant.

Locus Monophyly Constraint LUa Lcb NC Significance AGP6 P.ayacahuite 107 117 3 0.0016 P.chiapensis 107 110 2 0.3173,0.3657 I I P.monticola 107 117 5 0.0323-0.0388

P.strobus 107 107 1 I 1.0000 I P.chiapensisP.ayacahu!te 107 122 9 0.0011-0.0046 P.chiapensis-P.monticola 107 112 1 0.0253 P.chiapensis-P. strobus 107 118 2 0.0165 P.monticola-P. strobus 107 123 17 0.0007-0.0011

cesAl P. monticola 84 89 1 0.0588 P.strobus 84 86 2 0.1573,0.3173 P.chiapensis -P.ayacahu!te 84 90 16 0.0143-0.0339 P.chiapensis P.monticola 84 91 2 0.0082,0.0196 P.chiapensis-P. strobus 84 86 6 - I 0.15730.3173 I P. monticola-P.stro bus 84 91 5 0.0196

LEA-like P. monticola 119 127 4 0.0325-0.0455 P.chiapensis-P. ayacahuite 119 128 84 0.0027-0.0067 P.chiapensis P.monticola 119 131 36 0.0005-0.0013 P.chiapensis P.strobus 119 125 52 00143_00578d P.montico!aP. strobus 119 123 12 0.2482-0.2850

Lu is the length of the unconstrained topologies shown in Figure 2 Lc is the length of the topology constrained for the monophyly of P.chiapensiswith the species of interest N is the number of constrained trees recovered from a heuristic search d76.9% of the recovered topologies are significant, while 23.1% are insignificant. Gulf of Mexico Mtchoacar,i.z-jcj (.. Guerrero Oaxac - Il .-' A. 0 600km Gua B. Guatetrela z :. Idaho Iii'.iCA.3JJY I ( %. CA-2 Nevada o AGP6&cesAlAll loci 0 CA-I U 900km 0 2000 km A cesAl&LEAAGP6&LEA strobusFigureC. 4.1. (D). Distribution Population codesand sample are given locations in the forcollection all accessions table (Table at all loci.2). Pinus ayacahuite (A), P. chiapensis (B), P. monticola (C), and P. D. 10 * AGP6 t'J 130

AGP6 cesA I LEA-like

MONT I4013° T900O85)

In nn

STRO (0.005) (0.0016)

Figure 4.2. Graphic representation of within-taxa and between-taxa nucleotide diversity as measured by p-distances. Size of the circles represents within-taxa means (values in parentheses), lines connecting circles are between-taxa means (values given). CHIA = P chiapensis, MONT = P monticola, and STRO = P strobus. All three loci are at the same relative scale. 131

Figure 4.3 Most parsimonious frees derived from individual loci. Bootstrap values from 1000 replicates and TBR branch swapping are shown near nodes. A = aligned length of data set, L = length of trees, N = number of trees, CI = consistency index, and RI = retention index. Asterisks (*) indicate nodes that collapse in the strictconsensus tree. Population codes refer to Table 12 and Figure 8. 132

A. AGP6 P. gerardiana-1 -P. squamata-1 r P. ayacahulte (Mex) P. ayacahuite (Jal, Mb-i) P. 8yacahulte (Gua, Hon, Oax) 99 P.strobus (WI-i) A=779 P.strobus (WI-2,NBk) L=107 66 P. strobus (Nf I) P. strobus (MI, NC, NJ, NSc, Quo) N=I P. strobus (MN) CI=0.7850 J P. mont/cola (CA-2) 100 P. syacahulte (Jab) RI=0.8722 P ayacahuite (Mic-2) P. ,nonflcola (CA.3) 76 P. chla pens/s (Chi- 1> 68 P. chiapensis (Oax-2) 75 P. chiapensls (Gua, Gue, Oax-3) P. mont/cola (CA-i) P. monticola (OR-i, WA) P. mont icola (BrC-2. OR-2, WA) 5 P. monticola (BrC-1) B. cesAl P. bungeana 152 P. gerardlana.2 L;.I psquamata-1 P chlaponsis (Chb.2, Gua, Gue, Oax-i, Oax-2, Ver) P. strobus (WI-i) A= 1056 P. strobus(MI, NSc) pstrobus(MN, Nfl) L= P. monticola (CA-i) N=2 mont/cola (CA-2) CI = 0.9048 P. strobus (NC) P. ayacahuite (Gua, Mex, Oax) RI = 0.9292 100 1P. mont/cola (BrC-i, OR-i, OR-2, WA) L..P.monficola(BrC-2) 5 C. LEA-like P. bun geana P. gerardiana-2 A = 1008 P. P. strobus (WI-2) L=119 P. strobus (Nfl) 98 N=84 90 (NJ) P. strobus Que) 58 P. strobus IMI, MN) Cl = 0.8824 P. monticola (CA-3 (bA-i) RI = 0.9327 P mnntl,!nk ''"' P Mx P. ayacahuite(Mc-1) * P. ayacahulte (Gua. Hon. JaI, Oax) * P. mont/cola (CA.3) 90 P. monticola (8rC-i) P. monticola (BrC-2, OR-i, OR-2, WA) P. monticola (OR-i) - P. chlapensls(Gue, Ver) P. chiapensis (Oax- 1) 5 P. chiapensis(Chi.i, Chi-2, Gua, Oax-2)

Figure 4.3. 133

CONCLUSIONS

GENERAL CONCLUSIONS This dissertation represents a contribution to the field of pine systematics and phylogenetics. The included work does not represent an end to the pursuit of understanding the process or pattern of pine evolution, but rather a new beginning in this line of study. With the large amount of genomics work that has been published on Pinus taeda, P. radiata, P. pinaster (Plomion et al., 1995; Costa Ct al., 2000; Brown et at., 2001; Mariette et al., 2001; Temesgen et at., 2001, Brown Ct at. 2004; Devey et al., 2004), and to a lesser extent on a handful of other species representing both subgenera (Nelson et at., 1993; Nelson et al., 1994; Echt and Nelson, 1997; Travis et al., 1998), pines offer a robust system for the study of plant evolution. Research into association and comparative mapping in conifers (Krutovsky Ct al., 2004; Neale and Salvolainen, 2004) continues to greatly increase our knowledge of the pine genome. In addition, the recently awarded funding for the development of a BAC library scaffold in Pinus (Daniel Peterson, MGEL, unpublished data; NSF Grant DBI-042 1717) will add a tremendous wealth of data and will further advance the field of comparative genomics, allowing comparisons across such distant groups as the gymnosperms and angiosperms. Taken as whole, research on the patterns and processes of genetic differentiation and speciation in Pinus have made this genus a model taxa amongst conifers and perhaps among other long-lived, woody seed plants that share similar life history traits. The research of this dissertation represents the first effort to develop mapped nuclear loci for the purpose of phylogenetics in gymnosperm s, and as such has opened the doors for other researchers to either use our developed low-copy markers or to screen additional markers following our methods. It is my hope that this work will lay the groundwork for additional phylogenetic and population genetic research in pines and other gymnosperms. As a general conclusion to my first manuscript, I stated that, "a successful phylogenetic strategy in Pinus will require many low-copy nuclear loci that include a high proportion of silent sites and derive from independent linkage groups." While I still believe this statement to be accurate, the importance of this remark has taken on new meaning in light of new findings presented in this dissertation. Perhaps the most important general finding of this body of work is that coalescence failure at the species level can lead to significantly different phylogenetic 134

interpretations that are only detectable by sampling multiple individuals per species across multiple loci. Further, conflicting patterns of allelic coalescence and species relationships across loci complicate (possibly eliminate) the potential for the combination of sequence data from multiple genes that is needed to acquire large enough data sets to resolve relationships among terminal species. Combining of sequence data from across loci represents an area of phylogenetics where a clear consensus has yet to be reached, and this new perspective from pines may add a new dimension to this debate. The demonstration of widespread species-level nonmonophyly appears to be a severe constraint in the application of nuclear genes in resolving organismal evolutionary history of pines at low taxonomic ranks. Similar conclusions were reported by Bouillé and Bousquet (2005) and Broughton and Harrison (2003), all of whom have suggested that nuclear gene genealogies have limited potential to reconstruct evolutionary histories among closely related species. While this may prove to be true in deciphering relationships among the terminal species of Pinus, it is not prudent to extrapolate these findings to dissimilar taxa with different life history traits (e.g., small Ne,short life spans, higher inbreeding rates). Even with the limitation of noncoalescence, discrete nuclear genes can provide important insights into the historical, demographic, and possibly even selective processes that forge new species. This information offers perspectives that may contrast with organellar DNA or nrITS as to the biological basis for the presence (or absence) of phylogenetic patterns. While lack of allelic monophyly in pines appears to limit the potential for nuclear gene phylogenetics, two useful examples of the power of nuclear genes in addressing systematic questions are presented in Chapters 2 and 4 of this dissertation. In Chapter 2, a combined analysis of low-copy nuclear loci produced a well-supported subsectional topology and offers insight into the placements of the two monotypic subsections Nelsonianae and Krempfianae, whose evolutionary positions were previously unresolved by cpDNA or rDNA (Gernandt et al., 2005; Liston et al., 1999, 2003). The exemplar sampling strategy employed in Chapter 2 was supported in Chapter 3 by data indicating that intraspecific variation is contained within subsectional boundaries. In Chapter 4, three nuclear genes were used to clearly demonstrate the distinctiveness of P. chiapensis and its recognition at the specific rank. The use of nuclear genes in Pinus to address narrower systematic questions requiring population sampling across multiple loci appears to be an area that offers a great deal of future research potential. 135

FUTURE RESEARCH The data on the lack of allelic monophyly within the nuclear genome of Pinus is strongly suggestive that future approaches will need to move beyond traditional cladistic methods and into the rapidly developing field of coalescent theory. Coalescent based approaches exploit variation among loci to estimate population parameters such as effective population size, ancestral population sizes, divergence times, rates of gene flow, expectations of genic and genomic monophyly, and information on historical biogeography (Beerli and Felsenstein, 1999; Rannala and Yang, 2003; Rosenberg, 2003; Dean and Ballard, 2004; Dolman and Moritz, 2005; Walsh et al., 2005). Coalescent methods "treat the gene tree as a nuisance parameter" (Dolman and Moritz, 2005) from which to extract data, and are increasingly pushing the boundaries of available analysis tools and helping to turn potential conflicts (i.e. incomplete lineage sorting) into constructive data (Hey and Machado, 2003; Maddison and Knowles, 2006). Research from this dissertation can be applied to questions more traditionally considered in the field of population genetics. For example, a distinctive population structure was uncovered for P. monticola in Chapter 4 of this dissertation, with the results mirroring those of Ste inhoff et al. (1983). Steinhoffet al. (1983), using isozyme variation among populations of P. monticola, found that the population could be divided into two geographic groups, a broad northern group and a southern group consisting of populations from roughly south of the OregonCalifornian border. While this conclusion was based on a limited number of populations and failed to include samples from remote populations, Steinhoff et al. (1983) hypothesized that the southern group was the ancestral group, based on reduced levels of heterozygosity within the northern population. Using sequence data from low-copy nuclear genes instead of isozymes, it will be possible to place both the diversity of the southern population and the lack of diversity of the northern population into a phylogenetic framework, thereby giving a more complete picture of the genetic diversity of P. monticola. A population level study that samples populations from across the range of P. monticola, multiple accessions from each population, and multiple loci will provide insights into the Pleistocene history ofF. monticola (i.e. refugia, dispersal centers, bottleneck events), and will provide important information on the genetic diversity and structure of Western white pine. Further, I expect results from a study of this nature to provide dramatic insight into conservation management and restoration strategies for P. monticola. Although a population structure ofF. chiapensis was not uncovered from the data in Chapter 4, it is possible 136 that a population structure will become apparent as more loci are developed, screened, and sequenced across this species and closely related relatives. Both the methods and the developed nuclear loci from this dissertation can also be used in the more traditional phylogenetic sense. Two additional unresolved systematic questions in Pinus that have attracted the attention of researchers are, 1) the monophyly of subsect. Cembrae ("stone pines", Shaw, 1914; Little and Critchfield, 1969), and 2) the clinal variation in the P. flexilis, P. strobformis, P. ayacahuite complex of subsect. Quinquefoliae (Critchfield and Little, 1966; T. Ledig, unpublished data). Both of these outstanding phylogenetic questions would be easily addressed following the methods and sampling design presented in Chapter 4. In a preliminary test of the monophyly of the proposed subsection Cembrae, the topology shown in Figure 3.1 was taken as a one-gene line of evidence for testing this evolutionary hypothesis. The monophyly of this subsection was tested under constraints using all twelve sequences from P. albicaulis, P. cembra, P. koraiensis, P. pumila, and P. sibirica. A heuristic search enforcing a "subsect. Cembrae" monophyly constraint returned 42 trees of length 359, 9 steps longer than the unconstrained trees, resulting in significant topological conflict using the WSR test (P=0.0027- 0.0067). This preliminary analysis suggests that subsection Cembrae is an artificial taxa, resulting from the convergent evolution of cones in response to selective pressure for bird dispersal (Tomback, 1982; Tomback and Linhart, 1990). Future work will require more intense population- level sampling across multiple loci, and will lead to valuable insights into the pressures driving speciation and the evolution of homoplastic morphological characters in Pinus. Such research will most likely require international collaboration with researchers in Japan, Korea, China, and/or eastern Russia. In a final example, Critchfield and Little (1966) recognize P. strobformis as a morphological and geographical link between P. flexilis to the north and P. ayacahuite to the south. In Chapter 3, I found that either a broadly circumscribed P. ayacahuite or P. flexilis, to include the diversity ofF. strobformis, would fail to restore allelic monophyly in this taxonomic complex. At the LEA-like locus (Chapter 3), the only case of interspecific allele sharing that was uncovered was between P. ayacahuire and P. flexilis. This observation was striking, since these samples came fmm wild collections separated by -3600 km (P. ayacahuite originated from the state of Mexico, Mexico and P. flexilis from Alberta, Canada). This allele was also sequenced for P. strobformis from both New Mexico and Texas, although it was not included in this disseration. In Chapter 4 the only interspecific allele sharing that was uncovered was between 137

populations of P. ayacahuite and P. strobus at the AGP6 locus. To determine if this common allele was present in P. flexilis and P. strobformis would require intense population sampling across the range of those species. This research would provide key insights into how alleles are partitioned across the landscape. Determining the relationship between P. flexilis, P. strob(formis, and P. ayacahuite, their potential for interspecific hybridization, the movement of alleles along this north-south chain, and the geographic boundaries of their ranges could all be studied using intense population sampling across multiple low-copy nuclear genes, following the methods presented within this dissertation.

REFERENCES Beerli, P., and J. Felsenstein. 1999. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152:763-773.

Bouillé, M. and J. Bousquet. 2005. Trans-species shared polymorphisms at orthologous nuclear gene loci among distant species in the conifer Picea (Pinaceae): Implications for the long-term maintenance of genetic diversity in trees. American Journal of Botany 92:63-73.

Broughton, R. E., and R. G. Harrison. 2003. Nuclear gene genealogies reveal historical, demographic and selective factors associated with speciation in field crickets. Genetics 163:1389-140 1.

Brown, G. R., E. E. Kadel Ill, D. L. Bassoni, K. L. Kiehne, B. Temesgen, J. P. van Buijtenen, M. M. Sewell, K. A. Marshall, and D. B. Neale. 2001. Anchored reference loci in loblolly pine (Pinus taeda L.) for integrating pine genomics. Genetics 159:799-809.

Brown, G. R., G. P. Gill, R. J. Kuntz, C. H. Langley, and D. B. Neale. 2004. Nucleotide diversity and linkage disequilibrium in loblolly pine. Proceedings of the National Academy of Sciences 10 1:15255-15260.

Costa, P., D. Pot, C. Dubos, J. M. Frigerio, C. Pionneau, C. Bodenes, E. Bertocchi, M. T. Cervera, D. L. Remington, and C. Plomion. 2000. A genetic map of maritime pine based on AFLP, RAPD and protein markers. Theoretical and Applied Genetics 100:39-48.

Critchfield, W. B., and E. L. Little, Jr. 1966. Geographic distribution of the pines of the world. USDA Forest Service Miscellaneous Publication 991. Washington, DC.

Dean, M. D., and J. W. Ballard. 2004. Linking phylogenetics with population genetics to reconstruct the geographic origin of a species. Molecular Phylogenetics and Evolution 32:998- 1009.

Devey, M. E., S. D. Carson, M. F. Nolan, A. C. Matheson, C. Te Riini, and J. Hohepa. 2004. 138

QTL associations for density and diameter in Pinus radiata and the potential for marker-aided selection. Theoretical Applied Genetics 108:516-524.

Dolman, G., and C. Moritz. 2006. A multilocus perspective on refugial isolation and divergence in rainforest skinks (Carlia). Evolution 60:573-582.

Echt, C. S., and C. D. Nelson. 1997. Linkage mapping and genome length in eastern white pine Pinus strobus. Theoretical and Applied Genetics 94:1031-1037.

Gernandt, D. S., G. Geada Lopez, S. 0. Garcia, and A. Liston. 2005. Phylogeny and classification of Pinus. Taxon 54:29-42.

Hey, J. and C. A. Machado. 2003. The study of structured populationsnew hope for a difficult and divided science. Nature Reviews Genetics 4:535-543.

Krutovsky, K. V., M. Troggio, G. R. Brown, K. D. Jermstad, D. B. Neale. 2004. Comparative Mapping in the Pinaceae. Genetics 168:447-461.

Liston, A., W. A. Robinson, D. Piflero, and E. R. Alvarez-Buylla. 1999. Phylogenetics of Pinus (Pinaceae) based on nuclear ribosomal DNA internal transcribed spacer region sequences. Molecular Phylogenetics and Evolution 11:95-109.

Liston, A., D. S. Gernandt, T. F. Vining, C. S. Campbell, and D. Pifiero. 2003. Molecular phylogeny of Pinaceae and Pinus. Acta Horticulturae 615:107-1 14.

Little, E. L., Jr., and W. B. Critchfield. 1969. Subdivisions of the genus Pinus. USDA Forest Service Miscellaneous Publication 1144. Washington, DC.

Maddison, W. P., and L. L. Knowles. 2006. Inferring phylogeny despite incomplete lineage sorting. Systematic Biology 55:21-30.

Mariette, S., D. Chagne, C. Lezier, P. Pastuszka, A. Raffm, C. Plomion, and A. Kremer. 2001. Genetic diversity within and among Pinus pinaster populations: Comparison between AFLP and microsatellite markers. Heredity 86:469-479.

Neale, D. B. and 0. Savolainen. 2004. Association genetics of complex traits in conifers. Trends in Plant Science 9:325-330.

Nelson CD, Nance WL, Doudrick RL (1993) A partial genetic linkage map of slash pine (Pinus elliottii Engeim. var. elliottii) based on random amplified polymorphic DNAs. Theoretical Applied Genetics 87: 145-151.

Nelson CD, Kubisiak TL, Stine M, Nance WL (1994) A genetic linkage map of longleaf pine (P/ntis pal ustris Mill.) based on random amplified polymorphic DNAs. Journal of Heredity 85:433-439.

Plomion, C., D. M. O'Malley, and C. E. Durel. 1995. Genomic analysis in maritime pine Pinus pinaster. Comparison of two RAPD maps using selfed and open-pollinated seeds of the same individual. Theoretical and Applied Genetics 90:1028-1 034. 139

Rannala, B., and Z. Yang. 2003. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164:1645-1656.

Rosenberg, N. A. 2003. The shapes of neutral gene genealogies in two species: Probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. Evolution 57:1465-1477.

Shaw, G. R. 1914. The genus Pinus. Publications of the Arnold Arboretum No. 5, Riverside Press, Cambridge, MA.

Steinhoff, R.J., D.G. Joyce, and L. Fins. 1983. Isozyme variation in Pinus monticola. Canadian Journal of Forest Resources 13:1122-1132.

Temesgen, B., G. R. Brown, D. E. Harry, C. S. Kinlaw, M. M. Sewell, and D. B. Neale. 2001. Genetic mapping of expressed sequence tag polymorphism (ESTP) markers in loblolly pine (Pinus taeda). Theoretical Applied Genetics 102:664-675.

Tomback, D. F., and Y. B. Linhart. 1990. The evolution of bird-dispersed pines. Evolutionary Ecology 4:185-2 19.

Tomback, D. F. 1982. Dispersal of whitebark pine seeds by Clark's nutcracker A mutualism hypothesis. The Journal of Animal Ecology 51:451-467.

Travis, S. E., K. Ritland, T. G. Whitham, and P. Keim. 1998. A genetic linkage map of Pinus edulis based on amplified fragment length polymorphisms. Theoretical and Applied Genetics 97:871-880.

Walsh, H. E., I. L. Jones, and V. L. Friesen. 2005. A test of founder effect speciation using multiple loci in the auklets (Aethia spp.). Genetics 171:1885-1894. 140

BIBLIOGRAPHY

Alvarez, I. and J. F. Wendel. 2003. Ribosomal ITS sequences and plant phylogenetic inference. Molecular Phylogenetics and Evolution 29:417-34.

Alvarez, I., R. C. Cronn, and J. F. Wendel. 2005. Phylogeny of the New World diploid cottons (Gossypium L., Malvaceae) based on sequences of three low-copy nuclear genes. Plant Systematics and Evolution 252:199-2 14.

Alvin, K. 1960. Further conifers of the Pinaceae from the Wealden Formation of Belgium. Memoirs of the Royal Institute of Natural Sciences, Belgium 146:1-39.

Avise, J. C. 1994. Molecular markers, natural history and evolution. Chapman and Hall, New York.

Andresen, J. W. 1964. The taxonomic status of Pinus chiapensis. Phytologia 10:417-421.

Andresen, J. W. 1966. A multivariate analysis of the Pinus chiapensismonticolastrobus phylad. Rhodora 68:1-24.

Bailey, D. K. 1970. Phytogeography and taxonomy of Pinus subsection Balfourianae. Annals of the Missouri Botanical Garden 57:210-249.

Baum, D. A., and M. J. Donoghue. 1995. Choosing among alternative "phylogenetic" species concepts. Systematic Botany 20:560-573.

Beerli, P., and J. Felsenstein. 1999. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152:763-773.

Beriy, P. E., A. L. Hipp, K. J. Wurdack, B. Van Ee, and R. Riina. 2005. Molecular phylogenetics of the giant genus Croton and tribe Crotoneae (Euphorbiaceae sensu stricto) using ITS and TRNL-TRNF DNA sequence data. American Journal of Botany 92:1520-1 534.

Bouillé, M. and J. Bousquet. 2005. Trans-species shared polymorphisms at orthologous nuclear gene loci among distant species in the conifer Picea (Pinaceae): Implications for the long-term maintenance of genetic diversity in trees. American Journal of Botany 92:63-73.

Broughton, R. E., and R. G. Harrison. 2003. Nuclear gene genealogies reveal historical, demographic and selective factors associated with speciation in field crickets. Genetics 163: 1389-1401.

Brown, G. R., E. E. Kadel Ill, D. L. Bassoni, K. L. Kiehne, B. Temesgen, J. P. van Buijtenen, M. M. Sewell, K. A. Marshall, and D. B. Neale. 2001. Anchored reference loci in loblolly pine (Pinus taeda L.) for integrating pine genomics. Genetics 159:799-809.

Brown, G. R., G. P. Gill, R. J. Kuntz, C. H. Langley, and D. B. Neale. 2004. Nucleotide diversity and linkage disequilibrium in loblolly pine. Proceedings of the National Academy of Sciences 101: 15255-15260. 141

Bull, J. J., J. P. Huelsenbeck, C. W. Cunningham, D. L. Swofford, and P. J. Waddell. 1993. Partitioning and combining data in phylogenetic analysis. Systematic Biology 42:384-397.

Businsk', R. 1999. Taxonomic revision of Eurasian pines (genus Pinus L.)- Survey of species and infraspecific taxa according to latest knowledge. Acta Pruhoniciana 68:7-86.

Businsk)2, R. 2004. A revision of the Asian Pinus subsection Strobus (Pinaceae). Willdenowia 34: 209-257.

Campbell, C. S., W. A. Wright, M. Cox, 1. F. Vining, C. S. Major, and M. P. Arsenault. 2005. Nuclear ribosomal DNA internal transcribed spacer 1 (ITS!) in Picea (Pinaceae): sequence divergence and structure. Molecular Phylogenetics and Evolution 35:165-1 85.

Chase, M. W., ET AL. 1993. Phylogenetics of seed plants: An analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden 80:528-548+550-580.

Chaw, S. M., A. Zharkikh, H. M. Sung, T. C. Lau, and W. H. Li. 1997. Molecular phylogeny of extant gymnosperms and seed plant evolution: Analysis of nuclear !8S rRNA sequences. Molecular Biology and Evolution 14:56-68.

Chee P.W, M. Lavin, L.E. Talbert. 1995. Molecular analysis of evolutionary patterns in U genome wild wheats. Genome 38:290-297.

Chen, J., C. G. Tauer, and Y. Huang. 2002. Nucleotide sequences of the internal transcribed spacers and 5.8S region of nuclear ribosomal DNA in Pinus taeda L. and Pinus echinata Mill. DNA Sequence 13:129-13 1.

Chiang, T.-Y., K.-H. Hung, T.-W. Hsu, and W.-L. Wu. 2004. Lineage sorting and phylogeography in Lithocarpusformosanus and L. dodonaefolizLs (Fagaceae) from Taiwan. Annals of the Missouri Botanical Garden 91:207-222.

Church, S. A., and D. R. Taylor. 2005. Speciation and hybridization among Houstonia (Rubiaceae) species: The influence of polyploidy on reticulate evolution. American Journal of Botany 92:1372-1380.

Clegg, M. T., B. S. Gaut, G. H. Learn Jr., and B. R. Morton. 1994. Rates and patterns of chloroplast DNA evolution. Proceedings of the National Academy of Sciences 91:6795-6801.

Comes, H. P., and R. J. Abbott. 2001. Molecular phylogeography, reticulation, and lineage sorting in Mediterranean Senecio sect. Senecio (Asteraceae). Evolution 55:1943-1962.

Costa, P., D. Pot, C. Dubos, J. M. Frigerio, C. Pionneau, C. Bodenes, E. Bertocchi, M. T. Cervera, D. L. Remington, and C. Plomion. 2000. A genetic map of maritime pine basedon AFLP, RAPD and protein markers. Theoretical and Applied Genetics 100:39-48.

Crisp, M. D., and G. T. Chandler. 1996. Paraphyletic species. Telopea 6:813-844.

Critchfield, W. B. 1975. Interspecific hybridization in Pinus: A summary review. Pages 99-1 05 in: Proceedings 14th Canadian Tree Improvement Association, Part 2, Symposium on 142

interspecific hybridization in forest trees, NB, 28-30 August 1973 (Fowler, D. P., and C. W. Yeatman, eds.). Canadian Forestry Service, Ottawa, ON.

Critchfield, W. B. 1986. Hybridization and classification of the white pines (Pinus section Stro bus). Taxon 35:647-656.

Critchfield, W. B., and E. L. Little, Jr. 1966. Geographic distribution of the pines of the world. USDA Forest Service Miscellaneous Publication 991. Washington, DC.

Cronn, R. C., R. L. Small, and J. F. Wendel. 1999. Duplicated genes evolve independently after polyploid formation in cotton. Proceedings of the National Academy of Sciences, USA 96:14406-14411.

Cronn, R. C., R. L. Small, T. Haselkorn, and J. F. Wendel. 2002. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes. American Journal of Botany 89:707-725.

Cronn, R. C., R. L. Small, T. Haselkorn, and J. F. Wendel. 2003. Cryptic repeated genomic recombination during speciation in Gossypium gossypioides. Evolution 57:2475-2489.

Cross, E. W., C. J. Quinn, and S. J. Wagstaff. 2002. Molecular evidence for the polyphyly of Olearia (Astereae: Asteraceae). Plant Systematics and Evolution 235:99-120.

Cunningham, C. W. 1997. Can three incongruence tests predict when data should be combined? Molecular Biology and Evolution 14:733-740. de Queiroz, K., and M. J. Donoghue. 1988. Phylogenetic systematics and the species problem. Cladistics 4:317-338.

Dean, M. D., and J. W. Ballard. 2004. Linking phylogenetics with population genetics to reconstruct the geographic origin of a species. Molecular Phylogenetics and Evolution 32:998- 1009. del Castillo, R. F., and S. Acosta. 2002. Ethnobotanical notes on Pinus strobus var. chiapensis. Anales del Instituto de Biologla, Universidad Nacional AutOnoma de Mexico, Serie Botánica 73 :3 19-327.

Delgado, P., D Piflero, A. Chaos, N. Perez-Nasser, and E. R. Alavarez-Buylla. 1999. High population differentiation and genetic variation in the endangered Mexican pine Pinus rzedowskii (Pinaceae). American Journal of Botany 86:669-676.

Devey, M. E., S. D. Carson, M. F. Nolan, A. C. Matheson, C. Te Riini, and J. Hohepa. 2004. QTL associations for density and diameter in Pinus radiata and the potential for marker-aided selection. Theoretical Applied Genetics 108:516-524.

Don, R. H., P. T. Cox, B. J. Wainwright, K. Baker, and J. S. Mattick. 1991. Touchdowns PCR to circumvent spurious priming during gene amplification. Nucleic Acids Research 19:4008.

Donahue, J. K., W. S. Dvorak, and E. A. Gutierrez. 1991. The distribution, ecology, and gene 143

conservation of Pinus ayacahuite and P. chiapensis in Mexico and Central America. CAMCORE, Raleigh, North Carolina.

Doyle, J. J. 1992. Gene trees and species trees: Molecular systematics as one-character taxonomy. Systematic Botany 17:1 44i 63.

Doyle, J. J. 1997. Trees within trees: Genes and species, molecules and morphology. Systematic Biology 46:537-553.

Dressler, R. L. 1954. Some floristic relationships between Mexico and the United States. Rhodora 56:81-96. Dvorak, W. S., J. K. Donahue, and J. A. Vasquez. 1996. Provenance and progeny results for the tropical white pine, Pinus chiapensis, at five and eight years of age. New Forests 12:125-140.

Dvorak, W. S., A. P. Jordan, G. P. Hodge, and J. L.Romero. 2000. Assessing evolutionary relationships of pines in the Oocarpae and Australes subsections using RAPD markers. New Forests 20:163-192.

Echt, C. S., and C. D. Nelson. 1997. Linkage mapping and genome length in eastern white pine Pinus strobus. Theoretical and Applied Genetics 94:1031-1037.

Eckert, A. J., and B. D. Hall. Phylogeny, historical biogeography, and patterns of diversification for Pinus (Pinaceae): Phylogenetic tests of fossil-based hypotheses. Molecular Phylogentics and Evolution, in press.

Eisen, J. A. and C. M. Fraser. 2003. Phylogenomics: intersection of evolution and genomics. Science 300:1706-1707.

FAO. 1981. Report of the fifth session of the FAO panel of experts on forest gene resources. FAO, Rome, Italy.

Farjon, A. 1990. A Bibliography of Conifers. Regnum Vegetabile. Koeltz Scientific Books, Koenigstein, Germany.

Farjon, A. 1998. World Checklist and Bibliography of Conifers. Royal Botanical Gardens at Kew, Richmond, U.K.

Farjon, A. 2005. Drawings and descriptions of the genus Pinus, 2d ed. E. J. Brill and W. Backhuys, Leiden, Netherlands.

Farjon, A., and B. T. Styles. 1997. Pinus (Pinaceae). Flora Neotropica Monograph 75. The New York Botanical Garden, New York.

Farjon, A. and C. N. Page. 1999. Conifersstatus survey and conservation action plan. IUCN/SSC Conifer Specialist Group, Cambridge, U.K.

Farris, J. S., M. Källersjo, A. G. Kluge, and C. Bult. 1994. Testing significance of incongruence. Cladistics 10:315-319. 144

Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791.

Feng, Y., S.-H. Oh, and P. S. Manos. 2005. Phylogeny and historical biogeography of the genus Platanus as inferred from nuclear and chloroplast DNA. Systematic Botany 30:786-799.

Fernando, D. D., S. M. Long, and R. A. Sniezko. 2005. Sexual reproduction and crossing barriers in white pines: The case between Pinus lambertiana (sugar pine) and P. monticola (western white pine). Tree Genetics and Genomes 1:143-150.

Friesen, N., A. Brandes, and J. S. Heslop-Harrison. 2001. Diversity, origin, and distribution of retrotransposons (gypsy and copia) in conifers. Molecular Biology and Evolution 18:1176- 1188.

Fulton, T. M., R, Van der Hoeven, N. T. Eannetta, and S. D. Tanksley. 2002. Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants. The Plant Cell 14:1457-1467.

Funk, D. J., and K. E. Omland. 2003. Species-level paraphyly and polyphyly: Frequency, causes, and consequences, with insights from animal mitochondrial DNA. Annual Review of Ecology, Evolution, and Systematics 34:397-423.

Garrett, P. W. 1979. Species hybridization in the genus Pinus. USDA Forest Service Research Paper NE-436. U.S.D.A. Forest Service Northeastern Forest Experiment Station, Broomall, Pennsylvania, USA.

Gaussen, H. 1960. Pinus. In "Les gymnospermes actuelles et fossiles," Tome II, Section I, Vol. I, Chap. XI, Fascicle VI, pp. 1-272, Travaux du Laboratorie Forestier de Toulouse, Toulouse, France.

Gaut, B. S., and J. F. Doebley. 1997. DNA sequence evidence for the segmental allotetraploid origin of maize. Proceedings of the National Academy of Sciences, USA 94:6809-6814.

Geada LOpez, G., K. Kamiya, and K. Harada. 2002. Phylogenetic relationships of diploxylon pines (subgenus Pinus) based on plastid sequence data. International Journal of Plant Science 163:737-747.

Germano, J. and A. S. Klein. 1999. Using species-specific nuclear and chloroplast single nucleotide polymorphisms to distinguish between Picea glauca, P. mariana, and P. rubens from single needle samples. Theoretical and Applied Genetics 99:37-49.

Gernandt, D. S., A. Liston, and D. Pinero. 2001. Variation in the nrDNA ITS of Pinus subsection Cembroides: implications for molecular systematic studies of pine species complexes. Molecular Phylogenetics and Evolution 21:449-467.

Gernandt, D. S., A. Liston, and D. Piñero. 2003. Phylogenetics of Pinus subsections Cembroides and Nelsoniae inferred from cpDNA sequences. Systematic Botany 4:657-673.

Gernandt, D. S., G. Geada Lopez, S. 0. GarcIa, and A. Liston. 2005. Phylogeny and classification 145

of Pinus. Taxon 54:29-42.

Goetsch, L., A. J. Eckert, and B. D. Hall. 2005. The molecular systematics of Rhododendron (Ericaceae): A phylogeny based upon RPB2 gene sequences. Systematic Botany 30:616-626.

Goodwillie, C., and J. W. Stiller. 2001. Evidence for polyphyly in a species of Linanthus (Polemoniaceae): Convergent evolution in self-fertilizing taxa. Systematic Botany 26:273-282.

Govindaraju, D., P. Lewis, and C. Cuilis. 1992. Phylogenetic analysis of pines using ribosomal DNA restriction fragment polymorphisms. Plant Syslematics and Evolution 179:141-1 53.

Grotkopp, E., M. Rejmanek, M. J. Sanderson, and T. L. Rost. 2004. Evolution of genome size in pines (Pinus) and its life-history correlates: Supertree analyses. Evolution 58:1705-1729.

Gugerli, F., J. Senn, M. Anzidei, A. Madaghiele, U. Buchler, C. Sperisen, and G. G. Vendramin. 2001. Chloroplast microsatellites and mitochondrial nad 1 intron 2 sequences indicate congruent phylogenetic relationships among Swiss stone pine (Pinus cembra), Siberian stone pine (Pinus sibirica), and Siberian dwarf pine (Pinuspumila). Molecular Ecology 10:1489-1497.

Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series 41:95-98.

Hare, M. P., F. Cipriano, and S. R. Palumbi. 2002. Genetic evidence on the demography of speciation in allopatric dolphin species. Evolution 56:804-816.

Harris, E. E. and T. R. Disotell. 1998. Nuclear gene trees and the phylogenetic relationships of the mangabeys (Primates: Papionini). Molecular Biology and Evolution 15:892-900.

Hart, J. A. 1987. A cladistic analysis of conifers: Preliminary results. Journal of the Arnold Arboretum 68:269-307.

Hey, J. and C. A. Machado. 2003. The study of structured populationsnew hope for a difficult and divided science. Nature Reviews Genetics 4:535-543.

Hipp, A. L., J. C. Hall, and K. Sytsma. 2004. Congruence versus phylogenetic accuracy: revisiting the incongruence length difference test. Systematic Biology 53:81-89.

Hudson, R. R., and J. A. Coyne. 2002. Mathematical consequences of the genealogical species concept. Evolution 56:1557-1565.

Huelsenbeck, J. P., J. J. Bull, and C. W. Cunningham. 1996. Combining data in phylogenetic analysis. Trends in Ecology and Evolution 11:152-158.

Iglesias, R. G. and M. J. Babiano. 1999. Late-embryogenesis-abundant (LEA) protein from Pseudotsuga menziesii seeds. Plant Physiology 119:806-806. loerger, T. R., A. G. Clark, and T. H. Kao. 1990. Polymorphism at the self-incompatibility locus in Solanaceae predates speciation. Proceedings of the National Academy of Sciences, USA 87:9732-9735. 146

Jansson, S. and P. Gustafsson. 1990. Type I and Type II genes for the chlorophyll a/b-binding protein in the gymnosperm Pinus sylvestris (Scots pine): cDNA cloning and sequence analysis. Plant Molecular Biology 14:287-296.

Johnson, L. A. and D. E. Soltis 1998. Assessing congruence: empirical examples from molecular data. In D. E. Soltis, P. 5. Soltis, and J. J. Doyle [eds.], Molecular Systematics of Plants II: DNA Sequencing, 297-348. Kluwer Academic Publishers, Norwell, Massachusetts, USA.

Kamiya, K., K. Harada, H. Tachida, and P. S. Ashton. 2005. Phylogeny of PgiC gene in Shorea and its closely related genera (Dipterocarpaceae), the dominant trees in southeast Asian tropical rain forests. American Journal of Botany 92:775-788.

Karalamangala, R. R. and D. L. Nickrent. 1989. An electrophoretic study of representatives of subgenus Diploxylon of Pinus. Canadian Journal of Botany 67:1750-1759.

Kinlaw, C. S. and D. B. Neale. 1997. Complex gene families in pine genomes. Trends in Plant Science 2:356-359.

Kral, R. 1993. Pinus. Flora of North America Editorial Committee (ed.): Flora of North America (North of Mexico), vol. 2. Oxford University Press, New York.

Krupkin, A. B., A. Liston, and S. H. Strauss. 1996. Phylogenetic analysis of the hard pines(Pinus subgenusPinus,Pinaceae) from chloroplast DNA restriction site analysis. American Journal of Botany 83:489-498.

Krutovsky, K. V., M. Troggio, G. R. Brown, K. D. Jermstad, and D. B. Neale. 2004. Comparative mapping in the Pinaceae. Genetics 168:447-461.

Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245.

Kusumi, J., Y. Tsumura, H. Yoshimaru, and H. Tachida. 2002. Molecular evolution of nuclear genes in Cupressaceae, a group of conifer trees. Molecular Biology and Evolution 19:736-747.

Landry, P. 1989. A revised synopsis of the white pines(Pinu.s,section Quinquefoliis). Phytologia 65:467-474.

Lanner, R. M. 1 974a. Natural hybridization between Pinus edulis and Pinus monophylla in the American Southwest. Silvae Genetica 23:108-116.

Lanner, R. M. 1974b. A new pine from Baja California and the hybrid origin of Pinus quadrjfolia. Southwest Naturalist 19: 75-95.

Lanner, R. M. 1998. Seed dispersal in Pinus. In D. M. Richardson [ed.], Ecology and biogeography of Pinus, 28 1-295. Cambridge University Press, New York.

Lanner, R. M., and A. M. Phillips III. 1992. Natural hybridization and introgression of pinyon pines in northwestern Arizona. International Journal of Plant Sciences 153:250-257. 147

Le Maitre, D. C. 1998. Pines in cultivation: A global view. Pages 407-431 in D. M. Richardson [ed.], Ecology and biogeography of Pinus. Cambridge University Press, New York, New York, USA.

Ledig, F. T. 1998. Genetic variation in Pinus. Pages 251-280 in: Ecology and Biogeography of Pinus (Richardson, D. M., ed.). Cambridge University Press, Cambridge, U.K.

Ledig, F. T., M. T. Conkle, B. Bermejo-Velazaquez, 1. Eguiluz-Piedra, P. D. Hodgskiss, D. R. Johnson, W. S. Dvorak. 1999. Evidence for an extreme bottleneck in a rare Mexican pinyon: Genetic diversity, disequilibrium, and the mating system in Pinus maximartinezii. Evolution 53 :91-99.

Lee, C., S.-C. Kim, K. Lundy, and A. Santos-Guerra. 2005. Chloroplast DNA phylogeny of the woody Sonchus alliance (Asteraceae: Sonchinae) in the Macaronesian Islands. American Journal of Botany 92:2072-2085.

Levin, R. A., K. Watson, and L. Bohs. 2005. A four-gene study of evolutionary relationships in Solanum section Acanthophora. American Journal of Botany 92:603-612.

Levin, R. A., and J. S. Miller. 2005. Relationships within tribe Lycieae (Solanaceae): Paraphyly of Lycium and multiple origins of gender dimorphism. American Journal of Botany 92:2044- 2053.

Liston, A., W. A. Robinson, D. Piñero, and E. R. Alvarez-Buylla. 1999. Phylogenetics of Pinus (Pinaceae) based on nuclear ribosomal DNA internal transcribed spacer region sequences. Molecular Phylogenetics and Evolution 11:95-109.

Liston, A., D. S. Gernandt, T. F. Vining, C. S. Campbell, and D. Piñero. 2003. Molecular phylogeny of Pinaceae and Pinus. Acta Horticulturae 615:107-1 14.

Little, E. L., Jr. 1968. Two new pinyon varieties from Arizona. Phytologia 17:329-342.

Little, E. L. Jr. and F. I. Righter. 1965. Botanical descriptions of forty artificial pine hybrids. USDA Forest Service Technical Bulletin 1345. U.S.D.A. Forest Service, Washington, D.C., USA.

Little, E. L., Jr., and W. B. Critchfield. 1969. Subdivisions of the genus Pinus. USDA Forest Service Miscellaneous Publication 1144. Washington, DC.

Lopez, G. G., K. Kamiya, and K. Harada. 2002. Phylogenetic relationships of diploxylon pines (subgenus Pinus) based on plastid sequence data. International Journal of Plant Science 163 :737-747.

Luo, Y., F. Zhang, and Q.-E. Yang. 2005. Phylogeny of Aconitum subgenus Aconitum (Ranunculaceae) inferred from ITS sequences. Plant Systematics and Evolution 252:11-25.

Ma, X.-F., A. E. Szmidt, and X.-R. Wang. 2006. Genetic structure and evolutionary history ofa diploid hybrid pine Pinus densata inferred from the nucleotide variation atseven gene loci. Molecular Biology and Evolution 23:807-8 16. 148

MacDonald, G. M., L. C. Cwynar, and C. Whitlock. 1998. The late Quaternary dynamics of pines in northern North America. Pages 122-149 in: Ecology and Biogeography of Pinus (Richardson, D. M., ed.). Cambridge University Press, Cambridge, U.K.

Maddison, W. P., and L. L. Knowles. 2006. Inferring phylogeny despite incomplete lineage sorting. Systematic Biology 55:21-30.

Magallon, S., and M. J. Sanderson. 2002. Relationships among seed plants inferred from highly conserved genes: Sorting conflicting phylogenetic signals among ancient lineages. American Journal of Botany 89:1991-2006.

Malcomber, S. 1. 2002. Phylogeny of Gaertnera Lam. (Rubiaceae) based on multiple DNA markers: Evidence of rapid radiation in a widespread, morphologically diverse genus. Evolution 56:42-57.

Malusa, J. 1992. Phylogeny and biogeography of the pinyon pines (Pinus subsect. Cembroides). Systematic Botany 17:42-66.

Mariette, S., D. Chagne, C. Lezier, P. Pastuszka, A. Raffin, C. Plomion, and A. Kremer. 2001. Genetic diversity within and among Pinus pinaster populations: Comparison between AFLP and microsatellite markers. Heredity 86:469-479.

Martin, P. 5., and B. E. Harrell. 1957. The pleistocene history of temperate biotas in Mexico and eastern United States. Ecology 38:468-480.

Martin, D. P., C. Williamson, and D. Posada. 2005. RDP2: Recombination detection and analysis from sequence alignments. Bioinformatics 21:260-262.

Martinez, M. 1940. Pinaceas Mexicanas. Anales del Instituto de Biologla de Mexico 11:57-84.

Mason-Gamer, R. J. 2005. The {beta}-amylase genes of grasses and a phylogenetic analysis of the Triticeae (Poaceae). American Journal of Botany 92:1045-1058.

McGuire, G., F. Wright, and M. J. Prentice. 1997. A graphical method for detecting recombination in phylogenetic data sets. Molecular Biology and Evolution 14:1125-1131.

McKown, A. D., J.-M. Moncalvo, and N. G. Dengler. 2005. Phylogeny of Flaveria (Asteraceae) and inference of C4 photosynthesis evolution. American Journal of Botany 92:1911-1928.

Meijer, J. 2000. Fossil woods from the late Cretaceous Aachen Formation. Review of Palaeobotany and Palynology 112: 297-336.

Millar, C. I., S. H. Strauss, M. T. Conkle, and R. D. Westfall. 1988. Allozyme differentiation and biosystematics of the Californian closed-cone pines (Pinus subsect. Oocarpae). Systematic Botany 13:351-370.

Miller, C. 1973. Silicified cones and vegetative remains of Pinus from the Eocene of British Columbia. Contributions from the University of Michigan Museum of Paleontology 24:101- 118. 149

Mime, I., F. Wright, G. Rowe, D. F. Marshall, D. Husmeier, and G. McGuire. 2004. TOPALi: Software for automatic identification of recombinant sequences within DNA multiple alignments. Bioinformatics 20:1806-1807.

Mirov, N. T. 1967. The Genus Pinus. The Ronald Press Company, New York, New York, USA.

Miyamoto, M. M. 1996. A congruence study of molecular and morphological data for Eutherian mammals. Molecular Phylogenetics and Evolution 6:373-390.

Moore, W. 5. 1995. Inferring phylogenies from mtDNA variation: Mitochondrial-gene trees versus nuclear-gene trees. Evolution 49:718-726.

MUller, K., and T. Borsch. 2005. Phylogenetics of Utricularia (Lentibulariaceae) and molecular evolution of the trnK intron in a lineage with high substitutional rates. Plant Systematics and Evolution 250:39-67.

Nagl, S., H. Tichy, W. E. Mayer, N. Takahata, and J. Klein. 1998. Persistence of neutral polymorphisms in Lake Victoria cichlid fish. Proceedings of the National Academy of Sciences, USA 95:14238-14243.

Neale, D. B. and 0. Savolainen. 2004. Association genetics of complex traits in conifers. Trends in Plant Science 9:325-330.

Nei, M. 1987. Molecular Evoltionary Genetics. Columbia University Press, New York.

Nelson CD, Nance WL, Doudrick RL (1993) A partial genetic linkage map of slash pine (Pinus elliottii Engelm. var. elliottii) based on random amplified polymorphic DNAs. Theoretical Applied Genetics 87:145-15 1

Nelson CD, Kubisiak TL, Stine M, Nance WL (1994) A genetic linkage map of longleaf pine (Pinus palustris Mill.) based on random amplified polymorphic DNAs. Journal of Heredity 85:433-439

Newton, A. C., T. R. Allnutt, W. S. Dvorak, R. F. del Castillo, and R. A. Ennos. 2002. Patterns of genetic variation in Pinus chiapensis, a threatened Mexican pine, detected by RAPD and mitochondrial DNA RFLP markers. Heredity 89:191-198.

Nkongolo, K. K., P. Michael, and W. S. Gratton. 2002. Identification and characterization of RAPD markers inferring genetic relationships among pine species. Genome 45:51-58.

Oh, S.-H., and D. Potter. 2005. Molecular phylogenetic systematics and biogeography of tribe Neillieae (Rosaceae) using DNA sequences of cpDNA, rDNA, and LEAFY. American Journal of Botany 92:179-192.

Oline, D. K., J. B. Mitton, and M. C. Grant. 2000. Population and subspecific genetic differentiation in the foxtail pine (Pinus balfouriana). Evolution 54:1813-1819.

Pamilo, P., and M. Nei. 1988. Relationships between gene trees and species trees. Molecular 150

Biology and Evolution 5:568-583.

Parker, K. C., J. L. Hamrick, A. J. Parker, and E. A. Stacy. 1997. Allozyme diversity in Pinus virginiana (Pinaceae): Intraspecific and interspecific comparisons. American Journal of Botany 84: 1372-1382.

Pease, C. M. and D. J. Mattson. 1999. Demography of the yellowstone grizzly bears. Ecology 80:957-975.

Perez de la Rosa, J., S. A. Harris, and A. Farjon. 1995. Noncoding chloroplast DNA variation in Mexican pines. Theoretical and Applied Genetics 91:1101-1106.

Perry, J. P. 1991. The Pines of Mexico and Central America. Timber Press, Portland, OR.

Peters, J. L., K. G. McCracken, Y. N. Zhuravlev, Y. Lu, R. E. Wilson, K. P. Johnson, and K. E. Omland. 2005. Phylogenetics of wigeons and allies (Anatidae: Anas): The importance of sampling multiple loci and multiple individuals. Molecular Phylogenetics and Evolution 35:209-224.

Petit, R. J., I. Aguinagalde, J. L. de Beaulieu, C. Bittkau, S. Brewer, R. Cheddadi, R. Ennos, S. Fineschi, D. Grivet, M. Lascoux, A. Mohanty, G. MUller-Starck, B. Demesure-Musch, A. Palmé, J. P. MartIn, S. Rendell, and G. G. Vendramin. 2003. Glacial refugia: Hotspots but not melting pots of genetic diversity. Science 300:1563-1565.

Plomion, C., D. M. O'Malley, and C. E. Durel. 1995. Genomic analysis in maritime pine Pinus pinaster. Comparison of two RAPD maps using selfed and open-pollinated seeds of the same individual. Theoretical and Applied Genetics 90:1028-1034.

Pollock, D. D., D. J. Zwickl, J. S. McGuire, and D. M. Hillis. 2002. Increased taxon sampling is advantageous for phylogenetic inference. Systematic Biology 51:664-671.

Popp, M., and B. Oxelman. 2004. Evolution of a RNA polymerase gene family inSilene (Caryophyllaceae)incomplete concerted evolution and topological congruence among paralogues. Systematic Biology 53:914-932.

Posada, D. 2002. Evaluation of methods for detecting recombination from DNA sequences: Empirical data. Molecular Biology and Evolution 19:708-7 17.

Posada, D. and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817-818.

Posada, D., and K. A. Crandall. 2001. Evaluation of methods for detecting recombination from DNA sequences: Computer simulations. Proceedings of the National Academy of Sciences USA 98:13757-13762.

Posada, D., and K. A. Crandall. 2002. The effect of recombination on the accuracy of phylogeny estimation. Journal of Molecular Evolution 54:396-402.

Price, R. A., A. Liston, and S. H. Strauss. 1998. Phylogeny and systematics of Pinus. In D. M. 151

Richardson [ed.], Ecology and biogeography of Pinus, 49-68. Cambridge University Press, New York, New York, USA.

Rannala, B., and Z. Yang. 2003. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164:1645-1656.

Rice, W. R. 1989. Analyzing tables of statistical tests. Evolution 43 :223-225.

Richardson, D. M., and P. W. Rundel. 1998. Ecology and biogeography of Pinus: An introduction. Pages 3-46 in D. M. Richardson [ed.], Ecology and biogeography of Pinus. Cambridge University Press, New York, New York, USA.

Richmond, T. A. and Somerville, C. R. 2000. The cellulose synthase superfamily. Plant Physiology 124:495-498.

Rieseberg, L. H., and L. Broulliet. 1994. Are many plant species paraphyletic? Taxon 43 :21-32.

Roalson, E. H., and E. A. Friar. 2004. Phylogenetic relationships and biogeographic patterns in North American members of Carex section Acrocystis (Cyperaceae) using nrDNA ITS and ETS sequence data. Plant Systematics and Evolution 243:175-187.

Robba, L., M. A. Carine, S. J. Russell, and F. M. Raimondo. 2005. The monophyly and evolution of Cynara L. (Asteraceae) sensu lato: Evidence from the internal transcribed spacer region of nrDNA. Plant Systematics and Evolution 253:53-64.

Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798-804.

Rosenberg, N. A. 2003. The shapes of neutral gene genealogies in two species: Probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. Evolution 57:1465-1477.

Rosenberg, M. S. and S. Kumar. 2001. Incomplete taxon sampling is not a problem for phylogenetic inference. Proceedings of the National Academy of Sciences 98:10751-10756.

Rozas, J., J. Sánchez-DelBarrio, X. Messeguer, and R. Rozas. 2004. DnaSP: DNAsequence polymorphism v4.00.6. Bioinformatics 19:2496-2497.

Rzedowski, J., and L. Vela. 1966. Pinus strobus var. chiapensis en la Sierra Madre del Sur de Mexico. Ciencia 24:211-2 16.

Sanderson, M. J. and H. B. Shaffer. 2002. Troubleshooting molecular phylogenetic analyses. Annual Review of Ecology and Systematics 33:49-72.

SAS Institute, Inc. 1999. SAS/STAT user's guide, ver. 8, vol. 1. SAS Institute, Inc. Cary, NC. 1,243 pg.

Schierup, M. H., and .Hem. 2000. Consequences of recombination on traditional phylogenetic analysis. Genetics 156:879-891. 152

Schneeweiss, G., P. Schonswetter, S. Kelso, and H. Niklfeld. 2004. Complex biogeographic patterns in Androsace (Primulaceae) and related genera: Evidence from phylogenetic analyses of nuclear internal transcribed spacer and plastid trnL-F sequences. Systematic Biology 53:856-876.

Sharp, A. J. 1953. Notes on the flora of Mexico: World distribution of the woody dicotyledonous families and the origin of the modern vegetation. Journal of Ecology 41:374-380.

Shaw, G. R. 1914. The genus Pinus. Publications of the Arnold Arboretum No. 5, Riverside Press, Cambridge, MA.

Shaw, K. L. 1998. Species and the diversity of natural groups. Pages 44-56 in Endless forms: Species and speciation (Howard, D. J., and S. J. Berlocher, eds.). Oxford University Press, Oxford, U.K.

Shaw, J. 2001. Biogeographic patterns and cryptic speciation in bryophytes. Journal of Biogeography 28:253-261.

Shaw, J., E. B. Lickey, J. T. Beck, S. B. Farmer, W. Liu, J. Miller, K. C. Siripun, C. T. Winder, E. E. Schilling, and R. L. Small. 2005. The tortoise and the hare II: relative utility of 21 noncoding chloroplast DNA sequences for phylogenetic analysis. American Journal of Botany 92:142- 166.

Shaw, J., and R. L. Small. 2005. Chloroplast DNA phylogeny and phylogeography of the North American plums (Prunus subgenus Prunus section Prunocerasus, Rosaceae). American Journal of Botany 92:2011-2030.

Showalter, A. M. 2001. Arabinogalactan-proteins: structure, expression, and function. Cellular and Molecular Life Sciences 58:1399-1417.

Simmons, M. P., and H. Ochoterena. 2000. Gaps as characters in sequence-based phylogenetic analyses. Systematic Biology 49:369-38 1.

Sites, J. W., and J. C. Marshall. 2003. Delimiting species: A Renaissance issue in systematic biology. Trends in Ecology and Evolution 18:462-470.

Small, R. L., R. C. Cronn, and J. F. Wendel. 2004. Use of nuclear genes for phylogeny reconstruction in plants. Australian Systematic Botany 17:145-170.

Smith, J. M. 1992. Analyzing the mosaic structure of genes. Journal of Molecular Evolution 34:126-129.

Song, B. H., X. Q. Wang, X. R. Wang, K. Y. Ding, and D. Y. Hong. 2003. Cytoplasmic composition in Pinus densata and population establishment of the diploid hybrid pine. Molecular Ecology 12:2995-300 1.

Sorenson, F.C. 1999. Relationship between self-fertility, allocation of growth, and inbreeding depression in three coniferous species. Evolution 53: 417-425. 153

Springer, M. S., R. W. DeBry, C. Douady, H. M. Amrine, 0. Madsen, W. W. de Jong, and M. J. Stanhope. 2001. Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Molecular Biology and Evolution 18:132-143.

Standley, P. C. and J. A. Steyermark. 1958. Flora of Guatemala, Part I. Chicago Natural History Museum, Fieldiana, Botany, 24. 476 p.

Steinhoff, R.J., D.G. Joyce, and L. Fins. 1983. Isozyme variation in Pinus monticola. Canadian Journal of Forest Resources 13:1122-1132.

Strauss, S. H. and A. H. Doerksen. 1990. Restriction fragment analysis of pine phylogeny. Evolution 44:1081-1096.

Suchard, M. A., C.M.R. Kitchen, J. S. Sinsheimer, and R. E. Weiss. 2003. Hierarchical phylogenetic models for analyzing multipartite sequence data. Systematic Biology 52:649-664.

Swofford, D. L. 2003. PAUP*4.OblO: Phylogenetic analysis using parsimony (* and other methods). Sinauer Associates, Sunderland, Massachusetts, USA.

Syring, J., A. Willyard, R. Cronn, and A. Liston. 2005. Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. American Journal of Botany 92:2086-2 100.

Takezaki, N., F. Figueroa, Z. Zaleska-Rutczynska,N. Takahata, and J. Klein. 2004. The phylogenetic relationship of tetrapod, coelacanth, and Iungfish revealed by the sequences of forty-four nuclear genes. Molecular Biology and Evolution 21:1512-1524.

Temesgen, B., G. R. Brown, D. E. Harry, C. S. Kinlaw, M. M. Sewell, and D. B. Neale. 2001. Genetic mapping of expressed sequence tag polymorphism (ESTP) markers in loblolly pine (Pinustaeda).Theoretical Applied Genetics 102:664-675.

Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage sitemaps with particular reference to the evolution of humans and the apes. Evolution 37:221-244.

Tomback, D. F. 1982. Dispersal of whitebark pine seeds by Clark's nutcracker: A mutualism hypothesis. The Journal of Animal Ecology 51:451-467.

Tomback, D. F., and Y. B. Linhart. 1990. The evolution of bird-dispersed pines. Evolutionary Ecology 4:185-219.

Travis, S. E., K. Ritland, T. G. Whitham, and P. Keim. 1998. A genetic linkagemap of pinyon pine Pinusedulisbased on amplified fragment length polymorphisms. Theoretical and Applied Genetics 97:871-880.

Treutlein, J., G. F. Smith, B.-E. van Wyk, and M. Wink. 2003. Evidence for the polyphyly of Haworthia (Asphodelaceae Subfamily Alooideae; Asparagales) inferred from nucleotide sequences of rbcL, matK, ITS 1 and genomic fmgerprinting with ISSR-PCR. Plant Biology 5 :5 13-521. 154

U.S. Geological Survey. 1999. Digital representation of Atlas of United States Trees by Elbert L. Little, Jr.

Walsh, H. E., I. L. Jones, and V. L. Friesen. 2005. A test of founder effect speciation using multiple loci in the auklets (Aethia spp.). Genetics 171:1885-1894.

Wang, X.-R., and A.E. Szmidt. 1994. Hybridization and chloroplast DNA variation in a Pinus species complex from Asia. Evolution 48:1020-1031.

Wang, X. R., Y. Tsumura, H. Yoshimaru, K. Nagasaka, and A. E. Szmidt. 1999. Phylogenetic relationships of Eurasian pines (Pinus, Pinaceae) based on chloroplast rbcL, matK, rpl2O-rpsl8 spacer, and tm V intron sequences. American Journal of Botany 86:1742-1753.

Wang, X. Q., D. C. Tank, and T. Sang. 2000a. Phylogeny and divergence times in Pinaceae: Evidence from three genomes. Molecular Biology and Evolution 17:773-78 1.

Wang, X. R., A. E. Szmidt, and H. N. Nguyen. 2000b. The phylogenetic position of the endemic flat-needle pine Pinus krempfii (Pinaceae) from Vietnam, based on PCR-RFLP analysis of chloroplast DNA. Plant Systematics and Evolution 220:21-36.

Wang, X. R., A. E. Szmidt, and 0. Savolainen. 2001. Genetic composition and diploid hybrid speciation of a high mountain pine, Pinus densata, native to the Tibetan plateau. Genetics 159:337-346.

Watano, Y., A. Kanai, and N. Tani. 2004. Genetic structure of hybrid zones between Pinus pumila and P. parvflora var. pentaphylla (Pinaceae) revealed by molecular hybrid index analysis. American Journal of Botany 91:65-72.

Wendel, J. F. and J. J. Doyle 1998. Phylogenetic incongruence: window intogenome history and molecular evolution. In D. Soltis, P. Soltis, and J. Doyle [eds.], Molecular Systematics of Plants II, 265-296. Chapman and Hall, New York, New York, USA.

Wilkin, P., P. Schols, M. W. Chase, K. Chavamarit, C. A. Furness, S. Huysmans, F. Rakotonasolo, E. Smets, and C. Thapyai. 2005. A plastid gene phylogeny of theyam genus, Dioscorea: Roots, fruits and Madagascar. Systematic Botany 30:736-749.

Willyard, A., J. Syring, D. Gernandt, A. Liston, and R. Cronn. Unpublished manuscript. Molecular evolutionary rates indicate a recent and rapid diversification for modern pine lineages.

Winkworth, R. C., and M. J. Donoghue. 2005. Viburnum phylogeny basedon combined molecular data: Implications for taxonomy and biogeography. American Journal of Botany 92:653-666.

Wright, J. A., A. M. Mann V, and W. S. Dvorak. 1996. Conservation anduse of the Pinus chiapensis genetic resource in Columbia. Forest Ecology and Management 88:283-288.

Wu, C.-I. 2001. The genic view of the process of speciation. Journal of Evolutionary Biology 14:851-865. 155

Wu, J., K. V. Krutovskii, and S. H. Strauss. 1999. Nuclear DNA diversity, population differentiation, and phylgenetic relationships in the California closed-cone pines based on RAPD and allozyme markers. Genome 42:893-908.

Yamane, K., and T. Kawahara. 2005. Intra- and interspecific phylogenetic relationships among diploid Triticum-Aegilops species (Poaceae) based on base-pair substitutions, indels, and microsatellites in chloroplast noncoding sequences. American Journal of Botany 92:1887- 1898.

Yang, Z. 1998. On the best evolutionary rate for phylogenetic analysis. Systematic Biology 47: 125-133.

Yoder, A. D., J. A. Irwin, and B. A. Payseur. 2001. Failure of the ILD to determine data combinability for slow loris phylogeny. Systematic Biology 50:408-424.

Yuan, Y. M., S. Wohihauser, M. Moller, J. Klackenberg, M. Callmander, and P. Kupfer. 2005. Phylogeny and biogeography of Exacum (Gentianaceae): A disjunctive distribution in the Indian ocean basin resulted from long distance dispersal and extensive radiation. Systematic Biology 54:21-34.

Zhang, X. H. and V. L. Chiang. 1997. Molecular cloning of 4-coumarate:coenzyme A ligase in loblolly pine and the roles of this enzyme in the biosynthesis of lignin in compression wood. Plant Physiology 113:65-74.

Zhang, Y., G. Brown, R. Whetten, C. A. Loopstra, D. B. Neale, M. J. Kieliszewski, and R. R. Sederoff. 2003. An arabinogalactan protein associated with secondary cell wall formation in differentiating xylem of loblolly pine. Plant Molecular Biology 52:91-102.

Zhang, Z. Y. and D. Z. Li. 2004. Molecular phylogeny of section Parrya of Pinus (Pinaceae) based on chloroplast matK gene sequence data. Acta Botanica Sinica 46:171-179.