THE PHYLOGENY OF (): UTILIZING TWO

HIGH THROUGHPUT SEQUENCING METHODS IN THE ANALYSIS

OF A RAPID AND RECENT RADIATION

______

A Thesis

Presented to the

Faculty of

San Diego State University

______

In Partial Fulfillment

of the Requirements for the Degree

Master of Science in Biology

with a Concentration in

Evolutionary Biology

______

by

Amanda Blue Everett

Summer 2017

iii

Copyright © 2017 by Amanda Blue Everett All Rights Reserved

iv

ABSTRACT OF THE THESIS

The Phylogeny of Pogogyne (Lamiaceae): Utilizing Two High Throughput Sequencing Methods in the Analysis of a Rapid and Recent Radiation by Amanda Blue Everett Master of Science in Biology with a Concentration in Evolutionary Biology San Diego State University, 2017

The genus Pogogyne is a member of the mint family, Lamiaceae, a very diverse clade of angiosperms. Recent phylogenetic research recovers Pogogyne in the sub-tribe Menthinae, tribe , of subfamily . Previous research suggested that Pogogyne underwent a relatively rapid diversification following their adaptation to vernal pool or “temporary wetland” conditions 0.9 – 1.9 mya. In this study, Pogogyne is used as a model for the phylogenomic study of rapid divergence within closely related taxa. To accomplish this, two high throughput sequencing technologies are compared: 1) a method utilizing Illumina technology to sequence the high copy organellar fraction of genomic DNA, termed ‘genome skimming,’ and 2) a method that relies on Illumina sequencing of an enzyme catalyzed reduced sample of genomic DNA, known as restriction-site associated DNA sequencing (RADseq). Both types of data were analyzed using maximum likelihood and Bayesian algorithms. The analyses of both types of data were similar to previous findings, but the new data inferred topologies with much higher support than those that were obtained from earlier studies. Results support a monophyletic genus Pogogyne, a monophyletic subgenus Hedeomoides, and the suggestion that two novel species exist in the genus.

v

TABLE OF CONTENTS

PAGE

ABSTRACT ...... iv LIST OF TABLES ...... vii LIST OF FIGURES ...... viii CHAPTER 1 INTRODUCTION ...... 1 2 MATERIALS AND METHODS ...... 13 Genome Skimming ...... 13 Taxon Sampling and DNA Isolation...... 13 Library Preparation, Sequencing, and Pre-processing ...... 15 Phylogenetic Analysis ...... 16 Double-Digest RADseq ...... 17 Taxon Sampling and DNA Isolation...... 17 Library Preparation, Sequencing, and Pre-processing ...... 20 Sequence Assembly and Alignment ...... 21 Sensitivity Analysis ...... 22 Phylogenetic Analyses ...... 22 3 RESULTS ...... 24 Genome Skimming ...... 24 Sequencing and Processing ...... 24 Assembly and Alignment ...... 24 Phylogenetic Analysis ...... 24 Double-Digest RADseq ...... 28 Sequencing and Processing ...... 28 Assembly and Alignment ...... 28 Sensitivity Analysis ...... 28

vi

Phylogenetic Analysis ...... 28 4 DISCUSSION ...... 36 Comparison of Genome Skimming and ddRADseq ...... 36 Phylogenetic Hypotheses based on ddRADseq Data...... 39 Geographical Correlations ...... 40 Taxonomic Considerations ...... 41 Conclusions and Future Directions ...... 42 ACKNOWLEDGEMENTS ...... 43 LITERATURE CITED ...... 44

vii

LIST OF TABLES

PAGE

Table 1. Currently Recognized Subgenera and Species of Pogogyne, with Synonyms in Brackets and Environmental Listing shown (from CNPS Rare Program 2016) ...... 6 Table 2. Taxa Sampled for Genome Skimming Analysis...... 13 Table 3. Taxa Sampled for ddRADseq Analysis...... 18 Table 4. Final Alignment Lengths, Variable Characters, and Parsimony Informative Characters from Genome Skimming Analysis ...... 25 Table 5. PEAR Results Summary ...... 30

viii

LIST OF FIGURES

PAGE

Figure 1. A) P. abramsii, showing decumbent stems and vestiture. B) P. douglasii, showing . C) P. nudiuscula, corolla showing bilabiate, zygomorphic corolla with four stamens and trichomes on the style. D) P. sp. nov., showing punctate, oil secreting glands. E) P. serpylloides, showing a single nutlet from the schizocarp of nutlets. F) P. nudiuscula, showing corolla exserted relative to the calyx. G) P. serpylloides, showing corolla included relative to the calyx ...... 5 Figure 2. A) Vernal pool schematic. B) Dry vernal pool in San Diego, C) Partially flooded vernal pool in San Diego, California. D) A large northern California vernal pool in spring. Photos used with permission ...... 9 Figure 3. A) Time calibrated phylogeny of Silveira and Simpson (2013) generated in BEAST using chloroplast DNA and nuclear ribosomal DNA rates of molecular substitution for angiosperms. B) Tree from Bayesian analysis of concatenated ITS, ETS, and trnQ-rps16 data ...... 11 Figure 4. Pogogyne populations sampled for genome skimming analysis ...... 14 Figure 5. Pogogyne populations sampled for ddRADseq analysis ...... 19 Figure 6. A) The cistron phylogeny. B) The genus Pogogyne expanded. Maximum likelihood bootstrap support values are listed for each node, followed by the Bayesian posterior probability for each node...... 26 Figure 7. A) The plastome phylogeny. B) The genus Pogogyne expanded. Maximum likelihood bootstrap support values are listed for each node, followed by the Bayesian posterior probability for each node. C) The coastal clade from the Bayesian analysis. In this analysis, the relationships within the clade differed at two nodes...... 27 Figure 8. Results of the sensitivity analysis using 24 supermatrices analyzed in RAxML...... 32 Figure 9. Maximum likelihood topology produced using the c94d10m7p3 dataset...... 33 Figure 10. Bayesian topology of the c95d10m15p3 alignment, with duplicate samples removed...... 34 Figure 11. The SVD Quartets topology from the c80d4m15p3 (A), and c95d10m15p3 (B) datasets, with duplicate samples removed. Bootstrap values over 70 are considered strong support...... 35

1

CHAPTER 1

INTRODUCTION

Phylogenetic inference of closely related plant species poses a challenge for evolutionary biologists despite progress made in obtaining molecular data through Sanger sequencing (Sanger et al. 1977). An ideal Sanger dataset is derived from aligned, PCR amplified marker sequences that evolve at different rates and originate from many unlinked regions of the organismal genome, including those of the organelles. These datasets have been the foundation of phylogenetic studies over the decades, and have incited the development of novel conceptual and analytical approaches to understanding speciation based on molecular evolution. However, many lineages undergoing rapid divergence remain recalcitrant because of the lack of resolution using these types of data. Extremely common approaches to phylogenetic reconstruction are often based on the theory of coalescence (Kingman 1982). Here, a model is used to generate a species tree based on a consensus derived from the independent genealogy of the respective markers sequenced from each sample. In the case of rapid and recent speciation, the branches leading to the tips of the phylogeny are short; these branches representing time elapsed as a function of nucleotide substitution. However, in these scenarios sufficient time for markers to evolve may not have elapsed; thus, many markers do not confer phylogenetic signal in support of a bifurcating model (Knowles and Chan 2008). As intervals between divergence decrease and effective population size increases, gene trees increasingly tend to conflict with species trees. Topological resolution can be improved through the use of carefully selected, rapidly evolving, unlinked markers and increased sampling of each terminal species tree lineage (Degnan and Rosenberg 2006). Appropriate markers with sufficient variation to resolve relationships in a system that is the product of a rapid and recent radiation are difficult and laborious to develop in non-model organisms using the PCR/Sanger technique (Curto et al. 2012). In addition, it is highly implausible that all alleles of an ancestral population will sort

2 completely within daughter lineages, especially when time since divergence is short; an evolutionary process called incomplete lineage sorting. The majority of plant phylogenetic studies to date do not apply coalescent methodology to speciation questions. The markers used in studies of plant evolution are almost always derived from chloroplast DNA (cpDNA) and the DNA of the nuclear ribosomal cistron (nrDNA). These markers occur in multiple copies in the cells of and can be reliably isolated using standard extraction protocols. Additionally, because at least the flanking regions of these DNA markers are highly conserved, universal primers have been developed that reliably amplify genes and intergenic regions. When appropriate markers do happen to be available to amplify nuclear genes for a taxon, PCR amplification of these genes tends to augment specific loci and alleles excessively. Thus, when using plant nuclear genes it is also necessary to clone the PCR products, a step that disproportionately increases laboratory effort (Tao 2002). With the introduction of high throughput genomic sequencing, also known as Next Generation Sequencing (NGS), a deluge of technologies have become available allowing access to an unprecedented scale of genomic sequence data from non-model organisms. Current sequencing methods can now target specific gene regions, entire organellar genomes, transcriptomes, and even substantial fractions of the nuclear genome for any organism from which DNA (or RNA) can be extracted. Concomitantly, analytical methods have been developed to process the voluminous sequence data from these methods. Contemporary phylogenetic research requires prudent decision making with regard to sequencing technology and analytical/bioinformatic pipelines. In terms of evolutionary studies involving terrestrial plants, many new directions are under exploration, and significant progress has been made toward the resolution of many difficult phylogenies. In this study, Pogogyne is used as a model for the phylogenomic study of rapid and ongoing divergence within closely related taxa. To accomplish this, two high throughput sequencing technologies are compared: 1) a method utilizing Illumina technology to sequence the high copy organellar fraction of genomic DNA, termed ‘genome skimming’ (Straub et al. 2012), and 2) a method that relies on Illumina sequencing of an enzyme reduced sample of genomic DNA, known as restriction-site associated DNA sequencing (RADseq) (Baird et al. 2008).

3

In genome skimming, genomic DNA is sequenced with shallow coverage. While this allows limited access to the nuclear genome, it provides a wealth of information about the chloroplast, ribosomal cistron, and mitochondrial genomes (Straub et al. 2012). These genomic sequences are abundant in standard extractions derived from leaf tissue and are thus easily sequenced with shallow sequencing coverage to yield up to 200Kb of edited and sequence data after assembly and alignment. Several factors make genome skimming attractive for use in plant evolutionary studies. Genome skimming can be done using dried herbarium material as a source of genomic DNA. Because input DNA is fragmented during library preparation, the method is highly forgiving of degraded DNA, which is common in extractions derived from this type of sample. The input DNA requirement for this sequencing method is low relative to that of other sequencing methods because mtDNA, cpDNA, and nrDNA are present in high abundance in standard extractions (Särkinen et al. 2012; Dodsworth 2015; Bakker et al. 2016). Plant phylogenetic analyses utilizing genome- skimming datasets usually address deep evolutionary divergences (Barrett et al. 2014; Huang et al. 2014; Jones et al. 2014; Liu et al. 2014; Straub et al. 2014; Cotton et al. 2015; Lu et al. 2015; Barrett et al. 2016; Gardner et al. 2016); however, at least one study speaks to a rapid and recent divergence (Welch et al. 2016). Double-digest Restriction Site-Associated DNA Sequencing (ddRADseq) involves the dual enzymatic digestion of genomic DNA utilizing a pair of restriction enzymes that are selected based on the data needs of the study. The technique samples thousands of unlinked loci that when taken together, contain enough single nucleotide polymorphisms (SNPS) to generate a robust phylogenetic signal. Increased success has been demonstrated using input DNA extracted from herbarium specimens. The beauty of ddRADseq is that data are available from the entire genome, including nuclear regions. Thus, the support values that result from phylogenetic analysis can be interpreted within a biological context. This means that rather than being due to uncertainty caused by paucity of data, mixed or weak support values are the signature of biological phenomena such as incomplete lineage sorting, or horizontal gene transfer. Additionally, read processing pipelines are available for the generation of input alignments for many different methods of evolutionary analysis, making it possible to exhaustively explore the biological context of phylogenetic discordance analytically. This and similar restriction enzyme based methods are increasingly being used

4 to answer evolutionary questions about plants (Eaton and Ree 2013; Hipp et al. 2013; Escudero et al. 2014; Hipp et al. 2014; Cavender-Bares et al. 2015; Eaton et al. 2015; Mort et al. 2015; Nowak et al. 2015; Massatti et al. 2016; Paun et al. 2016; Wessinger et al. 2016). The genus Pogogyne is in the mint family, Lamiaceae, a very diverse clade of angiosperms that is split into multiple sub-families, tribes, and sub-tribes (Wagstaff and Olmstead 1997; Moon et al. 2010; Scheen et al. 2010; Bendiksby et al. 2011; Pastore et al. 2011; Salmaki et al. 2012; Roy and Lindqvist 2015). Recent phylogenetic research recovers Pogogyne in the sub-tribe Menthinae, tribe Mentheae, of subfamily Nepetoideae (Drew and Sytsma 2011, 2012). These species are small, decumbent to erect annual herbs that are hairy, bristly, or ciliate on most parts (except P. nudiuscula) (Figure 1A, B). The gynobasic style is hairy below the stigma lobes (Figure 1C), inspiring the name ‘Pogogyne’, from the Greek meaning, ‘bearded female’ (Forbes, 1961). Distinct punctate glands that secrete pungent oil occur on many portions of the plant (Jokerst and Hickman, 1993; Silveira et al., 2012) (Figure 1D). The fruit is hairy to glabrous (Figure 1E) and is a typical schizocarp of nutlets. Pogogyne is usually separated into two infrageneric groups, subgenus Pogogyne [subgen. Eupogogyne J. T. Howell] and subgenus Hedeomoides A. Gray (Table 1). Fertile stamen number and corolla exsertion relative to the calyx distinguish the two subgenera. Subgenus Pogogyne has four fertile stamens (Figure 1C), whereas subgenus Hedeomoides has two fertile stamens and two stamens that are vestigial or absent. In subgenus Pogogyne, the corolla is large and exserted relative to the calyx (Figure 1F); in Hedeomoides, it is small and mostly included within the calyx (Figures 1G) (Jokerst and Hickman 1993; Silveira et al. 2012).

1

5

A B

C D E

F G

Figure 1. A) P. abramsii, showing decumbent stems and vestiture. B) P. douglasii, showing inflorescence. C) P. nudiuscula, corolla showing bilabiate, zygomorphic corolla with four stamens and trichomes on the style. D) P. sp. nov., showing punctate, oil secreting glands. E) P. serpylloides, showing a single nutlet from the schizocarp of nutlets. F) P. nudiuscula, showing corolla exserted relative to the calyx. G) P. serpylloides, showing corolla included relative to the calyx. All photos were used with permission. Photo credits: A) Kier Morse, B) Steven Perry, C) Scott McMillan, D) Lee Simpson, E) Lee Simpson, F) Lee Simpson, G) G. McDonald.

6

Table 1. Currently Recognized Subgenera and Species of Pogogyne, with Synonyms in Brackets and Environmental Listing shown (from CNPS Rare Plant Program 2016)

Note: Primary Environmental Listing Sources are: CNPS=California Native Plant Society Inventory Listing; CA= California Endangered Species Act (CESA) listing; Fed= Federal Endangered Species Act (FESA) listing; see CNPS Rare Plant Program 2016). Symbols: 1B = Rare, threatened, or endangered in California and elsewhere; 4 = Limited distribution (Watch List); .1 = Seriously endangered in California (over 80% of occurrences threatened / high degree and immediacy of threat); .2 = Fairly endangered in California (20-80% occurrences threatened); CBR=Considered but rejected; CE=California endangered; FE=federally endangered; †=Type for genus

Subgenus Pogogyne includes four described species: P. abramsii J. T. Howell, P. clareana J. T. Howell, P. douglasii Benth., and P. nudiuscula A. Gray (Table 1). is a vernal pool endemic, occurs exclusively within northern San Diego County, and is state and federally endangered (CNPS Rare Plant Program 2016; Department of Fish and Wildlife 2016). Pogogyne clareana inhabits moist creek side habitats and is restricted to a very narrow range within the Santa Lucia Mountains of the Coastal Ranges. It is also listed as endangered by state and federal agencies (CNPS Rare Plant Program 2016; Department of Fish and Wildlife 2016). is widespread and phenotypically variable relative to the other species within subgenus Pogogyne. Its primary historical distribution was in vernal pools of the Central Valley and extending into vernal pools and swales of the Sierra Nevada Foothills. Due to habitat fragmentation within the Central Valley, it is now

7 most abundant in the foothills of the Sierra Nevada Mountains but is not considered endangered or threatened. Unlike the other members of the genus, P. nudiuscula has no vegetative trichomes. This species is narrowly distributed within southern regions of San Diego County and has also been observed in Tijuana. It is a vernal pool obligate that, due to habitat fragmentation, is now listed as endangered by state and federal agencies (CNPS Rare Plant Program 2016; Department of Fish and Wildlife 2016). It is allopatrically distributed relative to P. abramsii. Subgenus Hedeomoides has been treated as genus Hedeomoides Briquet, but this classification has not been recognized in recent treatments (Jokerst and Hickman 1993; Silveira 2010; Silveira et al. 2012). Subgenus Hedeomoides (also spelled Hediomoides in Howell 1931 and referred to as section Hediomoides in Jokerst 1992) has four described species: P. floribunda Jokerst, P. serpylloides A. Gray, P. tenuiflora A. Gray, and P. zizyphoroides Bentham (Jokerst and Hickman 1993; Silveira et al. 2012) (Table 1). Pogogyne tenuiflora is known from a single collection (E. Palmer 65, 1875, GH 00001496) from Guadalupe Island, Baja California, Mexico. This species was listed by Watson (1875), who cited the collection notes as “very rare, among sagebrush, on the eastern side.” Pogogyne tenuiflora has not been found since, and the species is presumed extinct. is found in vernal pools and moist grasslands in the Coast Ranges, San Francisco Bay Area, and Sierra Nevadan foothills. There is a single, historical collection (Orcutt 1361, 21 Apr 1886, UC 25599) from San Quintin, Baja California, Mexico; however, this species has not been found in Mexico since and is presumed extirpated there. is morphologically similar to P. serpylloides, except the nutlet is twice as large. It is distributed in vernal pools and moist depressions in the Central Valley, northern Coast Ranges, and the Sierra Nevadan foothills. The more geographically restricted P. floribunda inhabits vernal pools and seasonal lakes in Lassen County, Modoc County, and the eastern Modoc Plateau in Oregon. Unlike the other Pogogyne species, P. floribunda has white corollas. A detailed taxonomic review of Pogogyne is found in Silveira (2010). Extant Pogogyne species are either vernal pool obligates (some of which are narrowly endemic to a handful of pools) or are more widespread, specialized to life in ephemeral wetlands. Ephemeral wetlands exist in most Mediterranean regions of the world, including environments such as seasonally wet meadows and intermittent streams (Zedler 2003). The

8 vernal pools of the California Floristic Province (Figure 2) are geologically recent, and are distinguished from these other globally distributed wetlands by a largely endemic flora (Solomeshch et al. 2007). Vernal pools have a uniquely impermeable layer beneath their surface (Figure 2A), which, with the winter-wet and summer-dry Mediterranean climate of the California Floristic Province, has resulted in severe selective pressures.

9

Figure 2. A) Vernal pool schematic. B) Dry vernal pool in San Diego, California C) Partially flooded vernal pool in San Diego, California. D) A large northern California vernal pool in spring. Photos used with permission. Photo credits: B) Ellen Bauder, C) Scott McMillan. D) U.S. Dept. of Fish and Wildlife.

10

Winter precipitation creates seasonal pools that, due to the impermeable layer of substrate, do not drain by percolating through the soil but rather dry by evaporation throughout the spring and summer to a completely desiccated condition (Figure 2A, 2B). Consequently, the endemic plants of vernal pools are primarily annuals that grow vegetatively during rainy season inundation, then rapidly progress to flowering and fruiting during various stages of the drying phase (Figure 2C, 2D). These and other adaptations allow vernal pool plants to survive in a habitat that excludes competitors from directly adjacent environments (Zedler 1987). In the first study to conduct a divergence dating analysis on a vernal pool plant, Silviera and Simpson (2013) suggested that Pogogyne underwent a relatively rapid diversification following their adaptation to vernal pool or “temporary wetland” conditions. They inferred a time calibrated Bayesian phylogeny of Pogogyne from the concatenated sequence data of ribosomal ITS, ETS, and a single intergenic chloroplast region. The stem node of the Pogogyne clade was estimated at 5.1 – 7.7 mya, whereas the age of the crown node was estimated at 0.9 – 1.9 mya (Silveira and Simpson 2013) (Figure 3A). The latter age corresponds to an approximate age for the geologic formation of vernal pool soils of 0.6 – 4 mya (Harden 1987). From their analysis, Pogogyne is monophyletic and is divided into two primary clades, both weakly supported. One clade contains P. abramsii, P. clareana, P. nudiuscula, and some P. douglasii specimens; and the other clade contains additional P. douglasii specimens and the species of a strongly supported subgenus Hedeomoides. Notably within this phylogeny, P. douglasii is recovered as paraphyletic with adequate to low support (Figure 3B). Thus, subgenus Pogogyne was recovered as paraphyletic, but generally with weak support. These results further reflect the need to reassess the status of P. douglasii, which has in the past been divided into several subgenera and varieties (Howell 1931), but is now treated as a single phenotypically variable unit (Silveira 2010; Silveira et al. 2012; Silveira and Simpson 2013). Of the 24 nodes present in the final tree of Pogogyne from Silveira and Simpson (2013), exactly half are recovered with weak support (Figure 3B). The results of this study empirically demonstrate the practical challenge of obtaining appropriate markers for phylogenetic analyses at shallow levels.

3 11

A

B

Figure 3. A) Time calibrated phylogeny of Silveira and Simpson (2013) generated in BEAST using chloroplast DNA and nuclear ribosomal DNA rates of molecular substitution for angiosperms. B) Tree from Bayesian analysis of concatenated ITS, ETS, and trnQ-rps16 data. Bayesian posterior probabilities are indicated above the branches, and maximum likelihood bootstrap support values greater than 65% are indicated below branches.

12

In the current study, the evolutionary history of Pogogyne in revisited with an emphasis on establishing a current for the genus using molecular data derived from genome skimming and from ddRADseq. Specific goals of this study are first to refine the phylogenetic relationships of Pogogyne using more data. Results from these analyses will be used to evaluate current taxonomic circumscriptions. The second goal of this research is to compare genome skimming and ddRADseq, two high throughput sequencing methods, for their utility in a system with a rapid and recent radiation. The specific objectives employed to reach these goals are: 1) obtain sequence data via genome skimming; process this data using using the appropriate computer program, and analyze it using maximum likelihood and Bayesian Inference, and 2) obtain sequence data via ddRADseq; process this data using the appropriate programs, and analyze it using maximum likelihood, Bayesian Inference, and quartet analysis, 3) compares results of these analyses, 4) make taxonomic conclusions as to the classification of species and subgenera within Pogogyne.

13

CHAPTER 2

MATERIALS AND METHODS

GENOME SKIMMING

Taxon Sampling and DNA Isolation Sampling commenced after a permit was obtained authorizing the collection of material for all endangered Pogogyne species through the duration of this study. For the genome skimming analysis, samples obtained in the field were silica dried and collected with voucher specimens that were deposited at the San Diego State University Herbarium (SDSU), a member of the Consortium of California Herbaria (ucjeps.berkeley.edu/consortium). Additional material was sampled from existing herbarium specimens from the SDSU herbarium to generate a preliminary data set composed of 18 samples, representative of three outgroups and each of the seven extant and described species within Pogogyne, plus a putative novel species (Table 2, Figure 4). 13

Table 2. Taxa Sampled for Genome Skimming Analysis Sample ID Accession Taxon Collector / Coll. # Date Location Source A. ilicifolia SDSU17138 SDSU17138 A. ilicifolia M. Simpson 2795 4/20/07 San Diego Herbarium A. lanceolata SDSU17310 SDSU17310 A. lanceolata M. Simpson 2866 6/1/07 Santa Clara Herbarium M. villosa SDSU20737 SDSU20737 M. villosa M. Simpson 3812 5/20/14 San Benito Field P. abramsii SDSU20496 SDSU20496 P. abramsii A. Everett 3 5/23/13 San Diego Field P. clareana SDSU19277 SDSU19277 P. clareana M. Silveira 10 5/24/07 Sacramento Herbarium P. douglasii SDSU20800 SDSU20800 P. douglasii A. Everett 17 5/21/14 San Luis Obispo Field P. douglasii SDSU19273 SDSU19273 P. douglasii M. Silveira 18 6/1/07 Santa Barbara Herbarium P. douglasii SDSU19275 SDSU19275 P. douglasii M. Silveira 19 6/1/07 Santa Barbara Herbarium P. douglasii SDSU19285 SDSU19285 P. douglasii S. McMillan 10VI93B 6/10/93 Lake Herbarium P. douglasii SDSU19289 SDSU19289 P. douglasii S. McMillan 8VI93B 6/8/93 San Luis Obispo Herbarium P. douglasii SDSU19290 SDSU19290 P. douglasii S. McMillan 8VI93B 6/8/93 San Luis Obispo Herbarium P. floribunda SDSU19279 SDSU19279 P. floribunda M. Silveira 20 7/24/07 Modoc Herbarium P. nudiuscula SDSU19274 SDSU19274 P. nudiuscula M. Silveira 7 4/14/07 San Diego Herbarium P. nudiuscula SDSU20500 SDSU20500 P. nudiuscula A. Everett 7 5/23/13 San Diego Field P. serpylloides SDSU20796 SDSU20796 P. serpylloides A. Everett 13 5/20/14 Monterey Field P. sp. nov. SDSU19883 SDSU19883 P. sp. nov. M. Guilliams 1505 6/14/11 Baja California Herbarium P. zizyphoroides SDSU18438 SDSU18438 P. zizyphoroides B. Meinke 6/15/05 Klamath Herbarium P. zizyphoroides SDSU19277 SDSU19277 P. zizyphoroides M. Silveira 10 5/24/07 Sacramento Herbarium

14

P. floribunda SDSU18438 P. floribunda SDSU19279

P. douglasii SDSU19285 P. zizyphoroides SDSU19277

A. lanceolata SDSU17310

P. serpylloides SDSU20796 M. villosa SDSU20737

P. douglasii SDSU20800 P. clareana SDSU19272 P. douglasii SDSU19290

P. douglasii SDSU19273 P. douglasii SDSU19275

P. abramsii SDSU20496

A. ilicifolia SDSU17138 P. nudiuscula SDSU19274 P. nudiuscula SDSU20500 P. “mexicana” SDSU19883

Figure 4. Pogogyne populations sampled for genome skimming analysis. Some populations are represented by multiple samples in the analysis.

DNA was extracted from all sampled and vouchered leaf material using a version of the cetyltrimethylammonium bromide (CTAB) protocol of Doyle and Doyle (Doyle 1991; Friar 2005). This procedure was modified with an additional phenol extraction (Friar 2005) to circumvent the abundant presence of secondary compounds that interfere with DNA

15 purification in members of the Lamiaceae, and an RNAse incubation step to remove any RNA that was present in samples prior to library preparation. Success of extractions was evaluated using agarose gel electrophoresis on isolated material. Any samples observed to produce an adequate band, representing high molecular weight DNA, were submitted for library preparation and sequencing.

Library Preparation, Sequencing, and Pre-processing Global Biologics (Columbia, Missouri, USA) conducted the library preparation for genome skimming according to the standard chemistry and protocols for Illumina high throughput sequencing (details of genome skimming library preparation methods are published in Ripma et al. 2014). PRINSEQ (Schmieder and Edwards 2011) was used for quality control and to filter raw reads prior to assembly and alignment, following the methods of Ripma et al. (2014). All duplicate sequences, reverse compliment duplicate sequences, reads with a mean Phred score less than 30, and reads with more than one N were removed. The 3’ and 5’ ends were trimmed to a Phred score of 30 using a window size of 1 (Straub et al. 2013). Reads less than 50 bp were removed along with demultiplexing barcodes. Output files were generated in fastq format by PRINSEQ for subsequent import to Geneious v8.0.3 (Kearse et al. 2012). Sequence Assembly and Alignment – After importing edited read pools to Geneious, a reference-guided assembly was conducted for the ribosomal cistron based on the protocol established by Ripma et al. (2014). A nrDNA contig from Tectona grandis L.f. (GenBank#LN714775), a taxon in Lamiaceae containing sequence data for the 18S, ITS1, 5.8S, ITS2, 28S, and ETS regions, was used as a reference against which reads were assembled under default settings over 100 iterations for the single read pool of P. douglasii (SDSU19275). A consensus sequence was generated for P. douglasii (SDSU19275) using a 75% masking threshold (Whittall et al. 2010; Straub et al. 2012; Ripma et al. 2014). Regions without coverage and with coverage less than 25x were represented as gaps. The most complete annotations available through GenBank for the 18S, 5.8S, 28S, ITS1, and ITS2 regions were transferred from miltiorrhiza Bunge (GenBank#DQ132863), also a taxon in Lamiaceae. An external transcribed spacer annotation was also obtained from GenBank for piperita A. Gray (GenBank#JF301313). The annotations from these two

16

GenBank accessions were transferred to the P. douglasii (SDSU19275) consensus sequence based on 70% sequence similarity. The resulting contig served as a reference sequence for the assembly of the remaining read pools. With a masking threshold of 75%, areas with coverage under 25x converted to gaps, and ambiguity codes retained, a separate consensus sequence was generated for each read pool. Consensus sequences were aligned using the MAFFT plugin. Because sequence alignment and editing recommendations for this scale of data are still advancing (Blair and Murphy 2011), this study follows a conservative approach that has been suggested by other researchers (Parks et al. 2012; Ripma et al. 2014) whereby all alignment columns containing gaps and ambiguity codes are removed. A reference-guided assembly was also conducted in Geneious v.8.0.3 to prepare sequence data for the entire chloroplast. A complete, annotated plastome sequence from Salvia miltiorrhiza was downloaded from Genbank (GenBank#JX880022) and used as a reference for the assembly of reads from P. douglasii (SDSU 19275). A consensus sequence was generated using a 75% masking threshold, and regions with less than 25x coverage were replaced with gaps. Genomic regions of the consensus plastome were designated using annotations from vulgare L. subsp. vulgare (GenBank#JX880022). Sequences were grouped into categories for coding regions, introns, and intergenic regions. Assembly of the remaining read pools was conducted for each category of sequences using the aforementioned masking thresholds and base calling parameters. Consensus sequences within each category were aligned using the MAFFT plugin in Geneious and subsequently stripped of alignment columns containing ambiguities and gaps.

Phylogenetic Analysis The data partitions for both genomic regions were tested independently in JModelTest (Darriba et al. 2012) using the Akaike Information Criterion (Akaike 1974). Nucleotide substitution models for subsequent analyses were selected based on how close they were to the determination provided by JModelTest. RAxML (Stamatakis 2006) was implemented in Geneious v 8.0.3 to conduct separate Maximum Likelihood analyses on the partitioned cistron and plastome. A GTR + Γ model was implemented for all aforementioned analyses. A series of 1000 rapid bootstrap replicates were used to assess support, and clades with a value over 70 were considered highly

17 supported. Trees were imported to FigTree v 1.4.0 (http://tree.bio.ed.ac.uk/software/figtree/) for annotation and final formatting. Bayesian inference was conducted separately on the plastome and the cistron using BEAST (Drummond and Rambaut 2007). Individual analyses were run for 150 million generations, sampling trees every 12,500 generations. Models of nucleotide evolution were selected based on JModelTest results: GTR + Γ was the optimal model choice for the BEAST analysis of the plastome and the cistron. After removing a burn in fraction of 20%, the set of posterior trees was used to create a maximum clade credibility tree, in which a posterior probability of 0.95 or greater was considered strong support. Log files were evaluated for convergence of runs in Tracer v 1.5 (Drummond and Rambaut 2007). Trees resulting from each Bayesian analysis were imported to FigTree v 1.4.0 for final annotation and formatting.

DOUBLE-DIGEST RADSEQ

Taxon Sampling and DNA Isolation For the ddRADseq analysis, additional silica dried leaf material and voucher specimens were collected in the field and supplemented both with samples of plants grown from nutlets in a controlled setting, and with samples that were previously obtained from herbarium specimens for the genome skimming analysis. Five samples represented the outgroup taxa villosa, lanceolata, and Acanthomintha ilicifolia. Ingroup sampling included the geographically restricted P. clareana, P. nudiuscula, and P. abramsii; the northernmost species, P. floribunda, found on the Modoc Plateau; the slightly more widespread P. serpylloides and P. zizyphoroides; the most widespread P. douglasii; and the putative novel taxon only observed at a locality in Baja California, Mexico (Table 3, Figure 5). Of the 46 samples obtained for this part of the study, twelve were replicates and not included; therefore, 34 specimens were sampled in total for ddRADseq.

18

Table 3. Taxa Sampled for ddRADseq Analysis Taxon Sampling for ddRADseq Sample ID Accession Taxon Collector / Coll. # Date Location Source AI17138 SDSU17138 A. ilicifolia M. Simpson 2795 4/20/07 San Diego Herbarium AI17138b SDSU17138 A. ilicifolia M. Simpson 2795 4/20/07 San Diego Herbarium AL17310 SDSU17310 A. lanceolata M. Simpson 2866 6/1/07 Santa Clara Herbarium AL17310b SDSU17310 A. lanceolata M. Simpson 2866 6/1/07 Santa Clara Herbarium MV20737 SDSU20737 M. villosa M. Simpson 3812 5/20/14 San Benito Field PA20496 SDSU20496 P. abramsii A. Everett 3 5/23/13 San Diego Herbarium PA20496b SDSU20496 P. abramsii A. Everett 3 5/23/13 San Diego Herbarium PA21401 SDSU21401 P. abramsii A. Everett 20 4/2/15 San Diego Field PA21401b SDSU21401 P. abramsii A. Everett 20 4/2/15 San Diego Field PA21402 SDSU21402 P. abramsii A. Everett 21 4/2/15 San Diego Field PA21402b SDSU21402 P. abramsii A. Everett 21 4/2/15 San Diego Field PC19272 SDSU19272 P. clareana M. Silveira 28 5/29/08 Monterey Herbarium PD19273 SDSU19273 P. douglasii M. Silveira 18 6/1/07 Santa Barbara Herbarium PD19273b SDSU19273 P. douglasii M. Silveira 18 6/1/07 Santa Barbara Herbarium PD19275 SDSU19275 P. douglasii M. Silveira 19 6/1/07 Santa Barbara Herbarium PD19275b SDSU19275 P. douglasii M. Silveira 19 6/1/07 Santa Barbara Herbarium PD20800 SDSU20800 P. douglasii A. Everett 17 5/21/14 San Luis Obispo Field PD20800b SDSU20800 P. douglasii A. Everett 17 5/21/14 San Luis Obispo Field PD21396 SDSU21396 P. douglasii A. Everett 35 5/15/15 Santa Barbara Field PD21398 SDSU21398 P. douglasii A. Everett 24 4/17/15 San Luis Obispo Field PD21405 SDSU21405 P. douglasii C. Witham 1473 5/8/15 Yolo Field PD21406 SDSU21406 P. douglasii C. Witham 1475 5/8/15 Sacramento Field PF19279 SDSU19279 P. floribunda M. Silveira 20 7/24/07 Modoc Herbarium PF21415 SDSU21415 P. floribunda A. Everett 40 8/31/15 Klamath Cultivated PF21416 SDSU21416 P. floribunda A. Everett 41 8/31/15 Klamath Cultivated PF21416b SDSU21416 P. floribunda A. Everett 41 8/31/15 Klamath Cultivated PN19269 SDSU19269 P. nudiuscula M. Silveira 6 4/14/07 San Diego Herbarium PN19271 SDSU19271 P. nudiuscula M. Silveira 8 4/14/07 San Diego Herbarium PN20500 SDSU20500 P. nudiuscula A. Everett 7 5/23/13 San Diego Field PN20500b SDSU20500 P. nudiuscula A. Everett 7 5/23/13 San Diego Field PN21403 SDSU21403 P. nudiuscula A. Everett 19 4/2/15 San Diego Field PN21403b SDSU21403 P. nudiuscula A. Everett 19 4/2/15 San Diego Field PS20794 SDSU20794 P. serpylloides A. Everett 11 5/20/14 Monterey Field PS20795 SDSU20795 P. serpylloides A. Everett 12 5/20/14 Monterey Field PS20796 SDSU20796 P. serpylloides A. Everett 13 5/20/14 Monterey Field PS20797 SDSU20797 P. serpylloides A. Everett 14 5/20/14 Monterey Field PS20798 SDSU20798 P. serpylloides A. Everett 15 5/20/14 Monterey Field PS20799 SDSU20799 P. serpylloides A. Everett 16 5/20/14 Monterey Field PS21397 SDSU21397 P. serpylloides A. Everett 23 4/17/15 San Luis Obispo Field PS21404 SDSU21404 P. serpylloides A. Everett 22 4/5/15 Monterey Field Pspnov21393SDSU21393 P. sp. nov. A. Everett 25 4/24/15 Baja California Field PZ18438 SDSU18438 P. zizyphoroides B. Meinke 6/15/05 Klamath Herbarium PZ19277 SDSU19277 P. zizyphoroides M. Silviera 10 5/24/07 Sacramento Herbarium PZ19277b SDSU19277 P. zizyphoroides M. Silviera 10 5/24/07 Sacramento Herbarium PZD188666 DAV188666 P. zizyphoroides CNPS Field Team 5/14/08 Merced Cultivated PZU168804 UCR168804 P. zizyphoroides D. Charlton 8027B Yuba Cultivated 5/16/05

19

P. floribunda SDSU21416

P. floribunda SDSU21415 P. floribunda SDSU18438 P. floribunda SDSU19279

P. zizyphoroides UCR198804

P. douglasii SDSU21405 P. zizyphoroides SDSU19277 P. douglasii SDSU21406

A. lanceolata SDSU17310 P. zizyphoroides DAV188666 P. serpylloides SDSU21404 P. serpylloides SDSU20795 M. villosa SDSU20737 P. serpylloides SDSU20796 P. serpylloides SDSU20797 P. serpylloides SDSU20798 P. clareana SDSU19272 P. serpylloides SDSU20799

P. douglasii SDSU20800 P. douglasii SDSU21398 P. serpylloides SDSU21397

P. douglasii SDSU19273 P. douglasii SDSU21396

P. abramsii SDSU21401 P. nudiuscula SDSU20500 P. abramsii SDSU21402 A. ilicifolia SDSU17138 P. nudiuscula SDSU19269 P. nudiuscula SDSU19271 P. “mexicana” SDSU21393 Figure 5. Pogogyne populations sampled for ddRADseq analysis. Some populations are represented by multiple samples in the analysis.

20

DNA was extracted from all sampled leaf material using a version of the cetyltrimethylammonium bromide (CTAB) protocol (Doyle 1991), modified with an additional phenol extraction (Friar 2005) to circumvent the abundant presence of secondary compounds that interfere with DNA purification in members of the Lamiaceae, and with an RNAse incubation step to remove any RNA that was present in samples prior to library preparation.

Library Preparation, Sequencing, and Pre-processing Genomic DNA samples were submitted to Global Biologics (Columbia, Missouri, USA) for preparation and sequencing of ddRADseq libraries. Purification of DNA was followed by restriction enzyme digestion using EcoRI and MspI. Both P1, and P2 adapters were added to each digested sample. Pooling, purification, and enrichment were performed specifically to obtain fragments in the 150-450bp range containing both MspI and EcoRI cutsites. A PCR master mix was added to each DNA pool, followed by PCR cycles. Size selection was performed with the PippinHT Prep System using internal standards to target an insert size of 300bp +/- 150bp. Qubit and Fragment Analysis were conducted to determine library concentration, and qPCR was carried out to determine appropriate loading concentrations for sequencing. Sequencing was done in a single lane of an Illumina HiSeq2000 to produce multiplexed FASTQ read files for all samples. Read files from Illumina sequencing were output in FASTQ format for each sample. These files were demultiplexed and reads were assigned to samples using pyRAD (Eaton 2014). Each barcode was allowed one base pair mismatch, and barcodes were trimmed from sorted reads. Paired-end sequencing results in both a forward and a reverse read for each sequenced fragment. The degree of overlap between forward and reverse reads depends upon both the final read length and the target fragment length from the size selection step. PEAR (Paired-End reAd mergeR) (Zhang et al. 2014) was used to calculate the proportion of overlapping reads, to edit sequences before processing, and to merge each forward/reverse read overlap into a single, longer read. For each set of paired reads, PEAR provides scores of possible overlaps and finds the overlaps that have the highest assembly score. The assembly score is based on the probability of matches and mismatches given the observed data. Then PEAR assesses the statistical significance of merged reads. Significance is tested by

21 calculating a p-value for the null hypothesis that reads are not merged. Reads not passing this test are discarded, while reads passing this test are merged with error correction using Illumina quality scores. Simulations comparing alternative read merging software showed PEAR to have the lowest false positive rate and the highest percentage of correctly merged reads when the statistical test is disabled. When the statistical test is enabled, PEAR yielded the second highest percentage of correctly merged reads (Zhang et al. 2014). Reads were merged in PEAR with the OES statistical test enabled and a p-value of 0.01. The conservative nature of the statistical test made it possible to set the minimum allowed overlap between reads to be merged at 1 bp. A phred score of 20 was used as the threshold for trimming the low quality part of a read. Where the scores for two consecutive bases are below this threshold, the remainder of the read is trimmed. Reads with more than 33% uncalled bases were discarded. Bases with a quality score lower than 20 were converted to N’s. Output files for merged and non-merged reads were then generated in FASTQ format for subsequent processing steps. More than 80% of the paired-end reads overlapped in the case of all samples; thus, only merged and assembled reads were used for subsequent steps.

Sequence Assembly and Alignment Demultiplexed, edited, merged reads were prepared for downstream analyses using pyRAD (Eaton 2014). Consensus sequences for putative loci were first generated for each individual barcoded sample. Reads were clustered at a set similarity threshold using the uclust function in VSEARCH (Edgar 2010). These clusters represent putative loci. Next, clusters formed by fewer that a set number of reads were discarded to ensure accurate downstream base calls. Observed base counts over all sites in all clusters were used to jointly estimate sequencing error-rate and heterozygosity. Using the mean sequencing error-rate, consensus genotypes were assigned for each site in every cluster. Bases that could not be assigned with confidence were replaced with N’s with ambiguity codes retained. The within- sample clustering similarity threshold was also used to cluster the randomized consensus loci generated for each sample. All consensus loci shared by fewer than a set number of individuals were then discarded. It is assumed that paralogs are more likely to be shared by multiple species than are ancestral polymorphisms, so loci appearing heterozygous at the same site for 3 or fewer samples were conservatively discarded as clusters of paralogs. The

22 remaining clusters represented putative loci that were shared by all samples. Shared loci were aligned with Muscle (Edgar 2004). After merging and clustering, samples having too few reads for inclusion in subsequent analyses were identified. It is also possible to identify which sample, of two technical replicates, yielded the fewest reads in the sequencing step. The final step in formatting an alignment was to remove these samples, which are not useful for the analysis. Remaining clusters formed the alignments of orthologous sequences that were assembled into concatenated supermatrices for downstream analysis.

Sensitivity Analysis To evaluate the effect of different parameter settings on phylogenetic results, combinations of parameters controlling clustering similarity threshold, minimum read depth, and the minimum amount of taxa sharing a given locus were varied, and a total of 24 supermatrices were generated. The clustering similarity threshold parameter was assigned values ranging from 80% to 94%. For each similarity threshold, a minimum read depth of 3 or 10 was assigned, and the minimum number of individuals was set to 7 or 4. Each dataset was named according to its parameter settings. For each dataset, a maximum likelihood analysis was conducted in RAxML v 8.2.8 (Stamatakis 2014) using the CIPRES Science Gateway (Miller et al. 2010). Analyses were run under the GTR+I+Γ substitution model, assessing support with 1000 bootstrap replicates. To visualize the effect of parameter settings on topology, bootstrap support percentages were plotted for each dataset for the single node that differed between topologies.

Phylogenetic Analyses After analyzing the topologies produced by the different datasets, the dataset with the most conservative parameters (c94d10m7p3) was selected for a final maximum likelihood analysis in RAxML v 8.2.8 (Stamatakis 2014). Because it is impossible to partition thousands of individual loci, the GTR+Γ model of nucleotide evolution was implemented for the entire aligned supermatrix. Support was assessed with 1000 bootstrap replicates, and as with the genome skimming analysis, clades with a support value greater than 70 were considered well supported.

23

Bayesian inference was conducted using BEAST 2 (Drummond and Rambaut 2007) in the CIPRES science gateway (Miller et al. 2010). The dataset c95d15m10p3 was created specifically for this analysis because the other datasets required more computational power in order for BEAUti to be used to format the alignment. Ten separate analyses were run for 500 million generations with a sampling frequency of 25,000 generations for each dataset. The GTR + Γ substitution model was implemented for the entire alignment in each case. Log files were evaluated in Tracer v 1.5 (Drummond and Rambaut 2007) for convergence. The analyses that reached convergence were summarized with Log Combiner (Drummond and Rambaut 2007) discarding the first 40% of samples as burn in. The maximum clade credibility tree was calculated from the combined posterior distribution of 24,000 samples, where posterior probabilities greater than 0.95 were considered strong support. An analysis was also conducted on two separate datasets using SVD Quartets (Chifman and Kubatko 2014). This program the utilizes a quartet sampling method that accounts for sequence variability due to mutational and coalescent variance, and accounts for differences in the genealogical histories of individual loci. SVD Quartets uses unlinked SNP data directly, bypassing the need for long reads to estimate a gene tree (Chifman and Kubatko 2014). The c80d4m4p3, and c95d10m15p3 datasets were imported to PAUP* 4.0 (Swofford 2003) under the SVD Quartets analysis setting. The QFM quartet assembly method was used with exhaustive sampling, and 100 bootstrap replicates to assess support for relationships recovered.

24

CHAPTER 3

RESULTS

GENOME SKIMMING

Sequencing and Processing Final alignment lengths, variable characters, and phylogenetically informative characters are tabulated in Table 4. A total of 35,712,495 reads were obtained for the taxa involved in this study. The number of raw sequencing reads recovered per sample ranged from 1,380,258 to 2,848,928 with a mean of 1,984,028 and a standard deviation of 443,376. After quality control was executed in PRINSEQ, the number of reads per sample ranged from 1,231,148 to 2,582,995 with a mean of 1,787,996 and a standard deviation of 398,946.

Assembly and Alignment For the cistron analysis, the length of the final alignment was 11,129 bp. The complete data set had a total of 553 phylogenetically informative characters and the ingroup had 33 phylogenetically informative characters. Aligned lengths of the ETS, 18S, ITS1, 5.8S, ITS2, and 28S regions, and number of PICs for each region are tabulated in Table 4 for both the full genome skimming dataset, and for the ingroup only. For the plastome analysis, the length of the annotated P. douglasii (SDSU19275) chloroplast consensus sequence was 150,374 bp. The alignment lengths of the concatenated exons, introns, and intergenic regions, and number of PICs for each region are also tabulated in Table 4 for both the full genome skimming dataset, and for the ingroup only.

Phylogenetic Analysis The topologies produced by the cistron data in the maximum likelihood and Bayesian analyses were congruent at all nodes, but with mixed support (Figure 6B). The taxa belonging to subgenus Hedeomoides formed a strongly supported clade. Within subgenus

25

Hedeomodies, the two samples of P. floribunda are also monophyletic. The taxa of San Diego county and northern Baja California, Mexico formed a monophyletic group, termed the “Southern Clade” with mixed support. Within the Southern Clade, the two P. nudiuscula samples are monophyletic with weak support, and are sister to the putative novel taxon from Baja, P. “mexicana”. The single P. abramsii sample is sister to P. nudiuscula + P. “mexicana” with strong support. Sister to the Southern Clade is a clade grouping all of the P. douglasii specimens and the single P. clareana specimen. This clade is recovered with weak support, and the relationships of the terminal taxa within this clade have mixed support values. 15

Table 4. Final Alignment Lengths, Variable Characters, and Parsimony Informative Characters from Genome Skimming Analysis

Uninformative Uninformative Aligned Sequence variable variable Gene/Genome PIC: all taxa %PIC: all taxa PIC: ingroup %PIC: ingroup Length characters: all characters: taxa ingroup Cistron 11,129 728 553 4.969% 16 33 0.30% Plastome 150,374 972 748 0.497% 165 97 0.06% nrDNA regions ETS 412 30 42 10.19% 4 6 1.46% 18S 41 0 0 0.00% 0 0 0.00% ITS1 217 9 12 5.53% 1 2 0.92% 5.8S 164 0 1 0.61% 0 0 0.00% ITS2 228 8 12 5.26% 0 0 0.00% 28S 46 0 0 0.00% 0 0 0.00% non-coding 10246 675 490 4.78% 11 25 0.24% cpDNA regions exons 12,503 23 30 0.24% 5 5 0.04% introns 18,877 113 85 0.45% 8 10 0.05% non-coding 45,844 537 339 0.74% 114 50 0.11%

The topologies produced in the maximum likelihood and Bayesian analyses of the plastome data differed relative to the cistron topology, and also differed relative to one

another (Figure 7B). In both the maximum likelihood and Bayesian topologies, subgenus Hedeomoides is paraphyletic with strong support. In these topologies, P. floribunda is still monophyletic with strong support, and it is still also strongly supported as sister to the single P. serpylloides sample. However, P. zizyphoroides is sister to a P. douglasii specimen in a strongly supported clade that is sister to the rest of the genus. The Southern Clade is monophyletic and the terminal relationships within this clade match those recovered using the cistron data with strong support. The remaining P. douglasii samples form a clade with the single P. clareana specimen that is sister to the Southern Clade. The terminal

26 relationships within this clade differ between the maximum likelihood and Bayesian analyses of the plastome data, yet the differing relationships are strongly supported in both analyses.

Figure 6. A) The cistron phylogeny. B) The genus Pogogyne expanded. Maximum likelihood bootstrap support values are listed for each node, followed by the Bayesian posterior probability for each node.

27

Figure 7. A) The plastome phylogeny. B) The genus Pogogyne expanded. Maximum likelihood bootstrap support values are listed for each node, followed by the Bayesian posterior probability for each node. C) The coastal clade from the Bayesian analysis. In this analysis, the relationships within the clade differed at two nodes.

28

DOUBLE-DIGEST RADSEQ

Sequencing and Processing After demultiplexing in pyRAD, the number of reads for each sample ranged from 48,709 to 9,533,782. The proportion of overlapping sequences in each sample read pool ranged from 83% to 93% (Table 5). Of the seventeen samples that were derived from herbarium specimens, sixteen were among the samples that yielded fewer raw reads. A total of four samples, two of which were replicates, were removed from the analysis for this reason. The only sample of P. clareana available was one of the eliminated samples, so this taxon was not represented in subsequent analyses.

Assembly and Alignment The number of loci recovered varied for each sample in each dataset. The same four aforementioned samples had to be removed from each alignment prior to phylogenetic analysis due to too few loci present for analysis.

Sensitivity Analysis Twenty-four supermatrices were created in order to evaluate the effects of different parameter settings on the final topology. Analysis of these alignments indicated that two different topologies are supported by the data. In one, P. douglasii specimens are resolved as sister to the rest of the genus. In the other, P. douglasii specimens are resolved as sister to subgenus Hedeomoides (Figure 8).

Phylogenetic Analysis In the maximum likelihood analysis of the c94d10m7p3 ddRADseq dataset, three of the 27 ingroup nodes were weakly supported (Figure 9). Subgenus Hedeomoides is monophyletic. Within subgenus Hedeomoides, P. zizyphoroides, P. floribunda, and P. serpylloides are all monophyletic, respectively, and P. zizyphoroides is sister to P. floribunda + P. serpylloides. Two nodes relating samples of P. serpylloides to each other are weakly supported, likely due to sequence similarity. The Southern Clade is recovered with strong support and is sister to a clade containing all but two samples of P. douglasii. Within the Southern Clade, P. nudiuscula and P. abramsii are both monophyletic, and P. “mexicana” is

29 sister to P. nudiuscula + P. abramsii. The Southern Clade and the clade containing five of the seven P. douglasii samples form a monophyletic group that is sister to subgenus Hedeomoides. Lastly, the two remaining P. douglasii samples form a clade that is sister to all of the other groupings within Pogogyne.

30

Table 5. PEAR Results Summary PEAR Results Summary Sample No. Reads Assembled % Not assembled % Discarded % * AI17138 2,070,889 1,926,369 93.021 140,137 6.767 4,383 0.212 * AI17138b 48,709 41,763 85.740 6,836 14.034 110 0.226 * AL17310 678,089 633,035 93.356 43,520 6.418 1,534 0.226 * AL17310b 2,223,853 2,073,197 93.225 146,050 6.567 4,606 0.207 MV20737 3,300,610 2,944,033 89.197 350,312 10.614 6,265 0.190 * PA20496 3,536,977 3,143,465 88.874 385,466 10.898 8,046 0.227 * PA20496b 3,754,887 3,379,422 90.001 368,102 9.803 7,363 0.196 PA21401 6,971,157 6,075,918 87.158 879,462 12.616 15,777 0.226 PA21401b 8,154,682 7,051,735 86.475 1,084,934 13.304 18,013 0.221 PA21402 5,864,412 5,051,139 86.132 801,220 13.662 12,053 0.206 PA21402b 8,154,682 7,051,735 86.475 1,084,934 13.304 18,013 0.221 * PC19272 69,588 59,318 85.242 10,130 14.557 140 0.201 * PD19273 167,284 139,349 83.301 27,580 16.487 355 0.212 * PD19273b 217,097 188,248 86.711 28,376 13.071 473 0.218 * PD19275 2,267,275 2,043,415 90.126 219,045 9.661 4,815 0.212 * PD19275b 2,719,100 2,382,597 87.624 330,564 12.157 5,939 0.218 PD20800 4,780,762 4,182,098 87.478 587,500 12.289 11,164 0.234 PD20800b 442,961 401,993 90.751 40,074 9.047 894 0.202 PD21396 7,162,833 6,119,676 85.437 1,027,337 14.343 15,820 0.221 PD21398 5,318,583 4,549,639 85.542 756,315 14.220 12,629 0.237 PD21405 5,332,869 4,744,678 88.970 576,183 10.804 12,008 0.225 PD21406 7,387,978 6,513,981 88.170 858,714 11.623 15,283 0.207 * PF18438 2,202,713 2,026,732 92.011 170,640 7.747 5,341 0.242 * PF19279 6,517,750 5,786,510 88.781 717,385 11.007 13,855 0.213 † PF21415 7,441,053 6,573,860 88.346 849,480 11.416 17,713 0.238 † PF21416 7,554,390 6,736,325 89.171 800,233 10.593 17,832 0.236 † PF21416b 7,200,224 6,376,251 88.556 809,620 11.244 14,353 0.199 * PN19269 1,114,489 996,474 89.411 115,542 10.367 2,473 0.222 * PN19271 2,551,227 2,288,512 89.702 257,646 10.099 5,069 0.199 PN20500 5,008,781 4,395,229 87.750 602,248 12.024 11,304 0.226 PN20500b 1,295,562 1,106,594 85.414 186,351 14.384 2,617 0.202 PN21403 6,358,683 5,523,677 86.868 821,575 12.921 13,431 0.211 PN21403b 7,332,886 6,328,562 86.304 989,285 13.491 15,039 0.205 PS20794 4,806,522 4,151,206 86.366 645,076 13.421 10,240 0.213 PS20795 5,882,317 5,243,135 89.134 626,630 10.653 12,552 0.213 PS20796 9,456,620 8,007,923 84.681 1,426,885 15.089 21,812 0.231 PS20797 887,671 789,320 88.920 96,532 10.875 1,819 0.205 PS20798 4,981,368 4,390,245 88.133 581,265 11.669 9,858 0.198 PS20799 4,911,655 4,279,572 87.131 621,907 12.662 10,176 0.207 PS21397 5,826,261 5,057,396 86.803 756,274 12.980 12,591 0.216 PS21404 4,077,337 3,548,978 87.042 518,928 12.727 9,431 0.231 Pspnov21393 647,623 578,780 89.370 67,449 10.415 1,394 0.215 * PZ19277 3,735,921 3,375,483 90.352 352,219 9.428 8,219 0.220 * PZ19277b 520,888 463,756 89.032 56,106 10.771 1,026 0.197 † PZD188666 795,224 694,473 87.330 99,106 12.463 1,645 0.207 † PZU168804 9,533,782 8,271,207 86.757 1,241,667 13.024 20,908 0.219

Note: * = Sample derived from herbarium material. † = Sample derived from cultivated material.

31

The topology recovered in the Bayesian analysis of the c95d10m15p3 dataset was strongly supported at all nodes except one (Figure 10). In this tree, subgenus Hedeomoides is monophyletic. The groupings for the species within subgenus Hedeomoides are also monophyletically recovered. The relationships within the P. serpylloides clade differed from those of the maximum likelihood topology. The Southern Clade is monophyletic; P. nudiuscula and P. abramsii are both monophyletic within the Southern Clade; P. “mexicana” is sister to P. nudiuscula; and P. “mexicana” + P. nudiuscula is sister to P. abramsii. Five of the seven P. douglasii samples form a monophyletic group that is sister to the Southern Clade. The remaining two P. douglasii specimens form a clade that is sister to subgenus Hedeomoides. The topologies produced through the SVD Quartets analyses of the c80d4m15p3 and the c95d10m15p3 datasets are largely congruent. Subgenus Hedeomoides is monophyletic and the species within the subgenus are also all monophyletic. The relationships between the P. serpylloides samples differed between trees at terminal nodes. The Southern Clade is also monophyletic in both analyses, and the relationships of the monophyletically recovered species within this clade are concordant with one another, and with the results of the maximum likelihood and Bayesian analyses. The placement of a clade comprised of P. douglasii SDSU21405 + P. douglasii SDSU21406 is different in each of the SVD Quartets topologies. For the c80d4m15p3 dataset, this group is sister to a grouping that includes the other P. douglasii samples, and the Southern Clade. For the c95d10m15p3 dataset, this clade is sister to the rest of the genus. layout.

Figure 8. Results of the sensitivity analysis using 24 supermatrices analyzed in RAxML.

32

33

A Genus Pogogyne

B

Southern Clade

33

Subgenus Hedeomoi- des

Figure 9. Maximum likelihood topology produced using the c94d10m7p3 dataset. and trichomes on the style. D) P. sp. nov., showing punctate, ogil secretin gl ands . E) P. serpylloides, showing a single nutlet from the schizocarp of nutlets. F) P. nudiuscula, showing corolla exserted relative to the calyx. G) P. serpylloides, showing corolla included

Figure 10. Bayesian topology of the c95d10m15p3 alignment, with duplicate samples removed. Posterior probability estimates over 0.95 are considered strong support.

34

35

Figure 11. The SVD Quartets topology from the c80d4m15p3 (A), and c95d10m15p3 (B) datasets, with duplicate samples removed. Bootstrap values over 70 are considered strong support.

36

CHAPTER 4

DISCUSSION

COMPARISON OF GENOME SKIMMING AND DDRADSEQ Three main features of genome skimming and ddRADseq stand out when the two types of sequencing methods are compared: 1) the genomic origin of the loci available for analysis, 2) access to low copy nuclear regions of the genome, and 3) the amount of SNPs present in final datasets. The origin of loci available for phylogenetic analysis differed significantly between the genome skimming data and the ddRADseq data. The genome skimming data came from a full lane of samples and resulted in enough information to produce an alignment for the entire plastome, and ribosomal cistron. While these regions have been employed for phylogenetic inference in plants over several decades, there are specific caveats to their use. Results from chloroplast DNA, even when it comprises the entire plastome, must be interpreted with caution and should always be compared with other genomic sources of data. Because the plastome is a single, non-recombinant, linked unit that is uniparentally inherited, it can be thought of as a single very large locus that tracks a genetic history that may be different from that of the nuclear genome. The ribosomal cistron is a single unit that represents many tandem arrays of a repeated series of sequences. The individual repeats are generally assumed to undergo concerted evolution, a process that works to homogenize the sequences of the tandem repeats. However, in recently diverged species, it is possible that the homogenization of these sequences has not reached completion within a given species (Álvarez and Wendel 2003). This results in the presence of intra-individual polymorphism, a problem in phylogenetic analyses using cistron data, even when derived from genome skimming (Straub et al. 2012). Additionally, the cistron is comprised of a series of genes and spacers that are linked; thus, like the plastome, it is treated as a single locus.

37

These aforementioned caveats to the use of plastome and cistron data present challenges associated with phylogenetic analysis. For example, in this study, the plastome and cistron trees conflicted. Specifically, subgenus Hedeomoides and the samples of P. douglasii were non-monophyletic in the analyses of the plastome data, but all nodes except one node were recovered with very high support in both maximum likelihood and Bayesian analyses. These highly supported, yet seemingly spurious results are most likely due to the fact that the chloroplast genome tracks a different evolutionary history than the nuclear genome. The maximum likelihood and Bayesian analyses of the cistron alignment resulted in the recovery of a monophyletic subgenus Hedeomoides, a monophyletic Southern Clade, and a monophyletic grouping of P. douglasii that also included P. clareana. Though the results from the cistron data appeared to make more sense, additional phylogenetic investigation of the relationships recovered and the reasons for discordance are not possible. Thus, with respect to this study, the data obtained from genome skimming is inadequate because, in practice, it constitutes only two loci, thereby rendering the construction of a species tree using many loci impossible. Concatenation of the cistron data with the plastome data was inappropriate in this study because the plastome alignment was roughly ten times longer than the cistron alignment, so it dominated the phylogenetic signal. Thus, the results of a concatenated analysis (not shown) were biased by the plastome data, and the resulting tree was identical in topology to the plastome tree, but with lower support values at the nodes where the plastome and cistron trees conflicted. In contrast to the genome skimming dataset, the ddRADseq datasets were all supermatrices of concatenated, SNP containing loci from all genomic regions. Analytical methods were available to investigate phylogeny with ddRADseq data using both concatenation, and coalescent approaches. In the investigation of rapidly and recently diverged clades, it is prudent to seek data from the low-copy fraction of the nuclear genome, also due to the previously described caveats concerning phylogenetic analysis based solely on plastome and cistron sequences. When genome skimming was used with the intention of directly obtaining phylogenetic data, low-copy nuclear sequences were not readily available. The low-copy fraction of the nuclear genome can be accessed using prior information from genome skimming data to design probes for targeted sequencing methods (Straub et al. 2012), yet determining the optimal multiplexing level when using genome skimming in this way requires prior knowledge of the

38 size of the organismal genome, which is highly variable in plants and was, in this case, not available (Straub et al. 2012). In contrast, SNPs were obtained for this study from throughout the entire genome, including low-copy nuclear regions, in a single ddRADseq run. The number of SNPs obtained using each sequencing method differed significantly. There were 748 informative SNPs present within the plastome alignment, and there were 553 informative SNPs present within the cistron alignment (outgroups included, respectively). The number of SNPs present in the ddRADseq data was much greater. This number varied based on parameter settings, and ranged from approximately 80,000 in the most conservative dataset of the sensitivity analysis (c94d10m7p3) to approximately 160,000 in the least conservative dataset of the sensitivity analysis (c80d4m3p3). The number of SNPs available in each analysis influenced the support values obtained for each topology. Of the fourteen nodes recovered in the maximum likelihood and Bayesian analyses of the cistron data, 64% were recovered with strong support. The support values recovered in the analyses of the ddRADseq data varied depending on parameter settings, but a greater number of nodes were strongly supported in each analysis of the ddRADseq data. For example, in the maximum likelihood analysis of dataset c94d10m7p3, 86% of the nodes recovered were strongly supported. In the Bayesian analysis of dataset c95d10m15p3, 96% of the nodes recovered were strongly supported. In an ideal study involving a comparison of results from two different sequencing methods, the sampling would be same for all analyses. The major factor complicating comparisons of genome skimming and ddRADseq analyses is the lower sampling for those analyses using genome-skimming data. Specifically, a comparison of phylogenetic results would be improved if the two P. douglasii samples from the Central Valley localities were included in all analyses using both types of data. However, due to widespread drought conditions and time constraints, it was not possible to carry out adequate sampling for the construction of the genome skimming dataset. Despite this, the results of this study indicate that for an evolutionary scenario involving a rapid and recent divergence, ddRADseq is the more appropriate method for generating sequence data.

39

PHYLOGENETIC HYPOTHESES BASED ON DDRADSEQ DATA Nearly all species included in the analyses of the ddRADseq data were recovered as monophyletic units with high support. A clade of taxa currently recognized as P. douglasii was recovered as sister to the Southern Clade. The monophyly of this group, and the fact that it is phylogenetically distinct from the other two P. douglasii samples, indicate that these samples may represent a novel species. Within the Southern Clade, P. abramsii and P. nudiuscula are sister to one another, forming a clade that is sister to P. “mexicana” in all analyses except the Bayesian analysis, in which P. “mexicana” is recovered as sister to P. nudiuscula. These results do not refute the hypothesis that P. “mexicana” should be treated as a new species; however, more sampling may be necessary to provide phylogenetic evidence for this conclusion. Subgenus Hedeomoides is recovered as monophyletic in all analyses. Within subgenus Hedeomoides, P. zizyphoroides is monophyletic and is sister to a clade comprised of a P. floribunda clade plus a P. serpylloides clade. An interesting feature of all recovered topologies is the placement of a clade comprised of P. douglasii SDSU21405 + P. douglasii SDSU21406, two samples collected in the Central Valley of California. This clade is also recovered as monophyletic in all analyses, but its placement differs depending on the parameter settings used during the formulation of the datasets in pyRAD. The results of the sensitivity analysis showed that, using maximum likelihood, this clade was either recovered as sister to the rest of genus Pogogyne, or as sister to subgenus Hedeomoides. In the Bayesian analysis, these two Central Valley P. douglasii samples are also sister to the subgenus Hedeomoides. In the quartet analysis of dataset c80d4m15p3 they are sister to the remainder of the genus, while in the quartet analysis of dataset c95d10m15p3, they are sister to the Southern Clade and the samples that are currently recognized as P. douglasii, but may represent a novel taxon. The loci recovered by pyRAD for the different datasets supported different results with respect to the major clades recovered and their relationship to the Central Valley P. douglasii samples. As mentioned previously, phylogenetic discordance of topologies obtained using ddRADseq data may be an indication that biological processes such as introgression or incomplete lineage sorting have occurred with respect to the taxa involved in the topological discordance. In the case of the Central Valley P. douglasii samples, two

40 evolutionary scenarios are likely. In the first, P. douglasii may have been involved in one or multiple instances of introgression with the ancestors of the extant lineages. This would result in extant lineages possessing some of the same genetic variation as P. douglasii, thereby creating a phylogenetic signal that would relate the lineages arising from an introgressed common ancestor as sister to P. douglasii. Another possible interpretation is that the Central Valley P. douglasii may be involved in a case of progenitor derivative speciation (Crawford 2010), where P. douglasii was the progenitor of multiple taxa in the genus. In this scenario, the ancestral species does not split into descendants, but rather remains intact and largely unchanged. The derivative species would contain part of the progenitors variation in addition to novel features (Gottlieb 1973). When a recently diverged species has a restricted range at the edge of a more widespread hypothetical progenitor, this mode of speciation is sometimes invoked (Gottlieb 1973).

GEOGRAPHICAL CORRELATIONS A formal phylogeographical analysis of the genus Pogogyne would require increased sampling for all taxa. However, several correlations can be made with respect to the geography of the populations sampled for the ddRADseq part of this study. In all cases, populations within each species that were more geographically proximate to one another were recovered as sister lineages. Within subgenus Hedeomoides, P. floribunda SDSU21415 + P. floribunda SDSU18438 represent a single population that is sister to the sample representing the nearest population, P. floribunda SDSU19279. These three samples form a clade that is sister to a more distant population that is represented by P. floribunda SDSU21416. is sister to P. serpylloides in all recovered topologies. Within P. serpylloides, the samples SDSU20795 through SDSU20799 all represent a single population that is sister to the next nearest P. serpylloides population, represented by the sample SDSU21404. These two populations are sister to P. serpylloides SDSU21397, which is the more distant population relative to the other two P. serpylloides populations sampled. Pogogyne zizyphoroides was sister to P. floribunda + P. serpylloides. Three populations of P. zizyphoroides were sampled, and the two more proximate populations (represented by UCR198804 and SDSU19277) were sister to one another. These two populations were sister to P. zizyphoroides DAV188666, which was sampled from a more distant population.

41

The two P. douglasii samples collected in the Central Valley are sister to one another and phylogenetically distinct with respect to the P. douglasii samples collected outside of San Luis Obispo and Santa Barbara. The two Central Valley P. douglasii populations are sister to one another, however the relationship of this clade relative to the other species differs depending on the dataset used for each analysis. The P. douglasii samples that were collected near San Luis Obispo are sister to the samples collected near Santa Barbara. These samples form a clade that is sister to the Southern Clade in all analyses. Within the southern clade, the samples collected represent a single population each of P. abramsii, P. nudiuscula, and the putative novel taxon, P. “mexicana”. Additional sampling is necessary to provide geographical conclusions about the populations of these three species.

TAXONOMIC CONSIDERATIONS I propose the following taxonomic considerations based on the results of this study. First, as in previous studies, subgenus Hedeomoides is monophyletic with strong support. Thus, the continued recognition of this group as a subgenus, verified by distinctive morphological apomorphies (Silveira and Simpson 2013), is supported by the results of these analyses. Second, in past studies, subgenus Pogogyne was recovered as paraphyletic due to the placement of several P. douglasii samples. In this study, P. douglasii was also paraphyletic in all analyses but one: the c80d4m15p3 quartet analysis (Fig. 11), in which it is monophyletic. However, in all ddRadseq analyses, a novel clade of Central Valley P. douglasii occur within this paraphyletic assemblage, these results are not incompatible with those of the genome skimming analyses. Given this monophyly of the Central Valley P. douglasii samples, subgenus Pogogyne can be recircumscribed as restricted to this clade, given that the type species of the genus occurs within this geographic distribution. A third clade within the genus, including the coastal P. douglasii specimens plus P. nudiuscula, P. abramsii, and P. “mexicana” (all of which form a strongly supported monophyletic group) warrant the designation of a third subgenus inclusive of these taxa alone. These results resolve the issue of paraphyly within subgenus Pogogyne and account for distinctiveness of the Central Valley and coastal P. douglasii samples, the latter likely requiring a new species name.

42

CONCLUSIONS AND FUTURE DIRECTIONS This study demonstrates that high throughput sequencing is a powerful tool for elucidating phylogenetic relationships resulting from a recent and rapid divergence. This type of evolutionary scenario was challenging to analyze prior to the introduction of high throughput sequencing because even a significantly large multilocus alignment derived from Sanger sequencing is limited in its ability to provide information about closely related species. The methods of high throughput sequencing used in this study provided enough data to resolve a recalcitrant phylogeny that previously lacked support due to paucity of sequence data. The data derived from ddRADseq is superior in its ability to confer a strong phylogenetic signal for the monophyletic recovery of most of the species within Pogogyne. To better understand the basis for the changing placement of the Central Valley P. douglasii samples, additional analyses should be conducted to include a broader representation of this species through increased sampling. However, the studies conducted do permit the designation of several taxonomic changes in the genus.

43

ACKNOWLEDGEMENTS

I would first like to thank several funding sources for the financial support they provided throughout the execution of this project. They are the California Native Plant Society, the Northern California Botanists, the American Society of Plant Taxonomists, and the Southern California Botanical Society. I would also like to thank my thesis adviser, Michael Simpson for his support throughout the entire process. I am also grateful to the members of the Simpson lab for their advice and guidance during all stages of this work. Additional thanks go to my partner, Joao Costa, my family, my friends, and the students and faculty of the Evolutionary Biology program at San Diego State University.

44

LITERATURE CITED

Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19: 716-723. Álvarez, I. and J. F. Wendel. 2003. Ribosomal ITS sequences and plant phylogenetic inference. Molecular Phylogenetics and Evolution 29: 417-434. Baird, N. A., P. D. Etter, T. S. Atwood, M. C. Currey, A. L. Shiver, Z. A. Lewis, E. U. Selker, W. A. Cresko, and E. A. Johnson. 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3: e3376. Bakker, F. T., D. Lei, J. Yu, S. Mohammadin, Z. Wei, S. van de Kerke, B. Gravendeel, M. Nieuwenhuis, M. Staats, D. E. Alquezar-Planas, and R. Holmer. 2016. Herbarium genomics: Plastome sequence assembly from a range of herbarium specimens using an Iterative Organelle Genome Assembly pipeline. Biological Journal of the Linnean Society 117: 33-43. Barrett, C. F., C. D. Specht, J. Leebens-Mack, D. W. Stevenson, W. B. Zomlefer, and J. I. Davis. 2014. Resolving ancient radiations: Can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales)? Annals of Botany 113: 119-133. Barrett, C. F., W. J. Baker, J. R. Comer, J. G. Conran, S. C. Lahmeyer, J. H. Leebens-Mack, J. Li, G. S. Lim, D. R. Mayfield-Jones, L. Perez, et al. 2016. Plastid genomes reveal support for deep phylogenetic relationships and extensive rate variation among palms and other commelinid monocots. New Phytologist 209: 855-870. Bendiksby, M., L. Thorbek, A.-C. Scheen, C. Lindqvist, and O. Ryding. 2011. An updated phylogeny and classification of Lamiaceae subfamily Lamioideae. Taxon 60: 471- 484. Blair, C. and R. W. Murphy. 2011. Recent trends in molecular phylogenetic analysis: Where to next? Journal of Heredity 102: 130-138. Cavender-Bares, J., A. González-Rodríguez, D. A. R. Eaton, A. A. L. Hipp, A. Beulke, and P. S. Manos. 2015. Phylogeny and biogeography of the American live oaks (Quercus subsection Virentes): A genomic and population genetics approach. Molecular Ecology 24: 3668-3687. Chifman, J. and L. Kubatko. 2014. Quartet inference from SNP data under the coalescent model. Bioinformatics 30: 3317-3324. CNPS Rare Plant Program. 2016. Inventory of rare and endangered plants (online edition, v8-02). http://www.rareplants.cnps.org (accessed 25 April 2016).

45

Cotton, J. L., W. P. Wysocki, L. G. Clark, S. A. Kelchner, J. C. Pires, P. P. Edger, D. Mayfield-Jones, and M. R. Duvall. 2015. Resolving deep relationships of PACMAD grasses: A phylogenomic approach. BMC Plant Biology 15: 1. Crawford, D. J. 2010. Progenitor-derivative species pairs and plant speciation. Taxon 59: 1413-1423. Curto, M. A., P. Puppo, D. Ferreira, M. Nogueira, and H. Meimberg. 2012. Development of phylogenetic markers from single-copy nuclear genes for multi locus, species level analyses in the mint family (Lamiaceae). Molecular Phylogenetics and Evolution 63: 758-767. Darriba, D., G. L. Taboada, R. Doallo, and D. Posada. 2012. jModelTest 2: More models, new heuristics and parallel computing. Nature Methods 9: 772-772. Degnan, J. H. and N. A. Rosenberg. 2006. Discordance of species trees with their most likely gene trees. PLoS Genetics 2: e68. Department of Fish and Wildlife. 2016. State and federally listed endangered, threatened, and rare plants of California. http://www.dfg.ca.gov/wildlife/nongame/t_e_spp/ (accessed 25 April 2016). Dodsworth, S. 2015. Genome skimming for next-generation biodiversity analysis. Trends in Plant Science 20: 525-527. Doyle, J. 1991. DNA protocols for plants. Pp. 283-293 in Molecular Techniques in Taxonomy, vol. 57, NATO ASI Series, eds. G. Hewitt, A. B. Johnston, and J. P. Young. Heidelberg, Germany: Springer Berlin Heidelberg. Drew, B. T. and K. J. Sytsma. 2011. Testing the monophyly and placement of in the tribe Mentheae (Lamiaceae). Systematic Botany 36: 1038-1049. Drew, B. T. and K. J. Sytsma. 2012. Phylogenetics, biogeography, and staminal evolution in the tribe Mentheae (Lamiaceae). American Journal of Botany 99: 933-953. Drummond, A. J. and A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7: 214. Eaton, D. A. R. 2014. PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics 30: 1844-1849. https://doi.org/10.1093/bioinformatics/btu121 Eaton, D. A. R. and R. H. Ree. 2013. Inferring phylogeny and introgression using RADseq data: An example from flowering plants (Pedicularis: Orobanchaceae). Systematic Biology 62: 689-706. https://doi.org/10.1093/sysbio/syt032 Eaton, D., A. Gonzalez-Rodriguez, A. Hipp, and J. Cavender-Bares. 2015. Introgression obscures and reveals historical relationships among the American live oaks. bioRxiv. https://doi.org/10.1101/016238 Edgar, R. C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792-1797. Edgar, R. C. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460-2461.

46

Escudero, M., D. A. Eaton, M. Hahn, and A. L. Hipp. 2014. Genotyping-by-sequencing as a tool to infer phylogeny and ancestral hybridization: A case study in Carex (Cyperaceae). Molecular Phylogenetics and Evolution 79: 359-367. Forbes, T. R. 1961. Dictionary of word roots and combining forms. The Yale Journal of Biology and Medicine 33: 338. Friar, E. A. 2005. Isolation of DNA from plants with large amounts of secondary metabolites. Methods in Enzymology 395: 1-12. Gardner, A. G., E. B. Sessa, P. Michener, E. Johnson, K. A. Shepherd, D. G. Howarth, and R. S. Jabaily. 2016. Utilizing next-generation sequencing to resolve the backbone of the Core Goodeniaceae and inform future taxonomic and floral form studies. Molecular Phylogenetics and Evolution 94: 605-617. Gottlieb, L. D. 1973. Genetic differentiation, sympatric speciation, and the origin of a diploid species of Stephanomeria. American Journal of Botany 60: 545-553. Harden, J. W. 1987. Soils developed in granitic alluvium near Merced, California (U.S. Geological Survey Bulletin 1590-A). Washington DC: United States Government Printing Office. Hipp, A. L., D. A. Eaton, J. Cavender-Bares, E. Fitzek, R. Nipper, and P. S. Manos. 2014. A framework phylogeny of the American oak clade based on sequenced RAD data. PLoS ONE 9: e93975. Hipp, A. L., P. Manos, J. Cavender-Bares, D. Eaton, and R. Nipper. 2013. Using phylogenomics to infer the evolutionary history of oaks. International Oak Journal 24: 61-71. Howell, J. T. 1931. The genus Pogogyne. Proceedings of the California Academy of Sciences 20: 105-128. Huang, D. I., C. A. Hefer, N. Kolosova, C. J. Douglas, and Q. C. Cronk. 2014. Whole plastome sequencing reveals deep plastid divergence and cytonuclear discordance between closely related balsam poplars, Populus balsamifera and P. trichocarpa (Salicaceae). New Phytologist 204: 693-703. Jokerst, J. 1992. Pogogyne floribunda (Lamiaceae), a new species from the Great Basin in Northeastern California. Califoria Academy of Sciences 13: 347-353. Jokerst, J. and J. Hickman. 1993. Pogogyne. P. 724 in The Jepson Manual: Higher Plants of California, ed. J.C. Hickman. Berkeley, CA: University of California, Berkeley. Jones, S. S., S. V. Burke, and M. R. Duvall. 2014. Phylogenomics, molecular evolution, and estimated ages of lineages from the deep phylogeny of Poaceae. Plant Systematics and Evolution 300: 1421-1436. Kearse, M., R. Moir, A. Wilson, S. Stones-Havas, M. Cheung, S. Sturrock, S. Buxton, A. Cooper, S. Markowitz, C. Duran, T. Thierer, B. Ashton, P. Meintjes, A. Drummond. 2012. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28: 1647-1649.

47

Kingman, J. F. C. 1982. The coalescent. Stochastic Processes and their Applications 13: 235- 248. Knowles, L. L. and Y.-H. Chan. 2008. Resolving species phylogenies of recent evolutionary radiations. Annals of the Missouri Botanical Garden 95: 224-231. Liu, Y., C. J. Cox, W. Wang, and B. Goffinet. 2014. Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias. Systematic Biology 63: 862-878. Lu, J. M., N. Zhang, X. Y. Du, J. Wen, and D. Z. Li. 2015. Chloroplast phylogenomics resolves key relationships in ferns. Journal of Systematics and Evolution 53: 448-457. Massatti, R., A. A. Reznicek, and L. L. Knowles. 2016. Utilizing RADseq data for phylogenetic analysis of challenging taxonomic groups: A case study in Carex sect. Racemosae. American Journal of Botany 103: 337-347. Miller, M. A., W. Pfeiffer, and T. Schwartz. 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. Gateway Computing Environments Workshop (GCE) 2010: 1-8. Moon, H.-K., E. Smets, and S. Huysmans. 2010. Phylogeny of tribe Mentheae (Lamiaceae): The story of molecules and micromorphological characters. Taxon 59: 1065-1076. Mort, M. E., D. J. Crawford, J. K. Kelly, A. Santos-Guerra, M. M. de Sequeira, M. Moura, and J. Caujapé-Castells. 2015. Multiplexed-shotgun-genotyping data resolve phylogeny within a very recently derived insular lineage. American Journal of Botany 102: 634-641. Nowak, M. D., G. Russo, R. Schlapbach, C. N. Huu, M. Lenhard, and E. Conti. 2015. The draft genome of Primula veris yields insights into the molecular basis of heterostyly. Genome Biology 16: 1-17. Parks, M., R. Cronn, and A. Liston. 2012. Separating the wheat from the chaff: Mitigating the effects of noise in a plastome phylogenomic data set from Pinus L.(Pinaceae). BMC Evolutionary Biology 12: 100. Pastore, J. F. B., R. M. Harley, F. Forest, A. Paton, C. van den Berg. 2011. Phytogeny of the subtribe Hyptidinae (Lamiaceae tribe Ocimeae) as inferred from nuclear and plastid DNA. Taxon 60: 1317-1329. Paun, O., B. Turner, E. Trucchi, J. Munzinger, M. W. Chase, and R. Samuel. 2016. Processes driving the adaptive radiation of a tropical tree (Diospyros, Ebenaceae) in New Caledonia, a biodiversity hotspot. Systematic Biology 65: 212-227. Ripma, L. A., M. G. Simpson, and K. Hasenstab-Lehman. 2014. Geneious! Simplified genome skimming methods for phylogenetic systematic studies: A case study in Oreocarya (Boraginaceae). Applications in Plant Sciences 2: 1400062. Roy, T. and C. Lindqvist. 2015. New insights into evolutionary relationships within the subfamily Lamioideae (Lamiaceae) based on pentatricopeptide repeat (PPR) nuclear DNA sequences. American Journal of Botany 102: 1721-1735.

48

Salmaki, Y., S. Zarre, O. Ryding, C. Lindqvist, A. Scheunert, C. Bräuchler, G. Heubl. 2012. Phylogeny of the tribe Phlomideae (Lamioideae: Lamiaceae) with special focus on Eremostachys and Phlomoides: New insights from nuclear and chloroplast sequences. Taxon 61: 161-179. Sanger, F., S. Nicklen, A. R. Coulson 1977. DNA sequencing with chain terminating inhibitors. Proceedings of the National Academy of Sciences 74: 5463-5467. Särkinen, T., M. Staats, J. E. Richardson, R. S. Cowan, and F. T. Bakker. 2012. How to open the treasure chest? Optimising DNA extraction from herbarium specimens. PLoS ONE 7: e43808. Scheen, A.-C., M. Bendiksby, O. Ryding, C. Mathiesen, V. A. Albert, and C. Lindqvist. 2010. Molecular phylogenetics, character evolution, and suprageneric classification of Lamioideae (Lamiaceae). Annals of the Missouri Botanical Garden 97: 191-217. Schmieder, R. and R. Edwards. 2011. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27: 863-864. Silveira, M. A. 2010. The phylogenetic systematics of Pogogyne (Lamiaceae). M. S. thesis. San Diego, CA: San Diego State University. Silveira, M. A. and M. G. Simpson. 2013. Phylogenetic systematics of the Mesa Mints: Pogogyne (Lamiaceae). Systematic Botany 38: 782-794. Silveira, M., M. G. Simpson, and J. D. Jokerst. 2012. Pogogyne. Pp. 1400 in The Jepson Manual, Higher Plants of California, ed. J. C. Hickman. London, England: University of California Press. Solomeshch, A. I., M. G. Barbour, and R. Holland. 2007. Vernal Pools. Pp. 515-533 in Terrestrial Vegetation of California, ed. 3, eds. M. G. Barbour, T. Keeler-Wolf, and A. A. Schoenherr. Berkeley, CA: University of California Press. Stamatakis, A. 2006. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688-2690. Stamatakis, A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312-1313. Straub, S. C. K., M. Parks, K. Weitemier, M. Fishbein, R. C. Cronn, and A. Liston. 2012. Navigating the tip of the genomic iceberg: Next generation sequencing for plant systematics. American Journal of Botany 99: 349-364. Straub, S. C. K., R. C. Cronn, C. Edwards, M. Fishbein, and A. Liston. 2013. Horizontal transfer of DNA from the mitochondrial to the plastid genome and its subsequent evolution in Milkweeds (Apocynaceae). Genome Biology and Evolution 5: 1872- 1885. Straub, S. C., M. J. Moore, P. S. Soltis, D. E. Soltis, A. Liston, and T. Livshultz. 2014. Phylogenetic signal detection from an ancient rapid radiation: Effects of noise reduction, long-branch attraction, and model selection in crown clade Apocynaceae. Molecular Phylogenetics and Evolution 80: 169-185.

49

Swofford, D. L. 2003. PAUP*. Phylogenetic analysis using parsimony (*and other methods), version 4. Sunderland: Sinauer Associates. Tao, S. 2002. Utility of low-copy nuclear gene sequences in plant phylogenetics. Critical Reviews in Biochemistry & Molecular Biology 37: 121. Wagstaff, S. J. and R. G. Olmstead. 1997. Phylogeny of Labiatae and Verbenaceae inferred from rbcL sequences. Systematic Botany 22: 165-179. Watson, S. 1875. Botanical contributions: On the Flora of Guadalupe Island, lower California; list of a collection of plants from Guadalupe Island, made by Dr. Edward Palmer, with his notes upon them; descriptions of new species of plants, chiefly Californian, with revisions of certain genera. Proceedings of the American Academy of Arts and Sciences 11: 105-148. Welch, A. J., K. Collins, A. Ratan, D. I. Drautz-Moses, S. C. Schuster, and C. Lindqvist. 2016. The quest to resolve recent radiations: Plastid phylogenomics of extinct and endangered Hawaiian endemic mints (Lamiaceae). Molecular Phylogenetics and Evolution 99: 16-33. Wessinger, C. A., C. C. Freeman, M. E. Mort, M. D. Rausher, and L. C. Hileman. 2016. Multiplexed shotgun genotyping resolves species relationships within the North American genus Penstemon. American Journal of Botany 103: 912-922. Whittall, J. B., J. Syring, M. Parks, J. Buenrostro, C. Dick, A. Liston, and R. Cronn. 2010. Finding a (pine) needle in a haystack: Chloroplast genome sequence divergence in rare and widespread pines. Molecular Ecology 19: 100-114. Zedler, P. H. 1987. The ecology of southern California vernal pools: A community profile. Biological Report 85: 1-136. Zedler, P. H. 2003. Vernal pools and the concept of “isolated wetlands.” Wetlands 23: 597- 607. Zhang, J., K. Kobert, T. Flouri, and A. Stamatakis. 2014. PEAR: A fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30: 614-620.